fix(bitcoin): knots catalog default must equal top-level version

The knots versions[] marked 29.3.knots20260508 as default while the top-level catalog version is the floating 'latest' tag — violating the generator's own invariant (default:true MUST equal the top-level version so selecting it un-pins / tracks latest). Live effect via package.versions: catalog_default_version='latest' so the UI-highlighted default actually PINS+recreates (opposite of un-pin) and 'latest' was unreachable from the Version & Updates card. Add a 'latest' default entry (== the manifest's floating tag) and keep 29.3.knots20260508 as a pinnable option. Verified on .228: package.versions now returns default=latest with 2 selectable versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge bitcoin-multi-version: multi-version support for Core & Knots
2026-06-28 19:56:49 -04:00 · 2026-06-28 18:48:38 -04:00 · 2026-06-28 18:46:17 -04:00 · 2026-06-28 16:09:05 -04:00 · 2026-06-28 15:09:34 -04:00 · 2026-06-28 14:04:41 -04:00
147 changed files with 13199 additions and 10031 deletions
--- a/.githooks/pre-push
+++ b/.githooks/pre-push
@ -2,7 +2,7 @@
 # Keep the served companion APK in sync with main on every push.
 #
 # When a push to main includes Android changes, rebuild the APK, refresh
-# neode-ui/public/packages/archipelago-companion.apk.zip, commit it, and ask
+# neode-ui/public/packages/archipelago-companion.apk, commit it, and ask
 # you to push again (so the refreshed APK rides along in the same push).
 #
 # Enable once per clone:  git config core.hooksPath .githooks
@ -40,7 +40,7 @@ fi

 bash scripts/publish-companion-apk.sh || exit 0

-DEST="neode-ui/public/packages/archipelago-companion.apk.zip"
+DEST="neode-ui/public/packages/archipelago-companion.apk"
 if git diff --cached --quiet -- "$DEST"; then
  exit 0   # APK unchanged — nothing to do
 fi
--- a/Android/COMPANION_RELEASE.md
+++ b/Android/COMPANION_RELEASE.md
@ -0,0 +1,94 @@
+# Companion App — Build, Ship & "App Not Installed" Runbook
+
+Canonical procedure for releasing the Archipelago Companion Android app and for
+debugging install failures. Read this before touching the companion release flow.
+Hard lessons from 2026-06-26 are baked in below — don't relearn them.
+
+## Ship the companion (the only sanctioned way)
+
+```bash
+./Android/ship-companion.sh
+```
+
+This calls `scripts/publish-companion-apk.sh` (the single source of truth, also
+used by the `.githooks/pre-push` hook), which:
+
+1. **Removes/rejects resource dirs whose names contain spaces.** Empty stray
+   `mipmap-* NNN` dirs (left by icon-export tools) break a *clean* build with
+   `Invalid resource directory name`. Incremental builds hide them — clean builds
+   don't.
+2. **Always does a CLEAN build** (`:app:clean :app:assembleDebug`).
+3. **Forces v1 + v2 + v3 signing** via `zipalign` + `apksigner`.
+4. **Verifies all three schemes** (`apksigner verify --min-sdk-version 21`) and
+   **aborts** if any is missing.
+5. Stages the signed APK at `neode-ui/public/packages/archipelago-companion.apk`,
+   commits, and pushes with `SHIP_COMPANION=1` (the sanctioned pre-push bypass).
+
+**Never** hand-roll `gradlew assembleDebug` + `cp` to the served path. That path
+skips the clean build and the signature enforcement and is exactly how a broken
+APK shipped.
+
+### Bump the version first
+Edit `Android/app/build.gradle.kts` — `versionCode` (must strictly increase) and
+`versionName`. The committed value can drift AHEAD of what's actually built into
+the served APK, so verify the served APK's real version after shipping:
+`aapt2 dump badging neode-ui/public/packages/archipelago-companion.apk | grep version`.
+
+## Signing facts (important)
+
+- Debug builds are signed with the **committed** `Android/app/debug.keystore`
+  (store/key pass `android`, alias `androiddebugkey`) so every machine and the
+  served download share ONE signing key. Cert SHA-256: `D6:22:E0:7E:…:66:4D`.
+- **AGP silently ignores `enableV1Signing = true` for `minSdk ≥ 24`**, so a plain
+  gradle build produces a **v2-only** APK. The `apksigner` step in the publish
+  script is what actually guarantees v1+v2+v3 — do not remove it.
+- **Changing the signing key forces every existing install to be uninstalled
+  once.** Android blocks in-place upgrades across different signatures. Treat the
+  keystore as permanent; never regenerate it casually.
+
+## Debugging "App Not Installed" — DIAGNOSE FIRST
+
+Do **not** theorize about signing schemes / OEM quirks. Get the real reason:
+
+```bash
+adb install ~/Desktop/archipelago-companion-<ver>.apk
+# -> Failure [INSTALL_FAILED_<REASON>: ...]
+```
+
+Map the reason:
+
+| `INSTALL_FAILED_*` | Cause | Fix |
+|---|---|---|
+| `UPDATE_INCOMPATIBLE … signatures do not match` | Old install signed with a **different key** (e.g. pre-shared-keystore per-machine key `58:31:12…`). | Uninstall the old package, then install. **One-time** per device after a key change. |
+| `INVALID_APK` / parse error | Corrupt/incomplete download or bad signing. | Re-download; re-run the publish script. |
+| `INSUFFICIENT_STORAGE` | Storage. | Free space. |
+| `OLDER_SDK` | Device below `minSdk` (26 = Android 8.0). | Unsupported device. |
+
+> A manual uninstall on the phone may NOT clear `UPDATE_INCOMPATIBLE` if the
+> package is registered under another user/profile — `pm path <pkg>` under user 0
+> can show nothing while the conflict persists. `adb uninstall <pkg>` clears it
+> across all users.
+
+## Phone / adb safety (non-negotiable)
+
+When acting on the user's physical phone, be surgical — the user once had all
+home-screen app layouts wiped by an over-broad action.
+
+- Default to **read-only** adb (`devices`, `getprop`, `pm path/list`, `dumpsys`).
+- Mutations (`adb install`, `adb uninstall com.archipelago.app.debug`) only with
+  explicit go-ahead and **scoped to our exact package** — echo it first.
+- **Never** run launcher/system resets: no `pm clear` on launchers, no
+  `reset-permissions`, no factory wipe, no uninstalling apps you didn't build.
+
+## Verify the published download after shipping
+
+The download served to nodes is Gitea raw-on-main. Confirm the live bytes match
+what you built and signed:
+
+```bash
+SERVED=neode-ui/public/packages/archipelago-companion.apk
+URL=http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/$SERVED
+curl -sS -o /tmp/live.apk "$URL"
+shasum -a 256 "$SERVED" /tmp/live.apk          # must match
+apksigner verify -v --min-sdk-version 21 /tmp/live.apk | grep -i "scheme"  # v1/v2/v3 = true
+```
--- a/Android/app/build.gradle.kts
+++ b/Android/app/build.gradle.kts
@ -11,8 +11,8 @@ android {
        applicationId = "com.archipelago.app"
        minSdk = 26
        targetSdk = 35
-        versionCode = 11
-        versionName = "0.4.7"
+        versionCode = 16
+        versionName = "0.4.12"

        vectorDrawables {
            useSupportLibrary = true
--- a/Android/app/src/main/java/com/archipelago/app/data/ServerPreferences.kt
+++ b/Android/app/src/main/java/com/archipelago/app/data/ServerPreferences.kt
@ -112,6 +112,37 @@ class ServerPreferences(private val context: Context) {
        }
    }

+    /**
+     * Replace a saved server in place. Matches the existing entry by connection
+     * identity (address/port/scheme) so edits that change the name or password —
+     * or that touch a legacy 4-field entry — still update the right record. If the
+     * edited server is also the active one, the active record is kept in sync.
+     */
+    suspend fun updateSavedServer(original: ServerEntry, updated: ServerEntry) {
+        context.dataStore.edit { prefs ->
+            val current = prefs[savedServersKey] ?: emptySet()
+            val filtered = current.filterNot { raw ->
+                val e = ServerEntry.deserialize(raw)
+                e != null &&
+                    e.address == original.address &&
+                    e.port == original.port &&
+                    e.useHttps == original.useHttps
+            }.toSet()
+            prefs[savedServersKey] = filtered + updated.serialize()
+
+            val isActive = prefs[activeAddressKey] == original.address &&
+                (prefs[activePortKey] ?: "") == original.port &&
+                (prefs[activeHttpsKey] ?: false) == original.useHttps
+            if (isActive) {
+                prefs[activeAddressKey] = updated.address
+                prefs[activeHttpsKey] = updated.useHttps
+                prefs[activePortKey] = updated.port
+                prefs[activePasswordKey] = updated.password
+                prefs[activeNameKey] = updated.name
+            }
+        }
+    }
+
    suspend fun removeSavedServer(server: ServerEntry) {
        context.dataStore.edit { prefs ->
            val current = prefs[savedServersKey] ?: emptySet()
--- a/Android/app/src/main/java/com/archipelago/app/ui/components/NESMenu.kt
+++ b/Android/app/src/main/java/com/archipelago/app/ui/components/NESMenu.kt
@ -75,6 +75,7 @@ fun NESMenu(
    onDismiss: () -> Unit,
    onSelectServer: (ServerEntry) -> Unit,
    onAddServer: (ServerEntry) -> Unit,
+    onEditServer: (ServerEntry, ServerEntry) -> Unit,
    onRemoveServer: (ServerEntry) -> Unit,
    onToggleMode: () -> Unit,
    onToggleStyle: () -> Unit,
@ -87,7 +88,7 @@ fun NESMenu(
            contentAlignment = Alignment.Center,
        ) {
            AnimatedVisibility(visible = visible, enter = fadeIn() + scaleIn(initialScale = 0.95f), exit = fadeOut() + scaleOut(targetScale = 0.95f)) {
-                MenuPanel(servers, activeServer, isGamepadMode, controllerStyle, onDismiss, onSelectServer, onAddServer, onRemoveServer, onToggleMode, onToggleStyle, onBackToWebView)
+                MenuPanel(servers, activeServer, isGamepadMode, controllerStyle, onDismiss, onSelectServer, onAddServer, onEditServer, onRemoveServer, onToggleMode, onToggleStyle, onBackToWebView)
            }
        }
    }
@ -102,21 +103,39 @@ private fun MenuPanel(
    onDismiss: () -> Unit,
    onSelectServer: (ServerEntry) -> Unit,
    onAddServer: (ServerEntry) -> Unit,
+    onEditServer: (ServerEntry, ServerEntry) -> Unit,
    onRemoveServer: (ServerEntry) -> Unit,
    onToggleMode: () -> Unit,
    onToggleStyle: () -> Unit,
    onBackToWebView: (() -> Unit)?,
 ) {
    var showAdd by remember { mutableStateOf(false) }
+    // The saved server being edited, or null when adding a new one.
+    var editing by remember { mutableStateOf<ServerEntry?>(null) }
    var nm by remember { mutableStateOf("") }
    var addr by remember { mutableStateOf("") }
    var pwd by remember { mutableStateOf("") }

+    fun resetForm() {
+        nm = ""; addr = ""; pwd = ""; showAdd = false; editing = null
+    }
+
+    fun startEdit(server: ServerEntry) {
+        editing = server
+        nm = server.name; addr = server.address; pwd = server.password
+        showAdd = false
+    }
+
    fun submit() {
-        if (addr.isNotBlank()) {
+        if (addr.isBlank()) return
+        val orig = editing
+        if (orig != null) {
+            // Preserve fields the compact form doesn't expose (scheme, port).
+            onEditServer(orig, orig.copy(address = addr, password = pwd, name = nm))
+        } else {
            onAddServer(ServerEntry(addr, false, password = pwd, name = nm))
-            nm = ""; addr = ""; pwd = ""; showAdd = false
        }
+        resetForm()
    }

    Column(
@ -149,6 +168,7 @@ private fun MenuPanel(
                label = server.displayName(),
                selected = active,
                onClick = { onSelectServer(server) },
+                onEdit = { startEdit(server) },
                onRemove = { onRemoveServer(server) },
            )
        }
@ -157,8 +177,8 @@ private fun MenuPanel(
            Text("No servers", color = TextMuted, fontSize = 14.sp, modifier = Modifier.padding(vertical = 4.dp))
        }

-        // Add server
-        if (showAdd) {
+        // Add / edit server
+        if (showAdd || editing != null) {
            Column(
                Modifier
                    .fillMaxWidth()
@ -168,6 +188,25 @@ private fun MenuPanel(
                    .padding(12.dp),
                verticalArrangement = Arrangement.spacedBy(8.dp),
            ) {
+                Row(
+                    Modifier.fillMaxWidth(),
+                    verticalAlignment = Alignment.CenterVertically,
+                    horizontalArrangement = Arrangement.SpaceBetween,
+                ) {
+                    Text(
+                        if (editing != null) "Edit Server" else "Add Server",
+                        color = TextMuted,
+                        fontSize = 13.sp,
+                        letterSpacing = 1.sp,
+                        fontWeight = FontWeight.Medium,
+                    )
+                    Text(
+                        "Cancel",
+                        color = TextMuted,
+                        fontSize = 13.sp,
+                        modifier = Modifier.clickable { resetForm() }.padding(start = 8.dp),
+                    )
+                }
                GlassField(
                    value = nm, onValueChange = { nm = it },
                    placeholder = "Name (optional)",
@ -228,6 +267,7 @@ private fun MenuItem(
    selected: Boolean = false,
    labelColor: Color = TextPrimary,
    onClick: () -> Unit,
+    onEdit: (() -> Unit)? = null,
    onRemove: (() -> Unit)? = null,
 ) {
    Row(
@ -247,7 +287,16 @@ private fun MenuItem(
            color = if (selected) BitcoinOrange else labelColor,
            fontSize = 16.sp,
            fontWeight = FontWeight.Medium,
+            modifier = Modifier.weight(1f),
        )
+        if (onEdit != null) {
+            Text(
+                "✎",
+                color = TextMuted,
+                fontSize = 16.sp,
+                modifier = Modifier.clickable { onEdit() }.padding(horizontal = 8.dp),
+            )
+        }
        if (onRemove != null) {
            Text(
                "✕",
--- a/Android/app/src/main/java/com/archipelago/app/ui/screens/RemoteInputScreen.kt
+++ b/Android/app/src/main/java/com/archipelago/app/ui/screens/RemoteInputScreen.kt
@ -216,6 +216,17 @@ fun RemoteInputScreen(onBack: () -> Unit) {
            onAddServer = { server ->
                scope.launch { prefs.addSavedServer(server); if (activeServer == null) prefs.setActiveServer(server) }
            },
+            onEditServer = { original, updated ->
+                scope.launch {
+                    prefs.updateSavedServer(original, updated)
+                    // If the edited server is the live one, reconnect with the new
+                    // address/credentials so the change takes effect immediately.
+                    if (original.serialize() == activeServer?.serialize()) {
+                        ws.disconnect()
+                        prefs.setActiveServer(updated)
+                    }
+                }
+            },
            onRemoveServer = { server ->
                scope.launch {
                    prefs.removeSavedServer(server)
--- a/Android/app/src/main/java/com/archipelago/app/ui/screens/ServerConnectScreen.kt
+++ b/Android/app/src/main/java/com/archipelago/app/ui/screens/ServerConnectScreen.kt
@ -30,6 +30,7 @@ import androidx.compose.material.icons.filled.VisibilityOff
 import androidx.compose.foundation.verticalScroll
 import androidx.compose.material.icons.Icons
 import androidx.compose.material.icons.filled.Close
+import androidx.compose.material.icons.filled.Edit
 import androidx.compose.material.icons.filled.Lock
 import androidx.compose.material.icons.filled.LockOpen
 import androidx.compose.material3.CircularProgressIndicator
@ -106,9 +107,50 @@ fun ServerConnectScreen(
    var useHttps by remember { mutableStateOf(false) }
    var isConnecting by remember { mutableStateOf(false) }
    var errorMessage by remember { mutableStateOf<String?>(null) }
+    // The saved server currently being edited, or null when adding/connecting.
+    var editingServer by remember { mutableStateOf<ServerEntry?>(null) }

    val savedServers by prefs.savedServers.collectAsState(initial = emptyList())

+    fun clearForm() {
+        name = ""
+        address = ""
+        port = ""
+        password = ""
+        useHttps = false
+        passwordVisible = false
+        errorMessage = null
+    }
+
+    fun startEdit(server: ServerEntry) {
+        editingServer = server
+        name = server.name
+        address = server.address
+        port = server.port
+        password = server.password
+        useHttps = server.useHttps
+        passwordVisible = false
+        errorMessage = null
+    }
+
+    fun cancelEdit() {
+        editingServer = null
+        clearForm()
+    }
+
+    fun saveEdit() {
+        val original = editingServer ?: return
+        if (address.isBlank()) {
+            errorMessage = "Enter a server address"
+            return
+        }
+        val updated = ServerEntry(address, useHttps, port, password, name)
+        scope.launch {
+            prefs.updateSavedServer(original, updated)
+            cancelEdit()
+        }
+    }
+
    fun connect(server: ServerEntry) {
        if (isConnecting) return
        if (server.address.isBlank()) {
@ -178,7 +220,7 @@ fun ServerConnectScreen(
            Spacer(modifier = Modifier.height(4.dp))

            Text(
-                text = "Connect to Server",
+                text = if (editingServer != null) stringResource(R.string.edit_server_title) else "Connect to Server",
                style = MaterialTheme.typography.headlineMedium,
                color = TextPrimary,
                textAlign = TextAlign.Center,
@ -324,7 +366,11 @@ fun ServerConnectScreen(
                            keyboardActions = KeyboardActions(
                                onGo = {
                                    keyboard?.hide()
-                                    connect(ServerEntry(address, useHttps, port, password, name))
+                                    if (editingServer != null) {
+                                        saveEdit()
+                                    } else {
+                                        connect(ServerEntry(address, useHttps, port, password, name))
+                                    }
                                },
                            ),
                            colors = OutlinedTextFieldDefaults.colors(
@ -389,15 +435,40 @@ fun ServerConnectScreen(
                }
            }

-            // Connect button — glass style
-            GlassButton(
-                text = if (isConnecting) stringResource(R.string.connecting) else stringResource(R.string.connect),
-                onClick = {
-                    keyboard?.hide()
-                    connect(ServerEntry(address, useHttps, port, password, name))
-                },
-                modifier = Modifier.fillMaxWidth().height(56.dp),
-            )
+            if (editingServer != null) {
+                // Save / Cancel while editing an existing saved server
+                Row(
+                    modifier = Modifier.fillMaxWidth(),
+                    horizontalArrangement = Arrangement.spacedBy(12.dp),
+                ) {
+                    GlassButton(
+                        text = stringResource(R.string.cancel),
+                        onClick = {
+                            keyboard?.hide()
+                            cancelEdit()
+                        },
+                        modifier = Modifier.weight(1f).height(56.dp),
+                    )
+                    GlassButton(
+                        text = stringResource(R.string.save_changes),
+                        onClick = {
+                            keyboard?.hide()
+                            saveEdit()
+                        },
+                        modifier = Modifier.weight(1f).height(56.dp),
+                    )
+                }
+            } else {
+                // Connect button — glass style
+                GlassButton(
+                    text = if (isConnecting) stringResource(R.string.connecting) else stringResource(R.string.connect),
+                    onClick = {
+                        keyboard?.hide()
+                        connect(ServerEntry(address, useHttps, port, password, name))
+                    },
+                    modifier = Modifier.fillMaxWidth().height(56.dp),
+                )
+            }

            if (isConnecting) {
                CircularProgressIndicator(
@ -407,8 +478,8 @@ fun ServerConnectScreen(
                )
            }

-            // Saved servers
-            if (savedServers.isNotEmpty()) {
+            // Saved servers (hidden while editing one to keep focus on the form)
+            if (editingServer == null && savedServers.isNotEmpty()) {
                Spacer(modifier = Modifier.height(8.dp))
                Text(
                    text = stringResource(R.string.saved_servers),
@ -422,6 +493,7 @@ fun ServerConnectScreen(
                    SavedServerItem(
                        server = server,
                        onConnect = { connect(it) },
+                        onEdit = { startEdit(it) },
                        onRemove = { scope.launch { prefs.removeSavedServer(it) } },
                    )
                }
@ -434,6 +506,7 @@ fun ServerConnectScreen(
 private fun SavedServerItem(
    server: ServerEntry,
    onConnect: (ServerEntry) -> Unit,
+    onEdit: (ServerEntry) -> Unit,
    onRemove: (ServerEntry) -> Unit,
 ) {
    Row(
@ -476,6 +549,9 @@ private fun SavedServerItem(
                }
            }
        }
+        IconButton(onClick = { onEdit(server) }) {
+            Icon(imageVector = Icons.Default.Edit, contentDescription = stringResource(R.string.edit_server), modifier = Modifier.size(18.dp), tint = TextMuted)
+        }
        IconButton(onClick = { onRemove(server) }) {
            Icon(imageVector = Icons.Default.Close, contentDescription = stringResource(R.string.remove_server), modifier = Modifier.size(18.dp), tint = TextMuted)
        }
--- a/Android/app/src/main/java/com/archipelago/app/ui/screens/WebViewScreen.kt
+++ b/Android/app/src/main/java/com/archipelago/app/ui/screens/WebViewScreen.kt
@ -2,6 +2,7 @@ package com.archipelago.app.ui.screens

 import android.annotation.SuppressLint
 import android.graphics.Bitmap
+import android.graphics.BitmapFactory
 import android.view.ViewGroup
 import android.webkit.CookieManager
 import android.webkit.WebChromeClient
@ -45,6 +46,7 @@ import androidx.compose.material3.LinearProgressIndicator
 import androidx.compose.material3.MaterialTheme
 import androidx.compose.material3.Text
 import androidx.compose.runtime.Composable
+import androidx.compose.runtime.LaunchedEffect
 import androidx.compose.runtime.getValue
 import androidx.compose.runtime.mutableIntStateOf
 import androidx.compose.runtime.mutableStateOf
@ -65,6 +67,8 @@ import com.archipelago.app.ui.theme.BitcoinOrange
 import com.archipelago.app.ui.theme.SurfaceBlack
 import com.archipelago.app.ui.theme.TextMuted
 import com.archipelago.app.ui.theme.TextPrimary
+import kotlinx.coroutines.Dispatchers
+import kotlinx.coroutines.withContext

 /** Open a URL in the phone's default browser (genuinely external links). */
 private fun openExternalUrl(context: android.content.Context, url: String) {
@ -319,6 +323,26 @@ fun WebViewScreen(
                                }
                            }

+                            // Node apps (e.g. NetBird) terminate TLS with a
+                            // self-signed cert — the dashboard needs a secure
+                            // context for OIDC/window.crypto.subtle (#15). The
+                            // WebView default is to CANCEL untrusted certs, so
+                            // those apps render blank. The user explicitly trusts
+                            // their own node, so proceed for same-host certs only;
+                            // reject anything else (don't blanket-trust the web).
+                            override fun onReceivedSslError(
+                                view: WebView?,
+                                handler: android.webkit.SslErrorHandler?,
+                                error: android.net.http.SslError?,
+                            ) {
+                                val u = error?.url
+                                if (u != null && isSameHost(u, serverUrl)) {
+                                    handler?.proceed()
+                                } else {
+                                    handler?.cancel()
+                                }
+                            }
+
                            override fun shouldOverrideUrlLoading(
                                view: WebView?,
                                request: WebResourceRequest?,
@ -437,6 +461,27 @@ fun WebViewScreen(
    }
 }

+/** Best-effort fetch of the origin's /favicon.ico, so the launched app's icon
+ *  can be shown on the loading screen before the WebView reports onReceivedIcon
+ *  (which only fires once the page's <head> has parsed). Blocking — call on IO. */
+private fun fetchFavicon(pageUrl: String): Bitmap? {
+    return try {
+        val u = android.net.Uri.parse(pageUrl)
+        val scheme = u.scheme ?: return null
+        val host = u.host ?: return null
+        val portPart = if (u.port > 0) ":${u.port}" else ""
+        val conn = (java.net.URL("$scheme://$host$portPart/favicon.ico").openConnection()
+            as java.net.HttpURLConnection).apply {
+            connectTimeout = 4000
+            readTimeout = 4000
+            instanceFollowRedirects = true
+        }
+        conn.inputStream.use { BitmapFactory.decodeStream(it) }
+    } catch (_: Exception) {
+        null
+    }
+}
+
 /**
 * Lightweight in-app browser used when the kiosk hands off an app that can't be
 * shown in an iframe. Loads the app in a local WebView with a centered loading
@ -461,6 +506,15 @@ private fun InAppBrowser(
    var canGoBack by remember { mutableStateOf(false) }
    var canGoForward by remember { mutableStateOf(false) }

+    // Seed the loading-screen icon immediately from a best-effort favicon
+    // pre-fetch (main's app-icon work), then onReceivedIcon upgrades it — so the
+    // loader shows an icon right away instead of staying blank until the page
+    // parses its <head> (which is what made the loader look stuck).
+    LaunchedEffect(url) {
+        val fetched = withContext(Dispatchers.IO) { fetchFavicon(url) }
+        if (fetched != null && favicon == null) favicon = fetched
+    }
+
    // Back: walk the in-app history first, then close the overlay.
    BackHandler {
        val b = browser
@ -519,6 +573,23 @@ private fun InAppBrowser(
                                canGoForward = view?.canGoForward() == true
                            }

+                            // Self-signed TLS on the node's apps (e.g. NetBird on
+                            // :8087) would otherwise be cancelled by the WebView
+                            // and render blank. Proceed for the user's own node
+                            // (same host); reject any other untrusted cert.
+                            override fun onReceivedSslError(
+                                view: WebView?,
+                                handler: android.webkit.SslErrorHandler?,
+                                error: android.net.http.SslError?,
+                            ) {
+                                val u = error?.url
+                                if (u != null && isSameHost(u, serverUrl)) {
+                                    handler?.proceed()
+                                } else {
+                                    handler?.cancel()
+                                }
+                            }
+
                            override fun shouldOverrideUrlLoading(
                                view: WebView?,
                                request: WebResourceRequest?,
--- a/Android/app/src/main/res/drawable/ic_nav_back.xml
+++ b/Android/app/src/main/res/drawable/ic_nav_back.xml
@ -0,0 +1,12 @@
+<vector xmlns:android="http://schemas.android.com/apk/res/android"
+    android:width="24dp"
+    android:height="24dp"
+    android:viewportWidth="24"
+    android:viewportHeight="24">
+    <path
+        android:pathData="M15,19l-7,-7 7,-7"
+        android:strokeColor="#FFFFFF"
+        android:strokeWidth="2"
+        android:strokeLineCap="round"
+        android:strokeLineJoin="round" />
+</vector>
--- a/Android/app/src/main/res/drawable/ic_nav_close.xml
+++ b/Android/app/src/main/res/drawable/ic_nav_close.xml
@ -0,0 +1,12 @@
+<vector xmlns:android="http://schemas.android.com/apk/res/android"
+    android:width="24dp"
+    android:height="24dp"
+    android:viewportWidth="24"
+    android:viewportHeight="24">
+    <path
+        android:pathData="M6,18L18,6M6,6l12,12"
+        android:strokeColor="#FFFFFF"
+        android:strokeWidth="2"
+        android:strokeLineCap="round"
+        android:strokeLineJoin="round" />
+</vector>
--- a/Android/app/src/main/res/drawable/ic_nav_forward.xml
+++ b/Android/app/src/main/res/drawable/ic_nav_forward.xml
@ -0,0 +1,12 @@
+<vector xmlns:android="http://schemas.android.com/apk/res/android"
+    android:width="24dp"
+    android:height="24dp"
+    android:viewportWidth="24"
+    android:viewportHeight="24">
+    <path
+        android:pathData="M9,5l7,7 -7,7"
+        android:strokeColor="#FFFFFF"
+        android:strokeWidth="2"
+        android:strokeLineCap="round"
+        android:strokeLineJoin="round" />
+</vector>
--- a/Android/app/src/main/res/drawable/ic_nav_newtab.xml
+++ b/Android/app/src/main/res/drawable/ic_nav_newtab.xml
@ -0,0 +1,12 @@
+<vector xmlns:android="http://schemas.android.com/apk/res/android"
+    android:width="24dp"
+    android:height="24dp"
+    android:viewportWidth="24"
+    android:viewportHeight="24">
+    <path
+        android:pathData="M10,6H6a2,2 0,0 0,-2 2v10a2,2 0,0 0,2 2h10a2,2 0,0 0,2 -2v-4M14,4h6m0,0v6m0,-6L10,14"
+        android:strokeColor="#FFFFFF"
+        android:strokeWidth="2"
+        android:strokeLineCap="round"
+        android:strokeLineJoin="round" />
+</vector>
--- a/Android/app/src/main/res/drawable/ic_nav_refresh.xml
+++ b/Android/app/src/main/res/drawable/ic_nav_refresh.xml
@ -0,0 +1,12 @@
+<vector xmlns:android="http://schemas.android.com/apk/res/android"
+    android:width="24dp"
+    android:height="24dp"
+    android:viewportWidth="24"
+    android:viewportHeight="24">
+    <path
+        android:pathData="M4,4v6h6M20,20v-6h-6M5.64,15.36A8,8 0,0 0,18.36 18M18.36,8.64A8,8 0,0 0,5.64 6"
+        android:strokeColor="#FFFFFF"
+        android:strokeWidth="2"
+        android:strokeLineCap="round"
+        android:strokeLineJoin="round" />
+</vector>
--- a/Android/app/src/main/res/values/strings.xml
+++ b/Android/app/src/main/res/values/strings.xml
@ -23,6 +23,13 @@
    <string name="remote_input_hint">Use your phone as a keyboard and mouse for the kiosk</string>
    <string name="close">Close</string>
    <string name="open_in_browser">Open in browser</string>
+    <string name="back">Back</string>
+    <string name="forward">Forward</string>
+    <string name="refresh">Refresh</string>
    <string name="server_name_label">Server Name (optional)</string>
    <string name="server_name_placeholder">My Archipelago</string>
+    <string name="edit_server">Edit</string>
+    <string name="edit_server_title">Edit Server</string>
+    <string name="save_changes">Save Changes</string>
+    <string name="cancel">Cancel</string>
 </resources>
--- a/Android/ship-companion.sh
+++ b/Android/ship-companion.sh
@ -1,13 +1,18 @@
 #!/usr/bin/env bash
 #
 # Build the Android companion app and publish it as the served download
-# (neode-ui/public/packages/archipelago-companion.apk.zip), then commit + push.
+# (neode-ui/public/packages/archipelago-companion.apk — a plain APK a phone can
+# install straight from the link), then commit + push.
 #
 # Use this INSTEAD of `git push` when shipping the companion app, so the
 # downloadable APK on the node always matches what's on main.
 #
 #   ./Android/ship-companion.sh
 #
+# The actual build/sign/verify/stage is done by scripts/publish-companion-apk.sh
+# (single source of truth, shared with the pre-push hook). It does a CLEAN build,
+# forces v1+v2+v3 signing, and ABORTS if any signature scheme is missing — so a
+# broken or v2-only APK can never be shipped.
 set -euo pipefail

 ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
@ -16,21 +21,15 @@ cd "$ROOT"
 export JAVA_HOME="${JAVA_HOME:-/opt/homebrew/opt/openjdk@17}"
 export ANDROID_HOME="${ANDROID_HOME:-$HOME/Library/Android/sdk}"

-APK="Android/app/build/outputs/apk/debug/app-debug.apk"
-DEST="neode-ui/public/packages/archipelago-companion.apk.zip"
+DEST="neode-ui/public/packages/archipelago-companion.apk"

-echo "==> Building debug APK"
-( cd Android && ./gradlew :app:assembleDebug --console=plain -q )
-[ -f "$APK" ] || { echo "ERROR: APK not found at $APK" >&2; exit 1; }
+echo "==> Building + signing + verifying companion APK"
+bash scripts/publish-companion-apk.sh

-echo "==> Publishing -> $DEST"
-mkdir -p "$(dirname "$DEST")"
-rm -f "$DEST"
-( cd "$(dirname "$APK")" && zip -j -q "$ROOT/$DEST" "$(basename "$APK")" )
+[ -f "$DEST" ] || { echo "ERROR: served APK not found at $DEST" >&2; exit 1; }

-git add "$DEST"
-if git diff --cached --quiet; then
-  echo "==> Nothing to commit (working tree + APK unchanged)"
+if git diff --cached --quiet -- "$DEST"; then
+  echo "==> Nothing to commit (APK unchanged)"
 else
  git commit -q -m "chore(android): update companion apk download"
  echo "==> Committed"
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -0,0 +1,57 @@
+# Archipelago — agent guide
+
+## ✅ Single-node production gate is GREEN (2026-06-23)
+
+`tests/lifecycle/run-gate.sh` is **5/5 on .228, 0 failures** — the single-node exit
+criterion is met and the priority banner is demoted. Next exit-criteria: the
+**multinode pass** (`docs/multinode-testing-plan.md`) and workstreams B/C/D.
+
+**Read `docs/PRODUCTION-MASTER-PLAN.md` first** — it is still the authoritative plan
+for the north star: a world-class, **developer-ready app platform** where every app
+is manifest-driven, manifests ship via the **signed registry** (not OTA disk files),
+and **third-party developers publish apps via an external/decentralized registry** —
+all rootless, secure, robust, and 100%-uptime-capable. It no longer overrides all
+ad-hoc direction now that the gate is green, but it remains the source of truth for
+sequencing the remaining workstreams.
+
+Detailed sub-plans (all linked from the master):
+- App platform / packaging phases + security model → `docs/APP-PACKAGING-MIGRATION-PLAN.md`
+- Registry-distributed manifests (in progress) → `docs/registry-manifest-design.md`
+- External/decentralized marketplace for devs → `docs/marketplace-protocol.md`
+- Current per-app state → `docs/app-registry-status-2026-06-21.md`
+- Production test gate (exit criterion) → `tests/lifecycle/TESTING.md`
+
+## Invariants (never violate)
+
+- **Rootless Podman only.** No rootful, no Docker-socket mounts, no privileged
+  containers unless explicitly approved.
+- **No per-app Rust installers / no OS-level reliance.** Apps are declarative;
+  the orchestrator owns the lifecycle. `install_immich_stack` (hardcoded
+  `podman run` + `sudo chown`) is the anti-pattern being deleted, not a template.
+- **Secrets are manifest-declared** (`generated_secrets`, materialised by
+  `container::secrets`, 0600/rootless) — never hardcoded, per-app, or logged.
+- **Migrations never destroy data** — preserve `/var/lib/archipelago/<app>`,
+  secrets, credentials, ports, and adoption container names; keep a rollback path.
+- **Verify on the real node .228 before any tag.** (Fleet-wide multinode
+  verification is a separate plan: `docs/multinode-testing-plan.md`.)
+
+## Build / verify
+
+- Rust workspace root is `core/` (no Cargo.toml at repo root). `cargo` from `core/`.
+- If a `cargo test`/build hits `rust-lld: undefined hidden symbol`, it's
+  incremental-cache corruption — rebuild with `CARGO_INCREMENTAL=0`.
+- Frontend: `neode-ui/` → `npm run build` outputs to `web/dist/neode-ui/`.
+  Grep the built bundle for new strings before shipping (build can silently no-op).
+- App manifests load from disk on nodes at `/opt/archipelago/apps/*/manifest.yml`
+  (today); the goal is to distribute them via the signed catalog instead.
+
+## Production test gate (definition of done)
+
+`tests/lifecycle/run-gate.sh` green across install / UI / stop / start / restart /
+reinstall / reboot-survive / archipelago-restart-survive / uninstall — **5× on
+.228** (`ARCHY_ITERATIONS=5`). **Run the gate ON the node** (it uses local podman/systemctl/bitcoin
+probes), not via RPC from another host. **✅ GREEN 2026-06-23 (5/5, 0 not-ok)** — keep it
+green (re-run after orchestrator/lifecycle changes); regressions are top priority again.
+**Multinode testing (.198 + the rest of the fleet) is a SEPARATE plan** —
+`docs/multinode-testing-plan.md` — not part of this single-node gate criterion, and is
+the next exit criterion now that single-node is green.
--- a/app-catalog/catalog.json
+++ b/app-catalog/catalog.json
@ -73,7 +73,7 @@
      "author": "Mempool",
      "category": "money",
      "tier": "core",
-      "dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0",
+      "dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1",
      "repoUrl": "https://github.com/mempool/mempool",
      "requires": [
        "bitcoin-knots",
@ -214,31 +214,6 @@
        ]
      }
    },
-    {
-      "id": "meshtastic",
-      "title": "Meshtastic",
-      "version": "2-daily-alpine",
-      "description": "Open-source mesh networking for LoRa radios. Create decentralized communication networks.",
-      "icon": "/assets/img/app-icons/meshcore.svg",
-      "author": "Meshtastic",
-      "category": "networking",
-      "tier": "recommended",
-      "dockerImage": "docker.io/meshtastic/meshtasticd:daily-alpine",
-      "repoUrl": "https://github.com/meshtastic/firmware",
-      "containerConfig": {
-        "ports": [
-          "4403:4403"
-        ],
-        "volumes": [
-          "/var/lib/archipelago/meshtastic:/var/lib/meshtasticd"
-        ],
-        "env": [
-          "MESHTASTIC_PORT=/dev/ttyUSB0",
-          "MESHTASTIC_SERIAL=true"
-        ],
-        "notes": "Requires a LoRa radio device at /dev/ttyUSB0. The config file is rendered from the app manifest before container start."
-      }
-    },
    {
      "id": "vaultwarden",
      "title": "Vaultwarden",
@ -281,7 +256,7 @@
    },
    {
      "id": "fedimint",
-      "title": "Fedimint",
+      "title": "Fedimint Guardian",
      "version": "0.10.0",
      "description": "Federated Bitcoin minting service with built-in Guardian UI. Privacy-preserving Bitcoin custody.",
      "icon": "/assets/img/app-icons/fedimint.png",
@ -299,7 +274,7 @@
      "author": "Fedimint",
      "category": "money",
      "tier": "core",
-      "dockerImage": "146.59.87.168:3000/lfg2025/fmcd:0.8.0",
+      "dockerImage": "146.59.87.168:3000/lfg2025/fmcd:0.8.1",
      "repoUrl": "https://github.com/minmoto/fmcd"
    },
    {
--- a/apps/archy-mempool-web/manifest.yml
+++ b/apps/archy-mempool-web/manifest.yml
@ -1,12 +1,12 @@
 app:
  id: archy-mempool-web
  name: Mempool Web
-  version: 3.0.0
+  version: 3.0.1
  description: Frontend web UI for mempool explorer.
  container_name: mempool

  container:
-    image: git.tx1138.com/lfg2025/mempool-frontend:v3.0.0
+    image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1
    pull_policy: if-not-present
    network: archy-net

@ -33,7 +33,10 @@ app:

  health_check:
    type: http
-    endpoint: http://localhost:8080
+    # 127.0.0.1 not localhost: the image's wget resolves localhost to ::1 (IPv6)
+    # first, but nginx binds 0.0.0.0:8080 (IPv4) only -> localhost probe gets
+    # "connection refused" -> perpetual unhealthy -> health_monitor restart loop.
+    endpoint: http://127.0.0.1:8080
    path: /
    interval: 30s
    timeout: 5s
--- a/apps/bitcoin-core/Dockerfile
+++ b/apps/bitcoin-core/Dockerfile
@ -1,5 +1,29 @@
-# Bitcoin Core - uses official image
-FROM bitcoin/bitcoin:24.0
-
-# Default user is already 'bitcoin'
-# No additional setup needed
+# Bitcoin Core — minimal rootless image built from the OFFICIAL upstream release.
+#
+# The CANONICAL, verified build path is scripts/build-bitcoin-image.sh, which
+# downloads the upstream tarball, verifies SHA-256 + the OpenPGP signature
+# (fail-closed), and tags/pushes <registry>/bitcoin:<version>. This Dockerfile
+# mirrors that image for a manual/local build and replaces the old stale
+# community base (`FROM bitcoin/bitcoin:24.0`).
+#
+# Build (binaries must be pre-fetched + verified into ./bin — see the script):
+#   scripts/build-bitcoin-image.sh core 31.0
+FROM debian:bookworm-slim
+ARG BITCOIN_VERSION=31.0
+RUN set -eux; \
+    apt-get update; \
+    apt-get install -y --no-install-recommends ca-certificates; \
+    rm -rf /var/lib/apt/lists/*; \
+    useradd -m -u 1000 -s /bin/bash bitcoin; \
+    mkdir -p /home/bitcoin/.bitcoin; \
+    chown -R bitcoin:bitcoin /home/bitcoin
+# bin/ holds the SHA-256 + GPG-verified bitcoind / bitcoin-cli (Guix-built,
+# x86_64-linux-gnu) extracted from the official release tarball.
+COPY bin/bitcoind /usr/local/bin/bitcoind
+COPY bin/bitcoin-cli /usr/local/bin/bitcoin-cli
+RUN chmod 0755 /usr/local/bin/bitcoind /usr/local/bin/bitcoin-cli
+USER bitcoin
+WORKDIR /home/bitcoin
+VOLUME ["/home/bitcoin/.bitcoin"]
+EXPOSE 8332 8333
+ENTRYPOINT ["bitcoind"]
--- a/apps/bitcoin-knots/Dockerfile
+++ b/apps/bitcoin-knots/Dockerfile
@ -0,0 +1,30 @@
+# Bitcoin Knots — minimal rootless image built from the OFFICIAL upstream release.
+#
+# Knots previously had NO Dockerfile (the :latest tag was built/pushed by hand).
+# The CANONICAL, verified build path is scripts/build-bitcoin-image.sh, which
+# downloads the upstream tarball, verifies SHA-256 + the OpenPGP signature
+# (fail-closed, Luke-Jr release key), and tags/pushes
+# <registry>/bitcoin-knots:<version>. Knots version strings embed a build date,
+# e.g. 29.3.knots20260508 — the full string is the tag.
+#
+# Build (binaries must be pre-fetched + verified into ./bin — see the script):
+#   scripts/build-bitcoin-image.sh knots 29.3.knots20260508
+FROM debian:bookworm-slim
+ARG KNOTS_VERSION=29.3.knots20260508
+RUN set -eux; \
+    apt-get update; \
+    apt-get install -y --no-install-recommends ca-certificates; \
+    rm -rf /var/lib/apt/lists/*; \
+    useradd -m -u 1000 -s /bin/bash bitcoin; \
+    mkdir -p /home/bitcoin/.bitcoin; \
+    chown -R bitcoin:bitcoin /home/bitcoin
+# bin/ holds the SHA-256 + GPG-verified bitcoind / bitcoin-cli (Knots, Guix-built,
+# x86_64-linux-gnu) extracted from the official release tarball.
+COPY bin/bitcoind /usr/local/bin/bitcoind
+COPY bin/bitcoin-cli /usr/local/bin/bitcoin-cli
+RUN chmod 0755 /usr/local/bin/bitcoind /usr/local/bin/bitcoin-cli
+USER bitcoin
+WORKDIR /home/bitcoin
+VOLUME ["/home/bitcoin/.bitcoin"]
+EXPOSE 8332 8333
+ENTRYPOINT ["bitcoind"]
--- a/apps/fedimint-clientd/manifest.yml
+++ b/apps/fedimint-clientd/manifest.yml
@ -9,13 +9,18 @@ app:
    # 0.8.2 — iroh-capable). No usable upstream image exists, so we build + push
    # this to the node registry. Pin the tag to match the REST shapes coded in
    # core/archipelago/src/wallet/fedimint_client.rs (validated against 0.8.2).
-    image: 146.59.87.168:3000/lfg2025/fmcd:0.8.0
+    image: 146.59.87.168:3000/lfg2025/fmcd:0.8.1
    pull_policy: if-not-present
    network: archy-net
    # No entrypoint override: the image's resilient `fmcd-run` launcher loops
    # fmcd and retries on join failure (fmcd needs >=1 federation to boot), so an
    # unreachable default never crash-loops. All config comes from FMCD_* env
    # below. Nodes can join more federations via wallet.fedimint-join.
+    # Auto-generated on first install (random hex, 0600, rootless-owned) so the
+    # app needs no host provisioning. The wallet bridge reads the same file.
+    generated_secrets:
+      - name: fmcd-password
+        kind: hex16
    secret_env:
      - key: FMCD_PASSWORD
        secret_file: fmcd-password
@ -28,7 +33,12 @@ app:
    - storage: 2Gi

  resources:
-    cpu_limit: 1
+    # fmcd's embedded iroh networking can hot-loop on relay/hole-punch retries
+    # on NAT'd nodes that reach the federation neither directly nor via iroh's
+    # public relays, pegging its whole allotment. Cap it low so a stuck instance
+    # can't starve the node (steady-state is <3% of a core; joins are brief);
+    # the fmcd-run watchdog additionally restarts a sustained-hot process.
+    cpu_limit: 0.25
    memory_limit: 1Gi
    disk_limit: 2Gi

--- a/apps/fedimint-gateway/manifest.yml
+++ b/apps/fedimint-gateway/manifest.yml
@ -16,6 +16,14 @@ app:
        else
          exec gatewayd --data-dir /data --listen 0.0.0.0:8176 --bcrypt-password-hash "$FEDI_HASH" --network bitcoin --bitcoind-url http://host.archipelago:8332 --bitcoind-username "$FM_BITCOIND_USERNAME" --bitcoind-password "$FM_BITCOIND_PASSWORD" ldk --ldk-lightning-port 9737 --ldk-alias archipelago-gateway;
        fi
+    # The gateway's admin API is gated by a bcrypt password hash. Generate it on
+    # first install (random password + its bcrypt hash, both 0600 rootless-owned)
+    # so the app installs from its manifest alone — `fedimint-gateway-hash` holds
+    # the hash passed to gatewayd, `fedimint-gateway-hash.pw` the plaintext for
+    # any client that must authenticate. Self-heals a wrongly root-owned hash.
+    generated_secrets:
+      - name: fedimint-gateway-hash
+        kind: bcrypt
    secret_env:
      - key: FM_BITCOIND_PASSWORD
        secret_file: bitcoin-rpc-password
--- a/apps/fedimint/manifest.yml
+++ b/apps/fedimint/manifest.yml
@ -1,6 +1,6 @@
 app:
  id: fedimint
-  name: Fedimint
+  name: Fedimint Guardian
  version: 0.10.0
  description: Federated Bitcoin minting service with built-in Guardian UI. Privacy-preserving Bitcoin custody.

--- a/apps/immich-postgres/manifest.yml
+++ b/apps/immich-postgres/manifest.yml
@ -0,0 +1,58 @@
+app:
+  id: immich-postgres
+  name: Immich Postgres
+  version: "14-vectorchord0.4.3-pgvectors0.2.0"
+  description: Postgres (pgvecto.rs / vectorchord) backend for Immich.
+
+  # Container named immich_postgres (underscore) to match the runtime's existing
+  # per-app references (lifecycle/health/crash-recovery/config) and serve as the
+  # server's DB_HOSTNAME alias. Top-level key → serde(flatten) → extensions →
+  # compute_container_name.
+  container_name: immich_postgres
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/immich-postgres:14-vectorchord0.4.3-pgvectors0.2.0
+    pull_policy: if-not-present
+    network: archy-net
+    # postgres drops to its own uid (container 999 → host 100998 under rootless),
+    # so the data dir must be owned by that mapped uid — mirrors archy-btcpay-db.
+    # Verified on .228: the live immich-db is owned 100998. Without this a FRESH
+    # install's dir would be service-user-owned and postgres would EACCES.
+    data_uid: "100998:100998"
+    generated_secrets:
+      - name: immich-db-password
+        kind: hex32
+    secret_env:
+      - key: POSTGRES_PASSWORD
+        secret_file: immich-db-password
+
+  dependencies:
+    - storage: 40Gi
+
+  resources:
+    memory_limit: 2Gi
+    disk_limit: 40Gi
+
+  security:
+    capabilities: [CHOWN, DAC_OVERRIDE, FOWNER, SETGID, SETUID]
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  volumes:
+    - type: bind
+      source: /var/lib/archipelago/immich-db
+      target: /var/lib/postgresql/data
+      options: [rw]
+
+  environment:
+    - POSTGRES_USER=postgres
+    - POSTGRES_DB=immich
+
+  health_check:
+    type: tcp
+    endpoint: localhost:5432
+    interval: 30s
+    timeout: 5s
+    retries: 3
--- a/apps/immich-redis/manifest.yml
+++ b/apps/immich-redis/manifest.yml
@ -0,0 +1,37 @@
+app:
+  id: immich-redis
+  name: Immich Redis
+  version: "7-alpine"
+  description: Valkey (Redis-compatible) cache for Immich.
+
+  # Container named immich_redis (underscore) to match runtime per-app references
+  # and serve as the server's REDIS_HOSTNAME alias on archy-net.
+  container_name: immich_redis
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/valkey:7-alpine
+    pull_policy: if-not-present
+    network: archy-net
+
+  dependencies: []
+
+  resources:
+    memory_limit: 128Mi
+
+  security:
+    capabilities: [SETGID, SETUID]
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  volumes: []
+
+  environment: []
+
+  health_check:
+    type: tcp
+    endpoint: localhost:6379
+    interval: 30s
+    timeout: 5s
+    retries: 3
--- a/apps/immich/manifest.yml
+++ b/apps/immich/manifest.yml
@ -0,0 +1,74 @@
+app:
+  id: immich
+  name: Immich
+  version: "2.7.4"
+  description: Self-hosted photo and video backup with mobile apps and search.
+
+  # app_id "immich" = the user-facing launcher (matches the catalog entry's title
+  # + icon). The container is named "immich_server" so it matches the runtime's
+  # existing per-app container references (lifecycle/health/crash-recovery/ports);
+  # `container_name` is a top-level app key (captured by serde(flatten) into
+  # extensions, read by compute_container_name). It reaches its backends by their
+  # underscore aliases on archy-net (DB_HOSTNAME / REDIS_HOSTNAME below).
+  container_name: immich_server
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/immich-server:release
+    pull_policy: if-not-present
+    network: archy-net
+    secret_env:
+      - key: DB_PASSWORD
+        secret_file: immich-db-password
+
+  dependencies:
+    - app_id: immich-postgres
+    - app_id: immich-redis
+    - storage: 200Gi
+
+  resources:
+    memory_limit: 2Gi
+    disk_limit: 200Gi
+
+  security:
+    capabilities: []
+    readonly_root: false
+    network_policy: isolated
+
+  ports:
+    - host: 2283
+      container: 2283
+      protocol: tcp
+
+  volumes:
+    - type: bind
+      source: /var/lib/archipelago/immich
+      target: /usr/src/app/upload
+      options: [rw]
+
+  environment:
+    - DB_HOSTNAME=immich_postgres
+    - DB_USERNAME=postgres
+    - DB_DATABASE_NAME=immich
+    - REDIS_HOSTNAME=immich_redis
+    - UPLOAD_LOCATION=/usr/src/app/upload
+
+  health_check:
+    type: http
+    endpoint: http://localhost:2283
+    path: /api/server/ping
+    interval: 30s
+    timeout: 5s
+    retries: 20
+
+  interfaces:
+    main:
+      name: Web UI
+      description: Immich photo library
+      type: ui
+      port: 2283
+      protocol: http
+      path: /
+
+  metadata:
+    launch:
+      open_in_new_tab: true
--- a/apps/indeedhub-api/manifest.yml
+++ b/apps/indeedhub-api/manifest.yml
@ -0,0 +1,77 @@
+app:
+  id: indeedhub-api
+  name: IndeedHub API
+  version: "1.0.0"
+  description: IndeedHub backend API (Nostr auth, media, payments).
+  category: community
+
+  # Hyphen name matches runtime references + the live container (adoption);
+  # alias `api` is the short hostname the frontend nginx proxies to
+  # (http://api:4000). Reaches its backends by their short aliases
+  # (postgres/redis/minio) on indeedhub-net — unchanged from the legacy installer.
+  container_name: indeedhub-api
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/indeedhub-api:1.0.0
+    pull_policy: if-not-present
+    network: indeedhub-net
+    network_aliases: [api]
+    # The JWT signing secret is owned here (no backend container owns it); the
+    # db + minio passwords are owned by indeedhub-postgres / indeedhub-minio and
+    # only consumed here. ensure_generated_secrets no-ops when a file already
+    # exists, so live values on .228 are preserved (postgres pw is fixed at
+    # PGDATA init — regenerating would lock the API out).
+    generated_secrets:
+      - name: indeedhub-jwt
+        kind: hex32
+    secret_env:
+      - key: DATABASE_PASSWORD
+        secret_file: indeedhub-db-password
+      - key: AWS_SECRET_KEY
+        secret_file: indeedhub-minio-password
+      - key: NOSTR_JWT_SECRET
+        secret_file: indeedhub-jwt
+
+  dependencies:
+    - app_id: indeedhub-postgres
+    - app_id: indeedhub-redis
+    - app_id: indeedhub-minio
+
+  resources:
+    memory_limit: 2Gi
+
+  security:
+    capabilities: []
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  volumes: []
+
+  environment:
+    - PORT=4000
+    - DATABASE_HOST=postgres
+    - DATABASE_PORT=5432
+    - DATABASE_USER=indeedhub
+    - DATABASE_NAME=indeedhub
+    - QUEUE_HOST=redis
+    - QUEUE_PORT=6379
+    - S3_ENDPOINT=http://minio:9000
+    - AWS_REGION=us-east-1
+    - AWS_ACCESS_KEY=indeeadmin
+    - S3_PUBLIC_BUCKET_NAME=indeedhub-public
+    - S3_PRIVATE_BUCKET_NAME=indeedhub-private
+    - S3_PUBLIC_BUCKET_URL=/storage
+    - NOSTR_JWT_EXPIRES_IN=7d
+    # Fixed across the fleet (envelope-encryption master key baked by the legacy
+    # installer); not node-specific, so a plain env literal, not a secret.
+    - AES_MASTER_SECRET=0123456789abcdef0123456789abcdef
+    - ENVIRONMENT=production
+
+  health_check:
+    type: tcp
+    endpoint: localhost:4000
+    interval: 30s
+    timeout: 5s
+    retries: 10
--- a/apps/indeedhub-ffmpeg/manifest.yml
+++ b/apps/indeedhub-ffmpeg/manifest.yml
@ -0,0 +1,51 @@
+app:
+  id: indeedhub-ffmpeg
+  name: IndeedHub FFmpeg Worker
+  version: "1.0.0"
+  description: IndeedHub background media transcoding worker.
+  category: community
+
+  # Hyphen name matches runtime references + the live container (adoption). No
+  # network_alias: nothing connects TO the worker — it only dials out to
+  # postgres/redis/minio (resolved by their aliases on indeedhub-net).
+  container_name: indeedhub-ffmpeg
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/indeedhub-ffmpeg:1.0.0
+    pull_policy: if-not-present
+    network: indeedhub-net
+    secret_env:
+      - key: DATABASE_PASSWORD
+        secret_file: indeedhub-db-password
+      - key: AWS_SECRET_KEY
+        secret_file: indeedhub-minio-password
+
+  dependencies:
+    - app_id: indeedhub-api
+
+  resources:
+    memory_limit: 4Gi
+
+  security:
+    capabilities: []
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  volumes: []
+
+  environment:
+    - DATABASE_HOST=postgres
+    - DATABASE_PORT=5432
+    - DATABASE_USER=indeedhub
+    - DATABASE_NAME=indeedhub
+    - QUEUE_HOST=redis
+    - QUEUE_PORT=6379
+    - S3_ENDPOINT=http://minio:9000
+    - AWS_REGION=us-east-1
+    - AWS_ACCESS_KEY=indeeadmin
+    - S3_PUBLIC_BUCKET_NAME=indeedhub-public
+    - S3_PRIVATE_BUCKET_NAME=indeedhub-private
+    - ENVIRONMENT=production
+    - AES_MASTER_SECRET=0123456789abcdef0123456789abcdef
--- a/apps/indeedhub-minio/manifest.yml
+++ b/apps/indeedhub-minio/manifest.yml
@ -0,0 +1,60 @@
+app:
+  id: indeedhub-minio
+  name: IndeedHub MinIO
+  version: "RELEASE.2024-11-07T00-52-20Z"
+  description: MinIO S3-compatible object storage for IndeedHub media.
+  category: community
+
+  # Hyphen name matches runtime references + the live container (adoption);
+  # alias `minio` is the short hostname the api/ffmpeg use (S3_ENDPOINT=
+  # http://minio:9000) AND the frontend nginx proxies to (http://minio:9000).
+  container_name: indeedhub-minio
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/minio:RELEASE.2024-11-07T00-52-20Z
+    pull_policy: if-not-present
+    network: indeedhub-net
+    network_aliases: [minio]
+    # `server /data` — the minio entrypoint args from the legacy installer.
+    custom_args: [server, /data]
+    generated_secrets:
+      - name: indeedhub-minio-password
+        kind: hex32
+    secret_env:
+      - key: MINIO_ROOT_PASSWORD
+        secret_file: indeedhub-minio-password
+
+  dependencies:
+    - storage: 50Gi
+
+  resources:
+    memory_limit: 1Gi
+    disk_limit: 50Gi
+
+  security:
+    capabilities: []
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  # Named volume matches the live indeedhub-minio-data volume on .228.
+  volumes:
+    - type: volume
+      source: indeedhub-minio-data
+      target: /data
+      options: [rw]
+
+  # MINIO_ROOT_USER "indeeadmin" is the fixed admin identity baked by the legacy
+  # installer (api/ffmpeg use it as AWS_ACCESS_KEY); the password is the
+  # generated secret above. Not secret, so it stays a plain env value.
+  environment:
+    - MINIO_ROOT_USER=indeeadmin
+
+  health_check:
+    type: http
+    endpoint: http://localhost:9000
+    path: /minio/health/live
+    interval: 30s
+    timeout: 5s
+    retries: 5
--- a/apps/indeedhub-postgres/manifest.yml
+++ b/apps/indeedhub-postgres/manifest.yml
@ -0,0 +1,59 @@
+app:
+  id: indeedhub-postgres
+  name: IndeedHub Postgres
+  version: "16.13-alpine"
+  description: Postgres database backend for IndeedHub.
+  category: community
+
+  # Container named indeedhub-postgres (hyphen) to match the runtime's existing
+  # per-app references (health_monitor tiers/deps, crash_recovery) and the live
+  # .228 install, so the orchestrator ADOPTS the running container instead of
+  # recreating it. `network_aliases: [postgres]` keeps the short hostname the
+  # api/ffmpeg/relay reach by (DATABASE_HOST=postgres) resolvable on
+  # indeedhub-net, reproducing the legacy `--network-alias postgres`.
+  container_name: indeedhub-postgres
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/postgres:16.13-alpine
+    pull_policy: if-not-present
+    network: indeedhub-net
+    network_aliases: [postgres]
+    generated_secrets:
+      - name: indeedhub-db-password
+        kind: hex32
+    secret_env:
+      - key: POSTGRES_PASSWORD
+        secret_file: indeedhub-db-password
+
+  dependencies:
+    - storage: 10Gi
+
+  resources:
+    memory_limit: 1Gi
+    disk_limit: 10Gi
+
+  security:
+    capabilities: [CHOWN, DAC_OVERRIDE, FOWNER, SETGID, SETUID]
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  # Named podman volume (matches the live indeedhub-postgres-data volume on .228);
+  # preserves all existing database content across the migration.
+  volumes:
+    - type: volume
+      source: indeedhub-postgres-data
+      target: /var/lib/postgresql/data
+      options: [rw]
+
+  environment:
+    - POSTGRES_USER=indeedhub
+    - POSTGRES_DB=indeedhub
+
+  health_check:
+    type: tcp
+    endpoint: localhost:5432
+    interval: 30s
+    timeout: 5s
+    retries: 3
--- a/apps/indeedhub-redis/manifest.yml
+++ b/apps/indeedhub-redis/manifest.yml
@ -0,0 +1,45 @@
+app:
+  id: indeedhub-redis
+  name: IndeedHub Redis
+  version: "7.4.8-alpine"
+  description: Redis queue/cache backend for IndeedHub.
+  category: community
+
+  # Hyphen name matches runtime references + the live container (adoption);
+  # alias `redis` is the short hostname the api/ffmpeg reach (QUEUE_HOST=redis).
+  container_name: indeedhub-redis
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/redis:7.4.8-alpine
+    pull_policy: if-not-present
+    network: indeedhub-net
+    network_aliases: [redis]
+
+  dependencies:
+    - storage: 1Gi
+
+  resources:
+    memory_limit: 256Mi
+
+  security:
+    capabilities: [SETGID, SETUID]
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  # Named volume matches the live indeedhub-redis-data volume on .228.
+  volumes:
+    - type: volume
+      source: indeedhub-redis-data
+      target: /data
+      options: [rw]
+
+  environment: []
+
+  health_check:
+    type: tcp
+    endpoint: localhost:6379
+    interval: 30s
+    timeout: 5s
+    retries: 3
--- a/apps/indeedhub-relay/manifest.yml
+++ b/apps/indeedhub-relay/manifest.yml
@ -0,0 +1,47 @@
+app:
+  id: indeedhub-relay
+  name: IndeedHub Nostr Relay
+  version: "0.9.0"
+  description: nostr-rs-relay backing IndeedHub's Nostr identity + comments.
+  category: community
+
+  # Hyphen name matches runtime references + the live container (adoption);
+  # alias `relay` is the short hostname the frontend nginx proxies to
+  # (http://relay:8080 for the /relay websocket).
+  container_name: indeedhub-relay
+
+  container:
+    image: 146.59.87.168:3000/lfg2025/nostr-rs-relay:0.9.0
+    pull_policy: if-not-present
+    network: indeedhub-net
+    network_aliases: [relay]
+
+  dependencies:
+    - storage: 2Gi
+
+  resources:
+    memory_limit: 256Mi
+    disk_limit: 2Gi
+
+  security:
+    capabilities: []
+    readonly_root: false
+    network_policy: isolated
+
+  ports: []
+
+  # Named volume matches the live indeedhub-relay-data volume on .228.
+  volumes:
+    - type: volume
+      source: indeedhub-relay-data
+      target: /usr/src/app/db
+      options: [rw]
+
+  environment: []
+
+  health_check:
+    type: tcp
+    endpoint: localhost:8080
+    interval: 30s
+    timeout: 5s
+    retries: 3
--- a/apps/indeedhub/manifest.yml
+++ b/apps/indeedhub/manifest.yml
@ -1,63 +1,84 @@
 app:
  id: indeedhub
  name: IndeeHub
-  version: 1.0.0
+  version: "1.0.0"
  description: Bitcoin documentary streaming platform featuring God Bless Bitcoin and other educational content about Bitcoin, sovereignty, and decentralized technology. Sign in with your Nostr identity.
  category: community

+  # The user-facing launcher (app_id "indeedhub"). Container is named "indeedhub"
+  # (matches the runtime's per-app references + the live container, so the
+  # orchestrator adopts it). Its nginx (listen 7777) proxies to the backends by
+  # their short aliases on indeedhub-net: api:4000, minio:9000, relay:8080.
+  container_name: indeedhub
+
  container:
    image: 146.59.87.168:3000/lfg2025/indeedhub:1.0.0
-    pull_policy: always  # Pull from registry; falls back to local build
+    pull_policy: if-not-present
    network: indeedhub-net

  dependencies:
+    - app_id: indeedhub-api
    - storage: 1Gi

  resources:
-    cpu_limit: 2
    memory_limit: 512Mi
    disk_limit: 1Gi

  security:
-    capabilities: []
-    readonly_root: true
-    no_new_privileges: true
-    user: 1001
-    seccomp_profile: default
-    network_policy: bridge
-    apparmor_profile: default
+    # nginx master runs as root and drops workers to the nginx user (uid/gid
+    # 101) — needs SET{UID,GID}; CHOWN + DAC_OVERRIDE let it own + write the
+    # proxy cache under the tmpfs /var/cache/nginx. The orchestrator does
+    # --cap-drop=ALL, so (unlike the legacy `podman run` default caps) these
+    # must be declared or nginx workers die with "setgid(101) failed".
+    capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID]
+    readonly_root: false
+    network_policy: isolated

  ports:
    - host: 7778
      container: 7777
-      protocol: tcp  # Web UI. Port 7777 on the host is reserved for Nostr relay.
+      protocol: tcp  # Web UI. Port 7777 on the host is reserved for the Nostr relay.

+  # Writable scratch the baked nginx needs; matches the legacy installer's
+  # --tmpfs /run + /var/cache/nginx.
  volumes:
-    - type: tmpfs
-      target: /tmp
-      options: [rw,noexec,nosuid,size=64m]
-    - type: tmpfs
-      target: /app/.next/cache
-      options: [rw,noexec,nosuid,size=128m]
    - type: tmpfs
      target: /run
-      options: [rw,nosuid,nodev,size=16m]
+      options: [rw, nosuid, nodev, size=16m]
    - type: tmpfs
      target: /var/cache/nginx
-      options: [rw,nosuid,nodev,size=32m]
+      options: [rw, nosuid, nodev, size=32m]

-  environment:
-    - NODE_ENV=production
-    - NEXT_TELEMETRY_DISABLED=1
+  environment: []

+  # Defensive + idempotent. The current indeedhub:1.0.0 image already bakes the
+  # iframe-friendly nginx (X-Frame-Options omitted, nostr-provider.js present +
+  # <script> injected), so these are mostly no-ops on that tag — but they keep
+  # the app iframe-loadable + the provider script fresh for any image build that
+  # predates the bake. copy_from_host pulls /opt/archipelago/web-ui/nostr-provider.js
+  # (kept current by frontend OTA releases). Replaces the legacy hardcoded
+  # patch_indeedhub_nostr_provider() Rust hook.
+  hooks:
+    post_install:
+      - exec: ["sed", "-i", "/X-Frame-Options/d", "/etc/nginx/conf.d/default.conf"]
+      - copy_from_host:
+          src: "web-ui/nostr-provider.js"
+          dest: "/usr/share/nginx/html/nostr-provider.js"
+      - exec: ["sh", "-c", "grep -q nostr-provider /etc/nginx/conf.d/default.conf || sed -i 's#</head>#<script src=\"/nostr-provider.js\"></script></head>#' /etc/nginx/conf.d/default.conf"]
+      - exec: ["nginx", "-s", "reload"]
+
+  # TCP liveness on the nginx port, NOT an http GET of /. nginx binds 7777 at
+  # startup (before workers), so this passes immediately and stays green under
+  # load. An http check of / runs the SPA + sub_filter and false-fails when the
+  # node is busy → the reconciler then treats the frontend as wedged and
+  # recreates it in a loop (observed churning the frontend on the loaded .198).
  health_check:
-    type: http
-    endpoint: http://localhost:3000
-    path: /
+    type: tcp
+    endpoint: localhost:7777
    interval: 30s
-    timeout: 10s
-    retries: 3
-    start_period: 40s
+    timeout: 5s
+    retries: 5
+    start_period: 30s

  interfaces:
    main:
--- a/apps/mempool/manifest.yml
+++ b/apps/mempool/manifest.yml
@ -5,7 +5,7 @@ app:
  description: Bitcoin mempool and blockchain explorer. Real-time transaction and block visualization.
  
  container:
-    image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0
+    image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1
    image_signature: cosign://...
    pull_policy: if-not-present
    
@ -30,7 +30,7 @@ app:
    
  ports:
    - host: 4080
-      container: 4080
+      container: 8080  # mempool-frontend nginx listens on 8080 (FRONTEND_HTTP_PORT=8080)
      protocol: tcp  # Web UI
    
  volumes:
--- a/apps/meshtastic/Dockerfile
+++ b/apps/meshtastic/Dockerfile
@ -1,5 +0,0 @@
-# Meshtastic - uses official image
-FROM meshtastic/meshtastic:latest
-
-# Default configuration is in the image
-# No additional setup needed
--- a/apps/meshtastic/manifest.yml
+++ b/apps/meshtastic/manifest.yml
@ -1,69 +0,0 @@
-app:
-  id: meshtastic
-  name: Meshtastic
-  version: 2-daily-alpine
-  description: Open-source mesh networking for LoRa radios. Create decentralized communication networks.
-  
-  container:
-    image: docker.io/meshtastic/meshtasticd:daily-alpine
-    pull_policy: if-not-present
-    
-  dependencies:
-    - storage: 1Gi
-    
-  resources:
-    cpu_limit: 1
-    memory_limit: 512Mi
-    disk_limit: 1Gi
-    
-  security:
-    capabilities: [NET_ADMIN, SYS_ADMIN]  # Required for LoRa radio access
-    readonly_root: false  # Needs write access for device management
-    no_new_privileges: true
-    user: 1000
-    seccomp_profile: default
-    network_policy: host  # Requires host network for radio access
-    apparmor_profile: meshtastic
-    
-  ports:
-    - host: 4403
-      container: 4403
-      protocol: tcp  # Meshtastic TCP API
-    
-  devices:
-    - /dev/ttyUSB0  # LoRa radio device (if connected)
-    
-  volumes:
-    - type: bind
-      source: /var/lib/archipelago/meshtastic
-      target: /var/lib/meshtasticd
-      options: [rw]
-
-  files:
-    - path: /var/lib/archipelago/meshtastic/config.yaml
-      content: |
-        General:
-          MACAddress: AA:BB:CC:DD:EE:01
-        Webserver:
-          Port: 4403
-      
-  environment:
-    - MESHTASTIC_PORT=/dev/ttyUSB0
-    - MESHTASTIC_SERIAL=true
-    
-  health_check:
-    type: cmd
-    endpoint: test -f /var/lib/meshtasticd/config.yaml
-    interval: 30s
-    timeout: 30s
-    retries: 5
-    
-  networking:
-    mesh_enabled: true
-    local_network_access: true
-
-  metadata:
-    icon: /assets/img/app-icons/meshcore.svg
-    category: networking
-    tier: recommended
-    repo: https://github.com/meshtastic/firmware
--- a/apps/netbird-dashboard/manifest.yml
+++ b/apps/netbird-dashboard/manifest.yml
@ -0,0 +1,77 @@
+app:
+  id: netbird-dashboard
+  name: NetBird Dashboard
+  version: "2.38.0"
+  description: NetBird management dashboard (SPA). Internal stack member served through the netbird proxy.
+  category: networking
+
+  # Hyphen name matches runtime references + the live container (adoption).
+  # Alias `netbird-dashboard` is the short hostname the proxy's nginx proxies to.
+  container_name: netbird-dashboard
+
+  container:
+    image: docker.io/netbirdio/dashboard:v2.38.0
+    pull_policy: if-not-present
+    network: netbird-net
+    network_aliases: [netbird-dashboard]
+    # The dashboard SPA bakes its API/OIDC base URL from these at container
+    # start. They must point at the proxy's public HTTPS origin (8087) so the
+    # browser uses a secure context (window.crypto.subtle / OIDC PKCE, #15).
+    # {{HOST_IP}} is the node's primary host IP, resolved at apply time.
+    derived_env:
+      - key: NETBIRD_MGMT_API_ENDPOINT
+        template: "https://{{HOST_IP}}:8087"
+      - key: NETBIRD_MGMT_GRPC_API_ENDPOINT
+        template: "https://{{HOST_IP}}:8087"
+      - key: AUTH_AUTHORITY
+        template: "https://{{HOST_IP}}:8087/oauth2"
+
+  dependencies:
+    - app_id: netbird-server
+
+  resources:
+    memory_limit: 256Mi
+
+  security:
+    # cap-drop=ALL is applied by the orchestrator. The dashboard image runs
+    # nginx (master as root, drops workers) binding :80 — needs the worker-drop
+    # caps + NET_BIND_SERVICE for the privileged port.
+    capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE]
+    readonly_root: false
+    network_policy: isolated
+
+  # Internal only — reached container-to-container by the proxy via netbird-net.
+  ports: []
+
+  volumes: []
+
+  environment:
+    - AUTH_AUDIENCE=netbird-dashboard
+    - AUTH_CLIENT_ID=netbird-dashboard
+    - AUTH_CLIENT_SECRET=
+    - USE_AUTH0=false
+    - AUTH_SUPPORTED_SCOPES=openid profile email groups
+    - AUTH_REDIRECT_URI=/nb-auth
+    - AUTH_SILENT_REDIRECT_URI=/nb-silent-auth
+    - NETBIRD_TOKEN_SOURCE=idToken
+    - NGINX_SSL_PORT=443
+    - LETSENCRYPT_DOMAIN=none
+
+  health_check:
+    type: tcp
+    endpoint: localhost:80
+    interval: 30s
+    timeout: 5s
+    retries: 5
+    start_period: 20s
+
+  metadata:
+    author: NetBird
+    icon: /assets/img/app-icons/netbird.svg
+    website: https://netbird.io
+    repo: https://github.com/netbirdio/dashboard
+    license: BSD-3-Clause
+    tags:
+      - networking
+      - vpn
+      - dashboard
--- a/apps/netbird-server/manifest.yml
+++ b/apps/netbird-server/manifest.yml
@ -0,0 +1,122 @@
+app:
+  id: netbird-server
+  name: NetBird Server
+  version: "0.71.2"
+  description: NetBird combined management / signal / relay server with an embedded identity provider and STUN. Backend for the self-hosted NetBird mesh VPN.
+  category: networking
+
+  # Hyphen name matches the runtime references (crash_recovery / dependencies /
+  # config startup order) + the live container, so on an existing node the
+  # orchestrator ADOPTS the running server rather than recreating it (data +
+  # the sqlite store under /var/lib/netbird preserved). Alias `netbird-server`
+  # is the short hostname the proxy's nginx proxies/grpc-passes to.
+  container_name: netbird-server
+
+  container:
+    image: docker.io/netbirdio/netbird-server:0.71.2
+    pull_policy: if-not-present
+    network: netbird-net
+    network_aliases: [netbird-server]
+    # The relay authSecret and the sqlite store encryptionKey are base64 keys
+    # (the server base64-decodes them to recover raw bytes — hex would decode to
+    # the wrong value). Generated once and reused: ensure_generated_secrets
+    # no-ops when the file already exists, so a re-render of config.yaml on an
+    # adopted node keeps the same keys (regenerating would orphan the store).
+    generated_secrets:
+      - name: netbird-relay-auth-secret
+        kind: base64
+      - name: netbird-store-encryption-key
+        kind: base64
+    # Pass the rendered config explicitly, mirroring the legacy `--config` arg.
+    custom_args: ["--config", "/etc/netbird/config.yaml"]
+
+  dependencies:
+    - storage: 1Gi
+
+  resources:
+    memory_limit: 1Gi
+
+  security:
+    # cap-drop=ALL is applied by the orchestrator. The server binds :80
+    # (management/signal/relay HTTP + gRPC) inside the container — a privileged
+    # port — so it needs NET_BIND_SERVICE. STUN is 3478/udp (unprivileged).
+    capabilities: [NET_BIND_SERVICE]
+    readonly_root: false
+    network_policy: isolated
+
+  ports:
+    - host: 8086
+      container: 80
+      protocol: tcp   # management API + embedded OIDC issuer (/oauth2)
+    - host: 3478
+      container: 3478
+      protocol: udp   # STUN — must be UDP; tcp here breaks relay discovery
+
+  volumes:
+    - type: bind
+      source: /var/lib/archipelago/netbird/data
+      target: /var/lib/netbird
+      options: [rw]
+    # The rendered config.yaml, read-only. Re-rendered on every reconcile from
+    # host facts + the base64 secrets; idempotent (stable bytes → no restart).
+    - type: bind
+      source: /var/lib/archipelago/netbird/config.yaml
+      target: /etc/netbird/config.yaml
+      options: [ro]
+
+  environment: []
+
+  # The server's config. {{HOST_IP}} is the node's primary host IP (the proxy's
+  # public origin is https on 8087 — the dashboard needs a secure context for
+  # OIDC PKCE, issue #15). {{secret:...}} are read 0600 from the secrets dir.
+  files:
+    - path: /var/lib/archipelago/netbird/config.yaml
+      overwrite: true
+      content: |
+        server:
+          listenAddress: ":80"
+          exposedAddress: "https://{{HOST_IP}}:8087"
+          stunPorts:
+            - 3478
+          metricsPort: 9090
+          healthcheckAddress: ":9000"
+          logLevel: "info"
+          logFile: "console"
+          authSecret: "{{secret:netbird-relay-auth-secret}}"
+          dataDir: "/var/lib/netbird"
+          auth:
+            issuer: "https://{{HOST_IP}}:8087/oauth2"
+            localAuthDisabled: false
+            signKeyRefreshEnabled: false
+            dashboardRedirectURIs:
+              - "https://{{HOST_IP}}:8087/nb-auth"
+              - "https://{{HOST_IP}}:8087/nb-silent-auth"
+            dashboardPostLogoutRedirectURIs:
+              - "https://{{HOST_IP}}:8087/"
+            cliRedirectURIs:
+              - "http://localhost:53000/"
+          store:
+            engine: "sqlite"
+            encryptionKey: "{{secret:netbird-store-encryption-key}}"
+
+  # TCP liveness on the management port. Binds at startup, stays green; an http
+  # check of /oauth2 would false-fail while the issuer warms up.
+  health_check:
+    type: tcp
+    endpoint: localhost:80
+    interval: 30s
+    timeout: 5s
+    retries: 10
+    start_period: 30s
+
+  metadata:
+    author: NetBird
+    icon: /assets/img/app-icons/netbird.svg
+    website: https://netbird.io
+    repo: https://github.com/netbirdio/netbird
+    license: BSD-3-Clause
+    tags:
+      - networking
+      - vpn
+      - wireguard
+      - mesh
--- a/apps/netbird/manifest.yml
+++ b/apps/netbird/manifest.yml
@ -0,0 +1,182 @@
+app:
+  id: netbird
+  name: NetBird
+  version: "2.38.0"
+  description: Self-hosted WireGuard mesh VPN control plane with dashboard, embedded identity provider, management API, signal, relay, and STUN. The user-facing entry point — a TLS proxy in front of the dashboard + server.
+  category: networking
+
+  # The user-facing launcher (app_id + container both "netbird", matching the
+  # runtime references + the live container so the orchestrator adopts it). This
+  # is the nginx that terminates TLS on 8087 and fans out to the dashboard +
+  # server by their short aliases on netbird-net.
+  container_name: netbird
+
+  container:
+    image: docker.io/library/nginx:1.27-alpine
+    pull_policy: if-not-present
+    network: netbird-net
+    # Self-signed TLS cert materialised before create — the dashboard needs a
+    # secure context (window.crypto.subtle / OIDC PKCE, issue #15), so the proxy
+    # serves HTTPS. Idempotent: kept as-is when crt+key already exist (a user
+    # accepts it once). SAN defaults to the host IP + 127.0.0.1 + localhost.
+    generated_certs:
+      - crt: /var/lib/archipelago/netbird/tls.crt
+        key: /var/lib/archipelago/netbird/tls.key
+
+  dependencies:
+    - app_id: netbird-server
+    - app_id: netbird-dashboard
+    - storage: 1Gi
+
+  resources:
+    memory_limit: 256Mi
+
+  security:
+    # cap-drop=ALL is applied by the orchestrator. nginx (master as root, drops
+    # workers) binds :443 — needs the worker-drop caps + NET_BIND_SERVICE.
+    capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE]
+    readonly_root: false
+    network_policy: isolated
+
+  ports:
+    # 8087 publishes the TLS listener (container :443). HTTPS is required for the
+    # dashboard's secure context (issue #15).
+    - host: 8087
+      container: 443
+      protocol: tcp
+
+  volumes:
+    - type: bind
+      source: /var/lib/archipelago/netbird/nginx.conf
+      target: /etc/nginx/conf.d/default.conf
+      options: [ro]
+    - type: bind
+      source: /var/lib/archipelago/netbird/tls.crt
+      target: /etc/nginx/tls.crt
+      options: [ro]
+    - type: bind
+      source: /var/lib/archipelago/netbird/tls.key
+      target: /etc/nginx/tls.key
+      options: [ro]
+
+  environment: []
+
+  # The proxy config. {{NETWORK_GATEWAY}} is the netbird-net bridge gateway =
+  # Podman's aardvark DNS. nginx uses it as an explicit `resolver` with VARIABLE
+  # upstreams so it re-resolves container names per request — without it nginx
+  # pins a container IP at startup and 502s forever once that IP moves on a
+  # restart/reboot (issue #15, observed live on .198). Every #15 fix below
+  # (CORS $http_origin reflect, grpc pass, nb-auth/nb-silent-auth rewrite to
+  # index.html, /relay websocket) is preserved verbatim from the legacy config.
+  files:
+    - path: /var/lib/archipelago/netbird/nginx.conf
+      overwrite: true
+      content: |
+        server {
+            listen 443 ssl;
+            server_name _;
+
+            # netbird's dashboard needs a secure context (window.crypto.subtle for
+            # OIDC PKCE), so the proxy terminates TLS with a self-signed cert (#15).
+            ssl_certificate /etc/nginx/tls.crt;
+            ssl_certificate_key /etc/nginx/tls.key;
+
+            # Rootless Podman can hand a container a new IP across restarts/reboots.
+            # nginx resolves a literal upstream name ONCE at startup and caches it,
+            # so after the IP moves every request 502s with "host unreachable"
+            # (issue #15, observed live on .198: nginx pinned to a dead
+            # netbird-dashboard IP). Fix: point `resolver` at the netbird-net
+            # gateway (Podman's aardvark DNS) and use VARIABLE upstreams, which
+            # forces nginx to re-resolve the container names at request time.
+            resolver {{NETWORK_GATEWAY}} valid=10s ipv6=off;
+
+            proxy_set_header Host $host;
+            proxy_set_header X-Real-IP $remote_addr;
+            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+            proxy_set_header X-Forwarded-Proto $scheme;
+            proxy_http_version 1.1;
+
+            location ~ ^/(relay|ws-proxy/) {
+                set $nb_server netbird-server;
+                proxy_pass http://$nb_server:80;
+                proxy_set_header Upgrade $http_upgrade;
+                proxy_set_header Connection "upgrade";
+                proxy_read_timeout 1d;
+            }
+
+            location ~ ^/(api|oauth2)(/|$) {
+                # The dashboard is a SPA whose API/OIDC base URL is baked at build
+                # time to one host:port. A single box is reached via several
+                # addresses, so those fetches are cross-origin and the browser
+                # blocks them with no Access-Control-Allow-Origin (#15, live on
+                # .198). Reflect the caller's Origin and answer the CORS preflight.
+                if ($request_method = OPTIONS) {
+                    add_header Access-Control-Allow-Origin $http_origin always;
+                    add_header Access-Control-Allow-Credentials true always;
+                    add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
+                    add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
+                    add_header Access-Control-Max-Age 86400 always;
+                    add_header Content-Length 0;
+                    return 204;
+                }
+                add_header Access-Control-Allow-Origin $http_origin always;
+                add_header Access-Control-Allow-Credentials true always;
+                add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
+                add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
+                set $nb_server netbird-server;
+                proxy_pass http://$nb_server:80;
+            }
+
+            location ~ ^/(signalexchange\.SignalExchange|management\.ManagementService|management\.ProxyService)/ {
+                set $nb_server netbird-server;
+                grpc_pass grpc://$nb_server:80;
+                grpc_read_timeout 1d;
+                grpc_send_timeout 1d;
+            }
+
+            # OIDC callback routes are client-side SPA routes with NO prebuilt page
+            # in the dashboard bundle, so proxying them straight through 404s —
+            # which crashes the dashboard's auth init and shows "Unauthenticated"
+            # with dead buttons (#15, live on .198: /nb-auth + /nb-silent-auth
+            # returned 404). Serve index.html at these paths (URL unchanged) so
+            # react-oidc boots and completes the login / silent-SSO.
+            location ~ ^/(nb-auth|nb-silent-auth) {
+                set $nb_dashboard netbird-dashboard;
+                rewrite ^.*$ /index.html break;
+                proxy_pass http://$nb_dashboard:80;
+            }
+
+            location / {
+                set $nb_dashboard netbird-dashboard;
+                proxy_pass http://$nb_dashboard:80;
+            }
+        }
+
+  health_check:
+    type: tcp
+    endpoint: localhost:443
+    interval: 30s
+    timeout: 5s
+    retries: 5
+    start_period: 20s
+
+  interfaces:
+    main:
+      name: Dashboard
+      description: Manage your self-hosted NetBird mesh VPN
+      type: ui
+      port: 8087
+      protocol: https
+      path: /
+
+  metadata:
+    author: NetBird
+    icon: /assets/img/app-icons/netbird.svg
+    website: https://netbird.io
+    repo: https://github.com/netbirdio/netbird
+    license: BSD-3-Clause
+    tags:
+      - networking
+      - vpn
+      - wireguard
+      - mesh
--- a/core/archipelago/src/api/rpc/container.rs
+++ b/core/archipelago/src/api/rpc/container.rs
@ -171,6 +171,13 @@ impl RpcHandler {
        // than the WebSocket-delivered package_data, which caused apps to flicker
        // between "installed" and "not-installed" in the UI.
        let (data, _) = self.state_manager.get_snapshot().await;
+        // Apps the user explicitly stopped must read as "stopped" even though a
+        // UI companion (electrs-ui, bitcoin-ui, …) keeps serving the launch port:
+        // launch_port_reachable() below would otherwise upgrade an exited backend
+        // back to "running". The reconcile guard keeps these backends down, so the
+        // marker is authoritative here.
+        let user_stopped =
+            crate::crash_recovery::load_user_stopped(&self.config.data_dir).await;
        if data.server_info.status_info.containers_scanned && !data.package_data.is_empty() {
            let mut containers = Vec::with_capacity(data.package_data.len());
            for (id, pkg) in &data.package_data {
@ -202,7 +209,11 @@ impl RpcHandler {
                // Scanner backoff preserves cached package_data. Refresh stable
                // states so callers do not see stale `running`/`exited` after
                // health-monitor recovery or Quadlet --rm container removal.
-                if state == "running" && requires_launch_port_for_health(id) {
+                if user_stopped.contains(id) {
+                    // User stopped it → authoritative "stopped". Do NOT let a
+                    // still-running UI companion's launch port mark it running.
+                    state = "stopped".to_string();
+                } else if state == "running" && requires_launch_port_for_health(id) {
                    if !self.cached_reachable_health(id).await?.is_some() {
                        state = live_state_for_app(id)
                            .await
--- a/core/archipelago/src/api/rpc/dispatcher.rs
+++ b/core/archipelago/src/api/rpc/dispatcher.rs
@ -57,6 +57,8 @@ impl RpcHandler {
            "package.uninstall" => self.clone().spawn_package_uninstall(params).await,
            "package.update" => self.clone().spawn_package_update(params).await,
            "package.check-updates" => self.handle_package_check_updates(params).await,
+            "package.versions" => self.handle_package_versions(params).await,
+            "package.set-config" => self.clone().handle_package_set_config(params).await,
            "package.credentials" => self.handle_package_credentials(params).await,
            "app.filebrowser-token" => self.handle_filebrowser_token().await,

--- a/core/archipelago/src/api/rpc/package/dependencies.rs
+++ b/core/archipelago/src/api/rpc/package/dependencies.rs
@ -376,16 +376,31 @@ pub(super) fn startup_order(package_id: &str) -> &'static [&'static str] {
 /// order for the given app. Unknown containers sort to the end.
 pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec<String>> {
    let containers = get_containers_for_app(package_id).await?;
+    Ok(order_present_containers(package_id, containers))
+}
+
+/// Order the *actually-present* containers of an app by its dependency-aware
+/// startup order. Containers whose name is unknown to the order list sort to
+/// the end, preserving their relative input order.
+///
+/// This deliberately does NOT inject order entries that aren't live
+/// containers. `startup_order` is a union of container-name variants across
+/// install generations (e.g. `mysql-mempool` vs `archy-mempool-db`), so any
+/// single install only ever has a subset of those names. Injecting a phantom
+/// name makes the start path fail on a "no such object" inspect — and because
+/// `do_orchestrator_package_start` propagates the unknown-app-id fallback
+/// error via `?`, every later member (the api + frontend) is then skipped,
+/// leaving the stack down until the health monitor recovers it minutes later.
+/// That was the source of mempool gate flakes #73 (frontend) / #74 (api).
+fn order_present_containers(package_id: &str, containers: Vec<String>) -> Vec<String> {
+    if containers.is_empty() {
+        // Nothing is live under any known name. Fall back to the package id so
+        // a single-container app whose container matches its id still gets one
+        // start attempt; multi-container stacks with no live members are
+        // surfaced as "no containers" by the caller's emptiness check.
+        return vec![package_id.to_string()];
+    }
    let order = startup_order(package_id);
-    if order.is_empty() && containers.is_empty() {
-        return Ok(vec![package_id.to_string()]);
-    }
-    let mut sorted = containers;
-    for required in order {
-        if !sorted.iter().any(|name| name == required) {
-            sorted.push((*required).to_string());
-        }
-    }
    // If no special order is defined, fall back to mempool order for legacy
    // multi-container names that may still be returned by config lookups.
    let effective_order: &[&str] = if order.is_empty() {
@ -393,8 +408,14 @@ pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec
    } else {
        order
    };
-    sorted.sort_by_key(|c| effective_order.iter().position(|o| *o == c).unwrap_or(99));
-    Ok(sorted)
+    let mut sorted = containers;
+    sorted.sort_by_key(|c| {
+        effective_order
+            .iter()
+            .position(|o| *o == c)
+            .unwrap_or(usize::MAX)
+    });
+    sorted
 }

 /// Configure Fedimint Gateway to use LND instead of LDK.
@ -452,7 +473,48 @@ pub(super) fn configure_fedimint_lnd(

 #[cfg(test)]
 mod tests {
-    use super::{requires_unpruned_bitcoin, startup_order};
+    use super::{order_present_containers, requires_unpruned_bitcoin, startup_order};
+
+    #[test]
+    fn order_present_containers_never_injects_phantom_stack_members() {
+        // The live mempool stack on a node: db + api + frontend. These are the
+        // only real container names; the startup_order list also contains
+        // variant/legacy names (mysql-mempool, archy-mempool-api, ...) that are
+        // NOT live here and must never appear in the result — a phantom name in
+        // the start list aborts the orchestrator start mid-sequence (gate
+        // #73/#74).
+        let present = vec![
+            "mempool".to_string(),
+            "mempool-api".to_string(),
+            "archy-mempool-db".to_string(),
+        ];
+        let ordered = order_present_containers("mempool", present);
+        // Dependency order: db -> api -> frontend.
+        assert_eq!(ordered, vec!["archy-mempool-db", "mempool-api", "mempool"]);
+        // No phantom variants leaked in.
+        for phantom in ["mysql-mempool", "archy-mempool-api", "archy-mempool-web"] {
+            assert!(
+                !ordered.iter().any(|c| c == phantom),
+                "phantom {phantom} must not be injected"
+            );
+        }
+    }
+
+    #[test]
+    fn order_present_containers_orders_known_before_unknown() {
+        let present = vec!["mempool".to_string(), "some-sidecar".to_string()];
+        let ordered = order_present_containers("mempool", present);
+        // The known frontend sorts ahead of an unknown sidecar.
+        assert_eq!(ordered, vec!["mempool", "some-sidecar"]);
+    }
+
+    #[test]
+    fn order_present_containers_empty_falls_back_to_package_id() {
+        assert_eq!(
+            order_present_containers("mempool", vec![]),
+            vec!["mempool".to_string()]
+        );
+    }

    #[test]
    fn btcpay_start_order_includes_required_stack_members() {
--- a/core/archipelago/src/api/rpc/package/install.rs
+++ b/core/archipelago/src/api/rpc/package/install.rs
@ -243,6 +243,17 @@ impl RpcHandler {
            }
        }

+        // Multi-version support: honor an install-time version selection for the
+        // orchestrator-managed Bitcoin apps. Selecting the catalog default (or
+        // omitting `version`) leaves the app unpinned (tracks latest); selecting
+        // an older version pins it so install_fresh resolves that image and the
+        // update badge stays suppressed. See docs/bitcoin-multi-version-design.md.
+        if matches!(package_id, "bitcoin-core" | "bitcoin-knots") {
+            if let Some(version) = params.get("version").and_then(|v| v.as_str()) {
+                persist_install_version_selection(package_id, version).await;
+            }
+        }
+
        // Phase: Preparing — emit BEFORE the stack dispatch so multi-container
        // stacks also flip state to Installing immediately. Without this, the
        // backend's package state for stack apps stayed empty until the first
@ -2427,6 +2438,36 @@ exit 2
    }
 }

+/// Persist an install-time version selection for a multi-version app. Selecting
+/// the catalog default (or a version equal to it) un-pins so the app tracks
+/// latest; selecting any other version pins it. Best-effort: a write failure
+/// just means the app installs at the catalog default.
+async fn persist_install_version_selection(app_id: &str, version: &str) {
+    use crate::container::version_config::{read, write, AppVersionConfig};
+    let is_default = crate::container::app_catalog::catalog_default_version(app_id)
+        .map(|d| d == version)
+        .unwrap_or(false);
+    let existing = read(app_id);
+    let cfg = AppVersionConfig {
+        pinned_version: if is_default {
+            None
+        } else {
+            Some(version.to_string())
+        },
+        auto_update: existing.auto_update,
+    };
+    if let Err(e) = write(app_id, &cfg) {
+        tracing::warn!(app_id, version, error = %e, "failed to persist install-time version selection");
+    } else {
+        tracing::info!(
+            app_id,
+            version,
+            pinned = !is_default,
+            "persisted install-time version selection"
+        );
+    }
+}
+
 fn should_try_orchestrator_install(package_id: &str, orchestrator_available: bool) -> bool {
    orchestrator_available && uses_orchestrator_install_flow(package_id)
 }
--- a/core/archipelago/src/api/rpc/package/mod.rs
+++ b/core/archipelago/src/api/rpc/package/mod.rs
@ -5,6 +5,7 @@ mod install;
 mod lifecycle;
 mod progress;
 mod runtime;
+mod set_config;
 mod stacks;
 mod update;
 mod validation;
--- a/core/archipelago/src/api/rpc/package/runtime.rs
+++ b/core/archipelago/src/api/rpc/package/runtime.rs
@ -22,6 +22,11 @@ const PODMAN_LOG_TIMEOUT: Duration = Duration::from_secs(15);
 /// Per-container graceful shutdown timeout in seconds.
 /// Bitcoin Core needs 600s to flush UTXO set, LND 330s for channel state,
 /// indexers 300s for index flush, databases 120s for WAL/transaction commit.
+///
+/// MIRRORS `archipelago_container::runtime::stop_grace_secs_for` (which returns
+/// `u64` and is the canonical table used by the orchestrator stop path). This
+/// `&str` variant exists for the legacy `podman stop -t <s>` call sites here —
+/// keep the two tables in sync until those are migrated to the orchestrator.
 pub fn stop_timeout_secs(container_name: &str) -> &'static str {
    let id = container_name
        .strip_prefix("archy-")
@ -307,7 +312,16 @@ impl RpcHandler {

        let mut stopped = 0u32;
        let mut removed = 0u32;
-        let mut errors = Vec::new();
+        // Two distinct failure classes, kept separate so they don't get
+        // conflated (the old single `errors` vec did, which caused the "ghost in
+        // My Apps" bug): `container_errors` means a container could NOT be
+        // removed (force-rm failed too) — the app is genuinely still present, so
+        // we keep its state entry and surface a hard error. `cleanup_errors`
+        // means volume/network/data-dir teardown left residue — the containers
+        // are already gone, so the app IS uninstalled and MUST disappear from My
+        // Apps; the residue is logged but never ghosts the app.
+        let mut container_errors: Vec<String> = Vec::new();
+        let mut cleanup_errors: Vec<String> = Vec::new();

        self.set_uninstall_stage(
            package_id,
@ -365,7 +379,7 @@ impl RpcHandler {
                            let msg =
                                format!("Failed to remove {}: {}; {}", name, stderr.trim(), e);
                            tracing::error!("Uninstall {}: {}", package_id, msg);
-                            errors.push(msg);
+                            container_errors.push(msg);
                        }
                    }
                }
@ -374,12 +388,35 @@ impl RpcHandler {
                    Err(force_err) => {
                        let msg = format!("Failed to remove {}: {}; {}", name, e, force_err);
                        tracing::error!("Uninstall {}: {}", package_id, msg);
-                        errors.push(msg);
+                        container_errors.push(msg);
                    }
                },
            }
        }

+        // A container that survived even force-remove means the app is NOT
+        // actually uninstalled — keep its state entry and fail so the spawned
+        // task reverts it to its prior state (and the user can retry), rather
+        // than orphaning a live container that's missing from My Apps.
+        if !container_errors.is_empty() {
+            tracing::error!(
+                "Uninstall {}: containers could not be removed: {:?}",
+                package_id,
+                container_errors
+            );
+            return Err(anyhow::anyhow!(
+                "Uninstall {} failed: {}",
+                package_id,
+                container_errors.join("; ")
+            ));
+        }
+
+        // Containers are gone → the app is uninstalled. Remove its state entry
+        // NOW, before the (possibly slow, possibly fallible) volume/data
+        // teardown below, so My Apps updates immediately and a residue failure
+        // can never leave a ghost. Reinstall/scan no longer see a stale entry.
+        self.remove_package_state_entry(package_id).await;
+
        self.set_uninstall_stage(package_id, "Cleaning up volumes")
            .await;
        // Avoid global Podman volume prune on production nodes: store-wide
@ -427,70 +464,73 @@ impl RpcHandler {
                        let stderr = String::from_utf8_lossy(&o.stderr);
                        let msg = format!("Failed to remove data {}: {}", dir, stderr.trim());
                        tracing::error!("Uninstall {}: {}", package_id, msg);
-                        errors.push(msg);
+                        cleanup_errors.push(msg);
                    }
                    Err(e) => {
                        let msg = format!("Failed to remove data {}: {}", dir, e);
                        tracing::error!("Uninstall {}: {}", package_id, msg);
-                        errors.push(msg);
+                        cleanup_errors.push(msg);
                    }
                    _ => {}
                }
            }
        }

-        if !errors.is_empty() {
+        // The app is already gone from My Apps (entry removed above). Residual
+        // volume/data cleanup failures are logged but NEVER ghost the app — a
+        // reinstall and the next uninstall both tolerate leftover dirs.
+        if !cleanup_errors.is_empty() {
            tracing::error!(
-                "Uninstall {} completed with errors: {:?}",
+                "Uninstall {} removed but left cleanup residue: {:?}",
                package_id,
-                errors
+                cleanup_errors
            );
-            return Err(anyhow::anyhow!(
-                "Uninstall {} partially failed: {}",
-                package_id,
-                errors.join("; ")
-            ));
        }

        tracing::info!(
-            "Uninstall {} complete: stopped={}, removed={}",
+            "Uninstall {} complete: stopped={}, removed={}, cleanup_errors={}",
            package_id,
            stopped,
-            removed
+            removed,
+            cleanup_errors.len()
        );

-        // Immediately remove from in-memory state so the UI updates without
-        // waiting for the scanner's absence threshold (3 scans × 60s each).
-        {
-            let (mut data, _rev) = self.state_manager.get_snapshot().await;
-            let before = data.package_data.len();
-            data.package_data.remove(package_id);
-            // Also remove any alias keys (e.g. "bitcoin-knots" vs "bitcoin")
-            let aliases: Vec<String> = data
-                .package_data
-                .keys()
-                .filter(|k| {
-                    super::config::all_container_names(package_id)
-                        .iter()
-                        .any(|c| c.strip_prefix("archy-").unwrap_or(c) == k.as_str())
-                })
-                .cloned()
-                .collect();
-            for alias in &aliases {
-                data.package_data.remove(alias);
-            }
-            if data.package_data.len() < before {
-                self.state_manager.update_data(data).await;
-            }
-        }
-
        Ok(serde_json::json!({
            "status": "uninstalled",
            "stopped": stopped,
            "removed": removed,
+            "cleanup_warnings": cleanup_errors,
        }))
    }

+    /// Remove a package's entry (and any alias keys) from persisted state so it
+    /// disappears from My Apps immediately, without waiting for the scanner's
+    /// absence threshold (3 scans × 60s). Called as soon as an uninstall has
+    /// removed the app's containers — before the slower volume/data teardown —
+    /// so a residue failure can never leave a ghost entry behind.
+    async fn remove_package_state_entry(&self, package_id: &str) {
+        let (mut data, _rev) = self.state_manager.get_snapshot().await;
+        let before = data.package_data.len();
+        data.package_data.remove(package_id);
+        // Also remove any alias keys (e.g. "bitcoin-knots" vs "bitcoin").
+        let aliases: Vec<String> = data
+            .package_data
+            .keys()
+            .filter(|k| {
+                super::config::all_container_names(package_id)
+                    .iter()
+                    .any(|c| c.strip_prefix("archy-").unwrap_or(c) == k.as_str())
+            })
+            .cloned()
+            .collect();
+        for alias in &aliases {
+            data.package_data.remove(alias);
+        }
+        if data.package_data.len() < before {
+            self.state_manager.update_data(data).await;
+        }
+    }
+
    /// Start a bundled app (create container from pre-loaded image if needed).
    pub(in crate::api::rpc) async fn handle_bundled_app_start(
        &self,
--- a/core/archipelago/src/api/rpc/package/set_config.rs
+++ b/core/archipelago/src/api/rpc/package/set_config.rs
@ -0,0 +1,268 @@
+//! Multi-version support — version listing + in-app version switch / pin /
+//! auto-update toggle (`docs/bitcoin-multi-version-design.md` §3 Phase 3).
+//!
+//! Two RPCs:
+//!   - `package.versions` — read the selectable versions for an app plus the
+//!     runner's current pin / auto-update preference and (best-effort) the
+//!     version actually running. Drives the install modal + "Version & Updates"
+//!     card.
+//!   - `package.set-config` — persist a version pin (or un-pin to track latest)
+//!     and/or the auto-update toggle, then recreate the app at the chosen image
+//!     when the version actually changed. A DOWNGRADE (older release over a
+//!     newer chainstate — the highest-risk operation, design §4) is refused
+//!     unless the caller passes `confirm: true`, so the UI can warn first.
+
+use super::config::get_containers_for_app;
+use super::install::install_log;
+use super::validation::validate_app_id;
+use crate::api::rpc::RpcHandler;
+use crate::container::{app_catalog, version_config};
+use anyhow::Result;
+use std::sync::Arc;
+use tracing::{info, warn};
+
+/// Apps that participate in multi-version selection today. Kept narrow on
+/// purpose: version switching recreates the container, which is only safe for
+/// the single-container, orchestrator-managed Bitcoin backends whose data and
+/// downgrade semantics we understand. Any app the catalog gives a `versions[]`
+/// list also qualifies (third-party registry apps inherit the capability).
+fn supports_versions(app_id: &str) -> bool {
+    matches!(app_id, "bitcoin-core" | "bitcoin-knots")
+        || !app_catalog::catalog_versions(app_id).is_empty()
+}
+
+/// Extract the tag from a full image reference, leaving a `registry:port/repo`
+/// host-port colon intact (only a colon AFTER the last `/` is a tag).
+fn image_tag(image: &str) -> Option<String> {
+    let after_slash = image.rsplit_once('/').map(|(_, r)| r).unwrap_or(image);
+    after_slash
+        .rsplit_once(':')
+        .map(|(_, tag)| tag.to_string())
+        .filter(|t| !t.is_empty())
+}
+
+/// Best-effort: the version tag of the backend container actually running for
+/// `app_id`, by inspecting its image. `None` when not installed or unreadable.
+async fn installed_version(app_id: &str) -> Option<String> {
+    let containers = get_containers_for_app(app_id).await.ok()?;
+    // Prefer the backend container (exact id / `archy-<id>`) over UI companions.
+    let name = containers
+        .iter()
+        .find(|n| n.as_str() == app_id || n.as_str() == format!("archy-{app_id}"))
+        .or_else(|| containers.first())?;
+    let out = tokio::process::Command::new("podman")
+        .args(["inspect", name, "--format", "{{.ImageName}}"])
+        .output()
+        .await
+        .ok()?;
+    if !out.status.success() {
+        return None;
+    }
+    let image = String::from_utf8_lossy(&out.stdout).trim().to_string();
+    image_tag(&image)
+}
+
+impl RpcHandler {
+    /// `package.versions` — what a runner can install / switch to for this app,
+    /// plus their current preference and the running version.
+    pub(in crate::api::rpc) async fn handle_package_versions(
+        &self,
+        params: Option<serde_json::Value>,
+    ) -> Result<serde_json::Value> {
+        let params = params.ok_or_else(|| anyhow::anyhow!("Missing params"))?;
+        let app_id = params
+            .get("id")
+            .and_then(|v| v.as_str())
+            .ok_or_else(|| anyhow::anyhow!("Missing package id"))?;
+        validate_app_id(app_id)?;
+
+        let versions = app_catalog::catalog_versions(app_id);
+        let default = app_catalog::catalog_default_version(app_id);
+        let cfg = version_config::read(app_id);
+        let installed = installed_version(app_id).await;
+
+        Ok(serde_json::json!({
+            "id": app_id,
+            "supportsVersions": supports_versions(app_id),
+            "default": default,
+            "installedVersion": installed,
+            "pinnedVersion": cfg.pinned_version,
+            "autoUpdate": cfg.auto_update,
+            "versions": versions.iter().map(|v| serde_json::json!({
+                "version": v.version,
+                "default": v.default,
+                "deprecated": v.deprecated,
+                "eol": v.eol,
+            })).collect::<Vec<_>>(),
+        }))
+    }
+
+    /// `package.set-config` — persist version pin + auto-update preference and
+    /// recreate on an actual version change. Downgrades require `confirm:true`.
+    pub(in crate::api::rpc) async fn handle_package_set_config(
+        self: Arc<Self>,
+        params: Option<serde_json::Value>,
+    ) -> Result<serde_json::Value> {
+        let params = params.ok_or_else(|| anyhow::anyhow!("Missing params"))?;
+        let app_id = params
+            .get("id")
+            .and_then(|v| v.as_str())
+            .ok_or_else(|| anyhow::anyhow!("Missing package id"))?
+            .to_string();
+        validate_app_id(&app_id)?;
+
+        if !supports_versions(&app_id) {
+            return Err(anyhow::anyhow!(
+                "{} has no selectable versions in the catalog",
+                app_id
+            ));
+        }
+
+        let confirm = params
+            .get("confirm")
+            .and_then(|v| v.as_bool())
+            .unwrap_or(false);
+        let existing = version_config::read(&app_id);
+        let default = app_catalog::catalog_default_version(&app_id);
+
+        // ---- Resolve the requested pin (if a version was supplied) ----------
+        // Absent `version` => leave the pin unchanged (an auto-update-only edit).
+        // `version == default` => un-pin (track latest). Any other version must
+        // exist in the catalog and resolve to a same-repo image, else reject.
+        let version_param = params
+            .get("version")
+            .and_then(|v| v.as_str())
+            .map(str::to_string);
+        let mut new_pin = existing.pinned_version.clone();
+        let mut version_changed = false;
+        if let Some(req) = version_param.as_deref() {
+            let resolved_pin = if default.as_deref() == Some(req) {
+                None // selecting the default un-pins
+            } else {
+                // Validate the version is real + same-repo before pinning.
+                if !app_catalog::catalog_versions(&app_id)
+                    .iter()
+                    .any(|v| v.version == req)
+                {
+                    return Err(anyhow::anyhow!(
+                        "version {} is not offered for {}",
+                        req,
+                        app_id
+                    ));
+                }
+                Some(req.to_string())
+            };
+            version_changed = resolved_pin != existing.pinned_version;
+            new_pin = resolved_pin;
+        }
+
+        let new_auto_update = params
+            .get("autoUpdate")
+            .and_then(|v| v.as_bool())
+            .unwrap_or(existing.auto_update);
+
+        // ---- Downgrade gate (design §4: warn + confirm + allow) -------------
+        // "Current" = what wrote the on-disk chainstate: the running version if
+        // we can read it, else the existing pin, else the catalog default.
+        if version_changed {
+            let target = version_param.as_deref().unwrap_or_default();
+            let current = installed_version(&app_id)
+                .await
+                .or_else(|| existing.pinned_version.clone())
+                .or_else(|| default.clone());
+            if let Some(current) = current {
+                if version_config::is_downgrade(&current, target) && !confirm {
+                    warn!(
+                        "set-config {}: refusing un-confirmed downgrade {} -> {}",
+                        app_id, current, target
+                    );
+                    return Ok(serde_json::json!({
+                        "status": "confirm_required",
+                        "kind": "downgrade",
+                        "id": app_id,
+                        "currentVersion": current,
+                        "targetVersion": target,
+                        "warning": format!(
+                            "Switching {app_id} from {current} down to {target} is a \
+                             downgrade. Bitcoin may refuse to start on a chainstate \
+                             written by the newer version without a full reindex, and \
+                             a pruned node can lose block data. Re-confirm to proceed."
+                        ),
+                    }));
+                }
+            }
+        }
+
+        // ---- Persist preference --------------------------------------------
+        version_config::write(
+            &app_id,
+            &version_config::AppVersionConfig {
+                pinned_version: new_pin.clone(),
+                auto_update: new_auto_update,
+            },
+        )?;
+        install_log(&format!(
+            "SET-CONFIG {}: pinned={:?} autoUpdate={} (version_changed={})",
+            app_id, new_pin, new_auto_update, version_changed
+        ))
+        .await;
+        info!(
+            app_id = %app_id,
+            pinned = ?new_pin,
+            auto_update = new_auto_update,
+            version_changed,
+            "package.set-config applied"
+        );
+
+        // ---- Recreate when the version actually changed + app is installed --
+        // The orchestrator's install/recreate path reads the pin we just wrote
+        // (prod_orchestrator image resolution), so reusing the update machinery
+        // pulls + recreates at the chosen image. An auto-update-only edit, or a
+        // change to a not-installed app, just persists the preference.
+        let mut recreating = false;
+        if version_changed {
+            let installed = get_containers_for_app(&app_id)
+                .await
+                .map(|c| !c.is_empty())
+                .unwrap_or(false);
+            if installed {
+                recreating = true;
+                // Fire the existing async update flow; it flips state to
+                // Updating and recreates honoring the new pin. The UI polls.
+                self.clone()
+                    .spawn_package_update(Some(serde_json::json!({ "id": app_id })))
+                    .await?;
+            }
+        }
+
+        Ok(serde_json::json!({
+            "status": "ok",
+            "id": app_id,
+            "pinnedVersion": new_pin,
+            "autoUpdate": new_auto_update,
+            "versionChanged": version_changed,
+            "recreating": recreating,
+        }))
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::image_tag;
+
+    #[test]
+    fn image_tag_keeps_registry_port_colon() {
+        assert_eq!(
+            image_tag("146.59.87.168:3000/lfg2025/bitcoin:28.4").as_deref(),
+            Some("28.4")
+        );
+        assert_eq!(
+            image_tag("146.59.87.168:3000/lfg2025/bitcoin-knots:29.3.knots20260508")
+                .as_deref(),
+            Some("29.3.knots20260508")
+        );
+        // No tag => None (don't mistake the registry port for a tag).
+        assert_eq!(image_tag("146.59.87.168:3000/lfg2025/bitcoin"), None);
+        assert_eq!(image_tag("docker.io/library/redis:7"), Some("7".to_string()));
+    }
+}
--- a/core/archipelago/src/api/rpc/package/stacks.rs
+++ b/core/archipelago/src/api/rpc/package/stacks.rs
@ -6,7 +6,6 @@
 use crate::api::rpc::RpcHandler;
 use crate::data_model::InstallPhase;
 use anyhow::{Context, Result};
-use base64::Engine;
 use std::process::Output;
 use std::time::Duration;
 use tracing::info;
@ -620,16 +619,25 @@ async fn install_stack_via_orchestrator(
    ))
    .await;

+    let mut installed = 0usize;
    for app_id in app_ids {
        match orchestrator.install(app_id).await {
            Ok(container_name) => {
+                installed += 1;
                install_log(&format!(
                    "INSTALL ORCH: {} stack — app {} installed as {}",
                    stack_name, app_id, container_name
                ))
                .await;
            }
-            Err(e) if e.to_string().contains("unknown app_id") => {
+            Err(e) if e.to_string().contains("unknown app_id") && installed == 0 => {
+                // None of the stack's manifests are known — the orchestrator
+                // can't render this stack at all, so defer to the legacy
+                // installer. Only safe when NOTHING was installed yet: once an
+                // earlier member is up, falling back would let the legacy path
+                // double-create containers on the same data dir (observed
+                // corrupting an immich postgres cluster — two postmasters, one
+                // PGDATA). A partial set means a deploy bug, not a legacy node.
                install_log(&format!(
                    "INSTALL ORCH SKIP: {} stack — app {} unknown, falling back to legacy stack installer",
                    stack_name, app_id
@ -637,6 +645,17 @@ async fn install_stack_via_orchestrator(
                .await;
                return Ok(None);
            }
+            Err(e) if e.to_string().contains("unknown app_id") => {
+                install_log(&format!(
+                    "INSTALL ORCH FAIL: {} stack — app {} unknown AFTER {} installed; refusing legacy fallback (would double-create on shared data)",
+                    stack_name, app_id, installed
+                ))
+                .await;
+                return Err(e.context(format!(
+                    "orchestrator stack install {} aborted: app {} has no manifest but {} member(s) already installed — deploy all stack manifests",
+                    stack_name, app_id, installed
+                )));
+            }
            Err(e) => {
                install_log(&format!(
                    "INSTALL ORCH FAIL: {} stack — app {} failed: {}",
@ -668,11 +687,42 @@ fn mempool_stack_app_ids() -> &'static [&'static str] {
    &["archy-mempool-db", "mempool-api", "archy-mempool-web"]
 }

-const REGISTRY: &str = "146.59.87.168:3000/lfg2025";
+fn immich_stack_app_ids() -> &'static [&'static str] {
+    // Install order = dependency order: db + cache before the server. The server
+    // app_id is the user-facing "immich" (canonical name + icon); its install is
+    // handled here (not recursively) since orchestrator.install bypasses the
+    // package.install routing that maps "immich" → this stack installer.
+    &["immich-postgres", "immich-redis", "immich"]
+}

-const NETBIRD_DASHBOARD_IMAGE: &str = "docker.io/netbirdio/dashboard:v2.38.0";
-const NETBIRD_SERVER_IMAGE: &str = "docker.io/netbirdio/netbird-server:0.71.2";
-const NETBIRD_PROXY_IMAGE: &str = "docker.io/library/nginx:1.27-alpine";
+fn netbird_stack_app_ids() -> &'static [&'static str] {
+    // Dependency/startup order: the combined management/signal/relay server
+    // first (it owns the base64 relay/store secrets + the sqlite store, and is
+    // the OIDC issuer the others point at), then the dashboard SPA, then the
+    // user-facing TLS proxy ("netbird", which carries the self-signed cert +
+    // the templated nginx.conf and is the launcher). Mirrors the netbird
+    // startup_order in dependencies.rs.
+    &["netbird-server", "netbird-dashboard", "netbird"]
+}
+
+fn indeedhub_stack_app_ids() -> &'static [&'static str] {
+    // Dependency order: backends + their generated secrets first, then the api
+    // (owns indeedhub-jwt; reads the db/minio secrets the backends materialised),
+    // then the ffmpeg worker, then the user-facing frontend ("indeedhub", which
+    // carries the post_install nginx hook). The frontend's nginx reaches the
+    // backends by their short network_aliases (api/minio/relay) on indeedhub-net.
+    &[
+        "indeedhub-postgres",
+        "indeedhub-redis",
+        "indeedhub-minio",
+        "indeedhub-relay",
+        "indeedhub-api",
+        "indeedhub-ffmpeg",
+        "indeedhub",
+    ]
+}
+
+const REGISTRY: &str = "146.59.87.168:3000/lfg2025";

 /// Pull an image with retry and exponential backoff (3 attempts).
 async fn pull_image_with_retry(image: &str) -> Result<()> {
@ -734,6 +784,17 @@ async fn pull_image_with_retry(image: &str) -> Result<()> {
 impl RpcHandler {
    /// Install Immich stack (postgres + redis + server).
    pub(super) async fn install_immich_stack(&self) -> Result<serde_json::Value> {
+        // Manifest-driven path (workstream B/C): render the stack from
+        // apps/immich-*/manifest.yml via the orchestrator (rootless Quadlet
+        // units, generated_secrets, reboot-survivable). Falls back to the legacy
+        // installer below only when the orchestrator doesn't know these app_ids
+        // (manifests not yet deployed). See docs/PRODUCTION-MASTER-PLAN.md.
+        if let Some(orchestrated) =
+            install_stack_via_orchestrator(self, "immich", immich_stack_app_ids()).await?
+        {
+            return Ok(orchestrated);
+        }
+
        if let Some(adopted) = adopt_stack_if_exists(
            "immich_server",
            "immich",
@ -1383,6 +1444,20 @@ impl RpcHandler {

    /// Install the IndeedHub multi-container stack.
    pub(super) async fn install_indeedhub_stack(&self) -> Result<serde_json::Value> {
+        // Manifest-driven path (#20 phase 3): render the 7-member stack from
+        // apps/indeedhub-*/manifest.yml via the orchestrator (dedicated
+        // indeedhub-net + network_aliases, generated_secrets, the frontend's
+        // post_install nginx hook, reboot-survivable). The manifests use the exact
+        // live container names / named volumes, so on an existing node this ADOPTS
+        // the running stack rather than recreating it (data preserved). Falls back
+        // to the legacy installer below only when the orchestrator doesn't know
+        // these app_ids (manifests not yet deployed). See PRODUCTION-MASTER-PLAN.md.
+        if let Some(orchestrated) =
+            install_stack_via_orchestrator(self, "indeedhub", indeedhub_stack_app_ids()).await?
+        {
+            return Ok(orchestrated);
+        }
+
        let registry = crate::container::registry::load_registries(&self.config.data_dir)
            .await
            .unwrap_or_default()
@ -1758,6 +1833,27 @@ impl RpcHandler {

    /// Install self-hosted NetBird (dashboard + combined management/signal/relay server).
    pub(super) async fn install_netbird_stack(&self) -> Result<serde_json::Value> {
+        // Manifest-driven path (#20 phase 4): render the 3-member stack from
+        // apps/netbird-*/manifest.yml via the orchestrator — dedicated
+        // netbird-net + network_aliases, base64 generated_secrets, a self-signed
+        // TLS cert (generated_certs) so the dashboard gets a secure context for
+        // OIDC PKCE (#15), and templated config.yaml/nginx.conf rendered from
+        // host facts + the netbird-net gateway. The manifests use the exact live
+        // container names, so on an existing node this ADOPTS the running stack
+        // rather than recreating it (the sqlite store + base64 keys are
+        // preserved — ensure_generated_secrets no-ops on existing files).
+        //
+        // #20 ph4: the legacy hardcoded `podman run` installer was DELETED — the
+        // signed catalog always ships apps/netbird-*/manifest.yml, so there is no
+        // in-Rust fallback. If the orchestrator doesn't know these app_ids and no
+        // running stack exists to adopt, install errors rather than silently
+        // diverging from the manifest contract.
+        if let Some(orchestrated) =
+            install_stack_via_orchestrator(self, "netbird", netbird_stack_app_ids()).await?
+        {
+            return Ok(orchestrated);
+        }
+
        if let Some(adopted) = adopt_stack_if_exists(
            "netbird",
            "netbird",
@ -1768,491 +1864,12 @@ impl RpcHandler {
            return Ok(adopted);
        }

-        install_log("INSTALL START: netbird stack (dashboard + server)").await;
-        info!("Installing self-hosted NetBird stack");
-
-        self.set_install_phase("netbird", InstallPhase::PullingImage)
-            .await;
-        for (i, image) in [
-            NETBIRD_DASHBOARD_IMAGE,
-            NETBIRD_SERVER_IMAGE,
-            NETBIRD_PROXY_IMAGE,
-        ]
-        .iter()
-        .enumerate()
-        {
-            self.set_install_progress("netbird", i as u64, 3).await;
-            pull_image_with_retry(image)
-                .await
-                .with_context(|| format!("Failed to pull NetBird image: {}", image))?;
-        }
-        self.set_install_progress("netbird", 3, 3).await;
-
-        for name in ["netbird", "netbird-dashboard", "netbird-server"] {
-            let _ = podman_stack_status(&["rm", "-f", name], PODMAN_STACK_PROBE_TIMEOUT).await;
-        }
-        let _ = podman_stack_status(
-            &["network", "rm", "-f", "netbird-net"],
-            PODMAN_STACK_PROBE_TIMEOUT,
+        anyhow::bail!(
+            "netbird manifests not available on this node — the signed catalog must provide apps/netbird-*/manifest.yml (legacy hardcoded installer removed in #20 ph4)"
        )
-        .await;
-
-        self.set_install_phase("netbird", InstallPhase::CreatingContainer)
-            .await;
-
-        tokio::fs::create_dir_all("/var/lib/archipelago/netbird/data")
-            .await
-            .context("Failed to create NetBird data directory")?;
-
-        let host_ip = detect_netbird_public_host_ip()
-            .await
-            .unwrap_or_else(|| self.config.host_ip.clone());
-
-        // Create the network FIRST so we can read back the gateway it was
-        // assigned — that gateway is Podman's aardvark DNS, which the proxy's
-        // nginx needs as an explicit `resolver` to re-resolve container names
-        // (issue #15: without it nginx caches a container IP and 502s forever
-        // once that IP changes on restart/reboot).
-        let _ = podman_stack_status(
-            &["network", "create", "netbird-net"],
-            PODMAN_STACK_PROBE_TIMEOUT,
-        )
-        .await;
-
-        let resolver_ip = netbird_net_resolver_ip().await;
-        write_netbird_config_files(&host_ip, &self.config.host_ip, &resolver_ip).await?;
-        ensure_netbird_tls_cert(&host_ip).await?;
-
-        let mut server_cmd = tokio::process::Command::new("podman");
-        server_cmd.args([
-            "run",
-            "-d",
-            "--name",
-            "netbird-server",
-            "--network",
-            "netbird-net",
-            "--network-alias",
-            "netbird-server",
-            "--restart=unless-stopped",
-            "-p",
-            "8086:80",
-            "-p",
-            "3478:3478/udp",
-            "-v",
-            "/var/lib/archipelago/netbird/data:/var/lib/netbird",
-            "-v",
-            "/var/lib/archipelago/netbird/config.yaml:/etc/netbird/config.yaml:ro",
-            NETBIRD_SERVER_IMAGE,
-            "--config",
-            "/etc/netbird/config.yaml",
-        ]);
-        run_required_stack_command("netbird", "create server", &mut server_cmd).await?;
-
-        self.set_install_phase("netbird", InstallPhase::StartingContainer)
-            .await;
-        tokio::time::sleep(std::time::Duration::from_secs(5)).await;
-
-        let mut dashboard_cmd = tokio::process::Command::new("podman");
-        dashboard_cmd.args([
-            "run",
-            "-d",
-            "--name",
-            "netbird-dashboard",
-            "--network",
-            "netbird-net",
-            // Explicit alias so the proxy can always resolve `netbird-dashboard`
-            // via Podman DNS — don't rely on implicit container-name aliasing.
-            "--network-alias",
-            "netbird-dashboard",
-            "--restart=unless-stopped",
-            "--env-file",
-            "/var/lib/archipelago/netbird/dashboard.env",
-            NETBIRD_DASHBOARD_IMAGE,
-        ]);
-        run_required_stack_command("netbird", "create dashboard", &mut dashboard_cmd).await?;
-
-        let mut proxy_cmd = tokio::process::Command::new("podman");
-        proxy_cmd.args([
-            "run",
-            "-d",
-            "--name",
-            "netbird",
-            "--network",
-            "netbird-net",
-            "--restart=unless-stopped",
-            // 8087 publishes the TLS listener — netbird's dashboard requires a
-            // secure context (window.crypto.subtle / OIDC PKCE), issue #15.
-            "-p",
-            "8087:443",
-            "-v",
-            "/var/lib/archipelago/netbird/nginx.conf:/etc/nginx/conf.d/default.conf:ro",
-            "-v",
-            "/var/lib/archipelago/netbird/tls.crt:/etc/nginx/tls.crt:ro",
-            "-v",
-            "/var/lib/archipelago/netbird/tls.key:/etc/nginx/tls.key:ro",
-            NETBIRD_PROXY_IMAGE,
-        ]);
-        run_required_stack_command("netbird", "create unified proxy", &mut proxy_cmd).await?;
-
-        wait_for_stack_containers(
-            "netbird",
-            &["netbird-server", "netbird-dashboard", "netbird"],
-            60,
-        )
-        .await?;
-
-        self.set_install_phase("netbird", InstallPhase::WaitingHealthy)
-            .await;
-        // Containers being "running" is NOT the same as the embedded OIDC
-        // provider being ready (#10). The dashboard SPA opens right after install
-        // and, if it loads before /oauth2/.well-known is served, caches a bad
-        // auth state — the user appears logged-in but can't log out until it
-        // self-corrects. Wait (best-effort) for OIDC discovery to answer before
-        // we report Done, so the first dashboard load sees a ready provider.
-        wait_for_netbird_oidc_ready(Duration::from_secs(60)).await;
-
-        self.set_install_phase("netbird", InstallPhase::PostInstall)
-            .await;
-        self.set_install_phase("netbird", InstallPhase::Done).await;
-        self.clear_install_progress("netbird").await;
-
-        install_log("INSTALL OK: netbird stack").await;
-        info!("NetBird stack installed");
-        Ok(serde_json::json!({
-            "success": true,
-            "package_id": "netbird",
-            "message": "NetBird self-hosted stack installed",
-        }))
    }
 }

-/// Best-effort wait for NetBird's embedded OIDC provider to start serving its
-/// discovery document. The management server publishes 8086:80 on the host and
-/// is the issuer at `/oauth2`, so its `.well-known/openid-configuration` is the
-/// signal that the dashboard's login/logout flow will work. Polls until a 2xx
-/// or the timeout — NEVER fails the install (the stack is already running; this
-/// only narrows the post-install race window in #10).
-async fn wait_for_netbird_oidc_ready(timeout: Duration) {
-    let url = "http://127.0.0.1:8086/oauth2/.well-known/openid-configuration";
-    let client = match reqwest::Client::builder()
-        .timeout(Duration::from_secs(5))
-        .build()
-    {
-        Ok(c) => c,
-        Err(_) => return,
-    };
-    let deadline = tokio::time::Instant::now() + timeout;
-    loop {
-        if let Ok(resp) = client.get(url).send().await {
-            if resp.status().is_success() {
-                info!("NetBird OIDC discovery is ready");
-                return;
-            }
-        }
-        if tokio::time::Instant::now() >= deadline {
-            info!("NetBird OIDC discovery not ready within timeout — proceeding anyway");
-            return;
-        }
-        tokio::time::sleep(Duration::from_secs(2)).await;
-    }
-}
-
-async fn read_or_generate_b64_secret(name: &str) -> String {
-    let path = format!("/var/lib/archipelago/secrets/{}", name);
-    if let Ok(val) = tokio::fs::read_to_string(&path).await {
-        let trimmed = val.trim().to_string();
-        if !trimmed.is_empty() {
-            return trimmed;
-        }
-    }
-    let mut buf = [0u8; 32];
-    rand::RngCore::fill_bytes(&mut rand::rngs::OsRng, &mut buf);
-    let secret = base64::engine::general_purpose::STANDARD.encode(buf);
-    let _ = tokio::fs::create_dir_all("/var/lib/archipelago/secrets").await;
-    let _ = tokio::fs::write(&path, &secret).await;
-    secret
-}
-
-/// Read the gateway of the `netbird-net` bridge. Podman runs its aardvark DNS
-/// resolver on this address, so nginx can use it as an explicit `resolver` to
-/// re-resolve container names at request time. Falls back to Podman's usual
-/// first-pool gateway if the inspect fails (best effort — config is rewritten
-/// on every (re)install).
-async fn netbird_net_resolver_ip() -> String {
-    let out = tokio::process::Command::new("podman")
-        .args([
-            "network",
-            "inspect",
-            "netbird-net",
-            "--format",
-            "{{range .Subnets}}{{.Gateway}}{{end}}",
-        ])
-        .output()
-        .await;
-    if let Ok(o) = out {
-        let gw = String::from_utf8_lossy(&o.stdout).trim().to_string();
-        if !gw.is_empty() && gw.parse::<std::net::IpAddr>().is_ok() {
-            return gw;
-        }
-    }
-    "10.89.0.1".to_string()
-}
-
-/// Generate a self-signed TLS cert for the netbird proxy if absent. The
-/// dashboard needs a secure context (window.crypto.subtle / OIDC PKCE), so the
-/// proxy serves HTTPS; a self-signed cert is sufficient (the user accepts it
-/// once when opening netbird in a tab). SAN covers the LAN IP plus
-/// localhost/127.0.0.1 so it's valid however the box is reached locally.
-async fn ensure_netbird_tls_cert(host_ip: &str) -> Result<()> {
-    let dir = "/var/lib/archipelago/netbird";
-    let crt = format!("{dir}/tls.crt");
-    let key = format!("{dir}/tls.key");
-    if tokio::fs::metadata(&crt).await.is_ok() && tokio::fs::metadata(&key).await.is_ok() {
-        return Ok(());
-    }
-    let _ = tokio::fs::create_dir_all(dir).await;
-    let san = format!("subjectAltName=IP:{host_ip},IP:127.0.0.1,DNS:localhost");
-    let status = tokio::process::Command::new("openssl")
-        .args([
-            "req",
-            "-x509",
-            "-newkey",
-            "rsa:2048",
-            "-nodes",
-            "-keyout",
-            &key,
-            "-out",
-            &crt,
-            "-days",
-            "3650",
-            "-subj",
-            &format!("/CN={host_ip}"),
-            "-addext",
-            &san,
-        ])
-        .status()
-        .await
-        .context("failed to run openssl for netbird TLS cert")?;
-    if !status.success() {
-        anyhow::bail!("openssl failed to generate netbird TLS cert");
-    }
-    Ok(())
-}
-
-async fn write_netbird_config_files(host_ip: &str, lan_ip: &str, resolver_ip: &str) -> Result<()> {
-    // netbird's dashboard uses window.crypto.subtle (OIDC PKCE), which browsers
-    // only expose in a SECURE context — so the proxy serves HTTPS and every
-    // origin here is https (issue #15: over plain http the dashboard threw
-    // "window.crypto.subtle is unavailable" and never reached login).
-    let public_origin = format!("https://{}:8087", host_ip);
-    let server_origin = format!("http://{}:8086", host_ip);
-    // A single box is reached via several addresses. Allow the OIDC login flow
-    // to redirect back to whichever origin the user actually used, otherwise
-    // post-login lands on the wrong host and the dashboard shows
-    // "Unauthenticated" (issue #15). The browser-side CORS is handled in the
-    // nginx proxy; this covers the redirect-URI allow-list.
-    let lan_origin = format!("https://{}:8087", lan_ip);
-    let mut redirect_origins = vec![public_origin.clone()];
-    if lan_origin != public_origin {
-        redirect_origins.push(lan_origin);
-    }
-    let dashboard_redirect_uris = redirect_origins
-        .iter()
-        .flat_map(|o| {
-            [
-                format!("      - \"{o}/nb-auth\""),
-                format!("      - \"{o}/nb-silent-auth\""),
-            ]
-        })
-        .collect::<Vec<_>>()
-        .join("\n");
-    let dashboard_logout_uris = redirect_origins
-        .iter()
-        .map(|o| format!("      - \"{o}/\""))
-        .collect::<Vec<_>>()
-        .join("\n");
-    let relay_secret = read_or_generate_b64_secret("netbird-relay-auth-secret").await;
-    let encryption_key = read_or_generate_b64_secret("netbird-store-encryption-key").await;
-    let config = format!(
-        r#"server:
-  listenAddress: ":80"
-  exposedAddress: "{public_origin}"
-  stunPorts:
-    - 3478
-  metricsPort: 9090
-  healthcheckAddress: ":9000"
-  logLevel: "info"
-  logFile: "console"
-  authSecret: "{relay_secret}"
-  dataDir: "/var/lib/netbird"
-  auth:
-    issuer: "{public_origin}/oauth2"
-    localAuthDisabled: false
-    signKeyRefreshEnabled: false
-    dashboardRedirectURIs:
-{dashboard_redirect_uris}
-    dashboardPostLogoutRedirectURIs:
-{dashboard_logout_uris}
-    cliRedirectURIs:
-      - "http://localhost:53000/"
-  store:
-    engine: "sqlite"
-    encryptionKey: "{encryption_key}"
-"#
-    );
-    tokio::fs::write("/var/lib/archipelago/netbird/config.yaml", config)
-        .await
-        .context("Failed to write NetBird config.yaml")?;
-
-    let dashboard_env = format!(
-        r#"NETBIRD_MGMT_API_ENDPOINT={public_origin}
-NETBIRD_MGMT_GRPC_API_ENDPOINT={public_origin}
-AUTH_AUDIENCE=netbird-dashboard
-AUTH_CLIENT_ID=netbird-dashboard
-AUTH_CLIENT_SECRET=
-AUTH_AUTHORITY={public_origin}/oauth2
-USE_AUTH0=false
-AUTH_SUPPORTED_SCOPES=openid profile email groups
-AUTH_REDIRECT_URI=/nb-auth
-AUTH_SILENT_REDIRECT_URI=/nb-silent-auth
-NETBIRD_TOKEN_SOURCE=idToken
-NGINX_SSL_PORT=443
-LETSENCRYPT_DOMAIN=none
-"#
-    );
-    tokio::fs::write("/var/lib/archipelago/netbird/dashboard.env", dashboard_env)
-        .await
-        .context("Failed to write NetBird dashboard.env")?;
-
-    let nginx_conf = format!(
-        r#"server {{
-    listen 443 ssl;
-    server_name _;
-
-    # netbird's dashboard needs a secure context (window.crypto.subtle for OIDC
-    # PKCE), so the proxy terminates TLS with a self-signed cert (issue #15).
-    ssl_certificate /etc/nginx/tls.crt;
-    ssl_certificate_key /etc/nginx/tls.key;
-
-    # Rootless Podman can hand a container a new IP across restarts/reboots.
-    # nginx resolves a literal upstream name ONCE at startup and caches it, so
-    # after the IP moves every request 502s with "host unreachable" (issue #15,
-    # observed live on .198: nginx pinned to a dead netbird-dashboard IP). Fix:
-    # point `resolver` at the netbird-net gateway (Podman's aardvark DNS) and
-    # use VARIABLE upstreams, which forces nginx to re-resolve the container
-    # names at request time. Everything is reached container-to-container by
-    # name so nothing depends on host-published ports either.
-    resolver {resolver_ip} valid=10s ipv6=off;
-
-    proxy_set_header Host $host;
-    proxy_set_header X-Real-IP $remote_addr;
-    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
-    proxy_set_header X-Forwarded-Proto $scheme;
-    proxy_http_version 1.1;
-
-    location ~ ^/(relay|ws-proxy/) {{
-        set $nb_server netbird-server;
-        proxy_pass http://$nb_server:80;
-        proxy_set_header Upgrade $http_upgrade;
-        proxy_set_header Connection "upgrade";
-        proxy_read_timeout 1d;
-    }}
-
-    location ~ ^/(api|oauth2)(/|$) {{
-        # The dashboard is a SPA whose API/OIDC base URL is baked at build time
-        # to one host:port. A single box is reached via several addresses (LAN
-        # IP, Tailscale 100.x, hostname), so those fetches are cross-origin and
-        # the browser blocks them with no Access-Control-Allow-Origin (issue
-        # #15, observed live on .198). Reflect the caller's Origin so the
-        # self-hosted management/OIDC API is reachable from any of them, and
-        # answer the CORS preflight here.
-        if ($request_method = OPTIONS) {{
-            add_header Access-Control-Allow-Origin $http_origin always;
-            add_header Access-Control-Allow-Credentials true always;
-            add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
-            add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
-            add_header Access-Control-Max-Age 86400 always;
-            add_header Content-Length 0;
-            return 204;
-        }}
-        add_header Access-Control-Allow-Origin $http_origin always;
-        add_header Access-Control-Allow-Credentials true always;
-        add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
-        add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
-        set $nb_server netbird-server;
-        proxy_pass http://$nb_server:80;
-    }}
-
-    location ~ ^/(signalexchange\.SignalExchange|management\.ManagementService|management\.ProxyService)/ {{
-        set $nb_server netbird-server;
-        grpc_pass grpc://$nb_server:80;
-        grpc_read_timeout 1d;
-        grpc_send_timeout 1d;
-    }}
-
-    # OIDC callback routes are client-side SPA routes with NO prebuilt page in
-    # the dashboard bundle, so proxying them straight through 404s — which
-    # crashes the dashboard's auth init and shows "Unauthenticated" with dead
-    # buttons (issue #15, confirmed live on .198: /nb-auth + /nb-silent-auth
-    # returned 404). Serve the dashboard's index.html at these paths (URL
-    # unchanged) so react-oidc boots and completes the login / silent-SSO.
-    location ~ ^/(nb-auth|nb-silent-auth) {{
-        set $nb_dashboard netbird-dashboard;
-        rewrite ^.*$ /index.html break;
-        proxy_pass http://$nb_dashboard:80;
-    }}
-
-    location / {{
-        set $nb_dashboard netbird-dashboard;
-        proxy_pass http://$nb_dashboard:80;
-    }}
-}}
-
-# Direct server remains available for diagnostics at {server_origin}.
-"#
-    );
-    tokio::fs::write("/var/lib/archipelago/netbird/nginx.conf", nginx_conf)
-        .await
-        .context("Failed to write NetBird nginx.conf")?;
-
-    Ok(())
-}
-
-async fn detect_netbird_public_host_ip() -> Option<String> {
-    let output = tokio::process::Command::new("hostname")
-        .args(["-I"])
-        .output()
-        .await
-        .ok()?;
-    let stdout = String::from_utf8_lossy(&output.stdout);
-    let ips: Vec<&str> = stdout
-        .split_whitespace()
-        .filter(|s| s.contains('.'))
-        .collect();
-
-    // Prefer the LAN address as the canonical origin — that's what users browse
-    // to on the local network. Baking the Tailscale 100.x address here broke
-    // LAN access with cross-origin/redirect mismatches (issue #15). Tailscale
-    // (100.64.0.0/10 CGNAT) is only a fallback for nodes with no LAN IP.
-    let is_private_lan = |ip: &str| {
-        ip.starts_with("192.168.")
-            || ip.starts_with("10.")
-            || (ip.starts_with("172.")
-                && ip
-                    .split('.')
-                    .nth(1)
-                    .and_then(|o| o.parse::<u8>().ok())
-                    .map(|o| (16..=31).contains(&o))
-                    .unwrap_or(false))
-    };
-    if let Some(lan) = ips.iter().find(|ip| is_private_lan(ip)) {
-        return Some(lan.to_string());
-    }
-    ips.iter()
-        .find(|ip| ip.starts_with("100."))
-        .map(|s| s.to_string())
-}
-
 #[cfg(test)]
 mod tests {
    use super::{btcpay_stack_app_ids, mempool_stack_app_ids};
--- a/core/archipelago/src/api/rpc/package/update.rs
+++ b/core/archipelago/src/api/rpc/package/update.rs
@ -32,19 +32,27 @@ impl RpcHandler {
            .ok_or_else(|| anyhow::anyhow!("Missing package id"))?;
        validate_app_id(package_id)?;

-        // Verify an update is actually available. Prefer the remote app catalog
-        // (decoupled from the binary OTA), falling back to the image-versions.sh
-        // pin when the catalog is absent or doesn't cover this app.
+        // Resolve the target image. Prefer the remote app catalog (decoupled
+        // from the binary OTA), falling back to the image-versions.sh pin. This
+        // is OPTIONAL for orchestrator-managed apps: the orchestrator resolves
+        // the image itself (manifest + catalog + version_config pin) in its
+        // upgrade path, so an app the catalog doesn't carry a primary image for
+        // (e.g. bitcoin-core, image lives in the embedded manifest + versions[])
+        // still upgrades. Only the legacy/stack path below hard-requires it.
        let pinned = crate::container::app_catalog::catalog_primary_image(package_id)
-            .or_else(|| image_versions::pinned_image_for_app(package_id))
-            .ok_or_else(|| anyhow::anyhow!("No pinned image found for {}", package_id))?;
+            .or_else(|| image_versions::pinned_image_for_app(package_id));

        // Note: the `already updating` guard lives in `spawn_package_update`
        // (the async wrapper that dispatch actually routes to). By the time
        // this inner function runs, the wrapper has already flipped state to
        // `Updating`, so duplicating the check here would be a false positive.

-        install_log(&format!("UPDATE: {} → {}", package_id, pinned)).await;
+        install_log(&format!(
+            "UPDATE: {} → {}",
+            package_id,
+            pinned.as_deref().unwrap_or("(orchestrator-resolved)")
+        ))
+        .await;

        // Set state to Updating
        {
@ -114,6 +122,16 @@ impl RpcHandler {
            }
        }

+        // Legacy/stack path hard-requires a concrete primary image (the
+        // orchestrator path above already returned for apps it manages).
+        let pinned = match pinned {
+            Some(p) => p,
+            None => {
+                self.clear_update_state(package_id).await;
+                return Err(anyhow::anyhow!("No pinned image found for {}", package_id));
+            }
+        };
+
        // Resolve images to pull — either a stack or single container
        let images_to_pull = self.resolve_images_to_pull(package_id, &pinned);

--- a/core/archipelago/src/config.rs
+++ b/core/archipelago/src/config.rs
@ -66,7 +66,7 @@ pub struct Config {
    /// through Quadlet (`.container` units in ~/.config/containers/systemd
    /// + systemctl --user start) instead of `podman create + start`. Default
    /// off so the legacy path stays the production path until the harness
-    /// at tests/lifecycle/run-20x.sh has gone green against the new path
+    /// at tests/lifecycle/run-gate.sh has gone green against the new path
    /// on .228 + .198. See `project_v1_7_52_phase3_quadlet_design`.
    #[serde(default)]
    pub use_quadlet_backends: bool,
@ -487,7 +487,7 @@ mod tests {

    #[test]
    fn test_config_use_quadlet_backends_defaults_off() {
-        // Phase 3.2 of v1.7.52 — the new path stays gated until the 20×
+        // Phase 3.2 of v1.7.52 — the new path stays gated until the 5×
        // harness goes green on .228 and .198. Flipping this default
        // ahead of that would route every backend install through code
        // we haven't fleet-validated yet.
--- a/core/archipelago/src/container/app_catalog.rs
+++ b/core/archipelago/src/container/app_catalog.rs
@ -86,6 +86,44 @@ pub struct AppCatalogEntry {
    /// Optional human-readable changelog lines for this version.
    #[serde(default, skip_serializing_if = "Vec::is_empty")]
    pub changelog: Vec<String>,
+    /// Multi-version support (`docs/bitcoin-multi-version-design.md`): the bounded
+    /// set of versions a user may install or switch to for this app. Empty for
+    /// single-version apps; `version`/`image` above remain the default/latest for
+    /// back-compat. Old nodes ignore this field (no `deny_unknown_fields`).
+    #[serde(default, skip_serializing_if = "Vec::is_empty")]
+    pub versions: Vec<CatalogVersion>,
+    /// Full app manifest, embedded so the app installs from the registry alone —
+    /// no OTA-shipped `apps/<id>/manifest.yml`. Carried as the raw value the
+    /// publisher signed (so it stays part of the verified preimage) and
+    /// deserialized into an `AppManifest` by the orchestrator at load time, where
+    /// it overrides the disk manifest (origin-wins). Absent during the migration
+    /// window => the node falls back to the disk manifest. See
+    /// `docs/registry-manifest-design.md`.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub manifest: Option<serde_json::Value>,
+}
+
+/// One selectable version in an app's `versions[]` list. The catalog carries a
+/// curated, bounded set (current + a few majors back); see
+/// `docs/bitcoin-multi-version-design.md` §3 Phase 1.
+#[derive(Debug, Clone, Serialize, Deserialize, Default, PartialEq, Eq)]
+pub struct CatalogVersion {
+    /// User-facing + tag-matching version string (e.g. `31.0`,
+    /// `29.3.knots20260508`). Treated as the image tag.
+    pub version: String,
+    /// Concrete image reference for this version. When omitted the orchestrator
+    /// falls back to composing `<default-repo>:<version>` from the entry image.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub image: Option<String>,
+    /// Marks the default / latest version pre-selected in the install modal.
+    #[serde(default, skip_serializing_if = "std::ops::Not::not")]
+    pub default: bool,
+    /// Deprecated versions are still installable but badged in the UI.
+    #[serde(default, skip_serializing_if = "std::ops::Not::not")]
+    pub deprecated: bool,
+    /// Optional end-of-life date (YYYY-MM-DD), surfaced in the UI.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub eol: Option<String>,
 }

 /// Read-side cache file search order. Mirrors `image_versions.rs`: the running
@ -166,6 +204,76 @@ pub fn catalog_stack_images(app_id: &str) -> HashMap<String, String> {
    entry_for(app_id).and_then(|e| e.images).unwrap_or_default()
 }

+/// All `(app_id, manifest-value)` pairs the registry catalog carries. The
+/// orchestrator deserializes + validates each into an `AppManifest` and prefers
+/// it over the disk manifest (origin-wins); disk remains the migration fallback.
+/// Empty when the catalog is absent or no entry embeds a manifest.
+pub fn catalog_manifest_values() -> Vec<(String, serde_json::Value)> {
+    load_catalog()
+        .apps
+        .into_iter()
+        .filter_map(|(id, e)| e.manifest.map(|m| (id, m)))
+        .collect()
+}
+
+/// The catalog's default/latest version string for an app (the top-level
+/// `version` field), if covered. Used to decide whether an install-time
+/// selection should pin (older) or track-latest (default).
+pub fn catalog_default_version(app_id: &str) -> Option<String> {
+    entry_for(app_id).map(|e| e.version).filter(|v| !v.is_empty())
+}
+
+/// Curated, selectable versions for an app per the remote catalog. Empty when
+/// the catalog is absent or the app is single-version. The default entry (if
+/// any) sorts first so callers can pre-select it.
+pub fn catalog_versions(app_id: &str) -> Vec<CatalogVersion> {
+    let mut versions = entry_for(app_id).map(|e| e.versions).unwrap_or_default();
+    versions.sort_by_key(|v| !v.default); // default first, stable otherwise
+    versions
+}
+
+/// Resolve the image for a specific selectable `version` of `app_id`, validated
+/// same-repo against `manifest_image` (the same guard `catalog_image_override`
+/// applies). The version's explicit `image` is used when present; otherwise the
+/// repo of `manifest_image` is retagged with `version`. Returns `None` when the
+/// version is unknown or would point at a different repository — the caller then
+/// keeps the default resolution and the switch is refused upstream.
+pub fn catalog_image_for_version(
+    app_id: &str,
+    version: &str,
+    manifest_image: &str,
+) -> Option<String> {
+    let entry = catalog_versions(app_id)
+        .into_iter()
+        .find(|v| v.version == version)?;
+    let manifest_repo =
+        crate::container::image_versions::image_without_registry_or_tag(manifest_image);
+    let candidate = match entry.image {
+        Some(img) => img,
+        None => {
+            // Retag the manifest's full registry/repo with the requested version.
+            let repo = manifest_image
+                .rsplit_once(':')
+                // keep registry:port colons intact: only strip a tag after the last '/'
+                .filter(|(left, _)| left.contains('/'))
+                .map(|(left, _)| left)
+                .unwrap_or(manifest_image);
+            format!("{repo}:{version}")
+        }
+    };
+    let same_repo =
+        crate::container::image_versions::image_without_registry_or_tag(&candidate) == manifest_repo;
+    if same_repo {
+        Some(candidate)
+    } else {
+        warn!(
+            "app-catalog: ignoring version {} for {} — repo mismatch (candidate={}, manifest={})",
+            version, app_id, candidate, manifest_image
+        );
+        None
+    }
+}
+
 /// Image override for the orchestrator's install/upgrade path. Returns the
 /// catalog's primary image for `app_id` ONLY when it refers to the same
 /// repository as the manifest's current image — a guard so a catalog typo can
@ -193,6 +301,12 @@ pub fn catalog_image_override(app_id: &str, manifest_image: &str) -> Option<Stri
 /// newer catalog, nor vice-versa). Falls back to the deployed pin only when the
 /// catalog is missing or doesn't cover the app.
 pub fn available_update_for_app(app_id: &str, running_image: &str) -> Option<String> {
+    // A runner-pinned version is an explicit "stay here" choice — never advertise
+    // an update over it (design §3 Phase 3). Auto-update, when enabled, ignores
+    // the pin and is driven by the catalog tick, not this badge.
+    if crate::container::version_config::pinned_version(app_id).is_some() {
+        return None;
+    }
    if let Some(catalog_image) = catalog_primary_image(app_id) {
        // Catalog covers this app with a concrete image -> authoritative.
        return crate::container::image_versions::available_update_for_images(
@ -346,6 +460,30 @@ mod tests {
        assert_eq!(e.digest.as_deref(), Some("blake3:deadbeef"));
    }

+    #[test]
+    fn entry_carries_embedded_manifest() {
+        let json = r#"{
+            "schema": 1,
+            "apps": {
+                "demo": {
+                    "version": "1.0.0",
+                    "manifest": {
+                        "app": {
+                            "id": "demo",
+                            "name": "Demo",
+                            "version": "1.0.0",
+                            "container": { "image": "registry/demo:1.0.0" }
+                        }
+                    }
+                }
+            }
+        }"#;
+        let cat: AppCatalog = serde_json::from_str(json).unwrap();
+        let e = cat.apps.get("demo").unwrap();
+        let m = e.manifest.as_ref().expect("manifest present");
+        assert_eq!(m["app"]["id"], "demo");
+    }
+
    #[test]
    fn empty_catalog_when_absent_is_default() {
        let cat = AppCatalog::default();
--- a/core/archipelago/src/container/boot_reconciler.rs
+++ b/core/archipelago/src/container/boot_reconciler.rs
@ -96,6 +96,35 @@ impl BootReconciler {
            }
        }

+        // Companion self-heal runs on its OWN cadence, decoupled from the
+        // per-app reconcile pass. On a heavily loaded node `reconcile_existing`
+        // over dozens of apps can take well over a minute, which would delay a
+        // companion-unit repair (deleted/lost unit file) past any reasonable
+        // safety window. Detecting + rewriting a companion unit is cheap, so it
+        // gets a dedicated `interval` loop. The handle is aborted when the main
+        // loop exits (shutdown uses `notify_one`, so we must NOT add a second
+        // waiter on `self.shutdown` — it would steal the single wake permit).
+        let companion_handle = if self.companion_stage {
+            let orchestrator = self.orchestrator.clone();
+            let interval = self.interval;
+            Some(tokio::spawn(async move {
+                loop {
+                    let installed = orchestrator.manifest_ids().await;
+                    for (companion, err) in crate::container::companion::reconcile(&installed).await
+                    {
+                        tracing::warn!(
+                            companion = %companion,
+                            error = %err,
+                            "companion reconcile failed"
+                        );
+                    }
+                    time::sleep(interval).await;
+                }
+            }))
+        } else {
+            None
+        };
+
        // Initial pass: no delay.
        self.tick().await;

@ -111,23 +140,15 @@ impl BootReconciler {
                }
            }
        }
+
+        if let Some(handle) = companion_handle {
+            handle.abort();
+        }
    }

    async fn tick(&self) {
        let report = self.orchestrator.reconcile_existing().await;
        Self::log_report(&report);
-
-        if !self.companion_stage {
-            return;
-        }
-        let installed = self.orchestrator.manifest_ids().await;
-        for (companion, err) in crate::container::companion::reconcile(&installed).await {
-            tracing::warn!(
-                companion = %companion,
-                error = %err,
-                "companion reconcile failed"
-            );
-        }
    }

    fn log_report(report: &ReconcileReport) {
--- a/core/archipelago/src/container/companion.rs
+++ b/core/archipelago/src/container/companion.rs
@ -221,13 +221,26 @@ async fn ensure_image_present(spec: &CompanionSpec) -> Result<String> {
    for dir in spec.build_dir_candidates {
        let dockerfile = PathBuf::from(dir).join("Dockerfile");
        if fs::try_exists(&dockerfile).await.unwrap_or(false) {
+            // `:local` is a deliberate manual override — never auto-rebuild it.
            if image_exists(&local_image_compat).await {
                return Ok(local_image_compat);
            }
+            // Reuse the auto-built `:latest` only when the build context has NOT
+            // changed since it was built. Without this staleness check an
+            // already-present image is reused forever, so edits to the baked-in
+            // context (Dockerfile, nginx.conf, …) never reach the node — this is
+            // exactly why the guardian-CSS nginx fix never reached the fleet.
            if image_exists(&local_image).await {
-                return Ok(local_image);
+                if !context_is_newer_than_image(dir, &local_image).await {
+                    return Ok(local_image);
+                }
+                info!(
+                    companion = spec.name,
+                    "build context changed since image built; rebuilding {dir}"
+                );
+            } else {
+                info!(companion = spec.name, "building locally from {dir}");
            }
-            info!(companion = spec.name, "building locally from {dir}");
            let out = command_output_with_timeout(
                Command::new("podman").args(["build", "-t", &local_image, dir]),
                COMPANION_BUILD_TIMEOUT,
@ -272,7 +285,15 @@ async fn ensure_image_present(spec: &CompanionSpec) -> Result<String> {

 async fn image_exists(image: &str) -> bool {
    let mut cmd = Command::new("podman");
-    cmd.args(["image", "inspect", image]);
+    // Only the exit status matters. WITHOUT a `--format`, `podman image inspect`
+    // prints the image's full multi-KB manifest JSON; `.status()` inherits the
+    // service's stdout, so on a hit that whole blob lands in the journal — once
+    // per companion image, every reconcile pass. That flood spikes journald +
+    // IO and starves the async runtime (UI websocket then drops → "connection
+    // lost"/reconnect). Discard the child's stdout/stderr; we read neither.
+    cmd.args(["image", "inspect", image])
+        .stdout(std::process::Stdio::null())
+        .stderr(std::process::Stdio::null());
    match tokio::time::timeout(COMPANION_IMAGE_CHECK_TIMEOUT, cmd.status()).await {
        Ok(Ok(status)) => status.success(),
        Ok(Err(err)) => {
@ -286,6 +307,73 @@ async fn image_exists(image: &str) -> bool {
    }
 }

+/// Returns true if any file in the build context `dir` is newer than the
+/// already-built `image`, signalling the cached image is stale and must be
+/// rebuilt. Conservative: if either timestamp can't be determined we return
+/// false (reuse the cache) to avoid rebuild storms on every reconcile pass.
+async fn context_is_newer_than_image(dir: &str, image: &str) -> bool {
+    let image_created = match image_created_unix(image).await {
+        Some(t) => t,
+        None => return false,
+    };
+    match newest_mtime_unix(PathBuf::from(dir)).await {
+        Some(ctx) => ctx > image_created,
+        None => false,
+    }
+}
+
+/// Build timestamp of `image` as Unix seconds, via `podman image inspect`.
+async fn image_created_unix(image: &str) -> Option<i64> {
+    let mut cmd = Command::new("podman");
+    cmd.args(["image", "inspect", "--format", "{{.Created.Unix}}", image]);
+    let out = command_output_with_timeout(
+        &mut cmd,
+        COMPANION_IMAGE_CHECK_TIMEOUT,
+        "podman image created time",
+    )
+    .await
+    .ok()?;
+    if !out.status.success() {
+        return None;
+    }
+    String::from_utf8_lossy(&out.stdout).trim().parse::<i64>().ok()
+}
+
+/// Newest modification time (Unix seconds) across all files under `dir`,
+/// walked recursively. Runs on a blocking thread since it touches the fs.
+async fn newest_mtime_unix(dir: PathBuf) -> Option<i64> {
+    tokio::task::spawn_blocking(move || newest_mtime_blocking(&dir))
+        .await
+        .ok()
+        .flatten()
+}
+
+fn newest_mtime_blocking(dir: &std::path::Path) -> Option<i64> {
+    let mut newest: Option<i64> = None;
+    let mut stack = vec![dir.to_path_buf()];
+    while let Some(p) = stack.pop() {
+        let entries = match std::fs::read_dir(&p) {
+            Ok(e) => e,
+            Err(_) => continue,
+        };
+        for entry in entries.flatten() {
+            let meta = match entry.metadata() {
+                Ok(m) => m,
+                Err(_) => continue,
+            };
+            if meta.is_dir() {
+                stack.push(entry.path());
+            } else if let Ok(modified) = meta.modified() {
+                if let Ok(dur) = modified.duration_since(std::time::UNIX_EPOCH) {
+                    let secs = dur.as_secs() as i64;
+                    newest = Some(newest.map_or(secs, |n| n.max(secs)));
+                }
+            }
+        }
+    }
+    newest
+}
+
 async fn command_output_with_timeout(
    cmd: &mut Command,
    timeout: Duration,
--- a/core/archipelago/src/container/docker_packages.rs
+++ b/core/archipelago/src/container/docker_packages.rs
@ -691,16 +691,37 @@ fn extract_lan_address(ports: &[String]) -> Option<String> {
    None
 }

+/// netbird's dashboard launch URL: HTTPS on 8087 (the proxy terminates TLS —
+/// the dashboard needs a secure context for OIDC PKCE, issue #15) at the node's
+/// primary host IP so it's reachable from the LAN. Manifest-driven netbird no
+/// longer writes `dashboard.env`, so this is derived from host facts (the same
+/// `{{HOST_IP}}` the orchestrator bakes into the cert/config); it falls back to
+/// the static localhost mapping when the host IP can't be read. URL shape is
+/// identical to the legacy installer's, so the existing https reachability
+/// wrapper still applies.
 async fn netbird_configured_launch_url() -> Option<String> {
-    let env = tokio::fs::read_to_string("/var/lib/archipelago/netbird/dashboard.env")
+    if let Some(ip) = first_host_ip().await {
+        return Some(format!("https://{ip}:8087"));
+    }
+    PodmanClient::lan_address_for("netbird")
+}
+
+/// First address from `hostname -I` — the node's primary host IP. Mirrors the
+/// orchestrator's `detect_host_ip` so launch URLs match the cert/config the
+/// orchestrator renders for `{{HOST_IP}}`.
+async fn first_host_ip() -> Option<String> {
+    let out = tokio::process::Command::new("hostname")
+        .arg("-I")
+        .output()
        .await
        .ok()?;
-    env.lines()
-        .find_map(|line| line.strip_prefix("NETBIRD_MGMT_API_ENDPOINT="))
-        .map(str::trim)
-        .filter(|s| !s.is_empty())
+    if !out.status.success() {
+        return None;
+    }
+    String::from_utf8_lossy(&out.stdout)
+        .split_whitespace()
+        .next()
        .map(ToOwned::to_owned)
-        .or_else(|| PodmanClient::lan_address_for("netbird"))
 }

 async fn reachable_lan_address(app_id: &str, candidate: Option<String>) -> Option<String> {
--- a/core/archipelago/src/container/hooks.rs
+++ b/core/archipelago/src/container/hooks.rs
@ -0,0 +1,203 @@
+//! Manifest-driven lifecycle hook executor (Task #20).
+//!
+//! Runs an app's declarative `post_install` hooks against its **own** running
+//! container. Hooks are an allowlisted, reviewed escape hatch — NOT arbitrary
+//! host scripts:
+//!
+//! - `exec` runs *inside the container* (`podman exec`), never on the host, and
+//!   inherits the container's (already dropped) capabilities.
+//! - `copy_from_host.src` is resolved against an allowlist root, canonicalised,
+//!   and rejected on any escape; only then is it `podman cp`'d into the container.
+//! - Execution is **best-effort + idempotent**: each step is logged, a failure is
+//!   warned and the remaining steps still run, so a transient hook error never
+//!   bricks an install. Authors must make steps safe to re-run (e.g. `grep -q … ||`).
+//!
+//! See `docs/manifest-hooks-design.md`.
+
+use std::path::{Path, PathBuf};
+use std::time::Duration;
+
+use anyhow::{bail, Result};
+use archipelago_container::{AppManifest, HookStep};
+
+/// Upper bound on a single hook command. Generous — config rewrites + nginx
+/// reloads are fast, but an image with a hung entrypoint shouldn't wedge install.
+const HOOK_TIMEOUT: Duration = Duration::from_secs(60);
+
+/// Roots a `copy_from_host.src` may resolve within. A src is joined onto each
+/// root, canonicalised, and accepted only if it stays inside that root:
+/// - the app's own data dir (`<data_dir>/<app_id>`), and
+/// - `/opt/archipelago` (covers the orchestrator's bundled `web-ui/` assets,
+///   e.g. indeedhub's `web-ui/nostr-provider.js`).
+fn allowlist_roots(app_id: &str, data_dir: &Path) -> Vec<PathBuf> {
+    vec![data_dir.join(app_id), PathBuf::from("/opt/archipelago")]
+}
+
+/// Resolve a hook copy source against the allowlist. Returns the canonical
+/// absolute path iff it exists and lies within an allowlist root. Defence in
+/// depth: `AppManifest::validate` already rejects absolute / `..` srcs, but we
+/// re-check here and canonicalise so a symlink inside a root can't escape it.
+fn resolve_copy_src(src: &str, app_id: &str, data_dir: &Path) -> Result<PathBuf> {
+    if src.is_empty() || src.starts_with('/') || src.contains("..") {
+        bail!("hook copy src '{src}' is not an allowlisted relative path");
+    }
+    for root in allowlist_roots(app_id, data_dir) {
+        let Ok(root_canon) = root.canonicalize() else {
+            continue;
+        };
+        let Ok(canon) = root.join(src).canonicalize() else {
+            continue;
+        };
+        if canon.starts_with(&root_canon) {
+            return Ok(canon);
+        }
+    }
+    bail!("hook copy src '{src}' did not resolve inside an allowlist root")
+}
+
+/// Run an app's declarative `post_install` hooks against its running container.
+/// Best-effort: never returns an error — a failed step is warned and skipped.
+/// Called from the install path after the container is created + running, and
+/// only when a fresh container was created (see `install_fresh`).
+pub async fn run_post_install(manifest: &AppManifest, container_name: &str, data_dir: &Path) {
+    let steps = &manifest.app.hooks.post_install;
+    if steps.is_empty() {
+        return;
+    }
+    let app_id = &manifest.app.id;
+    tracing::info!(
+        app_id = %app_id,
+        container = %container_name,
+        steps = steps.len(),
+        "running manifest post_install hooks"
+    );
+    for (i, step) in steps.iter().enumerate() {
+        match run_step(step, container_name, app_id, data_dir).await {
+            Ok(()) => tracing::debug!(app_id = %app_id, step = i, "post_install hook step ok"),
+            Err(err) => tracing::warn!(
+                app_id = %app_id,
+                container = %container_name,
+                step = i,
+                error = %err,
+                "post_install hook step failed (continuing best-effort)"
+            ),
+        }
+    }
+}
+
+async fn run_step(
+    step: &HookStep,
+    container: &str,
+    app_id: &str,
+    data_dir: &Path,
+) -> Result<()> {
+    match step {
+        HookStep::Exec { exec } => {
+            let mut args: Vec<&str> = Vec::with_capacity(exec.len() + 2);
+            args.push("exec");
+            args.push(container);
+            args.extend(exec.iter().map(String::as_str));
+            // `exec` spawns a process INSIDE the container's cgroup. When the
+            // container was started by archipelago.service, that cgroup is under
+            // the service's slice and a bare `podman exec` from the service can't
+            // write its `cgroup.procs` ("crun: ... Permission denied / OCI
+            // permission denied"). Run it in a transient user scope (its own
+            // delegated cgroup) — mirrors `podman_user_scope` for pasta starts.
+            run_podman(&args, /* scoped */ true).await
+        }
+        HookStep::CopyFromHost { copy_from_host } => {
+            let abs = resolve_copy_src(&copy_from_host.src, app_id, data_dir)?;
+            let abs = abs.to_string_lossy().into_owned();
+            let dest = format!("{container}:{}", copy_from_host.dest);
+            // `cp` is a host-side copy (no in-container process), so no scope needed.
+            run_podman(&["cp", &abs, &dest], /* scoped */ false).await
+        }
+    }
+}
+
+/// Run a podman command, optionally inside a transient systemd user scope. The
+/// scope gives the invocation its own delegated cgroup so `podman exec` can
+/// place its child process — without it, an exec launched from the service's
+/// own cgroup is denied write to the container's `cgroup.procs`.
+async fn run_podman(args: &[&str], scoped: bool) -> Result<()> {
+    let rendered = args.join(" ");
+    let mut cmd = if scoped {
+        let mut c = tokio::process::Command::new("systemd-run");
+        c.args(["--user", "--scope", "--quiet", "--collect", "podman"]);
+        c.args(args);
+        c
+    } else {
+        let mut c = tokio::process::Command::new("podman");
+        c.args(args);
+        c
+    };
+    let out = tokio::time::timeout(HOOK_TIMEOUT, cmd.output())
+        .await
+        .map_err(|_| anyhow::anyhow!("podman {rendered} timed out after {:?}", HOOK_TIMEOUT))?
+        .map_err(|e| anyhow::anyhow!("podman {rendered}: {e}"))?;
+
+    if !out.status.success() {
+        bail!(
+            "podman {rendered} exited {}: {}",
+            out.status,
+            String::from_utf8_lossy(&out.stderr).trim()
+        );
+    }
+    Ok(())
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn resolve_copy_src_accepts_file_in_app_data_dir() {
+        let tmp = tempfile::tempdir().unwrap();
+        let data_dir = tmp.path();
+        let app_dir = data_dir.join("myapp/web-ui");
+        std::fs::create_dir_all(&app_dir).unwrap();
+        std::fs::write(app_dir.join("provider.js"), b"x").unwrap();
+
+        let got = resolve_copy_src("web-ui/provider.js", "myapp", data_dir).unwrap();
+        assert!(got.ends_with("myapp/web-ui/provider.js"));
+        assert!(got.is_absolute());
+    }
+
+    #[test]
+    fn resolve_copy_src_rejects_absolute() {
+        let tmp = tempfile::tempdir().unwrap();
+        assert!(resolve_copy_src("/etc/passwd", "myapp", tmp.path()).is_err());
+    }
+
+    #[test]
+    fn resolve_copy_src_rejects_traversal() {
+        let tmp = tempfile::tempdir().unwrap();
+        assert!(resolve_copy_src("web-ui/../../etc/shadow", "myapp", tmp.path()).is_err());
+    }
+
+    #[test]
+    fn resolve_copy_src_rejects_missing_file() {
+        // Inside the allowlist shape but the file doesn't exist → canonicalize fails.
+        let tmp = tempfile::tempdir().unwrap();
+        std::fs::create_dir_all(tmp.path().join("myapp")).unwrap();
+        assert!(resolve_copy_src("nope.js", "myapp", tmp.path()).is_err());
+    }
+
+    #[test]
+    fn resolve_copy_src_rejects_symlink_escape() {
+        // A symlink inside the app dir pointing outside it must be rejected by
+        // the post-canonicalisation prefix check.
+        let tmp = tempfile::tempdir().unwrap();
+        let app_dir = tmp.path().join("myapp");
+        std::fs::create_dir_all(&app_dir).unwrap();
+        let secret = tmp.path().join("secret.txt");
+        std::fs::write(&secret, b"s").unwrap();
+        let link = app_dir.join("link.js");
+        if std::os::unix::fs::symlink(&secret, &link).is_ok() {
+            // `secret.txt` lives in the tmp root, NOT under <data_dir>/myapp, so
+            // the canonical target escapes the app-data root. It also isn't under
+            // /opt/archipelago. Must be rejected.
+            assert!(resolve_copy_src("link.js", "myapp", tmp.path()).is_err());
+        }
+    }
+}
--- a/core/archipelago/src/container/mod.rs
+++ b/core/archipelago/src/container/mod.rs
@ -6,12 +6,15 @@ pub mod data_manager;
 pub mod dev_orchestrator;
 pub mod docker_packages;
 pub mod filebrowser;
+pub mod hooks;
 pub mod image_versions;
 pub mod lnd;
 pub mod prod_orchestrator;
 pub mod quadlet;
 pub mod registry;
+pub mod secrets;
 pub mod traits;
+pub mod version_config;

 pub use boot_reconciler::{BootReconciler, DEFAULT_INTERVAL as RECONCILER_DEFAULT_INTERVAL};
 pub use dev_orchestrator::DevContainerOrchestrator;
--- a/core/archipelago/src/container/prod_orchestrator.rs
+++ b/core/archipelago/src/container/prod_orchestrator.rs
--- a/core/archipelago/src/container/quadlet.rs
+++ b/core/archipelago/src/container/quadlet.rs
@ -227,13 +227,20 @@ impl QuadletUnit {
                mode
            );
        }
-        for (host, container, proto) in &self.ports {
-            let p = if proto.is_empty() {
-                "tcp"
-            } else {
-                proto.as_str()
-            };
-            let _ = writeln!(s, "PublishPort={host}:{container}/{p}");
+        // Host networking exposes the container's ports on the host directly.
+        // Podman rejects PublishPort combined with Network=host ("published
+        // ports cannot be used with host network") and the unit crash-loops
+        // (exit 125). Skip publishing in host mode — matches the NetworkMode
+        // doc note that Podman discards port mappings under host networking.
+        if !matches!(self.network, NetworkMode::Host) {
+            for (host, container, proto) in &self.ports {
+                let p = if proto.is_empty() {
+                    "tcp"
+                } else {
+                    proto.as_str()
+                };
+                let _ = writeln!(s, "PublishPort={host}:{container}/{p}");
+            }
        }
        for env in &self.environment {
            // env entries already arrive shaped as "KEY=VALUE"; quadlet
@ -403,7 +410,18 @@ impl QuadletUnit {
            environment: app.environment.clone(),
            devices: app.devices.clone(),
            add_hosts: vec![("host.archipelago".into(), "10.89.0.1".into())],
-            network_aliases: vec![name.to_string()],
+            // Container always answers to its own name; manifest extras add the
+            // short hostnames peers bake in (e.g. indeedhub api/minio/relay).
+            // Only emitted for Bridge networks (slirp/pasta reject aliases).
+            network_aliases: {
+                let mut a = vec![name.to_string()];
+                for extra in &app.container.network_aliases {
+                    if !a.iter().any(|x| x == extra) {
+                        a.push(extra.clone());
+                    }
+                }
+                a
+            },
            entrypoint: app.container.entrypoint.clone(),
            command: app.container.custom_args.clone(),
            read_only_root: app.security.readonly_root,
@ -563,11 +581,12 @@ pub async fn write_if_changed(unit: &QuadletUnit, dir: &Path) -> Result<bool> {
 /// Reload the user systemd manager. Required after any quadlet write
 /// or removal so systemd picks up the generated `.service` translation.
 pub async fn daemon_reload_user() -> Result<()> {
-    let status = Command::new("systemctl")
-        .args(["--user", "daemon-reload"])
-        .status()
+    // Bounded: a wedged user manager (e.g. a unit stuck "deactivating" while
+    // podman hangs) could otherwise block daemon-reload indefinitely and freeze
+    // any caller — notably uninstall teardown.
+    let status = systemctl_user_status(&["daemon-reload"], Duration::from_secs(30))
        .await
-        .context("spawn systemctl --user daemon-reload")?;
+        .context("systemctl --user daemon-reload")?;
    if !status.success() {
        return Err(anyhow!("systemctl --user daemon-reload exited {status}"));
    }
@ -624,7 +643,17 @@ pub async fn restart_service(service: &str) -> Result<()> {

 /// Stop a generated Quadlet service without removing its unit file.
 pub async fn stop_service(service: &str) -> Result<()> {
-    match systemctl_user_status(&["stop", service], QUADLET_STOP_TIMEOUT).await {
+    stop_service_with_timeout(service, QUADLET_STOP_TIMEOUT).await
+}
+
+/// Stop a user service, waiting up to `timeout` for a graceful stop before
+/// force-killing the app-scoped unit. Slow-to-SIGTERM apps (bitcoin-core ~600s,
+/// lnd ~330s) must not be SIGKILLed at the default 45s — that risks data
+/// corruption — so the orchestrator passes the per-app grace here. Never waits
+/// less than `QUADLET_STOP_TIMEOUT`.
+pub async fn stop_service_with_timeout(service: &str, timeout: Duration) -> Result<()> {
+    let timeout = timeout.max(QUADLET_STOP_TIMEOUT);
+    match systemctl_user_status(&["stop", service], timeout).await {
        Ok(status) if status.success() => Ok(()),
        Ok(status) => Err(anyhow!("systemctl --user stop {service} exited {status}")),
        Err(err) => {
@ -759,11 +788,19 @@ fn directive_values(unit_body: &str, prefix: &str) -> Vec<String> {
 /// that systemd no longer knows about.
 pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
    let svc = format!("{unit_name}.service");
-    // Stop first; ignore failure (unit may already be down).
-    let _ = Command::new("systemctl")
-        .args(["--user", "stop", &svc])
-        .status()
-        .await;
+    // Stop first; ignore failure (unit may already be down). BOUNDED — on
+    // rootless podman a generated unit can wedge in "deactivating" while
+    // `podman rm -f` hangs underneath it, and an unbounded `systemctl stop`
+    // would block the entire uninstall forever: the progress bar freezes and
+    // the package entry is stranded in `Removing` (a ghost in My Apps that also
+    // blocks reinstall). If the graceful stop times out, escalate to
+    // SIGKILL + reset-failed so teardown always proceeds.
+    if systemctl_user_status(&["stop", &svc], QUADLET_STOP_TIMEOUT)
+        .await
+        .is_err()
+    {
+        let _ = kill_and_reset_service(&svc).await;
+    }
    let path = dir.join(format!("{unit_name}.container"));
    if fs::try_exists(&path).await.unwrap_or(false) {
        match fs::remove_file(&path).await {
@ -774,10 +811,15 @@ pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
    }
    daemon_reload_user().await.ok();
    // Defensive: kill the actual container too, in case quadlet left it.
-    let _ = Command::new("podman")
-        .args(["rm", "-f", unit_name])
-        .status()
-        .await;
+    // Bounded so a hung podman store can't re-introduce the stall this function
+    // exists to avoid.
+    let _ = tokio::time::timeout(
+        QUADLET_STOP_TIMEOUT,
+        Command::new("podman")
+            .args(["rm", "-f", unit_name])
+            .status(),
+    )
+    .await;
    Ok(())
 }

@ -852,6 +894,26 @@ mod tests {
        assert!(!s.contains("Network=host"));
    }

+    #[test]
+    fn render_host_network_omits_publish_ports() {
+        // Podman rejects PublishPort with Network=host (crash-loop exit 125).
+        let mut u = sample_unit();
+        u.network = NetworkMode::Host;
+        u.ports = vec![(3000, 3000, "tcp".into())];
+        let s = u.render();
+        assert!(s.contains("Network=host"));
+        assert!(!s.contains("PublishPort"));
+    }
+
+    #[test]
+    fn render_non_host_network_emits_publish_ports() {
+        let mut u = sample_unit();
+        u.network = NetworkMode::Bridge("archy-net".into());
+        u.ports = vec![(3000, 3000, "tcp".into())];
+        let s = u.render();
+        assert!(s.contains("PublishPort=3000:3000/tcp"));
+    }
+
    #[test]
    fn unit_filename_and_service_name_are_consistent() {
        let u = sample_unit();
@ -1033,6 +1095,7 @@ app:
  version: 1.0.0
  container:
    image: registry/bitcoin-knots:1.0
+    network: archy-net
    entrypoint: ["/usr/local/bin/bitcoind"]
    custom_args: ["-server=1", "-rpcbind=0.0.0.0"]
  ports:
@ -1053,7 +1116,7 @@ app:
  security:
    capabilities: ["NET_BIND_SERVICE"]
    readonly_root: true
-    network_policy: archy-net
+    network_policy: isolated
 "#;
        let m = AppManifest::parse(yaml).expect("manifest must parse");
        let u = QuadletUnit::from_manifest(&m, "bitcoin-knots");
@ -1193,7 +1256,7 @@ app:
    image: x:latest
  volumes:
    - type: bind
-      source: /etc/host-conf
+      source: /var/lib/archipelago/x-conf
      target: /etc/conf
      options: ["ro"]
 "#;
@ -1217,7 +1280,7 @@ app:
      target: /tmp
      tmpfs_options: "rw,size=64m"
    - type: bind
-      source: /var/lib/x
+      source: /var/lib/archipelago/x
      target: /data
      options: []
 "#;
@ -1225,7 +1288,7 @@ app:
        let u = QuadletUnit::from_manifest(&m, "x");
        // tmpfs entry is dropped from bind_mounts; bind entry survives.
        assert_eq!(u.bind_mounts.len(), 1);
-        assert_eq!(u.bind_mounts[0].host, PathBuf::from("/var/lib/x"));
+        assert_eq!(u.bind_mounts[0].host, PathBuf::from("/var/lib/archipelago/x"));
    }

    #[test]
@ -1404,6 +1467,31 @@ app:
        assert!(!publish_ports_changed(new, new));
    }

+    #[test]
+    fn from_manifest_appends_manifest_network_aliases_for_bridge() {
+        let yaml = r#"
+app:
+  id: indeedhub-api
+  name: IndeedHub API
+  version: 1.0.0
+  container:
+    image: registry/indeedhub-api:1.0.0
+    network: indeedhub-net
+    network_aliases: [api]
+  security:
+    capabilities: []
+    network_policy: isolated
+"#;
+        let m = AppManifest::parse(yaml).expect("manifest must parse");
+        let u = QuadletUnit::from_manifest(&m, "indeedhub-api");
+        assert!(matches!(u.network, NetworkMode::Bridge(ref n) if n == "indeedhub-net"));
+        // Own name first, then the baked-in short alias the frontend nginx uses.
+        assert_eq!(u.network_aliases, vec!["indeedhub-api", "api"]);
+        let s = u.render();
+        assert!(s.contains("NetworkAlias=api"));
+        assert!(s.contains("PodmanArgs=--network-alias=api"));
+    }
+
    #[test]
    fn network_aliases_changed_detects_service_discovery_drift() {
        let old = "[Container]\nNetwork=archy-net\n";
@ -1462,6 +1550,7 @@ app:
  version: 1.0.0
  container:
    image: registry/lnd:latest
+    network: archy-net
  ports:
    - host: 10009
      container: 10009
@ -1477,7 +1566,7 @@ app:
    memory_limit: 1g
  security:
    capabilities: []
-    network_policy: archy-net
+    network_policy: isolated
 "#;
        let m = AppManifest::parse(yaml).unwrap();
        let body = QuadletUnit::from_manifest(&m, "lnd").render();
--- a/core/archipelago/src/container/secrets.rs
+++ b/core/archipelago/src/container/secrets.rs
@ -0,0 +1,208 @@
+//! Declarative, self-healing generation of app secrets.
+//!
+//! An app declares `generated_secrets` in its manifest; this module materialises
+//! them just before `secret_env` is resolved. That keeps the migration's
+//! data-driven bar: an app installs from its manifest alone — no host
+//! provisioning and no per-app Rust — and every secret lands `0600`, owned by
+//! the unprivileged (rootless) service user.
+//!
+//! Two properties make it safe to call on every install/reconcile tick:
+//!
+//! * **Idempotent** — a target file that already exists, is readable and
+//!   non-empty is left untouched, so values are stable across ticks.
+//! * **Self-healing without privilege** — a target file that exists but is
+//!   *unreadable* (the classic `root:root`-owned secret left by some earlier
+//!   path) is unlinked and rewritten. Unlinking needs write on the
+//!   service-owned secrets dir, not on the file, so this recovers the broken
+//!   state with no `chown` and no root — exactly what a rootless node needs.
+
+use anyhow::{Context, Result};
+use archipelago_container::{AppManifest, GeneratedSecret, SecretGenKind};
+use rand::RngCore;
+use std::fs;
+use std::io::Write;
+use std::os::unix::fs::OpenOptionsExt;
+use std::path::Path;
+
+/// Plaintext-password length (bytes of entropy) for [`SecretGenKind::Bcrypt`].
+const BCRYPT_PASSWORD_BYTES: usize = 24;
+
+/// Materialise every declared generated secret for `manifest` under
+/// `secrets_dir`. No-op when the manifest declares none. Safe to call on every
+/// reconcile/install tick (idempotent + self-healing).
+pub fn ensure_generated_secrets(secrets_dir: &Path, manifest: &AppManifest) -> Result<()> {
+    let specs = &manifest.app.container.generated_secrets;
+    if specs.is_empty() {
+        return Ok(());
+    }
+    fs::create_dir_all(secrets_dir)
+        .with_context(|| format!("creating secrets dir {}", secrets_dir.display()))?;
+    for gs in specs {
+        ensure_one(secrets_dir, gs).with_context(|| format!("generating secret '{}'", gs.name))?;
+    }
+    Ok(())
+}
+
+fn ensure_one(dir: &Path, gs: &GeneratedSecret) -> Result<()> {
+    let files = gs.target_files();
+
+    // Idempotent fast path: every target file present, readable and non-empty.
+    if files.iter().all(|f| readable_nonempty(&dir.join(f))) {
+        return Ok(());
+    }
+
+    // Self-heal: drop any stale/unreadable target so the write below recreates
+    // it owned by us. Unlinking uses the (service-owned) dir's write bit, so a
+    // wrongly root-owned secret is recovered with no privilege escalation.
+    for f in &files {
+        let p = dir.join(f);
+        if p.exists() && !readable_nonempty(&p) {
+            tracing::warn!("regenerating unreadable/stale secret {}", p.display());
+            fs::remove_file(&p)
+                .with_context(|| format!("removing stale secret {}", p.display()))?;
+        }
+    }
+
+    match gs.kind {
+        SecretGenKind::Hex16 => write_secret(&dir.join(&gs.name), &random_hex(16))?,
+        SecretGenKind::Hex32 => write_secret(&dir.join(&gs.name), &random_hex(32))?,
+        SecretGenKind::Base64 => write_secret(&dir.join(&gs.name), &random_base64(32))?,
+        SecretGenKind::Bcrypt => {
+            let password = random_hex(BCRYPT_PASSWORD_BYTES);
+            let hash = bcrypt::hash(&password, bcrypt::DEFAULT_COST)
+                .context("bcrypt-hashing generated password")?;
+            // Primary (server-facing hash) first, then the plaintext sibling.
+            write_secret(&dir.join(&gs.name), &hash)?;
+            write_secret(&dir.join(format!("{}.pw", gs.name)), &password)?;
+        }
+    }
+    Ok(())
+}
+
+/// True when `path` exists, is readable by this process, and is non-empty after
+/// trimming. Any error (missing, permission denied, empty) reads as false.
+fn readable_nonempty(path: &Path) -> bool {
+    fs::read_to_string(path)
+        .map(|s| !s.trim().is_empty())
+        .unwrap_or(false)
+}
+
+fn random_hex(bytes: usize) -> String {
+    let mut buf = vec![0u8; bytes];
+    rand::thread_rng().fill_bytes(&mut buf);
+    hex::encode(buf)
+}
+
+/// `bytes` of entropy, standard base64 (with padding). For keys that a service
+/// base64-decodes to recover the raw bytes (e.g. netbird's store encryptionKey).
+fn random_base64(bytes: usize) -> String {
+    use base64::Engine as _;
+    let mut buf = vec![0u8; bytes];
+    rand::thread_rng().fill_bytes(&mut buf);
+    base64::engine::general_purpose::STANDARD.encode(buf)
+}
+
+/// Atomically write a `0600` secret: a temp file in the same dir (so the rename
+/// is atomic), fsynced, then renamed over the target.
+fn write_secret(path: &Path, value: &str) -> Result<()> {
+    let dir = path
+        .parent()
+        .context("secret path has no parent directory")?;
+    let name = path
+        .file_name()
+        .and_then(|n| n.to_str())
+        .context("secret path has no filename")?;
+    let tmp = dir.join(format!(".{name}.tmp"));
+
+    let mut f = fs::OpenOptions::new()
+        .write(true)
+        .create(true)
+        .truncate(true)
+        .mode(0o600)
+        .open(&tmp)
+        .with_context(|| format!("creating temp secret {}", tmp.display()))?;
+    f.write_all(value.as_bytes())
+        .with_context(|| format!("writing temp secret {}", tmp.display()))?;
+    f.sync_all()
+        .with_context(|| format!("fsync temp secret {}", tmp.display()))?;
+    drop(f);
+
+    fs::rename(&tmp, path)
+        .with_context(|| format!("renaming {} -> {}", tmp.display(), path.display()))?;
+    Ok(())
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use archipelago_container::SecretGenKind;
+    use std::os::unix::fs::PermissionsExt;
+
+    fn manifest_with(secrets: Vec<GeneratedSecret>) -> AppManifest {
+        let mut m: AppManifest = serde_yaml::from_str(
+            "app:\n  id: t\n  name: t\n  version: 1.0.0\n  container:\n    image: x:y\n",
+        )
+        .unwrap();
+        m.app.container.generated_secrets = secrets;
+        m
+    }
+
+    fn gs(name: &str, kind: SecretGenKind) -> GeneratedSecret {
+        GeneratedSecret {
+            name: name.to_string(),
+            kind,
+        }
+    }
+
+    #[test]
+    fn generates_hex_and_bcrypt_with_0600() {
+        let dir = tempfile::tempdir().unwrap();
+        let m = manifest_with(vec![
+            gs("tok", SecretGenKind::Hex16),
+            gs("admin", SecretGenKind::Bcrypt),
+        ]);
+        ensure_generated_secrets(dir.path(), &m).unwrap();
+
+        let tok = std::fs::read_to_string(dir.path().join("tok")).unwrap();
+        assert_eq!(tok.trim().len(), 32, "hex16 = 16 bytes = 32 hex chars");
+
+        let hash = std::fs::read_to_string(dir.path().join("admin")).unwrap();
+        let pw = std::fs::read_to_string(dir.path().join("admin.pw")).unwrap();
+        assert!(hash.starts_with("$2"), "bcrypt hash shape");
+        assert!(bcrypt::verify(pw.trim(), hash.trim()).unwrap(), "pw matches hash");
+
+        for f in ["tok", "admin", "admin.pw"] {
+            let mode = std::fs::metadata(dir.path().join(f))
+                .unwrap()
+                .permissions()
+                .mode()
+                & 0o777;
+            assert_eq!(mode, 0o600, "{f} must be 0600");
+        }
+    }
+
+    #[test]
+    fn idempotent_value_is_stable() {
+        let dir = tempfile::tempdir().unwrap();
+        let m = manifest_with(vec![gs("tok", SecretGenKind::Hex32)]);
+        ensure_generated_secrets(dir.path(), &m).unwrap();
+        let first = std::fs::read_to_string(dir.path().join("tok")).unwrap();
+        ensure_generated_secrets(dir.path(), &m).unwrap();
+        let second = std::fs::read_to_string(dir.path().join("tok")).unwrap();
+        assert_eq!(first, second, "a present readable secret is never rewritten");
+    }
+
+    #[test]
+    fn self_heals_unreadable_secret() {
+        // Simulate the root-owned case: a present-but-unreadable file. We can't
+        // chmod-away read as the owner in a unit test, so emulate "unreadable"
+        // via the empty-file branch (readable_nonempty == false), which drives
+        // the same unlink+regenerate path.
+        let dir = tempfile::tempdir().unwrap();
+        std::fs::write(dir.path().join("tok"), "").unwrap();
+        let m = manifest_with(vec![gs("tok", SecretGenKind::Hex16)]);
+        ensure_generated_secrets(dir.path(), &m).unwrap();
+        let v = std::fs::read_to_string(dir.path().join("tok")).unwrap();
+        assert_eq!(v.trim().len(), 32, "stale/empty secret was regenerated");
+    }
+}
--- a/core/archipelago/src/container/version_config.rs
+++ b/core/archipelago/src/container/version_config.rs
@ -0,0 +1,278 @@
+//! Per-app version preferences — the persistence layer for multi-version support.
+//!
+//! Multi-version support (`docs/bitcoin-multi-version-design.md`) lets a node
+//! runner pin Bitcoin Core / Knots to a specific version and opt into
+//! auto-update-to-latest. Both choices live in the existing per-app config file
+//! at `/var/lib/archipelago/app-configs/<id>.json` as two keys:
+//!
+//! ```jsonc
+//! { "pinnedVersion": "29.3.knots20260508", "autoUpdate": false }
+//! ```
+//!
+//! This is the single source of truth the orchestrator's install path reads to
+//! resolve the image, and that the auto-update tick + "available update" badge
+//! consult. Reads/writes are merge-preserving so they never clobber any
+//! `containerConfig` (ports/volumes/env) a generic app may also store here.
+//!
+//! Platform-managed apps (bitcoin-core/knots/…) never use the
+//! `containerConfig`-style keys (see `config.rs::dynamic_app_config`, which
+//! returns early for them), so adding these keys to their file is collision-free.
+
+use serde_json::{Map, Value};
+use std::path::PathBuf;
+
+/// Resolved version preferences for one app. Defaults: no pin, auto-update off
+/// (consensus-critical apps opt in explicitly — design open-question #4).
+#[derive(Debug, Clone, Default, PartialEq, Eq)]
+pub struct AppVersionConfig {
+    /// The version string the runner pinned, if any. Suppresses the update badge
+    /// and overrides the catalog default at install/recreate time.
+    pub pinned_version: Option<String>,
+    /// When true, the hourly catalog tick updates this app to the catalog
+    /// default automatically. Ignored while a version is pinned.
+    pub auto_update: bool,
+}
+
+fn config_dir() -> PathBuf {
+    let base = std::env::var("ARCHIPELAGO_DATA_DIR")
+        .unwrap_or_else(|_| "/var/lib/archipelago".to_string());
+    PathBuf::from(base).join("app-configs")
+}
+
+fn config_path(app_id: &str) -> PathBuf {
+    config_dir().join(format!("{app_id}.json"))
+}
+
+/// App ids that have opted into auto-update-to-latest AND are not pinned (a pin
+/// is an explicit "stay here"). Drives the hourly per-app auto-update tick. The
+/// app id is the config file stem. Returns empty when the dir is absent.
+pub fn auto_update_apps() -> Vec<String> {
+    let mut out = Vec::new();
+    let Ok(entries) = std::fs::read_dir(config_dir()) else {
+        return out;
+    };
+    for entry in entries.flatten() {
+        let path = entry.path();
+        if path.extension().and_then(|e| e.to_str()) != Some("json") {
+            continue;
+        }
+        let Some(app_id) = path.file_stem().and_then(|s| s.to_str()) else {
+            continue;
+        };
+        let cfg = read(app_id);
+        if cfg.auto_update && cfg.pinned_version.is_none() {
+            out.push(app_id.to_string());
+        }
+    }
+    out
+}
+
+fn read_raw(app_id: &str) -> Map<String, Value> {
+    let path = config_path(app_id);
+    match std::fs::read_to_string(&path) {
+        Ok(s) => serde_json::from_str::<Value>(&s)
+            .ok()
+            .and_then(|v| v.as_object().cloned())
+            .unwrap_or_default(),
+        Err(_) => Map::new(),
+    }
+}
+
+/// Read the version preferences for `app_id`. Returns defaults when the file is
+/// absent or the keys are unset.
+pub fn read(app_id: &str) -> AppVersionConfig {
+    let obj = read_raw(app_id);
+    AppVersionConfig {
+        pinned_version: obj
+            .get("pinnedVersion")
+            .and_then(Value::as_str)
+            .filter(|s| !s.is_empty())
+            .map(String::from),
+        auto_update: obj
+            .get("autoUpdate")
+            .and_then(Value::as_bool)
+            .unwrap_or(false),
+    }
+}
+
+/// The pinned version for `app_id`, if set. Convenience for the hot path.
+pub fn pinned_version(app_id: &str) -> Option<String> {
+    read(app_id).pinned_version
+}
+
+/// Parse the leading numeric `major.minor.patch` of a version string into a
+/// comparable tuple. Stops at the first non-numeric component, so Bitcoin Core
+/// (`31.0`, `28.4`) and the Knots date-suffixed form (`29.3.knots20260508` →
+/// `(29, 3, 0)`) both compare on their consensus-relevant major/minor. The
+/// Knots build-date suffix is intentionally ignored — a same-major.minor Knots
+/// rebuild is not a chainstate downgrade.
+fn version_key(version: &str) -> (u64, u64, u64) {
+    let mut it = version.split('.').map(|c| {
+        // Take the leading digit run of each dotted component (`knots20260508`
+        // yields no leading digits → 0; `3` → 3).
+        c.chars()
+            .take_while(|ch| ch.is_ascii_digit())
+            .collect::<String>()
+            .parse::<u64>()
+            .unwrap_or(0)
+    });
+    (
+        it.next().unwrap_or(0),
+        it.next().unwrap_or(0),
+        it.next().unwrap_or(0),
+    )
+}
+
+/// True when installing `candidate` over `current` is a DOWNGRADE — an older
+/// Bitcoin release over a chainstate written by a newer one. This is the
+/// highest-risk operation (Core refuses to start on a newer chainstate without
+/// an expensive reindex; pruned nodes can lose data), so the UI must warn and
+/// the switch must be explicitly confirmed (design §4). Equal or newer → false.
+pub fn is_downgrade(current: &str, candidate: &str) -> bool {
+    version_key(candidate) < version_key(current)
+}
+
+/// Merge `cfg` into the on-disk config, preserving every other key. A
+/// `pinned_version` of `None` removes the `pinnedVersion` key (un-pins / "track
+/// latest"). Creates the directory and file on first write.
+pub fn write(app_id: &str, cfg: &AppVersionConfig) -> std::io::Result<()> {
+    let path = config_path(app_id);
+    let mut obj = read_raw(app_id);
+    match &cfg.pinned_version {
+        Some(v) => {
+            obj.insert("pinnedVersion".to_string(), Value::String(v.clone()));
+        }
+        None => {
+            obj.remove("pinnedVersion");
+        }
+    }
+    obj.insert("autoUpdate".to_string(), Value::Bool(cfg.auto_update));
+
+    if let Some(parent) = path.parent() {
+        std::fs::create_dir_all(parent)?;
+    }
+    let serialized = serde_json::to_string_pretty(&Value::Object(obj))
+        .map_err(|e| std::io::Error::new(std::io::ErrorKind::InvalidData, e))?;
+    // Atomic-ish write: temp + rename so a crash mid-write can't truncate config.
+    let tmp = path.with_extension("json.tmp");
+    std::fs::write(&tmp, serialized.as_bytes())?;
+    std::fs::rename(&tmp, &path)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    // `ARCHIPELAGO_DATA_DIR` is process-global, so the write/read tests must not
+    // run concurrently — serialize them and give each a unique dir. Without this
+    // lock, parallel `cargo test` races on the env var (poisoning is fine: a
+    // panicking test still releases a usable guard).
+    static ENV_LOCK: std::sync::Mutex<u64> = std::sync::Mutex::new(0);
+
+    fn with_tmp_data_dir<F: FnOnce()>(f: F) {
+        let mut counter = ENV_LOCK.lock().unwrap_or_else(|e| e.into_inner());
+        *counter += 1;
+        let dir = std::env::temp_dir().join(format!(
+            "archy-vc-test-{}-{}",
+            std::process::id(),
+            *counter
+        ));
+        let _ = std::fs::remove_dir_all(&dir);
+        std::fs::create_dir_all(&dir).unwrap();
+        std::env::set_var("ARCHIPELAGO_DATA_DIR", &dir);
+        f();
+        std::env::remove_var("ARCHIPELAGO_DATA_DIR");
+        let _ = std::fs::remove_dir_all(&dir);
+        // `counter` guard drops here, releasing the lock for the next test.
+    }
+
+    #[test]
+    fn defaults_when_absent() {
+        with_tmp_data_dir(|| {
+            let cfg = read("bitcoin-core");
+            assert_eq!(cfg.pinned_version, None);
+            assert!(!cfg.auto_update);
+        });
+    }
+
+    #[test]
+    fn write_then_read_roundtrips() {
+        with_tmp_data_dir(|| {
+            write(
+                "bitcoin-knots",
+                &AppVersionConfig {
+                    pinned_version: Some("29.3.knots20260508".into()),
+                    auto_update: false,
+                },
+            )
+            .unwrap();
+            let cfg = read("bitcoin-knots");
+            assert_eq!(cfg.pinned_version.as_deref(), Some("29.3.knots20260508"));
+            assert!(!cfg.auto_update);
+        });
+    }
+
+    #[test]
+    fn write_preserves_existing_keys() {
+        with_tmp_data_dir(|| {
+            // Simulate a generic app's containerConfig already on disk.
+            let path = config_path("someapp");
+            std::fs::create_dir_all(path.parent().unwrap()).unwrap();
+            std::fs::write(&path, r#"{"ports":["80:80"],"autoUpdate":false}"#).unwrap();
+            write(
+                "someapp",
+                &AppVersionConfig {
+                    pinned_version: Some("1.2.3".into()),
+                    auto_update: true,
+                },
+            )
+            .unwrap();
+            let raw = read_raw("someapp");
+            assert!(raw.contains_key("ports"), "ports key must survive");
+            assert_eq!(raw.get("pinnedVersion").unwrap(), "1.2.3");
+            assert_eq!(raw.get("autoUpdate").unwrap(), &Value::Bool(true));
+        });
+    }
+
+    #[test]
+    fn downgrade_detection() {
+        // Older over newer = downgrade.
+        assert!(is_downgrade("31.0", "30.0"));
+        assert!(is_downgrade("28.4", "27.2"));
+        // Same or newer = not a downgrade.
+        assert!(!is_downgrade("30.0", "31.0"));
+        assert!(!is_downgrade("28.4", "28.4"));
+        // Knots date-suffixed strings compare on major.minor only.
+        assert!(is_downgrade("29.3.knots20260508", "28.1.knots20251010"));
+        assert!(!is_downgrade(
+            "29.3.knots20260101",
+            "29.3.knots20260508"
+        ));
+    }
+
+    #[test]
+    fn unpin_removes_key() {
+        with_tmp_data_dir(|| {
+            write(
+                "bitcoin-core",
+                &AppVersionConfig {
+                    pinned_version: Some("31.0".into()),
+                    auto_update: true,
+                },
+            )
+            .unwrap();
+            write(
+                "bitcoin-core",
+                &AppVersionConfig {
+                    pinned_version: None,
+                    auto_update: true,
+                },
+            )
+            .unwrap();
+            let raw = read_raw("bitcoin-core");
+            assert!(!raw.contains_key("pinnedVersion"));
+            assert_eq!(read("bitcoin-core").pinned_version, None);
+            assert!(read("bitcoin-core").auto_update);
+        });
+    }
+}
--- a/core/archipelago/src/crash_recovery.rs
+++ b/core/archipelago/src/crash_recovery.rs
@ -61,6 +61,22 @@ pub async fn load_user_stopped(data_dir: &Path) -> std::collections::HashSet<Str
    }
 }

+/// Names of the containers that were running at the last periodic snapshot
+/// (`running-containers.json`, saved every ~120s by `save_container_snapshot`).
+/// Unlike `check_for_crash`, this reads the snapshot unconditionally (no PID/crash
+/// gate) — it's the durable "what was running" signal the boot reconciler uses to
+/// recreate a previously-running app whose container vanished. Empty if absent.
+pub async fn load_last_running_names(data_dir: &Path) -> std::collections::HashSet<String> {
+    let path = data_dir.join(CONTAINER_STATE_FILE);
+    match fs::read_to_string(&path).await {
+        Ok(content) => match serde_json::from_str::<ContainerSnapshot>(&content) {
+            Ok(snapshot) => snapshot.containers.into_iter().map(|c| c.name).collect(),
+            Err(_) => std::collections::HashSet::new(),
+        },
+        Err(_) => std::collections::HashSet::new(),
+    }
+}
+
 /// Save the set of user-stopped containers to disk.
 pub async fn save_user_stopped(data_dir: &Path, stopped: &std::collections::HashSet<String>) {
    let path = data_dir.join(USER_STOPPED_FILE);
@ -898,6 +914,43 @@ mod tests {
        assert_eq!(containers[1].name, "archy-mempool-web");
    }

+    #[tokio::test]
+    async fn test_load_last_running_names_reads_snapshot_without_pid_gate() {
+        let tmp = TempDir::new().unwrap();
+        // No PID file written — load_last_running_names must NOT require a crash.
+        let snapshot = ContainerSnapshot {
+            timestamp: 1000,
+            containers: vec![
+                RunningContainerRecord {
+                    name: "immich_server".to_string(),
+                    image: "immich:2.7".to_string(),
+                },
+                RunningContainerRecord {
+                    name: "immich_postgres".to_string(),
+                    image: "postgres:16".to_string(),
+                },
+            ],
+        };
+        fs::write(
+            tmp.path().join(CONTAINER_STATE_FILE),
+            serde_json::to_string(&snapshot).unwrap(),
+        )
+        .await
+        .unwrap();
+
+        let names = load_last_running_names(tmp.path()).await;
+        assert_eq!(names.len(), 2);
+        assert!(names.contains("immich_server"));
+        assert!(names.contains("immich_postgres"));
+        assert!(!names.contains("immich_redis"));
+    }
+
+    #[tokio::test]
+    async fn test_load_last_running_names_empty_when_absent() {
+        let tmp = TempDir::new().unwrap();
+        assert!(load_last_running_names(tmp.path()).await.is_empty());
+    }
+
    #[tokio::test]
    async fn test_write_and_remove_pid_marker() {
        let tmp = TempDir::new().unwrap();
--- a/core/archipelago/src/main.rs
+++ b/core/archipelago/src/main.rs
@ -198,14 +198,53 @@ async fn main() -> Result<()> {
        (Some(trait_obj), Some(dev))
    } else {
        let prod = Arc::new(ProdContainerOrchestrator::new(config.clone()).await?);
+        // Pull the freshest signed app-catalog BEFORE loading manifests, so any
+        // registry-embedded manifest (the origin-wins overlay in load_manifests)
+        // is in place on THIS boot — not a restart later. Without this the boot
+        // would overlay the previous run's cached catalog and a newly-published
+        // app (e.g. a registry-only install) wouldn't appear until the next
+        // restart. Bounded + best-effort: on timeout/unreachable origin the
+        // last-cached catalog (or the disk manifests) still load — registry is
+        // an overlay on top of disk, never a hard dependency.
+        match tokio::time::timeout(
+            std::time::Duration::from_secs(25),
+            crate::container::app_catalog::refresh_catalog(&config.data_dir),
+        )
+        .await
+        {
+            Ok(Ok(n)) => info!("🛰️  app-catalog refreshed before manifest load ({n} apps)"),
+            Ok(Err(e)) => tracing::debug!("app-catalog pre-load refresh failed (using cache): {e}"),
+            Err(_) => tracing::debug!("app-catalog pre-load refresh timed out (using cache)"),
+        }
        // Best-effort manifest load; a missing /opt/archipelago/apps is
        // logged inside load_manifests and not fatal.
        match prod.load_manifests().await {
-            Ok(n) => info!("📦 Loaded {n} app manifest(s) from disk"),
+            Ok(n) => info!("📦 Loaded {n} app manifest(s) (disk + registry catalog)"),
            Err(e) => {
                tracing::error!(error = %e, "prod orchestrator: load_manifests failed at startup");
            }
        }
+        // Reboot-survival safety net for the podman `--restart` path: ensure the
+        // user's podman-restart.service is enabled so `unless-stopped` containers
+        // come back after a reboot even when the Quadlet backend path is off
+        // (orchestrator-installed backends like immich/btcpay run as plain podman
+        // containers until the Phase-3 Quadlet rollout). Idempotent + best-effort.
+        {
+            let out = tokio::process::Command::new("systemctl")
+                .args(["--user", "enable", "--now", "podman-restart.service"])
+                .output()
+                .await;
+            match out {
+                Ok(o) if o.status.success() => {
+                    info!("🔁 podman-restart.service enabled (reboot-survival for --restart containers)")
+                }
+                Ok(o) => tracing::debug!(
+                    "podman-restart.service enable skipped: {}",
+                    String::from_utf8_lossy(&o.stderr).trim()
+                ),
+                Err(e) => tracing::debug!("podman-restart.service enable skipped: {e}"),
+            }
+        }
        // Adoption pass: link existing podman containers back to their
        // manifests so the reconciler doesn't recreate them.
        match tokio::time::timeout(Duration::from_secs(35), prod.adopt_existing()).await {
@ -249,7 +288,9 @@ async fn main() -> Result<()> {
    // via auth.setup RPC. The Login page detects is_setup=false and shows
    // "Create Password" form instead of login form.

-    // Create server
+    // Create server. Keep a clone of the orchestrator handle for the background
+    // update scheduler (per-app auto-update applies via the orchestrator).
+    let update_orchestrator = orchestrator.clone();
    let server = Server::new(config.clone(), orchestrator, dev_orchestrator).await?;

    // Start server
@ -274,10 +315,12 @@ async fn main() -> Result<()> {
        });
    }

-    // Spawn background update scheduler
+    // Spawn background update scheduler. Pass the orchestrator so the scheduler
+    // can apply per-app auto-update-to-latest (multi-version support) via the
+    // safe orchestrator upgrade path; None in dev mode disables it.
    let update_data_dir = config.data_dir.clone();
    tokio::spawn(async move {
-        update::run_update_scheduler(update_data_dir).await;
+        update::run_update_scheduler(update_data_dir, update_orchestrator).await;
    });

    // Synchronize host-side doctor artifacts (script + systemd units) with
--- a/core/archipelago/src/mesh/listener/mod.rs
+++ b/core/archipelago/src/mesh/listener/mod.rs
@ -373,6 +373,8 @@ pub fn spawn_mesh_listener(
    our_x25519_secret: [u8; 32],
    our_x25519_pubkey_hex: String,
    server_name: Option<String>,
+    lora_region: Option<String>,
+    channel_name: Option<String>,
    shutdown: tokio::sync::watch::Receiver<bool>,
    cmd_rx: mpsc::Receiver<MeshCommand>,
 ) -> tokio::task::JoinHandle<()> {
@ -394,6 +396,8 @@ pub fn spawn_mesh_listener(
                &our_x25519_secret,
                &our_x25519_pubkey_hex,
                server_name.as_deref(),
+                lora_region.as_deref(),
+                channel_name.as_deref(),
                &mut shutdown,
                &mut cmd_rx,
            )
--- a/core/archipelago/src/mesh/listener/session.rs
+++ b/core/archipelago/src/mesh/listener/session.rs
@ -39,6 +39,30 @@ impl MeshRadioDevice {
        }
    }

+    /// Provision the operator-configured LoRa region. Meshcore radios manage
+    /// their own band on the device, so this is a no-op for them; Meshtastic
+    /// radios ship region-UNSET (RF-silent) and must be set or they never mesh.
+    /// Returns `Ok(true)` when a region was written (the device reboots to
+    /// apply, so the caller should restart the session).
+    async fn ensure_lora_region(&mut self, region: Option<&str>) -> Result<bool> {
+        match self {
+            Self::Meshcore(_) => Ok(false),
+            Self::Meshtastic(device) => device.ensure_lora_region(region).await,
+        }
+    }
+
+    /// Provision the shared archy primary channel so all nodes can decode each
+    /// other. No-op for meshcore (it joins its channel by name on the device);
+    /// Meshtastic radios can sit on mismatched channels otherwise and silently
+    /// drop every packet as undecryptable. Returns `Ok(true)` when a channel was
+    /// written (device reboots; caller should restart the session).
+    async fn ensure_channel(&mut self, channel_name: Option<&str>) -> Result<bool> {
+        match self {
+            Self::Meshcore(_) => Ok(false),
+            Self::Meshtastic(device) => device.ensure_channel(channel_name).await,
+        }
+    }
+
    async fn send_self_advert(&mut self) -> Result<()> {
        match self {
            Self::Meshcore(device) => device.send_self_advert().await,
@ -46,6 +70,17 @@ impl MeshRadioDevice {
        }
    }

+    /// Actively advertise our identity over the air. Meshcore already does this
+    /// inside `send_self_advert` (CMD_SEND_SELF_ADVERT), so this is a no-op for
+    /// it; Meshtastic needs an explicit NodeInfo broadcast or peers never learn
+    /// about an already-running node.
+    async fn send_nodeinfo_advert(&mut self, want_response: bool) -> Result<()> {
+        match self {
+            Self::Meshcore(_) => Ok(()),
+            Self::Meshtastic(device) => device.send_nodeinfo_broadcast(want_response).await,
+        }
+    }
+
    async fn send_channel_text(&mut self, channel: u8, payload: &[u8]) -> Result<()> {
        match self {
            Self::Meshcore(device) => device.send_channel_text(channel, payload).await,
@ -471,6 +506,23 @@ async fn sync_queued_messages(
    }
 }

+/// How many times we will try to write the LoRa region across reconnects before
+/// giving up. A healthy radio accepts it on the first try (the reboot-and-verify
+/// resolves on the next session). A radio that silently refuses to persist
+/// config — corrupt/full flash, managed mode, etc. — would otherwise reboot-loop
+/// forever; after this many attempts we stop, log, and run without it.
+const MAX_REGION_PROVISION_ATTEMPTS: u32 = 3;
+
+/// Process-global count of LoRa-region writes attempted (one radio per process).
+/// Reset to 0 whenever the radio reports the desired region, so genuine later
+/// drift re-provisions but a broken radio doesn't loop.
+static REGION_PROVISION_ATTEMPTS: std::sync::atomic::AtomicU32 =
+    std::sync::atomic::AtomicU32::new(0);
+
+/// Same retry-cap idea as the region, for the shared-channel write.
+static CHANNEL_PROVISION_ATTEMPTS: std::sync::atomic::AtomicU32 =
+    std::sync::atomic::AtomicU32::new(0);
+
 /// Run a single mesh session (connect, initialize, main loop).
 pub(super) async fn run_mesh_session(
    state: &Arc<MeshState>,
@ -480,6 +532,8 @@ pub(super) async fn run_mesh_session(
    our_x25519_secret: &[u8; 32],
    our_x25519_pubkey_hex: &str,
    server_name: Option<&str>,
+    lora_region: Option<&str>,
+    channel_name: Option<&str>,
    shutdown: &mut tokio::sync::watch::Receiver<bool>,
    cmd_rx: &mut mpsc::Receiver<MeshCommand>,
 ) -> Result<()> {
@ -512,6 +566,73 @@ pub(super) async fn run_mesh_session(

    let _ = state.event_tx.send(MeshEvent::DeviceConnected(device_info));

+    // Provision the LoRa region before anything else. A fresh Meshtastic radio
+    // is region-UNSET and therefore RF-silent — it can neither hear nor be
+    // heard, so contact discovery and DMs would all silently fail. If we write
+    // a new region the firmware reboots to apply it; restart the session so we
+    // re-handshake the freshly-rebooted radio (and then set its name on the
+    // reconnect, where the region already matches and no reboot occurs).
+    use std::sync::atomic::Ordering;
+    let region_attempts = REGION_PROVISION_ATTEMPTS.load(Ordering::Relaxed);
+    if region_attempts < MAX_REGION_PROVISION_ATTEMPTS {
+        match device.ensure_lora_region(lora_region).await {
+            Ok(true) => {
+                REGION_PROVISION_ATTEMPTS.fetch_add(1, Ordering::Relaxed);
+                info!(
+                    region = lora_region.unwrap_or(""),
+                    attempt = region_attempts + 1,
+                    max = MAX_REGION_PROVISION_ATTEMPTS,
+                    "Provisioned LoRa region — radio rebooting, restarting mesh session"
+                );
+                // Give the radio time to reboot before the reconnect re-opens it.
+                tokio::time::sleep(Duration::from_secs(10)).await;
+                return Ok(());
+            }
+            // Radio reports the desired region (or none configured): clear the
+            // attempt counter so a future genuine drift re-provisions cleanly.
+            Ok(false) => REGION_PROVISION_ATTEMPTS.store(0, Ordering::Relaxed),
+            Err(e) => warn!("Failed to provision LoRa region: {}", e),
+        }
+    } else if lora_region.is_some() {
+        warn!(
+            region = lora_region.unwrap_or(""),
+            attempts = MAX_REGION_PROVISION_ATTEMPTS,
+            "Radio did not persist the configured LoRa region after repeated \
+             attempts — continuing without it. The radio likely needs a manual \
+             factory reset / reflash; mesh discovery stays offline until its \
+             region is set."
+        );
+    }
+
+    // Provision the shared primary channel (after the region, since both reboot
+    // the radio). Without a matching channel two same-region radios still can't
+    // decode each other's traffic. Same retry-cap + restart-on-change pattern.
+    let channel_attempts = CHANNEL_PROVISION_ATTEMPTS.load(Ordering::Relaxed);
+    if channel_attempts < MAX_REGION_PROVISION_ATTEMPTS {
+        match device.ensure_channel(channel_name).await {
+            Ok(true) => {
+                CHANNEL_PROVISION_ATTEMPTS.fetch_add(1, Ordering::Relaxed);
+                info!(
+                    channel = channel_name.unwrap_or(""),
+                    attempt = channel_attempts + 1,
+                    max = MAX_REGION_PROVISION_ATTEMPTS,
+                    "Provisioned shared mesh channel — radio rebooting, restarting mesh session"
+                );
+                tokio::time::sleep(Duration::from_secs(10)).await;
+                return Ok(());
+            }
+            Ok(false) => CHANNEL_PROVISION_ATTEMPTS.store(0, Ordering::Relaxed),
+            Err(e) => warn!("Failed to provision mesh channel: {}", e),
+        }
+    } else if channel_name.is_some() {
+        warn!(
+            channel = channel_name.unwrap_or(""),
+            attempts = MAX_REGION_PROVISION_ATTEMPTS,
+            "Radio did not persist the shared mesh channel after repeated \
+             attempts — continuing without it; the radio may need a manual reset."
+        );
+    }
+
    // Set advert name to the server's human-readable name (e.g. "ThinkPad"),
    // falling back to the DID fragment if no name is configured.
    let advert_name = if let Some(name) = server_name {
@ -536,6 +657,13 @@ pub(super) async fn run_mesh_session(
    if let Err(e) = device.send_self_advert().await {
        warn!("Failed to send initial advert: {}", e);
    }
+    // Actively announce our identity over the air with want_response, so any
+    // already-running neighbour both learns about us and replies with its own
+    // NodeInfo — immediate two-way discovery instead of waiting for the radio's
+    // multi-hour NodeInfo cycle. (No-op for meshcore.)
+    if let Err(e) = device.send_nodeinfo_advert(true).await {
+        warn!("Failed to send initial NodeInfo advert: {}", e);
+    }

    // NOTE: Archipelago identity adverts (`ARCHY:2:{ed}:{x25519}`) are intentionally
    // NOT broadcast on the shared public channel (channel 0). Doing so spams every
@ -615,6 +743,13 @@ pub(super) async fn run_mesh_session(
                } else {
                    consecutive_write_failures = 0;
                }
+                // Periodic over-air identity beacon (no want_response, to avoid
+                // reply storms) so peers that come online later still discover
+                // us between the radio's own infrequent NodeInfo broadcasts.
+                // No-op for meshcore (its self-advert above already goes out).
+                if let Err(e) = device.send_nodeinfo_advert(false).await {
+                    debug!("Periodic NodeInfo advert failed: {}", e);
+                }
                // (Identity re-broadcast on the public channel intentionally
                // removed — see the note at session startup. It spammed the
                // shared channel every advert tick.)
--- a/core/archipelago/src/mesh/meshtastic.rs
+++ b/core/archipelago/src/mesh/meshtastic.rs
@ -22,6 +22,10 @@ const START2: u8 = 0xc3;
 const TO_RADIO_MAX: usize = 512;
 const BROADCAST_NUM: u32 = 0xffff_ffff;
 const TEXT_MESSAGE_APP: u32 = 1;
+/// Meshtastic PortNum for NodeInfo (identity) packets — used to actively
+/// advertise ourselves over the air so neighbours discover us, the parity
+/// equivalent of meshcore's self-advert.
+const NODEINFO_APP: u32 = 4;
 /// Meshtastic PortNum for admin (config) packets.
 const ADMIN_APP: u32 = 6;
 /// AdminMessage.set_owner oneof field number (carries a `User`).
@ -37,9 +41,31 @@ const TO_RADIO_HEARTBEAT: u64 = 7;
 const FROM_RADIO_PACKET: u64 = 2;
 const FROM_RADIO_MY_INFO: u64 = 3;
 const FROM_RADIO_NODE_INFO: u64 = 4;
+/// FromRadio.config (field 5): a `Config` block streamed during want_config.
+const FROM_RADIO_CONFIG: u64 = 5;
 const FROM_RADIO_CONFIG_COMPLETE_ID: u64 = 7;
 const FROM_RADIO_REBOOTED: u64 = 8;

+/// AdminMessage.set_config oneof field number (carries a `Config`). NB: 33 is
+/// `set_channel` — `set_config` is 34 (verified against meshtastic/protobufs).
+const ADMIN_SET_CONFIG_FIELD: u64 = 34;
+/// AdminMessage.set_channel oneof field number (carries a `Channel`).
+const ADMIN_SET_CHANNEL_FIELD: u64 = 33;
+/// FromRadio.channel (field 10): a `Channel` streamed during want_config.
+const FROM_RADIO_CHANNEL: u64 = 10;
+/// Channel.role value for the PRIMARY channel (broadcasts ride here).
+const CHANNEL_ROLE_PRIMARY: u64 = 1;
+/// Config.lora oneof field number (carries a `LoRaConfig`).
+const CONFIG_LORA_FIELD: u64 = 6;
+/// LoRaConfig field numbers we set when provisioning the radio's region.
+const LORA_USE_PRESET_FIELD: u64 = 1;
+const LORA_REGION_FIELD: u64 = 7;
+const LORA_HOP_LIMIT_FIELD: u64 = 8;
+const LORA_TX_ENABLED_FIELD: u64 = 9;
+/// RegionCode::UNSET — a radio in this state refuses to transmit or receive on
+/// LoRa, so it can never mesh. Fresh-flashed radios ship UNSET.
+const REGION_UNSET: u32 = 0;
+
 /// Async Meshtastic device handle.
 pub struct MeshtasticDevice {
    port: serial2_tokio::SerialPort,
@ -57,6 +83,19 @@ pub struct MeshtasticDevice {
    /// records which peers are PKC-capable, so we can tell a true end-to-end
    /// (PKI) DM from a channel-PSK fallback.
    peer_pubkeys: HashMap<u32, Vec<u8>>,
+    /// The radio's currently-configured LoRa region code, learned from the
+    /// `Config.lora` block during `initialize`. `None` until that frame is
+    /// seen; `Some(REGION_UNSET)` for a fresh radio that has never had a region
+    /// set (which means it is RF-silent). Used to decide whether we need to
+    /// provision the operator-configured region — and to avoid a reboot loop by
+    /// only writing when it actually differs.
+    current_region: Option<u32>,
+    /// The radio's current PRIMARY channel as `(name, psk)`, learned from the
+    /// `Channel` blocks during `initialize`. Two radios only decode each other
+    /// when their primary channel (name + psk → channel hash) matches, so archy
+    /// provisions a shared channel here the same way it provisions the region.
+    /// `None` until a primary `Channel` frame is seen.
+    current_primary_channel: Option<(String, Vec<u8>)>,
    device_path: String,
 }

@ -84,6 +123,8 @@ impl MeshtasticDevice {
            short_name: None,
            contacts: HashMap::new(),
            peer_pubkeys: HashMap::new(),
+            current_region: None,
+            current_primary_channel: None,
            device_path: path.to_string(),
        })
    }
@ -203,10 +244,207 @@ impl MeshtasticDevice {
        Ok(())
    }

+    /// Ensure the radio is provisioned for the operator-configured LoRa region.
+    /// A freshly-flashed Meshtastic radio ships with `region = UNSET`, which
+    /// makes the firmware refuse to transmit or receive anything — so two such
+    /// radios can never see each other and the mesh appears empty. This is the
+    /// Meshtastic analog of how a meshcore radio comes up on its configured
+    /// band: archy brings every node onto the same region automatically.
+    ///
+    /// Returns `Ok(true)` when it actually wrote a new region (the device then
+    /// reboots to apply it, so the caller should restart the session). Returns
+    /// `Ok(false)` when no change was needed (already correct, no region
+    /// configured, or an unrecognised region string) — never reboot-loops.
+    pub async fn ensure_lora_region(&mut self, region: Option<&str>) -> Result<bool> {
+        let Some(region_str) = region else {
+            return Ok(false);
+        };
+        let Some(code) = region_name_to_code(region_str) else {
+            warn!(
+                region = region_str,
+                "Unknown LoRa region in mesh-config — leaving radio region unchanged"
+            );
+            return Ok(false);
+        };
+        if code == REGION_UNSET {
+            // Operator explicitly asked for UNSET (or blank) — don't fight it.
+            return Ok(false);
+        }
+        match self.current_region {
+            Some(cur) if cur == code => Ok(false),
+            _ => {
+                self.set_lora_region(code).await?;
+                Ok(true)
+            }
+        }
+    }
+
+    /// Write a LoRa region to the locally-connected radio via an
+    /// `AdminMessage { set_config: Config { lora: LoRaConfig { … } } }` on the
+    /// ADMIN_APP port — the same local-admin path `set_advert_name` uses (no
+    /// session passkey needed over serial). We send a minimal, valid preset
+    /// config: `use_preset` + `LONG_FAST` (the default modem preset), the
+    /// chosen `region`, a sane `hop_limit`, and `tx_enabled`. The firmware
+    /// reboots to apply the change.
+    pub async fn set_lora_region(&mut self, region_code: u32) -> Result<()> {
+        let Some(node_num) = self.node_num else {
+            anyhow::bail!("Meshtastic set_lora_region: node_num unknown");
+        };
+
+        // LoRaConfig { use_preset(1)=true, region(7)=code, hop_limit(8)=3,
+        // tx_enabled(9)=true }. modem_preset defaults to LONG_FAST (0) and
+        // tx_power defaults to max, which is what we want for a stock mesh.
+        let mut lora = Vec::new();
+        encode_varint_field_into(LORA_USE_PRESET_FIELD, 1, &mut lora);
+        encode_varint_field_into(LORA_REGION_FIELD, region_code as u64, &mut lora);
+        encode_varint_field_into(LORA_HOP_LIMIT_FIELD, 3, &mut lora);
+        encode_varint_field_into(LORA_TX_ENABLED_FIELD, 1, &mut lora);
+
+        // Config { lora(6): LoRaConfig }
+        let mut config = Vec::new();
+        encode_len_field(CONFIG_LORA_FIELD, &lora, &mut config);
+
+        // AdminMessage { set_config(33): Config }
+        let mut admin = Vec::new();
+        encode_len_field(ADMIN_SET_CONFIG_FIELD, &config, &mut admin);
+
+        let packet = encode_mesh_packet(node_num, ADMIN_APP, &admin);
+        self.send_to_radio(&encode_to_radio_variant(TO_RADIO_PACKET, &packet))
+            .await
+            .context("Failed to send Meshtastic set_config(LoRa region) admin packet")?;
+
+        info!(
+            node_num,
+            region_code, "Set Meshtastic LoRa region (device will reboot to apply)"
+        );
+        self.current_region = Some(region_code);
+        Ok(())
+    }
+
+    /// Ensure the radio's PRIMARY channel matches the shared archy channel so
+    /// all nodes can decode each other. Region gets two radios onto the same
+    /// band; a matching channel (name + psk → channel hash) gets them decoding
+    /// each other's traffic — without it they hear each other but drop every
+    /// packet as undecryptable. The psk is derived deterministically from the
+    /// channel name, so every archy node with the same `channel_name` converges
+    /// on the same channel (the parity equivalent of meshcore's named channel).
+    ///
+    /// Returns `Ok(true)` when it wrote a new channel (the device reboots to
+    /// apply, so the caller should restart the session); `Ok(false)` when no
+    /// change was needed — never reboot-loops.
+    pub async fn ensure_channel(&mut self, channel_name: Option<&str>) -> Result<bool> {
+        let Some(channel_name) = channel_name else {
+            return Ok(false);
+        };
+        if channel_name.is_empty() {
+            return Ok(false);
+        }
+        let desired_psk = derive_channel_psk(channel_name);
+        let already = matches!(
+            &self.current_primary_channel,
+            Some((name, psk)) if name == channel_name && psk == &desired_psk
+        );
+        if already {
+            Ok(false)
+        } else {
+            self.set_channel(channel_name, &desired_psk).await?;
+            Ok(true)
+        }
+    }
+
+    /// Write the PRIMARY channel via `AdminMessage { set_channel: Channel { … } }`
+    /// (the same local-admin path as `set_advert_name`). The firmware reboots to
+    /// apply it.
+    pub async fn set_channel(&mut self, name: &str, psk: &[u8]) -> Result<()> {
+        let Some(node_num) = self.node_num else {
+            anyhow::bail!("Meshtastic set_channel: node_num unknown");
+        };
+
+        // ChannelSettings { psk(2), name(3) }
+        let mut settings = Vec::new();
+        encode_len_field(2, psk, &mut settings);
+        encode_len_field(3, name.as_bytes(), &mut settings);
+
+        // Channel { index(1)=0, settings(2), role(3)=PRIMARY }
+        let mut channel = Vec::new();
+        encode_varint_field_into(1, 0, &mut channel);
+        encode_len_field(2, &settings, &mut channel);
+        encode_varint_field_into(3, CHANNEL_ROLE_PRIMARY, &mut channel);
+
+        // AdminMessage { set_channel(33): Channel }
+        let mut admin = Vec::new();
+        encode_len_field(ADMIN_SET_CHANNEL_FIELD, &channel, &mut admin);
+
+        let packet = encode_mesh_packet(node_num, ADMIN_APP, &admin);
+        self.send_to_radio(&encode_to_radio_variant(TO_RADIO_PACKET, &packet))
+            .await
+            .context("Failed to send Meshtastic set_channel admin packet")?;
+
+        info!(node_num, channel = %name, "Set Meshtastic primary channel (device will reboot to apply)");
+        self.current_primary_channel = Some((name.to_string(), psk.to_vec()));
+        Ok(())
+    }
+
    pub async fn send_self_advert(&mut self) -> Result<()> {
        self.send_to_radio(&encode_heartbeat()).await
    }

+    /// Build our own `User` protobuf (id/long_name/short_name) for a NodeInfo
+    /// advert. Returns `None` until the handshake has learned our identity.
+    fn build_self_user(&self) -> Option<Vec<u8>> {
+        let mut user = Vec::new();
+        if let Some(id) = &self.user_id {
+            encode_len_field(1, id.as_bytes(), &mut user);
+        }
+        if let Some(long_name) = &self.long_name {
+            encode_len_field(2, long_name.as_bytes(), &mut user);
+        }
+        if let Some(short_name) = &self.short_name {
+            encode_len_field(3, short_name.as_bytes(), &mut user);
+        }
+        if user.is_empty() {
+            None
+        } else {
+            Some(user)
+        }
+    }
+
+    /// Actively advertise our identity over the air by broadcasting a NodeInfo
+    /// packet (our `User`) on the primary channel. Meshtastic radios otherwise
+    /// only emit NodeInfo on boot and every few hours, so without this two
+    /// already-running nodes can sit forever without discovering each other.
+    /// This is the Meshtastic analog of meshcore's periodic self-advert.
+    ///
+    /// `want_response` solicits each neighbour to reply with its own NodeInfo —
+    /// use it on connect for immediate two-way discovery; leave it off for the
+    /// periodic beacon so a busy mesh doesn't trigger reply storms.
+    pub async fn send_nodeinfo_broadcast(&mut self, want_response: bool) -> Result<()> {
+        let Some(user) = self.build_self_user() else {
+            debug!("Meshtastic NodeInfo advert skipped — local identity not known yet");
+            return Ok(());
+        };
+
+        // Data { portnum(1)=NODEINFO_APP, payload(2)=User, want_response(3)? }
+        let mut data = Vec::new();
+        encode_varint_field_into(1, NODEINFO_APP as u64, &mut data);
+        encode_len_field(2, &user, &mut data);
+        if want_response {
+            encode_varint_field_into(3, 1, &mut data);
+        }
+
+        // MeshPacket { to(2)=BROADCAST (fixed32), decoded(4)=Data }. The firmware
+        // fills in `from` = our node-num when it transmits.
+        let mut packet = Vec::new();
+        encode_fixed32_field(2, BROADCAST_NUM, &mut packet);
+        encode_len_field(4, &data, &mut packet);
+
+        self.send_to_radio(&encode_to_radio_variant(TO_RADIO_PACKET, &packet))
+            .await
+            .context("Failed to send Meshtastic NodeInfo broadcast")?;
+        debug!(want_response, "Broadcast Meshtastic NodeInfo advert");
+        Ok(())
+    }
+
    pub async fn send_channel_text(&mut self, _channel: u8, msg: &[u8]) -> Result<()> {
        let text = String::from_utf8_lossy(msg);
        let packet = encode_mesh_packet(BROADCAST_NUM, TEXT_MESSAGE_APP, text.as_bytes());
@ -339,12 +577,36 @@ impl MeshtasticDevice {
            return Ok(Some(frame));
        }

+        // Drain aggressively. Meshtastic firmware interleaves verbose debug-log
+        // text with protobuf frames on the same serial line, so a single small
+        // read per poll can fall behind the byte stream, overflow the OS serial
+        // buffer, and corrupt/drop inbound frames — which silently kills message
+        // reception while leaving sends working. Pull up to a bounded burst of
+        // bytes per call, decoding as soon as a complete frame appears.
        let mut tmp = [0u8; READ_BUF_SIZE];
-        match tokio::time::timeout(Duration::from_millis(50), self.port.read(&mut tmp)).await {
-            Ok(Ok(0)) => anyhow::bail!("Meshtastic serial port closed"),
-            Ok(Ok(n)) => self.read_buf.extend_from_slice(&tmp[..n]),
-            Ok(Err(e)) => return Err(e).context("Meshtastic serial read error"),
-            Err(_) => return Ok(None),
+        for _ in 0..32 {
+            match tokio::time::timeout(Duration::from_millis(30), self.port.read(&mut tmp)).await {
+                Ok(Ok(0)) => anyhow::bail!("Meshtastic serial port closed"),
+                Ok(Ok(n)) => {
+                    self.read_buf.extend_from_slice(&tmp[..n]);
+                    if let Some(frame) = decode_serial_frame(&mut self.read_buf) {
+                        return Ok(Some(frame));
+                    }
+                    // Bound memory if it's a pure-debug flood with no frames:
+                    // keep only from the last possible frame-start marker.
+                    if self.read_buf.len() > 64 * 1024 {
+                        if let Some(pos) =
+                            self.read_buf.windows(2).rposition(|w| w == [START1, START2])
+                        {
+                            self.read_buf.drain(..pos);
+                        } else {
+                            self.read_buf.clear();
+                        }
+                    }
+                }
+                Ok(Err(e)) => return Err(e).context("Meshtastic serial read error"),
+                Err(_) => break, // no more bytes available right now
+            }
        }

        Ok(decode_serial_frame(&mut self.read_buf))
@ -352,8 +614,14 @@ impl MeshtasticDevice {

    fn handle_from_radio(&mut self, frame: &[u8]) -> Option<InboundFrame> {
        let Some((field, value)) = decode_top_level_variant(frame) else {
+            debug!(
+                len = frame.len(),
+                head = %hex::encode(&frame[..frame.len().min(8)]),
+                "Meshtastic FromRadio frame did not decode to a known top-level field"
+            );
            return None;
        };
+        debug!(field, value_len = value.len(), "Meshtastic FromRadio field");
        match field {
            FROM_RADIO_MY_INFO => {
                if let Some((node_num, user_id)) = parse_my_info(value) {
@ -369,6 +637,22 @@ impl MeshtasticDevice {
                None
            }
            FROM_RADIO_PACKET => self.packet_to_inbound_frame(value),
+            FROM_RADIO_CONFIG => {
+                // Only the LoRa sub-config carries a region; other Config
+                // variants (device/position/…) return None and are ignored.
+                if let Some(region) = parse_config_lora_region(value) {
+                    self.current_region = Some(region);
+                    debug!(region, "Meshtastic LoRa region from device config");
+                }
+                None
+            }
+            FROM_RADIO_CHANNEL => {
+                if let Some((name, psk)) = parse_primary_channel(value) {
+                    debug!(name = %name, psk_len = psk.len(), "Meshtastic primary channel from device");
+                    self.current_primary_channel = Some((name, psk));
+                }
+                None
+            }
            FROM_RADIO_CONFIG_COMPLETE_ID | FROM_RADIO_REBOOTED => None,
            other => {
                debug!(
@ -424,6 +708,12 @@ impl MeshtasticDevice {
        if Some(from) == self.node_num {
            return None;
        }
+        info!(
+            from = format!("!{:08x}", from),
+            len = packet.payload.len(),
+            pki = packet.pki_encrypted,
+            "Meshtastic received text packet over the air"
+        );
        // Record E2E status: a `pki_encrypted` packet (or one carrying the
        // sender's `public_key`) proves this DM arrived end-to-end encrypted via
        // the PKI, not the shared channel PSK. We learn the sender's key here too
@ -504,6 +794,116 @@ fn encode_heartbeat() -> Vec<u8> {
    encode_to_radio_variant(TO_RADIO_HEARTBEAT, &[])
 }

+/// Extract `LoRaConfig.region` from a `Config` message, returning the region
+/// code. Returns `Some(REGION_UNSET)` when the LoRa block is present but has no
+/// region field (a fresh radio), and `None` when this Config carries a
+/// non-LoRa variant (device/position/…) so the caller keeps the prior value.
+fn parse_config_lora_region(data: &[u8]) -> Option<u32> {
+    let mut idx = 0;
+    while idx < data.len() {
+        let (field, value, next) = next_field(data, idx)?;
+        idx = next;
+        if field == CONFIG_LORA_FIELD {
+            if let FieldValue::Bytes(b) = value {
+                let mut j = 0;
+                let mut region = REGION_UNSET;
+                while j < b.len() {
+                    let (lf, lv, ln) = next_field(b, j)?;
+                    j = ln;
+                    if lf == LORA_REGION_FIELD {
+                        if let FieldValue::Varint(v) = lv {
+                            region = v as u32;
+                        }
+                    }
+                }
+                return Some(region);
+            }
+        }
+    }
+    None
+}
+
+/// Extract `(name, psk)` from a `Channel` message, but only for the PRIMARY
+/// channel (role == 1) — that's the one broadcasts ride on and whose hash must
+/// match for two radios to decode each other. Returns `None` for secondary /
+/// disabled channels so the caller keeps the primary it already learned.
+fn parse_primary_channel(data: &[u8]) -> Option<(String, Vec<u8>)> {
+    let mut role = 0u64;
+    let mut name = String::new();
+    let mut psk = Vec::new();
+    let mut idx = 0;
+    while idx < data.len() {
+        let (field, value, next) = next_field(data, idx)?;
+        idx = next;
+        match (field, value) {
+            (3, FieldValue::Varint(v)) => role = v,
+            (2, FieldValue::Bytes(b)) => {
+                let mut j = 0;
+                while j < b.len() {
+                    let (sf, sv, sn) = next_field(b, j)?;
+                    j = sn;
+                    match (sf, sv) {
+                        (2, FieldValue::Bytes(p)) => psk = p.to_vec(),
+                        (3, FieldValue::Bytes(n)) => {
+                            name = String::from_utf8_lossy(n).to_string()
+                        }
+                        _ => {}
+                    }
+                }
+            }
+            _ => {}
+        }
+    }
+    if role == CHANNEL_ROLE_PRIMARY {
+        Some((name, psk))
+    } else {
+        None
+    }
+}
+
+/// Derive the 32-byte channel PSK deterministically from the channel name, so
+/// every archy node configured with the same `channel_name` converges on the
+/// exact same primary channel (identical hash) and meshes automatically.
+fn derive_channel_psk(channel_name: &str) -> Vec<u8> {
+    use sha2::{Digest, Sha256};
+    let mut hasher = Sha256::new();
+    hasher.update(b"archipelago-mesh:");
+    hasher.update(channel_name.as_bytes());
+    hasher.finalize().to_vec()
+}
+
+/// Map a Meshtastic `RegionCode` name (as set in `mesh-config.json`, e.g.
+/// "EU_868", "US", "ANZ") to its protobuf enum value. Case-insensitive.
+/// Returns `None` for an unrecognised name so we never write a bogus region.
+fn region_name_to_code(name: &str) -> Option<u32> {
+    Some(match name.trim().to_uppercase().as_str() {
+        "UNSET" => 0,
+        "US" => 1,
+        "EU_433" => 2,
+        "EU_868" | "EU868" => 3,
+        "CN" => 4,
+        "JP" => 5,
+        "ANZ" => 6,
+        "KR" => 7,
+        "TW" => 8,
+        "RU" => 9,
+        "IN" => 10,
+        "NZ_865" => 11,
+        "TH" => 12,
+        "LORA_24" => 13,
+        "UA_433" => 14,
+        "UA_868" => 15,
+        "MY_433" => 16,
+        "MY_919" => 17,
+        "SG_923" => 18,
+        "PH_433" => 19,
+        "PH_868" => 20,
+        "PH_915" => 21,
+        "ANZ_433" => 22,
+        _ => return None,
+    })
+}
+
 fn encode_to_radio_variant(field: u64, bytes: &[u8]) -> Vec<u8> {
    let mut out = Vec::new();
    encode_len_field(field, bytes, &mut out);
@ -544,7 +944,11 @@ fn decode_top_level_variant(buf: &[u8]) -> Option<(u64, &[u8])> {
                }
                if matches!(
                    field,
-                    FROM_RADIO_PACKET | FROM_RADIO_MY_INFO | FROM_RADIO_NODE_INFO
+                    FROM_RADIO_PACKET
+                        | FROM_RADIO_MY_INFO
+                        | FROM_RADIO_NODE_INFO
+                        | FROM_RADIO_CONFIG
+                        | FROM_RADIO_CHANNEL
                ) {
                    return Some((field, &buf[idx..end]));
                }
--- a/core/archipelago/src/mesh/mod.rs
+++ b/core/archipelago/src/mesh/mod.rs
@ -326,6 +326,14 @@ pub struct MeshConfig {
    /// Channel name for broadcasts.
    #[serde(default)]
    pub channel_name: Option<String>,
+    /// Meshtastic LoRa region (e.g. "EU_868", "US", "ANZ"). Fresh-flashed
+    /// Meshtastic radios ship region-UNSET and are RF-silent until a region is
+    /// set, so archy provisions this region on connect to bring every node onto
+    /// the same band automatically (the parity equivalent of a meshcore radio
+    /// coming up on its configured band). Ignored for meshcore devices and when
+    /// unset/None.
+    #[serde(default)]
+    pub lora_region: Option<String>,
    /// Whether to periodically broadcast our identity.
    #[serde(default)]
    pub broadcast_identity: bool,
@ -385,6 +393,7 @@ impl Default for MeshConfig {
            enabled: false,
            device_path: None,
            channel_name: Some("archipelago".to_string()),
+            lora_region: None,
            broadcast_identity: true,
            advert_name: None,
            mesh_only_mode: None,
@ -675,6 +684,8 @@ impl MeshService {
            self.our_x25519_secret,
            self.our_x25519_pubkey_hex.clone(),
            self.server_name.clone(),
+            self.config.lora_region.clone(),
+            self.config.channel_name.clone(),
            shutdown_rx,
            cmd_rx,
        );
--- a/core/archipelago/src/update.rs
+++ b/core/archipelago/src/update.rs
@ -1702,7 +1702,67 @@ pub async fn get_schedule(data_dir: &Path) -> Result<UpdateSchedule> {

 /// Background update scheduler. Runs in a loop, checking/applying based on schedule.
 /// Call this once at startup via `tokio::spawn`.
-pub async fn run_update_scheduler(data_dir: std::path::PathBuf) {
+/// Apply per-app auto-update-to-latest for apps the runner opted in
+/// (`docs/bitcoin-multi-version-design.md` §3 Phase 3). Independent of the
+/// binary OTA schedule below. Conservative: only upgrades an app when the fresh
+/// catalog actually advertises a newer image than the one running, and only via
+/// the orchestrator's normal upgrade lifecycle (the same safe path as the
+/// manual "Update" button). Pinned apps are excluded upstream in
+/// `auto_update_apps()`. Best-effort — failures are logged, never fatal.
+async fn apply_per_app_auto_updates(
+    orchestrator: &Option<std::sync::Arc<dyn crate::container::traits::ContainerOrchestrator>>,
+) {
+    let Some(orchestrator) = orchestrator.as_ref() else {
+        return;
+    };
+    for app_id in crate::container::version_config::auto_update_apps() {
+        // Determine the version actually running by inspecting the backend
+        // container's image. Skip when not installed / unreadable.
+        let running_image = ["", "archy-"]
+            .iter()
+            .map(|p| format!("{p}{app_id}"))
+            .collect::<Vec<_>>();
+        let mut current_image = None;
+        for name in &running_image {
+            if let Ok(out) = tokio::process::Command::new("podman")
+                .args(["inspect", name, "--format", "{{.ImageName}}"])
+                .output()
+                .await
+            {
+                if out.status.success() {
+                    let img = String::from_utf8_lossy(&out.stdout).trim().to_string();
+                    if !img.is_empty() {
+                        current_image = Some(img);
+                        break;
+                    }
+                }
+            }
+        }
+        let Some(current_image) = current_image else {
+            continue;
+        };
+        // Only act when the catalog advertises a genuine update over what's
+        // running (this also re-checks the pin guard inside the helper).
+        if crate::container::app_catalog::available_update_for_app(&app_id, &current_image)
+            .is_none()
+        {
+            continue;
+        }
+        info!(
+            "auto-update: {} has a newer catalog image (running {}), upgrading",
+            app_id, current_image
+        );
+        match orchestrator.upgrade(&app_id).await {
+            Ok(()) => info!("auto-update: {} upgraded to catalog latest", app_id),
+            Err(e) => warn!("auto-update: {} upgrade failed: {}", app_id, e),
+        }
+    }
+}
+
+pub async fn run_update_scheduler(
+    data_dir: std::path::PathBuf,
+    orchestrator: Option<std::sync::Arc<dyn crate::container::traits::ContainerOrchestrator>>,
+) {
    use tokio::time::{interval, Duration};

    // Check every hour; act based on schedule setting
@ -1728,6 +1788,10 @@ pub async fn run_update_scheduler(data_dir: std::path::PathBuf) {
            debug!("Update scheduler: app-catalog refresh failed: {}", e);
        }

+        // Per-app auto-update-to-latest (multi-version support). Runs every tick
+        // regardless of the binary-OTA schedule below; opt-in + pin-respecting.
+        apply_per_app_auto_updates(&orchestrator).await;
+
        let state = match load_state(&data_dir).await {
            Ok(s) => s,
            Err(e) => {
--- a/core/archipelago/src/wallet/fedimint_client.rs
+++ b/core/archipelago/src/wallet/fedimint_client.rs
@ -50,38 +50,12 @@ pub struct FederationRegistry {
 const REGISTRY_FILE: &str = "wallet/fedimint_federations.json";

 /// Shared HTTP-Basic password between the fmcd container and this bridge. The
-/// fedimint-clientd manifest reads it via `secret_env: fmcd-password`, resolved
-/// from `<data_dir>/secrets/`; the bridge reads the same file in `from_node`.
+/// fedimint-clientd manifest generates it via `generated_secrets: [fmcd-password]`
+/// and injects it through `secret_env`; the bridge reads the same file in
+/// `from_node`. (Generation lives in `container::secrets`, not here — it's a
+/// generic, manifest-declared concern, not fedimint-specific.)
 const FMCD_PASSWORD_SECRET: &str = "fmcd-password";

-/// Generate the fmcd Basic-auth password once, so the fmcd container
-/// (`secret_env: fmcd-password`) and this bridge (`from_node`) agree on it.
-/// Idempotent: a non-empty existing secret is left untouched. Mirrors the
-/// bitcoin-rpc secret pattern (random hex, 0600). Called from the orchestrator's
-/// `ensure_app_secrets` before the container's `secret_env` is resolved.
-pub async fn ensure_fmcd_password(secrets_dir: &Path) -> Result<()> {
-    let path = secrets_dir.join(FMCD_PASSWORD_SECRET);
-    if let Ok(existing) = fs::read_to_string(&path).await {
-        if !existing.trim().is_empty() {
-            return Ok(());
-        }
-    }
-    fs::create_dir_all(secrets_dir)
-        .await
-        .context("creating secrets dir for fmcd password")?;
-    let bytes: [u8; 16] = rand::random();
-    let password = hex::encode(bytes);
-    fs::write(&path, &password)
-        .await
-        .context("writing fmcd password secret")?;
-    #[cfg(unix)]
-    {
-        use std::os::unix::fs::PermissionsExt;
-        let _ = fs::set_permissions(&path, std::fs::Permissions::from_mode(0o600)).await;
-    }
-    Ok(())
-}
-
 pub async fn load_registry(data_dir: &Path) -> Result<FederationRegistry> {
    let path = data_dir.join(REGISTRY_FILE);
    if !path.exists() {
--- a/core/container/src/lib.rs
+++ b/core/container/src/lib.rs
@ -8,9 +8,11 @@ pub mod runtime;
 pub use bitcoin_simulator::{BitcoinSimulationMode, BitcoinSimulator};
 pub use health_monitor::HealthMonitor;
 pub use manifest::{
-    AppInterface, AppManifest, BuildConfig, ContainerConfig, Dependency, DerivedEnv, GeneratedFile,
-    HealthCheck, HostFacts, ManifestError, ResolvedSource, ResourceLimits, SecretEnv,
-    SecretsProvider, SecurityPolicy, Volume,
+    AppInterface, AppManifest, BuildConfig, ContainerConfig, Dependency, DerivedEnv, GeneratedCert,
+    GeneratedFile, GeneratedSecret, HealthCheck, HookStep, HostCopy, HostFacts, LifecycleHooks,
+    ManifestError,
+    ResolvedSource, ResourceLimits, SecretEnv, SecretGenKind, SecretsProvider, SecurityPolicy,
+    Volume,
 };
 pub use podman_client::{
    image_uses_insecure_registry, ContainerState, ContainerStatus, PodmanClient,
--- a/core/container/src/manifest.rs
+++ b/core/container/src/manifest.rs
@ -57,10 +57,88 @@ pub struct AppDefinition {
    #[serde(default)]
    pub interfaces: HashMap<String, AppInterface>,

+    /// Controlled post-install / pre-start lifecycle hooks. Declarative,
+    /// allowlisted operations run against the app's OWN container — never the
+    /// host. See `docs/manifest-hooks-design.md`.
+    #[serde(default)]
+    pub hooks: LifecycleHooks,
+
    #[serde(flatten)]
    pub extensions: HashMap<String, serde_yaml::Value>,
 }

+/// Declarative lifecycle hooks for an app. Absent = none (forward-compatible).
+#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq)]
+pub struct LifecycleHooks {
+    /// Run once after a successful install, with the container created + running.
+    #[serde(default)]
+    pub post_install: Vec<HookStep>,
+    /// Run before each start (repair/ownership). Reserved; not yet executed.
+    #[serde(default)]
+    pub pre_start: Vec<HookStep>,
+}
+
+/// A single controlled hook operation. Each list item is a one-key map, e.g.
+/// `- exec: [...]` or `- copy_from_host: { src, dest }`.
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
+#[serde(untagged)]
+pub enum HookStep {
+    /// Run a command vector INSIDE the app's container (`podman exec`). Never on
+    /// the host; inherits the container's (already dropped) capabilities.
+    Exec { exec: Vec<String> },
+    /// Copy a file from an allowlisted host root into the container. `src` is
+    /// relative to the allowlist (data dir / web-ui) — no absolute paths, no `..`.
+    CopyFromHost {
+        #[serde(rename = "copy_from_host")]
+        copy_from_host: HostCopy,
+    },
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
+pub struct HostCopy {
+    pub src: String,
+    pub dest: String,
+}
+
+impl LifecycleHooks {
+    fn validate(&self) -> Result<(), ManifestError> {
+        for step in self.post_install.iter().chain(self.pre_start.iter()) {
+            step.validate()?;
+        }
+        Ok(())
+    }
+}
+
+impl HookStep {
+    fn validate(&self) -> Result<(), ManifestError> {
+        match self {
+            HookStep::Exec { exec } => {
+                if exec.is_empty() {
+                    return Err(ManifestError::Invalid(
+                        "hooks: exec must be a non-empty command vector".to_string(),
+                    ));
+                }
+            }
+            HookStep::CopyFromHost { copy_from_host } => {
+                let s = &copy_from_host.src;
+                if s.is_empty() || s.starts_with('/') || s.contains("..") {
+                    return Err(ManifestError::Invalid(format!(
+                        "hooks: copy_from_host.src must be a relative allowlisted path \
+                         (no leading '/', no '..'), got '{s}'"
+                    )));
+                }
+                if copy_from_host.dest.is_empty() || !copy_from_host.dest.starts_with('/') {
+                    return Err(ManifestError::Invalid(format!(
+                        "hooks: copy_from_host.dest must be an absolute container path, got '{}'",
+                        copy_from_host.dest
+                    )));
+                }
+            }
+        }
+        Ok(())
+    }
+}
+
 #[derive(Debug, Clone, Serialize, Deserialize, Default)]
 pub struct ContainerConfig {
    /// Pull source. Mutually exclusive with `build`. Exactly one of the two must be present.
@ -92,6 +170,17 @@ pub struct ContainerConfig {
    #[serde(default)]
    pub network: Option<String>,

+    /// Extra DNS aliases the container answers to on its `network`, in addition
+    /// to its own container name (which is always added). Mirrors podman
+    /// `--network-alias`. Used by multi-container stacks whose images reference
+    /// peers by a short baked-in hostname — e.g. indeedhub's frontend nginx
+    /// proxies to `api:4000` / `minio:9000` / `relay:8080`, so the api/minio/relay
+    /// members declare `network_aliases: [api]` / `[minio]` / `[relay]` to keep
+    /// those short names resolvable on the dedicated `indeedhub-net`. Ignored for
+    /// slirp4netns/pasta (podman rejects aliases there).
+    #[serde(default)]
+    pub network_aliases: Vec<String>,
+
    /// Extra positional arguments appended to the container command
    /// after the image. Mirrors `SPEC_CUSTOM_ARGS` in
    /// `scripts/container-specs.sh` (bitcoin-knots prune/dbcache flags,
@ -122,6 +211,31 @@ pub struct ContainerConfig {
    #[serde(default)]
    pub secret_env: Vec<SecretEnv>,

+    /// Secrets the orchestrator generates on first use when absent, so an app
+    /// installs from its manifest alone — no host provisioning, no per-app Rust.
+    /// Materialised before `secret_env` is resolved, written `0600` and owned by
+    /// the unprivileged (rootless) service user. Idempotent and self-healing: a
+    /// file that already exists and is readable is left untouched; one that is
+    /// present-but-unreadable (e.g. wrongly created `root`-owned) is recreated
+    /// in place via the service-owned secrets dir — no `chown`, no privilege.
+    ///
+    /// Example: `- { name: fmcd-password, kind: hex16 }`
+    #[serde(default)]
+    pub generated_secrets: Vec<GeneratedSecret>,
+
+    /// Self-signed TLS certificates the orchestrator materialises before the
+    /// container is created (so a bind-mounted cert path resolves to a real
+    /// file, not a stale/missing path). Like `generated_secrets`, this keeps an
+    /// app data-driven: a service that needs a secure context (e.g. netbird's
+    /// dashboard — OIDC PKCE / `window.crypto.subtle` only works over HTTPS,
+    /// issue #15) declares the cert here instead of relying on per-app Rust.
+    /// Idempotent: an entry whose `crt` and `key` already exist is left
+    /// untouched. SAN/CN templates are rendered against host facts at apply time.
+    ///
+    /// Example: `- { crt: /var/lib/archipelago/netbird/tls.crt, key: /var/lib/archipelago/netbird/tls.key }`
+    #[serde(default)]
+    pub generated_certs: Vec<GeneratedCert>,
+
    /// Rootless-mapped UID:GID applied to the container's data directory
    /// (the `bind`-mounted host path with `target` inside the container's
    /// data root) before creation. Mirrors `SPEC_DATA_UID`.
@ -151,6 +265,66 @@ pub struct SecretEnv {
    pub secret_file: String,
 }

+/// How a [`GeneratedSecret`] is produced. Each kind is deterministic in shape
+/// (so the orchestrator knows which files to expect) but random in value.
+#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
+#[serde(rename_all = "snake_case")]
+pub enum SecretGenKind {
+    /// 16 random bytes, lowercase hex (32 chars). Service passwords/API tokens.
+    Hex16,
+    /// 32 random bytes, lowercase hex (64 chars). Longer keys/cookies.
+    Hex32,
+    /// 32 random bytes, standard base64 (44 chars incl. padding). For services
+    /// that require a base64-encoded key rather than hex — e.g. netbird's relay
+    /// `authSecret` and the SQLite store `encryptionKey`, which base64-decode
+    /// their configured value (hex would decode to the wrong bytes).
+    Base64,
+    /// A random password and its bcrypt hash. `<name>` holds the bcrypt hash
+    /// (what a server is configured with); the plaintext is stored alongside as
+    /// `<name>.pw` for any client that must authenticate. `secret_env` injects
+    /// whichever file it references.
+    Bcrypt,
+}
+
+/// A secret materialised by the orchestrator on demand. See
+/// [`ContainerConfig::generated_secrets`]. `name` is a bare filename under the
+/// secrets dir — validated (no `/`, no `..`) at [`AppManifest::validate`] time.
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
+pub struct GeneratedSecret {
+    pub name: String,
+    pub kind: SecretGenKind,
+}
+
+impl GeneratedSecret {
+    /// Every file this secret materialises, in the order they should be written
+    /// (primary first). A consumer references one of these via `secret_env`.
+    pub fn target_files(&self) -> Vec<String> {
+        match self.kind {
+            SecretGenKind::Hex16 | SecretGenKind::Hex32 | SecretGenKind::Base64 => {
+                vec![self.name.clone()]
+            }
+            SecretGenKind::Bcrypt => vec![self.name.clone(), format!("{}.pw", self.name)],
+        }
+    }
+}
+
+/// A self-signed TLS certificate materialised by the orchestrator. See
+/// [`ContainerConfig::generated_certs`]. `crt`/`key` are absolute host paths
+/// (typically under `/var/lib/archipelago/<app>/`) that the container
+/// bind-mounts read-only. `common_name` and `sans` are rendered against host
+/// facts (`{{HOST_IP}}`) at apply time; when omitted they default to the
+/// node's host IP plus `IP:127.0.0.1,DNS:localhost` so the cert is valid for
+/// however the box is reached locally.
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
+pub struct GeneratedCert {
+    pub crt: String,
+    pub key: String,
+    #[serde(default)]
+    pub common_name: Option<String>,
+    #[serde(default)]
+    pub sans: Vec<String>,
+}
+
 fn default_pull_policy() -> String {
    "if-not-present".to_string()
 }
@ -413,6 +587,25 @@ impl AppManifest {
            }
        }

+        // network_aliases: each must be a non-empty DNS label (lowercase
+        // alphanumeric + hyphen, no leading/trailing hyphen) so it renders as a
+        // valid podman --network-alias / aardvark-dns name.
+        for (i, alias) in self.app.container.network_aliases.iter().enumerate() {
+            let ok = !alias.is_empty()
+                && alias.len() <= 63
+                && alias
+                    .chars()
+                    .all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '-')
+                && !alias.starts_with('-')
+                && !alias.ends_with('-');
+            if !ok {
+                return Err(ManifestError::Invalid(format!(
+                    "container.network_aliases[{i}] '{alias}' must be a non-empty DNS label \
+                     (lowercase a-z, 0-9, '-'; no leading/trailing '-')"
+                )));
+            }
+        }
+
        // custom_args: no empty strings (would inject literal "" into
        // the podman command line and confuse downstream parsing).
        for (i, a) in self.app.container.custom_args.iter().enumerate() {
@ -487,6 +680,40 @@ impl AppManifest {
            }
        }

+        // generated_secrets: bare-filename names, unique across every file the
+        // set materialises (so a Bcrypt's `.pw` sibling can't collide with
+        // another secret). Path-safety mirrors secret_env.
+        {
+            let mut names: std::collections::HashSet<String> = std::collections::HashSet::new();
+            for (i, g) in self.app.container.generated_secrets.iter().enumerate() {
+                if g.name.is_empty() || g.name.contains('/') || g.name.contains("..") {
+                    return Err(ManifestError::Invalid(format!(
+                        "container.generated_secrets[{}].name must be a bare filename (no '/', no '..'), got '{}'",
+                        i, g.name
+                    )));
+                }
+                for f in g.target_files() {
+                    if !names.insert(f.clone()) {
+                        return Err(ManifestError::Invalid(format!(
+                            "container.generated_secrets produces duplicate file '{f}'"
+                        )));
+                    }
+                }
+            }
+        }
+
+        // generated_certs: crt/key must be non-empty absolute paths with no
+        // traversal (they become bind-mount sources, same safety bar as files).
+        for (i, c) in self.app.container.generated_certs.iter().enumerate() {
+            for (field, val) in [("crt", &c.crt), ("key", &c.key)] {
+                if val.is_empty() || !val.starts_with('/') || val.contains("..") {
+                    return Err(ManifestError::Invalid(format!(
+                        "container.generated_certs[{i}].{field} must be an absolute path with no '..', got '{val}'"
+                    )));
+                }
+            }
+        }
+
        // data_uid: if set, must look like "NNNNN:NNNNN".
        if let Some(u) = &self.app.container.data_uid {
            let parts: Vec<&str> = u.split(':').collect();
@ -587,6 +814,10 @@ impl AppManifest {
            }
        }

+        // Lifecycle hooks: declarative, allowlisted (no host exec, no absolute /
+        // `..` copy sources). See docs/manifest-hooks-design.md.
+        self.app.hooks.validate()?;
+
        Ok(())
    }
 }
@ -1002,6 +1233,57 @@ mod tests {
    use std::fs;
    use std::path::{Path, PathBuf};

+    #[test]
+    fn hooks_parse_and_validate() {
+        let yaml = r#"
+app:
+  id: indeedhub
+  name: IndeedHub
+  version: 1.0.0
+  container:
+    image: test/indeedhub:1.0.0
+  hooks:
+    post_install:
+      - exec: ["sed", "-i", "/X-Frame-Options/d", "/etc/nginx/conf.d/default.conf"]
+      - copy_from_host:
+          src: "web-ui/nostr-provider.js"
+          dest: "/usr/share/nginx/html/nostr-provider.js"
+"#;
+        let m = AppManifest::parse(yaml).unwrap();
+        assert_eq!(m.app.hooks.post_install.len(), 2);
+        match &m.app.hooks.post_install[0] {
+            HookStep::Exec { exec } => assert_eq!(exec[0], "sed"),
+            _ => panic!("expected exec step"),
+        }
+        match &m.app.hooks.post_install[1] {
+            HookStep::CopyFromHost { copy_from_host } => {
+                assert_eq!(copy_from_host.dest, "/usr/share/nginx/html/nostr-provider.js")
+            }
+            _ => panic!("expected copy_from_host step"),
+        }
+        m.validate().unwrap();
+    }
+
+    #[test]
+    fn hooks_reject_absolute_or_traversal_copy_src() {
+        for bad in ["/etc/passwd", "../../etc/shadow", "web-ui/../../etc/x"] {
+            let yaml = format!(
+                "app:\n  id: a\n  name: a\n  version: 1.0.0\n  container:\n    image: x:y\n  \
+                 hooks:\n    post_install:\n      - copy_from_host:\n          src: \"{bad}\"\n          dest: \"/x\"\n"
+            );
+            assert!(
+                AppManifest::parse(&yaml).is_err(),
+                "src '{bad}' must be rejected"
+            );
+        }
+    }
+
+    #[test]
+    fn hooks_reject_empty_exec() {
+        let yaml = "app:\n  id: a\n  name: a\n  version: 1.0.0\n  container:\n    image: x:y\n  hooks:\n    post_install:\n      - exec: []\n";
+        assert!(AppManifest::parse(yaml).is_err());
+    }
+
    #[test]
    fn test_manifest_parse() {
        let yaml = r#"
@ -1459,6 +1741,7 @@ app:
            pull_policy: "if-not-present".to_string(),
            build: None,
            network: None,
+            network_aliases: vec![],
            custom_args: vec![],
            entrypoint: None,
            derived_env: vec![
@ -1476,6 +1759,8 @@ app:
                },
            ],
            secret_env: vec![],
+            generated_secrets: vec![],
+            generated_certs: vec![],
            data_uid: None,
        };
        let facts = HostFacts {
@ -1512,6 +1797,7 @@ app:
            pull_policy: "if-not-present".to_string(),
            build: None,
            network: None,
+            network_aliases: vec![],
            custom_args: vec![],
            entrypoint: None,
            derived_env: vec![],
@ -1525,6 +1811,8 @@ app:
                    secret_file: "fedimint-gateway-password".to_string(),
                },
            ],
+            generated_secrets: vec![],
+            generated_certs: vec![],
            data_uid: None,
        };
        let p = MapSecretsProvider {
@ -1553,6 +1841,7 @@ app:
            pull_policy: "if-not-present".to_string(),
            build: None,
            network: None,
+            network_aliases: vec![],
            custom_args: vec![],
            entrypoint: None,
            derived_env: vec![],
@ -1560,6 +1849,8 @@ app:
                key: "BITCOIN_RPC_PASS".to_string(),
                secret_file: "bitcoin-rpc-password".to_string(),
            }],
+            generated_secrets: vec![],
+            generated_certs: vec![],
            data_uid: None,
        };
        let p = MapSecretsProvider {
--- a/core/container/src/podman_client.rs
+++ b/core/container/src/podman_client.rs
@ -121,10 +121,16 @@ impl PodmanClient {
            "cryptpad" => "http://localhost:3003",
            "penpot" => "http://localhost:9001",
            "immich_server" | "immich" => "http://localhost:2283",
+            // Gitea publishes SSH (2222) and web (3001). Without a manifest on
+            // disk, extract_lan_address() returns whichever podman lists first —
+            // which can be the SSH port, breaking the launch. Pin the web UI.
+            "gitea" => "http://localhost:3001",
            "nginx-proxy-manager" => "http://localhost:8081",
            "fedimint-gateway" => "http://localhost:8176",
            "endurain" => "http://localhost:8080",
-            "netbird" => "http://localhost:8087",
+            // HTTPS: netbird's dashboard needs a secure context for OIDC PKCE
+            // (window.crypto.subtle), so the proxy serves TLS on 8087 (issue #15).
+            "netbird" => "https://localhost:8087",
            "electrs" | "archy-electrs-ui" => "http://localhost:50002",
            _ => return None,
        };
@ -275,10 +281,18 @@ impl PodmanClient {
        // Build the container spec for the API
        let mut port_mappings = Vec::new();
        for port in &manifest.app.ports {
+            // Honour the manifest's protocol (default tcp). netbird's STUN port
+            // is 3478/udp; forcing tcp here would publish the wrong protocol and
+            // silently break relay discovery.
+            let protocol = match port.protocol.to_ascii_lowercase().as_str() {
+                "udp" => "udp",
+                "sctp" => "sctp",
+                _ => "tcp",
+            };
            port_mappings.push(serde_json::json!({
                "container_port": port.container,
                "host_port": port.host,
-                "protocol": "tcp",
+                "protocol": protocol,
            }));
        }

@ -385,11 +399,21 @@ impl PodmanClient {
            },
        });
        if let Some(network) = custom_network {
+            // The container always answers to its own name; manifest
+            // network_aliases add extra short hostnames peers may bake in
+            // (e.g. indeedhub's api/minio/relay). Dedup so a manifest that
+            // redundantly lists its own name doesn't double it.
+            let mut aliases = vec![name.to_string()];
+            for a in &manifest.app.container.network_aliases {
+                if !aliases.iter().any(|x| x == a) {
+                    aliases.push(a.clone());
+                }
+            }
            body.as_object_mut()
                .expect("container create body is a JSON object")
                .insert(
                    "networks".to_string(),
-                    serde_json::json!({ network: { "aliases": [name] } }),
+                    serde_json::json!({ network: { "aliases": aliases } }),
                );
        }

@ -412,11 +436,22 @@ impl PodmanClient {
    }

    pub async fn stop_container(&self, name: &str) -> Result<()> {
+        self.stop_container_with_grace(name, 10).await
+    }
+
+    /// Stop via libpod honouring a per-app grace (seconds). The HTTP deadline is
+    /// kept above the grace so the post-grace SIGKILL lands before we give up —
+    /// otherwise slow-to-SIGTERM apps (fedimint, bitcoin-core, electrumx…) time
+    /// out at exactly the grace boundary and the stop is reported as failed.
+    pub async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
+        let deadline = std::time::Duration::from_secs(
+            grace_secs + crate::runtime::STOP_GRACE_DEADLINE_BUFFER_SECS,
+        );
        self.api_request(
            "POST",
-            &format!("libpod/containers/{}/stop?t=10", name),
+            &format!("libpod/containers/{}/stop?t={}", name, grace_secs),
            None,
-            DEFAULT_TIMEOUT,
+            deadline,
        )
        .await
        .map(|_| ())
--- a/core/container/src/runtime.rs
+++ b/core/container/src/runtime.rs
@ -10,6 +10,35 @@ const PODMAN_CLI_DEFAULT_TIMEOUT: Duration = Duration::from_secs(30);
 const PODMAN_CLI_IMAGE_CHECK_TIMEOUT: Duration = Duration::from_secs(10);
 const PODMAN_CLI_BUILD_TIMEOUT: Duration = Duration::from_secs(900);

+/// Default graceful-stop grace (seconds) when a caller doesn't supply a per-app
+/// value. Mirrors the historical `podman stop -t 30`.
+pub const DEFAULT_STOP_GRACE_SECS: u64 = 30;
+/// Headroom added to a stop grace to form the await/HTTP deadline, so podman's
+/// post-grace SIGKILL completes before the wrapper times out.
+pub const STOP_GRACE_DEADLINE_BUFFER_SECS: u64 = 15;
+
+/// Canonical per-app graceful-stop grace (seconds), keyed by container name.
+/// Slow-to-SIGTERM apps need far longer than the 30s default: bitcoin-core
+/// flushes its chainstate, lnd closes channels, electrumx finishes indexing,
+/// stack DBs checkpoint. Used as the fallback when a manifest doesn't declare
+/// `stop_grace_secs`. NOTE: the RPC layer's `stop_timeout_secs` mirrors this
+/// (returns the same values as `&str` for legacy `podman stop -t` call sites) —
+/// keep the two in sync until that path is retired.
+pub fn stop_grace_secs_for(container_name: &str) -> u64 {
+    let id = container_name
+        .strip_prefix("archy-")
+        .unwrap_or(container_name);
+    match id {
+        "bitcoin-knots" | "bitcoin-core" | "bitcoin" => 600,
+        "lnd" => 330,
+        "electrumx" | "electrs" | "mempool-electrs" => 300,
+        "btcpay-db" | "mempool-db" | "penpot-postgres" | "immich_postgres" | "nextcloud-db"
+        | "endurain-db" => 120,
+        "btcpay-server" | "nbxplorer" | "fedimint" | "fedimint-gateway" => 60,
+        _ => DEFAULT_STOP_GRACE_SECS,
+    }
+}
+
 #[async_trait]
 pub trait ContainerRuntime: Send + Sync {
    async fn pull_image(&self, image: &str, signature: Option<&str>) -> Result<()>;
@ -21,6 +50,19 @@ pub trait ContainerRuntime: Send + Sync {
    ) -> Result<String>;
    async fn start_container(&self, name: &str) -> Result<()>;
    async fn stop_container(&self, name: &str) -> Result<()>;
+    /// Stop a container honouring a per-app graceful-shutdown grace (seconds).
+    ///
+    /// Slow-to-SIGTERM apps (bitcoin-core, lnd, electrumx, fedimint, immich…)
+    /// need a longer `podman stop -t` than the default 30s, or `podman stop`
+    /// returns before the container exits and the orchestrator treats the stop
+    /// as failed (the container keeps running). The wrapping deadline is always
+    /// kept strictly greater than `grace_secs` so podman's post-grace SIGKILL
+    /// lands inside the await. The default impl ignores the grace and calls
+    /// `stop_container` — only the real podman runtime honours it.
+    async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
+        let _ = grace_secs;
+        self.stop_container(name).await
+    }
    async fn remove_container(&self, name: &str) -> Result<()>;
    async fn get_container_status(&self, name: &str) -> Result<ContainerStatus>;
    async fn get_container_logs(&self, name: &str, lines: u32) -> Result<Vec<String>>;
@ -122,10 +164,23 @@ impl ContainerRuntime for PodmanRuntime {
    }

    async fn stop_container(&self, name: &str) -> Result<()> {
-        match self.client.stop_container(name).await {
+        self.stop_container_with_grace(name, DEFAULT_STOP_GRACE_SECS)
+            .await
+    }
+
+    async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
+        match self.client.stop_container_with_grace(name, grace_secs).await {
            Ok(()) => Ok(()),
            Err(api_err) => {
-                let output = self.podman_cli(&["stop", "-t", "30", name]).await?;
+                // CLI fallback. Keep the wrapper deadline strictly above the
+                // `-t` grace so podman's post-grace SIGKILL completes before the
+                // await gives up (otherwise a deadline == grace races the kill
+                // and reports a spurious timeout).
+                let grace = grace_secs.to_string();
+                let deadline = Duration::from_secs(grace_secs + STOP_GRACE_DEADLINE_BUFFER_SECS);
+                let output = self
+                    .podman_cli_timeout(&["stop", "-t", &grace, name], deadline)
+                    .await?;
                if output.status.success() {
                    Ok(())
                } else {
@ -841,6 +896,10 @@ impl ContainerRuntime for AutoRuntime {
        self.runtime.stop_container(name).await
    }

+    async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
+        self.runtime.stop_container_with_grace(name, grace_secs).await
+    }
+
    async fn remove_container(&self, name: &str) -> Result<()> {
        self.runtime.remove_container(name).await
    }
--- a/docker/fmcd/fmcd-run
+++ b/docker/fmcd/fmcd-run
@ -1,5 +1,5 @@
 #!/bin/sh
-# Resilient launcher for fmcd.
+# Resilient launcher for fmcd, with a stuck-CPU watchdog.
 #
 # fmcd requires >=1 federation to boot — if the default federation is
 # unreachable at first boot it exits non-zero. Rather than let the container
@ -9,9 +9,72 @@
 #
 # All config comes from FMCD_* env (FMCD_ADDR, FMCD_MODE, FMCD_DATA_DIR,
 # FMCD_INVITE_CODE, FMCD_PASSWORD), so fmcd needs no CLI args here.
+#
+# WATCHDOG: on NAT'd nodes that can reach the iroh federation neither directly
+# nor via iroh's public relays, fmcd's embedded iroh networking enters a
+# relay/hole-punch reconnect hot-loop that pegs its entire CPU allotment
+# indefinitely (observed: ~1 core sustained for 4 days on a Tailscale node,
+# while LAN nodes that reach the guardian directly stay <3%). fmcd exposes no
+# iroh/relay knobs, but a restart demonstrably clears the stuck iroh state
+# (a fresh process idles at <1%). So we sample fmcd's own CPU usage and, if it
+# stays near its full allotment for a sustained window, restart it. Real work
+# (federation joins, ecash ops) is bursty and measured in seconds — it never
+# flat-pegs a core for many consecutive minutes — so the threshold below does
+# not fire on legitimate load.
 set -u
+
+CLK=$(getconf CLK_TCK 2>/dev/null || echo 100)
+WATCH_SAMPLE="${FMCD_WATCH_SAMPLE:-60}"     # seconds between CPU samples
+WATCH_CORES="${FMCD_WATCH_CORES:-0.18}"     # cores; "hot" if usage exceeds this
+WATCH_HITS="${FMCD_WATCH_HITS:-15}"         # consecutive hot samples -> restart (~15 min)
+
+# Total CPU ticks (utime+stime, fields 14+15 of /proc/PID/stat) for $1; 0 if gone.
+cpu_ticks() {
+    awk '{print $14 + $15}' "/proc/$1/stat" 2>/dev/null || echo 0
+}
+
+# Watch fmcd ($1). Returns (so the caller can kill it) once fmcd has been hot
+# for WATCH_HITS consecutive samples; exits quietly if fmcd dies on its own.
+watchdog() {
+    pid="$1"
+    hot=0
+    prev=$(cpu_ticks "$pid")
+    while kill -0 "$pid" 2>/dev/null; do
+        sleep "$WATCH_SAMPLE"
+        cur=$(cpu_ticks "$pid")
+        cores=$(awk -v c="$cur" -v p="$prev" -v clk="$CLK" -v s="$WATCH_SAMPLE" \
+            'BEGIN{ d=c-p; if (d<0) d=0; printf "%.3f", d/clk/s }')
+        prev="$cur"
+        if [ "$(awk -v c="$cores" -v t="$WATCH_CORES" 'BEGIN{print (c>t)?1:0}')" = "1" ]; then
+            hot=$((hot + 1))
+            echo "[fmcd-run] watchdog: fmcd hot (${cores} cores) ${hot}/${WATCH_HITS}" >&2
+            if [ "$hot" -ge "$WATCH_HITS" ]; then
+                echo "[fmcd-run] watchdog: fmcd stuck high-CPU — restarting to clear iroh state" >&2
+                kill -TERM "$pid" 2>/dev/null
+                sleep 5
+                kill -KILL "$pid" 2>/dev/null
+                return 0
+            fi
+        else
+            hot=0
+        fi
+    done
+    return 0
+}
+
+# Forward container stop signals to the running fmcd (FMCD_PID is reread when
+# the trap fires, so it always targets the current child).
+FMCD_PID=
+trap 'kill -TERM "$FMCD_PID" 2>/dev/null; exit 0' TERM INT
+
 while true; do
-    fmcd || true
-    echo "[fmcd-run] fmcd exited (federation unreachable?); retrying in 30s" >&2
+    fmcd &
+    FMCD_PID=$!
+    watchdog "$FMCD_PID" &
+    WD_PID=$!
+    wait "$FMCD_PID" 2>/dev/null
+    kill -TERM "$WD_PID" 2>/dev/null
+    wait "$WD_PID" 2>/dev/null
+    echo "[fmcd-run] fmcd exited (federation unreachable or watchdog restart); retrying in 30s" >&2
    sleep 30
 done
--- a/docker/mempool-frontend/Dockerfile
+++ b/docker/mempool-frontend/Dockerfile
@ -0,0 +1,14 @@
+# Archipelago mempool frontend — adds a resilient nginx backend proxy.
+#
+# The only delta vs the upstream image is /patch/entrypoint.sh, which rewrites
+# the generated nginx-mempool.conf to use `resolver` + a variable proxy_pass so
+# the frontend re-resolves the backend (mempool-api) via DNS on every request.
+# Without this, nginx pins the backend IP at startup and serves 502 / "offline"
+# after any backend restart (podman reassigns the IP). See the script header.
+ARG BASE=146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0
+FROM ${BASE}
+
+# --chmod keeps the exec bit (build runs as USER 1000, plain COPY lands root:0644
+# → "not executable"). Base USER/ENTRYPOINT/CMD (1000 / /patch/entrypoint.sh /
+# nginx -g "daemon off;") are inherited unchanged.
+COPY --chmod=0755 entrypoint.sh /patch/entrypoint.sh
--- a/docker/mempool-frontend/entrypoint.sh
+++ b/docker/mempool-frontend/entrypoint.sh
@ -0,0 +1,137 @@
+#!/bin/sh
+__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__=${BACKEND_MAINNET_HTTP_HOST:=127.0.0.1}
+__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__=${BACKEND_MAINNET_HTTP_PORT:=8999}
+__MEMPOOL_FRONTEND_HTTP_PORT__=${FRONTEND_HTTP_PORT:=8080}
+
+CONF=/etc/nginx/conf.d/nginx-mempool.conf
+
+# ─── archipelago patch ────────────────────────────────────────────────────
+# The stock frontend writes `proxy_pass http://<backend>:8999` with a literal
+# hostname and NO resolver, so nginx resolves the backend IP ONCE at worker
+# start and caches it for the process lifetime. Podman reassigns the backend
+# container's IP whenever it is restarted/recreated (gate, OTA, crash, reboot
+# re-IPAM), after which nginx keeps proxying to the dead IP → /api hangs, the
+# websocket 502s, and the mempool UI shows "offline" until nginx is reloaded.
+#
+# Fix: force per-request DNS re-resolution via `resolver` + a variable in
+# proxy_pass. Because a variable in proxy_pass disables nginx's automatic
+# location→URI rewriting, each block is rewritten to preserve its original
+# path mapping exactly:
+#   /api/v1/ws, /ws → "/"            (var + "/" replaces the whole URI)
+#   /api/v1         → identity       (no-URI proxy_pass passes $uri unchanged)
+#   /api/           → /api/v1/$1     (explicit rewrite, then no-URI proxy_pass)
+# Operates on the __PLACEHOLDER__ tokens so the host/port sed below fills in
+# the concrete values (incl. the `set $mp_backend` line). Idempotent.
+# Resolver address: podman's aardvark-dns answers on the network gateway
+# (e.g. 10.89.0.1), NOT Docker's 127.0.0.11. Read it from resolv.conf so this
+# works on any podman network/subnet (and still falls back for Docker).
+ARCHY_RESOLVER=$(awk '/^nameserver/ { print $2; exit }' /etc/resolv.conf 2>/dev/null)
+ARCHY_RESOLVER=${ARCHY_RESOLVER:-127.0.0.11}
+
+if ! grep -q 'set \$mp_backend' "$CONF"; then
+  awk -v res_addr="$ARCHY_RESOLVER" '
+    BEGIN { res = 0 }
+    /^[[:space:]]*location / && res == 0 {
+      print "\tresolver " res_addr " valid=10s ipv6=off;"
+      res = 1
+    }
+    /proxy_pass http:\/\/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__\/;/ {
+      print "\t\tset $mp_backend __MEMPOOL_BACKEND_MAINNET_HTTP_HOST__;"
+      print "\t\tproxy_pass http://$mp_backend:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__/;"
+      next
+    }
+    /proxy_pass http:\/\/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__\/api\/v1\/;/ {
+      print "\t\tset $mp_backend __MEMPOOL_BACKEND_MAINNET_HTTP_HOST__;"
+      print "\t\trewrite ^/api/(.*)$ /api/v1/$1 break;"
+      print "\t\tproxy_pass http://$mp_backend:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__;"
+      next
+    }
+    /proxy_pass http:\/\/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__\/api\/v1;/ {
+      print "\t\tset $mp_backend __MEMPOOL_BACKEND_MAINNET_HTTP_HOST__;"
+      print "\t\tproxy_pass http://$mp_backend:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__;"
+      next
+    }
+    { print }
+  ' "$CONF" > "$CONF.archy" && mv "$CONF.archy" "$CONF"
+fi
+# ─── end archipelago patch ────────────────────────────────────────────────
+
+sed -i "s/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__/${__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__}/g" /etc/nginx/conf.d/nginx-mempool.conf
+sed -i "s/__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__/${__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__}/g" /etc/nginx/conf.d/nginx-mempool.conf
+
+cp /etc/nginx/nginx.conf /patch/nginx.conf
+sed -i "s/__MEMPOOL_FRONTEND_HTTP_PORT__/${__MEMPOOL_FRONTEND_HTTP_PORT__}/g" /patch/nginx.conf
+cat /patch/nginx.conf > /etc/nginx/nginx.conf
+
+if [ "${LIGHTNING_DETECTED_PORT}" != "" ];then
+  export LIGHTNING=true
+fi
+
+# Runtime overrides - read env vars defined in docker compose
+
+__MAINNET_ENABLED__=${MAINNET_ENABLED:=true}
+__TESTNET_ENABLED__=${TESTNET_ENABLED:=false}
+__TESTNET4_ENABLED__=${TESTNET_ENABLED:=false}
+__SIGNET_ENABLED__=${SIGNET_ENABLED:=false}
+__LIQUID_ENABLED__=${LIQUID_ENABLED:=false}
+__LIQUID_TESTNET_ENABLED__=${LIQUID_TESTNET_ENABLED:=false}
+__ITEMS_PER_PAGE__=${ITEMS_PER_PAGE:=10}
+__KEEP_BLOCKS_AMOUNT__=${KEEP_BLOCKS_AMOUNT:=8}
+__NGINX_PROTOCOL__=${NGINX_PROTOCOL:=http}
+__NGINX_HOSTNAME__=${NGINX_HOSTNAME:=localhost}
+__NGINX_PORT__=${NGINX_PORT:=8999}
+__BLOCK_WEIGHT_UNITS__=${BLOCK_WEIGHT_UNITS:=4000000}
+__MEMPOOL_BLOCKS_AMOUNT__=${MEMPOOL_BLOCKS_AMOUNT:=8}
+__BASE_MODULE__=${BASE_MODULE:=mempool}
+__ROOT_NETWORK__=${ROOT_NETWORK:=}
+__MEMPOOL_WEBSITE_URL__=${MEMPOOL_WEBSITE_URL:=https://mempool.space}
+__LIQUID_WEBSITE_URL__=${LIQUID_WEBSITE_URL:=https://liquid.network}
+__MINING_DASHBOARD__=${MINING_DASHBOARD:=true}
+__LIGHTNING__=${LIGHTNING:=false}
+__AUDIT__=${AUDIT:=false}
+__MAINNET_BLOCK_AUDIT_START_HEIGHT__=${MAINNET_BLOCK_AUDIT_START_HEIGHT:=0}
+__TESTNET_BLOCK_AUDIT_START_HEIGHT__=${TESTNET_BLOCK_AUDIT_START_HEIGHT:=0}
+__SIGNET_BLOCK_AUDIT_START_HEIGHT__=${SIGNET_BLOCK_AUDIT_START_HEIGHT:=0}
+__ACCELERATOR__=${ACCELERATOR:=false}
+__ACCELERATOR_BUTTON__=${ACCELERATOR_BUTTON:=true}
+__SERVICES_API__=${SERVICES_API:=https://mempool.space/api/v1/services}
+__PUBLIC_ACCELERATIONS__=${PUBLIC_ACCELERATIONS:=false}
+__HISTORICAL_PRICE__=${HISTORICAL_PRICE:=true}
+__ADDITIONAL_CURRENCIES__=${ADDITIONAL_CURRENCIES:=false}
+
+# Export as environment variables to be used by envsubst
+export __MAINNET_ENABLED__
+export __TESTNET_ENABLED__
+export __TESTNET4_ENABLED__
+export __SIGNET_ENABLED__
+export __LIQUID_ENABLED__
+export __LIQUID_TESTNET_ENABLED__
+export __ITEMS_PER_PAGE__
+export __KEEP_BLOCKS_AMOUNT__
+export __NGINX_PROTOCOL__
+export __NGINX_HOSTNAME__
+export __NGINX_PORT__
+export __BLOCK_WEIGHT_UNITS__
+export __MEMPOOL_BLOCKS_AMOUNT__
+export __BASE_MODULE__
+export __ROOT_NETWORK__
+export __MEMPOOL_WEBSITE_URL__
+export __LIQUID_WEBSITE_URL__
+export __MINING_DASHBOARD__
+export __LIGHTNING__
+export __AUDIT__
+export __MAINNET_BLOCK_AUDIT_START_HEIGHT__
+export __TESTNET_BLOCK_AUDIT_START_HEIGHT__
+export __SIGNET_BLOCK_AUDIT_START_HEIGHT__
+export __ACCELERATOR__
+export __ACCELERATOR_BUTTON__
+export __SERVICES_API__
+export __PUBLIC_ACCELERATIONS__
+export __HISTORICAL_PRICE__
+export __ADDITIONAL_CURRENCIES__
+
+folder=$(find /var/www/mempool -name "config.js" | xargs dirname)
+echo ${folder}
+envsubst < ${folder}/config.template.js > ${folder}/config.js
+
+exec "$@"
--- a/docs/1.8-alpha-improvements-tracker.md
+++ b/docs/1.8-alpha-improvements-tracker.md
@ -1,231 +0,0 @@
-# 1.8-alpha Improvements Tracker
-
-Last updated: 2026-06-12 01:15 EDT
-
-This tracks the user-facing improvement list that must land with the `1.8-alpha`
-container migration release and the next ISO cut produced from that release. It
-is intentionally separate from the container handoff docs, but should be treated
-as release and ISO smoke-test scope.
-
-Status legend:
-
- `todo`: not started.
- `in-progress`: active local work or validation.
- `blocked`: needs host access, hardware, credentials, a product decision, or an
-  external artifact.
- `done`: implemented and validated for this release.
- `defer?`: candidate to explicitly defer from `1.8-alpha` after product review.
-
-Resume protocol:
-
-1. Read this file after `docs/NEXT_TERMINAL_HANDOFF.md`.
-2. Keep every user-requested improvement represented here until it is either
-   `done` or explicitly moved out of `1.8-alpha` by product decision.
-3. When implementation starts, change status to `in-progress` and add the file,
-   test, host, or design decision being worked.
-4. Mark `done` only after the change is implemented and validated locally or on
-   the release validation host, as appropriate.
-5. Before cutting the next ISO, run this checklist as part of ISO smoke testing.
-
-Active-session note, 2026-06-10 05:48 EDT: resumed from
-`docs/NEXT_TERMINAL_HANDOFF.md`; no `.198` host actions have been run yet. The
-immediate tracker-affecting local gate is rerunning the focused Rust
-`container::image_versions::tests` validation for the Nextcloud false-update
-row, then continuing lifecycle/control-plane truthfulness work.
-
-Resume-save checkpoint, 2026-06-10 08:32 EDT: the current pass stayed on the
-fixes backlog, not app migration. No `.198` host actions were run, no dev server
-was intentionally left running, and no long-running validation command is
-expected to still be active. Continue from the in-progress `Make tabs info load
-quickly or show loading states` row or the next unresolved fixes-backlog row.
-
-Active-session progress: `git diff --check` passed. Focused image-version Rust
-validation is still inconclusive because the tool PTY stayed open with no
-active compiler process visible, a bounded 300s retry using the normal
-workspace target exited `124` before test output, and a fresh 600s retry in
-`/tmp/archy-cargo-image-versions-2` also exited `124` after compiling into the
-`archipelago` crate without reaching test output. The Nextcloud false-update
-row remains `in-progress`. A local lifecycle fix is in progress so migrated
-single-orchestrator app stops return immediately with a transitional state
-instead of blocking the UI while Podman cleanup runs; `cargo fmt --check` and
-focused backend compile check passed, and `git diff --check` is clean. Latest
-credentials backlog follow-up added backend PhotoPrism credentials, centered
-the mobile credential pre-launch modal in My Apps and the icon grid, and passed
-focused frontend tests, type-check, backend compile check, `cargo fmt --check`,
-and `git diff --check`. Web5 Connected Nodes Messages/Requests, Web5
-Identities, and DWN message browsing now preserve visible content during
-refresh/failure and show compact refresh labels instead of replacing populated
-tabs with loading panels; focused tests and type-check passed. Server Network
-overview, Network Interfaces, and Tor Services cards now keep visible values
-during refresh or refresh failure and show compact refresh labels instead of
-reverting to skeletons or false empty states; focused test and type-check
-passed. The standalone Credentials view now keeps credential rows visible
-during refresh/failure and shows `Refreshing credentials...`; focused test and
-type-check passed. Lightning Channels now keeps existing channels visible
-during refresh/failure and shows `Refreshing channels...`; focused test and
-type-check passed. Peer Files now keeps existing peer catalog items visible
-during Tor refresh/failure and shows `Refreshing peer files...`; focused test,
-type-check, and `git diff --check` passed. Cloud peer cards now remain visible
-during federation peer-list refresh/failure with `Refreshing peer nodes...`;
-focused test, type-check, and `git diff --check` passed. The Web5 Verifiable
-Credentials summary now keeps credential rows visible during refresh/failure
-with `Refreshing credentials...`; focused test, type-check, and
-`git diff --check` passed. Web5 Nostr Relays now keeps relay stats visible
-during refresh/failure with `Refreshing relays...`; focused test, type-check,
-and `git diff --check` passed. Web5 Domains now keeps registered-name counts
-visible during refresh/failure with `Refreshing domains...`; focused test,
-type-check, and `git diff --check` passed. Settings Backups now keeps existing
-backup rows visible during refresh/failure with `Refreshing backups...`;
-focused test, type-check, and `git diff --check` passed. Settings Transport
-Preferences now keeps preference controls visible during refresh/failure with
-`Refreshing transport preferences...`; focused test, type-check, and
-`git diff --check` passed. Settings VPN status now keeps current connection
-details visible during refresh/failure with `Refreshing VPN status...`;
-focused test, type-check, and `git diff --check` passed. Web5 Federation now
-shows `Refreshing federation...` during summary refresh and keeps existing node
-counts/DID visible on refresh failure; focused test, type-check, and
-`git diff --check` passed. Mesh map denied-location behavior now has component
-coverage proving browser location denial reports that peer positions can still
-appear without requiring local location; focused test, type-check, and
-`git diff --check` passed. Companion/app-session mobile tab-app handling now
-keeps apps that require a new tab inside the mobile session fallback instead of
-auto-opening an external tab and closing; focused app-session, launcher, and
-config tests passed with type-check and `git diff --check`.
-Nostr Discoverable Nodes now keeps discovered rows visible during relay refresh
-or relay failure and shows `Searching relays...`; focused test, type-check, and
-`git diff --check` passed. App Store/App Details screenshot sections now render
-only real screenshot metadata and no longer show fake placeholder tiles when no
-assets exist; focused App Details content and marketplace handoff tests,
-type-check, and `git diff --check` passed. Home now has an App Store
-recommendations card driven by uninstalled core/recommended marketplace apps;
-the recommendations respect installed aliases so apps drop out after install
-and move into normal My Apps/Home behavior. Focused helper tests, type-check,
-`git diff --check`, and the Playwright Home dashboard smoke passed. Easy Mode
-goal configure steps now route to their owning app/screen, verify steps have an
-explicit `Check & Continue` action, and configure/info/verify actions start
-goal progress before completing the step; focused goal action/store tests,
-type-check, and `git diff --check` passed. Setup path selection no longer shows
-the disabled `Connect Existing (Coming Soon)` option; Fresh Start and Restore
-from Seed are the only visible choices and route correctly. Focused onboarding
-option/composable tests, type-check, and `git diff --check` passed. Header
-responsiveness follow-up restored the primary My Apps/App Store/Websites
-navigation to persistent desktop tabs at `md+` on My Apps, Discover, and
-Marketplace; removed the desktop primary dropdowns; kept mobile dropdown
-behavior; delayed App Store category collapse by lowering the search reserve and
-header gap; and removed the My Apps desktop category dropdown. Focused
-Marketplace/App config tests, type-check, and scoped `git diff --check` passed.
-Browser smoke against the already-running local Vite/mock session is still next.
-
-Active-session update, 2026-06-12 01:15 EDT: system update UX hardening landed
-locally. `load_state()` now clears stale `update_in_progress` when no staged OTA
-files exist, so failed legacy update attempts cannot leave the update screen
-permanently stuck. Direct `update.git-apply` is gated behind
-`ARCHIPELAGO_GIT_UPDATES`, preventing production nodes from accidentally entering
-the local git/self-build path that requires `cargo`. `.116` was recovered from a
-failed self-build attempt by applying its already-staged manifest OTA; it is now
-on `1.7.84-alpha`, backend health is OK, nginx is active/config-valid, HTTP UI
-returns `200`, `update_in_progress=false`, and staging was removed. Validation:
-`cargo fmt --check`, `cargo check -p archipelago`, and scoped `git diff --check`
-passed; focused `cargo test` was blocked by a local `rust-lld` undefined hidden
-symbol linker failure unrelated to the updater patch.
-
-Done criteria for this tracker:
-
- Code/UI items: implemented, covered by targeted test or manual smoke check,
-  and no known regression against the container migration work.
- Runtime/container items: validated on the release host named in
-  `docs/NEXT_TERMINAL_HANDOFF.md`, then included in ISO smoke test scope.
- Product-decision items: documented decision plus implementation task if the
-  decision keeps it in `1.8-alpha`.
- External/hardware items: hardware/document/access obtained, or explicitly
-  deferred from the release by product decision.
-
-## Release-Critical Runtime Gates
-
-| Item | Status | Release question / blocker |
-| --- | --- | --- |
-| Check logs of every server for errors and fix | blocked | Needs explicit target server list. Current docs name `.198`; are there more production validation hosts? |
-| Go through issues on gate | blocked | Need location of "gate" issue tracker/board and access details. |
-| Sort out container tagging so databases, backend, etc are sorted properly | in-progress | Tie to manifest/catalog metadata and My Apps grouping. |
-| Sort out supplementary container naming so it is better | in-progress | Needs naming convention for dependencies: app-prefixed service names vs role-first names. |
-| Figure out how we offer updates to apps | todo | Product/runtime design needed: manual update, scheduled checks, or auto-update by app tier. |
-| Figure out how we provide different versions for Bitcoin to download and keep updated automatically | todo | Requires release policy for Knots/Core versions and whether users may pin old versions. |
-| Make sure all credentials are given for apps without registration | in-progress | File Browser now exposes credentials on App Details and in the pre-launch interstitial. Backend `package.credentials` returns the secured File Browser password from `/var/lib/archipelago/secrets/filebrowser/password` when present, with `admin/admin` fallback matching the install hook. PhotoPrism now exposes manifest-backed `admin` / `archipelago` credentials from both backend `package.credentials` and the frontend fallback. My Apps and mobile icon-grid credential pre-launch modals are vertically centered on mobile. Covered by `appCredentials.test.ts`, `AppIconGrid.test.ts`, local type-check, backend compile check, `cargo fmt --check`, and `git diff --check`. Grafana was not added because `GRAFANA_ADMIN_PASSWORD` is not resolved to a known repo default/secret. Remaining no-registration apps still need inventory. |
-| Nextcloud always shows update, and how are apps actually updated? | in-progress | Nextcloud manifest/catalog metadata is aligned to the pinned `nextcloud:29` image, and update detection now ignores registry-host-only image changes while still reporting real same-repo tag drift. Catalog drift check passed. Backend focused test was added but local validation hit a Rust linker/incremental artifact failure, then bounded retries exited `124` before test output, including a 600s fresh-target retry on 2026-06-10. Broader app update UX/policy design still needed. |
-| Make sure Tor is solid as having to rotate addresses to get it to work | todo | Needs `.198`/target-host Tor logs and reproducible failure case. |
-| Fix fleet it does not seem to work | done | Fleet data now preserves existing nodes during refresh, exposes an explicit refreshing state, sorts online nodes first, avoids duplicate history fetches when selecting a node, accepts backend `entries` and legacy `history` response shapes for per-node charts, and uses readable loading/auto-refresh UI. Covered by `useFleetData.test.ts`, local type-check, targeted tests, and user visual review of the Fleet header/card treatment. |
-| Check Beta Telemetry and how it works | done | Telemetry is opt-in via `analytics-config.json`; the background reporter runs every 15 minutes only when enabled, saves `telemetry-latest.json`, writes local Fleet reports/history under `telemetry-fleet/`, and optionally POSTs a `telemetry.ingest` JSON-RPC envelope to `TELEMETRY_COLLECTOR_URL`. The systemd unit now reads optional `/var/lib/archipelago/telemetry.env`, and deploys write that file when `TELEMETRY_COLLECTOR_URL` is exported in `scripts/deploy-config.sh`. Manual and periodic report schemas now both include metric percentages and container inventory, and the Fleet UI normalizes older reports with missing fields. Covered by local type-check, `useFleetData.test.ts`, `cargo check -p archipelago`, deploy-script syntax check, and `git diff --check`. Remaining ops step: choose the real collector URL, deploy it, restart the service, and confirm central Fleet ingest. |
-| Get Netbird working | todo | Requires app/runtime validation and credentials/config expectations. |
-| Sort out how we are going to manage lightning channel creation | todo | Product design needed for UX, safety limits, fees, and peer selection. |
-| Make sure old health notifications do not return on refresh/new login when stale/out of date | done | Health toasts now require a current app-linked unhealthy package state and hide stale package health notifications after 30 minutes on reload/new login. Backend monitoring notifications now prune duplicate active alerts and old generic alerts before pushing new ones. Covered by `HealthNotifications.test.ts`, local type-check, targeted frontend tests, and backend notification unit test work. |
-| Fix BTCPay issue from desktop file "BTCPay Issues" | blocked | Need file contents or path to that desktop artifact. |
-| Check Nostr Discoverable Nodes and get it working correctly | in-progress | Discover modal now keeps discovered rows visible during relay refresh/failure and shows `Searching relays...` instead of dropping to an empty state. Covered by `DiscoverModal.test.ts`, local type-check, and `git diff --check`. Needs live relay/trust validation before marking done. |
-| Make sure update password is working properly | done | Backend now returns separate SSH update status so a successful web password change is not reported as a full failure when optional SSH password update fails. Settings modal shows success plus SSH warning and stays open for review. Covered by local type-check, focused modal/RPC tests, auth unit test, `cargo check -p archipelago`, and `git diff --check`. |
-| Prevent System Update screen from getting permanently stuck | done | Update state loading now reconciles `update_in_progress` with the actual manifest OTA staging directory and clears stale stuck state when no staged files exist. Direct git/self-build apply is disabled unless `ARCHIPELAGO_GIT_UPDATES` is explicitly set, so production nodes cannot fall into the old `self-update.sh` path that requires local `cargo`. `.116` was recovered by applying its valid staged manifest OTA and verified on `1.7.84-alpha` with backend health OK, nginx active/config-valid, HTTP UI `200`, `update_in_progress=false`, and staging removed. Validated locally with `cargo fmt --check`, `cargo check -p archipelago`, and scoped `git diff --check`; focused `cargo test` was blocked by a local `rust-lld` linker artifact failure unrelated to the updater patch. |
-| Do UI performance and general performance improvements | todo | Needs profiling target; start with obvious loading/render issues. |
-| Make sure companion app is all working well, had issues with tab apps | in-progress | Mobile app-session now keeps apps that require a new tab inside the session fallback instead of auto-opening an external tab and closing immediately. Covered by `AppSessionMobileNewTab.test.ts`, existing app-session config tests, app launcher tests, local type-check, and `git diff --check`. Broader companion smoke test still needed before marking done. |
-| Even though performance is better, on reboot/restart backend/update show checking-containers notification instead of no apps | done | My Apps now shows a dedicated `Checking containers` card when initial backend data has loaded but `server-info.status-info.containers-scanned` is still false and no apps are ready to render, instead of falling through to the no-apps empty state. A follow-up UI pass preserves the last known app list when a later scanner/backoff update reports an empty package map with `containers-scanned=false`, and shows a refresh status banner above the grid. Validated by local type-check, targeted tests, and `git diff --check`; follow-up validation passed `npm test -- --run src/views/apps/__tests__/appPackageCache.test.ts` and `npm run type-check`. |
-| Check mesh core is picking up public channel/other devices, not just Archipelago ones | blocked | Needs Meshtastic hardware/radio environment. |
-| Make tabs info load quickly or show loading states | in-progress | Fleet now has initial loading/background-refresh states, and node history keeps showing while the next sample is fetched instead of blanking out. Web5 Connected Nodes Trusted/Observers tabs now show loading instead of empty states while peer data is pending and keep existing lists visible during refresh; Messages and Requests now also keep populated lists visible during refresh/failure. Web5 Shared Content now keeps My Content visible during refresh/failure with `Refreshing shared content...`, and Browse Peers keeps current same-peer results visible during refresh with `Refreshing peer content...` instead of replacing lists with full loading panels. Web5 Identities now keeps the identity list visible during refresh/failure with `Refreshing identities...`; Web5 DWN message browsing keeps stored messages visible during refresh/failure with `Refreshing messages...`. The Web5 Verifiable Credentials summary keeps credential rows visible during refresh/failure with `Refreshing credentials...`. Web5 Nostr Relays keeps relay stats visible during refresh/failure with `Refreshing relays...`. Web5 Domains keeps registered-name counts visible during refresh/failure with `Refreshing domains...`. Web5 Federation keeps summary node counts/DID visible during refresh/failure with `Refreshing federation...`. Server Network overview, Network Interfaces, and Tor Services cards now keep visible values during refresh/failure with `Refreshing network...`, `Refreshing interfaces...`, and `Refreshing Tor services...`. Credentials keeps credential rows visible during refresh/failure with `Refreshing credentials...`. Settings Backups keeps backup rows visible during refresh/failure with `Refreshing backups...`. Settings Transport Preferences keeps preference controls visible during refresh/failure with `Refreshing transport preferences...`. Settings VPN status keeps current connection details visible during refresh/failure with `Refreshing VPN status...`. Lightning Channels keeps existing channels visible during refresh/failure with `Refreshing channels...`. Peer Files keeps existing peer catalog items visible during Tor refresh/failure with `Refreshing peer files...`. Cloud keeps existing peer cards visible during federation peer-list refresh/failure with `Refreshing peer nodes...`. Covered by focused Web5/Server/Credentials/Backups/Transport/VPN/Lightning/Peer Files/Cloud tests and local type-check. Broader tab-info audit still needed for other slow panels before marking done. |
-| Add states about why Bitcoin address is not ready | in-progress | Receive Bitcoin on-chain flows now reject blank LND address responses and translate common LND/Bitcoin readiness failures into user-facing reasons: wallet locked, wallet uninitialized, Bitcoin/LND still syncing, LND unreachable, or LND REST/newaddress transport issues. The receive modals now show a live “checking wallet readiness” message while the request is in flight. Backend `lnd.newaddress` now errors if LND returns an error or no address. Needs live wallet-state smoke test before marking done. |
-| Add new Bitcoin wallets easily and securely | todo | Product/security design needed. |
-| Add the new gate instead of gate | blocked | Need definition of "new gate" and target integration. |
-| Local Nostr signer app should ask which account after logout/re-login | todo | Needs signer/session state validation. |
-| See what apps can migrate to local Nostr signer sign-in | todo | Needs app-by-app auth inventory. |
-| Make server name change change the host name | in-progress | Settings label changed to `Hostname`. `server.set-name` now persists the display name, derives a Linux-safe hostname slug, attempts `sudo -n hostnamectl set-hostname`, and returns non-fatal hostname warning fields if OS update fails. Covered by hostname slug unit test, local type-check, `cargo check -p archipelago`, and `git diff --check`. Impact audit: mDNS/SSH/Tailscale labels may change; already-created app configs using old `HOST_MDNS` (notably Fedimint derived env) are not automatically rewritten by hostnamectl, so this needs release-host smoke validation before marking done. |
-| Sort out HTTPS certificate, what is best way? | todo | Needs product decision: self-signed local CA, ACME DNS, Tailscale certs, or reverse proxy model. |
-
-## User Interface And App Experience
-
-| Item | Status | Release question / blocker |
-| --- | --- | --- |
-| LND Channels then back/back gets stuck between LND detail and channels | done | App Details back now routes explicitly to the parent surface, and Lightning Channels back replaces history so browser back no longer bounces between LND detail and Channels. Validated by local type-check and targeted tests. |
-| Add a Meshtastic icon | done | Added `meshcore.svg` asset and manifest-owned icon metadata. Catalog generation is idempotent and strict catalog drift is clean. |
-| Improve default app icon fallback | done | Missing/broken app icons now fall back to the centered Archipelago `A` mark using the same black fill and gradient-border treatment as the custom UI icon asset, instead of the old generic placeholder. Applied to My Apps cards, mobile icons, Marketplace cards, and App Details. Validated by local type-check, targeted tests, Rust check, and `git diff --check`. |
-| Use favicon for Portainer apps? | todo | Need decision: use upstream favicons dynamically or ship curated icons. |
-| Settings for apps | blocked | Needs definition: per-app config screen, runtime env vars, credentials, or install options? |
-| Update SearXNG app icon | blocked | Needs user-provided/approved icon asset. User said to move past this until they can make icons. |
-| Once an app is installed remove recommended/core pills | done | Marketplace cards hide tier badges when installed. Validated by `MarketplaceAppCard.test.ts`, targeted Vitest, type-check, and `git diff --check`. |
-| Get Bitcoin / LND UI fully done with all options and controls | todo | Large feature area; needs scope for `1.8-alpha` vs post-release. |
-| Fix intro always showing on new browser sessions | done | Splash gating now checks the backend onboarding-complete state before showing the intro when this browser has no local intro flag. Already-onboarded nodes skip the splash and seed `neode_intro_seen`; fresh installs still show it. Covered by `introSplash.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Fix App Store tabs/categories/search overflow | done | Discover/App Store and Marketplace render one shared App Store section list. Follow-up after user review restored the primary My Apps/App Store/Websites navigation to persistent desktop tabs at `md+` on My Apps, Discover, and Marketplace; mobile keeps dropdown behavior. App Store category collapse now happens later by starting uncollapsed and using a smaller header gap/search reserve, and the My Apps category dropdown no longer appears on desktop. Covered by local type-check, focused Marketplace/App config tests, and scoped `git diff --check`; browser smoke remains the next resume step. |
-| Add a test harness for all of the application | in-progress | Lifecycle harness exists; need expand UI/e2e coverage definition. |
-| Fix app details screen links | done | App Details sidebar no longer renders dead `href="#"` links. It now renders only real manifest website/marketing, upstream/wrapper repo, and support URLs, and hides the Links card when no usable URLs exist. Covered by `AppSidebar.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Fix FIPS anchoring, update FIPS | todo | Needs expected FIPS UX/API behavior. |
-| Fix generate receive address not working on nodes and identify wallet management | todo | Needs wallet API/backend validation. |
-| Fix mesh page on larger screens so it scales nicely | done | Mesh keeps the tabbed tools layout on normal desktop/1920px widths and only splits Off-Grid Bitcoin, Dead Man, and Map into separate stacked containers on very large screens (`>=2560px` wide and `>=1200px` tall). The desktop tools column now fills its panel instead of using a wrapper scroll container. Validated by local type-check, targeted tests, and `git diff --check`. |
-| Mesh map should handle denied location permission and still show other devices | in-progress | Mesh map now treats browser geolocation as optional in the UI: denied local location reports that peer locations can still appear, and the empty hint waits for mesh device positions instead of saying location sharing is required. Covered by `MeshMap.test.ts`. Needs browser smoke test with denied location plus a peer coordinate message before marking done. |
-| Make tablet-size Meshtastic scrollable | done | Tablet/mobile Mesh tools panels now have bounded heights and internal scrolling so the selected Bitcoin/Dead Man/Map panel can scroll without blowing out the page. Validated by local type-check, targeted tests, and `git diff --check`. |
-| Make mobile screens have gap below lowest container and tab bar | done | Dashboard route panels, including the separate Chat/Mesh branch, now use mobile tab-bar bottom clearance so the lowest content clears the bottom tab bar. |
-| Add Trusted tab to Connected Nodes container and have Peers and Observers | done | Connected Nodes now labels trusted peers as Trusted and splits federation nodes with `trust_level: observer` into the Observers tab. Observer nodes are excluded from Trusted, shown with their own count/badge, and refresh from the same live federation list. Validated by local type-check and targeted tests. |
-| Add more tree navigation to cloud files so they do not all go back to first screen | done | Cloud folder navigation now persists the current folder path in the route query so refresh/browser back keeps nested folders instead of resetting to the section root. The Cloud back button now walks up to the parent folder before returning to Cloud home. Covered by `cloudPath.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Fix visible UI refreshing on find nodes screens | done | Federation node auto-refresh no longer blanks/replaces the visible node lists after the initial load. Existing nodes stay visible during background refreshes, covered by `NodeList.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Remove dead UI components/ones that are coming soon | done | Removed the dead Web3/coming-soon Network card, disabled local-network placeholder button, and the non-interactive Spotlight AI Assistant coming-soon block. Verified active UI no longer contains explicit `Coming soon` copy outside historical release-note text. Covered by local type-check and `git diff --check`. |
-| Hide Web3 container on network for now and move FIPS Mesh up | done | Network page now places the live FIPS Mesh card in the top overview grid where the dead Web3 card was, removes the duplicate lower FIPS card, and updates the Home Network description to remove Web3 language. Validated by local type-check, targeted tests, and `git diff --check`. |
-| Make cool screens less hidden: Find Nodes, Fleet, Monitoring, etc. | done | Existing Web5 summary cards now expose Monitoring, Find Nodes/Federation, and Fleet directly. Federation card has separate `Find Nodes` and `Fleet` actions instead of hiding Find Nodes behind Fleet. Covered by `Web5Federation.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Fix dashboard container/card square rendering corruption | done | Generalized the App Store compositor workaround to dashboard scroll-panel glass cards/buttons/inputs and removed transform-based stagger movement so Chromium/Brave no longer paints random large black square/rectangle layers over containers. Kept the Web5 bottom-action placement change. Validated by local type-check, targeted tests, and `git diff --check`. |
-| Move constrained card header actions to bottom buttons | done | Web5 summary actions and Network actions for Add Device, Scan WiFi, Restart Tor, and Add Service now stay in the card header only on very wide screens; otherwise they render at the card bottom as full-width or 50/50 buttons. Button icons were removed from those action buttons. Validated by local type-check, targeted tests, and `git diff --check`. |
-| Work on setup screens function and flows | in-progress | Onboarding setup choice now shows only usable paths: Fresh Start and Restore from Seed. Removed the disabled `Connect Existing (Coming Soon)` option, and covered default Fresh routing plus Restore routing with `OnboardingOptions.test.ts`; `useOnboarding.test.ts`, local type-check, and `git diff --check` passed. Broader onboarding/setup audit still needed before marking done. |
-| Work on Easy Mode experience | in-progress | Easy Mode goal configure steps now route to their owning app/screen instead of silently completing without navigation; verify steps now expose a `Check & Continue` action; configure/info/verify actions start goal progress before completing the active step. Covered by `goalStepActions.test.ts`, existing goal store tests, local type-check, and `git diff --check`. Broader Easy Mode product scope still needed before marking done. |
-| Update My Apps homescreen to show most-used apps instead of hardcoded | done | App launches are recorded locally through the app launcher, and the Home My Apps card now shows the top three installed user apps by launch count/recency with a running-app/name fallback when there is no history. Covered by `appUsage.test.ts`, existing app launcher tests, local type-check, targeted tests, and `git diff --check`. |
-| Improve Full Archive Node dependent apps UX | in-progress | Electrum-style apps already block install on pruned Bitcoin nodes; Marketplace/App Store cards now surface an inline warning that a full archive Bitcoin node is required instead of only showing a terse `Bitcoin Pruned` button. Covered by `MarketplaceAppCard.test.ts` and local type-check. Broader dependency UX remains. |
-| Fix incorrect modals that are wrong color and are not full-screen overlay | done | Custom Teleport modals that still used the old light `bg-black/10` overlay now use the same full-screen `bg-black/60` overlay treatment as BaseModal/newer modals. Verified no fixed modal overlays retain `bg-black/10`; validated by local type-check, targeted tests, and `git diff --check`. |
-| Prevent modals from allowing background scroll | done | Added shared scroll-lock composable, root-level body lock, wheel/touch containment, and explicit dashboard route-panel locking. User validated the background no longer scrolls behind modal overlays. |
-| Look over gamepad navigation | todo | Needs focused controller-nav pass. |
-| App Store screenshots | in-progress | Placeholder policy fixed: Marketplace App Details and installed App Details now render screenshot sections only when real screenshot metadata exists, and otherwise hide the fake placeholder tiles. Metadata can be string URLs or `{ src, alt }` objects. Covered by `AppContentSection.test.ts`, `useMarketplaceApp.test.ts`, local type-check, and `git diff --check`. Needs actual screenshot assets/metadata before marking done. |
-| Fix App Detail page issues; container controls are not good | done | App Details container controls now disable while start/stop/restart/update/uninstall RPCs are running and show action-specific progress labels. Header actions collapse into the bottom 50/50 grid below `1280px` to avoid tablet/smaller desktop overlap. Credentials now show a loading state while package credentials are being fetched. Covered by `AppHeroSection.test.ts`, `AppSidebar.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Add setup instructions for apps that need them | done | App Details now renders a dedicated Setup Instructions card from `static-files.instructions` when present, so apps can show install/setup notes without a new schema. Covered by `AppSidebar.test.ts`, local type-check, and `git diff --check`. |
-| Add press-and-hold option for apps on mobile app screen | done | Mobile My Apps icons now support long press/context menu to open the app detail/options screen while a normal tap still launches the app. Space key opens the same options path for keyboard users. Covered by `AppIconGrid.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Side-load: add port-not-available validation | done | Sideload modal now validates app ID collisions, malformed `host:container` mappings, reserved Archipelago/package host ports, and host ports already exposed by installed packages before queueing install. Backend install remains the final bind authority. Covered by `sideloadValidation.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Delete app data option and uninstall warning | done | Uninstall dialogs in My Apps and App Details now include a clear warning plus a `Delete app data and reset it` choice. Leaving it off preserves app data for later reinstall; checking it passes `preserve_data=false` through `package.uninstall` so the app is fully reset. Covered by `AppsUninstallModal.test.ts`, `rpc-client.test.ts`, local type-check, targeted tests, and `git diff --check`. |
-| Add App Store container with recommended apps that change to Home Screen | done | Home now shows up to three uninstalled core/recommended App Store apps and routes clicks through the existing Marketplace App Details handoff. Installed aliases are honored, so recommendations disappear once the app is installed and the app moves into normal My Apps/Home behavior. Follow-up layout polish moved Cloud back into the second card slot, moved Recommended Apps into Cloud's previous slot, and placed Quick Start inside the grid next to Wallet to avoid an odd-width row. Covered by `homeRecommendations.test.ts`, local type-check, `git diff --check`, and Playwright Home dashboard smoke against local Vite/mock backend. |
-| Add QR code to download mobile companion app in login-triggered modal and improve modal | done | Companion intro modal now renders a QR code on desktop and a direct download button on mobile. It reads `VITE_COMPANION_APK_URL` and falls back to `/packages/archipelago-companion.apk.zip`; the APK zip is now published at `neode-ui/public/packages/archipelago-companion.apk.zip` so the modal can serve it immediately. Covered by local type-check, `git diff --check`, and manual file placement verification. |
-| Fix TV HDMI overscan clipping in kiosk mode | in-progress | Kiosk launcher now passes a browser safe-area fallback through `/kiosk?safe_area=...`; `/kiosk` now persists the safe-area value during redirect; self-update and deploy paths refresh kiosk launcher/services. The X11 safe-area attempt is opt-in because it stretched the live TV output on `100.66.157.120`. Wi-Fi UI fixes are included in the same OTA patch: scan errors are visible, scans can be retried, escaped SSIDs parse correctly, and open networks do not require a password. Needs live validation on HDMI node `100.66.157.120` after applying the visible OTA update. |
-| Video calling Picture-in-Picture | blocked | Need referenced document or desired provider/library. |
-| Card-based loading visuals on App Store pages | done | Discover and Marketplace now show app-card skeleton grids while community/Nostr catalog data is loading and no cards are available yet, instead of a centered spinner/empty state. Validated by local type-check, targeted tests, and `git diff --check`. |
-
-## External / Hardware Items
-
-| Item | Status | Release question / blocker |
-| --- | --- | --- |
-| Buy a HaLow device and start integration | blocked | Requires hardware purchase and driver/device target. Not a code-only `1.8-alpha` item unless hardware is available now. |
--- a/docs/BETA-ISSUES-20260328.md
+++ b/docs/BETA-ISSUES-20260328.md
@ -1,96 +0,0 @@
-# Beta Test Issues — 2026-03-28 (ISO build 2137)
-
-Hardware: Dell OptiPlex 3020M, i5, 8GB RAM, 465G HDD, UEFI+Legacy
-
-## ISO / Boot (image-recipe)
-
-### 1. UEFI autodetect broken
- **Severity**: High
- **Detail**: Only autodetects/boots in Legacy BIOS mode. UEFI boot does not autodetect the install disk.
- **Where**: `build-auto-installer-iso.sh` GRUB config, EFI boot chain
- **Status**: TODO
-
-### 2. Installation TUI screens need redesign
- **Severity**: Medium
- **Detail**: Current installer output is plain/ugly. Needs polished design.
- **Action**: User will provide .md mockup for each screen, then we implement.
- **Where**: `build-auto-installer-iso.sh` auto-install.sh embedded script
- **Status**: AWAITING DESIGN
-
-### 3. No TUI animations
- **Severity**: Low
- **Detail**: Would like Claude-style spinner/progress animations during install. May not be possible with bash.
- **Where**: auto-install.sh
- **Status**: TODO (investigate)
-
-### 4. USB read errors on boot
- **Severity**: Medium (cosmetic but bad first impression)
- **Detail**: Read errors scroll on screen during USB boot before installer loads. Scares new users.
- **Where**: Kernel/initramfs boot, possibly `quiet` not suppressing early messages
- **Status**: TODO
-
-### 5. GRUB background tiling + text cutoff
- **Severity**: Medium
- **Detail**: Boot menu background image tiles instead of scaling. Menu text ("Install Archipelago", "Failsafe mode") is cut off.
- **Where**: `branding/grub-theme/`, `boot/grub/grub.cfg`, theme.txt resolution settings
- **Status**: TODO
-
-### 6. USB removal drops to command line
- **Severity**: Medium
- **Detail**: After install completes, removing USB drops to shell before user presses Enter to reboot. Confuses non-technical users.
- **Where**: auto-install.sh — end of install, before `read -s` / `reboot`
- **Status**: TODO
-
-## Frontend / UI (neode-ui)
-
-### 7. Broken splash screen flashes before onboarding
- **Severity**: High
- **Detail**: Black screen with "online/offline" top-right, broken archipelago image top-left, "use arrow keys" text. Flashes briefly before onboarding loads.
- **Where**: Likely `RootRedirect.vue` or `SplashScreen.vue` — routing/transition timing
- **Status**: TODO (reported before, persists)
-
-### 8. Skip buttons still visible in onboarding
- **Severity**: Medium
- **Detail**: Onboarding flow still shows skip buttons. Should be removed for clean UX.
- **Where**: `src/views/onboarding/` components
- **Status**: TODO
-
-### 9. App install UX outdated
- **Severity**: High
- **Detail**: Missing the yellow "Installing..." button that persists across navigation. Apps don't show as "installing" in My Apps view during install.
- **Where**: `src/views/marketplace/`, `src/views/myapps/`, app install store
- **Status**: TODO
-
-### 10. Login requires double Enter
- **Severity**: Medium
- **Detail**: Password field on login page requires pressing Enter twice to submit.
- **Where**: `src/views/LoginView.vue` — form submission handler
- **Status**: TODO (reported before, persists)
-
-### 11. No password setting UI
- **Severity**: High
- **Detail**: No way for user to set/change their password from the web UI. Currently hardcoded `password123`.
- **Where**: Settings view, backend auth API
- **Status**: TODO
-
-### 12. Browser login loops (non-kiosk)
- **Severity**: High
- **Detail**: Logging in from a browser (not kiosk) on the same network redirects back to login in a loop. Kiosk mode works fine.
- **Where**: Auth/session handling — possibly cookie `SameSite` or redirect logic in `RootRedirect.vue`
- **Status**: TODO
-
-### 13. Can't exit input fields with arrow keys
- **Severity**: Medium
- **Detail**: When focused on a text input, up/down arrow keys don't move focus to adjacent UI elements. Stuck in the field.
- **Where**: `useControllerNav.ts` — input field focus trap logic
- **Status**: TODO (reported before, persists)
-
---
-
-## Summary
-
-| Category | Critical | High | Medium | Low |
-|----------|----------|------|--------|-----|
-| ISO/Boot | 0 | 1 | 4 | 1 |
-| Frontend | 0 | 4 | 3 | 0 |
-| **Total** | **0** | **5** | **7** | **1** |
--- a/docs/BETA-PROGRESS.md
+++ b/docs/BETA-PROGRESS.md
@ -1,335 +0,0 @@
-# Beta Progress Tracker
-
-> **Goal**: Flawless beta that works perfectly on every machine we install it on.
-> **Freeze started**: 2026-03-18
-> **Last updated**: 2026-03-25
-
---
-
-## Pipeline
-
-```
-PHASE 1: Feature Testing (internal)     ← WE ARE HERE
-    ↓
-PHASE 2: User Testing (real users, controlled)
-    ↓
-PHASE 3: Beta Live (public release)
-```
-
-**Current phase**: PHASE 1 — Feature Testing
-**Gate to Phase 2**: Every feature works, all bugs fixed, security hardened, ISO verified
-**Gate to Phase 3**: User testing feedback resolved, no P0/P1 issues remaining
-
---
-
-## Phase 1: Feature Testing (Internal)
-
-Everything in this phase must pass before we hand it to real users.
-
-### Overall Status: IN PROGRESS (~65%)
-
-| Workstream | Status | Completion | Gate-blocking? |
-|------------|--------|------------|----------------|
-| 1A. Critical Bugs (BUG-1 CSRF) | DONE | 100% | ~~YES~~ |
-| 1B. Boot Screen (FEATURE-4) | IN PROGRESS | ~80% (needs hardware test) | YES |
-| 1C. Security Hardening (TASK-8) | DONE (12/12 + code audit) | 100% | ~~YES~~ |
-| 1D. Rootless Podman (TASK-11) | DONE (.228), IN PROGRESS (.198) | ~80% | YES |
-| 1E. Beta Telemetry (TASK-12) | NOT STARTED | 0% | YES |
-| 1F. App Testing — every feature | NOT STARTED | 0% | YES |
-| 1G. ISO Build & Fresh Install | NOT STARTED | 0% | YES |
-| 1H. UI Polish & Layout | DONE (batch + What's New) | ~90% | No |
-| 1I. WebSocket Reliability | NOT STARTED | 0% | No |
-| 1J. Quality Baseline Check | NOT STARTED | 0% | No |
-| 1K. Architecture Review Fixes | DONE (4/4 items) | 100% | ~~YES~~ |
-| 1L. Update System (git.tx1138.com) | DONE | 100% | No |
-
-### 1A. Critical Bugs
-
-#### BUG-1: Random logout / CSRF mismatch — P0
-**Status**: PLANNED
-**Impact**: Users get randomly logged out. Blocks user testing — unacceptable UX.
-
-**What's known**:
- Sessions now persist to disk (fixed)
- CSRF token mismatch between cookie and header still causes 403s
- Likely caused by cookie rotation in multi-tab or deploy scenarios
-
-**Remaining work**:
- [ ] Add debug logging to capture actual cookie vs header values
- [ ] Reproduce reliably (multi-tab, deploy, long idle)
- [ ] Fix the root cause
- [ ] Verify fix survives deploys and multi-tab use
-
-#### BUG-3: IndeedHub WebSocket spam — P2
-**Status**: PLANNED
-**Impact**: Console noise, minor. Should fix before user testing.
-
- [ ] Rebuild IndeedHub with relative WebSocket URL
- [ ] Verify fix
-
---
-
-### 1B. Boot Screen (FEATURE-4)
-
-**Status**: IN PROGRESS (~80% complete)
-**Impact**: Users hit errors on first boot before backend is ready. Blocks user testing.
-
- [x] Audit current `/health` endpoint — returns trivial "OK"
- [x] Add granular service readiness to health endpoint (JSON with version + services)
- [x] Design boot screen component — BootScreen.vue (379 lines, starfield + terminal log + orb)
- [x] Create pixel art icon animations (6 SVG icons cycling)
- [x] Implement health polling with smooth transition (server.echo RPC, 2s interval)
- [x] Handle edge cases (timeout, 502/503 detection, boot-reset)
- [ ] Test on fresh ISO install (first-boot path)
- [ ] Test on normal reboot (existing user path)
-
---
-
-### 1C. Security Hardening (TASK-8)
-
-**Status**: DONE — 12/12 pentest findings fixed + additional hardening from code audit
-
-#### Pentest (12/12 fixed)
- [x] C1: /lnd-connect-info requires session auth
- [x] C3: DEV_MODE removed from production service
- [x] H1: node-message verifies ed25519 signatures
- [x] H2: federation.peer-joined verifies ed25519 signature
- [x] H3: federation.peer-address-changed requires signed proof
- [x] H4: Backend binds to 127.0.0.1
- [x] M1: content.add rejects `..` path traversal
- [x] M2: NIP-07 postMessage uses specific origin
- [x] M3: AIUI nginx checks session_id cookie
- [x] L2: Strict v3 onion validation
- [x] MED-03: Shell injection in bitcoin.conf generation
- [x] MED-07: No body size limit on /rpc/
-
-#### Code audit (additional)
- [x] CSRF: HMAC-derived from session token (BUG-1 fix)
- [x] Argon2id password hashing (bcrypt auto-upgrade)
- [x] Random Bitcoin RPC password on first boot
- [x] RBAC Viewer role: explicit allowlist
- [x] Error sanitization tightened
- [x] Identity label max length enforced
- [ ] Cosign image verification (large scope — post-beta candidate)
-
---
-
-### 1D. Rootless Podman (TASK-11)
-
-**Status**: DONE on .228 (30 containers rootless), IN PROGRESS on .198
-**Impact**: Security posture — containers no longer require root.
-
- [x] Migrate existing root Podman containers to rootless (archipelago user)
- [x] Update PodmanClient to run `podman` directly (no sudo) — 9 Rust files
- [x] Deploy script auto-fixes ownership + sysctl + linger on every deploy
- [x] All 30 containers running rootless on .228
- [ ] .198: only 2 containers running — needs full container recreation (TASK-39)
- [x] Tailscale deploy script: full deploy-tailscale.sh with split-mode SSH, rootful→rootless migration, container creation, all infrastructure
- [ ] Test full deploy on .198 (validation before Tailscale)
- [ ] Deploy to Tailscale nodes (Arch 1/2/3)
-
---
-
-### 1E. Beta Telemetry — Node Reporting (TASK-12)
-
-**Status**: NOT STARTED
-**Impact**: Without this we're blind during user testing — can't see what's broken on their machines.
-
-All beta nodes report health/errors to a central log. We build a panel to monitor and triage issues.
-
-**Design**:
- Opt-in telemetry (user consents during onboarding or settings)
- Each node periodically reports: health status, error log digest, container states, uptime
- Central endpoint collects reports (could be a simple API on one of our servers)
- Dashboard panel shows all reporting nodes, their status, recent errors
- Privacy: no wallet data, no keys, no personal data — only system health and error logs
- Nodes identified by anonymous ID (hash of DID), not IP or name
-
-**Tasks**:
- [ ] Design report payload (health, errors, container states, versions, uptime)
- [ ] Design privacy model — what's collected, what's NOT, user consent flow
- [ ] Build reporting endpoint (backend RPC → central collector)
- [ ] Build central collector service (receives + stores reports)
- [ ] Build monitoring dashboard/panel (view all nodes, filter by error type)
- [ ] Add opt-in toggle to Settings UI
- [ ] Add reporting interval config (default: every 15 min?)
- [ ] Test with multi-node fleet (.228, .198, Tailscale nodes)
-
---
-
-### 1F. App Testing — Every Feature
-
-**Status**: NOT STARTED
-**Reference**: `docs/BETA-RELEASE-CHECKLIST.md` — full matrix
-
-Systematic test of **every feature** on the dev server, then on fresh install.
-
-#### Core Flows
- [ ] Onboarding: welcome → password → path → DID → backup → dashboard
- [ ] Login / logout / re-login
- [ ] Password change (invalidates other sessions)
- [ ] 2FA enrollment and verification
- [ ] Settings: view server name, version, DID, Tor address
- [ ] Dashboard: all overview cards render with data
-
-#### App Lifecycle (every app)
- [ ] Bitcoin Knots: install, sync starts, UI loads, uninstall
- [ ] Electrs: install, auto-connects to Bitcoin, UI loads, uninstall
- [ ] LND: install, auto-connects to Bitcoin, UI loads, uninstall
- [ ] BTCPay Server: install, connects, Lightning available, uninstall
- [ ] Mempool: install with Bitcoin+Electrs, shows data, uninstall
- [ ] Fedimint + Gateway: install, UI loads, uninstall
- [ ] File Browser: install, UI loads, uninstall
- [ ] Immich: install, UI loads, uninstall
- [ ] PhotoPrism: install, UI loads, uninstall
- [ ] Penpot: install, UI loads, uninstall
- [ ] SearXNG: install, UI loads, uninstall
- [ ] Ollama: install, UI loads, uninstall
- [ ] Nostr Relay: install, UI loads, uninstall
- [ ] Nginx Proxy Manager: install, UI loads, uninstall
- [ ] Tailscale: install, UI loads, uninstall
- [ ] Home Assistant: install, UI loads (new tab), uninstall
- [ ] IndeedHub: opens external URL in iframe
-
-#### Dependency Chain Errors
- [ ] Electrs without Bitcoin → clear error message
- [ ] LND without Bitcoin → clear error message
- [ ] Mempool without Bitcoin+Electrs → clear error message
-
-#### Federation & Identity
- [ ] Federation invite + join between nodes
- [ ] DWN sync between federated nodes
- [ ] Backup create + download
- [ ] Backup restore on fresh install
-
-#### WebSocket
- [ ] Connects on login, receives initial data
- [ ] Reconnects after network drop
- [ ] Ping/pong heartbeat both directions
- [ ] Connection state visible in UI
- [ ] Install progress delivered real-time
-
-#### Nginx Proxies
- [ ] Every `/app/*` proxy resolves correctly
- [ ] BTCPay and Home Assistant open in new tab
- [ ] Tor hidden services resolve
-
---
-
-### 1G. ISO Build & Fresh Install
-
-**Status**: NOT STARTED
-
- [ ] ISO builds successfully on dev server
- [ ] ISO size < 10 GB
- [ ] All container images captured
- [ ] Boot from USB on x86_64 hardware
- [ ] Auto-installer partitions correctly
- [ ] Services start on first boot
- [ ] Web UI accessible within 3 minutes
- [ ] Full onboarding flow completes
- [ ] Second machine test (different hardware)
- [ ] ARM64 test (if targeting)
-
---
-
-### 1H. UI Polish & Layout
-
-**Status**: MOSTLY DONE — batch of fixes shipped 2026-03-18
-**Note**: Layout rearrangements and UX improvements allowed during freeze.
-
- [x] Rename fedimintd → "Fedimint Guardian" + icon (TASK-26)
- [x] Tab-launch icons for apps opening in new tabs (TASK-27)
- [x] Installed apps sorted to end of marketplace (TASK-28)
- [x] Mesh mobile: header hidden, overflow fixed (TASK-29)
- [x] On-Chain first in receive modals (TASK-30)
- [x] Federation node names — show name not DID, hover for key (TASK-35)
- [x] Cleaner iframe error screen with remediation (TASK-36)
- [x] CPU alert threshold fixed (BUG-33)
- [x] ElectrumX shows index size during indexing
- [x] Container startup "Checking..." shimmer
- [ ] Sticky nav header (TASK-31)
- [ ] Review all views for consistent glass design
- [ ] Verify all loading/empty/error states work
- [ ] Check responsive layout on tablet/mobile
-
---
-
-### 1I. WebSocket Reliability
-
-Covered under 1F testing — no separate workstream needed.
-
---
-
-### 1J. Quality Baseline Check
-
-**Last known** (2026-03-11):
- Silent catches: 0
- Console statements: 0
- `any` types: 0
- TypeScript errors: 0
- Tests: 515 passed
- npm audit (runtime): 0
-
- [ ] Re-run full quality sweep — verify no regressions
- [ ] Fix any new violations
-
---
-
-## Phase 2: User Testing (Controlled)
-
-**Gate**: All Phase 1 items pass. No P0/P1 bugs open.
-
-Starts when we hand ISOs to real users on real hardware we don't control.
-
-| Item | Status |
-|------|--------|
-| Recruit test users (3-5 people, varied hardware) | NOT STARTED |
-| Provide ISOs + install instructions | NOT STARTED |
-| Beta telemetry collecting reports from user nodes | NOT STARTED |
-| Monitor dashboard for errors across fleet | NOT STARTED |
-| Triage + fix reported issues | NOT STARTED |
-| User feedback collection (structured form or channel) | NOT STARTED |
-| Fix all P0/P1 issues from user reports | NOT STARTED |
-| Rebuild ISO with fixes, re-test | NOT STARTED |
-
---
-
-## Phase 3: Beta Live (Public)
-
-**Gate**: User testing complete. No P0/P1 issues. Telemetry shows stable fleet.
-
-| Item | Status |
-|------|--------|
-| Final ISO build with all fixes | NOT STARTED |
-| Release notes / changelog | NOT STARTED |
-| Download page / distribution | NOT STARTED |
-| Public announcement | NOT STARTED |
-| Telemetry monitoring active for early adopters | NOT STARTED |
-
---
-
-## Session Log
-
-| Date | Session | Work Done | Items Closed |
-|------|---------|-----------|--------------|
-| 2026-03-18 | #1 | Created beta freeze plan, progress tracker | — |
-| 2026-03-18 | #2 | Restructured into 3-phase pipeline, added telemetry workstream | — |
-| 2026-03-18 | #3 | Updated tracking to reflect completed work — TASK-11 done, TASK-8 9/12, UI batch done | TASK-11, TASK-26-30, TASK-32, TASK-34-36, BUG-33 |
-| 2026-03-18 | #4 | Rewrote deploy-tailscale.sh (full deploy with split-mode SSH, rootful migration, containers, infra). Fixed first-boot-containers.sh rootless bugs (subnet, UID mapping, prereqs). Dynamic HTTPS certs. | — |
-| 2026-03-18 | #5 | BUG-1 CSRF fix, TASK-8 12/12 done, 7 bugs fixed, Argon2id migration, random BTC RPC, RBAC hardened, What's New history, Bitcoin sync gauge. Tagged v1.2.0-alpha.9. | BUG-1, TASK-8, BUG-20/37/40/41, TASK-31/38 |
-| 2026-03-25 | #6 | Architecture review audit: all P0s+P1s verified fixed. Fixed remaining items: Nostr timeouts (6 calls), crypto dep pinning (12 deps), container image pinning (15 images), CI pipeline. Update system wired to git.tx1138.com. Cleaned stale branches. Docs updated. | Architecture review 4/4, CI pipeline |
-
---
-
-## Post-Beta Parking Lot
-
-These are explicitly deferred until after beta ships:
- FEATURE-6: Watch-only wallet architecture
- TASK-7: Mesh Bitcoin security hardening
- INQUIRY-5: Offline balance check via mesh relay
- TASK-2: Roll incoming-tx into deploy & ISO (P2, not blocking)
- did:dht integration
- Multi-user support
- Cluster mode
- Mobile companion PWA
--- a/docs/BETA-RELEASE-CHECKLIST.md
+++ b/docs/BETA-RELEASE-CHECKLIST.md
@ -1,269 +0,0 @@
-# Beta Release Checklist (v0.5.0-beta)
-
-## Pre-Build Verification
-
-### Source Code
-
- [ ] All changes committed and pushed to `main`
- [ ] `cargo clippy --all-targets --all-features` passes (zero warnings)
- [ ] `cargo fmt --all` applied
- [ ] `cd neode-ui && npm run type-check` passes (zero errors)
- [ ] `cd neode-ui && npm test` passes (all tests green)
- [ ] `cargo test --all-features` passes on dev server
-
-### Critical Files
-
- [ ] `core/container/src/podman_client.rs` — rootless Podman REST API socket
- [ ] `core/archipelago/src/container/docker_packages.rs` — app metadata + UI mapping
- [ ] `core/archipelago/src/api/rpc/package.rs` — app configs, capabilities, dependencies
- [ ] `core/archipelago/src/session.rs` — session security hardening
- [ ] `core/security/src/secrets_manager.rs` — encryption + rotation
- [ ] `neode-ui/src/views/Marketplace.vue` — all app entries with pinned image versions
- [ ] `neode-ui/src/api/websocket.ts` — heartbeat + reconnection
- [ ] `image-recipe/configs/nginx-archipelago.conf` — all app proxies + path traversal blocks
- [ ] All app icons present in `neode-ui/public/assets/img/app-icons/`
-
---
-
-## App Integration Matrix
-
-Every app must be tested for install, launch, and uninstall on a fresh system.
-
-### Core Bitcoin Stack
-
-| App | Image | Version | Install | Launch | UI Loads | Uninstall |
-|-----|-------|---------|---------|--------|----------|-----------|
-| Bitcoin Knots | `bitcoinknots/bitcoin` | `v28.1` | [ ] | [ ] | [ ] | [ ] |
-| Electrs | `mempool/electrs` | `v0.4.1` | [ ] | [ ] | [ ] | [ ] |
-| LND | `lightninglabs/lnd` | `v0.18.4` | [ ] | [ ] | [ ] | [ ] |
-| BTCPay Server | `btcpayserver/btcpayserver` | `2.0.6` | [ ] | [ ] | [ ] | [ ] |
-| Mempool | `mempool/frontend` | `v3.0.0` | [ ] | [ ] | [ ] | [ ] |
-| Fedimint | `fedimintui/fedimint` | `0.5.0` | [ ] | [ ] | [ ] | [ ] |
-| Fedimint Gateway | `fedimintui/gateway-ui` | `0.5.0` | [ ] | [ ] | [ ] | [ ] |
-
-### Storage & Media
-
-| App | Image | Version | Install | Launch | UI Loads | Uninstall |
-|-----|-------|---------|---------|--------|----------|-----------|
-| File Browser | `filebrowser/filebrowser` | `v2` | [ ] | [ ] | [ ] | [ ] |
-| Immich | `ghcr.io/immich-app/immich-server` | `v1.121.0` | [ ] | [ ] | [ ] | [ ] |
-| PhotoPrism | `photoprism/photoprism` | `240915` | [ ] | [ ] | [ ] | [ ] |
-
-### Productivity & Privacy
-
-| App | Image | Version | Install | Launch | UI Loads | Uninstall |
-|-----|-------|---------|---------|--------|----------|-----------|
-| Penpot | `penpotapp/frontend` | `2.4` | [ ] | [ ] | [ ] | [ ] |
-| SearXNG | `searxng/searxng` | `2024.11.17-e2554de75` | [ ] | [ ] | [ ] | [ ] |
-| Ollama | `ollama/ollama` | `0.5.4` | [ ] | [ ] | [ ] | [ ] |
-
-### Network & Infrastructure
-
-| App | Image | Version | Install | Launch | UI Loads | Uninstall |
-|-----|-------|---------|---------|--------|----------|-----------|
-| Nostr Relay | `scsiblade/nostr-rs-relay` | `0.9.0` | [ ] | [ ] | [ ] | [ ] |
-| Nginx Proxy Manager | `jc21/nginx-proxy-manager` | `2.12.1` | [ ] | [ ] | [ ] | [ ] |
-| Tailscale | `tailscale/tailscale` | pinned | [ ] | [ ] | [ ] | [ ] |
-| Home Assistant | `homeassistant/home-assistant` | pinned | [ ] | [ ] | [ ] | [ ] |
-
-### Virtual Apps (No Container)
-
-| App | Behavior | Works |
-|-----|----------|-------|
-| IndeedHub | Opens external URL | [ ] |
-
---
-
-## Dependency Chain Tests
-
-These must be tested in order on a fresh install:
-
- [ ] Install Bitcoin Knots → starts and begins syncing
- [ ] Install Electrs while Bitcoin running → connects to Bitcoin automatically
- [ ] Install LND while Bitcoin running → connects to Bitcoin automatically
- [ ] Install BTCPay while Bitcoin running → connects; Lightning available if LND present
- [ ] Install Mempool while Bitcoin + Electrs running → shows blockchain data
- [ ] Try installing Electrs without Bitcoin → shows clear error message
- [ ] Try installing LND without Bitcoin → shows clear error message
- [ ] Try installing Mempool without Bitcoin + Electrs → shows missing deps error
- [ ] Fedimint Gateway auto-detects LND credentials when available
-
---
-
-## Security Hardening Verification
-
-### Session Security
-
- [ ] Sessions expire after 24 hours of inactivity
- [ ] Password change invalidates all other sessions
- [ ] Maximum 5 concurrent sessions (oldest evicted when exceeded)
- [ ] Session tokens are SHA-256 hashed in memory (never stored as plaintext)
- [ ] Login rate limiting: 5 failures per 60 seconds per IP
-
-### Container Security
-
- [ ] All container images use pinned versions (no `:latest`)
- [ ] Read-only root filesystem enabled for compatible apps
- [ ] `--cap-drop=ALL` applied to all containers
- [ ] `--security-opt=no-new-privileges:true` applied to all containers
- [ ] Required capabilities added explicitly per app (e.g., CHOWN for File Browser)
-
-### Secrets Management
-
- [ ] Secrets encrypted with AES-256-GCM on disk
- [ ] Secret metadata tracked (creation date, rotation count)
- [ ] Secret rotation generates new random values and re-encrypts
- [ ] `security.list-expiring` RPC returns secrets older than threshold
-
-### Path Traversal Prevention
-
- [ ] Nginx blocks `..` in filebrowser API paths (403 response)
- [ ] Frontend `sanitizePath()` strips `..` and resolves paths
- [ ] File Browser token not exposed in URLs
-
-### Authentication
-
- [ ] TOTP 2FA enrollment and verification works
- [ ] TOTP backup codes work for recovery
- [ ] Maximum 5 TOTP attempts before session invalidation
- [ ] Pending TOTP sessions expire after 5 minutes
- [ ] Cookie-based auth (no tokens in query strings)
-
---
-
-## WebSocket & Connectivity
-
- [ ] WebSocket connects on login and receives initial data dump
- [ ] WebSocket reconnects after network interruption (exponential backoff, max 30s)
- [ ] Server sends ping every 30s; client responds with pong
- [ ] Client sends JSON ping every 30s; server responds with JSON pong
- [ ] Server closes inactive connections after 5 minutes
- [ ] Connection state shown in UI (connected/reconnecting/disconnected)
- [ ] Install progress updates delivered in real-time via WebSocket
-
---
-
-## Fresh Install Testing Matrix
-
-### ISO Build
-
- [ ] ISO builds successfully on dev server
- [ ] ISO size is reasonable (< 10 GB)
- [ ] All container images captured in ISO
-
-### Installation
-
- [ ] Boot from USB on x86_64 hardware
- [ ] Auto-installer partitions disk correctly
- [ ] Debian 13 installs without errors
- [ ] Archipelago services start on first boot
- [ ] Web UI accessible at server IP within 3 minutes of first boot
-
-### Onboarding Flow
-
- [ ] Welcome screen displays with intro video
- [ ] Password creation enforces minimum requirements
- [ ] Path selection shows all 6 options
- [ ] DID generation completes within 60 seconds
- [ ] Identity naming is optional and skippable
- [ ] Backup download produces valid JSON file
- [ ] Onboarding completes and reaches Dashboard
-
-### Post-Onboarding
-
- [ ] Dashboard shows all overview cards
- [ ] App Store loads with all curated apps
- [ ] Settings shows server name, version, DID, Tor address
- [ ] Logout and re-login works
- [ ] Password change works and invalidates other sessions
-
---
-
-## Performance Targets
-
- [ ] Backend startup: < 3 seconds
- [ ] Frontend initial load: < 500 KB gzipped
- [ ] WebSocket initial data: < 1 second after connection
- [ ] App install progress visible in UI within 5 seconds of starting
-
---
-
-## Nginx Proxy Verification
-
-All app proxies must work in both HTTP and HTTPS blocks:
-
- [ ] `/rpc/` → backend:5678
- [ ] `/ws/` → backend:5678 (WebSocket upgrade)
- [ ] `/health` → backend:5678
- [ ] `/app/filebrowser/` → filebrowser:80
- [ ] `/app/searxng/` → searxng:8080
- [ ] `/app/immich/` → immich:2283
- [ ] `/app/penpot/` → penpot-frontend:80
- [ ] `/app/ollama/` → ollama:11434
- [ ] `/app/photoprism/` → photoprism:2342
- [ ] `/app/nginx-proxy-manager/` → npm:81
- [ ] `/app/tailscale/` → tailscale:8240
- [ ] BTCPay (port 23000) opens in new tab
- [ ] Home Assistant (port 8123) opens in new tab
- [ ] Tor hidden services resolve for all configured apps
-
---
-
-## Rollback Procedures
-
-### If Backend Fails to Start
-
-```bash
-# Check logs
-sudo journalctl -u archipelago -n 50 --no-pager
-
-# Restore previous binary
-sudo cp /usr/local/bin/archipelago.bak /usr/local/bin/archipelago
-sudo systemctl restart archipelago
-```
-
-### If Frontend is Broken
-
-```bash
-# Restore previous frontend build
-sudo cp -r /opt/archipelago/web-ui.bak/* /opt/archipelago/web-ui/
-sudo systemctl reload nginx
-```
-
-### If Container Won't Start
-
-```bash
-# Check container logs
-podman logs <container-name>
-
-# Remove and recreate
-podman rm -f <container-name>
-# Reinstall from App Store
-```
-
-### If ISO Install Fails
-
-1. Boot into rescue mode from USB
-2. Check `/var/log/installer.log` on target disk
-3. Verify disk partitioning with `lsblk`
-4. Re-run installer with `INSTALLER_STARTED= /opt/installer.sh`
-
-### Full System Rollback
-
-If the beta is unusable:
-1. Re-flash the ISO from the last known good build
-2. Restore user data from `/var/lib/archipelago/` backup
-3. Re-import DID from backup JSON file
-
---
-
-## Sign-Off
-
-| Reviewer | Area | Date | Pass/Fail |
-|----------|------|------|-----------|
-| | Backend | | |
-| | Frontend | | |
-| | Security | | |
-| | ISO Build | | |
-| | Fresh Install | | |
-| | App Integrations | | |
--- a/docs/CHAT_TRANSCRIPT_2026-05-02.md
+++ b/docs/CHAT_TRANSCRIPT_2026-05-02.md
@ -1,317 +0,0 @@
-# Chat Transcript And Working Notes
-
-Date: 2026-05-02
-
-This file captures the current chat context, decisions, progress, and next steps so work can continue from another device/session.
-
-## User Request
-
-The user asked to continue hardening Archipelago app/container lifecycle, then asked multiple times to save the plan/progress/next steps and finally to save the entire chat to Markdown.
-
-Key user constraints and corrections:
-
- Continue if next steps are clear; ask only if blocked.
- Exhaustively harden app/container lifecycle before release.
- Preserve data during destructive lifecycle testing unless explicitly instructed otherwise.
- Do not rely on `/app/...` proxy paths for app launch/testing. The user corrected: “we never use paths only ports.”
- LND/Electrum wallet-connect tests must validate real connection details and QR, including Tor.
-
-## Earlier Progress Summary
-
-Before the latest work, the project already had substantial lifecycle hardening in progress:
-
- Remote lifecycle harness exists at `tests/lifecycle/remote-lifecycle.sh`.
- `.198` SSH works with `/home/archipelago/.ssh/id_ed25519`.
- `.228` RPC works, but SSH is blocked with `Permission denied (publickey,password)`.
- Multiple backend release binaries were built and deployed to `.198` with backups in `/usr/local/bin/archipelago.bak-*`.
- Fixed stale package scanner state recovery from `Removing -> Running` when a container is actually live.
- Fixed startup ordering so crash recovery runs before BootReconciler.
- Removed dangerous automatic Podman runtime directory deletion on `podman info` failure.
- Narrowed generic crash recovery to safe legacy containers.
- Fixed companion reconciliation on install/start/restart.
- Fixed uninstall/reinstall behavior so uninstall disables manifest apps instead of deleting manifest availability, and reinstall re-enables them.
- Fixed LND config generation/repair:
-  - `bitcoin.active=true`
-  - `bitcoin.mainnet=true`
-  - `bitcoin.node=bitcoind`
-  - `bitcoind.rpchost=bitcoin-knots:8332`
-  - sudo fallback for writing container-owned config paths.
- `.198` had previously passed focused lifecycle for `filebrowser`, `bitcoin-knots`, and a looser LND launch test.
-
-## Major Files Touched In This Session
-
- `docs/CONTAINER_LIFECYCLE_HANDOFF.md`
- `docs/CHAT_TRANSCRIPT_2026-05-02.md`
- `tests/lifecycle/remote-lifecycle.sh`
- `core/archipelago/src/container/lnd.rs`
- `core/archipelago/src/container/companion.rs`
- `core/archipelago/src/container/prod_orchestrator.rs`
- `core/archipelago/src/container/docker_packages.rs`
- `core/container/src/podman_client.rs`
- `core/archipelago/src/port_allocator.rs`
- `apps/lnd-ui/manifest.yml`
- `neode-ui/src/views/appSession/appSessionConfig.ts`
- `neode-ui/src/stores/container.ts`
- `neode-ui/src/stores/appLauncher.ts`
- `neode-ui/src/views/appDetails/appDetailsData.ts`
- nginx config/snippet files under `scripts/` and `image-recipe/`
-
-## LND Wallet Bootstrap Investigation
-
-Initial strict LND probe failed because `/lnd-connect-info` could not read `admin.macaroon`:
-
-```text
-Failed to read LND admin macaroon — is LND installed?
-direct: Permission denied (os error 13)
-sudo: cat: /var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon: No such file or directory
-```
-
-LND logs showed the wallet was uninitialized/locked:
-
-```text
-Waiting for wallet encryption password. Use lncli create...
-```
-
-Tests showed `lncli create` is interactive and does not support `--stdin`:
-
-```text
-[lncli] flag provided but not defined: -stdin
-```
-
-`lncli unlock --stdin` is supported, so the final approach was:
-
- Use LND REST unlocker endpoints for new wallet creation.
- Use `lncli unlock --stdin` only for an existing wallet.
- Treat “wallet already exists” from REST as a signal to unlock.
- Use sudo-aware checks/reads for wallet artifacts because LND data directories are container-owned and `0700`.
-
-Implemented in `core/archipelago/src/container/lnd.rs`:
-
- `ensure_wallet_initialized()`
- `file_exists_as_root()`
- `read_file_as_root()`
- `init_wallet_via_rest()`
- `get_lnd_unlocker_json()`
- `post_lnd_unlocker_json()`
- `unlock_existing_wallet()`
- `wait_for_admin_macaroon()`
- `lnd_getinfo_ready()`
-
-Focused Rust test passes:
-
-```bash
-cd /home/archipelago/Projects/archy/core
-cargo test -p archipelago --bin archipelago lnd
-```
-
-Result:
-
-```text
-7 passed; 0 failed
-```
-
-## LND UI Port Collision
-
-The strict LND UI test then failed with `502`.
-
-Investigation found a real port collision:
-
- `nostr-rs-relay` uses host `8081`.
- Old `archy-lnd-ui` also used host `8081`.
- nginx `/app/lnd/` proxy also pointed at `8081`.
-
-Fix implemented:
-
- Move LND UI companion to host port `18083`, container port `80`.
- Keep `nostr-rs-relay` on `8081`.
- Update app metadata/routing to `18083`.
- Update tests to expect direct port launch.
-
-Important correction from user:
-
-```text
-we never use paths only ports, how many times do you need to be told
-```
-
-Action taken after correction:
-
- Stop validating through `/app/lnd/` and `/app/electrumx/` in the lifecycle harness.
- Switch `launch_url_for()` to direct app ports.
- Switch app session resolver to direct `http://host:port` launch, even from HTTPS parent pages.
- Remove use of `HTTPS_PROXY_PATHS[id]` in `resolveAppUrl()`.
-
-Direct-port LND audit command:
-
-```bash
-ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd tests/lifecycle/remote-lifecycle.sh
-```
-
-Result:
-
-```text
-### 192.168.1.198 iteration 1 / 1 ###
-lnd                    state=running
-all checks passed
-```
-
-The audit now validates `http://192.168.1.198:18083/`, not `/app/lnd/`.
-
-## Lifecycle Harness Changes
-
-`tests/lifecycle/remote-lifecycle.sh` changes made:
-
- Normalize package states with `ascii_downcase` because API returned `Running`.
- Direct port launch URLs:
-  - LND: `http://${ARCHY_HOST}:18083/`
-  - Electrum/Electrs: `http://${ARCHY_HOST}:50002/`
-  - Bitcoin UI: `http://${ARCHY_HOST}:8334/`
-  - Other apps mapped to direct ports where known.
- LND probe checks:
-  - `Connect Your Wallet`
-  - `id="lndQrBox"`
-  - `id="connHost"`
-  - `value="rest-tor"`
-  - `value="grpc-tor"`
-  - `value="rest-local"`
-  - `value="grpc-local"`
-  - `Copy lndconnect URI`
-  - `/lnd-connect-info` cert, macaroon, ports, and Tor onion.
- Electrum probe checks:
-  - local QR container and address field
-  - Tor QR container and onion field
-  - port `50001`
-  - QR renderer
-  - direct `http://${ARCHY_HOST}:50002/qrcode.js`
-  - `/electrs-status` Tor onion.
- Full lifecycle now fails immediately on any failed phase with `|| return 1` so a later reinstall cannot mask a failed restart/probe.
-
-## Deployments To `.198`
-
-Several release builds were made and deployed:
-
-```bash
-cd /home/archipelago/Projects/archy/core
-cargo build -p archipelago --bin archipelago --release
-```
-
-Deploy pattern:
-
-```bash
-scp -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no \
-  /home/archipelago/Projects/archy/core/target/release/archipelago \
-  archipelago@192.168.1.198:/tmp/archipelago.new
-
-ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no \
-  archipelago@192.168.1.198 \
-  "sudo cp /usr/local/bin/archipelago /usr/local/bin/archipelago.bak-<timestamp> && \
-   sudo install -m 0755 /tmp/archipelago.new /usr/local/bin/archipelago && \
-   sudo systemctl restart archipelago.service && \
-   systemctl is-active archipelago.service"
-```
-
-Latest deploy returned:
-
-```text
-active
-```
-
-## `.198` Current Observations
-
-After forcing LND package restart, companion reconciliation succeeded:
-
-```text
-nostr-rs-relay Up ... 0.0.0.0:8081->8080/tcp
-lnd Up ... 0.0.0.0:8080->8080/tcp, 0.0.0.0:9735->9735/tcp, 0.0.0.0:10009->10009/tcp
-archy-lnd-ui Up ... 0.0.0.0:18083->80/tcp
-```
-
-Direct UI test from `.198` returned `200`:
-
-```bash
-curl -i http://127.0.0.1:18083/
-```
-
-Strict direct-port LND audit is green:
-
-```text
-lnd                    state=running
-all checks passed
-```
-
-## Full LND Lifecycle Status
-
-Full direct-port lifecycle was started:
-
-```bash
-ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
-```
-
-It reached:
-
-```text
-### 192.168.1.198 iteration 1 / 1 ###
-== lnd: install ==
-== lnd: stop ==
-```
-
-Then the user aborted the command while asking to save memory/transcript.
-
-The next continuation point is to rerun full LND direct-port lifecycle from scratch and inspect the stop phase if it hangs/fails.
-
-## Handoff File
-
-A durable handoff file was also created:
-
-```text
-docs/CONTAINER_LIFECYCLE_HANDOFF.md
-```
-
-It contains the plan, progress, current blockers, and next steps.
-
-## Immediate Next Steps
-
-1. Rerun full strict LND direct-port lifecycle:
-
-```bash
-ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
-```
-
-2. If it hangs/fails at `stop`, inspect package runtime stop path and logs:
-
-```bash
-ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 \
-  'journalctl -u archipelago.service -n 260 --no-pager | egrep -i "package\.(stop|start|restart|install|uninstall)|lnd|companion|error|failed" | sed -n "1,220p"; podman ps -a --format "{{.Names}} {{.Status}} {{.Ports}}" | egrep "lnd|nostr" || true'
-```
-
-3. If stop is unreliable, inspect/fix:
-
- `core/archipelago/src/api/rpc/package/runtime.rs`
- `core/archipelago/src/container/prod_orchestrator.rs`
-
-Likely causes to check:
-
- Reconciler restarting LND while stop is expected.
- State scanner reporting stale `running`.
- Companion handling interfering with parent app state.
- Async lifecycle returning before actual stop completes.
-
-4. Once LND full lifecycle is green, run Electrum strict lifecycle with direct port `50002`:
-
-```bash
-ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=electrumx ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
-```
-
-5. Continue with app groups after LND/Electrum:
-
- `filebrowser`
- `bitcoin-knots`
- `lnd`
- `electrumx`
- `mempool`
- `btcpay-server`
- `fedimint`
- remaining catalog apps.
-
-## Important Instruction To Preserve
-
-Use ports only for app launch/testing. Do not add or rely on `/app/...` path proxy launch behavior unless the user explicitly changes this requirement.
--- a/docs/CONTAINER-ISSUES-REPORT.md
+++ b/docs/CONTAINER-ISSUES-REPORT.md
@ -1,508 +0,0 @@
-# Archipelago Container Infrastructure — Critical Issues Report
-
-**Date:** 2026-03-31
-**Status:** Server .228 rebooted — some apps recovered, many did not. UI showed everything as "crashed" during recovery window.
-**Purpose:** Fix guide for getting container lifecycle to production quality.
-
---
-
-## Executive Summary
-
-The container system has **7 systemic failures** that compound each other:
-
-1. **Silent failures everywhere** — errors are swallowed with `|| true`, `.unwrap_or_default()`, and warn-level logs. Nothing actually tells the user (or the system) that something broke.
-2. **Health checks are fake** — manifests define real health checks (HTTP probes, exec checks) but they are **never executed**. "Healthy" just means `podman ps` shows "running".
-3. **Duplicate polling burns CPU** — health monitor + metrics collector both call `podman stats` every 60 seconds independently. Add crash recovery snapshots, disk monitor, and frontend polling = constant subprocess spawning.
-4. **Uninstall doesn't clean up** — no volume removal, no network cleanup, force-kills stateful containers (risking wallet/DB corruption), returns 200 OK on partial failure.
-5. **Two divergent install paths** — `first-boot-containers.sh` and the Rust RPC installer use different passwords, ports, capabilities, memory limits, and Bitcoin config. They are never in sync.
-6. **UI misrepresents state** — `Exited` (even clean exit code 0) shows as "crashed". No "recovering" or "starting up" state exists. During boot recovery, UI shows a wall of red/gray "crashed" labels.
-7. **Dependency-blind restarts** — health monitor restarts services without restarting their dependencies first, so they immediately fail again and burn through the 3-attempt limit.
-
---
-
-## LIVE EVIDENCE: .228 Reboot on 2026-03-31
-
-After rebooting .228, here's the actual container state 30 minutes later:
-
-### Permanently Dead (exceeded 3 restart attempts, abandoned)
-| Container | Exit Code | Cause |
-|-----------|-----------|-------|
-| `indeedhub-postgres` | 0 (clean) | Shut down by reboot. Health monitor tried 3 restarts, it keeps exiting cleanly. Once abandoned, all dependent services die too. |
-| `indeedhub-redis` | 0 | Same — clean exit, 3 failed restart attempts, abandoned |
-| `indeedhub-minio` | 0 | Same |
-| `indeedhub-relay` | 0 | Same |
-| `indeedhub` | 0 | Same |
-| `indeedhub-api` | 1 | Can't resolve hostname `indeedhub-postgres` (postgres is dead, DNS entry gone from network) |
-| `jellyfin` | 137 (OOM) | "Failed to create CoreCLR" — memory limit too low for .NET runtime. SIGKILL = OOM. 3 attempts exhausted. |
-
-### Crash-Looping (still failing on every restart)
-| Container | Cause |
-|-----------|-------|
-| `mempool-api` | `ECONNREFUSED 10.89.0.42:3306` — DB (`archy-mempool-db`) just restarted, not ready yet |
-| `portainer` | "database schema version does not align with server version" — image upgraded, DB not migrated. Will NEVER recover. |
-| `photoprism` | "Failed creating test file in storage folder" — volume permission issue (rootless UID mapping) |
-
-### Never Started (stuck in "Created" state)
-| Container | Cause |
-|-----------|-------|
-| `archy-mempool-web` | "cannot assign requested address" — network binding failure |
-| `fedimint` | Same network error |
-
-### Running but Unhealthy
-| Container | Notes |
-|-----------|-------|
-| `homeassistant` | Up 14 min, health check failing |
-| `searxng` | Up 13 min, health check failing |
-| `onlyoffice` | Up 10 min, health check failing |
-
-### Actually Recovered (healthy)
-`filebrowser`, `bitcoin-knots`, `vaultwarden`, `nginx-proxy-manager`, `archy-btcpay-db`, `lnd`, `electrumx`, `grafana`
-
-### Key Observations
-1. **All containers have `unless-stopped` restart policy** — but this doesn't help because containers that exit cleanly (code 0) don't get restarted by Podman. The health monitor is the only restart mechanism, and it gives up after 3 attempts.
-2. **The entire IndeedHub stack died** because postgres was abandoned first. Once postgres hit 3 restart attempts, every dependent service (api, redis, minio, relay, main) also failed and hit their own 3-attempt limit. **No dependency awareness.**
-3. **Containers in "Created" state** were never even started — some kind of network assignment failure during creation. The health monitor doesn't handle "Created" state containers.
-4. **The UI showed ALL apps as "crashed"** during the first few minutes, even the ones that eventually recovered. This is because `Exited` state (even exit code 0) maps to the label "crashed" in `appsConfig.ts`.
-
---
-
-## Problem 1: Containers Don't Start or Recover After Reboot
-
-**Confirmed:** All apps crashed after .228 reboot on 2026-03-31.
-
-### Root Causes
-
-#### A. Crash recovery has a 30-second timeout that's too short
-**File:** `core/archipelago/src/crash_recovery.rs:265-271`
-```rust
-let result = tokio::time::timeout(
-    std::time::Duration::from_secs(30),
-    tokio::process::Command::new("podman").args(["start", &record.name]).output(),
-).await;
-```
-On a cold boot with many containers, Podman is under load. 30 seconds is not enough. If it times out, the container is **skipped** — no retry.
-
-#### B. If `podman ps` itself times out, recovery finds zero containers
-**File:** `core/archipelago/src/crash_recovery.rs:318`
-The `podman ps -a` call to discover stopped containers has a 30-second timeout. On a busy system post-reboot, this can timeout. Result: `all_names` is empty, recovery silently exits having started nothing.
-
-#### C. Boot tier ordering uses a catch-all that misses dependencies
-**File:** `core/archipelago/src/crash_recovery.rs:374-385`
-```rust
-fn container_boot_tier(name: &str) -> u8 {
-    match id {
-        "btcpay-db" | "mempool-db" | ... => 0,  // databases
-        "bitcoin-knots" | ... => 1,               // bitcoin
-        "lnd" | "electrumx" | ... => 2,           // depends on bitcoin
-        "mempool-web" | ... => 4,                  // frontend
-        _ => 3,  // EVERYTHING ELSE - may start before its dependencies
-    }
-}
-```
-Any app not explicitly listed gets tier 3, which may be before its dependencies are ready.
-
-#### D. First-boot script swallows ALL errors
-**File:** `scripts/first-boot-containers.sh:8` — no `set -e`
-48+ commands have `|| true` appended. Every `podman run` failure is silently ignored. The script always exits 0 and reports "complete" to systemd even if 50% of containers failed.
-
-#### E. Install RPC returns success before container is actually running
-**File:** `core/archipelago/src/api/rpc/package/install.rs:260-294`
-After container creation, the installer polls for 30 seconds (6 checks x 5 seconds). If the container is still in "created" or "starting" state after 30 seconds:
-```rust
-if i == 5 {
-    debug!("Container {} health check timeout (30s) -- continuing anyway");
-}
-```
-It logs at debug level and **returns success**. The user sees "installed" but the container never actually started.
-
-### Fixes Required
-
-1. **Increase crash recovery timeout to 120s** and add retry with backoff (3 attempts per container)
-2. **Increase `podman ps` timeout to 60s** during boot recovery
-3. **Replace tier catch-all** — every container must be explicitly listed or derived from manifest dependencies
-4. **Remove `|| true`** from critical commands in first-boot-containers.sh. Use proper error handling: log the error, record the failure, continue to next container, but report actual failures at the end
-5. **Install RPC must return failure** if container isn't running after timeout, not silently succeed
-6. **Add `--restart unless-stopped`** to container creation in the Podman client (`core/container/src/podman_client.rs:303-335`) — currently missing, so Podman itself never auto-restarts crashed containers
-
---
-
-## Problem 2: Health Checks Are Fake
-
-### Root Causes
-
-#### A. "Healthy" just means "running" — application health is never checked
-**File:** `core/archipelago/src/container/dev_orchestrator.rs:239-249`
-```rust
-pub async fn get_health_status(&self, app_id: &str) -> Result<String> {
-    match status.state {
-        ContainerState::Running => Ok("healthy".to_string()),  // <-- THIS IS THE ENTIRE CHECK
-        ContainerState::Stopped | ContainerState::Exited => Ok("unhealthy".to_string()),
-        ...
-    }
-}
-```
-A container can be "running" but the application inside is completely broken. This is reported as "healthy".
-
-#### B. Manifest health checks exist but are never executed
-All 30+ app manifests in `image-recipe/build/debian-iso/custom/archipelago/apps/*/manifest.yml` define health checks like:
-```yaml
-health_check:
-  type: http
-  endpoint: http://localhost:4080
-  path: /api/health
-  interval: 30s
-  timeout: 5s
-  retries: 3
-```
-The `HealthMonitor` struct at `core/container/src/health_monitor.rs` can execute these checks. **But it is never instantiated.** No code path creates a `HealthMonitor` from the manifest health check definitions.
-
-#### C. Health status is never pushed to the frontend via WebSocket
-**File:** `core/archipelago/src/data_model.rs:120-127`
-```rust
-pub struct PackageDataEntry {
-    pub health: Option<String>,  // Field exists but is NEVER POPULATED
-}
-```
-The health field in the data model is always `None`. Frontend can only get health via explicit RPC call, which it almost never makes.
-
-#### D. Frontend never polls health status
-**File:** `neode-ui/src/stores/container.ts:169-175`
-`fetchHealthStatus()` is only called after `startContainer()` and `startBundledApp()`. There is **no setInterval, no periodic polling, no watch**. After the initial call, health status is never refreshed.
-
-### Fixes Required
-
-1. **Wire up manifest health checks** — instantiate `HealthMonitor` from manifest definitions, run actual HTTP/exec probes instead of just checking `podman ps`
-2. **Populate the `health` field in `PackageDataEntry`** so WebSocket pushes real health status to frontend
-3. **Add 30-second health polling** in the frontend container store (with backoff to 60s when all healthy)
-4. **Fix `get_health_status()`** in dev_orchestrator to call actual health checks, not just check container state
-
---
-
-## Problem 3: CPU Exhaustion from Duplicate Polling
-
-### Root Causes
-
-#### A. Two independent monitors both call `podman stats` every 60 seconds
- **Health monitor:** `core/archipelago/src/health_monitor.rs:17` — `CHECK_INTERVAL_SECS = 60`
-  - Runs `podman ps -a --format json` (line 305-323)
-  - Runs `podman stats --no-stream` every 5 cycles (line 442-450)
- **Metrics collector:** `core/archipelago/src/monitoring/mod.rs:28` — 60-second interval
-  - Runs `podman stats --no-stream --format json` independently (collector.rs:220-224)
-
-These are **not coordinated**. Both spawn separate subprocesses. On a system with 15+ containers, each `podman stats` call is expensive.
-
-#### B. Total subprocess spawning frequency
-| Component | Interval | What it runs |
-|-----------|----------|-------------|
-| Health monitor | 60s | `podman ps`, `podman stats` (every 5th), restart attempts |
-| Metrics collector | 60s | `podman stats` (duplicate!) |
-| Crash recovery snapshot | 120s | `podman ps` |
-| Disk monitor | 300s | `df`, `sudo dmesg`, potentially `podman image prune` |
-| Telemetry | 900s | `podman stats` (another duplicate) |
-| Systemd watchdog | 120s | sd_notify ping |
-| Frontend fleet polling | 60s | RPC calls that trigger more podman commands |
-
-That's roughly **one `podman` subprocess every 10-15 seconds** on average, plus all the triggered operations.
-
-#### C. No restart policy means polling-driven restarts
-**File:** `core/container/src/podman_client.rs:303-335`
-Container creation spec does NOT include `RestartPolicy`. Podman itself never restarts crashed containers. Instead, the health monitor's 60-second poll detects the crash and attempts a restart. This is far more CPU-intensive than Podman's built-in restart mechanism.
-
-#### D. Health monitor restart attempts with exponential backoff still spawn processes
-When a container fails, the health monitor tries restarts at 10s, 30s, 90s backoff. Each attempt spawns `podman start`, `podman inspect`, etc. If multiple containers are unhealthy, this multiplies.
-
-### Fixes Required
-
-1. **Deduplicate `podman stats`** — create a shared cache layer. One component fetches, others read from cache (TTL: 30s)
-2. **Add `RestartPolicy: unless-stopped` with MaxRetryCount: 5** to all container creation — let Podman handle restarts natively instead of polling
-3. **Increase health monitor interval to 120s** (60s is too aggressive when health checks are just `podman ps`)
-4. **Remove duplicate `podman stats`** call from metrics collector — share data with health monitor
-5. **Make frontend fleet polling viewport-aware** — only poll when user is actually viewing the fleet page
-6. **Batch all container queries** — use a single `podman ps -a --format json` per check cycle, shared across all consumers
-
---
-
-## Problem 4: Uninstall Doesn't Work
-
-### Root Causes
-
-#### A. No volume removal
-**File:** `core/archipelago/src/api/rpc/package/runtime.rs:172-289`
-The uninstall function stops containers, removes containers, releases ports, and attempts data directory cleanup. It **never removes Podman volumes**. Orphaned volumes accumulate forever.
-
-#### B. No network cleanup
-**File:** `core/archipelago/src/api/rpc/package/runtime.rs:172-289`
-Multi-container stacks create networks (`archy-net`, `immich-net`, `penpot-net`) during install (`stacks.rs:89, 211`). These are **never cleaned up** during uninstall. Leftover networks can prevent reinstallation.
-
-#### C. Force-kills stateful containers without graceful shutdown
-**File:** `core/archipelago/src/api/rpc/package/runtime.rs:226`
-```rust
-let rm_out = tokio::process::Command::new("podman")
-    .args(["rm", "-f", name])  // -f = force kill
-    .output().await;
-```
-The code defines proper shutdown timeouts (Bitcoin: 600s, LND: 330s, databases: 120s) but only uses them for `stop`. The `rm -f` that follows **ignores these timeouts** and force-kills immediately. This risks corrupting Bitcoin's UTXO set, LND channel state, or database WAL.
-
-#### D. Returns 200 OK even on partial failure
-**File:** `core/archipelago/src/api/rpc/package/runtime.rs:268-289`
-```rust
-Ok(serde_json::json!({
-    "status": if errors.is_empty() { "uninstalled" } else { "partial" },
-    ...
-}))
-```
-Returns HTTP 200 with `"partial"` status. Frontend at `neode-ui/src/views/apps/useAppsActions.ts:74` doesn't check for "partial" — it deletes the app from the UI regardless.
-
-#### E. Data directory cleanup requires sudo and fails silently
-**File:** `core/archipelago/src/api/rpc/package/runtime.rs:256-265`
-```rust
-let rm_out = tokio::process::Command::new("sudo")
-    .args(["rm", "-rf", dir]).output().await;
-if let Ok(o) = rm_out {
-    if !o.status.success() {
-        tracing::warn!(...);  // Warning only, continues
-    }
-}
-```
-If sudo isn't configured or fails, data remains on disk but UI shows "uninstalled".
-
-#### F. Container name detection has gaps
-**File:** `core/archipelago/src/api/rpc/package/config.rs:287-340`
-Container names are hardcoded patterns. If a container was created with a different naming convention (e.g., by first-boot-containers.sh vs RPC installer), it won't be found and won't be removed.
-
-### Fixes Required
-
-1. **Add `podman volume rm`** for all volumes associated with the app after container removal
-2. **Add network cleanup** — remove app-specific networks after all containers on that network are gone
-3. **Use `podman stop -t {timeout}` then `podman rm`** (without -f) — respect graceful shutdown timeouts, especially for Bitcoin/LND/databases
-4. **Return an error (not 200)** when uninstall has failures. Frontend must check and display errors
-5. **Surface "partial" failures to the user** with specific error messages
-6. **Unify container naming** — derive names from a single source (manifest), not hardcoded patterns in multiple files
-
---
-
-## Problem 5: Two Divergent Install Paths
-
-The first-boot bash script and the Rust RPC installer create containers with **different configurations**. This is a major source of bugs.
-
-### Specific Divergences
-
-#### A. Database passwords
- **First-boot** (`scripts/first-boot-containers.sh:118-127`): Generates random passwords with `openssl rand -base64 24`, stores in `/var/lib/archipelago/secrets/`
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:456,484,514-515,610`): Uses hardcoded `"btcpaypass"`, `"mempoolpass"`, `"rootpass"`, `"immichpass"`
-
-**Result:** Apps installed via RPC after first-boot can't connect to databases because passwords don't match.
-
-#### B. Bitcoin configuration
- **First-boot** (`scripts/first-boot-containers.sh:295-313`): Dynamically sets `-prune=550` on small disks, `-txindex=1` on large disks
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:415-420`): No custom args at all
-
-**Result:** Bitcoin installed via RPC has no pruning or txindex regardless of disk size.
-
-#### C. ZMQ configuration for LND
- **First-boot** (`scripts/first-boot-containers.sh:100-114`): Bitcoin.conf generated without ZMQ publisher settings
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:438-439`): LND configured to connect to `tcp://bitcoin-knots:28332` and `tcp://bitcoin-knots:28333`
-
-**Result:** LND can't receive block notifications from Bitcoin because ZMQ isn't configured on either path.
-
-#### D. Port conflicts
- **First-boot** (`scripts/first-boot-containers.sh:813,835`): Both strfry and indeedhub bind to host port 7777
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:734`): IndeedHub uses `8190:3000`
-
-**Result:** On first-boot, whichever of strfry/indeedhub starts second fails. Via RPC, different port entirely.
-
-#### E. Memory limits
- **First-boot** (`scripts/first-boot-containers.sh:253-283`): Ollama gets 1g on low-mem systems
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:245-280`): Ollama gets 4g always
-
-**Result:** Same app gets different resource limits depending on how it was installed.
-
-#### F. Version mismatches in marketplace UI
- `scripts/image-versions.sh:17`: LND image is `v0.18.4-beta`
- `neode-ui/src/views/marketplace/marketplaceData.ts:155`: Shows `0.17.4`
- `scripts/image-versions.sh:21-22`: Mempool images are `v3.0.0`
- `neode-ui/src/views/marketplace/marketplaceData.ts:177`: Shows `2.5.0`
-
-### Fixes Required
-
-1. **Single source of truth for container config** — Rust config must read passwords from `/var/lib/archipelago/secrets/`, not hardcode them
-2. **Add ZMQ config** to Bitcoin startup in both paths: `zmqpubrawblock=tcp://0.0.0.0:28332` and `zmqpubrawtx=tcp://0.0.0.0:28333`
-3. **Fix port 7777 conflict** — assign unique ports to strfry and indeedhub
-4. **Add disk-aware Bitcoin config** to Rust installer (prune/txindex based on disk size)
-5. **Sync memory limits** between first-boot and Rust config
-6. **Update marketplace version strings** to match actual image versions in `image-versions.sh`
-7. **Long-term: eliminate first-boot-containers.sh** — have the backend handle all container creation using the same Rust code path
-
---
-
-## Problem 6: Post-Install Hooks Run Async and Fail Silently
-
-**File:** `core/archipelago/src/api/rpc/package/install.rs:541-625`
-
-Post-install hooks (setting FileBrowser password, configuring NextCloud, etc.) are spawned as background tasks:
-```rust
-tokio::spawn(async move {
-    let _ = tokio::fs::create_dir_all(secret_dir).await;
-    let _ = tokio::fs::write(...).await;
-});
-```
-
-The install RPC returns success **before hooks complete**. If a hook fails (network timeout, service not ready), the error is logged but the user is told installation succeeded. Credentials aren't set, configs aren't applied.
-
-### Fix Required
-
-Await post-install hooks before returning success, or return a "configuring" status and let the frontend poll for completion.
-
---
-
-## Problem 7: Podman Client Swallows Errors
-
-**File:** `core/container/src/podman_client.rs`
-
-#### A. JSON serialization failures return empty strings (line 182-183)
-```rust
-let body_str = body.map(|b| serde_json::to_string(&b).unwrap_or_default()).unwrap_or_default();
-```
-
-#### B. Container ID parsing failures return empty string (line 344-348)
-```rust
-let id = result["Id"].as_str().unwrap_or("").to_string();
-Ok(id)  // Empty string = success?
-```
-
-#### C. Socket timeout is only 5 seconds (line 154-160)
-On a busy system or during boot, Podman socket may take >5s to respond. Every API call fails. No retry logic.
-
-### Fixes Required
-
-1. Replace `.unwrap_or_default()` with proper error propagation using `?`
-2. Return `Err` when container ID is empty
-3. Increase socket timeout to 15-30s
-4. Add retry with backoff (3 attempts) on socket connection
-
---
-
-## Problem 8: UI Misrepresents Container State
-
-### Root Causes
-
-#### A. "Exited" always displays as "Crashed" — even for clean shutdowns
-**File:** `neode-ui/src/views/apps/appsConfig.ts:119-146`
-```typescript
-getStatusLabel(state, health):
-  - "exited" → "crashed"     // <-- THIS IS THE PROBLEM
-```
-Every container that exited — whether from a clean reboot (exit 0), OOM kill (exit 137), or app error (exit 1) — shows the same "crashed" label. After a reboot, the UI is a wall of "crashed" labels even though containers are in the process of starting up.
-
-#### B. No "recovering" or "boot in progress" state exists
-**File:** `core/archipelago/src/data_model.rs:103-119`
-PackageState enum has `Starting`, but it's only set during **explicit user start actions**, not during automatic crash recovery. During boot recovery, containers transition from `Exited → Running` without ever passing through `Starting`, so the UI never shows a spinner or "starting up" message.
-
-#### C. Backend skips sub-containers from package listing, so their state is invisible
-**File:** `core/archipelago/src/container/docker_packages.rs:39-117`
-The excluded_services list filters out backend services like `mempool-db`, `btcpay-db`, `nbxplorer`, `penpot-postgres`, etc. UI containers ending in `-ui` are also skipped. These containers are invisible to the user even when they're the actual cause of a stack failure (e.g., `indeedhub-postgres` being dead kills the entire IndeedHub stack, but only `indeedhub-api` errors are visible).
-
-#### D. No distinction between "needs manual intervention" and "will recover soon"
-The UI shows the same visual treatment for:
- Portainer (DB migration error — will NEVER recover without manual intervention)
- mempool-api (DB not ready yet — will recover in 30 seconds)
- IndeedHub (dependencies abandoned — won't recover until deps are manually restarted)
-
-### Fixes Required
-
-1. **Differentiate exit codes**: Exit 0 = "stopped" (gray), Exit non-zero = "crashed" (red), Exit 137 = "killed (OOM)" (red with warning)
-2. **Add a "recovering" state**: During boot/crash recovery window (first 5 minutes after backend start), show "Starting up..." instead of "crashed" for exited containers
-3. **Show sub-container health**: When a parent app is unhealthy, show which sub-service caused the failure (e.g., "IndeedHub: postgres is down")
-4. **Distinguish recoverable from permanent failures**: After health monitor gives up (3 attempts), change label to "Needs attention" instead of keeping "crashed"
-5. **Add recovery progress indicator**: During boot, show "Recovering containers: 15/22 started" on the dashboard
-
---
-
-## Problem 9: Dependency-Blind Restarts
-
-### Root Cause (Confirmed by .228 reboot)
-
-The health monitor restarts containers individually without considering dependencies. This was proven by the IndeedHub stack failure:
-
-1. `indeedhub-postgres` exits cleanly (code 0) on reboot
-2. Health monitor restarts postgres — it starts, but exits again (likely needs volume mount or network ready)
-3. After 3 attempts, postgres is **abandoned**
-4. Meanwhile, `indeedhub-api` tries to connect to postgres → `ENOTFOUND indeedhub-postgres` → exits
-5. Health monitor restarts api → same DNS failure → exits
-6. After 3 attempts, api is **abandoned**
-7. Same cascade for redis, minio, relay, main container — all abandoned within minutes
-
-**File:** `core/archipelago/src/health_monitor.rs:500-530`
-The restart loop treats each container independently. There's no logic to:
- Check if a container's dependencies are running before restarting it
- Restart dependencies first when a dependent container fails
- Reset attempt counters when a dependency comes back online
-
-**3 attempts is too few**, especially when dependencies need time:
- Attempt 1: 10s backoff → dependency still starting
- Attempt 2: 30s backoff → dependency crashed and is being restarted
- Attempt 3: 90s backoff → dependency hit its own 3-attempt limit and was abandoned
- Game over. Entire stack is dead.
-
-### Fixes Required
-
-1. **Dependency-aware restart ordering**: Before restarting a container, check if its dependencies are running. If not, restart dependencies first.
-2. **Increase max restart attempts to 5-10** for containers with dependencies
-3. **Reset attempt counters** when a dependency comes back online (the dependent container failed because of the dependency, not itself)
-4. **Add a "stack restart" concept**: When restarting any container in a multi-container stack (indeedhub, mempool, btcpay, immich, penpot), restart the entire stack in dependency order
-5. **Handle "Created" state containers**: `archy-mempool-web` and `fedimint` are in "Created" state (never started). The health monitor should detect these and attempt to start them.
-
---
-
-## Priority Order for Fixes
-
-### P0 — System is broken without these (reboot = broken system)
-1. **Dependency-aware restarts** in health_monitor.rs — restart dependencies before dependents, reset attempt counters when deps recover
-2. **Increase max restart attempts to 10** (currently 3) — dependency chains need more time on boot
-3. **Handle "Created" state** — containers stuck in Created are never started by health monitor
-4. **Fix UI state labels** — "exited" code 0 should say "stopped", not "crashed". Add "recovering" state during boot window.
-5. Fix Rust config to read secrets from `/var/lib/archipelago/secrets/` instead of hardcoded passwords
-6. Fix port 7777 conflict (strfry vs indeedhub)
-7. Add ZMQ config to Bitcoin for LND block notifications
-
-### P1 — Core functionality broken
-8. Wire up manifest health checks (replace fake "running = healthy" with actual HTTP/exec probes)
-9. Fix uninstall to clean up volumes, networks, and respect graceful shutdown timeouts
-10. Return actual errors from install/uninstall instead of silent success on partial failure
-11. Remove `|| true` from critical first-boot commands
-12. Show sub-container health in UI (which dependency is actually broken)
-
-### P2 — Performance and CPU
-13. Deduplicate `podman stats` calls (health monitor + metrics collector both call every 60s independently)
-14. Increase health monitor interval to 120s
-15. Add frontend health polling via WebSocket push (populate `health` field in data model)
-16. Make fleet polling viewport-aware (don't poll when user isn't viewing)
-
-### P3 — Consistency and correctness
-17. Sync memory limits between first-boot and Rust config
-18. Update marketplace version strings (LND shows 0.17.4, actual is 0.18.4; Mempool shows 2.5.0, actual is 3.0.0)
-19. Unify container naming conventions between first-boot script and Rust config
-20. Add disk-aware Bitcoin config (prune/txindex) to Rust installer
-21. Distinguish "needs manual intervention" from "will recover soon" in UI
-
---
-
-## Key Files to Modify
-
-| File | What to fix |
-|------|-------------|
-| `core/archipelago/src/health_monitor.rs` | Dependency-aware restarts, increase MAX_RESTART_ATTEMPTS to 10, handle Created state, deduplicate with metrics collector |
-| `core/container/src/podman_client.rs` | Add RestartPolicy to container creation spec, fix `.unwrap_or_default()` error swallowing, increase socket timeout to 15-30s |
-| `core/archipelago/src/crash_recovery.rs` | Increase timeouts to 120s, add retry with backoff, fix tier ordering catch-all |
-| `core/archipelago/src/api/rpc/package/install.rs` | Return failure on timeout (not silent success), await post-install hooks |
-| `core/archipelago/src/api/rpc/package/runtime.rs` | Add volume/network cleanup on uninstall, use `podman stop -t` then `podman rm` (not `-f`), return errors on partial failure |
-| `core/archipelago/src/api/rpc/package/config.rs` | Read secrets from disk, fix port 7777, add ZMQ config, sync memory limits |
-| `core/archipelago/src/container/dev_orchestrator.rs` | Wire up manifest-defined health checks instead of just checking podman state |
-| `core/archipelago/src/container/docker_packages.rs` | Stop filtering sub-containers from state — or expose their health as part of parent app status |
-| `core/archipelago/src/data_model.rs` | Populate `health` field for WebSocket push, add exit code to state |
-| `core/archipelago/src/monitoring/mod.rs` | Share podman stats data with health monitor instead of duplicate subprocess calls |
-| `neode-ui/src/views/apps/appsConfig.ts` | Fix state labels: exit 0 = "stopped", exit non-zero = "crashed", add "recovering" during boot window |
-| `neode-ui/src/stores/container.ts` | Add periodic health polling (30s) |
-| `neode-ui/src/views/apps/useAppsActions.ts` | Check for "partial" uninstall status, show errors to user |
-| `neode-ui/src/views/marketplace/marketplaceData.ts` | Fix version strings to match image-versions.sh |
-| `scripts/first-boot-containers.sh` | Remove `\|\| true` from critical commands, fix port 7777 conflict, add proper error reporting |
--- a/docs/CONTAINER_LIFECYCLE_HANDOFF.md
+++ b/docs/CONTAINER_LIFECYCLE_HANDOFF.md
--- a/docs/CURRENT_AGENT_HANDOFF.md
+++ b/docs/CURRENT_AGENT_HANDOFF.md
@ -1,216 +0,0 @@
-# Current Agent Handoff - Bitcoin UI Recovery And `1.8-alpha` Resume
-
-Last updated: 2026-06-10 05:33 EDT
-
-## Read This First
-
-This is a separate handoff from `docs/NEXT_TERMINAL_HANDOFF.md`. That file tracks
-an older/broader plan. For the next agent resuming this machine-switch pause,
-read this file first, then read:
-
- `docs/RESUME.md`
- `docs/1.8-alpha-improvements-tracker.md`
- `docs/CONTAINER_LIFECYCLE_HANDOFF.md`
- `docs/MIGRATION_STATUS_REPORT.md`
-
-Do not assume `docs/NEXT_TERMINAL_HANDOFF.md` is the current short-term plan.
-
-## Current Goal
-
-Cut Archipelago `1.8-alpha`, including a ready-to-test ISO image.
-
-The release goal is not just "apps launch once"; the app/container system needs
-to be developer-ready and production-release ready:
-
- manifests and docs must describe the real runtime contract;
- apps must install, start, stop, restart, uninstall, reinstall, survive reboot,
-  report truthful status, and show useful progress;
- My Apps must preserve last-known truth during Podman/scanner backoff instead
-  of showing false empty/no-app states;
- Bitcoin-dependent apps must explain sync/wallet readiness instead of looking
-  broken;
- final validation needs focused lifecycle, broad non-destructive lifecycle,
-  then repeated reboot checks before ISO cut/smoke test.
-
-## Current Estimate
-
-As of this pause:
-
- Credible release candidate: roughly `87-91%`.
- Production-quality release developers will love: roughly `73-79%`.
- Calendar estimate if the remaining systemic lifecycle issues are bounded:
-  `1-2 focused engineering days` for a release candidate, then additional
-  reboot/ISO smoke time.
- The biggest remaining risk is not catalog wiring; it is rootless Podman
-  control-plane responsiveness, stale scanner state, lifecycle progress UX, and
-  reboot validation.
-
-## Validation Host
-
- Host: `192.168.1.198`
- SSH user: `archipelago`
- Password used in this session: `password123`
- Active Bitcoin app on this host: `bitcoin-knots`, not `bitcoin-core`
- Keep `archipelago-doctor.timer` and `archipelago-reconcile.timer` inactive
-  for deterministic validation unless intentionally testing them.
- Preserve app data.
- Avoid broad Podman store/image cleanup commands on `.198`.
-
-## Bitcoin UI Incident Summary
-
-User reported the Bitcoin custom UI showing:
-
-`Bitcoin node is starting or busy syncing; retrying automatically. Detail:
-getblockchaininfo: Bitcoin RPC request failed ... operation timed out`
-
-Then after listener repair, the message changed through:
-
- `Connection refused`
- `Verifying blocks...`
- then the user reported it looked fine again.
-
-What happened:
-
- The node is a `bitcoin-knots` node.
- During live debugging, the wrong alias, `bitcoin-core`, was started/stopped.
- `bitcoin-core` and `bitcoin-knots` compete for the same Bitcoin RPC/P2P ports.
- That action left the real `bitcoin-knots` service active but without the host
-  `8332` rootlessport listener for a while.
- Stopping the stray `bitcoin-core.service` and restarting only
-  `bitcoin-knots.service` recreated listeners on `8332` and `8333`.
- After restart, bitcoind entered the normal `-28 Verifying blocks...` phase.
- The user later reported the Bitcoin UI looked fine again.
-
-Known live state observed during recovery:
-
- `bitcoin-knots.service`: active
- `bitcoin-core.service`: inactive
- `archy-bitcoin-ui.service`: active
- listeners present after repair:
-  - `8332` via `rootlessport`
-  - `8333` via `rootlessport`
-  - `8334` via nginx/Bitcoin UI
- `bitcoin-knots` logs showed active IBD around height `4137xx` and progress
-  about `0.09438`.
-
-Do not restart Bitcoin again unless there is a fresh confirmed service/listener
-failure. If checking status, prefer read-only probes and avoid starting the
-wrong variant.
-
-## Source Fixes Made Locally
-
-These local edits were made after live Bitcoin recovered. They are not deployed
-yet and were not fully validated before the user paused.
-
-### `core/archipelago/src/bitcoin_status.rs`
-
-Changed Bitcoin status cache behavior and copy:
-
- refresh interval changed from `5s` to `10s`;
- transient error backoff added at `15s`;
- RPC client timeout increased from `8s` to `20s`;
- error context now uses full anyhow chain with `{e:#}`;
- transient classifications now include common overloaded/backend states;
- user-facing copy now distinguishes:
-  - `verifying blocks after restart`;
-  - `waiting for the Bitcoin RPC listener`;
-  - `busy and not answering RPC before the timeout`;
-  - generic `starting or busy syncing`;
- added unit tests for the three user-visible states above.
-
-Intent: stop collapsing distinct backend states into the same stale
-"starting or busy syncing" timeout message.
-
-### `core/archipelago/src/api/rpc/package/update.rs`
-
-Narrow Bitcoin alias fix added:
-
- `orchestrator_update_app_id("bitcoin-knots")` now remains
-  `"bitcoin-knots"` instead of mapping to `"bitcoin-core"`;
- candidate app IDs for a Bitcoin container now prefer `bitcoin-knots` before
-  `bitcoin-core`;
- tests updated to lock this behavior.
-
-Intent: `bitcoin-core` and `bitcoin-knots` can be dependency/status aliases,
-but must not be interchangeable lifecycle/update targets on a node that has a
-specific installed variant.
-
-Important: this file also already contained other uncommitted update/pull
-timeout changes from prior work. Do not assume every diff in this file came
-from this interruption.
-
-## Validation Status At Pause
-
-Completed:
-
- `cargo fmt --manifest-path core/Cargo.toml --all` passed after the local
-  Bitcoin edits.
-
-Attempted but not completed:
-
- Targeted Cargo tests were first launched in three separate `/tmp` target dirs
-  and failed due `/tmp` filling with `No space left on device`.
- Those temporary dirs were removed:
-  - `/tmp/archy-cargo-bitcoin-status`
-  - `/tmp/archy-cargo-update-alias`
-  - `/tmp/archy-cargo-container-candidates`
- A second run using `CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix` was still
-  compiling when the user paused. It was terminated for handoff.
- No successful Rust test result exists yet for the new Bitcoin status/alias
-  tests.
-
-Recommended validation after resume:
-
-```bash
-git diff --check -- core/archipelago/src/bitcoin_status.rs core/archipelago/src/api/rpc/package/update.rs docs/CURRENT_AGENT_HANDOFF.md
-CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago bitcoin_status::tests
-CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago update_aliases_map_to_manifest_app_ids
-CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago container_name_candidates_cover_common_aliases
-```
-
-If Cargo target locking appears stale, check for real `cargo`/`rustc` workers
-before deleting anything. Prefer workspace-local target dirs under `.codex-tmp`
-over new cold `/tmp` targets.
-
-## Immediate Next Steps
-
-1. Confirm no lingering Cargo process:
-
-   ```bash
-   pgrep -af "cargo|rustc|cargo-bitcoin-fix"
-   ```
-
-2. Validate the local Bitcoin source fixes listed above.
-
-3. If validation passes, build/deploy the backend to `.198` only after
-   confirming the user still wants deployment.
-
-4. Recheck live Bitcoin non-destructively:
-
-   - `bitcoin-knots.service` active;
-   - `bitcoin-core.service` inactive;
-   - listeners on `8332`, `8333`, `8334`;
-   - Bitcoin UI loads on `8334`;
-   - `/bitcoin-status` returns useful copy if backend is busy.
-
-5. Resume release backlog:
-
-   - rootless Podman lifecycle/control-plane responsiveness;
-   - My Apps last-known-state truthfulness during scanner backoff;
-   - progress UX for install/uninstall/start/stop/restart;
-   - remaining tracker rows in `docs/1.8-alpha-improvements-tracker.md`;
-   - focused lifecycle matrix on `.198`;
-   - broad non-destructive lifecycle;
-   - 3 clean reboot validations minimum, 5 preferred;
-   - ISO cut and ISO smoke test.
-
-## Cautions For Next Agent
-
- Do not start `bitcoin-core` on `.198` unless intentionally migrating variants.
- Treat `bitcoin-knots` as the installed Bitcoin variant.
- Do not run broad Podman prune/store cleanup.
- Do not revert unrelated dirty worktree changes.
- `docs/NEXT_TERMINAL_HANDOFF.md` exists but is not the short-term handoff for
-  this pause.
- Many repo files are dirty from broader release hardening. Read diffs before
-  attributing changes.
--- a/docs/HANDOFF-2026-06-20-mesh-netbird.md
+++ b/docs/HANDOFF-2026-06-20-mesh-netbird.md
@ -1,144 +0,0 @@
-# Handoff — Mesh device rename, mesh routing, duplicate contacts, netbird logout (2026-06-20)
-
-Session is a **test-build iteration toward the 1.8.0 bug-bash release** — sideload patched binaries
-to test nodes, NO version bump / NO OTA release (manifest stays `1.7.99-alpha`). Because the version
-string never changes, **verify a deploy by sha256-matching the deployed binary**, not by `current_version`.
-
-## Test node roster (creds in the operator's local notes / agent memory — NOT in this repo)
- `.116` 192.168.1.116 — this build host (archi-thinkpad), dev/validation.
- `.198` 192.168.1.198, `.228` 192.168.1.228 — LAN resilience nodes.
- `.5`  Tailscale 100.72.136.5  (archy-x250-beta) — **Meshtastic radio**.
- `.120` Tailscale 100.66.157.120 (archy-x250-exp) — **Meshtastic radio**.
- `.89` Tailscale 100.89.209.89 (archy-x250-pa) — **dual radio**: ttyACM0 Meshtastic (probe FAILS),
-  ttyUSB0 MeshCore (active). Configured device_path = ttyACM0. Runs netbird (v2.38.0).
-
-Deploy driver used this session: `/tmp/archy-deploy/deploy-node.sh <user@host> <pw> <label>`
-(scp binary + stream `web/dist/neode-ui` + sudo swap `/usr/local/bin/archipelago`, preserve aiui +
-claude-login.html, chown 1000:1000, restart, verify sha256+health). Recreate from this doc if /tmp is gone.
-
-## Deploy state (binary sha) at handoff
- `b5183dfc…` (HEAD d00d1b20, includes Meshtastic rename) → on **.5 and .120** (verified).
- `f702b4f1…` (the 3 wallet/mesh/ui fixes, pre-rename) → on **.116, .198, .228**.
- `7c17a96…` (OLD, pre-f702b4f1) → **.89 is STALE** — update before re-testing .120→.89.
-
-## DONE
-1. **Meshtastic device rename → server name** — committed `d00d1b20` (pushed to gitea-vps2/main).
-   `meshtastic.rs set_advert_name` was a no-op (in-memory only). Now sends
-   `AdminMessage{set_owner=User{long_name,short_name}}` to the local node on ADMIN_APP port (6),
-   set_owner field = 32. long_name = server name (≤39), short_name = first 4 alphanumerics upper-cased.
-   **Hardware-verified**: .120 radio now reads back `Archy-X250-EXP`, .5 reads back `Archy-X250-Beta`.
-   MeshCore already renamed (CMD_SET_ADVERT_NAME, serial.rs:147) — unchanged, now at parity.
-2. **Routing priority confirmed = Mesh → FIPS → Tor**. `send_typed_wire` (mesh/mod.rs:1007): reachable
-   radio peer → LoRa; federation-synthetic OR (`!reachable && arch_pubkey_hex.is_some()`) → federation.
-   `send_typed_wire_via_federation` (mod.rs:1124): FIPS first w/ `.fips_timeout(8s)`, Tor fallback.
-3. **`.120`→`.89` "non-delivery" diagnosed — it is NOT a delivery failure.** `.120` sends to .89's
-   federation contact_id `3027572739`, logs `Federation envelope delivered transport=tor` (gated on
-   HTTP 2xx, mod.rs:1185). The receiver returns 2xx ONLY after ed25519-verify + successful
-   `inject_typed_from_federation` (node_message.rs:217-263). Identity matches (.89 pubkey 031875b4…).
-   `.89`→`.120` works. So .120's messages ARE injected into .89's state under contact_id
-   `2679725907` = federation_peer_contact_id(.120 pubkey 535fb91f…), name "Archy-X250-EXP".
-   It's a **duplicate-contact SURFACING** problem (user confirmed doubles).
-
-## SESSION 3 PROGRESS (2026-06-20 — deployed fleet-wide, binary `e1f2e88`)
- **#5 Arch Mobile messages CONFIRMED FIXED** by the #12 dedup — user verified MeshCore surfaces them.
- **#3 ecash pay-for-file — confirm UI + auto-refund** (`12f54e39`): PeerFiles shows a confirmation
-  step (amount + which wallet Cashu/Fedimint + balances + switch + styled Confirm); `content.download-peer-paid`
-  takes `method`, logs the backend+outcome, gives backend-specific rejection errors, and RECLAIMS the
-  spent token on any failure (fedimint reissue / cashu receive) so funds aren't lost. Root cause of the
-  user's failed pay: `.198` had no Cashu → spent Fedimint notes → seller `.89` not in the SAME federation
-  → rejected → notes stuck (now auto-refunded; old stuck notes auto-return in ~1h via the 3600s spend timeout).
-  To COMPLETE a fedimint pay, payer+seller must share a federation (or share a Cashu mint w/ balance).
- **#1 companion crash** — added an on-screen red error overlay (`242baf5d`) since chrome://inspect isn't
-  reachable on the WebView; user reproduces → screenshots the box → that's the real error to fix on.
- **#7 NEW: can't add Fedimint federations on `.116`** — fmcd sidecar crash-loops `Operation not permitted
-  (os error 1)`, so `:8178` answers HTTP 000 and `wallet.fedimint-join` fails. fmcd WORKS on `.198`/`.89`.
-  EXHAUSTIVE black-box isolation on `.116` (seccomp default vs unconfined; cap-drop ALL vs caps restored;
-  fresh data vs a `cp -a` COPY of the real /data; default net vs archy-net; /data 755 vs 777) — **fmcd ran
-  in EVERY standalone `podman run` config**, including full real security (cap-drop ALL + readonly +
-  no-new-priv + archy-net + copy of real data). Only the ORCHESTRATOR-created container EPERMs. So:
-  - **seccomp is NOT the cause** (default-seccomp standalone runs) — the seccomp "fix" was reverted (`63b98599`).
-  - NOT caps, NOT /data perms/ownership, NOT the existing multimint.db (the copy runs), NOT archy-net.
-  - The differentiator is something specific to the orchestrator's libpod-API create vs `podman run` that I
-    did NOT pin (a related symptom: the orchestrator's volume self-heal logs `chown /data: Operation not
-    permitted` because the container has cap-drop ALL → no CAP_CHOWN). NEXT: create fmcd via the libpod API
-    socket directly (replicating prod_orchestrator's exact body) to repro outside the orchestrator, then diff.
-  WORKAROUND for now: **test Fedimint on `.198`/`.89` (working fmcd), not `.116`.** Not the ecash code.
- Deploy: all 6 nodes verified on `e1f2e88`; pushed gitea-vps2 (gitea-local token still 401s).
-
-## SESSION 2 PROGRESS (2026-06-20, code-complete — NOT yet deployed; user held deploy)
-All committed to local `main`; NOT pushed to gitea-vps2/origin yet, NOT sideloaded.
- **#12 dup contacts DONE** (`f92e442b`, +3 unit tests pass). Backend `group_peer_twins()`
-  helper (mesh/mod.rs) dedups by `arch_pubkey_hex`, radio twin = canonical send id, unions
-  messages; wired into conversations.list/messages + mesh.contacts-list. **KEY FINDING:**
-  conversations.list/messages have NO frontend consumer — the live chat list renders the
-  *frontend* merge `mergedPeers` (Mesh.vue), which matched twins by the `Archy-z6Mk…` advert
-  prefix that the device RENAME broke. Real fix = merge by `arch_pubkey_hex` (now exposed on the
-  MeshPeer TS type). Should also clear `.120→.89` and likely **#5** (Arch Mobile on .116, same bug).
- **Companion crash diagnostic SHIPPED** (`b3633ec5`): main.ts global handler now shows the REAL
-  error + keeps a 25-entry `window.__archyErrors` ring buffer + catches async/unhandledrejection.
-  Still need to deploy + repro on the optiplex node (read `window.__archyErrors` via chrome://inspect)
-  to get the actual throw. User says LAN/mobile-browser fine → Tailscale-WebView-specific.
- **#3 dual-ecash pay-for-file DONE** (`8f06d88f`, compiles): payer tries Cashu→Fedimint, seller
-  accepts both (verify_and_receive_payment: non-"cashu" = reissue_into_any), new
-  fedimint_client::spend_from_any(), wallet.ecash-balance reports total_sats. LIVE federation
-  validation pending (two nodes sharing a federation).
- **#2 mobile scroll cutoff DONE** (`a8c668ee`): DashboardMobileNav wrote `--mobile-tab-bar-height:0px`
-  when the bar was hidden/unlaid-out, defeating the `,88px` fallback → bar covered last row. Now never
-  writes 0 (removes var → fallback), re-measures on rAF + post-WebView-injection. Backup hypothesis if
-  it persists: `.dashboard-view` is `min-h-screen`(100vh) → mobile-browser toolbar overlap, switch to dvh.
-
-DEPLOYED 2026-06-20 to ALL 6 nodes — binary sha `4a8f2198…` (release build of commit a6957a48 +
-this handoff), FE rebuilt, all sha-verified + service active: .116(local) .198 .228 .89 .5 .120.
-.5/.120 needed a 30-min timeout (slow DERP). #10 netbird OIDC gate also shipped in this build.
-REMAINING VERIFICATION (on real hardware, user-side):
- #12/#5: open mesh chat on .116 (and .89/.120) — confirm a federated node shows ONCE with its
-  messages (no radio/federation double), and that "Arch Mobile" messages now surface.
- #1 companion crash: open the companion app to the optiplex node over Tailscale, reproduce the
-  crash, then read the REAL error from `window.__archyErrors` (chrome://inspect the WebView) or the
-  now-detailed toast. That error is what's needed to write the actual fix. Confirm which node = optiplex.
- #3: pay for a peer file when the buyer's balance is only in Fedimint (needs two nodes in a federation).
- #2: check Cloud/files bottom rows clear the tab bar on mobile browser.
-Commits are LOCAL on main (f92e442b/b3633ec5/8f06d88f/a8c668ee/a6957a48 + docs) — NOT pushed to
-gitea-vps2/origin (no version bump; bug-bash sideload only).
-
-## TODO (original resume — #12 now DONE above)
-### #12 Fix duplicate mesh contacts  ← DONE this session (see SESSION 2 PROGRESS)
-Root cause: `handle_mesh_contacts_list` (api/rpc/mesh/typed_messages.rs:1126) and
-`handle_conversations_list` (api/rpc/mesh/status.rs:89) emit **one row per `state.peers` entry** with
-**no cross-transport dedup**. A node can have TWO peers: a radio peer (low contact_id, firmware key)
-and a federation peer (high contact_id ≥ 0x8000_0000, archipelago key). `bind_federation_twins`
-(mesh/mod.rs:85) correlates them by exact advert_name and copies `arch_pubkey_hex` onto the radio
-twin, but LEAVES BOTH ROWS. Messages are keyed by `peer_contact_id` (split across the two ids), so
-the federation-injected messages sit on the federation row while the user may open the radio row → empty.
-
-**Design constraint (important):** the two twins have DIFFERENT routing. Collapsing must NOT break
-"mesh-first": the canonical SEND contact_id should be the RADIO twin when one exists (so send_typed_wire
-routes LoRa-if-reachable, else federation via the bound arch key), else the federation id. The merged
-THREAD must union messages from ALL twin contact_ids (group by `arch_pubkey_hex`). Apply the dedup in:
- `handle_conversations_list` (status.rs:89) — one conversation per identity group; last msg = newest across twins.
- `handle_mesh_contacts_list` (typed_messages.rs:1126).
- `handle_conversations_messages` (status.rs ~146) — when asked for a contact_id, resolve its group's
-  twin ids and filter messages by ANY of them.
-Add a shared helper (e.g. group peers by `arch_pubkey_hex` when Some, else singleton by contact_id).
-Do NOT merge/re-key at `bind_federation_twins` time — that would force federation routing and break mesh-first.
-MeshPeer struct: mesh/types.rs:28 (fields: contact_id, advert_name, did, pubkey_hex, arch_pubkey_hex, reachable…).
-
-**Before testing #12:** update `.89` to the current build (it's on stale 7c17a96), then re-check whether
-.120 ("Archy-X250-EXP") shows once with its messages. NB: .89 had 0 journal mentions of "Archy-X250-EXP"
-and no radio contact for .120 — so its specific double may be a stale-binary artifact; confirm on fresh build.
-
-### #10 Netbird logout race
-Symptom: right after install netbird shows logged-in but can't log out; self-corrects after a while.
-Map: install `stacks.rs install_netbird_stack` (~1760-1918): 3 containers (netbird-server :8086, dashboard,
-nginx proxy :8087→443 self-signed TLS). `wait_for_stack_containers` waits for "running", NOT OIDC-ready.
-Dashboard is netbird's own SPA, opened in a NEW TAB (appLauncher.ts ~52-60, secure-context/crypto.subtle).
-Hypothesis: startup race — dashboard loads before netbird-server's OIDC provider is ready, caches a bad auth
-state; logout endpoint not ready. Likely fix: gate install completion / launch on netbird-server OIDC
-readiness (poll an endpoint) rather than container "running". Repro on `.89` (has netbird running).
-Prior note: AccountInfoSection.vue ~602 release note claims a previous unified-origin fix for the 404
-logout/login loop — the initial-state race remains.
-
-## Mesh parity directive
-MeshCore "works great"; Meshtastic must reach the SAME parity (rename done; duplicate-contact + routing
-fallback shared across both). Meshtastic↔MeshCore are INCOMPATIBLE over-the-air, so cross-protocol
-federated peers (.120↔.89) rely entirely on the FIPS/Tor fallback.
--- a/docs/MARKETPLACE-QA.md
+++ b/docs/MARKETPLACE-QA.md
@ -1,58 +0,0 @@
-# Marketplace QA — app-by-app install walk
-
-Purpose: track install/launch/uninstall health for every app in the marketplace catalog on `.228`. User installs each app one by one; for each broken one we triage, fix at the right layer (app recipe / registry image / backend / frontend), commit, redeploy, and re-verify.
-
-Target build: `v1.7.43-alpha` + backend md5 `9b8ead06aaf210b85cd78fce270384e3` (image-versions path fix included).
-
-## Status key
-
- ✅ install, launch, uninstall all clean
- ⚠️  installs and runs but has cosmetic or partial issues (note in details)
- ❌ broken — fix needed
- ⏳ pending verification
-
-## Catalog
-
-Pull the authoritative list from Marketplace page on `.228` during the walk. Fill in as you go.
-
-| App | Status | Notes / fix applied |
-|---|---|---|
-| _(to be filled during walk)_ | ⏳ | |
-
-## Known issues going in
-
- **Vaultwarden** — container exits immediately on start. Pre-existing. Backend async wrapper correctly detects + removes the install state entry. Needs container-config investigation (image pin / env vars / volume layout).
-
-## Fix layers cheat-sheet
-
-When an app breaks, identify which layer to fix at:
-
-1. **App recipe** — `apps/<app>/package.yaml` or wherever the Podman manifest lives. Ports, volumes, env vars, healthcheck, resource caps.
-2. **Registry image** — if image itself is missing/wrong-tag on `.168`:3000/lfg2025 or `git.tx1138.com`. Push corrected image, bump `scripts/image-versions.sh`.
-3. **Backend orchestrator** — `core/archipelago/src/container/` or `core/archipelago/src/api/rpc/package/` if the install flow mishandles this app's shape.
-4. **Frontend** — `neode-ui/src/views/marketplace/` or curated data in `neode-ui/src/views/marketplace/marketplaceData.ts` if catalog entry is wrong or UI can't render this app correctly.
-
-## Per-app fix workflow
-
-For each broken app:
-
-1. Capture failure mode:
-   ```
-   ssh archy228 'sudo journalctl -u archipelago --since "5 minutes ago" --no-pager | tail -80'
-   ssh archy228 'podman ps -a --format "{{.Names}}\t{{.Status}}\t{{.Image}}" | grep <app>'
-   ssh archy228 'podman logs <container-name> 2>&1 | tail -60'
-   ```
-2. Diagnose — which layer.
-3. Fix in repo (use SSHFS mount for edits).
-4. `cargo check` if backend changed; `npm run build` if frontend changed.
-5. Commit with `fix(app/<name>): ...` or `fix(registry/<image>): ...` etc.
-6. Redeploy as needed (binary via Mac ferry; frontend via rsync; registry via podman push).
-7. User re-verifies on `.228`. Mark ✅.
-
-## Release-notes policy
-
-For each app fix, append a bullet to the current in-flight release entry in `neode-ui/src/views/settings/AccountInfoSection.vue`. If the fix pile gets large enough to warrant its own release, bump to v1.7.44-alpha and start a new block at the top. Keep entries operator-focused ("Nostr Relay no longer crashes on first start"), not implementation-focused.
-
-## Running log
-
-_Add dated notes here as we progress through the catalog._
--- a/docs/MASTER_PLAN.md
+++ b/docs/MASTER_PLAN.md
@ -1,476 +0,0 @@
-# MASTER PLAN
-
-> Archipelago project task tracking and roadmap.
->
-> **BETA FREEZE ACTIVE (2026-03-18)** — No new features. Fix bugs, harden security, test everything.
-> Pipeline: **Feature Testing** → **User Testing** → **Beta Live**
-> Progress: `docs/BETA-PROGRESS.md` | Acceptance: `docs/BETA-RELEASE-CHECKLIST.md`
-
-## Roadmap
-
-### Phase 1: Feature Testing (internal) — CURRENT
-
-| ID | Title | Priority | Status | Dependencies |
-|----|-------|----------|--------|--------------|
-| **FEATURE-4** | **Onboarding loading screen with progress** | **P1** | IN PROGRESS | - |
-| **TASK-9** | **Full feature testing sweep** | **P1** | PLANNED | - |
-| **TASK-10** | **ISO build verification + multi-hardware test** | **P1** | PLANNED | - |
-| **TASK-12** | **Beta telemetry — reporter + toggle + collector POST** | **P1** | IN PROGRESS | - |
-| **TASK-39** | **Finish .198 rootless container migration** | **P1** | PLANNED | TASK-11 |
-| **TASK-42** | **LUKS2 full-partition encryption for /var/lib/archipelago/** | **P1** | IN PROGRESS | - |
-| **TASK-49** | **Container app reliability — bulletproof installs + recovery** | **P0** | PLANNED | - |
-| **TASK-50** | **Networking stack: first-install → reboot-proof** | **P0** | IN PROGRESS | - |
-| **BUG-44** | **App iframe shows blank/broken when container is starting or crashed** | **P2** | PLANNED | - |
-| **TASK-45** | **Deploy script: auto-chown data dirs after rootful→rootless migration** | **P2** | PLANNED | - |
-| **BUG-46** | **FileBrowser missing in unbundled ISO + Cloud auto-login broken** | **P1** | IN PROGRESS | - |
-| **BUG-47** | **Onboarding: DID sign 403 + blob HTTPS + no password setup** | **P1** | IN PROGRESS | - |
-| **FEATURE-48** | **Meshtastic support for mesh (plug and play)** | **P1** | PLANNED | - |
-
-### Phase 2: User Testing (controlled, real hardware)
-
-| ID | Title | Priority | Status | Dependencies |
-|----|-------|----------|--------|--------------|
-| **TASK-13** | **Recruit 3-5 test users, distribute ISOs** | **P1** | NOT STARTED | Phase 1 complete |
-| **TASK-14** | **Monitor telemetry, triage + fix user-reported issues** | **P1** | NOT STARTED | TASK-12, TASK-13 |
-| **TASK-15** | **Rebuild ISO with fixes, re-verify** | **P1** | NOT STARTED | TASK-14 |
-
-### Phase 3: Beta Live (public)
-
-| ID | Title | Priority | Status | Dependencies |
-|----|-------|----------|--------|--------------|
-| **TASK-16** | **Final ISO build + release notes + distribution** | **P1** | NOT STARTED | Phase 2 complete |
-
-### Post-Beta (FROZEN — do not start)
-
-| ID | Title | Priority | Status | Dependencies |
-|----|-------|----------|--------|--------------|
-| **TASK-2** | **Roll incoming-tx into deploy & ISO** | **P2** | DEFERRED | - |
-| **INQUIRY-5** | **Offline balance check via mesh relay** | **P2** | DEFERRED | - |
-| **FEATURE-6** | **Watch-only wallet architecture** | **P1** | DEFERRED | - |
-| **TASK-7** | **Mesh Bitcoin security hardening** | **P1** | DEFERRED | FEATURE-6 |
-| **FEATURE-43** | **P2P encrypted voice/video calling (WebRTC over federation)** | **P1** | DEFERRED | - |
-| **FEATURE-48** | **Meshtastic support for mesh (plug and play)** | **P1** | PLANNED | - |
-
-## Active Work
-
-### FEATURE-4: Onboarding loading screen with progress (IN PROGRESS)
-**Priority**: P1 — High
-**Status**: IN PROGRESS (2026-03-17)
-
-Users hit the onboarding screen before the backend is ready, resulting in "Server is still starting up" errors that block identity creation. The onboarding flow should not begin until the server is fully operational.
-
-**Solution**: Show the existing screensaver as a loading/boot screen with server startup progress. Swap the inner logo for animated pixel art icons (smiley face, Bitcoin logo, etc.) that cycle while services come online. Show progress indicators for each backend service (identity store, container runtime, LND, etc.). Only transition to onboarding once `/health` returns ready.
-
-**Key considerations**:
- Reuse the existing screensaver component as the boot screen
- Animated pixel art icons rotate in the center (smiley, BTC, lightning bolt, etc.)
- Progress bar or status checklist showing which services are ready
- Poll `/health` endpoint for service readiness
- Smooth transition from boot screen → onboarding once all critical services are up
- First-boot vs normal boot: first boot shows onboarding after, normal boot goes to dashboard
-
-**Key files**:
- `neode-ui/src/views/Onboarding.vue` — current onboarding flow
- `neode-ui/src/components/Screensaver.vue` — existing screensaver to repurpose
- `core/archipelago/src/api/rpc/mod.rs` — health endpoint
- `core/archipelago/src/server.rs` — startup sequence and service initialization
-
-**Tasks**:
- [ ] Investigate current health endpoint — what services does it check, what's missing
- [ ] Design boot screen component: screensaver background + animated pixel icons + progress
- [ ] Create pixel art icon set (smiley, BTC, lightning, shield, etc.) as SVG/CSS animations
- [ ] Implement service readiness polling (health check with granular service status)
- [ ] Add backend support for granular startup progress (which services are ready)
- [ ] Build boot screen component with smooth transition to onboarding/dashboard
- [ ] Handle edge cases: very slow starts, partial service failures, timeout fallback
- [ ] Test on fresh ISO install (first-boot scenario)
-
-### TASK-9: Full app testing matrix on fresh install (PLANNED)
-**Priority**: P1 — High
-**Status**: PLANNED (2026-03-18)
-
-Run through the complete `docs/BETA-RELEASE-CHECKLIST.md` app matrix on a fresh ISO install. Every app: install, launch, UI loads, uninstall. Every dependency chain: correct errors when deps missing.
-
-### TASK-10: ISO build verification + multi-hardware test (PLANNED)
-**Priority**: P1 — High
-**Status**: PLANNED (2026-03-18)
-
-Build a fresh ISO, install on at least 2 different hardware configurations, verify full onboarding flow, app installs, and multi-day uptime.
-
---
-
-### TASK-17: Alpha version tags + rollback strategy (PLANNED)
-**Priority**: P2 — Medium
-**Status**: PLANNED (2026-03-18)
-
-Tag every significant alpha version with git tags for easy rollback. Each tag should correspond to a deployable state. Maintain a version log so any alpha can be rebuilt and deployed.
-
-**Tasks**:
- [ ] Tag current state as `v1.2.0-alpha.1` (pre-rootless-podman)
- [ ] Establish naming convention: `v{major}.{minor}.{patch}-alpha.{build}`
- [ ] Tag after rootless podman migration: `v1.2.0-alpha.2`
- [ ] Document rollback procedure (git checkout tag + deploy)
- [ ] Add version tag step to deploy script (auto-tag on successful deploy)
- [ ] Update CHANGELOG.md with each alpha milestone
-
---
-
-### TASK-42: LUKS2 full-partition encryption for /var/lib/archipelago/ (IN PROGRESS)
-**Priority**: P1 — High
-**Status**: IN PROGRESS (2026-03-26)
-
-Encrypt all Archipelago app data at rest using LUKS2 full-partition encryption. Protects Bitcoin wallet data, LND macaroons, FileBrowser files, Vaultwarden vault, secrets, and everything else from physical disk seizure. Seamless UX — user never interacts with encryption directly.
-
-**Design**:
- LUKS2 partition for `/var/lib/archipelago/` created during ISO install
- Cipher: AES-256-XTS (hardware AES-NI on x86_64, ChaCha20 fallback on ARM without AES-NI)
- Key derived from setup password via Argon2id + hardware salt (`/sys/class/dmi/id/product_uuid`)
- Key file stored at `/root/.luks-archipelago.key` (root:600, on boot partition)
- Auto-unlock via `/etc/crypttab` on every boot — no passphrase prompt
- Password change in Settings re-derives key and rotates LUKS keyslot
-
-**Threat model**:
- Disk removed from machine = fully encrypted, unreadable
- Running machine with login = transparent (same as today)
- Forgot password = cannot decrypt (correct sovereign behavior)
-
-**Tasks**:
- [x] ISO installer: create LUKS2 partition, format + mount at `/var/lib/archipelago/`
- [ ] First-boot: derive LUKS key from setup password via Argon2id + hardware salt
- [x] Store key file at `/root/.luks-archipelago.key` with 600 perms
- [x] Configure `/etc/crypttab` for auto-unlock at boot
- [ ] Settings password change: re-derive LUKS key, add new keyslot, remove old
- [x] Detect AES-NI availability, fall back to ChaCha20 on ARM without it
- [ ] Test: fresh install, reboot survives, power-cycle survives, password change works
- [ ] Test: disk removed from machine is unreadable
- [x] Update `image-recipe/build-auto-installer-iso.sh`
-
-**Key files**:
- `image-recipe/build-auto-installer-iso.sh` — partition creation
- `scripts/first-boot-containers.sh` — runs after LUKS mount
- `core/archipelago/src/api/rpc/system.rs` — password change handler
- `core/archipelago/src/server.rs` — startup checks
-
-### TASK-49: Container app reliability — bulletproof installs + recovery (PLANNED)
-**Priority**: P0 — Critical
-**Status**: PLANNED (2026-03-29)
-
-Every marketplace app must install cleanly, survive failures, auto-recover from unhealthy states, and uninstall without residue. Currently: some apps fail silently, health checks are inconsistent, and there's no systematic testing.
-
-**Scope**: All 25+ marketplace apps — install, health, restart, uninstall, dependency chains.
-
-#### Phase A: Audit & Fix Install Flow (Days 1-2)
-Test every app install on a fresh .198 node. Fix failures as found.
-
- [ ] **A1**: Create install test matrix — spreadsheet of all apps with columns: installs?, starts?, healthy?, UI loads?, uninstalls?, deps correct?
- [ ] **A2**: Test core apps: Bitcoin Knots, LND, Mempool, BTCPay, Electrumx, FileBrowser
- [ ] **A3**: Test recommended apps: Fedimint, Vaultwarden, Grafana, SearXNG, Tailscale, Portainer
- [ ] **A4**: Test optional apps: Home Assistant, Jellyfin, PhotoPrism, Nextcloud, Ollama, Immich, Penpot, OnlyOffice
- [ ] **A5**: Test web-only/L484 apps: noStrudel, BotFights, NWNN, IndeedHub, DWN
- [ ] **A6**: Test Nostr relay (nostr-rs-relay) install + relay functionality
- [ ] **A7**: Fix all install failures found in A2-A6
-
-#### Phase B: Health Checks & Restart Policies (Days 2-3)
-Ensure every container has proper health checks and restart policies.
-
- [ ] **B1**: Audit all container manifests for `--health-cmd`, `--health-interval`, `--health-retries`
- [ ] **B2**: Add health checks to containers missing them (curl endpoint or process check)
- [ ] **B3**: Verify `--restart unless-stopped` on all containers
- [ ] **B4**: Test failure recovery: `podman kill <container>` → verify auto-restart
- [ ] **B5**: Test OOM recovery: set low memory limit → trigger OOM → verify restart
- [ ] **B6**: Verify container-doctor.sh runs on timer and fixes unhealthy containers
- [ ] **B7**: Verify reconcile-containers.sh detects and recreates missing containers
-
-#### Phase C: Dependency Chain Validation (Day 3)
-Apps with dependencies (BTCPay→Bitcoin+Postgres, Mempool→Bitcoin+MariaDB) must handle missing deps gracefully.
-
- [ ] **C1**: Map all dependency chains (which app needs which)
- [ ] **C2**: Test installing dependent app without dependency → verify error message
- [ ] **C3**: Test stopping dependency while dependent is running → verify graceful degradation
- [ ] **C4**: Test restarting dependency → verify dependent reconnects automatically
- [ ] **C5**: Ensure backend `dependency_resolver.rs` handles all chains correctly
-
-#### Phase D: Uninstall & Cleanup (Day 4)
-Every app must uninstall cleanly — no orphaned volumes, networks, or config.
-
- [ ] **D1**: Test uninstall for each app — verify container, volumes, config removed
- [ ] **D2**: Verify no orphaned podman volumes after uninstall (`podman volume ls`)
- [ ] **D3**: Verify no orphaned networks after uninstall
- [ ] **D4**: Test reinstall after uninstall — must work cleanly
- [ ] **D5**: Fix any cleanup issues found
-
-#### Phase E: Stress & Soak Testing (Day 5)
-Multi-day uptime test with all core apps running.
-
- [ ] **E1**: Install all core + recommended apps on .198
- [ ] **E2**: Let run for 24h — check for crashes, memory leaks, disk growth
- [ ] **E3**: Simulate power failure (hard reboot) — verify all apps come back
- [ ] **E4**: Simulate network failure — verify apps recover when network returns
- [ ] **E5**: Run container-doctor after soak test — should report all healthy
-
-#### Phase E2: FileBrowser Auto-Login (Day 5)
-FileBrowser must auto-login seamlessly after install — user should never see a separate login screen. Still protected via nginx session cookie validation.
-
- [ ] **E2a**: Fix FileBrowser auto-login flow: nginx auth_request validates Archipelago session, injects FileBrowser auth token
- [ ] **E2b**: Verify auto-login works on fresh bundled install (first boot)
- [ ] **E2c**: Verify auto-login works on unbundled install (Marketplace install)
- [ ] **E2d**: Verify FileBrowser is NOT accessible without valid Archipelago session (security)
- [ ] **E2e**: Test auto-login after session expiry → re-login to Archipelago → FileBrowser works again
-
-#### Phase F: Frontend UX (Day 5-6)
-The UI must accurately reflect container state at all times.
-
- [ ] **F1**: Installing state persists across navigation (DONE — TASK-49 server store)
- [ ] **F2**: App card shows correct state: stopped, starting, running, unhealthy, crashed
- [ ] **F3**: App iframe shows contextual error when container is down (BUG-44)
- [ ] **F4**: Uninstall progress shown in My Apps
- [ ] **F5**: Error toast when install fails with actionable message
-
-**Key files**:
- `core/archipelago/src/container/` — PodmanClient, manifests, health
- `core/archipelago/src/api/rpc/package/` — install/uninstall RPC handlers
- `scripts/container-doctor.sh` — health check + auto-fix
- `scripts/reconcile-containers.sh` — recreate missing containers
- `scripts/image-versions.sh` — pinned image versions
- `scripts/first-boot-containers.sh` — first-boot container creation
- `neode-ui/src/views/marketplace/` — install UI
- `neode-ui/src/views/apps/` — My Apps state display
-
-**Testing approach**:
- Fresh .198 install as test bed
- SSH in, run installs via web UI, check with `podman ps -a`
- Automated: `scripts/container-doctor.sh --local` after each test
- Manual: kill containers, pull power, break networks, verify recovery
-
---
-
-### BUG-44: App iframe shows blank/broken when container is starting or crashed (PLANNED)
-**Priority**: P2 — Medium
-**Status**: PLANNED (2026-03-21)
-
-When an app container is still starting up or has crashed, the iframe overlay shows a blank/broken page with no feedback. Should show contextual loading states:
- **Starting**: skeleton loader or "App is starting up..." with spinner
- **Crashed**: "App has stopped" with restart button and link to logs
- **Port not ready**: "Waiting for app to become available..." with timeout warning
- **X-Frame-Options blocked**: Detect and open in new tab automatically
-
-**Key files**:
- `neode-ui/src/views/AppSession.vue` — iframe container
- `neode-ui/src/stores/appLauncher.ts` — app launch state
- `neode-ui/src/api/container-client.ts` — container status checks
-
-### TASK-45: Deploy script: auto-chown data dirs after rootful→rootless migration (PLANNED)
-**Priority**: P2 — Medium
-**Status**: PLANNED (2026-03-21)
-
-When `deploy-tailscale.sh` migrates from rootful to rootless Podman, all files in `/var/lib/archipelago/` created by the old root-running backend are owned by `root:root`. The new backend runs as `archipelago` user and can't read them (node-key.pem, credentials, sessions, identity, etc.). Deploy script must auto-detect and fix ownership after migration.
-
-Also fix:
- `/run/user/1000/crun` ownership (left as root from rootful container creation)
- Container recreation needs `--cap-add NET_BIND_SERVICE` for apps binding port 80 (nextcloud)
- Container recreation needs config volume mounts for apps writing to `/etc/` (searxng)
- Frontend should be copied from .228, not built locally (prevents build mismatches)
-
-**Key files**:
- `scripts/deploy-tailscale.sh` — Step 14 (UID mapping) and Step 22 (container creation)
- `scripts/first-boot-containers.sh` — container creation reference
-
-### BUG-46: FileBrowser missing in unbundled ISO + Cloud auto-login broken (IN PROGRESS)
-**Priority**: P1 — High
-**Status**: IN PROGRESS (2026-03-26)
-
-Two issues with the Cloud feature on fresh installs:
-
-1. **FileBrowser not prepackaged in unbundled ISO** — The unbundled ISO variant doesn't include the FileBrowser container image, so Cloud doesn't work out of the box. FileBrowser is a core dependency (not an optional app) since it powers the Cloud file manager. Must be bundled even in the unbundled variant.
-
-2. **FileBrowser auto-login not working** — The auto-login flow (so users don't need to enter separate FileBrowser credentials) appears broken. Need to investigate whether the auth proxy/token injection is functioning correctly on fresh installs.
-
-**Tasks**:
- [x] Add FileBrowser image to unbundled ISO build (core dependency, always bundled)
- [x] Create minimal first-boot script for unbundled mode (FileBrowser only)
- [x] Fix auto-login: `Secure` cookie flag silently fails on HTTP — made conditional
- [x] Changed `SameSite=Strict` to `SameSite=Lax` for better navigation compatibility
- [ ] Test Cloud feature end-to-end on a fresh install (both bundled and unbundled)
-
-**Key files**:
- `image-recipe/build-auto-installer-iso.sh` — UNBUNDLED container image list
- `scripts/first-boot-containers.sh` — FileBrowser container creation
- `image-recipe/configs/nginx-archipelago.conf` — FileBrowser proxy config
- `neode-ui/src/views/Cloud.vue` — Cloud UI / auto-login logic
-
-### BUG-47: Onboarding: DID sign 403 + blob HTTPS + no password setup (IN PROGRESS)
-**Priority**: P1 — High
-**Status**: IN PROGRESS (2026-03-26)
-
-Three onboarding issues on clean install:
-
-1. **Sign DID returns 403 Forbidden** — The DID verification/signing step during onboarding fails with a 403 response from the backend.
-2. **Blob URL HTTPS warning** — Browser complains about blob URL loaded over insecure connection (`blob:http://...` should be served over HTTPS). Likely related to the backup download on HTTP connections.
-3. **No password setup on clean install** — Users cannot set a password during onboarding. The setup password flow is missing or broken.
-
-**Root causes found**:
- `node.did`, `node.signChallenge`, `node.nostr-pubkey`, `node.createBackup`, `identity.verify` were NOT in `UNAUTHENTICATED_METHODS` — onboarding has no session, so they all returned 403
- `auth.setup` and `auth.isSetup` RPC methods were missing from the dispatcher — the frontend called them but no handler existed
- Blob HTTPS warning is a browser security feature on HTTP connections (not a code bug)
-
-**Tasks**:
- [x] Add onboarding methods to UNAUTHENTICATED_METHODS in middleware.rs
- [x] Add `auth.setup` RPC handler (creates user with password, prevents re-setup)
- [x] Add `auth.isSetup` RPC handler (checks if user.json exists)
- [x] Rust compiles clean
- [ ] Blob URL HTTPS warning — known browser limitation on HTTP, no code fix needed
- [ ] Test full onboarding flow end-to-end on fresh ISO
-
-**Key files**:
- `neode-ui/src/views/OnboardingVerify.vue` — DID signing step
- `neode-ui/src/views/OnboardingBackup.vue` — Backup download (blob URL)
- `neode-ui/src/views/OnboardingIntro.vue` — Password setup entry point
- `core/archipelago/src/api/rpc/auth.rs` — Auth RPC endpoints
- `core/archipelago/src/api/rpc/middleware.rs` — Request auth middleware
-
---
-
-### TASK-50: Networking stack: first-install → reboot-proof (IN PROGRESS)
-**Priority**: P0 — Critical
-**Status**: IN PROGRESS (2026-04-08)
-
-Every networking service must work from first install, survive reboots, and never go down. Covers the full stack: WireGuard (traditional peer VPN), NostrVPN (mesh VPN), Tor, Tor hidden services, Tor Electrum, and LND Connect wallet.
-
-**Why**: These are the sovereignty backbone — if any of them fail silently after a reboot or fresh install, the node is useless as a self-sovereign server. Users shouldn't need to SSH in to fix networking.
-
-**Services**:
- **WireGuard** (port 51820) — traditional peer VPN for direct connections
- **NostrVPN** (port 51821) — mesh VPN with Nostr identity, `nvpn` daemon
- **nostr-rs-relay** (port 7777) — private relay for NostrVPN signaling + general use
- **Tor** — SOCKS proxy + hidden services for all apps
- **Tor hidden services** — .onion addresses for node access without public IP
- **Tor Electrum** — Electrum server accessible over Tor
- **LND Connect** — wallet connect URIs over Tor for mobile wallets
-
-**Tasks**:
- [x] NostrVPN systemd service (`nostr-vpn.service`) — enabled, reboot-proof
- [x] WireGuard interface (`wg0`) — configured, auto-start
- [ ] Build nvpn v0.3.7 from source (fixes event processing bug in v0.3.4)
- [ ] Verify NostrVPN mesh forms between server and phone after v0.3.7 upgrade
- [ ] nostr-rs-relay service — systemd unit, auto-start, in-memory mode
- [ ] Each node runs its own relay on port 7777
- [ ] Tor service — systemd, auto-start, SOCKS on 9050
- [ ] Tor hidden services — auto-generate .onion for web UI, LND, Electrum
- [ ] Nodes without public IP use Tor hidden service as relay endpoint
- [ ] Tor Electrum — Electrumx/Fulcrum accessible over .onion
- [ ] LND Connect — generate wallet connect URI over Tor
- [ ] Show relay URLs in VPN card UI
- [ ] ISO first-boot: all networking services configured and started automatically
- [ ] Reboot test: power cycle → all services come back without intervention
- [ ] Fresh install test: ISO → boot → all networking operational
-
-**Key files**:
- `/etc/systemd/system/nostr-vpn.service` — NostrVPN daemon
- `/var/lib/archipelago/nostr-vpn/.config/nvpn/config.toml` — nvpn config
- `image-recipe/configs/nginx-archipelago.conf` — proxy rules
- `scripts/first-boot-containers.sh` — first-boot service setup
- `scripts/image-versions.sh` — pinned versions
- `neode-ui/src/views/apps/VpnCard.vue` — VPN UI card
- `core/archipelago/src/vpn.rs` — VPN status backend
-
---
-
-## Post-Beta (FROZEN)
-
-*These tasks are deferred until after beta ships. Do not start.*
-
- **INQUIRY-5**: Offline balance check via mesh relay
- **FEATURE-6**: Watch-only wallet architecture
- **TASK-7**: Mesh Bitcoin security hardening
- **TASK-2**: Roll incoming-tx into deploy & ISO
- **FEATURE-43**: P2P encrypted voice/video calling (WebRTC over federation)
-
---
-
-### FEATURE-43: P2P encrypted voice/video calling — WebRTC over federation (DEFERRED)
-**Priority**: P1 — High
-**Status**: DEFERRED (post-beta)
-
-Self-sovereign encrypted voice and video calling between Archipelago peers. Zero new containers or dependencies — uses browser-native WebRTC with signaling over the existing federation WebSocket. Integrates directly into peer tabs/chat.
-
-**Security & Privacy**:
- All media encrypted via DTLS/SRTP (WebRTC mandatory encryption — no opt-out)
- Signaling (SDP offers, ICE candidates) transmitted over existing federation WebSocket through Tor
- ICE candidate filtering: strip local/public IP candidates in Tor-relay mode
- No central server, no metadata leakage — true P2P between browsers
- Two privacy modes:
-  - **LAN Direct**: <50ms latency, IPs visible to peer (trusted same-network peers)
-  - **Tor Relay**: 300-800ms latency, full anonymity via coturn TURN server on .onion
-
-**Architecture**:
- Signaling reuses existing federation WebSocket — new message types: `call-offer`, `call-answer`, `call-ice`, `call-hangup`, `call-reject`, `call-busy`
- Browser `getUserMedia()` + `RTCPeerConnection` — no backend media processing
- Opus codec for voice (~30kbps, handles Tor latency well)
- VP8/VP9 adaptive bitrate for video (720p on LAN, degrades gracefully)
- Optional `coturn` container (~10MB RAM) for Tor-relay media mode only
-
-**UX**:
- Voice and video call buttons in peer chat (federation contacts)
- Incoming call: glass modal slides up with peer name + avatar, accept/decline
- In-call: floating glass PIP overlay — navigate while talking
- One-tap mute, camera toggle, speaker toggle, hangup
- Call quality indicator (green/yellow/red based on RTT)
- Ring timeout (30s) → missed call notification
- Call history in peer chat thread
-
-**Tasks**:
- [ ] `CallService.ts` — WebRTC wrapper (offer/answer, ICE management, stream handling, codec negotiation)
- [ ] Federation signaling protocol — new message types over existing WS (`call-offer`, `call-answer`, `call-ice`, `call-hangup`)
- [ ] Rust backend — relay call signaling messages between federation peers (pass-through, no media processing)
- [ ] ICE candidate filtering — strip public IPs in privacy mode, force relay-only
- [ ] `CallOverlay.vue` — incoming call modal (glass aesthetic, ring animation, accept/decline)
- [ ] `CallPIP.vue` — floating picture-in-picture during active call (draggable, minimize/expand)
- [ ] `CallControls.vue` — mute, camera toggle, speaker, hangup, privacy mode switch
- [ ] Voice-only mode — Opus codec, bandwidth-optimized, Tor-friendly
- [ ] Video mode — VP8/VP9 adaptive bitrate, resolution scaling based on connection quality
- [ ] Optional `coturn` container manifest — TURN relay for Tor-routed media
- [ ] Call quality monitoring — RTT measurement, packet loss detection, quality indicator
- [ ] Call history — persist in peer chat thread, missed call notifications
- [ ] Multi-peer consideration — design for 1:1 first, extensible to group calls later
- [ ] Test: LAN direct call (voice + video)
- [ ] Test: Tor relay call (voice — verify latency is acceptable)
- [ ] Test: call during active chat, call while navigating other views
- [ ] Test: network interruption recovery (ICE restart)
-
-**Key files** (new):
- `neode-ui/src/services/CallService.ts` — WebRTC engine
- `neode-ui/src/components/call/CallOverlay.vue` — incoming call UI
- `neode-ui/src/components/call/CallPIP.vue` — in-call floating overlay
- `neode-ui/src/components/call/CallControls.vue` — call action buttons
- `apps/coturn/manifest.yml` — optional TURN relay container
-
-**Key files** (modified):
- `neode-ui/src/views/Federation.vue` — call buttons in peer chat
- `core/archipelago/src/api/rpc/federation.rs` — call signaling relay
- `neode-ui/src/stores/federation.ts` — call state management
-
-## Completed
-
-| ID | Title | Completed |
-|----|-------|-----------|
-| **TASK-11** | Rootless podman migration (.228 — 30 containers) | 2026-03-18 |
-| **TASK-32** | Integrate boot loader into deploy + build + production | 2026-03-17 |
-| **TASK-34** | Pentest findings remediation plan | 2026-03-18 |
-| **TASK-26** | Rename fedimintd to "Fedimint Guardian" + icon | 2026-03-18 |
-| **TASK-27** | Add tab-launch icon to apps that open in tabs | 2026-03-18 |
-| **TASK-28** | Sort installed apps to end of marketplace | 2026-03-18 |
-| **TASK-29** | Fix mesh mobile: remove title/flash/peers header, fix gutters | 2026-03-18 |
-| **TASK-30** | On-Chain as first tab in receive Bitcoin modals | 2026-03-18 |
-| **TASK-35** | Federation node names (show name not DID, hover for key) | 2026-03-18 |
-| **TASK-36** | Cleaner iframe error screen with remediation | 2026-03-18 |
-| **BUG-1** | Random logout / CSRF mismatch — HMAC-derived tokens | 2026-03-18 |
-| **TASK-8** | Security hardening — 12/12 pentest findings fixed | 2026-03-18 |
-| **BUG-20** | ElectrumX index estimate string ~55→~130 GB | 2026-03-18 |
-| **BUG-37** | App card Start/Launch flicker during container scan | 2026-03-18 |
-| **BUG-40** | Uninstall dialog not full-screen modal | 2026-03-18 |
-| **BUG-41** | Uninstall loader ends but app card persists | 2026-03-18 |
-| **BUG-33** | CPU load alert threshold too low (8 = 2x cores) | 2026-03-18 |
-| **TASK-31** | Sticky nav header (Apps page) | 2026-03-18 |
-| **TASK-38** | Blockchain sync info on homepage System card | 2026-03-18 |
-| **TASK-17** | Alpha version tags + deploy auto-tag | 2026-03-18 |
-| **BUG-3** | IndeedHub WebSocket spam — removed dead nostrConfig | 2026-03-18 |
--- a/docs/MIGRATION_STATUS_REPORT.md
+++ b/docs/MIGRATION_STATUS_REPORT.md
@ -1,252 +0,0 @@
-# Migration Status Report
-
-Last updated: 2026-06-14
-
-## RESUME CHECKPOINT (2026-06-14, after SSH drop)
-
-State right now, so any disconnect resumes cleanly:
-
- **`main` = `a483fe4b`** = the other agent's 4 fixes (`0ed892a4`: wallet receive / bitcoin
-  install self-heal / ElectrumX tile / extended test gate) + **my F1 fix committed on top**
-  (`launch_url_port` in `docker_packages.rs` + 3 regression tests). Tree is clean (only two
-  untracked `docs/*.md` tracking files remain). Not pushed.
- The old isolated `archy-f1` worktree was **removed** — built the combined tree in-place.
- ✅ **DONE — combined backend release build** (`cd core && TMPDIR=/home/archipelago/.buildtmp
-  cargo build --release -p archipelago`, 7m46s, exit 0). `/tmp` is a full tmpfs so `TMPDIR`
-  MUST point at `/home/archipelago/.buildtmp`.
- ✅ **DONE — sideloaded + restarted on `.116`.** Backed up old binary to
-  `/usr/local/bin/archipelago.pre-f1.bak`, `install`ed new binary (root:root 755),
-  `sudo systemctl restart archipelago` (new MainPID 2885863).
- ✅ **F1 VALIDATED LIVE on `.116` (2026-06-14).** See "FINDING F1" below — before/after proves
-  the fix. Harness focused audit `jellyfin,filebrowser` → **all checks passed, exit 0**.
- **IMPORTANT — restart is SAFE on this node:** containers run rootless under
-  `user-1000.slice/user@1000.service/app.slice`, a DIFFERENT cgroup from
-  `/system.slice/archipelago.service`. They survived both the 01:47 and this restart
-  (bitcoin/lnd/btcpay/immich/indeedhub all intact, count stayed 36). The
-  `feedback_no_systemctl_deploy_until_quadlet` cgroup-cascade warning does NOT apply to `.116`'s
-  current config. (The reconciler does recreate a few app containers like jellyfin/fedimint on
-  adoption — normal level-triggered behavior, not casualties.)
- **RELEASE IN PROGRESS — v1.7.91-alpha (user approved 2026-06-14).** Bundles the other agent's
-  4 fixes (`0ed892a4`) + F1 (`a483fe4b`) + changelog (`ab858271`). Steps:
-  1. ✅ Freed `/tmp` (removed stale published frontend tarballs 1.7.83→1.7.89; ~1.1G free) —
-     `create-release.sh` writes the 184MB frontend tarball to `/tmp` (hardcoded, NOT TMPDIR).
-  2. ✅ `cargo fmt -p archipelago --check` clean; curated layman changelog added + committed.
-  3. 🔄 `TMPDIR=/home/archipelago/.buildtmp scripts/create-release.sh 1.7.91-alpha`
-     (runs `tests/release/run.sh` gate → bumps Cargo.toml/package.json → builds backend+frontend
-     → manifest → commit "chore: release v1.7.91-alpha" → tag `v1.7.91-alpha`). MUST set TMPDIR
-     or cargo's ring C-build fails on the full `/tmp` tmpfs.
-  - **AFTER create-release.sh:** `scripts/publish-release-assets.sh 1.7.91-alpha gitea-vps2`
-     → `git push origin main && git push gitea-local main` → `git push --tags` (origin+gitea-local).
-     Ship target per memory: vps2 (146.59.87.168) is PRIMARY OTA manifest; tx1138 RETIRED.
-  - Verify packaged tarball actually contains the new version string before trusting the build
-    (npm run build can silently produce stale dist — see `feedback_frontend_build_verify`).
-
-## Validation node (ACTIVE)
-
-As of 2026-06-14 the app-migration lifecycle validation moves from `.198` (remote, OVH) to
-**`.116` — the local dev node (`archi-thinkpad`, `192.168.1.116`)** because it is the machine
-this session runs on, so the harness drives it over loopback instead of SSH (much faster, no
-network latency). A separate agent owns OS-level fixes + its own test harness; this track owns
-the **app-packaging migration** lifecycle validation only.
-
-How to drive the harness against `.116` (local):
-
-```bash
-ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http ARCHY_PASSWORD='ThisIsWeb54321@' \
-  ARCHY_APPS='meshtastic,jellyfin,filebrowser,uptime-kuma' \
-  tests/lifecycle/remote-lifecycle.sh        # focused, audit-only (non-destructive)
-```
-
- `.116` serves nginx on **:80 only** (443 is tailscale's) → use `ARCHY_SCHEME=http`, `ARCHY_HOST=127.0.0.1`.
- Local node is healthy: `update_state.json.current_version == 1.7.90-alpha`, `update_in_progress=false`
-  (the OTA self-heal that was a follow-up gap in PROGRESS_MEMORY is now confirmed resolved on .116).
- Login password for `.116`: `ThisIsWeb54321@` (verified against `auth.login`). Note: auth.login
-  has a login rate-limiter — avoid rapid repeated attempts.
- `.198` results below remain the prior baseline; new results are tagged `[.116]`.
-
-### [.116] audit log (newest first)
-
- **2026-06-14 — focused audit `meshtastic,jellyfin,filebrowser,uptime-kuma` (audit-only, non-destructive):**
-  harness exit 1, FAILED checks: 1.
-  - `filebrowser` — running, pass (also passed a standalone single-app smoke run).
-  - `uptime-kuma` — running, pass.
-  - `meshtastic` — `state=absent`. Not installed on `.116` (was installed/validated on `.198`).
-    Not a regression; just node state. To exercise meshtastic here, install it first (it needs
-    `/dev/ttyUSB0`, which `.116` may not have) or drop it from the focused set on this node.
-  - `jellyfin` — **running but FAILED: "launch metadata missing: jellyfin has no lan_address".**
-    **ROOT-CAUSED 2026-06-14 — real, current bug in the working tree (a regression).** See
-    "FINDING F1" below.
-
-### [.116] FINDING F1 — manifest launch URLs with a path are silently dropped (OPEN, fix pending)
-
-**Symptom:** `jellyfin` is `running` and genuinely serving (`curl 127.0.0.1:8096/` → 302), but
-`container-list` reports `lan_address: null`, so the UI/harness sees no launch URL.
-
-**Root cause:** `core/archipelago/src/container/docker_packages.rs::reachable_lan_address()` parses
-the port out of the candidate URL with `url.rsplit(':').next()`. When the candidate comes from the
-manifest `interfaces.main` (via `PodmanClient::lan_address_for` →
-`core/container/src/podman_client.rs::manifest_primary_interface_url`), the URL **includes the
-manifest `path`** — e.g. jellyfin → `http://localhost:8096/`. Then `rsplit(':').next()` yields
-`"8096/"`, which **fails to `parse::<u16>()`**, so the function hits its `else { return None }`
-branch and drops a perfectly reachable launch URL. (Diagnostic tell: the dropped-at-parse path
-emits **no** log, whereas a genuine unreachable port logs "suppressing unreachable launch URL".
-jellyfin has no such log; uptime-kuma — whose candidate `…:3002` has no path — does.)
-
-**Why it's a regression:** the old `extract_lan_address(ports)` produced `http://localhost:PORT`
-(no path), which parsed fine. The newer manifest-interface feature appends the declared `path`,
-so any app routed through `lan_address_for` now yields `…:PORT/` and trips the parser.
-
-**Blast radius (apps in `requires_reachable_launch` whose `interfaces.main.path` = `/`):**
-`botfights`, `btcpay-server`, `fedimint`, `jellyfin`, `gitea`, `nextcloud`, `portainer`.
-(`filebrowser`/`nextcloud`/`nginx-proxy-manager`/`vaultwarden` are in `uses_allocated_launch_port`
-so they hit `extract_lan_address` first and dodge it; `grafana`/`mempool`/`uptime-kuma`/`searxng`
-have no manifest `interfaces.main` path.) On `.198` this likely went unnoticed because those apps
-weren't all running during the launch-metadata assertion, or predated the interfaces.main addition.
-
-**Fix (IMPLEMENTED in working tree, uncommitted):**
-`docker_packages.rs::reachable_lan_address` now parses the port via a new `launch_url_port()`
-helper that reads digits after the final colon (`take_while(is_ascii_digit)`), mirroring the
-RPC-layer `port_from_url`, so `http://localhost:8096/` → `Some(8096)`. Added unit tests
-(`launch_url_port_tests`) covering the trailing-path regression, the bare-authority case, and a
-no-port reject. The existing `lan_address_prefers_manifest_main_interface` test only exercised
-`lan_address_for` (which always returned `…:8175/`) and never the `reachable_lan_address` wrapper,
-which is why the bug slipped through.
-
-**Unit validation: GREEN (2026-06-14).** `cargo test -p archipelago --bin archipelago launch_url_port`
-→ 3 passed / 0 failed (trailing-path, bare-authority, no-port-reject); crate compiles clean.
-
-**Coordination note (shared tree):** the repo is on branch `fix/wallet-receive-portdrift-secrets`
-at commit `bb808df8` (= the deployed 1.7.90-alpha). A parallel agent has uncommitted changes here
-(lnd `wallet.rs`, `bitcoin_relay.rs`, `prod_orchestrator.rs`, electrumx manifest, neode-ui, new
-bats). To validate F1 in isolation (and NOT deploy their in-flight work onto the live node, nor
-disturb their tree), the live-validation build is done in a detached git worktree at
-`/home/archipelago/archy-f1` = clean `bb808df8` + only the F1 `docker_packages.rs` change. Build:
-`cd /home/archipelago/archy-f1/core && TMPDIR=/home/archipelago/.buildtmp cargo build --release -p archipelago`
-(`.116`'s `/tmp` is a 7.7G tmpfs that runs 100% full → the ring crate's C compile fails with
-"No space left on device"; redirect `TMPDIR` to `/` which has ~399G). After validation the
-worktree is removed (`git worktree remove`). NOTE: sideloading replaces the OTA-managed
-`/usr/local/bin/archipelago` with a local 1.7.90-alpha+F1 build until the next OTA — back up the
-current binary first (`/usr/local/bin/archipelago.pre-f1.bak`).
-
-**Live validation status — ✅ GREEN on `.116` (2026-06-14).** Built combined tree (`a483fe4b`),
-sideloaded, restarted `archipelago.service`. Before/after on the live node (old buggy binary → new):
-
-| app | OLD lan_address | NEW lan_address |
-|---|---|---|
-| jellyfin | `None` ❌ | `http://localhost:8096/` ✅ |
-| btcpay-server | `None` ❌ | `http://localhost:23000/` ✅ |
-| fedimint | `None` ❌ | `http://localhost:8175/` ✅ |
-| gitea | `None` ❌ | `http://localhost:3001/` ✅ |
-| portainer | `None` ❌ | `http://localhost:9000/` ✅ |
-| botfights | `None` ❌ | `http://localhost:9100/` ✅ |
-| nextcloud | `:8085` ✓ | `:8085` (unchanged — allocated-port path) |
-| filebrowser | `:8083` ✓ | `:8083` (unchanged) |
-
-Harness focused audit `jellyfin,filebrowser` → **all checks passed, exit 0**. Unit tests green.
-No container casualties (all 36 survived; see RESUME CHECKPOINT for the cgroup detail).
-
-NOTE: Do NOT run the prod binary directly to "check a version" —
-`/usr/local/bin/archipelago <anyflag>` boots a whole second node instance (learned the hard way
-2026-06-14; it exited without leaving a stray, but don't repeat).
-
-## Goal
-
-Make Archipelago's app/container system developer-ready and release-ready: app installs, lifecycle, recovery, and integrations should be portable, manifest-driven, and not rely on one-off OS-level changes or hardcoded Rust branches for each new app. The OS/backend should provide generic primitives for manifests, Quadlet rendering, lifecycle, health/readiness, dependency ordering, data ownership, image availability, bind mounts, secrets, app files, networking, bridge/signer integrations, and recovery.
-
-The developer contract should be clear enough that a third-party developer can build and ship an Archipelago app from documentation plus manifest/schema examples. If an app needs a capability the platform does not yet expose, the release direction is to add a reusable manifest/orchestrator primitive rather than a special case tied to that app. This is the standard for the `1.8-alpha` app migration: professional app delivery, predictable behavior after restart/reboot, and a path for user-installed/community apps that does not require rebuilding the OS image for every app.
-
-Release quality bar: every supported app must install, stop, start, restart, uninstall, survive host reboot, report accurate status, and expose clear install/uninstall progress. Stale health notifications must not persist across login or refresh after the underlying condition has cleared. Final release validation should run on the intended release validation server, not drift between appliances without an explicit checkpoint.
-
-Target release: `1.8-alpha`, including a cut and smoke-tested ISO once validation is green.
-
-Current release readiness estimate: about `82%`. The remaining percentage is mostly post-reboot recovery confidence, repeated reboot validation, and ISO creation/smoke testing rather than the core manifest/catalog migration itself.
-
-## Current Result
-
- The migration is not final-release complete yet, but the core direction is being met.
- Portainer, Filebrowser, BTCPay, Grafana, Nostr Relay, SearXNG, Gitea, and key dependency units have moved further into the manifest/orchestrator path.
- `.198` has passed focused and broad lifecycle audits for the already migrated set.
- Meshtastic is now routed through the orchestrator path, no longer falls back to legacy `localhost/meshtastic:latest`, and has passed full lifecycle validation on `.198`.
- On 2026-06-02, focused and broad `.198` non-destructive lifecycle audits passed after clearing a wedged `nextcloud` Podman record. The live registry config already has OVH primary plus tx1138 mirror, and Meshtastic/Portainer were added to the catalog surfaces.
- Later on 2026-06-02, the current release backend hash `579b823cf4a4b8c50bb3d0c3d49449c58101b016eb6ebc8049975dce98e34265` was found active and stable on `.198`. Meshtastic `app.files` rendering was proven live by removing `/var/lib/archipelago/meshtastic/config.yaml`, restarting through `package.restart`, and verifying the manifest recreated the file. Focused Meshtastic, focused `meshtastic,jellyfin,filebrowser`, and broad non-destructive audits all passed afterward; raw Podman sweep was clean.
- The remaining release gate was continued on 2026-06-02: bounded disk cleanup, journal retention, backend-backup retention, and release-focused catalog drift classification were added. `.198` is active on backend hash `e285d421cef497beb6b4b929f36fb4296d6db1f4a4c786157b6751eec51619ca`; focused and broad post-cleanup lifecycle audits passed, and final raw Podman sweep was clean.
- Follow-up found Podman store commands can hang on `.198` beyond image prune (`podman system df`, image list/exists, and sometimes broad ps/inspect). The release cleanup path now skips Podman image/volume prune rather than touching that unstable path. `.198` is active on backend hash `c9695dc3db10ff6e593cdbcfbbdc94b2e98b6008aa62655bba51b9879b549e8c`; Uptime Kuma was repaired with a normal `package.restart`; focused and broad post-repair lifecycle audits passed, and final raw bad-state sweep was clean.
- On 2026-06-03, startup/adoption scanner hardening and pasta restart repair were deployed. `.198` is active on backend hash `2b72e83ff368e4a696ad701f8985b0a8e1e889d9f4844056dc063455df973b28`; `package.restart` for Uptime Kuma now returns successfully and restores the `3002` pasta listener; focused `meshtastic,jellyfin,filebrowser,uptime-kuma` and broad lifecycle audits passed.
- Later on 2026-06-03, expanded rollback cleanup and store-safe uninstall hardening were deployed. `.198` is active on backend hash `7f90345b75148b7ed748e1a417f31d1273e1646a9b742891858df11c5397051b`; `system.disk-cleanup` reclaimed `10.3 GB` from old backend and web UI rollback artifacts while still skipping Podman prune, and focused `meshtastic,jellyfin,filebrowser,uptime-kuma` lifecycle passed afterward.
- Latest 2026-06-03 follow-up deployed backend hash `d21202cd79794e3bfc882d37134afd7a41dac766bae386a675714e5fa030e94e`. It mitigates stale cached `container-list` state during Podman scan backoff, adds a bounded TCP reachability fallback for `container-health`, and adds Jellyfin `8096` to legacy pasta host-listener repair. Focused `meshtastic,jellyfin,filebrowser,uptime-kuma` lifecycle passed on this hash. Broad lifecycle still needs rerun on this latest hash.
- Current validation backend hash is `14d360a206d1e58f287c5722d709dace0284b0dea56b66aa4bce0f57c631631b`. It keeps the generic host-listener health direction, preserves the `container-health` fallback fix from `be95ea...`, hardens fresh local-build installs so `podman image exists <local-build-tag>` failures/timeouts rebuild instead of failing the lifecycle operation, and reduces duplicated legacy runtime port repair by deriving host ports from manifests. Targeted PhotoPrism and broad non-destructive `.198` lifecycle audits passed on this hash.
- Catalog metadata generation from manifests is now implemented via `scripts/generate-app-catalog.py`. The canonical catalog and UI public catalog are synced from manifest-owned fields, strict release drift is zero, and frontend build validation passed.
- Current live `.198` validation backend hash is `95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de`. Broad non-destructive lifecycle is green on that deployed line after app health/port recovery, IndeedHub recovery, scoped legacy install hardening, and bounded Podman pull hardening.
- Local release validation now passes the full backend binary test target and every Rust workspace member after release cleanup fixes for scanner backoff wakeups, crash-recovery tests, manifest-port lookup, journal parsing, and boot-reconciler test determinism.
- Frontend release validation now passes `npm run type-check`, `npm test` (`548` tests), and `npm run build` after fixing mobile app-launch routing for new-tab apps and updating stale launch tests. Local `npm ci` is blocked by root-owned `neode-ui/node_modules` entries, so dependency reinstall remains a local environment cleanup item requiring explicit approval.
- Reboot validation is not yet green. User reported that a reboot test left IndeeHub stopped afterward, with multiple containers killed by SIGKILL during shutdown/reboot and at least one crash. Treat post-reboot recovery as the active release blocker.
- Local follow-up now hardens IndeeHub stack boot recovery and updates lifecycle validation so IndeeHub must still serve the Nostr signer bridge (`/nostr-provider.js`) before a launch probe passes.
-
-## Completed In This Pass
-
- Pause checkpoint for resume: generated app-session metadata now covers manifest-owned launch ports, titles, and new-tab behavior. The next migration step should continue from proxy path/companion UI alias generation or return to the release blocker around post-reboot IndeeHub recovery.
- Updated `docs/APP-PACKAGING-MIGRATION-PLAN.md` to reflect the current `apps/<app-id>/manifest.yml` contract, replacing stale `archy-app.yml` next-step language with the actual parser/generator/orchestrator progress and the remaining migration blockers.
- Updated `docs/app-developer-guide.md` so developers see the current manifest fields, generated catalog flow, validation commands, and release lifecycle expectations instead of the older Nostr marketplace publish/trust-score draft.
- Verified the developer-guide manifest example parses as YAML, `scripts/generate-app-catalog.py` is idempotent, strict release catalog drift remains zero, and `git diff --check` is clean for the migration docs.
- Extended `scripts/generate-app-catalog.py` to also emit `neode-ui/src/views/appSession/generatedAppSessionConfig.ts` from manifests, and wired `appSessionConfig.ts` to merge generated launch ports/titles/new-tab launch behavior with the existing manual overrides for companion UIs and aliases.
- Added a Fedimint `interfaces.main` launch declaration for the Guardian wait/proxy UI on port `8175`, so that public launch surface is now represented in the manifest.
- Focused validation passed for the generated app-session path: Python helper compile, generator idempotence, strict catalog drift, `appSessionConfig.test.ts`, and frontend type-check.
- Aligned `docs/APP-PACKAGING-MIGRATION-PLAN.md` and `docs/app-developer-guide.md` with the current manifest/runtime contract so the release docs no longer describe the stale marketplace-style schema.
- Removed the hardcoded Portainer host-prep path and replaced it with a manifest plus generic Podman socket bind-mount preparation.
- Added generic Quadlet health drift detection for command, interval, timeout, and retry changes.
- Made rendered HTTP health helpers honor manifest timeouts.
- Added image availability guards before Quadlet starts/restarts so pruned images are pulled or built before systemd tries to start them.
- Fixed stale dependency handling so active manifest dependencies are not suppressed by old `user-stopped.json` entries.
- Added parent-app reconcile syncing for dependency Quadlet units.
- Validated Portainer, Filebrowser, BTCPay, and broad non-destructive audits on `.198`.
- Updated Meshtastic manifest to use a real available image, the real `/dev/ttyUSB0` device, the actual daemon data path, and a non-HTTP health check.
- Updated the lifecycle harness so non-HTTP apps do not require launch metadata.
- Added a generic manifest-owned file rendering primitive under `app.files` so apps can declare required bind-mounted config files without adding app-specific Rust/OS branches.
-
-## Current `.198` State
-
- `archipelago.service`: active.
- `archipelago-doctor.timer`: inactive.
- `archipelago-reconcile.timer`: inactive.
- Current validation backend hash: `95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de`.
- `.198` root filesystem pressure is currently resolved for release validation: latest sweep showed `/` at 65% used with about 9.6G free after expanded rollback cleanup.
- Latest focused Fedimint, Immich, IndeedHub, and PhotoPrism audits passed on the current hash.
- Broad non-destructive lifecycle passed on the current hash before and after backend restart validation.
-
-## Meshtastic Status
-
- Orchestrator routing is fixed and verified by the generated Quadlet unit.
- Current generated unit uses:
-  - `Image=docker.io/meshtastic/meshtasticd:daily-alpine`
-  - `Volume=/var/lib/archipelago/meshtastic:/var/lib/meshtasticd:Z`
-  - `AddDevice=/dev/ttyUSB0`
-  - `HealthCmd=test -f /var/lib/meshtasticd/config.yaml`
- The daemon starts and accepts TCP API connections on port `4403`.
- Full lifecycle passed on `.198`: install, stop, start, restart, uninstall with preserved data, and reinstall.
- A persisted `config.yaml` is required. The release path is now the generic `app.files` manifest primitive rather than a Meshtastic-specific backend hook, and this has been verified live on `.198` by deleting the file and proving `package.restart` recreates it from the manifest.
-
-## Release Blockers
-
- Continue monitoring the current optimized release backend on `.198`; the previously observed release-binary segfault is not reproducing with hash `95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de`.
- `system.disk-cleanup` now handles journal, backend-backup, legacy backend rollback, and web UI rollback retention while intentionally skipping Podman image/volume prune because Podman store commands can hang on `.198` under current load. Diagnose Podman store health separately from the release cleanup path.
- Release image probes have been further quarantined from the fragile Podman store commands and deployed to `.198` on backend hash `7e82532137292e91111f63819d1be7fa69f994ce20d6b5e0194915f194f20412`: runtime, legacy install, and companion image checks now use bounded targeted `podman image inspect` instead of `podman image exists` or `podman images -q`. Focused and broad non-destructive lifecycle validation passed on the deployed hash.
- Podman socket/runtime health remains a release blocker: `package.restart jellyfin` stopped the container but failed to complete because Podman reported `Cannot connect to Podman socket at /run/user/1000/podman/podman.sock: Permission denied`; `package.start jellyfin` recovered the app and the focused lifecycle passed afterward.
- Release-focused catalog drift now has zero missing catalog/manifest entries and zero metadata drift after generating catalog metadata from manifests.
- Backend-restart validation passed. Host-reboot validation is currently failed/pending due to post-reboot IndeeHub recovery. Reboot retests should run only after an explicit release checkpoint/approval.
- Local code-review/refactor cleanup gate has full local validation coverage now:
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago` passed (`688` tests);
-  - all other workspace packages check/test clean;
-  - frontend type-check/tests/build passed;
-  - release build, catalog drift, catalog idempotence, Python helper compile, and whitespace checks passed.
- Before `1.8-alpha` release:
-  - deploy the post-reboot recovery fixes;
-  - prove focused IndeeHub lifecycle with Nostr signer injection intact;
-  - update the app packaging/developer docs so `docs/APP-PACKAGING-MIGRATION-PLAN.md` and `docs/app-developer-guide.md` match the current manifest/runtime contract and release-quality lifecycle expectations;
-  - complete the required refactor/remove-dead-code gate after correctness validation: remove obsolete transitional code, stale per-app hacks, duplicate lifecycle paths, and misleading compatibility fallbacks, then rerun release validation;
-  - require at least 3 consecutive clean post-fix reboots with broad non-destructive lifecycle green after each;
-  - prefer 5 consecutive clean reboots for production-release confidence;
-  - cut and smoke-test the `1.8-alpha` ISO.
-
-## Bottom Line
-
-We are working toward the intended goal: better than Umbrel/StartOS by making app behavior declarative and registry/manifest-owned. The migration is substantially advanced, Meshtastic manifest-owned config generation is verified live, catalog metadata is generated from manifests, disk cleanup/backup retention is in place without Podman prune risk, and full local backend/frontend workspace validation has been green. Remaining follow-up for `1.8-alpha` is post-reboot recovery validation, especially IndeeHub plus Nostr signer behavior, repeated reboot passes, ISO cut/smoke test, separate Podman socket/store-health diagnosis, and optional local cleanup of root-owned frontend dependencies before rerunning `npm ci`.
--- a/docs/NEXT_TERMINAL_HANDOFF.md
+++ b/docs/NEXT_TERMINAL_HANDOFF.md
@ -1,572 +0,0 @@
-# Next Terminal Handoff - Archipelago `1.8-alpha`
-
-Last updated: 2026-06-11 00:17 America/New_York
-
-## Resume Prompt
-
-Paste this into the next terminal/session:
-
-> Continue Archipelago `1.8-alpha` release hardening from `/home/archipelago/Projects/archy`. First read `docs/NEXT_TERMINAL_HANDOFF.md`, then `docs/RESUME.md`, `docs/CONTAINER_LIFECYCLE_HANDOFF.md`, `docs/MIGRATION_STATUS_REPORT.md`, and `docs/1.8-alpha-improvements-tracker.md`. Active validation node is `.198` at `192.168.1.198` with user `archipelago` and password `password123`. Keep `archipelago-doctor.timer` and `archipelago-reconcile.timer` inactive for deterministic validation. Do not run broad Podman store/image cleanup commands on `.198` (`podman prune`, `podman image list`, `podman system df`, broad image-exists/list/store-wide cleanup); the store/control path is known to hang under load. Preserve app data. Latest deployed backend hash on `.198` is `159e0daf13fca2df7e831122cb0e6c84223a7e5b7433f5dd0b7eec263233e228`. Fedimint Guardian public launch is fixed: `8175` serves the styled wait/proxy UI with real background/icon assets and proxies to backend Guardian on `8177`; `package.restart fedimint` now returns immediately and settled with both services active. Latest local-only tracker pass added uninstall preserve/delete-data UI, companion APK QR/download, setup instructions rendering, Fleet/Bitcoin receive-state loading improvements, Nextcloud false-update work, PhotoPrism credential fallback, and removed the Spotlight AI coming-soon block. Continue with the broader rootless Podman lifecycle/control-plane blocker, My Apps state truthfulness, progress UX, remaining in-progress tracker items, full lifecycle, clean reboot iterations, ISO cut, and ISO smoke test.
-
-## Current Goal
-
-Cut Archipelago `1.8-alpha`, including a ready-to-test ISO image.
-
-Release status is still not green. The remaining work is mostly systemic hardening and final gates, not basic app catalog wiring.
-
-The user improvement list in `docs/1.8-alpha-improvements-tracker.md` is part of
-the same release and next ISO cut. Keep that tracker updated as items move from
-`todo` to `in-progress`, `blocked`, `done`, or explicit release deferral.
-
-## Active Session Checkpoint - 2026-06-10 05:48 EDT
-
-New terminal resumed from this handoff. No `.198` host actions have been run in
-this resumed pass yet.
-
-Resume-save checkpoint, 2026-06-10 08:32 EDT: progress is saved in this handoff
-and `docs/1.8-alpha-improvements-tracker.md`. No `.198` host actions were run
-after the 05:48 checkpoint, no dev server was intentionally left running, and no
-long-running validation command is expected to still be active from this pass.
-The user explicitly wants the fixes backlog continued, not app migration work,
-unless they redirect. Start a resumed session by re-reading the tracker row
-`Make tabs info load quickly or show loading states`, then continue the slow
-panel audit or move to the next unresolved fixes-backlog row.
-
-Resume-save checkpoint, 2026-06-10 23:15 EDT: continued only frontend fixes
-backlog work and avoided Bitcoin/Tor RPC/backend paths because another agent is
-working there. No `.198` host actions were run, no dev server was intentionally
-left running, and no long-running validation command is expected to still be
-active from this pass.
-
-Resume-save checkpoint, 2026-06-11 00:17 EDT: continued the fixes backlog only,
-not app migration. Avoid Bitcoin/Tor RPC/backend work because a separate agent
-is working there. The latest local change fixes the header responsiveness
-regression the user flagged: primary My Apps/App Store/Websites navigation is
-restored to persistent desktop tabs at `md+` on My Apps, Discover, and
-Marketplace; desktop primary dropdowns were removed; mobile dropdown behavior
-remains; App Store category collapse is delayed by starting uncollapsed and
-using a smaller header gap/search reserve; My Apps desktop category dropdown was
-removed. Validation passed `npm run type-check`,
-`npm test -- --run src/views/marketplace/__tests__/MarketplaceAppCard.test.ts src/views/apps/__tests__/appsConfig.test.ts`,
-and scoped `git diff --check`. Browser smoke against the already-running local
-Vite/mock session (`http://127.0.0.1:8102` and mock backend `5959`) is still
-pending. Leave that existing session alone unless it has already exited.
-
-Exact first step for this pass:
-
-1. Update the handoff docs with this fresh checkpoint.
-2. Rerun local resume gates that were pending after the 05:30 checkpoint:
-   `git diff --check` and the focused Rust image-version test for the
-   Nextcloud false-update work.
-3. If local gates are clean, continue the rootless Podman lifecycle/control-plane
-   blocker by inspecting the backend scanner/backoff and package stop/start/
-   restart paths before touching `.198`.
-
-Progress in this resumed pass:
-
- `git diff --check` passed.
- `/tmp` has sufficient build headroom for focused Rust validation
-  (`/tmp` was 14% used at the start of the pass).
- Focused Rust validation for Nextcloud/image-version work is still
-  inconclusive, not green:
-  `env CARGO_INCREMENTAL=0 CARGO_TARGET_DIR=/tmp/archy-cargo-image-versions cargo test --manifest-path core/Cargo.toml -p archipelago container::image_versions::tests`
-  compiled through the `archipelago` crate, then the tool PTY stayed open with
-  no active `cargo`, `rustc`, or linker process visible in `ps`.
- A bounded retry using the normal workspace target also did not finish:
-  `timeout 300s cargo test --manifest-path core/Cargo.toml -p archipelago container::image_versions::tests`
-  exited `124` after compiling the `archipelago` test target without reaching
-  test output. Keep the Nextcloud false-update row `in-progress`.
- Found and fixed a lifecycle asymmetry in
-  `core/archipelago/src/api/rpc/package/runtime.rs`: `package.stop` claimed to
-  return immediately but single-orchestrator apps still stopped synchronously
-  before responding. The local change now lets migrated single-orchestrator apps
-  return `{"status":"stopping"}` immediately and finish stop in the background,
-  matching start/restart behavior. This is not deployed yet and still needs
-  local validation.
- Separate UI-only pass on port-review track:
-  - My Apps now preserves the last known backend package list when a later
-    scanner/backoff update reports `containers-scanned=false` with an empty
-    package map;
-  - the page shows `Refreshing container state. Showing the last known app list
-    until the scan finishes.` above the app grid while cached app state is being
-    rendered;
-  - this touched only `neode-ui` UI files and this handoff/tracker note, so it
-    should not conflict with the backend app migration/control-plane pass;
-  - focused validation passed:
-    `npm test -- --run src/views/apps/__tests__/appPackageCache.test.ts` and
-    `npm run type-check`.
-  - Web5 Shared Content My Content tab now keeps the current content list
-    visible during refresh/failure and shows `Refreshing shared content...`;
-  - Web5 Shared Content Browse Peers tab now keeps the current peer content list
-    visible while refreshing the same peer, and shows `Refreshing peer content...`
-    instead of replacing the tab with a full loading panel;
-  - switching to a different peer still clears stale content and shows the full
-    connecting state;
-  - focused validation passed:
-    `npm test -- --run src/views/web5/__tests__/Web5SharedContent.test.ts` and
-    `npm run type-check`.
-  - Local review services are running for user review:
-    Vite `http://localhost:8102/` / `http://192.168.1.116:8102/` and mock
-    backend `http://localhost:5959`; `curl` probes returned HTTP `200` for both
-    the Vite root and proxied `server.get-state`.
- `cargo fmt --manifest-path core/Cargo.toml --all --check` passed after the
-  stop-path fix.
- Backend compile validation for the stop-path fix passed:
-  `env CARGO_TARGET_DIR=/tmp/archy-cargo-runtime-check cargo check --manifest-path core/Cargo.toml -p archipelago --bin archipelago`.
-  The first check session also eventually returned success after the bounded
-  rerun waited on its build-directory lock.
- `git diff --check` passed again after the stop-path edit and doc updates.
- Follow-up inspection confirmed the lower-level Quadlet/orchestrator stop path
-  is already bounded: `quadlet::stop_service` uses timed `systemctl --user stop`
-  with app-scoped kill/reset recovery, and the runtime fallback treats missing
-  containers as success. No additional lower-level stop change was made in this
-  pass.
- Latest backlog-fix pass stayed on the fixes tracker, not new app migration:
-  - backend `package.credentials` now returns manifest-backed PhotoPrism
-    credentials (`admin` / `archipelago`) directly, matching the existing UI
-    fallback;
-  - My Apps and mobile icon-grid credential pre-launch modals are centered
-    vertically on mobile instead of behaving like bottom sheets;
-  - validation passed:
-    `npm test -- --run src/views/apps/__tests__/appCredentials.test.ts src/views/apps/__tests__/AppIconGrid.test.ts`,
-    `npm run type-check`,
-    `env CARGO_TARGET_DIR=/tmp/archy-cargo-runtime-check timeout 300s cargo check --manifest-path core/Cargo.toml -p archipelago --bin archipelago`,
-    `cargo fmt --manifest-path core/Cargo.toml --all --check`, and
-    `git diff --check`.
- Focused Nextcloud/image-version Rust test is still not green:
-  `env CARGO_INCREMENTAL=0 CARGO_TARGET_DIR=/tmp/archy-cargo-image-versions-2 timeout 600s cargo test --manifest-path core/Cargo.toml -p archipelago container::image_versions::tests -- --nocapture`
-  again exited `124` after compiling into the `archipelago` crate without
-  reaching test output. Keep that tracker row `in-progress`.
- Continued the tab loading-state backlog:
-  - Web5 Connected Nodes Messages and Requests tabs keep populated lists
-    visible during refresh or refresh failure;
-  - Web5 Identities keeps the current identity list visible during refresh or
-    refresh failure and shows `Refreshing identities...`;
-  - Web5 DWN message browsing keeps stored messages visible during refresh or
-    refresh failure and shows `Refreshing messages...`;
-  - validation passed:
-    `npm test -- --run src/views/web5/__tests__/Web5ConnectedNodes.test.ts src/views/web5/__tests__/Web5Identities.test.ts src/views/web5/__tests__/Web5DWN.test.ts`
-    and `npm run type-check`.
- Continued the same tab/loading-state backlog on Server networking:
-  - Server Network overview keeps current values visible during refresh/failure
-    and shows `Refreshing network...`;
-  - Server Network Interfaces keeps current detected interfaces visible during
-    refresh/failure and shows `Refreshing interfaces...`;
-  - Server Tor Services keeps existing hidden-service rows visible during
-    refresh/failure and shows `Refreshing Tor services...`;
-  - validation passed:
-    `npm test -- --run src/views/__tests__/ServerNetworkRefresh.test.ts` and
-    `npm run type-check`.
- Continued the same loading-state backlog on Credentials:
-  - the Credentials list keeps existing credential rows visible during
-    refresh/failure and shows `Refreshing credentials...`;
-  - validation passed:
-    `npm test -- --run src/views/__tests__/CredentialsRefresh.test.ts src/views/__tests__/ServerNetworkRefresh.test.ts`
-    and `npm run type-check`.
- Continued the same loading-state backlog on Lightning Channels:
-  - the channels list keeps existing channels visible during refresh/failure
-    and shows `Refreshing channels...`;
-  - validation passed:
-    `npm test -- --run src/views/apps/__tests__/LightningChannels.test.ts src/views/__tests__/CredentialsRefresh.test.ts src/views/__tests__/ServerNetworkRefresh.test.ts`
-    and `npm run type-check`.
- Continued the same loading-state backlog on Peer Files:
-  - the peer catalog keeps existing file cards visible during Tor
-    refresh/failure and shows `Refreshing peer files...`;
-  - validation passed:
-    `npm test -- --run src/views/__tests__/PeerFilesRefresh.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Cloud peer cards:
-  - Cloud keeps existing peer cards visible during federation peer-list
-    refresh/failure and shows `Refreshing peer nodes...`;
-  - validation passed:
-    `npm test -- --run src/views/__tests__/CloudPeersRefresh.test.ts src/views/__tests__/PeerFilesRefresh.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on the Web5 Verifiable Credentials
-  summary:
-  - the summary keeps existing credential rows visible during refresh/failure
-    and shows `Refreshing credentials...`;
-  - validation passed:
-    `npm test -- --run src/views/web5/__tests__/Web5CredentialsSummary.test.ts src/views/__tests__/CloudPeersRefresh.test.ts src/views/__tests__/PeerFilesRefresh.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Web5 Nostr Relays:
-  - relay stats stay visible during refresh/failure and show
-    `Refreshing relays...`;
-  - validation passed:
-    `npm test -- --run src/views/web5/__tests__/Web5NostrRelays.test.ts src/views/web5/__tests__/Web5CredentialsSummary.test.ts src/views/__tests__/CloudPeersRefresh.test.ts src/views/__tests__/PeerFilesRefresh.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Web5 Domains:
-  - registered-name counts stay visible during refresh/failure and show
-    `Refreshing domains...`;
-  - validation passed:
-    `npm test -- --run src/views/web5/__tests__/Web5Domains.test.ts src/views/web5/__tests__/Web5NostrRelays.test.ts src/views/web5/__tests__/Web5CredentialsSummary.test.ts src/views/__tests__/CloudPeersRefresh.test.ts src/views/__tests__/PeerFilesRefresh.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Settings Backups:
-  - existing backup rows stay visible during refresh/failure and show
-    `Refreshing backups...`;
-  - validation passed:
-    `npm test -- --run src/views/settings/__tests__/BackupSection.test.ts src/views/web5/__tests__/Web5Domains.test.ts src/views/web5/__tests__/Web5NostrRelays.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Settings Transport Preferences:
-  - existing preference controls stay visible during refresh/failure and show
-    `Refreshing transport preferences...`;
-  - validation passed:
-    `npm test -- --run src/views/settings/__tests__/TransportPrefsCard.test.ts src/views/settings/__tests__/BackupSection.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Settings VPN status:
-  - current VPN connection details stay visible during refresh/failure and show
-    `Refreshing VPN status...`;
-  - validation passed:
-    `npm test -- --run src/views/settings/__tests__/VpnStatusSection.test.ts src/views/settings/__tests__/TransportPrefsCard.test.ts src/views/settings/__tests__/BackupSection.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the same loading-state backlog on Web5 Federation:
-  - summary node counts and node DID stay visible during refresh/failure and
-    show `Refreshing federation...`;
-  - validation passed:
-    `npm test -- --run src/views/web5/__tests__/Web5Federation.test.ts`,
-    `npm run type-check`, and `git diff --check`.
- Continued the Mesh map denied-location backlog:
-  - added component coverage that browser geolocation denial remains optional
-    and tells the user peer positions can still appear;
-  - validation passed:
-    `npm test -- --run src/components/__tests__/MeshMap.test.ts`,
-    `npm run type-check`, and `git diff --check`.
-  - row remains `in-progress` until browser smoke validates denied location
-    with a real peer coordinate message.
- Continued the companion/tab-app backlog:
-  - mobile app-session keeps apps that require a new tab inside the mobile
-    session fallback instead of auto-opening an external tab and closing;
-  - validation passed:
-    `npm test -- --run src/views/__tests__/AppSessionMobileNewTab.test.ts src/views/appSession/__tests__/appSessionConfig.test.ts src/stores/__tests__/appLauncher.test.ts`,
-    `npm run type-check`, and `git diff --check`.
-  - row remains `in-progress` until broader companion smoke testing is done.
- Continued the Nostr Discoverable Nodes UI backlog:
-  - Discover modal keeps existing discovered rows visible during relay
-    refresh/failure and shows `Searching relays...`;
-  - validation passed:
-    `npm test -- --run src/views/federation/__tests__/DiscoverModal.test.ts`,
-    `npm run type-check`, and `git diff --check`.
-  - row remains `in-progress` until live relay/trust validation is done.
- Continued the App Store screenshots backlog:
-  - Marketplace App Details and installed App Details no longer show fake
-    screenshot placeholder tiles when no screenshot metadata exists;
-  - both views now render real screenshot URLs when metadata is provided as
-    strings or `{ src, alt }` objects;
-  - validation passed:
-    `npm test -- --run src/views/appDetails/__tests__/AppContentSection.test.ts src/composables/__tests__/useMarketplaceApp.test.ts`,
-    `npm run type-check`, and `git diff --check`;
-  - row remains `in-progress` until real screenshot assets/metadata are added.
- Continued the Home/App Store recommendations backlog:
-  - Home now shows an App Store recommendations card with up to three
-    uninstalled core/recommended marketplace apps;
-  - the selector respects installed aliases, so recommended apps drop out once
-    installed and then rely on normal My Apps/Home behavior;
-  - card clicks reuse the existing Marketplace App Details handoff;
-  - card animation ordering was tightened so Home cards have a stable stagger
-    sequence as the recommendations card appears/disappears;
-  - validation passed:
-    `npm test -- --run src/views/home/__tests__/homeRecommendations.test.ts`,
-    `npm run type-check`,
-    `git diff --check`, and
-    `ARCHY_BASE_URL=http://127.0.0.1:8103 npx playwright test e2e/visual-regression.spec.ts -g 'home / dashboard' --project=chromium`;
-  - temporary Vite on `8103` was stopped after the smoke. An older local
-    dev/mock session on `8102`/`5959` was already present and was left alone.
-  - tracker row is `done`.
- Home layout follow-up:
-  - Cloud was moved back into the second card slot;
-  - Recommended Apps moved into Cloud's previous position;
-  - Quick Start now lives inside the dashboard grid next to Wallet, with
-    stacked goal buttons, instead of rendering as a separate odd-width row;
-  - validation passed:
-    `npm test -- --run src/views/home/__tests__/homeRecommendations.test.ts`,
-    `npm run type-check`,
-    `git diff --check`, and
-    `ARCHY_BASE_URL=http://127.0.0.1:8102 npx playwright test e2e/visual-regression.spec.ts -g 'home / dashboard' --project=chromium`.
- Continued the Easy Mode experience backlog:
-  - goal configure steps now route to their owning app/screen instead of
-    silently completing without navigation;
-  - verify steps now show `Check & Continue`, so goals that start with a verify
-    step are no longer stuck without an active action;
-  - configure/info/verify actions start goal progress before completing the
-    current step;
-  - validation passed:
-    `npm test -- --run src/views/goals/__tests__/goalStepActions.test.ts src/stores/__tests__/goals.test.ts`,
-    `npm run type-check`, and `git diff --check`;
-  - tracker row is `in-progress` because broader Easy Mode product scope still
-    needs review.
- Continued the setup screens/function/flow backlog:
-  - onboarding setup choice now shows only usable paths, Fresh Start and
-    Restore from Seed;
-  - removed the disabled `Connect Existing (Coming Soon)` option;
-  - validation passed:
-    `npm test -- --run src/views/__tests__/OnboardingOptions.test.ts src/composables/__tests__/useOnboarding.test.ts`,
-    `npm run type-check`, and `git diff --check`;
-  - tracker row is `in-progress` because broader onboarding/setup audit still
-    needs review.
-
-## Latest Local Checkpoint - 2026-06-10 05:30 EDT
-
-User paused work to switch machines. No dev server or validation command should
-be intentionally left running from this checkpoint.
-
-Latest local-only release-tracker work since the older `.198` handoff:
-
- Uninstall/data reset:
-  - My Apps and App Details uninstall dialogs now include `Delete app data and reset it`;
-  - unchecked preserves app data and sends `preserve_data=true`;
-  - checked sends `preserve_data=false`;
-  - covered by `AppsUninstallModal.test.ts`, `rpc-client.test.ts`, type-check, and `git diff --check`;
-  - tracker row is `done`.
- Companion APK:
-  - companion intro modal uses `VITE_COMPANION_APK_URL` or `/packages/archipelago-companion.apk.zip`;
-  - desktop shows a centered QR image generated with the same `qrcode` library used by wallet flows;
-  - mobile shows a direct download button;
-  - visible close button restored;
-  - APK exists at `neode-ui/public/packages/archipelago-companion.apk.zip`;
-  - tracker row is `done`.
- Setup instructions:
-  - App Details sidebar renders `static-files.instructions` when non-empty;
-  - covered by `AppSidebar.test.ts`, type-check, and `git diff --check`;
-  - tracker row is `done`.
- Fleet / tab loading:
-  - Fleet auto-refresh header/sort controls were tightened;
-  - node history no longer blanks during refresh and now shows `Refreshing history...`;
-  - covered by `useFleetData.test.ts`, type-check, and `git diff --check`;
-  - tracker row remains `in-progress` pending broader slow-tab audit.
- Bitcoin receive readiness:
-  - receive modals show a live `Checking Lightning wallet readiness...` message while on-chain address generation is in flight;
-  - shared helper now distinguishes LND REST/newaddress transport failures;
-  - covered by `bitcoinReceive.test.ts`, type-check, and `git diff --check`;
-  - tracker row remains `in-progress` pending live wallet-state smoke test.
- Nextcloud false update:
-  - Nextcloud manifest/catalog/static UI metadata moved from `28` to pinned `29`;
-  - update comparison now ignores registry-host-only image changes while reporting same-repo tag drift;
-  - `python3 scripts/check-app-catalog-drift.py --release --strict` passed;
-  - `cargo test -p archipelago container::image_versions::tests` from `core/` failed first with a Rust linker/incremental artifact issue after `/tmp` was full, then the non-incremental retry was killed because it ran too long;
-  - old `/tmp/archy-cargo-*` build-cache directories were removed and `/tmp` recovered to about 14% used;
-  - tracker row is `in-progress`; rerun the focused Rust test before marking done.
- Dead/coming-soon UI:
-  - removed the non-interactive Spotlight AI Assistant coming-soon block;
-  - verified no active UI `Coming soon` strings remain outside historical release-note text;
-  - type-check passed and `git diff --check` passed;
-  - tracker row is `done`.
- No-registration credentials:
-  - added PhotoPrism fallback credentials from its manifest (`admin` / `archipelago`);
-  - did not add Grafana because its `GRAFANA_ADMIN_PASSWORD` is not resolved to a known local secret/default in the repo;
-  - `npm test -- --run src/views/apps/__tests__/appCredentials.test.ts` passed;
-  - `npm run type-check` passed;
-  - tracker row still `in-progress` because other no-registration apps still need inventory.
-
-Most recent validations before pause:
-
- `npm run type-check` passed after the PhotoPrism credential fallback.
- `npm test -- --run src/views/apps/__tests__/appCredentials.test.ts` passed.
- `git diff --check` passed after the Spotlight cleanup and before the PhotoPrism fallback; rerun it after resuming.
- `python3 scripts/check-app-catalog-drift.py --release --strict` passed during the Nextcloud pass.
- Backend Rust focused validation for image versions is still not clean because of the local linker/incremental artifact failure and the killed retry; rerun from `core/` when convenient.
-
-## Latest Known `.198` State
-
- Host: `192.168.1.198`.
- Backend deployed: `/usr/local/bin/archipelago` sha256 `159e0daf13fca2df7e831122cb0e6c84223a7e5b7433f5dd0b7eec263233e228`.
- `archipelago.service`: active after deploy.
- `archipelago-doctor.timer`: inactive.
- `archipelago-reconcile.timer`: inactive.
- No reboot validation should be started yet.
-
-## What Was Just Done
-
- Investigated current Fedimint Guardian UI report:
-  - live `.198` RPC reports `fedimint` as `starting` and `container-health {"fedimint":"starting"}`;
-  - direct `http://192.168.1.198:8175/` returns HTTP `000` because the manifest wrapper has not exec'd `fedimintd` yet;
-  - `bitcoin-knots` is `running` and `http://192.168.1.198:8334/` returns HTTP `200`;
-  - `bitcoin.status` RPC returned an operation-failed error during the check, consistent with the current Bitcoin-dependent-app wait-state problem.
- Added frontend Fedimint-specific wait-state copy:
-  - My Apps/App card now says `Waiting for Bitcoin to finish initial sync before Guardian starts.` when Fedimint is starting or running with `health=starting`;
-  - App session fallback title now says `Waiting for Bitcoin sync` instead of generic `App not reachable` for that state.
- Validated frontend changes:
-  - `npm test -- --run src/views/apps/__tests__/appsConfig.test.ts` passed (`7` tests);
-  - `npm run type-check` passed;
-  - `npm run build` passed.
- Deployed rebuilt static frontend to `.198` only:
-  - preserved `aiui/` and `claude-login.html`;
-  - backed up previous web root at `/opt/archipelago/rollback/web-ui-fedimint-ui-20260610-042927.tar`;
-  - reloaded nginx;
-  - confirmed deployed assets contain the new Fedimint copy.
- Fixed Fedimint Guardian launch on `.198` while Bitcoin is still syncing:
-  - added `docker/fedimint-ui`, an nginx wait/proxy companion;
-  - changed Fedimint backend manifest so real Guardian UI maps to host `8177` instead of the public launch port;
-  - public launch port `8175` is now owned by `archy-fedimint-ui`, which serves `Waiting for Bitcoin sync` until `fedimintd` binds behind it;
-  - fixed the Fedimint wait command to avoid `printf '%s'` in Quadlet `Exec=` because systemd expands `%s` to the user shell (`/bin/bash`);
-  - live `.198` `fedimint.service` unit has `TimeoutStartSec=infinity` so systemd does not kill the intentional Bitcoin-sync wait loop;
-  - rebuilt and deployed frontend static files so Fedimint remains launchable while `health=starting`;
-  - confirmed `http://192.168.1.198:8175/` returns HTTP `200` with `Waiting for Bitcoin sync`.
- Restyled the Fedimint wait/proxy page:
-  - `docker/fedimint-ui/index.html` now uses Archipelago-style `glass-card`, app icon block, Montserrat-like heading stack, orange focus/glow accents, and yellow starting badge styling;
-  - rebuilt `localhost/fedimint-ui:latest` on `.198`;
-  - restarting `archy-fedimint-ui.service` hit the known rootless Podman cleanup slowness and left the unit temporarily `deactivating`;
-  - recovered with app-scoped `systemctl --user kill --kill-whom=all -s SIGKILL archy-fedimint-ui.service`, `reset-failed`, and `start`;
-  - final LAN validation: `http://192.168.1.198:8175/` returns HTTP `200`, size `6419`, and contains `glass-card`, `app-icon`, `Archipelago App`, and `Waiting for Bitcoin sync`.
- Updated the Fedimint wait/proxy page again per design feedback:
-  - uses the Bitcoin custom UI's `/assets/img/bg-network.jpg` full-screen background + dark overlay pattern;
-  - uses the real Fedimint icon inside the Bitcoin custom UI `logo-gradient-border` treatment instead of text initials;
-  - copied those assets into `docker/fedimint-ui/assets/`;
-  - rebuilt `localhost/fedimint-ui:latest` on `.198`;
-  - fixed nginx routing so `/assets/...` is served statically instead of being proxied to the not-yet-running Guardian backend;
-  - corrected the companion page to reference `fedimint.jpg` because the catalog icon bytes are JPEG despite the old `.png` extension;
-  - final LAN validation: `http://192.168.1.198:8175/` returns HTTP `200`, size `11328`; `/assets/img/app-icons/fedimint.jpg` returns `200 image/jpeg`; `/assets/img/bg-network.jpg` returns `200 image/jpeg`;
-  - Playwright render validation confirmed title `Fedimint Guardian`, status `Waiting for Bitcoin sync`, background URL `/assets/img/bg-network.jpg`, and icon natural width `860`.
- Hardened Fedimint/backend lifecycle enough for this path:
-  - generated Quadlet services now include `TimeoutStartSec=0` so systemd does not kill dependency-gated container entrypoints while they wait for Bitcoin IBD;
-  - `package.restart` now returns `{"status":"restarting"}` immediately instead of blocking the RPC call for minutes in the single-orchestrator path;
-  - `quadlet::restart_service` now uses bounded stop/start, app-scoped kill/reset recovery, and settle waits instead of opaque `systemctl restart`;
-  - deployed backend hash `159e0daf13fca2df7e831122cb0e6c84223a7e5b7433f5dd0b7eec263233e228` to `.198`;
-  - backup made at `/opt/archipelago/rollback/archipelago-before-quadlet-timeout0-20260610-082535`;
-  - `package.restart fedimint` returned `{"status":"restarting"}` in `0s`;
-  - restart observation: `8175` stayed HTTP `200` throughout; generated `fedimint.container` gained `TimeoutStartSec=0`; `fedimint.service` and `archy-fedimint-ui.service` settled `active`; ports `8175` and `8177` listened.
- Final Fedimint live validation after restart:
-  - `container-health` returned `{"fedimint":"healthy"}`;
-  - `container-list` returned `fedimint` `state:"running"` and `lan_address:"http://localhost:8175"`;
-  - services: `fedimint.service` active, `archy-fedimint-ui.service` active;
-  - unit contains `TimeoutStartSec=0` at line `42`;
-  - public wait/proxy UI and both image assets returned `200`.
- Fedimint live rollback references:
-  - previous frontend backup: `/opt/archipelago/rollback/web-ui-fedimint-guardian-launch-20260610-045949.tar`;
-  - previous Fedimint Quadlet backup: `/home/archipelago/.config/containers/systemd/fedimint.container.guardian-fix-rewrite-20260610-050607.bak`.
- Earlier backend hash `7f58da80063f58574675256913ac9cddf131e65d8935015748a70adffc228f83` was superseded by `159e0daf13fca2df7e831122cb0e6c84223a7e5b7433f5dd0b7eec263233e228`.
- Added explicit release gates:
-  - app packaging docs must match current manifest/runtime contract before `1.8-alpha`;
-  - refactor/remove-dead-code is mandatory before `1.8-alpha`, after correctness validation and before final ISO/release gates.
- Validated IndeeHub:
-  - `container-list` reported `indeedhub` running;
-  - `container-health` returned `{"indeedhub":"healthy"}`;
-  - `http://192.168.1.198:7778/` returned HTTP `200`;
-  - `http://192.168.1.198:7778/nostr-provider.js` returned HTTP `200` and contains the Archipelago NIP-07/NIP-98 provider shim.
- Validated Immich launch:
-  - `http://192.168.1.198:2283/` returned HTTP `200`;
-  - one `container-health` check returned `{"immich":"unknown"}`, so health truthfulness still needs follow-up.
- Fixed Tailscale launch UI:
-  - patched `app-catalog/catalog.json`, `neode-ui/public/catalog.json`, and `scripts/first-boot-containers.sh`;
-  - command now waits for `/var/run/tailscale/tailscaled.sock` before starting `tailscale web`;
-  - copied updated catalog to `/opt/archipelago/web-ui/catalog.json` on `.198`;
-  - patched the live generated Tailscale `.container` unit and restarted only `tailscale.service`;
-  - confirmed `container-list` reports Tailscale running;
-  - confirmed `container-health` returns `{"tailscale":"healthy"}`;
-  - confirmed `http://192.168.1.198:8240/` returns HTTP `200` with Tailscale UI content.
-
-## Important Caveat
-
-Tailscale launch is fixed, but Tailscale lifecycle is not fully passing:
-
- `package.restart tailscale` failed through RPC with `podman ps timed out while listing containers`.
- Manual app-scoped restart showed old container stop needed SIGKILL and Podman cleanup took roughly 2 minutes.
- Logs still showed `podman ps timed out`, `podman stats timed out`, scan backoff, and slow cleanup.
-
-This confirms the active blocker is the rootless Podman control-plane/lifecycle path, not just individual app launch URLs.
-
-## Active Blockers
-
- Rootless Podman/control-plane responsiveness:
-  - `podman ps` and cleanup paths time out;
-  - backend scan/backoff causes stale or slow UI state;
-  - app stop/start/restart can look frozen or fail through RPC.
- My Apps state truthfulness:
-  - do not show false empty/no-apps while scanner/Podman is in backoff;
-  - preserve last-known apps and show explicit stale/checking state.
- Progress UX:
-  - install/uninstall/start/stop/restart must show meaningful phase progress and not appear frozen.
- Immich health truthfulness:
-  - HTTP launch works, but health may still report `unknown`.
- Portainer:
-  - HTTP `9000` returned `200`;
-  - user still needs to retry environment wizard and confirm `/var/run/docker.sock` works.
- Fedimint:
-  - public Guardian launch URL now loads on `8175` even while Bitcoin is in IBD;
-  - `archy-fedimint-ui` owns `8175` and proxies to the real Guardian backend on `8177` when `fedimintd` eventually starts;
-  - durable manifest/companion/frontend/backend changes are now deployed on `.198`;
-  - `package.restart fedimint` fast-returned and settled active with `TimeoutStartSec=0`, but keep Fedimint in the broader lifecycle matrix because rootless Podman cleanup slowness remains a systemic blocker.
- Reboot validation:
-  - require at least 3 clean consecutive post-fix reboots with broad lifecycle green after each;
-  - prefer 5 clean reboots;
-  - do not start until lifecycle/control-plane is stable.
- App packaging docs:
-  - aligned `docs/APP-PACKAGING-MIGRATION-PLAN.md` and `docs/app-developer-guide.md` with the current manifest/runtime contract.
- Refactor/remove-dead-code:
-  - required before `1.8-alpha`;
-  - remove stale per-app hacks, duplicate lifecycle paths, stale fallback metadata, misleading compatibility shims;
-  - rerun release gates afterward.
-
-## Local Validation Already Run
-
- `bash -n tests/lifecycle/remote-lifecycle.sh` passed.
- `bash -n scripts/first-boot-containers.sh tests/lifecycle/remote-lifecycle.sh` passed.
- `cargo fmt --manifest-path core/Cargo.toml --all` was run.
- `cargo test --manifest-path core/Cargo.toml -p archipelago-container` passed (`45` tests).
- `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container` passed.
- `python3 scripts/check-app-catalog-drift.py --release --strict` passed.
- `cmp -s app-catalog/catalog.json neode-ui/public/catalog.json` passed.
- `git diff --check` passed.
- `npm test -- --run src/views/apps/__tests__/appsConfig.test.ts` passed.
- `npm run type-check` passed.
- `npm run build` passed.
- `python3 scripts/check-app-catalog-drift.py --release --strict` passed after Fedimint manifest changes.
- `git diff --check` passed for Fedimint manifest, companion, frontend, and new `docker/fedimint-ui` files.
- `cargo fmt --manifest-path core/Cargo.toml --all` passed.
- `CARGO_TARGET_DIR=/tmp/archy-cargo-check-quadlet cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container` passed after Quadlet/restart changes.
- `CARGO_TARGET_DIR=/tmp/archy-cargo-final-quadlet cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release` produced the deployed backend binary (tool PTY heartbeat wrapper became stale after link; artifact hash was validated separately before deploy).
- Live Fedimint restart validation passed on `.198`:
-  - `package.restart fedimint` returned `{"status":"restarting"}` immediately;
-  - `8175` remained HTTP `200`;
-  - `fedimint.service` and `archy-fedimint-ui.service` settled `active`;
-  - `container-health fedimint` returned `healthy`.
- `cargo test --manifest-path core/Cargo.toml -p archipelago companion::tests` compiled then the tool PTY stuck with no active `cargo`/`rustc` process visible; treat as inconclusive, not failed.
- Filtered `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago indeedhub` appeared wedged in the tool PTY after compilation started; no local cargo/rustc worker remained visible. Treat as inconclusive, not failed.
-
-## Immediate Next Step
-
-Do not reboot yet.
-
-Start with the rootless Podman lifecycle/control-plane blocker:
-
-1. Inspect the backend stop/start/restart path around `package.restart`, scanner backoff, and `podman ps` dependency.
-2. Make stop/restart tolerate slow cleanup without wedging RPC/UI state.
-3. Keep last-known app state during scanner backoff.
-4. Revalidate focused apps on `.198`: `tailscale`, `indeedhub`, `immich`, `portainer`, `vaultwarden`, `botfights`; keep `fedimint` in the matrix but its focused Guardian launch/restart path is currently green.
-5. Only after focused lifecycle is clean, run broad non-destructive lifecycle.
-6. Only after that, begin 3/5 reboot validation.
-
-## Files Touched In Last Mini-Pass
-
- `docs/NEXT_TERMINAL_HANDOFF.md` - this file.
- `neode-ui/src/views/apps/appsConfig.ts` - Fedimint launch-blocked reason helper.
- `neode-ui/src/views/apps/AppCard.vue` - show Fedimint Bitcoin-sync wait copy on app cards.
- `neode-ui/src/views/AppSession.vue` - pass app-specific blocked reason into app session.
- `neode-ui/src/views/appSession/AppSessionFrame.vue` - show app-specific blocked title/reason instead of generic unreachable fallback.
- `neode-ui/src/views/apps/__tests__/appsConfig.test.ts` - regression coverage for Fedimint wait-state copy.
- `apps/fedimint/manifest.yml` - backend real Guardian UI now maps host `8177` and wait command avoids systemd `%` expansion.
- `core/archipelago/src/container/companion.rs` - added `archy-fedimint-ui` companion mapping.
- `core/archipelago/src/container/quadlet.rs` - generated unit `TimeoutStartSec=0` plus bounded stop/restart recovery helpers.
- `core/archipelago/src/api/rpc/package/runtime.rs` - restart RPC returns immediately and runs restart async.
- `docker/fedimint-ui/` - new nginx wait/proxy companion image for Fedimint Guardian launch.
- `docs/RESUME.md` - checkpoint and gates.
- `docs/MIGRATION_STATUS_REPORT.md` - packaging/refactor release gates.
- `docs/CONTAINER_LIFECYCLE_HANDOFF.md` - packaging/refactor release gates.
- `docs/APP-PACKAGING-MIGRATION-PLAN.md` - updated manifest/runtime contract documentation.
- `docs/app-developer-guide.md` - updated manifest/runtime contract documentation.
- `docs/MIGRATION_STATUS_REPORT.md` - noted that the docs gate is being closed in this pass.
- `app-catalog/catalog.json` - Tailscale socket-wait startup command.
- `neode-ui/public/catalog.json` - same Tailscale catalog update.
- `scripts/first-boot-containers.sh` - same Tailscale first-boot startup update.
- `neode-ui/src/views/apps/appPackageCache.ts` - UI-only last-known package
-  cache for scanner backoff.
- `neode-ui/src/views/apps/__tests__/appPackageCache.test.ts` - cache behavior
-  coverage.
- `neode-ui/src/views/Apps.vue` - uses cached packages during scanner backoff
-  and shows a refresh status banner.
- `docs/1.8-alpha-improvements-tracker.md` - noted My Apps backoff cache
-  improvement.
- `neode-ui/src/views/web5/Web5SharedContent.vue` - preserves shared/peer
-  content during refresh and shows compact refresh states.
- `neode-ui/src/views/web5/__tests__/Web5SharedContent.test.ts` - shared and
-  peer content refresh regression coverage.
-
-The worktree has many other pre-existing release-hardening changes. Do not revert unrelated dirty files.
--- a/docs/PRODUCTION-MASTER-PLAN.md
+++ b/docs/PRODUCTION-MASTER-PLAN.md
@ -0,0 +1,906 @@
+# PRODUCTION MASTER PLAN — Archipelago App Platform & Registry
+
+> **✅ SINGLE-NODE PRODUCTION GATE IS GREEN (2026-06-23): `run-gate.sh` 5/5 on .228, 0 failures.**
+> This remains the authoritative plan for the broader north star (manifest-driven
+> platform, registry-distributed manifests, external marketplace), but it is no
+> longer a hard priority banner blocking all other work. Remaining workstreams are
+> in §6 / §8b. Next exit-criteria: multinode (`docs/multinode-testing-plan.md`) +
+> workstreams B/C/D.
+>
+> Last updated: 2026-06-26 · zombie-container guard + gitea launch-port fix shipped, binary `040df5ce` rolled to the fleet (see §8b SESSION h). Prior: orchestrator Fix A+B (`a721532f`/`e0343137`) deployed + proven.
+
+---
+
+## 1. The North Star
+
+Make Archipelago a **world-class, developer-ready app platform** where:
+
+1. **Every app is manifest-driven** — install/run/update/uninstall needs only the
+   app's manifest (+ catalog entry). **Zero OS-level code reliance**: no per-app
+   Rust installers, no `sudo mkdir/chown`, no host provisioning.
+2. **Manifests are distributed via the (signed) registry**, not baked into the
+   binary OTA as disk files. Bumping/adding an app = a signed catalog change.
+3. **Third-party developers can build and ship apps via an external registry** —
+   a decentralized marketplace (DID-signed manifests, Nostr discovery, reputation),
+   not a gatekept central store. `archy app validate/render/install/test` tooling.
+4. The platform stays **rootless, secure-by-default, elegant, robust, and
+   100%-uptime-capable** (reboot-survivable, self-healing, no data loss on migrate).
+
+**Definition of done:** the production test gate (§5) is green for the app set on
+real nodes. Until then, this plan is the priority.
+
+## 2. Invariants (never violate)
+
+- **Rootless Podman only.** No rootful, no Docker-socket mounts, no privileged
+  containers unless explicitly approved. (ADR-001, ADR-009.)
+- **No app-specific business logic in the Rust backend.** The orchestrator owns
+  the lifecycle state machine; apps are declarative. Legacy `install_immich_stack`
+  (hardcoded `podman run` + `sudo chown`) is the anti-pattern being deleted.
+- **Secrets are manifest-declared** (`generated_secrets`, materialised by
+  `container::secrets` 0600/rootless, idempotent + self-healing) — never hardcoded,
+  per-app, or logged. Replaces the deleted `ensure_fmcd_password`.
+- **Migrations never destroy data.** Preserve `/var/lib/archipelago/<app>`,
+  generated secrets, displayed credentials, public ports, and adoption container
+  names. Always provide a rollback path. Stop/recreate only when necessary.
+- **Verify on the real node .228 before any tag.** (Fleet/multinode verification is
+  a separate pass → `docs/multinode-testing-plan.md`.)
+
+## 3. Current state (2026-06-21)
+
+- **~40 apps are manifest-based and Quadlet-migrated** (survive
+  `archipelago.service` restart + reboot). Exhaustive per-app table:
+  `docs/app-registry-status-2026-06-21.md`.
+- **Legacy holdout: immich** — the one app with **no manifest** and a hardcoded
+  Rust stack installer (in-cgroup, not Quadlet). 3 containers, healthy, live data.
+  The migration proof case.
+- **Manifests still travel by OTA disk rsync** (`apps/ → /opt/archipelago/apps`).
+  The signed catalog (`app-catalog.json`) currently distributes **only image
+  overrides** — not full manifests. Gap closed by workstream B.
+- **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`,
+  `-fedimint-ui`) build from `docker/<name>` contexts via `companion.rs`, not the
+  manifest registry — a later phase folds them in.
+- **No app has passed the formal production gate.** That is the blocker.
+
+## 4. Workstreams (each links its authoritative detail doc)
+
+| # | Workstream | Detail doc | Status |
+|---|-----------|-----------|--------|
+| A | **Manifest-driven app platform** — packaging contract, single/multi-container runtime, routing, controlled hooks, dev tooling (6 phases, security model, migration rules) | `APP-PACKAGING-MIGRATION-PLAN.md` | mostly done; immich + multi-container polish remain |
+| B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet |
+| C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending |
+| D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 0–2 code-complete (worktree) |
+| E | **Production test gate** — 5× lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out → `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **✅ .228 5×-GREEN (110/110 ×5, 0 not-ok, 2026-06-23)** — but this is DESTRUCTIVE-tier / ~8 core apps only; see §6c for the coverage gaps |
+| F | **Lifecycle perfection — cascade + progress + ALL apps** — extend the gate to uninstall/reinstall (cascade), real install/uninstall progress UI, and EVERY installed app (not just the 8 core). The "insanely-perfect OS/container environment" bar. | §6c (below), `tests/lifecycle/TESTING.md` | **IN PROGRESS (2026-06-26)** — root bug FIXED: uninstall could hang → ghost/stuck-bar/reinstall-block (`71cc9ac4`, unbounded systemctl/podman in `quadlet::disable_remove`); `cascade-uninstall.bats` **7/7 green on .228** w/ binary `ae349a75`. Remaining: wire CASCADE into the canonical gate run, progress-UI truthfulness, all-apps matrix, guardian/IBD state. |
+
+**Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md`
+(ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption
+scan, Quadlet rendering) and `bulletproof-containers.md` (the six container failure
+modes FM1–FM6 + the desired-state-first reconciler that fixes them).
+
+## 5. Production test gate (exit criterion)
+
+An app is **production-ready** only when `tests/lifecycle/run-gate.sh` is green
+across the full matrix — install / UI-reachable / stop / start / restart /
+reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall —
+**5× on .228** (`ARCHY_ITERATIONS=5`). **The gate runs ON the node** (it uses local
+podman/systemctl/bitcoin probes; running it via RPC from another host silently
+tests the runner). **Multinode / fleet verification (.198 + others) is a SEPARATE
+plan — `docs/multinode-testing-plan.md` — NOT part of this single-node criterion.**
+Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps, L2 UI ● dashboard +
+proxies; L3 survival ◐; ~30 apps have zero automated coverage.
+
+> ⚠️ **The 2026-06-23 5×-green is NOT the full bar.** `run-gate.sh` runs only the
+> **DESTRUCTIVE tier** (stop/start/restart/survive) over ~8 core apps; it **skips
+> uninstall/reinstall** (CASCADE is gated behind `ARCHY_ALLOW_CASCADE_DESTRUCTIVE`,
+> never set by the gate) and tests no install/uninstall **progress UI**. Real
+> uninstall/reinstall/progress bugs (immich + grafana) were found in manual testing
+> right after — see **§6c (workstream F)** for the gap and the expanded-gate plan.
+> The true "every app, fully" criterion is F's definition-of-done, not this run.
+
+## 6. Immediate sequence (live workstream)
+
+1. ✅ **B-phase 1** — `manifest` field on `AppCatalogEntry`; `load_manifests`
+   catalog-wins merge; `manifest_dir` kept (build-source catalog manifests skipped
+   in phase 1); unit tests. *(commit 220666d3)*
+2. ✅ **B-phase 2** — `EMBED_MANIFESTS` publisher generator + round-trip guard.
+   *(7bfbe8fe; signing via existing ceremony — not yet flipped on for the fleet.)*
+3. ✅ **C immich proof** — immich is a manifest-driven stack (immich + immich-postgres
+   + immich-redis) installed via `install_stack_via_orchestrator`; legacy installer
+   is now fallback-only. Live-migrated + verified on .228. Found+fixed: container_name
+   duplicate-on-shared-PGDATA, version-digit validation, partial-fallback hardening,
+   data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)*
+4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide)
+   for the podman-`--restart` path. *(f160e0c4)*
+5. ✅ **E** — 5× gate on **.228** (`ARCHY_ITERATIONS=5`) is **GREEN: 5/5, 0 not-ok**
+   (2026-06-23). Two real orchestrator bugs were found + fixed en route (package.stop
+   per-app grace; package.restart phantom stack-member injection → `order_present_containers`,
+   commit 92d7f52d) plus two single-shot-read probes hardened (bitcoin-knots state, immich
+   lan_address). The single-node criterion is met.
+6. ✅ Banner demoted (this doc, 2026-06-23). Next: multinode pass + workstreams B/C/D.
+
+**Multinode / fleet verification (.198 and the rest) is split into its own plan:**
+`docs/multinode-testing-plan.md`. Do it AFTER the .228 single-node gate is green.
+
+**Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the
+published catalog (then sign) to actually distribute manifests via the registry;
+Phase-3 `use_quadlet_backends` rollout so orchestrator backends are Quadlet (not
+just podman-`--restart`).
+
+## 6b. Post-deploy task order (agreed 2026-06-23)
+
+After the 2026-06-23 multinode test deploy (latest backend + UX frontend to .116/.198/.228
+ Tailscale testers), do these IN ORDER:
+1. **netbird #20 ph4** — the last real manifest migration (workstream A).
+2. **Phase-3 `use_quadlet_backends`** — orchestrator backends become Quadlet units.
+3. **§6c Lifecycle perfection** (workstream F) — the comprehensive uninstall/reinstall +
+   progress-UI + all-apps gate expansion below.
+
+## 6c. Lifecycle perfection — what "green" MISSED (workstream F, the perfection bar)
+
+**Why this exists:** the 2026-06-23 single-node gate went 5×-green but is **NOT** the
+"every app fully lifecycle-tested" guarantee a user reasonably assumes. The canonical gate
+(`run-gate.sh`) only runs the **DESTRUCTIVE tier** (stop / start / restart / survive) over
+**~8 core apps** (bitcoin-knots, btcpay, electrumx, lnd, mempool, immich, fedimint,
+filebrowser). It explicitly **SKIPS uninstall/reinstall** (the CASCADE tier is gated behind
+`ARCHY_ALLOW_CASCADE_DESTRUCTIVE`, which `run-gate.sh` never sets) and has **zero coverage**
+for the other ~30 apps (grafana, jellyfin, vaultwarden, penpot, nextcloud, photoprism,
+uptime-kuma, homeassistant, … — see `app-registry-status-2026-06-21.md`). So uninstall,
+reinstall, install-progress UI, and most apps were never under test.
+
+**Real bugs found in manual multinode testing on .198 (2026-06-23) — the motivating evidence:**
+- **Uninstall is broken for immich + grafana:** takes very long, the progress bar sits at a
+  **solid full-red with no real progression**, and the app **does not actually uninstall** —
+  it still appears in **My Apps** afterward (ghost entry / state not cleared).
+- **grafana reinstall just stops** partway (no completion, no clear error).
+- **fedimint guardian** suddenly showed **"starting up — Guardian opens a wait page until
+  Bitcoin finishes initial sync" / "starting"** on that node — verify this is correct
+  wait-for-IBD behavior vs a stuck/false state (it's a backend that depends on bitcoin sync).
+
+**✅ 2026-06-26 — root cause of the immich/grafana uninstall trio FOUND + FIXED (`71cc9ac4`).**
+Single cause: `quadlet::disable_remove()` (first op in uninstall teardown, via companion +
+orchestrator) ran `systemctl --user stop` / `daemon-reload` / `podman rm -f` with **no timeout**.
+On rootless podman a generated unit can wedge "deactivating" while podman hangs → `systemctl stop`
+blocks forever → the spawned uninstall task returns neither Ok nor Err, so (a) `set_uninstall_stage`
+never fires → **frozen full-red bar**, (b) `remove_package_state_entry` never runs → **ghost stuck in
+`Removing`**, (c) the install guard rejects reinstall (`already Removing`). The spawn wrapper already
+reverts state on Err/removes on Ok — only a *hang* stranded it. Fix bounds all three calls
+(stop→`QUADLET_STOP_TIMEOUT` + SIGKILL/reset-failed escalation; daemon-reload→30s; podman rm→timeout).
+**Validated live: `cascade-uninstall.bats` 7/7 on .228** (binary `ae349a75`) — grafana install →
+uninstall (no ghost, data dir gone) → reinstall → running → cleanup. NOTE: proves the happy path +
+no-regression; the original hang was load/timing-induced and not separately reproduced.
+
+**Workstream F scope — the gate must grow to (in priority order):**
+1. **CASCADE tier in the canonical gate:** uninstall → verify the app is GONE from My Apps /
+   `container-list` / package state (no ghost), data preserved per policy, then reinstall →
+   verify it returns healthy. Catch the immich/grafana ghost + reinstall-stops bugs.
+   *(✅ DONE `b7d92107`: `run-gate.sh` now runs ONE cascade pass after the 5× loop when
+   `ARCHY_GATE_CASCADE=1` (+`ARCHY_ALLOW_DESTRUCTIVE=1`), counted into the tally — opt-in so default
+   behavior is unchanged, and deliberately NOT folded into all 5 iterations. `cascade-uninstall.bats`
+   7/7 on .228. Next: extend cascade coverage beyond the single throwaway app to the multi-container
+   stacks, e.g. an immich/btcpay cascade variant.)*
+2. **Progress-UI assertions:** install AND uninstall must report monotonic, truthful progress
+   (not a stuck full-red bar); a long op must surface a real stage/percentage and a terminal
+   success/failure — no silent hang. (Likely both a backend progress-event fix AND a UI fix.)
+   *(✅ 2026-06-26 `9f17ba68`: the "stuck full-red bar" was `AppCard.vue` hardcoding the uninstall
+   bar to `w-full bg-red-400/60 animate-pulse` — solid, full, red, fake-pulse. Now derives a real
+   percentage from the backend's existing `uninstall-stage` label ("Stopping containers (X/N)"→10–50%,
+   "Cleaning up volumes"→70%, "Removing app data"→90%) and renders like install (neutral fill, real
+   width+%, shimmer). FE built `index-DtZyZomC.js`, rolled to .228/.116/.198/.89 (+.88/.5/.120).
+   STILL TODO: a bats/UI assertion that the bar is monotonic + lands on a terminal state; possibly a
+   backend numeric-progress field so the UI doesn't parse stage strings.)*
+3. **ALL-apps coverage:** a generic per-app lifecycle matrix (install / UI-reach / stop / start /
+   restart / uninstall / reinstall / reboot-survive) driven by the manifest set, so grafana and
+   the ~30 uncovered apps are gated too — not just the 8 core. Manifest-driven, so new apps are
+   covered automatically.
+   *(✅ 2026-06-26 `43934eef`: `bats/all-apps-lifecycle.bats` — DESTRUCTIVE counterpart to the
+   read-only `all-apps-matrix.bats`. Discovers the app set from My Apps ∩ the node `catalog.json`;
+   drives stop/start/restart for every app and, under `ARCHY_ALLOW_CASCADE_DESTRUCTIVE`, a FULL
+   teardown (uninstall→no-ghost→reinstall) with the catalog `{dockerImage, containerConfig}` as the
+   reinstall spec. PROTECTED (never touched): bitcoin*/electrum* (resync cost) + lnd/btcpay*/fedimint*
+   (irreversible wallet loss — user asked to protect only bitcoin+electrum; wallet apps added for
+   safety, override via `ARCHY_MATRIX_PROTECT`). Validated on .228 (discovery + 1-app lifecycle
+   green). HEAVY/destructive → a supervised pass on LAN nodes (.116/.198/.228), NOT folded into
+   run-gate. Invoke: `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1 ARCHY_PASSWORD=…
+   ARCHY_SCHEME=https bats bats/all-apps-lifecycle.bats`.)*
+   **✅ FIRST FULL DESTRUCTIVE RUN on .228 (2026-06-26):** lifecycle **11/11 clean**; teardown
+   **8/11** (immich 3-container stack incl.) — and it surfaced **3 real reinstall bugs** (the payoff):
+   1. **fresh-install bind-dir ownership = root:root** → EACCES on reinstall (jellyfin `/config`
+      denied exit 139; netbird-server can't open its SQLite store). Fix B's chown-to-parent only
+      runs on the reconcile path, **not** `package.install`. The important orchestrator fix.
+   2. **netbird reinstall adopts leftover containers → skips the manifest cert/file render**
+      (tls.crt/key/nginx.conf never written → proxy can't start → app reads absent). Only a fully
+      clean reinstall renders them.
+   3. **portainer image pin `lfg2025/portainer:2.19.4` is `manifest unknown`** (never pushed to the
+      registry) and the pin OVERRIDES the RPC dockerImage → portainer is un(re)installable
+      fleet-wide. Registry/catalog data bug (push the image or change the pin).
+   .228 restored (jellyfin+netbird via manual chown / clean reinstall; all installed apps running,
+   28 ctrs; portainer left uninstalled — uninstallable until #3 fixed). TODO: fix #1 (extend chown
+   to install path) + #2 + #3; add reboot-survive + UI-reach per app to the matrix.
+4. **Guardian/IBD-dependent states:** assert that "waiting for bitcoin sync"-style states are a
+   legitimate, surfaced wait (with a path to ready) and never a permanent stuck state.
+
+**Definition of done for F:** the expanded gate (CASCADE + progress + all-apps) is 5×-green on
+.228, then re-verified across the multinode fleet — i.e. an *insanely-perfect* OS/container
+environment where every app installs, runs, updates, uninstalls, and reinstalls cleanly with
+honest progress, no ghosts, no data loss, reboot-survivable.
+
+## 7. Release blockers & operational gotchas (durable)
+
+Carried forward from prior handoffs (deduped against persistent memory):
+
+- **Rootless control-plane responsiveness** — slow `podman ps`/store cleanup at
+  startup must not surface a false "no apps installed" UI. **My Apps must preserve
+  last-known apps during scanner backoff**, never show empty during a transient.
+- **Reboot survival** — gate on ≥3 (prefer 5) consecutive clean post-reboot
+  lifecycle passes. Quadlet units under `user.slice` survive `archipelago.service`
+  restart; legacy in-cgroup containers get SIGKILLed and reconciled back.
+- **Startup patterns** — wait on a socket/health, never `sleep`. Tailscale waits
+  for its socket; Fedimint Guardian waits for Bitcoin RPC `initialblockdownload:false`
+  before launching fedimintd (proxy/wait companion on :8175 during IBD).
+- **Bitcoin must run full** (`txindex=1`, non-pruned) for ElectrumX/mempool.
+- **Adoption** — match existing containers by name and adopt without recreate;
+  record a migration version in app state; preserve Nostr signer bridges
+  (IndeeHub needs `/nostr-provider.js` served, not just port reachability).
+- **Image presence** — use bounded targeted `podman image inspect`, not
+  `podman image exists` (avoids store-walk stalls).
+- **Companion rebuilds** — `companion.rs` must rebuild `:latest` when the build
+  context changes (staleness check), else baked-in fixes (e.g. guardian CSS) never
+  reach nodes. `:local` is a manual override, never auto-rebuilt.
+
+## 8. Roadmap
+
+**Pipeline:** Feature Testing (internal) → User Testing (controlled hardware) →
+Beta Live (public). Hardening priorities feeding the gate:
+
+- **P0** Container app reliability — bulletproof install/health/restart/uninstall
+  across all apps, dependency chains, multi-container stacks.
+- **P0** Networking stack first-install → reboot-proof (WireGuard/NetBird, Tor
+  hidden services, LND Connect).
+- **P1** LUKS2 full-partition encryption for `/var/lib/archipelago/`
+  (AES-256-XTS, Argon2id, key from setup password + hardware salt).
+- **P1** Meshtastic plug-and-play parity with MeshCore.
+- **P1 ✅ CODE-COMPLETE** (branch `companion-mobile-ux`, 2026-06-23; needs
+  on-device + mobile-web verification before merge to `main`) — Mobile app-launch
+  UX — drop the "this app opens in a tab" interstitial.
+  Two surfaces (both: no interstitial screen, launch the app directly):
+  - **Companion app (Android):** open **every** app in the **in-app WebView**
+    (not just non-iframeable ones) — *and* carry the current mobile-iframe footer
+    controls into the WebView (back/forward/reload/close — good, useful UX).
+  - **Mobile web browser (PWA):** open tab-apps directly in a **new browser tab**.
+  Touch points: `neode-ui/src/stores/appLauncher.ts`, `AppLauncherOverlay.vue`,
+  the Android in-app WebView bridge, and the mesh-mobile iframe footer controls.
+  (Reference prior work: `b5a9deb8` in-app webview for non-iframeable apps,
+  `d1fbcd9b` "open in browser" via native bridge.)
+  - **✅ Done (branch `companion-mobile-ux`):** mobile launches now use the
+    store-driven panel (no route push) so the background tab no longer changes and
+    closing returns you where you launched; tab-only apps open directly (in-app
+    WebView on companion via `openInApp`, new browser tab on PWA) with **no
+    interstitial**; the Android `InAppBrowser` (`WebViewScreen.kt`) gained a bottom
+    footer bar (back/forward/reload/open-in-browser/close) + a centered loading
+    screen (favicon + progress); a shared `AppLoadingScreen` (icon + progress)
+    replaced the black/spinner loaders on the app session **and** legacy iframe
+    overlay; the dashboard is pinned to `100dvh` on mobile so the mesh chat/tools
+    panes stop sliding under the tab bar in mobile browsers (no-op in companion);
+    ElectrumX shows its real icon in My Apps. Companion APK bumped to **v0.4.7**
+    (versionCode 11) with a committed shared debug keystore so updates install
+    without an uninstall. **Not yet:** merge to `main`; publish the 0.4.7 companion
+    download (deferred until the gate work lands so they ship together).
+
+**Post-beta (deferred — do not start until gate is green):** P2P encrypted
+voice/video (WebRTC over federation via Tor); watch-only wallet + mesh BTC
+hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.md`);
+Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash
+phases 2–6 (`dual-ecash-design.md`).
+
+## 8b. SESSION STATE + RESUME (updated 2026-06-26) — READ §8b "CURRENT STATE + RESUME" FIRST
+
+### ▶ SESSION h (2026-06-26) — LATEST, RESUME FROM HERE
+
+**Canonical resume detail: memory `project_session_resume_2026_06_23b` (▶️ top of MEMORY.md).**
+Local main = `670ebb06` (3 commits past the previously-pushed `43e70049`: `0a8db904` zombie
+guard + `670ebb06` gitea launch-port fix; `43e70049` webview was already pushed). **Combined
+release binary `040df5ce2551d17b` rolled to the fleet.** Binary+FE not in git — rebuild on a
+fresh machine (`cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`).
+
+**DONE this session:**
+1. ✅ **Zombie-container guard** (`0a8db904`) — the reconciler's Running branch now verifies a
+   container's `State.Pid` is alive (`/proc/<pid>` exists) before trusting podman's "Up"; on a
+   concrete dead PID it stop+remove+`install_fresh` from the manifest. Conservative: any
+   uncertainty (inspect fail / unparseable PID) assumes alive, so a transient hiccup never
+   destroys a healthy container. Fixes the class that broke NetBird login on .228 (dashboard
+   "Up" w/ dead PID → proxy 502, no host port → reconciler never recovered it). Unit test +
+   **live-proven on .228**: synthetic zombie on `jellyfin` (killed conmon+PID → podman still
+   "Up") → guard logged `…process is dead (zombie) — recreating app_id=jellyfin` → recreated →
+   settled to NoOp. **Zero false-positives across the other 33 healthy containers.**
+2. ✅ **Gitea launch-port fix** (`670ebb06`) — gitea launched at **:2222 (SSH)** instead of
+   **:3001 (web)** on nodes without the gitea manifest on disk (`manifest_lan_address_for`
+   returns None → fell through to `extract_lan_address`, which returns podman's first-listed
+   port; podman lists `2222->22` before `3001->3000`). Added `"gitea" => http://localhost:3001`
+   to the static `lan_address_for` map (`core/container/src/podman_client.rs`) like every other
+   core app. Reported on tailscale node **100.82.34.38** — that node still needs the new binary
+   (or a refreshed gitea manifest) to pick it up.
+3. ✅ **Rolled `040df5ce`** to .228/.116/.198/.89 (verified sha+active); .88/.5/.120 rolling.
+
+**OPEN follow-ups (logged, NOT regressions):**
+- **mempool env-drift recreate-loop on .228** — reconciler logs `container env drift detected —
+  recreating app_id=mempool` every ~30-90s, never converges (pre-existing; the known mempool
+  nginx stale-IP class, [[project_mempool_nginx_stale_ip_fix]]). mempool stays running but churns.
+- **nostr-rs-relay** stuck "Stopping" + ~2s create-loop on .228 (from session g).
+
+**NEXT:** finish .88/.5/.120 roll → push main to gitea-vps2 → Phase-3 quadlet / Workstream F /
+multinode. SSH/sudo pw `ThisIsWeb54321@` (**.88 = `ThisIsWeb54321!`**); UI/RPC .228/.198 =
+`ThisIsWeb54321@`. Reusable tooling in scratchpad: `deploy-bin.sh`/`remote-apply.sh` (EXPECT_SHA
+= `040df5ce…`), `rpc.sh`.
+
+---
+
+### ▶ SESSION g (2026-06-25) — earlier, historical
+
+**Canonical resume detail: memory `project_session_resume_2026_06_23b` + `project_netbird_ph4_legacy_deletion_map` + `project_workstream_f_lifecycle_perfection`.**
+`gitea-vps2/main = a721532f` (pushed). **Local main = `89d397bb`** (2 new commits this session, NOT pushed/deployed: `41e7f500` harness tolerance + `89d397bb` netbird ph4 legacy delete). Binary+FE are NOT in git — rebuild on a fresh machine.
+
+**TL;DR (SESSION g, 2026-06-25) — everything below DONE this session:**
+1. ✅ **Rolled** `e0343137` + fresh FE (`index-a75rd6Hy.js`) to **7 nodes** (.116/.198/.228/.89/.88/.5/.120), all verified. **.15 SKIPPED** (auth rejected — creds don't match).
+2. ✅ **Harness tolerance fixes COMMITTED** `41e7f500` (run-gate settle/immich + immich.bats 90s + mempool.bats poll).
+3. ✅ **mempool RESOLVED** fleet-wide — see mempool note below.
+4. ✅ **netbird #20 ph4 DONE** — legacy Rust installer DELETED, committed `89d397bb` (492 lines gone, manifest-driven only, `cargo check` clean). Release binary BUILDING for the .228 live-verify (build left running — check after).
+
+**NEXT (resume here):** (a) check the release build, deploy the `89d397bb` binary to .228, live-verify netbird adopts via manifest (https:8087→200, no `bail!`); (b) roll `89d397bb` to the rest of the fleet (behavior-neutral — manifest path already executed); (c) **push local main → gitea-vps2** (2 commits ahead); then **Phase-3 `use_quadlet_backends` → Workstream F → multinode**.
+
+**ROLL RESULTS (2026-06-25, binary `e0343137b99bf066` + fresh FE bundled):**
+| Node | Result |
+|------|--------|
+| .228 | ✅ already on `e0343137` (prior session, binary-only) |
+| .116 (local) | ✅ binary + fresh FE; 36 containers survived restart; UI 200; `index-a75rd6Hy.js` live |
+| .198 (LAN) | ✅ binary + fresh FE; 38 containers up; UI 200 |
+| .89 (100.89.209.89) | ✅ binary + fresh FE; service active |
+| .88 (100.70.96.88, pw `ThisIsWeb54321!`) | ✅ binary + fresh FE; service active |
+| .5 (100.72.136.5) | ⏳ attempted — see resume note (cellular x250) |
+| .120 (100.66.157.120) | ⏳ attempted — see resume note (cellular x250) |
+| .15 (100.64.83.15, archy-dev-pa) | ❌ SKIPPED — `archipelago@` + `ThisIsWeb54321@` rejected (`Permission denied (publickey,password)`); node creds unknown |
+
+Deploy tooling (reusable): scratchpad `deploy-bin.sh <label> <local\|ssh\|ts> <host> <pw>` + `remote-apply.sh` (mv binary avoids ETXTBSY, atomic FE swap preserving `aiui`/APK/`claude-login.html`, chown 1000:1000, restart, sha+health verify). Frontend tarball = `tar -C web/dist/neode-ui -czf neode-ui.tgz .` (flat). Full sha `e0343137b99bf06642c45da67bb092e9a411190ff59eda8e5177c2a06b6f6e89`.
+
+**Focus: validate the two UNVALIDATED-WIP orchestrator fixes (commit `a721532f`) on the .228 canary, then roll to the 7-node fleet.**
+- **Fix A** — desired-state recovery: a was-running app that vanished (e.g. lost through a failed teardown + reboot) auto-recreates on reconcile, via new `crash_recovery::load_last_running_names` (reads `running-containers.json` sans PID gate) + exact container-name match in `reconcile_all_with_mode`. Zero false-positives (uninstalled/user-stopped excluded).
+- **Fix B** — recreate volume-ownership: a freshly-created bind dir for a NO-`data_uid` app gets `chown --reference=<parent>` so container-root can write → kills the immich-class recreate EACCES crash-loop. Only fresh dirs (zero regression for existing installs).
+
+VALIDATION PROGRESS (sessions e→f):
+1. ✅ Release binary built — sha16 `e0343137b99bf066` (differs from pre-fix `f2aa2fab` → fixes compiled in).
+2. ✅ `cargo test -p archipelago crash_recovery` — **13/13 green**, incl. the two new Fix A tests.
+3. ✅ Deployed new binary to **.228 canary** (binary-only; FE unchanged at `435b9f92`). Verified live sha `e0343137`, active, RPC OK. Container cgroup confirmed in `user@1000.service` (NOT archipelago.service) → `systemctl stop` is container-safe on .228.
+4. ✅ **Fix A PROVEN** — `podman rm -f jellyfin` (non-baseline, no-data_uid) → periodic ExistingOnly reconciler (30s) recreated it; journal: `previously-running app has no container after boot — recreating (desired-state recovery) app_id=jellyfin`.
+5. ✅ **Fix B PROVEN** — fresh `package.install uptime-kuma` (no-data_uid, no prior data dir) → bind dir chowned to parent owner `1000:1000` (NOT root:root), state=running, RestartCount=0, no EACCES, app wrote its own subdirs → clean uninstall (container+data-dir gone). all-apps matrix read-only **5/5 (17 apps)**.
+6. 🟡 **5× DESTRUCTIVE gate on .228 — NOT yet 5/5, but failures are HARNESS-TOLERANCE FLAKES, NOT Fix A/B regressions** (proven: Fix A logged **0** desired-state-recovery firings during the failures; immich/lnd `RestartCount: 0`, no crashes). Under sustained 5× churn on this 34-app node a *different* heavy-app recovery probe slips each iteration:
+   - immich `lan_address` (test 64): 30s probe too tight after archipelago-restart recovery. **FIXED** (settle_stack now waits on immich :2283 when present, cap 180→300s; test 64 deadline 30→90s). Went **ok/ok/ok 3×** after fix.
+   - mempool orphan count (test 82): single-shot count caught a transient extra container mid-recreate (clears to 3=3). **FIXED locally** (poll for steady-state ≤30s) — fix is in local `tests/lifecycle/bats/mempool.bats`, NOT yet re-gated.
+   - lnd `getinfo recovers after restart` (test 77): already has a generous 240s deadline; peak concurrent load occasionally beats it. lnd itself **HEALTHY** (wallet unlocked — "wallet already unlocked, WalletUnlocker no longer available", RestartCount 0). Likely needs deadline bump or lnd added to within-iteration tolerance. **NOT yet fixed.**
+   - NOTE: the 300s settle bump made iterations very long (iter2=1062s) and a diagnostic run wedged in iter3; killed it. Re-think settle (maybe per-app readiness with shorter caps) before the next run.
+7. ✅ **DECISION RESOLVED (2026-06-25):** user chose **(B) roll now** AND bundle the fresh UX frontend (per `feedback_deploy_targets_and_ux_bundle`). Gate load-robustness deferred to a separate hardening pass.
+8. ✅ **ROLLED** `e0343137` + fresh FE (`index-a75rd6Hy.js`) to .116/.198/.89/.88/.5/.120 (.228 already on it) — all verified `sha=e0343137`, service active. **.15 skipped** (auth reject). See roll table above.
+9. ✅ **Harness fixes COMMITTED** `41e7f500` (no longer uncommitted).
+10. ✅ **netbird #20 ph4 — legacy installer DELETED**, committed `89d397bb`. `install_netbird_stack` is now orchestrator-manifest → adopt → `bail!` (no in-Rust installer); removed 6 dead helpers + 3 `NETBIRD_*_IMAGE` consts + unused import (~492 lines). `cargo check` clean (0 warnings). Manifest path verified live pre-delete (.228 https:8087→200). **Release binary BUILT: sha `cccb7cfd9c38a651`** (`core/target/release/archipelago`, supersedes `e0343137`) — NOT yet deployed; deploy to .228 + live-verify then roll. Map+rationale: memory `project_netbird_ph4_legacy_deletion_map`. **Pre-existing follow-up (NOT introduced by delete): the manifest path lacks an active #10 OIDC-readiness gate — if that login race resurfaces, add an OIDC-ready gate to the netbird manifest.**
+
+**✅ 2026-06-25 — STRAY 13h GATE on .228 found + killed; mempool RESOLVED.** A `setsid` gate run from session-e was still churning .228 ~13h later (pathologically slow — only reached test 71/lnd; the 300s settle bump is the suspect). Killed its process group (note: `pkill -f bats` self-matches the ssh command's own argv → kill by numeric PID/PGID instead). After kill, `crash_recovery` (Fix A) auto-recovered the immich/indeedhub/netbird stacks — **good live exercise of Fix A**. **mempool fallout RESOLVED:** the gate churn left .228's podman **overlay storage corrupt** (mempool frontend crash-looped — container couldn't write `/etc/nginx`, same image serves fine on .116) → **fixed by rebooting .228** (clears overlay corruption; Fix A staggered-recovered all apps; mempool stable 200). **.198 is PRUNED** bitcoin → mempool requires archival (install correctly refused) → **cleanly uninstalled** the orphan mempool-db. All nodes now correct. LESSON: never leave the gate running unsupervised; reconsider the 300s settle before re-running.
+
+Fleet on `e0343137` + FE `index-a75rd6Hy.js` on .116/.198/.228/.89/.88/.5/.120 (.15 still old). **`89d397bb` (netbird-delete) binary NOT yet deployed anywhere — verify on .228 then roll.** SSH/sudo pw UNIFORM `ThisIsWeb54321@` (**.88 = `ThisIsWeb54321!`**); **UI/RPC: .228=`ThisIsWeb54321@`, .198=`ThisIsWeb54321@`.** Reusable tooling in scratchpad: `deploy-bin.sh`/`remote-apply.sh` (binary+FE swap), `rpc.sh <host> <pw> <method> [params]` (auth.login→call). Gate harness at `~/lifecycle/lifecycle` on .228 — **CHECK it isn't already running/wedged before re-launching**.
+
+---
+
+### ▶ SESSION b (2026-06-23 PM) — earlier, historical
+
+**Canonical resume detail: memory `project_session_resume_2026_06_23b` (▶️ top of MEMORY.md).**
+`gitea-vps2/main = 4346007d` pushed; local HEAD `e57514b6` (uninstall fix, committed, **not pushed/deployed**).
+
+Shipped + verified live on .228 (all in 4346007d):
+- **Connection-lost FULLY fixed** — companion `image_exists` journal-flood (Stdio::null) + netbird UDP-port reconcile churn (`wait_for_manifest_host_ports` tcp-only). .228: flood→0, ws/db→0 disconnects, load 3.95→2.26.
+- **netbird → manifest-driven** (#20 ph4) — 3 manifests + 4 orchestrator primitives (base64 secret, GeneratedCert+`ensure_manifest_certs`, templated-file render `{{HOST_IP}}/{{NETWORK_GATEWAY}}/{{secret:}}`, udp port protocol). Live: https 8087→200, OIDC→200, resolver=gateway. Legacy-Rust delete deferred to post-full-verify.
+- **registry-manifest flip (code)** — `EMBED_MANIFESTS` default-on, `main.rs` bounded pre-load `refresh_catalog`. Catalog regenerated w/ 52 embedded manifests but **NOT published** (gitignored + never committed; publish = force-add to gitea-vps2 main). Do after fleet binary roll.
+- **UX regression root-caused + fixed** — the mobile/desktop UX (loader/AppLoadingScreen, store-driven launch, app icons, android webview footer) was on `companion-mobile-ux` and **never merged to main**, so any main build silently dropped it. **Merged → main**, frontend redeployed to .228. Android 0.4.9/code13 pushed for user to build APK elsewhere.
+
+In progress — **Workstream F lifecycle bugs** (this §, user-picked next):
+- **uninstall ghost — FIXED + pushed (e57514b6) + DEPLOYED to .228.** `handle_package_uninstall` returned Err on any cleanup-residue failure *before* removing the package state entry → ghost in My Apps + revert-to-Installed. Now: split container vs cleanup errors; remove state entry as soon as containers gone (before slow data rm). **LIVE-VERIFY IN PROGRESS:** fresh grafana (not previously installed → no data risk) install→uninstall→reinstall on .228; install was mid image-pull at handoff. RPC recipe + caution in memory `project_session_resume_2026_06_23b`.
+- **#15 fedimint guardian — RESOLVED, not stuck** (legit `until` IBD-gate → setup wizard now bitcoin synced; no code change).
+- #14 grafana reinstall-stops — verify in the same grafana test (likely same root cause as #13).
+
+Next: finish grafana uninstall/reinstall live-verify on .228 → roll the new binary to the rest of the fleet (.116/.198/.5/.120 still on old binary) → publish embedded catalog (#8) → finish Workstream F (gate CASCADE+progress+all-apps expansion) → Phase 3 Quadlet → multinode.
+WATCH: main.rs pre-load `refresh_catalog` (≤25s) slows startup — sanity-check startup→RPC-ready isn't egregious on the fleet roll.
+
+---
+
+### ▶ CURRENT STATE + RESUME (2026-06-23) — earlier session-a baseline (historical)
+
+**✅ HEADLINE (2026-06-23): single-node gate GREEN (`run-gate.sh` 5/5 on .228, 0 not-ok) +
+multinode test deploy DONE to 6 nodes.** The exit criterion (§5) is met. Green took fixing **two real
+orchestrator bugs** (package.stop per-app grace, 2026-06-22; package.restart phantom stack-member
+injection, 2026-06-23 — `order_present_containers`, commit 92d7f52d) plus hardening two single-shot
+probes (bitcoin-knots state, immich lan_address). All work is **committed + PUSHED to `gitea-vps2`
+(146) `main` @ `ccb594fb`** — the local-only state is resolved. Binary = release sha `5472c575…`.
+
+**▶ DEPLOY STATE (latest backend `5472c575` + UX frontend + one-tap companion APK) — 2026-06-23:**
+
+| Node | Pw | Done | Notes |
+|------|----|----|-------|
+| .116 (local, http:80) | `ThisIsWeb54321@` | ✅ | dev node: bitcoin mid-IBD + http-only |
+| .198 | `archipelago` | ✅ | resilience; user manual-testing here |
+| .228 | `archipelago` | ✅ | canonical gate node (5×-green) |
+| 100.82.34.38 (archipelago-1) | `archipelago` | ✅ | |
+| 100.89.209.89 (archy-x250-pa) | `ThisIsWeb54321@` | ✅ | |
+| 100.70.96.88 (archipelago node) | `ThisIsWeb54321!` | ✅ | note the `!` |
+| 100.64.83.15 (archy-dev-pa) | ? | ⏳ | UP (tailscale ping ok) but `ThisIsWeb54321@` REJECTED — **need correct pw** |
+| 100.66.157.120 (archy-x250-exp) | `ThisIsWeb54321@` | ⏭️ | DOWN — user said leave it |
+
+Deploy scripts saved in scratchpad: `deploy-node.sh` (full binary+FE, sha+health verify) and
+`fe-only.sh` (FE-only, no archipelago restart). Reusable: `bash deploy-node.sh <host> <pw> <scheme> 127.0.0.1`.
+
+**▶ COMPANION APK fixed (other agent's commit `5c43e127` + my reconcile):** QR + download were a
+zip-wrapped `.apk.zip` (forced unzip). Now serve raw `archipelago-companion.apk` (one-tap) from the
+146 raw URL; `CompanionIntroOverlay.vue` + ship/publish scripts repointed; old `.zip` dropped. The
+OLD `.apk.zip` URL now 404s, so EVERY node was FE-refreshed to the new build (all 6 verified
+`/ : 200` + bundle references `archipelago-companion.apk`).
+
+**▶ MANUAL-TEST BUGS FOUND on .198 → workstream F (§4/§6c).** The green gate is DESTRUCTIVE-tier /
+~8 core apps; it SKIPS uninstall/reinstall and has no progress-UI / all-apps coverage. Real bugs:
+immich+grafana **uninstall hangs at a solid full-red bar + leaves a ghost in My Apps** (doesn't
+actually remove); grafana **reinstall stops**; fedimint guardian shows "waiting for bitcoin sync"
+(verify legit vs stuck). These motivate **workstream F** (cascade + progress + all-apps gate).
+Also added **§10**: investigate TanStack-Query/push-based state mgmt for neode-ui (the state-drift
+root cause behind the stuck bar + ghosts).
+
+**▶ NEXT — agreed task order (do IN ORDER, see §6b):**
+1. **netbird #20 ph4** — last real manifest migration.
+2. **Phase-3 `use_quadlet_backends`** — orchestrator backends → Quadlet units.
+3. **§6c workstream F** — cascade/uninstall + progress-UI + ALL-apps gate; fix the immich/grafana
+   uninstall + ghost-My-Apps + reinstall-stops bugs to a 5×-green; then §10 state-mgmt investigation.
+4. **Multinode pass** — `docs/multinode-testing-plan.md` (the 6 deployed nodes are ready for manual
+   testing now).
+
+**▶ LOOSE ENDS / gotchas for the resuming session:**
+- **`neode-ui/src/components/AppLoadingScreen.vue` is UNTRACKED** on .116 — the other agent created it
+  but NO committed code imports it (orphan, not in `e825bbed`). Left in place; decide whether to wire
+  it in or delete. Not deployed (committed UX doesn't reference it).
+- **gitea-local mirror (`localhost:3000`) push is BROKEN** (token redirects to `/login`); push to
+  `gitea-vps2` works and is primary. Reconcile the local mirror token if you need it.
+- **Don't delete bitcoin/electrum data** (user directive) — run only the DESTRUCTIVE gate
+  (`run-gate.sh` default; never set `ARCHY_ALLOW_CASCADE_DESTRUCTIVE` on real nodes with synced chains).
+- **.198 gate not run this session** (user was manual-testing there + restarting). .116 gate ran but
+  failed 12 tests — ALL environmental (.116 is http-only → ui-coverage hardcodes `https://`; + bitcoin
+  mid-IBD → bitcoin/lnd preconditions). NOT product regressions. `gate-116.log` on .116.
+
+**(historical resume notes for the 5× chase below — superseded by the green result above)**
+
+**Headline (2026-06-22):** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN
+(110/110)**; a **fresh 5× run is IN PROGRESS on `.228`** (the single-node exit criterion) after a
+real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out
+(`docs/multinode-testing-plan.md`). The gate is canonically **5×** now — `run-gate.sh` (the `20x`
+naming/script was removed 2026-06-22, commit `57a013bc`).
+
+**2026-06-22 (late) — mempool stale-IP bug FOUND + FIXED (real production bug, not a flake):**
+The 1st 5× attempt failed iteration 1 on `#74 mempool api backend remains queryable`. Root cause was
+NOT timing — the frontend nginx pinned mempool-api's IP at startup (no `resolver`); after the gate
+restarts mempool-api (new podman IP) nginx 502s and the UI shows "offline". Fixed in
+`mempool-frontend:v3.0.1` (resolver+variable proxy_pass; see `[[project_mempool_nginx_stale_ip_fix]]`
+/ `docker/mempool-frontend/`), pushed to vps2, manifests bumped 3.0.0→3.0.1, deployed + resilience-
+verified live on .228 (backend restart now auto-recovers). Also fixed the test itself (`mempool.bats`
+#74: 180s→300s + real `fail` helper). Commits `0f05f73a` (fix) `57a013bc` (gate rename).
+
+**THE 5× RUN IS DETACHED ON .228 — survives terminal/session close. Check it from any machine:**
+```
+sshpass -p archipelago ssh archipelago@192.168.1.228 \
+  'grep -E "iteration [0-9]+: (PASS|FAIL)|RESULTS|passed:|failed:" /tmp/gate-5x3.log; \
+   echo "running pid: $(pgrep -f run-gate.sh$ || echo DONE)"; grep "^not ok" /tmp/gate-5x3.log | sort -u'
+```
+- Log: `/tmp/gate-5x3.log` on .228 · launched `nohup` · `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1`,
+  run **ON the node** from `/tmp/lifecycle-run/tests/lifecycle` via `./run-gate.sh` (ARCHY_HOST=127.0.0.1).
+  `bats` 1.11.1 + static `jq` 1.7.1 are installed on .228.
+- **If all 5 iterations PASS → .228 has met the single-node criterion → demote the banner.**
+- If it flakes again: readiness-under-churn (lnd/mempool); hardening in `98f4fa44` (inter-iteration
+  `settle_stack()` + readiness windows). Re-copy repo `tests/lifecycle` to /tmp/lifecycle-run, relaunch.
+
+**▶ 2026-06-23 (morning) — 5× FINISHED 2/5; both mempool fails ROOT-CAUSED to ONE real
+orchestrator bug (NOT flakes) + FIXED:** the overnight run finished `passed: 2 / failed: 3` on
+`gate-5x3.log`, three *distinct one-off* fails, none repeating:
+- iter1 `#5 container-list valid state for bitcoin-knots` — pre-launch churn (as predicted); didn't
+  repeat. **Hardened anyway:** the probe was a single-shot read; now polls ≤30s for a settled valid
+  state so a momentary `restarting`/transient can't flake a 20-min iteration (`bitcoin-knots.bats`).
+- iter2 `#74 mempool api queryable` + iter5 `#73 mempool stack running` — **SAME root cause.**
+  `package.restart mempool` resolves its container list via `ordered_containers_for_start`, which was
+  **injecting phantom stack-member names** (`mysql-mempool`, `archy-mempool-api`, `archy-mempool-web`
+  — variant names from the union `startup_order` list that aren't live on this node). The phantom
+  `mysql-mempool` is 2nd in the start order; `do_orchestrator_package_start` hits its unknown-app-id
+  fallback → `do_package_start` inspect fails "no such object" → the `?` **aborts the whole start
+  sequence**, so `mempool-api` (pos 5) + `mempool` frontend (pos 8) never start. They then sat down
+  ~6 min until the health monitor independently recovered them → #73 (frontend not running in 180s)
+  and #74 (api not queryable in 300s) both flake. Journal proof on .228: `package.restart mempool
+  failed: Start failed: mysql-mempool: ... no such object`, 23:27:32.
+  **Fix:** `ordered_containers_for_start` now orders only the *actually-present* containers and never
+  injects phantom order entries (new pure helper `order_present_containers` + 3 unit tests,
+  `dependencies.rs`). This is the SAME class as the mempool nginx bug — a hardcoded-name/reality
+  mismatch — and is exactly the manifest-driven-lifecycle anti-pattern the master plan targets.
+- **Deploy + relaunch:** built release binary on .116, swapped `/usr/local/bin/archipelago` on .228
+  (containers live under `user@1000.service`, NOT the `archipelago.service` cgroup, so a service
+  restart does NOT kill them — verified via conmon cgroup paths). Manually verified mempool restart
+  keeps the stack up, then relaunched a clean 5× → see `gate-5x4.log` (check cmd above, swap the
+  filename). Expectation: all three fixed → 5/5 green → demote the banner.
+
+**Code fixes shipped this session (all on `main`, built + DEPLOYED to .228 AND .198):**
+- `2dad64b2` stop honours per-app grace (was `-t 30` deadline racing SIGKILL).
+- `760a32bc` reconciler stops resurrecting user-stopped apps (dep-override + host-port watchdog).
+- `6e49ce6f` container-list reports user-stopped apps as `stopped` despite a live UI companion.
+- `452f05d8` companion self-heal on its own ~30s loop (was gated behind the slow per-app pass).
+- Test-harness hardening: `88930558` `53b8e47f` `892ff083` `98f4fa44` (readiness retries, immich/
+  fedimint/NPM/lnd windows, inter-iteration settle). Binary built on .116
+  `core/target/release/archipelago` (4-fix); deploy = stop archipelago, cp to /usr/local/bin, start.
+
+**NODE-STATE fixes on .228 NOT in the repo (re-apply if .228 is reset/reimaged):**
+- nginx `/app/lnd/` proxy target was stale `8081` → fixed to `18083` (sed in
+  /etc/nginx/sites-{available,enabled}/archipelago + snippets, then `nginx -s reload`). Repo code is
+  correct (18083); old node config was stale.
+- Removed a stale orphan `~/.config/containers/systemd/home-assistant.container` (ContainerName
+  `home-assistant` ≠ the real `homeassistant` container; it was stuck "activating"). Real app fine.
+- electrumx was re-installed (`package.install` w/ image `146.59.87.168:3000/lfg2025/electrumx:v1.18.0`)
+  to re-register it as a tracked manifest app (it had become adopted plain-podman).
+
+**KEY LESSON:** run the lifecycle gate **ON the node**, not via RPC from .116 — its bitcoin/companion/
+orphan/endpoint tests use local `podman`/`systemctl`/`bitcoin-cli`/`curl`, so a remote run silently
+tests the *runner* (this is why earlier runs from .116 falsely showed "bitcoin in IBD" etc.).
+
+**Remaining (after 5× green):** netbird migration (#20 ph4 — the one real migration left) + btcpay/
+mempool stack polish; Phase-3 `use_quadlet_backends`; B flip-on (EMBED_MANIFESTS+sign); per-app test
+coverage (~30 apps unwritten); the mobile app-launch UX (§8 Roadmap P1). Multinode → its own plan.
+
+---
+
+### Where we are — Task #20 (manifest lifecycle hooks) + indeedhub migration: DONE & 2-node verified
+
+Manifest-driven lifecycle hooks + the IndeedHub stack migration are **complete and
+live-verified on BOTH .228 and .198** (adoption + fresh-create + post_install hook
+exec, stable under load). 15 commits this session: `4c1a4e59`..`e2a012d0`. Working
+tree clean. The release lifecycle gate is **5×** (`ARCHY_ITERATIONS=5`).
+
+**Shipped (all on `main`, newest first):**
+- `e2a012d0` indeedhub frontend health → `tcp:7777` (was http GET `/`; the http check
+  false-failed under load and the reconciler churned the frontend — fixed).
+- `ff78b312` hook `exec` runs in a transient user scope
+  (`systemd-run --user --scope --quiet --collect podman exec …`) — fixes
+  "crun: write cgroup.procs: Permission denied" when exec'ing from archipelago.service.
+- `ff8f11b8` indeedhub frontend caps `[CHOWN,DAC_OVERRIDE,SETGID,SETUID]` — nginx
+  workers died "setgid(101) failed" under the orchestrator's `--cap-drop=ALL`.
+- `b73084db` DELETED the legacy indeedhub orchestrator special-cases (−382 lines:
+  reconcile_indeedhub_stack, start_indeedhub_backends, the 120s dependency-DNS gate,
+  patch_indeedhub_nostr_provider, repair_indeedhub_network_aliases, INDEEDHUB_* consts)
+  → "indeedhub" now uses the GENERIC install_fresh/reconcile path.
+- `b1eea8c0` 7 indeedhub manifests (apps/indeedhub{,-postgres,-redis,-minio,-relay,-api,
+  -ffmpeg}) + `install_indeedhub_stack` orchestrator-first (immich pattern).
+- `b94b61f6` `network_aliases` ContainerConfig field (podman_client + quadlet rendering,
+  DNS-label validated) — lets the frontend nginx reach `api:4000`/`minio:9000`/`relay:8080`
+  on the dedicated `indeedhub-net`.
+- `955c54b7`/`4c1a4e59` #20 hooks phases 1-2: schema (LifecycleHooks/HookStep/HostCopy in
+  archipelago-container::manifest) + executor `container::hooks::run_post_install`
+  (allowlist-canonicalised copy_from_host + scoped exec), wired into `install_fresh`.
+- `84031e62` gate 20×→5× (docs only: CLAUDE.md, this file, tests/lifecycle/TESTING.md).
+
+**Design = adoption-safe + manifest-driven.** Manifests reproduce the live install exactly
+so existing nodes ADOPT (NoOp) instead of recreate: hyphen container_names the runtime
+already references, named volumes `indeedhub-{postgres,redis,minio,relay}-data`,
+`indeedhub-net` + network_aliases [postgres|redis|minio|relay|api], generated_secrets reuse
+the live /var/lib/archipelago/secrets values (ensure_one no-ops on existing; postgres pw is
+fixed at PGDATA init). minio user "indeeadmin" + AES_MASTER_SECRET literal kept. The
+frontend image indeedhub:1.0.0 already bakes the iframe nginx (X-Frame omit + nostr-provider.js
+ sub_filter), so the post_install hook (sed X-Frame / copy nostr-provider.js / inject /
+nginx reload) is defensive/idempotent. crash_recovery.rs's frontend-after-deps ordering
+guard is KEPT on purpose (beneficial; not a blocker).
+
+### ⛔ GATE BLOCKER 2026-06-22 — `package.stop` ignores the per-app stop grace (REAL, fleet-wide, ROOT-CAUSED)
+
+Step 1 (sync .228 tcp-health manifest) is **DONE + verified**. Step 2 (the 5× gate) surfaced a
+real, fleet-wide `package.stop` bug — **reproduced on the CLEAN, quadlet-correct .198**, so it is a
+genuine product bug, not node contamination. Root cause is fully pinned (below).
+
+**Symptom.** `package.stop <app>` returns `{"status":"stopping"}` but the container **never stops**
+(`container-list` shows `running` 60s+); the gate's `wait_for_container_status … stopped 60` times
+out. Hits **fedimint, electrumx, bitcoin-knots, btcpay-server, immich** (slow-to-SIGTERM apps).
+`filebrowser` passes because it exits on SIGTERM in <30s.
+
+**ROOT CAUSE (from .198 journal during a live `package.stop fedimint`):**
+```
+WARN  quadlet: systemctl --user stop fedimint.service timed out after 45s
+ERROR runtime: package.stop fedimint failed: stop_container fedimint:
+      podman stop -t 30 fedimint timed out after 30s: deadline has elapsed
+```
+The orchestrator stop path **ignores the per-app graceful-stop table** and the wrapper deadline
+equals the grace:
+- `archipelago::api::rpc::package::runtime::stop_timeout_secs()` defines per-app grace
+  (**bitcoin 600s, lnd 330s, electrumx 300s, immich_postgres 120s, fedimint/btcpay 60s**, default 30).
+  The **legacy** stop paths use it (runtime.rs:329/607/1060 `podman stop -t <stop_timeout_secs>`).
+- The **orchestrator** path does NOT: `prod_orchestrator::stop()` → `ContainerRuntime::stop_container`
+  (`container/src/runtime.rs:124`) → API `PodmanClient::stop_container` hardcodes **`?t=10`**
+  (podman_client.rs) and the CLI fallback hardcodes **`-t 30`** (runtime.rs:128). fedimint needs 60s
+  but gets 10s/30s ⇒ SIGTERM grace expires; the API/CLI stop errors out and the whole stop fails →
+  state reverts to `running`.
+- **Compounding:** `PODMAN_CLI_DEFAULT_TIMEOUT = 30s` (runtime.rs:9) wraps `podman stop -t 30`, so
+  the await fires **exactly** when podman would SIGKILL → "timed out after 30s" even though the kill
+  would land a moment later. The wrapper deadline must exceed the `-t` grace.
+
+**FIX (two parts, design choice flagged):**
+1. **Thread the per-app stop grace into the orchestrator stop path.** Either (A) move/duplicate
+   `stop_timeout_secs` into the `container` crate and have `stop_container` use it, (B) extend the
+   `ContainerRuntime::stop_container` signature to take a `grace: Duration` and have
+   `prod_orchestrator::stop()` compute it from the loaded manifest, or **(C, north-star-aligned)**
+   add a `stop_grace_secs` field to the manifest (default 30) and read it from `lm.manifest` in
+   `stop()`. (C) is the manifest-driven choice; bitcoin/lnd/electrumx/fedimint manifests then declare
+   their value. **DECISION NEEDED from owner: A/B (fast, table-based) vs C (manifest-driven).**
+2. **Make the CLI/API wrapper deadline = grace + buffer** (e.g. grace + 15s) so podman's SIGKILL
+   completes inside the await. Apply to both `PodmanClient::stop_container` (`?t=`+HTTP timeout) and
+   the `runtime.rs` CLI fallback (`-t`+`PODMAN_CLI_DEFAULT_TIMEOUT`).
+   Add a mock-orchestrator test: a container that ignores SIGTERM for >30s must still end `stopped`.
+
+**Build/deploy after the fix:** `cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`
+→ sideload to .228 + .198 (stop archipelago, cp binary, start) → **re-quadletize .228** (its backend
+`.container` files are gone from my cascade-gate contamination — reinstall its apps so units
+regenerate, matching .198) → re-run the canonical gate (DESTRUCTIVE only).
+
+### ✅/⚠️ FIX SHIPPED + VALIDATED 2026-06-22 — and the gate has MORE causes than the grace bug
+
+**Done:** the grace fix is implemented (option **C+table fallback**: manifest `stop_grace_secs` →
+`stop_grace_secs_for()` table; deadline = grace + 15s), unit-tested (3 tests green), committed
+(`2dad64b2`), release-built, and **deployed to BOTH .228 and .198** (active, UI 200). Quadlet
+regression suite green (37/37). **Validated:** healthy app `vaultwarden` stops cleanly on .198
+(running→exited→removed) — no regression; the deployed binary's stop path works.
+
+**The gate stop-failure was MULTI-CAUSED (3 real product bugs) — all 3 now FIXED + the electrumx
+lifecycle suite is GREEN (10/10, 66s) on .228:**
+1. ✅ **Stop ignored per-app grace** (`podman stop -t 30` spurious 30s timeout) — commit `2dad64b2`.
+   Orchestrator now uses manifest `stop_grace_secs` → `stop_grace_secs_for()` table; deadline =
+   grace + 15s; applied to quadlet stop + API + CLI.
+2. ✅ **Reconciler resurrected user-stopped apps** — commit `760a32bc`. The reconcile filter's
+   `dependency_required` override re-included a user-stopped dependency (electrumx ← active mempool),
+   the in-memory `disabled` set is wiped on manifest reload, and the host-port "repair" then restarted
+   the stopped backend within ~8s. Fix: `ensure_running_with_mode` now bails `Left("user-stopped")`
+   when the on-disk `user_stopped` marker is set (the single choke point all reconcile flows through);
+   install/start clear the marker first so user actions are unaffected.
+3. ✅ **container-list reported user-stopped apps as `running`** — commit `6e49ce6f`. The backend was
+   Exited but its UI companion (electrs-ui/bitcoin-ui/…) kept serving the launch port, and the
+   state-refresh upgraded any reachable launch port to `running`. Fix: `handle_container_list` forces
+   `stopped` for `user_stopped` apps before the launch-port refresh.
+
+**Earlier theories now RESOLVED/superseded:** "fedimint crash-looping" was **probe-induced churn** —
+left alone, fedimint is stable (Up 48 min, 0 watchdog restarts/30 min); its restarts during testing
+were the host-port watchdog firing while I rapid-cycled stop/start (fixed by #2). "Exited→Stopped
+key mismatch" was actually the live-UI-companion launch-port issue (#3). "Grace vs gate-timeout"
+(electrumx 300s) was moot — a healthy electrumx honours SIGQUIT and stops in <1s.
+
+**TWO-NODE GATE RESULT (1×, DESTRUCTIVE, both with the 3-fix binary):**
+- **.228: 104/110.** All previously-failing `package.stop` tests now PASS (bitcoin/btcpay/electrumx/
+  fedimint/immich). Remaining 6: test 31 (companion recreate), 44 (fedimint orphan — probe
+  pollution), 55 (immich restart timing), 83 (bitcoin not archival-synced), 94/99 (endpoint/lnd-proxy
+  cascade from 83).
+- **.198: 94/110.** **14 of 16 failures are one root cause: bitcoin is in IBD** (test 83 says
+  `blocks=817652 headers=954850` — ~137k behind). Everything chained to bitcoin cascades: lnd
+  (16,85), btcpay (22,23,103), electrumx (37), mempool stack (71,72,73,101), endpoints (94),
+  bitcoin.getinfo (7,12). The other 2 are node-independent: **31** (companion recreate) and **44**
+  (fedimint orphan pollution).
+
+**CONCLUSION: the lifecycle-stop blocker is FIXED and validated on both nodes.** The residual red is
+NOT lifecycle bugs — it is (a) **bitcoin still syncing (IBD)** on the test nodes [test 83 is an
+explicit precondition; nothing electrumx/lnd/btcpay/mempool can pass until it finishes], (b) **.228
+plain-podman contamination** (my cascade-gate), and (c) two minor items: **test 31** companion-unit
+recreate (both nodes — likely the 90s window vs reconcile tick + image step; investigate) and **test
+44** orphan fedimint container left by my probing.
+
+**EVERY gate failure is now FIXED or explained — NO lifecycle code bugs remain.** Final read:
+- ✅ `package.stop` (the blocker): 3 bugs fixed (`2dad64b2`/`760a32bc`/`6e49ce6f`), green both nodes.
+- **bitcoin-IBD cascade** (most of .198's red): environmental — bitcoin syncing (test 83 precondition).
+- **test 31** companion-recreate: NOT a product bug. Two things: (a) **FIXED** — the companion
+  reconcile stage was gated behind the slow per-app pass; now it runs on its own ~30s loop
+  (`452f05d8`). Validated on .228 with the new binary: a deleted `archy-electrs-ui` unit self-heals
+  in **~10s** (was stuck 100s+), journal: `companion not active, repairing → wrote quadlet unit →
+  companion started`. (b) **HARNESS CAVEAT** — the companion-survives bats does LOCAL `rm`/`systemctl
+  --user` (no ssh), so running the gate from .116 against a remote node actually tests **.116's**
+  companions with **.116's** (old) binary, not the RPC target. ⇒ the companion-survives suite must be
+  run ON the target node (or with the new binary on .116) to be meaningful. This explains the
+  "failed on both nodes" runs — both were silently testing .116.
+- **test 55** immich restart: NOT a bug — the heavy 3-container stack (postgres+redis+server) restarts
+  in >120s under load; immich DOES return to running. *Optional:* bump the immich restart wait.
+- **test 44** fedimint orphan: my probe pollution; a teardown clears it.
+
+**To reach a literally-green 5× gate (now infra/node-prep + minor test-window tuning, not lifecycle code):**
+1. Let bitcoin finish IBD on a test node (or point the gate at an archival-synced bitcoin).
+2. Re-quadletize .228 (reinstall its backends so `.container` units regenerate, matching .198).
+   electrumx done; bitcoin/btcpay/fedimint/immich/etc. remain. (Most backends ARE in manifest_ids
+   already; this is about regenerating quadlet units + clearing adopted plain-podman state.)
+3. Optional: faster companion-reconcile cadence (test 31) + longer immich-restart wait (test 55) +
+   clear the test-44 orphan — or simply run the gate on a less-loaded, bitcoin-synced node.
+4. ✅ **test 31 ROOT-CAUSED = contamination + load (NOT a product bug).** `companion::reconcile` only
+   recreates a deleted companion unit (e.g. `archy-electrs-ui`) when its PARENT backend (electrumx)
+   is in `manifest_ids`. On contaminated .228 electrumx ran as plain podman and was NOT a tracked
+   manifest install (its `/opt/.../electrumx/manifest.yml` exists on disk but wasn't loaded), so the
+   reconciler never iterated it → companion orphaned. **Proven fix:** `package.install electrumx`
+   re-registered it (now `reconcile action app_id=electrumx` fires) AND restored the companion (unit
+   present, service active). The companion self-heal logic is correct. ⇒ test 31 clears once .228 is
+   re-quadletized (step 2). electrumx on .228 is now de-contaminated. Still: clear test-44 orphans.
+4. Then run `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1` on the synced+quadlet node, then the other.
+
+**Quadlet context (still true, but SEPARATE from the bug above):** quadlet IS the intended backend
+runtime — .198 has the backend `.container` files (bitcoin-knots/btcpay-server/fedimint/filebrowser/
+indeedhub/gitea/grafana/botfights/…). .228 lost them (only UI companions + home-assistant remain;
+`bitcoin-core.container` is `.disabled-20260506`) **because my cascade-gate uninstalled its apps and
+my `package.start` restore recreated them as bare `podman run --restart=unless-stopped`** without
+regenerating units. Two related hardening items: (a) `package.start` should regenerate a missing
+quadlet unit, not fall back to bare podman; (b) re-survey the status doc's "Quadlet-everywhere ~96%"
+from `.container`-file presence + `PODMAN_SYSTEMD_UNIT`, not from "container running".
+
+The **stop→stopped STATE reporting is correct** once the container actually stops (server.rs:1334
+keeps a `--rm`'d app visible as `Stopped` via the `user_stopped` guard — proven on filebrowser); the
+bug is purely "container never stops", not "state not reported".
+
+### MY-SESSION ERRATA (own it on resume)
+- I ran the gate with `ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1`, which is **NOT** the canonical gate (that
+  is `ARCHY_ALLOW_DESTRUCTIVE=1` only — stop/start/restart, no uninstall/reinstall; see run-gate.sh
+  "Suggested release-gate invocation"). Cascade ran uninstall/reinstall on every app and, when I
+  killed the run mid-iteration, left bitcoin-knots/electrumx/btcpay/fedimint/immich uninstalled or
+  stranded. **I fully restored .228** (reinstalled bitcoin-knots with the correct image
+  `146.59.87.168:3000/lfg2025/bitcoin-knots:latest`; started the rest; cleared a stale
+  `user-stopped.json`). Verified healthy: UI 200, 35 containers, 17 apps `running`.
+- Reinstall gotcha: `package.install` needs a REAL image ref in `dockerImage`; a bare app name
+  → `Invalid Docker image format`.
+
+### NEXT STEPS (in order) — SINGLE-NODE (.228) criterion
+1. ✅ **DONE** — 4 stop/reconcile bugs fixed + deployed (`2dad64b2` grace, `760a32bc`
+   reconcile-resurrection guard, `6e49ce6f` container-list user-stopped, `452f05d8` companion
+   cadence). Plus test-harness fixes (lnd/immich/fedimint/NPM readiness + config).
+2. ✅ **DONE** — gate run **ON .228** (synced bitcoin): **110/110 GREEN** (1×). Key lesson:
+   **run the gate on the node**, not via RPC from .116 (local podman/systemctl/bitcoin probes).
+3. ◧ **5× run on .228 in progress** (`ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1`, on the node).
+   5 consecutive clean iterations = the single-node gate criterion → demote the banner.
+4. **netbird migration (#20 phase 4)** — the one real migration left; assess setup steps first (TLS
+   cert gen, config files, resolver IP — may need host-file-write hooks beyond exec/copy_from_host;
+   legacy is install_netbird_stack in stacks.rs). Then btcpay/mempool stack polish.
+5. Hardening: `package.start` should regenerate a missing quadlet unit, not fall back to bare podman.
+
+**Multinode / fleet (.198 + the rest) → `docs/multinode-testing-plan.md` (separate, after .228 green).**
+Carry-over notes for that plan: .198 bitcoin was mid-IBD; the lnd `/app/lnd/` nginx proxy had a
+stale `8081` target on .228 (repo code is correct at 18083 — re-check on other nodes).
+
+### KNOWN ISSUES / WATCH-OUTS
+- **.198 is a weak/loaded node** (load avg ~3–5). The generic reconcile recreates
+  containers it deems unhealthy; under load, false-failing health checks → churn. The
+  tcp-health fix (`e2a012d0`) mitigated the frontend case. If the lifecycle gate churns on
+  .198, look for other apps whose http health checks false-fail under load → prefer tcp.
+- **Many concurrent SSH sessions to .198 wedge its sshd** (MaxStartups) — it pings but SSH
+  hangs for minutes. Use ONE ssh at a time to .198; `pkill -f 192.168.1.198` to clear strays.
+- Hook `exec` only works in the scoped form (committed). `copy_from_host` is direct `cp`.
+
+### DEPLOY / VERIFY FACTS (both nodes, ISO Debian, glibc 2.41 — binary built on .116 runs on both)
+- **Build:** `cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`
+  (~12 min, opt-level=3). Binary at `core/target/release/archipelago`. Linker
+  "undefined hidden symbol" → rebuild with CARGO_INCREMENTAL=0. `archipelago` is a
+  bin-only crate (no lib). Filtered tests: `cargo test -p archipelago --bin archipelago -- hooks quadlet`.
+- **Sideload:** `scp binary $H:/tmp/archipelago-new` → `sudo systemctl stop archipelago;
+  sudo cp /tmp/archipelago-new /usr/local/bin/archipelago; sudo chmod +x …; sudo systemctl
+  start archipelago`. Containers SURVIVE the restart (--restart unless-stopped +
+  podman-restart.service). Binary path is /usr/local/bin/archipelago.
+- **Manifests** live at /opt/archipelago/apps/<app_id>/manifest.yml (root-owned ok). The
+  orchestrator CACHES them at startup → **edit on disk then RESTART archipelago to reload**.
+  Bulk deploy: `tar czf t.tgz -C apps indeedhub indeedhub-postgres indeedhub-redis
+  indeedhub-minio indeedhub-relay indeedhub-api indeedhub-ffmpeg`; scp; `sudo tar xzf t.tgz
+  -C /opt/archipelago/apps`.
+- **Nodes:** .228 = 192.168.1.228, SSH pw `archipelago`, RPC/UI pw `password123` (https).
+  .198 = 192.168.1.198, SSH pw `archipelago`, **RPC/UI pw `ThisIsWeb54321@`** (https). Both
+  have the 7-container indeedhub stack + secrets + named volumes pre-existing.
+- **Trigger install via RPC:** `auth.login` (sets session+csrf cookies) → send the csrf
+  cookie value as `X-CSRF-Token` header → `package.install` with params
+  `{"id":"indeedhub","dockerImage":"<any>"}` (dockerImage required even for stacks; install
+  is async → returns `{"status":"installing"}`). install logs go to
+  /var/log/archipelago/container-installs.log (best-effort) AND journalctl -u archipelago.
+- **Fresh-create test recipe:** `podman rm -f indeedhub` (stateless frontend) → package.install
+  indeedhub → expect install_fresh + post_install hook (all 4 steps `ok`) + UI 200 on :7778
+  (/ , /nostr-provider.js, /api/). On adoption the frontend is NoOp (hook does NOT run —
+  install_fresh is the only hook trigger).
+
+## 9. Documentation map (what survives)
+
+This master plan is the hub. Authoritative standalone docs (linked above), kept:
+
+- **Design:** `architecture.md`, `app-developer-guide.md`,
+  `APP-PACKAGING-MIGRATION-PLAN.md`, `registry-manifest-design.md`,
+  `marketplace-protocol.md`, `dht-distribution-design.md`,
+  `multi-node-architecture.md`, `rust-orchestrator-migration.md`,
+  `bulletproof-containers.md`, `three-mode-ui-design.md`, `dual-ecash-design.md`,
+  `meshroller-integration-design.md`, `phase4-streaming-ecash-plan.md`, `adr/*`.
+- **Reference:** `app-manifest-spec.md`, `api-reference.md`, `developer-guide.md`,
+  `operations-runbook.md`, `troubleshooting.md`, `user-walkthrough.md`,
+  `bitcoin-rpc-relay.md`, `security-code-audit-2026-03.md`, `GAMEPAD-NAV.md`,
+  `SEED-VERIFICATION.md`, `hotfix-process.md`, `app-registry-status-2026-06-21.md`.
+
+All dated handoffs/resumes/transcripts/superseded trackers were consolidated here
+and removed (recoverable via git) on 2026-06-21.
+
+## 10. Backlog — investigate frontend state management (2026-06-23)
+
+**Investigate adopting a real client-state/data-fetching layer for `neode-ui`** instead of
+the current hand-rolled Pinia stores + ad-hoc fetch/poll patterns. Motivation: lifecycle/UX
+bugs like the stuck "full-red" install/uninstall progress bar and ghost **My Apps** entries
+(see §6c) are partly a *state-sync* problem — the UI's view of package state drifts from the
+backend and isn't reliably invalidated/refetched. A principled query/cache layer (request
+dedup, background refetch, cache invalidation on mutation, optimistic updates, retry/stale
+handling) would make these classes of bug structurally hard.
+
+**Research → recommend → (maybe) adopt:**
+- Evaluate **TanStack Query** (Vue Query) as the leading candidate, plus alternatives
+  (Pinia Colada, vue-query alternatives, plain Pinia + a disciplined invalidation layer, or
+  an SSE/WebSocket push model for package-state events instead of polling).
+- Criteria: fit with the existing Pinia/RPC architecture, bundle-size cost, offline/PWA
+  behaviour, how cleanly it models long-running mutations (install/uninstall with progress),
+  and whether a push channel for package-state changes is the better root-cause fix.
+- Deliverable: a short design note + a recommendation, then a scoped migration of the
+  package-lifecycle surfaces (My Apps / install / uninstall / update progress) as the proof
+  case — sequence AFTER workstream F (it informs F's progress-UI fix and vice-versa).
+
+## 10b. Backlog — intelligent launch-port selection (2026-06-26)
+
+**Replace the per-app static launch-port map with a smart, manifest-first heuristic.** Gitea
+launched at **:2222 (SSH)** instead of **:3001 (web)** on a node missing the gitea manifest on
+disk: `manifest_lan_address_for` returned None → the code fell through to `extract_lan_address`,
+which returns podman's **first-listed** published port, and podman lists `2222->22` before
+`3001->3000`. Patched 2026-06-26 (`670ebb06`) with a static `"gitea" => 3001` entry in
+`lan_address_for` (`core/container/src/podman_client.rs`) — but that's a per-app band-aid (the
+anti-pattern CLAUDE.md warns against; the map already carries bitcoin/lnd/mempool/immich/… by hand).
+
+**Real fix (do this, then delete the static entries):**
+- **Primary** is already correct — derive the launch URL from the manifest's declared
+  `interfaces.main` port. The failure was only the *fallback*. The north-star cure is
+  registry-distributed manifests (workstream B) so the manifest is always present and we never
+  guess.
+- **Smart fallback** — make `extract_lan_address` stop returning the blind first port: **skip
+  container-side ports that are known non-HTTP (22/SSH, etc.) and prefer the published port whose
+  container side matches the manifest `health_check` endpoint / a known web port.** Fixes the whole
+  multi-port-app class generically (no per-app hardcoding), and lets us drop the static map.
+- ~20-line change to one function + unit tests; rides the next fleet roll. NOT a free-port
+  remap (that's `port_allocator.rs`, which already resolves host-port *collisions* — a different
+  problem; gitea's web UI was never in conflict).
+
+## 10c. Backlog — generalize the archival/full-node install blocker (2026-06-26)
+
+**Make "this app needs an un-pruned (archival, txindex) Bitcoin node" a manifest-declared
+dependency, applied to every app that needs it — using the electrumX/mempool blocker as the
+reference behavior.** Today the gate works but is **hardcoded**: `requires_unpruned_bitcoin()` in
+`core/archipelago/src/api/rpc/package/dependencies.rs` is a literal `matches!(package_id, "electrumx"
+| "electrs" | "mempool-electrs" | "mempool" | "mempool-web")`, and install `bail!`s with
+`archival_bitcoin_required_message` when `bitcoin.pruned` is true or disk < `ARCHIVAL_BITCOIN_DISK_GB`
+(1 TB). That's the same per-app-hardcoding anti-pattern as the gitea static map (§10b) and the
+`install_*_stack` Rust — any new app needing a full node is silently *un*-gated until someone edits
+this match.
+
+**Do:**
+- **Declare it in the manifest** — e.g. `requires: { bitcoin: archival }` (or a
+  `dependencies.bitcoin.pruned: false` constraint) so the install pre-flight reads the requirement
+  from the manifest set instead of a hardcoded list. Covers future apps automatically (manifest-driven
+  north star).
+- **Audit coverage** — confirm EVERY archival-dependent app is gated (electrumX, electrs,
+  mempool + its electrs, and any BTC-indexer/explorer added later); add a unit test asserting the
+  manifest constraint ⇒ blocker fires.
+- **UX** — the blocker must be a clear, surfaced **pre-install** state in the UI (not just an RPC
+  `bail!` string): explain *why* (pruned node / insufficient disk), what to do (add ~1 TB, resync
+  un-pruned with txindex), and keep the app visibly "requires archival node" rather than a confusing
+  generic failure. Pairs with workstream F's honest-progress/blocker UX.
+- Reference: the existing `package-install-prune-check` dependency descriptor (dependencies.rs:208)
+  is the seam to make data-driven.
+
+## 10d. Mesh — Meshtastic MeshCore-parity (in the fleet binary; one open bug) (2026-06-26)
+
+**Status: shipped as commit `8fdb45e8` and now riding in the rolled fleet binary** (built into the
+#9 deploy from HEAD, sha `0060dcd6…`). The Meshtastic driver auto-provisions LoRa **region (EU_868)**
+and a shared **channel "archipelago"** via the official admin API (`set_config`=field34,
+`set_channel`=field33) — discovery, bidirectional RF, and **sending** are all verified on **.116 + .228**.
+Detail + history: [[project_meshtastic_parity]].
+
+**Open work (slot after WS-F #9–11, before/with multinode):**
+- **RECEIVED-message surfacing bug** — the running driver does **not** surface received messages
+  (`mesh.messages` stays `[]`) even though the radio physically receives them. An instrumentation
+  build was in flight to locate where the inbound packet is dropped between the radio serial/BLE read
+  and the `mesh.messages` store. This is the one blocker to closing MeshCore parity.
+- **.198 radio is bad** — won't persist config (needs a reflash) so it's not a usable mesh test node;
+  use .116/.228 for mesh verification.
+- Definition of done: a message sent from a MeshCore/Meshtastic peer on channel "archipelago" appears
+  in `mesh.messages` on the receiving archipelago node, end-to-end, on ≥2 LAN nodes.
--- a/docs/PROGRESS_MEMORY.md
+++ b/docs/PROGRESS_MEMORY.md
@ -1,44 +0,0 @@
-# Progress Memory
-
-Last updated: 2026-06-13
-
-## Current State
-
- `v1.7.90-alpha` release is complete, tagged, pushed, uploaded, and verified on vps2.
- Release commit: `bb808df8` (chore: release v1.7.90-alpha).
- Feature commit:  `c800293f` (fix: bitcoin receive, AIUI pointer input, electrs self-heal, OTA timeout).
- Gitea tag: `v1.7.90-alpha` (on origin/gitea-vps2).
- Live OTA manifest on the update host (146.59.87.168) now resolves to `1.7.90-alpha`; both
-  artifact download URLs (binary + frontend tarball) return HTTP 200.
- v1.7.89-alpha was already fully shipped before this session.
-
-## What shipped in v1.7.90-alpha
-
- Bitcoin receive address generation fixed (correct address type, no more 400).
- AIUI/app session: on-screen pointer can click + type into app content (incl. app store
-  search); "open in new tab" opens the phone browser; mobile credential modal centered.
- Electrs self-heals from a corrupt index and shows a percent/block-height progress screen.
- update.rs: retired tx1138 secondary mirror dropped (one-time migration); longer download
-  timeout for slow connections.
-
-## Verification
-
- Full release harness green (8 stages): git-diff, cargo-fmt, catalog-drift, release-manifest,
-  ui-type-check, ui-unit-tests (80 files / 655 tests), cargo-check, cargo-test-weekly.
- Freshly built binary embeds `1.7.90-alpha` (no stale 1.7.89); frontend dist rebuilt fresh
-  (new AppSession bundle); manifest sha256 + size match on-disk artifacts.
-
-## Known gaps / follow-ups
-
- `gitea-local` (localhost:3000) push FAILS from this node — redirects to /login (auth).
-  The v1.7.88 and v1.7.89 tags were also already missing there, so this is a pre-existing
-  condition on this node, not a v1.7.90 regression. vps2 is the primary OTA mirror and is fine.
- OTA self-update verification on THIS node (.116) not yet observed this session — the node
-  should auto-apply from the live 1.7.90-alpha manifest; confirm
-  `update_state.json.current_version == 1.7.90-alpha` after the scheduler runs.
-
-## Resume Context
-
- If a later session resumes, continue from the next active product/release task, not this
-  finished release.
- Broader context: docs/WEEKLY_RELEASE_TRACKER.md, docs/RESUME.md, docs/NEXT_TERMINAL_HANDOFF.md
--- a/docs/REMAINING-ISSUES-PLAN.md
+++ b/docs/REMAINING-ISSUES-PLAN.md
@ -1,224 +0,0 @@
-# Remaining issues — implementation plans
-
-Written 2026-06-17. Covers the open Gitea issues not closeable in the single-box
-dev env. Each plan lists the files to touch, the approach, and how to verify
-(most need .116 + .198, a companion phone, or funded wallets). Issues #3 (VPN)
-and #5 (OpenWRT/TollGate) are intentionally out of scope per the user.
-
-Status of the rest at time of writing:
- **#31** group chat over Tor — dedup-by-`msg_id` fix already shipped (open only
-  for a 2-node Tor confirmation). See its Gitea comment.
- **#43** install on .70 — blocked: .70 unreachable. Plan below is a code-side
-  hardening that doesn't depend on .70's logs.
-
---
-
-## #46 — Pay for peer files (local wallet OR invoice+QR to seller)
-
-> **Status (2026-06-17): Phase 1 DONE & compiles** (LN invoice + QR + release).
-> Seller: `content_invoice.rs` entitlement store, `GET /content/{id}/invoice`
-> + `/invoice-status/{hash}`, invoice-paid path in `serve_content`
-> (`X-Invoice-Hash`), LND `create_invoice`/`invoice_is_settled`. Buyer:
-> `content.request-invoice` / `.invoice-status` / `.download-peer-invoice` +
-> `PeerFiles.vue` picker modal + QR + poll. Phases 2 (on-chain) and 3 (local
-> LN/on-chain methods) remain; needs live funded-wallet verify. Issue left open.
-
-**Goal.** At the paid-download step in Cloud → peer files, let the buyer choose
-how to pay: (a) their local wallet (ecash today; LN/on-chain later), or (b) get
-an invoice with a QR drawn on the **selling** node's wallet, pay from any
-external wallet, and have the file release on confirmation.
-
-**What exists already**
- Buyer ecash auto-pay: `content.download-peer-paid` (mints ecash, downloads
-  atomically) — wired in `neode-ui/src/views/PeerFiles.vue` `downloadFile()`.
- Payer-side builder: `streaming.prepare-payment` RPC + `wallet/ecash.rs`
-  (`build_payment_token`, cross-mint), `swarm/payment.rs`.
- Free streaming download: `/api/peer-content/:onion/:id` (Range-capable).
- LND invoice RPC: `lnd.createinvoice`; ecash balance: `wallet.ecash-balance`.
-
-**Backend work**
-1. **Seller-side invoice RPC** (new), e.g. `content.request-invoice`
-   `{ onion, content_id }` → asks the *selling* node (over the existing
-   `/archipelago/...` peer transport, same path machinery as
-   `content.download-peer-paid`) to produce a payment request for `price_sats`:
-   - LN: `lnd.createinvoice` on the seller, return `bolt11` + `payment_hash`.
-   - on-chain: `lnd.newaddress` on the seller, return `address` + `amount`.
-   - Seller records a pending entitlement keyed by `payment_hash`/address →
-     content_id → buyer.
-2. **Payment confirmation + release**: seller polls its own LND
-   (`lnd.lookup-invoice` / address watch); on settle, marks the entitlement
-   paid. Buyer side polls `content.invoice-status { payment_hash }` → when paid,
-   downloads via the existing `/api/peer-content` (gate now passes because the
-   entitlement is satisfied). Reuse the streaming gate in `streaming/` — add an
-   "invoice-paid" path alongside the ecash-token path.
-3. Keep `content.download-peer-paid` (local-ecash) as the (a) fast path.
-
-**Frontend work** (`PeerFiles.vue`)
-1. Before a paid download, open a small **payment-method picker** modal:
-   - "Pay from this node's wallet" → existing ecash flow (show balance; if
-     insufficient, the LN/on-chain local options when those land).
-   - "Pay from another wallet (QR)" → call `content.request-invoice`, render the
-     `bolt11`/address as a **QR** (add a tiny QR lib or reuse one already in the
-     bundle — check `package.json`), show amount + a live "waiting for
-     payment…" state polling `content.invoice-status`, then auto-download.
-2. Reuse the existing `purchaseError`/`downloading` state + `triggerDownload`.
-
-**Verify**: .116 (seller) + .198 (buyer), a funded regtest/LN wallet. Buyer
-picks QR, pays from a 3rd wallet, file releases. Then the local-ecash path.
-
-**Effort**: large (multi-day). Phase it: (1) LN-invoice + QR + release, (2)
-on-chain, (3) local LN/on-chain methods.
-
---
-
-## #18 — Companion app: "open in external browser" apps don't work
-
-> **Status (2026-06-17): DONE & compiles (Rust + TS); Android unbuilt here.**
-> Reverse relay hop added: `external_open_tx` channel, kiosk publishes
-> `{"t":"o","url"}` on `/ws/remote-relay` (URL-validated), forwarded to the
-> companion's `/ws/remote-input`. `requestExternalOpen()` in `remote-relay.ts`
-> wired into all four `appLauncher.ts` external-open sites; `InputWebSocket.kt`
-> + `RemoteInputScreen.kt` open it via `ACTION_VIEW`. Issue closed; live pairing
-> test pending.
-
-**Goal.** Apps configured to open in a new/external browser should launch on the
-**phone** when driven from the companion controller, using the phone-default-
-browser request pattern.
-
-**What exists**
- Relay protocol in `neode-ui/src/api/remote-relay.ts` — message cases `m`
-  (move cursor), `c` (click), `s` (scroll, just fixed in #7). Click resolves the
-  element under the virtual cursor via `deepElementFromPoint`.
- The kiosk side runs the dashboard; "open external" apps currently try to
-  `window.open` on the **kiosk**, which the phone never sees.
-
-**Approach**
-1. **Detect external-open intent on the kiosk**: when a click lands on an
-   element that would open externally (anchor with `target=_blank` / an app
-   flagged `opensExternally`, or an intercepted `window.open`), instead of
-   opening locally, send a new relay message to the phone:
-   `{ t: 'open-url', url }` over the `/ws/remote-relay` channel (the kiosk is the
-   relay server side — find where it sends frames back to the companion).
-2. **Companion (phone) side** handles `open-url` by doing `window.open(url,
-   '_blank')` / `location.href = url` so it opens in the phone's default browser.
-   - If the companion is the **Android APK** (separate codebase, see
-     `Android/` + memory `feedback_companion_apk_not_in_update`), add an
-     intent-based handler there; if it's a mobile web client, handle in JS.
-3. Intercept `window.open` on the kiosk dashboard globally (a small shim that,
-   when remote-relay is active, forwards to the phone instead of opening).
-
-**Verify**: phone + kiosk paired; tap an "open external" app from the companion;
-it opens in the phone browser.
-
-**Effort**: medium; needs the companion device + possibly an APK change.
-
---
-
-## #50 — Integrate Meshroller into our mesh features
-
-> **Decision made 2026-06-17: seam (a) — Rust-native lift.** Full design with
-> verified seam anchors (message types, dispatch, send API, event/trust gates,
-> Ollama call) is in **`docs/meshroller-integration-design.md`**. Summary below.
-
-Source: https://gitea.l484.com/clasko/Meshroller
-
-**Phase 0 — review (DONE 2026-06-17)**
- Reviewed. Meshroller is a single ~29KB Python script (`meshroller.py`): a
-  daemon that bridges a **Meshtastic** radio (via the `meshtastic` Python serial
-  module, `SerialInterface`) to an **Ollama** LLM (`qwen2.5-coder`). It has
-  trusted-node auth, scheduled/queued messaging, and command handling on mesh
-  channels. It is a **daemon**, not firmware or a library.
- **License**: in-house (our own developer) — no third-party license blocker.
- **Hardware/transport reality**: it rides **Meshtastic serial + a local
-  Ollama**. Our radio is **Meshcore** (Heltec V3) and our mesh stack targets
-  meshcore. The `meshtastic` module does NOT speak meshcore, so the script
-  cannot drive our radio unmodified.
- **Decision needed (architecture)**: per user, integration **must work with
-  meshcore**. Two seams:
-  - (a) Lift Meshroller's *behaviors* (LLM bridge, trusted-node auth, scheduled
-    messaging, command parser) into our Rust mesh stack as typed message kinds —
-    native to meshcore, no Python/Meshtastic dependency. Preferred for meshcore.
-  - (b) Package the Python daemon as a container app and add a meshcore serial
-    backend to it (keeps the script, but requires writing meshcore I/O the
-    `meshtastic` module doesn't provide).
-  This choice is the remaining gate; the rest of Phase 1 below stands.
-
-**Phase 1 — choose the seam**
- Our mesh stack: `core/archipelago/src/mesh/` (`mod.rs` `MeshService`,
-  `listener/`, `protocol.rs`, `types.rs`). Decide:
-  - If Meshroller is a *protocol/feature on the same radio* → implement it as a
-    typed message kind in our `MeshMessageType` + `listener/dispatch.rs`
-    (mirrors how block headers / alerts are handled).
-  - If it's a *separate transport/daemon* → wrap it behind our transport router
-    (`transport/`) like FIPS/LAN/Tor.
- Reuse the event seam (`MeshEvent`) so the UI gets pushes (same path we just
-  wired for #48).
-
-**Phase 2 — UX** (ties into `project_mesh_telegram_plan`)
- A dead-simple onboarding + usage flow in the Mesh tab. Define the 1–2 killer
-  actions and design the setup wizard.
-
-**Verify**: 2 radios (the .116 Meshcore + a second). 
-
-**Effort**: multi-day; gated on the Phase 0 review + a license/architecture
-decision.
-
---
-
-## #15 — netbird app doesn't work (LOW PRIORITY)
-
-> **Status (2026-06-17): DIAGNOSED LIVE on .198 + FIXED (option A shipped); login works.**
-> THE real blocker: the dashboard needs a **secure context** —
-> `window.crypto.subtle is unavailable` over plain http, so OIDC PKCE threw
-> before login. Fix: proxy now serves **HTTPS** (self-signed cert at install,
-> `8087:443`, all origins `https://`); frontend opens netbird in a **new tab**
-> (self-signed-HTTPS iframe is blocked). Layered fixes also in `stacks.rs`:
-> nginx `resolver <gateway>` + variable upstreams (IP-cache 502; `resolver
-> local=on`/`${NGINX_LOCAL_RESOLVERS}` FAIL on nginx:1.27-alpine), LAN-IP
-> canonical origin + CORS + multi-origin redirect URIs, `/nb-auth`+`/nb-silent-auth`
-> SPA fallback (were 404), and a stale-store note (wipe to re-init). Also found:
-> `conmon died` zombie containers (recreate fixes; #53). Validated on .198,
-> registration+login succeed. Trusted-cert/iframe (option B) = #56;
-> registry-app migration = #52. Existing nodes need a clean reinstall.
-
-**Diagnose first** (likely a container/config issue, like other app fixes):
-1. On a node: `podman logs <netbird container>` — capture the actual failure.
-2. Check the app manifest + install path (`container/` install, env, ports,
-   the four iframe-sync places per memory `feedback_gitea_iframe_setup` if it
-   has a UI).
-3. netbird needs a management URL / setup key — confirm whether the app expects
-   config we don't provide, or a host capability (TUN device / NET_ADMIN) the
-   rootless-podman setup lacks.
-
-**Likely fix**: either supply the missing env/setup-key UI, or add the required
-container capability. Low priority — schedule after the above.
-
---
-
-## #43 — Install errors at DID-creation + password screens (.70); FIPS slow
-
-`.70` is unreachable, so we can't read its logs. Code-side hardening that helps
-regardless:
-> **Status (2026-06-17): hardening DONE & compiles.** Root cause was a
-> non-idempotent `seed.generate` that overwrote node keys under the client's
-> retry storm on slow first boot. Fixed: idempotent generate + retry-safe
-> verify (`seed_rpc.rs`), transient-vs-genuine error handling in
-> `OnboardingSeedGenerate/Verify.vue`, and a non-blocking FIPS status on
-> `OnboardingDone.vue`. Issue closed; full closure wants a fresh install on a
-> reachable node + re-test on .70.
-
-1. **Onboarding error surfacing** — in the seed/DID + password onboarding views
-   (`OnboardingSeed*`, the password step) and their RPC handlers
-   (`seed.generate` / `seed.verify` / `auth.setup`), make a *successful*
-   operation never show an error toast, and make genuinely-failed ops show the
-   real message + a retry — so cosmetic errors (op actually succeeded) stop
-   alarming users. Audit the promise/catch paths for races where a slow backend
-   resolves after a timeout fires.
-2. **FIPS start delay** — confirm `spawn_post_onboarding_fips_activate`
-   (`api/rpc/seed_rpc.rs`) isn't blocking onboarding; it already runs detached.
-   Consider surfacing "FIPS starting…" status instead of letting it look stuck.
-
-**Verify**: a fresh ISO install on a reachable node (.198 or a scratch box),
-watch the DID + password screens; then re-test on .70 once reachable.
-
-**Effort**: small–medium (the hardening); full closure needs a repro node.
--- a/docs/RESUME.md
+++ b/docs/RESUME.md
@ -1,840 +0,0 @@
-# RESUME - Archipelago Release Hardening on `.198`
-
-Last updated: 2026-06-10
-
-## 2026-06-10 05:48 EDT Active Session Checkpoint
-
-Work resumed from `docs/NEXT_TERMINAL_HANDOFF.md`. No `.198` host actions have
-been run yet in this resumed pass.
-
-Current first steps:
-
-1. Rerun `git diff --check`.
-2. Rerun the focused Rust image-version test for the Nextcloud false-update
-   helper.
-3. If those are clean, inspect and continue the rootless Podman lifecycle/
-   scanner-backoff work before any `.198` validation.
-
-Progress:
-
- `git diff --check` passed.
- Focused Rust image-version test in `/tmp/archy-cargo-image-versions` remains
-  inconclusive: the tool PTY stayed open after compile output stopped, with no
-  active `cargo`, `rustc`, or linker process visible.
- Bounded retry of the focused image-version test using the normal workspace
-  target also timed out: `timeout 300s cargo test --manifest-path core/Cargo.toml -p archipelago container::image_versions::tests`
-  exited `124` after compiling the `archipelago` test target without reaching
-  test output. Nextcloud false-update validation is still not closed.
- Local code change in progress: single-orchestrator `package.stop` now returns
-  immediately with `stopping` and runs the orchestrator stop in the background,
-  instead of blocking the RPC/UI while Podman cleanup happens.
- `cargo fmt --manifest-path core/Cargo.toml --all --check` passed.
- Compile check passed in `/tmp/archy-cargo-runtime-check`:
-  `cargo check --manifest-path core/Cargo.toml -p archipelago --bin archipelago`.
- `git diff --check` passed after the stop-path edit and doc updates.
- Lower-level stop path inspection: Quadlet service stop is already bounded
-  with kill/reset recovery, and the runtime fallback treats already-absent
-  containers as success. No extra lower-level stop change was made.
-
-## 2026-06-10 05:30 EDT Pause Checkpoint
-
-User paused to switch machines. Continue from `/home/archipelago/Projects/archy`
-and read `docs/NEXT_TERMINAL_HANDOFF.md` plus
-`docs/1.8-alpha-improvements-tracker.md` first. No dev server or validation
-command should be intentionally left running from this checkpoint.
-
-Latest local-only tracker progress:
-
- Done: uninstall preserve/delete-data choice, companion APK QR/download modal,
-  App Details setup-instructions card, dead/coming-soon UI cleanup via Spotlight
-  AI placeholder removal.
- In progress: Fleet/tab loading polish, Bitcoin receive-address readiness
-  states, no-registration credentials inventory, Nextcloud false-update fix.
- New credential fallback: PhotoPrism now shows manifest-backed credentials
-  (`admin` / `archipelago`) when backend credentials are empty. Grafana was not
-  added because `GRAFANA_ADMIN_PASSWORD` is not resolved to a known repo
-  default/secret.
- Nextcloud local fix: manifest/catalog/UI metadata now points at `nextcloud:29`
-  and image update detection ignores registry-host-only changes. Catalog drift
-  passed, but backend focused Rust validation did not complete cleanly. First
-  `cargo test -p archipelago container::image_versions::tests` from `core/`
-  hit a Rust linker/incremental artifact failure while `/tmp` was full; a
-  non-incremental retry was killed after running too long. Old
-  `/tmp/archy-cargo-*` build-cache directories were removed and `/tmp` recovered.
-
-Latest local validations:
-
- `npm run type-check` passed after the PhotoPrism credential fallback.
- `npm test -- --run src/views/apps/__tests__/appCredentials.test.ts` passed.
- `git diff --check` passed after the Spotlight cleanup and should be rerun
-  after resuming.
- `python3 scripts/check-app-catalog-drift.py --release --strict` passed during
-  the Nextcloud pass.
-
-Immediate next steps:
-
-1. Rerun `git diff --check`.
-2. Rerun `cargo test -p archipelago container::image_versions::tests` from
-   `core/` when ready to validate the Nextcloud update-detection helper.
-3. Continue the `docs/1.8-alpha-improvements-tracker.md` rows that remain
-   `todo` or `in-progress`, avoiding host-gated items until `.198` access is
-   intentionally resumed.
-
-## 2026-06-09 Resume Handoff - Read First
-
-Last user prompt to preserve:
-
-> please can we save all our progress, backlog, and goal to memory so I can resume on another device please
->
-> including the last prompt
-
-Ultimate release goal:
-
-Archipelago's app/container system must be developer-ready and production-release ready. New apps should be supported through manifest/runtime contracts and clear developer documentation, not one-off OS-level changes or fragile per-app hacks. The app system must be professional, secure, elegant, lightweight, and predictable: apps install, start, stop, restart, uninstall, reinstall, survive reboot, show correct status/progress, and launch correctly from tabs/iframes. Developers should be able to package apps for Archipelago clearly from the migration/developer docs.
-
-Important target node:
-
- Validation node: `archipelago@192.168.1.198`, password `password123`.
- Current release deadline pressure from user: production release target was Thursday, 2026-06-11.
- Tests have been run mostly on `.198`; user noted we may also need to validate on the current intended release server, not only `.198`.
- Avoid broad/destructive Podman store cleanup. Do not use `git reset --hard` or revert unrelated user changes.
-
-Current deployed backend on `.198`:
-
- Latest deployed `/usr/local/bin/archipelago` sha256: `9a00e5432dd9241a9a54087cc87ede46fc0c77a5051dbfb2d34112b9b12e902f`.
- A later local-only code change exists and passed `cargo check`: cached web-app health now requires HTTP reachability, not just TCP. This was not deployed because the user interrupted the release build/deploy flow. No build process was left running at handoff.
-
-Major progress achieved in the latest session:
-
- Beta Telemetry / Fleet collector:
-  - Confirmed `TELEMETRY_COLLECTOR_URL` was not set in the current shell and no repo/service config was setting it.
-  - Fixed the periodic reporter to POST a `telemetry.ingest` JSON-RPC envelope to the configured collector endpoint instead of POSTing the raw telemetry report body.
-  - Added optional systemd env loading with `EnvironmentFile=-/var/lib/archipelago/telemetry.env` in `image-recipe/configs/archipelago.service`.
-  - Updated `scripts/deploy-to-target.sh` so deployments write `/var/lib/archipelago/telemetry.env` when `TELEMETRY_COLLECTOR_URL` is exported in `scripts/deploy-config.sh`.
-  - Documented the expected value shape in `scripts/deploy-config.example`: `https://<collector-host>/rpc/v1`.
-  - Verification passed: `cargo fmt -p archipelago --manifest-path core/Cargo.toml`, `bash -n scripts/deploy-to-target.sh`, `git diff --check` for the touched files, and `CARGO_TARGET_DIR=/tmp/archy-cargo-check cargo check -p archipelago --manifest-path core/Cargo.toml`.
-  - `systemd-analyze verify image-recipe/configs/archipelago.service` could not run in the sandbox because systemd bus access failed with `SO_PASSCRED failed: Operation not permitted`.
-  - Still needed: choose the real collector host, create or update local `scripts/deploy-config.sh` with `export TELEMETRY_COLLECTOR_URL='https://<collector-host>/rpc/v1'`, deploy, restart `archipelago`, and confirm opted-in nodes ingest into Fleet.
- IndeeHub:
-  - Recovered stale/corrupt metadata/container state enough for fresh lifecycle.
-  - Full lifecycle passed earlier on `.198`.
-  - Verified launch on `7778`.
-  - Verified `/nostr-provider.js` is served and the Nostr signer bridge requirement is preserved.
- Saleor:
-  - Removed from app catalog/server as requested.
- Bitcoin Knots / Bitcoin UI:
-  - Fixed false health path so `bitcoin-knots` health no longer just probes the UI bridge on `8334`.
-  - Patched Bitcoin UI wording to show retrying/busy sync states instead of scary permanent failure.
-  - Verified `/bitcoin-status` recovered; node is in IBD and pruned, progress around 6-7% during latest checks.
- Fedimint:
-  - Restored/kept Fedimint Gateway as separate catalog app. Do not make Guardian launch Gateway.
-  - Fixed Guardian startup path so `fedimint` uses manifest-backed Quadlet/orchestrator, not legacy startup.
-  - Fixed generated unit regeneration by removing the pre-orchestrator Podman inspect gate for orchestrator starts.
-  - Fedimint Guardian unit now includes `FM_BITCOIND_URL=http://bitcoin-knots:8332`.
-  - Added manifest wrapper that waits for Bitcoin RPC sync with `"initialblockdownload":false` before launching `fedimintd`.
-  - Current correct behavior on `.198`: `fedimint.service` active and logging `Waiting for Bitcoin RPC sync at http://bitcoin-knots:8332...`; RPC health returns `starting`; container-list now reports `fedimint` as `starting` instead of stale `stopping`.
-  - Guardian iframe/tab does not yet show UI because `fedimintd` is intentionally gated until Bitcoin leaves IBD. The UI should explain "waiting for Bitcoin sync" rather than opening a blank/dead iframe.
- BotFights:
-  - User reported stopped/unhealthy.
-  - Added `botfights` to manifest-backed orchestrator start path so it no longer fails immediately on legacy Podman discovery.
-  - Deployed backend hash `9a00e543...`.
-  - BotFights started and is active.
-  - Direct checks after it finished booting: `/` returned HTTP 200; `/api/health` returned `{"status":"ok","name":"botfights"}`.
-  - Note: `.198` manifests still use `git.tx1138.com/lfg2025/botfights:1.1.0`; local repo manifest shows `146.59.87.168:3000/lfg2025/botfights:1.1.0`. Reconcile this catalog/manifest mismatch later.
- Status/health correctness:
-  - Reduced container health/status Podman timeouts to avoid UI hanging forever.
-  - `container-list` now refreshes stale cached states and uses Quadlet service-active fallback for stale `stopping` states.
-  - Fedimint stale `stopping` fixed to `starting`.
-  - Local-only patch passed `cargo check`: web-app cached health requires HTTP success/redirect, not just open TCP. This fixes false healthy during app boot, seen with BotFights.
- Filebrowser/Home Assistant/Immich/Bitcoin:
-  - Latest RPC health check showed filebrowser healthy, homeassistant healthy, immich healthy, bitcoin-knots healthy.
-  - Still treat Home Assistant setup/restart hang and Immich post-setup HTTP 500 as backlog blockers needing focused validation.
-
-Current critical blockers:
-
- Runtime control plane / Podman scanning:
-  - Backend restarts repeatedly take 1-2 minutes because startup/crash recovery synchronously waits on slow `podman ps`.
-  - Logs show repeated `podman ps -a --format json timed out after 30s` and crash recovery `podman ps stopped timed out after 60s`.
-  - This is causing bad UX: "checking forever", false "no apps installed", intermittent "loading apps", stale statuses, slow lifecycle actions.
-  - Next platform fix should move Podman/crash-recovery scans out of the service readiness path and keep last-known app state during scanner backoff.
- My Apps UI false negatives:
-  - User reports apps sometimes do not show, "checking" forever, "loading apps" sometimes good but often false "no apps installed".
-  - Required fix: do not show empty/no-apps while scanner or Podman is in backoff. Keep last known apps, show explicit loading/checking/stale state, and avoid destructive UI conclusions from scan timeout.
- Fedimint Guardian:
-  - Current "starting/waiting for Bitcoin sync" is correct while Bitcoin is in IBD.
-  - Need UI/status copy that explains waiting for Bitcoin sync, and later validate Guardian UI on `8175` once Bitcoin sync condition is satisfied.
- Progress UX:
-  - User explicitly requires install/uninstall/start/stop/restart progress to be accurate and not look frozen.
-  - Uninstall indicator currently poor/no progress. Must fix with clear phase updates and no stale notifications.
- Stale health notifications:
-  - Must not persistently trigger on new logins/refreshes after no longer valid.
-  - Some UI filtering was patched earlier, but keep this in regression backlog.
- Reboot survival:
-  - Must pass repeated reboot validation after runtime/status fixes.
-  - Acceptance target from user: minimum 3 clean consecutive reboots, preferably 5.
-
-Backlog captured from user reports:
-
- Portainer:
-  - Environment wizard error: `Dial unix /var/run/docker.sock: connect: connection refused`.
-  - User noted Portainer does Podman orchestration well; compare/learn from its socket/control flow where useful.
- Fedimint:
-  - Setup after guardian confirmation caused app not to launch.
-  - Guardian launch was opening Gateway before; do not regress. Guardian and Gateway must remain distinct.
-  - Gateway app disappeared from catalog before; it has been restored but keep in regression tests.
- Bitcoin Knots:
-  - User saw missing app/launch issues and status bridge messages. UI now improved, but include in lifecycle/reboot regression.
- Home Assistant:
-  - Setup has issues on this node and restart hung for a long time.
- Immich:
-  - After setup user saw HTTP 500 stacktrace from `loadServerConfig`. Needs focused post-setup validation, not just "healthy".
- Filebrowser:
-  - User saw erroneous stopped status while app was working. Status ordering was patched; keep in regression.
- Tailscale:
-  - Launch must show local login/auth UI, not merely container running.
- BTCPay/Fedimint/Gateway/other Bitcoin-dependent apps:
-  - Need clearer dependency wait states when Bitcoin RPC is slow/IBD.
- App catalog/developer readiness:
-  - Apps should not require OS-level changes per app.
-  - App migration document and developer guide must include this principle and current app packaging contract.
- Saleor:
-  - Removed from catalog/server and should stay removed unless intentionally reintroduced.
-
-Release readiness estimate:
-
- Prior estimate was 68%; after latest IndeeHub/Fedimint/BotFights/status progress, a realistic estimate is about 72%.
- Remaining 28% is not feature volume; it is systemic hardening: runtime control-plane responsiveness, truthful UI during Podman backoff, lifecycle/reboot gates, and focused app-specific post-setup validation.
-
-Suggested immediate next steps after resuming:
-
-1. Read this file and verify no background build/process is running.
-2. Build/deploy the local-only HTTP-health tightening patch if not already deployed.
-3. Patch backend startup/crash recovery so Podman scans are async/non-blocking and service readiness is not held hostage by `podman ps`.
-4. Patch My Apps UI/data flow to preserve last-known apps during scanner backoff and never show false empty state while checking.
-5. Run focused status checks on `.198`: fedimint, botfights, filebrowser, bitcoin-knots, immich, homeassistant, portainer.
-6. Continue lifecycle gates only after the runtime scan/control path is stable enough that tests measure apps, not Podman timeouts.
-
-Read this first if resuming in a fresh OpenCode session. Paste the resume prompt below verbatim.
-
---
-
-## Resume Prompt
-
-> Continue Archipelago release hardening from `docs/RESUME.md`. First read `docs/RESUME.md`, `docs/CONTAINER_LIFECYCLE_HANDOFF.md`, and `docs/MIGRATION_STATUS_REPORT.md`. The active validation node is `.198` at `192.168.1.198`; keep `archipelago-doctor.timer` and `archipelago-reconcile.timer` inactive for deterministic tests. Do not run Podman prune/image-list/system-df/image-exists/store-wide cleanup commands on `.198`; the store is known to hang under load. Preserve app data. Latest deployed backend hash is `f1f5c61c9f66ae58e3cb0c7f1cb390777814d162345685c1ddec099057ba2fe3`. This includes the rootless Podman socket fix that treats `/run/user/1000/podman/podman.sock` as a socket bind, never a directory/data bind, prefers persistent `podman-archy-api.service` for Portainer, and changes absent cached `Stopping` entries to `Stopped`. User reported host reboot validation was not clean: many containers were SIGKILLed during reboot/shutdown and IndeeHub was stopped after boot. User also reported Immich, IndeeHub, Tailscale, Vaultwarden, Portainer, Home Assistant, Uptime Kuma, Nextcloud, Fedimint, and Botfights app lifecycle/launch/state issues. BTCPay was a false alarm: slow but fine. Current live validation: Vaultwarden full preserve-data lifecycle passed; Portainer full preserve-data lifecycle passed and its socket mount is no longer `//deleted`, but the user still needs to retry the Portainer environment wizard. Fedimint direct container state is running/healthy. IndeeHub remains P0: Podman still has a corrupted `indeedhub|Removing|97cf9fd13bb2` record; targeted `podman rm -f`, `podman rm -f --time 0`, and `podman container cleanup --rm indeedhub` hang and must be killed. Treat post-reboot recovery, launch reachability, lifecycle correctness, progress indication, and rootless Podman socket-backed apps as active release blockers. IndeeHub is not passing unless `http://<node>:7778/` is reachable and `/nostr-provider.js` is injected/served so the Nostr signer works as before. Tailscale is not passing unless launch presents the Tailscale login/auth UI. Before editing or touching `.198`, summarize current state and your exact first step.
-
---
-
-## Current Goal
-
-Cut Archipelago `1.8-alpha`, including a ready-to-test ISO image.
-
-Current status estimate: about 68% of the way to release. The app migration, manifest/catalog generation, and many local gates are advanced, and the latest pass fixed Vaultwarden plus the concrete Portainer stale socket mount. Live `.198` testing still shows the app platform is not production-bulletproof. Remaining release blockers include app install/start truthfulness, frontend launch readiness gating, IndeeHub recovery and Nostr signer compatibility, Tailscale login-link launch, Home Assistant/Uptime Kuma/Nextcloud install/start failures, full lifecycle coverage, progress indication quality, app packaging documentation, refactor/dead-code cleanup, repeated reboot validation, final `.198` lifecycle confidence, and cutting/smoke-testing the `1.8-alpha` ISO.
-
-## Release Readiness Estimate
-
- Estimated completion: `68%`.
- What is already achieved:
-  - manifest-driven app migration is substantially advanced;
-  - catalog metadata generation and strict drift checks are green;
-  - local backend/frontend release gates have been green in prior passes;
-  - broad non-destructive lifecycle has passed on the deployed release-candidate line before the reboot-gate finding;
-  - Podman store-risk paths have been quarantined from known fragile broad image/store commands;
-  - IndeeHub recovery now has local hardening in progress, including explicit Nostr signer validation in the lifecycle harness;
-  - targeted Immich fixes now make dependency creation fail fast instead of silently reporting install success, and a follow-up readiness-gating patch is in progress so the app does not look launchable before HTTP readiness;
-  - mobile and desktop app progress UX now has clearer install/remove phase labels in local changes;
-  - Vaultwarden full preserve-data lifecycle passed on `.198` after the rootless socket fix;
-  - Portainer full preserve-data lifecycle passed on `.198` after recreating the container against persistent `podman-archy-api.service`; its mount now points at `/podman/podman.sock`, not `/podman/podman.sock//deleted`.
- What must still pass before release:
-  - deploy the current Immich readiness-gating backend and frontend progress UX changes;
-  - focused Immich validation: install must stay in progress until `http://<node>:2283/` returns HTTP success and app launch opens the frontend;
-  - focused IndeeHub validation: recover stale/corrupt frontend container, prove `http://<node>:7778/`, and prove `/nostr-provider.js` signer bridge is injected/served;
-  - keep Vaultwarden in regression coverage even though the latest full lifecycle passed;
-  - focused Tailscale validation: launch must present the local login/auth link/UI on `8240`;
-  - focused Portainer validation: user must retry the environment wizard and confirm it can connect to the rootless Podman socket at `/var/run/docker.sock`;
-  - full preserve-data lifecycle testing for representative migrated apps and key stacks: `install -> launch -> stop -> start -> restart -> uninstall preserve_data -> reinstall -> launch`;
-  - progress indication validation for install, uninstall, start, stop, restart, reboot recovery, and failed transitions; generic "running" or "removing" pills are not enough;
-  - app packaging documentation gate: update `docs/APP-PACKAGING-MIGRATION-PLAN.md` and `docs/app-developer-guide.md` so they match the current manifest/runtime contract, include lifecycle/progress/reboot expectations, and clearly tell developers to use reusable manifest/orchestrator primitives instead of OS-level per-app hacks;
-  - required refactor/remove-dead-code gate: after correctness is proven and before cutting `1.8-alpha`, remove obsolete app-specific paths, stale fallback metadata, duplicate lifecycle logic, unused scripts/hooks, and misleading compatibility shims; rerun lifecycle, launch, and release gates afterward;
-  - broad non-destructive lifecycle after the deploy;
-  - at least 3 consecutive clean post-fix reboot iterations, with broad lifecycle green after each;
-  - preferably 5 consecutive clean reboot iterations before calling `1.8-alpha` production-release ready;
-  - final local release gates after any additional fixes;
-  - cut the `1.8-alpha` ISO;
-  - boot/smoke-test the ISO enough to prove installability, backend startup, UI startup, app catalog availability, and at least a focused app lifecycle.
-
---
-
-## Latest User Directive
-
-> A lot were killed SIGKILL and one crashed, a couple stopped. Not sure if we did fixes but we should be a few reboot tests until 3/4/5 reboots are clean I guess, unless you advise a different passing criteria
->
-> please do not forget that indeehub must work with the nostr signer just like before, I hope we haven't broken that or anything, please add to tasks
->
-> also please note that immich and tailscale are not launching on the front-ends on their ports from the app screen, they say running/healthy but clearly aren't
->
-> Also BTCPay is not running either
->
-> no my bad, wrong server, BTCPay is fine just slow, please continue
->
-> Yes, as shown in trying to complete the environment wizard in portainer you get "Failure Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
->
-> please confirm there is a refactor/remove dead code release gate too
-
-Passing criterion adopted: after the post-reboot recovery fix is deployed, require at least 3 consecutive clean reboots with broad non-destructive lifecycle green after each; prefer 5 consecutive clean reboots for production-release confidence. SIGKILL during shutdown is not automatically disqualifying if every managed app recovers and is reachable after boot, but any app left stopped/crashed/unreachable after boot is a failed reboot iteration. IndeeHub validation must include the Nostr signer bridge, not just HTTP reachability.
-
-Immich, Tailscale, Vaultwarden, and Portainer are explicit blockers. Container `running`/`healthy` is not enough for Immich/Tailscale; direct/app-screen launch routes must work. Tailscale launch must present the login/auth UI. Vaultwarden must survive install/start/restart. Portainer must be able to talk to the rootless Podman socket from inside its Docker-compatible socket bind. BTCPay is not currently a blocker; it was a wrong-server/slow-app false alarm.
-
-There is also an explicit app packaging documentation gate and an explicit required refactor/remove-dead-code release gate. The packaging docs must be current enough for a third-party developer to package an app against the actual manifest/runtime contract. Do the refactor/dead-code cleanup after current correctness fixes are validated, not before, but do not cut `1.8-alpha` without it: remove stale per-app hacks, dead legacy code paths, duplicate lifecycle helpers, obsolete scripts/hooks, and misleading fallback metadata that would make `1.8-alpha` hard to maintain, then rerun the release gates.
-
---
-
-## Live `.198` State
-
- Host: `192.168.1.198`.
- Password for lifecycle harness/RPC login: `password123`.
- Latest recorded `/usr/local/bin/archipelago` sha256: `f1f5c61c9f66ae58e3cb0c7f1cb390777814d162345685c1ddec099057ba2fe3`.
- `archipelago.service`: active.
- `archipelago-doctor.timer`: inactive.
- `archipelago-reconcile.timer`: inactive.
- `/`: `65%` used, about `9.6G` free.
- `/var/lib/archipelago`: about `9-10%` used, about `370G` free.
-
-Current active app blockers:
-
- Immich: after deploying hash `54d781...`, reinstall no longer immediately stops. Live test showed `immich_postgres` and `immich_redis` healthy and `immich_server` running; first launch had a readiness gap while Immich ran migrations/geodata import, then `2283` returned HTTP `200`. Local follow-up changes add an Immich server health check and require healthy status before install completes.
- IndeeHub: still blocked. Latest targeted check after hash `f1f5c61c...` showed a corrupted Podman ghost record: `indeedhub|Removing|97cf9fd13bb2`; `podman inspect indeedhub` fails with `layer not known`. Targeted `podman rm -f`, `podman rm -f --time 0`, and `podman container cleanup --rm indeedhub` hang and must be killed. Must recover this record without broad store cleanup and then verify `http://<node>:7778/` plus `/nostr-provider.js` for the Nostr signer.
- Home Assistant: user reports install completes then app stops. Treat as part of the migrated single-container/rootless Podman control-plane blocker.
- Uptime Kuma: user reports install takes ages then app stops. Live logs showed `package.install uptime-kuma failed: systemctl --user restart podman.socket exited exit status: 1`.
- Nextcloud: user reports same install-then-stop behavior. Live logs showed `package.install nextcloud failed: systemctl --user restart podman.socket exited exit status: 1`.
- Vaultwarden: latest full preserve-data lifecycle passed on hash `2a168489...`: install -> launch on `8082` -> stop -> start -> restart -> uninstall preserve_data -> reinstall -> launch. Keep in regression tests because the user-visible transition/progress UX still looked like it was stuck while stopping.
- Portainer: latest full preserve-data lifecycle passed on hash `2a168489...`. The stale mount was confirmed as `/run/user/1000/podman/podman.sock//deleted`; after persistent `podman-archy-api.service` and Portainer recreate, mountinfo shows `/podman/podman.sock` without `//deleted` and `http://127.0.0.1:9000/` returns HTTP `200`. User still needs to retry the environment wizard; do not close this blocker until the wizard no longer reports `Cannot connect to the Docker daemon at unix:///var/run/docker.sock`.
- Tailscale: still blocked. Container running is not enough; launch must present local login/auth UI on `8240`.
- Fedimint: user reported it showed `stopping`; after hash `f1f5c61c...`, direct targeted state shows `fedimint|Up ... (healthy)` and RPC `container-list` shows `fedimint running`. Keep in focused regression/launch checks.
- Botfights: newly reported stopped/broken. Direct probe after the report showed `botfights` running/healthy and `http://127.0.0.1:9100/` returning `200`; keep in focused lifecycle/launch validation after Podman control-plane recovery.
- Rootless Podman socket/control plane: improved but still a release-risk area. Fixed the concrete bug where `/run/user/1000/podman/podman.sock` could be created as a directory and the Portainer bind could point at a deleted socket inode. The current deployed backend prefers persistent `podman-archy-api.service`. Continue watching scanner timeouts and lifecycle behavior for Home Assistant, Uptime Kuma, Nextcloud, and Portainer.
- Stuck Podman records: P0 migration blocker. IndeeHub proves ordinary targeted `podman rm` fallbacks are not sufficient once a record is wedged in `Removing`.
- Progress UX: still blocked until live validation proves install/uninstall/start/stop/restart show phase detail and do not appear frozen.
-
-Do not treat root disk pressure as a current blocker anymore. It was reduced from `99%` used with under `600M` free to about `65%` used with roughly `10G` free.
-
-### 2026-06-10 Resume Continuation Checkpoint
-
- Deployed backend hash `7f58da80063f58574675256913ac9cddf131e65d8935015748a70adffc228f83` to `.198`.
-  - Previous live hash observed before deploy: `9a00e5432dd9241a9a54087cc87ede46fc0c77a5051dbfb2d34112b9b12e902f`.
-  - `archipelago.service` is active.
-  - `archipelago-doctor.timer` and `archipelago-reconcile.timer` are inactive.
- Added explicit release gates to this handoff:
-  - app packaging docs must be updated before `1.8-alpha`;
-  - refactor/remove-dead-code is required before `1.8-alpha`, after correctness validation and before final release gates/ISO.
- Local validation before deploy:
-  - `bash -n tests/lifecycle/remote-lifecycle.sh` passed;
-  - `cargo fmt --manifest-path core/Cargo.toml --all`;
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago-container` passed (`45` tests);
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container` passed;
-  - `python3 scripts/check-app-catalog-drift.py --release --strict` passed;
-  - `git diff --check` passed.
-  - Filtered `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago indeedhub` appeared wedged in the tool PTY after compilation started; no local cargo/rustc worker remained visible. Treat that one filtered run as inconclusive, not failed.
- IndeeHub live validation after deploy:
-  - `container-list` reports `indeedhub` running;
-  - `container-health` reports `{"indeedhub":"healthy"}`;
-  - `http://192.168.1.198:7778/` returns HTTP `200`;
-  - `http://192.168.1.198:7778/nostr-provider.js` returns HTTP `200` and contains the Archipelago NIP-07/NIP-98 Nostr provider shim.
- Immich live validation after deploy:
-  - `container-list` reports `immich` running;
-  - direct `http://192.168.1.198:2283/` returns HTTP `200`;
-  - `container-health` reported `{"immich":"unknown"}` during one focused check, so health truthfulness still needs follow-up even though launch HTTP is reachable.
- Tailscale live validation after deploy:
-  - Found the live generated unit still used the stale catalog command `sleep 2; tailscale web...`; locally patched `app-catalog/catalog.json`, `neode-ui/public/catalog.json`, and `scripts/first-boot-containers.sh` to use the safer socket-wait startup, and copied the catalog to `/opt/archipelago/web-ui/catalog.json`.
-  - App-scoped `package.restart tailscale` failed via RPC with `podman ps timed out while listing containers`.
-  - Patched the live generated Tailscale `.container` unit to match the catalog fix and restarted only `tailscale.service`; the old container required SIGKILL during stop and Podman cleanup took roughly 2 minutes.
-  - After restart, the Tailscale unit runs both `tailscaled` and `tailscale web`, `container-list` reports `tailscale` running, `container-health` reports `{"tailscale":"healthy"}`, and `http://192.168.1.198:8240/` returns HTTP `200` with Tailscale UI content.
-  - Do not close Tailscale lifecycle as fully passing yet: launch UI is fixed, but stop/restart behavior exposed the rootless Podman cleanup/control-plane blocker.
- Other live probes after deploy:
-  - `portainer` HTTP `9000` returns `200`; user still needs to retry the environment wizard.
-  - `vaultwarden` HTTP `8082` returns `200` from localhost on `.198`.
-  - `botfights` HTTP `9100` returns `200` from localhost on `.198`.
-  - `btcpay-server` returned `302` then timed out under a short probe; continue treating BTCPay as slow rather than a current blocker unless a focused check fails.
-  - `fedimint` port `8175` reset during probe while RPC showed `starting`; keep expected Bitcoin-sync wait-state/status copy in scope.
- Podman/control-plane remains the active systemic blocker:
-  - logs still show `podman ps timed out`, `podman stats timed out`, scan backoff, and slow app cleanup;
-  - do not start reboot-count validation until app stop/start/restart and post-reboot recovery are clean enough that tests measure app behavior instead of Podman timeouts.
-
---
-
-## Latest Completed Work
-
-### 2026-06-08 Rootless Socket, Vaultwarden, and Portainer Fix
-
- Built and deployed backend hash `2a168489737180b4088503dd93ef89c11da13e64790b324db8baea8ca05d3536` to `.198`; then built and deployed follow-up hash `f1f5c61c9f66ae58e3cb0c7f1cb390777814d162345685c1ddec099057ba2fe3`; `archipelago.service` active, `archipelago-doctor.timer` inactive, `archipelago-reconcile.timer` inactive.
- Fixed rootless Podman socket bind handling in `core/archipelago/src/container/prod_orchestrator.rs`:
-  - `/run/user/1000/podman/podman.sock` is skipped by bind-directory creation and data UID/chown prep;
-  - socket bind mounts call explicit socket repair before other bind prep;
-  - `ensure_user_podman_socket()` now prefers persistent `podman-archy-api.service` at `unix:///run/user/1000/podman/podman.sock`, falling back to `podman.socket` only if needed.
- Validated locally before deploy:
-  - `cargo fmt --manifest-path core/Cargo.toml --all`.
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago absent_` (`4 passed`, including the stale absent `Stopping` regression tests).
-  - `git diff --check`.
-  - `timeout 900s cargo build --manifest-path core/Cargo.toml -p archipelago --release`.
- Vaultwarden full preserve-data lifecycle passed on `.198`:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=vaultwarden ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Portainer full preserve-data lifecycle passed on `.198`:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=portainer ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Portainer stale socket mount was confirmed and repaired:
-  - Before recreate, mountinfo showed `/run/user/1000/podman/podman.sock//deleted -> /var/run/docker.sock`.
-  - After persistent `podman-archy-api.service` and Portainer recreate, mountinfo shows `/podman/podman.sock -> /var/run/docker.sock`, host socket exists, and Portainer UI returns HTTP `200`.
-  - User still needs to retry the Portainer environment wizard; do not close the blocker until that wizard can connect.
- Direct state check after deploy:
-  - `fedimint|Up ... (healthy)` and RPC `container-list` shows `fedimint running`.
-  - `indeedhub|Removing|97cf9fd13bb2`; `podman inspect` fails with `layer not known`; targeted removal/cleanup hangs and had to be killed.
-  - `vaultwarden running true`.
-  - `portainer running true`.
-
-### 2026-06-08 Reboot Blocker Follow-up In Progress
-
- User reported host reboot validation was not clean: many containers were killed with SIGKILL during reboot/shutdown, one crashed, a couple stopped, and IndeeHub was stopped after boot.
- Treat this as a failed reboot gate. Do not call the release ready until post-fix reboot iterations are clean.
- Local changes made in this pass:
-  - hardened `core/archipelago/src/container/prod_orchestrator.rs` IndeeHub stack recovery so reboot reconcile starts existing backend containers through a user scope when possible, waits for backend containers and API dependency DNS, starts/restarts the frontend, verifies it remains running, and verifies host port `7778`;
-  - hardened `core/container/src/manifest.rs` package validation for app IDs, ports, env keys, capabilities, devices, volume sources/options, network policy, and reviewed host-bind exceptions while preserving all current real manifests;
-  - updated `tests/lifecycle/remote-lifecycle.sh` so IndeeHub launch validation requires `/nostr-provider.js` to be injected into the HTML and served from the app, preserving the Nostr signer requirement.
- Deployed follow-up backend hash `4108ca146b482c028ae8d7c4bec314b71ef3412f15efd2e61846a2c345b36aba` to `.198`; service active, timers inactive. Focused audit still showed:
-  - `indeedhub` stuck `stopping` and unhealthy;
-  - `immich` stopped/unhealthy;
-  - `tailscale` running/healthy but direct launch `8240` returned `000`;
-  - `vaultwarden` health RPC errored and launch `8082` returned `000`;
-  - `btcpay-server` was fine (`23000` returned HTTP 200); user confirmed BTCPay was a wrong-server/slow-app false alarm.
- Targeted diagnostics on `.198` found:
-  - IndeeHub frontend Podman state `removing`/`stopping` with no `7778` listener;
-  - Immich server stopped, Redis exited, Postgres unhealthy, no `2283` listener;
-  - Tailscale listener process existed on `8240`, but direct HTTP still returned `000`; logs show Tailscale is `NeedsLogin`/`WantRunning=false`, so launch must present the login/auth UI rather than a generic daemon endpoint;
-  - Vaultwarden container was absent; public `package.start vaultwarden` failed on stale/refused Podman socket before local fixes;
-  - Portainer launches but the environment wizard reports `Cannot connect to the Docker daemon at unix:///var/run/docker.sock`, confirming socket-backed apps are not release-ready.
- Local follow-up fixes after those diagnostics:
-  - `core/container/src/runtime.rs` now tries `podman rm -f --time 0`, targeted `podman container cleanup`, and another `rm -f` when normal forced remove fails;
-  - `ensure_user_podman_socket()` now verifies the rootless Podman socket accepts Unix connections, not just that the socket path exists;
-  - IndeeHub readiness now falls back to platform-managed network-alias presence when `getent` inside the API image cannot prove DNS;
-  - lifecycle harness now requires Tailscale launch content to look like login/auth UI.
- Local validation passed after those fixes:
-  - `cargo fmt --manifest-path core/Cargo.toml --all`.
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago-container` (`45 passed`).
-  - `bash -n tests/lifecycle/remote-lifecycle.sh`.
-  - `git diff --check`.
- Deployed second follow-up backend hash `06420c0377fff650a2bf3211f13c1e0754bf8df81345b8485f4c9a30cb552439` to `.198`; service active, timers inactive.
- Public RPC recovery attempts on hash `06420c...`:
-  - `package.restart indeedhub` still failed;
-  - `package.start immich` accepted async start but app remained `starting` with no `2283` launch;
-  - `package.start vaultwarden` accepted async start but no `8082` launch appeared;
-  - `package.restart portainer` failed;
-  - `package.restart tailscale` accepted async restart but no `8240` launch UI appeared.
- Latest focused probe after hash `06420c...`:
-  - `tailscale` `running`, `http://192.168.1.198:8240/` returns `000`;
-  - `immich` `starting`, `http://192.168.1.198:2283/` returns `000`;
-  - `indeedhub` `stopping`, `http://192.168.1.198:7778/` returns `000`;
-  - `portainer` `running`, `http://192.168.1.198:9000/` returns `000`;
-  - `vaultwarden` absent/not listed, `http://192.168.1.198:8082/` returns `000`.
- Conclusion: do not proceed to reboot testing or ISO work. The rootless Podman control-plane/socket health and stuck container-state recovery need a deeper platform fix before lifecycle/reboot gates are meaningful.
- Local validation passed so far:
-  - `cargo fmt --manifest-path core/Cargo.toml --all`.
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago-container` (`45 passed`).
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `bash -n tests/lifecycle/remote-lifecycle.sh`.
-  - `git diff --check`.
- A filtered `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago indeedhub` compiled and ran the matching existing IndeedHub test (`1 passed`); it did not exercise the new reboot recovery branch because there is no direct unit for that path yet.
- Next steps:
-  - deploy the new backend only after approval;
-  - verify focused `indeedhub,immich,tailscale,vaultwarden,portainer` lifecycle/launch, including IndeeHub Nostr provider check and Portainer socket usability;
-  - run reboot validation iterations on `.198` only after explicit approval;
-  - pass threshold: 3 consecutive clean post-fix reboots minimum, 5 preferred for production-release confidence.
-  - cut and smoke-test the `1.8-alpha` ISO after reboot validation is green.
-
-### Local Release Gate Completion After `.198` App Recovery
-
- Did not touch `.198`, reboot the host, change timers, or run Podman store-wide commands.
- Fixed scanner backoff/in-flight skip behavior: skipped scans now bump `scan_tick`, so install/update success paths that kicked the scanner do not wait for their timeout when Podman scan backoff is active.
- Fixed stale crash-recovery unit tests after `should_auto_start_stopped_container` gained the `include_stack_members` flag; coverage now asserts generic boot recovery skips stack helpers while stack recovery can include them.
- Fixed local runtime manifest-port lookup so tests and local backend runs can find workspace `apps/*/manifest.yml` via `CARGO_MANIFEST_DIR`; this covers new public apps such as PhotoPrism.
- Fixed journal usage parsing for real `journalctl --disk-usage` compact output such as `463.9M`.
- Fixed boot-reconciler cadence tests so `without_companion_stage()` also bypasses the global crash-recovery wait gate in tests; production still waits for recovery completion.
- Verified catalog generation is idempotent: `python3 scripts/generate-app-catalog.py` reported `updated 0 fields` for both catalogs.
- Validation passed locally:
-  - `cargo fmt --manifest-path core/Cargo.toml --all`.
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago` (`688 passed`).
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago-container` (`43 passed`).
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago-performance -p archipelago-security`.
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago-performance -p archipelago-security` (`12 security tests passed`; performance has no tests).
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `python3 scripts/check-app-catalog-drift.py --release --strict`.
-  - `python3 -m py_compile scripts/generate-app-catalog.py scripts/check-app-catalog-drift.py scripts/app-catalog-image-smoke-test.py`.
-  - `git diff --check`.
-  - `cmp -s app-catalog/catalog.json neode-ui/public/catalog.json`.
- Remaining gated item remains host reboot validation on `.198`, only if explicitly approved.
-
-### Frontend Release Gate Completion
-
- Did not touch `.198`, reboot the host, change timers, or run Podman store-wide commands.
- Found and fixed a mobile app-launch regression in `neode-ui/src/stores/appLauncher.ts`:
-  - desktop-only new-tab apps still open directly on desktop;
-  - mobile now routes those apps through the app-session route instead of escaping Archipelago in a new browser tab;
-  - `dashboardReturnPath()` now tolerates tests/minimal router mocks with no `currentRoute`.
- Updated frontend tests to match current desktop new-tab policy and mobile in-app routing behavior.
- Fixed `AppIconGrid` test setup so it shares the mounted Pinia instance and mocks credential lookup before launch.
- Fixed onboarding retry test timing to cover the actual exponential retry budget.
- Validation passed locally:
-  - `npm run type-check` from `neode-ui`.
-  - `npm test` from `neode-ui` (`548 passed`).
-  - `npm run build` from `neode-ui`.
-  - `python3 scripts/generate-app-catalog.py` (`updated 0 fields`).
-  - `python3 scripts/check-app-catalog-drift.py --release --strict`.
-  - `python3 -m py_compile scripts/generate-app-catalog.py scripts/check-app-catalog-drift.py scripts/app-catalog-image-smoke-test.py`.
-  - `cmp -s app-catalog/catalog.json neode-ui/public/catalog.json`.
-  - `git diff --check`.
- Local caveat: `npm ci` is currently blocked because existing `neode-ui/node_modules/@alloc` entries are owned by `root:root`. Existing installed modules were sufficient for type-check, tests, and build. Do not delete or chown this tree without explicit approval.
-
-### Fedimint/File Browser, Nostr/NPM, and IndeedHub Recovery
-
- Built and deployed backend hash `95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de` to `.198`.
- Fixed UI-facing package health for reachable running apps whose Podman health stayed `starting`, `unhealthy`, or a numeric exit value while the launch port was reachable.
- Confirmed Fedimint Guardian and File Browser were actually reachable; their `server.get-state` package-data now reports healthy instead of “starting up”.
- Fixed Nostr relay port conflict by moving `apps/nostr-rs-relay/manifest.yml` host port from `8081` to `18081`.
- Recovered Nginx Proxy Manager admin launch on `8081`; Nostr now launches on `18081` and no longer captures the NPM launch port.
- Hardened legacy package install so scoped web-app installs use `podman create` plus `systemd-run --user --scope podman start`, avoiding backend-cgroup coupling without hanging the install RPC.
- Recovered IndeedHub without deleting data: started the stopped `indeedhub-minio` dependency, repaired frontend reachability, and verified `7778` returns the app.
- Validation passed:
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `python3 scripts/check-app-catalog-drift.py --release --strict`.
-  - Focused lifecycle for `indeedhub,nginx-proxy-manager,nostr-rs-relay,fedimint,filebrowser`.
-  - Direct launch checks returned HTTP `200` for `7778`, `8081`, `18081`, `8175`, and `8083`.
-  - Broad non-destructive lifecycle passed on live hash `95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de`.
- Final `.198` state after validation: `archipelago.service` active; `archipelago-doctor.timer` inactive; `archipelago-reconcile.timer` inactive; `/` at `65%` used with about `9.6G` free; `/var/lib/archipelago` at `10%` used with about `370G` free.
-
-### Deployed Podman Store-Risk Cleanup
-
- Reviewed release-relevant Podman store/image call sites without running broad Podman store/image commands on `.198`.
- Bounded stack installer image pulls and manual package update image pulls with `kill_on_drop` and 600s timeouts.
- Deployed backend hash `a52a87474c9a788e058ee1da1edd6091ab305594a53e7a153889f77041598ff4` to `.198` with the previous backend backed up under `/usr/local/bin/archipelago.backup-20260608-store-risk-*`.
- Validation passed:
-  - `python3 scripts/check-app-catalog-drift.py --release --strict`.
-  - `cargo fmt` from `core/`.
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - Focused post-deploy lifecycle: `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint,immich,indeedhub,photoprism ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
-  - Broad post-deploy non-destructive lifecycle: `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
- Final `.198` state after validation: `archipelago.service` active; `archipelago-doctor.timer` inactive; `archipelago-reconcile.timer` inactive; `/` at `65%` used with about `9.8G` free; `/var/lib/archipelago` at `10%` used with about `370G` free.
-
-### Release Candidate Backend Restart Validation
-
- Built and deployed backend hash `e28affdf4c1d3cecbe4c14b0439b53d977ed20873c966c288116601d49dac732` to `.198`.
- Bounded additional Podman store/control probes so image and stack health checks fail fast instead of hanging under `.198` Podman store/socket load.
- Fixed Fedimint health reporting: if Podman health remains `starting` but the app endpoint is reachable, `container-health` can use the reachable cached app fallback.
- Fixed package start/restart fallback for runtime web apps by using `systemd-run --user --scope` for `podman start`, then falling back to direct bounded `podman start`.
- Recovered live Immich without data loss:
-  - `immich_server` had exited because `/usr/src/app/upload/encoded-video/.immich` could not be written.
-  - Correct live ownership is still `podman unshare chown -R 0:0 /var/lib/archipelago/immich`, which maps to host UID/GID `1000:1000` and container root ownership.
-  - A temporary `1000:1000` in-container ownership experiment was reverted because Immich's storage check writes as container root.
- Validation passed on latest hash:
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `python3 scripts/check-app-catalog-drift.py --release --strict`.
-  - `npm run build` from `neode-ui`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint ARCHY_STABILITY_SECONDS=10 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=immich ARCHY_STABILITY_SECONDS=10 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
-  - Backend restart validation followed by focused `fedimint,immich,indeedhub,photoprism` lifecycle passed.
-  - Post-restart broad non-destructive lifecycle passed.
- Remaining gate before calling this a release: host reboot validation, if approved.
-
-### IndeedHub and Immich Lifecycle Recovery
-
- Built and deployed backend hash `89dfc3d4e801b35564dc8dc7f4a513028eb7e2027b586e8aad7a0f374e20d6a9` to `.198`.
- IndeedHub focused audit is green after sequencing network alias repair immediately before frontend startup, after dependencies are running.
- Fedimint and NetBird focused audits are green; they were not current blockers after rerun.
- Immich was the broad-audit blocker and is now green:
-  - dependency readiness accepts healthy Podman health state for `immich_postgres` and `immich_redis` before falling back to slower exec probes;
-  - `immich_server` startup repairs `/var/lib/archipelago/immich` ownership with `podman unshare chown -R 0:0`, preserving upload data while matching the current rootless container user mapping;
-  - this fixed the observed `EACCES` on `/usr/src/app/upload/encoded-video/.immich`.
- Validation passed on latest hash:
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=indeedhub ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=netbird ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=immich ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
- Residual risk remains: `.198` still intermittently logs `podman ps -a --format json timed out after 30s` and transient Bitcoin RPC timeouts under load. Continue avoiding store-wide Podman commands.
-
-### Release Refactor Cleanup
-
- Built and deployed backend hash `14d360a206d1e58f287c5722d709dace0284b0dea56b66aa4bce0f57c631631b` to `.198`.
- Legacy package runtime host-port cleanup/repair now derives host ports from manifests when available.
- Hardcoded ports remain only as fallback for legacy/non-manifest apps and extra stale-port cleanup compatibility.
- Removed the duplicate Gitea-specific stale port cleanup helper.
- Validation passed on latest hash:
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
- Added focused runtime-host-port tests, but local `cargo test --manifest-path ../../core/Cargo.toml -p archipelago runtime_host_ports` did not finish within 5 minutes during compilation.
-
-### Catalog Metadata Generation
-
- Added `scripts/generate-app-catalog.py` to sync manifest-owned metadata into `app-catalog/catalog.json` and `neode-ui/public/catalog.json`.
- The generator updates fields that manifests already own: `title`, `version`, `description`, `dockerImage`, `category`, `tier`, `icon`, and `repoUrl`.
- The catalog still preserves catalog-only fields such as `author`, `requires`, `featured`, and rich `containerConfig` notes.
- Corrected stale manifest metadata for BotFights, IndeeHub, Gitea, LND, ElectrumX, Fedimint, and Mempool before generation.
- Release catalog drift is now zero:
-  - `python3 scripts/check-app-catalog-drift.py --release --strict` reports `metadata_drift=0`, `missing_catalog=0`, `missing_manifests=0`.
- Validation passed:
-  - `jq empty app-catalog/catalog.json neode-ui/public/catalog.json`.
-  - canonical and UI public catalogs match byte-for-byte.
-  - `cargo test --manifest-path core/Cargo.toml -p archipelago-container`.
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago`.
-  - `npm run build` from `neode-ui`.
-
-### Podman Store-Risk Hardening
-
- Built and deployed backend hash `eaa83c30467acd42ad864a8e0ea0d5fd88b94b775a06bfcdc460c4b0cd8e75b2` to `.198`.
- Fresh local-build installs now treat `podman image exists <local-build-tag>` failure/timeout as "unknown/missing" and rebuild the local image instead of failing the lifecycle operation.
- This keeps local image store checks from being release-blocking while preserving bounded runtime timeouts and matching the existing drift-restart behavior.
- Validation passed on the latest hash:
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
- Added focused unit test coverage for the image-exists failure behavior, but local `cargo test --manifest-path core/Cargo.toml -p archipelago install_fresh_builds_when_image_exists_check_fails` did not complete within 15 minutes during compilation.
-
-### Container Health Fallback and Broad Lifecycle Green
-
- Built and deployed backend hash `be95ea91339a7fb0a3b20d0ae5d816dca220d5e5ca86838cc0ba50b609ad7b36` to `.198`.
- Fixed `container-health` broad lifecycle timeout behavior:
-  - `cached_reachable_health()` now parses ports from URLs with trailing slashes correctly, such as `http://localhost:2342/`.
-  - The local TCP fallback now covers the lifecycle web app ports, including PhotoPrism, BTCPay, LND UI, Mempool, Electrum, Fedimint, Gitea, IndeedHub, Ollama, Vaultwarden, Tailscale, and others.
-  - Cached-running apps with reachable local TCP listeners can report `healthy` without depending on flaky Podman health/inspect calls.
- Validation passed on the latest hash:
-  - `cargo check --manifest-path core/Cargo.toml -p archipelago`.
-  - `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh`.
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`.
-
-### Generic Host-Port Health Checkpoint
-
- Built and deployed backend hash `3912b900c376b6c28bf5453640cae82135f67d7e0f984b8adcc78064b924143b` to `.198`.
- Confirmed objective remains: app behavior should be manifest/platform-primitive owned, not OS-image or per-app backend hack owned.
- Broad lifecycle on `d21202cd...` failed only on Uptime Kuma briefly showing `stopping` during listener repair; it recovered afterward.
- Fixed stale transitional merge: `Stopping -> Running` recovers when no user-stop marker exists; user-initiated stops still keep `Stopping`.
- Health monitor now derives required host TCP ports from Podman JSON `Ports` and marks running containers unhealthy when declared host listeners are missing.
- This is generic host-port health, not an app-specific mapping.
- After deploying `3912b900...`, Uptime Kuma recovered `3002` and returned HTTP `302` after backend restart.
- Jellyfin still needs follow-up: Podman reports `jellyfin Up ... (healthy)` with `0.0.0.0:8096->8096/tcp`, but `ss` shows no `8096` listener and `curl http://192.168.1.198:8096/` fails.
- Follow-up on `be95ea...` resolved the broad lifecycle timeout by hardening `container-health` fallback behavior.
-
-### Stale State and Jellyfin Pasta Listener Hardening
-
- Built and deployed backend hash `d21202cd79794e3bfc882d37134afd7a41dac766bae386a675714e5fa030e94e` to `.198`.
- `container-list` now overlays cached `exited` entries with targeted live state so scanner backoff does not leave lifecycle/UI reads stuck on stale `exited` after recovery.
- `container-health` now has a bounded cached-running plus local TCP reachability fallback for web apps, reducing dependency on slow/hung Podman inspect paths for health reads.
- Jellyfin was added to legacy runtime host-port repair for pasta listener `8096`.
- `package.restart jellyfin` still exposed a real Podman socket/runtime blocker after stopping the container: `Cannot connect to Podman socket at /run/user/1000/podman/podman.sock: Permission denied`.
- `package.start jellyfin` recovered the app afterward; `jellyfin` became `Up ... (healthy)`, `8096` had a `pasta.avx2` listener, and `http://192.168.1.198:8096/` returned HTTP `302`.
- Focused lifecycle passed on the latest hash:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Release catalog drift check remains: `missing_catalog=0`, `missing_manifests=0`, `metadata_drift=35`.
-
-### Expanded Cleanup and Store-Safe Uninstall
-
- Built and deployed backend hash `7f90345b75148b7ed748e1a417f31d1273e1646a9b742891858df11c5397051b` to `.198`.
- Expanded `system.disk-cleanup` to remove old rollback artifacts while keeping newest rollback points:
-  - `/usr/local/bin/archipelago.backup-*` newest 3.
-  - legacy `/usr/local/bin/archipelago.bak*` newest 3.
-  - `/usr/local/bin/archipelago.before-*` newest 3 as part of legacy backend cleanup.
-  - `/opt/archipelago/web-ui.bak*` newest 3.
-  - `/opt/archipelago/web-ui.old` included as web UI rollback cleanup.
- Live `system.disk-cleanup` reclaimed `10.3 GB`:
-  - `Removed old backend backups: 41.6 MB freed`.
-  - `Removed old legacy backend backups: 3.6 GB freed`.
-  - `Removed old web UI backups: 6.6 GB freed`.
-  - `Skipped Podman image/volume prune: Podman store commands can block app health on busy nodes`.
- `/usr/local/bin` dropped to about `336M`.
- `/opt/archipelago` dropped to about `1.1G`.
- Removed global `podman volume prune -f` from uninstall. Uninstall now logs a skip and still removes explicit app data when `preserve_data=false`.
-
-### Startup Scan and Uptime Kuma Fixes
-
- Startup `adopt_existing()` is bounded with a 35s timeout.
- Initial container scan seeds the same 300s Podman scan backoff used by periodic scans.
- Legacy pasta restart paths use scoped `podman restart` instead of stop+start.
- Uptime Kuma was repaired:
-  - Before: container internally healthy on `127.0.0.1:3001`, but host `3002` had no pasta listener.
-  - After: `package.restart uptime-kuma` returns `{"status":"restarted"}` and `http://192.168.1.198:3002/` returns HTTP `302`.
-
-### Cleanup and Catalog Work Already Done
-
- `system.disk-cleanup` intentionally skips Podman image/volume prune.
- `nostr-rs-relay` was added to both catalog surfaces.
- `scripts/check-app-catalog-drift.py --release --strict` reports zero missing catalog/manifest entries and zero metadata drift after catalog generation.
- Meshtastic `app.files` live behavior was validated: deleting `/var/lib/archipelago/meshtastic/config.yaml` and restarting recreated it from the manifest.
-
---
-
-## Verification Already Run
-
- `cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container` passed for the currently deployed release-candidate line.
- `cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release` passed for the currently deployed release-candidate line.
- Broad lifecycle on current hash `14d360a206d1e58f287c5722d709dace0284b0dea56b66aa4bce0f57c631631b` passed:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Targeted PhotoPrism audit on current hash passed:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh`
- Focused lifecycle on current hash `d21202cd79794e3bfc882d37134afd7a41dac766bae386a675714e5fa030e94e` passed:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Live cleanup RPC passed and reclaimed `10.3 GB`.
- Focused lifecycle after expanded cleanup passed:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Before the expanded cleanup pass, broad lifecycle also passed on hash `2b72e83ff368e4a696ad701f8985b0a8e1e889d9f4844056dc063455df973b28`:
-  - `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh`
- Direct app checks after latest cleanup passed:
-  - `http://192.168.1.198:3002/` -> HTTP `302`.
- `http://192.168.1.198:8096/` -> HTTP `302` after Jellyfin recovery/start.
-  - `http://192.168.1.198:8083/` -> HTTP `404` on `/`, which is expected for Filebrowser root probe behavior used here.
-
-### Test Caveat
-
- Earlier local focused test commands timed out during first-time test binary compilation, but after compilation completed the full backend test target passed: `cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago` (`688 passed`).
- Remaining workspace packages also pass checks/tests: `archipelago-container`, `archipelago-performance`, and `archipelago-security`.
-
---
-
-## Critical Constraints
-
- Preserve app data.
- `.198` is the active validation node.
- Current live backend hash on `.198`: `7e82532137292e91111f63819d1be7fa69f994ce20d6b5e0194915f194f20412`.
- Keep `archipelago-doctor.timer` and `archipelago-reconcile.timer` inactive unless explicitly testing them.
- Do not run destructive git commands.
- Do not run Podman store-wide cleanup or broad image/store commands on `.198` without a mitigation plan:
-  - Avoid `podman system df`.
-  - Avoid `podman image list` / `podman image ls`.
-  - Avoid broad `podman image exists` loops.
-  - Avoid `podman image prune` and `podman volume prune`.
- Podman store commands can hang and block app health under current `.198` load.
- Latest local mitigation: Rust release image-existence probes now use bounded targeted `podman image inspect` instead of `podman image exists` or `podman images -q`.
-
---
-
-## Current Remaining Blockers
-
-1. Podman socket/store health remains unresolved.
-    - Need quarantine/mitigation strategy rather than store-wide commands in release paths.
-    - Current release paths avoid prune and broad image-list/existence commands; orchestrator, companion, and legacy install image checks now use bounded `podman image inspect`.
-    - Latest concrete failure remains historical: `package.restart jellyfin` stopped the container but failed to complete because Podman reported socket permission/runtime failure. `package.start jellyfin` recovered afterward.
-    - Latest deployed hash still logged one initial `podman ps -a --format json` scan timeout/backoff, but focused and broad non-destructive lifecycle validation passed.
-
-2. Release code-review/refactor gate is still open.
-   - Reduce remaining app-specific Rust/OS branches where possible.
-   - Review scanner, health, reconcile, and install/update paths for performance and store-risk.
-   - Clean up dead transitional paths.
-
-3. Clean release branch hygiene is not done.
-   - Worktree is very dirty with many modified and untracked files.
-   - Do not commit unless explicitly asked.
-
-4. Full production validation still needed.
-   - Broad non-destructive lifecycle is green on live hash `7e82532137292e91111f63819d1be7fa69f994ce20d6b5e0194915f194f20412`.
-   - Backend restart validation has passed.
-   - Run host reboot validation if approved.
-   - Run selected full lifecycle tests for critical apps if time allows.
-
---
-
-## Files Changed In Latest Pass
-
- `core/container/src/runtime.rs`
-  - Changed Podman runtime `image_exists()` from `podman image exists` to a bounded targeted `podman image inspect` local-storage probe.
-
- `core/archipelago/src/api/rpc/package/install.rs`
-  - Replaced legacy `podman images -q` local fallback and post-pull verification checks with bounded targeted `podman image inspect`.
-
- `core/archipelago/src/container/companion.rs`
-  - Changed companion image existence checks from `podman image exists` to `podman image inspect`.
-
- `core/archipelago/src/container/prod_orchestrator.rs`
-  - Updated image-existence failure test fixture wording for the new `image inspect` probe.
-
- Validation for latest local mitigation:
-  - `cargo fmt --all --check` passed.
-  - `cargo check -p archipelago-container` passed.
-  - `cargo check -p archipelago` passed.
-  - `CARGO_INCREMENTAL=0 cargo check -p archipelago --tests` passed.
-  - `cargo test -p archipelago-container` passed (`43` tests).
-  - `git diff --check -- <changed files>` passed.
-  - Filtered `cargo test -p archipelago install_fresh_build` did not complete: one run hit a `rust-lld` undefined hidden symbol artifact/link failure after concurrent Cargo jobs; the sequential `CARGO_INCREMENTAL=0` rerun exceeded 10 minutes during compile, but test-target compilation passed afterward.
-
- `core/archipelago/src/api/rpc/system/handlers.rs`
-  - Calls expanded rollback cleanup helpers and reports reclaimed bytes.
-
- `core/archipelago/src/api/rpc/system/mod.rs`
-  - Added cleanup helpers for legacy backend backups and web UI rollback backups.
-  - Uses size accounting for directories before removal.
-  - Keeps newest rollback artifacts instead of deleting all.
-
- `core/archipelago/src/api/rpc/package/runtime.rs`
-   - Skips global `podman volume prune -f` during uninstall.
-   - Adds Jellyfin `8096` to runtime host-port/pasta cleanup repair.
-   - Derives legacy runtime host-port cleanup/repair ports from manifests.
-   - Keeps compatibility fallback ports for legacy/non-manifest apps and removes duplicate Gitea stale-port cleanup code.
-
- `core/archipelago/src/api/rpc/container.rs`
-   - Adds stale cached `exited` refresh for `container-list`.
-   - Adds cached-running plus local TCP reachability fallback for `container-health`.
-   - Fixes fallback URL port parsing and expands lifecycle web app port coverage.
-
- `core/archipelago/src/container/prod_orchestrator.rs`
-  - Rebuilds local-build images when `image_exists` fails/times out instead of failing fresh install.
-  - Adds focused unit test coverage for that behavior.
-
- `scripts/generate-app-catalog.py`
-  - Generates/syncs public catalog metadata from manifest-owned fields.
-
- `app-catalog/catalog.json` and `neode-ui/public/catalog.json`
-  - Generated from current manifests; files match byte-for-byte.
-
- `docs/CONTAINER_LIFECYCLE_HANDOFF.md`
-  - Added latest deployment, cleanup, validation, and residual-risk checkpoint.
-
- `docs/MIGRATION_STATUS_REPORT.md`
-  - Updated current hash, root disk state, and remaining blockers.
-
- `docs/RESUME.md`
-  - This file, replacing stale April migration resume content.
-
---
-
-## Suggested Next Steps
-
-1. Re-read the three docs:
-   - `docs/RESUME.md`
-   - `docs/CONTAINER_LIFECYCLE_HANDOFF.md`
-   - `docs/MIGRATION_STATUS_REPORT.md`
-
-2. Verify latest `.198` state:
-   - `ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 'df -h / /var/lib/archipelago; systemctl is-active archipelago.service; systemctl is-active archipelago-doctor.timer 2>/dev/null || true; systemctl is-active archipelago-reconcile.timer 2>/dev/null || true; sha256sum /usr/local/bin/archipelago'`
-
-3. Start Podman-store-risk review:
-   - Search for image/store operations: `image_exists`, `podman image`, `podman system`, `podman prune`, `volume prune`.
-   - Prefer targeted container status/API calls with timeouts.
-   - Avoid new broad store commands.
-
-4. Continue release code-review/refactor cleanup.
-
-5. If approved, run backend-restart validation and then host-reboot validation.
-
---
-
-## Current Release Readiness Estimate
-
- Credible release candidate: closer now, roughly `87-91%`.
- Production-quality release developers will love: still closer to `73-79%`.
-
-The biggest improvement in the latest pass is that broad lifecycle is green again on the latest backend. The biggest remaining technical risk is Podman store/socket health.
--- a/docs/SESSION-2026-03-18.md
+++ b/docs/SESSION-2026-03-18.md
@ -1,56 +0,0 @@
-# Session 2026-03-18 — Resume Guide
-
-## What Was Done
-
-### Rootless Podman Migration (TASK-11 DONE)
- .228: 30 containers running rootless with full security hardening
- All `sudo podman` removed from Rust backend (9 files) + deploy script
- UID mapping: container UID N → host UID (100000 + N - 1)
- Deploy script auto-fixes ownership + sysctl + linger on every deploy
-
-### .198 Migration (IN PROGRESS)
- Root containers stopped, UID ownership fixed, IndeedHub images migrated
- `/etc/hosts` fixed to 644 (rootless podman needs read access)
- **Only 2 containers running — needs full container recreation**
- Next: run container setup (Bitcoin, LND, ElectrumX, all apps)
- The `--both` deploy only copies binary+frontend, doesn't create containers
-
-### Security Hardening (TASK-8 — 9/12 pentest findings fixed)
- C1: /lnd-connect-info requires session auth
- C3: DEV_MODE removed from production service
- H1: node-message verifies ed25519 signatures
- M1: content.add rejects `..` path traversal
- M2: NIP-07 postMessage uses specific origin
- M3: AIUI nginx checks session_id cookie
- L2: Strict v3 onion validation
- **Still open**: H2/H3 (federation signature verification), H4 (bind ports to 127.0.0.1)
-
-### UI/UX Fixes
- Mesh serial: auto-detect, backoff, udev rule, Connect button
- External iframes: CSP https: added
- Container startup: "Checking..." shimmer, marketplace sort
- Port mapping: all nginx+frontend+backend synced
- ElectrumX: shows index size during indexing
- Fedimintd → "Fedimint Guardian"
- IndeedHub Studio version
- On-Chain first in receive modals
- Tab-launch icons, iframe error screen, CPU alert threshold
- Mesh mobile: header hidden, overflow fixed
- Federation/Cloud: DID on hover
-
-### Git Tags
- v1.2.0-alpha.1 through v1.2.0-alpha.8 (current)
-
-## Resume Checklist
-1. **Finish .198 containers** — create Bitcoin, LND, ElectrumX, MariaDB, Mempool, BTCPay, Grafana, etc.
-2. **H2/H3** — federation peer-joined/address-changed signature verification
-3. **H4** — bind service ports to 127.0.0.1
-4. **BUG-1** — CSRF mismatch (P0 critical)
-5. **Many /task items** in MASTER_PLAN.md from testing session
-6. **Tailscale migration** for other nodes (preserve auth state)
-
-## Key Facts
- Rootless subnet: 10.89.0.0/16
- Bitcoin RPC: rpcallowip=0.0.0.0/0, password in /var/lib/archipelago/secrets/
- .198 /etc/hosts must be 644
- Deploy --both only copies, --live creates containers
--- a/docs/SESSION-RESUME-2026-04-24.md
+++ b/docs/SESSION-RESUME-2026-04-24.md
@ -1,653 +0,0 @@
-> gitea app icon is still missing.
-
-> and we have a container called “bold_lichterman” which I have no idea what it is
-
-> great, let's finish it off
-
-# Session Resume - 2026-04-24
-
-## Latest user directives (must be followed first)
-
-> please continue, please state my last comment in the resume doc and first before making this plan to adhere to
-
-> And we need to get every container working on .116 and tested before we release
-
-> we have no time requirements so the best path is the way
-
-> Continue, leave release gate as a reminder later it won’t happen for a while
-
-> we only work via fuse thinkpad
-
-> all code has to be local changes to .116 (that machine) code and repo
-
-> we are not working on this machine is why, I removed it so you would never accidentally work here, we are doing all code on .116 Projects/archy repo
-
-> we're using paths instead of port which seems to be causing issues again, launch and tab should use port no? Please confirm this is correct as paths have never worked.
-
-> A lot of the apps aren't loading properly, did you screw all the apps up with this wrong approach?
-
-Adherence for current session:
- Before proposing or executing a plan, record the latest directive in this `SESSION-RESUME` doc first.
- Release gate is now explicit: `.116` required containers must be working and tested before release.
- No time constraint: choose the most correct long-term architecture/stability path even if it takes significantly longer.
- Release gate remains required, but treat it as a later checkpoint reminder while long-running sync/migration work continues.
- Runtime stabilization on `.116` is immediate priority; keep migration work aligned with this gate.
- Work context is strictly the `.116` repo via FUSE thinkpad mount; do not make/code against any non-`.116` local workspace.
-
-## Goal in progress
-Move package lifecycle to orchestrator-first behavior with automated proof gates, while keeping safe legacy fallback during migration.
-
-## Work completed in this session
-
-### Step 8b.1 wiring progress (orchestrator runtime parity)
- Implemented orchestrator-side resolution for new manifest fields in `core/archipelago/src/container/prod_orchestrator.rs`:
-  - resolve `container.derived_env` from detected host facts (`HOST_IP`, `HOST_MDNS`, `DISK_GB`) before create
-  - resolve `container.secret_env` from `/var/lib/archipelago/secrets/<name>` before create
-  - apply `container.data_uid` with pre-create recursive `chown -R UID:GID` on bind-mounted volume sources
- Added unit coverage in `prod_orchestrator.rs` for:
-  - derived+secret env resolution reaching `create_container`
-  - data_uid ownership path executing prior to create/start
- Extended Podman create payload mapping in `core/container/src/podman_client.rs` to honor:
-  - `container.network` (with legacy `security.network_policy` fallback)
-  - `container.entrypoint`
-  - `container.custom_args` as command args
-  - `volumes.type=tmpfs` with `tmpfs_options`
-
-### Step 8b.2 first backend manifest port started (fedimint)
- Ported `apps/fedimint/manifest.yml` from legacy `container-specs.sh` behavior:
-  - image corrected to `git.tx1138.com/lfg2025/fedimintd:v0.10.0`
-  - network set to `archy-net`
-  - bitcoin RPC target corrected to `bitcoin-knots:8332`
-  - `FM_BIND_P2P` / `FM_BIND_API` / `FM_BIND_UI` aligned with spec
-  - `FM_P2P_URL` / `FM_API_URL` migrated to `derived_env` with `HOST_MDNS`
-  - `FM_BITCOIND_PASSWORD` migrated to `secret_env` from `bitcoin-rpc-password`
-  - data dir ownership mapping set with `data_uid: "100000:100000"`
-
-### Step 8b.2 continued (fedimint-gateway manifest added)
- Added `apps/fedimint-gateway/manifest.yml` with a shell entrypoint wrapper matching legacy two-path behavior:
-  - if LND cert+macaroon are present, starts `gatewayd ... lnd --lnd-rpc-host lnd:10009 ...`
-  - otherwise starts `gatewayd ... ldk --ldk-lightning-port 9737 ...`
- Manifest uses new schema fields now wired in orchestrator runtime:
-  - `network: archy-net`
-  - `entrypoint` + `custom_args` (dynamic runtime command)
-  - `secret_env` for `FM_BITCOIND_PASSWORD` and `FEDI_HASH`
-  - `data_uid: "100000:100000"`
- Note: unlike legacy script, this manifest declares both `8176` and `9737` host ports statically; runtime branch still selects LND-vs-LDK execution at startup.
-
-### Step 8b.3 started (filebrowser baseline service)
- Added `apps/filebrowser/manifest.yml` to port baseline filebrowser from legacy specs/first-boot behavior:
-  - image: `git.tx1138.com/lfg2025/filebrowser:v2.27.0`
-  - `network: archy-net`
-  - `custom_args: ["--config", "/data/.filebrowser.json"]`
-  - `data_uid: "100000:100000"`
-  - capabilities include `NET_BIND_SERVICE` + legacy rootless write caps
-  - binds `/var/lib/archipelago/filebrowser` → `/srv` and `/var/lib/archipelago/filebrowser-data` → `/data`
- Added orchestrator pre-start hook for `filebrowser` in `core/archipelago/src/container/filebrowser.rs` and wired in `prod_orchestrator`:
-  - ensures root directories exist (`Documents`, `Photos`, `Music`, `Downloads`, `Builds`)
-  - writes `/var/lib/archipelago/filebrowser-data/.filebrowser.json` if missing (atomic tmp+rename)
-  - keeps behavior idempotent (no rewrite if config already exists)
-
-### Step 8b.3 continued (electrumx manifest added)
- Added `apps/electrumx/manifest.yml` with spec-faithful baseline:
-  - image `git.tx1138.com/lfg2025/electrumx:v1.18.0`
-  - network `archy-net`
-  - bind mount `/var/lib/archipelago/electrumx:/data`
-  - electrum TCP port `50001:50001`
-  - `secret_env` for Bitcoin RPC password
-  - shell entrypoint wrapper that exports `DAEMON_URL` with secret at runtime before launching `electrumx_server`
-  - keeps `COIN`, `DB_DIRECTORY`, `SERVICES` env aligned with legacy behavior
-
-### Step 8b.3 continued (bitcoin-knots + lnd manifest reconciliation)
- Reconciled `apps/bitcoin-core/manifest.yml` toward production `bitcoin-knots` behavior while keeping app id stable:
-  - added `container_name: bitcoin-knots` to preserve adoption of existing container name
-  - switched image to `git.tx1138.com/lfg2025/bitcoin-knots:latest`
-  - set `network: archy-net`
-  - added dynamic startup command (prune-vs-full-node) using `custom_args` and `DISK_GB` from `derived_env`
-  - added `secret_env` for Bitcoin RPC password and `data_uid: "100101:100101"`
- Reconciled `apps/lnd/manifest.yml` to legacy/runtime expectations:
-  - image updated to `git.tx1138.com/lfg2025/lnd:v0.18.4-beta`
-  - network set to `archy-net`
-  - capabilities aligned with spec (`CHOWN`, `FOWNER`, `SETUID`, `SETGID`, `DAC_OVERRIDE`, `NET_RAW`)
-  - bitcoin backend host corrected to `bitcoin-knots`
-  - RPC password moved to `secret_env` from `bitcoin-rpc-password`
-  - data ownership mapping set via `data_uid: "100000:100000"`
-
-### Step 8b.3 continued (mempool + btcpay companion manifests)
- Added new manifests for stack companions previously only defined in `container-specs.sh`:
-  - `apps/archy-mempool-db/manifest.yml`
-  - `apps/mempool-api/manifest.yml`
-  - `apps/archy-mempool-web/manifest.yml` (with `container_name: mempool` to preserve existing frontend container adoption)
-  - `apps/archy-btcpay-db/manifest.yml`
-  - `apps/archy-nbxplorer/manifest.yml`
- Reconciled `apps/btcpay-server/manifest.yml` toward runtime stack parity (image/tag/network/ports/env/deps aligned to legacy stack installer).
-
-### Step 8b.5 progress (update path: orchestrator-first recreate)
- Updated `core/archipelago/src/api/rpc/package/update.rs` recreate path to avoid hard dependency on `reconcile-containers.sh`:
-  - after stop/pull/rm, each container recreate now tries orchestrator `install(app_id)` first using container-name alias candidates
-  - includes alias mapping for known name/app-id mismatches (`bitcoin-knots` ↔ `bitcoin-core`, `archy-*` aliases, `mempool` ↔ `archy-mempool-web`)
-  - on orchestrator miss/error, falls back to legacy reconcile script path (safe migration fallback retained)
-  - rollback path now reuses the same orchestrator-first recreate helper instead of invoking reconcile directly
- Added unit test coverage for alias candidate generation in update module tests.
-
-### .116 release-gate automation scaffold started
- Added read-only required-stack lifecycle suite for `.116` in `tests/lifecycle/bats/required-stack.bats`:
-  - asserts required containers are present + running
-  - probes core endpoints (bitcoin RPC, electrumx TCP, lnd getinfo, mempool API/frontend, bitcoin-ui, lnd-ui)
- Updated `tests/lifecycle/run.sh` so no-auth read-only suites can run with `ARCHY_ALLOW_NOAUTH=1` (password still required for RPC-auth suites).
-
-### Stack install path migration progress (orchestrator-first)
- Updated `core/archipelago/src/api/rpc/package/stacks.rs`:
-  - added orchestrator-first stack installer helper (`install_stack_via_orchestrator`) with legacy stack fallback
-  - wired helper into `install_btcpay_stack` and `install_mempool_stack`
-  - fixed mempool legacy fallback drift:
-    - adopt checks now include current frontend container name `mempool`
-    - root DB secret name corrected to `mysql-root-db-password`
-    - backend host env aligned to `electrumx` and `bitcoin-knots` on `archy-net`
- Expanded orchestrator install allowlist in `core/archipelago/src/api/rpc/package/install.rs` to include newly ported backend/companion apps.
-
-### Legacy config drift cleanup (package config helpers)
- Updated legacy `get_app_config` paths in `core/archipelago/src/api/rpc/package/config.rs` to match current `.116` runtime topology and secrets:
-  - moved host-based RPC/electrum endpoints to in-network service names (`bitcoin-knots`, `electrumx`, `mempool-api`, `archy-nbxplorer`)
-  - corrected mempool mysql root secret fallback name to `mysql-root-db-password`
-  - aligned btcpay and fedimint bitcoin RPC URLs to `bitcoin-knots` service target
-  - removed LND host-based ZMQ defaults in legacy args path and aligned bitcoind RPC host to `bitcoin-knots:8332`
-
-### Step 8b migration tightening (install/update/stack policy)
- `core/archipelago/src/api/rpc/package/update.rs`
-  - moved `btcpay-server` and `mempool` out of forced legacy-update list (now orchestrator-first update candidates)
-  - kept safe legacy-update routing for still-unported stack families (`immich`, `penpot`, `indeedhub`, `fedimint`)
- `core/archipelago/src/api/rpc/package/stacks.rs`
-  - extracted canonical stack app-id sets for BTCPay and mempool and added unit test coverage to prevent drift
- `core/archipelago/src/api/rpc/package/install.rs`
-  - tests updated to assert expanded orchestrator-install allowlist for newly ported backend/companion apps
-
-### Continued migration + test gate expansion
- `core/archipelago/src/api/rpc/package/update.rs`
-  - moved `fedimint` out of forced legacy-update list (now orchestrator-first update candidate with fallback)
- `core/archipelago/src/api/rpc/package/config.rs`
-  - removed obsolete mempool data-dir cleanup target (`/var/lib/archipelago/mempool-electrs`) to match current stack shape
- Added destructive required-stack lifecycle suite:
-  - `tests/lifecycle/bats/required-stack-destructive.bats`
-  - gated by `ARCHY_ALLOW_DESTRUCTIVE=1`; restarts required service containers and verifies endpoint recovery
-  - keeps destructive checks explicit and opt-in during migration work
-  - added restart retry and HTTP readiness polling to absorb transient podman/pasta port-bind races during rapid restart cycles on `.116`
-
-### Validation run notes (latest)
- `.116`: `cargo test -p archipelago api::rpc::package::update::tests` -> PASS (4/4)
- `.116`: `cargo test -p archipelago api::rpc::package::config::tests` -> no direct tests matched filter (0 run, no failures)
- `.116`: `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1 tests/lifecycle/run.sh required-stack-destructive` -> PASS (3/3) after restart retry/readiness hardening
-
-### Added next lifecycle gate (in progress)
- Added `tests/lifecycle/bats/package-update-smoke.bats`:
-  - destructive RPC-authenticated update smoke for `package.update` on `bitcoin-ui`
-  - optional stack smoke for `mempool` behind `ARCHY_ALLOW_STACK_UPDATE=1`
- Updated `tests/lifecycle/run.sh` usage examples with `package-update-smoke` target
- First `.116` run attempt blocked by missing `ARCHY_PASSWORD` environment variable (expected for auth-required suite)
-
-### Newly observed UI routing issue (user report)
- Report: launching **Grafana** opens **Gitea** instead of Grafana.
- Likely collision/drift area to validate and fix:
-  - `core/archipelago/src/api/rpc/package/config.rs` currently maps both apps into the 3000/3001 neighborhood (`grafana` host `3000`, `gitea` host `3001` + historical nginx iframe comments).
-  - `neode-ui/src/stores/appLauncher.ts` resolves app sessions by URL port (`3000 -> grafana`), so stale/misrouted backend launch URLs or proxy rules can misdirect launches.
- Add regression checks after fix:
-  - container-list launch URL for grafana resolves to grafana service endpoint
-  - launching grafana from UI does not route to gitea content
-
-### Grafana->Gitea misroute remediation (current)
- Root cause confirmed: legacy `gitea-iframe.conf` bound host port `3000`, colliding with Grafana launch expectations.
- Fixes applied:
-  - `core/archipelago/src/api/rpc/package/install.rs`
-    - stop deploying gitea dedicated nginx server on `3000`
-    - remove stale `/etc/nginx/conf.d/gitea-iframe.conf` during gitea install path
-    - set Gitea `ROOT_URL` to `http://<host>/app/gitea/`
-  - `image-recipe/configs/nginx-archipelago.conf`
-    - `/app/gitea/` proxy now targets `127.0.0.1:3001` (not `3000`)
-  - `image-recipe/configs/snippets/archipelago-https-app-proxies.conf` and `scripts/nginx-https-app-proxies.conf`
-    - added explicit `/app/gitea/ -> 127.0.0.1:3001`
-  - `neode-ui/src/views/appSession/appSessionConfig.ts`
-    - moved gitea away from direct port `3000`; route via proxy path mapping
-  - `neode-ui/src/stores/appLauncher.ts`
-    - `resolveAppIdFromUrl()` now recognizes `/app/{id}/` path-based URLs before port mapping
-  - `neode-ui/src/stores/__tests__/appLauncher.test.ts`
-    - added regression test for `/app/gitea/` routing
- Validation:
-  - `.116` vitest launcher suite passes (`12/12`) with gitea path regression test.
-  - removed live `/etc/nginx/conf.d/gitea-iframe.conf` on `.116` and reloaded nginx.
- Current runtime note:
-  - `gitea` container running on `3001`; `grafana` container not currently running on `.116`, so direct `/app/grafana/` proxy check returns 502 until Grafana is started.
-
-### User directive (latest)
- Root cause to address later in planned sequence: **Grafana and Gitea must not share/clash ports**.
- Treat this as a dedicated root-fix item when we reach that phase; continue broader Step 8b migration/testing work in the meantime.
-
-### Workflow note
- Todo list maintenance explicitly requested; keep statuses current as work advances to avoid stale execution state.
-
-### Validation run notes (latest continuation)
- `.116`: `tests/lifecycle/run.sh required-stack-destructive` with `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1` -> PASS (3/3)
- `.116`: `cargo test -p archipelago api::rpc::package::update::tests` -> PASS (4/4)
- `.116`: `cargo test -p archipelago api::rpc::package::stacks::tests` -> PASS (1/1)
- `.116`: `cargo test -p archipelago api::rpc::package::install::tests` -> PASS (3/3)
-
-### Validation run notes (latest continuation 2)
- `.116`: `tests/lifecycle/run.sh package-update-smoke` with `ARCHY_PASSWORD=archipelago ARCHY_ALLOW_DESTRUCTIVE=1` -> PASS (`bitcoin-ui` smoke passed; `mempool` optional test skipped without `ARCHY_ALLOW_STACK_UPDATE=1`)
- `.116`: `tests/lifecycle/run.sh required-stack` with `ARCHY_ALLOW_NOAUTH=1` -> PASS (9/9)
- `.116`: `tests/lifecycle/run.sh required-stack-destructive` with `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1` -> PASS (3/3)
- `.116`: `cargo test -p archipelago api::rpc::package::install::tests` -> PASS (4/4) after alias mapping additions
- `.116`: `cargo test -p archipelago api::rpc::package::update::tests` -> PASS (5/5) after alias mapping additions
- `.116`: `cargo test -p archipelago api::rpc::package::stacks::tests` -> PASS (1/1)
-
-### Step 8b alias parity improvements
- `core/archipelago/src/api/rpc/package/install.rs`
-  - added orchestrator install app-id normalization (`bitcoin-knots -> bitcoin-core`, `electrs/mempool-electrs -> electrumx`)
-  - expanded orchestrator install allowlist to include alias IDs for parity with scanner/runtime naming
-  - added unit test: `install_aliases_map_to_manifest_app_ids`
- `core/archipelago/src/api/rpc/package/update.rs`
-  - added orchestrator update app-id normalization for same alias set
-  - orchestrator upgrade/health now uses normalized app-id while preserving package-level progress/state semantics
-  - added unit test: `update_aliases_map_to_manifest_app_ids`
-
-### Lifecycle hardening + full-suite pass
- `tests/lifecycle/lib/rpc.bash`
-  - `wait_for_container_status` now uses `container-list` state first and uses `container-status` with `app_id` fallback (instead of stale `name` param)
- `tests/lifecycle/bats/bitcoin-knots.bats`
-  - made `container-status` assertion resilient to alias-migration drift by accepting either valid `container-status` result or valid `container-list` state for `bitcoin-knots`
- `.116`: full lifecycle suite pass
-  - `ARCHY_PASSWORD=archipelago ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1 tests/lifecycle/run.sh`
-  - result: `1..25`, all passing (with expected optional skips)
-
-### Release-gate runtime status (latest)
- `.116` Bitcoin Knots chain sync remains in early IBD:
-  - `blocks=0`, `headers=342297`, `verificationprogress=7.28959974719862e-10`, `initialblockdownload=true`
- Several non-required containers remain unhealthy/exited and are not part of current required-stack release gate:
-  - examples: `homeassistant`, `immich_server`, `uptime-kuma`, `jellyfin`, `photoprism`, `vaultwarden`, `nextcloud`, `searxng`
-
-### Runtime diagnostics note (non-blocking to Step 8b lane)
- Grafana container on `.116` required mapped UID ownership (`100472:100472`) on `/var/lib/archipelago/grafana` to run under rootless user-namespace mapping.
- Active nginx on `.116` still had `/app/gitea/` upstream pointing to `127.0.0.1:3000` prior to full config rollout; corrected live config to `3001` and reloaded.
- Per user directive, the root architectural fix for Grafana/Gitea port separation remains a planned dedicated step (not closed yet).
-
-### Current `.116` proof status (latest run)
- Rust tests on `.116` all green for migration slices:
-  - `api::rpc::package::install::tests`
-  - `api::rpc::package::update::tests`
-  - `api::rpc::package::stacks::tests`
-  - `container::prod_orchestrator::tests`
-  - `archipelago-container manifest::tests::parse_every_real_manifest`
- `.116` required-stack lifecycle suite (`tests/lifecycle/bats/required-stack.bats`) re-run and passing (9/9).
-
-### Automated `.116` gate execution now running in-loop
- Re-ran `tests/lifecycle/bats/required-stack.bats` on `.116` (read-only gate suite): all checks passing.
- Re-ran Rust migration tests on `.116` after code updates:
-  - `api::rpc::package::install::tests`
-  - `api::rpc::package::update::tests`
-  - `container::prod_orchestrator::tests`
-  - `archipelago-container manifest::tests::parse_every_real_manifest`
-  - all passing.
-
-### Runtime stabilization update on `.116` (release-gate work)
- User directive recorded: all required containers on `.116` must be working and tested before release; no time constraint, choose best path.
- Best-path decision applied: move Bitcoin node to full mode (`txindex=1`, non-pruned) and rebuild chain state/indexes for durable ElectrumX/mempool compatibility.
-
-Actions taken:
- Wrote `/var/lib/archipelago/bitcoin/bitcoin_rw.conf` with full-mode settings:
-  - `server=1`
-  - `txindex=1`
-  - `rpcbind=0.0.0.0:8332`
-  - `rpcallowip=0.0.0.0/0`
-  - `listen=1`
-  - `bind=0.0.0.0:8333`
- Recreated `bitcoin-knots` with proper caps and `-reindex` startup.
- Confirmed node is running non-pruned and syncing from genesis; sample check showed `blocks=5954`, `headers=946415`, `pruned=false`, `txindex thread` active.
- Recreated `electrumx` on `archy-net` with a real `/var/lib/archipelago/electrumx` data mount.
- Corrected mempool MariaDB data ownership mapping mismatch (`/var/lib/archipelago/mysql-mempool` to `100998:100998`) so tables are readable by the container's mysql user.
- Restarted dependent containers (`lnd`, `electrumx`, `mempool-api`) after Bitcoin mode switch.
-
-Current status snapshot:
- `bitcoin-knots`: running, healthy, full reindex in progress.
- `electrumx`: running, initial sync catch-up in progress.
- `lnd`: running; health status noisy due to startup/wallet/macaroon checks while chain backend is syncing.
- `mempool-api`: running but endpoint still timing out during early-chain synchronization and repeated difficulty-update retries.
-
-Important note:
- Because the node has been reset to a full reindex from genesis, downstream service health is expected to remain transitional until sufficient chain progress is reached. Release gate is still open (not yet met).
-
-### 1) Orchestrator-first update path (partial migration)
- File: `core/archipelago/src/api/rpc/package/update.rs`
- Change:
-  - `handle_package_update` now attempts `orchestrator.upgrade(package_id)` first when eligible.
-  - Falls back to legacy update flow for stack/legacy packages.
-  - Handles `unknown app_id` from orchestrator as a non-fatal fallback case.
-
-### 2) Orchestrator-first install path (initial allowlist)
- File: `core/archipelago/src/api/rpc/package/install.rs`
- Change:
-  - `handle_package_install` now attempts `orchestrator.install(package_id)` first for allowlisted apps:
-    - `bitcoin-ui`
-    - `electrs-ui`
-    - `lnd-ui`
-  - Other apps remain on legacy install path for now.
-  - Handles `unknown app_id` fallback to legacy installer.
-
-### 3) Added unit tests
- `core/archipelago/src/api/rpc/package/update.rs`
-  - path-selection tests for orchestrator vs legacy.
- `core/archipelago/src/api/rpc/package/install.rs`
-  - allowlist tests for orchestrator-first install.
-
-### 4) Test commands run and status
- Ran:
-  - `cargo test -p archipelago api::rpc::package::install::tests`
-  - `cargo test -p archipelago api::rpc::package::update::tests`
- Result: passing.
-
-## Validation commands for target hosts
-
-### Local host
-```bash
-ssh localhost 'sudo systemctl restart archipelago && sleep 2 && systemctl --no-pager --full status archipelago | sed -n "1,60p"'
-```
-
-### Remote host (.228)
-```bash
-ssh archipelago@192.168.1.228 'sudo systemctl restart archipelago && sleep 2 && systemctl --no-pager --full status archipelago | sed -n "1,60p"'
-```
-
-### Check orchestrator-path logs
-```bash
-ssh archipelago@192.168.1.228 'journalctl -u archipelago -n 300 --no-pager | egrep "INSTALL ORCH|UPDATE ORCH|unknown app_id|legacy flow"'
-```
-
-### Check container states
-```bash
-ssh archipelago@192.168.1.228 'podman ps -a --format "{{.Names}}\t{{.Status}}\t{{.Image}}"'
-```
-
-## Recommended next steps
-1. Expand orchestrator-install allowlist beyond UI apps to additional single-container manifest-backed apps.
-2. Migrate stack updates (`mempool`, `btcpay`, `immich`, `indeedhub`) to orchestrator-driven stack plans.
-3. Unify graceful stop timeout behavior in orchestrator runtime path for stateful apps.
-4. Add SSH-driven integration tests (local + `.228`) as a release gate.
-
-## 2026-04-24 15:10 UTC — continuity checkpoint (auto-memory)
-
- User requested: keep working continuously and always update resume memory before any stop.
- Persisted code changes deployed to `/usr/local/bin/archipelago` on `.116`:
-  - `core/archipelago/src/api/rpc/package/config.rs`
-    - `immich` stack uses public `docker.io/valkey/valkey:7-alpine`.
-    - Healthcheck defaults hardened:
-      - `searxng` uses `wget` probe (image lacks curl).
-      - `botfights` uses node-based fetch probe for `/api/health`.
-      - `nextcloud` uses reachability probe (`curl -s -o /dev/null .../status.php`).
-      - `portainer` healthcheck disabled by default (`return vec![]`) to avoid false unhealthy flap.
-    - Portainer socket mount path updated to rootless user socket:
-      - `/run/user/1000/podman/podman.sock:/var/run/docker.sock`.
-  - `core/archipelago/src/api/rpc/package/install.rs`
-    - `create_data_dirs()` fallback chown flow guarded for UID mapping (no underflow path when host UID is root-mapped 1000).
- Validation run on `.116`:
-  - `cargo fmt --all`
-  - `cargo test -p archipelago api::rpc::package::stacks::tests`
-  - `cargo test -p archipelago api::rpc::package::install::tests`
-  - All passing (warnings only).
- Runtime state after redeploy + reinstall checks:
-  - Healthy: `botfights`, `searxng`, `nextcloud`, `immich_postgres`, `immich_redis`; `immich_server` running and ping OK.
-  - `portainer` running with no healthcheck (`health=none`) per persisted default.
-  - Required Bitcoin stack remains up (`bitcoin-knots`, `lnd`, `mempool-api`, `mempool`, `electrumx`, UIs).
-  - Intentional unresolved blocker: `uptime-kuma` stays `Created` due planned root fix (`gitea` occupies host `3001`).
- Note: `nextcloud` private-registry pull failed; public literal install path works (`docker.io/library/nextcloud:28`) and is now healthy.
-
-## 2026-04-24 15:20 UTC — continuation checkpoint
-
- Continued per request; no stop.
- Lifecycle regression fixed and verified:
-  - `tests/lifecycle/lib/rpc.bash` `wait_for_container_status()` fallback now maps aliases:
-    - `bitcoin-knots` -> `bitcoin-core`
-    - `electrs` / `mempool-electrs` -> `electrumx`
-  - This resolved flaky failure in `bats/bitcoin-knots.bats` stop/start wait path.
- Full lifecycle suite rerun:
-  - `ARCHY_PASSWORD=archipelago ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1 tests/lifecycle/run.sh`
-  - Result: `1..25` all passing (same optional skips as before).
- Runtime parity snapshot remains:
-  - Healthy/running: required Bitcoin stack, `immich_*`, `botfights`, `searxng`, `nextcloud`.
-  - `portainer` running with no healthcheck (`health=none`) by persisted default.
-  - Intentional remaining blocker unchanged: `uptime-kuma` `Created` due `gitea`/`3001` root conflict (deferred to root fix lane).
-
-## 2026-04-25 09:35 UTC — continuation checkpoint
-
- Re-ran full lifecycle with stack update smoke enabled:
-  - `ARCHY_PASSWORD=archipelago ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1 ARCHY_ALLOW_STACK_UPDATE=1 tests/lifecycle/run.sh`
-  - Result: `1..25` all passing (including optional test 13).
- Container/endpoint parity check post-suite:
-  - Required Bitcoin stack remains up; HTTP endpoints for mempool API/web + bitcoin/lnd UI respond.
-  - Immich still healthy (`/api/server/ping` -> `pong`).
-  - Non-required app states stable from previous hardening (`botfights`, `searxng`, `nextcloud` healthy; `portainer` running with no healthcheck).
-  - Planned unresolved conflict unchanged: `uptime-kuma` still `Created` due `gitea` occupying host `3001`.
- Bitcoin sync status snapshot (for release-gate context):
-  - `blocks=0`, `headers=392976`, `initialblockdownload=true`, `verificationprogress~7.29e-10`, `pruned=false`.
-
-## 2026-04-25 13:55 UTC — continuation checkpoint
-
- Continued stabilization after all lifecycle passes.
- Added noise-reduction tweak in `core/archipelago/src/electrs_status.rs`:
-  - Bitcoin RPC failures in ElectrumX status cache are now classified with `is_transient_error(...)`.
-  - Transient connection-style failures log at `debug` instead of `warn`.
-  - Non-transient failures still log as `warn`.
- Built + deployed updated backend binary and restarted `archipelago` service (`active`).
- Post-deploy runtime snapshot unchanged/stable:
-  - Healthy: required Bitcoin stack, `immich_postgres`, `immich_redis`, `botfights`, `searxng`, `nextcloud`.
-  - Running: `immich_server`.
- Known deferred blocker unchanged: `uptime-kuma` remains `Created` due `gitea` on host port `3001`.
-
-## 2026-04-25 14:20 UTC — continuation checkpoint
-
- User directive recorded first for this continuation:
-  - "it’s on the thinkpad in projects/archy via fuse drive or ssh"
-  - "whatever the best access method is"
- Switched active workspace to the `.116` repo via FUSE mount:
-  - `/Users/dorian/mnt/archy-thinkpad`
- Root cause confirmed for current `package.update bitcoin-ui` blocker:
-  - Service is running with `ARCHIPELAGO_DEV_MODE=true`, so orchestrator `upgrade()` resolves through `DevContainerOrchestrator::load_manifest_for()`.
-  - Dev manifest loader only searched legacy path `<data_dir>/apps/<app_id>/manifest.yml` (`/var/lib/archipelago/apps/...`), which is missing on `.116`.
-  - Production manifests are under `/opt/archipelago/apps` (and repo-local `/home/archipelago/Projects/archy/apps` on dev nodes), causing orchestrator update to fail with missing manifest.
- Fix applied:
-  - `core/archipelago/src/container/dev_orchestrator.rs`
-    - `load_manifest_for()` now searches manifest locations in this order:
-      1. `$ARCHIPELAGO_APPS_DIR`
-      2. `/opt/archipelago/apps`
-      3. `/home/archipelago/Projects/archy/apps`
-      4. `<data_dir>/apps` (legacy fallback)
-    - Added helper `candidate_manifest_paths(...)` with de-dup logic.
-    - Added unit test coverage for fallback path inclusion.
- Validation attempt:
-  - Ran `cargo fmt --all && cargo test -p archipelago container::dev_orchestrator::tests` from `core/`.
-  - Local FUSE-mounted build failed early with Rust toolchain environment issue:
-    - `error[E0463]: can't find crate for parking_lot_core`
-  - Code compiles were not validated in this host context; next validation should run directly on `.116` shell (ssh) where the existing build toolchain is known-good.
-
-## 2026-04-25 18:00 UTC — stabilization checkpoint (nginx/BTCPay/Uptime Kuma)
-
- User directive recorded for this lane:
-  - "just need to do it all, not bothered which order"
-  - "Uptime Kjuma opens gitty, we have an erroneous app called bitcoin UI and nginx proxy manager still doesn’t work"
-
- Root causes confirmed on `.116`:
-  1. **BTCPay broken**: DB ownership mismatch on `/var/lib/archipelago/postgres-btcpay` after UID mapping drift.
-     - Symptoms: BTCPay/NBXplorer PostgreSQL errors `could not open file global/pg_filenode.map: Permission denied`.
-  2. **Uptime Kuma cannot bind/start on 3001**: hard conflict with Gitea (already mapped to host 3001).
-  3. **Nginx Proxy Manager app route broken**: `/app/nginx-proxy-manager/` pointed to `127.0.0.1:8181`, but live NPM is on `81`.
-  4. **Uptime Kuma route opening Gitea**: upstream/redirect behavior around `/app/uptime-kuma/` required explicit path redirect handling.
-
- Code fixes applied in repo (ThinkPad FUSE `.116` source):
-  - `core/archipelago/src/container/dev_orchestrator.rs`
-    - manifest lookup fallback order for dev-mode orchestrator upgrade/install:
-      `$ARCHIPELAGO_APPS_DIR` -> `/opt/archipelago/apps` -> `/home/archipelago/Projects/archy/apps` -> `<data_dir>/apps`.
-  - `core/archipelago/src/api/rpc/package/config.rs`
-    - `uptime-kuma` host mapping changed `3001:3001` -> `3002:3001`.
-  - `core/archipelago/src/api/rpc/package/install.rs`
-    - BTCPay Postgres UID map corrected to container uid 999 (`host 100998`) for `archy-btcpay-db`.
-    - `uptime-kuma` install path now forces `--entrypoint=/usr/bin/dumb-init` (bypass failing `setpriv --clear-groups` startup path under rootless/cap-drop).
-  - `core/archipelago/src/port_allocator.rs`
-    - reserve `3002` to avoid accidental reallocation conflicts.
-  - `core/container/src/podman_client.rs`
-    - `lan_address_for("uptime-kuma")` updated to `http://localhost:3002`.
-  - nginx templates:
-    - `image-recipe/configs/nginx-archipelago.conf`
-    - `image-recipe/configs/snippets/archipelago-https-app-proxies.conf`
-    - `scripts/nginx-https-app-proxies.conf`
-    - Changes:
-      - `/app/uptime-kuma/` upstream -> `127.0.0.1:3002`
-      - exact `location = /app/uptime-kuma/` now redirects to `/app/uptime-kuma/dashboard`
-      - `/app/nginx-proxy-manager/` upstream -> `127.0.0.1:81`
-  - UI filtering:
-    - `neode-ui/src/views/apps/appsConfig.ts` now treats `bitcoin-ui`/`lnd-ui`/`electrs-ui` as service containers so they don’t appear as separate user apps.
-
- Live `.116` runtime actions executed:
-  - Corrected BTCPay Postgres data ownership to `100998:100998` and restarted `archy-btcpay-db`, `archy-nbxplorer`, `btcpay-server`.
-  - Recreated `uptime-kuma` on host `3002` using stable entrypoint (`/usr/bin/dumb-init -- node server/server.js`).
-  - Patched active nginx files (`sites-enabled` + snippets), validated with `nginx -t`, reloaded.
-  - Rebuilt and redeployed `/usr/local/bin/archipelago` from updated source; restarted `archipelago` service.
-
- Validation status after fixes:
-  - Rust tests on `.116`:
-    - `cargo test -p archipelago container::dev_orchestrator::tests` -> PASS
-    - `cargo test -p archipelago api::rpc::package::update::tests` -> PASS
-    - `cargo test -p archipelago api::rpc::package::install::tests` -> PASS
-  - Lifecycle gate:
-    - `tests/lifecycle/run.sh required-stack package-update-smoke` -> PASS (`1..11`, optional stack-update skipped unless enabled)
-  - Runtime smoke:
-    - `btcpay-server` login endpoint returns `200`.
-    - `uptime-kuma` container running healthy on `3002`; `/app/uptime-kuma/dashboard` returns `200` with Uptime Kuma HTML.
-    - `/app/nginx-proxy-manager/` returns `200` (no longer 502).
-    - `/app/gitea/` remains on `3001` and returns `200`.
-
- Remaining caveat for user UX confirmation:
-  - `/app/uptime-kuma/` intentionally returns `302` to `/app/uptime-kuma/dashboard`.
-  - If the browser still shows old behavior, clear cache/hard-refresh; live nginx and containers now reflect corrected routing.
-
-### Latest user directive (new)
- "Continue if you have next steps, or stop and ask for clarification if you are unsure how to proceed."
-
-### Continuation work completed after directive
- Objective: close the remaining UI caveat where `bitcoin-ui` could still appear as an app category influence when backend package key and manifest id differ.
- Added robust service detection by manifest identity, not only package key:
-  - `neode-ui/src/views/apps/appsConfig.ts`
-    - new helper `isServicePackage(id, pkg)` combines key-based and `manifest.id`-based service checks.
-    - `useCategoriesWithApps(...)` now filters using `isServicePackage(...)`.
-  - `neode-ui/src/views/Apps.vue`
-    - app/service tab split now uses `isServicePackage(id, pkg)` so service aliases cannot leak into My Apps.
- Added regression tests:
-  - `neode-ui/src/views/apps/__tests__/appsConfig.test.ts`
-    - verifies `bitcoin-ui` / `lnd-ui` / `electrs-ui` are always treated as services.
-    - verifies alias key case (`core-lnd-ui` with `manifest.id=bitcoin-ui`) is still classified as service.
-    - verifies service-only `money` category is removed when only real app is `filebrowser`.
-
-### Validation attempt + blocker
- Tried running targeted frontend tests, but local dependency toolchain on this FUSE workspace is currently broken:
-  - initial error: missing optional module `@rollup/rollup-darwin-arm64`
-  - `pnpm install` failed with filesystem permissions error: `EPERM ... node_modules/.ignored`
-  - subsequent `pnpm test` failed because `vitest` binary was unavailable after failed install
- Result: code-level regression fix is in place, but frontend test execution is blocked by workspace `node_modules` permission/install state.
-
-### Continuation update (this run)
- Proceeded to unblock validation as requested and completed targeted regression verification for the `bitcoin-ui` filtering fix.
-
- Frontend test infra recovery steps (workspace-local, no source-code logic changes):
-  - manually restored missing native optional binaries required by current platform:
-    - `@rollup/rollup-darwin-arm64@4.59.0`
-    - `@esbuild/darwin-arm64@0.27.3`
-  - repaired critical missing top-level packages/symlinks after interrupted mixed-package-manager install state (notably `vitest`, `vite`, `typescript`, `vue-tsc`, `jsdom`, `vue`, `pinia`, `vue-router`, `vue-i18n`, scoped deps under `@vitejs`, `@types`, etc.).
-
- Test execution status:
-  - default `vitest.config.ts` run remains blocked by `@vitejs/plugin-vue` resolving through `.ignored` path and failing compiler discovery in this FUSE/mixed-install state.
-  - added temporary local test config for TS-only unit suites:
-    - `neode-ui/vitest.novue.config.ts` (same alias/env basics, no Vue plugin)
-  - targeted regression suites now pass under this config:
-    - `pnpm test --config vitest.novue.config.ts src/views/apps/__tests__/appsConfig.test.ts src/stores/__tests__/appLauncher.test.ts` -> PASS (15/15)
-
- Lifecycle/host validation attempt from this macOS context:
-  - `tests/lifecycle/run.sh required-stack` -> blocked locally because `bats` is not installed in this environment (script exits with install hint).
-  - direct SSH to `.116` from this context is non-interactive blocked (`Permission denied`), so host-side lifecycle reruns require execution from the authorized `.116` session context.
-
-### Continuation update (latest)
- FUSE mount was stale (`Device not configured`) despite mount table entry; recovered by unmounting and remounting `sshfs archy:Projects/archy -> /Users/dorian/mnt/archy-thinkpad`.
-
- Lifecycle validation re-run on `.116` (via SSH):
-  - `ARCHY_ALLOW_NOAUTH=1 tests/lifecycle/run.sh required-stack`
-    - first run had a transient fail on "required containers are running" while mempool family was still in startup window after prior restarts.
-    - immediate rerun passed fully (`1..9` all `ok`).
-  - `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_NOAUTH=1 tests/lifecycle/run.sh required-stack-destructive` passed (`1..3` all `ok`).
-
- Frontend validation on `.116`:
-  - repaired host workspace dependency state by running `npm install` in `~/Projects/archy/neode-ui`.
-  - default Vitest config now works again.
-  - `npm run test -- src/views/apps/__tests__/appsConfig.test.ts src/stores/__tests__/appLauncher.test.ts` -> PASS (15/15).
-  - `npm run test -- src/stores/__tests__/app.test.ts src/stores/__tests__/container.test.ts` -> PASS (40/40).
-  - `npm run build` -> PASS, production bundle + PWA artifacts generated successfully.
-
- Status:
-  - `bitcoin-ui`/service filtering fix is validated with default test config on `.116`.
-  - required-stack + destructive required-stack gates both green on `.116` after transient startup window cleared.
-
- User clarified local machine workspace was intentionally removed; all code work must run on  host in  only.
-
- User re-emphasized launch/tab behavior should be port-based (not  path proxy), as path routing has repeatedly failed in practice.
- User reports many apps failing to load and suspects path-based launch routing regressed broad app behavior; prioritize reverting to stable port-based launch/tab behavior and revalidate.
-
- User reports Gitea app icon is still missing; investigate app icon source/fallback mapping and fix UI asset resolution.
-
- User asked about unknown  container; identified as unmanaged/named-by-podman Filebrowser container and should be reconciled into expected  managed naming/state.
-
- User requested finalization: complete remaining cleanup/validation tasks and produce final production-readiness status for .
-
-### Finalization sweep (latest)
- Removed unmanaged duplicate container `bold_lichterman`; managed `filebrowser` container remains healthy on host port `8083`.
- Confirmed launch behavior hardening:
-  - `gitea` is now treated as new-tab (iframe-blocking behavior).
-  - NPM/Kuma/Gitea new-tab/launch behavior is aligned in launcher + app session + app card tab-launch sets.
-  - App icon fallback now retries `.svg` when a `.png` icon path fails.
- UI validation:
-  - `neode-ui` targeted suites pass: `appLauncher` + `appsConfig` (23/23).
-  - Fresh production build completed and deployed to `/opt/archipelago/web-ui`.
-  - Served bundle verified from nginx: `/assets/index-ptu--7k0.js`.
- Runtime/container validation on `.116`:
-  - `podman ps` shows all expected containers running after cleanup.
-  - Host-port probe matrix executed; user-facing HTTP apps return `200` (gitea, kuma, npm, portainer, filebrowser, grafana, nextcloud, homeassistant, mempool, immich, etc.).
-  - Non-HTTP service ports (SSH/LN/RPC/TLS-only) are explicitly skipped or expected to not return HTTP.
- Lifecycle gates:
-  - `required-stack.bats`: PASS (`1..9`, all ok).
-  - `required-stack-destructive.bats` with `ARCHY_ALLOW_DESTRUCTIVE=1`: PASS (`1..3`, all ok).
-
-Current readiness status:
- Container runtime + required stack gates: green.
- Launcher/icon regressions reported by user: addressed and redeployed.
- Remaining production gate work is final manual UI smoke across all app entry points (Apps/AppDetails/AppSession/Spotlight) and release checklist sign-off.
-
-> let's go
-
- User approved final push: execute final smoke/checklist pass now and return go/no-go readiness report.
-
-### Final gate rerun (go/no-go check)
- Re-ran  and  for release-gate confirmation.
- Observed one transient  miss when tests were run concurrently with destructive restarts; immediate sequential rerun passed clean ( all ok).
- Destructive suite passed with gate enabled:  ( all ok).
- UI regression suite remains green: launcher + appsConfig ().
-
-Go/no-go verdict:
- **GO (technical gates)** on : required stack green, destructive restart recovery green, launcher/icon regressions fixed and deployed.
- Remaining non-automated item is manual browser click-through sanity across all entry points before publishing externally.
-
-> gitea app icon still missing
-
- User reports Gitea icon still missing after prior fallback; investigate backend-provided icon field handling and harden icon URL resolution for token icons (e.g., ).
-
-> Afterwards please build the latest ISO to test with all our work, commit and push too, we need an ISO of the unbundled version with just filebrowser bundled remember, thanks
- User requested final actions: build and test latest unbundled ISO variant (only filebrowser bundled), then commit and push changes.
-
-> Where is the ISO?
- User asked where ISO is; current archived unbundled builder run is failing before artifact generation and must be repaired.
-
-> please do not miss AIUI in the release build or remove it from the nodes whatever you do
- Critical release constraint: AIUI must remain bundled in release artifacts and must never be removed from existing nodes during update/deploy.
-
-> please check the resume files for our latest plan and resume the work.
- Current directive: read the resume/plan files, resume the latest active work, and continue from the recorded release/ISO lane while preserving the AIUI release constraint above.
--- a/docs/STATUS.md
+++ b/docs/STATUS.md
@ -1,667 +0,0 @@
-# RESUME HERE — Rust orchestrator migration
-
-Updated: 2026-04-23 (Install UX polish: phase-based progress bar, post-install scanner kick for instant Launch button, .23 VPS retired with auto-purge migration, frontend/backend deployed to .228 as v1.7.43-alpha.)
-
-**To resume this work, SSH into the ThinkPad and run `opencode` from `~/Projects/archy/`. Or work from the laptop via the SSHFS mount at `~/mnt/archy-thinkpad/`.**
-
---
-
-## ✅ INSTALL UX POLISH + .23 RETIREMENT — SHIPPED (v1.7.43-alpha)
-
-**Rounds 3–5 + config migration + changelog (2026-04-23)** — 5 commits on `main` (unpushed per user mirror protocol):
-
- `8cc84ebc` `feat(install): phase-based progress bar replaces unparseable pull bytes` — `podman pull` emits zero parseable progress when stderr is piped (no TTY), so the legacy byte-counting regex never matched. Replaced with 7 phase-based levels: Preparing (5%) → PullingImage (20%) → CreatingContainer (70%) → StartingContainer (80%) → WaitingHealthy (88%) → PostInstall (95%) → Done (100%). UI maps phases to fixed % and only advances forward (`Math.max`). Final phase label renamed from "Running post-install…" to "Finalizing…" after user feedback that it read like a regression to the install step.
- `f86d86c3` `fix(install): kick scanner post-install so Launch button appears immediately` — scan runs every 60s; post-install the state flipped to Running but the skeletal install-time manifest (`interfaces: None`) persisted until next scan, so `canLaunch(pkg)` returned false for up to a minute. Added `scan_kick: Arc<Notify>` + `scan_tick: Arc<watch::Sender<u64>>` on `RpcHandler`. Scan loop uses `tokio::select!` between the 60s interval and the notify. New `kick_scanner_and_wait` helper (2s timeout) called in install/update success paths BEFORE writing Running, so a fresh manifest lands first. Merge during Installing/Updating uses `merge_preserving_transitional` (keeps state, takes fresh manifest).
- `22052325` `chore: retire .23 VPS mirror, promote .168 OVH to primary` — dropped `DEFAULT_TERTIARY_MIRROR_URL`, promoted `.168` to `DEFAULT_SECONDARY_MIRROR_URL` as "Server 1 (OVH)". 2-entry default registry (.168 priority 0, tx1138 priority 10). Trusted-registry allowlist, catalog fallback, installer ISO registries, `marketplaceData.ts` REGISTRY, `image-versions.sh` all updated. Tests updated for new default counts (registry 3→2, mirror 3→2). URL-parser fixture tests in `update.rs` retain `.23` strings intentionally — they exercise string-parsing logic, not policy.
- `0ee16820` `fix(config): auto-purge decommissioned .23 VPS from saved registry/mirror configs` — `load_mirrors`/`load_registries` normally only ADD missing defaults (explicit removals stick, by design). Existing nodes have `.23` baked into their saved `update-mirrors.json` + `config/registries.json` and would pay timeouts forever against a dead host. Added targeted one-time migration in both loaders: `.retain(|m| !m.url.contains("23.182.128.160"))` before the defaults-merge step. Narrow-scope exception to the stickiness rule, documented in-code. Triggers lazily on next load (install RPC, update RPC, Settings UI open).
- `008da477` `docs(changelog): add v1.7.43-alpha entry covering async lifecycle + .23 retirement` — 4 release-note bullets in `AccountInfoSection.vue` describing async-spawn, phase progress, scanner kick, and .23 retirement from the operator's perspective. Historical "Server 3 (OVH)" entries in older changelog blocks left intact — they describe what shipped at the time.
-
-**Deployed to .228**:
- Backend binary md5 `d2b619949f19815faaeab10429e36ba0` at `/usr/local/bin/archipelago`.
- Frontend at `/opt/archipelago/web-ui/` (includes marketplaceData.ts .168 update + v1.7.43-alpha changelog entry). Deployed bundle verified: `.168` present in `Settings-*.js` + `Marketplace-*.js`, `.23` absent from all assets.
- `/var/lib/archipelago/update-mirrors.json` + `config/registries.json` were manually deleted + regenerated with new defaults during Round 5 verification; migration code will handle any other node on first load.
- Rollback targets from Round 2 still valid: `/usr/local/bin/archipelago.bak-pre-async-install` + `/opt/archipelago/web-ui.bak-pre-async-install/`.
-
-**Git remotes cleaned on .116** (working-copy change only, not in any commit):
- `git remote remove gitea-vps` (dropped the .23 Gitea remote).
- `git remote set-url --delete --push origin http://.../23.182.128.160:3000/...` (dropped .23 from origin multi-push alias).
- Remaining push targets: `tx1138` (canonical), `gitea-local` (localhost Gitea), `gitea-vps2` (.168 OVH).
-
-**Rollback Rounds 3–5** (same command as Round 2 — backups predate all of this):
-```
-ssh archy228 'sudo cp -a /usr/local/bin/archipelago.bak-pre-async-install /usr/local/bin/archipelago && sudo rsync -a --delete /opt/archipelago/web-ui.bak-pre-async-install/ /opt/archipelago/web-ui/ && sudo systemctl restart archipelago && sudo systemctl reload nginx'
-```
-
---
-
-## ✅ ASYNC-SPAWN LIFECYCLE FIX — SHIPPED (Stop/Start/Restart + Install/Uninstall/Update)
-
-**Round 2 (2026-04-23, install/uninstall/update)** — 3 commits on `main`:
-
- `2d5b859e` `feat(rpc): async-spawn install/uninstall/update lifecycle` — new `api/rpc/package/async_lifecycle.rs` with `spawn_package_install`, `spawn_package_uninstall`, `spawn_package_update`. Dispatcher + handler thread `self: Arc<Self>` so spawned tasks own their Arc. Install/update Ok arms explicitly set `Running` because `merge_preserving_transitional` refuses to let the scanner overwrite `Installing`/`Updating`. Removed redundant inner "already updating" guard in `update.rs`. Transient install entry uses empty icon (see commit 3 rationale).
- `0733ac40` `fix(ui): shorten install/uninstall/update timeouts for async RPCs` — drop 11m/45m timeouts to 15s across `rpc-client.ts`, `stores/server.ts`, and the 5 direct call sites in `Marketplace.vue`, `Discover.vue`, `MarketplaceAppDetails.vue`. Return types updated to `{ status, package_id }`.
- `e471ef75` `fix(rpc): empty icon in transient install entry to avoid broken-image flicker` — `progress.rs::create_installing_entry` no longer hardcodes `/assets/img/app-icons/<id>.png`. About half of bundled apps use `.svg`/`.webp` icons; the frontend's fallback chain (`backend_icon || curated.icon || placeholder`) now lands on the correct curated extension.
-
-**Deployed to .228** (binary md5 `f66857b3b8b3640c8cac8bd25fe508ec` at `/usr/local/bin/archipelago`, backup at `/usr/local/bin/archipelago.bak-pre-async-install`; frontend at `/opt/archipelago/web-ui/`, backup at `/opt/archipelago/web-ui.bak-pre-async-install/`). User confirmed: uninstall fast and responsive, install of LND + SearXNG clean, icon flicker fixed.
-
-**Known out-of-scope issue**: Vaultwarden container itself exits immediately on start with an internal error. The async wrapper correctly detects this via post-start exit verification and removes the state entry. Needs separate vaultwarden container-config investigation.
-
-**Rollback Round 2 (if ever needed)**:
-```
-ssh archy228 'sudo cp -a /usr/local/bin/archipelago.bak-pre-async-install /usr/local/bin/archipelago && sudo rsync -a --delete /opt/archipelago/web-ui.bak-pre-async-install/ /opt/archipelago/web-ui/ && sudo systemctl restart archipelago && sudo systemctl reload nginx'
-```
-
---
-
-**Round 1 (Stop/Start/Restart)** — 4 commits on `main` (unpushed per user mirror protocol):
-
- `44cd5eef` `feat(rpc): spawn_transitional helper for async lifecycle ops` — new `api/rpc/transitional.rs` with `Op::{Stop,Start,Restart}` and `RpcHandler::spawn_transitional` / `flip_to_transitional` / `set_state` helpers. `install_log` re-exported so sibling modules can use it.
- `19a99ca9` `fix(rpc): async container stop/start/restart; widen state mapping` — `container.rs` start/stop rewritten + restart added; `container-list` now emits all transitional variants instead of falling back to `"unknown"`. `dispatcher.rs` registers `container-restart`. `package/runtime.rs` mirrored with `do_package_*` helpers inside `tokio::spawn` and revert-on-error.
- `6712810b` `fix(state): preserve transitional state across container scans` — `server.rs` scan merge now keeps transitional states while taking fresh observability fields; 1200s stuck-timeout escape hatch via `transitional_since: HashMap<String, Instant>`. Three passing `server::merge_tests`.
- `9ce28f08` `fix(ui): single-button lifecycle control with transitional labels` — `ContainerApps.vue` and `ContainerAppDetails.vue` use a single primary button driven by `getAppVisualState()`. **Dashboard now routes through `container-start`/`container-stop`** (the async RPCs) instead of the legacy synchronous `bundled-app-*` path. `ContainerStatus.vue` widened to render all new variants.
-
-**Deployed to .228** (ThinkPad demo device):
- Binary at `/usr/local/bin/archipelago` (md5 `de86b63f74c7e6fe6e555ffe30b86b4f`), backup at `/usr/local/bin/archipelago.bak-pre-async-stop`.
- Frontend at `/opt/archipelago/web-ui/`, backup at `/opt/archipelago/web-ui.bak-pre-async-stop/`.
- Release build took 3m56s on .116. Deploy via scp + atomic `install -m 755` + `systemctl restart archipelago`. `nginx -t` + `systemctl reload nginx` for frontend.
-
-**Manual verification**: user clicked Stop on LND in the dashboard. Button flipped to `Stopping…` instantly, held for the full graceful-stop window, transitioned to `Start` when `podman stop` completed. No mid-flight revert to Running. User sign-off: _"absolutely beautiful"_.
-
-**Rollback (if ever needed)**:
-```
-ssh archy228 'sudo cp /usr/local/bin/archipelago.bak-pre-async-stop /usr/local/bin/archipelago && sudo rsync -a --delete /opt/archipelago/web-ui.bak-pre-async-stop/ /opt/archipelago/web-ui/ && sudo systemctl restart archipelago && sudo systemctl reload nginx'
-```
-
-### Follow-ups to consider
-
-1. **Chaos matrix / Step 11** — the original next-step gated behind this fix. Now unblocked.
-2. **bundled-app-start / bundled-app-stop** — still synchronous in the backend. Dashboard no longer calls them, but the RPC methods remain for any external caller. Decide: deprecate, or mirror the async-spawn treatment for parity.
-3. **`transitional_since` persistence** — currently in-memory only, so a backend restart mid-stop loses the timeout anchor. Acceptable for now (scan loop re-observes live podman state and reconciles), but worth revisiting if crash-recovery stories tighten.
-4. **Test regressions inventory** — the full `cargo test -p archipelago` run on .116 shows 22 pre-existing failures in unrelated modules (mesh/wallet/credentials/avatar/session/transport/update-mirrors/fips/identity_manager/image_versions). Unrelated to this work but tech debt. Log at `/tmp/cargo-test-all.log` on .116.
-5. **Amend STATUS.md's older "NEXT SESSION — START HERE" section** (below) — it is now stale. Left in place for historical reference of how the fix was designed; delete on the next pass if it gets confusing.
-
---
-
-## ⚡ NEXT SESSION — START HERE (historical — fix above is now shipped)
-
-**Goal**: implement async-spawn lifecycle fix so the dashboard never shows a frozen spinner again. User mandate: _"best server containers in the world"_. Do not ship the chaos matrix (Step 11) until this lands and manual LND stop verifies instant RPC + live `Stopping…` label.
-
-### How to work on this repo (SSH + SSHFS setup)
-
-You are likely running on the **laptop** (macOS). The repo lives on the **ThinkPad** (.116). There are two access paths, use both in parallel:
-
-1. **SSHFS mount at `~/mnt/archy-thinkpad/`** — for all file ops (`read`/`edit`/`write`/`glob`/`grep`).
-2. **Direct SSH** — for everything that isn't file ops: `git`, `cargo`, `npm`, `systemctl`, running the server, tailing logs.
-
-See the "FUSE / SSHFS development loop" section below for the full mount lifecycle — that's _the_ thing that makes this dev setup work, and it will break periodically.
-
-### FUSE / SSHFS development loop
-
-**Why this exists**: editing the repo directly on the ThinkPad over raw SSH means no IDE, no tool-native file reads, no glob/grep speed. SSHFS mounts the remote filesystem as a local directory so OpenCode's file tools work transparently. But SSHFS is a leaky abstraction — know the gotchas or you'll waste hours.
-
-**Stack** (macOS laptop):
- **macFUSE** — kernel extension providing FUSE on macOS. Install via `brew install --cask macfuse` (requires reboot + security approval in System Settings the first time).
- **sshfs** — userspace mount tool. Install via `brew install gromgit/fuse/sshfs-mac` (the homebrew core `sshfs` was removed; use this tap).
- Verify: `which sshfs` → `/opt/homebrew/bin/sshfs`, `sshfs --version` → `SSHFS version 2.10 / FUSE library version 2.9.9`.
-
-**Actual mount command currently running** (verified from `ps`):
-```
-sshfs archy:Projects/archy /Users/dorian/mnt/archy-thinkpad \
-  -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3,volname=archy-thinkpad
-```
-
-Breakdown:
- `archy:Projects/archy` — remote path via the `archy` SSH alias (uses `~/.ssh/archy_opencode`, no password prompt).
- `~/mnt/archy-thinkpad` — local mount point. Create once: `mkdir -p ~/mnt/archy-thinkpad`.
- `reconnect` — sshfs auto-reconnects if the TCP session drops (WiFi flap, laptop sleep). Without this, the mount turns into a zombie immediately.
- `ServerAliveInterval=15` — sends a keepalive every 15s.
- `ServerAliveCountMax=3` — disconnect after 3 missed keepalives (45s). Tune up if your network is flaky.
- `volname=archy-thinkpad` — Finder display name.
-
-**Check mount health**:
-```
-mount | grep archy-thinkpad
-# should print: archy:Projects/archy on /Users/dorian/mnt/archy-thinkpad (macfuse, nodev, nosuid, synchronous, mounted by dorian)
-
-ls ~/mnt/archy-thinkpad/ | head
-# should list repo contents fast (<1s). If it hangs, mount is stale.
-```
-
-**Recovery when the mount hangs / goes stale** (this WILL happen — laptop sleeps, WiFi drops, ThinkPad reboots):
-```
-# 1. Force-unmount (macOS — `umount` alone often fails on a hung FUSE mount)
-sudo diskutil unmount force ~/mnt/archy-thinkpad
-# fallback if diskutil can't see it:
-sudo umount -f ~/mnt/archy-thinkpad
-
-# 2. Kill any zombie sshfs process
-pkill -f "sshfs archy:Projects/archy"
-
-# 3. Remount
-sshfs archy:Projects/archy ~/mnt/archy-thinkpad \
-  -o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3,volname=archy-thinkpad
-
-# 4. Verify
-ls ~/mnt/archy-thinkpad/ | head
-```
-
-If the mount point itself got wedged (`ls: /Users/dorian/mnt/archy-thinkpad: Device not configured`), the sequence above still works — macFUSE garbage-collects the inode after the force-unmount.
-
-**When to use which path** (rules, not suggestions):
-| Operation | Use | Why |
-|---|---|---|
-| `read` / `edit` / `write` | SSHFS mount | OpenCode tools want local paths |
-| `glob` / `grep` | SSHFS mount | Local FS traversal is fine; remote would need rg over SSH |
-| Reading many files | SSHFS mount | Each read is a round-trip but parallelizable |
-| `git status` / `git diff` / `git log` | SSH | Git over FUSE is painfully slow (lots of stat calls) |
-| `git add` / `git commit` | SSH | Same — commit times grow linearly with tree size on FUSE |
-| `cargo check` / `cargo test` / `cargo build` | SSH | Compiling over FUSE would take hours; cargo's incremental stat pattern destroys FUSE performance |
-| `npm install` / `npm run build` | SSH | Same reason — massive file churn |
-| Running the server / tailing journal | SSH | Service lives on .116 |
-| Deploying to .228 | SSH from .116 | SCP from ThinkPad; laptop isn't in the critical path |
-
-**Don't do this** (will bite you):
- `cargo build` from the mount — will try to write target/ over FUSE, gets orders of magnitude slower, may hang.
- `rsync` without `--exclude="._*"` — macOS writes AppleDouble metadata files, they leak to the remote as `._*` siblings of every real file. `.gitignore` already excludes them (commit `13858842`), but they clutter the tree.
- Writing big binary files via the mount — use `scp` over SSH instead.
- Relying on file-change-watcher tools (watchman, chokidar) — they get confused by FUSE event semantics.
-
-**Editing workflow in a typical session**:
-1. Laptop: OpenCode `read`s a file via `/Users/dorian/mnt/archy-thinkpad/...`. FUSE fetches it over SSH, caches briefly.
-2. Laptop: OpenCode `edit`s the file — FUSE writes the new bytes back to .116 immediately (synchronous mount).
-3. Laptop: `ssh archy "cd ~/Projects/archy && ~/.cargo/bin/cargo check -p archipelago"` — runs on the real filesystem on .116, sees the edit.
-4. Laptop: `ssh archy "cd ~/Projects/archy && git diff path/to/file"` — confirms the edit landed.
-5. Laptop: `ssh archy "cd ~/Projects/archy && git add path/to/file && git commit -m '...'"` — commit from .116.
-
-The SSHFS mount and the SSH shell are pointing at **the same inodes** — edits via the mount are instantly visible to `cargo`/`git` over SSH. There's no "sync" step.
-
-**Cache caveat**: macFUSE caches attributes briefly (default ~1s). If you write via SSH and read via the mount within that window, you may see stale metadata. The mount's `synchronous` flag (visible in `mount` output) minimizes but doesn't eliminate this. If you get a weird diff between what SSH and the mount report, re-read after a second, or `stat --file-system ~/mnt/archy-thinkpad/<file>` to force a refresh.
-
-**Direct SSH** access (use when FUSE isn't the right tool):
-   - `ssh archy` → `archipelago@192.168.1.116` using `~/.ssh/archy_opencode`
-   - `ssh archy228` → `archipelago@192.168.1.228` using `~/.ssh/archy_opencode`
-   - Full host form also works: `ssh archipelago@192.168.1.116` / `ssh archipelago@192.168.1.228` (same key resolves via IdentitiesOnly).
-
-### SSH keys — what's where
-
-**Laptop `~/.ssh/` (macOS, user `dorian`)**:
-| File | Purpose |
-|---|---|
-| `archy_opencode` / `.pub` | **Primary key for this project.** Unlocks both `archy` (.116) and `archy228` (.228). Created 2026-04-22 specifically for OpenCode work. |
-| `archipelago-deploy` / `.pub` | Older archipelago deploy key. Not needed for current work. |
-| `id_ed25519` / `.pub` | Personal default key. Not used by archy/archy228 configs (`IdentitiesOnly yes` forces `archy_opencode`). |
-| `id_ed25519_angor` / `.pub` | Angor project. Unrelated. |
-| `id_ed25519_start9` / `.pub` | Start9 project. Unrelated. |
-| `vps-ci-setup` / `.pub` | VPS CI. Unrelated. |
-| `config` | Host aliases (shown above) |
-
-**.116 `/home/archipelago/.ssh/`**:
-| File | Purpose |
-|---|---|
-| `authorized_keys` | Accepts: laptop's `archy_opencode.pub` + 3 other keys (4 lines total). |
-| `id_ed25519` / `.pub` | .116's OWN identity key. This is what lets `.116 → .228` work passwordless. |
-| `archipelago-deploy` | Symlink → `id_ed25519` (legacy alias). |
-| `id_ed25519_vps168` / `.pub` | For SSH to `146.59.87.168` (VPS). Unrelated to this work. |
-| `config` | Host entry for the VPS only. |
-
-**.228 `/home/archipelago/.ssh/`**:
-| File | Purpose |
-|---|---|
-| `authorized_keys` | Accepts: laptop's `archy_opencode.pub` + .116's `id_ed25519.pub` + 2 others (4 lines total). |
-| _(no `id_ed25519`)_ | .228 has no outbound key — it's a terminal node. Don't try to `ssh` _from_ .228 _to_ anywhere. |
-
-**Connectivity matrix (all verified 2026-04-23)**:
-| From → To | Works passwordless | Via |
-|---|---|---|
-| Laptop → .116 | ✅ | `archy_opencode` |
-| Laptop → .228 | ✅ | `archy_opencode` |
-| .116 → .228 | ✅ | .116's `id_ed25519` |
-| .228 → anywhere | ❌ | no outbound key (by design) |
-
-### Sudo — verified state
-
-**.116** (dev ThinkPad):
- User `archipelago` is in `sudo` group.
- Sudo password required: **`ThisIsWeb54321@`**
- Sudoers drop-ins present: `/etc/sudoers.d/archipelago-ci`, `/etc/sudoers.d/archipelago-wg` (scope-limited NOPASSWD for specific CI/wg commands — not full NOPASSWD).
- For most dev work you don't need sudo on .116.
-
-**.228** (prod kiosk):
- User `archipelago` has **full passwordless sudo** via `/etc/sudoers.d/archipelago` containing `archipelago ALL=(ALL) NOPASSWD:ALL`.
- User is also in `sudo` group.
- Sudo password (if ever prompted, shouldn't be): **`archipelago`**
- Dashboard password: **`password123`**
-
-### Cargo / npm / paths
-
- **Cargo PATH gotcha**: non-interactive SSH login has no cargo in PATH. Always use `~/.cargo/bin/cargo` over SSH.
-  - Example: `ssh archy '~/.cargo/bin/cargo check -p archipelago' --workdir ~/Projects/archy/core`
-  - Or cd first: `ssh archy 'cd ~/Projects/archy && ~/.cargo/bin/cargo check -p archipelago'`
- **Long cargo builds** (>2 min Bash tool timeout): launch detached and poll the log:
-  ```
-  ssh archy 'cd ~/Projects/archy && nohup ~/.cargo/bin/cargo build --release -p archipelago > /tmp/cargo-build.log 2>&1 < /dev/null & disown'
-  ssh archy 'tail -30 /tmp/cargo-build.log'
-  ssh archy 'pgrep -a cargo'   # to check if still running
-  ```
- **npm / frontend** lives at `~/Projects/archy/neode-ui/` on .116 (also accessible via laptop mount at `~/mnt/archy-thinkpad/neode-ui/`). Node is on interactive PATH; for scripted SSH, `source ~/.nvm/nvm.sh && nvm use` or call the absolute path if nvm is used.
- Repo on .116: `~/Projects/archy/` (Cargo workspace at `core/Cargo.toml`).
- Web root on .228: check `/etc/nginx/sites-enabled/` for the live path; historically `/var/lib/archipelago/web-ui/` or `/opt/archipelago/web-ui/`.
-
-### Deploying new server binary to .228
-
-```
-# 1. Build on .116 (detached — takes ~3-5 min for release)
-ssh archy 'cd ~/Projects/archy && nohup ~/.cargo/bin/cargo build --release -p archipelago > /tmp/cargo-build.log 2>&1 < /dev/null & disown'
-# wait / tail log until "Finished `release` profile"
-
-# 2. SCP .116 → .228 (uses .116's id_ed25519 → .228's authorized_keys, passwordless)
-ssh archy 'scp ~/Projects/archy/core/target/release/archipelago archipelago@192.168.1.228:/tmp/archipelago.new'
-
-# 3. Atomic swap on .228 with backup
-ssh archy228 'sudo cp /usr/local/bin/archipelago /usr/local/bin/archipelago.bak-pre-async-stop && sudo mv /tmp/archipelago.new /usr/local/bin/archipelago && sudo chmod +x /usr/local/bin/archipelago && sudo systemctl restart archipelago'
-
-# 4. Verify
-ssh archy228 'systemctl status archipelago --no-pager | head -20 && sudo journalctl -u archipelago -n 50 --no-pager'
-```
-
-### Git workflow
-
- Branch: `main` on .116, currently **22 commits ahead of `tx1138/main`**.
- Remote `tx1138` exists but **do NOT push** — user mirrors to 4 Gitea remotes personally after reviewing.
- Atomic commits, one logical change per commit. Conventional Commits format (`feat:`, `fix:`, `docs:`, `refactor:`, `chore:`, `test:`, `perf:`).
- Never `--amend` unless the commit you're amending was created in this session AND has not been pushed. Safer: new commit.
- Never `--force` push. Never modify git config.
- If pre-commit hooks fail, create a NEW commit with the fix — don't `--amend` after a failed commit.
-
-### Other
-
- Full destructive latitude on both nodes. Announce multi-hour ops (OTA, full rebuild, apt upgrade). Don't ask for routine stop/start/rebuild permission.
- No ship pressure. Do it properly.
- Use `question` tool for ambiguous decisions (don't guess user intent on design choices).
- Keep `docs/STATUS.md` fresh between sessions — it IS the session handoff.
-
-### Hosts reference (quick)
-
-| Host | IP | SSH alias | Role | Dashboard | Sudo |
-|---|---|---|---|---|---|
-| `archy` (ThinkPad X250) | 192.168.1.116 | `ssh archy` | dev host, Debian 13 | `archipelago` | `ThisIsWeb54321@` |
-| `archy228` (HP ProDesk) | 192.168.1.228 | `ssh archy228` | prod kiosk, Rust orchestrator | `password123` | NOPASSWD (fallback `archipelago`) |
-
-### Bug being fixed
-
-Dashboard sequence when user clicks **Stop LND**:
-1. UI collapses Start/Stop buttons to single spinner-button ("Stopping…") via `loadingApps.add('lnd')`.
-2. Frontend calls `container-stop` RPC. Server runs `podman stop -t 330 lnd` **synchronously inside the RPC handler** (via `orchestrator.stop()`). RPC blocks up to **5.5 min** for LND (330s timeout + overhead).
-3. Meanwhile the 30-second package-scan loop in `server.rs:scan_and_update_packages` keeps running. It rebuilds `PackageDataEntry` from podman inspect — podman still reports `running` (stop hasn't completed) — and **blindly overwrites** the store entry at `server.rs:854`.
-4. `container-list` RPC reads `state_manager` snapshot → returns `state = "running"`.
-5. Frontend polling sees `running` → `getAppState()` returns `'running'` → the two-button (Start | Stop) block re-renders → the transitional button disappears → **UI looks like the stop silently failed**.
-6. Eventually `podman stop` finishes → next scan → state flips to `Stopped` → buttons change _again_.
-
-Net visible bug: button spins briefly, reverts to Running, then several minutes later suddenly shows Stopped. User rightly calls this "out of sync and confusing".
-
-### Decisions already locked in (do not re-ask)
-
- **Full scope fix** (not minimal hotfix). User chose "Go full scope, do it right".
- **Async-spawn lives in the RPC layer**, not in the `ContainerOrchestrator` trait. Trait stays synchronous so the reconciler, boot flow, unit tests, and the chaos harness retain deterministic behaviour.
- **`PackageState` already has `Stopping`/`Starting`/`Restarting`/`Installing`/`Updating`/`Removing`** variants — enum at `core/archipelago/src/data_model.rs:107-124`. No schema change needed.
- **UI collapses to one full-width button** with spinner during every transitional state. Labels: Start / Stop / Starting… / Stopping… / Restarting… / Installing… / Updating… / Removing… / Install (when `not-installed`).
- **Helper API shape**: `RpcHandler::spawn_transitional(op: Op, app_id: String)` where `Op` is an enum `{Stop, Start, Restart}`. Helper dispatches to `orchestrator.stop/start/restart` internally, knows each op's transitional+final states, handles error → revert + `install_log()`.
- **`mark_user_stopped` must run BEFORE the spawn** (preserves ordering the crash recovery layer depends on — see `runtime.rs:145-148`).
-
-### Implementation order (4 commits, local only)
-
-**Commit 1 — `feat(rpc): spawn_transitional helper for async lifecycle ops`**
- New file: `core/archipelago/src/api/rpc/transitional.rs` (or extend `container.rs`; prefer new file for cohesion with future stacks/package variants)
- `enum Op { Stop, Start, Restart }` with `transitional_state()`, `final_state_on_success()`, `log_prefix()`, and async `dispatch(&orch, &app_id)` method
- `impl RpcHandler { pub(super) async fn spawn_transitional(&self, op: Op, app_id: String) -> Result<()> }`
-  - Capture `Arc<dyn ContainerOrchestrator>` + `Arc<StateManager>` clones
-  - Set transitional state via `state_manager.update_data()` (if entry exists; skip if not — Start on never-installed shouldn't create an entry)
-  - `tokio::spawn(async move { ... })`
-  - Inside spawn: `install_log("{LOG_PREFIX}: {app_id}")`, `op.dispatch(&orch, &app_id).await`, on success set final state, on error log + `install_log("{LOG_PREFIX} FAIL: …")` + revert state to previous (cache pre-transition state in a local)
-  - Return `Ok(())` immediately after spawn
-
-**Commit 2 — `fix(rpc): async container stop/start/restart; widen state mapping`**
- `api/rpc/container.rs:85-107` — rewrite `handle_container_stop` body: `validate_app_id`, `mark_user_stopped`, `spawn_transitional(Op::Stop, app_id.to_string()).await?`, return `Ok(json!({ "status": "stopping" }))`
- `api/rpc/container.rs:61-83` — rewrite `handle_container_start`: `clear_user_stopped`, `spawn_transitional(Op::Start, …)`, return `{ "status": "starting" }`
- **Add** `handle_container_restart` (currently missing in `container.rs` — only exists as `package.restart` at `runtime.rs:176-242`). Register RPC route name `container-restart`. Add matching frontend client method in `container-client.ts`.
- `api/rpc/container.rs:148-154` — widen the `container-list` state mapping: add arms for `Stopping → "stopping"`, `Starting → "starting"`, `Restarting → "restarting"`, `Installing → "installing"`, `Updating → "updating"`, `Removing → "removing"`, `Installed → "installed"`, `CreatingBackup`/`RestoringBackup`/`BackingUp` → their kebab-case strings. No more `"unknown"` fallback unless the variant is genuinely unknown.
- Mirror same spawn treatment in `api/rpc/package/runtime.rs`: `handle_package_start` (L28-119), `handle_package_stop` (L122-173), `handle_package_restart` (L176-242). Keep the existing verification loops (post-start exit-check at L82-117; restart stop+start fallback at L215-235) _inside_ the spawned future, not in the RPC body.
-
-**Commit 3 — `fix(state): preserve transitional state across container scans`**
- `server.rs:847-857` — in the merge loop, before the `merged.insert(id.clone(), pkg.clone())` overwrite, check `merged.get(id).state` and skip overwrite if it's transitional: `matches!(existing.state, Installing | Stopping | Starting | Restarting | Updating | Removing | CreatingBackup | RestoringBackup | BackingUp)`
- Still allow _non-state_ fields (lan_address, health, ports) to update. Simplest: when existing is transitional, keep `existing.state` but merge updated fields from `pkg`. Write a tiny helper `merge_preserving_transitional(existing, fresh) -> PackageDataEntry`.
- Unit test: construct `existing.state = Stopping`, `fresh.state = Running`, assert merged.state stays `Stopping`.
- **Also check**: Is there a timeout escape hatch? If `Stopping` is set and podman actually finishes but the spawn died before writing the final state (process crash, panic), the entry will be stuck `Stopping` forever. Mitigation: track a `transitional_since: Instant` in the entry (not persisted, just in-memory side table on StateManager), and if > 2× the stop timeout has elapsed, allow podman scan state to override. Scope for this commit or follow-up — lean toward: include it, because fleet reliability matters.
-
-**Commit 4 — `fix(ui): single-button lifecycle control with transitional labels`**
- `neode-ui/src/api/container-client.ts` — extend `ContainerStatus.state` union to: `'created' | 'running' | 'stopped' | 'exited' | 'paused' | 'unknown' | 'stopping' | 'starting' | 'restarting' | 'installing' | 'updating' | 'removing' | 'installed'`. Add `restartContainer(appId)` method calling `container-restart`.
- `neode-ui/src/stores/container.ts` — add computed `getAppVisualState(appId)` that returns one of: `'not-installed' | 'running' | 'stopped' | 'starting' | 'stopping' | 'restarting' | 'installing' | 'updating' | 'removing'`. Maps `exited`→`stopped`, `created`→`stopped`, `paused`→`stopped`, `installed`→`stopped`. Add `restartContainer(appId)` action (sets `loadingApps` for request dedup, calls client, does NOT `fetchContainers` immediately because server will broadcast state; a final `fetchContainers` after a short delay can backstop if WebSocket push is absent).
- `neode-ui/src/views/ContainerApps.vue:85-136` — replace the two-button conditional with a single full-width button bound to `getAppVisualState(app.id)`. Table:
-  | visual state    | click action   | label          | spinner | disabled |
-  |-----------------|----------------|----------------|---------|----------|
-  | `not-installed` | installApp     | Install        | no      | no       |
-  | `running`       | stopContainer  | Stop           | no      | no       |
-  | `stopped`       | startContainer | Start          | no      | no       |
-  | `starting`      | —              | Starting…      | yes     | yes      |
-  | `stopping`      | —              | Stopping…      | yes     | yes      |
-  | `restarting`    | —              | Restarting…    | yes     | yes      |
-  | `installing`    | —              | Installing…    | yes     | yes      |
-  | `updating`      | —              | Updating…      | yes     | yes      |
-  | `removing`      | —              | Removing…      | yes     | yes      |
-  - Add a separate Restart button next to the primary one when state is `running`, calling new `restartContainer` action. Restart button hides while transitional.
- `neode-ui/src/views/ContainerAppDetails.vue:83` (and full stop/start button blocks around L220, L232) — mirror the same single-button pattern.
- Also audit line 239 of `ContainerApps.vue` (`some((app) => store.getAppState(app.id) === 'created')`) and the logic around lines 276, 295, 309, 312 — make sure they use `getAppVisualState` where appropriate.
-
-### Verification gates (do not skip)
-
-1. `~/.cargo/bin/cargo check -p archipelago` on .116 via SSH
-2. `~/.cargo/bin/cargo test -p archipelago` on .116 via SSH — at least the new merge helper test must pass
-3. Build release binary on .116: `nohup ~/.cargo/bin/cargo build --release -p archipelago > /tmp/cargo-build.log 2>&1 < /dev/null & disown`. Poll until done.
-4. SCP binary to .228 `/usr/local/bin/archipelago`, back up prior to `/usr/local/bin/archipelago.bak-pre-async-stop`. `sudo systemctl restart archipelago` on .228.
-5. **Manual LND stop test on .228**:
-   - Open dashboard, confirm LND is Running (first: `ssh archipelago@192.168.1.228 'podman start lnd'` — LND is currently Exited(0) from the demo)
-   - Click Stop
-   - Expected: button _immediately_ becomes "Stopping…" with spinner (RPC returns <1s)
-   - Dashboard should stay on "Stopping…" for ~5 min
-   - Then flip to "Start" button with label "Start"
-   - At no point should it revert to "Running" mid-stop
-6. Same test with Bitcoin Core stop (longest timeout, 600s)
-7. Frontend build: `cd ~/Projects/archy/neode-ui && npm run type-check && npm run build`. Rsync `dist/` to `archipelago@192.168.1.228:/var/lib/archipelago/web-ui/` (or wherever the active web root is — check `/etc/nginx` on .228 first).
-8. Then and only then: resume chaos matrix. First recover LND/ElectrumX via UI (great end-to-end test of the new async Start path), then run smoke → full 32-case matrix.
-
-### Key files (exact lines of interest)
-
- `core/archipelago/src/api/rpc/container.rs:85-107` — `handle_container_stop` (blocking — target of fix)
- `core/archipelago/src/api/rpc/container.rs:61-83` — `handle_container_start`
- `core/archipelago/src/api/rpc/container.rs:148-154` — narrow state mapping (drops transitional → "unknown")
- `core/archipelago/src/api/rpc/package/runtime.rs:11-24` — `stop_timeout_secs` table (reference, unchanged)
- `core/archipelago/src/api/rpc/package/runtime.rs:122-173` — `handle_package_stop` (also blocking, mirror treatment)
- `core/archipelago/src/api/rpc/package/runtime.rs:28-119` — `handle_package_start`
- `core/archipelago/src/api/rpc/package/runtime.rs:176-242` — `handle_package_restart`
- `core/archipelago/src/api/rpc/package/progress.rs` — existing broadcast pattern to mirror (`set_install_progress`, `set_uninstall_stage`)
- `core/archipelago/src/api/rpc/mod.rs:62-100` — `RpcHandler` struct (already holds `Arc<dyn ContainerOrchestrator>` + state_manager)
- `core/archipelago/src/server.rs:812-857` — `scan_and_update_packages` (merge loop at L850-857 is where transitional-state clobber happens)
- `core/archipelago/src/container/docker_packages.rs:636-663` — `convert_state` + `package_state_str` (read-only reference, no change)
- `core/archipelago/src/container/traits.rs` — `ContainerOrchestrator` trait (stays synchronous, do not change)
- `core/archipelago/src/crash_recovery.rs` — `mark_user_stopped` / `clear_user_stopped` (call order preserved)
- `core/archipelago/src/data_model.rs:107-124` — `PackageState` enum (no change — all variants exist)
- `neode-ui/src/api/container-client.ts` — `ContainerStatus` type + RPC methods (extend)
- `neode-ui/src/stores/container.ts:93-312` — Pinia store (add `getAppVisualState`, add `restartContainer` action)
- `neode-ui/src/views/ContainerApps.vue:85-136, 239, 276, 295, 309-312, 383` — two-button block + state reads
- `neode-ui/src/views/ContainerAppDetails.vue:83, 220, 232` — details page Stop/Start
-
-### Chaos harness (not in repo — lives on .116)
-
- `archipelago@192.168.1.116:~/ui-chaos/` — deployed, playwright + deps installed, smoke test for bitcoin-core passes (2.1 min). LND/ElectrumX/bitcoin-ui smoke tests not yet run (blocked on the async-stop fix landing; LND currently Exited on .228 from the demo).
- `/tmp/chaos/` on laptop — canonical source for rsync to .116.
- Run: `cd ~/ui-chaos && npx playwright test tests/<spec>`
- Target: 32 cases = 4 core containers × 8 scenarios (install-fresh, graceful-stop, sigkill, rm-container, oom-kill, rm-image, restart-service, network-partition).
- Uses SSH+Playwright hybrid per design; includes the `bash -lc '<escaped>'` single-quote fix for ssh argv flattening and JSON-parsed `podman inspect` instead of Go templates.
-
-### Pre-existing bugs still deferred (do not fix until Stop UX lands)
-
-1. `archipelago --version` spawns server (should be a pure CLI query)
-2. RPC unknown-method returns generic error (should return method-not-found with the bad method name)
-3. `docker_packages.rs` filters out UI containers (`archy-lnd-ui`, `archy-electrs-ui`) — some views need them visible
-4. `lnd.lan_address` stale on .228
-5. first-boot silent failure on some hardware
-6. `web-ui.failed.*` scar on .228 (benign systemd unit state)
-7. `test_parse_image_versions` pre-existing broken assertion — fix or `#[ignore]` when touching that area
-
---
-
-## Where we are
-
-Working through the 11-step plan in [`rust-orchestrator-migration.md`](./rust-orchestrator-migration.md).
-
- [x] **Step 1** — `3767c267` ContainerConfig schema with `build:`, `ResolvedSource` enum, `resolve()`, 10 tests
- [x] **Step 2** — `34af4d9d` ContainerRuntime trait gained `image_exists` + `build_image`, 4 argv tests, 25/25 pass
- [x] **Step 3** — `b6a04d31` ProdContainerOrchestrator (999 LOC), 16 tests all pass, not yet wired to main.rs
- [x] **Step 4** — `e8a59c93` ContainerOrchestrator trait, RpcHandler uses it in prod (+ `13858842` chore gitignore ._*)
- [x] **Step 5** — `fc39b04b` BootReconciler with Arc<Notify> shutdown, 4 paused-time tests pass
- [x] **Step 6** — `48f08aa3` main.rs wire-up (orchestrator construction + adopt_existing + BootReconciler spawn + shutdown Notify)
- [x] **Step 7** — `069bc4a5` bitcoin-ui pre-start hook + embedded nginx.conf template (8 unit tests + 1 integration test), 39/39 container:: tests pass
- [x] **Step 8a** — `a0707f4d` retire archipelago-reconcile.{service,timer} + ISO builder touchpoints, keep scripts for update.rs
- [x] **Step 9** — **Hot-swap on .228 verified.** All three UIs (bitcoin-ui/lnd-ui/electrs-ui) installing + serving HTTP 200.
- [x] **.228 dashboard bugs** — ExtraHost `192.168.1.254` bug (`3ee192ba`) + LND macaroon permission bug (`be960023`). See "Post-Step 9 bug hunt" below.
- [ ] **Step 8b** — Port remaining ~25 container creations from `first-boot-containers.sh` into `apps/<id>/manifest.yml`, then port `update.rs` to orchestrator (deferred, multi-day work)
- [ ] **Step 8c** — Rename `first-boot-containers.sh` → `first-boot-setup.sh`, strip container ops, keep setup. Delete `reconcile-containers.sh` + `container-specs.sh`. Add ISO lines to copy `apps/` (final one-way door, requires 8b complete)
- [ ] **Step 10** — Hot-swap + verify on .116 (adoption-heavy test — .116 already has all containers running)
- [ ] **Step 11** — Chaos matrix on both nodes (all 8 scenarios × all containers incl. bitcoin-core)
-
-## Post-Step 9 bug hunt (.228, 2026-04-23)
-
-User reported three visible dashboard bugs after Step 9 verification:
-1. LND — "no connect details or QR"
-2. ElectrumX — stuck at "Building index (2 KB / ~130 GB)" for days
-3. bitcoin-core — in scope for chaos testing
-
-**Root cause #1 (ExtraHost, commit `3ee192ba`)**: `scripts/first-boot-containers.sh` computed `HOST_GATEWAY` from `ip route show default`, which returns the **LAN router** (e.g. 192.168.1.254), not the gateway to the host. Every container configured with `--add-host=host.containers.internal:$HOST_GATEWAY` was dialing the WiFi router instead of the host. LND crash-looped with `dial tcp 192.168.1.254:8332: connection refused`; ElectrumX's DAEMON_URL hit the same dead end; any `archy-net` bridge consumer of bitcoin-core's RPC was broken. Fixed by replacing the computed value with podman's magic `host-gateway` literal (supported since 4.4; we ship 5.4.2). Live-recreated bitcoin-core/electrumx/lnd on .228 with the corrected `--add-host`; LND reached chain backend; ElectrumX resumed indexing (went from 2 KB → 164.9 MB in under an hour).
-
-**Root cause #2 (macaroon permissions, commit `be960023`)**: LND's `admin.macaroon` lives at `/var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon`, owned by rootless-podman subordinate UID 100000, mode 640. The archipelago server runs as host UID 1000 and literally cannot read the file. Every LND RPC (`getinfo`, `connect-info`, `export-channel-backup`) plus the shared `lnd_client()` helper failed with "Failed to read LND admin macaroon". **Confirmed pre-existing on .116 too** (long-standing bug unrelated to Step 9). Fix: centralised the path as `LND_ADMIN_MACAROON_PATH`, added a `read_lnd_admin_macaroon()` helper in `api/rpc/lnd/mod.rs` that tries direct read first then falls back to `sudo -n cat` (mirrors the pattern already used for Tor onion hostnames). Four call sites routed through the helper. Verified on .228 — `curl -k https://<host>/lnd-connect-info` now returns 200 with cert + macaroon + tor_onion; dashboard QR unblocked.
-
-## Step 9 evidence (.228, 2026-04-23)
-
- Binary: Step 9 build with `732df1b8` + `ba83f9bc`, scp'd to .228 as `/usr/local/bin/archipelago`. Old binary backed up at `/usr/local/bin/archipelago.bak-pre-step9`. Later replaced with macaroon-fix build (`be960023`); previous backed up at `/usr/local/bin/archipelago.bak-pre-macaroon`.
- DEV_MODE override disabled (`override.conf` → `override.conf.disabled-pre-step9`).
- `/opt/archipelago/apps/{bitcoin-ui,electrs-ui,lnd-ui}/manifest.yml` populated.
- `/opt/archipelago/docker/bitcoin-ui/Dockerfile` replaced with the Step 7 version (no `COPY nginx.conf`). Old dir backed up as `bitcoin-ui.bak-pre-step9`.
- Post-start snapshot:
-  - `🔗 Adopted 1 existing container(s): ["electrs-ui"]` — adoption of 13h-running container worked without recreation
-  - `🔄 Boot reconciler started (interval: 30s)` — every 30s, all three app_ids reach `NoOp` after the initial install pass
-  - `bitcoin-ui nginx.conf rendered path=/var/lib/archipelago/bitcoin-ui/nginx.conf auth_hash=97af1c18` — pre-start hook fires in `install_fresh`
-  - `curl localhost:8334` → HTTP 200 (bitcoin-ui), `:8081` → 200 (lnd-ui), `:50002` → 200 (electrs-ui)
-  - OCI memory limits correctly applied: bitcoin-ui=128Mi, electrs-ui=128Mi, lnd-ui=64Mi (was emitted as 0 pre-fix)
-
-## Bugs fixed this session
-
-1. **`parse_memory_limit` truncation bug** (`732df1b8`): lowercased "128Mi" → "128mi" → `trim_end_matches('m')` → "128i" → f64 parse fails → `None.unwrap_or(0)` → OCI `memory.limit:0` → systemd rejects MemoryMax=0. 6 regression tests; `create_container` now omits instead of emitting 0.
-2. **`archipelago.service` cgroup delegation missing** (`ba83f9bc`): belt-and-braces `Delegate=memory pids cpu io`.
-3. **ExtraHost `192.168.1.254`** (`3ee192ba`): see Post-Step 9 bug hunt above.
-4. **LND admin.macaroon unreadable** (`be960023`): see Post-Step 9 bug hunt above.
-
-## Commits made this session
-
-```
-3ee192ba fix(first-boot): use podman host-gateway magic for host.containers.internal
-be960023 fix(lnd): read admin macaroon via sudo fallback
-4b8ef0a0 docs: STATUS.md through Step 9 (.228 hot-swap verified)
-ba83f9bc feat(systemd): delegate cgroup controllers to archipelago.service
-732df1b8 fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes
-a0707f4d refactor: retire archipelago-reconcile.{service,timer}  (Step 8a)
-1c81a739 docs: split Step 8 into 8a/8b/8c
-6e46932f docs: STATUS.md through Step 7
-069bc4a5 feat: bitcoin-ui pre-start hook (Step 7)
-```
-
-Branch is **19 commits ahead of tx1138/main** (local only — user pushes to mirrors personally).
-
-## Uncommitted state
-
-Clean. Only untracked: `tests/` (bats harness from prior session, not in scope), `tmp-dump-spec.py` (scratch).
-
-## Answered design questions (no need to re-ask)
-
-1. UI container naming → `archy-<app_id>` for UIs only; existing bitcoin-knots/lnd/electrumx keep bare names
-2. BITCOIN_RPC_AUTH injection → runtime bind-mount of nginx.conf (no build-args, no envsubst)
-3. Reconciler interval → 30 seconds
-4. Concurrency → per-app `Mutex<()>` in a `DashMap`
-5. Bash scripts → split into 8a/8b/8c; 8a done, 8b/8c deferred
-6. Step 4 extension → `ContainerOrchestrator` trait includes `install(app_id)`; the `manifest_path`-based install RPC stays dev-only
-7. Step 7 bitcoin-ui template → embed via `include_str!`, render on install + every reconcile, atomic tmp+rename to `/var/lib/archipelago/bitcoin-ui/nginx.conf`, bind-mount into container. RPC user hardcoded `archipelago`, password from `/var/lib/archipelago/secrets/bitcoin-rpc-password`.
-
-## Context: which host is what
-
-| Host | IP | Role | Dashboard pw | Sudo pw |
-|---|---|---|---|---|
-| `archy` | 192.168.1.116 | **Dev ThinkPad** (Lenovo X250, Debian 13). Currently running v1.7.42-alpha (DEV_MODE). Step 10 target. | archipelago | ThisIsWeb54321@ |
-| `archy228` | 192.168.1.228 | Kiosk HP ProDesk. **Step 9 landing zone** — now running Rust-orchestrator binary in prod mode. | password123 | archipelago |
-
-Both are development alpha nodes — **full destructive latitude**, no need to ask before stop/start/rebuild.
-
-## Next action
-
-**Step 10 — Hot-swap on .116.**
-
-Unlike .228 (which tested the INSTALL path for net-new UI containers), .116 tests the ADOPTION path: it already has all three UIs and all backend containers running from prior v1.7.42-alpha runs. We want to verify the new prod orchestrator adopts every existing container without recreating or restarting them.
-
-Steps:
-1. Disable DEV_MODE on .116 (check if override.conf exists — `/etc/systemd/system/archipelago.service.d/`)
-2. Stage the already-built binary at `~/Projects/archy/core/target/release/archipelago` → `/usr/local/bin/archipelago.new`
-3. Ensure `/opt/archipelago/apps/{bitcoin-ui,electrs-ui,lnd-ui}/manifest.yml` present (copy from repo)
-4. Ensure `/opt/archipelago/docker/bitcoin-ui/` matches the Step-7 layout (no baked nginx.conf)
-5. Snapshot: `podman ps -a --format "{{.Names}}\t{{.Status}}\t{{.CreatedAt}}"` → save to `/tmp/pre-step10-containers.txt`
-6. `systemctl stop archipelago` → install binary → `systemctl start archipelago`
-7. Verify in journal: every running container appears in "Adopted N existing container(s)"; no container was recreated; all HTTP smokes still 200; BootReconciler reaches NoOp on every app_id after one pass.
-8. If broken → restore `.bak` binary, re-enable DEV_MODE override.
-9. Commit STATUS.md update.
-
-**Risk on .116:** If adoption fails mid-flight, we'd lose the running v1.7.42 backend that I'm currently typing at. Keep a second SSH session open to the ThinkPad for emergency revert. The backup plan is `install /usr/local/bin/archipelago.bak /usr/local/bin/archipelago && systemctl restart archipelago`.
-
-**After Step 10 we are blocked on Step 8b** (multi-day manifest ports) before Step 11 (chaos matrix).
-
---
-
-### Why Step 8 got split (discovered 2026-04-23)
-
-Original plan was one commit "delete bash + edit ISO builder". But on investigation:
- `first-boot-containers.sh` creates **30+ containers** with per-container logic (wallets, DB init, rpcauth derivations, post-create health waits). The repo only has manifests for 3 (bitcoin-ui, electrs-ui, lnd-ui from Step 7). Deleting bash now = brick first-boot on fresh installs.
- Script also does non-container setup: secret generation (RPC pw, DB pw, FileBrowser admin pw), UID-mapping chowns for rootless podman subuid, Tor hostnames dir, WireGuard, firewall rules, nostr-relay dir. None of this lives in the Rust orchestrator.
- `update.rs` (OTA update RPC) invokes `reconcile-containers.sh` at two sites. Deleting the script breaks package updates. Porting those call sites to the orchestrator needs all containers to have manifests.
- Design doc §505 updated to split 8 → 8a/8b/8c. Only 8a (delete the reconcile systemd unit + timer, BootReconciler covers) is safe to execute before we port manifests.
-
---
-
-# Archipelago — Current State, Plan, and Releases
-
-Updated: 2026-04-22
-
-This is the "pick this up tomorrow" page. One-stop summary of where we are, what the plan is, and what's shipped. Detailed plan lives in [`bulletproof-containers.md`](./bulletproof-containers.md).
-
---
-
-## Current state
-
-### Fleet status
-
-All four Gitea mirrors are synced to v1.7.40-alpha:
-
-| Mirror | Host | Status |
-|---|---|---|
-| tx1138 | https://git.tx1138.com | ✅ v1.7.40-alpha live |
-| gitea-local | http://localhost:3000 | ✅ v1.7.40-alpha live |
-| .160 | http://23.182.128.160:3000 | ✅ v1.7.40-alpha live (Gitea recovered via `podman system renumber` — see below) |
-| .168 | http://146.59.87.168:3000 | ✅ v1.7.40-alpha live |
-
-Fleet test nodes:
-
-| Node | Version | State |
-|---|---|---|
-| .103 (dev) | 1.7.40 | running, being developed against |
-| .116 (this box) | 1.7.40 | healed manually via `systemd-run chmod 755 /opt/archipelago/web-ui` after v1.7.38/39 bug |
-| .198 | 1.7.39 → 1.7.40-alpha | healed manually |
-| .228 (primary test) | 1.7.40-alpha | healed manually; bitcoin-core + lnd + electrumx running; UI companions currently missing; bitcoin.conf rpcauth patched live |
-| .249 (ISO test) | unreachable today | |
-| .253 | 1.7.39 → 1.7.40-alpha | healed manually |
-
-### Known open issues (drives the plan below)
-
-1. **UI companion containers disappear** on .228 after daemon restarts — no auto-recreate (fixed by v1.7.45 Quadlet migration)
-2. **bitcoin.conf rpcauth drifts** from canonical secret → ElectrumX "Daemon connection problem" (fixed by v1.7.43 reconcile::derived)
-3. **`host.containers.internal`** resolves to LAN gateway inside containers on some versions (fixed by v1.7.42 containers.conf)
-4. **Podman state DB loss** requires manual recovery (fixed by v1.7.44 startup self-heal)
-5. **LND "Connect Wallet" info** vanishing after crashes — symptom of the same drift class as #2
-6. **ElectrumX not syncing** on .228 — downstream of #2; will resolve when bitcoin.conf is reconciled
-
-### Recent field incident (2026-04-22)
-
- Shipped v1.7.38 + v1.7.39, both broke nginx fleet-wide because the frontend tarball's root dir was `drwx------` (700). Every node that OTA'd got 500 errors on every page.
- Root-cause fix shipped in v1.7.40 (`create-release-manifest.sh` chmod + pre-ship assertion that `tar tvzf | head -1` shows `drwxr-xr-x`).
- .160 Gitea was down all day (502) because its rootless podman's `libpod/bolt_state.db` had vanished. Recovered via clearing `/run/user/$UID/{containers,libpod,podman}` + `podman system renumber`.
- Full failure-mode audit is in [`bulletproof-containers.md`](./bulletproof-containers.md).
-
---
-
-## Plan
-
-We're shipping a level-triggered **reconciler + Quadlet** architecture over six incremental releases. Each release closes one failure mode. See [`bulletproof-containers.md`](./bulletproof-containers.md) for the full design, code layout, test harness, chaos matrix, sources.
-
-### Release roadmap
-
-| Release | Closes | What lands | Status |
-|---|---|---|---|
-| **v1.7.41** | FM5 (bad OTA nginx 500) | Post-OTA auto-rollback. New binary probes `https://127.0.0.1/` on boot; if non-200 within 90s, restores `web-ui.bak` + calls `rollback_update()` + restarts | **in flight — deploying to .228 for test** |
-| **v1.7.42** | FM4 (`host.containers.internal` wrong) | `/etc/containers/containers.conf` w/ `host_containers_internal_ip = 10.89.0.1`; every container gets `--add-host=host.archipelago:10.89.0.1` | pending |
-| **v1.7.43** | FM2 (config drift) | `reconcile::derived::render_bitcoin_conf` — pure fn over canonical secret, rewrites on drift. Same for `lnd.conf` | pending |
-| **v1.7.44** | FM6 (podman state loss) | Startup probe detects broken podman state, auto-recovers via `/run/user/$UID/*` clear + `system renumber` | pending |
-| **v1.7.45** | FM1 + FM3 (companion orphans) | `archy-bitcoin-ui` → Quadlet `.container` unit in `/etc/containers/systemd/`. systemd (not archipelago) owns it | pending |
-| **v1.7.46** | — | `archy-lnd-ui` → Quadlet | pending |
-| **v1.7.47** | — | `archy-electrs-ui` → Quadlet | pending |
-| **v1.7.48+** | all (full daemon refactor) | `core/archipelago/src/reconcile/` module replaces imperative `install.rs` container management. Main app containers become Quadlet too | pending |
-
-Test harness (bats + Goss + Chaos Toolkit + vmtest) lands scaffold in v1.7.41, first lifecycle tests blocking v1.7.45, full matrix blocking beta tag.
-
---
-
-## Release history
-
-### [v1.7.41-alpha](/releases/v1.7.41-alpha/) — IN FLIGHT — 2026-04-22
-**Post-OTA auto-rollback.** After an update lands, the node probes its own web UI through nginx — if the frontend isn't answering cleanly within 90 seconds, the node automatically rolls back to the previous version and restarts. A bad release can no longer leave the fleet stranded on an unreachable node.
-
-Changes:
- `core/archipelago/src/update.rs`: `PendingVerification` struct, write marker before service restart, `verify_pending_update()` on new binary boot — probes `https://127.0.0.1/`, on fail restores `web-ui.bak` + calls `rollback_update()` + `systemctl restart archipelago`
- `core/archipelago/src/main.rs`: startup task invokes verifier concurrently with server
-
-### [v1.7.40-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.40-alpha/) — 2026-04-22
-**Proper fix for the 500 error.** Fixed the v1.7.38/39 tarball-perms bug at its source — staging dir is now explicitly `chmod 755` before tar; `--mode=u=rwX,go=rX` normalizes archive perms; pre-ship assertion aborts release if `tar tvzf | head -1` isn't `drwxr-xr-x`.
-
-Changes:
- `scripts/create-release-manifest.sh`: pre-tar chmod + tar --mode flag + post-tar verify
- Everything from .38 + .39 still in place (onboarding auto-heal, silent logins, app purge, AIUI in tarball)
-
-### [v1.7.39-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.39-alpha/) — 2026-04-22
-**Hotfix attempt** for v1.7.38's nginx 500 (didn't fully work — still shipped broken tarball perms). Added startup self-heal chmod in `main.rs` and post-extract chmod in `update.rs` OTA applier.
-
-### [v1.7.38-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.38-alpha/) — 2026-04-22
-**Onboarding auto-heal + silent logins + App Store trim.**
-
-Changes:
- `auth.rs`: `is_onboarding_complete()` auto-heals from `setup_complete` + `password_hash` (prevents clear-cache → onboarding wizard bug)
- `useOnboarding`: tri-state — backend-unreachable no longer defaults to `/onboarding/intro`
- Login sounds gated by `isFirstInstallPhase()` — silent after onboarding, typing sounds unaffected
- Removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog + Rust + docker + icons
- Deleted 15 image versions from tx1138, .168, gitea-local registries
- AIUI baked into release tarball via `demo/aiui/`
- `prebuild` hook syncs `app-catalog/catalog.json` → `public/catalog.json`
-
-(Shipped with tarball-perms bug; fleet had to be healed before v1.7.40.)
-
-### [v1.7.37-alpha](https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.37-alpha/) — 2026-04-22
-**Bitcoin Core install fixes + dynamic node UI + full-archive default.**
-
- Bitcoin Core passes explicit `-rpcbind/-rpcallowip/etc.` CLI args so vanilla image exposes RPC
- Split `bitcoin-core` from `bitcoin-knots` in backend `AppMetadata`
- bitcoin-ui auto-detects Core vs. Knots from subversion, swaps branding at runtime
- Storage (Full Archive · X GB / Pruned) indicator on dashboard
- Node Settings modal shows real values (network, storage, txindex, ZMQ, RPC port)
- Pull fallback to `docker.io` when no mirror carries the image
- Removed `prune=550` hardcode — full archive default
-
---
-
-## Key docs
-
- [`bulletproof-containers.md`](./bulletproof-containers.md) — full reconcile architecture, code layout, test matrix, chaos scenarios, sources
- [`BETA-RELEASE-CHECKLIST.md`](./BETA-RELEASE-CHECKLIST.md) — existing beta checklist
- [`BETA-ISSUES-20260328.md`](./BETA-ISSUES-20260328.md) — prior beta-blocker tracking
- [`hotfix-process.md`](./hotfix-process.md) — release workflow
- [`architecture.md`](./architecture.md) — system architecture overview
-
---
-
-## How to resume
-
-1. Check fleet mirrors are all live: `curl -sS https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/manifest.json | jq .version`
-2. Read [`bulletproof-containers.md`](./bulletproof-containers.md) for the current plan
-3. Check task list (`/list` or via Claude Code) for the in-flight release
-4. Latest in-flight work: v1.7.41 deploying to .228 for test; will ship to all 4 mirrors once verified
--- a/docs/STEP-8B-PORT-AUDIT.md
+++ b/docs/STEP-8B-PORT-AUDIT.md
@ -1,179 +0,0 @@
-# Step 8b Port Audit — container-specs.sh → apps/*/manifest.yml
-
-Last updated: 2026-04-23
-
-This audit is the scope-lock for Step 8b of `docs/rust-orchestrator-migration.md`. Every container currently declared in `scripts/container-specs.sh:ALL_CONTAINER_SPECS` must be port-faithful to `apps/<id>/manifest.yml` before Step 8c can delete the bash scripts.
-
-Findings in short:
-
- `scripts/container-specs.sh` lists **30 containers** across 5 tiers.
- `apps/*/manifest.yml` exists for **27 app ids**, but the overlap is partial and most of the overlapping manifests are **aspirational stubs written in the original design phase, never reconciled against production behavior**. The image references, container names, network topology, env, and health checks disagree with what actually runs on `.116` and `.228`.
- Only the three UI apps (`bitcoin-ui`, `electrs-ui`, `lnd-ui`) plus `aiui` are truly ported (Step 7 scope).
- The Rust schema (`core/container/src/manifest.rs::AppManifest`) is **missing** several fields needed for a faithful port: `archy-net` network selection, `custom_args`, `entrypoint` override, derived host env (e.g. `HOST_MDNS`), secret-file env injection, and data-dir UID/GID mapping.
-
---
-
-## Table — every spec, mapped
-
-Legend for **Status**:
-
- ✅ PORTED — manifest exists and matches reality (Step 7 done).
- ⚠ STUB — `apps/<id>/manifest.yml` exists but disagrees with `container-specs.sh` (image, name, network, env, or health wrong).
- ❌ MISSING — no manifest file on disk.
- — N/A — intentionally out of Step 8b (optional app with no spec, or already managed by a different system).
-
-| Tier | Spec name (container-specs.sh)   | Actual container name | Image source                        | apps/<id>/ matches? | Status | Notes |
-|-----:|----------------------------------|-----------------------|-------------------------------------|---------------------|--------|-------|
-| 0    | archy-mempool-db                 | archy-mempool-db      | `$MARIADB_IMAGE`                    | mempool/            | ⚠      | Existing manifest (if any) targets mempool combined stack, not the DB sidecar. Likely a companion of `apps/mempool`. |
-| 0    | archy-btcpay-db                  | archy-btcpay-db       | `$BTCPAY_POSTGRES_IMAGE`            | btcpay-server/      | ⚠      | Existing manifest describes only the app container. DB is a silent companion in the current model. |
-| 0    | immich_postgres                  | immich_postgres       | `$IMMICH_POSTGRES_IMAGE`            | (none)              | ❌      | Optional. No `apps/immich/` dir. |
-| 0    | immich_redis                     | immich_redis          | `$VALKEY_IMAGE`                     | (none)              | ❌      | Optional. No `apps/immich/` dir. |
-| 1    | bitcoin-knots                    | bitcoin-knots         | `$BITCOIN_KNOTS_IMAGE`              | bitcoin-core/       | ⚠      | `apps/bitcoin-core/manifest.yml` references `bitcoin/bitcoin:28.4`; production runs Bitcoin **Knots** at `$ARCHY_REGISTRY/bitcoin-knots:latest`. App id mismatch: spec is `bitcoin-knots`, manifest is `bitcoin-core`. Decide: rename spec or rename app id. |
-| 1    | electrumx                        | electrumx             | `$ELECTRUMX_IMAGE`                  | (none)              | ❌      | Separate from `electrs-ui`. No `apps/electrumx/` dir. |
-| 2    | lnd                              | lnd                   | `$LND_IMAGE`                        | lnd/                | ⚠      | Manifest exists; needs verification against current env/ports/caps. |
-| 2    | mempool-api                      | mempool-api           | `$MEMPOOL_BACKEND_IMAGE`            | mempool/            | ⚠      | Companion of `apps/mempool`. May need dedicated manifest or stack-form. |
-| 2    | archy-mempool-web                | archy-mempool-web     | `$MEMPOOL_WEB_IMAGE`                | mempool/            | ⚠      | Companion. |
-| 2    | archy-nbxplorer                  | archy-nbxplorer       | `$NBXPLORER_IMAGE`                  | btcpay-server/      | ⚠      | Companion of BTCPay. |
-| 2    | btcpay-server                    | btcpay-server         | `$BTCPAY_IMAGE`                     | btcpay-server/      | ⚠      | Stub; env, ports, deps need reconciliation. |
-| 2    | fedimint                         | fedimint              | `$FEDIMINT_IMAGE`                   | fedimint/           | ⚠      | **This is the bug from yesterday.** Stub references wrong image (`fedimint/fedimintd:v0.10.0` instead of `$ARCHY_REGISTRY/fedimintd:v0.10.0`), wrong RPC target (`bitcoin-core:8332` instead of `bitcoin-knots:8332`), missing `HOST_MDNS` env, missing `archy-net`, missing `FM_BIND_P2P`/`FM_BIND_API`, missing gateway ports etc. |
-| 2    | fedimint-gateway                 | fedimint-gateway      | `$FEDIMINT_GATEWAY_IMAGE`           | (none)              | ❌      | No manifest. Has complex LND-aware entrypoint in `container-specs.sh:load_spec_fedimint-gateway`. |
-| 2    | immich_server                    | immich_server         | `$IMMICH_SERVER_IMAGE`              | (none)              | ❌      | Optional. |
-| 3    | homeassistant                    | homeassistant         | `$HOMEASSISTANT_IMAGE`              | home-assistant/     | ⚠      | id mismatch: `homeassistant` vs `home-assistant`. |
-| 3    | grafana                          | grafana               | `$GRAFANA_IMAGE`                    | grafana/            | ⚠      | Stub. |
-| 3    | uptime-kuma                      | uptime-kuma           | `$UPTIME_KUMA_IMAGE`                | (none)              | ❌      | Optional. |
-| 3    | jellyfin                         | jellyfin              | `$JELLYFIN_IMAGE`                   | (none)              | ❌      | Optional. |
-| 3    | photoprism                       | photoprism            | `$PHOTOPRISM_IMAGE`                 | (none)              | ❌      | Optional. |
-| 3    | vaultwarden                      | vaultwarden           | `$VAULTWARDEN_IMAGE`                | (none)              | ❌      | Optional. Known-bad container on `.228` (see STATUS.md). |
-| 3    | nextcloud                        | nextcloud             | `$NEXTCLOUD_IMAGE`                  | (none)              | ❌      | Optional. |
-| 3    | searxng                          | searxng               | `$SEARXNG_IMAGE`                    | searxng/            | ⚠      | Stub. |
-| 3    | onlyoffice                       | onlyoffice            | `$ONLYOFFICE_IMAGE`                 | onlyoffice/         | ⚠      | Stub. |
-| 3    | filebrowser                      | filebrowser           | `$FILEBROWSER_IMAGE`                | (none)              | ❌      | **Critical** — this is Archipelago baseline (bootstrapped by first-boot), not an optional app. Lost `.filebrowser.json` yesterday. Must have a manifest. |
-| 3    | nginx-proxy-manager              | nginx-proxy-manager   | `$NPM_IMAGE`                        | (none)              | ❌      | Optional. |
-| 3    | portainer                        | portainer             | `$PORTAINER_IMAGE`                  | (none)              | ❌      | Optional. |
-| 3    | ollama                           | ollama                | `$OLLAMA_IMAGE`                     | ollama/             | ⚠      | Stub. |
-| 4    | archy-bitcoin-ui                 | archy-bitcoin-ui      | `localhost/bitcoin-ui:local`        | bitcoin-ui/         | ✅     | Step 7 done. |
-| 4    | archy-lnd-ui                     | archy-lnd-ui          | `localhost/lnd-ui:local`            | lnd-ui/             | ✅     | Step 7 done. |
-| 4    | archy-electrs-ui                 | archy-electrs-ui      | `localhost/electrs-ui:local`        | electrs-ui/         | ✅     | Step 7 done. |
-
-### Non-spec apps that already have manifests (outside `container-specs.sh`)
-
-These are managed entirely by the install RPC today and already have adoption paths in the Rust orchestrator. They are **not** in 8b scope:
-
- `aiui`, `botfights`, `core-lightning`, `did-wallet`, `endurain`, `gitea`, `indeedhub`, `lightning-stack` (stack), `meshtastic`, `morphos-server`, `nostr-rs-relay`, `router`, `strfry`, `web5-dwn`.
-
---
-
-## Schema gaps blocking faithful ports
-
-`core/container/src/manifest.rs::AppManifest` currently supports:
-
- `container.image` OR `container.build` (mutually exclusive, validated).
- `dependencies: Vec<Dependency>`, `resources: {cpu_limit, memory_limit, disk_limit}`.
- `security: { capabilities, readonly_root, network_policy: string, apparmor_profile }`.
- `ports: Vec<{host, container, protocol}>`, `volumes: Vec<{type, source, target, options}>`.
- `environment: Vec<String>` (each `"KEY=VALUE"`).
- `health_check: {type, endpoint, path, interval, timeout, retries}`.
- `devices: Vec<String>`, `extensions: HashMap<String, Value>` (flatten).
-
-What `container-specs.sh` uses that the schema **does not** express first-class:
-
-| Need | Example from bash | Proposed schema addition |
-|---|---|---|
-| Join the named `archy-net` bridge | `SPEC_NETWORK="archy-net"` | `container.network: Option<String>` (Some("archy-net"), or None for `isolated`, or "host"). Existing `security.network_policy` left as-is for policy knobs (e.g. firewall isolation layer); this new field is literally the podman `--network` value. |
-| Extra args / custom flags | `SPEC_CUSTOM_ARGS="-server=1 -prune=550 ..."` | `container.custom_args: Vec<String>`. |
-| Entrypoint override | `SPEC_ENTRYPOINT="gatewayd --data-dir /data ... lnd --lnd-rpc-host lnd:10009"` | `container.entrypoint: Option<Vec<String>>`. |
-| Host-derived env (mDNS hostname, host IP) | `FM_P2P_URL=fedimint://$HOST_MDNS:8173` | `container.derived_env: Vec<{key, template}>` with a small allow-list of `{{HOST_MDNS}}`, `{{HOST_IP}}`, `{{DISK_GB}}` substitutions resolved at apply time. |
-| Secret-file env (read from `/var/lib/archipelago/secrets/<name>`) | `FM_BITCOIND_PASSWORD=$BITCOIN_RPC_PASS` (from secret file in bash) | `container.secret_env: Vec<{key, secret_file}>`, secret_file relative to `$SECRETS_DIR`. Never logged. |
-| Data dir UID/GID (for rootless mapped chown) | `SPEC_DATA_UID="100070:100070"` | `container.data_uid: Option<String>` (e.g. `"100070:100070"`). Applied as `chown -R` before container create. |
-| Exec health check | `SPEC_HEALTH_CMD="bitcoin-cli ..."` | Extend `HealthCheck` so `type: exec` + `command: Vec<String>` works end-to-end; confirm the runtime honors it. |
-| Optional/skip-when-not-installed semantics | `SPEC_OPTIONAL="true"` | Already covered: `BootReconciler` only installs if an `AppManifest` is registered. For baseline-on-first-boot containers (filebrowser), we use the same install path. No schema change. |
-| Local-image flag (don't pull) | `SPEC_LOCAL_IMAGE="true"` | Already covered: `container.build` vs `container.image`. |
-
-Everything else (tier ordering, dependency tree, readonly_root, tmpfs mounts) is either already in the schema or folded into `custom_args` cleanly.
-
-### tmpfs
-
-`SPEC_TMPFS="/tmp:rw,noexec,nosuid,size=256m ..."` used by `grafana`, `searxng`, `ollama`. Currently no first-class field. Proposed: `volumes[].type: tmpfs` with a new `tmpfs_options` field on `Volume`, or a dedicated `container.tmpfs: Vec<{target, options}>`. Either works; the `Volume`-variant keeps all mount declarations in one place.
-
---
-
-## Proposed commit sequence
-
-Each item is a separate commit. None recreates a container on the fleet.
-
-**8b.0 — schema extensions, no manifest changes, no orchestrator changes**
-
-1. `feat(container/manifest): add network, custom_args, entrypoint, derived_env, secret_env, data_uid, tmpfs fields` — add fields to `ContainerConfig`/`SecurityPolicy`/`Volume`, update `validate()`, add unit tests per new field. Backwards-compat: every existing `apps/*/manifest.yml` must still parse (verify with a `parse_every_real_manifest` test that walks `apps/*/manifest.yml` in the repo).
-
-2. `feat(container/manifest): resolve derived_env against host facts` — add `HostFacts { host_ip, host_mdns, disk_gb }` struct and `resolve_env(facts) -> Vec<String>` method; unit test with a fixed `HostFacts`.
-
-3. `feat(container/manifest): resolve secret_env against a SecretsProvider` — add trait `SecretsProvider { fn read(&self, name: &str) -> Result<String>; }`, stub `FileSecretsProvider` rooted at `/var/lib/archipelago/secrets`, unit test with a tmpdir provider.
-
-**8b.1 — orchestrator honors the new fields**
-
-4. `feat(prod_orchestrator): honor network/custom_args/entrypoint on create` — thread the new `ResolvedContainerConfig` into the runtime's create call. Mock-runtime unit tests for each field.
-5. `feat(prod_orchestrator): chown data dir to data_uid before create` — called from `install_fresh`. Unit test with a tmpdir.
-6. `feat(prod_orchestrator): resolve derived_env + secret_env before create` — wire in `HostFacts` + `SecretsProvider`. Unit test.
-
-**8b.2 — first real backend port: fedimint**
-
-7. `feat(apps/fedimint): port manifest from container-specs.sh with mDNS URLs + archy-net` — rewrites `apps/fedimint/manifest.yml` using the new schema. Includes `container_name: fedimint` (no prefix), `network: archy-net`, `derived_env: [FM_P2P_URL, FM_API_URL]`, `secret_env: [FM_BITCOIND_PASSWORD, ...]`.
-8. `feat(apps/fedimint-gateway): new manifest with LND-aware entrypoint` — creates `apps/fedimint-gateway/manifest.yml`. Dynamic entrypoint is a 2-case template resolved by a derived field `{{LND_AVAILABLE}}` (presence of `/var/lib/archipelago/lnd/tls.cert`). May require a second commit to add that derived fact — scope-judge at write time.
-9. `test(lifecycle): fedimint adoption + fresh-install` — bats scaffold per `docs/bulletproof-containers.md§Test harness`.
-
-**8b.3 — remaining critical backends (one per commit)**
-
-10. `feat(apps/filebrowser): new manifest — baseline Archipelago service` (fixes yesterday's `.filebrowser.json` loss by regenerating via `custom_args: ["--config", "/data/.filebrowser.json"]` + `caps: [..., NET_BIND_SERVICE]`).
-11. `feat(apps/electrumx): new manifest`.
-12. `feat(apps/bitcoin-knots): rename-or-merge with apps/bitcoin-core/manifest.yml` — decide naming once, update everywhere. Recommend: keep `apps/bitcoin-core/` dir (it's the user-visible app name) and use `extensions.container_name: bitcoin-knots` to preserve adoption.
-13. `feat(apps/lnd): reconcile stub against spec`.
-14. `feat(apps/btcpay-server + companions): multi-container stack` — reuse the existing stack path in `api/rpc/package/stacks.rs` OR decide to add `container.companions: Vec<ContainerConfig>`. Defer decision until 10–13 land.
-
-**8b.4 — mempool stack, optional apps**
-
-Continue one-at-a-time until every ⚠ or ❌ row above is ✅.
-
-**8b.5 — port `core/archipelago/src/api/rpc/package/update.rs`**
-
-Replace `reconcile-containers.sh` calls with `ContainerOrchestrator::upgrade(app_id)`. Unblocks 8c.
-
-**8c — delete bash scripts** (per `docs/rust-orchestrator-migration.md`).
-
---
-
-## Runtime-only drift on `.116` — write it into manifests, not scripts
-
-Per `docs/RESUME.md§Runtime-only fixes on .116`, yesterday's patches are:
-
-1. `~archipelago/.config/containers/containers.conf` (`image_copy_tmp_dir = "storage"`) → lands in `first-boot-setup.sh` (renamed in Step 8c) OR in a Rust startup-side prereq hook. Not a per-manifest concern.
-2. Secrets ownership `archipelago:archipelago` → Rust orchestrator's `ensure_secrets` path (already exists; verify it chowns).
-3. `/var/lib/archipelago/filebrowser-data/.filebrowser.json` → handled by filebrowser's `custom_args: ["--config", "/data/.filebrowser.json"]` plus a pre-start hook (mirrors `bitcoin_ui` precedent) that writes the file if absent. Details in 8b.3 commit 10.
-4. Fedimint data dir chown → handled by `container.data_uid: "100000:100000"` in the fedimint manifest.
-
-All runtime-only fixes end up expressed as manifest fields or Rust-side hooks. None survives as bash.
-
---
-
-## Open decisions (lock before writing code)
-
-1. **`bitcoin-knots` vs `bitcoin-core` naming.** Recommend: app id stays `bitcoin-core` (user-facing), container name becomes `bitcoin-knots` via `extensions.container_name`, image is Knots. Or rename both to `bitcoin-knots` for honesty. Pick one and apply everywhere.
-2. **`archy-` prefix rule.** Currently `UI_APP_IDS` in `prod_orchestrator.rs` hardcodes `["bitcoin-ui", "electrs-ui", "lnd-ui"]` → `archy-`. Several backends use `archy-` too (`archy-mempool-db`, `archy-mempool-web`, `archy-nbxplorer`, `archy-btcpay-db`). Recommend: drop the hardcoded list, rely on `extensions.container_name` everywhere, audit all existing manifests to set it explicitly so adoption doesn't orphan.
-3. **Companions (mempool-api + mempool-web + mempool-db, btcpay-server + nbxplorer + btcpay-db).** Two options: (a) one manifest per container with explicit deps and an "app group" id; (b) extend `ContainerConfig` with `companions: Vec<…>`. `apps/lightning-stack/manifest.yml` already shipped probably has a precedent — check its shape before deciding.
-4. **Keep `container-specs.sh` as the source of truth until 8b is fully ported?** Yes. `BootReconciler` only acts on what's in `apps/*/manifest.yml`; anything not ported stays on the bash path until its commit lands. Zero-downtime migration.
-
---
-
-## Where to resume
-
-After user approves this plan: commit 1 in 8b.0 (schema extensions + tests, no orchestrator or manifest changes). Smallest possible diff, highest leverage, and unblocks every subsequent port.
-
-## Validation Snapshot - 2026-04-28
-
- Runtime cleanup: removed orphan `bold_lichterman` duplicate; retained managed `filebrowser`.
- Launch policy alignment: local app launches are port-based; iframe-blocked apps (including `gitea`) are forced to new-tab.
- App icon reliability: image fallback now retries `.svg` when `.png` does not exist.
- Required stack verification on `.116`:
-  - `tests/lifecycle/bats/required-stack.bats` -> PASS
-  - `ARCHY_ALLOW_DESTRUCTIVE=1 tests/lifecycle/bats/required-stack-destructive.bats` -> PASS
- Broad host-port probe confirms HTTP 200 responses for user-facing app UIs on mapped ports; non-HTTP ports intentionally excluded from HTTP pass/fail semantics.
-
--- a/docs/WEEKLY_RELEASE_TRACKER.md
+++ b/docs/WEEKLY_RELEASE_TRACKER.md
@ -1,288 +0,0 @@
-# Weekly Release Tracker
-
-Last updated: 2026-06-14 (session on node .116 / archi-thinkpad)
-
---
-
-# ▶ IN PROGRESS — LND wallet auto-unlock fix (2026-06-14)
-
-## RESUME PROMPT (paste into a fresh session, on .116 / archi-thinkpad, tree at /home/archipelago/Projects/archy)
-
-> Resume the LND wallet-password fix. Read memory `project_lnd_wallet_password.md` FIRST (full
-> root-cause + design + validated facts). Work is on branch `lnd-wallet-password-fix` (pushed to
-> gitea-vps2, commit 91adc281, NOT merged to main, NOT shipped). Bug: hardcoded
-> `WALLET_PASSWORD="hellohello"` left LND wallets LOCKED fleet-wide after OTA → Bitcoin-receive
-> shows "wallet is locked" on every updated node. DONE + cargo-checked: per-node random secret
-> (secrets/lnd-wallet-password), both init paths unified, candidate-unlock with fail-fast,
-> login-time candidate-migration (ChangePassword). DETECTION GATE already shipped on main
-> (commit 8c8e4d7a). DECISION: alpha, NO funds on nodes → destructive wipe+recreate is OK and
-> wanted UNATTENDED for ALL nodes in the next update. A wallet locked with an unknown password is
-> already inaccessible, so wiping loses nothing reachable.
-
-## EXACT NEXT STEPS — LND fix (in order)
-1. **Finish seed/fresh recovery** (REMAINING piece): in `container/lnd.rs ensure_wallet_initialized`,
-   when wallet.db exists but ALL unlock candidates fail → wipe wallet.db (+ macaroons + graph/chain
-   mainnet state, as root via host_sudo) and re-init fresh (random genseed + per-node secret) so the
-   node self-heals unattended at boot. (Login-time candidate-migration already handles nodes whose
-   pw matches.) Validate the wipe→reinit mechanic on the scratch LND first (see below).
-2. **Scratch validation** (was in progress, .249 unreachable from .116's subnet → use a throwaway
-   `lnd-scratch` podman container on .116, regtest/neutrino, REST :18099 — already proven for
-   init/unlock/ChangePassword). Test: init(passA) → restart→LOCKED → delete wallet.db while locked →
-   confirm /v1/state→NON_EXISTING (may need container restart) → genseed+initwallet fresh → unlock.
-   NOTE: scratch wallet.db lives at the container's LND data dir (regtest), `podman exec lnd-scratch
-   find / -name wallet.db`. CLEAN UP: `podman rm -f lnd-scratch` when done.
-3. `cargo check -p archipelago` (on .116 ~15-30s incremental; full test compile ~9min).
-4. **End-to-end on .228** (reachable 192.168.1.x, SSH pw `archipelago`, UI pw unknown, NO funds —
-   has a locked unknown-pw wallet = perfect auto-recreate test): build binary
-   (`ARCHIPELAGO_TARGET=archipelago@192.168.1.228 scripts/deploy-to-target.sh` or per
-   reference_deploy_to_nodes), deploy, restart, confirm wallet auto-recreates+unlocks, lncli state
-   RPC_ACTIVE, lnd.newaddress returns an address. Run os-audit against .228 → lnd check PASS.
-5. Merge `lnd-wallet-password-fix` → main, then **cut + publish v1.7.93-alpha** (carries the LND
-   fix). Ship ritual: create-release.sh 1.7.93-alpha → add CHANGELOG (≥3 layman bullets) → run
-   sync-whats-new.py (the new What's-New gate will require it) → publish-release-assets.sh gitea-vps2
-   → push origin/gitea-vps2 + tags → verify live manifest==1.7.93-alpha. Heads-up: create-release
-   leaves core/Cargo.lock version-bump uncommitted (commit it as a chore, both .91 and .92 hit this).
-
-## Context: how we got here (this session, all on node .116)
- Shipped **v1.7.91-alpha** (bitcoinReceive TS2538 build fix) and **v1.7.92-alpha** (ElectrumX
-  overlay-during-sync fix; L3 reboot os-audit gate; What's-New sync gate + 8-version backfill) —
-  both LIVE on vps2. Restored .116-local nginx `/lnd-connect-info` route (was dropped 2026-06-10).
- Triaged user symptoms: ElectrumX "can't connect" = electrs syncing / Bitcoin verifying (not a
-  regression); .228 "5/14 apps after reboot" = normal ~5min staggered startup (all 14 came up).
- LND lock bug found + detection gate shipped + forward fix & migration implemented (this section).
-
---
-
-# ✔ DONE PASS — v1.7.91-alpha + v1.7.92-alpha (2026-06-14)
-
-## Outcome (both releases PUBLISHED + LIVE on vps2)
-
- **v1.7.91-alpha** — bitcoinReceive.ts TS2538 build-blocker fixed; cut, published, verified
-  live (`manifest.version==1.7.91-alpha`), tag `v1.7.91-alpha` on vps2. The fleet OTA'd to it
-  (confirmed on .116 + .198).
- **v1.7.92-alpha** — cut, published, verified live (`manifest.version==1.7.92-alpha`), tag on
-  vps2, main@d462e444. Carries:
-  - `fix(ui)` ElectrumX **overlay-during-sync** bug — the "App not reachable / retry" overlay
-    no longer paints over the ElectrumX sync screen (AppSessionFrame.vue gated on `!electrsSync`).
-  - `test(resilience)` **L3 per-boot health gate** — `batch_host_reboot` now runs os-audit.sh
-    after reboot (RPC/OTA/all-apps/FM-guards), not just container-set equality. os-audit validated
-    11/0/0 green on .116.
-  - `feat(release)` **What's New sync gate** — `scripts/sync-whats-new.py` + `whats-new-sync`
-    stage in tests/release/run.sh. Backfilled the 8 missing modal blocks (v1.7.85→.92); the gate
-    fails any release whose CHANGELOG version isn't in the Settings modal.
- **.116 node fix (not shipped — local config)**: restored the `/lnd-connect-info` nginx proxy
-  route that a 2026-06-10 "before-116-routing" change had dropped (fell through to SPA). Backup at
-  `/etc/nginx/conf.d/rpc.tx1138.com.conf.bak-lndconnect-*`. Shipped template already has the route.
- **User symptoms triaged (none were .91/.92 regressions)**: receive-generate "unchanged" = .91's
-  receive change was a behavior-preserving build guard; ElectrumX "can't connect" on .198 = Bitcoin
-  node mid-"Verifying blocks…" (-28) so electrs was "waiting for Bitcoin node"; on .116 electrs was
-  ~59% mid-sync. The overlay UX bug is fixed regardless.
-
-## Known follow-ups (not blockers)
- **gitea-local mirror push fails** (`localhost:3000` → redirect to `/login`, token auth). vps2 is
-  the OTA source and is fine; gitea-local secondary mirror is stale. Diagnose the local Gitea token.
- `sync-whats-new.py` only **inserts missing** versions; it does not rewrite a block when CHANGELOG
-  bullets for an already-present version change (had to delete+resync the .92 block by hand to pick
-  up its 3rd bullet). Fine for the forward case; enhance to idempotently re-render if needed.
-
-## What happened this session
-
- `scripts/create-release.sh 1.7.91-alpha` was running; its release gate PASSED all 7 checks,
-  backend built clean (7m22s), then it **FAILED at step [4/8] frontend build** with:
-  `src/utils/bitcoinReceive.ts(23,24): error TS2538: Type 'undefined' cannot be used as an index type.`
-  Cause: `noUncheckedIndexedAccess` — `codeMatch[1]` is `string | undefined` and was used directly
-  to index `RECEIVE_CODE_MESSAGES`. **FIXED** → `const code = message.match(/\[([A-Z_]+)\]/)?.[1]`
-  then `if (code && RECEIVE_CODE_MESSAGES[code])`. `npx vue-tsc --noEmit` is now clean (exit 0).
-  The failed run aborted BEFORE bumping the manifest (still 1.7.90) or tagging (no v1.7.91 tag),
-  but it HAD already partial-bumped Cargo.toml/package.json/locks to 1.7.91 — those partial bumps
-  are reverted (create-release.sh re-owns the bump); only the genuine TS fix + harness are committed.
- Built a new OS-wide health harness `tests/lifecycle/os-audit.sh` (non-destructive, one scorecard):
-  Section A backend/RPC health, Section B all-apps lifecycle audit (delegates to remote-lifecycle.sh),
-  Section C FM-guards (port-drift + secret-completeness bats, orphan-container sweep). Section A
-  validated all-PASS on .116. Fixed a jq bug in the FM12 OTA-wedge check: `//` treats a legit
-  `false` as empty and fell through to "unknown" — now uses `has()`. Section B is slow (~3 min) and
-  opaque while running because output is captured (`out=$(...)`) not streamed — minor wart, TODO.
-
-## EXACT NEXT STEPS — v1.7.91 (in order)
-
-1. Confirm clean tree + on main (`git status`; create-release.sh requires `git diff --quiet HEAD`).
-   The TS fix + os-audit.sh are committed & pushed; version-bump artifacts reverted to 1.7.90.
-2. Re-run the release: `scripts/create-release.sh 1.7.91-alpha`. Backend is cached (only a .ts
-   changed) so it's fast; the frontend build now passes. It bumps versions, builds, writes
-   releases/manifest.json (→1.7.91-alpha), commits, and tags v1.7.91-alpha.
-   - Memory guards: grep the staged frontend tarball for "1.7.91-alpha" before shipping (silent
-     vue-tsc failures); tarball must be flat (`tar -C web/dist/neode-ui .`).
-3. Publish: `scripts/publish-release-assets.sh 1.7.91-alpha gitea-vps2`, then
-   `git push origin main && git push origin --tags` (origin pushes to BOTH gitea-local + vps2).
-4. Verify manifest LIVE (this is "published"):
-   `curl -fsS http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/manifest.json | jq .version`
-   must show `1.7.91-alpha`. **Then notify the user — they asked to be told when 1.7.91 publishes.**
-5. os-audit harness: run a full green pass on .116
-   (`ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http ARCHY_PASSWORD='ThisIsWeb54321@' tests/lifecycle/os-audit.sh`),
-   confirm Section A FM12 now reads `update_in_progress=false` (PASS not WARN), review B + C findings,
-   then wire os-audit.sh into the reboot-survival (L3) loop as the per-boot gate.
-
---
-
-# ─ HISTORY — v1.7.89-alpha pass (2026-06-12), superseded ─
-
-Last updated: 2026-06-12 ~17:45 EDT (session on node .116)
-
-## RESUME PROMPT (paste into a fresh session)
-
-> Continue the v1.7.89-alpha release pass from /home/archipelago/Projects/archy on node .116.
-> Read docs/WEEKLY_RELEASE_TRACKER.md fully first — it has root causes, fixes already made,
-> and exact next steps. Do NOT redo: AIUI revert (done, validated), updater fixes in
-> core/archipelago/src/update.rs (done, uncommitted), .116 OTA unwedge (done). Resume at
-> "EXACT NEXT STEPS" below.
-
-## EXACT NEXT STEPS (in order)
-
-1. Backend focused tests were running in background:
-   `cd core && timeout 1500 cargo test -p archipelago -- update:: lnd container::image_versions scanner`
-   (log: /tmp/claude-.../tasks/bds4jk19e.output — if lost, just rerun the command; first
-   attempt died at 400s timeout during test compile, 1500s is the right budget).
-   Need: all green.
-2. RESOLVED before session end: vitest recheck passed clean — EXIT=0, 79 files / 645 tests,
-   even while cargo test was compiling. The earlier harness ui-unit-tests FAIL was load/flake
-   (machine saturated by the parallel cargo test compile), not a real failure. On resume just
-   rerun `tests/release/run.sh --quick` WITHOUT a parallel cargo build to confirm green;
-   if it ever fails again, the failing test name is in the stage output (drop `--silent`).
-3. Run full harness: `tests/release/run.sh` (static+frontend+backend). Then commit ALL
-   working-tree changes (one commit, e.g. "fix: harden OTA updates, AIUI desktop gap, LND
-   no-proxy" — CHANGELOG v1.7.89 section is already curated).
-4. Cut release: `scripts/create-release.sh 1.7.89-alpha` (needs clean tree, on main,
-   validates CHANGELOG section exists — it does). Then
-   `tests/release/run.sh --manifest` should pass, and grep the staged frontend tarball
-   for 1.7.89-alpha (memory: silent build failures).
-5. Publish: `scripts/publish-release-assets.sh 1.7.89-alpha gitea-vps2`, then
-   `git push origin main && git push origin --tags` and push gitea-local + tags too.
-   Verify manifest live on http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/manifest.json
-6. Verify OTA on THIS node (.116): schedule is auto_apply; either wait for the scheduler
-   or trigger via UI. Confirm /var/lib/archipelago/update_state.json current_version
-   becomes 1.7.89-alpha, `update_in_progress` returns to false, web-ui + binary versions
-   MATCH (this node currently has web-ui 1.7.84 / binary 1.7.85 mismatch — the OTA heals it),
-   and journalctl shows "Post-OTA verification succeeded" (the new probe falls back to
-   http://127.0.0.1/ which is what .116 serves).
-7. Update this tracker + docs/PROGRESS_MEMORY.md, mark tasks done.
-Purpose: live tracker for this pass — test everything shipped this week (v1.7.83→v1.7.89),
-build the release test harness, fix OTA updates on .116, make updates bulletproof, cut v1.7.89-alpha.
-If the session is cut off, resume from here.
-
-## Task status
-
-| # | Task | Status |
-|---|------|--------|
-| 1 | AIUI revert (mobile back/close gone, desktop gap fixed) | DONE — validated |
-| 2 | Dev server on :8100 with embedded AIUI | DONE — see below |
-| 3 | Inventory this week's release-log items | DONE — see checklist |
-| 4 | Test harness covering this week + seed of system-wide harness | IN PROGRESS |
-| 5 | Fix OTA updates on .116 + bulletproof updates | IN PROGRESS — diagnosis below |
-| 6 | Cut v1.7.89-alpha release | PENDING (gates: 4, 5) |
-
-## State of the working tree
-
- HEAD = 495b9078 (v1.7.89 changelog + AIUI mobile restore committed).
- Uncommitted, intended for v1.7.89-alpha:
-  - `neode-ui/src/views/Dashboard.vue` — chat route back to plain `h-full` (desktop bottom-gap fix). Validated.
-  - `core/.../rpc/lnd/*` + `container/lnd.rs` — LND REST no-proxy + wallet readiness/unlock fixes.
-  - Version bumps to 1.7.89-alpha (Cargo.toml, package.json, locks), CHANGELOG entry.
-  - `neode-ui/vite.config.ts` — added `/aiui` dev proxy (keep; dev-only convenience).
-
-## AIUI validation (task 1) — DONE
-
- HEAD already removed the mobile back button and restored `hideClose=true` (495b9078).
- Working-tree Dashboard.vue removes `dashboard-scroll-panel mobile-scroll-pad` from the chat
-  route (that padding caused the desktop bottom gap); mesh keeps its styling.
- Chat CSS verified byte-identical to last-good 34c4e87d (May 20).
- Playwright check (desktop 1440x900, mobile 390x844): chat fills full viewport, no bottom gap,
-  no mobile back/close. `npm run type-check` + focused route tests + full vitest (645/645) pass.
-
-## Dev server on :8100 (task 2) — DONE
-
- Running: `BACKEND_URL=http://127.0.0.1:5678 VITE_AIUI_URL=/aiui/ npx vite --host 0.0.0.0 --port 8100`
-  from `neode-ui/` (real local backend on 5678).
- AIUI now embeds in /dashboard/chat via new vite proxy `/aiui` → `http://127.0.0.1:80`
-  (the node's deployed AIUI), same-origin like production.
- Secondary throwaway instance for automated checks: :8101 against mock backend
-  (`node mock-backend.js` on 5959, password `password123`).
-
-## This week's shipped items (v1.7.83 → v1.7.89) — test checklist
-
-### Frontend (vitest/type-check/build cover most; full suite 645/645 green 2026-06-12)
- [x] AIUI fast launch, no availability probe (v1.7.88) — covered by visual check + Chat.vue tests
- [x] AIUI mobile layout restore (v1.7.89) — playwright visual check
- [x] App-session launch metadata from manifests / typed interfaces (v1.7.83) — appSessionConfig tests
- [x] OnlyOffice + Saleor removal (v1.7.83) — catalog tests
- [ ] Bitcoin receive UI flow end-to-end (v1.7.87/88) — needs live LND node check
- [ ] Fleet tab keeps node list/alerts during refresh, names not hashes (v1.7.85/86) — store tests?
- [ ] Credential interstitial full-screen overlay (v1.7.87) — visual
- [ ] Mobile federation/system-update buttons full width (v1.7.86) — visual
-
-### Backend (cargo)
- [ ] LND REST no-proxy client + GET newaddress p2wkh (v1.7.88/89) — unit tests + live check
- [ ] LND wallet readiness/unlock after restart (v1.7.89) — unit + live
- [ ] Bitcoin trusted-node relay rpcauth/txrelay (v1.7.84) — unit tests exist? check
- [ ] Container scanner RAII in-flight guard (v1.7.84) — cargo test
- [ ] ElectrumX health-check startup window + cache tuning (v1.7.85/86)
- [ ] Portainer pin 2.19.4 / bitcoin-ui image pin (v1.7.84/85) — image-versions tests
- [ ] Fleet telemetry name/hostname/URL fields (v1.7.85)
- [ ] Federation no self-import (v1.7.85)
- [ ] Kiosk safe-area + self-update refreshes kiosk files (v1.7.84)
- [ ] Wi-Fi scan error/retry/escaped SSID/open networks (v1.7.84)
-
-### OTA / updates (task 5)
- [ ] .116 stuck: current 1.7.85-alpha, `update_in_progress: true` since 1.7.88 attempt — diagnose+fix
- [ ] Updater hardening: stuck-in-progress recovery, resumable/atomic apply, verify post-restart version
-
-## OTA diagnosis on .116 — ROOT CAUSES FOUND + FIXED (code staged for v1.7.89)
-
-Four bugs, all reproduced from the journal (Jun 12 03:45–04:33):
-
-1. Post-OTA probe only tries `https://127.0.0.1/`; .116's nginx binds only :80 (443 is
-   tailscale's) → connection refused × 18 → a GOOD 1.7.85 update was "rolled back".
-   FIX: probe falls back to `http://127.0.0.1/` on connect error (update.rs probe_frontend_once).
-2. That rollback's binary restore did `host_sudo cp` onto the RUNNING binary → ETXTBSY exit 1
-   → binary stayed 1.7.85 while web-ui rolled back to 1.7.84 (mismatch confirmed live).
-   FIX: rollback now cp→tmp→atomic mv, same pattern as apply (update.rs rollback_update).
-3. The rollback chown'd `update-backup/archipelago` root:root IN PLACE → next apply's
-   fs::copy (as service user) hit EACCES → "Failed to backup current binary" × 3 → 1.7.86/88
-   never applied. FIX: apply unlinks stale backup first; rollback chowns only its temp copy.
-4. Failed apply left `update_in_progress: true` wedged (staging still populated so the
-   stale-flag guard never fires). Unwedged operationally; fixed structurally by 1–3.
-
-Operational cleanup DONE on .116 (2026-06-12 17:15): removed root-owned
-`update-backup/archipelago`, stale `update-staging/` (1.7.86), and the stale
-`update-pending-verify.json`. Next state load clears `update_in_progress`.
-NOTE: live web-ui is 1.7.84 / binary 1.7.85 (mismatch from bug 2). Not hand-patched —
-the v1.7.89 OTA will resync both. Good 1.7.85 frontend is quarantined at
-`/opt/archipelago/web-ui.failed.1781250438247`.
-Verification plan: after v1.7.89 release, watch .116 auto-apply (schedule auto_apply),
-confirm `update_state.json.current_version == 1.7.89-alpha` and web-ui version matches.
-
-## Test harness (task 4) — CREATED at tests/release/run.sh
-
- Stages: static (git diff --check, cargo fmt, catalog drift, optional --manifest),
-  frontend (type-check, full vitest), optional --with-build (build + grep dist for version),
-  backend (cargo check + focused cargo test: update:: lnd container::image_versions scanner,
-  all wrapped in `timeout`), optional --live URL smoke (/, /aiui/, /rpc/v1).
- Results so far (2026-06-12): type-check PASS, full vitest 645/645 PASS, cargo fmt PASS,
-  cargo check PASS, catalog drift PASS (3 pre-existing MISSING_CATALOG warnings, exit 0,
-  identical on HEAD). Focused backend cargo tests running (first run hit the known slow
-  test-compile on .116 at 400s timeout; rerunning with 1500s).
- AIUI embed verified end-to-end via playwright on :8101 (mock backend): iframe loads,
-  `ready` handshake clears the loading overlay, hideClose honored.
- Release flow confirmed: commit all → `scripts/create-release.sh 1.7.89-alpha` (validates
-  curated CHANGELOG section, builds, manifests, commits, tags) →
-  `scripts/publish-release-assets.sh 1.7.89-alpha gitea-vps2` → push origin main + tags.
-  Tarball layout/perms safety is already inside create-release-manifest.sh.
- CHANGELOG v1.7.89 section rewritten layman-readable (updater fixes added).
-
-## Release gates for v1.7.89-alpha (task 6)
-
-1. All harness stages green locally.
-2. OTA fix for stuck `update_in_progress` included + .116 updates successfully to the new release.
-3. Frontend build: grep packaged tarball for "1.7.89-alpha" before shipping (memory: silent vue-tsc failures).
-4. Flat tarball layout (`tar -C web/dist/neode-ui .`).
-5. Commit, tag `v1.7.89-alpha`, push origin + gitea-local + tags, publish release assets, verify
-   manifest + node OTA picks it up.
--- a/docs/app-registry-status-2026-06-21.md
+++ b/docs/app-registry-status-2026-06-21.md
@ -0,0 +1,153 @@
+# Archipelago App Registry — Status Survey
+
+**Generated:** 2026-06-21 · **Survey node:** .228 (archi resilience node, 14-app) · **Binary:** v1.7.99-alpha
+
+This document inventories every app in the registry and reports, per app:
+manifest-based or not · installed on .228 · migration status (Quadlet/legacy) ·
+automated test coverage / release-gate status.
+
+---
+
+## 1. Architecture context — "manifest-based or not"
+
+**Every registry app is manifest-based.** That is the core architecture
+(Pillar 4, *data-driven apps*): install/uninstall needs only the app's
+`manifest.yml` + catalog entry — no host OS changes, no archipelago binary code
+per app. The live registry on .228 is **40 loaded manifests**
+(`Loaded 40 app manifest(s) from disk`).
+
+The **only** non-manifest runtime units are:
+
+- **4 companions** — `archy-bitcoin-ui`, `archy-lnd-ui`, `archy-electrs-ui`,
+  `archy-fedimint-ui`. Built from `docker/<name>` contexts via
+  `core/archipelago/src/container/companion.rs`, *not* the manifest registry.
+- **Stack sub-containers** — `immich_*`, `indeedhub-*`, `netbird-*`. Spawned by
+  their parent manifest app.
+
+---
+
+## 2. Migration status (Quadlet-everywhere — Pillar 1)
+
+"Migrated" = runs as a **Quadlet unit under `user.slice`**, so it survives an
+`archipelago.service` restart (legacy in-cgroup containers get SIGKILLed on
+restart and reconciled back).
+
+On .228 migration is **effectively complete** — every installed app is
+`QUADLET:running` **except one**:
+
+| Status | Apps |
+|---|---|
+| ✅ Migrated (Quadlet / user.slice) | bitcoin-knots, electrumx, lnd, fedimint, fedimint-clientd, fedimint-gateway, btcpay-server (+archy-btcpay-db, archy-nbxplorer), mempool, mempool-api, archy-mempool-db, indeedhub (+7 sub-containers), netbird (+server, +dashboard), vaultwarden, jellyfin, filebrowser, portainer, botfights, nostr-rs-relay, homeassistant, + 4 companions |
+| ⚠️ NOT migrated (legacy, service cgroup) | **immich_server** — still in `/system.slice/archipelago.service`. The only legacy holdout. (`immich_postgres`/`immich_redis` are pod members.) |
+
+---
+
+## 3. Exhaustive per-app registry table
+
+| App (registry id) | Manifest | Installed on .228 | Migration | Test coverage |
+|---|---|---|---|---|
+| bitcoin-knots | yes | ✅ | QUADLET | **L1 RPC ●**, L2 UI ● |
+| bitcoin-core | yes | ✗ (shares knots) | — | ◐ regression-gate |
+| lnd | yes | ✅ | QUADLET | **L1 RPC ●**, L2 ● |
+| electrumx | yes | ✅ | QUADLET | **L1 RPC ●**, L2 ● |
+| btcpay-server | yes | ✅ | QUADLET | **L1 RPC ●**, L2 ● |
+| mempool | yes | ✅ | QUADLET | **L1 RPC ●**, L2 ● |
+| mempool-api | yes | ✅ | QUADLET | via mempool stack |
+| archy-mempool-db | yes | ✅ | QUADLET | via mempool stack |
+| archy-mempool-web | yes | ✗ | — | via mempool stack |
+| archy-btcpay-db | yes | ✅ | QUADLET | via btcpay stack |
+| archy-nbxplorer | yes | ✅ | QUADLET | via btcpay stack |
+| fedimint (Guardian) | yes | ✅ | QUADLET | L1 ◐ container-only, L2 ● |
+| fedimint-clientd | yes | ✅ | QUADLET | none |
+| fedimint-gateway | yes | ✅ (this session) | QUADLET | none |
+| filebrowser | yes | ✅ | QUADLET | L2 probe-only |
+| indeedhub | yes | ✅ | QUADLET | none |
+| jellyfin | yes | ✅ | QUADLET | none |
+| vaultwarden | yes | ✅ | QUADLET | none |
+| portainer | yes | ✅ | QUADLET | none |
+| botfights | yes | ✅ | QUADLET | none |
+| nostr-rs-relay | yes | ✅ | QUADLET | none |
+| home-assistant | yes | ✅ (container `homeassistant`) | QUADLET | none |
+| netbird | yes | ✅ (+server, +dashboard) | QUADLET | none |
+| immich | yes | ✅ | ⚠️ **LEGACY** | none |
+| grafana | yes | ✗ (unit *activating*, no container) | staged | none |
+| strfry | yes | ✗ (unit *activating*) | staged | none |
+| ~~onlyoffice~~ | — | removed 2026-06-21 | — | — |
+| aiui | yes | ✗ | — | none |
+| core-lightning | yes | ✗ | — | none |
+| did-wallet | yes | ✗ | — | none |
+| gitea | yes | ✗ | — | none |
+| lightning-stack | yes | ✗ | — | none |
+| meshtastic | yes | ✗ | — | none |
+| morphos-server | yes | ✗ | — | none |
+| nextcloud | yes | ✗ | — | none |
+| photoprism | yes | ✗ | — | none |
+| router | yes | ✗ | — | none |
+| searxng | yes | ✗ | — | none |
+| uptime-kuma | yes | ✗ | — | none |
+| bitcoin-ui | yes | runs as companion `archy-bitcoin-ui` | QUADLET (companion) | L3 companions ● |
+| lnd-ui | yes | runs as companion `archy-lnd-ui` | QUADLET (companion) | L3 companions ● |
+| electrs-ui | yes | runs as companion `archy-electrs-ui` | QUADLET (companion) | L3 companions ● |
+| fips-ui | yes | ✗ | — | none |
+
+Notes:
+- `home-assistant` (registry id) runs as container **`homeassistant`** — the
+  app-id ≠ container-name. A duplicate `home-assistant.service` quadlet unit
+  sits in *activating*; the live container is `homeassistant` (Up 6 days, healthy).
+- `grafana` / `strfry` have Quadlet `.container` units but the units are stuck
+  *activating* with **no running container** — staged, not live. Worth a
+  separate investigation.
+- `onlyoffice` was **removed from the registry on 2026-06-21**.
+
+---
+
+## 4. Test-gate reality
+
+**No app has passed the formal release gate.** The gate is `run-gate.sh` green
+across the full lifecycle matrix (install / UI reachable / stop / start /
+restart / reinstall / reboot-survive / archipelago-restart-survive / uninstall),
+**5× on .228 AND .198**. All 8 release-gate checkboxes in
+`tests/lifecycle/TESTING.md` are **unchecked (☐)**.
+
+What exists today:
+
+| Layer | Status |
+|---|---|
+| L0 unit | 631 tests ● green |
+| L1 RPC | ● for **6 core apps only**: bitcoin-knots, lnd, electrumx, btcpay, mempool, fedimint |
+| L2 UI | ● dashboard + 7 proxy paths + bitcoin-ui:8334 |
+| L3 lifecycle survival | companions ● ; backends ◐ (regression-gate only — fails until Phase-3 Quadlet flag flips by default) |
+| Per-app L1+L2 matrix | **50 of 110 cells** |
+| L4 browser / L5 chaos / L6 perf | ○ 0 — not started |
+
+Regression suites added after v1.7.90-alpha (run read-only, abort releases on
+failure): `bitcoin-receive.bats`, `port-drift.bats`, `secret-completeness.bats`.
+
+**The other ~30 registry apps have zero automated coverage.**
+
+---
+
+## 5. Key gaps
+
+1. **immich** is the last legacy (in-cgroup) app — migrate to Quadlet to finish Pillar 1.
+2. **grafana / strfry** Quadlet units stuck *activating* with no container — investigate. (onlyoffice removed 2026-06-21.)
+3. **fedimint-gateway / fedimint-clientd** (this session) now run but have no lifecycle test coverage.
+4. The formal **5× release gate has never been green** — it is the blocker for the v1.7.52 tag.
+
+---
+
+## 6. This session's changes (2026-06-21)
+
+- **Generated-secrets system** deployed to .228 (binary + manifests). Self-healing:
+  the root-owned `fedimint-gateway-hash` was regenerated archipelago-owned/readable
+  → **fedimint-gateway now starts** (gatewayd webserver up on :8176). `fmcd-password`
+  generated for fedimint-clientd.
+- **Guardian-UI CSS fix** applied on .228: rebuilt the stale `localhost/fedimint-ui:latest`
+  companion image (built 2026-06-12, pre-fix) from the corrected context
+  (`@guardian_assets` proxy fallback to :8177). Guardian's own CSS
+  (`/assets/bootstrap.min.css`, `/assets/style.css`) **404 → 200 text/css**.
+  Root cause: `companion.rs::ensure_image_present` skips rebuild when the
+  `:latest` image already exists, so the context fix never re-baked.
+
+*Survey method: live `podman` cgroup inspection on .228 + `/opt/archipelago/apps`
+manifest enumeration + `tests/lifecycle/TESTING.md`.*
--- a/docs/bitcoin-multi-version-design.md
+++ b/docs/bitcoin-multi-version-design.md
@ -0,0 +1,300 @@
+# Bitcoin Multi-Version Support — Design
+
+<!-- ════════════════════════════════════════════════════════════════════
+     PROGRESS TRACKER / RESUME POINT  (keep this current — update each session)
+     ════════════════════════════════════════════════════════════════════
+**Branch/worktree:** `bitcoin-multi-version` @ `/home/archipelago/Projects/archy-btcver`
+(isolated — never touch `main` or the other agent's branch). All work UNCOMMITTED on
+that branch as of last update.
+
+**Last updated:** 2026-06-28 (session 2 — software end-to-end implemented)
+
+**Motivation refresh:** BIP-110 signalling makes per-node version *choice* a real
+requirement — runners must be able to pick / pin / switch Core & Knots versions.
+
+**User direction this session:** finish the SOFTWARE end-to-end (Phase 1–3 + UI),
+DEFER the Phase 0 image build pipeline. Downgrade policy = **warn + confirm + allow**.
+
+### Status by phase
+- [x] **Phase 1 — catalog schema** (`app_catalog.rs`): `CatalogVersion` struct +
+  `versions[]` + `catalog_versions()` / `catalog_default_version()` /
+  `catalog_image_for_version()` (same-repo guard) DONE. Pin suppresses update badge
+  in `available_update_for_app()` DONE. `versions[]` now EMITTED by
+  `scripts/generate-app-catalog.sh` (curated `VERSIONS` map) → `releases/app-catalog.json`
+  regenerated; bitcoin-core carries its one built version (28.4.0, default). **Knots
+  versions[] intentionally empty** (only floating `:latest` exists; design forbids
+  advertising floating). More versions light up automatically once Phase 0 builds
+  tagged images and they're appended to the `VERSIONS` map.
+- [x] **Phase 2 — install-time selection**: `version_config.rs` (pin/auto-update
+  persistence + `is_downgrade()` + `auto_update_apps()`, unit-tested) DONE;
+  `install.rs` `persist_install_version_selection()` DONE; `prod_orchestrator.rs`
+  pinned-wins resolution DONE. **UI:** `MarketplaceAppDetails.vue` install panel shows
+  a version `<select>` (latest pre-selected) when the app offers ≥2 versions — passes
+  the choice to `package.install`. (Hidden today since only 1 version exists.)
+- [x] **Phase 3 — in-app switch + auto-update toggle**:
+  - `package.versions` RPC (read) + `package.set-config` RPC (write, downgrade-gated)
+    → new `api/rpc/package/set_config.rs`, wired in `mod.rs` + `dispatcher.rs`.
+  - Auto-update tick: `run_update_scheduler` now takes the orchestrator + calls
+    `apply_per_app_auto_updates()` hourly (opt-in, pin-respecting, catalog-driven).
+  - UI: "Version & Updates" card in `appDetails/AppSidebar.vue` (version switch +
+    auto-update toggle + downgrade warn/confirm); `rpc-client.ts` + types added.
+- [x] **Phase 0 — image build pipeline**: `scripts/build-bitcoin-image.sh` —
+  downloads the OFFICIAL upstream tarball + SHA256SUMS(.asc), verifies SHA-256 **and**
+  the OpenPGP signature (fail-closed; pinned release-key fingerprints), builds a
+  minimal **rootless** image (debian-slim + verified `bitcoind`/`bitcoin-cli`),
+  smoke-tests `--version`, tags + pushes `:<version>`. Validated on Core 31.0
+  (pinned-GPG pass, smoke `v31.0.0`). **Published curated set** (registry
+  `lfg2025`): Core **31.0, 30.2, 29.3, 27.2, 26.2, 25.2** (28.4 already present —
+  kept, not overwritten) + Knots **29.3.knots20260508**. `VERSIONS` map in
+  `generate-app-catalog.sh` lists them; catalog regenerated. Adding a future release
+  = run the script for it, then prepend it to the map + regenerate.
+
+### Verification status
+- `cargo check -p archipelago` GREEN (backend). Frontend `npm run build` GREEN
+  (vue-tsc typecheck passes; new RPC strings confirmed in `web/dist`).
+- Unit tests: `version_config` had a pre-existing parallel-test race (shared
+  process-global `ARCHIPELAGO_DATA_DIR`) — FIXED with an `ENV_LOCK` mutex + unique
+  per-test dirs. `set_config` `image_tag` test added.
+- **Phase 0 images verified end-to-end**: SHA-256 + pinned-maintainer OpenPGP
+  signature (deterministic VALIDSIG check), built rootless, smoke-tested, **pushed
+  to the live registry** — confirmed remotely: `bitcoin` tags
+  {25.2,26.2,27.2,28.4,29.3,30.2,31.0} + `bitcoin-knots:29.3.knots20260508`.
+- **NOT yet verified on `.228`** (CLAUDE.md invariant — do before any tag): install
+  bitcoin-core, open its page, switch/pin a version, confirm recreate. All code
+  UNCOMMITTED on the branch.
+
+### Gotchas captured (for resume)
+- `gpg --verify` exit code is unreliable on multi-sig `SHA256SUMS` — must parse
+  `--status-fd` VALIDSIG and require a pinned maintainer fpr (script does this).
+- `podman push` needs the sandbox disabled (`/var/tmp` is RO under the harness
+  sandbox) and `--tls-verify=false` (registry serves HTTP). Persistent keyring
+  (`BITCOIN_KEYRING_DIR`) avoids flaky per-build keyserver fetches.
+
+### Next action when resuming
+1. Re-verify: `cd archy-btcver/core && CARGO_INCREMENTAL=0 cargo check -p archipelago`
+   and `cargo test -p archipelago -- version_config set_config`; `cd neode-ui && npm run build`.
+2. Live-verify on `.228`: install bitcoin-core, open its detail page → "Version &
+   Updates" card; exercise `package.versions` / `package.set-config` via RPC.
+3. Commit on the branch (checkpoint).
+4. **Phase 0** when greenlit: build+push tagged Core/Knots images, then extend the
+   `VERSIONS` map in `scripts/generate-app-catalog.sh` and regenerate the catalog.
+
+### Decisions still needed from user (see §6 open questions)
+Curated version set + storage budget (defaulted to current+~3 majors); when to do
+Phase 0 image pipeline; pruned-node downgrade policy refinement (currently warn+confirm
+for all). Auto-update default = OFF (opt-in), as recommended.
+════════════════════════════════════════════════════════════════════ -->
+
+**Status:** design (2026-06-22)
+**Goal:** let a user choose *which* version of Bitcoin Core / Bitcoin Knots to
+install (latest pre-selected, older versions in a dropdown), and later switch
+versions or opt into auto-update — all manifest/catalog-driven, all served from
+**our signed registry**, rootless, with **zero data loss** across version
+changes.
+
+See also: [`docs/registry-manifest-design.md`](registry-manifest-design.md)
+(catalog distribution + signing this builds on),
+[`docs/PRODUCTION-MASTER-PLAN.md`](PRODUCTION-MASTER-PLAN.md) (gate that must be
+green first), `MEMORY → project_decoupled_app_updates`,
+`MEMORY → project_manifest_driven_north_star`.
+
+> **Scheduling:** this is net-new scope. It lands **after** the production test
+> gate (`tests/lifecycle/run-20x.sh`) is green on `.228` + `.198`. The data-
+> preservation invariant (downgrade vs. chainstate) is the highest risk here.
+
+---
+
+## 1. Where we are today
+
+### Image source / build
+| Thing | Today |
+|-------|-------|
+| `apps/bitcoin-core/Dockerfile` | `FROM bitcoin/bitcoin:24.0` — a **community** image, **stale** (manifest says 28.4), no project-official Docker image exists |
+| `apps/bitcoin-knots/` | **no Dockerfile** — `:latest` is built/pushed by hand |
+| Registry | `scripts/image-versions.sh` → `ARCHY_REGISTRY="146.59.87.168:3000/lfg2025"`; only `BITCOIN_KNOTS_IMAGE=…/bitcoin-knots:latest` pinned, no Core pin |
+| Tags in registry | **one tag per image**. No historical versions. |
+
+### Version pinning
+- `apps/bitcoin-core/manifest.yml` → `…/bitcoin:28.4` (pinned).
+- `apps/bitcoin-knots/manifest.yml` → `…/bitcoin-knots:latest` (**floating** — a
+  liability for reproducibility and for "switch back to the version I had").
+- `core/archipelago/src/container/app_catalog.rs` + `app-catalog/catalog.json`:
+  signed, hourly-fetched, carries `version` (badge text) + `image`.
+  `catalog_image_override()` overrides the manifest image **only if same-repo**.
+  `available_update_for_app()` already ignores floating tags for update
+  detection.
+
+### Install path
+- `prod_orchestrator.rs::install_fresh()` resolves the image as
+  **manifest image → catalog override → pull**. There is **no per-install
+  version parameter** — `orchestrator.install(app_id)` takes only the id.
+- RPC `package.install` (`api/rpc/package/install.rs`) *accepts* `dockerImage` /
+  `version` params but for orchestrator-managed apps (bitcoin-core / bitcoin-knots
+  are allowlisted) it **ignores them** and lets the orchestrator resolve.
+- **Conflict guard** (`prod_orchestrator.rs` ~1306–1325): core and knots may not
+  run simultaneously. Must be preserved by everything below.
+
+### UI
+- Install is **one-click, no modal** (`MarketplaceAppDetails.vue::installApp()`).
+- Update badge + "Update to X" already exist (`appDetails/AppHeroSection.vue`,
+  RPC `package.update`).
+- **No** Bitcoin-specific settings panel; all apps share `AppSidebar.vue`.
+- Per-app config persisted **only at install time** as `containerConfig` →
+  `/var/lib/archipelago/app-configs/<id>.json`. **No post-install set-config RPC.**
+
+---
+
+## 2. Source-of-truth decision: official upstream → our registry
+
+We use the **official releases** as upstream provenance, but nodes only ever pull
+from our registry. Nodes do **not** fetch bitcoin.org / GitHub at install time —
+that would break rootless/offline installs and the signed-registry trust model,
+and neither project publishes an official Docker image anyway.
+
+**Official sources (verified):**
+
+| Impl | Index | Per-version asset pattern |
+|------|-------|---------------------------|
+| Bitcoin Core | [bitcoincore.org/en/releases](https://bitcoincore.org/en/releases/) · [github bitcoin/bitcoin](https://github.com/bitcoin/bitcoin/releases) | `https://bitcoincore.org/bin/bitcoin-core-<ver>/bitcoin-<ver>-x86_64-linux-gnu.tar.gz` + `SHA256SUMS` + `SHA256SUMS.asc` |
+| Bitcoin Knots | [github bitcoinknots/bitcoin](https://github.com/bitcoinknots/bitcoin/releases) · [bitcoinknots.org/files](https://bitcoinknots.org/) | `https://bitcoinknots.org/files/<maj>.x/<ver>/bitcoin-<ver>-x86_64-linux-gnu.tar.gz` (`<ver>` e.g. `29.3.knots20260508`) |
+
+Both ship **signed binary tarballs** with multi-builder Guix attestations
+(`SHA256SUMS.asc`). The build pipeline verifies these **once, at build**; our DHT
+Phase 0 registry signature then carries provenance to the fleet.
+
+> Knots version strings embed a build date (`29.3.knots20260508`). Treat the full
+> string as the tag; surface a friendly `29.3` + date in the UI.
+
+---
+
+## 3. Design
+
+### Phase 0 — Reproducible, verified image pipeline *(prerequisite)*
+
+New `scripts/build-bitcoin-image.sh <impl> <version>` that, per version:
+
+1. Downloads the official tarball + `SHA256SUMS(.asc)` (GitHub release assets are
+   an identical mirror → fallback).
+2. Verifies SHA256 **and** the Guix/builder GPG signatures. **Fail closed.**
+3. Builds a minimal **rootless** image: pin a small base, unpack
+   `bitcoind`/`bitcoin-cli`. Keep the existing entrypoint probe
+   (`command -v bitcoind || find /opt -path '*/bin/bitcoind'`) so per-version
+   layout differences don't break startup.
+4. Tags + pushes `:<version>` **and** updates the default pin (`:latest` /
+   `:28.4`-style) to the registry.
+
+**Curate, don't mirror everything.** Publish a bounded set (proposal: current +
+last ~3 majors), e.g. Core `31.0, 30.0, 29.3, 28.4, 27.2` and Knots
+`29.3.knots…, 28.1.knots…, 27.1.knots…`. **`log` / document dropped versions** —
+silent truncation reads as "all versions supported" when it isn't.
+
+Also fixes existing debt: replaces the stale community `FROM bitcoin/bitcoin:24.0`
+and gives Knots a real Dockerfile + non-floating tags.
+
+### Phase 1 — Version catalog (signed, registry-distributed)
+
+Extend `AppCatalogEntry` (forward-compatible — no `deny_unknown_fields`, old nodes
+ignore it):
+
+```jsonc
+"bitcoin-core": {
+  "version": "31.0",                 // default / latest (existing field)
+  "image": "…/bitcoin:31.0",         // existing
+  "versions": [                      // NEW
+    { "version": "31.0", "image": "…/bitcoin:31.0", "default": true },
+    { "version": "30.0", "image": "…/bitcoin:30.0" },
+    { "version": "28.4", "image": "…/bitcoin:28.4", "deprecated": true, "eol": "2026-...." }
+  ]
+}
+```
+
+Published to `releases/app-catalog.json`, signed by the existing release-root
+mechanism. This is the **single source of truth** the UI reads for "what can I
+install / switch to," and third-party-registry apps inherit the capability for
+free. `version`/`image` stay as the default for back-compat.
+
+### Phase 2 — Install-time version selection
+
+- **Orchestrator:** add `install_with_image(app_id, Option<image_tag>)` (or an
+  optional arg on `install`). When a tag is supplied, **validate same-repo**
+  against the manifest (reuse `image_without_registry_or_tag()`), then override in
+  `install_fresh()`. Default path unchanged. Preserve the core/knots conflict
+  guard.
+- **RPC:** thread the selected version/image from `package.install` into the
+  orchestrator for the allowlisted apps (the param is already received — just not
+  forwarded).
+- **UI:** the first **install modal** in the app — latest pre-selected, dropdown
+  of `versions[]`, deprecated/EOL badges on old entries. On confirm, pass the
+  chosen version to `package.install`.
+
+### Phase 3 — In-app version switch + auto-update toggle
+
+- **UI:** a Bitcoin **"Version & Updates"** card (conditional in `AppSidebar.vue`
+  for `bitcoin-core` / `bitcoin-knots`): current version, a switch dropdown, and
+  an **auto-update-to-latest** toggle.
+- **Switch = controlled re-pull/recreate** reusing the `package.update`
+  machinery but targeting an arbitrary (incl. older) tag → effectively
+  `package.set-version`.
+- **Persistence:** new `package.set-config` RPC writing the existing
+  `app-configs/<id>.json` (`{ pinnedVersion, autoUpdate }`).
+- **Auto-update:** the existing hourly catalog check, when `autoUpdate:true`,
+  triggers `package.update` to the catalog default. A pinned version **suppresses
+  the update badge**.
+
+---
+
+## 4. Invariants & safety rails
+
+- **Rootless only.** Pipeline images and run path stay rootless; no Docker-socket,
+  no privileged.
+- **No data loss across version change.** Preserve `/var/lib/archipelago/bitcoin`,
+  secrets (`bitcoin-rpc-password`, `…-rpcauth`), ports, and the adoption container
+  name on every install / switch / update.
+- **⚠️ Downgrade vs. chainstate (highest risk).** Bitcoin Core refuses to start on
+  a chainstate written by a *newer* version unless reindexed (expensive, or data
+  loss on a pruned node). The UI **must** warn loudly on downgrade; the
+  orchestrator should gate/confirm it and never silently wipe. Pruned nodes can't
+  simply `-reindex`.
+- **Core ⇄ Knots switch** stays governed by the existing conflict guard; treat an
+  impl switch as distinct from a version switch.
+- **Floating tags** (`latest`) are never advertised as a selectable "version" and
+  never counted as an available update (already handled by
+  `available_update_for_app`).
+- **Verify on a real node** (`.228` then `.198`) and pass `run-20x` before any
+  tag.
+
+---
+
+## 5. Files / seams (no code yet)
+
+| Concern | File |
+|---------|------|
+| Image build/push | new `scripts/build-bitcoin-image.sh`; `apps/bitcoin-core/Dockerfile`; new `apps/bitcoin-knots/Dockerfile`; `scripts/image-versions.sh` |
+| Catalog schema | `core/archipelago/src/container/app_catalog.rs`; `releases/app-catalog.json` (+ `app-catalog/catalog.json`) |
+| Install override | `core/archipelago/src/container/prod_orchestrator.rs` (`install` / `install_fresh`); `api/rpc/package/install.rs`; `api/rpc/dispatcher.rs` |
+| Switch / set-config RPC | `api/rpc/package/update.rs`; new `package.set-config` handler; `app-configs/<id>.json` |
+| Install modal | `neode-ui/src/views/MarketplaceAppDetails.vue`; new `…/marketplace/AppInstallModal.vue` |
+| Version & Updates card | `neode-ui/src/views/appDetails/AppSidebar.vue`; `neode-ui/src/api/rpc-client.ts`; `neode-ui/src/types/api.ts` |
+
+---
+
+## 6. Open questions
+
+1. **Curated version set** — how many majors back do we host, and storage budget
+   on the registry?
+2. **Multi-arch** — fleet is x86_64 today; do any nodes need arm64 images?
+3. **Pruned-node downgrade policy** — block outright, or allow with an explicit
+   "this will require re-sync / may lose pruned data" confirmation?
+4. **Auto-update default** — off (opt-in) for a consensus-critical app like
+   Bitcoin? (Recommended: **off**, explicit opt-in.)
+5. **Knots date-suffix UX** — how to display `29.3.knots20260508` cleanly.
+
+---
+
+## Sources
+
+- [Bitcoin Core releases](https://bitcoincore.org/en/releases/)
+- [bitcoin/bitcoin releases](https://github.com/bitcoin/bitcoin/releases)
+- [bitcoinknots/bitcoin releases](https://github.com/bitcoinknots/bitcoin/releases)
+- [Bitcoin Knots](https://bitcoinknots.org/)
+- [bitcoin.org version history](https://bitcoin.org/en/version-history)
--- a/docs/ci-cd-plan.md
+++ b/docs/ci-cd-plan.md
@ -1,37 +0,0 @@
-# CI/CD Pipeline Plan
-
-## CI Workflow (on push to main + PRs)
-
-### Jobs
-1. **Rust checks**
-   - `cargo clippy --all-targets --all-features` (zero warnings)
-   - `cargo fmt --all -- --check`
-   - `cargo test --all-features`
-
-2. **Frontend checks**
-   - `npm run type-check` (vue-tsc)
-   - `npm run lint` (eslint)
-   - `npm test` (vitest)
-
-3. **Script validation**
-   - `bash -n` on all .sh files
-   - `shellcheck` on critical scripts
-
-### Merge policy
-All checks must pass before merge.
-
-## Release Workflow (on tag push v*)
-
-### Jobs
-1. Build Linux binary (cross-compile x86_64 + ARM64)
-2. Build frontend (`npm run build`)
-3. ISO build via SSH to build server
-4. QEMU smoke test of ISO
-
-## Pre-requisites
- GitHub Actions runners with Rust toolchain
- SSH key for build server access
- Branch protection on main
- Image digest manifest from `scripts/image-versions.sh`
-
-## Estimated implementation: 2 weeks
--- a/docs/current-state.md
+++ b/docs/current-state.md
@ -1,5 +0,0 @@
-# Current State
-
-> This document has been consolidated into [`architecture.md`](architecture.md).
->
-> See that file for the current system architecture, active nodes, codebase stats, and feature status.
--- a/Show More
+++ b/Show More