# Handover — fresh-ISO feedback bug-bash (2026-07-02) **For: the agent building the next ISO + fleet deploy.** All fixes below are **merged and pushed: gitea-ai main = `f5d24796`** (merge of `c375ecc4`, 65 files; branch `iso-feedback-fixes-2026-07-02` also pushed). Source feedback: user's fresh ISO install on a Framework (11th-gen Tiger Lake) machine, node `192.168.1.81` (SSH `archipelago` / `archipelago`). Diagnostic bundle: `/home/archipelago/incoming-logs/node-logs-192.168.1.81/`. **⚠️ Known-red tests on main (NOT from this work):** `trust::anchor:: unset_constant_is_none` + 2 `trust::signed_doc` tests fail because a prior commit pinned `RELEASE_ROOT_PUBKEY_HEX` without updating them. The signing/ audit agent's uncommitted changes in the shared tree fix exactly these — coordinate with them; don't "fix" it independently or you'll collide. This bug-bash branch alone was 898/898 green; merged with main it's 894/898 with only those three. ## ⚠️ Outstanding user request for the deploy - **Change .81's web-UI password to `ThisIsWeb54321@`** — the user forgot the current one. Node was unreachable from .116 during this session (flaky WiFi AP, IP flapped .68↔.81). Do this during deploy (SSH works from the user's machine; `archipelago`/`archipelago`). ## What changed (by file) ### Backend (core/archipelago/src) — builds clean, targeted tests pass - `api/handler/websocket.rs` — **subscribe BEFORE initial snapshot** (the "everything needs ctrl-r" root cause: broadcasts in the snapshot→subscribe gap were silently lost; a stale client never learned containers-scanned). - `main.rs` — crash check now runs BEFORE writing the PID marker (**crash recovery had never run on any node** — it always saw its own PID and skipped); tracing default demoted debug→info (journal volume). - `crash_recovery.rs` — PID-reuse guard (`process_is_archipelago`); new **pending-boot-starts registry** (names queued for recovery/reconcile) with writers in `recover_containers` + stack recovery. - `server.rs` — scanner overlays Stopped/Exited → **Restarting** for pending-boot-start ids (user ask: "status should be restarting if they are being restarted"); `SCANNER_RESTARTING` ownership set so scanner-authored Restarting resolves immediately instead of wedging in the 20-min transitional-preserve. - `container/prod_orchestrator.rs` — reconcile pass + `adopt_existing` register/deregister pending boot-starts; LND pre-start hook passes detected `bitcoin_host()` (Knots vs Core) into `lnd::ensure_config`; new `fedimint-clientd` pre-start hook (mkdir + chown 1000:1000 of `/var/lib/archipelago/fmcd` — self-heals the crash-loop). - `container/lnd.rs` — `ensure_config(paths, rpc_pass, bitcoin_host)`; bitcoind.rpchost no longer hardcoded `bitcoin-knots`; drift check rewrites host changes; +unit test `ensure_config_repairs_bitcoin_host_drift`. - `api/rpc/package/dependencies.rs` — bounded **dependency wait** (`wait_for_install_deps`, 36×5s): installed-but-starting deps wait with "Waiting for Bitcoin to start…" on the card; not-installed deps fail fast with `DependencyGateError` marker; +5 unit tests. - `api/rpc/package/install.rs`, `stacks.rs` — call sites wired to `gate_install_deps` (lnd/electrumx/mempool/btcpay). - `api/rpc/package/async_lifecycle.rs` — `DependencyGateError` removes the optimistic entry (**no more phantom "Stopped" LND tile**) + pushes an Error notification with the reason. - `api/rpc/package/progress.rs` — `set_install_message` helper. - `api/rpc/seed_rpc.rs` — `save_pending_seed_encrypted`; seed.restore also stashes the mnemonic; `auth.rs` — **auth.setup persists the encrypted seed backup** (recovery-phrase reveal previously failed on EVERY node because nothing ever wrote `master_seed.enc`). - `api/rpc/middleware.rs` — sanitizer allowlist extended (seed/2FA/auth errors reach the user instead of "Check server logs"); +2 tests. - `bitcoin_status.rs` — friendly status for "connection reset" (bitcoind starting); raw URL/os-error chains no longer shown; +3 tests. - `bootstrap.rs` — journald drop-in self-heal (OTA nodes get log caps); bitcoin.conf printtoconsole heal. (Log-spam agent's work; verified.) - `api/rpc/package/config.rs` — bitcoin args `-printtoconsole=0`. ### Manifests / scripts / configs - `apps/lnd/manifest.yml` — BITCOIND_HOST now `derived_env {{BITCOIN_HOST}}`. - `apps/bitcoin-knots/manifest.yml`, `apps/bitcoin-core/manifest.yml` — `-printtoconsole=0` (90.6% of the journal was IBD UpdateTip spam; debug.log in the datadir keeps full logs). - `scripts/first-boot-containers.sh` — chown 1000:1000 of `/var/lib/archipelago/fmcd` in BOTH fmcd blocks (root-owned dir was the fedimint-clientd "Permission denied os error 13" crash-loop); printtoconsole=0. - `scripts/container-doctor.sh`, `scripts/reconcile-containers.sh` — printtoconsole=0. - `image-recipe/configs/journald-archipelago.conf` (NEW) — SystemMaxUse=500M, rate limits; baked by ISO builder + bootstrap self-heal. - `image-recipe/configs/nginx-archipelago.conf` — `/assets/` 404s no longer cacheable (the `always` immutable header could pin a missing background for a YEAR); HTTPS block gained the missing `/assets/` location (was silently serving index.html as images). - `image-recipe/configs/archipelago-kiosk.service` — MemoryMax 1500→2800M, MemoryHigh 1200→2200M (kiosk was riding reclaim-throttle = the lag). - `image-recipe/_archived/build-auto-installer-iso.sh` — kiosk launcher/service now spliced from `image-recipe/configs/` at build time (was a stale inline heredoc that force-disabled GPU); **+ `firmware-intel-graphics` + `firmware-amd-graphics`** (Debian trixie split the i915 DMC blobs out of firmware-misc-nonfree; the .81 kernel logged tgl_dmc missing). ### Frontend (neode-ui) — vue-tsc clean, vitest green - `views/Login.vue` — Enter in field 1 → focus confirm; Enter in confirm → submit; submit button always clickable (shows inline mismatch/length error instead of being silently disabled); errors clear on input; **Restart Onboarding needs a confirming second click** (5s window) — this button is the likely cause of the "onboarding restarted after mismatch" report. +`login.restartConfirm` key in en/es locales. - `stores/sync.ts` — 30s staleness reconciliation (server.get-state) while connected; already-connected fast path now refetches too. - `composables/useContainersScanTimeout.ts` (NEW, +tests) — 20s escape hatch; wired into `Apps.vue` / `Discover.vue` / `Marketplace.vue`; fresh empty node reaches the real "no apps yet" empty state; "Checking…" can never persist. - Backgrounds: 10 heaviest bg JPEGs → **WebP q90** (9.4MB→6.6MB; refs updated in OnboardingWrapper/Dashboard/useRouteTransitions); 7 remaining images stayed JPEG (WebP came out LARGER on those — noisy sources; deliberate). - `public/assets/video/video-intro.mp4` — re-encoded CRF20 (SSIM 0.988) with **+faststart** (moov was at EOF → browser had to download all 15MB before playing = the intro lag). 12.7MB now, streams immediately. - LND icon: stale dist artifact; any fresh `npm run build` ships `app-icons/lnd.png` correctly. ## Verification done here - `cargo build -p archipelago` + `cargo check` clean; targeted tests (bitcoin_status, middleware sanitize, dep_wait, lnd, crash_recovery, boot_reconciler, bitcoin_host, prod_orchestrator lnd hooks): **52 passed, 0 failed**. Full suite: **898 passed, 0 failed, 1 ignored** (22s). - `npm run build` green; dist verified: 10 bg-*.webp present, `lnd.png` icon present, `restartConfirm` string in bundle, optimized faststart video (12,740,782 bytes) in place. Note: main had a latent build breaker (unused template ref in `Web5ConnectedNodes.vue` from commit 8256fde1, vue-tsc TS6133) — fixed here by removing the dead ref/binding; without this fix `npm run build` fails on current main. - vitest: new composable tests + related suites pass. - `bash -n` clean on all touched scripts; nginx conf live-verified by agent (200/404/cache headers on both HTTP+HTTPS blocks). - ISO kiosk splice byte-verified against configs/ by agent simulation. ## NOT done / left for you 1. **Full test-suite run + gate**: run the complete `cargo test` and (after deploy) `tests/lifecycle/run-gate.sh` ON .228 per CLAUDE.md before any tag. 2. **Frontend bundle grep before shipping** (per memory/feedback): verify new strings (e.g. `restartConfirm`, `bg-home.webp`) in the built tarball. 3. **Diagnostics collector** (`data-dir-listing.txt` = 15MB of podman overlay internals; dmidecode empty) — collector script wasn't found in this repo (likely lives on-node or in the user's collection script); fix when found. 4. **podman healthcheck cgroup EPERM spam** (1,250 journal errors, healthchecks unreliable fleet-wide) — real open bug, Quadlet-phase territory, NOT fixed. 5. **DP link-training failures on .81** (display corruption) — likely cable/dock/port hardware; firmware fix may help; tell user to try another cable/port if corruption recurs. 6. **LoRa/RNode onboarding surface** — never scoped; user may want it as a feature (mesh device-found modal exists only on Mesh page post-login). 7. The concurrent audit agent's files (`docs/1.8.0-RELEASE-HARDENING-PLAN.md`, `core/.../trust/*`, parts of `bootstrap.rs`) are ALSO uncommitted here — coordinate before committing; don't mix attribution.