archipelago 987a961f4a fix(nginx): self-heal fedimint asset rewrite on deployed nodes — HTTP + HTTPS (B13)
The B13 template fix only fixed fresh ISOs. Already-deployed nodes keep their
old nginx config, where /app/fedimint/ proxies to :8175 without rewriting the
Guardian UI's root-rooted asset URLs (src="/assets/...", url("/assets/...")).
Those resolve against the SPA root: bg-network.jpg exists there by luck, but
app-icons/fedimint.jpg 404s (location /assets/ uses try_files =404) — the
visibly-broken icon.

bootstrap.rs::patch_nginx_conf now heals both paths on startup:
- Style A (main conf, HTTP): swaps the old single nostr-provider sub_filter tail
  for the full reroot set; byte-matches the shipped template.
- Style B (HTTPS app-proxy snippet): the snippet's fedimint block has no
  sub_filter and a per-node-varying trailing directive, so anchor on the unique
  :8175 proxy_pass and insert the reroot set after it (nginx ignores directive
  order). Snippet added to the bootstrap nginx loop (skipped on HTTP-only nodes).

missing_* flags are now gated on their splice anchors so the included snippet
neither attempts the main-conf-only patches nor logs warn-skips every boot.
Idempotent via the 'href="/' 'href="/app/fedimint/' marker.

Verified on .198 (both paths): fedimint app-icon 404 -> 200 image/jpeg; nginx -t
OK; containers survived restart (Quadlet); idempotent steady state, no warn spam.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:03:04 -04:00

29 KiB
Raw Blame History

▶▶ SESSION SAVE / RESUME (2026-06-15)

State: v1.7.96-alpha SHIPPED. v1.7.97-alpha NOT cut yet — 10 fixes committed on vps2 main (git remote: gitea-vps2), nothing on the fleet yet. Validate on .116/.198 + UI-confirm BEFORE cutting .97.

Resume command (run elsewhere):

cd ~/Projects/archy && git fetch gitea-vps2 && git checkout main && git reset --hard gitea-vps2/main && cat tests/production-quality/TRACKER.md

Then continue from "IN PROGRESS" below.

Committed & ready for .97 (vps2 main): B5 (LND CORS, verified .116/.198/.103), B1, B2, B4, B14, B21, B3 (incl. /api/peer-content nginx via bootstrap), B15, B7, B13 (fedimint CSS self-heal — main conf + HTTPS snippet, verified .198 both paths app-icon 404→200). B6 pruned-gate already live.

IN PROGRESS — pick up at B12. B13 DONE (committed this session; bootstrap.rs self-heals both the main conf and the HTTPS app-proxy snippet — see B13 entry below for full verification). REMAINING:

  1. B12 (mempool host detect — stacks.rs:1278 hardcodes CORE_RPC_HOST=bitcoin-knots; fails on bitcoin-core nodes → dynamic host detect; backend, medium risk, test .116).
  2. Then B16 (bitcoin status retain — UI-test), B6 no-node-present half, B14b (FIPS reachability depth), B22/B23 (peer download + group chat — need live repro), B9/B10/B11/B17/B18/B19, B8 (low), B20 (mesh-headers feature).

Note: .198 is currently running a sideloaded .97-dev binary (md5 4c83803d, built from this B13 commit) — NOT an official release. Reflashing/OTA will replace it.

Ship .97 when ready: ./scripts/create-release.sh 1.7.97-alpha (curate CHANGELOG ≥3 layman bullets first + run scripts/sync-whats-new.py; SKIP_RELEASE_TESTS=1 only for the 2 known-flaky vitest timing tests) → scripts/publish-release-assets.sh 1.7.97-alpha gitea-vps2 → git push gitea-vps2 main + tag. (gitea-local push fails: token rejected — non-blocking.)


Production-Quality Bug Tracker

Living tracker for the post-v1.7.96 "no new features until production quality" push. Updated continuously as we investigate → fix → test → pass. Kept in-repo so progress survives a session cutoff.

Rules (from user, 2026-06-15)

  • No new features until the OS is production / no-bugs quality.
  • Test-harness-first: build/extend a harness for each bug before fixing.
  • Validate every fix on .116 + .198 (both 192.168.1.x, pw ThisIsWeb54321@) + the harness BEFORE it goes into any release. (.198 still carries the LND CORS nginx duplicate → good for fix-(a) validation; .116 does not.)
  • Priority order: cloud/federated-nodes + mesh FIRST, then app-specific, then low-pri.

Status legend

TODO · INVESTIGATING · ROOT-CAUSED · FIXING · TESTING (on .116+harness) · PASSED · SHIPPED

Release status

  • v1.7.96-alpha — SHIPPED (2026-06-15). Live on vps2 (primary OTA): manifest v1.7.96-alpha, assets HTTP 200, main@8c3c7954 + tag present. Contents: kiosk grid removal + FIPS TCP/UDP anchor selector. NOTE: gitea-local (localhost) mirror push failed (token rejected → /login); non-blocking, needs refreshed token.
  • v1.7.97-alpha — IN PROGRESS (this push). Will bundle the verified fixes below.

🔴🔴 TOP PRIORITY

B5 — LND "connect your wallet" details/QR broken fleet-wide — ROOT-CAUSED

Origin: user escalation. Symptom: LND connect screen (served on app port :18083) can't load details/QR. Two distinct root causes (confirmed live):

  • (a) Duplicate ACAO on /lnd-connect-info (seen on .103): backend sets Access-Control-Allow-Origin (proxy.rs:108) AND nginx add_header adds a second → browser rejects "multiple values". nginx config drift. Fix: bootstrap.rs nginx patch must strip the redundant add_header from the /lnd-connect-info location (backend owns CORS).
  • (b) No ACAO on /proxy/lnd/v1/* 401 (fleet-wide): the unauth/auth-layer 401 is produced before the CORS-adding proxy handler (proxy.rs:135 handle_lnd_proxy). Browser → "No 'Access-Control-Allow-Origin' header". Fix: ensure auth-layer/early-return responses for /proxy/lnd + /lnd-connect-info carry CORS headers.
  • .116 /lnd-connect-info returns a single correct ACAO → symptom varies by node's nginx state.
  • Backend CORS helper: handler/mod.rs app_cors_origin() (:270) — reflects Origin when its host == request host.
  • Backend change → ships in .97. Status: PASSED — verified on .116, .198, .103 (harness 4/4 each). Ready to bundle into .97.
  • Caveat: bootstrap's nginx dup-strip runs a few seconds AFTER /health goes green (async patch+reload) — converges within ~1 min of restart; not instant. Acceptable.
  • CODE CHANGES MADE (uncommitted):
    • core/archipelago/src/bootstrap.rs: added NGINX_LND_DUP_CORS const + strip in patch_nginx_conf() (removes the duplicate nginx add_header ACAO from /lnd-connect-info so the backend's single header wins). Idempotent; runs on startup nginx bootstrap. → fixes (a)
    • core/archipelago/src/api/handler/mod.rs: new unauthorized_cors(origin) helper (:~205) + /proxy/lnd/ route (:~505) computes origin first and returns unauthorized_cors so the 401 carries ACAO. → fixes (b)
    • Test on .116 for (b); test on .103 for (a) [.116 has no dup to strip].
    • 2026-06-15 RESULT — .116 (fix b): harness 4/4 PASS (sideloaded built binary, restarted). /proxy/lnd/v1/* now returns CORS on the 401.
    • (Correction: an earlier "LND container MISSING" reading was a FALSE alarm — docker isn't in the non-interactive PATH; runtime is podman. Verified lnd Up 9h — containers SURVIVED the restart cleanly.)
    • Next: deploy to .103 + run harness to confirm fix (a) (nginx dup strip).
  • Harness: tests/production-quality/lnd-cors-test.sh <node> — asserts single correct ACAO on /lnd-connect-info + ACAO present on /proxy/lnd/v1/{getinfo,channels}. Baseline (2026-06-15): .116 = 2 pass/2 fail (proxy missing ACAO); .103 = 1 pass/3 fail (connect-info dup + proxy missing).
  • FIX PLAN (precise):
    1. (b) handler/mod.rs:504-508 /proxy/lnd/ returns Self::unauthorized() (401, NO CORS) when session check fails → browser CORS wall. Add CORS (app_cors_origin) to that 401. Same pattern for any other app-origin early-return.
    2. (a) nginx /lnd-connect-info location double-adds ACAO (backend + nginx add_header). Strip the nginx add_header Access-Control-Allow-Origin there; backend owns CORS. Update bootstrap.rs nginx patch to remove it on existing nodes (idempotent).
    • Verify: rebuild backend, deploy to .116, run harness → expect 3/3 (or 4 assertions) PASS on .116 AND .103.

🔴 PRIORITY — cloud / federation / mesh

Dupes, erroneous names, and non-convergent group membership across nodes. Expected: trusted nodes form a transitive group (every node connects to any newly-added trusted node; all nodes show the same set). .103 has a long/dirty list.

Federated peer "sapien" shows TWO chats: one "sapien" WITHOUT archy logo (looks non-federated) + one named by raw DID did:key:z6MkoSbN5CM7fBaQg2nWbCymEkFXsHnuXvec9Mjo5RtJf9dQ. Same node keyed by both federated identity and raw DID → merge to one. Code: core/archipelago/src/mesh + mesh/typed_messages.rs (note :233 — meshcore adverts don't carry archy pubkey).

B3 — Cloud peer media won't preview/play — FIXING (code done: /api/peer-content streaming proxy + playMedia streams free content)

Music/video preview files on peer nodes' cloud don't play (streaming/range/content-type over mesh+Tor peer fetch).

Unexpected token '<', "<!doctype" when FileBrowser absent (/app/filebrowser/api/resources → SPA index.html), and 502 when FileBrowser is down (seen on .103). filebrowser-client.ts:102/:106. Fix: detect FileBrowser unavailable, friendly prompt; consider nginx returning JSON 404/502 for missing /app/<app>/ instead of SPA shell. Handle BOTH absent + down.

B14 — cloud browse transport not recorded — FIXED (record_peer_transport in 4 content handlers; build OK). NOTE: live data shows FIPS reaches only ~4/15 peers, 6 fall back to Tor genuinely → see B14b.

Browsing trusted/peer nodes in the Cloud tab connects over Tor instead of FIPS (should prefer FIPS like the rest of mesh; same for peer browsing). cf project_fips_integration, project_tor_node_to_node_works (last_transport should be fips/mesh).


🟠 APP-SPECIFIC

B6 — ElectrumX install gate — PARTIAL (pruned-node gate already works; "no node present" half DEFERRED: false-positive risk without UI test, needs package-presence check)

Show the yellow requirement badge when no full node / only a pruned node is present (reuse existing yellow badge pattern).

B7 — ElectrumX UI stuck loader on top — FIXED (overlay hides + iframe shows when status stale; type-check green). UI-confirm.

UI renders but a loader sits on top; possibly stale pre-sync screen not clearing.

B9 — IndeedHub keeps stopping on nodes — TODO

Container won't stay running (crash-loop / reconcile stop). Check logs + restart policy + health.

B10 — Immich still crashes — TODO

Recurring crash ("still" → prior attempts). Check container logs + resource limits + DB/ML deps.

B11 — Companion app: "open in external browser" apps don't work — TODO

Apps meant to open in a new/external browser don't launch from the companion app; need the phone-default-browser request-modal pattern mobile apps use. Relates to v1.7.90 "open in new tab from companion app".

B12 — Mempool not connecting — ROOT-CAUSED (stacks.rs:1278 hardcodes CORE_RPC_HOST=bitcoin-knots; fails on bitcoin-core nodes. Fix=dynamic host detect. Backend, medium risk, test .116)

mempool can't reach the Bitcoin backend on some nodes. Investigate on .116. Check mempool→electrs→bitcoind wiring + deps.

B13 — Fedimint UI not applying CSS — FIXED + VERIFIED on .198 (both HTTP + HTTPS)

Root cause confirmed: the Fedimint Guardian page (served by :8175) is a server-rendered status page with ~7.8KB INLINE CSS plus image assets referenced root-rooted (src="/assets/img/app-icons/fedimint.jpg", url("/assets/img/bg-network.jpg")). Without an asset rewrite those /assets/... URLs resolve against the archipelago SPA root: bg-network.jpg happens to exist there (shared design asset → loaded by luck) but app-icons/fedimint.jpg does NOT → 404 (the broken/visibly-missing icon). The location /assets/ block uses try_files $uri =404, so missing fedimint assets 404 rather than fall through.

Fix = nginx sub_filter set that reroots every root-rooted asset URL (href="/, src="/, url("/, and single-quote variants) under /app/fedimint/, plus proxy_set_header Accept-Encoding "" so the upstream doesn't gzip (sub_filter can't rewrite gzipped bodies). Shipped two ways:

  • Fresh ISOs (committed a50b6df2): templates image-recipe/configs/nginx-archipelago.conf (HTTP) + image-recipe/configs/snippets/archipelago-https-app-proxies.conf (HTTPS).
  • Already-deployed nodes (bootstrap self-heal, this commit): core/archipelago/src/bootstrap.rs::patch_nginx_conf now heals BOTH the main conf (Style A — swaps the old single nostr-provider sub_filter tail for the full reroot set, byte-matches the shipped template) AND the HTTPS app-proxy snippet (Style B — anchors on the unique :8175 proxy_pass and inserts the reroot set; robust to the snippet's varying trailing directive). missing_* flags now gated on their splice anchors so the healed snippet early-returns cleanly (no per-boot warn-skips). Idempotent via the 'href="/' 'href="/app/fedimint/' marker.

VERIFIED on .198 (sideloaded built binary, restart, async self-heal converged ~15s):

  • HTTP /app/fedimint/: live conf healed byte-identical to template; app-icon 404→200 image/jpeg (41944b).
  • HTTPS /app/fedimint/ (snippet): healed; same app-icon 404→200; bg-network 200; root /assets/img/app-icons/fedimint.jpg returns 200 text/html (SPA shell) — proving the reroot is necessary.
  • nginx -t OK both times; containers survived restart (Quadlet); both files carry the marker exactly once (idempotent steady state); no warn spam in logs. NOTE: self-healed snippet is functionally correct but NOT byte-identical to the fresh-ISO snippet template (insert-after-proxy_pass vs full block) — acceptable; nginx ignores directive order/whitespace.

B15 — Bitcoin UI sync progress lags — FIXED (Home.vue poll 30s→10s). UI-confirm.

Bitcoin UI doesn't update its sync progress fast enough even though the console clearly already has the block-height data. Likely a polling-interval / reactive-update gap between the status source and the UI.

B16 — Bitcoin sync status vanishes — DEFERRED (homeStatus.ts already partly retains last value; safe fix needs UI test to avoid showing stale-as-live; plan in findings)

The bitcoin sync status in the Home > System container disappears when it should persist/cache and show an "updating" state. Related to B15 (Bitcoin UI sync lag). Likely the status component clears on empty/transitional poll instead of retaining last-known + showing updating.

B17 — archipelago.service flaps on boot before starting — TODO

On some boots, [FAILED] Failed to start archipelago.service - Archipelago Backend prints ~20 times over ~5 min before it finally starts properly. Likely a startup dependency/timing race (DB lock, port bind, crash-recovery, or a dependency not ready) causing systemd restart loop until a precondition is met. Check service Restart=/RestartSec, ExecStartPre gates, and what the early failures log. May tie to B16/crash-recovery.

B18 — Apps stop right after install (or become unstartable) — TODO

Many apps install but immediately stop, requiring a manual Start — or become unstartable entirely. Likely the install→start handoff / reconciler doesn't bring them up (or starts then they exit). Related to B9 (IndeedHub stopping), B10 (Immich). Possibly linked to the cgroup-SIGKILL-on-archipelago.service-restart issue (feedback_no_systemctl_deploy_until_quadlet) — but NOTE: on .116 (Quadlet) containers survived a service restart cleanly, so the reconciler may be fine there; reproduce on the affected nodes. Check post-install start sequencing + boot_reconciler + container restart policy + cgroup placement.

B19 — Failed download-update lands on Install button (should be Download) — TODO

When an update download fails, the UI sometimes shows the Install button instead of returning to the Download button — a big UX issue (user can't retry the download cleanly). Check the SystemUpdate state machine's error/failure transition.

B20 — Surface bitcoin-headers-over-mesh broadcast (send/receive toggles) — TODO (feature-adjacent, surfacing existing work)

We previously broadcast bitcoin block headers over mesh to archipelago nodes but never fully surfaced it. Want two switches: "send headers" (you broadcast) and "receive headers" (you accept). NOTE: this is feature-adjacent — surfacing existing functionality; the user added it during the no-new-features push, so treat as low-priority polish until the bug list is clear. Code: mesh block-headers (mesh.block-headers RPC seen in logs; core/archipelago/src/mesh).

B14b — FIPS reachability: many peers fall back to Tor — INVESTIGATED (needs FIPS-network depth)

Live (2026-06-15) federation sync last_transport on .116/.198: ~4 peers fips, ~6 tor, ~5 none. So beyond the recording fix (B14), FIPS genuinely doesn't reach many federated peers (they use Tor). Investigate WHY: is fips_npub known for those peers? are they FIPS-online? is the shared anchor connecting them? (cf project_fips_integration, project_tor_node_to_node_works). This is the real "Tor not FIPS" depth. FINDINGS (.198, 2026-06-15): archipelago-fips ACTIVE; ALL 13 peers HAVE fips_npub; last_transport = 5 fips / 5 tor / 3 none. So it's NOT a missing-npub or service-down bug — FIPS genuinely reaches some peers and not others = DIAL-TIME reachability: the 'tor' peers aren't FIPS-reachable at dial time (offline, NAT, their FIPS not registered with the shared anchor), and 'none' = fully offline (X250 roam/beta/cellular). NEXT (deeper, needs FIPS-network debugging): verify a known-online peer (e.g. .228/.116) is reachable over FIPS from .198 right now; if an online FIPS peer still falls back to Tor → real anchor/registration bug; check fips daemon peer table + anchor connectivity. Likely partly peer-availability (not fully fixable in code).

B21 — Show Tor/FIPS transport pill on cloud browse — FIXED (build+type-check green; deploy+UI-confirm on .116/.198)

Tag whether the peer connection is Tor or FIPS and surface it as a small pill on the cloud browse screens / connection loader. Data source: federation node last_transport (now recorded by B14) exposed via federation.list-nodes; frontend renders a pill (FIPS=fast/green, Tor=slower) on PeerFiles.vue / Cloud peer view + the connection loader. Frontend-only-ish. FINDINGS: PeerFiles.vue:46 loader HARDCODES 'Connecting via Tor...' even when FIPS used (bug). Frontend types already have last_transport ('fips'|'tor'|'mesh'|'lan') federation/types.ts:31; NodeList.vue:167 already renders a transport indicator. PLAN: have content.browse-peer RETURN the transport used (B14 already computes it) → frontend shows a pill (FIPS green / Tor amber) on PeerFiles header + fix the loader text to reflect actual/attempted transport. Small backend (add transport to browse response) + frontend pill.

B22 — Peer cloud download/audio errors (.228→.198) — TODO (pairs with B3)

Observed 2026-06-15 browsing .228's cloud from .198: (a) downloading a peer cloud file → "Operation failed. Check server logs for details." (b) playing a peer AUDIO file → "Could not play audio. File Browser may not be running." (misleading — it's a peer file, not File Browser; that's the OLD base64/blob path B3 replaces). ACTION: (a) check content.download-peer backend error on .198 logs while downloading (likely the same Range/transport/timeout path as B3, or a peer-side 4xx); (b) verify B3 streaming fixes peer audio once deployed, and fix the misleading audioPlayer error string. Get server logs: ssh .198, journalctl -u archipelago | grep -i 'content|peer|download'.

B23 — Archipelago group chat (all nodes) broken/slow over Tor — TODO (PRIORITY, mesh)

The all-nodes "Archipelago group" chat (over Tor) doesn't seem to work. Facets:

  • (a) Group delivery unreliable / "doesn't work" over Tor.
  • (b) Messages may just be VERY SLOW (latency — likely Tor-only path; should use FIPS+Tor per the new transport method like B14, preferring FIPS).
  • (c) Add the SENDER CONTACT NAME to each message so you can differentiate who sent what (group messages lack attribution).
  • (d) Messages sometimes DUPLICATED (dedup by message id / sender_seq — cf mesh.ts:73 cross-transport identity (sender_pubkey, sender_seq); duplicate likely from receiving same msg over both transports or re-broadcast). Code: core/archipelago/src/mesh (typed_messages, listener), frontend Mesh.vue/stores/mesh.ts. Relates to B2 (identity), B14/B14b (transport). Test on .116/.198 (+ a Tor-only peer like .228).

B8 — netbird app doesn't work — TODO (LOW / much later)

(RETRACTED: CryptPad placeholder-icon — user says cryptpad is fine.)


📋 vps2 Gitea issues (lfg2025/archy) — imported 2026-06-15

  • G#1 [Bug] Strange peer request behaviour — TODO (likely related to B1/federation)
  • G#2 [Bug] Fix flashing USB from kiosk — TODO
  • G#3 [Feature] VPN Configuration — DEFERRED (feature; no new features until production quality)
  • G#4 [Bug] Bitcoind is slow — TODO
  • G#5 [Feature] OpenWRT and TollGate integration — DEFERRED (feature)
  • G#6 [Feature] Move dashboard/monitoring link to home screen — DEFERRED (feature)
  • G#7 [Bug] Scrolling with Companion app — TODO

Gitea issue mapping (vps2 lfg2025/archy)

All backlog bugs now mirrored as Gitea issues: B1→#8, B2→#9, B3→#10, B4→#11, B5→#12, B6→#13, B7→#14, B8→#15, B9→#16, B10→#17, B11→#18, B12→#19, B13→#20, B14→#21, B15→#22, B16→#23, B17→#24, B18→#25, B19→#26. (Pre-existing G#17 remain; some overlap, e.g. G#1 strange-peer ≈ B1.) Close the Gitea issue when a bug is verified+shipped.

INVESTIGATION FINDINGS 2026-06-15 (B1/B2/B3/B4/B14) — cutoff insurance

B1 trusted-node divergence — ROOT-CAUSED. federation/sync.rs merge_transitive_peers() (~:140) dedupes ONLY by DID; the SAME physical node appears under multiple DIDs (same onion + fips_npub) → duplicate entries ("Arch Dev" ×2, "Sapien" ×2). No background convergence → lists diverge (.103=16 nodes, .116/.198=15). Model: federation/types.rs:24 FederatedNode (PK=did); storage federation/storage.rs nodes.json; add_node dedupes by DID only (:125). FIX: in merge_transitive_peers add a SECOND match arm — if no DID match, match by normalized onion (trim .onion); if found, treat as same node (merge fips_npub/name, don't add). Same dedup on add_node. Plus a one-time cleanup of existing dup DIDs (remove-node the stale one). TEST: after sync, all 3 nodes have identical node set, no two entries share an onion.

B2 duplicate chat contact — ROOT-CAUSED (same root as B1). Two federation DIDs (same onion/fips_npub, e.g. "Sapien" dids z6MkoSbN… + z6MkeYMU…) get seeded as TWO mesh contacts: mesh/mod.rs seed_federation_peers_into_mesh() (:94) upserts per-pubkey contact_id; frontend Mesh.vue mergeKeyForPeer() (:492) keys by DID so two DIDs = two rows. FIX: (backend) in seed, skip a node whose onion was already seeded (HashSet of onions); (frontend) Mesh.vue merge by onion when DIDs differ but onion matches. Fixing B1's onion-dedup largely resolves this too. TEST: one "Sapien" row; mesh.peers has one contact for the shared onion.

B3 peer media won't play — ROOT-CAUSED. PeerFiles.vue playMedia()/loadPreview() (~:358,:508) fetch the WHOLE file via RPC content.preview-peer/content.download-peer (api/rpc/content.rs :393,:213) which base64-encodes the entire file; frontend makes a Blob URL → browser can't Range-seek → video/large-audio won't play (+ 30/120s timeouts truncate big files). The peer's HTTP /content/<id> handler (api/handler/content.rs :49) ALREADY supports Range/206 + Accept-Ranges. FIX (bigger): add a local streaming proxy endpoint /api/peer-content/{onion}/{id} in api/handler/mod.rs that forwards the browser's Range header to the peer's /content/<id> (via fips::dial PeerRequest) and streams back 206 + Content-Range + Content-Type; frontend sets <video>/<audio> src to that URL (not a blob). TEST: curl Range on the new endpoint → 206 + Content-Range; video seeks/plays.

B4 cloud my-folders <!doctype/502 — ROOT-CAUSED. filebrowser-client.ts listDirectory() (:99) does res.json() (:106) after only an res.ok check; when FileBrowser is ABSENT nginx serves SPA index.html (200, '<!doctype') → JSON crash; when DOWN → 502. FIX (frontend, low-risk): guard res content-type !== application/json → throw typed "FileBrowser unavailable" handled by Cloud.vue/CloudFolder.vue empty-state; same guard in login() (:71) + getUsage() (:215). OPTIONAL nginx: add error_page 502 503 = @filebrowser_unavailable returning JSON in the /app/filebrowser/ block (image-recipe/configs/nginx-archipelago.conf ~:411). TEST: stop filebrowser on .116/.198 → Cloud shows friendly state, no doctype crash.

B14 cloud browse Tor-not-FIPS — ROOT-CAUSED (nuance). FIPS-first logic WORKS (fips/dial.rs send_get :331 tries FIPS, falls back to Tor on 404/5xx; v1.7.94 fix). BUT the 4 content handlers in api/rpc/content.rs (browse :297, download :237, download_paid :356, preview :421) capture _transport and NEVER call record_peer_transport() → UI badge shows Tor/null even when FIPS used. FIX: add record_peer_transport(data_dir, None, Some(onion), &transport.to_string()) after each successful send_get (storage.rs:84 has the fn). ⚠️ VERIFY on nodes whether FIPS is ACTUALLY used or genuinely falling back to Tor (if genuinely Tor, deeper FIPS-reachability issue beyond recording). TEST: after browse, last_transport = fips (when peer FIPS-reachable).

INVESTIGATION FINDINGS 2026-06-15 (B6/B7/B12/B13/B15/B16) — cutoff insurance

B13 Fedimint CSS — app HTML (docker/fedimint-ui/index.html) uses absolute /assets/* paths; under /app/fedimint/ the browser requests /assets/* which hit the main SPA, not :8175 → unstyled. FIX: nginx sub_filter rewrite (same proven pattern as indeedhub/botfights blocks) in image-recipe/configs/nginx-archipelago.conf (/app/fedimint/ :641) + snippets/archipelago-https-app-proxies.conf (:164) + bootstrap patch for existing nodes. Rewrites href/src/url '/' → '/app/fedimint/'. TEST: curl .../app/fedimint/assets/...css → 200 real CSS.

B6 ElectrumX archival gate — electrs needs a NON-pruned full node; install card doesn't warn at a glance. /bitcoin-status returns blockchain_info.pruned. Yellow badge pattern exists (MarketplaceAppCard.vue). FIX (frontend, simple): show a yellow "Requires a full archive Bitcoin node (not pruned)" note on the electrumx card (MarketplaceAppCard.vue ~:53). catalog.json electrumx already has requires.

B7 ElectrumX stuck loader — sync overlay gated by electrsSync (useElectrsSync.ts syncing = status!=='synced'); if status never flips to 'synced' (stale/crash) the overlay blocks the UI forever. AppSessionFrame.vue:44 iframe gate !electrsSync. FIX (frontend): fail-open — allow iframe when electrsSync?.stale (and add a timeout in useElectrsSync.ts so a slow/stale status stops blocking after ~5min).

B15 bitcoin sync UI lag — Home.vue:485 polls every 30s. FIX: faster bitcoin refresh (~5-10s) (separate interval for bitcoin vs system stats).

B16 bitcoin status vanishes — homeStatus.ts refreshBitcoin clears/leaves bitcoinAvailable null on a failed/transitional poll → HomeSystemCard.vue:60 v-if hides the card. FIX: retain last-known bitcoinAvailable on transient failure + show an "Updating…" badge instead of disappearing.

B12 mempool not connecting — stacks.rs:1278 + apps/mempool-api/manifest.yml:50 hardcode CORE_RPC_HOST=bitcoin-knots; on nodes running bitcoin-core (not knots) mempool-api gets getaddrinfo ENOTFOUND bitcoin-knots. Also ELECTRUM_HOST=electrumx absent on pruned nodes (docs/CONTAINER_LIFECYCLE_HANDOFF.md:654). FIX: detect which bitcoin container runs (knots vs core) + set CORE_RPC_HOST dynamically; qualify the mempool stack so it doesn't half-start without electrumx. Backend (stacks.rs) — medium risk, test on .116.

  • 2026-06-15 (cont. 2): B15 (poll 30s→10s) + B7 (ElectrumX loader fail-open on stale) — committed c0d41cf8, type-check green. B6 PARTIAL (pruned gate already works; no-node-present half deferred). Fanned out investigations for B6/B7/B12/B13/B15/B16 — all root-caused with fix plans in FINDINGS above.
  • DEFERRED with ready plans (need a backend build + careful patch, or UI test, or live repro): B13 (fedimint CSS — nginx sub_filter asset rewrite; bootstrap exact-match patch is fragile, do carefully), B12 (mempool host — dynamic bitcoin-knots/core detect in stacks.rs), B16 (bitcoin status retain — UI-test to avoid stale-as-live), B6 no-node-present half, B14b (FIPS net depth), B22/B23 (need live repro).
  • NEXT options: (a) continue backend batch B13+B12 (one build); (b) do UI confirms on .116/.198 + cut v1.7.97-alpha with the ~10 committed fixes (LND incident + cloud/federation/mesh).
  • Committed fixes awaiting .97: B5, B1, B2, B4, B14, B21, B3, B15, B7 (+ B6 pruned-gate already live). All on vps2 main; NOT on fleet yet.

Progress log

  • 2026-06-15: tracker created. v1.7.96-alpha shipped. All 19 bugs filed as Gitea issues #8#26. vps2 feature issues (G#3/5/6) deferred (no new features).
  • 2026-06-15: B5 (LND CORS) DONE — root-caused, both fixes implemented, verified on .116/.198/.103 (harness 4/4 each), committed 1db720af, pushed to vps2 main. Will bundle into .97 (Gitea #12 to close on .97 ship).
  • Validation nodes: .116 + .198 (pw ThisIsWeb54321@). Runtime is podman (docker not in non-interactive PATH). Sideload binary → /usr/local/bin/archipelago + restart (containers survive on these nodes).
  • 2026-06-15 (cont.): B1,B2,B4 dedup+guard — committed ed493106, unit-tested 2/2, live .198 healthy. B14 transport recording — committed 1c6dc153 (after build-repair: used private crate::federation::storage:: path → E0603; fixed to re-exported crate::federation::). B21 Tor/FIPS pill — committed 0801dd66. All pushed to vps2 main; builds verified EXIT 0.
  • Discovered B14b (FIPS reaches only ~4/15 peers; rest genuinely Tor) and B21 (pill) during the block.
  • ⚠️ LESSON: a backgrounded build "completed" notification does NOT mean success — grep the EXIT code before committing (a broken commit reached main once; repaired by 1c6dc153; no release cut from it → fleet unaffected).
  • NEXT: B3 (peer media streaming — big), then B14b (FIPS reachability), then app-specific (B6,B7,B9B13,B15B19). None deployed to fleet yet — all on vps2 main awaiting the .97 release after full .116/.198 + UI verification.