The B13 template fix only fixed fresh ISOs. Already-deployed nodes keep their
old nginx config, where /app/fedimint/ proxies to :8175 without rewriting the
Guardian UI's root-rooted asset URLs (src="/assets/...", url("/assets/...")).
Those resolve against the SPA root: bg-network.jpg exists there by luck, but
app-icons/fedimint.jpg 404s (location /assets/ uses try_files =404) — the
visibly-broken icon.
bootstrap.rs::patch_nginx_conf now heals both paths on startup:
- Style A (main conf, HTTP): swaps the old single nostr-provider sub_filter tail
for the full reroot set; byte-matches the shipped template.
- Style B (HTTPS app-proxy snippet): the snippet's fedimint block has no
sub_filter and a per-node-varying trailing directive, so anchor on the unique
:8175 proxy_pass and insert the reroot set after it (nginx ignores directive
order). Snippet added to the bootstrap nginx loop (skipped on HTTP-only nodes).
missing_* flags are now gated on their splice anchors so the included snippet
neither attempts the main-conf-only patches nor logs warn-skips every boot.
Idempotent via the 'href="/' 'href="/app/fedimint/' marker.
Verified on .198 (both paths): fedimint app-icon 404 -> 200 image/jpeg; nginx -t
OK; containers survived restart (Quadlet); idempotent steady state, no warn spam.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
29 KiB
▶▶ SESSION SAVE / RESUME (2026-06-15)
State: v1.7.96-alpha SHIPPED. v1.7.97-alpha NOT cut yet — 10 fixes committed on vps2 main (git remote: gitea-vps2), nothing on the fleet yet. Validate on .116/.198 + UI-confirm BEFORE cutting .97.
Resume command (run elsewhere):
cd ~/Projects/archy && git fetch gitea-vps2 && git checkout main && git reset --hard gitea-vps2/main && cat tests/production-quality/TRACKER.md
Then continue from "IN PROGRESS" below.
Committed & ready for .97 (vps2 main): B5 (LND CORS, verified .116/.198/.103), B1, B2, B4, B14, B21, B3 (incl. /api/peer-content nginx via bootstrap), B15, B7, B13 (fedimint CSS self-heal — main conf + HTTPS snippet, verified .198 both paths app-icon 404→200). B6 pruned-gate already live.
IN PROGRESS — pick up at B12. B13 DONE (committed this session; bootstrap.rs self-heals both the main conf and the HTTPS app-proxy snippet — see B13 entry below for full verification). REMAINING:
- B12 (mempool host detect — stacks.rs:1278 hardcodes CORE_RPC_HOST=bitcoin-knots; fails on bitcoin-core nodes → dynamic host detect; backend, medium risk, test .116).
- Then B16 (bitcoin status retain — UI-test), B6 no-node-present half, B14b (FIPS reachability depth), B22/B23 (peer download + group chat — need live repro), B9/B10/B11/B17/B18/B19, B8 (low), B20 (mesh-headers feature).
Note: .198 is currently running a sideloaded .97-dev binary (md5 4c83803d, built from this B13 commit) — NOT an official release. Reflashing/OTA will replace it.
Ship .97 when ready: ./scripts/create-release.sh 1.7.97-alpha (curate CHANGELOG ≥3 layman bullets first + run scripts/sync-whats-new.py; SKIP_RELEASE_TESTS=1 only for the 2 known-flaky vitest timing tests) → scripts/publish-release-assets.sh 1.7.97-alpha gitea-vps2 → git push gitea-vps2 main + tag. (gitea-local push fails: token rejected — non-blocking.)
Production-Quality Bug Tracker
Living tracker for the post-v1.7.96 "no new features until production quality" push. Updated continuously as we investigate → fix → test → pass. Kept in-repo so progress survives a session cutoff.
Rules (from user, 2026-06-15)
- No new features until the OS is production / no-bugs quality.
- Test-harness-first: build/extend a harness for each bug before fixing.
- Validate every fix on
.116+.198(both 192.168.1.x, pw ThisIsWeb54321@) + the harness BEFORE it goes into any release. (.198 still carries the LND CORS nginx duplicate → good for fix-(a) validation; .116 does not.) - Priority order: cloud/federated-nodes + mesh FIRST, then app-specific, then low-pri.
Status legend
TODO · INVESTIGATING · ROOT-CAUSED · FIXING · TESTING (on .116+harness) · PASSED · SHIPPED
Release status
- v1.7.96-alpha — SHIPPED (2026-06-15). Live on vps2 (primary OTA): manifest v1.7.96-alpha, assets HTTP 200,
main@8c3c7954+ tag present. Contents: kiosk grid removal + FIPS TCP/UDP anchor selector. NOTE: gitea-local (localhost) mirror push failed (token rejected → /login); non-blocking, needs refreshed token. - v1.7.97-alpha — IN PROGRESS (this push). Will bundle the verified fixes below.
🔴🔴 TOP PRIORITY
B5 — LND "connect your wallet" details/QR broken fleet-wide — ROOT-CAUSED
Origin: user escalation. Symptom: LND connect screen (served on app port :18083) can't load details/QR. Two distinct root causes (confirmed live):
- (a) Duplicate ACAO on
/lnd-connect-info(seen on .103): backend setsAccess-Control-Allow-Origin(proxy.rs:108) AND nginxadd_headeradds a second → browser rejects "multiple values". nginx config drift. Fix: bootstrap.rs nginx patch must strip the redundantadd_headerfrom the/lnd-connect-infolocation (backend owns CORS). - (b) No ACAO on
/proxy/lnd/v1/*401 (fleet-wide): the unauth/auth-layer 401 is produced before the CORS-adding proxy handler (proxy.rs:135handle_lnd_proxy). Browser → "No 'Access-Control-Allow-Origin' header". Fix: ensure auth-layer/early-return responses for/proxy/lnd+/lnd-connect-infocarry CORS headers. .116/lnd-connect-inforeturns a single correct ACAO → symptom varies by node's nginx state.- Backend CORS helper: handler/mod.rs
app_cors_origin()(:270) — reflects Origin when its host == request host. - Backend change → ships in .97. Status: ✅ PASSED — verified on .116, .198, .103 (harness 4/4 each). Ready to bundle into .97.
- Caveat: bootstrap's nginx dup-strip runs a few seconds AFTER /health goes green (async patch+reload) — converges within ~1 min of restart; not instant. Acceptable.
- CODE CHANGES MADE (uncommitted):
core/archipelago/src/bootstrap.rs: addedNGINX_LND_DUP_CORSconst + strip inpatch_nginx_conf()(removes the duplicate nginxadd_headerACAO from/lnd-connect-infoso the backend's single header wins). Idempotent; runs on startup nginx bootstrap. → fixes (a)core/archipelago/src/api/handler/mod.rs: newunauthorized_cors(origin)helper (:~205) +/proxy/lnd/route (:~505) computes origin first and returnsunauthorized_corsso the 401 carries ACAO. → fixes (b)- Test on .116 for (b); test on .103 for (a) [.116 has no dup to strip].
- 2026-06-15 RESULT — .116 (fix b): harness 4/4 PASS (sideloaded built binary, restarted).
/proxy/lnd/v1/*now returns CORS on the 401. ✅ - (Correction: an earlier "LND container MISSING" reading was a FALSE alarm —
dockerisn't in the non-interactive PATH; runtime is podman. Verifiedlnd Up 9h— containers SURVIVED the restart cleanly.) - Next: deploy to .103 + run harness to confirm fix (a) (nginx dup strip).
- Harness:
tests/production-quality/lnd-cors-test.sh <node>— asserts single correct ACAO on /lnd-connect-info + ACAO present on /proxy/lnd/v1/{getinfo,channels}. Baseline (2026-06-15): .116 = 2 pass/2 fail (proxy missing ACAO); .103 = 1 pass/3 fail (connect-info dup + proxy missing). - FIX PLAN (precise):
- (b) handler/mod.rs:504-508
/proxy/lnd/returnsSelf::unauthorized()(401, NO CORS) when session check fails → browser CORS wall. Add CORS (app_cors_origin) to that 401. Same pattern for any other app-origin early-return. - (a) nginx
/lnd-connect-infolocation double-adds ACAO (backend + nginxadd_header). Strip the nginxadd_header Access-Control-Allow-Originthere; backend owns CORS. Update bootstrap.rs nginx patch to remove it on existing nodes (idempotent).
- Verify: rebuild backend, deploy to .116, run harness → expect 3/3 (or 4 assertions) PASS on .116 AND .103.
- (b) handler/mod.rs:504-508
🔴 PRIORITY — cloud / federation / mesh
B1 — Trusted-node list not clean — PASSED (onion-dedup; unit test 2/2; live .198 15→13 distinct, healthy). UI visual-confirm recommended.
Dupes, erroneous names, and non-convergent group membership across nodes. Expected: trusted nodes form a transitive group (every node connects to any newly-added trusted node; all nodes show the same set). .103 has a long/dirty list.
B2 — Duplicate chat contact for one node — PASSED (resolved by load-dedup feeding mesh seed; unit-tested). UI visual-confirm recommended.
Federated peer "sapien" shows TWO chats: one "sapien" WITHOUT archy logo (looks non-federated) + one named by raw DID did:key:z6MkoSbN5CM7fBaQg2nWbCymEkFXsHnuXvec9Mjo5RtJf9dQ. Same node keyed by both federated identity and raw DID → merge to one. Code: core/archipelago/src/mesh + mesh/typed_messages.rs (note :233 — meshcore adverts don't carry archy pubkey).
B3 — Cloud peer media won't preview/play — FIXING (code done: /api/peer-content streaming proxy + playMedia streams free content)
Music/video preview files on peer nodes' cloud don't play (streaming/range/content-type over mesh+Tor peer fetch).
B4 — Cloud "my folders" fails (JSON parse / 502) — PASSED (content-type guard; built, guard in bundle, deployed .198). UI visual-confirm recommended.
Unexpected token '<', "<!doctype" when FileBrowser absent (/app/filebrowser/api/resources → SPA index.html), and 502 when FileBrowser is down (seen on .103). filebrowser-client.ts:102/:106. Fix: detect FileBrowser unavailable, friendly prompt; consider nginx returning JSON 404/502 for missing /app/<app>/ instead of SPA shell. Handle BOTH absent + down.
B14 — cloud browse transport not recorded — FIXED (record_peer_transport in 4 content handlers; build OK). NOTE: live data shows FIPS reaches only ~4/15 peers, 6 fall back to Tor genuinely → see B14b.
Browsing trusted/peer nodes in the Cloud tab connects over Tor instead of FIPS (should prefer FIPS like the rest of mesh; same for peer browsing). cf project_fips_integration, project_tor_node_to_node_works (last_transport should be fips/mesh).
🟠 APP-SPECIFIC
B6 — ElectrumX install gate — PARTIAL (pruned-node gate already works; "no node present" half DEFERRED: false-positive risk without UI test, needs package-presence check)
Show the yellow requirement badge when no full node / only a pruned node is present (reuse existing yellow badge pattern).
B7 — ElectrumX UI stuck loader on top — FIXED (overlay hides + iframe shows when status stale; type-check green). UI-confirm.
UI renders but a loader sits on top; possibly stale pre-sync screen not clearing.
B9 — IndeedHub keeps stopping on nodes — TODO
Container won't stay running (crash-loop / reconcile stop). Check logs + restart policy + health.
B10 — Immich still crashes — TODO
Recurring crash ("still" → prior attempts). Check container logs + resource limits + DB/ML deps.
B11 — Companion app: "open in external browser" apps don't work — TODO
Apps meant to open in a new/external browser don't launch from the companion app; need the phone-default-browser request-modal pattern mobile apps use. Relates to v1.7.90 "open in new tab from companion app".
B12 — Mempool not connecting — ROOT-CAUSED (stacks.rs:1278 hardcodes CORE_RPC_HOST=bitcoin-knots; fails on bitcoin-core nodes. Fix=dynamic host detect. Backend, medium risk, test .116)
mempool can't reach the Bitcoin backend on some nodes. Investigate on .116. Check mempool→electrs→bitcoind wiring + deps.
B13 — Fedimint UI not applying CSS — FIXED + VERIFIED on .198 (both HTTP + HTTPS)
Root cause confirmed: the Fedimint Guardian page (served by :8175) is a server-rendered status page with ~7.8KB INLINE CSS plus image assets referenced root-rooted (src="/assets/img/app-icons/fedimint.jpg", url("/assets/img/bg-network.jpg")). Without an asset rewrite those /assets/... URLs resolve against the archipelago SPA root: bg-network.jpg happens to exist there (shared design asset → loaded by luck) but app-icons/fedimint.jpg does NOT → 404 (the broken/visibly-missing icon). The location /assets/ block uses try_files $uri =404, so missing fedimint assets 404 rather than fall through.
Fix = nginx sub_filter set that reroots every root-rooted asset URL (href="/, src="/, url("/, and single-quote variants) under /app/fedimint/, plus proxy_set_header Accept-Encoding "" so the upstream doesn't gzip (sub_filter can't rewrite gzipped bodies). Shipped two ways:
- Fresh ISOs (committed
a50b6df2): templatesimage-recipe/configs/nginx-archipelago.conf(HTTP) +image-recipe/configs/snippets/archipelago-https-app-proxies.conf(HTTPS). - Already-deployed nodes (bootstrap self-heal, this commit):
core/archipelago/src/bootstrap.rs::patch_nginx_confnow heals BOTH the main conf (Style A — swaps the old single nostr-provider sub_filter tail for the full reroot set, byte-matches the shipped template) AND the HTTPS app-proxy snippet (Style B — anchors on the unique:8175proxy_pass and inserts the reroot set; robust to the snippet's varying trailing directive).missing_*flags now gated on their splice anchors so the healed snippet early-returns cleanly (no per-boot warn-skips). Idempotent via the'href="/' 'href="/app/fedimint/'marker.
VERIFIED on .198 (sideloaded built binary, restart, async self-heal converged ~15s):
- HTTP
/app/fedimint/: live conf healed byte-identical to template; app-icon 404→200 image/jpeg (41944b). - HTTPS
/app/fedimint/(snippet): healed; same app-icon 404→200; bg-network 200; root/assets/img/app-icons/fedimint.jpgreturns 200 text/html (SPA shell) — proving the reroot is necessary. nginx -tOK both times; containers survived restart (Quadlet); both files carry the marker exactly once (idempotent steady state); no warn spam in logs. NOTE: self-healed snippet is functionally correct but NOT byte-identical to the fresh-ISO snippet template (insert-after-proxy_pass vs full block) — acceptable; nginx ignores directive order/whitespace.
B15 — Bitcoin UI sync progress lags — FIXED (Home.vue poll 30s→10s). UI-confirm.
Bitcoin UI doesn't update its sync progress fast enough even though the console clearly already has the block-height data. Likely a polling-interval / reactive-update gap between the status source and the UI.
B16 — Bitcoin sync status vanishes — DEFERRED (homeStatus.ts already partly retains last value; safe fix needs UI test to avoid showing stale-as-live; plan in findings)
The bitcoin sync status in the Home > System container disappears when it should persist/cache and show an "updating" state. Related to B15 (Bitcoin UI sync lag). Likely the status component clears on empty/transitional poll instead of retaining last-known + showing updating.
B17 — archipelago.service flaps on boot before starting — TODO
On some boots, [FAILED] Failed to start archipelago.service - Archipelago Backend prints ~20 times over ~5 min before it finally starts properly. Likely a startup dependency/timing race (DB lock, port bind, crash-recovery, or a dependency not ready) causing systemd restart loop until a precondition is met. Check service Restart=/RestartSec, ExecStartPre gates, and what the early failures log. May tie to B16/crash-recovery.
B18 — Apps stop right after install (or become unstartable) — TODO
Many apps install but immediately stop, requiring a manual Start — or become unstartable entirely. Likely the install→start handoff / reconciler doesn't bring them up (or starts then they exit). Related to B9 (IndeedHub stopping), B10 (Immich). Possibly linked to the cgroup-SIGKILL-on-archipelago.service-restart issue (feedback_no_systemctl_deploy_until_quadlet) — but NOTE: on .116 (Quadlet) containers survived a service restart cleanly, so the reconciler may be fine there; reproduce on the affected nodes. Check post-install start sequencing + boot_reconciler + container restart policy + cgroup placement.
B19 — Failed download-update lands on Install button (should be Download) — TODO
When an update download fails, the UI sometimes shows the Install button instead of returning to the Download button — a big UX issue (user can't retry the download cleanly). Check the SystemUpdate state machine's error/failure transition.
B20 — Surface bitcoin-headers-over-mesh broadcast (send/receive toggles) — TODO (feature-adjacent, surfacing existing work)
We previously broadcast bitcoin block headers over mesh to archipelago nodes but never fully surfaced it. Want two switches: "send headers" (you broadcast) and "receive headers" (you accept). NOTE: this is feature-adjacent — surfacing existing functionality; the user added it during the no-new-features push, so treat as low-priority polish until the bug list is clear. Code: mesh block-headers (mesh.block-headers RPC seen in logs; core/archipelago/src/mesh).
B14b — FIPS reachability: many peers fall back to Tor — INVESTIGATED (needs FIPS-network depth)
Live (2026-06-15) federation sync last_transport on .116/.198: ~4 peers fips, ~6 tor, ~5 none. So beyond the recording fix (B14), FIPS genuinely doesn't reach many federated peers (they use Tor). Investigate WHY: is fips_npub known for those peers? are they FIPS-online? is the shared anchor connecting them? (cf project_fips_integration, project_tor_node_to_node_works). This is the real "Tor not FIPS" depth. FINDINGS (.198, 2026-06-15): archipelago-fips ACTIVE; ALL 13 peers HAVE fips_npub; last_transport = 5 fips / 5 tor / 3 none. So it's NOT a missing-npub or service-down bug — FIPS genuinely reaches some peers and not others = DIAL-TIME reachability: the 'tor' peers aren't FIPS-reachable at dial time (offline, NAT, their FIPS not registered with the shared anchor), and 'none' = fully offline (X250 roam/beta/cellular). NEXT (deeper, needs FIPS-network debugging): verify a known-online peer (e.g. .228/.116) is reachable over FIPS from .198 right now; if an online FIPS peer still falls back to Tor → real anchor/registration bug; check fips daemon peer table + anchor connectivity. Likely partly peer-availability (not fully fixable in code).
B21 — Show Tor/FIPS transport pill on cloud browse — FIXED (build+type-check green; deploy+UI-confirm on .116/.198)
Tag whether the peer connection is Tor or FIPS and surface it as a small pill on the cloud browse screens / connection loader. Data source: federation node last_transport (now recorded by B14) exposed via federation.list-nodes; frontend renders a pill (FIPS=fast/green, Tor=slower) on PeerFiles.vue / Cloud peer view + the connection loader. Frontend-only-ish. FINDINGS: PeerFiles.vue:46 loader HARDCODES 'Connecting via Tor...' even when FIPS used (bug). Frontend types already have last_transport ('fips'|'tor'|'mesh'|'lan') federation/types.ts:31; NodeList.vue:167 already renders a transport indicator. PLAN: have content.browse-peer RETURN the transport used (B14 already computes it) → frontend shows a pill (FIPS green / Tor amber) on PeerFiles header + fix the loader text to reflect actual/attempted transport. Small backend (add transport to browse response) + frontend pill.
B22 — Peer cloud download/audio errors (.228→.198) — TODO (pairs with B3)
Observed 2026-06-15 browsing .228's cloud from .198: (a) downloading a peer cloud file → "Operation failed. Check server logs for details." (b) playing a peer AUDIO file → "Could not play audio. File Browser may not be running." (misleading — it's a peer file, not File Browser; that's the OLD base64/blob path B3 replaces). ACTION: (a) check content.download-peer backend error on .198 logs while downloading (likely the same Range/transport/timeout path as B3, or a peer-side 4xx); (b) verify B3 streaming fixes peer audio once deployed, and fix the misleading audioPlayer error string. Get server logs: ssh .198, journalctl -u archipelago | grep -i 'content|peer|download'.
B23 — Archipelago group chat (all nodes) broken/slow over Tor — TODO (PRIORITY, mesh)
The all-nodes "Archipelago group" chat (over Tor) doesn't seem to work. Facets:
- (a) Group delivery unreliable / "doesn't work" over Tor.
- (b) Messages may just be VERY SLOW (latency — likely Tor-only path; should use FIPS+Tor per the new transport method like B14, preferring FIPS).
- (c) Add the SENDER CONTACT NAME to each message so you can differentiate who sent what (group messages lack attribution).
- (d) Messages sometimes DUPLICATED (dedup by message id / sender_seq — cf mesh.ts:73 cross-transport identity (sender_pubkey, sender_seq); duplicate likely from receiving same msg over both transports or re-broadcast). Code: core/archipelago/src/mesh (typed_messages, listener), frontend Mesh.vue/stores/mesh.ts. Relates to B2 (identity), B14/B14b (transport). Test on .116/.198 (+ a Tor-only peer like .228).
B8 — netbird app doesn't work — TODO (LOW / much later)
(RETRACTED: CryptPad placeholder-icon — user says cryptpad is fine.)
📋 vps2 Gitea issues (lfg2025/archy) — imported 2026-06-15
- G#1 [Bug] Strange peer request behaviour — TODO (likely related to B1/federation)
- G#2 [Bug] Fix flashing USB from kiosk — TODO
- G#3 [Feature] VPN Configuration — DEFERRED (feature; no new features until production quality)
- G#4 [Bug] Bitcoind is slow — TODO
- G#5 [Feature] OpenWRT and TollGate integration — DEFERRED (feature)
- G#6 [Feature] Move dashboard/monitoring link to home screen — DEFERRED (feature)
- G#7 [Bug] Scrolling with Companion app — TODO
Gitea issue mapping (vps2 lfg2025/archy)
All backlog bugs now mirrored as Gitea issues: B1→#8, B2→#9, B3→#10, B4→#11, B5→#12, B6→#13, B7→#14, B8→#15, B9→#16, B10→#17, B11→#18, B12→#19, B13→#20, B14→#21, B15→#22, B16→#23, B17→#24, B18→#25, B19→#26. (Pre-existing G#1–7 remain; some overlap, e.g. G#1 strange-peer ≈ B1.) Close the Gitea issue when a bug is verified+shipped.
INVESTIGATION FINDINGS 2026-06-15 (B1/B2/B3/B4/B14) — cutoff insurance
B1 trusted-node divergence — ROOT-CAUSED. federation/sync.rs merge_transitive_peers() (~:140) dedupes ONLY by DID; the SAME physical node appears under multiple DIDs (same onion + fips_npub) → duplicate entries ("Arch Dev" ×2, "Sapien" ×2). No background convergence → lists diverge (.103=16 nodes, .116/.198=15). Model: federation/types.rs:24 FederatedNode (PK=did); storage federation/storage.rs nodes.json; add_node dedupes by DID only (:125). FIX: in merge_transitive_peers add a SECOND match arm — if no DID match, match by normalized onion (trim .onion); if found, treat as same node (merge fips_npub/name, don't add). Same dedup on add_node. Plus a one-time cleanup of existing dup DIDs (remove-node the stale one). TEST: after sync, all 3 nodes have identical node set, no two entries share an onion.
B2 duplicate chat contact — ROOT-CAUSED (same root as B1). Two federation DIDs (same onion/fips_npub, e.g. "Sapien" dids z6MkoSbN… + z6MkeYMU…) get seeded as TWO mesh contacts: mesh/mod.rs seed_federation_peers_into_mesh() (:94) upserts per-pubkey contact_id; frontend :492) keys by DID so two DIDs = two rows. FIX: (backend) in seed, skip a node whose onion was already seeded (HashSet of onions); (frontend) Mesh.vue merge by onion when DIDs differ but onion matches. Fixing B1's onion-dedup largely resolves this too. TEST: one "Sapien" row; Mesh.vue mergeKeyForPeer() (mesh.peers has one contact for the shared onion.
B3 peer media won't play — ROOT-CAUSED. PeerFiles.vue playMedia()/loadPreview() (~:358,:508) fetch the WHOLE file via RPC content.preview-peer/content.download-peer (api/rpc/content.rs :393,:213) which base64-encodes the entire file; frontend makes a Blob URL → browser can't Range-seek → video/large-audio won't play (+ 30/120s timeouts truncate big files). The peer's HTTP /content/<id> handler (api/handler/content.rs :49) ALREADY supports Range/206 + Accept-Ranges. FIX (bigger): add a local streaming proxy endpoint /api/peer-content/{onion}/{id} in api/handler/mod.rs that forwards the browser's Range header to the peer's /content/<id> (via fips::dial PeerRequest) and streams back 206 + Content-Range + Content-Type; frontend sets <video>/<audio> src to that URL (not a blob). TEST: curl Range on the new endpoint → 206 + Content-Range; video seeks/plays.
B4 cloud my-folders <!doctype/502 — ROOT-CAUSED. filebrowser-client.ts listDirectory() (:99) does res.json() (:106) after only an res.ok check; when FileBrowser is ABSENT nginx serves SPA index.html (200, '<!doctype') → JSON crash; when DOWN → 502. FIX (frontend, low-risk): guard res content-type !== application/json → throw typed "FileBrowser unavailable" handled by Cloud.vue/CloudFolder.vue empty-state; same guard in login() (:71) + getUsage() (:215). OPTIONAL nginx: add error_page 502 503 = @filebrowser_unavailable returning JSON in the /app/filebrowser/ block (image-recipe/configs/nginx-archipelago.conf ~:411). TEST: stop filebrowser on .116/.198 → Cloud shows friendly state, no doctype crash.
B14 cloud browse Tor-not-FIPS — ROOT-CAUSED (nuance). FIPS-first logic WORKS (fips/dial.rs send_get :331 tries FIPS, falls back to Tor on 404/5xx; v1.7.94 fix). BUT the 4 content handlers in api/rpc/content.rs (browse :297, download :237, download_paid :356, preview :421) capture _transport and NEVER call record_peer_transport() → UI badge shows Tor/null even when FIPS used. FIX: add record_peer_transport(data_dir, None, Some(onion), &transport.to_string()) after each successful send_get (storage.rs:84 has the fn). ⚠️ VERIFY on nodes whether FIPS is ACTUALLY used or genuinely falling back to Tor (if genuinely Tor, deeper FIPS-reachability issue beyond recording). TEST: after browse, last_transport = fips (when peer FIPS-reachable).
INVESTIGATION FINDINGS 2026-06-15 (B6/B7/B12/B13/B15/B16) — cutoff insurance
B13 Fedimint CSS — app HTML (docker/fedimint-ui/index.html) uses absolute /assets/* paths; under /app/fedimint/ the browser requests /assets/* which hit the main SPA, not :8175 → unstyled. FIX: nginx sub_filter rewrite (same proven pattern as indeedhub/botfights blocks) in image-recipe/configs/nginx-archipelago.conf (/app/fedimint/ :641) + snippets/archipelago-https-app-proxies.conf (:164) + bootstrap patch for existing nodes. Rewrites href/src/url '/' → '/app/fedimint/'. TEST: curl .../app/fedimint/assets/...css → 200 real CSS.
B6 ElectrumX archival gate — electrs needs a NON-pruned full node; install card doesn't warn at a glance. /bitcoin-status returns blockchain_info.pruned. Yellow badge pattern exists (MarketplaceAppCard.vue). FIX (frontend, simple): show a yellow "Requires a full archive Bitcoin node (not pruned)" note on the electrumx card (MarketplaceAppCard.vue ~:53). catalog.json electrumx already has requires.
B7 ElectrumX stuck loader — sync overlay gated by electrsSync (useElectrsSync.ts syncing = status!=='synced'); if status never flips to 'synced' (stale/crash) the overlay blocks the UI forever. AppSessionFrame.vue:44 iframe gate !electrsSync. FIX (frontend): fail-open — allow iframe when electrsSync?.stale (and add a timeout in useElectrsSync.ts so a slow/stale status stops blocking after ~5min).
B15 bitcoin sync UI lag — Home.vue:485 polls every 30s. FIX: faster bitcoin refresh (~5-10s) (separate interval for bitcoin vs system stats).
B16 bitcoin status vanishes — homeStatus.ts refreshBitcoin clears/leaves bitcoinAvailable null on a failed/transitional poll → HomeSystemCard.vue:60 v-if hides the card. FIX: retain last-known bitcoinAvailable on transient failure + show an "Updating…" badge instead of disappearing.
B12 mempool not connecting — stacks.rs:1278 + apps/mempool-api/manifest.yml:50 hardcode CORE_RPC_HOST=bitcoin-knots; on nodes running bitcoin-core (not knots) mempool-api gets getaddrinfo ENOTFOUND bitcoin-knots. Also ELECTRUM_HOST=electrumx absent on pruned nodes (docs/CONTAINER_LIFECYCLE_HANDOFF.md:654). FIX: detect which bitcoin container runs (knots vs core) + set CORE_RPC_HOST dynamically; qualify the mempool stack so it doesn't half-start without electrumx. Backend (stacks.rs) — medium risk, test on .116.
- 2026-06-15 (cont. 2): B15 ✅ (poll 30s→10s) + B7 ✅ (ElectrumX loader fail-open on stale) — committed
c0d41cf8, type-check green. B6 PARTIAL (pruned gate already works; no-node-present half deferred). Fanned out investigations for B6/B7/B12/B13/B15/B16 — all root-caused with fix plans in FINDINGS above. - DEFERRED with ready plans (need a backend build + careful patch, or UI test, or live repro): B13 (fedimint CSS — nginx sub_filter asset rewrite; bootstrap exact-match patch is fragile, do carefully), B12 (mempool host — dynamic bitcoin-knots/core detect in stacks.rs), B16 (bitcoin status retain — UI-test to avoid stale-as-live), B6 no-node-present half, B14b (FIPS net depth), B22/B23 (need live repro).
- NEXT options: (a) continue backend batch B13+B12 (one build); (b) do UI confirms on .116/.198 + cut v1.7.97-alpha with the ~10 committed fixes (LND incident + cloud/federation/mesh).
- Committed fixes awaiting .97: B5, B1, B2, B4, B14, B21, B3, B15, B7 (+ B6 pruned-gate already live). All on vps2 main; NOT on fleet yet.
Progress log
- 2026-06-15: tracker created. v1.7.96-alpha shipped. All 19 bugs filed as Gitea issues #8–#26. vps2 feature issues (G#3/5/6) deferred (no new features).
- 2026-06-15: B5 (LND CORS) ✅ DONE — root-caused, both fixes implemented, verified on .116/.198/.103 (harness 4/4 each), committed
1db720af, pushed to vps2 main. Will bundle into .97 (Gitea #12 to close on .97 ship). - Validation nodes: .116 + .198 (pw ThisIsWeb54321@). Runtime is podman (docker not in non-interactive PATH). Sideload binary → /usr/local/bin/archipelago + restart (containers survive on these nodes).
- 2026-06-15 (cont.): B1,B2,B4 ✅ dedup+guard — committed
ed493106, unit-tested 2/2, live .198 healthy. B14 ✅ transport recording — committed1c6dc153(after build-repair: used privatecrate::federation::storage::path → E0603; fixed to re-exportedcrate::federation::). B21 ✅ Tor/FIPS pill — committed0801dd66. All pushed to vps2 main; builds verified EXIT 0. - Discovered B14b (FIPS reaches only ~4/15 peers; rest genuinely Tor) and B21 (pill) during the block.
- ⚠️ LESSON: a backgrounded build "completed" notification does NOT mean success — grep the EXIT code before committing (a broken commit reached main once; repaired by 1c6dc153; no release cut from it → fleet unaffected).
- NEXT: B3 (peer media streaming — big), then B14b (FIPS reachability), then app-specific (B6,B7,B9–B13,B15–B19). None deployed to fleet yet — all on vps2 main awaiting the .97 release after full .116/.198 + UI verification.