Peer media (music/video) wouldn't play: the frontend downloaded the whole file via RPC as base64 and made a non-seekable Blob URL, so <video>/large <audio> stalled and big files hit the RPC timeout. Add GET /api/peer-content/<onion>/<id> — a same-origin, session-gated proxy that forwards the browser's Range header to the peer's /content/<id> (which already returns 206 Partial Content) and passes status + Content-Range + Content-Type back. PeerFiles.playMedia() now points <video>/<audio> at this streaming URL for free content instead of buffering a base64 blob, so the player can seek and start immediately. Onion/id validated to prevent SSRF/path traversal. (Paid preview keeps its existing flow.) Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
19 KiB
Production-Quality Bug Tracker
Living tracker for the post-v1.7.96 "no new features until production quality" push. Updated continuously as we investigate → fix → test → pass. Kept in-repo so progress survives a session cutoff.
Rules (from user, 2026-06-15)
- No new features until the OS is production / no-bugs quality.
- Test-harness-first: build/extend a harness for each bug before fixing.
- Validate every fix on
.116+.198(both 192.168.1.x, pw ThisIsWeb54321@) + the harness BEFORE it goes into any release. (.198 still carries the LND CORS nginx duplicate → good for fix-(a) validation; .116 does not.) - Priority order: cloud/federated-nodes + mesh FIRST, then app-specific, then low-pri.
Status legend
TODO · INVESTIGATING · ROOT-CAUSED · FIXING · TESTING (on .116+harness) · PASSED · SHIPPED
Release status
- v1.7.96-alpha — SHIPPED (2026-06-15). Live on vps2 (primary OTA): manifest v1.7.96-alpha, assets HTTP 200,
main@8c3c7954+ tag present. Contents: kiosk grid removal + FIPS TCP/UDP anchor selector. NOTE: gitea-local (localhost) mirror push failed (token rejected → /login); non-blocking, needs refreshed token. - v1.7.97-alpha — IN PROGRESS (this push). Will bundle the verified fixes below.
🔴🔴 TOP PRIORITY
B5 — LND "connect your wallet" details/QR broken fleet-wide — ROOT-CAUSED
Origin: user escalation. Symptom: LND connect screen (served on app port :18083) can't load details/QR. Two distinct root causes (confirmed live):
- (a) Duplicate ACAO on
/lnd-connect-info(seen on .103): backend setsAccess-Control-Allow-Origin(proxy.rs:108) AND nginxadd_headeradds a second → browser rejects "multiple values". nginx config drift. Fix: bootstrap.rs nginx patch must strip the redundantadd_headerfrom the/lnd-connect-infolocation (backend owns CORS). - (b) No ACAO on
/proxy/lnd/v1/*401 (fleet-wide): the unauth/auth-layer 401 is produced before the CORS-adding proxy handler (proxy.rs:135handle_lnd_proxy). Browser → "No 'Access-Control-Allow-Origin' header". Fix: ensure auth-layer/early-return responses for/proxy/lnd+/lnd-connect-infocarry CORS headers. .116/lnd-connect-inforeturns a single correct ACAO → symptom varies by node's nginx state.- Backend CORS helper: handler/mod.rs
app_cors_origin()(:270) — reflects Origin when its host == request host. - Backend change → ships in .97. Status: ✅ PASSED — verified on .116, .198, .103 (harness 4/4 each). Ready to bundle into .97.
- Caveat: bootstrap's nginx dup-strip runs a few seconds AFTER /health goes green (async patch+reload) — converges within ~1 min of restart; not instant. Acceptable.
- CODE CHANGES MADE (uncommitted):
core/archipelago/src/bootstrap.rs: addedNGINX_LND_DUP_CORSconst + strip inpatch_nginx_conf()(removes the duplicate nginxadd_headerACAO from/lnd-connect-infoso the backend's single header wins). Idempotent; runs on startup nginx bootstrap. → fixes (a)core/archipelago/src/api/handler/mod.rs: newunauthorized_cors(origin)helper (:~205) +/proxy/lnd/route (:~505) computes origin first and returnsunauthorized_corsso the 401 carries ACAO. → fixes (b)- Test on .116 for (b); test on .103 for (a) [.116 has no dup to strip].
- 2026-06-15 RESULT — .116 (fix b): harness 4/4 PASS (sideloaded built binary, restarted).
/proxy/lnd/v1/*now returns CORS on the 401. ✅ - (Correction: an earlier "LND container MISSING" reading was a FALSE alarm —
dockerisn't in the non-interactive PATH; runtime is podman. Verifiedlnd Up 9h— containers SURVIVED the restart cleanly.) - Next: deploy to .103 + run harness to confirm fix (a) (nginx dup strip).
- Harness:
tests/production-quality/lnd-cors-test.sh <node>— asserts single correct ACAO on /lnd-connect-info + ACAO present on /proxy/lnd/v1/{getinfo,channels}. Baseline (2026-06-15): .116 = 2 pass/2 fail (proxy missing ACAO); .103 = 1 pass/3 fail (connect-info dup + proxy missing). - FIX PLAN (precise):
- (b) handler/mod.rs:504-508
/proxy/lnd/returnsSelf::unauthorized()(401, NO CORS) when session check fails → browser CORS wall. Add CORS (app_cors_origin) to that 401. Same pattern for any other app-origin early-return. - (a) nginx
/lnd-connect-infolocation double-adds ACAO (backend + nginxadd_header). Strip the nginxadd_header Access-Control-Allow-Originthere; backend owns CORS. Update bootstrap.rs nginx patch to remove it on existing nodes (idempotent).
- Verify: rebuild backend, deploy to .116, run harness → expect 3/3 (or 4 assertions) PASS on .116 AND .103.
- (b) handler/mod.rs:504-508
🔴 PRIORITY — cloud / federation / mesh
B1 — Trusted-node list not clean — PASSED (onion-dedup; unit test 2/2; live .198 15→13 distinct, healthy). UI visual-confirm recommended.
Dupes, erroneous names, and non-convergent group membership across nodes. Expected: trusted nodes form a transitive group (every node connects to any newly-added trusted node; all nodes show the same set). .103 has a long/dirty list.
B2 — Duplicate chat contact for one node — PASSED (resolved by load-dedup feeding mesh seed; unit-tested). UI visual-confirm recommended.
Federated peer "sapien" shows TWO chats: one "sapien" WITHOUT archy logo (looks non-federated) + one named by raw DID did:key:z6MkoSbN5CM7fBaQg2nWbCymEkFXsHnuXvec9Mjo5RtJf9dQ. Same node keyed by both federated identity and raw DID → merge to one. Code: core/archipelago/src/mesh + mesh/typed_messages.rs (note :233 — meshcore adverts don't carry archy pubkey).
B3 — Cloud peer media won't preview/play — FIXING (code done: /api/peer-content streaming proxy + playMedia streams free content)
Music/video preview files on peer nodes' cloud don't play (streaming/range/content-type over mesh+Tor peer fetch).
B4 — Cloud "my folders" fails (JSON parse / 502) — PASSED (content-type guard; built, guard in bundle, deployed .198). UI visual-confirm recommended.
Unexpected token '<', "<!doctype" when FileBrowser absent (/app/filebrowser/api/resources → SPA index.html), and 502 when FileBrowser is down (seen on .103). filebrowser-client.ts:102/:106. Fix: detect FileBrowser unavailable, friendly prompt; consider nginx returning JSON 404/502 for missing /app/<app>/ instead of SPA shell. Handle BOTH absent + down.
B14 — cloud browse transport not recorded — FIXED (record_peer_transport in 4 content handlers; build OK). NOTE: live data shows FIPS reaches only ~4/15 peers, 6 fall back to Tor genuinely → see B14b.
Browsing trusted/peer nodes in the Cloud tab connects over Tor instead of FIPS (should prefer FIPS like the rest of mesh; same for peer browsing). cf project_fips_integration, project_tor_node_to_node_works (last_transport should be fips/mesh).
🟠 APP-SPECIFIC
B6 — ElectrumX install button missing "Requires Archival Node" gate — TODO
Show the yellow requirement badge when no full node / only a pruned node is present (reuse existing yellow badge pattern).
B7 — ElectrumX UI stuck loader on top — TODO
UI renders but a loader sits on top; possibly stale pre-sync screen not clearing.
B9 — IndeedHub keeps stopping on nodes — TODO
Container won't stay running (crash-loop / reconcile stop). Check logs + restart policy + health.
B10 — Immich still crashes — TODO
Recurring crash ("still" → prior attempts). Check container logs + resource limits + DB/ML deps.
B11 — Companion app: "open in external browser" apps don't work — TODO
Apps meant to open in a new/external browser don't launch from the companion app; need the phone-default-browser request-modal pattern mobile apps use. Relates to v1.7.90 "open in new tab from companion app".
B12 — Mempool not connecting to Bitcoin on some nodes — TODO
mempool can't reach the Bitcoin backend on some nodes. Investigate on .116. Check mempool→electrs→bitcoind wiring + deps.
B13 — Fedimint UI not applying CSS — TODO
Actual Fedimint UI (not pre-sync) renders unstyled. Likely asset path / proxy base-href (assets rooted at / vs /app/fedimint/).
B15 — Bitcoin UI sync progress lags — TODO
Bitcoin UI doesn't update its sync progress fast enough even though the console clearly already has the block-height data. Likely a polling-interval / reactive-update gap between the status source and the UI.
B16 — Bitcoin sync status on Home > System container vanishes — TODO
The bitcoin sync status in the Home > System container disappears when it should persist/cache and show an "updating" state. Related to B15 (Bitcoin UI sync lag). Likely the status component clears on empty/transitional poll instead of retaining last-known + showing updating.
B17 — archipelago.service flaps on boot before starting — TODO
On some boots, [FAILED] Failed to start archipelago.service - Archipelago Backend prints ~20 times over ~5 min before it finally starts properly. Likely a startup dependency/timing race (DB lock, port bind, crash-recovery, or a dependency not ready) causing systemd restart loop until a precondition is met. Check service Restart=/RestartSec, ExecStartPre gates, and what the early failures log. May tie to B16/crash-recovery.
B18 — Apps stop right after install (or become unstartable) — TODO
Many apps install but immediately stop, requiring a manual Start — or become unstartable entirely. Likely the install→start handoff / reconciler doesn't bring them up (or starts then they exit). Related to B9 (IndeedHub stopping), B10 (Immich). Possibly linked to the cgroup-SIGKILL-on-archipelago.service-restart issue (feedback_no_systemctl_deploy_until_quadlet) — but NOTE: on .116 (Quadlet) containers survived a service restart cleanly, so the reconciler may be fine there; reproduce on the affected nodes. Check post-install start sequencing + boot_reconciler + container restart policy + cgroup placement.
B19 — Failed download-update lands on Install button (should be Download) — TODO
When an update download fails, the UI sometimes shows the Install button instead of returning to the Download button — a big UX issue (user can't retry the download cleanly). Check the SystemUpdate state machine's error/failure transition.
B20 — Surface bitcoin-headers-over-mesh broadcast (send/receive toggles) — TODO (feature-adjacent, surfacing existing work)
We previously broadcast bitcoin block headers over mesh to archipelago nodes but never fully surfaced it. Want two switches: "send headers" (you broadcast) and "receive headers" (you accept). NOTE: this is feature-adjacent — surfacing existing functionality; the user added it during the no-new-features push, so treat as low-priority polish until the bug list is clear. Code: mesh block-headers (mesh.block-headers RPC seen in logs; core/archipelago/src/mesh).
B14b — FIPS reachability: many peers fall back to Tor — TODO (priority, deeper)
Live (2026-06-15) federation sync last_transport on .116/.198: ~4 peers fips, ~6 tor, ~5 none. So beyond the recording fix (B14), FIPS genuinely doesn't reach many federated peers (they use Tor). Investigate WHY: is fips_npub known for those peers? are they FIPS-online? is the shared anchor connecting them? (cf project_fips_integration, project_tor_node_to_node_works). This is the real "Tor not FIPS" depth.
B21 — Show Tor/FIPS transport pill on cloud browse — FIXED (build+type-check green; deploy+UI-confirm on .116/.198)
Tag whether the peer connection is Tor or FIPS and surface it as a small pill on the cloud browse screens / connection loader. Data source: federation node last_transport (now recorded by B14) exposed via federation.list-nodes; frontend renders a pill (FIPS=fast/green, Tor=slower) on PeerFiles.vue / Cloud peer view + the connection loader. Frontend-only-ish. FINDINGS: PeerFiles.vue:46 loader HARDCODES 'Connecting via Tor...' even when FIPS used (bug). Frontend types already have last_transport ('fips'|'tor'|'mesh'|'lan') federation/types.ts:31; NodeList.vue:167 already renders a transport indicator. PLAN: have content.browse-peer RETURN the transport used (B14 already computes it) → frontend shows a pill (FIPS green / Tor amber) on PeerFiles header + fix the loader text to reflect actual/attempted transport. Small backend (add transport to browse response) + frontend pill.
B8 — netbird app doesn't work — TODO (LOW / much later)
(RETRACTED: CryptPad placeholder-icon — user says cryptpad is fine.)
📋 vps2 Gitea issues (lfg2025/archy) — imported 2026-06-15
- G#1 [Bug] Strange peer request behaviour — TODO (likely related to B1/federation)
- G#2 [Bug] Fix flashing USB from kiosk — TODO
- G#3 [Feature] VPN Configuration — DEFERRED (feature; no new features until production quality)
- G#4 [Bug] Bitcoind is slow — TODO
- G#5 [Feature] OpenWRT and TollGate integration — DEFERRED (feature)
- G#6 [Feature] Move dashboard/monitoring link to home screen — DEFERRED (feature)
- G#7 [Bug] Scrolling with Companion app — TODO
Gitea issue mapping (vps2 lfg2025/archy)
All backlog bugs now mirrored as Gitea issues: B1→#8, B2→#9, B3→#10, B4→#11, B5→#12, B6→#13, B7→#14, B8→#15, B9→#16, B10→#17, B11→#18, B12→#19, B13→#20, B14→#21, B15→#22, B16→#23, B17→#24, B18→#25, B19→#26. (Pre-existing G#1–7 remain; some overlap, e.g. G#1 strange-peer ≈ B1.) Close the Gitea issue when a bug is verified+shipped.
INVESTIGATION FINDINGS 2026-06-15 (B1/B2/B3/B4/B14) — cutoff insurance
B1 trusted-node divergence — ROOT-CAUSED. federation/sync.rs merge_transitive_peers() (~:140) dedupes ONLY by DID; the SAME physical node appears under multiple DIDs (same onion + fips_npub) → duplicate entries ("Arch Dev" ×2, "Sapien" ×2). No background convergence → lists diverge (.103=16 nodes, .116/.198=15). Model: federation/types.rs:24 FederatedNode (PK=did); storage federation/storage.rs nodes.json; add_node dedupes by DID only (:125). FIX: in merge_transitive_peers add a SECOND match arm — if no DID match, match by normalized onion (trim .onion); if found, treat as same node (merge fips_npub/name, don't add). Same dedup on add_node. Plus a one-time cleanup of existing dup DIDs (remove-node the stale one). TEST: after sync, all 3 nodes have identical node set, no two entries share an onion.
B2 duplicate chat contact — ROOT-CAUSED (same root as B1). Two federation DIDs (same onion/fips_npub, e.g. "Sapien" dids z6MkoSbN… + z6MkeYMU…) get seeded as TWO mesh contacts: mesh/mod.rs seed_federation_peers_into_mesh() (:94) upserts per-pubkey contact_id; frontend :492) keys by DID so two DIDs = two rows. FIX: (backend) in seed, skip a node whose onion was already seeded (HashSet of onions); (frontend) Mesh.vue merge by onion when DIDs differ but onion matches. Fixing B1's onion-dedup largely resolves this too. TEST: one "Sapien" row; Mesh.vue mergeKeyForPeer() (mesh.peers has one contact for the shared onion.
B3 peer media won't play — ROOT-CAUSED. PeerFiles.vue playMedia()/loadPreview() (~:358,:508) fetch the WHOLE file via RPC content.preview-peer/content.download-peer (api/rpc/content.rs :393,:213) which base64-encodes the entire file; frontend makes a Blob URL → browser can't Range-seek → video/large-audio won't play (+ 30/120s timeouts truncate big files). The peer's HTTP /content/<id> handler (api/handler/content.rs :49) ALREADY supports Range/206 + Accept-Ranges. FIX (bigger): add a local streaming proxy endpoint /api/peer-content/{onion}/{id} in api/handler/mod.rs that forwards the browser's Range header to the peer's /content/<id> (via fips::dial PeerRequest) and streams back 206 + Content-Range + Content-Type; frontend sets <video>/<audio> src to that URL (not a blob). TEST: curl Range on the new endpoint → 206 + Content-Range; video seeks/plays.
B4 cloud my-folders <!doctype/502 — ROOT-CAUSED. filebrowser-client.ts listDirectory() (:99) does res.json() (:106) after only an res.ok check; when FileBrowser is ABSENT nginx serves SPA index.html (200, '<!doctype') → JSON crash; when DOWN → 502. FIX (frontend, low-risk): guard res content-type !== application/json → throw typed "FileBrowser unavailable" handled by Cloud.vue/CloudFolder.vue empty-state; same guard in login() (:71) + getUsage() (:215). OPTIONAL nginx: add error_page 502 503 = @filebrowser_unavailable returning JSON in the /app/filebrowser/ block (image-recipe/configs/nginx-archipelago.conf ~:411). TEST: stop filebrowser on .116/.198 → Cloud shows friendly state, no doctype crash.
B14 cloud browse Tor-not-FIPS — ROOT-CAUSED (nuance). FIPS-first logic WORKS (fips/dial.rs send_get :331 tries FIPS, falls back to Tor on 404/5xx; v1.7.94 fix). BUT the 4 content handlers in api/rpc/content.rs (browse :297, download :237, download_paid :356, preview :421) capture _transport and NEVER call record_peer_transport() → UI badge shows Tor/null even when FIPS used. FIX: add record_peer_transport(data_dir, None, Some(onion), &transport.to_string()) after each successful send_get (storage.rs:84 has the fn). ⚠️ VERIFY on nodes whether FIPS is ACTUALLY used or genuinely falling back to Tor (if genuinely Tor, deeper FIPS-reachability issue beyond recording). TEST: after browse, last_transport = fips (when peer FIPS-reachable).
Progress log
- 2026-06-15: tracker created. v1.7.96-alpha shipped. All 19 bugs filed as Gitea issues #8–#26. vps2 feature issues (G#3/5/6) deferred (no new features).
- 2026-06-15: B5 (LND CORS) ✅ DONE — root-caused, both fixes implemented, verified on .116/.198/.103 (harness 4/4 each), committed
1db720af, pushed to vps2 main. Will bundle into .97 (Gitea #12 to close on .97 ship). - Validation nodes: .116 + .198 (pw ThisIsWeb54321@). Runtime is podman (docker not in non-interactive PATH). Sideload binary → /usr/local/bin/archipelago + restart (containers survive on these nodes).
- 2026-06-15 (cont.): B1,B2,B4 ✅ dedup+guard — committed
ed493106, unit-tested 2/2, live .198 healthy. B14 ✅ transport recording — committed1c6dc153(after build-repair: used privatecrate::federation::storage::path → E0603; fixed to re-exportedcrate::federation::). B21 ✅ Tor/FIPS pill — committed0801dd66. All pushed to vps2 main; builds verified EXIT 0. - Discovered B14b (FIPS reaches only ~4/15 peers; rest genuinely Tor) and B21 (pill) during the block.
- ⚠️ LESSON: a backgrounded build "completed" notification does NOT mean success — grep the EXIT code before committing (a broken commit reached main once; repaired by 1c6dc153; no release cut from it → fleet unaffected).
- NEXT: B3 (peer media streaming — big), then B14b (FIPS reachability), then app-specific (B6,B7,B9–B13,B15–B19). None deployed to fleet yet — all on vps2 main awaiting the .97 release after full .116/.198 + UI verification.