38 KiB
Raw Blame History

▶▶ SESSION SAVE / RESUME (2026-06-16) — v1.7.97-alpha CUT, mid-rollout

v1.7.97-alpha is BUILT + TAGGED LOCALLY but NOT yet published to the fleet.

  • Release commit 47c16971 ("chore: release v1.7.97-alpha") + tag v1.7.97-alpha exist on LOCAL main only. NOT pushed to gitea-vps2. Fleet still sees 1.7.96-alpha.
  • Contents (14 fixes + image-opt): B5,B1,B2,B4,B14,B21,B3,B15,B7,B13,B12,B16,B17, B6-pruned-gate + lossless background-image optimization (bg-mesh PNG→JPEG).
  • Release artifacts staged: releases/v1.7.97-alpha/{archipelago, archipelago-frontend-1.7.97-alpha.tar.gz} + /tmp/archipelago-frontend-1.7.97-alpha.tar.gz (177MB, flat layout verified, optimized images baked in, no APK).
  • Deployed (sideload, NOT fleet OTA): .116 = on 1.7.97-alpha, healthy, B17 self-heal CONFIRMED (unit now has RequiresMountsFor, 36 containers survived restart). .198 = deploying (sideload binary+frontend).
  • Backup binaries for rollback: /usr/local/bin/archipelago.1.7.96-alpha.bak on .116 and .198.

REMAINING (this session, user wants to do WITH them):

  1. Finish .198 sideload; then UI-confirm fixes together on .116/.198 + close passing Gitea issues (#8,#9,#10,#11,#12,#14,#19(code-only),#20,#21,#22,#23,#24,#29). Issue map below.
  2. Publish to fleet: scripts/publish-release-assets.sh 1.7.97-alpha gitea-vps2 + git push gitea-vps2 main + tag (AFTER joint confirm — user's call).
  3. Cut a fresh ISO (bakes B13 nginx + B17 unit + all frontend). ISO builds run on a server (deploy-to-target / .228). Then test the ISO together.

⚠️ LESSON: never run the release binary to "check --version" — it has no such flag and BOOTS A FULL NODE (adopts containers, grabs mesh radio). Use strings <bin> | grep version. (Did this on .116; the instance exited on the :5678 port conflict, no harm.)


▶▶ SESSION SAVE / RESUME (2026-06-15)

State: v1.7.96-alpha SHIPPED. v1.7.97-alpha NOT cut yet — 10 fixes committed on vps2 main (git remote: gitea-vps2), nothing on the fleet yet. Validate on .116/.198 + UI-confirm BEFORE cutting .97.

Resume command (run elsewhere):

cd ~/Projects/archy && git fetch gitea-vps2 && git checkout main && git reset --hard gitea-vps2/main && cat tests/production-quality/TRACKER.md

Then continue from "IN PROGRESS" below.

Committed & ready for .97 (vps2 main): B5 (LND CORS, verified .116/.198/.103), B1, B2, B4, B14, B21, B3 (incl. /api/peer-content nginx via bootstrap), B15, B7, B13 (fedimint CSS self-heal — main conf + HTTPS snippet, verified .198 both paths app-icon 404→200), B12 (mempool bitcoin-host detect across 3 render paths — unit-tested; live bitcoin-core validation pending), B16 (bitcoin sync tile retain/Updating… — unit-tested 6/6, commit 83dbd25c). B6 pruned-gate already live. = 13 fixes. PLUS image-optimization (commit 386d4bfc — all bg images losslessly optimized, bg-mesh PNG→JPEG; user asked to include it in the .97 release).

IN PROGRESS — B16 DONE (commit 83dbd25c). Pick up at B6 no-node-present half. B13 + B12 + B16 DONE (committed; see entries below). REMAINING:

  1. B6 no-node-present half, B12b (sibling bitcoin-host hardcodes: LND/BTCPay/electrumx/fedimint + mempool dep declaration — reuse {{BITCOIN_HOST}}; needs validation, esp. LND/fedimint), B14b (FIPS reachability depth), B22/B23 (peer download + group chat — need live repro), B9/B10/B11/B17/B18/B19, B8 (low), B20 (mesh-headers feature).
  2. Loose end: 4 pre-existing prod_orchestrator test failures (generated-files/data_uid fixtures use disallowed tempdir volume sources) — see B12 NOTE; separate small fix.

Note: .198 is running a sideloaded B13-era .97-dev binary (md5 4c83803d). The B12 binary was built (core/target/release/archipelago) but NOT sideloaded (mempool isn't on .198; .198 is Knots so B12 is a no-op there). Reflashing/OTA replaces the dev binary.

Ship .97 when ready: ./scripts/create-release.sh 1.7.97-alpha (curate CHANGELOG ≥3 layman bullets first + run scripts/sync-whats-new.py; SKIP_RELEASE_TESTS=1 only for the 2 known-flaky vitest timing tests) → scripts/publish-release-assets.sh 1.7.97-alpha gitea-vps2 → git push gitea-vps2 main + tag. (gitea-local push fails: token rejected — non-blocking.)


Production-Quality Bug Tracker

Living tracker for the post-v1.7.96 "no new features until production quality" push. Updated continuously as we investigate → fix → test → pass. Kept in-repo so progress survives a session cutoff.

Rules (from user, 2026-06-15)

  • No new features until the OS is production / no-bugs quality.
  • Test-harness-first: build/extend a harness for each bug before fixing.
  • Validate every fix on .116 + .198 (both 192.168.1.x, pw ThisIsWeb54321@) + the harness BEFORE it goes into any release. (.198 still carries the LND CORS nginx duplicate → good for fix-(a) validation; .116 does not.)
  • Priority order: cloud/federated-nodes + mesh FIRST, then app-specific, then low-pri.

Status legend

TODO · INVESTIGATING · ROOT-CAUSED · FIXING · TESTING (on .116+harness) · PASSED · SHIPPED

Release status

  • v1.7.96-alpha — SHIPPED (2026-06-15). Live on vps2 (primary OTA): manifest v1.7.96-alpha, assets HTTP 200, main@8c3c7954 + tag present. Contents: kiosk grid removal + FIPS TCP/UDP anchor selector. NOTE: gitea-local (localhost) mirror push failed (token rejected → /login); non-blocking, needs refreshed token.
  • v1.7.97-alpha — IN PROGRESS (this push). Will bundle the verified fixes below.

🔴🔴 TOP PRIORITY

B5 — LND "connect your wallet" details/QR broken fleet-wide — ROOT-CAUSED

Origin: user escalation. Symptom: LND connect screen (served on app port :18083) can't load details/QR. Two distinct root causes (confirmed live):

  • (a) Duplicate ACAO on /lnd-connect-info (seen on .103): backend sets Access-Control-Allow-Origin (proxy.rs:108) AND nginx add_header adds a second → browser rejects "multiple values". nginx config drift. Fix: bootstrap.rs nginx patch must strip the redundant add_header from the /lnd-connect-info location (backend owns CORS).
  • (b) No ACAO on /proxy/lnd/v1/* 401 (fleet-wide): the unauth/auth-layer 401 is produced before the CORS-adding proxy handler (proxy.rs:135 handle_lnd_proxy). Browser → "No 'Access-Control-Allow-Origin' header". Fix: ensure auth-layer/early-return responses for /proxy/lnd + /lnd-connect-info carry CORS headers.
  • .116 /lnd-connect-info returns a single correct ACAO → symptom varies by node's nginx state.
  • Backend CORS helper: handler/mod.rs app_cors_origin() (:270) — reflects Origin when its host == request host.
  • Backend change → ships in .97. Status: PASSED — verified on .116, .198, .103 (harness 4/4 each). Ready to bundle into .97.
  • Caveat: bootstrap's nginx dup-strip runs a few seconds AFTER /health goes green (async patch+reload) — converges within ~1 min of restart; not instant. Acceptable.
  • CODE CHANGES MADE (uncommitted):
    • core/archipelago/src/bootstrap.rs: added NGINX_LND_DUP_CORS const + strip in patch_nginx_conf() (removes the duplicate nginx add_header ACAO from /lnd-connect-info so the backend's single header wins). Idempotent; runs on startup nginx bootstrap. → fixes (a)
    • core/archipelago/src/api/handler/mod.rs: new unauthorized_cors(origin) helper (:~205) + /proxy/lnd/ route (:~505) computes origin first and returns unauthorized_cors so the 401 carries ACAO. → fixes (b)
    • Test on .116 for (b); test on .103 for (a) [.116 has no dup to strip].
    • 2026-06-15 RESULT — .116 (fix b): harness 4/4 PASS (sideloaded built binary, restarted). /proxy/lnd/v1/* now returns CORS on the 401.
    • (Correction: an earlier "LND container MISSING" reading was a FALSE alarm — docker isn't in the non-interactive PATH; runtime is podman. Verified lnd Up 9h — containers SURVIVED the restart cleanly.)
    • Next: deploy to .103 + run harness to confirm fix (a) (nginx dup strip).
  • Harness: tests/production-quality/lnd-cors-test.sh <node> — asserts single correct ACAO on /lnd-connect-info + ACAO present on /proxy/lnd/v1/{getinfo,channels}. Baseline (2026-06-15): .116 = 2 pass/2 fail (proxy missing ACAO); .103 = 1 pass/3 fail (connect-info dup + proxy missing).
  • FIX PLAN (precise):
    1. (b) handler/mod.rs:504-508 /proxy/lnd/ returns Self::unauthorized() (401, NO CORS) when session check fails → browser CORS wall. Add CORS (app_cors_origin) to that 401. Same pattern for any other app-origin early-return.
    2. (a) nginx /lnd-connect-info location double-adds ACAO (backend + nginx add_header). Strip the nginx add_header Access-Control-Allow-Origin there; backend owns CORS. Update bootstrap.rs nginx patch to remove it on existing nodes (idempotent).
    • Verify: rebuild backend, deploy to .116, run harness → expect 3/3 (or 4 assertions) PASS on .116 AND .103.

🔴 PRIORITY — cloud / federation / mesh

Dupes, erroneous names, and non-convergent group membership across nodes. Expected: trusted nodes form a transitive group (every node connects to any newly-added trusted node; all nodes show the same set). .103 has a long/dirty list.

Federated peer "sapien" shows TWO chats: one "sapien" WITHOUT archy logo (looks non-federated) + one named by raw DID did:key:z6MkoSbN5CM7fBaQg2nWbCymEkFXsHnuXvec9Mjo5RtJf9dQ. Same node keyed by both federated identity and raw DID → merge to one. Code: core/archipelago/src/mesh + mesh/typed_messages.rs (note :233 — meshcore adverts don't carry archy pubkey).

B3 — Cloud peer media won't preview/play — FIXING (code done: /api/peer-content streaming proxy + playMedia streams free content)

Music/video preview files on peer nodes' cloud don't play (streaming/range/content-type over mesh+Tor peer fetch).

Unexpected token '<', "<!doctype" when FileBrowser absent (/app/filebrowser/api/resources → SPA index.html), and 502 when FileBrowser is down (seen on .103). filebrowser-client.ts:102/:106. Fix: detect FileBrowser unavailable, friendly prompt; consider nginx returning JSON 404/502 for missing /app/<app>/ instead of SPA shell. Handle BOTH absent + down.

B14 — cloud browse transport not recorded — FIXED (record_peer_transport in 4 content handlers; build OK). NOTE: live data shows FIPS reaches only ~4/15 peers, 6 fall back to Tor genuinely → see B14b.

Browsing trusted/peer nodes in the Cloud tab connects over Tor instead of FIPS (should prefer FIPS like the rest of mesh; same for peer browsing). cf project_fips_integration, project_tor_node_to_node_works (last_transport should be fips/mesh).


🟠 APP-SPECIFIC

B6 — ElectrumX install gate — PARTIAL (pruned-node gate already works; "no node present" half DEFERRED: false-positive risk without UI test, needs package-presence check)

Show the yellow requirement badge when no full node / only a pruned node is present (reuse existing yellow badge pattern).

B7 — ElectrumX UI stuck loader on top — FIXED (overlay hides + iframe shows when status stale; type-check green). UI-confirm.

UI renders but a loader sits on top; possibly stale pre-sync screen not clearing.

B9 — IndeedHub keeps stopping on nodes — TODO

Container won't stay running (crash-loop / reconcile stop). Check logs + restart policy + health.

B10 — Immich still crashes — TODO

Recurring crash ("still" → prior attempts). Check container logs + resource limits + DB/ML deps.

B11 — Companion app: "open in external browser" apps don't work — TODO

Apps meant to open in a new/external browser don't launch from the companion app; need the phone-default-browser request-modal pattern mobile apps use. Relates to v1.7.90 "open in new tab from companion app".

B12 — Mempool not connecting — FIXED (mempool host detect, 3 paths; unit-tested). Live bitcoin-core validation PENDING (no core node available).

Bigger than the original "stacks.rs:1278" framing. CORE_RPC_HOST=bitcoin-knots was hardcoded in THREE env-render paths; on a bitcoin-core node the container is named bitcoin-core, so mempool-api can't resolve RPC. Both Knots and Core are reachable on archy-net by container name — only the name differs.

  • Path 1 — legacy direct-podman (stacks.rs::install_mempool_stack, used when no orchestrator): now format!("CORE_RPC_HOST={}", detect_bitcoin_rpc_host()). FIXED.
  • Path 2 — config.rs::get_app_config (install.rs legacy path): same. FIXED.
  • Path 3 — Quadlet/manifest (THE MODERN FLEET PATH, e.g. .198): prod_orchestrator renders env from apps/mempool-api/manifest.yml static YAML. FIXED via a new {{BITCOIN_HOST}} derived-env placeholder: HostFacts.bitcoin_host (container/manifest.rs) + resolve_derived_env renders it; prod_orchestrator::bitcoin_host() detects Knots/Core via podman ps (test-injectable set_bitcoin_host_for_test); resolved on-demand only for manifests using the placeholder (perf). mempool-api manifest moved CORE_RPC_HOST from static env → derived_env: {{BITCOIN_HOST}}.
  • New helper dependencies::detect_bitcoin_rpc_host() + pure pick_bitcoin_host().
  • TESTS (all green): pick_bitcoin_host 5 cases (knots/core/plain/none/substring-safety); container-crate resolve_derived_env renders {{BITCOIN_HOST}}; orchestrator mempool_core_rpc_host_follows_bitcoin_node (core→bitcoin-core, knots→bitcoin-knots). No-regression verified: picker returns bitcoin-knots live on .198 (so Knots nodes unchanged; existing mempool installs see no env drift).
  • VALIDATION GAP: cannot exercise on a live bitcoin-core node (none available; .198 is Knots where the fix is a no-op). Need a Core node to confirm end-to-end.
  • FOLLOW-UP (B12b, NOT done): same hardcode exists for siblings on bitcoin-core nodes — config.rs lnd(:724)/btcpay(:739)/electrumx(:782), and prod_orchestrator::resolve_dynamic_env fedimint FM_BITCOIND_URL=...bitcoin-knots (~:2425). Plus mempool-api manifest dependencies: bitcoin-knots (line 18) is Knots-specific bookkeeping (install-time check already accepts Core via BITCOIN_NAMES, so non-blocking). All can reuse {{BITCOIN_HOST}}. Deferred per user (mempool-only scope) — each needs its own validation, esp. LND/fedimint.
  • NOTE (unrelated pre-existing failures): 4 prod_orchestrator tests fail on clean HEAD too — install_applies_data_uid_chown_before_create, install_writes_manifest_generated_files_before_create, manifest_generated_files_{do_not_overwrite_by_default,can_overwrite_when_declared} — their fixtures pass tempdir volume sources that validate_bind_source rejects (only /var/lib/archipelago/* + 2 sockets allowed). NOT caused by B12; worth a separate fix. mempool can't reach the Bitcoin backend on some nodes. Investigate on .116. Check mempool→electrs→bitcoind wiring + deps.

B13 — Fedimint UI not applying CSS — FIXED + VERIFIED on .198 (both HTTP + HTTPS)

Root cause confirmed: the Fedimint Guardian page (served by :8175) is a server-rendered status page with ~7.8KB INLINE CSS plus image assets referenced root-rooted (src="/assets/img/app-icons/fedimint.jpg", url("/assets/img/bg-network.jpg")). Without an asset rewrite those /assets/... URLs resolve against the archipelago SPA root: bg-network.jpg happens to exist there (shared design asset → loaded by luck) but app-icons/fedimint.jpg does NOT → 404 (the broken/visibly-missing icon). The location /assets/ block uses try_files $uri =404, so missing fedimint assets 404 rather than fall through.

Fix = nginx sub_filter set that reroots every root-rooted asset URL (href="/, src="/, url("/, and single-quote variants) under /app/fedimint/, plus proxy_set_header Accept-Encoding "" so the upstream doesn't gzip (sub_filter can't rewrite gzipped bodies). Shipped two ways:

  • Fresh ISOs (committed a50b6df2): templates image-recipe/configs/nginx-archipelago.conf (HTTP) + image-recipe/configs/snippets/archipelago-https-app-proxies.conf (HTTPS).
  • Already-deployed nodes (bootstrap self-heal, this commit): core/archipelago/src/bootstrap.rs::patch_nginx_conf now heals BOTH the main conf (Style A — swaps the old single nostr-provider sub_filter tail for the full reroot set, byte-matches the shipped template) AND the HTTPS app-proxy snippet (Style B — anchors on the unique :8175 proxy_pass and inserts the reroot set; robust to the snippet's varying trailing directive). missing_* flags now gated on their splice anchors so the healed snippet early-returns cleanly (no per-boot warn-skips). Idempotent via the 'href="/' 'href="/app/fedimint/' marker.

VERIFIED on .198 (sideloaded built binary, restart, async self-heal converged ~15s):

  • HTTP /app/fedimint/: live conf healed byte-identical to template; app-icon 404→200 image/jpeg (41944b).
  • HTTPS /app/fedimint/ (snippet): healed; same app-icon 404→200; bg-network 200; root /assets/img/app-icons/fedimint.jpg returns 200 text/html (SPA shell) — proving the reroot is necessary.
  • nginx -t OK both times; containers survived restart (Quadlet); both files carry the marker exactly once (idempotent steady state); no warn spam in logs. NOTE: self-healed snippet is functionally correct but NOT byte-identical to the fresh-ISO snippet template (insert-after-proxy_pass vs full block) — acceptable; nginx ignores directive order/whitespace.

B15 — Bitcoin UI sync progress lags — FIXED (Home.vue poll 30s→10s). UI-confirm.

Bitcoin UI doesn't update its sync progress fast enough even though the console clearly already has the block-height data. Likely a polling-interval / reactive-update gap between the status source and the UI.

B16 — Bitcoin sync status vanishes — FIXED + UNIT-TESTED (commit 83dbd25c). UI-confirm.

The bitcoin sync status in the Home > System container disappears when it should persist/cache and show an "updating" state. Related to B15 (Bitcoin UI sync lag). Root cause: the tile is gated v-if="stats.bitcoinAvailable===true" (HomeSystemCard.vue:60); a transient bitcoin.getinfo failure (RPC busy during heavy IBD, or a route-change/scan where the packages map is momentarily empty) could blank it. FIX (commit 83dbd25c): added a bitcoinStale flag to homeStatus.ts —

  • getinfo fails while the bitcoin container is Running, OR package data is momentarily absent → retain last-known value + bitcoinStale=true (tile stays, renders "Updating…" instead of a frozen figure shown as live).
  • container authoritatively Stopped/ExitedbitcoinAvailable=false, stale=false (no stale-as-live — genuinely down is reflected).
  • first-ever poll times out but container Running (syncing node) → show the tile as updating rather than staying hidden. Wired bitcoinStale through Home.vue systemStats → HomeSystemCard prop; card shows "Updating…" (dimmed) when stale. Harness: neode-ui/src/stores/__tests__/homeStatus.test.ts (6 cases) — RED before fix (5/6 fail), GREEN after (6/6). vue-tsc --noEmit exit 0. Full vitest suite: only pre-existing AppIconGrid cross-test teardown flake (passes 7/7 standalone; not my change). UI-confirm on .116/.198 still recommended (hard to trigger transient failure on demand — unit test is the authoritative harness here).

B17 — archipelago.service flaps on boot before starting — FIXED + VERIFIED on .198 (commit 34b1fdc1)

On some boots, [FAILED] Failed to start archipelago.service printed ~20× over ~5 min before starting. ROOT CAUSE (proven live on .198): on production nodes /var/lib/archipelago is a separate /dev/mapper/archipelago-data ext4 volume (systemd unit var-lib-archipelago.mount), and podman's graphroot=/var/lib/archipelago/containers/storage lives on it too. The unit ordered only After=network-online.target — NO mount dependency — so on cold boots the service (and its ExecStartPre) could start BEFORE the volume mounted, write to the bare mountpoint on rootfs, fail every podman call, exit, and be restarted every 5s (Restart=on-failure RestartSec=5) until the mount appeared. Smoking gun in .198's journal: var-lib-archipelago.mount: Directory /var/lib/archipelago to mount over is not empty, mounting anyway — the service had written there pre-mount. Dev laptop .116 has the data dir on rootfs → never flaps (explains "on some boots"). Diagnostic: every node showed banners == "Server listening" (process always succeeds once it runs) ⇒ failure is systemd-level, not a Rust crash. FIX (commit 34b1fdc1): RequiresMountsFor=/var/lib/archipelago (adds Requires= + After= on the mount unit).

  • image-recipe/configs/archipelago.service: ships the directive on fresh ISOs.
  • bootstrap::ensure_archipelago_mount_ordering(): self-heals already-deployed nodes' installed /etc/systemd/system/archipelago.service + daemon-reload (boot-ordering only — effective next reboot; never restarts the running service). Idempotent; harmless on rootfs installs. VERIFIED on .198: applied directive → systemctl show -p After includes var-lib-archipelago.mount, systemd-analyze verify clean → rebooted: mount@07:35:22, archipelago banner@07:35:35 (13s AFTER mount), banners=1 listening=1 failed_to_start=0 (zero flap), directive persisted. cargo check EXIT 0. NOTE: self-heal CODE (auto-patch on deployed nodes) still to be exercised with the built binary on .228 (directive was applied manually on .198); residual rootfs shadow files under the mountpoint are benign.

B18 — Apps stop right after install (or become unstartable) — TODO

Many apps install but immediately stop, requiring a manual Start — or become unstartable entirely. Likely the install→start handoff / reconciler doesn't bring them up (or starts then they exit). Related to B9 (IndeedHub stopping), B10 (Immich). Possibly linked to the cgroup-SIGKILL-on-archipelago.service-restart issue (feedback_no_systemctl_deploy_until_quadlet) — but NOTE: on .116 (Quadlet) containers survived a service restart cleanly, so the reconciler may be fine there; reproduce on the affected nodes. Check post-install start sequencing + boot_reconciler + container restart policy + cgroup placement.

B19 — Failed download-update lands on Install button (should be Download) — TODO

When an update download fails, the UI sometimes shows the Install button instead of returning to the Download button — a big UX issue (user can't retry the download cleanly). Check the SystemUpdate state machine's error/failure transition.

B20 — Surface bitcoin-headers-over-mesh broadcast (send/receive toggles) — TODO (feature-adjacent, surfacing existing work)

We previously broadcast bitcoin block headers over mesh to archipelago nodes but never fully surfaced it. Want two switches: "send headers" (you broadcast) and "receive headers" (you accept). NOTE: this is feature-adjacent — surfacing existing functionality; the user added it during the no-new-features push, so treat as low-priority polish until the bug list is clear. Code: mesh block-headers (mesh.block-headers RPC seen in logs; core/archipelago/src/mesh).

B14b — FIPS reachability: many peers fall back to Tor — INVESTIGATED (needs FIPS-network depth)

Live (2026-06-15) federation sync last_transport on .116/.198: ~4 peers fips, ~6 tor, ~5 none. So beyond the recording fix (B14), FIPS genuinely doesn't reach many federated peers (they use Tor). Investigate WHY: is fips_npub known for those peers? are they FIPS-online? is the shared anchor connecting them? (cf project_fips_integration, project_tor_node_to_node_works). This is the real "Tor not FIPS" depth. FINDINGS (.198, 2026-06-15): archipelago-fips ACTIVE; ALL 13 peers HAVE fips_npub; last_transport = 5 fips / 5 tor / 3 none. So it's NOT a missing-npub or service-down bug — FIPS genuinely reaches some peers and not others = DIAL-TIME reachability: the 'tor' peers aren't FIPS-reachable at dial time (offline, NAT, their FIPS not registered with the shared anchor), and 'none' = fully offline (X250 roam/beta/cellular). NEXT (deeper, needs FIPS-network debugging): verify a known-online peer (e.g. .228/.116) is reachable over FIPS from .198 right now; if an online FIPS peer still falls back to Tor → real anchor/registration bug; check fips daemon peer table + anchor connectivity. Likely partly peer-availability (not fully fixable in code).

B21 — Show Tor/FIPS transport pill on cloud browse — FIXED (build+type-check green; deploy+UI-confirm on .116/.198)

Tag whether the peer connection is Tor or FIPS and surface it as a small pill on the cloud browse screens / connection loader. Data source: federation node last_transport (now recorded by B14) exposed via federation.list-nodes; frontend renders a pill (FIPS=fast/green, Tor=slower) on PeerFiles.vue / Cloud peer view + the connection loader. Frontend-only-ish. FINDINGS: PeerFiles.vue:46 loader HARDCODES 'Connecting via Tor...' even when FIPS used (bug). Frontend types already have last_transport ('fips'|'tor'|'mesh'|'lan') federation/types.ts:31; NodeList.vue:167 already renders a transport indicator. PLAN: have content.browse-peer RETURN the transport used (B14 already computes it) → frontend shows a pill (FIPS green / Tor amber) on PeerFiles header + fix the loader text to reflect actual/attempted transport. Small backend (add transport to browse response) + frontend pill.

B22 — Peer cloud download/audio errors (.228→.198) — TODO (pairs with B3)

Observed 2026-06-15 browsing .228's cloud from .198: (a) downloading a peer cloud file → "Operation failed. Check server logs for details." (b) playing a peer AUDIO file → "Could not play audio. File Browser may not be running." (misleading — it's a peer file, not File Browser; that's the OLD base64/blob path B3 replaces). ACTION: (a) check content.download-peer backend error on .198 logs while downloading (likely the same Range/transport/timeout path as B3, or a peer-side 4xx); (b) verify B3 streaming fixes peer audio once deployed, and fix the misleading audioPlayer error string. Get server logs: ssh .198, journalctl -u archipelago | grep -i 'content|peer|download'.

B23 — Archipelago group chat (all nodes) broken/slow over Tor — TODO (PRIORITY, mesh)

The all-nodes "Archipelago group" chat (over Tor) doesn't seem to work. Facets:

  • (a) Group delivery unreliable / "doesn't work" over Tor.
  • (b) Messages may just be VERY SLOW (latency — likely Tor-only path; should use FIPS+Tor per the new transport method like B14, preferring FIPS).
  • (c) Add the SENDER CONTACT NAME to each message so you can differentiate who sent what (group messages lack attribution).
  • (d) Messages sometimes DUPLICATED (dedup by message id / sender_seq — cf mesh.ts:73 cross-transport identity (sender_pubkey, sender_seq); duplicate likely from receiving same msg over both transports or re-broadcast). Code: core/archipelago/src/mesh (typed_messages, listener), frontend Mesh.vue/stores/mesh.ts. Relates to B2 (identity), B14/B14b (transport). Test on .116/.198 (+ a Tor-only peer like .228).

B8 — netbird app doesn't work — TODO (LOW / much later)

(RETRACTED: CryptPad placeholder-icon — user says cryptpad is fine.)


📋 vps2 Gitea issues (lfg2025/archy) — imported 2026-06-15

  • G#1 [Bug] Strange peer request behaviour — TODO (likely related to B1/federation)
  • G#2 [Bug] Fix flashing USB from kiosk — TODO
  • G#3 [Feature] VPN Configuration — DEFERRED (feature; no new features until production quality)
  • G#4 [Bug] Bitcoind is slow — TODO
  • G#5 [Feature] OpenWRT and TollGate integration — DEFERRED (feature)
  • G#6 [Feature] Move dashboard/monitoring link to home screen — DEFERRED (feature)
  • G#7 [Bug] Scrolling with Companion app — TODO

Gitea issue mapping (vps2 lfg2025/archy)

All backlog bugs now mirrored as Gitea issues: B1→#8, B2→#9, B3→#10, B4→#11, B5→#12, B6→#13, B7→#14, B8→#15, B9→#16, B10→#17, B11→#18, B12→#19, B13→#20, B14→#21, B15→#22, B16→#23, B17→#24, B18→#25, B19→#26. (Pre-existing G#17 remain; some overlap, e.g. G#1 strange-peer ≈ B1.) Close the Gitea issue when a bug is verified+shipped.

INVESTIGATION FINDINGS 2026-06-15 (B1/B2/B3/B4/B14) — cutoff insurance

B1 trusted-node divergence — ROOT-CAUSED. federation/sync.rs merge_transitive_peers() (~:140) dedupes ONLY by DID; the SAME physical node appears under multiple DIDs (same onion + fips_npub) → duplicate entries ("Arch Dev" ×2, "Sapien" ×2). No background convergence → lists diverge (.103=16 nodes, .116/.198=15). Model: federation/types.rs:24 FederatedNode (PK=did); storage federation/storage.rs nodes.json; add_node dedupes by DID only (:125). FIX: in merge_transitive_peers add a SECOND match arm — if no DID match, match by normalized onion (trim .onion); if found, treat as same node (merge fips_npub/name, don't add). Same dedup on add_node. Plus a one-time cleanup of existing dup DIDs (remove-node the stale one). TEST: after sync, all 3 nodes have identical node set, no two entries share an onion.

B2 duplicate chat contact — ROOT-CAUSED (same root as B1). Two federation DIDs (same onion/fips_npub, e.g. "Sapien" dids z6MkoSbN… + z6MkeYMU…) get seeded as TWO mesh contacts: mesh/mod.rs seed_federation_peers_into_mesh() (:94) upserts per-pubkey contact_id; frontend Mesh.vue mergeKeyForPeer() (:492) keys by DID so two DIDs = two rows. FIX: (backend) in seed, skip a node whose onion was already seeded (HashSet of onions); (frontend) Mesh.vue merge by onion when DIDs differ but onion matches. Fixing B1's onion-dedup largely resolves this too. TEST: one "Sapien" row; mesh.peers has one contact for the shared onion.

B3 peer media won't play — ROOT-CAUSED. PeerFiles.vue playMedia()/loadPreview() (~:358,:508) fetch the WHOLE file via RPC content.preview-peer/content.download-peer (api/rpc/content.rs :393,:213) which base64-encodes the entire file; frontend makes a Blob URL → browser can't Range-seek → video/large-audio won't play (+ 30/120s timeouts truncate big files). The peer's HTTP /content/<id> handler (api/handler/content.rs :49) ALREADY supports Range/206 + Accept-Ranges. FIX (bigger): add a local streaming proxy endpoint /api/peer-content/{onion}/{id} in api/handler/mod.rs that forwards the browser's Range header to the peer's /content/<id> (via fips::dial PeerRequest) and streams back 206 + Content-Range + Content-Type; frontend sets <video>/<audio> src to that URL (not a blob). TEST: curl Range on the new endpoint → 206 + Content-Range; video seeks/plays.

B4 cloud my-folders <!doctype/502 — ROOT-CAUSED. filebrowser-client.ts listDirectory() (:99) does res.json() (:106) after only an res.ok check; when FileBrowser is ABSENT nginx serves SPA index.html (200, '<!doctype') → JSON crash; when DOWN → 502. FIX (frontend, low-risk): guard res content-type !== application/json → throw typed "FileBrowser unavailable" handled by Cloud.vue/CloudFolder.vue empty-state; same guard in login() (:71) + getUsage() (:215). OPTIONAL nginx: add error_page 502 503 = @filebrowser_unavailable returning JSON in the /app/filebrowser/ block (image-recipe/configs/nginx-archipelago.conf ~:411). TEST: stop filebrowser on .116/.198 → Cloud shows friendly state, no doctype crash.

B14 cloud browse Tor-not-FIPS — ROOT-CAUSED (nuance). FIPS-first logic WORKS (fips/dial.rs send_get :331 tries FIPS, falls back to Tor on 404/5xx; v1.7.94 fix). BUT the 4 content handlers in api/rpc/content.rs (browse :297, download :237, download_paid :356, preview :421) capture _transport and NEVER call record_peer_transport() → UI badge shows Tor/null even when FIPS used. FIX: add record_peer_transport(data_dir, None, Some(onion), &transport.to_string()) after each successful send_get (storage.rs:84 has the fn). ⚠️ VERIFY on nodes whether FIPS is ACTUALLY used or genuinely falling back to Tor (if genuinely Tor, deeper FIPS-reachability issue beyond recording). TEST: after browse, last_transport = fips (when peer FIPS-reachable).

INVESTIGATION FINDINGS 2026-06-15 (B6/B7/B12/B13/B15/B16) — cutoff insurance

B13 Fedimint CSS — app HTML (docker/fedimint-ui/index.html) uses absolute /assets/* paths; under /app/fedimint/ the browser requests /assets/* which hit the main SPA, not :8175 → unstyled. FIX: nginx sub_filter rewrite (same proven pattern as indeedhub/botfights blocks) in image-recipe/configs/nginx-archipelago.conf (/app/fedimint/ :641) + snippets/archipelago-https-app-proxies.conf (:164) + bootstrap patch for existing nodes. Rewrites href/src/url '/' → '/app/fedimint/'. TEST: curl .../app/fedimint/assets/...css → 200 real CSS.

B6 ElectrumX archival gate — electrs needs a NON-pruned full node; install card doesn't warn at a glance. /bitcoin-status returns blockchain_info.pruned. Yellow badge pattern exists (MarketplaceAppCard.vue). FIX (frontend, simple): show a yellow "Requires a full archive Bitcoin node (not pruned)" note on the electrumx card (MarketplaceAppCard.vue ~:53). catalog.json electrumx already has requires.

B7 ElectrumX stuck loader — sync overlay gated by electrsSync (useElectrsSync.ts syncing = status!=='synced'); if status never flips to 'synced' (stale/crash) the overlay blocks the UI forever. AppSessionFrame.vue:44 iframe gate !electrsSync. FIX (frontend): fail-open — allow iframe when electrsSync?.stale (and add a timeout in useElectrsSync.ts so a slow/stale status stops blocking after ~5min).

B15 bitcoin sync UI lag — Home.vue:485 polls every 30s. FIX: faster bitcoin refresh (~5-10s) (separate interval for bitcoin vs system stats).

B16 bitcoin status vanishes — homeStatus.ts refreshBitcoin clears/leaves bitcoinAvailable null on a failed/transitional poll → HomeSystemCard.vue:60 v-if hides the card. FIX: retain last-known bitcoinAvailable on transient failure + show an "Updating…" badge instead of disappearing.

B12 mempool not connecting — stacks.rs:1278 + apps/mempool-api/manifest.yml:50 hardcode CORE_RPC_HOST=bitcoin-knots; on nodes running bitcoin-core (not knots) mempool-api gets getaddrinfo ENOTFOUND bitcoin-knots. Also ELECTRUM_HOST=electrumx absent on pruned nodes (docs/CONTAINER_LIFECYCLE_HANDOFF.md:654). FIX: detect which bitcoin container runs (knots vs core) + set CORE_RPC_HOST dynamically; qualify the mempool stack so it doesn't half-start without electrumx. Backend (stacks.rs) — medium risk, test on .116.

  • 2026-06-15 (cont. 2): B15 (poll 30s→10s) + B7 (ElectrumX loader fail-open on stale) — committed c0d41cf8, type-check green. B6 PARTIAL (pruned gate already works; no-node-present half deferred). Fanned out investigations for B6/B7/B12/B13/B15/B16 — all root-caused with fix plans in FINDINGS above.
  • DEFERRED with ready plans (need a backend build + careful patch, or UI test, or live repro): B13 (fedimint CSS — nginx sub_filter asset rewrite; bootstrap exact-match patch is fragile, do carefully), B12 (mempool host — dynamic bitcoin-knots/core detect in stacks.rs), B16 (bitcoin status retain — UI-test to avoid stale-as-live), B6 no-node-present half, B14b (FIPS net depth), B22/B23 (need live repro).
  • NEXT options: (a) continue backend batch B13+B12 (one build); (b) do UI confirms on .116/.198 + cut v1.7.97-alpha with the ~10 committed fixes (LND incident + cloud/federation/mesh).
  • Committed fixes awaiting .97: B5, B1, B2, B4, B14, B21, B3, B15, B7 (+ B6 pruned-gate already live). All on vps2 main; NOT on fleet yet.

Progress log

  • 2026-06-15: tracker created. v1.7.96-alpha shipped. All 19 bugs filed as Gitea issues #8#26. vps2 feature issues (G#3/5/6) deferred (no new features).
  • 2026-06-15: B5 (LND CORS) DONE — root-caused, both fixes implemented, verified on .116/.198/.103 (harness 4/4 each), committed 1db720af, pushed to vps2 main. Will bundle into .97 (Gitea #12 to close on .97 ship).
  • Validation nodes: .116 + .198 (pw ThisIsWeb54321@). Runtime is podman (docker not in non-interactive PATH). Sideload binary → /usr/local/bin/archipelago + restart (containers survive on these nodes).
  • 2026-06-15 (cont.): B1,B2,B4 dedup+guard — committed ed493106, unit-tested 2/2, live .198 healthy. B14 transport recording — committed 1c6dc153 (after build-repair: used private crate::federation::storage:: path → E0603; fixed to re-exported crate::federation::). B21 Tor/FIPS pill — committed 0801dd66. All pushed to vps2 main; builds verified EXIT 0.
  • Discovered B14b (FIPS reaches only ~4/15 peers; rest genuinely Tor) and B21 (pill) during the block.
  • ⚠️ LESSON: a backgrounded build "completed" notification does NOT mean success — grep the EXIT code before committing (a broken commit reached main once; repaired by 1c6dc153; no release cut from it → fleet unaffected).
  • NEXT: B3 (peer media streaming — big), then B14b (FIPS reachability), then app-specific (B6,B7,B9B13,B15B19). None deployed to fleet yet — all on vps2 main awaiting the .97 release after full .116/.198 + UI verification.

New backlog issues filed 2026-06-16 (this session)

  • #32 Tor chat: message stuck on spinner though peers received it (task #8)
  • #33 Message toast: click-to-open chat + close icon (task #9)
  • #34 Local UI images never rebuild on source change — orchestrator gap (task #7); blocks OTA of bitcoin-ui relay + fedimint CSS to existing fleet
  • #35 Paid 10% video previews unplayable — truncated MP4 (task #6) NOTE: bitcoin RPC relay UI + fedimint guardian CSS now LIVE on .116 (image rebuilds); .198 deploy in progress. Bitcoin app launches host-net UI at :8334 (not /app/bitcoin-ui/ proxy).