The backend already sends did in federation peer lists, but the Peer
type omitted it and federationNodeToPeer() dropped it when mapping. Add
did?: string to Peer and pass node.did through, so trusted/observer
node rows route to Federation/Mesh by their real DID (falling back to
pubkey/onion) instead of failing the build on a missing property.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The #26 fix makes has_staged_update require the .download-complete
marker, so the state self-heal treats a marker-less staging dir as a
partial download and clears update_in_progress. The roundtrip test
staged a binary file but not the marker, so it began failing. Write
the marker to simulate a *complete* staged update.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The kiosk chromium pinned ~92% of a core (software-compositing spin from
--enable-gpu-rasterization on a GPU-less/headless node), saturating the machine
and starving the backend + container builds — it caused the .198 receive timeout
and the deploy storms.
- archipelago-kiosk.service: CPUQuota=75% + MemoryMax/High + Delegate, so a
runaway kiosk can never take the whole node down.
- archipelago-kiosk-launcher.sh: detect /dev/dri — use GPU rasterization only
when a GPU exists, else --disable-gpu (avoids the headless spin).
- bootstrap::ensure_kiosk_hardened: OTA self-heal that installs the updated
unit+launcher on already-deployed nodes, daemon-reloads, and only try-restarts
a *running* kiosk (never re-enables an operator-disabled one).
cargo check clean; launcher bash -n clean; unit syntax valid.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
immich_server/redis/postgres + indeedhub-* are multi-container stack members
whose sub-container app_ids are NOT in package_data, so the health monitor skips
them as "orphans" and never restarts them when they exit — Immich/IndeedHub stay
down until the next reboot (the boot-only start_stopped_stack_containers was the
only recovery). Spawn a 120s supervisor that reuses that same recovery at
runtime. It cheaply skips already-running containers and honours the user-stopped
list (set on every container by package.stop), so it only revives genuinely
crashed members and never fights a user stop.
cargo check clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- All four tabs (trusted/observers/messages/requests) capped at max-h-72 with
internal scroll, so the screen stays short instead of growing very long.
- Clicking a node row navigates to that node in the Federation screen
(?node=did); the Message button (stop-propagation) deep-links to that peer\047s
mesh chat (?peer=), using the Mesh.vue ?peer handler.
type-check clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A resumable-but-failed download leaves partial component files in update-staging.
has_staged_update() treated ANY staged file as "install-ready", so the state
self-heal kept update_in_progress=true and the UI showed Install instead of
Download (no clean retry).
- update.rs: write a .download-complete marker only after EVERY component
downloads+verifies; has_staged_update() now checks that marker. Partial/failed
downloads (no marker) correctly read as not-staged → self-heal clears
update_in_progress → UI shows Download. Resume still works (partial files kept).
- SystemUpdate.vue: on a genuine download failure, reset downloaded/in_progress
and re-sync, so the user lands back on Download immediately.
cargo check + vue-tsc clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
sendArchMessage looped over every federation node sequentially (await
sendMessageToPeer per node), so the spinner stayed up until the slowest/offline
node's Tor request finished — long after online peers had received the message.
Send to all peers concurrently (Promise.allSettled); the spinner now clears
after the slowest single delivery, not the sum.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- DID: the Identity card read the DID only from localStorage('neode_did'), so
nodes/browsers that never cached it (e.g. .116/.228) showed no DID. Fall back
to the node.did RPC and cache it — the DID now shows everywhere.
- npub: add the node's seed-derived Nostr public key (npub) to the Identity card
next to the DID + onion, fetched from node.nostr-pubkey, with a copy button.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make each peer file card a flex column filling its grid cell (flex flex-col
h-full) and pin the body row (filename + Play/Download) with mt-auto, so cards
with a media preview and cards without line their footers up across the row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
User: chat history (messages + mesh/Tor contacts) must persist and be
secure/encrypted per best practice. Root cause of the .198 loss was the B17
mount race writing empty stores over real data (B17 already fixes the trigger);
this hardens storage so it can never silently lose or expose data:
- storage_crypto: shared at-rest envelope mirroring credentials::store — key =
SHA-256(domain ‖ node identity key) (seed-derived, per-store domain
separation), ChaCha20-Poly1305 AEAD with a random 96-bit nonce, tamper-evident.
Transparent migration of legacy plaintext files. Unit-tested (round-trip,
wrong-key/tamper rejection, plaintext detection).
- messages.json: encrypted at rest + ATOMIC write (temp+rename) so a crash/
reboot mid-write cannot corrupt history; decrypt-with-migration on load; a
failed decrypt never overwrites the on-disk data.
- mesh contacts (alias/notes/pinned/blocked): were ONLY in memory and lost on
every restart — now persisted to mesh-contacts.json (encrypted, atomic),
loaded on MeshState startup, saved after contacts-save/contacts-block.
Explicit clear (mesh.clear-all) still wipes everything, as intended.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
User priority: FIPS is the main transport but it was unreliable and needed a
manual "Activate" button. Improvements (all in the FIPS dial/supervisor):
- Auto-activate: ensure_activated() installs the daemon config + starts the
service on its own once seed onboarding has materialised the key — no Activate
button needed. Idempotent; runs from the supervisor every 45s so a node that
onboards after boot still comes up automatically.
- Dial retry: try_fips_get/post now retry ONCE on a connect/timeout error. The
first dial to a peer triggers NAT hole-punching and often times out before the
path is up; the retry lands on the now-warm path — the main reason calls were
dropping to Tor despite the peer being FIPS-reachable.
- More patient connect_timeout (5s→8s) so a reachable-but-cold peer isn't
abandoned to Tor while hole-punching completes.
- Path warmer: spawn_fips_supervisor() keeps hole-punched paths to known
federation peers warm (every 45s, concurrent), so on-demand dials are fast and
land on FIPS.
- Confirmed the daemon config already enables BOTH udp + tcp transports
(render_config_yaml), so FIPS already uses TCP where UDP is blocked; the Tor
fallback was path-establishment, addressed above.
cargo check + fmt clean. Backend — needs a binary rebuild+deploy to validate on
.116/.198 (watch last_transport flip fips, and FIPS coming up with no button).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Add a close (X) button to the message toast (closeToast, @click.stop) like the
system notifications.
- Carry the sender pubkey on the toast; clicking now deep-links to that
conversation (/dashboard/mesh?peer=<pubkey>) instead of the generic mesh page.
- Mesh.vue reads ?peer= on mount and opens the matching peer (by pubkey_hex/did),
gracefully falling back to the mesh list when no match (B1/B2 identity).
type-check clean; useMessageToast tests 11/11.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Streaming a peer file connects over mesh/Tor before the first frame, so the
player sat blank. Add a loading state:
- PeerFiles video modal: spinner overlay ("Connecting to peer…") until the
<video> fires playing/canplay; an error overlay on failure instead of a
silent black box.
- useAudioPlayer: loading flag driven by loadstart/waiting vs canplay/playing;
GlobalAudioPlayer shows a spinner in the transport button while connecting.
- Fix the misleading audio error "Could not play audio. File Browser may not be
running." (wrong for peer content) → "Could not play this audio file. The peer
may be offline…" (B22).
type-check clean; useAudioPlayer tests 10/10.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- docker/fedimint-ui/nginx.conf: the local /assets/ handler 404'd the real
fedimint guardian UI's own bundled CSS (bootstrap.min.css, style.css) →
unstyled app. B13 fixed our local icon; this adds a @guardian_assets proxy
fallback to :8177 so the guardian's own /assets/* resolve. Verified live on
.116: /app/fedimint/assets/bootstrap.min.css 404→200 text/css. (needs
archy-fedimint-ui image rebuild to persist on nodes.)
- Home.vue: Quick Start Goals card regained lg:col-span-2 so it fills its row
on desktop instead of sitting at half width.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B4 fix made listDirectory require a JSON content-type (to detect the
SPA-fallback HTML / 502 cases) and changed the non-OK error string, but its
tests still mocked headerless responses + the old message, so they failed —
which also polluted the run and tripped AppIconGrid's teardown. Give the JSON
mock a content-type, update the non-OK expectation, and add a test for the
guard's friendly-error path. Full suite now 667/667 green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On production nodes /var/lib/archipelago (the app data dir AND podman's
graphroot=/var/lib/archipelago/containers/storage) is a separate
device-mapper volume. archipelago.service ordered only After=network-online
.target, so on cold boots it (and its ExecStartPre) could start BEFORE
var-lib-archipelago.mount, write to the bare mountpoint on rootfs, fail every
podman call, exit, and be restarted every 5s until the volume mounted — the
"~20x [FAILED] Failed to start over ~5min" boot flap. Proven live on .198:
"var-lib-archipelago.mount: Directory /var/lib/archipelago to mount over is
not empty, mounting anyway" — the service had written there pre-mount.
Fix: RequiresMountsFor=/var/lib/archipelago (adds Requires= + After= on the
mount unit).
- image-recipe/configs/archipelago.service: ships the directive on fresh ISOs.
- bootstrap::ensure_archipelago_mount_ordering(): self-heals already-deployed
nodes' installed unit + daemon-reload (boot-ordering only, effective next
reboot; never restarts the running service). Idempotent; harmless on rootfs
installs (maps to the always-mounted root).
Verified on .198: after applying, systemctl shows After=var-lib-archipelago
.mount and systemd-analyze verify is clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Home > System bitcoin tile is gated on bitcoinAvailable===true, so any
transient bitcoin.getinfo failure (RPC busy during heavy IBD, route-change
scan) could blank it even though the node is fine. Add a bitcoinStale flag:
- getinfo fails while the container is Running, or package data is momentarily
absent → retain the last-known value and mark it stale (tile stays, shows
"Updating…" instead of a frozen figure presented as live).
- container authoritatively Stopped/Exited → flip to not-available as before
(no stale-as-live).
- first-ever poll times out but container Running → show the tile as updating
rather than staying hidden on a syncing node.
Harness: src/stores/__tests__/homeStatus.test.ts (6 cases) — red before, green
after. type-check clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CORE_RPC_HOST was hardcoded to bitcoin-knots in three env-render paths, so on a
bitcoin-core node (container named bitcoin-core) mempool-api could not reach
Bitcoin RPC. Both node variants are reachable on archy-net by container name —
only the name differs.
- Legacy direct-podman (stacks.rs) and config.rs::get_app_config now use a new
dependencies::detect_bitcoin_rpc_host() (pure, unit-tested pick_bitcoin_host).
- Quadlet/manifest path (the modern fleet default): add a {{BITCOIN_HOST}}
derived-env placeholder — HostFacts.bitcoin_host + resolve_derived_env render
it; prod_orchestrator detects Knots/Core via podman ps, resolved on demand
only for manifests that use the placeholder. mempool-api manifest moves
CORE_RPC_HOST from static env to derived_env: {{BITCOIN_HOST}}.
Tests: pick_bitcoin_host (5 cases incl. substring safety), container-crate
resolve_derived_env, and orchestrator mempool_core_rpc_host_follows_bitcoin_node
(core->bitcoin-core, knots->bitcoin-knots). No-regression confirmed: picker
returns bitcoin-knots live on .198. Live bitcoin-core validation pending (no
core node available). Sibling hardcodes (lnd/btcpay/electrumx/fedimint) tracked
as B12b.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B13 template fix only fixed fresh ISOs. Already-deployed nodes keep their
old nginx config, where /app/fedimint/ proxies to :8175 without rewriting the
Guardian UI's root-rooted asset URLs (src="/assets/...", url("/assets/...")).
Those resolve against the SPA root: bg-network.jpg exists there by luck, but
app-icons/fedimint.jpg 404s (location /assets/ uses try_files =404) — the
visibly-broken icon.
bootstrap.rs::patch_nginx_conf now heals both paths on startup:
- Style A (main conf, HTTP): swaps the old single nostr-provider sub_filter tail
for the full reroot set; byte-matches the shipped template.
- Style B (HTTPS app-proxy snippet): the snippet's fedimint block has no
sub_filter and a per-node-varying trailing directive, so anchor on the unique
:8175 proxy_pass and insert the reroot set after it (nginx ignores directive
order). Snippet added to the bootstrap nginx loop (skipped on HTTP-only nodes).
missing_* flags are now gated on their splice anchors so the included snippet
neither attempts the main-conf-only patches nor logs warn-skips every boot.
Idempotent via the 'href="/' 'href="/app/fedimint/' marker.
Verified on .198 (both paths): fedimint app-icon 404 -> 200 image/jpeg; nginx -t
OK; containers survived restart (Quadlet); idempotent steady state, no warn spam.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fedimint UI HTML/CSS reference absolute /assets/* paths; under /app/fedimint/
those hit the main SPA, not the fedimint container, so the UI renders
unstyled. Add the proven sub_filter asset-rewrite pattern (as indeedhub/
botfights use) to the /app/fedimint/ block in the nginx template + https
snippet (also rewrites url(...) for the CSS background image). Bootstrap
self-heal for already-deployed nodes is the documented resume point.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
B15: Home system stats (incl. bitcoin sync %) polled every 30s — too slow;
now 10s so sync progress tracks the actual block height more closely.
B7: the ElectrumX sync overlay was gated only on status!=='synced', so if
the status never flips to 'synced' (ElectrumX stale/disconnected) the loader
stuck on top forever. Now the overlay hides and the app iframe loads when
the sync status is stale (fail-open), while still showing during active
indexing. type-check EXIT 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B3 streaming proxy endpoint existed in the backend but nginx had no
location for /api/peer-content/*, so the browser's requests fell through to
the SPA (200 text/html) and media still wouldn't play. Add an
NGINX_PEER_CONTENT_BLOCK that bootstrap patches into every server block
(forwards Cookie for session auth + Range, proxy_buffering off). Idempotent;
covers fresh-ISO nodes too since bootstrap runs on every startup.
Verified on .198: after restart the async nginx patch lands and
/api/peer-content/<onion>/<id> returns 401 (reaches backend, auth-gated)
instead of the SPA; nginx block present in both server blocks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Peer media (music/video) wouldn't play: the frontend downloaded the whole
file via RPC as base64 and made a non-seekable Blob URL, so <video>/large
<audio> stalled and big files hit the RPC timeout.
Add GET /api/peer-content/<onion>/<id> — a same-origin, session-gated proxy
that forwards the browser's Range header to the peer's /content/<id> (which
already returns 206 Partial Content) and passes status + Content-Range +
Content-Type back. PeerFiles.playMedia() now points <video>/<audio> at this
streaming URL for free content instead of buffering a base64 blob, so the
player can seek and start immediately. Onion/id validated to prevent
SSRF/path traversal. (Paid preview keeps its existing flow.)
Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
content.browse-peer now returns the transport that actually reached the
peer (fips/tor/mesh/lan). PeerFiles shows it as a small coloured pill next
to the peer name (FIPS/Mesh green, LAN blue, Tor amber) and the loading
text no longer hardcodes "Connecting via Tor" (it was misleading when FIPS
was used). Pairs with B14 (transport recording).
Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B14 commit referenced crate::federation::storage::record_peer_transport
but `storage` is a private module — record_peer_transport is re-exported at
crate::federation::. E0603 broke the build. Use the re-exported path (as
load_nodes/fips_npub_for_onion already do). Verified: cargo build --release
EXIT 0. Also logs B21 (Tor/FIPS pill) plan.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 4 content peer handlers (browse, download, download_paid, preview)
captured the transport returned by PeerRequest::send_get() but discarded
it, so the federation node's last_transport was never updated for cloud
activity — the UI showed Tor/none even when FIPS was used. Call
record_peer_transport() after each successful fetch (same as sync does).
Note: live data shows FIPS still reaches only some peers (many genuinely
fall back to Tor) — tracked separately as B14b (FIPS reachability).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
B1/B2: the same physical node can linger in the federation list under two
dids (e.g. after a did/key change). An onion is a node's unique stable
identity, so two entries with the same onion are one node. This showed the
node twice in the trusted-node list (B1) and as two mesh chat contacts —
one by name+logo, one by raw did (B2).
- storage::load_nodes now collapses same-onion entries (keep first, merge
fips_npub/name/last_state) so every consumer (list + chat seed + sync)
sees one entry per node.
- federation::sync merge_transitive_peers also matches by onion (not just
did) so new transitive hints don't re-add a known node under a new did.
- mesh::seed_federation_peers_into_mesh skips already-seeded onions (belt
and suspenders).
- Unit tests for dedup_nodes_by_onion (collapse + onion-suffix handling).
B4: filebrowser-client.listDirectory only checked res.ok before res.json(),
so when File Browser is absent (nginx serves the SPA index.html, 200) or
down (502) the JSON parse threw the opaque "Unexpected token '<'". Now it
checks the content-type and throws a friendly "File Browser is not
available" the Cloud view already renders as an empty state.
Verified: dedup unit tests 2/2; live .198 (15 entries→13 distinct onions)
restarted healthy on new binary; B4 guard present in built bundle + deployed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>