Large peer downloads (~178MB) failed with a generic 'Operation failed', and
the download path had three stacked problems:
- The FIPS reqwest client used a hard-coded 20s total timeout regardless of the
caller's .timeout(), so a big transfer over the mesh aborted at 20s before
the Tor fallback could help. Honor the per-request timeout (client_with_timeout).
- The peer-content proxy buffered the whole file into node memory via
resp.bytes() before sending a byte, and capped the transfer at 60s. Stream
the body through with hyper::Body::wrap_stream (constant memory) and raise the
timeout to 900s; bump the nginx peer-content read timeout to match.
- Free downloads pulled the file as base64 over RPC, doubling it in node memory
and the browser — fatal for large files. Download free files by streaming
from /api/peer-content straight to disk, after a 1-byte Range probe that
surfaces the real reason (peer offline on mesh and Tor) instead of a generic
failure. Paid downloads now return the real error through the {error} channel
the UI already displays.
Adds the reqwest 'stream' feature for bytes_stream().
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Apps could fail install when a stack member exited on its first start
because a dependency (db/redis/the bitcoin node) was not ready yet — a
transient crash, not a broken install. wait_for_stack_containers now
restarts each exited/dead container up to 3 times before declaring the
install failed; the runtime supervisor keeps it alive afterwards.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make each peer file card a flex column filling its grid cell (flex flex-col
h-full) and pin the body row (filename + Play/Download) with mt-auto, so cards
with a media preview and cards without line their footers up across the row.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
User: chat history (messages + mesh/Tor contacts) must persist and be
secure/encrypted per best practice. Root cause of the .198 loss was the B17
mount race writing empty stores over real data (B17 already fixes the trigger);
this hardens storage so it can never silently lose or expose data:
- storage_crypto: shared at-rest envelope mirroring credentials::store — key =
SHA-256(domain ‖ node identity key) (seed-derived, per-store domain
separation), ChaCha20-Poly1305 AEAD with a random 96-bit nonce, tamper-evident.
Transparent migration of legacy plaintext files. Unit-tested (round-trip,
wrong-key/tamper rejection, plaintext detection).
- messages.json: encrypted at rest + ATOMIC write (temp+rename) so a crash/
reboot mid-write cannot corrupt history; decrypt-with-migration on load; a
failed decrypt never overwrites the on-disk data.
- mesh contacts (alias/notes/pinned/blocked): were ONLY in memory and lost on
every restart — now persisted to mesh-contacts.json (encrypted, atomic),
loaded on MeshState startup, saved after contacts-save/contacts-block.
Explicit clear (mesh.clear-all) still wipes everything, as intended.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CORE_RPC_HOST was hardcoded to bitcoin-knots in three env-render paths, so on a
bitcoin-core node (container named bitcoin-core) mempool-api could not reach
Bitcoin RPC. Both node variants are reachable on archy-net by container name —
only the name differs.
- Legacy direct-podman (stacks.rs) and config.rs::get_app_config now use a new
dependencies::detect_bitcoin_rpc_host() (pure, unit-tested pick_bitcoin_host).
- Quadlet/manifest path (the modern fleet default): add a {{BITCOIN_HOST}}
derived-env placeholder — HostFacts.bitcoin_host + resolve_derived_env render
it; prod_orchestrator detects Knots/Core via podman ps, resolved on demand
only for manifests that use the placeholder. mempool-api manifest moves
CORE_RPC_HOST from static env to derived_env: {{BITCOIN_HOST}}.
Tests: pick_bitcoin_host (5 cases incl. substring safety), container-crate
resolve_derived_env, and orchestrator mempool_core_rpc_host_follows_bitcoin_node
(core->bitcoin-core, knots->bitcoin-knots). No-regression confirmed: picker
returns bitcoin-knots live on .198. Live bitcoin-core validation pending (no
core node available). Sibling hardcodes (lnd/btcpay/electrumx/fedimint) tracked
as B12b.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Peer media (music/video) wouldn't play: the frontend downloaded the whole
file via RPC as base64 and made a non-seekable Blob URL, so <video>/large
<audio> stalled and big files hit the RPC timeout.
Add GET /api/peer-content/<onion>/<id> — a same-origin, session-gated proxy
that forwards the browser's Range header to the peer's /content/<id> (which
already returns 206 Partial Content) and passes status + Content-Range +
Content-Type back. PeerFiles.playMedia() now points <video>/<audio> at this
streaming URL for free content instead of buffering a base64 blob, so the
player can seek and start immediately. Onion/id validated to prevent
SSRF/path traversal. (Paid preview keeps its existing flow.)
Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
content.browse-peer now returns the transport that actually reached the
peer (fips/tor/mesh/lan). PeerFiles shows it as a small coloured pill next
to the peer name (FIPS/Mesh green, LAN blue, Tor amber) and the loading
text no longer hardcodes "Connecting via Tor" (it was misleading when FIPS
was used). Pairs with B14 (transport recording).
Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B14 commit referenced crate::federation::storage::record_peer_transport
but `storage` is a private module — record_peer_transport is re-exported at
crate::federation::. E0603 broke the build. Use the re-exported path (as
load_nodes/fips_npub_for_onion already do). Verified: cargo build --release
EXIT 0. Also logs B21 (Tor/FIPS pill) plan.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 4 content peer handlers (browse, download, download_paid, preview)
captured the transport returned by PeerRequest::send_get() but discarded
it, so the federation node's last_transport was never updated for cloud
activity — the UI showed Tor/none even when FIPS was used. Call
record_peer_transport() after each successful fetch (same as sync does).
Note: live data shows FIPS still reaches only some peers (many genuinely
fall back to Tor) — tracked separately as B14b (FIPS reachability).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The LND wallet UI (served on its own app port) fetches /lnd-connect-info
and /proxy/lnd/* cross-origin, so both need correct CORS headers.
(a) Older nginx configs add their own Access-Control-Allow-Origin in the
/lnd-connect-info location on top of the one the backend sets, yielding
a DUPLICATE header that browsers reject ("multiple values"). bootstrap
now strips that redundant nginx add_header (backend owns CORS).
(b) /proxy/lnd/* returned a 401 with no CORS headers when the session
check failed, so the browser saw an opaque CORS error instead of a
readable 401. Add unauthorized_cors() and use it on that path.
Adds tests/production-quality/ (bug tracker + lnd-cors-test.sh harness).
Verified: harness 4/4 on .116, .198, .103.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FIPS peer content browse over the mesh was failing with "Peer returned
error: 404 Not Found" and never falling back to Tor. `is_peer_allowed_path`
only allowed `/content/<id>` (item fetches) — the catalog endpoint is
exactly `/content` (no trailing slash), so it 404'd over the FIPS peer
listener. A FIPS 404 was also treated as a successful response, so the dial
never retried Tor. Fixes: allow `/content` over the mesh; add
`fips_should_fall_back()` so a FIPS 404/5xx in Auto mode falls back to Tor
(handles version-skew peers reaching a different route). Also correct the
reconnect hint text — the public anchor is TCP/8443, not UDP/8668.
Federation: deleted nodes reappeared because transitive discovery
(`merge` of a peer's advertised trusted peers) re-added any unknown DID.
Add a tombstone store (`removed-nodes.json`): remove_node tombstones the
DID, transitive merge skips tombstoned DIDs, and a remote-triggered
peer-joined is ignored for a removed DID. Explicit local re-add (add_node)
clears the tombstone.
UI: the app credentials modal panel stretched edge-to-edge (height:100%,
max-width:none, items-stretch overlay). Constrain it to a centered card
(max-width 34rem, rounded, dimmed full-screen backdrop) matching the
AppIconGrid / wallet-receive modal.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The whole fleet was silently never reaching the FIPS mesh: the default
public anchor was configured as fips.v0l.io:8668/udp, but the anchor only
answers on TCP/8443. Fix the default to 185.18.221.160:8443/tcp (IPv4
literal — the hostname resolves IPv6-first and the daemon binds v4-only,
which fails the handshake with EAFNOSUPPORT), and auto-seed it in
anchors::load() so every node dials it without operator action (removal
still persists). Proven live on .116: cold start → anchor_connected in
~400ms, anchor became mesh parent.
Wire fips::update::apply() against upstream GitHub releases (stable
channel only): resolve /releases/latest → SHA256-verify the .deb against
checksums-linux.txt → install → restart. dpkg runs via `systemd-run` to
escape archipelago's ProtectSystem=strict sandbox (else /var/lib/dpkg is
read-only), with --force-confold (archipelago manages /etc/fips conffiles)
and --force-downgrade (dev builds sort newer than the stable tag).
Validated live: .116 upgraded 0.3.0-dev -> stable v0.3.0.
Also: standalone fips-ui dashboard app (apps/fips-ui + docker/fips-ui,
static nginx proxying /rpc/v1 same-origin, copiable own-anchor address);
reserve UI port 8336; register fips/fips-ui as platform-managed. Includes
the Lightning wallet cross-origin (CORS) + LND proxy auth + nginx
self-healer fix so the wallet screen connects instead of "failed to fetch".
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the fleet-wide hardcoded WALLET_PASSWORD='hellohello' that left wallets
LOCKED after OTA/reboot (auto-unlock used the wrong password fleet-wide).
Forward fix (both init paths unified, validated cargo check + LND REST mechanics
on a scratch wallet):
- Per-node random 256-bit secret in secrets/lnd-wallet-password (0600), mirroring
secrets/bitcoin-rpc-password. read_wallet_password (no-gen) vs
ensure_wallet_password (gen at init only).
- container/lnd.rs init AND api/rpc/lnd/wallet.rs seed-derived init both use the
per-node secret (wallet.rs keeps recoverable derived entropy; password unified).
- Unlock tries [per-node secret, legacy 'hellohello']; single-attempt primitive
distinguishes invalid-passphrase (fail fast, try next) from not-ready (retry),
so a wrong password no longer hangs the boot path ~60s.
Migration (candidate-unlock + rotate, best-effort at login):
- change_wallet_password (WalletUnlocker.ChangePassword) + migrate_locked_wallet:
if LOCKED, try candidates as current pw and ChangePassword onto the per-node
secret so future boots auto-unlock. Hooked into auth.login (non-blocking) with
the just-verified password as the candidate.
NOT YET: seed-recovery fallback for wallets where no candidate matches (e.g.
.116/.228) — destructive, needs entropy-source/funds-safety handling; next pass.
NOT shipped: pending end-to-end validation on a real node.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha
(all nodes were already on the latest build — these were live bugs, not stale
builds), plus the missing ElectrumX tile, and adds automated coverage so each
can't regress silently.
Receive address (".116 receive fails", ".228 false 'wallet is locked'"):
- LND publishes its REST API on a host port that can drift from the manifest
(a container created when the mapping was 8080 kept publishing 8080 after the
manifest moved to 18080). The in-process client connects to the manifest port,
gets connection-refused, and wallet init fails forever while the container
looks "Up". Add published-port drift detection to the reconciler
(container_ports_drifted / host_port_bindings_drifted) that recreates a
drifted backend even for restart-sensitive apps — a drifted container is
already broken, so leaving it "untouched" only perpetuates the failure.
- Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED,
WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they
survive the RPC error sanitizer instead of collapsing to the generic
"Operation failed". The UI maps the code instead of guessing wallet state from
substrings — so an unreachable REST endpoint is no longer mislabelled "locked".
Bitcoin install (".198 bitcoin gone / reinstall just stops"):
- bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only
generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so
secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate
it idempotently before bitcoin starts (ensure_app_secrets, reusing
ensure_txrelay_credentials), and name the missing secret in the error so a
genuine gap is actionable instead of a bare "IO error".
ElectrumX app tile missing on every node with it installed:
- The catalog generator dropped electrumx because the manifest had no
interfaces.main block, so the tile had no launch URL and was hidden. Declare
the companion UI port (50002) in the manifest, regenerate the catalog, and let
an app with a known launch URL stay launchable while its backend is still
"starting" (ElectrumX indexes for 10m+).
Test harness:
- New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness
(validated live; port-drift catches the real .116 drift).
- Rust unit tests for drift detection, the receive reason-code classifier, and
the named-missing-secret error; vitest for the UI code mapping.
- create-release.sh now runs tests/release/run.sh and aborts the release on
failure — previously it ran no tests at all.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- LND wallet: request correct address type so receive-address generation
no longer 400s
- AIUI/app session: on-screen pointer can click + type into app content
(incl. app store search); "open in new tab" opens the phone browser;
mobile credential modal centered instead of full-height
(remote-relay.ts, AppSession.vue, AppSessionFrame.vue, AppIconGrid.vue,
openExternal.ts, WebViewScreen.kt) + remote-relay tests
- health_monitor: electrs auto-recovers from a corrupt index and shows a
percent/block-height progress screen while reindexing (useElectrsSync.ts)
- update.rs: drop retired tx1138 secondary mirror (one-time migration);
longer download timeout for slow connections
- CHANGELOG: v1.7.90-alpha notes
- tests/release/run.sh: harness tweaks
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cargo check was showing five real warnings, all genuinely dead:
* container/mod.rs — re-exports compute_container_name, AdoptionReport,
ReconcileAction, ReconcileReport were unused outside
prod_orchestrator. Drop from the pub use line.
* prod_orchestrator — with_runtime + insert_manifest_for_test only exist
for the test module in the same file. Mark them
#[cfg(test)] so they don't appear in release builds.
* async_lifecycle — remove_package_entry has no callers; doc claims
"used for install-failure cleanup" but nothing
cleans up. Delete (10 lines).
* registry.rs — `use tracing::{debug, info};` had no consumers.
* fips.rs — unused-assignment chain on last_status. The poll
loop always sets it on every break path, so the
initial `None` and the unwrap_or_else fallback
were both dead. Refactored to `let after = loop
{ ...; break s; };`.
cargo check is now clean. cargo test --workspace --bins: 614 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If bitcoin-core was installed but never started (e.g. port 8332 already
bound by bitcoin-knots), the container sticks in `created` state forever.
The old conflict check refused EVERY future bitcoin install — including
re-install of the running variant — leaving no UI path to recovery.
Now the check distinguishes states:
- missing → no conflict, continue
- running → real conflict, refuse install
- created/exited/configured/... → stuck; auto-remove and continue
Volumes are untouched; only the dead container record goes away.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bitcoin containers were exiting in ms after start because the orchestrator
install path skipped the credential-materialisation step the legacy path
did. resolve_secret_env then failed to read
/var/lib/archipelago/secrets/bitcoin-rpc-password, the container started
with no password, and bitcoind crashed before logs were useful.
Two changes:
1. install.rs — call bitcoin_rpc_credentials() for bitcoin/bitcoin-core/
bitcoin-knots before any install branch runs. The function generates +
persists on first call (OnceCell-cached), so this is idempotent.
2. manifest.rs::resolve_secret_env — return ManifestError::Invalid when a
resolved secret trims to empty, instead of silently producing
`KEY=` env vars that crash auth.
Adds a unit test for the empty-secret rejection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3a of the install path consolidation. Two coupled changes:
1. install.rs handle_package_install: gate the legacy "container exists →
adopt + return" probe on !orchestrator_managed. Apps the orchestrator
knows about (bitcoin-knots, bitcoin-core, lnd, electrumx, fedimint,
filebrowser, btcpay-server stack apps, mempool stack apps, plus the
companion UIs that just moved to Quadlet) skip the legacy probe and
fall straight into the orchestrator branch.
The legacy adopt block was returning success on a bare `podman start`
exit-0 — even when the process inside the container crashed seconds
later. That's the .228 "running but unreachable" failure mode. The
orchestrator's ensure_running honors the manifest's health check and
pre-start hooks (e.g. re-renders bitcoin-ui's nginx.conf if the RPC
password rotated), so this is a behavioral upgrade, not just a
refactor.
2. ProdContainerOrchestrator::install: make idempotent. Previously it
blindly called install_fresh which would fail on `podman create` if
the container name already existed. Now it delegates to ensure_running:
- Container Running + healthy → no-op (refresh hooks, restart if
config rewritten)
- Container Stopped/Exited → start (with hook refresh)
- Container missing → install_fresh
- Container in wedged state (Created/Paused/Unknown) → force-recreate
Without this, change #1 would regress every "container already exists"
case for the 18 orchestrator-managed app IDs. With it, install becomes
the single source of truth for "make app X be in the desired state."
Tests: 654 passed across the workspace (614 unit + 37 orchestration + 3
rpc), 0 failures. The 20 prod_orchestrator tests cover the install /
ensure_running / reconcile paths the new install delegates through.
Net delta: install.rs grows by ~30 lines (gating wrapper + comments),
prod_orchestrator.rs grows by ~30 lines (idempotent install body). Both
are temporary — the larger deletions (~1700 lines) come once every app
has been verified through the orchestrator path in subsequent phases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>