archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	8893055810	test(gate): retry lnd getinfo for RPC readiness (wallet-unlock lags 'running') lnd's RPC isn't ready until its wallet auto-unlocks on (re)start, which lags the container 'running' state — single-shot lncli getinfo raced that window and false-failed (gate tests 60 + 85). Retry up to ~90s like a health probe. lnd is functional (getinfo returns cleanly once ready). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 14:45:36 -04:00
archipelago	53b8e47f1d	test(gate): fix two false-failing lifecycle tests (not product bugs) - immich restart: bump wait 120s->240s. Restart = ordered stop+start of the 3- container stack (postgres->redis->server w/ DB migrations), so it needs at least as long as the start test (180s) — the old 120s was inconsistent and false-failed on loaded nodes. immich does return to running. - fedimint orphan check: the unanchored 'total' regex (^fedimint) counts the legitimate fedimint-clientd (dual-ecash bridge) but the anchored 'known' regex omitted it -> total>known false orphan on every node running fedimint-clientd. Add fedimint-clientd to known. Both run as LOCAL podman/systemctl on the gate runner, so they test the runner node (.116), not the RPC target — surfaced while driving the .228 gate green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 14:11:35 -04:00
archipelago	84031e6209	docs: temporarily reduce release lifecycle gate from 20x to 5x Per user direction: the production test gate is 5x (ARCHY_ITERATIONS=5) on .228 AND .198 for now, down from 20x. Restore to 20x before the final ship. Updated CLAUDE.md, PRODUCTION-MASTER-PLAN.md, and tests/lifecycle/TESTING.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 17:11:00 -04:00
archipelago	b0b54a96fa	test(lifecycle): immich suite — package-level checks, wait-based destructive tier container-list reports stack apps package-level (.name="immich"), so the suite checks the "immich" package (presence, valid state, :2283 lan-address) rather than individual container names. Destructive tier fires async stop/start/restart and asserts on the end state via wait_for_container_status. KNOWN: the destructive tier is flaky for slow multi-container stacks — bats runs ops back-to-back with no settling while immich's async stack ops take 30s+, and stopped reports as "exited" not "stopped". The immich migration itself is verified working (manual stop/start/restart succeed; all 3 containers healthy). Hardening the harness for stack apps (inter-op settling + stopped\|exited acceptance) is a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 09:52:33 -04:00
archipelago	b1f175b927	test(lifecycle): add immich stack lifecycle suite RPC-based (host-agnostic) lifecycle coverage for the manifest-driven immich stack (immich + immich-postgres + immich-redis): presence + valid state of all 3 members, a guard that no legacy underscore containers exist (catches botched migration / legacy-installer fallback), destructive stop/start/restart of the server with postgres+redis staying up, and cascade uninstall/reinstall (preserve_data). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 09:01:19 -04:00
archipelago	03a4ee1b30	feat(container): manifest-declared generated secrets + companion/quadlet hardening Generated-secrets system: apps declare `generated_secrets` in their manifest (kinds hex16/hex32/bcrypt); `container::secrets::ensure_generated_secrets` materialises them 0600/rootless in resolve_dynamic_env — idempotent and self-healing (recovers wrongly root-owned secrets with no privilege). Replaces per-app Rust (deletes ensure_fmcd_password). fedimint-clientd/gateway manifests now declare fmcd-password / fedimint-gateway-hash. companion.rs: rebuild the auto-built :latest image when its build context changes (staleness check) so baked-in fixes (e.g. guardian-UI CSS) actually reach nodes. quadlet.rs: skip PublishPort under Network=host (podman rejects the combo, exit 125) + regression tests. UI: "Fedimint Guardian" rename, fedimint-clientd/nostr-rs-relay/meshtastic tagged as Services (headless backends), gateway icon fallback. Deployed + verified on .228 (generated-secrets fixed fedimint-gateway start; grafana/strfry orphan crash-loop units removed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 05:11:07 -04:00
archipelago	298595069d	fix(mesh): native Meshtastic unicast DMs + driver-level E2E status Meshtastic DMs were falling back to a channel broadcast, so every node on the LoRa channel saw a "direct" message. Send a directed MeshPacket (to = node num, decoded from the synthetic pubkey's node-id bytes) instead — the Meshtastic analog of the meshcore CMD_SEND_TXT_MSG fix. DMs now reach only the recipient; firmware auto-PKC-encrypts them end-to-end once NodeInfo keys are exchanged. Capture E2E status at the driver level (no shared-type/UI change): - learn each peer's real Curve25519 key from User.public_key (field 8) and inbound MeshPacket.public_key (16), kept in a side-map separate from the synthetic routing key so unicast routing is untouched - detect inbound MeshPacket.pki_encrypted (17) to tell a true E2E DM from a channel-PSK fallback - peer_is_pkc_capable() seam for a future mesh-tab E2E badge Hot-swap preserved: no dispatched MeshRadioDevice signature or the shared ParsedContact changed, so meshcore and meshtastic stay interchangeable behind the listener. Adds tests/multinode/meshtastic.sh, a two/three-radio on-air parity harness (detect, discover, DM round-trip, DM privacy, channel broadcast, typed envelope, reachability). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 09:52:26 -04:00
archipelago	4576964be4	docs(tracker): file new backlog as gitea #32-#35; relay UI + fedimint CSS live on .116	2026-06-16 06:41:22 -04:00
archipelago	82659e9f4e	docs(tracker): v1.7.97-alpha cut + mid-rollout state (116 deployed, 198 deploying, fleet pending)	2026-06-16 04:31:18 -04:00
archipelago	8a62ae008c	docs(tracker): B17 root-caused + fixed (data-volume mount ordering), verified .198	2026-06-16 03:38:58 -04:00
archipelago	dd0fac0e15	docs(tracker): B16 done (bitcoin tile retain/Updating…, unit-tested); image-opt staged for .97	2026-06-16 02:59:33 -04:00
archipelago	bf24bbc15a	fix(mempool): resolve CORE_RPC_HOST to the actual bitcoin node (Knots/Core) (B12) CORE_RPC_HOST was hardcoded to bitcoin-knots in three env-render paths, so on a bitcoin-core node (container named bitcoin-core) mempool-api could not reach Bitcoin RPC. Both node variants are reachable on archy-net by container name — only the name differs. - Legacy direct-podman (stacks.rs) and config.rs::get_app_config now use a new dependencies::detect_bitcoin_rpc_host() (pure, unit-tested pick_bitcoin_host). - Quadlet/manifest path (the modern fleet default): add a {{BITCOIN_HOST}} derived-env placeholder — HostFacts.bitcoin_host + resolve_derived_env render it; prod_orchestrator detects Knots/Core via podman ps, resolved on demand only for manifests that use the placeholder. mempool-api manifest moves CORE_RPC_HOST from static env to derived_env: {{BITCOIN_HOST}}. Tests: pick_bitcoin_host (5 cases incl. substring safety), container-crate resolve_derived_env, and orchestrator mempool_core_rpc_host_follows_bitcoin_node (core->bitcoin-core, knots->bitcoin-knots). No-regression confirmed: picker returns bitcoin-knots live on .198. Live bitcoin-core validation pending (no core node available). Sibling hardcodes (lnd/btcpay/electrumx/fedimint) tracked as B12b. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-16 02:07:39 -04:00
archipelago	987a961f4a	fix(nginx): self-heal fedimint asset rewrite on deployed nodes — HTTP + HTTPS (B13) The B13 template fix only fixed fresh ISOs. Already-deployed nodes keep their old nginx config, where /app/fedimint/ proxies to :8175 without rewriting the Guardian UI's root-rooted asset URLs (src="/assets/...", url("/assets/...")). Those resolve against the SPA root: bg-network.jpg exists there by luck, but app-icons/fedimint.jpg 404s (location /assets/ uses try_files =404) — the visibly-broken icon. bootstrap.rs::patch_nginx_conf now heals both paths on startup: - Style A (main conf, HTTP): swaps the old single nostr-provider sub_filter tail for the full reroot set; byte-matches the shipped template. - Style B (HTTPS app-proxy snippet): the snippet's fedimint block has no sub_filter and a per-node-varying trailing directive, so anchor on the unique :8175 proxy_pass and insert the reroot set after it (nginx ignores directive order). Snippet added to the bootstrap nginx loop (skipped on HTTP-only nodes). missing_* flags are now gated on their splice anchors so the included snippet neither attempts the main-conf-only patches nor logs warn-skips every boot. Idempotent via the 'href="/' 'href="/app/fedimint/' marker. Verified on .198 (both paths): fedimint app-icon 404 -> 200 image/jpeg; nginx -t OK; containers survived restart (Quadlet); idempotent steady state, no warn spam. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 18:03:04 -04:00
archipelago	a50b6df21b	fix(nginx): rewrite fedimint UI asset paths so CSS applies (B13, fresh-ISO) Fedimint UI HTML/CSS reference absolute /assets/* paths; under /app/fedimint/ those hit the main SPA, not the fedimint container, so the UI renders unstyled. Add the proven sub_filter asset-rewrite pattern (as indeedhub/ botfights use) to the /app/fedimint/ block in the nginx template + https snippet (also rewrites url(...) for the CSS background image). Bootstrap self-heal for already-deployed nodes is the documented resume point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 16:52:30 -04:00
archipelago	8427e219ea	docs(tracker): round-2 status (B15/B7 done, B13/B12/B16 deferred w/ plans) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 16:31:24 -04:00
archipelago	c0d41cf8cf	fix(ui): faster bitcoin sync refresh + unstick ElectrumX loader (B15,B7) B15: Home system stats (incl. bitcoin sync %) polled every 30s — too slow; now 10s so sync progress tracks the actual block height more closely. B7: the ElectrumX sync overlay was gated only on status!=='synced', so if the status never flips to 'synced' (ElectrumX stale/disconnected) the loader stuck on top forever. Now the overlay hides and the app iframe loads when the sync status is stale (fail-open), while still showing during active indexing. type-check EXIT 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 16:29:44 -04:00
archipelago	eb55c88e1a	docs(tracker): B6/B7/B12/B13/B15/B16 root causes + fix plans Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 14:43:01 -04:00
archipelago	31fe91b99a	docs(tracker): B13 fedimint CSS investigation progress Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 14:13:28 -04:00
archipelago	b9cc4bd780	docs(tracker): B14b FIPS reachability findings (dial-time, not npub/service) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 14:11:47 -04:00
archipelago	6c92eacba0	docs(tracker): add B22 (peer download/audio errors), B23 (group chat), B3 PASSED-http Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 14:09:31 -04:00
archipelago	602b9cd3df	fix(nginx): route /api/peer-content/* to the backend for B3 streaming The B3 streaming proxy endpoint existed in the backend but nginx had no location for /api/peer-content/*, so the browser's requests fell through to the SPA (200 text/html) and media still wouldn't play. Add an NGINX_PEER_CONTENT_BLOCK that bootstrap patches into every server block (forwards Cookie for session auth + Range, proxy_buffering off). Idempotent; covers fresh-ISO nodes too since bootstrap runs on every startup. Verified on .198: after restart the async nginx patch lands and /api/peer-content/<onion>/<id> returns 401 (reaches backend, auth-gated) instead of the SPA; nginx block present in both server blocks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 14:07:39 -04:00
archipelago	5c8707432b	fix(cloud): Range-streaming proxy for peer media so it plays/seeks (B3) Peer media (music/video) wouldn't play: the frontend downloaded the whole file via RPC as base64 and made a non-seekable Blob URL, so <video>/large <audio> stalled and big files hit the RPC timeout. Add GET /api/peer-content/<onion>/<id> — a same-origin, session-gated proxy that forwards the browser's Range header to the peer's /content/<id> (which already returns 206 Partial Content) and passes status + Content-Range + Content-Type back. PeerFiles.playMedia() now points <video>/<audio> at this streaming URL for free content instead of buffering a base64 blob, so the player can seek and start immediately. Onion/id validated to prevent SSRF/path traversal. (Paid preview keeps its existing flow.) Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 13:46:51 -04:00
archipelago	4cac6bc835	docs(tracker): record B1/B2/B4/B14/B21 done + B14b; next B3 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 13:27:51 -04:00
archipelago	0801dd6632	feat(cloud): show Tor/FIPS transport pill on peer browse (B21) content.browse-peer now returns the transport that actually reached the peer (fips/tor/mesh/lan). PeerFiles shows it as a small coloured pill next to the peer name (FIPS/Mesh green, LAN blue, Tor amber) and the loading text no longer hardcodes "Connecting via Tor" (it was misleading when FIPS was used). Pairs with B14 (transport recording). Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 13:25:39 -04:00
archipelago	1c6dc153ce	fix(content): use re-exported federation::record_peer_transport path (repair build) The B14 commit referenced crate::federation::storage::record_peer_transport but `storage` is a private module — record_peer_transport is re-exported at crate::federation::. E0603 broke the build. Use the re-exported path (as load_nodes/fips_npub_for_onion already do). Verified: cargo build --release EXIT 0. Also logs B21 (Tor/FIPS pill) plan. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 13:15:01 -04:00
archipelago	f2e3710c28	fix(content): record peer transport on cloud browse/download/preview (B14) The 4 content peer handlers (browse, download, download_paid, preview) captured the transport returned by PeerRequest::send_get() but discarded it, so the federation node's last_transport was never updated for cloud activity — the UI showed Tor/none even when FIPS was used. Call record_peer_transport() after each successful fetch (same as sync does). Note: live data shows FIPS still reaches only some peers (many genuinely fall back to Tor) — tracked separately as B14b (FIPS reachability). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 13:02:13 -04:00
archipelago	ed4931064b	fix(federation,cloud): dedup trusted nodes + chat contacts by onion; guard cloud my-folders (B1,B2,B4) B1/B2: the same physical node can linger in the federation list under two dids (e.g. after a did/key change). An onion is a node's unique stable identity, so two entries with the same onion are one node. This showed the node twice in the trusted-node list (B1) and as two mesh chat contacts — one by name+logo, one by raw did (B2). - storage::load_nodes now collapses same-onion entries (keep first, merge fips_npub/name/last_state) so every consumer (list + chat seed + sync) sees one entry per node. - federation::sync merge_transitive_peers also matches by onion (not just did) so new transitive hints don't re-add a known node under a new did. - mesh::seed_federation_peers_into_mesh skips already-seeded onions (belt and suspenders). - Unit tests for dedup_nodes_by_onion (collapse + onion-suffix handling). B4: filebrowser-client.listDirectory only checked res.ok before res.json(), so when File Browser is absent (nginx serves the SPA index.html, 200) or down (502) the JSON parse threw the opaque "Unexpected token '<'". Now it checks the content-type and throws a friendly "File Browser is not available" the Cloud view already renders as an empty state. Verified: dedup unit tests 2/2; live .198 (15 entries→13 distinct onions) restarted healthy on new binary; B4 guard present in built bundle + deployed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 12:29:12 -04:00
archipelago	1db720af13	fix(lnd): repair fleet-wide CORS on LND connect-wallet endpoints (B5) The LND wallet UI (served on its own app port) fetches /lnd-connect-info and /proxy/lnd/* cross-origin, so both need correct CORS headers. (a) Older nginx configs add their own Access-Control-Allow-Origin in the /lnd-connect-info location on top of the one the backend sets, yielding a DUPLICATE header that browsers reject ("multiple values"). bootstrap now strips that redundant nginx add_header (backend owns CORS). (b) /proxy/lnd/* returned a 401 with no CORS headers when the session check failed, so the browser saw an opaque CORS error instead of a readable 401. Add unauthorized_cors() and use it on that path. Adds tests/production-quality/ (bug tracker + lnd-cors-test.sh harness). Verified: harness 4/4 on .116, .198, .103. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 11:31:14 -04:00
archipelago	0c8991b519	test(multinode): assertion-based two-node E2E smoke suite Adds tests/multinode/smoke.sh on the existing multinode.bash lib: an assertion suite (pass/fail + non-zero exit) driving two real nodes through login, onion + FIPS identity, FIPS anchor-connected, federation pairing both directions, peer content browse over the mesh, and the removed-node tombstone (with an optional 3rd node C for the transitive-reappear case). Guards the v1.7.94/v1.7.95 fixes. Content-browse + tombstone checks skip-with-note against peers older than v1.7.95. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-15 09:03:58 -04:00
archipelago	8c8e4d7a29	test: gate that LND wallet is unlocked after restart (catches fleet-wide lock) A wrong/locked LND wallet password leaves the wallet LOCKED after every restart/OTA, breaking all Bitcoin-receive + Lightning ops fleet-wide — and the harness was blind to it: live-lnd-address-type treats 'wallet locked' as PASS, os-audit treated lnd-unreachable as WARN, and the archipelago lnd.getinfo RPC masks a locked wallet (returns all-zero success). - tests/release/run.sh: new 'live-lnd-unlocked' stage polls LND's unauth /v1/state and FAILs if still LOCKED after a 60s grace window. - tests/lifecycle/os-audit.sh: probe lnd.newaddress (the real receive path, which surfaces LND_WALLET_LOCKED) instead of lnd.getinfo; locked = hard FAIL, not-installed = WARN. Proven on .116 (genuinely locked): os-audit now reports '[FAIL] lnd wallet unlocked (lnd.newaddress) wallet LOCKED'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 10:36:12 -04:00
archipelago	2fac63e58c	feat(release): gate that Settings 'What's New' modal stays in sync with CHANGELOG The What's New modal (AccountInfoSection.vue) hardcodes one block per release and had silently drifted: it sat at v1.7.84 while the fleet shipped through v1.7.92, so eight releases of notes never reached users in Settings. - scripts/sync-whats-new.py: renders a modal block from each CHANGELOG version that's missing one (curated bullets, dev-process 'Validation…' lines dropped), inserts newest-first; never touches older hand-written pre-CHANGELOG history. --check mode lists anything missing and exits non-zero. - tests/release/run.sh: new 'whats-new-sync' static gate runs --check, so a release with an un-surfaced CHANGELOG entry fails before shipping. - Backfilled the eight missing blocks (v1.7.85 … v1.7.92) into the modal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 08:31:43 -04:00
archipelago	4232424b23	fix(ui): suppress app-unreachable overlay while ElectrumX sync screen shows When ElectrumX is still building its index (or waiting on the Bitcoin node), AppSessionFrame shows a sync 'pre UI'. The iframe-blocked fallback ('App not reachable / retrying') was not gated on electrsSync, so it painted over the sync screen and read as a hard connection error. Gate it on !electrsSync, mirroring the iframe's own guard. Also harden the lifecycle health probe: container_health used jq '// "unknown"', which only catches null/false — an empty-string health (a brief window under load) rendered as a blank 'bad health: X is '. Map empty to 'unknown' so the retry loop keeps waiting instead of failing on a transient. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 07:58:24 -04:00
archipelago	329e7811eb	test(lifecycle): add os-audit OS-wide health gate; docs: v1.7.91 resume notes os-audit.sh: one non-destructive scorecard tying backend/RPC health, the all-apps lifecycle audit (delegates to remote-lifecycle.sh), and the FM-guards (port-drift, secret-completeness, orphan-container sweep, OTA-wedge). The per-boot building block for the reboot-survival loop. FM12 check uses jq has() not // (// treats a legit false as empty). Section A validated all-PASS on .116. docs: v1.7.91 release-pass resume notes + the bitcoinReceive blocker writeup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 04:36:06 -04:00
archipelago	0ed892a412	fix: wallet receive reliability, bitcoin install self-heal, ElectrumX app tile Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha (all nodes were already on the latest build — these were live bugs, not stale builds), plus the missing ElectrumX tile, and adds automated coverage so each can't regress silently. Receive address (".116 receive fails", ".228 false 'wallet is locked'"): - LND publishes its REST API on a host port that can drift from the manifest (a container created when the mapping was 8080 kept publishing 8080 after the manifest moved to 18080). The in-process client connects to the manifest port, gets connection-refused, and wallet init fails forever while the container looks "Up". Add published-port drift detection to the reconciler (container_ports_drifted / host_port_bindings_drifted) that recreates a drifted backend even for restart-sensitive apps — a drifted container is already broken, so leaving it "untouched" only perpetuates the failure. - Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED, WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they survive the RPC error sanitizer instead of collapsing to the generic "Operation failed". The UI maps the code instead of guessing wallet state from substrings — so an unreachable REST endpoint is no longer mislabelled "locked". Bitcoin install (".198 bitcoin gone / reinstall just stops"): - bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate it idempotently before bitcoin starts (ensure_app_secrets, reusing ensure_txrelay_credentials), and name the missing secret in the error so a genuine gap is actionable instead of a bare "IO error". ElectrumX app tile missing on every node with it installed: - The catalog generator dropped electrumx because the manifest had no interfaces.main block, so the tile had no launch URL and was hidden. Declare the companion UI port (50002) in the manifest, regenerate the catalog, and let an app with a known launch URL stay launchable while its backend is still "starting" (ElectrumX indexes for 10m+). Test harness: - New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness (validated live; port-drift catches the real .116 drift). - Rust unit tests for drift detection, the receive reason-code classifier, and the named-missing-secret error; vitest for the UI code mapping. - create-release.sh now runs tests/release/run.sh and aborts the release on failure — previously it ran no tests at all. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 03:12:56 -04:00
archipelago	c800293f1f	fix: bitcoin receive, AIUI pointer input, electrs self-heal, OTA timeout - LND wallet: request correct address type so receive-address generation no longer 400s - AIUI/app session: on-screen pointer can click + type into app content (incl. app store search); "open in new tab" opens the phone browser; mobile credential modal centered instead of full-height (remote-relay.ts, AppSession.vue, AppSessionFrame.vue, AppIconGrid.vue, openExternal.ts, WebViewScreen.kt) + remote-relay tests - health_monitor: electrs auto-recovers from a corrupt index and shows a percent/block-height progress screen while reindexing (useElectrsSync.ts) - update.rs: drop retired tx1138 secondary mirror (one-time migration); longer download timeout for slow connections - CHANGELOG: v1.7.90-alpha notes - tests/release/run.sh: harness tweaks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 04:49:32 -04:00
archipelago	c49e8fcacd	fix: harden OTA updates, AIUI desktop gap, LND no-proxy - update.rs: post-OTA probe falls back to http://127.0.0.1/ on connect error (nginx binds :80, not :443) so good updates are no longer rolled back; recover stuck update_in_progress; avoid ETXTBSY on running binary - LND: REST client bypasses proxy, GET newaddress p2wkh, wallet readiness/unlock after restart - Dashboard.vue: chat route back to plain h-full (desktop bottom-gap fix) - vite.config.ts: dev-only /aiui proxy - tests/release/run.sh: release gate harness (static+frontend+backend) - CHANGELOG: v1.7.89-alpha notes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 01:23:32 -04:00
archipelago	d6f108d818	chore: snapshot release workspace	2026-06-12 03:00:15 -04:00
archipelago	c393b96da3	backend: harden rootless app lifecycle orchestration	2026-06-11 00:24:32 -04:00
archipelago	d736364ad7	fix(apps): stabilize btcpay and public proxy launch flows	2026-05-19 09:26:43 -04:00
archipelago	7804223152	chore: release v1.7.57-alpha	2026-05-17 17:30:04 -04:00
Dorian	835c525218	chore(release): stage v1.7.55-alpha	2026-05-13 15:09:22 -04:00
archipelago	745cb1c626	chore(release): stage v1.7.52-alpha	2026-05-05 11:29:18 -04:00
archipelago	10fbb8f87c	docs(testing): track Phase 3.4 race fix + drift-sync hook * L0 unit count: 630 → 631 (translate_health_check_http_does_not_double_prefix_scheme) * Phase 3 row: add TimeoutStartSec=600 race fix (44f275ed) + drift-sync hook (0889367d) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:53:18 -04:00
archipelago	bd96c0475d	feat(config): ARCHIPELAGO_USE_QUADLET_BACKENDS env override Adds an env-var lever for Phase 3.2's use_quadlet_backends flag so the 20× harness can flip the path on per-node without a config.json edit (which would require an archipelago.service restart — and that triggers FM3 cgroup cascade until Phase 3.5 ships, so we can't ask anyone to reconfigure live nodes that way today). Truthy parsing centralised in `parse_truthy_env` (1, true, yes, on — case-insensitive, whitespace-trimmed). Anything else is false. The helper is unit-tested so future env-var flags can reuse the same shape. Also adds a default-off regression test for use_quadlet_backends so flipping the default ahead of the 20× verification fires immediately. TESTING.md documents the Environment= snippet for the systemd drop-in so the next operator can flip the flag on a debug node without re-deriving the recipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:44:09 -04:00
archipelago	9a89a000d4	test(lifecycle): post-condition gate for use_quadlet_backends path A six-test bats suite that validates what install_via_quadlet (Phase 3.2) is supposed to leave behind: * `.container` unit on disk in $XDG_CONFIG_HOME/containers/systemd/ with [Container] / [Service] / [Install] sections, Image= present, and Restart=on-failure (the backend invariant — companions use Always) * Phase 3.4 cross-check: any unit with HealthCmd= must also emit Notify=healthy, otherwise systemctl start won't gate on health * `systemctl --user is-active` returns 0 for the .service * podman shows the container running * the container's cgroup is under user.slice/, NOT under archipelago.service — the kernel-level proof that FM3 cgroup cascade SIGKILL is structurally fixed for this container Auto-skips on every test when no backend Quadlet units exist (today's default state, use_quadlet_backends=false) — so the suite is a no-op on current fleet boxes and turns into a hard regression gate the moment anyone flips the flag and reinstalls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:34:47 -04:00
archipelago	97ce23d773	feat(quadlet): Phase 3.4 — health-gated startup via Notify=healthy QuadletUnit gains an optional HealthSpec; from_manifest translates the manifest's health_check (tcp/http/cmd) into a HealthCmd= directive and emits Notify=healthy alongside it. systemctl start <unit>.service then blocks until the container's first green probe — eliminating the "container up but RPC not ready" race the orchestrator currently papers over with post-start polling. Translation policy: * tcp, endpoint "host:port" -> nc -z host port * http, endpoint "host:port", path -> curl -fsS -m 5 http://endpoint<path> * cmd, endpoint "<shell command>" -> verbatim * unknown type / malformed endpoint -> None (skip Notify=healthy rather than emit a HealthCmd that hangs the unit start forever) Companion units leave health: None and remain byte-identical to before this PR — the renderer only emits the Health* / Notify= block when set. +4 quadlet unit tests (19 total). Dropped a never-used test setter that was generating a dead_code warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:21:57 -04:00
archipelago	65576bd755	feat(orchestrator): Phase 3.3 — in-place migration to Quadlet When use_quadlet_backends flips from off → on, existing fleet boxes have backend containers parented under archipelago.service's cgroup (the bad shape that triggers FM3 cascade SIGKILL on every archipelago restart). ensure_running now notices and corrects this: * If there's already a `<name>.container` unit on disk → no-op (subsequent reconcile ticks take this fast path). * Else if a podman container with that name exists → it's a pre-3.3 artifact. Stop+remove it (volumes survive — bind mounts are not touched by `podman rm`), then write the Quadlet unit, daemon-reload, and start the new managed service. * Else → fall through to install_fresh, which already routes through install_via_quadlet when the flag is on. The migration is idempotent and self-healing: if a fleet box is half-migrated (unit on disk but no service active, or service active but stale unit), the next reconcile tick converges. Bitcoin chain data, lnd wallet state, and electrumx index all live on host bind mounts and are unaffected by the container-record swap. Volume safety audited per backend in `uses_orchestrator_install_flow` allowlist — every entry mounts its data dir as a host bind mount. Default still off. To migrate a node: /etc/archipelago/config.toml: use_quadlet_backends = true followed by `systemctl restart archipelago` — the next reconcile tick walks every managed app and migrates each in turn. Tests: 624 passing, 0 cargo warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:27:59 -04:00
archipelago	5b2e02bd43	feat(orchestrator): Phase 3.2 — wire Quadlet path behind feature flag prod_orchestrator::install_fresh now branches on the new Config::use_quadlet_backends flag (default false): * off (today's production behavior) — unchanged: runtime.create_container + start_container, container parented under archipelago.service's cgroup, FM3 cascade SIGKILL on every archipelago restart. * on — install_via_quadlet renders the manifest as a Quadlet unit via QuadletUnit::from_manifest, writes it atomically into ~/.config/containers/systemd/, calls daemon-reload, and starts the generated <name>.service. Container ends up under user.slice — no more cgroup parented under archipelago, so archipelago restarts don't touch the container's lifetime. Default off so this commit is structurally safe to ship: nothing changes at runtime until an operator opts in. Flip the default once tests/lifecycle/run-20x.sh has gone green against the new path on .228 + .198 (the v1.7.52 release gate). Plumbing: * config.rs — `use_quadlet_backends: bool` w/ Default false * prod_orchestrator.rs — flag stored on the struct, threaded through new(), with set_use_quadlet_backends(bool) test setter * prod_orchestrator.rs — install_via_quadlet helper * dropped the Phase-3.1 #[allow(dead_code)] markers on from_manifest / parse_memory_mib / RestartPolicy::OnFailure now that the call path exists; if a future revert removes the wiring, the warnings come back. Tests: 624 passing, cargo check clean (0 warnings). Existing companion behavior unaffected — render_skips_backend_directives_when_default still passes byte-equal to before quadlet.rs grew the new fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:22:10 -04:00
archipelago	9becafafd3	feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:09:50 -04:00
archipelago	5074572373	test(lifecycle): add btcpay + fedimint + mempool suites Brings L1 (RPC API) + L3 (lifecycle survival) parity coverage to the three multi-app stacks that were previously only touched by required-stack.bats. Combined with bitcoin-knots / lnd / electrumx already shipping, the six core apps now have dedicated bats files. Each suite is shaped like the existing single-container suites (bitcoin-knots / lnd / electrumx) and gates every assertion on the backing container actually being present, so a node without the stack installed gets clean skip messages instead of false fails. * btcpay.bats — 9 tests, including stack-wide presence and a "supporting containers don't cascade-restart" guard * fedimint.bats — 8 tests, single container * mempool.bats — 9 tests, mixed legacy + orchestrator-managed stack; reuses the :8999 mempool-api probe from required-stack for parity Total bats now: 88 (was 53 → +35). TESTING.md matrix advances 23 → 50 of 110 cells. UI URL coverage for these three apps already lives in ui-coverage.bats, so this PR doesn't duplicate proxy-path probes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 16:55:31 -04:00

1 2

58 Commits