archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	3214d6aff3	fix(lnd): self-heal unrecoverable locked wallet via wipe+recreate When an existing LND wallet is locked and none of the candidate passwords (per-node secret, legacy constant) open it, the node can never auto-unlock unattended. unlock_existing_wallet now returns Ok(false) for "all candidates actively rejected" (vs Err for transient "LND not ready"), and ensure_wallet_initialized responds by recreating the wallet: - mark the lnd container user-stopped so the health monitor won't re-launch it (and re-open the wallet) mid-wipe, - stop lnd, delete its wallet/chain/graph state as root, - start lnd, wait for NON_EXISTING, re-init a fresh wallet on the per-node secret, then clear the user-stopped flag. LND runs as a plain bridge-network podman container (not a Quadlet unit), so it is restarted via `systemd-run --user --scope podman`, matching the orchestrator/health-monitor path. Alpha nodes hold no funds and a wallet locked with an unknown password is already inaccessible, so the wipe loses nothing reachable. Completes the forward fix from 91adc281 for nodes whose wallet pre-dates the per-node secret and whose password is unrecorded (e.g. .116/.228). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 14:08:33 -04:00
archipelago	91adc281ca	fix(lnd): per-node wallet password + locked-wallet self-heal on login Replaces the fleet-wide hardcoded WALLET_PASSWORD='hellohello' that left wallets LOCKED after OTA/reboot (auto-unlock used the wrong password fleet-wide). Forward fix (both init paths unified, validated cargo check + LND REST mechanics on a scratch wallet): - Per-node random 256-bit secret in secrets/lnd-wallet-password (0600), mirroring secrets/bitcoin-rpc-password. read_wallet_password (no-gen) vs ensure_wallet_password (gen at init only). - container/lnd.rs init AND api/rpc/lnd/wallet.rs seed-derived init both use the per-node secret (wallet.rs keeps recoverable derived entropy; password unified). - Unlock tries [per-node secret, legacy 'hellohello']; single-attempt primitive distinguishes invalid-passphrase (fail fast, try next) from not-ready (retry), so a wrong password no longer hangs the boot path ~60s. Migration (candidate-unlock + rotate, best-effort at login): - change_wallet_password (WalletUnlocker.ChangePassword) + migrate_locked_wallet: if LOCKED, try candidates as current pw and ChangePassword onto the per-node secret so future boots auto-unlock. Hooked into auth.login (non-blocking) with the just-verified password as the candidate. NOT YET: seed-recovery fallback for wallets where no candidate matches (e.g. .116/.228) — destructive, needs entropy-source/funds-safety handling; next pass. NOT shipped: pending end-to-end validation on a real node. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 11:19:56 -04:00
archipelago	a483fe4baa	fix: derive launch port from URL authority, not naive rsplit reachable_lan_address() parsed the launch port with url.rsplit(':') which yields "8096/" for manifest interfaces.main URLs that carry a path (http://localhost:8096/). That fails to parse and silently drops a perfectly reachable launch URL, so apps like jellyfin, btcpay-server, fedimint, gitea, nextcloud and portainer showed running with no launch link in the UI. New launch_url_port() reads digits after the final colon (mirroring port_from_url in the RPC layer) and tolerates a trailing path. Adds regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 03:35:19 -04:00
archipelago	0ed892a412	fix: wallet receive reliability, bitcoin install self-heal, ElectrumX app tile Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha (all nodes were already on the latest build — these were live bugs, not stale builds), plus the missing ElectrumX tile, and adds automated coverage so each can't regress silently. Receive address (".116 receive fails", ".228 false 'wallet is locked'"): - LND publishes its REST API on a host port that can drift from the manifest (a container created when the mapping was 8080 kept publishing 8080 after the manifest moved to 18080). The in-process client connects to the manifest port, gets connection-refused, and wallet init fails forever while the container looks "Up". Add published-port drift detection to the reconciler (container_ports_drifted / host_port_bindings_drifted) that recreates a drifted backend even for restart-sensitive apps — a drifted container is already broken, so leaving it "untouched" only perpetuates the failure. - Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED, WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they survive the RPC error sanitizer instead of collapsing to the generic "Operation failed". The UI maps the code instead of guessing wallet state from substrings — so an unreachable REST endpoint is no longer mislabelled "locked". Bitcoin install (".198 bitcoin gone / reinstall just stops"): - bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate it idempotently before bitcoin starts (ensure_app_secrets, reusing ensure_txrelay_credentials), and name the missing secret in the error so a genuine gap is actionable instead of a bare "IO error". ElectrumX app tile missing on every node with it installed: - The catalog generator dropped electrumx because the manifest had no interfaces.main block, so the tile had no launch URL and was hidden. Declare the companion UI port (50002) in the manifest, regenerate the catalog, and let an app with a known launch URL stay launchable while its backend is still "starting" (ElectrumX indexes for 10m+). Test harness: - New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness (validated live; port-drift catches the real .116 drift). - Rust unit tests for drift detection, the receive reason-code classifier, and the named-missing-secret error; vitest for the UI code mapping. - create-release.sh now runs tests/release/run.sh and aborts the release on failure — previously it ran no tests at all. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 03:12:56 -04:00
archipelago	c800293f1f	fix: bitcoin receive, AIUI pointer input, electrs self-heal, OTA timeout - LND wallet: request correct address type so receive-address generation no longer 400s - AIUI/app session: on-screen pointer can click + type into app content (incl. app store search); "open in new tab" opens the phone browser; mobile credential modal centered instead of full-height (remote-relay.ts, AppSession.vue, AppSessionFrame.vue, AppIconGrid.vue, openExternal.ts, WebViewScreen.kt) + remote-relay tests - health_monitor: electrs auto-recovers from a corrupt index and shows a percent/block-height progress screen while reindexing (useElectrsSync.ts) - update.rs: drop retired tx1138 secondary mirror (one-time migration); longer download timeout for slow connections - CHANGELOG: v1.7.90-alpha notes - tests/release/run.sh: harness tweaks Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 04:49:32 -04:00
archipelago	c49e8fcacd	fix: harden OTA updates, AIUI desktop gap, LND no-proxy - update.rs: post-OTA probe falls back to http://127.0.0.1/ on connect error (nginx binds :80, not :443) so good updates are no longer rolled back; recover stuck update_in_progress; avoid ETXTBSY on running binary - LND: REST client bypasses proxy, GET newaddress p2wkh, wallet readiness/unlock after restart - Dashboard.vue: chat route back to plain h-full (desktop bottom-gap fix) - vite.config.ts: dev-only /aiui proxy - tests/release/run.sh: release gate harness (static+frontend+backend) - CHANGELOG: v1.7.89-alpha notes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 01:23:32 -04:00
archipelago	b8ac68d844	fix: restore aiui and bitcoin receive before release	2026-06-12 05:10:03 -04:00
archipelago	8d4b309753	fix: patch bitcoin receive and full-screen launch overlays	2026-06-12 04:42:23 -04:00
archipelago	d6f108d818	chore: snapshot release workspace	2026-06-12 03:00:15 -04:00
archipelago	6a30ff11bd	chore: release v1.7.84-alpha	2026-06-11 04:44:58 -04:00
archipelago	f818f1dcc1	app-platform: remove unsupported saleor release surface	2026-06-11 01:16:21 -04:00
archipelago	c393b96da3	backend: harden rootless app lifecycle orchestration	2026-06-11 00:24:32 -04:00
archipelago	626a89bdbc	fix(apps): proxy saleor storefront media	2026-05-22 17:08:03 -04:00
archipelago	a578834462	fix(apps): repair saleor storefront startup	2026-05-21 21:33:51 -04:00
archipelago	8eb03d106e	fix(apps): repair saleor storefront graphql origin	2026-05-21 00:30:22 -04:00
archipelago	34c4e87d14	feat(apps): add saleor storefront	2026-05-20 23:02:57 -04:00
archipelago	cc1f8fba72	fix(apps): stabilize saleor and netbird release paths	2026-05-20 20:38:52 -04:00
archipelago	f4368785f0	fix(apps): unblock saleor and netbird first-use flows	2026-05-20 00:28:30 -04:00
archipelago	92c58141af	fix(apps): stabilize saleor and netbird launch	2026-05-19 21:45:17 -04:00
archipelago	522c046525	feat(apps): add saleor and harden netbird repair	2026-05-19 20:11:22 -04:00
archipelago	bd69ef41d5	fix(apps): repair netbird login and iframe focus	2026-05-19 19:21:43 -04:00
archipelago	1836b035b4	fix(mobile): improve app store search and launches	2026-05-19 18:29:04 -04:00
archipelago	f0bd49d03d	fix(apps): repair netbird install and app icons	2026-05-19 17:20:32 -04:00
archipelago	ab96c97cb9	fix(apps): self-host netbird and stabilize app sessions	2026-05-19 16:02:35 -04:00
archipelago	87be717f40	fix(apps): keep slow installs visible	2026-05-19 14:29:20 -04:00
archipelago	d736364ad7	fix(apps): stabilize btcpay and public proxy launch flows	2026-05-19 09:26:43 -04:00
archipelago	19dbf60f03	fix(apps): detect stale npm created containers	2026-05-18 10:04:22 -04:00
archipelago	7104ba0cbf	fix(apps): repair orchestrator starts before launch	2026-05-18 09:20:12 -04:00
archipelago	b701e125b4	fix(update): relax apply rate limit	2026-05-17 23:15:07 -04:00
archipelago	19f2125a4d	fix(apps): repair stale nginx proxy manager ports	2026-05-17 22:38:04 -04:00
archipelago	a992abcd06	chore: release v1.7.61-alpha	2026-05-17 22:13:21 -04:00
archipelago	4d6b4f76af	chore: release v1.7.60-alpha	2026-05-17 20:45:56 -04:00
archipelago	0a94c0097f	chore: release v1.7.59-alpha	2026-05-17 19:44:54 -04:00
archipelago	413d50116e	fix(apps): restore mobile and website launching	2026-05-17 19:22:18 -04:00
archipelago	e05e356d64	chore: release v1.7.58-alpha	2026-05-17 18:40:50 -04:00
archipelago	cfb304a001	feat(mesh): add meshtastic serial radio support	2026-05-17 18:07:40 -04:00
archipelago	7804223152	chore: release v1.7.57-alpha	2026-05-17 17:30:04 -04:00
archipelago	01ec0565a6	fix: restore wifi setup and ssh password updates	2026-05-15 18:15:06 -04:00
Dorian	b8053c00ca	fix: clear stale health notifications	2026-05-14 08:57:54 -04:00
Dorian	f95e9a1cd0	fix: quote quadlet environment values	2026-05-14 01:15:22 -04:00
Dorian	be50dc3235	fix: avoid bootstrap bitcoin restarts	2026-05-14 00:03:16 -04:00
Dorian	2ff47f88a7	fix: harden container reconcile and launch behavior	2026-05-13 22:59:55 -04:00
Dorian	835c525218	chore(release): stage v1.7.55-alpha	2026-05-13 15:09:22 -04:00
archipelago	c0751e2551	chore(release): stage v1.7.54-alpha	2026-05-06 09:23:57 -04:00
archipelago	1a0d8a432c	chore(release): stage v1.7.53-alpha	2026-05-05 13:59:50 -04:00
archipelago	745cb1c626	chore(release): stage v1.7.52-alpha	2026-05-05 11:29:18 -04:00
archipelago	aad0ba5234	feat(orchestrator): drift-sync existing Quadlet units on each reconcile When a Quadlet unit file already exists for an orchestrator-managed backend, sync its on-disk bytes against what the current renderer produces. write_if_changed makes this idempotent — when bytes match, no IO; when they differ (post-deploy of a renderer change), the file is rewritten and systemctl --user daemon-reload runs once. We deliberately do NOT restart the .service when the file changes: running containers keep their current config until the operator restarts them. That's the right tradeoff — file updates are cheap and non-destructive; service restarts are the SIGKILL cascade we're trying to eliminate. Why this matters: pre-this-commit, every renderer change required a fresh package.install RPC per app to take effect. Observed live on .228 2026-05-02 — the TimeoutStartSec=600 fix shipped in code but existing units stayed on the old format because nothing triggered a re-render. Combined with state.json being empty (so the reconciler's auto-install path didn't fire either), the fix was invisible until manual unit deletion. Companions (UI_APP_IDS) are skipped — companion.rs renders those units with a different shape; syncing here would clobber them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:43:18 -04:00
archipelago	281e65e697	fix(quadlet): TimeoutStartSec=600 when Notify=healthy is set Bug surfaced live on .228 2026-05-02 — every backend Quadlet unit (lnd, electrumx, fedimint, btcpay-server, mempool-api, bitcoin-knots) hit systemd's default 90s start timeout because Notify=healthy makes systemctl wait for the first green health probe, but HealthInterval=30s × HealthRetries=3 = 90s minimum even on a healthy service. Race: timeout fires the moment the third probe MIGHT succeed. Result was three different post-states (inactive+running, failed+missing, inactive+stopped) depending on whether systemd's ExecStopPost ran podman rm before the orchestrator's adoption logic re-grabbed the container. Fix: when health is set, render TimeoutStartSec=600 (10 minutes) into [Service]. Long enough for slow-starting backends (electrumx index replay, lnd wallet unlock) without being so long that a truly stuck unit hangs forever. Companions stay unchanged (no health → no override, default 90s applies). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:14:48 -04:00
archipelago	384f12de7a	fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:37:37 -04:00
archipelago	bd96c0475d	feat(config): ARCHIPELAGO_USE_QUADLET_BACKENDS env override Adds an env-var lever for Phase 3.2's use_quadlet_backends flag so the 20× harness can flip the path on per-node without a config.json edit (which would require an archipelago.service restart — and that triggers FM3 cgroup cascade until Phase 3.5 ships, so we can't ask anyone to reconfigure live nodes that way today). Truthy parsing centralised in `parse_truthy_env` (1, true, yes, on — case-insensitive, whitespace-trimmed). Anything else is false. The helper is unit-tested so future env-var flags can reuse the same shape. Also adds a default-off regression test for use_quadlet_backends so flipping the default ahead of the 20× verification fires immediately. TESTING.md documents the Environment= snippet for the systemd drop-in so the next operator can flip the flag on a debug node without re-deriving the recipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:44:09 -04:00

1 2 3 4 5 ...

452 Commits