archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	3214d6aff3	fix(lnd): self-heal unrecoverable locked wallet via wipe+recreate When an existing LND wallet is locked and none of the candidate passwords (per-node secret, legacy constant) open it, the node can never auto-unlock unattended. unlock_existing_wallet now returns Ok(false) for "all candidates actively rejected" (vs Err for transient "LND not ready"), and ensure_wallet_initialized responds by recreating the wallet: - mark the lnd container user-stopped so the health monitor won't re-launch it (and re-open the wallet) mid-wipe, - stop lnd, delete its wallet/chain/graph state as root, - start lnd, wait for NON_EXISTING, re-init a fresh wallet on the per-node secret, then clear the user-stopped flag. LND runs as a plain bridge-network podman container (not a Quadlet unit), so it is restarted via `systemd-run --user --scope podman`, matching the orchestrator/health-monitor path. Alpha nodes hold no funds and a wallet locked with an unknown password is already inaccessible, so the wipe loses nothing reachable. Completes the forward fix from 91adc281 for nodes whose wallet pre-dates the per-node secret and whose password is unrecorded (e.g. .116/.228). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 14:08:33 -04:00
archipelago	91adc281ca	fix(lnd): per-node wallet password + locked-wallet self-heal on login Replaces the fleet-wide hardcoded WALLET_PASSWORD='hellohello' that left wallets LOCKED after OTA/reboot (auto-unlock used the wrong password fleet-wide). Forward fix (both init paths unified, validated cargo check + LND REST mechanics on a scratch wallet): - Per-node random 256-bit secret in secrets/lnd-wallet-password (0600), mirroring secrets/bitcoin-rpc-password. read_wallet_password (no-gen) vs ensure_wallet_password (gen at init only). - container/lnd.rs init AND api/rpc/lnd/wallet.rs seed-derived init both use the per-node secret (wallet.rs keeps recoverable derived entropy; password unified). - Unlock tries [per-node secret, legacy 'hellohello']; single-attempt primitive distinguishes invalid-passphrase (fail fast, try next) from not-ready (retry), so a wrong password no longer hangs the boot path ~60s. Migration (candidate-unlock + rotate, best-effort at login): - change_wallet_password (WalletUnlocker.ChangePassword) + migrate_locked_wallet: if LOCKED, try candidates as current pw and ChangePassword onto the per-node secret so future boots auto-unlock. Hooked into auth.login (non-blocking) with the just-verified password as the candidate. NOT YET: seed-recovery fallback for wallets where no candidate matches (e.g. .116/.228) — destructive, needs entropy-source/funds-safety handling; next pass. NOT shipped: pending end-to-end validation on a real node. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 11:19:56 -04:00
archipelago	a483fe4baa	fix: derive launch port from URL authority, not naive rsplit reachable_lan_address() parsed the launch port with url.rsplit(':') which yields "8096/" for manifest interfaces.main URLs that carry a path (http://localhost:8096/). That fails to parse and silently drops a perfectly reachable launch URL, so apps like jellyfin, btcpay-server, fedimint, gitea, nextcloud and portainer showed running with no launch link in the UI. New launch_url_port() reads digits after the final colon (mirroring port_from_url in the RPC layer) and tolerates a trailing path. Adds regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 03:35:19 -04:00
archipelago	0ed892a412	fix: wallet receive reliability, bitcoin install self-heal, ElectrumX app tile Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha (all nodes were already on the latest build — these were live bugs, not stale builds), plus the missing ElectrumX tile, and adds automated coverage so each can't regress silently. Receive address (".116 receive fails", ".228 false 'wallet is locked'"): - LND publishes its REST API on a host port that can drift from the manifest (a container created when the mapping was 8080 kept publishing 8080 after the manifest moved to 18080). The in-process client connects to the manifest port, gets connection-refused, and wallet init fails forever while the container looks "Up". Add published-port drift detection to the reconciler (container_ports_drifted / host_port_bindings_drifted) that recreates a drifted backend even for restart-sensitive apps — a drifted container is already broken, so leaving it "untouched" only perpetuates the failure. - Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED, WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they survive the RPC error sanitizer instead of collapsing to the generic "Operation failed". The UI maps the code instead of guessing wallet state from substrings — so an unreachable REST endpoint is no longer mislabelled "locked". Bitcoin install (".198 bitcoin gone / reinstall just stops"): - bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate it idempotently before bitcoin starts (ensure_app_secrets, reusing ensure_txrelay_credentials), and name the missing secret in the error so a genuine gap is actionable instead of a bare "IO error". ElectrumX app tile missing on every node with it installed: - The catalog generator dropped electrumx because the manifest had no interfaces.main block, so the tile had no launch URL and was hidden. Declare the companion UI port (50002) in the manifest, regenerate the catalog, and let an app with a known launch URL stay launchable while its backend is still "starting" (ElectrumX indexes for 10m+). Test harness: - New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness (validated live; port-drift catches the real .116 drift). - Rust unit tests for drift detection, the receive reason-code classifier, and the named-missing-secret error; vitest for the UI code mapping. - create-release.sh now runs tests/release/run.sh and aborts the release on failure — previously it ran no tests at all. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-14 03:12:56 -04:00
archipelago	c49e8fcacd	fix: harden OTA updates, AIUI desktop gap, LND no-proxy - update.rs: post-OTA probe falls back to http://127.0.0.1/ on connect error (nginx binds :80, not :443) so good updates are no longer rolled back; recover stuck update_in_progress; avoid ETXTBSY on running binary - LND: REST client bypasses proxy, GET newaddress p2wkh, wallet readiness/unlock after restart - Dashboard.vue: chat route back to plain h-full (desktop bottom-gap fix) - vite.config.ts: dev-only /aiui proxy - tests/release/run.sh: release gate harness (static+frontend+backend) - CHANGELOG: v1.7.89-alpha notes Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-13 01:23:32 -04:00
archipelago	d6f108d818	chore: snapshot release workspace	2026-06-12 03:00:15 -04:00
archipelago	6a30ff11bd	chore: release v1.7.84-alpha	2026-06-11 04:44:58 -04:00
archipelago	f818f1dcc1	app-platform: remove unsupported saleor release surface	2026-06-11 01:16:21 -04:00
archipelago	c393b96da3	backend: harden rootless app lifecycle orchestration	2026-06-11 00:24:32 -04:00
archipelago	34c4e87d14	feat(apps): add saleor storefront	2026-05-20 23:02:57 -04:00
archipelago	f4368785f0	fix(apps): unblock saleor and netbird first-use flows	2026-05-20 00:28:30 -04:00
archipelago	92c58141af	fix(apps): stabilize saleor and netbird launch	2026-05-19 21:45:17 -04:00
archipelago	522c046525	feat(apps): add saleor and harden netbird repair	2026-05-19 20:11:22 -04:00
archipelago	bd69ef41d5	fix(apps): repair netbird login and iframe focus	2026-05-19 19:21:43 -04:00
archipelago	f0bd49d03d	fix(apps): repair netbird install and app icons	2026-05-19 17:20:32 -04:00
archipelago	ab96c97cb9	fix(apps): self-host netbird and stabilize app sessions	2026-05-19 16:02:35 -04:00
archipelago	87be717f40	fix(apps): keep slow installs visible	2026-05-19 14:29:20 -04:00
archipelago	413d50116e	fix(apps): restore mobile and website launching	2026-05-17 19:22:18 -04:00
archipelago	7804223152	chore: release v1.7.57-alpha	2026-05-17 17:30:04 -04:00
Dorian	b8053c00ca	fix: clear stale health notifications	2026-05-14 08:57:54 -04:00
Dorian	f95e9a1cd0	fix: quote quadlet environment values	2026-05-14 01:15:22 -04:00
Dorian	2ff47f88a7	fix: harden container reconcile and launch behavior	2026-05-13 22:59:55 -04:00
Dorian	835c525218	chore(release): stage v1.7.55-alpha	2026-05-13 15:09:22 -04:00
archipelago	c0751e2551	chore(release): stage v1.7.54-alpha	2026-05-06 09:23:57 -04:00
archipelago	745cb1c626	chore(release): stage v1.7.52-alpha	2026-05-05 11:29:18 -04:00
archipelago	aad0ba5234	feat(orchestrator): drift-sync existing Quadlet units on each reconcile When a Quadlet unit file already exists for an orchestrator-managed backend, sync its on-disk bytes against what the current renderer produces. write_if_changed makes this idempotent — when bytes match, no IO; when they differ (post-deploy of a renderer change), the file is rewritten and systemctl --user daemon-reload runs once. We deliberately do NOT restart the .service when the file changes: running containers keep their current config until the operator restarts them. That's the right tradeoff — file updates are cheap and non-destructive; service restarts are the SIGKILL cascade we're trying to eliminate. Why this matters: pre-this-commit, every renderer change required a fresh package.install RPC per app to take effect. Observed live on .228 2026-05-02 — the TimeoutStartSec=600 fix shipped in code but existing units stayed on the old format because nothing triggered a re-render. Combined with state.json being empty (so the reconciler's auto-install path didn't fire either), the fix was invisible until manual unit deletion. Companions (UI_APP_IDS) are skipped — companion.rs renders those units with a different shape; syncing here would clobber them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 11:43:18 -04:00
archipelago	281e65e697	fix(quadlet): TimeoutStartSec=600 when Notify=healthy is set Bug surfaced live on .228 2026-05-02 — every backend Quadlet unit (lnd, electrumx, fedimint, btcpay-server, mempool-api, bitcoin-knots) hit systemd's default 90s start timeout because Notify=healthy makes systemctl wait for the first green health probe, but HealthInterval=30s × HealthRetries=3 = 90s minimum even on a healthy service. Race: timeout fires the moment the third probe MIGHT succeed. Result was three different post-states (inactive+running, failed+missing, inactive+stopped) depending on whether systemd's ExecStopPost ran podman rm before the orchestrator's adoption logic re-grabbed the container. Fix: when health is set, render TimeoutStartSec=600 (10 minutes) into [Service]. Long enough for slow-starting backends (electrumx index replay, lnd wallet unlock) without being so long that a truly stuck unit hangs forever. Companions stay unchanged (no health → no override, default 90s applies). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 07:14:48 -04:00
archipelago	384f12de7a	fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 06:37:37 -04:00
archipelago	97ce23d773	feat(quadlet): Phase 3.4 — health-gated startup via Notify=healthy QuadletUnit gains an optional HealthSpec; from_manifest translates the manifest's health_check (tcp/http/cmd) into a HealthCmd= directive and emits Notify=healthy alongside it. systemctl start <unit>.service then blocks until the container's first green probe — eliminating the "container up but RPC not ready" race the orchestrator currently papers over with post-start polling. Translation policy: * tcp, endpoint "host:port" -> nc -z host port * http, endpoint "host:port", path -> curl -fsS -m 5 http://endpoint<path> * cmd, endpoint "<shell command>" -> verbatim * unknown type / malformed endpoint -> None (skip Notify=healthy rather than emit a HealthCmd that hangs the unit start forever) Companion units leave health: None and remain byte-identical to before this PR — the renderer only emits the Health* / Notify= block when set. +4 quadlet unit tests (19 total). Dropped a never-used test setter that was generating a dead_code warning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:21:57 -04:00
archipelago	65576bd755	feat(orchestrator): Phase 3.3 — in-place migration to Quadlet When use_quadlet_backends flips from off → on, existing fleet boxes have backend containers parented under archipelago.service's cgroup (the bad shape that triggers FM3 cascade SIGKILL on every archipelago restart). ensure_running now notices and corrects this: * If there's already a `<name>.container` unit on disk → no-op (subsequent reconcile ticks take this fast path). * Else if a podman container with that name exists → it's a pre-3.3 artifact. Stop+remove it (volumes survive — bind mounts are not touched by `podman rm`), then write the Quadlet unit, daemon-reload, and start the new managed service. * Else → fall through to install_fresh, which already routes through install_via_quadlet when the flag is on. The migration is idempotent and self-healing: if a fleet box is half-migrated (unit on disk but no service active, or service active but stale unit), the next reconcile tick converges. Bitcoin chain data, lnd wallet state, and electrumx index all live on host bind mounts and are unaffected by the container-record swap. Volume safety audited per backend in `uses_orchestrator_install_flow` allowlist — every entry mounts its data dir as a host bind mount. Default still off. To migrate a node: /etc/archipelago/config.toml: use_quadlet_backends = true followed by `systemctl restart archipelago` — the next reconcile tick walks every managed app and migrates each in turn. Tests: 624 passing, 0 cargo warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:27:59 -04:00
archipelago	5b2e02bd43	feat(orchestrator): Phase 3.2 — wire Quadlet path behind feature flag prod_orchestrator::install_fresh now branches on the new Config::use_quadlet_backends flag (default false): * off (today's production behavior) — unchanged: runtime.create_container + start_container, container parented under archipelago.service's cgroup, FM3 cascade SIGKILL on every archipelago restart. * on — install_via_quadlet renders the manifest as a Quadlet unit via QuadletUnit::from_manifest, writes it atomically into ~/.config/containers/systemd/, calls daemon-reload, and starts the generated <name>.service. Container ends up under user.slice — no more cgroup parented under archipelago, so archipelago restarts don't touch the container's lifetime. Default off so this commit is structurally safe to ship: nothing changes at runtime until an operator opts in. Flip the default once tests/lifecycle/run-20x.sh has gone green against the new path on .228 + .198 (the v1.7.52 release gate). Plumbing: * config.rs — `use_quadlet_backends: bool` w/ Default false * prod_orchestrator.rs — flag stored on the struct, threaded through new(), with set_use_quadlet_backends(bool) test setter * prod_orchestrator.rs — install_via_quadlet helper * dropped the Phase-3.1 #[allow(dead_code)] markers on from_manifest / parse_memory_mib / RestartPolicy::OnFailure now that the call path exists; if a future revert removes the wiring, the warnings come back. Tests: 624 passing, cargo check clean (0 warnings). Existing companion behavior unaffected — render_skips_backend_directives_when_default still passes byte-equal to before quadlet.rs grew the new fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:22:10 -04:00
archipelago	9becafafd3	feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 17:09:50 -04:00
archipelago	6bbe1b96cf	refactor: drop dead code surfaced by cargo cargo check was showing five real warnings, all genuinely dead: * container/mod.rs — re-exports compute_container_name, AdoptionReport, ReconcileAction, ReconcileReport were unused outside prod_orchestrator. Drop from the pub use line. * prod_orchestrator — with_runtime + insert_manifest_for_test only exist for the test module in the same file. Mark them #[cfg(test)] so they don't appear in release builds. * async_lifecycle — remove_package_entry has no callers; doc claims "used for install-failure cleanup" but nothing cleans up. Delete (10 lines). * registry.rs — `use tracing::{debug, info};` had no consumers. * fips.rs — unused-assignment chain on last_status. The poll loop always sets it on every break path, so the initial `None` and the unwrap_or_else fallback were both dead. Refactored to `let after = loop { ...; break s; };`. cargo check is now clean. cargo test --workspace --bins: 614 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 15:34:02 -04:00
archipelago	f9e34fd0c6	refactor(install): route orchestrator-managed apps through orchestrator first Phase 3a of the install path consolidation. Two coupled changes: 1. install.rs handle_package_install: gate the legacy "container exists → adopt + return" probe on !orchestrator_managed. Apps the orchestrator knows about (bitcoin-knots, bitcoin-core, lnd, electrumx, fedimint, filebrowser, btcpay-server stack apps, mempool stack apps, plus the companion UIs that just moved to Quadlet) skip the legacy probe and fall straight into the orchestrator branch. The legacy adopt block was returning success on a bare `podman start` exit-0 — even when the process inside the container crashed seconds later. That's the .228 "running but unreachable" failure mode. The orchestrator's ensure_running honors the manifest's health check and pre-start hooks (e.g. re-renders bitcoin-ui's nginx.conf if the RPC password rotated), so this is a behavioral upgrade, not just a refactor. 2. ProdContainerOrchestrator::install: make idempotent. Previously it blindly called install_fresh which would fail on `podman create` if the container name already existed. Now it delegates to ensure_running: - Container Running + healthy → no-op (refresh hooks, restart if config rewritten) - Container Stopped/Exited → start (with hook refresh) - Container missing → install_fresh - Container in wedged state (Created/Paused/Unknown) → force-recreate Without this, change #1 would regress every "container already exists" case for the 18 orchestrator-managed app IDs. With it, install becomes the single source of truth for "make app X be in the desired state." Tests: 654 passed across the workspace (614 unit + 37 orchestration + 3 rpc), 0 failures. The 20 prod_orchestrator tests cover the install / ensure_running / reconcile paths the new install delegates through. Net delta: install.rs grows by ~30 lines (gating wrapper + comments), prod_orchestrator.rs grows by ~30 lines (idempotent install body). Both are temporary — the larger deletions (~1700 lines) come once every app has been verified through the orchestrator path in subsequent phases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 11:12:52 -04:00
archipelago	23c4e7441f	refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:45:07 -04:00
archipelago	0684491072	chore: baseline codex hardening before lifecycle refactor Snapshots the in-flight hardening work so subsequent reconcile/Quadlet phases land on a clean before/after diff. Changes: - core/container/src/podman_client.rs: image_uses_insecure_registry() whitelist for the OVH (146.59.87.168:3000) and legacy Hetzner (23.182.128.160:3000) HTTP mirrors; podman_network_settings() lifts custom networks into the Networks map so containers can join them. - core/archipelago/src/container/prod_orchestrator.rs: ensure_container_network() creates per-manifest networks on demand; apply_data_uid() now goes through host_sudo for mkdir -p + chown so bind-mount roots get created and chowned without password prompts. - core/archipelago/src/api/rpc/package/{install,update,stacks}.rs: podman pull adds --tls-verify=false only for whitelisted registries. - core/archipelago/src/bootstrap.rs: removes stale dev-mode systemd override on startup (live nodes carried it from old installers). - core/archipelago/src/config.rs: ignore ARCHIPELAGO_DEV_MODE in prod binaries — it had been silently rerouting volumes to /tmp. - apps/bitcoin-{core,knots}/manifest.yml: locate bitcoind at runtime so image-layout differences don't break entrypoint. - scripts/app-catalog-image-smoke-test.py: production catalog/image smoke test that probes a target node before users click Install. - .gitignore: cover .codex, .pnpm-store, __pycache__, *.bak. Removes filebrowser.rs.bak and two stale catalog.json.bak files (verified identical to live counterparts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 08:52:29 -04:00
archipelago	05e6c2e738	fix: release v1.7.51-alpha install hardening	2026-05-01 05:02:39 -04:00
archipelago	7ab788d178	chore: release v1.7.49-alpha	2026-04-30 16:37:54 -04:00
archipelago	8f83b37d51	feat(orchestrator): complete container migration and release hardening	2026-04-28 15:00:58 -04:00
archipelago	2843cc1e84	fix(container/image_versions): reject entries that are not image references The parser retained any key ending in _IMAGE, so a harmless-looking variable like NOT_AN_IMAGE="something" would be treated as a pinned container image. Add a value-shape check: the value must contain both a registry separator (/) and a tag separator (:) to qualify.	2026-04-23 13:02:15 -04:00
archipelago	12f93cc15e	fix(image-versions): locate image-versions.sh at its actual deployed path The Rust search path listed /opt/archipelago/image-versions.sh and scripts/image-versions.sh (repo-relative for dev), but the image recipe deploys the file to /opt/archipelago/scripts/image-versions.sh. Production nodes therefore silently failed every lookup: find_file returned None, load_image_versions returned an empty HashMap, and both pinned_image_for_app and pinned_images_for_stack returned no matches. Symptom on deployed nodes: every container scan emitted "image-versions.sh not found in any search path" at DEBUG level, and the version-comparison logic in docker_packages.rs plus the update-check logic in api/rpc/package/update.rs silently degraded to no-op — users would not see update-available badges and upgrade RPCs could not resolve pinned targets. Fix: put the canonical deployed path first in PATHS, keep the older /opt/archipelago/image-versions.sh as a fallback for not-yet-updated nodes, and retain scripts/image-versions.sh as the dev-repo-relative fallback. Verified on .228: backend now logs "Parsed 57 image versions from /opt/archipelago/scripts/image-versions.sh" on scan. Pre-existing test_parse_image_versions failure in this module is unrelated (the NOT_AN_IMAGE assertion was broken before this change because the parser's _IMAGE-suffix retain keeps it). Leaving that for the general cargo-test cleanup pass.	2026-04-23 09:29:15 -04:00
archipelago	28e38a36a9	fix(config): auto-purge decommissioned .23 VPS from saved registry/mirror configs load_registries + load_mirrors normally only ADD missing defaults to the persisted JSON — explicit removals stick. After retiring the .23 Hetzner VPS we need the opposite: existing nodes have .23 baked into their saved configs and would spend seconds per install/update timing out against a dead host until the operator manually removes it via the Settings UI. Add a targeted one-time migration in both loaders: if any saved entry has 23.182.128.160 in its URL, drop it on load and rewrite the file. This is an exception to the usual "explicit removals stick" rule — the user never chose to add this mirror, it was a default. Narrow-scope migration (one hardcoded IP match, no schema version) because the cost/benefit of a general migration system isn't worth it for a single decommissioned host. Future retirements can follow the same pattern.	2026-04-23 08:51:26 -04:00
archipelago	d9d5fa65e5	chore: retire .23 VPS mirror, promote .168 OVH to primary The Hetzner VPS at 23.182.128.160 was decommissioned. Replace it everywhere with the OVH VPS at 146.59.87.168, which was previously the tertiary mirror. - update.rs: drop DEFAULT_TERTIARY_MIRROR_URL, promote .168 into the secondary slot as "Server 1 (OVH)"; tx1138 becomes Server 2. Default mirror list shrinks from 3 to 2. - container/registry.rs: default RegistryConfig drops .23, promotes .168 to Server 1 / priority 0, tx1138 stays Server 2 / priority 10. - api/rpc/package/config.rs: trusted-registry allowlist swaps .23 for .168. - api/handler/mod.rs: app-catalog fallback URL uses .168. - neode-ui/views/marketplace/marketplaceData.ts: REGISTRY uses .168. - scripts/image-versions.sh: ARCHY_REGISTRY_FALLBACK uses .168. - image-recipe/build-auto-installer-iso.sh: installer ISO registries use .168 (both podman registries.conf and backend registries.json). Tests updated to assert on the new 2-entry default lists (registry + mirror). URL-parser fixture tests in update.rs retain .23 strings — they exercise string-parsing logic, not mirror policy. Git remotes: dropped `gitea-vps` and the .23 push URL on the `origin` multi-push alias (not part of this commit — pure working-copy change).	2026-04-23 08:22:32 -04:00
archipelago	3e9c192b48	feat(container): bitcoin-ui pre-start hook renders nginx.conf from embedded template Replaces the first-boot-containers.sh sed/envsubst approach with a Rust-native render step bound into the ContainerOrchestrator lifecycle. - New container::bitcoin_ui module: embeds the nginx.conf template via include_str!, reads the plaintext RPC password from /var/lib/archipelago/secrets/bitcoin-rpc-password, substitutes {{BITCOIN_RPC_AUTH}} with base64(archipelago:<password>), and atomic- writes (tmp + rename) to /var/lib/archipelago/bitcoin-ui/nginx.conf. Idempotent: byte-compares before writing so unchanged input is a no-op (no inode churn, no restart cascade). - ProdContainerOrchestrator gains run_pre_start_hooks(app_id) returning HookOutcome::{Rewritten, Unchanged}. Fires in install_fresh before create_container, and in ensure_running: on Running + Rewritten triggers a restart; on Stopped re-renders then starts. - bitcoin-ui Dockerfile no longer COPYs a default.conf; the file now arrives via runtime bind-mount of the rendered config. If the bind- mount is ever missing, nginx starts with no site configured and returns 404 everywhere — safe failure vs. serving upstream RPC with a stale Authorization header. - apps/{bitcoin,electrs,lnd}-ui/manifest.yml land as first-class manifests. bitcoin-ui declares the bind-mount target and a dependency on bitcoin-core; electrs-ui and lnd-ui declare their own deps and health checks. - 8 new unit tests on the render fn (idempotency, rotation, trimming, missing/empty secret, template invariants) plus an integration test asserting install(bitcoin-ui) actually lands a substituted nginx.conf on disk via the hook. 39/39 container:: tests pass (test_parse_image_versions pre-existing failure unchanged, out of scope).	2026-04-23 02:19:52 -04:00
archipelago	81c1613040	feat(container): BootReconciler — periodic reconcile loop for prod orchestrator Step 5 of the rust-orchestrator migration. New file boot_reconciler.rs holds a small Tokio task that calls ProdContainerOrchestrator::reconcile_all() on a 30-second cadence (answered design Q3). * BootReconciler::new(orch, interval, shutdown) — shutdown is an Arc<Notify> so callers can trigger a graceful exit without pulling in tokio-util. * run_forever(self) — does one reconcile immediately, then loops on tokio::select! { sleep_until \| shutdown.notified() }. Shutdown interrupts the sleep but never an in-flight reconcile_all call. * Per-pass outcomes are logged at debug/warn; failures never propagate out because reconcile_all already absorbs per-app errors into ReconcileReport. Four tokio::test(start_paused = true) tests verify the loop cadence against a CountingRuntime test double: * initial_pass_fires_immediately — first reconcile runs with no delay * second_pass_fires_after_interval — second pass fires after exactly interval elapses in paused-clock time * shutdown_terminates_loop — notify_one() lets run_forever return * failure_in_one_pass_does_not_stop_loop — the loop keeps ticking even when the first pass had to install a missing container Not wired into main.rs yet — that is Step 6. Re-exported from container::mod as BootReconciler + RECONCILER_DEFAULT_INTERVAL for the wire-up step.	2026-04-22 19:04:34 -04:00
archipelago	40a6eaca72	feat(container): ContainerOrchestrator trait, RpcHandler uses it in prod Step 4 of the rust-orchestrator migration. Unifies the container lifecycle surface behind a single trait so the RPC layer stops caring whether it is talking to the dev or prod orchestrator. * New trait core/archipelago/src/container/traits.rs: ContainerOrchestrator with install / start / stop / restart / remove / upgrade / status / list / logs / health, all keyed by app_id. Every method is async_trait-based. * ProdContainerOrchestrator: the lifecycle methods are moved from inherent impl into the trait impl (avoids name-shadowing recursion). Adoption and reconcile remain inherent since only main.rs / BootReconciler call them. * DevContainerOrchestrator: new trait impl that forwards to the existing Dev-named methods, applying the dev container-name + port-offset rules internally. New load_manifest_for() helper resolves app_id to <data_dir>/apps/<app_id>/manifest.yml so trait-level install(app_id) works in dev too. install_container(manifest, path) stays inherent for the manifest-path RPC shape. * RpcHandler now holds Option<Arc<dyn ContainerOrchestrator>> and, when in dev mode, a separate Option<Arc<DevContainerOrchestrator>> for the manifest_path install RPC. In prod mode RpcHandler::new() constructs a ProdContainerOrchestrator and calls load_manifests() at startup. * All seven container-* RPC guards no longer say dev mode required. container-install still requires dev mode because its manifest_path argument has no prod meaning; every other container RPC now works in both modes via the trait. BOOT STILL DOES NOT USE THIS. main.rs wire-up (Step 6) and BootReconciler (Step 5) come next. Until then the prod orchestrator is constructed but nothing populates /opt/archipelago/apps so it has zero manifests to manage, matching the pre-Step-4 behaviour. Verification: cargo build -p archipelago clean (11 expected unused method warnings for methods not yet wired from main.rs). cargo test -p archipelago: all 21 container::* tests pass (16 prod_orchestrator + 5 others). 24 other test failures are pre-existing and unrelated (identity_manager / session / wallet / mesh / credentials — all independently flaky on file-backed state).	2026-04-22 18:56:52 -04:00
archipelago	e103925a4e	feat(container): ProdContainerOrchestrator with build-or-pull, adoption, reconcile Step 3 of the rust-orchestrator-migration. New file prod_orchestrator.rs (999 LOC) implements the full public surface that will replace scripts/first-boot-containers.sh: * install / start / stop / restart / remove / upgrade / status / list / logs / health * adopt_existing: read-only scan that claims containers matching our manifests by name, without recreating — preserves the v1.7.42 fixture on .116. * reconcile_all: level-triggered, per-app failures collected rather than aborting. * install_fresh: build-or-pull (Step 2 trait methods), relative build contexts resolved against the manifest directory. Naming rule (answered design Q1): UI app IDs (bitcoin-ui/electrs-ui/lnd-ui) get the archy- prefix; backends keep their bare ID. An explicit extensions.container_name always wins. Codified in compute_container_name() with unit tests for all three tiers. Concurrency (answered design Q4): per-app tokio::sync::Mutex<()> created lazily, protecting every mutating op against the reconciler loop. Acquiring the per-app lock only needs a read lock on the map, so independent apps do not serialize. 16 tests: 3 sync naming rule tests + 13 tokio async tests covering install (pull, build-absent, build-present, relative-context), reconcile (noop/exited/missing/ mixed-failure), adopt-by-name, upgrade sequence ordering, list filtering, health state mapping, and unknown-app-id rejection. All pass. Not wired into main.rs yet — that is Step 6. Crate builds clean with expected unused warnings for the new re-exports.	2026-04-22 18:32:31 -04:00
archipelago	919055f3f1	feat(container): add build source to manifest schema ContainerConfig.image is now Option<String>, mutually exclusive with a new optional ContainerConfig.build: Option<BuildConfig>. Exactly one of image or build must be present, enforced in AppManifest::validate. Adds ResolvedSource enum (Pull \| Build) and ContainerConfig::resolve + ::image_ref helpers so the orchestrator can treat pull and build uniformly. All 26 existing pull-only manifests continue to parse unchanged (covered by existing_pull_only_manifests_still_parse test). Call sites updated: podman_client, runtime::DockerRuntime, dev_orchestrator. Dev orchestrator errors out cleanly on Build sources until Step 2 lands build_image support on the runtime trait. Step 1 of docs/rust-orchestrator-migration.md. 10 new unit tests, all pass. Also includes: docs/rust-orchestrator-migration.md (design spec) and docs/STATUS.md resume section for the next session.	2026-04-22 17:46:36 -04:00
Dorian	36a6101026	release(v1.7.38-alpha): onboarding auto-heal + silent returning logins + app-store trim - auth.rs now infers onboarding-complete from setup_complete + password_hash so nodes stop bouncing users through the intro wizard after browser clear / update / reboot; the flag self-heals to disk on next check - frontend: "backend uncertain" no longer defaults to /onboarding/intro — useOnboarding returns null + callers poll / retry instead of flashing the wizard - login sounds (synthwave, welcome voice, pop, whoosh, oomph) gated by isFirstInstallPhase(); typing sounds unaffected - removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog, frontend config, Rust AppMetadata + install dispatch + install_penpot_stack; docker/fips-ui + docker/nostr-vpn-ui + apps/penpot dirs and 5 icons deleted; 15 image versions deleted from tx1138, .168, gitea-local registries (.160 Gitea was 502 at release time — follow-up) - AIUI baked into frontend release tarball via demo/aiui/; deploy-to-target falls back to demo/aiui/ when the AIUI sibling checkout is missing - prebuild hook syncs app-catalog/catalog.json → public/catalog.json so the two copies can no longer drift (was the source of the "apps still visible" bug — public/ had stale data) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:02:24 -04:00
Dorian	cfc98c600e	release(v1.7.37-alpha): bitcoin-core install fixes + dynamic node UI + full-archive default Install flow - api/rpc/package/install.rs: always append the literal image URL as a last-resort pull candidate in do_pull_image, so images not carried by any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install instead of masquerading as a generic pull failure across every mirror. - api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat error, not just "file exists". Once bitcoin-knots' first-boot chowns /var/lib/archipelago/bitcoin into the container's user namespace (700 perms, UID 100100/100101), the archipelago daemon can't even traverse in — try_exists returns Err which unwrap_or(false) treated as "not present" and drove a doomed write. Now errors out of the directory traversal are treated as "conf already owned by container user" and the write is skipped. Mirrors the lnd.conf pattern. - api/rpc/package/install.rs: drop the hardcoded `prune=550` from the conf default. Operators with multi-TB drives shouldn't be silently pruned; users who want a pruned node can set it in bitcoin.conf themselves. Full archive is the only honest default. - api/rpc/package/config.rs: bitcoin-core now passes explicit -server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and reads conf + argv only; without these the RPC listens on 127.0.0.1 inside the container and rootlessport can't reach it, so the bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call. Bitcoin Knots keeps its own entrypoint-driven defaults. - container/docker_packages.rs: split bitcoin-core out of the shared AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with bitcoin-core.svg and a Reference-implementation description; the bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home card showing "Bitcoin Knots" for a Core install. Bitcoin node UI (docker/bitcoin-ui) - index.html: impl name/tagline/logo now dynamic. applyImplBranding() reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both get their own icon and subtitle. Settings modal replaced its hardcoded Regtest/txindex=1/port-18443 placeholders with live values from getblockchaininfo + getindexinfo + getzmqnotifications. - index.html: new Storage info card (Full Archive · X GB / Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on the main dashboard, same level as Network. Settings modal mirrors it with the prune height when applicable. - Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the bg-network.jpg used by the dashboard are now COPY'd into the image under /usr/share/nginx/html/assets. Previously the <img src> pointed at paths that 404'd into the SPA fallback and the onerror handler hid the broken logo silently. Frontend - appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334), HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core). Without these the AppSessionFrame showed "No URL found for bitcoin-core" and the home/app-list title fell through to the raw id. - settings/AccountInfoSection.vue: backfill What's New entries for v1.7.31 through v1.7.37 that had been missed in earlier cuts. Release plumbing - releases/v1.7.37-alpha/: binary + frontend tarball. - releases/manifest.json: v1.7.37-alpha, sha256/size refreshed. - Cargo.toml / package.json: version bumps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:03:47 -04:00

1 2 3

104 Commits