archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	23c4e7441f	refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 10:45:07 -04:00
archipelago	2bf8181110	refactor(security): tighten capability + TLS-bypass surface Three small, focused tightenings: - core/container/src/podman_client.rs: drop the legacy Hetzner 23.182.128.160:3000 mirror from image_uses_insecure_registry(). It was decommissioned in v1.7.x and is stripped from active registry config at load time; leaving it in the bypass list let a stale config still skip TLS. Replace the inline match with a named INSECURE_REGISTRY_HOSTS slice so future entries are one line. Test now also pins the spoofing-immune semantics ("evil.example/146.59.87.168:3000/x" must NOT match). - core/archipelago/src/api/rpc/package/config.rs: split bitcoin from lnd in get_app_capabilities(). bitcoind never opens raw sockets — drop CAP_NET_RAW from bitcoin/bitcoin-core/bitcoin-knots. lnd/fedimint/fedimint-gateway keep it because they enumerate network interfaces during cert generation. - core/archipelago/src/bootstrap.rs: tighten_secrets_dir() enforces 0700 on /var/lib/archipelago/secrets and 0600 on every file inside on each startup. The dir-mode is the load-bearing isolation boundary against rootless container escapes (their UID maps to >=100000, can't traverse uid=1000/0700). The per-file sweep is defense-in-depth against any installer that wrote 0644. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 08:59:11 -04:00
archipelago	0684491072	chore: baseline codex hardening before lifecycle refactor Snapshots the in-flight hardening work so subsequent reconcile/Quadlet phases land on a clean before/after diff. Changes: - core/container/src/podman_client.rs: image_uses_insecure_registry() whitelist for the OVH (146.59.87.168:3000) and legacy Hetzner (23.182.128.160:3000) HTTP mirrors; podman_network_settings() lifts custom networks into the Networks map so containers can join them. - core/archipelago/src/container/prod_orchestrator.rs: ensure_container_network() creates per-manifest networks on demand; apply_data_uid() now goes through host_sudo for mkdir -p + chown so bind-mount roots get created and chowned without password prompts. - core/archipelago/src/api/rpc/package/{install,update,stacks}.rs: podman pull adds --tls-verify=false only for whitelisted registries. - core/archipelago/src/bootstrap.rs: removes stale dev-mode systemd override on startup (live nodes carried it from old installers). - core/archipelago/src/config.rs: ignore ARCHIPELAGO_DEV_MODE in prod binaries — it had been silently rerouting volumes to /tmp. - apps/bitcoin-{core,knots}/manifest.yml: locate bitcoind at runtime so image-layout differences don't break entrypoint. - scripts/app-catalog-image-smoke-test.py: production catalog/image smoke test that probes a target node before users click Install. - .gitignore: cover .codex, .pnpm-store, __pycache__, *.bak. Removes filebrowser.rs.bak and two stale catalog.json.bak files (verified identical to live counterparts). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 08:52:29 -04:00
archipelago	05e6c2e738	fix: release v1.7.51-alpha install hardening v1.7.51-alpha	2026-05-01 05:02:39 -04:00
archipelago	be9f9528c3	fix: release v1.7.50-alpha OTA runtime repair v1.7.50-alpha	2026-05-01 03:14:07 -04:00
archipelago	7ab788d178	chore: release v1.7.49-alpha v1.7.49-alpha	2026-04-30 16:37:54 -04:00
archipelago	f507b847ef	chore: release v1.7.48-alpha Hotfix: archipelago.service ExecStartPre now mkdirs /run/containers and /var/lib/containers before the unit's mount-namespace setup tries to bind them. Without this, fresh nodes that don't have /run/containers (e.g. nodes provisioned without a prior podman session) fail at the namespace step with: Failed to set up mount namespacing: /run/containers: No such file or directory Failed at step NAMESPACE spawning /bin/bash: No such file or directory Existing nodes don't pick up systemd unit changes via OTA — they need a one-time `systemctl edit archipelago` adding the same mkdir. ISO installs from this version forward have the fix baked in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v1.7.48-alpha	2026-04-29 16:27:22 -04:00
archipelago	8a2899ab4a	chore: release v1.7.47-alpha Sync-perf tuning for bitcoin/bitcoin-core/bitcoin-knots/electrumx. - Drop the --cpus=2 cap on bitcoin/electrumx variants. Script verification is parallelizable; the cap halved IBD speed on 4-8 core machines. - Bump bitcoin --memory 4g→8g so dbcache=4096 has headroom for mempool + connection buffers + I/O. 4g was OOM-prone during heavy IBD. - Bump electrumx --memory 1g→2g + add CACHE_MB=2048 + MAX_SEND=10MB. - bitcoin-core CLI args gain -dbcache=4096 -par=0 -maxconnections=125. - bitcoin-knots manifest matched (1024MB pruned / 4096MB full + par=0). Future v2: host-RAM-aware dbcache scaling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v1.7.47-alpha	2026-04-29 15:47:51 -04:00
archipelago	992b673b20	chore: release v1.7.46-alpha Follow-up to v1.7.45-alpha closing the remaining tasks identified by the resilience sweeps + the new bitcoin orphan / install-fail-vanish bugs. User-visible: - Health monitor: stop paging on orphaned containers from variant switches - Install fail: card stays visible (was vanishing) with error message - Stack pull progress: interpolate 20→70% (was stuck at 20%) - docker.io → lfg2025 mirror: bitcoin/gitea/nextcloud/valkey Internal: - Resilience harness — install-wait uses expected_containers_for, ui+auth probes retry with 60s backoff, dep-snapshot fix - InstallProgress gains optional `message` field (frontend renders it when phase is None) binary $(stat -c %s releases/v1.7.46-alpha/archipelago) sha256:$(sha256sum releases/v1.7.46-alpha/archipelago \| awk '{print $1}') tarball $(stat -c %s releases/v1.7.46-alpha/archipelago-frontend-1.7.46-alpha.tar.gz) sha256:$(sha256sum releases/v1.7.46-alpha/archipelago-frontend-1.7.46-alpha.tar.gz \| awk '{print $1}') Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v1.7.46-alpha	2026-04-29 14:50:33 -04:00
archipelago	4ec6ca98c1	chore: release v1.7.45-alpha Resilience-validated release. Three full sweeps of the new resilience harness against .228 confirm no shipstoppers. Big user-visible: - Bitcoin RPC auth durably correct via host-rendered nginx.conf bind-mount, replaces fragile post-start exec that failed under restricted-cap rootless podman ("crun: write cgroup.procs: Permission denied") - Multi-container stack installs (indeedhub, immich, btcpay, mempool) now emit phase events at every boundary so the progress bar advances - Apps no longer vanish from the dashboard mid-install (absent-scanner skips packages in transitional states) - Indeedhub fresh installs work end-to-end (was 8500+ restart loop): five missing env vars (DATABASE_PORT, QUEUE_HOST, QUEUE_PORT, S3_PRIVATE_BUCKET_NAME, AES_MASTER_SECRET) added to install code - Tailscale install fixed: --entrypoint string was being passed as a single shell-line arg; switched to custom_args array - Catalog cleaned of broken entries (dwn, endurain, ollama removed; nextcloud restored on docker.io) - Bitcoin Core update path uses correct image (was looking for nonexistent lfg2025/bitcoin:28.4) - ISO installs now allocate swap on the encrypted data partition Infra: - New resilience harness (scripts/resilience/) — black-box state-machine tester, every app × every transition. Run before each release. Sweep #3 final: PASS 107 / FAIL 12 / SKIP 14. The 12 fails are 1 cosmetic (homeassistant trusted_hosts), 8 harness/timing false-positives, and 3 non-shipstopper tracked items. Down from 23 in baseline sweep #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> v1.7.45-alpha	2026-04-29 12:31:45 -04:00
archipelago	dffa7e99bb	chore: release v1.7.44-alpha v1.7.44-alpha	2026-04-28 15:03:04 -04:00
archipelago	8f83b37d51	feat(orchestrator): complete container migration and release hardening	2026-04-28 15:00:58 -04:00
archipelago	4d05705315	feat(self-update): sync and rebuild UI containers on OTA self-update.sh previously rebuilt only the backend binary and Vue frontend. The custom UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) were left untouched forever. That meant any change to docker/<ui>/{Dockerfile, nginx.conf, index.html, ...} never reached a running node through OTA; it required a manual SSH + rebuild. This is exactly why the lnd-ui port fix didnt reach .228 in v1.7.43-alpha. Add a sync-and-rebuild stage: 1. Hash each docker/<ui>/ tree (content-only, path-stable via `cd && find` so src and dst compare equal when identical). 2. rsync changed trees to /opt/archipelago/docker/<ui>/. 3. For each changed UI: rebuild image as the archipelago user (rootless podman), then stop+remove+recreate the container using the canonical spec from scripts/container-specs.sh. Port mappings, caps, memory, and security opts all come from the spec, so the runtime cant drift from the tree. Also install first-boot-containers.sh into /opt/archipelago/scripts/ so a later reconciler run or reboot picks up current orchestration logic. Idempotent: if no UI tree changed since the last update, the whole stage is a no-op beyond the hash compare. Verified end-to-end on .228 with a synthetic change to lnd-ui: detection, sync, build, recreate, and HTTP 200 on both the direct container port and the host-nginx /app/lnd/ proxy.	2026-04-23 15:48:53 -04:00
archipelago	05b41f8946	fix(lnd-ui): align container port across all specs The LND UI container was unreachable on .228 after the v1.7.43-alpha deploy because three sources of truth disagreed on which port nginx listens on inside the container: - docker/lnd-ui/nginx.conf listen 8081 - docker/lnd-ui/Dockerfile EXPOSE 8080 - apps/lnd-ui/manifest.yml host networking, ports: [] - scripts/first-boot-containers.sh -p 8081:8080 - scripts/deploy-to-target.sh -p 8081:80 (de-facto) - scripts/deploy-tailscale.sh -p 8081:80 - scripts/container-specs.sh SPEC_PORTS=8081:80 Result: podman published host 8081 to container port 80, but no one was listening on 80 inside, so connections were reset. Canonicalize on container:80 with host:8081 publish, matching the three deploy paths already in agreement. Changes: - docker/lnd-ui/nginx.conf: listen 8081 -> listen 80 - docker/lnd-ui/Dockerfile: EXPOSE 8080 -> EXPOSE 80 - apps/lnd-ui/manifest.yml: replace host-network (never true) with bridge networking and explicit 8081:80 port mapping, correcting a documentation-vs-reality mismatch - scripts/first-boot-containers.sh: -p 8081:8080 -> -p 8081:80, and fix the internal-port comment Verified on .228 after rebuild: curl http://127.0.0.1:8081/ returns HTTP 200 and the /app/lnd/ host-nginx proxy resolves cleanly.	2026-04-23 15:42:49 -04:00
archipelago	ed73e4709b	chore(release): archive ISO build recipes, tarball-only releases Releases no longer ship as bootable ISOs. Archipelago updates are distributed as the backend binary plus a frontend tarball referenced by releases/manifest.json. Nodes OTA-update via scripts/self-update.sh. Filebrowser and AIUI remain bundled inside the frontend tarball and deployed atomically, verified present in v1.7.43-alpha release artifact (189 AIUI files, filebrowser-client bundle). Archived under image-recipe/_archived/ (resurrectable if ISO distribution is reintroduced): - build-auto-installer-iso.sh - build-unbundled-iso.sh - test-iso-qemu.sh - scripts/convert-iso-to-disk.sh - BUILD-ISO-STATUS.md, ISO-BUILD-CHECKLIST.md - branding/isohdpfx.bin - .gitea/workflows/build-iso-dev.yml Updated release process docs to drop ISO references: - scripts/create-release.sh (next-steps text) - docs/BETA-RELEASE-CHECKLIST.md - docs/hotfix-process.md - README.md	2026-04-23 15:36:00 -04:00
archipelago	0bd4e49a8c	docs(release-notes): v1.7.43-alpha bullet for AIUI preservation fix v1.7.43-alpha	2026-04-23 13:22:28 -04:00
archipelago	310c709aba	chore(release): bump version to 1.7.43-alpha	2026-04-23 13:21:58 -04:00
archipelago	dbf755e908	fix(aiui): bundle demo/aiui in self-update and ISO builds so updates never wipe it Every OTA self-update and every ISO capture was implicitly relying on /opt/archipelago/web-ui/aiui/ already being present on disk. Any node that had its web-ui directory atomically swapped (for example by a manual deployment shipping only neode-ui dist output) lost aiui entirely and the AI Assistant tab fell through to the "needs to be enabled" placeholder. self-update.sh: drop the rsync --exclude aiui preservation trick and instead stage demo/aiui into the freshly-built dist tree before rsync. demo/aiui in the repo is now the source of truth; every update overwrites the on-disk copy with a matching version rather than carrying forward whatever stale bundle happened to survive. build-auto-installer-iso.sh: prepend demo/aiui to the AIUI search list so ISO builds from a fresh repo clone pick it up automatically, without requiring a side-checkout of the AIUI project or a live dev server. This matches create-release-manifest.sh which already bakes demo/aiui into the release tarball (lines 86-89).	2026-04-23 13:21:49 -04:00
archipelago	2572688468	docs(release-notes): v1.7.43-alpha bullets for chunking, avatar, outbox, parser Four production-code fixes merit user-visible mention: the transport chunking data-corruption fix (real user-affecting bug for multi-chunk mesh payloads), the avatar u16 overflow panic (backend crash on certain seeds), the outbox TTL boundary, and the image-versions parser hardening.	2026-04-23 13:03:49 -04:00
archipelago	4bf35f95e6	test: repair stale test fixtures across identity, mesh, update, wallet, fips Several tests had drifted from the current production behavior: - identity_manager: create() already auto-provisions a Nostr key, so the explicit create_nostr_key() call failed with "already exists". Rewrite the test to assert on record.nostr_npub from create() directly. - mesh/protocol: test_build_app_start read the app name from frame[4..] but the v2 layout is [0:marker][1-2:len][3:cmd][4:version][5..:name]. test_identity_broadcast_roundtrip expected input DID = output DID but the v2 decoder derives DID from the ed25519 pubkey, so the roundtrip compares against did_key_from_pubkey_hex(&pub) now. - mesh/bitcoin_relay: test_build_block_header_announcement asserted sig.is_some(), but the builder intentionally emits an unsigned envelope to fit the 160-byte LoRa limit; assert sig.is_none(). Also widen placeholder hashes to the required 64 hex chars (32 bytes). - update: load_mirrors() now merges default mirrors post-migration, so the roundtrip test must assert the custom mirror survives alongside the defaults rather than strict equality. - wallet/cashu: test_proof_c_as_pubkey used hex that is not on the curve; replace with the secp256k1 generator point G so parsing succeeds. - fips: test_status_reports_no_key_pre_onboarding asserted npub.is_none(), which fails on dev boxes where the fips daemon is already running. Keep the !key_present assertion and drop the npub one.	2026-04-23 13:02:45 -04:00
archipelago	4edc420459	test(credentials): seed identity/node_key in test helper so encrypt/decrypt works Credentials tests created a fresh tempdir and immediately invoked encrypt/decrypt, but load_encryption_key reads <dir>/identity/node_key which did not exist, so every test failed with "node key not found". Add a test_dir_with_node_key() helper that writes a deterministic 32-byte key and switch all 8 call sites to it.	2026-04-23 13:02:28 -04:00
archipelago	7af048cc1a	fix(session): add test-only constructor so tests do not read real sessions SessionStore::new() reads /var/lib/archipelago/sessions.json, which on any node with an active dashboard contains live sessions that pollute test state and cause intermittent failures. Introduce a cfg(test) only new_for_tests(PathBuf) constructor and switch the test suite to it so tests always start from a clean tempdir.	2026-04-23 13:02:22 -04:00
archipelago	2843cc1e84	fix(container/image_versions): reject entries that are not image references The parser retained any key ending in _IMAGE, so a harmless-looking variable like NOT_AN_IMAGE="something" would be treated as a pinned container image. Add a value-shape check: the value must contain both a registry separator (/) and a tag separator (:) to qualify.	2026-04-23 13:02:15 -04:00
archipelago	c5ea41d0cb	fix(mesh/outbox): expire messages with zero TTL immediately is_expired used age > ttl_secs, so a message with ttl_secs=0 whose age rounded to 0 seconds was considered live forever. Switch to >= so the zero-TTL boundary expires on the first check, matching the intuitive meaning of TTL and the behavior the tests assert.	2026-04-23 13:02:07 -04:00
archipelago	9d42645aa3	fix(avatar): prevent u16 overflow panic when seed byte is large hue_color and accent_color computed (seed as u16) * 360, which overflows u16 when seed >= 182 — debug builds panicked, release wrapped silently. Widen to u32 before the multiplication. This also unblocks several identity_manager tests that constructed avatars through master_node_svg and were aborting on the panic.	2026-04-23 13:02:01 -04:00
archipelago	f6efe2f356	fix(transport/chunking): stop overwriting first 4 bytes of user data encode_chunked() split the payload into shards first, then overwrote the first 4 bytes of shard 0 with a u32 length header, then re-ran Reed-Solomon to regenerate parity over the now-corrupted shards. The decoder correctly read the length header and trimmed `[4..4+len]` from the reconstructed buffer, but those first 4 bytes had already been destroyed on the encode side, so every chunked mesh payload lost its first 4 bytes. Restructure: reserve 4 bytes for the length header up front, build a single contiguous [len][data][pad] buffer, then split into shards. Parity is computed over the correct shards on the first pass, no double-encode needed. Update test_chunk_roundtrip_medium: 500 bytes + 4-byte header = 504 bytes, which is 5 data shards (ceil(504/124)), not 4. The old test assertion was wrong all along and masked the corruption bug because it only checked the roundtripped bytes, which is exactly what we need to verify. New assertion is correct. Verified: all 7 transport::chunking tests pass.	2026-04-23 12:29:10 -04:00
archipelago	c4efb30382	docs(release-notes): v1.7.43-alpha bullet for install-log fix; prune stale RESUME note	2026-04-23 12:04:20 -04:00
archipelago	cd6f8bad70	fix(install-log): pre-create /var/log/archipelago/ so non-root backend can write The backend runs as `archipelago` and calls `install_log()` to append audit lines to the install log on every install / update / remove / start / stop / restart. Target path was /var/log/archipelago-container-installs.log, which does not exist and cannot be created by the service because /var/log/ is root-owned. OpenOptions errors were silently swallowed, so the log was never written on any node. Ship a tmpfiles.d rule that pre-creates /var/log/archipelago/ and container-installs.log with archipelago:archipelago ownership. Move the const path to match, keeping logs inside the directory logrotate already rotates (image-recipe/configs/logrotate.conf). Install the rule from both the ISO build and self-update, and apply it immediately on self-update so existing nodes get a working log without needing a reboot. Verified on .228: file created, backend user can write, backend binary rebuilt with new const.	2026-04-23 12:02:46 -04:00
archipelago	9f3d66e24e	docs(release-notes): v1.7.43-alpha bullet for self-update script refresh Document that OTA updates now refresh the reconcile helper scripts, closing the deploy gap that kept fixes to those scripts from reaching existing nodes.	2026-04-23 11:51:04 -04:00
archipelago	a272a79706	fix(self-update): install reconcile scripts on OTA updates The OTA self-update path only refreshed image-versions.sh, leaving reconcile-containers.sh and container-specs.sh frozen at whatever version was baked into the ISO that originally provisioned the node. Any fix to those scripts (notably the --create-missing flag and the DISK_GB detection fix shipped this round) never reached existing nodes, and on .228 both scripts were outright missing because the node predated their inclusion in the ISO recipe. Install all three helper scripts to /opt/archipelago/scripts/ on every self-update run. Also preserve the legacy copy of image-versions.sh at /opt/archipelago/image-versions.sh for any older backend binaries still looking there first.	2026-04-23 10:07:53 -04:00
archipelago	694e5b0a9d	fix(update): pass --create-missing when rollback recreates a destroyed container The update flow removes the old container before starting the new one. If the update fails after removal, the rollback path tries `podman start <name>` first, then falls back to reconcile. But reconcile without --create-missing treats the now-absent container as an optional one that the install flow will (re)create later, and skips it. Result: container stays destroyed until someone notices and runs reconcile manually. Add --create-missing to the rollback reconcile invocation so the fallback actually rebuilds the container from its canonical spec. Fixes the failure mode observed on .228 where a bitcoin-knots update left the node with no bitcoin-knots container at all.	2026-04-23 10:06:55 -04:00
archipelago	0f1ad47aec	docs(release-notes): v1.7.43-alpha bullets for disk-detection and rollback recovery Add two user-facing release notes for fixes shipped this round: - Full-archive Bitcoin nodes no longer silently get pruned on reconcile because the disk-size check was reading the OS partition. - Failed updates can now recover via reconcile --create-missing instead of leaving a destroyed container behind.	2026-04-23 10:02:32 -04:00
archipelago	06dcdafda4	fix(specs): measure DISK_GB at /var/lib/archipelago, not / The reconcile spec for bitcoin-knots auto-enables prune=550 when DISK_GB < 1000. DISK_GB was measured via `df /`, which on every archy install reports the ~30 GB OS partition because user data lives on a separate encrypted /var/lib/archipelago volume. Result: every archy node with a 2 TB data drive was silently being configured as a pruned node, and any bitcoin-knots container recreated by reconcile would delete its historical blocks down to the 550 MB prune window on next start. Observed on .228 (2 TB box): blocks dir went from 384 GB to 926 MB after a reconcile-triggered restart. Historical archive unrecoverable without full re-IBD from genesis. Fix: check /var/lib/archipelago first (where bitcoin data actually lives). Fall back to / only on first-boot before the data partition is mounted.	2026-04-23 09:54:16 -04:00
archipelago	92612ddc70	feat(reconcile): add --create-missing flag for recovering from failed-update rollbacks Context: when package update fails after remove-old-container but before reconcile-recreate, the rollback path in update.rs tries to restart the old container by name. If the container is already gone (removed in step 3 of the update), rollback fails silently and the node is left with no live container for that app but on-disk data still intact. This is exactly the state .228 ended up in after the reconcile-script-missing bug killed bitcoin-knots and lnd. Reconcile was designed to only repair existing containers for optional apps (SPEC_OPTIONAL=true): it skips "not installed" entries on the assumption that the install RPC creates them. That safety check is correct for normal operation but blocks recovery when an optional-marked container has been destroyed by a failed update. Fix: add --create-missing flag that overrides the SPEC_OPTIONAL skip. When set, reconcile treats absent containers exactly the same as broken containers — it creates them from the canonical spec using the existing on-disk data directory. Narrow-scope override; the default behaviour is unchanged. Updated --help to document all four flags. Verified on .228: after the failed bitcoin-core update took out both bitcoin-knots and lnd, running reconcile --container=bitcoin-knots --create-missing --force (as the archipelago user, not root — podman is rootless) brought bitcoin-knots back using the pruned chainstate at /var/lib/archipelago/bitcoin. Repeated for lnd. All containers now running; electrumx reconnecting; UIs recovering. Does NOT fix the underlying update-flow rollback hole (rollback should be able to re-create a container from spec, not just restart by name). That is a separate commit — this flag is the manual recovery tool plus the primitive the improved rollback will call.	2026-04-23 09:42:19 -04:00
archipelago	353825b66c	docs: release-note image-versions fix, add marketplace QA tracker, update RESUME - AccountInfoSection.vue: append 5th bullet to v1.7.43-alpha entry explaining that update-available badges and version comparisons work again now that the pinned-image catalog is found at the correct deployed path. - docs/MARKETPLACE-QA.md: new tracker for the upcoming app-by-app install walk on .228. Documents the per-app fix workflow, the four layers we might need to fix at (app recipe, registry image, backend orchestrator, frontend), status-key table for tracking each catalog entry, and the release-notes policy for the walk. - docs/RESUME.md: refresh with a9908597 commit, updated binary md5 on .228, and split Immediate Next Step into Phase 1 (browser verification) and Phase 2 (marketplace walk) with a pointer to the new tracker.	2026-04-23 09:32:41 -04:00
archipelago	12f93cc15e	fix(image-versions): locate image-versions.sh at its actual deployed path The Rust search path listed /opt/archipelago/image-versions.sh and scripts/image-versions.sh (repo-relative for dev), but the image recipe deploys the file to /opt/archipelago/scripts/image-versions.sh. Production nodes therefore silently failed every lookup: find_file returned None, load_image_versions returned an empty HashMap, and both pinned_image_for_app and pinned_images_for_stack returned no matches. Symptom on deployed nodes: every container scan emitted "image-versions.sh not found in any search path" at DEBUG level, and the version-comparison logic in docker_packages.rs plus the update-check logic in api/rpc/package/update.rs silently degraded to no-op — users would not see update-available badges and upgrade RPCs could not resolve pinned targets. Fix: put the canonical deployed path first in PATHS, keep the older /opt/archipelago/image-versions.sh as a fallback for not-yet-updated nodes, and retain scripts/image-versions.sh as the dev-repo-relative fallback. Verified on .228: backend now logs "Parsed 57 image versions from /opt/archipelago/scripts/image-versions.sh" on scan. Pre-existing test_parse_image_versions failure in this module is unrelated (the NOT_AN_IMAGE assertion was broken before this change because the parser's _IMAGE-suffix retain keeps it). Leaving that for the general cargo-test cleanup pass.	2026-04-23 09:29:15 -04:00
archipelago	4faac9cb74	docs(resume): add RESUME.md for context-restart recovery Consolidated single-file snapshot of plan + progress for a fresh OpenCode session to pick up the install UX polish work: - Where we are: v1.7.43-alpha shipped, 5 commits on main, deployed to .228, browser verification in progress. - Immediate next step: await user's verification results from https://192.168.1.228/ browser checklist. - Working layout: SSHFS mount, ssh archy / archy228, deploy recipes. - Architecture patterns: async-spawn lifecycle, phase-based install progress, scanner kick, .23 auto-purge migration. - Backlog: Vaultwarden exit-on-start, install log perms, 22 stale cargo test failures, historical changelog entries left intact. - User preferences: "best long-term first", one-by-one, no push, Bitcoin-only, conventional commits. Complements STATUS.md (which remains the engineering log) with a tighter resume-the-work narrative focused on the current round.	2026-04-23 09:14:36 -04:00
archipelago	b62b731db0	docs(status): record rounds 3-5 + config migration + changelog as shipped Adds a new top section to STATUS.md covering v1.7.43-alpha: - Round 3: phase-based install progress bar - Round 4: post-install scanner kick for instant Launch button - Round 5: .23 VPS retirement, .168 promoted to Server 1 - Config migration: auto-purge .23 from saved registry/mirror JSONs - Changelog: new v1.7.43-alpha entry in AccountInfoSection All 5 commits, deployment md5, verification notes, and git remote cleanup captured. Round 2 rollback command still valid for the full stack since backups predate every round in this session.	2026-04-23 09:09:02 -04:00
archipelago	6c8cb50679	docs(changelog): add v1.7.43-alpha entry covering async lifecycle + .23 retirement Four release-note bullets describing the user-visible changes shipped in this round: - async-spawn install/update/uninstall (UI no longer freezes) - phase-based install progress bar (Preparing through Finalizing) - scanner kick post-install (Launch button appears immediately) - .23 Hetzner VPS retired, .168 OVH promoted to Server 1 with auto-purge migration for existing nodes Matches the tone of existing changelog entries: what changed from the operator's perspective, not internal implementation detail.	2026-04-23 09:07:29 -04:00
archipelago	28e38a36a9	fix(config): auto-purge decommissioned .23 VPS from saved registry/mirror configs load_registries + load_mirrors normally only ADD missing defaults to the persisted JSON — explicit removals stick. After retiring the .23 Hetzner VPS we need the opposite: existing nodes have .23 baked into their saved configs and would spend seconds per install/update timing out against a dead host until the operator manually removes it via the Settings UI. Add a targeted one-time migration in both loaders: if any saved entry has 23.182.128.160 in its URL, drop it on load and rewrite the file. This is an exception to the usual "explicit removals stick" rule — the user never chose to add this mirror, it was a default. Narrow-scope migration (one hardcoded IP match, no schema version) because the cost/benefit of a general migration system isn't worth it for a single decommissioned host. Future retirements can follow the same pattern.	2026-04-23 08:51:26 -04:00
archipelago	d9d5fa65e5	chore: retire .23 VPS mirror, promote .168 OVH to primary The Hetzner VPS at 23.182.128.160 was decommissioned. Replace it everywhere with the OVH VPS at 146.59.87.168, which was previously the tertiary mirror. - update.rs: drop DEFAULT_TERTIARY_MIRROR_URL, promote .168 into the secondary slot as "Server 1 (OVH)"; tx1138 becomes Server 2. Default mirror list shrinks from 3 to 2. - container/registry.rs: default RegistryConfig drops .23, promotes .168 to Server 1 / priority 0, tx1138 stays Server 2 / priority 10. - api/rpc/package/config.rs: trusted-registry allowlist swaps .23 for .168. - api/handler/mod.rs: app-catalog fallback URL uses .168. - neode-ui/views/marketplace/marketplaceData.ts: REGISTRY uses .168. - scripts/image-versions.sh: ARCHY_REGISTRY_FALLBACK uses .168. - image-recipe/build-auto-installer-iso.sh: installer ISO registries use .168 (both podman registries.conf and backend registries.json). Tests updated to assert on the new 2-entry default lists (registry + mirror). URL-parser fixture tests in update.rs retain .23 strings — they exercise string-parsing logic, not mirror policy. Git remotes: dropped `gitea-vps` and the .23 push URL on the `origin` multi-push alias (not part of this commit — pure working-copy change).	2026-04-23 08:22:32 -04:00
archipelago	980c1b25f4	fix(install): kick scanner post-install so Launch button appears immediately After install completes, the async-spawn wrapper wrote state=Running but the skeletal install-time manifest (interfaces: None) persisted until the next scheduled 60s scan. The frontend saw state=running but hasUI=false and hid the Launch button for up to a full minute. Add a shared Notify/watch pair between RpcHandler and the scan loop: - scan_kick (Notify): scan loop selects! between the 60s interval and this notify, running immediately on either. - scan_tick (watch<u64>): scan loop bumps the counter after each completed scan so callers can await completion. Install and update success paths now call kick_scanner_and_wait before flipping to Running. The scan merges via merge_preserving_transitional (state stays Installing/Updating, manifest refreshed from live podman with interfaces.main.ui populated from real port bindings). 2s timeout falls back to pre-fix behavior on slow podman — no regression.	2026-04-23 07:59:03 -04:00
archipelago	7e62ea07f7	feat(install): phase-based progress bar replaces unparseable pull bytes Podman emits zero parseable progress when stderr is piped (no TTY), so the old byte-counter regex never matched in real installs. Users saw 0% for the whole pull, then a jump to 95%, then silence through create-container, health-check, and post-install hooks. Replace with 7 explicit lifecycle phases wired through install.rs and update.rs: Preparing (5%), PullingImage (20%), CreatingContainer (70%), StartingContainer (80%), WaitingHealthy (88%), PostInstall (95%), Done (100%). Each maps to a fixed UI progress and status message. Frontend PHASE_INFO mapper in stores/server.ts prioritizes phase when present, falls back to byte-counter for legacy. A Math.max forward-only guard ensures the bar never regresses. Deleted the duplicate watcher in Discover.vue that was fighting the store's watcher with stale byte logic. Added shimmer CSS on the fill (with prefers-reduced-motion opt-out) so the bar looks alive during long phases.	2026-04-23 07:58:43 -04:00
archipelago	576ff1a6de	docs(status): mark install/uninstall/update async-spawn as shipped	2026-04-23 06:58:45 -04:00
archipelago	49b98e0271	fix(rpc): empty icon in transient install entry to avoid broken-image flicker create_installing_entry hardcoded /assets/img/app-icons/<id>.png for every new install. About half the app icons ship as .svg or .webp (lnd.svg, vaultwarden.webp, bitcoin-knots.webp, mempool.webp), so the browser 404s on the wrong extension and renders the default broken-image glyph for the 10-30s window before the scanner refreshes with real manifest data. Send empty icon. The frontend's icon computed in AppCard.vue falls through to curatedMap which has correct extensions for bundled apps, and handleImageError still guards any remaining misses with a placeholder SVG.	2026-04-23 06:58:12 -04:00
archipelago	702b5d64d3	fix(ui): shorten install/uninstall/update timeouts for async RPCs With the backend flipped to async-spawn, install/uninstall/update return immediately with a { status, package_id } envelope. Client timeouts of 45m/11m were a leftover from synchronous handlers and masked real RPC failures. Drop all install/uninstall/update RPC timeouts to 15s. Progress and terminal state still arrive through the live state stream — the RPC only needs to confirm the spawn was accepted. Return-type annotations updated in rpc-client.ts and stores/server.ts. Five direct rpcClient.call sites across Marketplace.vue, Discover.vue, and MarketplaceAppDetails.vue updated with the shorter timeout.	2026-04-23 06:58:02 -04:00
archipelago	1ad889608f	feat(rpc): async-spawn install/uninstall/update lifecycle Extend the async-spawn treatment previously shipped for Stop/Start/Restart to the three remaining long-running lifecycle RPCs. Each wrapper validates params, rejects duplicate in-flight ops, flips state to the transitional variant (Installing/Removing/Updating), then spawns the existing inner handler on tokio. RPC returns immediately with { status, package_id }; the spawn task owns the terminal state write. Install and update success arms explicitly set state=Running. The scan loop merge (merge_preserving_transitional) refuses to overwrite transitional states, so the spawn task must write the terminal state. Uninstall's inner handler removes the entry entirely, so no explicit terminal write is needed there. Dispatcher and handler now thread self as Arc<Self> / &Arc<Self> so spawned tasks can hold their own Arc without extra field cloning. Transient install entry uses empty icon string. Hardcoding /assets/img/app-icons/<id>.png 404s for apps that ship .svg or .webp assets, which produces a broken-image flicker until the scanner refreshes with manifest data. Empty string causes the frontend's icon computed to fall through to the curated map, which has correct extensions. Removed the inner "already updating" guard in update.rs — the wrapper now owns duplicate-op detection for all three operations.	2026-04-23 06:57:50 -04:00
archipelago	0ea4f96de9	docs(status): mark async-spawn lifecycle fix as shipped Records the four landed commits, the .228 deploy (binary + frontend paths, backups, md5), the manual LND Stop verification, and the rollback incantation. Leaves the older "NEXT SESSION" design block in place as historical reference with a note that it's stale. Adds a follow-ups list: chaos matrix is now unblocked, bundled-app RPCs are still sync (deprecate or mirror-async?), transitional_since is in-memory only, and there are 22 pre-existing test failures in unrelated modules that should get their own cleanup pass.	2026-04-23 05:30:45 -04:00
archipelago	a8158b1ef5	fix(ui): single-button lifecycle control with transitional labels The app card and details view previously used a pair of Start/Stop buttons whose labels were driven off isAppLoading(), a client-side "I just clicked the button" flag. When the backend's graceful stop took longer than the RPC round-trip (up to 600s on bitcoin-core), the flag cleared while the container was still shutting down, the UI flipped back to "Running" as soon as the next 10s scan saw the still-alive container, and the user had no indication the stop was still in flight. Now that the backend flips PackageState to Stopping / Starting / Restarting / Installing / Updating / Removing for the duration of each lifecycle operation and the scan loop preserves those states, the UI can drive its label off the container state itself. A single full-width primary button replaces the Start/Stop pair. Its label, color, and disabled state come from getAppVisualState(), which collapses resting states (exited/created/paused/installed) into "stopped" and passes transitional states through untouched. Changes: - container-client.ts: widen ContainerStatus.state union to include the six transitional variants plus "installed". Add restartContainer() calling the new container-restart RPC. - stores/container.ts: add getAppVisualState() computed and the restartContainer() action. - ContainerApps.vue: single primary button (Start / Stop / Starting / Stopping / Restarting etc.) plus a separate circular Restart button visible only when running. Critically, handleStartApp and handleStopApp now route through store.startContainer and stopContainer (which call container-start / container-stop, the async RPCs) instead of the legacy synchronous bundled-app-start / bundled-app-stop path. Transitional-state polling widened from just "created" to the full set of transitional variants. - ContainerAppDetails.vue: same single-button pattern, Restart button now calls container-restart instead of the old stop-sleep-start sequence, added 2s polling interval for transitional states. - components/ContainerStatus.vue: widen state prop to match the shared union, render transitional labels with a trailing ellipsis and a yellow dot. No new tests — this is presentation logic. Manual verification on .228 will confirm the end-to-end async path: click Stop on LND, button becomes "Stopping" in under a second, stays that way for roughly 5 minutes, then flips to "Start" with a grey dot. The UI must never revert to "Running" mid-stop.	2026-04-23 05:20:15 -04:00
archipelago	cd69c3b2f6	fix(state): preserve transitional state across container scans The 30s package scan loop used to blindly overwrite every package entry from podman inspect. While a user-initiated Stop / Start / Restart was in flight, the RPC spawn task would flip the state to Stopping / Starting / Restarting, the next scan would see podman still reporting "running" (for the duration of the graceful stop, up to 600s for bitcoin-core), and clobber the transitional state back to Running. The dashboard would then flip Running -> Stopping -> Running -> Stopped, making it look like the stop had silently failed until it eventually completed. The merge loop now treats transitional variants (Stopping, Starting, Restarting, Installing, Updating, Removing, and the three backup variants) as owned by the RPC spawn task. For those variants, merge_preserving_transitional keeps the existing state while still taking live observability fields (health, exit_code, installed, lan_address, manifest, static_files, available_update) from the fresh scan so the UI continues to see live health readings. Adds an escape hatch via a per-scan transitional_since side table: if a package has been in a transitional state for more than 1200s (2x the longest graceful stop at 600s on bitcoin-core), the scan loop assumes the spawn task died without cleanup and overrides with podman's live state. Prevents a crashed background task from wedging a package in Stopping forever. Three unit tests cover the merge rule, the observability passthrough, and the transitional-variant classifier.	2026-04-23 05:15:13 -04:00

1 2 3 4 5 ...

1002 Commits