archy/docs/CONTAINER_LIFECYCLE_HANDOFF.md
2026-06-11 00:24:54 -04:00

147 KiB

Container Lifecycle Handoff

Last updated: 2026-06-08

2026-06-08 1.8-alpha Release Gate Update

  • Target release is now 1.8-alpha, including a cut and smoke-tested ISO after validation is green.
  • Current release readiness estimate is about 82%.
  • Host reboot validation is not clean yet. User reported that a reboot test left IndeeHub stopped afterward, with many containers killed by SIGKILL during reboot/shutdown, one crash, and a couple stopped.
  • Treat post-reboot recovery as the active release blocker.
  • IndeeHub is not considered recovered unless:
    • the stack containers recover after boot;
    • http://192.168.1.198:7778/ is reachable;
    • the HTML includes /nostr-provider.js;
    • http://192.168.1.198:7778/nostr-provider.js is served and looks like the Nostr signer bridge.
  • Local follow-up in progress:
    • core/archipelago/src/container/prod_orchestrator.rs now hardens IndeeHub stack reconcile by starting existing backend containers through a user scope when possible, waiting for backend/API dependency readiness, restarting the frontend when it does not remain running/reachable, and checking host port 7778;
    • tests/lifecycle/remote-lifecycle.sh now validates the IndeeHub Nostr provider during launch probes;
    • core/container/src/manifest.rs now has stricter package safety validation while preserving all current real manifests.
  • Validation passed locally for this follow-up:
    • cargo fmt --manifest-path core/Cargo.toml --all;
    • cargo test --manifest-path core/Cargo.toml -p archipelago-container (45 passed);
    • cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container;
    • filtered cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago indeedhub compiled and ran one matching existing test;
    • bash -n tests/lifecycle/remote-lifecycle.sh;
    • git diff --check.
  • Passing criterion after deploy:
    • minimum: 3 consecutive clean post-fix reboots, broad non-destructive lifecycle green after each;
    • preferred before release: 5 consecutive clean post-fix reboots, broad lifecycle green after each;
    • SIGKILL during shutdown is not automatically disqualifying if all managed apps recover and pass health/launch after boot, but any stopped/crashed/unreachable managed app after boot fails that iteration.
  • Final release gate after reboot validation: cut the 1.8-alpha ISO and smoke-test boot/install/backend/UI/catalog/focused app lifecycle.

2026-06-08 Focused Blocker Validation After 06420c...

  • Deployed backend 4108ca146b482c028ae8d7c4bec314b71ef3412f15efd2e61846a2c345b36aba, then backend 06420c0377fff650a2bf3211f13c1e0754bf8df81345b8485f4c9a30cb552439 to .198.
  • Both deploys restarted only archipelago.service; archipelago-doctor.timer and archipelago-reconcile.timer stayed inactive. No reboot and no broad Podman store/image commands were run.
  • Local fixes included:
    • targeted Podman remove fallback for stuck removing/stopping records;
    • rootless Podman socket liveness check by Unix connection, not path existence;
    • IndeeHub readiness fallback to platform network aliases when getent inside the API image cannot prove DNS;
    • Tailscale launch harness now requires login/auth UI content;
    • stricter manifest validation while preserving all real manifests.
  • Validation passed locally:
    • cargo fmt --manifest-path core/Cargo.toml --all;
    • cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container;
    • cargo test --manifest-path core/Cargo.toml -p archipelago-container (45 passed);
    • bash -n tests/lifecycle/remote-lifecycle.sh;
    • git diff --check.
  • .198 is still not release-ready after 06420c...:
    • indeedhub: stuck stopping, launch 7778 returns 000;
    • immich: starting, launch 2283 returns 000;
    • tailscale: running, launch 8240 returns 000; logs show NeedsLogin/WantRunning=false, and launch must present the Tailscale login/auth UI;
    • vaultwarden: absent/not listed after start attempt, launch 8082 returns 000;
    • portainer: running, launch 9000 returns 000; user confirmed Portainer environment wizard cannot connect to unix:///var/run/docker.sock;
    • btcpay-server: not a current blocker; direct launch 23000 returned HTTP 200 and user confirmed the earlier report was wrong-server/slowness.
  • Do not continue to reboot validation or ISO cutting until rootless Podman control-plane/socket health, stuck container-state cleanup, and app-screen launch contracts are fixed.

2026-06-08 .198 Release Candidate State Check

  • Deployed backend hash 7e82532137292e91111f63819d1be7fa69f994ce20d6b5e0194915f194f20412 to .198 after the targeted image-probe mitigation.

  • Previous live backend hash before deploy was 95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de.

  • Deployment notes:

    • local release build passed: cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release;
    • initial direct cp over /usr/local/bin/archipelago failed with Text file busy, after creating a timestamped backup;
    • recovered by installing to /usr/local/bin/archipelago.new, atomically renaming it over /usr/local/bin/archipelago, and restarting only archipelago.service;
    • no host reboot and no broad Podman store/image commands were run.
  • Latest mitigation now live on .198:

    • core/container/src/runtime.rs uses bounded targeted podman image inspect for ContainerRuntime::image_exists();
    • core/archipelago/src/api/rpc/package/install.rs uses bounded targeted podman image inspect for local fallback and post-pull verification;
    • core/archipelago/src/container/companion.rs uses podman image inspect for companion image checks.
  • Validation passed on live hash 7e82532137292e91111f63819d1be7fa69f994ce20d6b5e0194915f194f20412:

    • focused non-destructive lifecycle: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism,fedimint,indeedhub ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh;
    • broad non-destructive lifecycle: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh;
    • python3 scripts/check-app-catalog-drift.py --release --strict reports metadata_drift=0, missing_catalog=0, missing_manifests=0.
  • Final .198 state:

    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 7e82532137292e91111f63819d1be7fa69f994ce20d6b5e0194915f194f20412.
    • /: 66% used, about 9.6G free.
    • /var/lib/archipelago: 8% used, about 375G free.
  • Startup logs still showed one known podman ps -a --format json timed out after 30s scan timeout followed by scan backoff; lifecycle validation passed anyway. Treat Podman socket/store health as a residual release risk, but release image probes are now quarantined from the known fragile image-existence/list commands.

  • Remaining release gate: host reboot validation, only if explicitly approved.

  • Verified .198 without running broad Podman store/image commands.

  • Current local release binary and live /usr/local/bin/archipelago match hash 670a3e789540082437c7521cc5ad7a4c260f56ee8e0a9cf770160fa25b4e4644.

  • archipelago.service is active.

  • archipelago-doctor.timer is inactive.

  • archipelago-reconcile.timer is inactive.

  • / is at 65% used with about 9.9G free.

  • /var/lib/archipelago is at 10% used with about 370G free.

  • Backend-restart validation was already recorded as passed in the release-candidate checkpoint. The remaining live validation gate is host reboot validation, only if explicitly approved.

  • Continue avoiding podman image list, podman system df, broad podman image exists, podman image prune, and podman volume prune on .198 while the store/socket health risk is unresolved.

2026-06-08 Local Release Gate Completion

  • No .198 host actions were performed in this pass: no reboot, no timer changes, no deploy, no Podman store-wide commands.
  • Fixed scanner skip/backoff wakeups so skipped scans still advance the scan-completion watch counter for install/update waiters.
  • Fixed local full-test blockers:
    • crash-recovery unit tests now pass the include_stack_members flag and cover generic-vs-stack recovery behavior;
    • runtime manifest-port lookup checks the workspace apps/ directory via CARGO_MANIFEST_DIR, so new public manifests are visible from test/runtime working directories;
    • journal disk usage parsing accepts compact journalctl output such as 463.9M;
    • boot-reconciler cadence tests bypass the global crash-recovery wait gate when using the existing test-only without_companion_stage() helper.
  • Local validation passed:
    • cargo fmt --manifest-path core/Cargo.toml --all.
    • cargo test --manifest-path core/Cargo.toml -p archipelago --bin archipelago (688 passed).
    • cargo test --manifest-path core/Cargo.toml -p archipelago-container (43 passed).
    • cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container.
    • cargo check --manifest-path core/Cargo.toml -p archipelago-performance -p archipelago-security.
    • cargo test --manifest-path core/Cargo.toml -p archipelago-performance -p archipelago-security (12 security tests passed; performance has no tests).
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release.
    • python3 scripts/generate-app-catalog.py (updated 0 fields).
    • python3 scripts/check-app-catalog-drift.py --release --strict.
    • python3 -m py_compile scripts/generate-app-catalog.py scripts/check-app-catalog-drift.py scripts/app-catalog-image-smoke-test.py.
    • git diff --check.
    • cmp -s app-catalog/catalog.json neode-ui/public/catalog.json.
  • Remaining live gate is unchanged: host reboot validation on .198, only if explicitly approved.

2026-06-08 Frontend Release Gate Completion

  • No .198 host actions were performed in this pass: no reboot, no timer changes, no deploy, no Podman store-wide commands.
  • Fixed mobile app-launch behavior in neode-ui/src/stores/appLauncher.ts:
    • desktop still opens X-Frame-Options/new-tab apps directly in a new tab;
    • mobile now routes those same apps through app-session so app icons keep users inside Archipelago;
    • router return-path handling is defensive when currentRoute is unavailable.
  • Updated frontend tests for current launch behavior and fixed async/Pina fixture setup.
  • Local validation passed:
    • npm run type-check.
    • npm test (548 passed).
    • npm run build.
    • python3 scripts/generate-app-catalog.py (updated 0 fields).
    • python3 scripts/check-app-catalog-drift.py --release --strict.
    • python3 -m py_compile scripts/generate-app-catalog.py scripts/check-app-catalog-drift.py scripts/app-catalog-image-smoke-test.py.
    • cmp -s app-catalog/catalog.json neode-ui/public/catalog.json.
    • git diff --check.
  • Local caveat: npm ci failed before checks because existing neode-ui/node_modules/@alloc entries are root:root; do not mutate ownership or remove the tree without explicit approval.

2026-06-08 Local Podman Store-Risk Cleanup

  • Reviewed release-relevant Podman store/image call sites without running broad Podman store/image commands on .198.
  • Bounded stack installer image pulls in core/archipelago/src/api/rpc/package/stacks.rs with kill_on_drop and a 600s timeout.
  • Bounded manual package update image pulls in core/archipelago/src/api/rpc/package/update.rs with kill_on_drop and a 600s timeout while preserving stderr progress parsing.
  • Validation passed locally:
    • python3 scripts/check-app-catalog-drift.py --release --strict.
    • cargo fmt from core/.
    • cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release.
  • Local release binary hash after this cleanup is a52a87474c9a788e058ee1da1edd6091ab305594a53e7a153889f77041598ff4.
  • This local build has not been deployed to .198; live .198 remains on 670a3e789540082437c7521cc5ad7a4c260f56ee8e0a9cf770160fa25b4e4644 unless a later checkpoint says otherwise.

2026-06-08 .198 Podman Pull Hardening Deploy

  • Deployed backend hash a52a87474c9a788e058ee1da1edd6091ab305594a53e7a153889f77041598ff4 to .198.
  • Previous backend was backed up under /usr/local/bin/archipelago.backup-20260608-store-risk-* before replacement.
  • Restarted only archipelago.service; no host reboot was performed.
  • No broad Podman store/image commands were run.
  • Initial systemctl restart exceeded the local 120s wrapper while startup was still in progress, but the backend reached Server listening, then systemd settled to active/running.
  • Final .198 state:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: a52a87474c9a788e058ee1da1edd6091ab305594a53e7a153889f77041598ff4.
    • /: 65% used, about 9.8G free.
    • /var/lib/archipelago: 10% used, about 370G free.
  • Validation passed:
    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint,immich,indeedhub,photoprism ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • python3 scripts/check-app-catalog-drift.py --release --strict.
  • Remaining release gate: host reboot validation, only if explicitly approved.

2026-06-08 .198 App Health and Port Recovery

  • Deployed backend hash 95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de to .198.
  • Fedimint Guardian and File Browser were reachable but UI package-data reported health=starting; backend scanner now normalizes reachable running apps to healthy and restores the launch URL when the direct port is reachable.
  • Nostr relay had been using host port 8081, which conflicted with Nginx Proxy Manager admin launch. Updated apps/nostr-rs-relay/manifest.yml to use host port 18081.
  • Recovered live Nostr/NPM state:
    • Nginx Proxy Manager admin UI responds on http://127.0.0.1:8081/.
    • Nostr relay responds on http://127.0.0.1:18081/ with the expected Nostr-client message.
  • Hardened legacy install runtime for scoped web apps: use podman create followed by systemd-run --user --scope podman start so containers are not coupled to archipelago.service, while install RPCs do not hang on scoped podman run -d.
  • Recovered IndeedHub after broad validation found it stopped:
    • indeedhub-minio had stopped, causing the frontend nginx container to exit with host not found in upstream "minio".
    • Restarted existing indeedhub-minio with preserved volume data and restarted the frontend.
    • http://127.0.0.1:7778/ returned HTTP 200 afterward.
  • Validation passed:
    • cargo check --manifest-path core/Cargo.toml -p archipelago -p archipelago-container.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release.
    • python3 scripts/check-app-catalog-drift.py --release --strict.
    • Focused lifecycle for indeedhub,nginx-proxy-manager,nostr-rs-relay,fedimint,filebrowser.
    • Broad non-destructive lifecycle: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 95dfd8530ae9621b2f16da05d2229fe40bed7e5f6e2097cf4c87000fe97b92de.
    • /: 65% used, about 9.6G free.
    • /var/lib/archipelago: 10% used, about 370G free.
  • Remaining release gate: host reboot validation, only if explicitly approved.

2026-06-04 .198 IndeedHub and Immich Lifecycle Recovery

  • Deployed backend hash 89dfc3d4e801b35564dc8dc7f4a513028eb7e2027b586e8aad7a0f374e20d6a9 to .198.
  • Fixed IndeedHub frontend startup sequencing so network alias repair is only applied immediately before the frontend starts, after indeedhub-minio, indeedhub-redis, and indeedhub-api are running.
  • Fixed Immich lifecycle recovery on .198:
    • dependency readiness now accepts healthy Podman health state for immich_postgres and immich_redis before falling back to slower podman exec probes;
    • immich_server startup now repairs /var/lib/archipelago/immich ownership through podman unshare chown -R 0:0, preserving existing upload data while matching the current rootless container user mapping;
    • this resolved the observed EACCES failure writing /usr/src/app/upload/encoded-video/.immich.
  • Diagnosis notes:
    • Broad audit initially failed only on Immich (state=exited); focused Fedimint and NetBird audits passed.
    • Patched dependency wait got lifecycle past dependencies to Starting container: immich_server.
    • Upload ownership repair allowed Immich API and microservices to remain running; direct http://127.0.0.1:2283/ returned HTTP 200.
  • Verification on this hash:
    • cargo check --manifest-path core/Cargo.toml -p archipelago passed.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release passed.
    • Focused IndeedHub audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=indeedhub ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Focused Fedimint audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh.
    • Focused NetBird audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=netbird ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh.
    • Focused Immich audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=immich ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive lifecycle audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state after validation:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 89dfc3d4e801b35564dc8dc7f4a513028eb7e2027b586e8aad7a0f374e20d6a9.
  • Residual risk:
    • .198 still shows intermittent podman ps -a --format json timed out after 30s and transient Bitcoin RPC timeouts under load; keep avoiding store-wide Podman commands and treat Podman socket/store health as a separate release hardening item.

2026-06-03 .198 Generic Host-Port Health Checkpoint

  • Latest local Podman store-risk mitigation, pending deploy to .198:

    • core/container/src/runtime.rs now implements ContainerRuntime::image_exists() with bounded targeted podman image inspect instead of podman image exists.
    • core/archipelago/src/api/rpc/package/install.rs now verifies local fallback images and post-pull images with bounded targeted podman image inspect instead of podman images -q.
    • core/archipelago/src/container/companion.rs now uses podman image inspect instead of podman image exists.
    • A grep across core/**/*.rs finds no live Rust call sites for podman image exists or podman images -q; only an explanatory comment remains.
    • Validation passed: cargo fmt --all --check, cargo check -p archipelago-container, cargo check -p archipelago, CARGO_INCREMENTAL=0 cargo check -p archipelago --tests, cargo test -p archipelago-container, and whitespace check for the changed files.
    • A filtered cargo test -p archipelago install_fresh_build did not reach execution due to local compile/link slowness/artifact failure; --tests compilation passed afterward.
  • Deployed backend hash 14d360a206d1e58f287c5722d709dace0284b0dea56b66aa4bce0f57c631631b to .198 after release code-review/refactor cleanup of legacy runtime host-port repair.

  • Reduced duplicated app-specific port repair logic in core/archipelago/src/api/rpc/package/runtime.rs:

    • legacy package start/restart repair now derives host ports from apps/*/manifest.yml when available;
    • hardcoded ports remain only as fallback for legacy/non-manifest apps and for extra legacy cleanup ports such as Gitea 3000 and Nginx Proxy Manager 8084/8444;
    • the old duplicate Gitea cleanup helper was removed;
    • focused unit coverage was added for manifest-derived runtime ports and legacy extra ports.
  • Verification on this hash:

    • cargo check --manifest-path core/Cargo.toml -p archipelago passed.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release passed.
    • Focused runtime_host_ports test was added but local cargo test ... runtime_host_ports did not complete within 5 minutes during compilation, consistent with known local test/linker slowness.
    • Targeted PhotoPrism audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive lifecycle audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state after validation:

    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 14d360a206d1e58f287c5722d709dace0284b0dea56b66aa4bce0f57c631631b.
  • Catalog metadata generation is now implemented:

    • Added scripts/generate-app-catalog.py to sync manifest-owned fields into both app-catalog/catalog.json and neode-ui/public/catalog.json while preserving catalog-only presentation/runtime fields.
    • Corrected stale manifest metadata for public catalog apps where the manifest was behind production catalog/image values: BotFights, IndeeHub, Gitea icon/repo, LND title/image, ElectrumX image, Fedimint image, and Mempool title/version/image.
    • Ran generator; canonical and UI catalogs now match byte-for-byte.
    • Release drift gate is green: python3 scripts/check-app-catalog-drift.py --release --strict reports metadata_drift=0, missing_catalog=0, missing_manifests=0.
    • Validation passed: jq empty app-catalog/catalog.json neode-ui/public/catalog.json, cargo test --manifest-path core/Cargo.toml -p archipelago-container, cargo check --manifest-path core/Cargo.toml -p archipelago, and npm run build from neode-ui.
  • Deployed backend hash eaa83c30467acd42ad864a8e0ea0d5fd88b94b775a06bfcdc460c4b0cd8e75b2 to .198 after a narrow Podman store-risk hardening pass.

  • Hardened fresh local-build installs so podman image exists <local-build-tag> failures/timeouts no longer fail the lifecycle operation outright:

    • existing timeout remains bounded in the runtime;
    • install_fresh() now logs the check failure and rebuilds the local image instead;
    • this matches the existing drift-restart path and keeps local image store checks from becoming release-blocking.
  • Verification on this hash:

    • cargo check --manifest-path core/Cargo.toml -p archipelago passed.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release passed.
    • Focused unit test install_fresh_builds_when_image_exists_check_fails was added but local cargo test ... did not complete within 15 minutes during compilation, consistent with known local test/linker slowness.
    • Targeted PhotoPrism audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive lifecycle audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state after validation:

    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: eaa83c30467acd42ad864a8e0ea0d5fd88b94b775a06bfcdc460c4b0cd8e75b2.
  • Deployed backend hash be95ea91339a7fb0a3b20d0ae5d816dca220d5e5ca86838cc0ba50b609ad7b36 to .198 after hardening container-health fallback behavior.

  • Fixed the broad lifecycle timeout path where container-health could return Failed to get container health even though the app endpoint was reachable:

    • cached_reachable_health() now parses URL ports correctly when launch URLs include a trailing slash, such as http://localhost:2342/.
    • The fallback port map now covers the lifecycle launch apps, including PhotoPrism 2342, BTCPay 23000, LND UI 18083, Mempool 4080, Electrum 50002, Fedimint 8175, Gitea 3001, IndeedHub 7778, Ollama 11434, Vaultwarden 8082, Tailscale 8240, and others.
    • Reachable cached-running apps can now return healthy without depending on flaky Podman health/inspect paths.
  • Verification on this hash:

    • cargo check --manifest-path core/Cargo.toml -p archipelago passed.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release passed.
    • Targeted PhotoPrism audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_STABILITY_SECONDS=1 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive lifecycle audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state after validation:

    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: be95ea91339a7fb0a3b20d0ae5d816dca220d5e5ca86838cc0ba50b609ad7b36.
    • /: 62% used, about 11G free.
    • /var/lib/archipelago: 9% used, about 370G free.
  • Remaining blockers:

    • Podman socket/store health is still a release risk; continue avoiding broad store/image commands on .198.
    • Backend-restart and host-reboot validation are still pending and should be run only when approved.

2026-06-03 .198 Generic Host-Port Health Checkpoint In Progress

  • Deployed backend hash 3912b900c376b6c28bf5453640cae82135f67d7e0f984b8adcc78064b924143b to .198.
  • This pass is explicitly aligned with the migration objective: use generic platform primitives from manifest/container-declared ports instead of adding more OS-level or app-specific package edits.
  • Broad lifecycle on previous hash d21202cd... failed only because Uptime Kuma briefly appeared as stopping during listener repair; it recovered immediately afterward with 3002 listening and HTTP 302.
  • Implemented generic health-monitor host-port awareness:
    • Health monitor now parses Podman JSON Ports host TCP bindings for each container.
    • A running container with declared host TCP ports is not considered healthy if those host listeners are missing.
    • This avoids a hardcoded app-to-port list and makes missing pasta/rootless listeners a generic recovery concern.
  • Also fixed scanner merge semantics:
    • Stopping -> Running now recovers immediately when there is no user-stopped marker.
    • User-initiated stops still preserve Stopping over live Running while the stop is in progress.
  • Verification so far:
    • cargo check --manifest-path core/Cargo.toml -p archipelago passed.
    • cargo build --manifest-path core/Cargo.toml -p archipelago --bin archipelago --release passed.
    • Live service state after deploy: archipelago.service active; doctor/reconcile timers inactive.
    • After backend restart, Uptime Kuma recovered its 3002 listener and returned HTTP 302.
  • Still in progress:
    • Jellyfin is still running/healthy according to Podman but missing the 8096 host listener after backend restart.
    • Next fix should keep the same generic direction: missing host listener repair should use the manifest/orchestrator-aware restart path for apps with declared ports, not another Jellyfin-specific OS edit.
    • Broad lifecycle has not yet passed on 3912b900....

2026-06-03 .198 Stale State and Jellyfin Pasta Listener Repair

  • Deployed backend hash d21202cd79794e3bfc882d37134afd7a41dac766bae386a675714e5fa030e94e to .198.
  • Fixed a focused lifecycle false-negative where container-list could report stale cached exited state while Podman scan backoff was active and the container had already recovered:
    • Cached exited entries now get a targeted live refresh before being returned by container-list.
    • This avoids broad podman ps scans and preserves the UI/package-data consistency model.
  • Added a bounded container-health fallback for cached running web apps:
    • If the cached app state is Running and its known local launch port accepts TCP, the RPC can return healthy without waiting on Podman inspect/list paths.
    • This quarantines health reads from intermittent Podman socket/store stalls.
  • Added Jellyfin to the legacy runtime host-port repair path:
    • runtime_required_host_port("jellyfin") now maps to 8096.
    • stale pasta cleanup now includes 8096 for Jellyfin start conflicts.
  • Validation notes:
    • package.restart jellyfin exposed a remaining Podman socket/runtime failure after stopping the container: Cannot connect to Podman socket at /run/user/1000/podman/podman.sock: Permission denied.
    • package.start jellyfin recovered the app afterward; jellyfin returned Up ... (healthy), ss showed a pasta.avx2 listener on 8096, and http://192.168.1.198:8096/ returned HTTP 302.
    • Focused lifecycle passed on the current hash: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Endpoint checks after focused lifecycle: Uptime Kuma 3002 returned 302; Jellyfin 8096 returned 302; Filebrowser 8083 returned 404 at /, which is expected for this probe.
    • scripts/check-app-catalog-drift.py --release still reports zero missing entries and 35 metadata drift items.
  • Final .198 state:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: d21202cd79794e3bfc882d37134afd7a41dac766bae386a675714e5fa030e94e.
    • /: 62% used, about 11G free.
    • /var/lib/archipelago: 9% used, about 371G free.
  • Remaining blocker:
    • Broad lifecycle has not yet been rerun on d21202cd....
    • Podman socket/store health is still a release risk; avoid broad image/store commands and treat socket permission/runtime failures separately from app health.

2026-06-03 .198 Expanded Rollback Cleanup and Store-Safe Uninstall

  • Deployed backend hash 7f90345b75148b7ed748e1a417f31d1273e1646a9b742891858df11c5397051b to .198.
  • Expanded system.disk-cleanup retention beyond archipelago.backup-* to cover alpha-era rollback artifacts:
    • legacy /usr/local/bin/archipelago.bak* and archipelago.before-* files;
    • old /opt/archipelago/web-ui.bak* and web-ui.old directories.
  • Live cleanup reclaimed 10.3 GB without touching Podman image/volume prune:
    • Removed old backend backups: 41.6 MB freed.
    • Removed old legacy backend backups: 3.6 GB freed.
    • Removed old web UI backups: 6.6 GB freed.
    • Skipped Podman image/volume prune: Podman store commands can block app health on busy nodes.
  • Root filesystem pressure is no longer a release blocker on .198:
    • Before expanded cleanup: / was 99% used with about 478-545M free.
    • After expanded cleanup: / is 61% used with about 11G free.
    • /usr/local/bin dropped to about 336M; /opt/archipelago dropped to about 1.1G.
  • Uninstall no longer runs global podman volume prune -f; app data removal remains explicit when preserve_data=false.
  • Verification:
    • cargo build -p archipelago --bin archipelago --release passed.
    • Local cargo test -p archipelago system::tests did not complete within 10 minutes in this environment; release build succeeded and live cleanup validation passed.
    • Focused post-cleanup lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 7f90345b75148b7ed748e1a417f31d1273e1646a9b742891858df11c5397051b.

2026-06-03 .198 Startup Scan Backoff and Uptime Kuma Pasta Repair

  • Deployed backend hash 2b72e83ff368e4a696ad701f8985b0a8e1e889d9f4844056dc063455df973b28 to .198.
  • Startup adoption is now bounded with a 35s timeout so a stuck podman ps -a --format json cannot stall backend startup indefinitely.
  • The initial container scan now seeds the same 300s Podman scan backoff used by periodic scans, preventing an immediate second podman ps after a startup timeout.
  • Legacy pasta restart paths now use scoped podman restart instead of stop+start. This repairs cases where a running pasta container loses its host listener but podman start would be a no-op.
  • Uptime Kuma validation:
    • Before repair, the container was running and internally healthy on 127.0.0.1:3001, but host port 3002 had no pasta listener and LAN launch failed.
    • package.restart for uptime-kuma now returns {"status":"restarted"} instead of hanging.
    • Post-restart http://192.168.1.198:3002/ returned HTTP 302 and the scanner restored launch metadata.
  • Release validation passed:
    • Focused audit: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Broad audit: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • Final .198 state:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /: still tight at 99% used, about 395M free.
    • /var/lib/archipelago: about 10% used.
  • Residual risk:
    • .198 Podman store health remains fragile under broad store commands; avoid prune/image-list/system-df release operations until the store issue is handled separately.
    • Logs during broad validation still showed unrelated IndeedHub/conmon cgroup permission noise, but focused and broad lifecycle audits passed.

2026-06-02 .198 Registry/Catalog and Lifecycle Checkpoint

  • Follow-up on Podman prune/catalog generation:

    • Diagnosed the podman image prune -f failure and found it is broader than prune: podman system df, podman image list, podman image exists, and sometimes broad podman ps/inspect can hang on .198 under current store/node load.
    • Stopped only the diagnostic Podman commands started during this follow-up.
    • Changed system.disk-cleanup to skip Podman image/volume prune entirely for the release path. Cleanup still handles logs, journal retention, temp files, and backend backup retention, and returns an explicit action: Skipped Podman image/volume prune: Podman store commands can block app health on busy nodes.
    • Deployed backend hash c9695dc3db10ff6e593cdbcfbbdc94b2e98b6008aa62655bba51b9879b549e8c to .198.
    • Live cleanup validation passed: endpoint returned quickly, pruned old backend backups, did not spawn new Podman prune/list work, and / stayed around 98% with about 647-670M free.
    • During diagnosis, Uptime Kuma's port returned empty responses. Restarted only uptime-kuma through package.restart; data preserved; launch returned HTTP 302 afterward.
    • Focused post-repair audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Broad post-repair audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Final raw Podman bad-state sweep was clean.
    • Catalog metadata generation is not implemented yet. The release-safe step in this pass is the new scripts/check-app-catalog-drift.py --release mode, which reports zero missing catalog/manifest entries while still surfacing metadata-only drift.
  • Release-work continuation after cleanup/catalog/review gate:

    • Deployed backend hash e285d421cef497beb6b4b929f36fb4296d6db1f4a4c786157b6751eec51619ca to .198.
    • system.disk-cleanup is now bounded so a slow podman image prune -f cannot wedge the cleanup RPC indefinitely; the prune failure is reported as an action while cleanup continues.
    • system.disk-cleanup now vacuums systemd journals to a bounded size and prunes timestamped /usr/local/bin/archipelago.backup-* files to the newest three using the existing host_sudo path.
    • Live cleanup validation passed: endpoint returned, journals were reduced to about 200M, old backend backups were pruned to three, and / improved from about 99%/490M free to 98%/about 730M free.
    • Added nostr-rs-relay to both catalog surfaces. Release-focused catalog drift now has zero missing catalog/manifest entries; remaining drift is metadata-only and belongs to the catalog-generation follow-up.
    • Focused post-cleanup audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser,nostr-rs-relay,portainer ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Broad post-cleanup audit passed with extended harness timeout: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Final raw Podman sweep showed no unhealthy, stopping, removing, exited, created, or initialized containers.
    • Final service state: archipelago.service active; archipelago-doctor.timer inactive; archipelago-reconcile.timer inactive.
  • Follow-up validation after the previous cutoff:

    • .198 is already running the current local release build hash 579b823cf4a4b8c50bb3d0c3d49449c58101b016eb6ebc8049975dce98e34265; no backend replacement was performed in this pass.
    • Local release binary smoke-started successfully on an alternate bind/data dir before live checks.
    • Meshtastic manifest-owned file rendering is now proven live: /var/lib/archipelago/meshtastic/config.yaml was backed up, removed, and recreated by package.restart from apps/meshtastic/manifest.yml.
    • Focused Meshtastic audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh.
    • Focused regression audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Final raw Podman sweep showed no unhealthy, stopping, removing, exited, created, or initialized containers.
    • Service state remains deterministic-test safe: archipelago.service active; archipelago-doctor.timer inactive; archipelago-reconcile.timer inactive.
    • / remains tight at 99% used with about 490M free.
  • Live .198 state after this pass:

    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256 is now 579b823cf4a4b8c50bb3d0c3d49449c58101b016eb6ebc8049975dce98e34265; no backend replacement was performed in this follow-up pass.
    • /: still tight at 99% used, about 490M free.
  • Registry state:

    • Live /var/lib/archipelago/config/registries.json is already correct: 146.59.87.168:3000/lfg2025 is primary with tls_verify: false; git.tx1138.com/lfg2025 is enabled as secondary with tls_verify: true.
    • Added meshtastic and portainer to both app-catalog/catalog.json and neode-ui/public/catalog.json so migrated manifest-owned apps are present in the registry/catalog surface.
  • Live recovery performed:

    • Raw Podman sweep found nextcloud stuck in Removing.
    • Removed only the wedged container record with podman rm -f nextcloud; bind-mounted data was preserved.
  • Local verification passed:

    • jq empty app-catalog/catalog.json neode-ui/public/catalog.json.
    • cargo test -p archipelago-container generated_files_must_live_under_bind_mounts.
    • cargo test -p archipelago manifest_generated_files.
    • cargo test -p archipelago reconcile_force_recreates_stopping_container.
    • cargo test -p archipelago health_maps_states_to_strings.
    • cargo test -p archipelago test_rewrite_image.
    • cargo test -p archipelago test_load_default.
    • cargo check -p archipelago --bin archipelago.
    • cargo build -p archipelago --bin archipelago --release, hash 13786fd7bc5afb36fb7873ad9aee1a54a696e75b0a92c2fcd90cc8100038a54c.
  • Live validation passed:

    • Focused audit: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive audit: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Final raw Podman sweep showed no unhealthy, stopping, removing, exited, created, or initialized containers.
  • Remaining before release:

    • The prior release-binary segfault is no longer reproducing with the current artifact; .198 is active on hash 579b823cf4a4b8c50bb3d0c3d49449c58101b016eb6ebc8049975dce98e34265. Continue watching logs after restarts, but do not treat app.files deployment as blocked.
    • Add disk cleanup/backup retention policy; root filesystem pressure still makes deploys and image operations fragile.
    • Resolve broader app catalog/manifest drift reported by scripts/check-app-catalog-drift.py; this pass only added the migrated Meshtastic and Portainer catalog entries.

2026-05-28 .198 Meshtastic File-Rendering Recovery Checkpoint

  • Current .198 service state after recovery:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256 restored to 2ec1952dcc5f6101d236dd3ea7a85a40a6387a3f1afb8a5681345cad90306853 after a failed deploy attempt.
    • /: still tight at 99% used, about 546M free.
  • Local generated-file support status:
    • Manifest schema supports app.files.
    • Production orchestrator writes declared manifest files before create/start/restart and does not overwrite existing files unless overwrite: true is declared.
    • Meshtastic manifest declares /var/lib/archipelago/meshtastic/config.yaml under its bind-mounted data directory.
  • Local verification passed:
    • cargo test -p archipelago-container generated_files_must_live_under_bind_mounts.
    • cargo test -p archipelago manifest_generated_files.
    • cargo check -p archipelago --bin archipelago.
    • cargo build -p archipelago --bin archipelago --release produced local hash 13786fd7bc5afb36fb7873ad9aee1a54a696e75b0a92c2fcd90cc8100038a54c.
  • Live deploy caveat:
    • Deploying the local release binary to .198 caused immediate SIGSEGV on archipelago.service startup.
    • The previous live binary was restored from /usr/local/bin/archipelago.backup-20260528-container-files-2ec1952dcc5f6101d236dd3ea7a85a40a6387a3f1afb8a5681345cad90306853; backend returned active.
    • Do not redeploy that local release artifact blindly; diagnose the startup segfault/build mismatch first.
  • Live Meshtastic recovery:
    • Before recovery, .198 had Meshtastic manifests with files: but no /var/lib/archipelago/meshtastic/config.yaml; container logs showed No 'config.yaml' found and Blank MAC Address not allowed.
    • Wrote the same config currently declared by the manifest to /var/lib/archipelago/meshtastic/config.yaml as an operational recovery, then restarted meshtastic.service.
    • Meshtastic returned Up ... (healthy).
  • Live validation passed:
    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=meshtastic,jellyfin,filebrowser ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Raw Podman sweep showed Meshtastic, Jellyfin, File Browser, BTCPay, Grafana, SearXNG, Gitea, Nostr relay, Botfights, Portainer, Nginx Proxy Manager, and other active managed containers without unhealthy/stopping/removing/exited states.
  • Next required work:
    • Diagnose why the local release backend segfaults immediately on .198 before deploying the generic manifest file renderer as the durable fix.
    • After a safe backend deploy, remove reliance on the manually recovered Meshtastic config by proving the manifest-owned renderer recreates it on start/restart.
    • Keep deterministic-test timers inactive unless intentionally running non-deterministic recovery testing.

2026-05-27 .198 Manifest-Orchestrator Migration Checkpoint

  • Current .198 live backend:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 31ae1b346fd36d715c9fe7f0686dcb31a70d2fea44996abf122743d048fb7b2f.
  • Migration goal confirmed and advanced: apps should not require hardcoded OS/Rust edits to work. App differences belong in manifests; Rust/OS should provide generic primitives for lifecycle, Quadlet rendering, readiness/health, port repair, bind-mount prep, data ownership, and image availability.
  • New generic backend fixes deployed:
    • Quadlet health drift detection now compares HealthCmd, HealthInterval, HealthTimeout, and HealthRetries.
    • HTTP health command rendering now derives wget -T / curl -m from manifest health_check.timeout; timeout: 30s now produces helper-level 30s probes instead of an outer Podman 30s wrapped around an inner 5s command.
    • Existing Quadlet unit drift that requires restart now verifies the manifest image exists locally and pulls/builds if missing before restarting.
    • Existing Quadlet service start for a missing container now also verifies/pulls/builds the manifest image before systemctl --user start.
    • Reconcile now treats manifest-declared dependencies of active apps as required even if stale user-stopped.json entries exist, and parent app reconcile drift-syncs existing dependency Quadlet units from their own manifests.
    • Portainer host prep moved out of a hardcoded Rust install hook; generic bind-mount socket prep now handles manifest sources ending in /podman.sock.
  • Manifest updates deployed to both /opt/archipelago/apps and /opt/archipelago/web-ui/archipelago-runtime/apps:
    • portainer: declarative manifest with data dirs, Podman socket mount, capabilities, data_uid, 9000:9000, and no Podman healthcheck.
    • btcpay-server, grafana, nostr-rs-relay, searxng: HTTP health timeouts/retries loosened to timeout: 30s, retries: 5 to avoid false negatives under .198 load.
    • archy-nbxplorer manifest has timeout: 30s, retries: 5; live unit now matches with helper-level wget -T 30 / curl -m 30.
  • Local verification passed:
    • cargo fmt.
    • cargo test -p archipelago translate_health_check -- --nocapture passed.
    • cargo check -p archipelago --bin archipelago passed after each backend fix.
    • cargo build -p archipelago --bin archipelago --release passed; final deployed binary hash is 31ae1b346fd36d715c9fe7f0686dcb31a70d2fea44996abf122743d048fb7b2f.
  • Live .198 validation:
    • Portainer full lifecycle passed earlier: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=portainer ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • BTCPay focused lifecycle passed after the missing-image start guard: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=btcpay-server ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Focused migration audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=btcpay-server,grafana,nostr-rs-relay,searxng,portainer,gitea ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Broad non-destructive lifecycle audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=600 tests/lifecycle/remote-lifecycle.sh.
    • Targeted unit/container sweep showed btcpay-server, grafana, nostr-rs-relay, searxng, and portainer services active.
    • Post-focused and post-broad raw Podman sweeps found no unhealthy, stopping, removing, exited, created, or initialized containers.
    • Raw states: btcpay-server Up ... (healthy), grafana Up ... (healthy), nostr-rs-relay Up ... (healthy), searxng Up ... (healthy), portainer Up ....
    • Generated units for btcpay-server, grafana, nostr-rs-relay, and searxng now show helper-level wget -T 30 / curl -m 30, HealthTimeout=30s, and HealthRetries=5.
    • Generated unit for archy-nbxplorer now also shows helper-level wget -T 30 / curl -m 30, HealthTimeout=30s, and HealthRetries=5; BTCPay stack remained healthy.
    • Filebrowser full lifecycle passed under the manifest/orchestrator path: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=filebrowser ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Filebrowser post-test live verification: filebrowser.service active; bind mounts /srv and /data rendered; Exec=--config /data/.filebrowser.json; generated .filebrowser.json points database to /data/filebrowser.db and root to /srv; container is Up ... (healthy).
  • Operational caveat found:
    • .198 root filesystem remains tight: about 556M free on / (99% used). There are many old backend backup binaries under /usr/local/bin; deploys and Podman image operations are fragile until backup/image cleanup policy is added.
  • Remaining before release:
    • Meshtastic full lifecycle now passed on .198 after routing it through the orchestrator path and fixing its manifest image, device, volume target, health check, launch metadata handling, and TCP port declaration.
    • Replace the temporary/manual Meshtastic host config.yaml dependency with the generic manifest-owned file rendering path:
      • Added local schema support for app.files.
      • Added local production-orchestrator rendering for declared files before container start.
      • Added Meshtastic files: declaration for /var/lib/archipelago/meshtastic/config.yaml.
      • Local manifest parser tests passed; backend orchestrator tests are still running before deployment.
    • Latest post-Meshtastic raw .198 sweep:
      • archipelago.service: active.
      • archipelago-doctor.timer: inactive.
      • archipelago-reconcile.timer: inactive.
      • /: 99% used, about 532M free.
      • jellyfin and filebrowser reported unhealthy; investigate before final release qualification.
    • Add the release code-review/refactor/performance gate: remove dead transitional code, reduce remaining app-specific Rust/OS paths, review scan/health/reconcile performance, then rerun lifecycle and launch tests after cleanup.

2026-05-26 Migration Release Notes

  • Active doctrine: app-specific host mutations should move out of generic Rust/OS install paths wherever possible. Apps should be described by manifests and lifecycle hooks; the Rust backend should provide generic primitives for validation, container lifecycle, health/readiness, port repair, secrets, data ownership, and recovery.
  • Current .198 work remains focused on lifecycle migration hardening first. Do not call the migration finished until focused full lifecycle and broad audits pass on the manifest/orchestrator-owned path.
  • .198 Gitea migration checkpoint:
    • Backend deployed: /usr/local/bin/archipelago sha256 3780e54eec4821a61fbc024259bd854ec376228eb981fa169ec6f8aeafc5a9dd.
    • Gitea manifest deployed to both /opt/archipelago/apps/gitea/manifest.yml and /opt/archipelago/web-ui/archipelago-runtime/apps/gitea/manifest.yml, latest sha256 8df263fcca9581a4e0a2872d21d26eed35b007c7bd7475071bedfd005f514e68.
    • The Gitea fix is manifest-owned: security.no_new_privileges is now honored by the generic Podman/Quadlet renderers, and Gitea declares its required capabilities (CHOWN, FOWNER, SETUID, SETGID, DAC_OVERRIDE, NET_BIND_SERVICE) plus no_new_privileges: false.
    • Focused full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=gitea ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
  • .198 generic host-listener repair checkpoint:
    • Backend deployed: /usr/local/bin/archipelago sha256 be06756763283535d2b3ee911cc91c7d401fb51b4dd88a3ebe86d79a05183e84.
    • Running-container reconcile now probes manifest-declared host ports and repairs missing listeners generically; observed repair restored Grafana port 3000 without a Grafana-specific OS edit.
    • Uptime Kuma repair uses a longer readiness window so the generic repair path does not restart it before its slow HTTP startup completes.
    • Gitea healthcheck timeout/retries were loosened in manifest metadata (timeout: 30s, retries: 5) after raw Podman health showed timeout-only false negatives while HTTP launch returned 200.
    • Focused audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=gitea,grafana,uptime-kuma ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh.
  • Release follow-ups to keep in scope after the current Gitea/Uptime/Nextcloud migration pass:
    • Portainer fixes discussed on 2026-05-26 must be carried into the new declarative approach, not left as a hardcoded OS prerequisite path. Completed for the current .198 pass:
      • Added apps/portainer/manifest.yml with manifest-declared data dirs, Podman socket mount, port 9000, capabilities, data_uid, and no Podman healthcheck.
      • Removed the hardcoded ensure_portainer_host() OS/Rust install hook.
      • Added generic manifest-driven Podman socket preparation for any app that bind-mounts podman.sock.
      • Backend deployed: /usr/local/bin/archipelago sha256 d440e2cba52c6e1b60d8f0716386b0f4e3ce56b5370cedafabc6dbd30d230909.
      • Portainer manifest deployed to both /opt/archipelago/apps/portainer/manifest.yml and /opt/archipelago/web-ui/archipelago-runtime/apps/portainer/manifest.yml, latest sha256 5e2ab96f2ba91ad2539a7dc6b73c92c6cece676109550d7d4c2f556aa578ba9c.
      • Focused full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=portainer ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=900 tests/lifecycle/remote-lifecycle.sh.
    • Re-test the Filebrowser fixes under the manifest/orchestrator path.
    • Re-test the Meshtastic fixes before final release qualification.
    • Add an app packaging documentation gate: update docs/APP-PACKAGING-MIGRATION-PLAN.md and docs/app-developer-guide.md so third-party developers can package apps against the current manifest/runtime contract without relying on one-off OS-level changes.
    • Add a required release code-review/refactor gate before cutting 1.8-alpha: remove dead transitional code, replace remaining app-specific Rust/OS paths with manifest-owned metadata or generic lifecycle primitives, review scan/health/reconcile performance, then rerun lifecycle and launch tests after the cleanup.

2026-05-13 .198 Stopping-State Repair Checkpoint

  • User directive confirmed: testing target is .198 until all containers work and the container layer is bulletproof/perfected.
  • .198 service state after this pass:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 5d3777d928ae6ee7627e9401faf932442806020ab7ad7a439eb7384d8eb7b8e6.
  • Live blocker found and repaired:
    • nostr-rs-relay was stuck in raw Podman state Stopping (healthy); focused lifecycle audit failed with bad state: nostr-rs-relay is stopping.
    • Removed only the wedged container record with podman rm -f nostr-rs-relay; bind-mounted relay data under /var/lib/archipelago/nostr-relay was preserved.
    • Archipelago/runtime recreated the relay and it returned Up ... (healthy).
  • Durable local fix added and deployed:
    • core/archipelago/src/container/prod_orchestrator.rs now treats ContainerState::Stopping as a wedged container record during reconcile and force-recreates it from the manifest instead of trying a normal start.
    • Added unit coverage intent: reconcile_force_recreates_stopping_container.
    • cargo check -p archipelago --bin archipelago passed locally.
    • cargo build -p archipelago --bin archipelago --release passed locally and was deployed to .198.
    • Rust test binary build for the targeted unit test timed out during compilation in this environment before emitting compiler errors; use cargo check plus live .198 audit as the validated gate for this pass.
  • Post-deploy validation on .198:
    • Focused audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool,searxng,nginx-proxy-manager,nostr-rs-relay,grafana,btcpay-server ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=180 tests/lifecycle/remote-lifecycle.sh.
    • Broad audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=240 tests/lifecycle/remote-lifecycle.sh.
    • Raw Podman sweep found no unhealthy, Stopping, Removing, Exited, or Created containers after post-restart startup settled.
    • Direct HTTP probes returned healthy responses (200 or expected 302) for dashboard, bitcoin-ui, lnd-ui, btcpay, indeedhub, botfights, gitea, filebrowser, vaultwarden, searxng, fedimint, jellyfin, immich, homeassistant, grafana, tailscale, uptime-kuma, nextcloud, nginx-proxy-manager, and nostr-rs-relay.
  • Current .198 broad audit state:
    • Running: bitcoin-knots, lnd, btcpay-server, indeedhub, botfights, gitea, filebrowser, vaultwarden, searxng, fedimint, jellyfin, immich, homeassistant, grafana, tailscale, uptime-kuma, nextcloud.
    • Absent/expected in this audit: bitcoin-core, mempool, electrumx, photoprism.
  • Important observation:
    • Immediately after backend restart, bitcoin-knots briefly appeared Exited (137) during startup/recovery, then self-recovered and was running by inspection. Final broad audit and raw sweep were clean.
  • Next recommended gate:
    • Run destructive/full lifecycle on .198 only when ready to intentionally cycle app containers; non-destructive broad audit and raw health are green after the stopping-state fix.

2026-05-13 Resume Correction

  • User directive: "we're testing on .198 right, until all containers are working and we achieve our goal of bulletproof containers".
  • Active target remains .198; do not drift back to older .116/.228 release threads except for cross-node context.
  • Continue lifecycle hardening until every intended .198 container/app is working, recoverable, and aligned with the bulletproof-container goal.

Resume Prompt

Resume Archipelago lifecycle hardening from /home/archipelago/Projects/archy. Read docs/CONTAINER_LIFECYCLE_HANDOFF.md first. Active mission is node `192.168.1.198`, not the older `.116/.228/.67` release thread. SSH key is `/home/archipelago/.ssh/id_ed25519`; lifecycle password is `password123`. Preserve data unless explicitly told otherwise. Keep `archipelago-doctor.timer` and `archipelago-reconcile.timer` inactive during deterministic testing. Do not revert unrelated dirty worktree changes because another agent/user may be working too.

Mission: make every Archipelago app/container on `.198` lifecycle-safe and power-loss/reboot resilient. Containers should not randomly go down; app state must recover through daemon restarts, reboots, stale Podman/Quadlet state, missing host listeners, stuck installs, stopped/exited state drift, and stale stack/container records. Release is blocked until strict lifecycle plus app-specific reachability/launch probes agree with raw Podman health and actual app behavior.

Latest live `.198` status from 2026-05-11: `archipelago.service` active; `archipelago-doctor.timer` inactive; `archipelago-reconcile.timer` inactive; deployed `/usr/local/bin/archipelago` sha256 `ed4df8e4c3c0a12a481ea41f8246da4b5f9e9ad931d0f3f58084b0057c330af0`. Broad audit passed with `ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=180 tests/lifecycle/remote-lifecycle.sh`, but this is not enough for release because raw Podman still reports health/state mismatches.

Current suspected release blocker: reconcile the broad-audit pass with raw Podman health. On `.198`, `mempool-api` is `Up ... (unhealthy)`, `searxng` is `Up ... (unhealthy)`, `botfights` is `Up ... (unhealthy)`, and `nostr-rs-relay` is `Stopping (unhealthy)`. RPC/package state reports the installed audit set as running, so next work is to diagnose these health/state mismatches, decide whether each is a false-negative healthcheck or real app failure, fix the manifest/runtime/reconcile behavior, then rerun focused full lifecycle and browser/direct launch probes for affected apps.

Known `.198` package state from latest broad audit: running `lnd`, `mempool`, `indeedhub`, `botfights`, `gitea`, `filebrowser`, `vaultwarden`, `searxng`, `fedimint`, `jellyfin`, `immich`, `homeassistant`, `tailscale`, `uptime-kuma`, `nextcloud`; absent `bitcoin-knots`, `bitcoin-core`, `btcpay-server`, `electrumx`, `grafana`, `photoprism`. Some absences are expected/blockers from earlier qualification, but `btcpay-server` and `grafana` had previously passed focused checks, so verify whether their absence is intentional before release.

Regenerated release artifacts:
- `releases/v1.7.54-alpha/archipelago`: `77e3a236a6196a5ab9ec2411b150490e78ffc95ea6ab8eb34ab29b3df53cd632`
- `releases/v1.7.54-alpha/archipelago-frontend-1.7.54-alpha.tar.gz`: `a010ac43a2dd02f528202cb2f7b99b61ceab80adc6827877594e41df4ea951fb`
- `releases/manifest.json` and `release-manifest.json`: `0fb73c808ef87c1535c5e5f560ea331bacaded86c8c81abd5cdd2893a0415b6f`
- Unbundled ISO: `image-recipe/results/archipelago-installer-1.7.54-alpha-unbundled-x86_64.iso`, sha256 `9828b244e6ffdd5f1b1d5184c1b22bef7474b32078b1ceb4ec3584d9bdb6775b`, size `2.3G`.

2026-05-11 .198 Active Mission Checkpoint

2026-05-11 Resume Session Update

  • Latest user directive: "please resume our work".
  • Reconfirmed active mission is .198 lifecycle hardening, not the older .116/.228/.67 thread.
  • Live .198 state at resume:
    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: 494cd64f77cbecb95c08552237cb8fd3c11c2b2b76d5d39854e6cf92b5900b68.
  • Raw Podman still showed release blockers:
    • mempool-api: Up ... (unhealthy).
    • nginx-proxy-manager: Up ... (unhealthy).
    • nostr-rs-relay: Stopping (healthy).
    • searxng was healthy by the time of recheck and served http://127.0.0.1:8888/ with HTTP 200.
  • Diagnosed mempool-api as a real app failure, not a false-negative healthcheck: logs repeatedly show getaddrinfo ENOTFOUND electrumx, and .198 has no electrumx container present. mempool-api is configured with ELECTRUM_HOST=electrumx, so the broad audit was masking a broken stack member.
  • Found and fixed a local backend masking bug: ProdContainerOrchestrator::health returned healthy for every running container and ignored Podman's actual health status. It now returns Podman's health value for running containers, maps Stopping to unhealthy, and ContainerState now parses Podman's stopping state explicitly.
  • Local verification:
    • cargo fmt passed.
    • cargo test -p archipelago-container parse_podman_ps_json_handles_cli_output passed.
    • cargo check -p archipelago --bin archipelago passed.
    • cargo test -p archipelago health_maps_states_to_strings did not finish within 3 minutes during crate compilation; no compiler error was emitted before timeout.
  • Focused live audit command attempted: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool,searxng,nginx-proxy-manager ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=60 tests/lifecycle/remote-lifecycle.sh. It timed out because the deployed .198 backend still has the old health behavior and Podman operations on the node are intermittently hanging.
  • Next continuation point:
    • Decide whether to deploy a freshly built backend to .198. Do not deploy the current dirty worktree blindly unless the existing unrelated changes are intended for this release, because the workspace contains many modified files from prior work.
    • After deploy, rerun focused audit for mempool,searxng,nginx-proxy-manager and verify container-health reports mempool or stack health as unhealthy while mempool-api cannot resolve electrumx.
    • Fix the mempool stack qualification: on a pruned/under-disk node, mempool must not install/start into a half-running state that leaves mempool-api unhealthy because electrumx is absent.

2026-05-12 Lifecycle Hardening Completion Checkpoint

  • User directive: continue until the work is done.

  • Deployed fixed backend to .198; final /usr/local/bin/archipelago sha256: 616e50ba8a83654e4a7656f931e5c9d1340a92cfa0ba22906edc0d374560df02.

  • archipelago.service active; archipelago-doctor.timer inactive; archipelago-reconcile.timer inactive.

  • Local durable fixes made:

    • ProdContainerOrchestrator::health now respects Podman's health status instead of mapping all running containers to healthy.
    • Podman stopping state is parsed explicitly and maps to unhealthy/stopping instead of unknown/running.
    • container-health aggregates stack health for multi-container apps, so stack apps cannot hide unhealthy members like mempool-api.
    • Health fallback now uses bounded exact-container Podman checks to avoid broad podman ps hangs poisoning unrelated app health.
    • mempool install now runs dependency and archival-Bitcoin checks before dispatching to the stack installer, preventing half-running mempool stacks on pruned/under-disk nodes.
    • Nginx Proxy Manager healthcheck now probes http://localhost:81/; /api/ returns 502 on the deployed image while the UI is healthy.
    • Runtime start repair now covers Vaultwarden and Nextcloud missing host listeners.
    • Nextcloud runtime repair fixes bind-mounted data ownership before start/restart.
    • Stale transitional state timeout lowered from 20 minutes to 2 minutes so dead lifecycle tasks clear promptly.
  • Live .198 repairs performed with data preserved:

    • Removed broken mempool stack via package.uninstall preserve_data=true; mempool is now absent and full lifecycle correctly reports archival-blocked install.
    • Recreated Nginx Proxy Manager container after stale Podman Removing state; data under /var/lib/archipelago/nginx-proxy-manager preserved.
    • Recreated Vaultwarden container after stale conmon/host-listener failure; /var/lib/archipelago/vaultwarden preserved.
    • Recreated Home Assistant and Nextcloud container records after stale conmon/host-listener failures; data directories preserved.
    • Repaired Nextcloud ownership (/var/lib/archipelago/nextcloud) so Apache/PHP can write config.php and data/nextcloud.log.
  • Verification passed:

    • cargo fmt.
    • cargo check -p archipelago --bin archipelago.
    • cargo build -p archipelago --bin archipelago --release.
    • cargo test -p archipelago-container parse_podman_ps_json_handles_cli_output passed earlier in this session.
    • cargo test -p archipelago health_maps_states_to_strings still fails during local test binary linking with rust-lld undefined hidden symbols; cargo check and release build pass.
    • Focused audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool,searxng,nginx-proxy-manager ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=120 tests/lifecycle/remote-lifecycle.sh.
    • Broad audit passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=180 tests/lifecycle/remote-lifecycle.sh.
    • Final raw Podman sweep found no unhealthy, Stopping, or Removing containers.
    • Final direct probes returned HTTP 200 for LND UI, IndeedHub, Botfights, Gitea, File Browser, Vaultwarden, SearXNG, Fedimint, Jellyfin, Immich, Home Assistant, Tailscale, Uptime Kuma, Nextcloud, Nginx Proxy Manager, and Nostr Relay.
  • Final broad audit state:

    • Running: lnd, indeedhub, botfights, gitea, filebrowser, vaultwarden, searxng, fedimint, jellyfin, immich, homeassistant, tailscale, uptime-kuma, nextcloud.
    • Absent/expected for this node or archival-gated: bitcoin-knots, bitcoin-core, btcpay-server, mempool, electrumx, grafana, photoprism.
  • Remaining release consideration: .198 is green for the non-destructive broad audit and raw Podman health. Destructive/full lifecycle should still be run only when you are ready to intentionally cycle app containers.

  • User corrected the active mission after disconnect: continue .198 container lifecycle hardening, not the older .116/.228/.67 thread.

  • Mission: build "perfect containers" that do not go down unexpectedly and recover through daemon restarts, server reboots, power loss, stale Podman/Quadlet state, missing rootless host listeners, stuck installs, stopped/exited state drift, and stale stack/container records.

  • Preserve app data unless explicitly told otherwise.

  • Keep deterministic-test timers paused: archipelago-doctor.timer and archipelago-reconcile.timer should remain inactive.

  • Latest verified .198 service state:

    • archipelago.service: active.
    • archipelago-doctor.timer: inactive.
    • archipelago-reconcile.timer: inactive.
    • /usr/local/bin/archipelago sha256: ed4df8e4c3c0a12a481ea41f8246da4b5f9e9ad931d0f3f58084b0057c330af0.
  • Latest broad audit command passed:

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=180 tests/lifecycle/remote-lifecycle.sh
  • Latest broad audit states:
    • Running: lnd, mempool, indeedhub, botfights, gitea, filebrowser, vaultwarden, searxng, fedimint, jellyfin, immich, homeassistant, tailscale, uptime-kuma, nextcloud.
    • Absent: bitcoin-knots, bitcoin-core, btcpay-server, electrumx, grafana, photoprism.
  • Do not treat the broad audit pass as release-ready yet. Raw Podman still showed these concerning health/state mismatches:
    • mempool-api: Up ... (unhealthy).
    • searxng: Up ... (unhealthy).
    • botfights: Up ... (unhealthy).
    • nostr-rs-relay: Stopping (unhealthy).
  • Current suspected release blocker: Archipelago package state and broad audit say apps are running, but raw Podman health/state still reports unhealthy/stopping containers. Next agent should diagnose whether each mismatch is a false-negative healthcheck, stale Podman state, or a real app failure; then fix manifest/runtime/reconcile behavior and rerun focused full lifecycle plus browser/direct launch probes for affected apps.
  • Also verify whether btcpay-server and grafana being absent is intentional, because both had previously passed focused lifecycle checks on .198.

2026-05-06 Resume Checkpoint

  • Goal: make container lifecycle and health recovery durable for every install and existing Archipelago server, while preserving app data.
  • .228 state:
    • SSH key auth still fails, but password SSH works with password archipelago.
    • Quarantined stale Quadlet blocker ~/.config/containers/systemd/bitcoin-core.container.disabled-20260506.
    • Started companion Bitcoin/LND UI services; external ports 8334 and 18083 return HTTP 200.
    • Recreated stale bitcoin-knots container record only, preserving /var/lib/archipelago/bitcoin and BITCOIN_RPC_PASS; authenticated local RPC works.
    • Diagnosed Immich reset loop as immich_postgres memory cap 512MiB; raised live cap to 2g/4g swap and made it persistent in code.
    • Final external checks passed: dashboard 200, Bitcoin UI 200, LND UI 200, Immich 200, Bitcoin RPC unauthenticated 405 expected.
  • .116 state:
    • Removed stale update override /etc/systemd/system/archipelago.service.d/update-url.conf.
    • Valid RPC/password auth is archipelago; password123 failed.
    • Recreated stale bitcoin-knots preserving data and RPC password; direct authenticated RPC works.
    • Fixed Grafana with podman unshare chown -R 472:472 /var/lib/archipelago/grafana; Grafana health returns 200.
    • Deployed locally built fixed backend to /usr/local/bin/archipelago; previous binary was backed up and service restarted.
    • Backend deploy checksum now c6c7830f14dc80b0e22d803997ad3df31c9ab3d4b08829b3bddc1b03ce77bd0a.
    • Repaired active nginx config and canonical config so curl http://127.0.0.1/bitcoin-status returns JSON instead of SPA HTML.
    • Repaired LND UI companion drift: generated quadlet was using stale localhost/lnd-ui:latest, whose nginx listened on container port 8081 while the unit mapped 18083:80. Updated the live unit to use localhost/lnd-ui:local; http://127.0.0.1:18083/ returns HTTP 200 and survives systemctl --user restart archy-lnd-ui.service.
    • Focused non-destructive lifecycle audit passed: ARCHY_HOST=192.168.1.116 ARCHY_SCHEME=http ARCHY_PASSWORD=archipelago ARCHY_APPS=bitcoin-knots,lnd,btcpay-server,mempool,grafana ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh.
    • Deployed newest local backend and script fixes live to .116, restarted Archipelago twice, and re-ran the focused non-destructive audit successfully. Important release/OTA note: startup promoted stale /opt/archipelago/web-ui/archipelago-runtime/scripts over /opt/archipelago/scripts once; after refreshing the runtime payload scripts too, restart preserved 18083 everywhere.
    • Recent Bitcoin/ElectrumX status warnings appear transient during Bitcoin IBD/UTXO flushes. Live /bitcoin-status is ok=true, stale=false; ElectrumX reports waiting because it is indexed beyond the local Bitcoin node and is waiting for Bitcoin catch-up.
  • .67 state:
    • User confirmed credentials archipelago/archipelago.
    • This workspace cannot reach it: SSH No route to host, HTTP 000, ping 100% loss, neighbor incomplete/failed.
    • IndeedHub reboot/Nostr signing fix still needs live verification from a host that can reach .67.
  • Local durable fixes in progress/done:
    • Bitcoin/Grafana/Immich/IndeedHub backend fixes are implemented locally.
    • UI loading/launch readiness fixes are implemented locally.
    • Nginx canonical config now includes /bitcoin-status proxy next to /electrs-status.
    • Startup bootstrap now patches older nginx configs that are missing /bitcoin-status and still patches /api/app-catalog when needed. It handles both sites-available/archipelago and copied sites-enabled/archipelago layouts.
    • LND UI companion/spec drift is fixed locally: first-boot/container specs now use host 18083, and companion reconcile now rewrites stale quadlet units/images instead of only checking active state.
    • Release packaging now includes image-recipe/configs/nginx-archipelago.conf in the OTA runtime payload and strips __pycache__, .pyc, .bak, .bak-*, and logs from runtime assets.
    • Regenerated v1.7.54-alpha frontend tarball was explicitly verified to contain LND UI 18083, LND UI container nginx listen 80, and /bitcoin-status nginx blocks; no pycache/pyc/bak junk remains.
    • ISO builder now configures both 146.59.87.168:3000 and git.tx1138.com as insecure for Podman and passes --tls-verify=false for primary HTTP registry pulls. The unbundled ISO now successfully pulls and saves filebrowser.tar instead of warning that Cloud/File Browser will be missing.
    • ISO output filenames now include the release version and alpha suffix, e.g. archipelago-installer-1.7.54-alpha-unbundled-x86_64.iso.
  • Verification already passed before latest nginx change:
    • cargo fmt
    • cargo check -p archipelago --bin archipelago
    • cargo build -p archipelago --bin archipelago --release
    • bash -n scripts/first-boot-containers.sh
    • bash -n image-recipe/build-debian-iso.sh image-recipe/archipelago-scripts/install-to-disk.sh image-recipe/write-usb-dd.sh image-recipe/create-fat32-usb.sh image-recipe/_archived/build-auto-installer-iso.sh scripts/create-release-manifest.sh scripts/container-specs.sh scripts/first-boot-containers.sh scripts/self-update.sh
    • cd neode-ui && npm run build
    • cd neode-ui && npm run type-check
    • cd neode-ui && npm test -- appsConfig.test.ts appLauncher.test.ts --run
    • scripts/check-release-manifest.sh
    • sudo -n env UNBUNDLED=1 BUILD_FROM_SOURCE=1 bash build-debian-iso.sh from image-recipe/ passed and produced the v1.7.54-alpha unbundled ISO.
  • Next steps:
    • Re-check .116 Archipelago logs for Bitcoin status: RPC failure: getblockchaininfo after Bitcoin IBD/UTXO flushing calms down.
    • Deploy the fixed backend to .228 if desired so durable repairs run there too.
    • Optional next gate: run a full bundled/core-image ISO build if you need offline app images. The prior File Browser HTTP registry blocker is fixed for the builder path.
    • Verify IndeedHub on .67 only from a reachable network path.

2026-05-05 Botfights, Gitea, Icons

2026-05-06 Multi-Node Non-Destructive Audit

2026-05-06 .228 Live Repair

  • Access notes:

    • SSH key auth to .228 still fails, but password SSH works with password archipelago.
    • Dashboard/RPC health reports version=1.7.53-alpha.
  • Companion UI repair:

    • Root cause: a stale rootless Quadlet unit at ~/.config/containers/systemd/bitcoin-core.container blocked user Quadlet generation, so archy-bitcoin-ui.service and archy-lnd-ui.service were missing even though their .container files existed.
    • Quarantined only the stale blocker: ~/.config/containers/systemd/bitcoin-core.container.disabled-20260506.
    • Ran user daemon reload and started generated companion services.
    • Final verification: archy-bitcoin-ui.service and archy-lnd-ui.service are active; external http://192.168.1.228:8334/ and http://192.168.1.228:18083/ both return HTTP 200.
  • Bitcoin Knots repair:

    • Root cause: existing bitcoin-knots container record was stale and still launched exec bitcoind; current image only provides /opt/bitcoin-29.3.knots20260210/bin/bitcoind on PATH/fallback.
    • Removed and recreated only the bitcoin-knots container record, preserving /var/lib/archipelago/bitcoin and the existing BITCOIN_RPC_PASS.
    • New command matches the deployed manifest fallback: resolve command -v bitcoind, then search /opt -path '*/bin/bitcoind'.
    • Final verification: container is running, ports 8332/8333 are listening, authenticated local RPC getblockchaininfo works, and the node is in initial block/header sync.
  • Immich repair:

    • Root cause: immich_postgres was capped at 512MiB; during Immich v2.7.4 reverse-geocoding geodata import, Postgres child processes were SIGKILLed while bulk inserting into geodata_places, forcing DB recovery and causing immich_server to reset connections on 2283.
    • Raised only the Postgres container memory limit with podman update --memory=2g --memory-swap=4g immich_postgres, then restarted immich_postgres and immich_server; preserved /var/lib/archipelago/immich-db and /var/lib/archipelago/immich.
    • Final logs showed Successfully imported 224210 geodata records, Initialized local reverse geocoder, and both Immich API/microservices successfully started.
    • Final external verification: http://192.168.1.228:2283/ returns HTTP 200.
  • Final .228 external status after repair:

    • Dashboard http://192.168.1.228/: HTTP 200.
    • Bitcoin UI http://192.168.1.228:8334/: HTTP 200.
    • LND UI http://192.168.1.228:18083/: HTTP 200.
    • Immich http://192.168.1.228:2283/: HTTP 200.
    • Bitcoin RPC no-auth probe http://192.168.1.228:8332/: HTTP 405, expected for reachable RPC without credentials.
  • Still outstanding from this audit:

    • .116 has the same stale Bitcoin Knots container-command symptom but RPC password password123 fails; do not repair until valid auth/SSH access is confirmed.
    • .67 remains unreachable from this machine even with confirmed credentials archipelago/archipelago: SSH reports No route to host, HTTP probes return 000, local route is via wlp3s0 from 192.168.1.116, and ping has 100% packet loss. IndeedHub reboot behavior still needs diagnosis from a host that can reach .67.
    • The .228 ad-hoc Immich Postgres memory repair was made persistent locally after the live fix: install_immich_stack now creates immich_postgres with --memory=2g, and get_memory_limit("immich_postgres") returns 2g. Verification passed with cargo fmt and cargo check -p archipelago --bin archipelago.
  • IndeedHub reboot/Nostr signing root cause and local fix:

    • User confirmed IndeedHub works after a manual restart, but after server boot it fails to come back correctly and forgets the Nostr signing/provider behavior.
    • Root cause in code: ProdContainerOrchestrator::ensure_running_with_mode returned stack-managed immediately for indeedhub, so the boot reconciler never started/repaired the installed stack and never reapplied the imperative frontend nginx/Nostr-provider mutation.
    • Additional gap: package start/restart repaired IndeedHub network aliases but did not reapply nostr-provider.js / nginx patch after the frontend container was started.
    • Local fix: boot reconcile now handles an existing IndeedHub stack without fresh-installing the single manifest: starts backend containers, starts frontend if stopped/exited/created, repairs network aliases, reapplies the Nostr provider/nginx patch, and restarts the frontend if host port 7778 is not listening.
    • Local fix: package start/restart now reapplies the IndeedHub Nostr provider patch whenever indeedhub is in the started/restarted set.
    • Verification passed locally with cargo fmt and cargo check -p archipelago --bin archipelago.
    • Not live-verified on .67 because this workspace still cannot reach .67; deploy the backend build to a reachable test node or run from a host that can reach .67, then reboot and confirm http://<node>:7778/ plus Nostr signing in the iframe.
  • Bitcoin/Grafana permanent repair notes:

    • .116 showed Unable to connect to Bitcoin node because bitcoin-knots had the same stale container command as .228: existing container record still executed bare bitcoind, but the current image only has /opt/bitcoin-29.3.knots20260210/bin/bitcoind discoverable via PATH/fallback.
    • Local permanent fix: ProdContainerOrchestrator::container_env_drifted now also checks entrypoint/cmd drift against the current manifest. Existing stale containers whose command no longer matches the deployed manifest are removed/recreated by boot reconcile/start/install flows, preserving bind-mounted data.
    • .116 Grafana served /api/health but logs showed GF_PATHS_DATA='/var/lib/grafana' is not writable and repeated attempt to write a readonly database; live data ownership had mixed rootless mapped owners.
    • Local permanent fix: apps/grafana/manifest.yml now declares data_uid: "472:472", and Grafana start/reconcile paths repair /var/lib/archipelago/grafana ownership before start/restart. This makes fresh installs and already-installed nodes self-heal instead of relying on manual chown.
    • Verification passed with cargo fmt and cargo check -p archipelago --bin archipelago.
  • Current local branch state during audit:

    • main is 31 commits ahead of tx1138/main.
    • Tracked worktree is clean.
    • Untracked docs: docs/CONTAINER_LIFECYCLE_HANDOFF.md and docs/CHAT_TRANSCRIPT_2026-05-02.md.
  • Connectivity and service health:

    • .198: SSH reachable with /home/archipelago/.ssh/id_ed25519; archipelago.service active; local health returns status=ok, version=1.7.53-alpha.
    • .116: SSH reachable with /home/archipelago/.ssh/id_ed25519; archipelago.service active; local health returns status=ok, version=1.7.51-alpha.
    • .228: SSH still blocked with Permission denied (publickey,password); dashboard/RPC is reachable over HTTP/HTTPS.
  • Broad non-destructive lifecycle audit results:

    • .198 passed cleanly: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=180 tests/lifecycle/remote-lifecycle.sh.
    • .228 failed two checks with RPC-only audit: Bitcoin Knots UI direct port http://192.168.1.228:8334/ returned status=000, and LND UI direct port http://192.168.1.228:18083/ returned status=000. Dashboard itself returns HTTP 200. SSH-level diagnosis is blocked until credentials/key access are fixed.
    • .116 audit did not complete within 15 minutes and showed degraded state: container-health returned unknown for bitcoin-knots, btcpay-server, and lnd; LND direct port http://192.168.1.116:18083/ returned status=000. Direct probes showed dashboard HTTP 200, Bitcoin UI http://192.168.1.116:8334/ HTTP 200, old LND UI http://192.168.1.116:8081/ HTTP 200, BTCPay http://192.168.1.116:23000/ HTTP 302, and Mempool http://192.168.1.116:4080/ HTTP 200.
  • .116 live diagnostics:

    • Deployed backend checksum: f761e659d661f0a83cd3a67a086bb2279398bc05e50ee3c52e769e52d11e476c.
    • Service has ARCHIPELAGO_DEV_MODE=true override and ARCHIPELAGO_UPDATE_URL=http://192.168.1.116:3000/lfg2025/archy/raw/branch/main/releases/manifest.json.
    • archy-lnd-ui is still mapped to 0.0.0.0:8081->80/tcp, while the current lifecycle harness expects LND UI on 18083; treat .116 as stale relative to the current LND port migration.
    • lnd is Up ... (unhealthy) on 8080, 9735, and 10009.
    • btcpay-server is Up ... (unhealthy) on 23000.
    • bitcoin-knots is Up ... (reset) and backend logs show repeated Bitcoin RPC failures for getblockchaininfo.
    • Backend logs show ElectrumX status also failing Bitcoin RPC.
  • .198 live diagnostics:

    • Deployed backend checksum observed during this audit: 86cf408ed84c7a7a72d1b5529aa97561dd02db38aab57c523999d1f5e7bf48b7.
  • Local smoke verification passed:

    • cargo check -p archipelago --bin archipelago from core/.
    • npm run type-check from neode-ui/.
    • npm test -- appsConfig.test.ts appLauncher.test.ts --run from neode-ui/ (27 passed).
  • Next focused actions:

    • Fix .228 SSH access first if deeper runtime diagnosis is required; RPC-only audit already identifies closed/unreachable direct app ports 8334 and 18083.
    • Bring .116 forward to the current deployed release/runtime expectations before treating lifecycle failures as fresh regressions. It is on 1.7.51-alpha, has dev-mode/update-url overrides, and still launches LND UI on legacy port 8081.
    • After .116 is updated, rerun focused non-destructive checks for bitcoin-knots, lnd, btcpay-server, mempool, and ElectrumX/Bitcoin RPC status before a full broad audit.

2026-05-05 Tailscale And Grafana Recheck

2026-05-05 Release v1.7.52-alpha Staging

  • Release target corrected to 1.7.52-alpha.
  • Version bumped locally in:
    • core/archipelago/Cargo.toml
    • core/Cargo.lock
    • neode-ui/package.json
    • neode-ui/package-lock.json
  • .52 release notes added to CHANGELOG.md.
  • Debian 13/Trixie security mitigation added for rebuilt media:
    • _archived/build-auto-installer-iso.sh now runs apt-get -y full-upgrade after enabling Debian/Trixie security repositories during rootfs, Tailscale, FIPS, and installer environment creation.
    • image-recipe/archipelago-scripts/install-to-disk.sh now runs apt-get -y full-upgrade after writing trixie-security sources and before installing kernel/bootloader/packages.
    • This does not retroactively patch already-built ISOs; .52 media must be rebuilt.
  • Active ISO command restored:
    • Added image-recipe/build-debian-iso.sh wrapper around the archived builder so documented ISO commands no longer point at a missing script.
    • USB helper scripts now default to results/archipelago-installer-x86_64.iso / unbundled fallback and allow ARCHIPELAGO_ISO=/path/to.iso.
  • .52 release artifacts staged:
    • releases/v1.7.52-alpha/archipelago
    • releases/v1.7.52-alpha/archipelago-frontend-1.7.52-alpha.tar.gz
    • releases/manifest.json
    • release-manifest.json
  • Manifest validation passed: scripts/check-release-manifest.sh.
  • Frontend dependency audit:
    • Ran npm audit fix, removing the critical protobufjs advisory and high advisories.
    • Remaining audit finding is moderate uuid <14 via dockerode; npm audit fix --force would upgrade to breaking dockerode@5.0.0, so this was not forced during release staging.
  • Final verification passed:
    • cargo build -p archipelago --bin archipelago --release with existing reconcile_all dead-code warning.
    • cargo check -p archipelago --bin archipelago with same warning.
    • cd neode-ui && npm run build.
    • cd neode-ui && npm run type-check && npm test -- appsConfig.test.ts appLauncher.test.ts --run.
    • bash -n image-recipe/build-debian-iso.sh image-recipe/archipelago-scripts/install-to-disk.sh image-recipe/write-usb-dd.sh image-recipe/create-fat32-usb.sh image-recipe/_archived/build-auto-installer-iso.sh.
    • npm audit --audit-level=high reports only moderate findings and exits with the remaining moderate dockerode/uuid issue.
  • Not yet done in this pass:
    • Full bundled ISO build was not run; unbundled ISO build passed.
    • .52 release artifacts were staged locally but not committed, tagged, or pushed.
    • No git commit was created.

2026-05-05 Warning Fix And ISO Build

  • Removed the reconcile_all dead-code warning by making the install-missing reconcile helper test-only with #[cfg(test)]; production uses reconcile_existing.

  • Verification now passes without Rust warnings:

    • cargo check -p archipelago --bin archipelago
    • cargo build -p archipelago --bin archipelago --release
  • Refreshed .52 backend artifact and manifests after the warning fix:

    • scripts/check-release-manifest.sh passes.
    • Backend sha256: fc47c3bc42f67472252cb854bb03e200a92929ab38aeac519422704486af18d4.
    • Frontend tarball sha256: 329e57a0491e91966afcd5a82f5c00920657695b01ecc6c9e99c6814b44abf29.
  • Built unbundled .52 Debian ISO:

    • Command: sudo -n env UNBUNDLED=1 BUILD_FROM_SOURCE=1 bash image-recipe/build-debian-iso.sh from image-recipe/.
    • Output: image-recipe/results/archipelago-installer-unbundled-x86_64.iso.
    • Size: 2.3G.
    • sha256: 547ba5dcd0ad61aeaa52ce0beaff4f447e2ab2c59bf6b1fa127529606fe0209d.
  • ISO build note:

    • The unbundled ISO completed successfully.
    • Optional File Browser core image pull failed during Step 3b because 146.59.87.168:3000 answered HTTP while Podman tried HTTPS: server gave HTTP response to HTTPS client.
    • This was non-fatal for unbundled media; Cloud/File Browser may need post-install Marketplace download unless registry TLS/insecure registry config is corrected before a bundled/core-image ISO.
  • Backend build deployed to .198: eb539aaa11b32776888be1b23b90c9c0c78b46d8a86dc55ccce7f5b15bbda16e.

  • Tailscale is now qualified:

    • Root cause: container command started tailscale web before tailscaled, so the web UI exited because /var/run/tailscale/tailscaled.sock did not exist yet.
    • Fixed backend config and first-boot script to start tailscaled --tun=userspace-networking first, then bind tailscale web --listen 0.0.0.0:8240.
    • Removed only the stale tailscale container on .198; preserved /var/lib/archipelago/tailscale.
    • Full preserve-data lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=tailscale ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Frontend launch now opens local app port http://<host>:8240/ instead of the external Tailscale admin site.
    • Browser launch passed: ARCHY_BASE_URL=http://192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APP_ID=tailscale ARCHY_APP_TITLE=Tailscale ARCHY_APP_CARD_TITLE=Tailscale ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:8240/ ARCHY_EXPECTED_LAUNCH_MODE=popup ARCHY_EXPECTED_BODY_PATTERN='Tailscale|Connect|Login|Sign|Authorize|Machines|Admin|Tailnet|VPN' npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line.
  • Grafana regression was found during broad audit:

    • RPC/container state was running, but direct launch failed on http://192.168.1.198:3000/ with status=000; Podman reported a port mapping while ss had no host listener.
    • Extended existing host-port listener repair to include Grafana port 3000 on install/adoption/start/restart paths.
    • Full Grafana lifecycle passed after repair, then focused Grafana audit passed.
  • Broad .198 audit passed after Tailscale and Grafana repairs:

    • Command: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 ARCHY_TIMEOUT=300 tests/lifecycle/remote-lifecycle.sh.
    • Running apps included tailscale, grafana, and the previously qualified app set.
    • Absent and tolerated: ollama, photoprism, electrumx, dwn.
  • Local verification passed:

    • cargo fmt
    • cargo build -p archipelago --bin archipelago --release with existing reconcile_all dead-code warning.
    • cargo check -p archipelago --bin archipelago with same warning.
    • bash -n scripts/first-boot-containers.sh
    • cd neode-ui && npm run build
    • cd neode-ui && npm run type-check
    • cd neode-ui && npm test -- appsConfig.test.ts appLauncher.test.ts --run
  • Backend build deployed to .198: 4b92ecea7d0a988c4ebe814b47f49f00277867d5f1eb0dca2cb1cd906b536fe6.

  • Gitea regression re-tested and repaired after later launch failure:

    • Failure reproduced during full lifecycle after restart: launch failed: gitea http://192.168.1.198:3001/ status=000 bytes=0.
    • Live diagnosis: Gitea was healthy internally on container port 3000 and ROOT_URL was correct, but Podman's rootless pasta host listener on :3001 accepted no traffic.
    • Changed Gitea install networking in core/archipelago/src/api/rpc/package/install.rs to --network=slirp4netns:allow_host_loopback=true, matching the Uptime Kuma rootless listener repair path.
    • Backend build deployed to .198: 9db6c192c2e633c4648fafc0372ea0f3cb0749aacc5396bb12f7710c8bac4aa7.
    • Full preserve-data lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=gitea ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Direct check passed: http://192.168.1.198:3001/ returned HTTP 200; final container inspect showed network=slirp4netns and rootlessport listening on :3001.
  • Botfights is qualified:

    • Initial failure was stale pasta.avx2 listener on host port 9100; no Botfights container owned it.
    • Killed stale pid 211879 and reran full lifecycle.
    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=botfights ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
  • Gitea is qualified:

    • User-visible launch error was broken asset root: Gitea generated /app/gitea/assets/... URLs while the UI/lifecycle launched direct port http://192.168.1.198:3001/.
    • Fixed backend post-install hook in core/archipelago/src/api/rpc/package/install.rs to set ROOT_URL = http://<host>:3001/ instead of /app/gitea/.
    • Added install/start/restart stale listener cleanup and host-port verification for Gitea host ports 3001, 2222, and legacy stale 3000.
    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=gitea ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
  • Icons updated locally:

    • Replacement files found at /home/archipelago/immich.png, /home/archipelago/electrumx.png, and /home/archipelago/grafana.png.
    • Replaced neode-ui/public/assets/img/app-icons/immich.png, neode-ui/public/assets/img/app-icons/grafana.png, and neode-ui/public/assets/img/grafana.png.
    • Added neode-ui/public/assets/img/app-icons/electrumx.png and updated catalog/curated/marketplace references from .webp to .png.
    • Installed Gitea icon now falls back to existing /assets/img/app-icons/gitea.svg instead of nonexistent /assets/img/app-icons/gitea.png.
    • AppHeroSection.vue now uses resolveAppIcon() so app details uses the same fallback behavior.
    • Verification passed: npm test -- appsConfig.test.ts --run.

2026-05-05 Nextcloud, Uptime Kuma, ElectrumX Warning

  • Backend build deployed to .198: 1796cccd44e7d8f34b495b2dc04bc933d85a32c8c77cee31800653cc5f7b05d0.
  • Nextcloud live 403 Forbidden was caused by unreadable Apache/PHP entry files inside the container:
    • .htaccess, index.php, and status.php were 0600 root:root.
    • Added targeted Nextcloud permission repair in core/archipelago/src/api/rpc/package/install.rs instead of broad recursive ownership/mode changes.
    • Manually repaired live container file modes and restarted Nextcloud.
    • Retested http://192.168.1.198:8085/status.php and http://192.168.1.198:8085/; both returned HTTP/1.1 200 OK.
  • Uptime Kuma root cause was rootless host port listener instability:
    • The app was healthy internally on 127.0.0.1:3001 and returned 302 /dashboard, while the host 3002 listener was missing despite Podman showing a mapping.
    • Changed Uptime Kuma install networking in core/archipelago/src/api/rpc/package/install.rs to --network=slirp4netns:allow_host_loopback=true.
    • Ran cargo fmt, cargo check -p archipelago --bin archipelago, and cargo build -p archipelago --bin archipelago --release successfully before deploy.
    • Recreated Uptime Kuma through local backend RPC on .198 with preserve-data uninstall/reinstall; preserved /var/lib/archipelago/uptime-kuma.
    • Retested http://192.168.1.198:3002/; final response was HTTP/1.1 302 Found with Location: /dashboard.
  • ElectrumX archival-node UI warning implemented in neode-ui:
    • Marketplace.vue, MarketplaceAppDetails.vue, and Discover.vue fetch /bitcoin-status and only block ElectrumX/electrs/mempool-electrs installs when blockchain_info.pruned === true.
    • Failed or unavailable prune-status fetches remain fail-safe and do not block install attempts.
    • Warning text shown via toast/error paths: You need a full archival bitcoin node before downloading ElectrumX.
    • MarketplaceAppCard.vue blocked warning button is clickable so the toast path can display the popup text instead of silently disabling the button.
    • Frontend verification passed: npm run type-check from neode-ui.
  • Icon replacement remains blocked:
    • Searched likely upload locations and repo icon paths; no replacement icon files were found.
    • Existing icon directory is neode-ui/public/assets/img/app-icons/.
    • Continue once the actual replacement files/path are provided.

2026-05-04 Testing Continuation

  • SearXNG rootless listener fix deployed and qualified after reconnection:

    • Backend build deployed to .198: 0773e8719cfd1099ffeae27d9f046749353ebb7fa795c36097b674bd54c28820.
    • Root cause: the new-container install path repaired a missing rootless pasta host listener on port 8888, but the legacy "container already exists, adopt it" path could return success without the same repair. This left Podman reporting 0.0.0.0:8888->8080/tcp while ss showed no listener and launch probes returned 000.
    • Code fix: core/archipelago/src/api/rpc/package/install.rs now calls ensure_host_port_listener(package_id, package_id) before returning success from the existing-container adoption path.
    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=searxng ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=180 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser launch passed in panel mode: ARCHY_BASE_URL=http://192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APP_ID=searxng ARCHY_APP_TITLE=SearXNG ARCHY_APP_CARD_TITLE=SearXNG ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:8888/ ARCHY_EXPECTED_LAUNCH_MODE=panel ARCHY_EXPECTED_BODY_PATTERN='SearXNG|Search' npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line.
  • Jellyfin is qualified:

    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=jellyfin ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser launch passed in panel mode: ARCHY_BASE_URL=http://192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APP_ID=jellyfin ARCHY_APP_TITLE=Jellyfin ARCHY_APP_CARD_TITLE=Jellyfin ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:8096/ ARCHY_EXPECTED_LAUNCH_MODE=panel ARCHY_EXPECTED_BODY_PATTERN='Jellyfin|jellyfin' npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line.
  • ElectrumX is blocked on .198:

    • Reproduced failure: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=electrumx ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=300 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh stayed absent after install.
    • Backend log shows install was rejected before container creation: electrumx requires an unpruned Bitcoin node while indexing. Current Bitcoin is pruned.
    • Direct Bitcoin RPC confirmed pruned: true, prune_target_size: 576716800, IBD blocks=472928, headers=947914.
    • Disk check showed /var/lib/archipelago has about 384G free, likely not enough for unpruned mainnet plus ElectrumX index. User selected Mark blocked; do not reconfigure Bitcoin on .198 unless explicitly requested.
  • PhotoPrism is pending/blocked on image pull speed:

    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh stayed installing because the container image was still pulling.
    • No photoprism container was created yet; no port 2342 listener.
    • Backend logs show 146.59.87.168:3000/lfg2025/photoprism:240915 timed out after 600s, then git.tx1138.com/lfg2025/photoprism:240915 timed out after 600s, then retry attempt 1/3 restarted the primary registry pull.
    • Treat as image/registry-pull pending rather than app runtime failure unless a later pull completes and the container fails to start.
  • Stuck-installing backend fix deployed after PhotoPrism exposed long pull retries:

    • Backend build deployed to .198: 1f0dd8b9fe801d289557ac050f68011c395374f2b0d5c4677b884d6081612de0.
    • Single-container image pulls now try the configured registry list once with a 300s per-URL timeout instead of repeating the whole list three times with 600s per URL. This turns missing/stalled image pulls into visible failed installs instead of leaving cards in installing for close to an hour.
    • Scanner now removes stale absent transitional entries after TRANSITIONAL_STUCK_TIMEOUT; previously an Installing entry with no container could survive indefinitely after a backend restart or killed pull task.
    • Verified PhotoPrism state recovered to absent with ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=photoprism ARCHY_TIMEOUT=60 ARCHY_STABILITY_SECONDS=1 tests/lifecycle/remote-lifecycle.sh.
  • Nginx Proxy Manager is qualified:

    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=nginx-proxy-manager ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser launch passed as a new-tab app: ARCHY_BASE_URL=http://192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APP_ID=nginx-proxy-manager ARCHY_APP_TITLE='Nginx Proxy Manager' ARCHY_APP_CARD_TITLE='Nginx Proxy Manager' ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:81/ ARCHY_EXPECTED_LAUNCH_MODE=popup ARCHY_EXPECTED_BODY_PATTERN='Nginx|Proxy|Manager|Sign in|Email' npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line.
  • Portainer is qualified:

    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=portainer ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser launch passed as a new-tab app: ARCHY_BASE_URL=http://192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APP_ID=portainer ARCHY_APP_TITLE=Portainer ARCHY_APP_CARD_TITLE=Portainer ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:9000/ ARCHY_EXPECTED_LAUNCH_MODE=popup ARCHY_EXPECTED_BODY_PATTERN='Portainer|Username|Password|Create administrator' npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line.
  • Uptime Kuma is blocked on .198:

    • Initial failure was a recipe bug: code overrode the image entrypoint to /usr/bin/dumb-init but did not pass a program, causing repeated dumb-init usage exits.
    • Fixed recipe by passing -- node server/server.js; deployed backend 540aefb2e1d19aa64b7a5da316bf12c1933145d7ea536afedffb6068371a476f.
    • Added install/start/restart listener repair for host port 3002; latest deployed backend is bbcba3f32fab8e11349962f8bb5227ec0374cf36200a768a716c00485dcd121b.
    • Remaining blocker: Uptime Kuma container stays healthy and listens internally on 3001, Podman reports 0.0.0.0:3002->3001/tcp, but ss loses the actual host listener and direct curl returns 000.
    • Manual podman restart uptime-kuma makes 127.0.0.1:3002 return 302 32 for about 105 seconds, then the listener disappears while the container remains healthy. Treat as unstable rootless pasta listener, not an app process crash.
  • Immich is qualified:

    • Backend build deployed to .198: 22c8129b8f4e93b58cce9baef8f9e1d071cb243faf85bee1b56457d48f46bbfc.
    • Root cause of lifecycle failure: container-health was called with app id immich, but the fallback health/status aliases only inspected immich and archy-immich; the stack's real service container is immich_server. The scanner already reports the stack as immich, so state was running while health returned unknown.
    • Code fix: core/archipelago/src/api/rpc/container.rs now includes immich_server in health/status app-id and container-name candidates for immich.
    • Full lifecycle passed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=immich ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=1800 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser launch passed in panel mode from neode-ui: ARCHY_BASE_URL=http://192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APP_ID=immich ARCHY_APP_TITLE=Immich ARCHY_APP_CARD_TITLE=Immich ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:2283/ ARCHY_EXPECTED_LAUNCH_MODE=panel ARCHY_EXPECTED_BODY_PATTERN='Immich|Login|Admin|Photos' npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line.
    • Note: an earlier /tmp/archipelago.new transfer was truncated/mismatched and crashed with SIGSEGV; restored bbcba3f32fab8e11349962f8bb5227ec0374cf36200a768a716c00485dcd121b, recopied verified local release to /tmp/archipelago.local-release, then deployed it successfully.
  • DWN is blocked on missing/unpullable image:

    • Full lifecycle failed: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=dwn ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Failure: dwn did not reach running within 900s (last=absent).
    • Backend journal shows both pull attempts failed before container creation: 146.59.87.168:3000/lfg2025/dwn-server:main and git.tx1138.com/lfg2025/dwn-server:main, ending with Image pull failed from all 2 configured registries.
    • No dwn container or image exists on .198; treat as image/catalog publishing blocker unless a local fallback image is built or registry image is restored.
  • Botfights handoff point:

    • Lifecycle command was started but user interrupted during install while switching computers: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=botfights ARCHY_FULL_LIFECYCLE=1 ARCHY_TIMEOUT=900 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Last visible output before abort: == botfights: install ==.
    • On resume, inspect current botfights state/container/image before rerunning because the backend install task may have continued after the local harness was aborted.
  • Broad .198 audit passed:

    • Command: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • Running/healthy enough for audit: bitcoin-knots, btcpay-server, lnd, mempool, homeassistant, grafana, searxng, nextcloud, vaultwarden, filebrowser, fedimint, indeedhub.
    • Absent and tolerated by audit at the time: ollama, jellyfin, photoprism, immich, nginx-proxy-manager, portainer, uptime-kuma, electrumx, dwn, botfights, gitea.
  • Focused full preserve-data lifecycle passed in this continuation:

    • btcpay-server: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=btcpay-server ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • nextcloud: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=nextcloud ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • mempool: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • homeassistant: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=homeassistant ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • grafana: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=grafana ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • vaultwarden: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=vaultwarden ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • filebrowser: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=filebrowser ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  • Focused full preserve-data lifecycle still known-passing from prior handoff: lnd, bitcoin-knots, fedimint, indeedhub.

  • SearXNG regression reproduced:

    • Command failed at install launch probe: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=searxng ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
    • Failure: launch failed: searxng http://192.168.1.198:8888/ status=000 bytes=0.
    • Post-failure state: container searxng is Up ... (healthy) and podman port searxng reports 8080/tcp -> 0.0.0.0:8888, but ss -ltn has no *:8888 listener and both curl http://127.0.0.1:8888/ and curl http://192.168.1.198:8888/ return 000 0.
    • A package.restart temporarily recreated the listener and direct curl returned 200 6316, but the next full lifecycle reinstall reproduced the missing listener.
  • Remaining focused full-lifecycle candidates after this continuation:

    • Blocked on .198: electrumx, uptime-kuma.
    • Pending on image pull: photoprism.
    • Absent apps not yet qualified in this pass: botfights, gitea.
    • Botfights lifecycle attempt was interrupted during install; inspect state first on resume.
    • Blocked on missing image: dwn.
    • Skip ollama until image/manifest/catalog entry is restored.
    • electrumx is absent but was mentioned as a possible follow-up in earlier handoff; run only if it remains in scope.

2026-05-04 IndeedHub And LND Update

  • Latest deployed backend hash observed on .198: 83ad80ec793095f2b19746ad8c3d76ab2e7b57b132e4182a28ea9ff86067908b.
  • Frontend bundle redeployed to /opt/archipelago/web-ui; dashboard Last-Modified: Mon, 04 May 2026 10:15:11 GMT.
  • LND was intentionally switched back to panel/iframe launch per user request:
    • Removed lnd from NEW_TAB_APPS, TAB_LAUNCH_APPS, and NEW_TAB_APP_IDS.
    • Browser panel launch qualification passed against http://192.168.1.198:18083/.
  • IndeedHub is now qualified:
    • Full backend/container lifecycle passed.
    • Browser Launch qualification passed in panel/iframe mode.
    • /nostr-provider.js is served by IndeedHub and contains the NIP-07/NIP-98 bridge markers.

IndeedHub Issues Fixed

  • Stack restart failed because restarted backend containers lost network aliases (minio, postgres, redis, relay, api).
  • Added alias repair for IndeedHub stack restart/start paths:
    • core/archipelago/src/api/rpc/package/stacks.rs
    • core/archipelago/src/api/rpc/package/runtime.rs
    • core/archipelago/src/container/prod_orchestrator.rs
  • The frontend nginx container failed under read-only root with:
    • open() "/run/nginx.pid" failed (30: Read-only file system)
  • Added writable tmpfs mounts for stack-created IndeedHub frontend:
    • /run
    • /var/cache/nginx
  • The boot reconciler raced the async stack installer by recreating the single-container manifest indeedhub:latest while package.install indeedhub was still pulling stack images. This stole the indeedhub container name and caused stack frontend creation to fail.
  • Fixed by marking IndeedHub as stack-managed in ProdContainerOrchestrator::ensure_running_with_mode, so generic manifest reconciliation no longer installs/recreates it.
  • Lifecycle harness now waits for async install transition states to settle before checking running, avoiding stale-container false positives.

Passing Commands

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=indeedhub ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=indeedhub \
ARCHY_APP_TITLE=IndeedHub \
ARCHY_APP_CARD_TITLE=IndeedHub \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:7778/ \
ARCHY_EXPECTED_LAUNCH_MODE=panel \
ARCHY_EXPECTED_BODY_PATTERN='Indee|Indeed|Bitcoin|documentary|nostr' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=lnd \
ARCHY_APP_TITLE=LND \
ARCHY_APP_CARD_TITLE=LND \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:18083/ \
ARCHY_EXPECTED_LAUNCH_MODE=panel \
ARCHY_EXPECTED_BODY_PATTERN='Connect Your Wallet|lndconnect|REST|gRPC|Copy lndconnect URI' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
  • Grafana is now qualified:
    • Full backend/container lifecycle passed.
    • Browser Launch qualification passed against http://192.168.1.198:3000/ / /login.
  • Home Assistant is now qualified:
    • Full backend/container lifecycle passed.
    • Browser Launch qualification passed; first-run redirect to /onboarding.html is accepted.
  • SearXNG is now qualified:
    • Full backend/container lifecycle passed.
    • Browser Launch qualification passed in panel/iframe mode against http://192.168.1.198:8888/.
    • Fixed stale rootless pasta listener recovery for port 8888 before install/retry.
    • Fixed manifest image drift by aligning apps/searxng/manifest.yml with package install image 146.59.87.168:3000/lfg2025/searxng:latest; backend restart was required on .198 to reload the deployed manifest.
  • SearXNG recheck after user reported UI not loading:
    • RPC/container state showed running and Podman reported 0.0.0.0:8888->8080/tcp, but ss showed no actual listener and direct curl http://192.168.1.198:8888/ failed.
    • Restarted SearXNG through package.restart, which recreated the rootless port listener on *:8888.
    • Re-ran audit: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=searxng ARCHY_TIMEOUT=180 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh passed.
    • Re-ran browser launch qualification for SearXNG in panel mode; Playwright passed.
  • Ollama is currently blocked/unqualified:
    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=ollama ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh failed after install because container-list stayed absent for 900s.
    • No apps/ollama/manifest.yml exists and ollama is absent from app-catalog/catalog.json / neode-ui/public/catalog.json.
    • Confirmed configured image is missing: podman manifest inspect --tls-verify=false 146.59.87.168:3000/lfg2025/ollama:latest returns manifest unknown.
    • This matches CHANGELOG.md v1.7.45 note that Ollama was removed because it hung installs due to no source image in registries.
  • Nextcloud is now qualified:
    • Full backend/container lifecycle passed with preserve-data uninstall/reinstall: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=nextcloud ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser Launch qualification passed as a new-tab app against http://192.168.1.198:8085/.
    • Note: Nextcloud sends X-Frame-Options: SAMEORIGIN; panel/iframe launch leaves an empty iframe body from dashboard origin, so qualify it with ARCHY_EXPECTED_LAUNCH_MODE=popup.
  • Vaultwarden is now qualified:
    • Initial audit found vaultwarden absent by RPC but a stale rootless pasta listener still bound to *:8082; cleared with pkill -f "pasta.*8082" before install.
    • Full backend/container lifecycle passed with preserve-data uninstall/reinstall: ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=vaultwarden ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh.
    • Browser Launch qualification passed as a new-tab app against http://192.168.1.198:8082/.
  • Continue one-by-one lifecycle/browser qualification with jellyfin, photoprism, immich, nginx-proxy-manager, portainer, uptime-kuma, dwn, botfights, and gitea. Skip Ollama until an image/manifest/catalog entry is restored.

2026-05-04 Fedimint Update

  • Latest deployed backend hash observed on .198: cb464ede6625c00f4fa9e8940d933d7a69d29b0537cfabd8da783f0116a0c587.
  • Fedimint Guardian is now qualified under the current release standard:
    • Full backend/container lifecycle passed with preserve-data uninstall/reinstall.
    • Browser Launch qualification passed in panel/iframe mode against http://192.168.1.198:8175/.
  • Root-cause fix: Fedimint image runs as uid 0 inside the rootless container, so its bind-mounted data directory must be host-owned by 1000:1000, not subuid 100000:100000.
  • Implemented ownership repair in core/archipelago/src/container/prod_orchestrator.rs via the Fedimint pre-start/data-dir hook.
  • Passing lifecycle command:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
  • Passing browser launch command:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=fedimint \
ARCHY_APP_TITLE='Fedimint Guardian' \
ARCHY_APP_CARD_TITLE='Fedimint Guardian' \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:8175/ \
ARCHY_EXPECTED_LAUNCH_MODE=panel \
ARCHY_EXPECTED_BODY_PATTERN='Fedimint|Guardian|Federation|Mint|Bitcoin' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
  • Result: 1 passed (11.7s).
  • Note: backend scanner currently reports Fedimint lan_address from the first exposed port (8173), but the frontend app-session mapping correctly launches the UI on 8175.
  • Continue with IndeedHub full lifecycle and browser Launch qualification.

2026-05-04 Mempool Update

  • Latest deployed backend hash on .198: 02d79360df86d653c9e7b06a05bdf039a0454b81a65220dbe16fa57cafeed236.
  • Mempool is now qualified:
    • Full backend/container lifecycle passed.
    • Browser Launch qualification passed in panel/iframe mode.

Mempool Issues Fixed

  • Initial Mempool lifecycle failed after install with bad health: mempool is unknown.
  • Root cause: package id mempool maps to manifest/app id archy-mempool-web with container name mempool; container-health called orchestrator.health("mempool") directly and bypassed alias candidates.
  • Added alias handling in core/archipelago/src/api/rpc/container.rs:
    • mempool / mempool-web status candidates include archy-mempool-web.
    • specific container-health { app_id: "mempool" } now tries alias candidates and direct Podman container-name fallback.
  • After deploy, short audit passed:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool ARCHY_TIMEOUT=60 ARCHY_STABILITY_SECONDS=0 tests/lifecycle/remote-lifecycle.sh
  • Mempool full lifecycle passed:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  • Result: all checks passed.

Mempool Browser Launch

  • Mempool is an in-panel/iframe app, not a new-tab app.
  • Initial browser test failed because the generic spec expected a popup.
  • Updated neode-ui/e2e/app-launch.spec.ts:
    • ARCHY_EXPECTED_LAUNCH_MODE=panel verifies an app session iframe instead of popup.
    • Card selection now matches a card heading exactly via APP_CARD_TITLE/APP_TITLE, avoiding false matches from description text (ElectrumX description mentions Mempool).
    • Panel iframe selector tolerates source URLs without a trailing slash.
  • Passing command:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=mempool \
ARCHY_APP_TITLE=Mempool \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:4080/ \
ARCHY_EXPECTED_LAUNCH_MODE=panel \
ARCHY_EXPECTED_BODY_PATTERN='Mempool|Bitcoin|Block|Transaction' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
  • Result: 1 passed (15.8s).
  • Continue installed app qualification with electrumx or filebrowser.
  • ElectrumX already had prior focused work but should get the current browser launch standard if not already rerun after these Playwright spec changes.
  • Suggested ElectrumX backend lifecycle:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=electrumx ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  • Suggested ElectrumX browser launch:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=electrumx \
ARCHY_APP_TITLE=ElectrumX \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:50002/ \
ARCHY_EXPECTED_LAUNCH_MODE=panel \
ARCHY_EXPECTED_BODY_PATTERN='ElectrumX|Connect Your Wallet|50001' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line

2026-05-04 Resume Snapshot

  • Another agent changed the worktree before this session; do not revert unrelated dirty files.
  • .198 service is active, archipelago-doctor.timer inactive, archipelago-reconcile.timer inactive.
  • Latest deployed backend hash on .198: 02d79360df86d653c9e7b06a05bdf039a0454b81a65220dbe16fa57cafeed236.
  • LND remains qualified from prior session: full backend lifecycle passed and browser Launch opens http://192.168.1.198:18083/ with wallet-connect content.
  • BTCPay is now qualified:
    • Full backend/container lifecycle passed after stop-state normalization fix.
    • Browser Launch qualification passed against .198; first-run redirect to /register is accepted.

2026-05-04 Work Completed

  • Rechecked local/remote state after separate-agent work.
  • Ran BTCPay full lifecycle:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=btcpay-server ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  • Initial BTCPay run failed at stop because BTCPay containers were explicitly stopped, but Podman reports stopped containers as exited; scanner overwrote package state from Stopped to Exited, and the harness waited for stopped.
  • Fixed scanner merge path in core/archipelago/src/server.rs: scanned Exited package entries are normalized to Stopped when the app id is present in /var/lib/archipelago/user-stopped.json via configured data_dir.
  • Rebuilt and deployed backend to .198; new hash 6bd9db024ab37017cadd684cb3296c6adbcf290ac27e1238a6bf1e7c0f883e3e.
  • Verified BTCPay then reports state=stopped after explicit stop.
  • Reran BTCPay full lifecycle; result: all checks passed.
  • Updated neode-ui/e2e/app-launch.spec.ts to support app-specific URL/body regexes:
    • ARCHY_EXPECTED_LAUNCH_URL_PATTERN
    • ARCHY_EXPECTED_BODY_PATTERN
  • Ran BTCPay browser launch qualification:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=btcpay-server \
ARCHY_APP_TITLE=BTCPay \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:23000/ \
ARCHY_EXPECTED_LAUNCH_URL_PATTERN='^http://192\.168\.1\.198:23000/(register)?$' \
ARCHY_EXPECTED_BODY_PATTERN='BTCPay|Create.*account|Register|Store' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
  • Result: 1 passed (10.3s).
  • Mempool is now complete. Continue app-by-app qualification with ElectrumX or File Browser.
  • Prior suggested Mempool command, now passing:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=mempool ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  • If Mempool backend lifecycle passes, run browser launch qualification:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=mempool \
ARCHY_APP_TITLE=Mempool \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:4080/ \
ARCHY_EXPECTED_BODY_PATTERN='Mempool|Bitcoin|Block|Transaction' \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line

Updated Resume Prompt

Resume Archipelago container lifecycle hardening from /home/archipelago/Projects/archy. Read docs/CONTAINER_LIFECYCLE_HANDOFF.md first. Remote node is 192.168.1.198, SSH key /home/archipelago/.ssh/id_ed25519, ARCHY_PASSWORD=password123. Preserve data unless explicitly told otherwise. Keep archipelago-doctor.timer and archipelago-reconcile.timer paused. Do not revert unrelated dirty worktree changes because another agent has been working too. LND, BTCPay, and Mempool now have full backend lifecycle plus browser Launch qualification passing. Latest deployed backend hash on .198 is 02d79360df86d653c9e7b06a05bdf039a0454b81a65220dbe16fa57cafeed236. Continue with the next installed app, likely ElectrumX or File Browser, using full lifecycle and then Playwright browser launch qualification.

2026-05-03 Resume Snapshot

  • Remote node under test: 192.168.1.198.
  • SSH key: /home/archipelago/.ssh/id_ed25519.
  • Lifecycle password: ARCHY_PASSWORD=password123.
  • Current qualification target: BTCPay full lifecycle. LND user-facing launch flow is now qualified.
  • Do not proceed to broad release/audit until app launch qualification includes a real browser click/open-tab check, not just backend/direct-port curl.
  • Preserve data during lifecycle testing unless explicitly told otherwise.
  • Legacy timers should remain paused during deterministic qualification: archipelago-doctor.timer and archipelago-reconcile.timer inactive/disabled.

Latest Deployed State On .198

  • Backend deployed to /usr/local/bin/archipelago; service observed active.
  • Latest backend hash observed on .198: abbd9fa4e6beace75f590c1988a1904b9de62b4b21fade1291926ac039c4747b.
  • Frontend bundle was rebuilt with LND new-tab config and deployed to /opt/archipelago/web-ui.
  • Dashboard entrypoint at http://192.168.1.198/ returns 200 and fresh Last-Modified: Sun, 03 May 2026 20:09:08 GMT.
  • Dashboard CSP allows direct app ports via connect-src ... http://192.168.1.198:* and frame-src ... http://192.168.1.198:*.
  • LND direct UI still works from the test environment:
curl -fsS -D - http://192.168.1.198:18083/ -o /tmp/opencode/lnd-ui.html

Expected: HTTP/1.1 200 OK, wallet-connect page content including Connect Your Wallet, lndQrBox, rest-tor, grpc-tor, and Copy lndconnect URI.

LND Status

  • Backend/container lifecycle for LND passed after the latest backend changes:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  • Result: all checks passed through install, stop, start, restart, preserve-data uninstall, reinstall.
  • Direct LND UI is reachable at http://192.168.1.198:18083/.
  • Product/UI launch is now qualified by Playwright against .198. User previously saw browser launch failures (refused to connect / This site can't be reached), but the deployed frontend/backend now open the direct LND UI URL successfully.
  • Frontend changes intended to fix this:
    • neode-ui/src/views/appSession/appSessionConfig.ts: lnd added to NEW_TAB_APPS.
    • neode-ui/src/views/apps/appsConfig.ts: lnd added to TAB_LAUNCH_APPS.
    • neode-ui/src/stores/appLauncher.ts: lnd added to NEW_TAB_APP_IDS.

Browser-Level Launch Check Added

  • Added neode-ui/e2e/app-launch.spec.ts as a reusable Playwright qualification test.
  • Intended run command:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=lnd \
ARCHY_APP_TITLE=LND \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:18083/ \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
  • Current result: passing against .198.
  • Passing command:
cd /home/archipelago/Projects/archy/neode-ui
ARCHY_BASE_URL=http://192.168.1.198 \
ARCHY_PASSWORD=password123 \
ARCHY_APP_ID=lnd \
ARCHY_APP_TITLE=LND \
ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:18083/ \
npx playwright test e2e/app-launch.spec.ts --config=playwright.config.ts --project=chromium --reporter=line
  • Result: 1 passed (12.3s).
  • The test clicks the real My Apps Launch button, waits for the popup, verifies URL http://192.168.1.198:18083/, and checks wallet-connect text in the popup body.

New Root-Cause Findings To Continue

  • AppDetails can render App Not Found before package data has arrived. The route still does not wait for the WebSocket initial package snapshot; the launch qualification now uses My Apps card launch, which matches user behavior.
  • server.get-state frontend call was broken against the deployed backend:
RPC method: server.get-state
RPC error on server.get-state: Unknown method: server.get-state
  • Fixed by adding server.get-state dispatch support in core/archipelago/src/api/rpc/dispatcher.rs and deploying the new backend to .198.
  • Verified browser-authenticated server.get-state returns hasLnd=true, status=200, error=null.
  • WebSocket initial data still works; logs showed WebSocket /ws/db connected and initial state dumps.
  • Earlier browser-test failures were due to wrong Playwright baseURL defaulting to .228 and/or empty package state on that node, not LND direct UI reachability.
  • Direct unauthenticated container-list is allowed by auth rules, but authenticated browser calls without CSRF fail with 403; the Playwright test should not rely on raw RPC calls without CSRF unless using exempt read-only methods.

Immediate Resume Steps

  1. Proceed to BTCPay full lifecycle:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=btcpay-server ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  1. If BTCPay passes backend lifecycle, add/run browser-level launch qualification for BTCPay using the same Playwright spec with ARCHY_APP_ID=btcpay-server, ARCHY_APP_TITLE=BTCPay, and ARCHY_EXPECTED_LAUNCH_URL=http://192.168.1.198:23000/.

  2. Fix stale boot_reconciler unit tests for existing-only production behavior if running the full backend test suite.

Verification Commands Before Resuming

ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 'systemctl is-active archipelago.service; systemctl is-active archipelago-doctor.timer 2>/dev/null || true; systemctl is-active archipelago-reconcile.timer 2>/dev/null || true; podman ps -a --format "{{.Names}} {{.Status}} {{.Ports}}" | egrep "lnd|btcpay|nbxplorer|bitcoin|electrs" || true'
curl -fsS -D - http://192.168.1.198:18083/ -o /tmp/opencode/lnd-ui.html

Files Touched In This Latest Session

  • neode-ui/e2e/app-launch.spec.ts: new parameterized Playwright launch qualification spec.
  • neode-ui/playwright.config.ts: baseURL can now be overridden with ARCHY_BASE_URL.
  • core/archipelago/src/api/rpc/dispatcher.rs: added server.get-state dispatch handler.
  • neode-ui/src/views/appSession/appSessionConfig.ts: LND forced new-tab session behavior.
  • neode-ui/src/views/apps/appsConfig.ts: LND marked as tab-launch app.
  • neode-ui/src/stores/appLauncher.ts: LND forced new-tab from legacy/open URL path.
  • docs/CONTAINER_LIFECYCLE_HANDOFF.md: this handoff update.

Still Dirty / Important

  • Worktree is dirty with many lifecycle/backend/frontend changes and untracked files. Do not revert other changes.
  • git status --short currently includes untracked tests/lifecycle/remote-lifecycle.sh, core/archipelago/src/container/lnd.rs, neode-ui/e2e/app-launch.spec.ts, and this handoff doc.
  • No commit was created.

Resume Prompt

Use this prompt in a fresh remote session:

Resume Archipelago lifecycle hardening from /home/archipelago/Projects/archy. Read docs/CONTAINER_LIFECYCLE_HANDOFF.md first. Current remote node is 192.168.1.198, SSH key /home/archipelago/.ssh/id_ed25519, ARCHY_PASSWORD=password123. LND backend lifecycle and browser launch qualification are now passing; latest deployed backend hash on .198 is abbd9fa4e6beace75f590c1988a1904b9de62b4b21fade1291926ac039c4747b. Continue with BTCPay full lifecycle, then add/run the same browser launch qualification for BTCPay. Preserve data unless explicitly told otherwise, keep doctor/reconcile timers paused, and do not revert unrelated dirty worktree changes.

Operator Snapshot

  • Plan: harden app/container lifecycle before release using strict lifecycle tests and app-specific probes.
  • Current target: run broad .198 audit after focused fixes for LND, Bitcoin Knots, Fedimint, and IndeedHub.
  • LND status on .198: strict audit and full preserve-data lifecycle passed on 2026-05-02.
  • Bitcoin Knots status on .198: full preserve-data lifecycle passed on 2026-05-02.
  • Fedimint status on .198: full preserve-data lifecycle passed on 2026-05-02.
  • IndeedHub status on .198: full preserve-data lifecycle passed on 2026-05-02.
  • Last known local status: focused lifecycle/orchestrator/container unit tests pass and release build succeeds.
  • Do not release until broad audit and app-specific UI probes pass.

Goal

Harden and verify Archipelago app/container lifecycle before release. Required coverage is install, launch, stop, start, restart, uninstall with preserve_data=true, reinstall, and launch again. UI checks must validate app-specific functionality, not only HTTP 200.

Current Focus

Run broad lifecycle audit on node 192.168.1.198, then continue app-by-app for any installed package that is non-running or unhealthy. LND, Bitcoin Knots, Fedimint, and IndeedHub have each passed focused strict lifecycle validation.

Strict LND criteria:

  • lnd container reaches running.
  • archy-lnd-ui companion serves /app/lnd/.
  • LND wallet is initialized or unlocked non-interactively.
  • /var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon exists.
  • /lnd-connect-info returns certificate, macaroon, REST/gRPC ports, and Tor onion.
  • LND UI contains all connection modes: REST local, REST Tor, gRPC local, gRPC Tor.
  • QR/connect controls are present and backed by real connection info.

Important Nodes

  • .198: SSH works with /home/archipelago/.ssh/id_ed25519.
  • .228: RPC works, SSH still blocked with Permission denied (publickey,password).

Test Harness

Primary remote harness:

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd tests/lifecycle/remote-lifecycle.sh
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh

Harness changes made:

  • Normalizes package states with ascii_downcase because API can return Running.
  • Audit mode allows absent, fails installed non-running states.
  • Full lifecycle uses preserve-data uninstall.
  • LND probe checks DOM, all four connection modes, /lnd-connect-info, macaroon/cert lengths, REST/gRPC ports, and Tor onion.
  • Electrum probe now checks local and Tor QR containers/fields, qrcode.js, and /electrs-status Tor onion.
  • Added ARCHY_STABILITY_SECONDS observation window, default 15, so a single running snapshot is not enough.
  • Audit/full lifecycle now call container-health after install/start/restart/reinstall and fail anything other than healthy.
  • Focused validation passed for LND, Bitcoin Knots, Fedimint, and IndeedHub.

Implemented Backend Changes

Lifecycle/Reconcile

  • core/archipelago/src/server.rs
    • Scanner merge now recovers stale Removing -> Running if the container is actually live.
    • Added stale-removing recovery test.
  • core/archipelago/src/main.rs
    • Crash recovery now runs synchronously before BootReconciler.
  • core/archipelago/src/bootstrap.rs
    • Removed automatic deletion of /run/user/1000/{containers,libpod} when podman info fails.
  • core/archipelago/src/crash_recovery.rs
    • Generic boot recovery narrowed to safe containers only.
  • core/archipelago/src/container/prod_orchestrator.rs
    • Uninstall disables manifests rather than deleting manifest availability.
    • Explicit reinstall re-enables disabled manifests.
    • LND pre-start writes/repairs config.
    • LND post-start initializes/unlocks wallet in production.
    • Post-start hook is skipped in cfg(test) so unit tests do not mutate host LND state.
    • stop disables desired-state reconcile until explicit start.
    • Reconciler respects /var/lib/archipelago/user-stopped.json across daemon restarts.
    • Start path recreates containers when stale rootless Podman runtime state prevents startup.
  • core/archipelago/src/api/rpc/package/install.rs
    • Install reconciles companion UIs synchronously.
  • core/archipelago/src/api/rpc/package/runtime.rs
    • Start/restart reconcile companions.
    • Missing known companion containers are tolerated during stop/restart.
  • core/archipelago/src/health_monitor.rs
    • Added Bitcoin variant conflict guard for auto-restart: bitcoin-core and bitcoin-knots can both be installed, but the monitor must not auto-start one into default 8332/8333 while the other is already running.
    • Added unit tests for the conflict guard.
  • core/archipelago/src/api/rpc/package/install.rs
    • Removed install-time hard block between bitcoin-core and bitcoin-knots; users may install both. Runtime still needs alternate ports or one inactive variant to run both simultaneously.
  • core/archipelago/src/api/rpc/package/config.rs
    • Bitcoin variant container resolution is precise, so package operations for one variant do not target the other.
  • core/container/src/podman_client.rs
    • Custom network containers now receive container-name DNS aliases.
    • Containers get host.archipelago:10.89.0.1 for host RPC access from rootless networks.
  • apps/fedimint/manifest.yml and apps/fedimint-gateway/manifest.yml
    • Fedimint data owner fixed to 1000:1000.
    • Bitcoin RPC host changed to http://host.archipelago:8332.

Companions

  • core/archipelago/src/container/companion.rs
    • LND UI uses bridge networking, not host networking.
    • LND UI moved from host 8081 to host 18083 to avoid nostr-rs-relay conflict.
    • Test updated to expect 18083:80.
  • Routing/metadata moved LND UI to 18083:
    • apps/lnd-ui/manifest.yml
    • core/archipelago/src/container/docker_packages.rs
    • core/container/src/podman_client.rs
    • core/archipelago/src/port_allocator.rs
    • neode-ui/src/views/appSession/appSessionConfig.ts
    • neode-ui/src/stores/container.ts
    • neode-ui/src/stores/appLauncher.ts
    • neode-ui/src/views/appDetails/appDetailsData.ts
    • nginx snippets/configs for /app/lnd/ now proxy to 127.0.0.1:18083.

LND

  • New/expanded core/archipelago/src/container/lnd.rs.
  • ensure_config() writes required Bitcoin backend flags:
    • bitcoin.active=true
    • bitcoin.mainnet=true
    • bitcoin.node=bitcoind
    • bitcoind.rpchost=bitcoin-knots:8332
  • Handles permission denied writing lnd.conf via sudo.
  • ensure_wallet_initialized() now:
    • Checks wallet/macaroons via sudo-aware helpers because LND data is container-owned 0700.
    • Uses REST unlocker GET /v1/genseed and POST /v1/initwallet for new wallets.
    • Falls back to lncli unlock --stdin if wallet already exists.
    • Uses sudo-aware read for macaroon when checking /v1/getinfo readiness.

Verified Locally

Recent focused test passes:

cd /home/archipelago/Projects/archy/core
cargo test -p archipelago --bin archipelago health_monitor
cargo test -p archipelago --bin archipelago prod_orchestrator
cargo test -p archipelago --bin archipelago bitcoin_variant_container_names_are_precise
cargo test -p archipelago-container podman_network_settings_uses_networks_map_for_custom_networks
bash -n ../tests/lifecycle/remote-lifecycle.sh

Release build succeeds:

cd /home/archipelago/Projects/archy/core
cargo build -p archipelago --bin archipelago --release

.198 Current State

Recent deployment:

  • Built release binary with sudo-aware LND wallet checks and LND UI port 18083.
  • Deployed to /usr/local/bin/archipelago on .198 with backup.
  • Restarted archipelago.service; it returned active.
  • nginx on .198 was already updated so /app/lnd/ proxies to 127.0.0.1:18083.

Known .198 observations:

  • LND wallet artifacts exist after previous bootstrap:
    • /var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon
    • /var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/wallet.db
  • nostr-rs-relay occupies 8081; LND UI must stay on 18083.
  • LND strict audit passed on 2026-05-02:
    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd tests/lifecycle/remote-lifecycle.sh
  • LND full preserve-data lifecycle passed on 2026-05-02:
    • ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
  • Final observed state after LND lifecycle:
    • archipelago.service active.
    • nginx active.
    • lnd running on 8080, 9735, and 10009.
    • archy-lnd-ui running on 18083.
    • archy-electrs-ui running and 50002 listening.
  • Active default Bitcoin backend is currently bitcoin-knots; bitcoin-core is installed but user-stopped.
  • /var/lib/archipelago/user-stopped.json should include bitcoin-core so daemon restart does not resurrect it into a default-port conflict.
  • Fedimint fixed issues:
    • stale rootless Podman runtime storage was handled by recreate-on-start-failure path.
    • data ownership fixed for gateway and federation DB lock files.
    • Bitcoin RPC DNS fixed via host.archipelago host alias.
  • IndeedHub full lifecycle passed after forcing the dedicated stack installer path, which removes stale stack containers and recreates network aliases and volumes.

Focused Remote Passes

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=bitcoin-knots ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=fedimint ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=indeedhub ARCHY_FULL_LIFECYCLE=1 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh

Result for each focused run: all checks passed.

Immediate Next Steps

  1. Run broad audit:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_STABILITY_SECONDS=5 tests/lifecycle/remote-lifecycle.sh
  1. Continue app-by-app for any installed package that broad audit reports as non-running or unhealthy.

  2. Resume Electrum full lifecycle with strict Tor/QR checks if Electrum remains in scope. Previous run was user-aborted during electrumx: install:

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=electrumx ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
  1. If Electrum fails, capture current service and port state:
ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 'systemctl is-active archipelago.service; systemctl is-active nginx; ss -ltn | grep -E ":(50001|50002|18083|8081|8080|10009|9735)" || true; podman ps -a --format "{{.Names}} {{.Status}} {{.Ports}}" | egrep "electrs|electrum|lnd|nostr" || true'
  1. LND commands that passed and can be rerun as a regression check:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd tests/lifecycle/remote-lifecycle.sh
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
  1. If /app/lnd/ regresses to 502, inspect companion unit and logs:
ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 'systemctl --user status archy-lnd-ui.service --no-pager -l 2>&1 | sed -n "1,160p"; test -f ~/.config/containers/systemd/archy-lnd-ui.container && sed -n "1,160p" ~/.config/containers/systemd/archy-lnd-ui.container || true; journalctl --user -u archy-lnd-ui.service -n 160 --no-pager 2>&1 | sed -n "1,160p"'
  1. If package.stop lnd regresses and does not stop the container, inspect runtime stop path in:
  • core/archipelago/src/api/rpc/package/runtime.rs
  • core/archipelago/src/container/prod_orchestrator.rs

Likely issue: state scanner/reconciler or companion handling re-starts LND during stop/uninstall, or stop path waits on package state while container is being reconciled.

Previously Fixed Live Issues On .198

  • stale fedimint=removing recovered.
  • orphaned filebrowser rootlessport on 8083 cleared.
  • orphaned bitcoin-core rootlessport on 8332/8333 cleared.
  • LND missing bitcoin.active/backend config fixed.
  • LND config permission denied fixed via sudo write.
  • Companion start/restart race mostly fixed by synchronous companion reconciliation.
  • Bitcoin Core/Knots install-time conflict removed while preserving runtime default-port safety.
  • Bitcoin Core unintended resurrection after daemon restart fixed through persistent user-stopped state.
  • Fedimint DB lock permission errors fixed through 1000:1000 data ownership.
  • Fedimint Bitcoin RPC DNS errors fixed through host.archipelago.
  • IndeedHub stale stopped stack fixed by reinstalling through the dedicated stack installer.

Do Not Forget

  • Do not release until strict lifecycle and app-specific UI probes pass.
  • Preserve data during destructive lifecycle testing unless explicitly instructed otherwise.
  • Do not revert user/other-agent worktree changes.
  • .228 still needs SSH fixed or must be tested RPC/UI-only.