- [x]**C1-DEPLOY — Deploy current codebase to .228**: Run `./scripts/deploy-to-target.sh --target 192.168.1.228` from macOS. If deploy script fails, read the error, fix the script, retry. After deploy succeeds, SSH to .228 and verify backend is alive: `sudo systemctl status archipelago` and `curl -s http://127.0.0.1:5678/health`. If backend is not running, check `journalctl -u archipelago --no-pager -n 100` and fix whatever is wrong. Do not mark done until: deploy succeeds AND backend returns health JSON.
- [x]**C1-CONTAINERS — Check every single container**: SSH to .228. Run `podman ps -a --format 'table {{.Names}}\t{{.State}}\t{{.Status}}\t{{.Ports}}'` to see ALL containers. For EVERY container that is not `running`: run `podman logs <name> --tail 100` and record the error. For every container showing `(unhealthy)`: run `podman logs <name> --tail 100` and record why. For containers that don't exist yet but should (bitcoin-knots, lnd, electrumx, archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui): note them as missing. Write a summary of ALL issues found as a comment at the bottom of this plan file under `## Issue Log`. Do not fix anything yet — just diagnose. Mark done when you have a complete picture of every container's state.
- [x]**C1-APPS — Pull and start every Bitcoin stack app**: SSH to .228. For each app in the Bitcoin stack, ensure it exists and is running. Check: (1) `podman ps -a --filter name=bitcoin-knots` — if missing or stopped, check if the image exists (`podman images | grep bitcoin-knots`), if not pull it. Start or create the container using the spec from `scripts/container-specs.sh`. (2) Same for `lnd`. (3) Same for `electrumx`. (4) Same for `archy-bitcoin-ui`, `archy-lnd-ui`, `archy-electrs-ui`. After starting each container, immediately read its logs: `podman logs <name> --tail 50`. Record every error. If a container won't start, record the exact error. If it starts but crashes within 30 seconds, record the crash log. Do not mark done until you have attempted to start ALL 6 containers and recorded the outcome of each.
- [x]**C1-HEALTH — Deep health check of every running container**: SSH to .228. For each running Bitcoin stack container: (1) **bitcoin-knots**: `podman exec bitcoin-knots bitcoin-cli getblockchaininfo 2>&1` — record if RPC works or fails. Check `podman logs bitcoin-knots --tail 50` for any warnings/errors. (2) **lnd**: Check if it connects to Bitcoin backend — `podman logs lnd --tail 50 | grep -i 'error\|fail\|disconnect\|unable'`. (3) **electrumx**: Check if it connects to Bitcoin — `podman logs electrumx --tail 50 | grep -i 'error\|fail\|disconnect\|unable'`. (4) **archy-bitcoin-ui**: `curl -sf http://localhost:8334/ > /dev/null && echo OK || echo FAIL`. (5) **archy-lnd-ui**: `curl -sf http://localhost:8081/ > /dev/null && echo OK || echo FAIL`. (6) **archy-electrs-ui**: Find its port (`podman port archy-electrs-ui 2>/dev/null || echo 'not running'`) and curl it. Record EVERY failure. Do not mark done until every container has been health-checked and all results recorded in the Issue Log below.
- [x]**C2-FIX — Fix every issue from Cycle 1**: Read the Issue Log at the bottom of this file. For EACH issue listed: (1) Read the relevant source code. (2) Understand the root cause. (3) Write a proper, production-quality fix — clean code, proper error handling, no hacks. (4) Commit with `fix: description`. Address ALL issues — do not cherry-pick. If a fix requires changing Rust code, make the change locally (it will be compiled on .228 during deploy). If a fix requires changing container specs, update `scripts/container-specs.sh`. If a fix requires changing a Dockerfile, update the relevant `docker/*/Dockerfile`. If a fix requires changing image versions, update `scripts/image-versions.sh`. If a fix requires changing nginx configs, update the relevant config file. Do not mark done until every issue from the log has a fix committed.
- [x]**C2-DEPLOY — Redeploy with all fixes**: Run `./scripts/deploy-to-target.sh --target 192.168.1.228`. If deploy fails, fix the deploy error and retry. After deploy, SSH to .228 and rebuild any UI containers that changed: `cd ~/archy/docker/bitcoin-ui && podman build -t bitcoin-ui:local . && podman stop archy-bitcoin-ui 2>/dev/null; podman rm archy-bitcoin-ui 2>/dev/null` — then recreate from spec. Same for lnd-ui and electrs-ui if their Dockerfiles changed. Do not mark done until deploy succeeds and backend health check passes.
- [x]**C2-RETEST — Test everything again**: SSH to .228. Run the EXACT same checks as C1-CONTAINERS, C1-APPS, and C1-HEALTH. For EVERY container: `podman ps -a --format 'table {{.Names}}\t{{.State}}\t{{.Status}}'`. For every running container, read logs: `podman logs <name> --tail 50 | grep -i 'error\|fail\|panic\|crash\|unable\|refused\|timeout'`. Curl every UI. Check every RPC endpoint. **If ANY new issues are found**: fix them right here — edit code, commit, redeploy to .228, and retest. Keep looping (fix → deploy → test) within this single task until ALL containers are running, ALL health checks pass, ALL UIs respond, ALL logs are clean. Do not mark done until: `podman ps -a --format '{{.Names}} {{.State}}' | grep -v running` returns ZERO non-running containers in the Bitcoin stack, and every curl returns 200, and every log tail has no errors.
- [x]**C3-RESTART-BITCOIN — Kill Bitcoin Knots, verify auto-restart**: SSH to .228. Run `podman stop bitcoin-knots`. Wait 15 seconds. Check `podman ps --filter name=bitcoin-knots --format '{{.Names}} {{.State}}'`. It MUST be `running` (restarted by restart policy). If not running: (1) Check `podman inspect bitcoin-knots --format '{{.HostConfig.RestartPolicy.Name}}'` — must be `unless-stopped` or `always`. (2) If restart policy is wrong, fix `scripts/container-specs.sh`, recreate the container with correct policy. (3) Retest until bitcoin-knots auto-restarts after stop. After it restarts, verify RPC works: `podman exec bitcoin-knots bitcoin-cli getblockchaininfo`. Check logs for crash messages. **Loop fix → recreate → kill → verify until it works.** Do not mark done until bitcoin-knots survives a stop and auto-restarts within 30 seconds.
- [x]**C3-RESTART-LND — Kill LND, verify auto-restart**: Same process. `podman stop lnd`. Wait 15 seconds. Verify it auto-restarts. Verify it reconnects to bitcoin-knots (check logs: `podman logs lnd --tail 20`). If it doesn't restart or can't reconnect: fix, recreate, retest. Loop until it works. Do not mark done until lnd auto-restarts and reconnects to Bitcoin.
- [x]**C3-RESTART-UIS — Kill all UI containers, verify auto-restart**: `podman stop archy-bitcoin-ui archy-lnd-ui archy-electrs-ui`. Wait 15 seconds. Run `podman ps --format '{{.Names}} {{.State}}' | grep -E 'bitcoin-ui|lnd-ui|electrs-ui'` — all three must be `running`. Curl each UI endpoint — all must return 200. If any doesn't restart: fix restart policy, recreate, retest. Loop until all three survive kill and auto-restart.
- [x]**C3-CASCADE — Kill Bitcoin, watch everything, restart, verify full recovery**: This is the critical test. `podman stop bitcoin-knots`. Wait 60 seconds. Check LND and ElectrumX: they should either stay running (waiting for Bitcoin) or enter unhealthy/restarting state — NOT crash permanently. Run `podman ps -a --format '{{.Names}} {{.State}} {{.Status}}' | grep -E 'bitcoin|lnd|electrumx'`. Now start Bitcoin: `podman start bitcoin-knots`. Wait 120 seconds for Bitcoin RPC to come up. Check ALL containers: `podman ps --format '{{.Names}} {{.State}} {{.Status}}' | grep -E 'bitcoin|lnd|electrumx'`. ALL must be `running`. Read logs of each: `podman logs lnd --tail 30` and `podman logs electrumx --tail 30` — should show reconnection, not permanent failure. If ANY container is stuck in a crash loop or permanently dead: read logs, diagnose root cause, fix the code/config, redeploy, retest the entire cascade. **Loop until the full cascade works**: stop Bitcoin → dependents survive → restart Bitcoin → everything recovers. Do not mark done until this passes cleanly.
- [x]**C3-BACKEND-CRASH — Kill Archipelago backend, verify containers survive**: `sudo systemctl kill -s SIGKILL archipelago`. Wait 10 seconds. (1) Check backend restarted: `sudo systemctl status archipelago` — must be `active`. (2) Check containers: `podman ps --format '{{.Names}} {{.State}}' | grep -E 'bitcoin|lnd|electrumx'` — ALL must still be `running` (containers are independent of backend). (3) Check crash recovery: `journalctl -u archipelago --no-pager -n 50 | grep -i crash` — should show crash detected. (4) Check health endpoint: `curl -s http://127.0.0.1:5678/health` — should return JSON. If any of these fail: read full journal logs, find the error, fix the backend code, redeploy, retest. Loop until backend crash recovery works cleanly.
## Cycle 4: Full Retest — Deploy Clean, Test Everything, Zero Failures
- [x]**C4-CLEAN-DEPLOY — Fresh deploy with all accumulated fixes**: Run `./scripts/deploy-to-target.sh --target 192.168.1.228`. Rebuild UI containers on .228 if any Dockerfiles changed. Restart backend: `sudo systemctl restart archipelago`. Wait 30 seconds. This is the "clean slate" deploy with everything fixed from previous cycles.
- [x]**C4-FULL-TEST — Complete test suite, fix anything that fails, loop until perfect**: SSH to .228. Run EVERY check below. If ANY fails, fix → redeploy → rerun ALL checks. Repeat until every single line passes:
**Container state** (all must show `running`):
```
podman ps -a --format '{{.Names}} {{.State}}' | grep -E 'bitcoin-knots|lnd|electrumx|bitcoin-ui|lnd-ui|electrs-ui'
```
**Container health** (none should show `unhealthy`):
**IF ANY CHECK FAILS**: Read the logs, find the root cause, fix the code properly (clean, well-structured, typed, following CLAUDE.md), commit with `fix:` prefix, redeploy to .228, and run ALL checks again from the top. Keep looping. Do not mark done until EVERY SINGLE CHECK above passes in a single clean run with zero failures.
- [x]**C5-SOAK — Wait 5 minutes, recheck everything**: SSH to .228. Wait 5 minutes (`sleep 300`). Then rerun every check from C4-FULL-TEST. Containers that pass immediately but fail after 5 minutes have stability issues (memory leaks, connection timeouts, health check flaps). If ANYTHING changed state or went unhealthy during the 5-minute window: read logs (`podman logs <name> --since 5m`), find the issue, fix it, redeploy, wait 5 minutes again, recheck. Loop until everything stays healthy for a full 5-minute soak. Do not mark done until a clean 5-minute soak passes with zero state changes.
- [x]**C5-FINAL — Record final state**: SSH to .228. Run and paste output of: (1) `podman ps -a --format 'table {{.Names}}\t{{.State}}\t{{.Status}}'` (2) `curl -s http://127.0.0.1:5678/health` (3) `for c in bitcoin-knots lnd electrumx; do echo "=== $c ==="; podman logs $c --tail 5 2>&1; done`. Record this as the final passing state in the Issue Log at the bottom of this file. Mark the overall result: **PASS** or note any accepted limitations. Do not mark done until the final state is recorded.
- [x]**C6-QUALITY — Verify all code changes meet production standards**: Review every commit made during this overnight run. For each changed file: (1) Rust files: `grep -n 'unwrap()\|expect(' <file> | grep -v test | grep -v 'unwrap_or\|unwrap_err'` — zero results. `grep -n 'TODO\|FIXME\|HACK' <file>` — zero results. (2) TypeScript/Vue files: `cd neode-ui && npx vue-tsc -b --noEmit` — zero errors. (3) Shell scripts: `bash -n <file>` — syntax OK for every changed script. (4) No hardcoded credentials, no `:latest` tags, no `sudo podman`. If ANY quality issue is found: fix it properly, commit, redeploy, and rerun the relevant tests from C4-FULL-TEST to confirm the quality fix didn't break anything. Do not mark done until all code is production-quality AND all tests still pass.
**Known limitation:** Rootless Podman `unless-stopped` restart policy does not auto-restart containers after `podman stop`. Recovery relies on the backend health monitor + reconcile-containers.sh (runs on boot and periodically).