docs: exact cutoff-proof resume in master-plan SS8b (resume from any device)

Captures: .228 1x-GREEN (110/110); hardened 5x DETACHED on .228 (/tmp/gate-5x2.log,
nohup — survives terminal close) with the exact check-from-any-machine command; all
shipped code fixes (commits) + deploy state (.228 + .198); node-state fixes NOT in
repo (lnd nginx proxy 8081->18083, home-assistant orphan unit removed, electrumx
re-registered); the run-ON-the-node lesson; and remaining work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-22 17:22:29 -04:00
parent 98f4fa44a8
commit 8355453a7e

View File

@ -5,7 +5,7 @@
> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove > supersedes all prior roadmap/handoff/status docs. When the gate passes, remove
> the priority banner and demote this doc. > the priority banner and demote this doc.
> >
> Last updated: 2026-06-22 · Binary: v1.7.99-alpha · See §8b for the live resume. > Last updated: 2026-06-22 (evening) · .228 gate 1×-GREEN; hardened 5× running on .228 (see §8b CURRENT STATE — resume from any device).
--- ---
@ -164,7 +164,55 @@ hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.
Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash
phases 26 (`dual-ecash-design.md`). phases 26 (`dual-ecash-design.md`).
## 8b. SESSION STATE + RESUME (updated 2026-06-22) — READ THIS FIRST ON RESUME ## 8b. SESSION STATE + RESUME (updated 2026-06-22 evening) — READ §8b "CURRENT STATE + RESUME" FIRST
### ▶ CURRENT STATE + RESUME (2026-06-22 evening) — RESUME FROM HERE (works from any device)
**Headline:** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN
(110/110)**; a **hardened 5× run is IN PROGRESS on `.228`** (the single-node exit criterion). The
gate is now single-node (.228); multinode is split out (`docs/multinode-testing-plan.md`).
**THE 5× RUN IS DETACHED ON .228 — survives terminal/session close. Check it from any machine:**
```
sshpass -p archipelago ssh archipelago@192.168.1.228 \
'grep -E "iteration [0-9]+: (PASS|FAIL)|RESULTS|passed:|failed:" /tmp/gate-5x2.log; \
echo "running pid: $(pgrep -f run-20x.sh$ || echo DONE)"; grep "^not ok" /tmp/gate-5x2.log | sort -u'
```
- Log: `/tmp/gate-5x2.log` on .228 · launched `nohup` (pid was 4042141) · `ARCHY_ITERATIONS=5
ARCHY_ALLOW_DESTRUCTIVE=1`, run **ON the node** from `/tmp/lifecycle-run/tests/lifecycle`
(ARCHY_HOST=127.0.0.1). `bats` 1.11.1 + static `jq` 1.7.1 are installed on .228 for this.
- **If all 5 iterations PASS → .228 has met the single-node criterion → demote the banner.**
- If it flakes again: it'll be readiness-under-churn (lnd/mempool); the hardening (commit `98f4fa44`:
inter-iteration `settle_stack()` + 180240s readiness windows) targets exactly that. Re-copy the
repo `tests/lifecycle` to /tmp/lifecycle-run and re-launch.
**Code fixes shipped this session (all on `main`, built + DEPLOYED to .228 AND .198):**
- `2dad64b2` stop honours per-app grace (was `-t 30` deadline racing SIGKILL).
- `760a32bc` reconciler stops resurrecting user-stopped apps (dep-override + host-port watchdog).
- `6e49ce6f` container-list reports user-stopped apps as `stopped` despite a live UI companion.
- `452f05d8` companion self-heal on its own ~30s loop (was gated behind the slow per-app pass).
- Test-harness hardening: `88930558` `53b8e47f` `892ff083` `98f4fa44` (readiness retries, immich/
fedimint/NPM/lnd windows, inter-iteration settle). Binary built on .116
`core/target/release/archipelago` (4-fix); deploy = stop archipelago, cp to /usr/local/bin, start.
**NODE-STATE fixes on .228 NOT in the repo (re-apply if .228 is reset/reimaged):**
- nginx `/app/lnd/` proxy target was stale `8081` → fixed to `18083` (sed in
/etc/nginx/sites-{available,enabled}/archipelago + snippets, then `nginx -s reload`). Repo code is
correct (18083); old node config was stale.
- Removed a stale orphan `~/.config/containers/systemd/home-assistant.container` (ContainerName
`home-assistant` ≠ the real `homeassistant` container; it was stuck "activating"). Real app fine.
- electrumx was re-installed (`package.install` w/ image `146.59.87.168:3000/lfg2025/electrumx:v1.18.0`)
to re-register it as a tracked manifest app (it had become adopted plain-podman).
**KEY LESSON:** run the lifecycle gate **ON the node**, not via RPC from .116 — its bitcoin/companion/
orphan/endpoint tests use local `podman`/`systemctl`/`bitcoin-cli`/`curl`, so a remote run silently
tests the *runner* (this is why earlier runs from .116 falsely showed "bitcoin in IBD" etc.).
**Remaining (after 5× green):** netbird migration (#20 ph4 — the one real migration left) + btcpay/
mempool stack polish; Phase-3 `use_quadlet_backends`; B flip-on (EMBED_MANIFESTS+sign); per-app test
coverage (~30 apps unwritten); the mobile app-launch UX (§8 Roadmap P1). Multinode → its own plan.
---
### Where we are — Task #20 (manifest lifecycle hooks) + indeedhub migration: DONE & 2-node verified ### Where we are — Task #20 (manifest lifecycle hooks) + indeedhub migration: DONE & 2-node verified