docs(master-plan): §8b resume — gate green + 6-node deploy + APK fix + workstream F

Comprehensive resume for the session restart: single-node gate green
(5/5 .228), latest backend + UX + one-tap companion APK deployed to 6
nodes (table w/ creds + pending 100.64.83.15 cred), workstream-F bugs
from manual testing, agreed next order (netbird → Phase-3 → F →
multinode), and loose ends (untracked AppLoadingScreen.vue, broken
gitea-local mirror, don't-delete-bitcoin-data directive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-23 06:56:54 -04:00
parent ccb594fb85
commit c4cd5fdc90

View File

@ -247,24 +247,62 @@ phases 26 (`dual-ecash-design.md`).
### ▶ CURRENT STATE + RESUME (2026-06-23) — RESUME FROM HERE (works from any device)
**✅ HEADLINE (2026-06-23): the single-node production gate is GREEN — `run-gate.sh` 5/5 on .228,
0 not-ok** (`gate-5x5.log`: iters 698/756/1030/485/481s). The exit criterion (§5) is met. Getting
there took fixing **two real orchestrator bugs** the gate surfaced (package.stop per-app grace,
2026-06-22; package.restart phantom stack-member injection, 2026-06-23 — `order_present_containers`,
commit 92d7f52d) plus hardening **two single-shot-read probes** that flaked under churn (bitcoin-knots
state; immich lan_address). **.228 runs the fixed binary** (release, sha `5472c575…`, swapped into
`/usr/local/bin/archipelago` — safe because containers live in the `user@1000.service` slice, NOT the
`archipelago.service` cgroup). Commits this push (local `main`, **unpushed**): `92d7f52d` (orchestrator
fix + bitcoin-knots probe), `65117545` (docs), immich-probe + this doc update.
**✅ HEADLINE (2026-06-23): single-node gate GREEN (`run-gate.sh` 5/5 on .228, 0 not-ok) +
multinode test deploy DONE to 6 nodes.** The exit criterion (§5) is met. Green took fixing **two real
orchestrator bugs** (package.stop per-app grace, 2026-06-22; package.restart phantom stack-member
injection, 2026-06-23 — `order_present_containers`, commit 92d7f52d) plus hardening two single-shot
probes (bitcoin-knots state, immich lan_address). All work is **committed + PUSHED to `gitea-vps2`
(146) `main` @ `ccb594fb`** — the local-only state is resolved. Binary = release sha `5472c575…`.
**NEXT (post-gate, none blocking):**
1. **Bundled testing deploy** — per [[feedback_deploy_targets_and_ux_bundle]], the next testing deploy
must hit **.116 + .198** (not just .228) AND ship a real **neode-ui frontend build** bundling the
other agent's mobile app-launch UX changes ([[project_mobile_applaunch_ux]]). Blocked only on that
UX work being committed/final (was uncommitted + active `vite` at gate-green time).
2. **Multinode pass**`docs/multinode-testing-plan.md` (.198 + fleet), the next exit criterion.
3. **Workstreams** — netbird #20 ph4 (last real migration); Phase-3 `use_quadlet_backends`; B flip-on
(`EMBED_MANIFESTS` + sign) to distribute manifests via the registry; C marketplace tooling.
**▶ DEPLOY STATE (latest backend `5472c575` + UX frontend + one-tap companion APK) — 2026-06-23:**
| Node | Pw | Done | Notes |
|------|----|----|-------|
| .116 (local, http:80) | `ThisIsWeb54321@` | ✅ | dev node: bitcoin mid-IBD + http-only |
| .198 | `archipelago` | ✅ | resilience; user manual-testing here |
| .228 | `archipelago` | ✅ | canonical gate node (5×-green) |
| 100.82.34.38 (archipelago-1) | `archipelago` | ✅ | |
| 100.89.209.89 (archy-x250-pa) | `ThisIsWeb54321@` | ✅ | |
| 100.70.96.88 (archipelago node) | `ThisIsWeb54321!` | ✅ | note the `!` |
| 100.64.83.15 (archy-dev-pa) | ? | ⏳ | UP (tailscale ping ok) but `ThisIsWeb54321@` REJECTED — **need correct pw** |
| 100.66.157.120 (archy-x250-exp) | `ThisIsWeb54321@` | ⏭️ | DOWN — user said leave it |
Deploy scripts saved in scratchpad: `deploy-node.sh` (full binary+FE, sha+health verify) and
`fe-only.sh` (FE-only, no archipelago restart). Reusable: `bash deploy-node.sh <host> <pw> <scheme> 127.0.0.1`.
**▶ COMPANION APK fixed (other agent's commit `5c43e127` + my reconcile):** QR + download were a
zip-wrapped `.apk.zip` (forced unzip). Now serve raw `archipelago-companion.apk` (one-tap) from the
146 raw URL; `CompanionIntroOverlay.vue` + ship/publish scripts repointed; old `.zip` dropped. The
OLD `.apk.zip` URL now 404s, so EVERY node was FE-refreshed to the new build (all 6 verified
`/ : 200` + bundle references `archipelago-companion.apk`).
**▶ MANUAL-TEST BUGS FOUND on .198 → workstream F (§4/§6c).** The green gate is DESTRUCTIVE-tier /
~8 core apps; it SKIPS uninstall/reinstall and has no progress-UI / all-apps coverage. Real bugs:
immich+grafana **uninstall hangs at a solid full-red bar + leaves a ghost in My Apps** (doesn't
actually remove); grafana **reinstall stops**; fedimint guardian shows "waiting for bitcoin sync"
(verify legit vs stuck). These motivate **workstream F** (cascade + progress + all-apps gate).
Also added **§10**: investigate TanStack-Query/push-based state mgmt for neode-ui (the state-drift
root cause behind the stuck bar + ghosts).
**▶ NEXT — agreed task order (do IN ORDER, see §6b):**
1. **netbird #20 ph4** — last real manifest migration.
2. **Phase-3 `use_quadlet_backends`** — orchestrator backends → Quadlet units.
3. **§6c workstream F** — cascade/uninstall + progress-UI + ALL-apps gate; fix the immich/grafana
uninstall + ghost-My-Apps + reinstall-stops bugs to a 5×-green; then §10 state-mgmt investigation.
4. **Multinode pass**`docs/multinode-testing-plan.md` (the 6 deployed nodes are ready for manual
testing now).
**▶ LOOSE ENDS / gotchas for the resuming session:**
- **`neode-ui/src/components/AppLoadingScreen.vue` is UNTRACKED** on .116 — the other agent created it
but NO committed code imports it (orphan, not in `e825bbed`). Left in place; decide whether to wire
it in or delete. Not deployed (committed UX doesn't reference it).
- **gitea-local mirror (`localhost:3000`) push is BROKEN** (token redirects to `/login`); push to
`gitea-vps2` works and is primary. Reconcile the local mirror token if you need it.
- **Don't delete bitcoin/electrum data** (user directive) — run only the DESTRUCTIVE gate
(`run-gate.sh` default; never set `ARCHY_ALLOW_CASCADE_DESTRUCTIVE` on real nodes with synced chains).
- **.198 gate not run this session** (user was manual-testing there + restarting). .116 gate ran but
failed 12 tests — ALL environmental (.116 is http-only → ui-coverage hardcodes `https://`; + bitcoin
mid-IBD → bitcoin/lnd preconditions). NOT product regressions. `gate-116.log` on .116.
**(historical resume notes for the 5× chase below — superseded by the green result above)**