docs(master-plan): WS-F#3 first destructive run — 3 reinstall bugs found

Full all-apps-lifecycle pass on .228: lifecycle 11/11, teardown 8/11.
Surfaced (1) fresh-install bind-dir ownership root:root → reinstall
EACCES (jellyfin/netbird; Fix B misses the install path), (2) netbird
reinstall adopts leftover containers → skips manifest cert/file render,
(3) portainer image pin lfg2025/portainer:2.19.4 unpublished (manifest
unknown), pin overrides RPC dockerImage. .228 restored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-26 07:46:11 -04:00
parent 07b9b5a3aa
commit fc64b422e7

View File

@ -201,8 +201,21 @@ no-regression; the original hang was load/timing-induced and not separately repr
safety, override via `ARCHY_MATRIX_PROTECT`). Validated on .228 (discovery + 1-app lifecycle safety, override via `ARCHY_MATRIX_PROTECT`). Validated on .228 (discovery + 1-app lifecycle
green). HEAVY/destructive → a supervised pass on LAN nodes (.116/.198/.228), NOT folded into green). HEAVY/destructive → a supervised pass on LAN nodes (.116/.198/.228), NOT folded into
run-gate. Invoke: `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1 ARCHY_PASSWORD=… run-gate. Invoke: `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1 ARCHY_PASSWORD=…
ARCHY_SCHEME=https bats bats/all-apps-lifecycle.bats`. STILL TODO: run the full destructive pass ARCHY_SCHEME=https bats bats/all-apps-lifecycle.bats`.)*
on a LAN node + fix whatever reinstall failures it surfaces; add reboot-survive + UI-reach per app.)* **✅ FIRST FULL DESTRUCTIVE RUN on .228 (2026-06-26):** lifecycle **11/11 clean**; teardown
**8/11** (immich 3-container stack incl.) — and it surfaced **3 real reinstall bugs** (the payoff):
1. **fresh-install bind-dir ownership = root:root** → EACCES on reinstall (jellyfin `/config`
denied exit 139; netbird-server can't open its SQLite store). Fix B's chown-to-parent only
runs on the reconcile path, **not** `package.install`. The important orchestrator fix.
2. **netbird reinstall adopts leftover containers → skips the manifest cert/file render**
(tls.crt/key/nginx.conf never written → proxy can't start → app reads absent). Only a fully
clean reinstall renders them.
3. **portainer image pin `lfg2025/portainer:2.19.4` is `manifest unknown`** (never pushed to the
registry) and the pin OVERRIDES the RPC dockerImage → portainer is un(re)installable
fleet-wide. Registry/catalog data bug (push the image or change the pin).
.228 restored (jellyfin+netbird via manual chown / clean reinstall; all installed apps running,
28 ctrs; portainer left uninstalled — uninstallable until #3 fixed). TODO: fix #1 (extend chown
to install path) + #2 + #3; add reboot-survive + UI-reach per app to the matrix.
4. **Guardian/IBD-dependent states:** assert that "waiting for bitcoin sync"-style states are a 4. **Guardian/IBD-dependent states:** assert that "waiting for bitcoin sync"-style states are a
legitimate, surfaced wait (with a path to ready) and never a permanent stuck state. legitimate, surfaced wait (with a path to ready) and never a permanent stuck state.