diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 1df42f61..a2ac39f4 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -201,8 +201,21 @@ no-regression; the original hang was load/timing-induced and not separately repr safety, override via `ARCHY_MATRIX_PROTECT`). Validated on .228 (discovery + 1-app lifecycle green). HEAVY/destructive → a supervised pass on LAN nodes (.116/.198/.228), NOT folded into run-gate. Invoke: `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1 ARCHY_PASSWORD=… - ARCHY_SCHEME=https bats bats/all-apps-lifecycle.bats`. STILL TODO: run the full destructive pass - on a LAN node + fix whatever reinstall failures it surfaces; add reboot-survive + UI-reach per app.)* + ARCHY_SCHEME=https bats bats/all-apps-lifecycle.bats`.)* + **✅ FIRST FULL DESTRUCTIVE RUN on .228 (2026-06-26):** lifecycle **11/11 clean**; teardown + **8/11** (immich 3-container stack incl.) — and it surfaced **3 real reinstall bugs** (the payoff): + 1. **fresh-install bind-dir ownership = root:root** → EACCES on reinstall (jellyfin `/config` + denied exit 139; netbird-server can't open its SQLite store). Fix B's chown-to-parent only + runs on the reconcile path, **not** `package.install`. The important orchestrator fix. + 2. **netbird reinstall adopts leftover containers → skips the manifest cert/file render** + (tls.crt/key/nginx.conf never written → proxy can't start → app reads absent). Only a fully + clean reinstall renders them. + 3. **portainer image pin `lfg2025/portainer:2.19.4` is `manifest unknown`** (never pushed to the + registry) and the pin OVERRIDES the RPC dockerImage → portainer is un(re)installable + fleet-wide. Registry/catalog data bug (push the image or change the pin). + .228 restored (jellyfin+netbird via manual chown / clean reinstall; all installed apps running, + 28 ctrs; portainer left uninstalled — uninstallable until #3 fixed). TODO: fix #1 (extend chown + to install path) + #2 + #3; add reboot-survive + UI-reach per app to the matrix. 4. **Guardian/IBD-dependent states:** assert that "waiting for bitcoin sync"-style states are a legitimate, surfaced wait (with a path to ready) and never a permanent stuck state.