diff --git a/CLAUDE.md b/CLAUDE.md index 7fc60f8f..3566fb2a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -42,5 +42,6 @@ Detailed sub-plans (all linked from the master): ## Production test gate (definition of done) `tests/lifecycle/run-20x.sh` green across install / UI / stop / start / restart / -reinstall / reboot-survive / archipelago-restart-survive / uninstall — **20× on -.228 AND .198**. Until green, the master plan is the priority. +reinstall / reboot-survive / archipelago-restart-survive / uninstall — **5× on +.228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from 20× — +restore to 20× before the final ship). Until green, the master plan is the priority. diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 6a481d4f..021f7892 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -56,7 +56,7 @@ real nodes. Until then, this plan is the priority. - **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`, `-fedimint-ui`) build from `docker/` contexts via `companion.rs`, not the manifest registry — a later phase folds them in. -- **No app has passed the formal 20× production gate.** That is the blocker. +- **No app has passed the formal production gate (5× for now, was 20×).** That is the blocker. ## 4. Workstreams (each links its authoritative detail doc) @@ -66,7 +66,7 @@ real nodes. Until then, this plan is the priority. | B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet | | C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 0–2 code-complete (worktree) | -| E | **Production test gate** — 20× lifecycle on .228 + .198, per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** | +| E | **Production test gate** — 5× lifecycle on .228 + .198 (for now; was 20×), per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** | **Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md` (ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption @@ -78,7 +78,8 @@ modes FM1–FM6 + the desired-state-first reconciler that fixes them). An app is **production-ready** only when `tests/lifecycle/run-20x.sh` is green across the full matrix — install / UI-reachable / stop / start / restart / reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall — -**20× on .228 AND .198**. All 8 gate checkboxes in `tests/lifecycle/TESTING.md` +**5× on .228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from +20× — restore to 20× before the final ship). All 8 gate checkboxes in `tests/lifecycle/TESTING.md` are currently unchecked. Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps, L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated coverage. @@ -97,7 +98,7 @@ L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated cov 4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide) for the podman-`--restart` path. *(f160e0c4)* 5. ◻ **Verify on .198** (immich migration validated on .228 only so far). -6. ◻ **E** — run the 20× gate; fix until green. +6. ◻ **E** — run the 5× gate (`ARCHY_ITERATIONS=5`, was 20×); fix until green. 7. ◻ Demote this banner. **Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the diff --git a/tests/lifecycle/TESTING.md b/tests/lifecycle/TESTING.md index f4cf0258..47e68c5a 100644 --- a/tests/lifecycle/TESTING.md +++ b/tests/lifecycle/TESTING.md @@ -26,7 +26,8 @@ The migration's aim, restated as **five pillars** (every app must satisfy all fi desired→current from manifests + secrets. Self-healing, not edge-triggered. 3. **Lifecycle bulletproof** — every app passes the full matrix (install / UI reachable / stop / start / restart / reinstall / reboot-survive - / archipelago-restart-survive / uninstall) **20× green on .228 AND .198** + / archipelago-restart-survive / uninstall) **5× green on .228 AND .198 for now** + (`ARCHY_ITERATIONS=5`; temporarily reduced from 20×, restore before final ship) before any release. 4. **Data-driven apps** — install/uninstall needs only the app's manifest + catalog entry. **No host OS changes** (no apt, no /etc, no host units) and @@ -38,8 +39,8 @@ The migration's aim, restated as **five pillars** (every app must satisfy all fi drop-all-caps + add-back only what a manifest declares. Secrets are `0600`, owned by the service user. Security is king. -**Per-app definition of done:** all five pillars hold → lifecycle matrix 20× -green on .228 then .198 → catalog/registry updated (`app-catalog/catalog.json` +**Per-app definition of done:** all five pillars hold → lifecycle matrix 5× +(for now; was 20×) green on .228 then .198 → catalog/registry updated (`app-catalog/catalog.json` + `releases/app-catalog.json`, rebuilt image pushed to the mirror) → tracker cell ticked. Only then move to the next app. @@ -192,8 +193,8 @@ ARCHY_PASSWORD=password123 tests/lifecycle/run.sh # Full + destructive (for the verification fleet): ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 tests/lifecycle/run.sh -# 20× release-gate run (the actual v1.7.52 ship gate): -ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 \ +# 5× release-gate run (for now; was 20× — restore before final ship): +ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5 \ tests/lifecycle/run-20x.sh ``` @@ -247,8 +248,8 @@ We don't have a performance harness yet. Add as L6 lands: v1.7.52 ships only when ALL of: 1. ☐ Bitcoin-stops fix verified live on a fresh node (tests/lifecycle/bats/bitcoin-knots.bats fully ● after a cold install) -2. ☐ `tests/lifecycle/run-20x.sh` returns 0 against .228 (full suite, ARCHY_ALLOW_DESTRUCTIVE=1) -3. ☐ `tests/lifecycle/run-20x.sh` returns 0 against .198 (same) +2. ☐ `ARCHY_ITERATIONS=5 tests/lifecycle/run-20x.sh` returns 0 against .228 (5× for now; full suite, ARCHY_ALLOW_DESTRUCTIVE=1) +3. ☐ `ARCHY_ITERATIONS=5 tests/lifecycle/run-20x.sh` returns 0 against .198 (same) 4. ☐ The L3 `backend-survives-archipelago-restart` suite passes (= Phase 3 Quadlet shipped for backends) 5. ☐ Cargo: 0 warnings, 0 unused, all tests green (sustained ✓ since 1c0df95f) 6. ☐ LoC: at least one of {Phase 3 Quadlet, dev_mode resolution} merged