docs: temporarily reduce release lifecycle gate from 20x to 5x

Per user direction: the production test gate is 5x (ARCHY_ITERATIONS=5) on
.228 AND .198 for now, down from 20x. Restore to 20x before the final ship.
Updated CLAUDE.md, PRODUCTION-MASTER-PLAN.md, and tests/lifecycle/TESTING.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-21 17:11:00 -04:00
parent 9c45f718a2
commit 84031e6209
3 changed files with 16 additions and 13 deletions

View File

@ -42,5 +42,6 @@ Detailed sub-plans (all linked from the master):
## Production test gate (definition of done) ## Production test gate (definition of done)
`tests/lifecycle/run-20x.sh` green across install / UI / stop / start / restart / `tests/lifecycle/run-20x.sh` green across install / UI / stop / start / restart /
reinstall / reboot-survive / archipelago-restart-survive / uninstall — **20× on reinstall / reboot-survive / archipelago-restart-survive / uninstall — **5× on
.228 AND .198**. Until green, the master plan is the priority. .228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from 20×
restore to 20× before the final ship). Until green, the master plan is the priority.

View File

@ -56,7 +56,7 @@ real nodes. Until then, this plan is the priority.
- **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`, - **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`,
`-fedimint-ui`) build from `docker/<name>` contexts via `companion.rs`, not the `-fedimint-ui`) build from `docker/<name>` contexts via `companion.rs`, not the
manifest registry — a later phase folds them in. manifest registry — a later phase folds them in.
- **No app has passed the formal 20× production gate.** That is the blocker. - **No app has passed the formal production gate (5× for now, was 20×).** That is the blocker.
## 4. Workstreams (each links its authoritative detail doc) ## 4. Workstreams (each links its authoritative detail doc)
@ -66,7 +66,7 @@ real nodes. Until then, this plan is the priority.
| B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet | | B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet |
| C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending |
| D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) | | D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) |
| E | **Production test gate**20× lifecycle on .228 + .198, per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** | | E | **Production test gate**5× lifecycle on .228 + .198 (for now; was 20×), per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** |
**Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md` **Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md`
(ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption (ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption
@ -78,7 +78,8 @@ modes FM1FM6 + the desired-state-first reconciler that fixes them).
An app is **production-ready** only when `tests/lifecycle/run-20x.sh` is green An app is **production-ready** only when `tests/lifecycle/run-20x.sh` is green
across the full matrix — install / UI-reachable / stop / start / restart / across the full matrix — install / UI-reachable / stop / start / restart /
reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall — reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall —
**20× on .228 AND .198**. All 8 gate checkboxes in `tests/lifecycle/TESTING.md` **5× on .228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from
20× — restore to 20× before the final ship). All 8 gate checkboxes in `tests/lifecycle/TESTING.md`
are currently unchecked. Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps, are currently unchecked. Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps,
L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated coverage. L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated coverage.
@ -97,7 +98,7 @@ L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated cov
4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide) 4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide)
for the podman-`--restart` path. *(f160e0c4)* for the podman-`--restart` path. *(f160e0c4)*
5. ◻ **Verify on .198** (immich migration validated on .228 only so far). 5. ◻ **Verify on .198** (immich migration validated on .228 only so far).
6. ◻ **E** — run the 20× gate; fix until green. 6. ◻ **E** — run the 5× gate (`ARCHY_ITERATIONS=5`, was 20×); fix until green.
7. ◻ Demote this banner. 7. ◻ Demote this banner.
**Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the **Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the

View File

@ -26,7 +26,8 @@ The migration's aim, restated as **five pillars** (every app must satisfy all fi
desired→current from manifests + secrets. Self-healing, not edge-triggered. desired→current from manifests + secrets. Self-healing, not edge-triggered.
3. **Lifecycle bulletproof** — every app passes the full matrix 3. **Lifecycle bulletproof** — every app passes the full matrix
(install / UI reachable / stop / start / restart / reinstall / reboot-survive (install / UI reachable / stop / start / restart / reinstall / reboot-survive
/ archipelago-restart-survive / uninstall) **20× green on .228 AND .198** / archipelago-restart-survive / uninstall) **5× green on .228 AND .198 for now**
(`ARCHY_ITERATIONS=5`; temporarily reduced from 20×, restore before final ship)
before any release. before any release.
4. **Data-driven apps** — install/uninstall needs only the app's manifest + 4. **Data-driven apps** — install/uninstall needs only the app's manifest +
catalog entry. **No host OS changes** (no apt, no /etc, no host units) and catalog entry. **No host OS changes** (no apt, no /etc, no host units) and
@ -38,8 +39,8 @@ The migration's aim, restated as **five pillars** (every app must satisfy all fi
drop-all-caps + add-back only what a manifest declares. Secrets are `0600`, drop-all-caps + add-back only what a manifest declares. Secrets are `0600`,
owned by the service user. Security is king. owned by the service user. Security is king.
**Per-app definition of done:** all five pillars hold → lifecycle matrix 20× **Per-app definition of done:** all five pillars hold → lifecycle matrix 5×
green on .228 then .198 → catalog/registry updated (`app-catalog/catalog.json` (for now; was 20×) green on .228 then .198 → catalog/registry updated (`app-catalog/catalog.json`
+ `releases/app-catalog.json`, rebuilt image pushed to the mirror) → tracker + `releases/app-catalog.json`, rebuilt image pushed to the mirror) → tracker
cell ticked. Only then move to the next app. cell ticked. Only then move to the next app.
@ -192,8 +193,8 @@ ARCHY_PASSWORD=password123 tests/lifecycle/run.sh
# Full + destructive (for the verification fleet): # Full + destructive (for the verification fleet):
ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 tests/lifecycle/run.sh ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 tests/lifecycle/run.sh
# 20× release-gate run (the actual v1.7.52 ship gate): # 5× release-gate run (for now; was 20× — restore before final ship):
ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 \ ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5 \
tests/lifecycle/run-20x.sh tests/lifecycle/run-20x.sh
``` ```
@ -247,8 +248,8 @@ We don't have a performance harness yet. Add as L6 lands:
v1.7.52 ships only when ALL of: v1.7.52 ships only when ALL of:
1. ☐ Bitcoin-stops fix verified live on a fresh node (tests/lifecycle/bats/bitcoin-knots.bats fully ● after a cold install) 1. ☐ Bitcoin-stops fix verified live on a fresh node (tests/lifecycle/bats/bitcoin-knots.bats fully ● after a cold install)
2. ☐ `tests/lifecycle/run-20x.sh` returns 0 against .228 (full suite, ARCHY_ALLOW_DESTRUCTIVE=1) 2. ☐ `ARCHY_ITERATIONS=5 tests/lifecycle/run-20x.sh` returns 0 against .228 (5× for now; full suite, ARCHY_ALLOW_DESTRUCTIVE=1)
3. ☐ `tests/lifecycle/run-20x.sh` returns 0 against .198 (same) 3. ☐ `ARCHY_ITERATIONS=5 tests/lifecycle/run-20x.sh` returns 0 against .198 (same)
4. ☐ The L3 `backend-survives-archipelago-restart` suite passes (= Phase 3 Quadlet shipped for backends) 4. ☐ The L3 `backend-survives-archipelago-restart` suite passes (= Phase 3 Quadlet shipped for backends)
5. ☐ Cargo: 0 warnings, 0 unused, all tests green (sustained ✓ since 1c0df95f) 5. ☐ Cargo: 0 warnings, 0 unused, all tests green (sustained ✓ since 1c0df95f)
6. ☐ LoC: at least one of {Phase 3 Quadlet, dev_mode resolution} merged 6. ☐ LoC: at least one of {Phase 3 Quadlet, dev_mode resolution} merged