diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 1943db5d..da30f4b4 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -1,11 +1,13 @@ -# ๐Ÿšฉ PRODUCTION MASTER PLAN โ€” Archipelago App Platform & Registry +# PRODUCTION MASTER PLAN โ€” Archipelago App Platform & Registry -> **THIS IS THE AUTHORITATIVE PLAN. Agents: read this first and keep it open until -> the production test gate (ยง5) is green.** It overrides ad-hoc direction and -> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove -> the priority banner and demote this doc. +> **โœ… SINGLE-NODE PRODUCTION GATE IS GREEN (2026-06-23): `run-gate.sh` 5/5 on .228, 0 failures.** +> This remains the authoritative plan for the broader north star (manifest-driven +> platform, registry-distributed manifests, external marketplace), but it is no +> longer a hard priority banner blocking all other work. Remaining workstreams are +> in ยง6 / ยง8b. Next exit-criteria: multinode (`docs/multinode-testing-plan.md`) + +> workstreams B/C/D. > -> Last updated: 2026-06-22 (evening) ยท .228 gate 1ร—-GREEN; hardened 5ร— running on .228 (see ยง8b CURRENT STATE โ€” resume from any device). +> Last updated: 2026-06-23 ยท **.228 gate 5ร—-GREEN (110/110 ร—5, 0 not-ok)** โ€” exit criterion met (see ยง8b). --- @@ -67,7 +69,7 @@ real nodes. Until then, this plan is the priority. | B | **Registry-distributed manifests** โ€” catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet | | C | **Developer-ready external registry** โ€” 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app โ€ฆ` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | D | **Distribution backbone** โ€” signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 0โ€“2 code-complete (worktree) | -| E | **Production test gate** โ€” 5ร— lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out โ†’ `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **.228 GREEN (110/110); 5ร— in progress** | +| E | **Production test gate** โ€” 5ร— lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out โ†’ `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **โœ… .228 5ร—-GREEN (110/110 ร—5, 0 not-ok, 2026-06-23)** โ€” single-node criterion met | **Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md` (ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption @@ -100,9 +102,12 @@ proxies; L3 survival โ—; ~30 apps have zero automated coverage. data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)* 4. โœ… **Reboot-survival** โ€” podman-restart.service enabled (startup, fleet-wide) for the podman-`--restart` path. *(f160e0c4)* -5. โ—ง **E** โ€” 5ร— gate on **.228** (`ARCHY_ITERATIONS=5`). .228 is GREEN - 1ร— (110/110); the 5ร— run is in progress. This is now the SINGLE-NODE criterion. -6. โ—ป Demote this banner once the 5ร— is green. +5. โœ… **E** โ€” 5ร— gate on **.228** (`ARCHY_ITERATIONS=5`) is **GREEN: 5/5, 0 not-ok** + (2026-06-23). Two real orchestrator bugs were found + fixed en route (package.stop + per-app grace; package.restart phantom stack-member injection โ†’ `order_present_containers`, + commit 92d7f52d) plus two single-shot-read probes hardened (bitcoin-knots state, immich + lan_address). The single-node criterion is met. +6. โœ… Banner demoted (this doc, 2026-06-23). Next: multinode pass + workstreams B/C/D. **Multinode / fleet verification (.198 and the rest) is split into its own plan:** `docs/multinode-testing-plan.md`. Do it AFTER the .228 single-node gate is green. @@ -180,11 +185,32 @@ hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan. Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash phases 2โ€“6 (`dual-ecash-design.md`). -## 8b. SESSION STATE + RESUME (updated 2026-06-22 evening) โ€” READ ยง8b "CURRENT STATE + RESUME" FIRST +## 8b. SESSION STATE + RESUME (updated 2026-06-23) โ€” READ ยง8b "CURRENT STATE + RESUME" FIRST -### โ–ถ CURRENT STATE + RESUME (2026-06-22 evening) โ€” RESUME FROM HERE (works from any device) +### โ–ถ CURRENT STATE + RESUME (2026-06-23) โ€” RESUME FROM HERE (works from any device) -**Headline:** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1ร—-GREEN +**โœ… HEADLINE (2026-06-23): the single-node production gate is GREEN โ€” `run-gate.sh` 5/5 on .228, +0 not-ok** (`gate-5x5.log`: iters 698/756/1030/485/481s). The exit criterion (ยง5) is met. Getting +there took fixing **two real orchestrator bugs** the gate surfaced (package.stop per-app grace, +2026-06-22; package.restart phantom stack-member injection, 2026-06-23 โ€” `order_present_containers`, +commit 92d7f52d) plus hardening **two single-shot-read probes** that flaked under churn (bitcoin-knots +state; immich lan_address). **.228 runs the fixed binary** (release, sha `5472c575โ€ฆ`, swapped into +`/usr/local/bin/archipelago` โ€” safe because containers live in the `user@1000.service` slice, NOT the +`archipelago.service` cgroup). Commits this push (local `main`, **unpushed**): `92d7f52d` (orchestrator +fix + bitcoin-knots probe), `65117545` (docs), immich-probe + this doc update. + +**NEXT (post-gate, none blocking):** +1. **Bundled testing deploy** โ€” per [[feedback_deploy_targets_and_ux_bundle]], the next testing deploy + must hit **.116 + .198** (not just .228) AND ship a real **neode-ui frontend build** bundling the + other agent's mobile app-launch UX changes ([[project_mobile_applaunch_ux]]). Blocked only on that + UX work being committed/final (was uncommitted + active `vite` at gate-green time). +2. **Multinode pass** โ€” `docs/multinode-testing-plan.md` (.198 + fleet), the next exit criterion. +3. **Workstreams** โ€” netbird #20 ph4 (last real migration); Phase-3 `use_quadlet_backends`; B flip-on + (`EMBED_MANIFESTS` + sign) to distribute manifests via the registry; C marketplace tooling. + +**(historical resume notes for the 5ร— chase below โ€” superseded by the green result above)** + +**Headline (2026-06-22):** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1ร—-GREEN (110/110)**; a **fresh 5ร— run is IN PROGRESS on `.228`** (the single-node exit criterion) after a real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out (`docs/multinode-testing-plan.md`). The gate is canonically **5ร—** now โ€” `run-gate.sh` (the `20x`