From ae47897601d992467a84818cccf3dc1e049bb515 Mon Sep 17 00:00:00 2001 From: archipelago Date: Tue, 23 Jun 2026 04:27:36 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20single-node=20production=20gate=20GREEN?= =?UTF-8?q?=20(5/5=20on=20.228)=20=E2=80=94=20demote=20banner?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit run-gate.sh 5×-green on .228, 0 not-ok (gate-5x5.log). Records the milestone in the header/banner, §4 workstream E, §6 sequence, and §8b; demotes the priority banner per §6 item 6. Next: bundled testing deploy (.116/.198 + UX frontend), multinode pass, workstreams B/C/D. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/PRODUCTION-MASTER-PLAN.md | 52 +++++++++++++++++++++++++--------- 1 file changed, 39 insertions(+), 13 deletions(-) diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 1943db5d..da30f4b4 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -1,11 +1,13 @@ -# 🚩 PRODUCTION MASTER PLAN — Archipelago App Platform & Registry +# PRODUCTION MASTER PLAN — Archipelago App Platform & Registry -> **THIS IS THE AUTHORITATIVE PLAN. Agents: read this first and keep it open until -> the production test gate (§5) is green.** It overrides ad-hoc direction and -> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove -> the priority banner and demote this doc. +> **✅ SINGLE-NODE PRODUCTION GATE IS GREEN (2026-06-23): `run-gate.sh` 5/5 on .228, 0 failures.** +> This remains the authoritative plan for the broader north star (manifest-driven +> platform, registry-distributed manifests, external marketplace), but it is no +> longer a hard priority banner blocking all other work. Remaining workstreams are +> in §6 / §8b. Next exit-criteria: multinode (`docs/multinode-testing-plan.md`) + +> workstreams B/C/D. > -> Last updated: 2026-06-22 (evening) · .228 gate 1×-GREEN; hardened 5× running on .228 (see §8b CURRENT STATE — resume from any device). +> Last updated: 2026-06-23 · **.228 gate 5×-GREEN (110/110 ×5, 0 not-ok)** — exit criterion met (see §8b). --- @@ -67,7 +69,7 @@ real nodes. Until then, this plan is the priority. | B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet | | C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 0–2 code-complete (worktree) | -| E | **Production test gate** — 5× lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out → `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **.228 GREEN (110/110); 5× in progress** | +| E | **Production test gate** — 5× lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out → `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **✅ .228 5×-GREEN (110/110 ×5, 0 not-ok, 2026-06-23)** — single-node criterion met | **Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md` (ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption @@ -100,9 +102,12 @@ proxies; L3 survival ◐; ~30 apps have zero automated coverage. data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)* 4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide) for the podman-`--restart` path. *(f160e0c4)* -5. ◧ **E** — 5× gate on **.228** (`ARCHY_ITERATIONS=5`). .228 is GREEN - 1× (110/110); the 5× run is in progress. This is now the SINGLE-NODE criterion. -6. ◻ Demote this banner once the 5× is green. +5. ✅ **E** — 5× gate on **.228** (`ARCHY_ITERATIONS=5`) is **GREEN: 5/5, 0 not-ok** + (2026-06-23). Two real orchestrator bugs were found + fixed en route (package.stop + per-app grace; package.restart phantom stack-member injection → `order_present_containers`, + commit 92d7f52d) plus two single-shot-read probes hardened (bitcoin-knots state, immich + lan_address). The single-node criterion is met. +6. ✅ Banner demoted (this doc, 2026-06-23). Next: multinode pass + workstreams B/C/D. **Multinode / fleet verification (.198 and the rest) is split into its own plan:** `docs/multinode-testing-plan.md`. Do it AFTER the .228 single-node gate is green. @@ -180,11 +185,32 @@ hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan. Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash phases 2–6 (`dual-ecash-design.md`). -## 8b. SESSION STATE + RESUME (updated 2026-06-22 evening) — READ §8b "CURRENT STATE + RESUME" FIRST +## 8b. SESSION STATE + RESUME (updated 2026-06-23) — READ §8b "CURRENT STATE + RESUME" FIRST -### ▶ CURRENT STATE + RESUME (2026-06-22 evening) — RESUME FROM HERE (works from any device) +### ▶ CURRENT STATE + RESUME (2026-06-23) — RESUME FROM HERE (works from any device) -**Headline:** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN +**✅ HEADLINE (2026-06-23): the single-node production gate is GREEN — `run-gate.sh` 5/5 on .228, +0 not-ok** (`gate-5x5.log`: iters 698/756/1030/485/481s). The exit criterion (§5) is met. Getting +there took fixing **two real orchestrator bugs** the gate surfaced (package.stop per-app grace, +2026-06-22; package.restart phantom stack-member injection, 2026-06-23 — `order_present_containers`, +commit 92d7f52d) plus hardening **two single-shot-read probes** that flaked under churn (bitcoin-knots +state; immich lan_address). **.228 runs the fixed binary** (release, sha `5472c575…`, swapped into +`/usr/local/bin/archipelago` — safe because containers live in the `user@1000.service` slice, NOT the +`archipelago.service` cgroup). Commits this push (local `main`, **unpushed**): `92d7f52d` (orchestrator +fix + bitcoin-knots probe), `65117545` (docs), immich-probe + this doc update. + +**NEXT (post-gate, none blocking):** +1. **Bundled testing deploy** — per [[feedback_deploy_targets_and_ux_bundle]], the next testing deploy + must hit **.116 + .198** (not just .228) AND ship a real **neode-ui frontend build** bundling the + other agent's mobile app-launch UX changes ([[project_mobile_applaunch_ux]]). Blocked only on that + UX work being committed/final (was uncommitted + active `vite` at gate-green time). +2. **Multinode pass** — `docs/multinode-testing-plan.md` (.198 + fleet), the next exit criterion. +3. **Workstreams** — netbird #20 ph4 (last real migration); Phase-3 `use_quadlet_backends`; B flip-on + (`EMBED_MANIFESTS` + sign) to distribute manifests via the registry; C marketplace tooling. + +**(historical resume notes for the 5× chase below — superseded by the green result above)** + +**Headline (2026-06-22):** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN (110/110)**; a **fresh 5× run is IN PROGRESS on `.228`** (the single-node exit criterion) after a real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out (`docs/multinode-testing-plan.md`). The gate is canonically **5×** now — `run-gate.sh` (the `20x`