docs: single-node production gate GREEN (5/5 on .228) — demote banner

run-gate.sh 5×-green on .228, 0 not-ok (gate-5x5.log). Records the
milestone in the header/banner, §4 workstream E, §6 sequence, and §8b;
demotes the priority banner per §6 item 6. Next: bundled testing deploy
(.116/.198 + UX frontend), multinode pass, workstreams B/C/D.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-23 04:27:36 -04:00
parent 256d354048
commit ae47897601

View File

@ -1,11 +1,13 @@
# 🚩 PRODUCTION MASTER PLAN — Archipelago App Platform & Registry # PRODUCTION MASTER PLAN — Archipelago App Platform & Registry
> **THIS IS THE AUTHORITATIVE PLAN. Agents: read this first and keep it open until > **✅ SINGLE-NODE PRODUCTION GATE IS GREEN (2026-06-23): `run-gate.sh` 5/5 on .228, 0 failures.**
> the production test gate (§5) is green.** It overrides ad-hoc direction and > This remains the authoritative plan for the broader north star (manifest-driven
> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove > platform, registry-distributed manifests, external marketplace), but it is no
> the priority banner and demote this doc. > longer a hard priority banner blocking all other work. Remaining workstreams are
> in §6 / §8b. Next exit-criteria: multinode (`docs/multinode-testing-plan.md`) +
> workstreams B/C/D.
> >
> Last updated: 2026-06-22 (evening) · .228 gate 1×-GREEN; hardened 5× running on .228 (see §8b CURRENT STATE — resume from any device). > Last updated: 2026-06-23 · **.228 gate 5×-GREEN (110/110 ×5, 0 not-ok)** — exit criterion met (see §8b).
--- ---
@ -67,7 +69,7 @@ real nodes. Until then, this plan is the priority.
| B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet | | B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet |
| C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending |
| D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) | | D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) |
| E | **Production test gate** — 5× lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out → `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **.228 GREEN (110/110); 5× in progress** | | E | **Production test gate** — 5× lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out → `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **✅ .228 5×-GREEN (110/110 ×5, 0 not-ok, 2026-06-23)** — single-node criterion met |
**Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md` **Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md`
(ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption (ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption
@ -100,9 +102,12 @@ proxies; L3 survival ◐; ~30 apps have zero automated coverage.
data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)* data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)*
4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide) 4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide)
for the podman-`--restart` path. *(f160e0c4)* for the podman-`--restart` path. *(f160e0c4)*
5. ◧ **E** — 5× gate on **.228** (`ARCHY_ITERATIONS=5`). .228 is GREEN 5. ✅ **E** — 5× gate on **.228** (`ARCHY_ITERATIONS=5`) is **GREEN: 5/5, 0 not-ok**
1× (110/110); the 5× run is in progress. This is now the SINGLE-NODE criterion. (2026-06-23). Two real orchestrator bugs were found + fixed en route (package.stop
6. ◻ Demote this banner once the 5× is green. per-app grace; package.restart phantom stack-member injection → `order_present_containers`,
commit 92d7f52d) plus two single-shot-read probes hardened (bitcoin-knots state, immich
lan_address). The single-node criterion is met.
6. ✅ Banner demoted (this doc, 2026-06-23). Next: multinode pass + workstreams B/C/D.
**Multinode / fleet verification (.198 and the rest) is split into its own plan:** **Multinode / fleet verification (.198 and the rest) is split into its own plan:**
`docs/multinode-testing-plan.md`. Do it AFTER the .228 single-node gate is green. `docs/multinode-testing-plan.md`. Do it AFTER the .228 single-node gate is green.
@ -180,11 +185,32 @@ hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.
Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash
phases 26 (`dual-ecash-design.md`). phases 26 (`dual-ecash-design.md`).
## 8b. SESSION STATE + RESUME (updated 2026-06-22 evening) — READ §8b "CURRENT STATE + RESUME" FIRST ## 8b. SESSION STATE + RESUME (updated 2026-06-23) — READ §8b "CURRENT STATE + RESUME" FIRST
### ▶ CURRENT STATE + RESUME (2026-06-22 evening) — RESUME FROM HERE (works from any device) ### ▶ CURRENT STATE + RESUME (2026-06-23) — RESUME FROM HERE (works from any device)
**Headline:** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN **✅ HEADLINE (2026-06-23): the single-node production gate is GREEN — `run-gate.sh` 5/5 on .228,
0 not-ok** (`gate-5x5.log`: iters 698/756/1030/485/481s). The exit criterion (§5) is met. Getting
there took fixing **two real orchestrator bugs** the gate surfaced (package.stop per-app grace,
2026-06-22; package.restart phantom stack-member injection, 2026-06-23 — `order_present_containers`,
commit 92d7f52d) plus hardening **two single-shot-read probes** that flaked under churn (bitcoin-knots
state; immich lan_address). **.228 runs the fixed binary** (release, sha `5472c575…`, swapped into
`/usr/local/bin/archipelago` — safe because containers live in the `user@1000.service` slice, NOT the
`archipelago.service` cgroup). Commits this push (local `main`, **unpushed**): `92d7f52d` (orchestrator
fix + bitcoin-knots probe), `65117545` (docs), immich-probe + this doc update.
**NEXT (post-gate, none blocking):**
1. **Bundled testing deploy** — per [[feedback_deploy_targets_and_ux_bundle]], the next testing deploy
must hit **.116 + .198** (not just .228) AND ship a real **neode-ui frontend build** bundling the
other agent's mobile app-launch UX changes ([[project_mobile_applaunch_ux]]). Blocked only on that
UX work being committed/final (was uncommitted + active `vite` at gate-green time).
2. **Multinode pass**`docs/multinode-testing-plan.md` (.198 + fleet), the next exit criterion.
3. **Workstreams** — netbird #20 ph4 (last real migration); Phase-3 `use_quadlet_backends`; B flip-on
(`EMBED_MANIFESTS` + sign) to distribute manifests via the registry; C marketplace tooling.
**(historical resume notes for the 5× chase below — superseded by the green result above)**
**Headline (2026-06-22):** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN
(110/110)**; a **fresh 5× run is IN PROGRESS on `.228`** (the single-node exit criterion) after a (110/110)**; a **fresh 5× run is IN PROGRESS on `.228`** (the single-node exit criterion) after a
real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out
(`docs/multinode-testing-plan.md`). The gate is canonically **5×** now — `run-gate.sh` (the `20x` (`docs/multinode-testing-plan.md`). The gate is canonically **5×** now — `run-gate.sh` (the `20x`