From 76b23adcc04ec86ea31fd55640ecffa97a7d997d Mon Sep 17 00:00:00 2001 From: archipelago Date: Mon, 22 Jun 2026 11:34:55 -0400 Subject: [PATCH] docs(gate): test 31 root-caused = .228 contamination (not a product bug) companion::reconcile only recreates a deleted companion unit when its parent backend is in manifest_ids. On contaminated .228, electrumx ran as plain podman and was NOT a tracked manifest install (manifest on disk but unloaded), so the reconciler never iterated it -> archy-electrs-ui companion orphaned. Proven: package.install electrumx re-registered it + restored the companion. Self-heal logic is sound; test 31 clears on re-quadletize. electrumx on .228 de-contaminated. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/PRODUCTION-MASTER-PLAN.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 40f05ef2..b5d38ee8 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -290,7 +290,14 @@ recreate (both nodes — likely the 90s window vs reconcile tick + image step; i **To reach a literally-green 5× gate (now infra/node-prep, not code):** 1. Let bitcoin finish IBD on a test node (or point the gate at an archival-synced bitcoin). 2. Re-quadletize .228 (reinstall its backends so `.container` units regenerate, matching .198). -3. Investigate test 31 (companion recreate) — confirm code-bug vs load-timeout; clear test-44 orphans. +3. ✅ **test 31 ROOT-CAUSED = contamination, NOT a product bug.** `companion::reconcile` only + recreates a deleted companion unit (e.g. `archy-electrs-ui`) when its PARENT backend (electrumx) + is in `manifest_ids`. On contaminated .228 electrumx ran as plain podman and was NOT a tracked + manifest install (its `/opt/.../electrumx/manifest.yml` exists on disk but wasn't loaded), so the + reconciler never iterated it → companion orphaned. **Proven fix:** `package.install electrumx` + re-registered it (now `reconcile action app_id=electrumx` fires) AND restored the companion (unit + present, service active). The companion self-heal logic is correct. ⇒ test 31 clears once .228 is + re-quadletized (step 2). electrumx on .228 is now de-contaminated. Still: clear test-44 orphans. 4. Then run `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1` on the synced+quadlet node, then the other. **Quadlet context (still true, but SEPARATE from the bug above):** quadlet IS the intended backend