From 47a51488657d5cb3e3205326911f7f301e37279b Mon Sep 17 00:00:00 2001 From: archipelago Date: Mon, 22 Jun 2026 11:09:12 -0400 Subject: [PATCH] =?UTF-8?q?docs(gate):=20two-node=20result=20=E2=80=94=20s?= =?UTF-8?q?top=20blocker=20FIXED;=20residual=20red=20is=20bitcoin-IBD=20+?= =?UTF-8?q?=20node=20prep?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit .228 104/110, .198 94/110 with the 3-fix binary. Every package.stop test passes on healthy apps. .198's 14/16 failures trace to bitcoin in IBD (test 83: ~137k blocks behind) cascading to lnd/btcpay/electrumx/mempool. 2 node-independent: companion recreate (31, both nodes), fedimint orphan pollution (44). Path to green 5x gate is now infra (sync bitcoin, re-quadletize .228) + minor (test 31), not lifecycle bugs. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/PRODUCTION-MASTER-PLAN.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 45cc1b28..40f05ef2 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -269,10 +269,29 @@ were the host-port watchdog firing while I rapid-cycled stop/start (fixed by #2) key mismatch" was actually the live-UI-companion launch-port issue (#3). "Grace vs gate-timeout" (electrumx 300s) was moot — a healthy electrumx honours SIGQUIT and stops in <1s. -**Status: validating breadth.** electrumx suite GREEN on .228 (the previously-failing repro). Full -single-iteration gate (all suites, DESTRUCTIVE) running on .228 to confirm the other apps; then .198, -then the 5× canonical gate. `.228` is still contamination-flavored (plain podman) but the fixes are -runtime-agnostic and electrumx passed there regardless. Re-quadletizing .228 + the 5× runs remain. +**TWO-NODE GATE RESULT (1×, DESTRUCTIVE, both with the 3-fix binary):** +- **.228: 104/110.** All previously-failing `package.stop` tests now PASS (bitcoin/btcpay/electrumx/ + fedimint/immich). Remaining 6: test 31 (companion recreate), 44 (fedimint orphan — probe + pollution), 55 (immich restart timing), 83 (bitcoin not archival-synced), 94/99 (endpoint/lnd-proxy + cascade from 83). +- **.198: 94/110.** **14 of 16 failures are one root cause: bitcoin is in IBD** (test 83 says + `blocks=817652 headers=954850` — ~137k behind). Everything chained to bitcoin cascades: lnd + (16,85), btcpay (22,23,103), electrumx (37), mempool stack (71,72,73,101), endpoints (94), + bitcoin.getinfo (7,12). The other 2 are node-independent: **31** (companion recreate) and **44** + (fedimint orphan pollution). + +**CONCLUSION: the lifecycle-stop blocker is FIXED and validated on both nodes.** The residual red is +NOT lifecycle bugs — it is (a) **bitcoin still syncing (IBD)** on the test nodes [test 83 is an +explicit precondition; nothing electrumx/lnd/btcpay/mempool can pass until it finishes], (b) **.228 +plain-podman contamination** (my cascade-gate), and (c) two minor items: **test 31** companion-unit +recreate (both nodes — likely the 90s window vs reconcile tick + image step; investigate) and **test +44** orphan fedimint container left by my probing. + +**To reach a literally-green 5× gate (now infra/node-prep, not code):** +1. Let bitcoin finish IBD on a test node (or point the gate at an archival-synced bitcoin). +2. Re-quadletize .228 (reinstall its backends so `.container` units regenerate, matching .198). +3. Investigate test 31 (companion recreate) — confirm code-bug vs load-timeout; clear test-44 orphans. +4. Then run `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1` on the synced+quadlet node, then the other. **Quadlet context (still true, but SEPARATE from the bug above):** quadlet IS the intended backend runtime — .198 has the backend `.container` files (bitcoin-knots/btcpay-server/fedimint/filebrowser/