diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md
index bbc561ec..03b0f00d 100644
--- a/docs/PRODUCTION-MASTER-PLAN.md
+++ b/docs/PRODUCTION-MASTER-PLAN.md
@@ -169,22 +169,58 @@ phases 2–6 (`dual-ecash-design.md`).
 ### ▶ CURRENT STATE + RESUME (2026-06-22 evening) — RESUME FROM HERE (works from any device)
 
 **Headline:** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN
-(110/110)**; a **hardened 5× run is IN PROGRESS on `.228`** (the single-node exit criterion). The
-gate is now single-node (.228); multinode is split out (`docs/multinode-testing-plan.md`).
+(110/110)**; a **fresh 5× run is IN PROGRESS on `.228`** (the single-node exit criterion) after a
+real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out
+(`docs/multinode-testing-plan.md`). The gate is canonically **5×** now — `run-gate.sh` (the `20x`
+naming/script was removed 2026-06-22, commit `57a013bc`).
+
+**2026-06-22 (late) — mempool stale-IP bug FOUND + FIXED (real production bug, not a flake):**
+The 1st 5× attempt failed iteration 1 on `#74 mempool api backend remains queryable`. Root cause was
+NOT timing — the frontend nginx pinned mempool-api's IP at startup (no `resolver`); after the gate
+restarts mempool-api (new podman IP) nginx 502s and the UI shows "offline". Fixed in
+`mempool-frontend:v3.0.1` (resolver+variable proxy_pass; see `[[project_mempool_nginx_stale_ip_fix]]`
+/ `docker/mempool-frontend/`), pushed to vps2, manifests bumped 3.0.0→3.0.1, deployed + resilience-
+verified live on .228 (backend restart now auto-recovers). Also fixed the test itself (`mempool.bats`
+#74: 180s→300s + real `fail` helper). Commits `0f05f73a` (fix) `57a013bc` (gate rename).
 
 **THE 5× RUN IS DETACHED ON .228 — survives terminal/session close. Check it from any machine:**
 ```
 sshpass -p archipelago ssh archipelago@192.168.1.228 \
-  'grep -E "iteration [0-9]+: (PASS|FAIL)|RESULTS|passed:|failed:" /tmp/gate-5x2.log; \
-   echo "running pid: $(pgrep -f run-gate.sh$ || echo DONE)"; grep "^not ok" /tmp/gate-5x2.log | sort -u'
+  'grep -E "iteration [0-9]+: (PASS|FAIL)|RESULTS|passed:|failed:" /tmp/gate-5x3.log; \
+   echo "running pid: $(pgrep -f run-gate.sh$ || echo DONE)"; grep "^not ok" /tmp/gate-5x3.log | sort -u'
 ```
-- Log: `/tmp/gate-5x2.log` on .228 · launched `nohup` (pid was 4042141) · `ARCHY_ITERATIONS=5
-  ARCHY_ALLOW_DESTRUCTIVE=1`, run **ON the node** from `/tmp/lifecycle-run/tests/lifecycle`
-  (ARCHY_HOST=127.0.0.1). `bats` 1.11.1 + static `jq` 1.7.1 are installed on .228 for this.
+- Log: `/tmp/gate-5x3.log` on .228 · launched `nohup` · `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1`,
+  run **ON the node** from `/tmp/lifecycle-run/tests/lifecycle` via `./run-gate.sh` (ARCHY_HOST=127.0.0.1).
+  `bats` 1.11.1 + static `jq` 1.7.1 are installed on .228.
 - **If all 5 iterations PASS → .228 has met the single-node criterion → demote the banner.**
-- If it flakes again: it'll be readiness-under-churn (lnd/mempool); the hardening (commit `98f4fa44`:
-  inter-iteration `settle_stack()` + 180–240s readiness windows) targets exactly that. Re-copy the
-  repo `tests/lifecycle` to /tmp/lifecycle-run and re-launch.
+- If it flakes again: readiness-under-churn (lnd/mempool); hardening in `98f4fa44` (inter-iteration
+  `settle_stack()` + readiness windows). Re-copy repo `tests/lifecycle` to /tmp/lifecycle-run, relaunch.
+
+**▶ 2026-06-23 (morning) — 5× FINISHED 2/5; both mempool fails ROOT-CAUSED to ONE real
+orchestrator bug (NOT flakes) + FIXED:** the overnight run finished `passed: 2 / failed: 3` on
+`gate-5x3.log`, three *distinct one-off* fails, none repeating:
+- iter1 `#5 container-list valid state for bitcoin-knots` — pre-launch churn (as predicted); didn't
+  repeat. **Hardened anyway:** the probe was a single-shot read; now polls ≤30s for a settled valid
+  state so a momentary `restarting`/transient can't flake a 20-min iteration (`bitcoin-knots.bats`).
+- iter2 `#74 mempool api queryable` + iter5 `#73 mempool stack running` — **SAME root cause.**
+  `package.restart mempool` resolves its container list via `ordered_containers_for_start`, which was
+  **injecting phantom stack-member names** (`mysql-mempool`, `archy-mempool-api`, `archy-mempool-web`
+  — variant names from the union `startup_order` list that aren't live on this node). The phantom
+  `mysql-mempool` is 2nd in the start order; `do_orchestrator_package_start` hits its unknown-app-id
+  fallback → `do_package_start` inspect fails "no such object" → the `?` **aborts the whole start
+  sequence**, so `mempool-api` (pos 5) + `mempool` frontend (pos 8) never start. They then sat down
+  ~6 min until the health monitor independently recovered them → #73 (frontend not running in 180s)
+  and #74 (api not queryable in 300s) both flake. Journal proof on .228: `package.restart mempool
+  failed: Start failed: mysql-mempool: ... no such object`, 23:27:32.
+  **Fix:** `ordered_containers_for_start` now orders only the *actually-present* containers and never
+  injects phantom order entries (new pure helper `order_present_containers` + 3 unit tests,
+  `dependencies.rs`). This is the SAME class as the mempool nginx bug — a hardcoded-name/reality
+  mismatch — and is exactly the manifest-driven-lifecycle anti-pattern the master plan targets.
+- **Deploy + relaunch:** built release binary on .116, swapped `/usr/local/bin/archipelago` on .228
+  (containers live under `user@1000.service`, NOT the `archipelago.service` cgroup, so a service
+  restart does NOT kill them — verified via conmon cgroup paths). Manually verified mempool restart
+  keeps the stack up, then relaunched a clean 5× → see `gate-5x4.log` (check cmd above, swap the
+  filename). Expectation: all three fixed → 5/5 green → demote the banner.
 
 **Code fixes shipped this session (all on `main`, built + DEPLOYED to .228 AND .198):**
 - `2dad64b2` stop honours per-app grace (was `-t 30` deadline racing SIGKILL).