From 5269d50039392ef0292fdd8c1df334b4509fc705 Mon Sep 17 00:00:00 2001
From: archipelago <archipelago@localhost>
Date: Wed, 1 Jul 2026 09:20:15 -0400
Subject: [PATCH] docs: record .198 cleanup outcome + .228 fedimint-guardian
 clarification

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
---
 docs/PRODUCTION-MASTER-PLAN.md | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)
diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md
index c3363499..7319561a 100644
--- a/docs/PRODUCTION-MASTER-PLAN.md
+++ b/docs/PRODUCTION-MASTER-PLAN.md
@@ -1161,6 +1161,37 @@ never knows it's talking to podman — it just sees the standard Docker socket p
 Docker Engine API, which podman's socket implements compatibly. Not a bug: pick "Docker" (local) in
 the wizard.
 
+## 12b. `.198` disk-I/O relief — apps uninstalled, immich uninstall-mapping bug found+fixed (2026-07-01)
+
+User approved uninstalling immich, botfights, grafana, searxng on `.198` to relieve the disk-I/O
+contention from §12 (bitcoin-knots' slow IBD). All 4 uninstalled via RPC. **Found another instance
+of the exact §11 uninstall-durability bug class, this time in the uninstall app_id MAPPING rather
+than the durability mechanism**: `orchestrator_uninstall_app_ids("immich")` had no case (fell to the
+generic `_ => vec![package_id]`), so uninstalling "immich" only disabled the "immich" app_id itself
+— "immich-postgres" and "immich-redis" (separate orchestrator-tracked manifests, same shape as
+mempool-api/archy-mempool-db) stayed enabled, and the boot reconciler kept restarting their leftover
+*stopped* containers every ~30s. Confirmed live via `journalctl`: `reconcile action
+app_id=immich-redis action=Started` well after uninstall. **Fixed** (mirrors the existing
+mempool/btcpay/electrum mappings) + new test `immich_uninstall_covers_every_sibling_orchestrator_app_id`.
+Cleaned up live on `.198` by fully removing (not just stopping) the orphaned containers — a fully
+*absent* optional container is already correctly left alone even by the old deployed binary, so this
+stuck without needing a redeploy. **Committed + pushed** `09d42cbb`.
+
+**Outcome**: disk still showed 90-100% `%util` and `getblockchaininfo` still timed out (65s) right
+after the uninstalls — likely because bitcoin-knots' own IBD validation (492GB+ cumulative block I/O
+already) is the dominant consumer, not the other apps; removing 4 relatively light/idle apps gives
+some relief (less concurrent contention) but doesn't fix a fundamentally disk-bound full-chain
+validation in progress. Data volumes for the uninstalled apps were left in place (uninstall doesn't
+wipe `/var/lib/archipelago/<app>` by default) — disk *space* usage (72%) is unchanged, only the
+*active* I/O from those containers stopped.
+
+**`.228` "fedimint guardian" — clarified, not a bug**: user separately flagged ".228 has the fedimint
+guardian stop issue." Checked: `.228` has NO `fedimint` (guardian) container installed at all — only
+`fedimint-clientd` (a client joining *external* federations) and its UI, both healthy (`Up 2-5 days`).
+Only `.198` runs an actual guardian (`fedimint`), and that's the one already covered by §12's
+disk-I/O root cause. Likely a node mix-up in the report — flag if something else specific to `.228`
+was meant.
+
 ## 13. Peer-federated content 404s over FIPS (2026-07-01) — DATA LOSS, not a code bug in the transport
 
 User report: `.116 → .228` streaming/downloading peer-federated content over FIPS failed with