From 5269d50039392ef0292fdd8c1df334b4509fc705 Mon Sep 17 00:00:00 2001 From: archipelago Date: Wed, 1 Jul 2026 09:20:15 -0400 Subject: [PATCH] docs: record .198 cleanup outcome + .228 fedimint-guardian clarification Co-Authored-By: Claude Sonnet 5 --- docs/PRODUCTION-MASTER-PLAN.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index c3363499..7319561a 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -1161,6 +1161,37 @@ never knows it's talking to podman — it just sees the standard Docker socket p Docker Engine API, which podman's socket implements compatibly. Not a bug: pick "Docker" (local) in the wizard. +## 12b. `.198` disk-I/O relief — apps uninstalled, immich uninstall-mapping bug found+fixed (2026-07-01) + +User approved uninstalling immich, botfights, grafana, searxng on `.198` to relieve the disk-I/O +contention from §12 (bitcoin-knots' slow IBD). All 4 uninstalled via RPC. **Found another instance +of the exact §11 uninstall-durability bug class, this time in the uninstall app_id MAPPING rather +than the durability mechanism**: `orchestrator_uninstall_app_ids("immich")` had no case (fell to the +generic `_ => vec![package_id]`), so uninstalling "immich" only disabled the "immich" app_id itself +— "immich-postgres" and "immich-redis" (separate orchestrator-tracked manifests, same shape as +mempool-api/archy-mempool-db) stayed enabled, and the boot reconciler kept restarting their leftover +*stopped* containers every ~30s. Confirmed live via `journalctl`: `reconcile action +app_id=immich-redis action=Started` well after uninstall. **Fixed** (mirrors the existing +mempool/btcpay/electrum mappings) + new test `immich_uninstall_covers_every_sibling_orchestrator_app_id`. +Cleaned up live on `.198` by fully removing (not just stopping) the orphaned containers — a fully +*absent* optional container is already correctly left alone even by the old deployed binary, so this +stuck without needing a redeploy. **Committed + pushed** `09d42cbb`. + +**Outcome**: disk still showed 90-100% `%util` and `getblockchaininfo` still timed out (65s) right +after the uninstalls — likely because bitcoin-knots' own IBD validation (492GB+ cumulative block I/O +already) is the dominant consumer, not the other apps; removing 4 relatively light/idle apps gives +some relief (less concurrent contention) but doesn't fix a fundamentally disk-bound full-chain +validation in progress. Data volumes for the uninstalled apps were left in place (uninstall doesn't +wipe `/var/lib/archipelago/` by default) — disk *space* usage (72%) is unchanged, only the +*active* I/O from those containers stopped. + +**`.228` "fedimint guardian" — clarified, not a bug**: user separately flagged ".228 has the fedimint +guardian stop issue." Checked: `.228` has NO `fedimint` (guardian) container installed at all — only +`fedimint-clientd` (a client joining *external* federations) and its UI, both healthy (`Up 2-5 days`). +Only `.198` runs an actual guardian (`fedimint`), and that's the one already covered by §12's +disk-I/O root cause. Likely a node mix-up in the report — flag if something else specific to `.228` +was meant. + ## 13. Peer-federated content 404s over FIPS (2026-07-01) — DATA LOSS, not a code bug in the transport User report: `.116 → .228` streaming/downloading peer-federated content over FIPS failed with