diff --git a/docs/UNIFIED-TASK-TRACKER.md b/docs/UNIFIED-TASK-TRACKER.md index f45731d5..d1c21dd3 100644 --- a/docs/UNIFIED-TASK-TRACKER.md +++ b/docs/UNIFIED-TASK-TRACKER.md @@ -113,6 +113,25 @@ those are marked ✅ below with the commit that did it, so we stop re-litigating manifest-driven apps, never stacks; fedimint/fedimint-gateway/fedimint-clientd are 3 separate single-container apps with manifest dependency edges, not a coordinated stack. Workstream A's stack-migration tail is fully closed. +- [ ] **Container thrashing/flapping + reconciler churn** (added 2026-07-04 — was + implicit across other tracks, now an explicit pre-tag concern). The root cause + of restart-storm flapping is pre-Quadlet architecture: restarting + `archipelago.service` SIGKILLs every container in its cgroup, then the + reconciler rebuilds the world over several minutes (the post-OTA health check + deliberately skips per-app container assertions because of exactly this). + Consolidated lever list, in order of impact: + - **Phase-3 Quadlet default-flip** (tracked above) — removes the SIGKILL-the-world + behavior entirely; the single biggest fix. + - **Workstream F lifecycle items** — immich/grafana uninstall hangs + ghost + containers, grafana reinstall stops, fedimint guardian sync + (`docs/PRODUCTION-MASTER-PLAN.md` workstream F). + - **Reconciler churn observability** — no metric/log today distinguishes "settling + after restart" from "flapping"; add a per-app restart counter + log line when an + app restarts >N times in M minutes so thrash is visible instead of anecdotal. + - Already landed, don't re-do: boot-reconciler circuit breaker (2026-07-01), + indeedhub crashloop fix (2026-07-01), async blocking-Command pass (`4c75bb3d`, + removes executor stalls that made the API janky under reconcile load). + - Perf polish riding along: 93 MB frontend dist shrink (hardening plan §D 🟡). - [ ] **Developer tooling CLI suite** (validate/render/local-install/lifecycle-test) — APP-PACKAGING-MIGRATION-PLAN.md step 5, needed before external devs can publish. - [x] ~~**Consolidated deploy 2026-07-01**: merged PR #67 (reticulum daemon