--- name: Container Orchestration Hardening description: Container orchestration overhaul — stop grace periods, pull retry, persistent restart tracking, scheduled remediation, failsafe install, boot reconciliation type: project --- Container orchestration hardening implemented on dev-iso branch (2026-03-28). **Why:** Gitea issue requesting true orchestration. Containers were unreliable — 10s stop timeout risked Bitcoin Core UTXO corruption, image pulls failed silently, restart counters reset on process restart enabling infinite loops, doctor/reconcile scripts only ran manually. **What was done (7 changes):** 1. Per-container stop grace periods (600s bitcoin, 330s lnd, 300s electrs, 120s databases, 60s btcpay, 30s default) + systemd TimeoutStopSec=660 2. Image pull retry with exponential backoff (3 attempts: 5s/15s/45s) + post-pull verification + stacks.rs error propagation instead of silent swallow 3. Resolved container/health_monitor.rs TODO (documented as orchestrator-level responsibility) 4. Persistent restart tracking to restart-tracker.json (survives process restarts, seeded on startup) 5. Scheduled systemd timers: container-doctor every 30min, reconcile-containers every 6h 6. Failsafe install: post-pull image verify, rollback on start failure, 30s post-start health check with crash diagnosis 7. Boot reconciliation: runs reconcile-containers.sh after crash recovery completes **How to apply:** These changes affect beta reliability. The other programmer is working on custom base ISO on the same branch — coordinate on build-auto-installer-iso.sh changes.