archy/feedback_deploy_patterns.md at 4295476291bb979d9862aa7673fb0dd87dda5980

lfg2025/archy

Dorian 1e283daf13 fix: overhaul container lifecycle — recovery, health, uninstall, UI state

Container recovery:
- Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s
- Dependency-aware restarts: won't restart services before their deps
- Reset dependent counters when a dependency recovers
- Handle "created" state containers (were invisible to health monitor)
- Added IndeedHub, mempool-api, mysql to tier system
- Crash recovery: podman start timeout 30s→120s with retry
- Podman client: socket timeout 5s→30s, added restart policy

UI state representation:
- Exit code 0 shows "stopped" (gray), not "crashed" (red)
- Exit code 137 shows "killed (OOM)"
- Non-zero exit shows "crashed" (red)
- Added exit_code field to PackageDataEntry

Install/uninstall fixes:
- Install returns error when container doesn't start (was silent success)
- Post-install hooks awaited instead of fire-and-forget tokio::spawn
- Uninstall: graceful rm before force, volume prune, network cleanup
- Uninstall returns error on partial failure (was 200 OK)

Config consistency:
- DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded)
- Bitcoin: added ZMQ ports 28332/28333 for LND block notifications
- IndeedHub port 7777→8190 (was conflicting with strfry)
- Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0

Performance:
- Metrics collector interval 60s→300s (was duplicating health monitor)
- Podman client: proper error propagation instead of unwrap_or_default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-31 07:03:57 +01:00

1.1 KiB

Raw Blame History

name, description, type

name	description	type
Deploy container patterns	Hard-won deploy patterns — rootless port 80, credential sync, health checks, image export	feedback

Container deploy patterns learned from fleet-wide deploy sessions.

Rootless port 80: Containers binding port 80 MUST use --user 0:0. NET_BIND_SERVICE cap doesn't work in rootless Podman.

Why: Discovered across multiple containers (FileBrowser, Nextcloud, Vaultwarden, Jellyfin) that --cap-add NET_BIND_SERVICE is silently ignored in rootless mode. Only --user 0:0 works.

Credential sync: MariaDB/Postgres only read env vars on FIRST init. If deploy generates new random passwords in secrets/ but the DB data dir already exists, the DB keeps the OLD password. Fix: either wipe data dir + reinit, or ALTER USER to sync.

Image export: Always export custom images as INDIVIDUAL tarballs (podman save -o name.tar). Combined tarballs corrupt image IDs.

Health checks: Every container should have --health-cmd. Currently 25+ containers have them.

How to apply: Check these patterns in any deploy script changes or new container additions.

1.1 KiB Raw Blame History

1.1 KiB

Raw Blame History