Container recovery: - Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s - Dependency-aware restarts: won't restart services before their deps - Reset dependent counters when a dependency recovers - Handle "created" state containers (were invisible to health monitor) - Added IndeedHub, mempool-api, mysql to tier system - Crash recovery: podman start timeout 30s→120s with retry - Podman client: socket timeout 5s→30s, added restart policy UI state representation: - Exit code 0 shows "stopped" (gray), not "crashed" (red) - Exit code 137 shows "killed (OOM)" - Non-zero exit shows "crashed" (red) - Added exit_code field to PackageDataEntry Install/uninstall fixes: - Install returns error when container doesn't start (was silent success) - Post-install hooks awaited instead of fire-and-forget tokio::spawn - Uninstall: graceful rm before force, volume prune, network cleanup - Uninstall returns error on partial failure (was 200 OK) Config consistency: - DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded) - Bitcoin: added ZMQ ports 28332/28333 for LND block notifications - IndeedHub port 7777→8190 (was conflicting with strfry) - Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0 Performance: - Metrics collector interval 60s→300s (was duplicating health monitor) - Podman client: proper error propagation instead of unwrap_or_default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.3 KiB
6.3 KiB
Common Podman Failure Patterns
Rootless Podman Specific Failures
| Error | Cause | Fix |
|---|---|---|
ERRO[0000] cannot find UID/GID for user |
subuid/subgid not configured | Add archipelago:100000:65536 to /etc/subuid and /etc/subgid |
Error: unshare: operation not permitted |
Systemd RestrictNamespaces blocks user namespaces |
Remove RestrictNamespaces= from archipelago.service |
Error: could not get runtime: creating runtime |
XDG_RUNTIME_DIR not set or /run/user/1000 missing | Set Environment=XDG_RUNTIME_DIR=/run/user/1000 in service, ensure loginctl enable-linger archipelago |
permission denied on volume mount |
Wrong UID ownership — must use mapped UIDs | sudo chown -R 100000:100000 /var/lib/archipelago/APP (see UID mapping table) |
ERRO[0000] rootless containers not supported |
Podman not configured for rootless | Run podman system migrate, check /etc/subuid |
Error: creating container storage: layer not known |
Corrupted rootless storage | podman system reset (destroys all containers — last resort) |
Error: stat /tmp/podman-run-1000/...: no such file |
PrivateTmp=yes in systemd isolates /tmp | Set PrivateTmp=no in archipelago.service |
| Container ports unreachable from LAN | UFW DEFAULT_FORWARD_POLICY="DROP" | Change to "ACCEPT" in /etc/default/ufw, then sudo ufw reload |
Error: error creating network namespace |
Systemd SystemCallFilter blocks clone/unshare |
Remove SystemCallFilter= from archipelago.service |
| Containers lose network after service restart | podman runtime dir in /tmp cleaned | Ensure PrivateTmp=no so /tmp/podman-run-1000/ persists |
Container Won't Start
| Error | Cause | Fix |
|---|---|---|
exec format error |
Binary built on wrong arch | Rebuild on the Linux server |
address already in use |
Port conflict | ss -tlnp | grep :PORT to find offender |
permission denied |
Missing capability, wrong UID ownership, or read-only root | Check capabilities, check volume ownership with mapped UID, add tmpfs |
OCI runtime error |
Corrupt container state | podman rm -f NAME && recreate |
image not known |
Image not pulled | podman pull IMAGE:TAG |
no such network |
Network missing | podman network create archy-net |
Error: netavark: ...subnet overlap |
Network CIDR conflict | podman network rm archy-net && podman network create archy-net |
Container Starts But App Unreachable
| Symptom | Check Layer | Fix |
|---|---|---|
| Direct port works, /app/ doesn't | Nginx config | Add /app/{id}/ location block |
| Neither works | Podman ports | podman port NAME — verify mapping exists |
| Port mapped but refused | Container logs | App crashing internally — check logs |
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
| Works locally but not from LAN | UFW forward policy | Set DEFAULT_FORWARD_POLICY="ACCEPT" in /etc/default/ufw |
Container Keeps Dying
| Pattern | Cause | Fix |
|---|---|---|
| Exits immediately (code 1) | Config error | Check podman logs NAME |
| Dies after minutes | OOM killed | Increase --memory limit |
| Dies when dep restarts | No restart policy | Add --restart unless-stopped |
| Crash loop | Repeated crash | Fix root cause, don't just restart |
| Exit code 127 | Missing binary in container | Wrong image tag or corrupted image — re-pull |
| Exit code 137 | Killed by OOM or signal | Check dmesg for OOM kill, check podman inspect for OOMKilled |
Network Issues
| Problem | Cause | Fix |
|---|---|---|
| Can't resolve container names | Not on archy-net | Recreate with --network=archy-net |
| Can't reach internet | DNS missing | Add --dns 1.1.1.1 |
| Container-to-container timeout | Different networks | Put both on same network |
| Bitcoin RPC refused from container | rpcallowip wrong subnet | Use rpcallowip=0.0.0.0/0 (safe: port mapped, not exposed) |
| Old containers can't find new network | Subnet changed (rootful→rootless) | Recreate containers on new archy-net (rootless uses 10.89.x.x) |
Volume Permission Patterns (Rootless UID Mapping)
Formula: host_uid = 100000 + container_uid
| Container UID | Host UID | Apps | Data Directory |
|---|---|---|---|
| 0 (root) | 100000 | lnd, fedimint, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay, immich | /var/lib/archipelago/{app} |
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | /var/lib/archipelago/postgres-* |
| 101 | 100101 | bitcoin-knots | /var/lib/archipelago/bitcoin |
| 472 | 100472 | grafana | /var/lib/archipelago/grafana |
| 999 | 100999 | MariaDB (mysql-mempool) | /var/lib/archipelago/mysql-mempool |
Capability Reference
| Capability | Apps That Need It | Failure Mode |
|---|---|---|
| CHOWN | nextcloud, homeassistant, btcpay, jellyfin, portainer | Can't chown during setup |
| SETUID/SETGID | nextcloud, homeassistant, btcpay, jellyfin | Can't switch to service user |
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
| NET_ADMIN + NET_RAW | tailscale | Can't create TUN device or manage routes |
Read-Only Safe Apps
Only these apps can run with --read-only + tmpfs: searxng, grafana, filebrowser, electrumx, mempool-electrs, electrs, nostr-rs-relay, ollama, indeedhub
All others need writable root or will fail silently.
Systemd Sandbox Requirements for Rootless Podman
These systemd service settings MUST be configured for rootless Podman to work:
| Setting | Required Value | Why |
|---|---|---|
ProtectHome= |
no |
Podman stores images in ~/.local/share/containers/ |
PrivateTmp= |
no |
Podman runtime lives in /tmp/podman-run-1000/ |
RestrictNamespaces= |
NOT SET | Rootless podman creates user namespaces |
SystemCallFilter= |
NOT SET | Rootless podman needs clone/unshare syscalls |
ReadWritePaths= |
Include /var/lib/archipelago /run/user /tmp /etc/containers /var/lib/containers /run/containers |
Volume data + podman runtime paths |
Environment= |
XDG_RUNTIME_DIR=/run/user/1000 |
Podman socket location |