Container recovery: - Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s - Dependency-aware restarts: won't restart services before their deps - Reset dependent counters when a dependency recovers - Handle "created" state containers (were invisible to health monitor) - Added IndeedHub, mempool-api, mysql to tier system - Crash recovery: podman start timeout 30s→120s with retry - Podman client: socket timeout 5s→30s, added restart policy UI state representation: - Exit code 0 shows "stopped" (gray), not "crashed" (red) - Exit code 137 shows "killed (OOM)" - Non-zero exit shows "crashed" (red) - Added exit_code field to PackageDataEntry Install/uninstall fixes: - Install returns error when container doesn't start (was silent success) - Post-install hooks awaited instead of fire-and-forget tokio::spawn - Uninstall: graceful rm before force, volume prune, network cleanup - Uninstall returns error on partial failure (was 200 OK) Config consistency: - DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded) - Bitcoin: added ZMQ ports 28332/28333 for LND block notifications - IndeedHub port 7777→8190 (was conflicting with strfry) - Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0 Performance: - Metrics collector interval 60s→300s (was duplicating health monitor) - Podman client: proper error propagation instead of unwrap_or_default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
103 lines
6.3 KiB
Markdown
103 lines
6.3 KiB
Markdown
# Common Podman Failure Patterns
|
|
|
|
## Rootless Podman Specific Failures
|
|
|
|
| Error | Cause | Fix |
|
|
|-------|-------|-----|
|
|
| `ERRO[0000] cannot find UID/GID for user` | subuid/subgid not configured | Add `archipelago:100000:65536` to `/etc/subuid` and `/etc/subgid` |
|
|
| `Error: unshare: operation not permitted` | Systemd `RestrictNamespaces` blocks user namespaces | Remove `RestrictNamespaces=` from `archipelago.service` |
|
|
| `Error: could not get runtime: creating runtime` | XDG_RUNTIME_DIR not set or /run/user/1000 missing | Set `Environment=XDG_RUNTIME_DIR=/run/user/1000` in service, ensure `loginctl enable-linger archipelago` |
|
|
| `permission denied` on volume mount | Wrong UID ownership — must use mapped UIDs | `sudo chown -R 100000:100000 /var/lib/archipelago/APP` (see UID mapping table) |
|
|
| `ERRO[0000] rootless containers not supported` | Podman not configured for rootless | Run `podman system migrate`, check `/etc/subuid` |
|
|
| `Error: creating container storage: layer not known` | Corrupted rootless storage | `podman system reset` (destroys all containers — last resort) |
|
|
| `Error: stat /tmp/podman-run-1000/...: no such file` | PrivateTmp=yes in systemd isolates /tmp | Set `PrivateTmp=no` in `archipelago.service` |
|
|
| Container ports unreachable from LAN | UFW DEFAULT_FORWARD_POLICY="DROP" | Change to "ACCEPT" in `/etc/default/ufw`, then `sudo ufw reload` |
|
|
| `Error: error creating network namespace` | Systemd `SystemCallFilter` blocks clone/unshare | Remove `SystemCallFilter=` from `archipelago.service` |
|
|
| Containers lose network after service restart | podman runtime dir in /tmp cleaned | Ensure `PrivateTmp=no` so /tmp/podman-run-1000/ persists |
|
|
|
|
## Container Won't Start
|
|
|
|
| Error | Cause | Fix |
|
|
|-------|-------|-----|
|
|
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
|
|
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
|
|
| `permission denied` | Missing capability, wrong UID ownership, or read-only root | Check capabilities, check volume ownership with mapped UID, add tmpfs |
|
|
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
|
|
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
|
|
| `no such network` | Network missing | `podman network create archy-net` |
|
|
| `Error: netavark: ...subnet overlap` | Network CIDR conflict | `podman network rm archy-net && podman network create archy-net` |
|
|
|
|
## Container Starts But App Unreachable
|
|
|
|
| Symptom | Check Layer | Fix |
|
|
|---------|------------|-----|
|
|
| Direct port works, /app/ doesn't | Nginx config | Add `/app/{id}/` location block |
|
|
| Neither works | Podman ports | `podman port NAME` — verify mapping exists |
|
|
| Port mapped but refused | Container logs | App crashing internally — check logs |
|
|
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
|
|
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
|
|
| Works locally but not from LAN | UFW forward policy | Set `DEFAULT_FORWARD_POLICY="ACCEPT"` in `/etc/default/ufw` |
|
|
|
|
## Container Keeps Dying
|
|
|
|
| Pattern | Cause | Fix |
|
|
|---------|-------|-----|
|
|
| Exits immediately (code 1) | Config error | Check `podman logs NAME` |
|
|
| Dies after minutes | OOM killed | Increase `--memory` limit |
|
|
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
|
|
| Crash loop | Repeated crash | Fix root cause, don't just restart |
|
|
| Exit code 127 | Missing binary in container | Wrong image tag or corrupted image — re-pull |
|
|
| Exit code 137 | Killed by OOM or signal | Check `dmesg` for OOM kill, check `podman inspect` for OOMKilled |
|
|
|
|
## Network Issues
|
|
|
|
| Problem | Cause | Fix |
|
|
|---------|-------|-----|
|
|
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
|
|
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
|
|
| Container-to-container timeout | Different networks | Put both on same network |
|
|
| Bitcoin RPC refused from container | rpcallowip wrong subnet | Use `rpcallowip=0.0.0.0/0` (safe: port mapped, not exposed) |
|
|
| Old containers can't find new network | Subnet changed (rootful→rootless) | Recreate containers on new archy-net (rootless uses 10.89.x.x) |
|
|
|
|
## Volume Permission Patterns (Rootless UID Mapping)
|
|
|
|
Formula: **host_uid = 100000 + container_uid**
|
|
|
|
| Container UID | Host UID | Apps | Data Directory |
|
|
|---|---|---|---|
|
|
| 0 (root) | 100000 | lnd, fedimint, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay, immich | `/var/lib/archipelago/{app}` |
|
|
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `/var/lib/archipelago/postgres-*` |
|
|
| 101 | 100101 | bitcoin-knots | `/var/lib/archipelago/bitcoin` |
|
|
| 472 | 100472 | grafana | `/var/lib/archipelago/grafana` |
|
|
| 999 | 100999 | MariaDB (mysql-mempool) | `/var/lib/archipelago/mysql-mempool` |
|
|
|
|
## Capability Reference
|
|
|
|
| Capability | Apps That Need It | Failure Mode |
|
|
|-----------|------------------|-------------|
|
|
| CHOWN | nextcloud, homeassistant, btcpay, jellyfin, portainer | Can't chown during setup |
|
|
| SETUID/SETGID | nextcloud, homeassistant, btcpay, jellyfin | Can't switch to service user |
|
|
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
|
|
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
|
|
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
|
|
| NET_ADMIN + NET_RAW | tailscale | Can't create TUN device or manage routes |
|
|
|
|
## Read-Only Safe Apps
|
|
|
|
Only these apps can run with `--read-only` + tmpfs: searxng, grafana, filebrowser, electrumx, mempool-electrs, electrs, nostr-rs-relay, ollama, indeedhub
|
|
|
|
All others need writable root or will fail silently.
|
|
|
|
## Systemd Sandbox Requirements for Rootless Podman
|
|
|
|
These systemd service settings MUST be configured for rootless Podman to work:
|
|
|
|
| Setting | Required Value | Why |
|
|
|---------|---------------|-----|
|
|
| `ProtectHome=` | `no` | Podman stores images in `~/.local/share/containers/` |
|
|
| `PrivateTmp=` | `no` | Podman runtime lives in `/tmp/podman-run-1000/` |
|
|
| `RestrictNamespaces=` | NOT SET | Rootless podman creates user namespaces |
|
|
| `SystemCallFilter=` | NOT SET | Rootless podman needs clone/unshare syscalls |
|
|
| `ReadWritePaths=` | Include `/var/lib/archipelago /run/user /tmp /etc/containers /var/lib/containers /run/containers` | Volume data + podman runtime paths |
|
|
| `Environment=` | `XDG_RUNTIME_DIR=/run/user/1000` | Podman socket location |
|