Container recovery: - Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s - Dependency-aware restarts: won't restart services before their deps - Reset dependent counters when a dependency recovers - Handle "created" state containers (were invisible to health monitor) - Added IndeedHub, mempool-api, mysql to tier system - Crash recovery: podman start timeout 30s→120s with retry - Podman client: socket timeout 5s→30s, added restart policy UI state representation: - Exit code 0 shows "stopped" (gray), not "crashed" (red) - Exit code 137 shows "killed (OOM)" - Non-zero exit shows "crashed" (red) - Added exit_code field to PackageDataEntry Install/uninstall fixes: - Install returns error when container doesn't start (was silent success) - Post-install hooks awaited instead of fire-and-forget tokio::spawn - Uninstall: graceful rm before force, volume prune, network cleanup - Uninstall returns error on partial failure (was 200 OK) Config consistency: - DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded) - Bitcoin: added ZMQ ports 28332/28333 for LND block notifications - IndeedHub port 7777→8190 (was conflicting with strfry) - Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0 Performance: - Metrics collector interval 60s→300s (was duplicating health monitor) - Podman client: proper error propagation instead of unwrap_or_default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
90 lines
3.7 KiB
Markdown
90 lines
3.7 KiB
Markdown
---
|
|
name: podman
|
|
description: Rootless Podman container management — diagnose, fix, and harden uptime. Use for container issues, port problems, UID mapping, health checks, or uptime hardening.
|
|
disable-model-invocation: true
|
|
allowed-tools: Bash, Read, Edit, Write, Glob, Grep
|
|
argument-hint: "[diagnose|fix|uptime] [container-name]"
|
|
---
|
|
|
|
# Podman — Container Management
|
|
|
|
Archipelago runs rootless Podman as `archipelago` user (UID 1000). All `podman` commands run without sudo. UID mapping: container UID N → host UID (100000 + N).
|
|
|
|
**SSH**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
|
|
|
## Diagnose
|
|
|
|
```bash
|
|
# Container status
|
|
podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
|
|
|
|
# Restart policies (must be "unless-stopped")
|
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
|
echo -n "$c: "; podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
|
|
done
|
|
|
|
# Health checks
|
|
for c in $(podman ps --format "{{.Names}}"); do
|
|
health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
|
|
[ -n "$health" ] && [ "$health" != "<no value>" ] && echo "$c: $health"
|
|
done
|
|
|
|
# Resource usage + recent deaths
|
|
podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
|
|
podman events --filter event=died --since 24h 2>/dev/null | tail -10
|
|
|
|
# Rootless prerequisites
|
|
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR" # must be /run/user/1000
|
|
grep archipelago /etc/subuid # must show archipelago:100000:65536
|
|
ls /var/lib/systemd/linger/ | grep archipelago # must exist
|
|
grep DEFAULT_FORWARD_POLICY /etc/default/ufw # must be ACCEPT
|
|
```
|
|
|
|
Cross-check 4 layers for port consistency: Backend config (package.rs) → Podman ports → Nginx proxy → Frontend appLauncher.ts. See `references/port-map.md`.
|
|
|
|
## Fix
|
|
|
|
**Restart policy missing**: `podman update --restart unless-stopped CONTAINER_NAME`
|
|
|
|
**UID mapping (permission denied)**: `sudo chown -R HOST_UID:HOST_UID /var/lib/archipelago/APP`. Formula: host_uid = 100000 + container_uid. See `references/uid-mapping.md`.
|
|
|
|
**Port conflict**: `ss -tlnp | grep :PORT` to find offender. Can't add ports to running container — must recreate.
|
|
|
|
**Network missing**: `podman network connect archy-net CONTAINER_NAME`
|
|
|
|
**UFW blocking LAN**: `sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw && sudo ufw reload`
|
|
|
|
**Stale processes**: `pgrep -c -f "podman ps"` — if >10, kill stuck processes.
|
|
|
|
See `references/common-failures.md` for the full error→cause→fix lookup table.
|
|
|
|
## Uptime Hardening
|
|
|
|
### Layer 1: Restart policies
|
|
```bash
|
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
|
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
|
[ "$policy" = "no" ] || [ -z "$policy" ] && podman update --restart unless-stopped "$c"
|
|
done
|
|
```
|
|
|
|
### Layer 2: Watchdog timer
|
|
Create `/usr/local/bin/archipelago-container-watchdog.sh` that restarts stopped/unhealthy containers every 2 minutes via systemd timer. Script runs as archipelago user with `XDG_RUNTIME_DIR=/run/user/1000`.
|
|
|
|
### Layer 3: Ordered startup
|
|
Bitcoin stack has dependency chain: bitcoin-knots → electrumx + lnd → mempool + btcpay + fedimint → UI containers. Create `/usr/local/bin/archipelago-ordered-start.sh` with wait-for-container logic between tiers.
|
|
|
|
### Verification
|
|
```bash
|
|
sudo reboot # then SSH back after 3 min
|
|
podman ps --format "{{.Names}}" | sort # should match pre-reboot list
|
|
```
|
|
|
|
## Systemd Requirements
|
|
|
|
The archipelago.service needs these for rootless Podman:
|
|
- `ProtectHome=no` (podman stores in ~/.local/share/containers/)
|
|
- `PrivateTmp=no` (runtime in /tmp/podman-run-1000/)
|
|
- Do not set `RestrictNamespaces=` or `SystemCallFilter=`
|
|
- `Environment=XDG_RUNTIME_DIR=/run/user/1000`
|