- Correct off-by-one in UID mapping: container UID N → host UID (100000 + N - 1), not (100000 + N) - Deploy script auto-fixes UID ownership on every deploy - Bitcoin UI nginx uses __BITCOIN_RPC_AUTH__ placeholder injected from secrets at deploy time - container rules updated for rootless podman architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
11 KiB
name, description, allowed-tools
| name | description | allowed-tools |
|---|---|---|
| podman-fix | Fix Podman container issues on Archipelago — restart failed containers, repair port bindings, fix network connectivity, add missing restart policies, fix rootless UID mapping, and resolve config drift. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). Use when asked to "fix container", "restart app", "fix port mapping", "container not working", "app won't start", "fix podman", "repair container", "container down", "permission denied", or after /podman-doctor identifies issues to fix. | Bash Read Edit Write Glob Grep |
Podman Fix — Container Remediation
Targeted fix workflow for rootless Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
SSH command: ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228
ROOTLESS PODMAN: All
podmancommands run as thearchipelagouser — NO sudo. Only usesudofor: chown on volume directories, UFW changes, systemd service edits, nginx reload. Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).
If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
Fix Procedures
Fix 1: Container Not Running
# Check why it stopped
podman logs --tail 50 CONTAINER_NAME
podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
# If clean exit or crash — just restart
podman start CONTAINER_NAME
# If corrupt state — remove and recreate
podman rm -f CONTAINER_NAME
# Then recreate using the install flow (trigger from UI or re-run creation command)
If container keeps crashing, check logs for the actual error. Common causes:
- Missing config file → check if volume mount has the config
- Wrong permissions → fix UID mapping (see Fix 8 below)
- Dependency not ready → start dependency first, wait, then start this container
- Exit code 127 → missing binary in container image, re-pull the image
Fix 2: Missing Restart Policy
The most common uptime killer. Fix for ALL containers at once:
# Fix a single container
podman update --restart unless-stopped CONTAINER_NAME
# Fix ALL containers that have no restart policy
for c in $(podman ps -a --format "{{.Names}}"); do
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
echo "Fixing restart policy for: $c"
podman update --restart unless-stopped "$c"
fi
done
Also update the Rust source so new installs get it right:
- Check
core/archipelago/src/api/rpc/package.rsget_app_config()for the app - Ensure
--restartflag is in the podman run args
Fix 3: Port Mapping Issues
Port conflict (address already in use)
# Find what's using the port
ss -tlnp | grep :PORT_NUMBER
# If it's another container, either change one's port or stop the conflicting one
podman stop CONFLICTING_CONTAINER
# If it's a host process (e.g., system tor vs container tor)
sudo systemctl stop tor # Stop system service if container needs the port
sudo systemctl disable tor
Port not mapped (container running but port unreachable)
# Check current port mappings
podman port CONTAINER_NAME
# Can't add ports to running container — must recreate
podman stop CONTAINER_NAME
podman rm CONTAINER_NAME
# Recreate with correct -p flags (use the Rust install flow or manual podman run)
Nginx proxy missing or wrong
Read and fix the nginx config:
- HTTP:
image-recipe/configs/nginx-archipelago.conf - HTTPS:
image-recipe/configs/snippets/archipelago-https-app-proxies.conf
Add a location block:
location /app/APP_ID/ {
proxy_pass http://127.0.0.1:HOST_PORT/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Hide X-Frame-Options so it works in our iframe
proxy_hide_header X-Frame-Options;
proxy_hide_header Content-Security-Policy;
}
After editing nginx config, deploy and reload:
# On server
sudo nginx -t && sudo systemctl reload nginx
Frontend routing missing
Edit neode-ui/src/stores/appLauncher.ts:
- Add entry to
PORT_TO_APP_IDmap - If app blocks iframes, add port to the new-tab list in
resolveAppIdFromUrl()
Fix 4: Network Issues
Container not on archy-net (can't resolve other containers)
# Connect to archy-net without recreating
podman network connect archy-net CONTAINER_NAME
# Verify
podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
archy-net doesn't exist
podman network create archy-net
# Then reconnect all containers that need it
DNS not working inside container
# Test DNS from inside container
podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
podman exec CONTAINER_NAME ping -c1 bitcoin-knots
# If DNS fails, check the container's resolv.conf
podman exec CONTAINER_NAME cat /etc/resolv.conf
# If DNS fails, recreate container with explicit DNS
# Add --dns 1.1.1.1 to the podman run command
Container subnet changed (rootful → rootless migration)
# Old rootful subnet: 10.88.0.0/16
# New rootless subnet: 10.89.0.0/16
# Bitcoin RPC rpcallowip must be updated if using subnet-specific allowlist
# Check current archy-net subnet
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}"
# If Bitcoin RPC refuses connections from containers:
# Update bitcoin.conf rpcallowip to 0.0.0.0/0 (safe: only accessible via port mapping)
Fix 5: Health Check Issues
Add missing health check to running container
Can't add to running container — must recreate with health check flags:
# Example for a web app
podman run ... \
--health-cmd "curl -f http://localhost:PORT/health || exit 1" \
--health-interval 30s \
--health-timeout 5s \
--health-retries 3 \
--health-start-period 60s \
IMAGE
Fix unhealthy container
# See what the health check is actually running
podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
# Run the health check manually to see the error
podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
# Common fixes:
# - curl not installed in container → use wget or nc instead
# - Wrong port in health check → fix the check command
# - App takes too long to start → increase --health-start-period
Fix 6: Permission/Capability Issues
# Check what capabilities container has
podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
# If missing required caps, must recreate with correct --cap-add flags
# Refer to the capability reference in /podman-doctor references
Fix 7: Full Config Consistency Fix
When port map is inconsistent across layers, fix ALL layers:
- Decide the correct port (usually what's in package.rs)
- Fix Podman: recreate container with correct
-pflags - Fix Nginx: update location block's
proxy_passport - Fix Frontend: update
PORT_TO_APP_IDin appLauncher.ts - Deploy:
./scripts/deploy-to-target.sh --live - Verify:
curl -I http://192.168.1.228/app/APP_ID/
Fix 8: Rootless UID Mapping (Permission Denied on Volumes)
This is the #1 rootless-specific issue. Container UIDs are remapped by user namespaces.
Formula: host_uid = 100000 + container_uid
# Fix UID 0 containers (most apps — run as root inside, mapped to 100000 on host)
sudo chown -R 100000:100000 /var/lib/archipelago/APP_NAME
# Fix Bitcoin (container UID 101 → host UID 100101)
sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin
# Fix PostgreSQL (container UID 70 → host UID 100070)
sudo chown -R 100070:100070 /var/lib/archipelago/postgres-APP_NAME
# Fix Grafana (container UID 472 → host UID 100472)
sudo chown -R 100472:100472 /var/lib/archipelago/grafana
# Fix MariaDB (container UID 999 → host UID 100999)
sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool
How to find the right UID for a new container:
# Check what user the container image runs as
podman inspect IMAGE_NAME --format "{{.Config.User}}"
# If empty = root (UID 0) → host UID 100000
# If number → host UID = 100000 + that number
# If username → run: podman run --rm IMAGE_NAME id
After fixing ownership, restart the container:
podman restart CONTAINER_NAME
Fix 9: UFW Forward Policy (LAN Access Broken)
If containers work locally but not from other machines on the network:
# Check current policy
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
# Fix: change DROP to ACCEPT
sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw
sudo ufw reload
Fix 10: Systemd Sandbox Too Restrictive
If the Rust backend can't scan/manage containers after a systemd update:
# Check what's blocked
sudo journalctl -u archipelago --since "10 min ago" | grep -i "denied\|permission\|namespace\|syscall"
# The archipelago.service MUST have these for rootless podman:
# ProtectHome=no
# PrivateTmp=no (or disabled)
# RestrictNamespaces= (NOT SET — don't restrict)
# SystemCallFilter= (NOT SET — don't filter)
# ReadWritePaths=/var/lib/archipelago /etc/containers /var/lib/containers /run/containers /run/user /tmp
# Environment=XDG_RUNTIME_DIR=/run/user/1000
Edit the service file:
sudo systemctl edit archipelago.service
# Add overrides, then:
sudo systemctl daemon-reload
sudo systemctl restart archipelago
Fix 11: Stale Podman Processes
If podman ps hangs or is very slow:
# Kill stuck podman processes (>10 of them = something is wrong)
stuck=$(pgrep -c -f "podman ps\|podman stats" 2>/dev/null || echo 0)
if [ "$stuck" -gt 10 ]; then
pkill -f "podman ps\|podman stats"
echo "Killed $stuck stuck podman processes"
fi
# Kill orphaned conmon processes holding ports
for pid in $(pgrep conmon); do
container=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | grep -oP '(?<=--cid )\S+')
if [ -n "$container" ] && ! podman ps -a --format "{{.ID}}" | grep -q "${container:0:12}"; then
kill "$pid" 2>/dev/null && echo "Killed orphan conmon $pid"
fi
done
After Fixing
Always verify the fix:
# Container running?
podman ps --filter name=CONTAINER_NAME
# Port reachable?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
# Via nginx proxy?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
# Health check passing?
podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
# Volume permissions correct? (rootless check)
podman exec CONTAINER_NAME ls -la /data/ 2>/dev/null || echo "Check container data path"
Run /podman-doctor again to confirm all issues are resolved.