lfg2025/archy

Dorian 5008cb6d1f fix: rootless UID mapping corrections + credential injection

- Correct off-by-one in UID mapping: container UID N → host UID
  (100000 + N - 1), not (100000 + N)
- Deploy script auto-fixes UID ownership on every deploy
- Bitcoin UI nginx uses __BITCOIN_RPC_AUTH__ placeholder injected
  from secrets at deploy time
- container rules updated for rootless podman architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-18 15:57:16 +00:00

11 KiB

Raw Blame History

name, description, allowed-tools

name	description	allowed-tools
podman-fix	Fix Podman container issues on Archipelago — restart failed containers, repair port bindings, fix network connectivity, add missing restart policies, fix rootless UID mapping, and resolve config drift. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). Use when asked to "fix container", "restart app", "fix port mapping", "container not working", "app won't start", "fix podman", "repair container", "container down", "permission denied", or after /podman-doctor identifies issues to fix.	Bash Read Edit Write Glob Grep

Podman Fix — Container Remediation

Targeted fix workflow for rootless Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.

SSH command: ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228

ROOTLESS PODMAN: All podman commands run as the archipelago user — NO sudo. Only use sudo for: chown on volume directories, UFW changes, systemd service edits, nginx reload. Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).

If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.

Fix Procedures

Fix 1: Container Not Running

# Check why it stopped
podman logs --tail 50 CONTAINER_NAME
podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"

# If clean exit or crash — just restart
podman start CONTAINER_NAME

# If corrupt state — remove and recreate
podman rm -f CONTAINER_NAME
# Then recreate using the install flow (trigger from UI or re-run creation command)

If container keeps crashing, check logs for the actual error. Common causes:

Missing config file → check if volume mount has the config
Wrong permissions → fix UID mapping (see Fix 8 below)
Dependency not ready → start dependency first, wait, then start this container
Exit code 127 → missing binary in container image, re-pull the image

Fix 2: Missing Restart Policy

The most common uptime killer. Fix for ALL containers at once:

# Fix a single container
podman update --restart unless-stopped CONTAINER_NAME

# Fix ALL containers that have no restart policy
for c in $(podman ps -a --format "{{.Names}}"); do
  policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
  if [ "$policy" = "no" ] || [ -z "$policy" ]; then
    echo "Fixing restart policy for: $c"
    podman update --restart unless-stopped "$c"
  fi
done

Also update the Rust source so new installs get it right:

Check core/archipelago/src/api/rpc/package.rs get_app_config() for the app
Ensure --restart flag is in the podman run args

Fix 3: Port Mapping Issues

Port conflict (address already in use)

# Find what's using the port
ss -tlnp | grep :PORT_NUMBER

# If it's another container, either change one's port or stop the conflicting one
podman stop CONFLICTING_CONTAINER

# If it's a host process (e.g., system tor vs container tor)
sudo systemctl stop tor  # Stop system service if container needs the port
sudo systemctl disable tor

Port not mapped (container running but port unreachable)

# Check current port mappings
podman port CONTAINER_NAME

# Can't add ports to running container — must recreate
podman stop CONTAINER_NAME
podman rm CONTAINER_NAME
# Recreate with correct -p flags (use the Rust install flow or manual podman run)

Nginx proxy missing or wrong

Read and fix the nginx config:

HTTP: image-recipe/configs/nginx-archipelago.conf
HTTPS: image-recipe/configs/snippets/archipelago-https-app-proxies.conf

Add a location block:

location /app/APP_ID/ {
    proxy_pass http://127.0.0.1:HOST_PORT/;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    # Hide X-Frame-Options so it works in our iframe
    proxy_hide_header X-Frame-Options;
    proxy_hide_header Content-Security-Policy;
}

After editing nginx config, deploy and reload:

# On server
sudo nginx -t && sudo systemctl reload nginx

Frontend routing missing

Edit neode-ui/src/stores/appLauncher.ts:

Add entry to PORT_TO_APP_ID map
If app blocks iframes, add port to the new-tab list in resolveAppIdFromUrl()

Fix 4: Network Issues

Container not on archy-net (can't resolve other containers)

# Connect to archy-net without recreating
podman network connect archy-net CONTAINER_NAME

# Verify
podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"

archy-net doesn't exist

podman network create archy-net
# Then reconnect all containers that need it

DNS not working inside container

# Test DNS from inside container
podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
podman exec CONTAINER_NAME ping -c1 bitcoin-knots

# If DNS fails, check the container's resolv.conf
podman exec CONTAINER_NAME cat /etc/resolv.conf

# If DNS fails, recreate container with explicit DNS
# Add --dns 1.1.1.1 to the podman run command

Container subnet changed (rootful → rootless migration)

# Old rootful subnet: 10.88.0.0/16
# New rootless subnet: 10.89.0.0/16
# Bitcoin RPC rpcallowip must be updated if using subnet-specific allowlist

# Check current archy-net subnet
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}"

# If Bitcoin RPC refuses connections from containers:
# Update bitcoin.conf rpcallowip to 0.0.0.0/0 (safe: only accessible via port mapping)

Fix 5: Health Check Issues

Add missing health check to running container

Can't add to running container — must recreate with health check flags:

# Example for a web app
podman run ... \
  --health-cmd "curl -f http://localhost:PORT/health || exit 1" \
  --health-interval 30s \
  --health-timeout 5s \
  --health-retries 3 \
  --health-start-period 60s \
  IMAGE

Fix unhealthy container

# See what the health check is actually running
podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"

# Run the health check manually to see the error
podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND

# Common fixes:
# - curl not installed in container → use wget or nc instead
# - Wrong port in health check → fix the check command
# - App takes too long to start → increase --health-start-period

Fix 6: Permission/Capability Issues

# Check what capabilities container has
podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"

# If missing required caps, must recreate with correct --cap-add flags
# Refer to the capability reference in /podman-doctor references

Fix 7: Full Config Consistency Fix

When port map is inconsistent across layers, fix ALL layers:

Decide the correct port (usually what's in package.rs)
Fix Podman: recreate container with correct -p flags
Fix Nginx: update location block's proxy_pass port
Fix Frontend: update PORT_TO_APP_ID in appLauncher.ts
Deploy: ./scripts/deploy-to-target.sh --live
Verify: curl -I http://192.168.1.228/app/APP_ID/

Fix 8: Rootless UID Mapping (Permission Denied on Volumes)

This is the #1 rootless-specific issue. Container UIDs are remapped by user namespaces.

Formula: host_uid = 100000 + container_uid

# Fix UID 0 containers (most apps — run as root inside, mapped to 100000 on host)
sudo chown -R 100000:100000 /var/lib/archipelago/APP_NAME

# Fix Bitcoin (container UID 101 → host UID 100101)
sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin

# Fix PostgreSQL (container UID 70 → host UID 100070)
sudo chown -R 100070:100070 /var/lib/archipelago/postgres-APP_NAME

# Fix Grafana (container UID 472 → host UID 100472)
sudo chown -R 100472:100472 /var/lib/archipelago/grafana

# Fix MariaDB (container UID 999 → host UID 100999)
sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool

How to find the right UID for a new container:

# Check what user the container image runs as
podman inspect IMAGE_NAME --format "{{.Config.User}}"
# If empty = root (UID 0) → host UID 100000
# If number → host UID = 100000 + that number
# If username → run: podman run --rm IMAGE_NAME id

After fixing ownership, restart the container:

podman restart CONTAINER_NAME

Fix 9: UFW Forward Policy (LAN Access Broken)

If containers work locally but not from other machines on the network:

# Check current policy
grep DEFAULT_FORWARD_POLICY /etc/default/ufw

# Fix: change DROP to ACCEPT
sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw
sudo ufw reload

Fix 10: Systemd Sandbox Too Restrictive

If the Rust backend can't scan/manage containers after a systemd update:

# Check what's blocked
sudo journalctl -u archipelago --since "10 min ago" | grep -i "denied\|permission\|namespace\|syscall"

# The archipelago.service MUST have these for rootless podman:
# ProtectHome=no
# PrivateTmp=no (or disabled)
# RestrictNamespaces= (NOT SET — don't restrict)
# SystemCallFilter= (NOT SET — don't filter)
# ReadWritePaths=/var/lib/archipelago /etc/containers /var/lib/containers /run/containers /run/user /tmp
# Environment=XDG_RUNTIME_DIR=/run/user/1000

Edit the service file:

sudo systemctl edit archipelago.service
# Add overrides, then:
sudo systemctl daemon-reload
sudo systemctl restart archipelago

Fix 11: Stale Podman Processes

If podman ps hangs or is very slow:

# Kill stuck podman processes (>10 of them = something is wrong)
stuck=$(pgrep -c -f "podman ps\|podman stats" 2>/dev/null || echo 0)
if [ "$stuck" -gt 10 ]; then
  pkill -f "podman ps\|podman stats"
  echo "Killed $stuck stuck podman processes"
fi

# Kill orphaned conmon processes holding ports
for pid in $(pgrep conmon); do
  container=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | grep -oP '(?<=--cid )\S+')
  if [ -n "$container" ] && ! podman ps -a --format "{{.ID}}" | grep -q "${container:0:12}"; then
    kill "$pid" 2>/dev/null && echo "Killed orphan conmon $pid"
  fi
done

After Fixing

Always verify the fix:

# Container running?
podman ps --filter name=CONTAINER_NAME

# Port reachable?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/

# Via nginx proxy?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/

# Health check passing?
podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"

# Volume permissions correct? (rootless check)
podman exec CONTAINER_NAME ls -la /data/ 2>/dev/null || echo "Check container data path"

Run /podman-doctor again to confirm all issues are resolved.

11 KiB Raw Blame History