fix: rootless UID mapping corrections + credential injection

- Correct off-by-one in UID mapping: container UID N → host UID
  (100000 + N - 1), not (100000 + N)
- Deploy script auto-fixes UID ownership on every deploy
- Bitcoin UI nginx uses __BITCOIN_RPC_AUTH__ placeholder injected
  from secrets at deploy time
- container rules updated for rootless podman architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dorian 2026-03-18 15:57:16 +00:00
parent bf0cd342ca
commit 5008cb6d1f
8 changed files with 2167 additions and 129 deletions

View File

@ -5,15 +5,46 @@ globs:
- "**/*podman*" - "**/*podman*"
- "**/Containerfile" - "**/Containerfile"
- "**/Dockerfile" - "**/Dockerfile"
- "**/first-boot*"
- "**/container-doctor*"
--- ---
# Container Security Rules (Archipelago) # Container Security Rules (Archipelago — Rootless Podman)
- `readonly_root: true` always — containers must not write to their root filesystem ## Rootless Podman Architecture
- Podman runs as `archipelago` user (UID 1000), NOT root — never use `sudo podman`
- UID namespace mapping via subuid: container UID N → host UID (100000 + N)
- Container images stored in `~/.local/share/containers/storage/` (NOT /var/lib/containers)
- Container subnet: `10.89.0.0/16` (rootless), not `10.88.0.0/16` (rootful)
- XDG_RUNTIME_DIR must be `/run/user/1000` — required for podman socket
- `loginctl enable-linger archipelago` required for containers to survive logout
## Container Security (Non-Negotiable)
- Drop ALL capabilities, add only what's required (`--cap-drop=ALL --cap-add=...`) - Drop ALL capabilities, add only what's required (`--cap-drop=ALL --cap-add=...`)
- Run as non-root user (UID > 1000): `--user 1001:1001` - Set `--security-opt=no-new-privileges:true` on all containers
- Set `--security-opt=no-new-privileges:true` - Use `--read-only` + tmpfs where possible (safe apps: searxng, grafana, filebrowser, electrumx, nostr-rs-relay, ollama, indeedhub)
- Pin image versions by SHA256 digest, never use `:latest` tag - Pin image versions never use `:latest` tag
- Mount secrets as read-only files, never pass as environment variables when possible - Mount secrets as read-only files, never pass as environment variables when possible
- Set memory and CPU limits on all containers - Set memory and CPU limits on all containers
- Use `--network=none` unless network access is required - All containers must have `--restart unless-stopped`
## Volume Ownership (Critical for Rootless)
- Volume directories must be owned by the MAPPED UID, not the container UID
- Formula: `host_uid = 100000 + container_uid`
- UID 0 (most apps) → `sudo chown -R 100000:100000 /var/lib/archipelago/{app}`
- UID 101 (bitcoin) → `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin`
- UID 70 (postgres) → `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*`
- UID 472 (grafana) → `sudo chown -R 100472:100472 /var/lib/archipelago/grafana`
- UID 999 (mariadb) → `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-*`
## Systemd Service Requirements
- `ProtectHome=no` — podman needs `~/.local/share/containers/`
- `PrivateTmp=no` — podman runtime uses `/tmp/podman-run-1000/`
- `RestrictNamespaces=` must NOT be set — rootless podman creates user namespaces
- `SystemCallFilter=` must NOT be set — rootless podman needs clone/unshare
- UFW `DEFAULT_FORWARD_POLICY="ACCEPT"` — required for LAN access to container ports
## Network Rules
- Apps needing inter-container DNS: use `--network=archy-net` (bitcoin, lnd, electrumx, mempool, btcpay, fedimint)
- Standalone apps: default bridge network
- Tailscale only: `--network=host` + `NET_ADMIN` + `NET_RAW` + `/dev/net/tun`

View File

@ -4,6 +4,7 @@ description: >
Comprehensive Podman container diagnostic for Archipelago. Audits all running containers, Comprehensive Podman container diagnostic for Archipelago. Audits all running containers,
port mappings, network connectivity, health status, restart policies, and config consistency port mappings, network connectivity, health status, restart policies, and config consistency
across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing). across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing).
Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536).
Use when asked to "diagnose containers", "check podman", "why is app not working", Use when asked to "diagnose containers", "check podman", "why is app not working",
"container health check", "port not reachable", "audit containers", "podman status", "container health check", "port not reachable", "audit containers", "podman status",
or when any container/app is misbehaving. or when any container/app is misbehaving.
@ -12,46 +13,123 @@ allowed-tools: Bash Read Glob Grep
# Podman Doctor — Container Infrastructure Diagnostics # Podman Doctor — Container Infrastructure Diagnostics
Systematic diagnostic for Archipelago's Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, and config drift across all layers. Systematic diagnostic for Archipelago's **rootless Podman** container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, UID mapping issues, and config drift across all layers.
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` **SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
> **ROOTLESS PODMAN**: Archipelago runs Podman as the `archipelago` user (UID 1000), NOT root.
> Never use `sudo podman` — use plain `podman` after SSH'ing in as the `archipelago` user.
> Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).
If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit. If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
## Workflow ## Workflow
### Step 1: Gather Runtime State ### Step 1: Gather Runtime State
Run these on the server: Run these on the server (as `archipelago` user — NO sudo):
```bash ```bash
# All containers with status, ports, networks # All containers with status, ports, networks
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}" podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
# Check for port conflicts on known ports # Check for port conflicts on known ports
sudo ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b" ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
``` ```
### Step 2: Check Restart Policies ### Step 2: Rootless Podman Health Check
Rootless Podman has specific requirements that must be verified:
```bash
# Verify running as archipelago user (NOT root)
whoami # Must be "archipelago"
id # Must show uid=1000(archipelago)
# Check XDG_RUNTIME_DIR is set (required for rootless podman socket)
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR" # Must be /run/user/1000
# Verify subuid/subgid mapping exists
grep archipelago /etc/subuid # Must show: archipelago:100000:65536
grep archipelago /etc/subgid # Must show: archipelago:100000:65536
# Verify user lingering is enabled (keeps user services after logout)
ls /var/lib/systemd/linger/ | grep archipelago # Must exist
# Check podman storage is accessible
podman info --format "{{.Store.GraphRoot}}" # ~/.local/share/containers/storage
ls -la ~/.local/share/containers/storage/ 2>/dev/null || echo "ERROR: Storage not accessible"
# Check podman socket
ls -la /run/user/1000/podman/ 2>/dev/null || echo "WARNING: No podman socket directory"
```
### Step 3: Check Restart Policies
Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots. Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots.
```bash ```bash
for c in $(sudo podman ps -a --format "{{.Names}}"); do for c in $(podman ps -a --format "{{.Names}}"); do
echo -n "$c: " echo -n "$c: "
sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}" podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
done done
``` ```
**Red flag**: `no` or empty = container won't survive reboot. **Red flag**: `no` or empty = container won't survive reboot.
### Step 3: Verify Port Mapping Consistency ### Step 4: Volume Ownership Audit (Rootless UID Mapping)
Rootless Podman maps container UIDs via subuid. Volume directories must be owned by the MAPPED UID, not the container UID. Formula: `host_uid = 100000 + container_uid`
```bash
echo "=== Volume Ownership Check ==="
# Default containers (run as root inside = UID 0 → host UID 100000)
for dir in lnd fedimint homeassistant jellyfin vaultwarden photoprism ollama filebrowser electrumx btcpay immich; do
if [ -d "/var/lib/archipelago/$dir" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/$dir" 2>/dev/null)
if [ "$owner" != "100000:100000" ]; then
echo "WRONG: /var/lib/archipelago/$dir owned by $owner (should be 100000:100000)"
else
echo " OK: $dir → $owner"
fi
fi
done
# Bitcoin Knots (container UID 101 → host UID 100101)
if [ -d "/var/lib/archipelago/bitcoin" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/bitcoin")
[ "$owner" != "100101:100101" ] && echo "WRONG: bitcoin owned by $owner (should be 100101:100101)" || echo " OK: bitcoin → $owner"
fi
# PostgreSQL (container UID 70 → host UID 100070)
for dir in /var/lib/archipelago/*-db /var/lib/archipelago/postgres-*; do
if [ -d "$dir" ]; then
owner=$(stat -c '%u:%g' "$dir")
[ "$owner" != "100070:100070" ] && echo "WRONG: $dir owned by $owner (should be 100070:100070)" || echo " OK: $(basename $dir) → $owner"
fi
done
# Grafana (container UID 472 → host UID 100472)
if [ -d "/var/lib/archipelago/grafana" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/grafana")
[ "$owner" != "100472:100472" ] && echo "WRONG: grafana owned by $owner (should be 100472:100472)" || echo " OK: grafana → $owner"
fi
# MariaDB/MySQL (container UID 999 → host UID 100999)
if [ -d "/var/lib/archipelago/mysql-mempool" ]; then
owner=$(stat -c '%u:%g' "/var/lib/archipelago/mysql-mempool")
[ "$owner" != "100999:100999" ] && echo "WRONG: mysql-mempool owned by $owner (should be 100999:100999)" || echo " OK: mysql-mempool → $owner"
fi
```
### Step 5: Verify Port Mapping Consistency
Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs: Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
**Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings. **Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings.
**Layer 2 — Podman Runtime**: `sudo podman ps --format "{{.Names}}: {{.Ports}}"` **Layer 2 — Podman Runtime**: `podman ps --format "{{.Names}}: {{.Ports}}"`
**Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks: **Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks:
- `image-recipe/configs/nginx-archipelago.conf` (HTTP) - `image-recipe/configs/nginx-archipelago.conf` (HTTP)
@ -66,77 +144,114 @@ Cross-reference these 4 layers — mismatches between ANY two cause "app not loa
| Works on port but not /app/ path | Missing nginx location block | | Works on port but not /app/ path | Missing nginx location block |
| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts | | Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
### Step 4: Network Connectivity Audit ### Step 6: Network Connectivity Audit
```bash ```bash
# Networks and their containers # Networks and their containers
sudo podman network ls podman network ls
sudo podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!" podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
# Check container subnet (rootless uses 10.89.x.x, NOT 10.88.x.x)
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}" 2>/dev/null
``` ```
**Must be on archy-net**: bitcoin-knots, lnd, electrs, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui **Must be on archy-net**: bitcoin-knots, lnd, electrs/electrumx, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
**Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network) **Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
### Step 5: Health Check Status ### Step 7: UFW Forward Policy Check
Rootless Podman requires `DEFAULT_FORWARD_POLICY="ACCEPT"` in UFW, otherwise container ports are unreachable from LAN.
```bash
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
# Must be "ACCEPT", NOT "DROP"
# If DROP: containers work locally but NOT from other machines on the network
```
### Step 8: Systemd Service Sandbox Check
The `archipelago.service` must have specific settings relaxed for rootless Podman:
```bash
# Check critical settings
systemctl cat archipelago.service | grep -E "ProtectHome|PrivateTmp|RestrictNamespaces|ReadWritePaths|XDG_RUNTIME_DIR"
```
**Required settings for rootless Podman**:
- `ProtectHome=no` — podman stores images in `~/.local/share/containers/`
- `PrivateTmp=no` or disabled — podman runtime uses `/tmp/podman-run-1000/`
- `RestrictNamespaces=` must NOT be set — rootless podman needs user namespaces
- `ReadWritePaths=` must include `/var/lib/archipelago /run/user /tmp`
- `Environment=XDG_RUNTIME_DIR=/run/user/1000`
### Step 9: Health Check Status
```bash ```bash
# Containers with health checks — are they passing? # Containers with health checks — are they passing?
for c in $(sudo podman ps --format "{{.Names}}"); do for c in $(podman ps --format "{{.Names}}"); do
health=$(sudo podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null) health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
if [ -n "$health" ] && [ "$health" != "<no value>" ]; then if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
echo "$c: $health" echo "$c: $health"
fi fi
done done
# Containers WITHOUT health checks (gap in monitoring) # Containers WITHOUT health checks (gap in monitoring)
for c in $(sudo podman ps --format "{{.Names}}"); do for c in $(podman ps --format "{{.Names}}"); do
hc=$(sudo podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null) hc=$(podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
echo "NO HEALTHCHECK: $c" echo "NO HEALTHCHECK: $c"
fi fi
done done
``` ```
### Step 6: Resource & Failure Analysis ### Step 10: Resource & Failure Analysis
```bash ```bash
# Resource usage # Resource usage
sudo podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}" podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
# Recent deaths (last 24h) # Recent deaths (last 24h)
sudo podman events --filter event=died --since 24h 2>/dev/null | tail -20 podman events --filter event=died --since 24h 2>/dev/null | tail -20
# OOM kills # OOM kills
sudo podman ps -a --format "{{.Names}}" | while read c; do podman ps -a --format "{{.Names}}" | while read c; do
oom=$(sudo podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null) oom=$(podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
[ "$oom" = "true" ] && echo "OOM KILLED: $c" [ "$oom" = "true" ] && echo "OOM KILLED: $c"
done done
# Non-zero exits # Non-zero exits
sudo podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}" podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
``` ```
### Step 7: Systemd Integration ### Step 11: Systemd Integration
```bash ```bash
systemctl is-active archipelago nginx systemctl is-active archipelago nginx
systemctl list-units --type=service | grep -i podman systemctl --user list-units --type=service 2>/dev/null | grep -i podman
systemctl list-timers --all | grep -i -E "podman|container|archipelago" systemctl list-timers --all | grep -i -E "podman|container|archipelago"
``` ```
### Step 8: Generate Report ### Step 12: Generate Report
Produce a structured report: Produce a structured report:
``` ```
## Container Diagnostic Report ## Container Diagnostic Report
### Rootless Podman Status
- User: archipelago (UID 1000)
- Subuid mapping: [OK/MISSING]
- XDG_RUNTIME_DIR: [OK/MISSING]
- User linger: [enabled/disabled]
- UFW forward policy: [ACCEPT/DROP]
### Summary ### Summary
- Total containers: X running, Y stopped, Z unhealthy - Total containers: X running, Y stopped, Z unhealthy
- Port conflicts: [list or "none"] - Port conflicts: [list or "none"]
- Missing restart policies: [list or "none"] - Missing restart policies: [list or "none"]
- Network issues: [list or "none"] - Network issues: [list or "none"]
- UID mapping issues: [list or "none"]
- Health check gaps: [list] - Health check gaps: [list]
### Critical Issues (fix immediately) ### Critical Issues (fix immediately)
@ -154,3 +269,7 @@ After diagnosis, suggest running `/podman-fix` for any issues found.
## Port Reference ## Port Reference
See `references/port-map.md` for the canonical port assignment table across all 4 layers. See `references/port-map.md` for the canonical port assignment table across all 4 layers.
## UID Mapping Reference
See `references/uid-mapping.md` for the complete rootless UID mapping table.

View File

@ -1,15 +1,31 @@
# Common Podman Failure Patterns # Common Podman Failure Patterns
## Rootless Podman Specific Failures
| Error | Cause | Fix |
|-------|-------|-----|
| `ERRO[0000] cannot find UID/GID for user` | subuid/subgid not configured | Add `archipelago:100000:65536` to `/etc/subuid` and `/etc/subgid` |
| `Error: unshare: operation not permitted` | Systemd `RestrictNamespaces` blocks user namespaces | Remove `RestrictNamespaces=` from `archipelago.service` |
| `Error: could not get runtime: creating runtime` | XDG_RUNTIME_DIR not set or /run/user/1000 missing | Set `Environment=XDG_RUNTIME_DIR=/run/user/1000` in service, ensure `loginctl enable-linger archipelago` |
| `permission denied` on volume mount | Wrong UID ownership — must use mapped UIDs | `sudo chown -R 100000:100000 /var/lib/archipelago/APP` (see UID mapping table) |
| `ERRO[0000] rootless containers not supported` | Podman not configured for rootless | Run `podman system migrate`, check `/etc/subuid` |
| `Error: creating container storage: layer not known` | Corrupted rootless storage | `podman system reset` (destroys all containers — last resort) |
| `Error: stat /tmp/podman-run-1000/...: no such file` | PrivateTmp=yes in systemd isolates /tmp | Set `PrivateTmp=no` in `archipelago.service` |
| Container ports unreachable from LAN | UFW DEFAULT_FORWARD_POLICY="DROP" | Change to "ACCEPT" in `/etc/default/ufw`, then `sudo ufw reload` |
| `Error: error creating network namespace` | Systemd `SystemCallFilter` blocks clone/unshare | Remove `SystemCallFilter=` from `archipelago.service` |
| Containers lose network after service restart | podman runtime dir in /tmp cleaned | Ensure `PrivateTmp=no` so /tmp/podman-run-1000/ persists |
## Container Won't Start ## Container Won't Start
| Error | Cause | Fix | | Error | Cause | Fix |
|-------|-------|-----| |-------|-------|-----|
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server | | `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender | | `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
| `permission denied` | Missing capability or read-only root | Check `get_app_capabilities()`, add tmpfs | | `permission denied` | Missing capability, wrong UID ownership, or read-only root | Check capabilities, check volume ownership with mapped UID, add tmpfs |
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` | | `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` | | `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
| `no such network` | Network missing | `podman network create archy-net` | | `no such network` | Network missing | `podman network create archy-net` |
| `Error: netavark: ...subnet overlap` | Network CIDR conflict | `podman network rm archy-net && podman network create archy-net` |
## Container Starts But App Unreachable ## Container Starts But App Unreachable
@ -20,6 +36,7 @@
| Port mapped but refused | Container logs | App crashing internally — check logs | | Port mapped but refused | Container logs | App crashing internally — check logs |
| Works sometimes | Resources | Check OOM kills, CPU, disk space | | Works sometimes | Resources | Check OOM kills, CPU, disk space |
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted | | 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
| Works locally but not from LAN | UFW forward policy | Set `DEFAULT_FORWARD_POLICY="ACCEPT"` in `/etc/default/ufw` |
## Container Keeps Dying ## Container Keeps Dying
@ -29,6 +46,8 @@
| Dies after minutes | OOM killed | Increase `--memory` limit | | Dies after minutes | OOM killed | Increase `--memory` limit |
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` | | Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
| Crash loop | Repeated crash | Fix root cause, don't just restart | | Crash loop | Repeated crash | Fix root cause, don't just restart |
| Exit code 127 | Missing binary in container | Wrong image tag or corrupted image — re-pull |
| Exit code 137 | Killed by OOM or signal | Check `dmesg` for OOM kill, check `podman inspect` for OOMKilled |
## Network Issues ## Network Issues
@ -37,6 +56,20 @@
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` | | Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` | | Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
| Container-to-container timeout | Different networks | Put both on same network | | Container-to-container timeout | Different networks | Put both on same network |
| Bitcoin RPC refused from container | rpcallowip wrong subnet | Use `rpcallowip=0.0.0.0/0` (safe: port mapped, not exposed) |
| Old containers can't find new network | Subnet changed (rootful→rootless) | Recreate containers on new archy-net (rootless uses 10.89.x.x) |
## Volume Permission Patterns (Rootless UID Mapping)
Formula: **host_uid = 100000 + container_uid**
| Container UID | Host UID | Apps | Data Directory |
|---|---|---|---|
| 0 (root) | 100000 | lnd, fedimint, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay, immich | `/var/lib/archipelago/{app}` |
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `/var/lib/archipelago/postgres-*` |
| 101 | 100101 | bitcoin-knots | `/var/lib/archipelago/bitcoin` |
| 472 | 100472 | grafana | `/var/lib/archipelago/grafana` |
| 999 | 100999 | MariaDB (mysql-mempool) | `/var/lib/archipelago/mysql-mempool` |
## Capability Reference ## Capability Reference
@ -47,9 +80,23 @@
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files | | DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms | | FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 | | NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
| NET_ADMIN + NET_RAW | tailscale | Can't create TUN device or manage routes |
## Read-Only Safe Apps ## Read-Only Safe Apps
Only these 8 apps can run with `--read-only`: searxng, grafana, filebrowser, electrs, nostr-rs-relay, ollama, indeedhub Only these apps can run with `--read-only` + tmpfs: searxng, grafana, filebrowser, electrumx, mempool-electrs, electrs, nostr-rs-relay, ollama, indeedhub
All others need writable root or will fail silently. All others need writable root or will fail silently.
## Systemd Sandbox Requirements for Rootless Podman
These systemd service settings MUST be configured for rootless Podman to work:
| Setting | Required Value | Why |
|---------|---------------|-----|
| `ProtectHome=` | `no` | Podman stores images in `~/.local/share/containers/` |
| `PrivateTmp=` | `no` | Podman runtime lives in `/tmp/podman-run-1000/` |
| `RestrictNamespaces=` | NOT SET | Rootless podman creates user namespaces |
| `SystemCallFilter=` | NOT SET | Rootless podman needs clone/unshare syscalls |
| `ReadWritePaths=` | Include `/var/lib/archipelago /run/user /tmp /etc/containers /var/lib/containers /run/containers` | Volume data + podman runtime paths |
| `Environment=` | `XDG_RUNTIME_DIR=/run/user/1000` | Podman socket location |

View File

@ -0,0 +1,93 @@
# Rootless Podman UID Mapping Reference
## How Rootless UID Mapping Works
When Podman runs as the `archipelago` user (UID 1000), container processes don't run as their "apparent" UID on the host. Instead, Linux user namespaces remap UIDs.
**Mapping formula**: `host_uid = 100000 + container_uid`
This is configured in `/etc/subuid` and `/etc/subgid`:
```
archipelago:100000:65536
```
This means:
- Container UID 0 (root inside container) → Host UID 100000 (unprivileged on host)
- Container UID 70 (postgres) → Host UID 100070
- Container UID 101 (bitcoin) → Host UID 100101
- etc.
## Why This Matters
Volume directories (bind mounts) on the host must be owned by the **mapped** UID, not the container UID. If Bitcoin runs as UID 101 inside its container, the host directory must be owned by UID 100101.
If ownership is wrong, the container gets `permission denied` when trying to read/write its data.
## Complete UID Mapping Table
| Container UID | Host UID | Containers | Fix Command |
|---|---|---|---|
| 0 (root) | 100000 | lnd, fedimint, fedimint-gateway, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay-server, nbxplorer, immich, nostr-rs-relay, strfry, nextcloud, searxng, onlyoffice, tailscale, uptime-kuma | `sudo chown -R 100000:100000 /var/lib/archipelago/{app}` |
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*` |
| 101 | 100101 | bitcoin-knots, bitcoin-core | `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin` |
| 472 | 100472 | grafana | `sudo chown -R 100472:100472 /var/lib/archipelago/grafana` |
| 999 | 100999 | MariaDB (mysql-mempool) | `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool` |
## How to Find a Container's UID
If you encounter a new container with permission issues:
```bash
# Check what user the container runs as
podman inspect CONTAINER_NAME --format "{{.Config.User}}"
# If empty, it runs as root (UID 0) → host UID 100000
# If it shows a username, find the UID inside the image
podman run --rm IMAGE_NAME id
# Then calculate: host_uid = 100000 + container_uid
```
## Fix Script
Run this after any fresh install, migration, or when containers have permission errors:
```bash
#!/bin/bash
# Fix all rootless podman volume ownership
# UID 0 → 100000 (most containers)
for dir in lnd fedimint fedimint-gateway homeassistant jellyfin vaultwarden photoprism \
ollama filebrowser electrumx btcpay nbxplorer immich nostr-rs-relay nextcloud \
searxng onlyoffice uptime-kuma; do
[ -d "/var/lib/archipelago/$dir" ] && sudo chown -R 100000:100000 "/var/lib/archipelago/$dir"
done
# UID 101 → 100101 (Bitcoin)
[ -d "/var/lib/archipelago/bitcoin" ] && sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin
# UID 70 → 100070 (PostgreSQL)
for dir in /var/lib/archipelago/postgres-* /var/lib/archipelago/btcpay-db /var/lib/archipelago/immich-db; do
[ -d "$dir" ] && sudo chown -R 100070:100070 "$dir"
done
# UID 999 → 100999 (MariaDB)
[ -d "/var/lib/archipelago/mysql-mempool" ] && sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool
# UID 472 → 100472 (Grafana)
[ -d "/var/lib/archipelago/grafana" ] && sudo chown -R 100472:100472 /var/lib/archipelago/grafana
```
## Rootful vs Rootless Comparison
| Aspect | Rootful (old) | Rootless (current) |
|--------|---------------|-------------------|
| Podman command | `sudo podman` | `podman` (as archipelago user) |
| Container storage | `/var/lib/containers/storage` | `~/.local/share/containers/storage` |
| Container subnet | `10.88.0.0/16` | `10.89.0.0/16` |
| Volume ownership | Container UID directly | Mapped UID (100000 + container_uid) |
| Requires root? | Yes | No (except fixing volume ownership) |
| XDG_RUNTIME_DIR | Not needed | Required: `/run/user/1000` |
| User lingering | Not needed | Required: `loginctl enable-linger` |
| Systemd restrictions | All can be enabled | Must disable: RestrictNamespaces, SystemCallFilter |

View File

@ -2,19 +2,24 @@
name: podman-fix name: podman-fix
description: > description: >
Fix Podman container issues on Archipelago — restart failed containers, repair port bindings, Fix Podman container issues on Archipelago — restart failed containers, repair port bindings,
fix network connectivity, add missing restart policies, and resolve config drift. fix network connectivity, add missing restart policies, fix rootless UID mapping, and resolve
config drift. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536).
Use when asked to "fix container", "restart app", "fix port mapping", "container not working", Use when asked to "fix container", "restart app", "fix port mapping", "container not working",
"app won't start", "fix podman", "repair container", "container down", or after /podman-doctor "app won't start", "fix podman", "repair container", "container down", "permission denied",
identifies issues to fix. or after /podman-doctor identifies issues to fix.
allowed-tools: Bash Read Edit Write Glob Grep allowed-tools: Bash Read Edit Write Glob Grep
--- ---
# Podman Fix — Container Remediation # Podman Fix — Container Remediation
Targeted fix workflow for Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it. Targeted fix workflow for **rootless Podman** container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` **SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
> **ROOTLESS PODMAN**: All `podman` commands run as the `archipelago` user — NO sudo.
> Only use `sudo` for: chown on volume directories, UFW changes, systemd service edits, nginx reload.
> Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).
If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing. If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
## Fix Procedures ## Fix Procedures
@ -23,21 +28,22 @@ If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs
```bash ```bash
# Check why it stopped # Check why it stopped
sudo podman logs --tail 50 CONTAINER_NAME podman logs --tail 50 CONTAINER_NAME
sudo podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}" podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
# If clean exit or crash — just restart # If clean exit or crash — just restart
sudo podman start CONTAINER_NAME podman start CONTAINER_NAME
# If corrupt state — remove and recreate # If corrupt state — remove and recreate
sudo podman rm -f CONTAINER_NAME podman rm -f CONTAINER_NAME
# Then recreate using the install flow (trigger from UI or re-run creation command) # Then recreate using the install flow (trigger from UI or re-run creation command)
``` ```
**If container keeps crashing**: check logs for the actual error. Common causes: **If container keeps crashing**, check logs for the actual error. Common causes:
- Missing config file → check if volume mount has the config - Missing config file → check if volume mount has the config
- Wrong permissions → `chown -R` the data directory - Wrong permissions → fix UID mapping (see Fix 8 below)
- Dependency not ready → start dependency first, wait, then start this container - Dependency not ready → start dependency first, wait, then start this container
- Exit code 127 → missing binary in container image, re-pull the image
### Fix 2: Missing Restart Policy ### Fix 2: Missing Restart Policy
@ -45,14 +51,14 @@ The most common uptime killer. Fix for ALL containers at once:
```bash ```bash
# Fix a single container # Fix a single container
sudo podman update --restart unless-stopped CONTAINER_NAME podman update --restart unless-stopped CONTAINER_NAME
# Fix ALL containers that have no restart policy # Fix ALL containers that have no restart policy
for c in $(sudo podman ps -a --format "{{.Names}}"); do for c in $(podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
if [ "$policy" = "no" ] || [ -z "$policy" ]; then if [ "$policy" = "no" ] || [ -z "$policy" ]; then
echo "Fixing restart policy for: $c" echo "Fixing restart policy for: $c"
sudo podman update --restart unless-stopped "$c" podman update --restart unless-stopped "$c"
fi fi
done done
``` ```
@ -66,23 +72,24 @@ done
#### Port conflict (address already in use) #### Port conflict (address already in use)
```bash ```bash
# Find what's using the port # Find what's using the port
sudo ss -tlnp | grep :PORT_NUMBER ss -tlnp | grep :PORT_NUMBER
# If it's another container, either change one's port or stop the conflicting one # If it's another container, either change one's port or stop the conflicting one
sudo podman stop CONFLICTING_CONTAINER podman stop CONFLICTING_CONTAINER
# If it's a host process # If it's a host process (e.g., system tor vs container tor)
sudo kill PID # or stop the service sudo systemctl stop tor # Stop system service if container needs the port
sudo systemctl disable tor
``` ```
#### Port not mapped (container running but port unreachable) #### Port not mapped (container running but port unreachable)
```bash ```bash
# Check current port mappings # Check current port mappings
sudo podman port CONTAINER_NAME podman port CONTAINER_NAME
# Can't add ports to running container — must recreate # Can't add ports to running container — must recreate
sudo podman stop CONTAINER_NAME podman stop CONTAINER_NAME
sudo podman rm CONTAINER_NAME podman rm CONTAINER_NAME
# Recreate with correct -p flags (use the Rust install flow or manual podman run) # Recreate with correct -p flags (use the Rust install flow or manual podman run)
``` ```
@ -124,35 +131,51 @@ Edit `neode-ui/src/stores/appLauncher.ts`:
#### Container not on archy-net (can't resolve other containers) #### Container not on archy-net (can't resolve other containers)
```bash ```bash
# Connect to archy-net without recreating # Connect to archy-net without recreating
sudo podman network connect archy-net CONTAINER_NAME podman network connect archy-net CONTAINER_NAME
# Verify # Verify
sudo podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}" podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
``` ```
#### archy-net doesn't exist #### archy-net doesn't exist
```bash ```bash
sudo podman network create archy-net podman network create archy-net
# Then reconnect all containers that need it # Then reconnect all containers that need it
``` ```
#### DNS not working inside container #### DNS not working inside container
```bash ```bash
# Test DNS from inside container # Test DNS from inside container
sudo podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \ podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
sudo podman exec CONTAINER_NAME ping -c1 bitcoin-knots podman exec CONTAINER_NAME ping -c1 bitcoin-knots
# If DNS fails, check the container's resolv.conf
podman exec CONTAINER_NAME cat /etc/resolv.conf
# If DNS fails, recreate container with explicit DNS # If DNS fails, recreate container with explicit DNS
# Add --dns 1.1.1.1 to the podman run command # Add --dns 1.1.1.1 to the podman run command
``` ```
#### Container subnet changed (rootful → rootless migration)
```bash
# Old rootful subnet: 10.88.0.0/16
# New rootless subnet: 10.89.0.0/16
# Bitcoin RPC rpcallowip must be updated if using subnet-specific allowlist
# Check current archy-net subnet
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}"
# If Bitcoin RPC refuses connections from containers:
# Update bitcoin.conf rpcallowip to 0.0.0.0/0 (safe: only accessible via port mapping)
```
### Fix 5: Health Check Issues ### Fix 5: Health Check Issues
#### Add missing health check to running container #### Add missing health check to running container
Can't add to running container — must recreate with health check flags: Can't add to running container — must recreate with health check flags:
```bash ```bash
# Example for a web app # Example for a web app
sudo podman run ... \ podman run ... \
--health-cmd "curl -f http://localhost:PORT/health || exit 1" \ --health-cmd "curl -f http://localhost:PORT/health || exit 1" \
--health-interval 30s \ --health-interval 30s \
--health-timeout 5s \ --health-timeout 5s \
@ -164,10 +187,10 @@ sudo podman run ... \
#### Fix unhealthy container #### Fix unhealthy container
```bash ```bash
# See what the health check is actually running # See what the health check is actually running
sudo podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}" podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
# Run the health check manually to see the error # Run the health check manually to see the error
sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
# Common fixes: # Common fixes:
# - curl not installed in container → use wget or nc instead # - curl not installed in container → use wget or nc instead
@ -179,13 +202,10 @@ sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
```bash ```bash
# Check what capabilities container has # Check what capabilities container has
sudo podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}" podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
# If missing required caps, must recreate with correct --cap-add flags # If missing required caps, must recreate with correct --cap-add flags
# Refer to the capability reference in /podman-doctor references # Refer to the capability reference in /podman-doctor references
# Fix data directory permissions
sudo chown -R 1000:1000 /var/lib/archipelago/APP_NAME/
``` ```
### Fix 7: Full Config Consistency Fix ### Fix 7: Full Config Consistency Fix
@ -199,12 +219,108 @@ When port map is inconsistent across layers, fix ALL layers:
5. **Deploy**: `./scripts/deploy-to-target.sh --live` 5. **Deploy**: `./scripts/deploy-to-target.sh --live`
6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/` 6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/`
### Fix 8: Rootless UID Mapping (Permission Denied on Volumes)
This is the #1 rootless-specific issue. Container UIDs are remapped by user namespaces.
**Formula**: `host_uid = 100000 + container_uid`
```bash
# Fix UID 0 containers (most apps — run as root inside, mapped to 100000 on host)
sudo chown -R 100000:100000 /var/lib/archipelago/APP_NAME
# Fix Bitcoin (container UID 101 → host UID 100101)
sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin
# Fix PostgreSQL (container UID 70 → host UID 100070)
sudo chown -R 100070:100070 /var/lib/archipelago/postgres-APP_NAME
# Fix Grafana (container UID 472 → host UID 100472)
sudo chown -R 100472:100472 /var/lib/archipelago/grafana
# Fix MariaDB (container UID 999 → host UID 100999)
sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool
```
**How to find the right UID for a new container:**
```bash
# Check what user the container image runs as
podman inspect IMAGE_NAME --format "{{.Config.User}}"
# If empty = root (UID 0) → host UID 100000
# If number → host UID = 100000 + that number
# If username → run: podman run --rm IMAGE_NAME id
```
After fixing ownership, restart the container:
```bash
podman restart CONTAINER_NAME
```
### Fix 9: UFW Forward Policy (LAN Access Broken)
If containers work locally but not from other machines on the network:
```bash
# Check current policy
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
# Fix: change DROP to ACCEPT
sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw
sudo ufw reload
```
### Fix 10: Systemd Sandbox Too Restrictive
If the Rust backend can't scan/manage containers after a systemd update:
```bash
# Check what's blocked
sudo journalctl -u archipelago --since "10 min ago" | grep -i "denied\|permission\|namespace\|syscall"
# The archipelago.service MUST have these for rootless podman:
# ProtectHome=no
# PrivateTmp=no (or disabled)
# RestrictNamespaces= (NOT SET — don't restrict)
# SystemCallFilter= (NOT SET — don't filter)
# ReadWritePaths=/var/lib/archipelago /etc/containers /var/lib/containers /run/containers /run/user /tmp
# Environment=XDG_RUNTIME_DIR=/run/user/1000
```
Edit the service file:
```bash
sudo systemctl edit archipelago.service
# Add overrides, then:
sudo systemctl daemon-reload
sudo systemctl restart archipelago
```
### Fix 11: Stale Podman Processes
If `podman ps` hangs or is very slow:
```bash
# Kill stuck podman processes (>10 of them = something is wrong)
stuck=$(pgrep -c -f "podman ps\|podman stats" 2>/dev/null || echo 0)
if [ "$stuck" -gt 10 ]; then
pkill -f "podman ps\|podman stats"
echo "Killed $stuck stuck podman processes"
fi
# Kill orphaned conmon processes holding ports
for pid in $(pgrep conmon); do
container=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | grep -oP '(?<=--cid )\S+')
if [ -n "$container" ] && ! podman ps -a --format "{{.ID}}" | grep -q "${container:0:12}"; then
kill "$pid" 2>/dev/null && echo "Killed orphan conmon $pid"
fi
done
```
## After Fixing ## After Fixing
Always verify the fix: Always verify the fix:
```bash ```bash
# Container running? # Container running?
sudo podman ps --filter name=CONTAINER_NAME podman ps --filter name=CONTAINER_NAME
# Port reachable? # Port reachable?
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
@ -213,7 +329,10 @@ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
# Health check passing? # Health check passing?
sudo podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}" podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
# Volume permissions correct? (rootless check)
podman exec CONTAINER_NAME ls -la /data/ 2>/dev/null || echo "Check container data path"
``` ```
Run `/podman-doctor` again to confirm all issues are resolved. Run `/podman-doctor` again to confirm all issues are resolved.

View File

@ -3,7 +3,8 @@ name: podman-uptime
description: > description: >
Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies
restart policies, creates health check monitors, and configures auto-recovery for all restart policies, creates health check monitors, and configures auto-recovery for all
containers. Use when asked to "ensure uptime", "containers keep dying", "auto-restart", containers. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536).
Use when asked to "ensure uptime", "containers keep dying", "auto-restart",
"watchdog", "container monitoring", "uptime guarantee", "keep containers running", "watchdog", "container monitoring", "uptime guarantee", "keep containers running",
"survive reboot", or to harden container reliability. "survive reboot", or to harden container reliability.
allowed-tools: Bash Read Edit Write Glob Grep allowed-tools: Bash Read Edit Write Glob Grep
@ -15,6 +16,31 @@ Ensures every Archipelago container survives reboots, recovers from crashes, and
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` **SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
> **ROOTLESS PODMAN**: All `podman` commands run as the `archipelago` user — NO sudo.
> Only use `sudo` for: systemd unit files, chown on volumes, UFW changes.
> The archipelago user runs containers directly via user namespaces.
## Prerequisites for Rootless Uptime
Before setting up uptime infrastructure, verify rootless Podman basics are working:
```bash
# Must be the archipelago user
whoami # archipelago
# User lingering must be enabled (keeps user services running after logout)
ls /var/lib/systemd/linger/ | grep archipelago || sudo loginctl enable-linger archipelago
# XDG_RUNTIME_DIR must be set
echo $XDG_RUNTIME_DIR # /run/user/1000
# Subuid/subgid must be configured
grep archipelago /etc/subuid # archipelago:100000:65536
# UFW forward policy must be ACCEPT (for LAN access to containers)
grep DEFAULT_FORWARD_POLICY /etc/default/ufw # Must be "ACCEPT"
```
## Layer 1: Restart Policies (Survive Reboots) ## Layer 1: Restart Policies (Survive Reboots)
Every container MUST have `--restart unless-stopped`. This is non-negotiable. Every container MUST have `--restart unless-stopped`. This is non-negotiable.
@ -23,28 +49,31 @@ Every container MUST have `--restart unless-stopped`. This is non-negotiable.
```bash ```bash
# Audit # Audit
for c in $(sudo podman ps -a --format "{{.Names}}"); do for c in $(podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
echo "$c: $policy" echo "$c: $policy"
done done
# Fix any with "no" or empty policy # Fix any with "no" or empty policy
for c in $(sudo podman ps -a --format "{{.Names}}"); do for c in $(podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
if [ "$policy" = "no" ] || [ -z "$policy" ]; then if [ "$policy" = "no" ] || [ -z "$policy" ]; then
echo "Fixing: $c" echo "Fixing: $c"
sudo podman update --restart unless-stopped "$c" podman update --restart unless-stopped "$c"
fi fi
done done
``` ```
### Ensure podman auto-starts containers on boot ### Ensure podman auto-starts containers on boot
```bash For rootless Podman, containers with restart policies are auto-started by `podman-restart` as a **user** service:
# Enable podman-restart service (restarts containers with restart policy on boot)
sudo systemctl enable podman-restart.service 2>/dev/null || true
# If podman-restart doesn't exist, create it ```bash
# Enable the rootless podman-restart user service
systemctl --user enable podman-restart.service 2>/dev/null
# If the user service doesn't exist, create a system-level one
# (runs as archipelago user via User= directive)
cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service
[Unit] [Unit]
Description=Podman Start All Containers With Restart Policy Description=Podman Start All Containers With Restart Policy
@ -53,8 +82,12 @@ Wants=network-online.target
[Service] [Service]
Type=oneshot Type=oneshot
User=archipelago
Group=archipelago
Environment=XDG_RUNTIME_DIR=/run/user/1000
ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped
RemainAfterExit=yes RemainAfterExit=yes
TimeoutStartSec=300
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
@ -73,27 +106,31 @@ Create a systemd timer that checks container health every 2 minutes and restarts
```bash ```bash
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh
#!/bin/bash #!/bin/bash
# Archipelago Container Watchdog # Archipelago Container Watchdog (Rootless Podman)
# Checks all containers and restarts any that are stopped or unhealthy # Runs as archipelago user — NO sudo for podman commands
LOG_TAG="container-watchdog" LOG_TAG="container-watchdog"
# Run podman as the archipelago user with correct XDG path
export XDG_RUNTIME_DIR=/run/user/1000
PODMAN="/usr/bin/podman"
# Restart any stopped containers that should be running (have restart policy) # Restart any stopped containers that should be running (have restart policy)
for c in $(sudo podman ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}"); do for c in $($PODMAN ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}" 2>/dev/null); do
logger -t "$LOG_TAG" "Restarting stopped container: $c" logger -t "$LOG_TAG" "Restarting stopped container: $c"
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG" $PODMAN start "$c" 2>&1 | logger -t "$LOG_TAG"
done done
# Restart unhealthy containers # Restart unhealthy containers
for c in $(sudo podman ps --filter health=unhealthy --format "{{.Names}}"); do for c in $($PODMAN ps --filter health=unhealthy --format "{{.Names}}" 2>/dev/null); do
logger -t "$LOG_TAG" "Restarting unhealthy container: $c" logger -t "$LOG_TAG" "Restarting unhealthy container: $c"
sudo podman restart "$c" 2>&1 | logger -t "$LOG_TAG" $PODMAN restart "$c" 2>&1 | logger -t "$LOG_TAG"
done done
# Check for containers in "created" state (never started) # Check for containers in "created" state (never started)
for c in $(sudo podman ps -a --filter status=created --format "{{.Names}}"); do for c in $($PODMAN ps -a --filter status=created --format "{{.Names}}" 2>/dev/null); do
logger -t "$LOG_TAG" "Starting created container: $c" logger -t "$LOG_TAG" "Starting created container: $c"
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG" $PODMAN start "$c" 2>&1 | logger -t "$LOG_TAG"
done done
SCRIPT SCRIPT
@ -103,7 +140,7 @@ sudo chmod +x /usr/local/bin/archipelago-container-watchdog.sh
### Create the systemd timer ### Create the systemd timer
```bash ```bash
# Service unit # Service unit — runs as archipelago user for rootless podman
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service
[Unit] [Unit]
Description=Archipelago Container Watchdog Description=Archipelago Container Watchdog
@ -111,6 +148,9 @@ After=podman-restart.service
[Service] [Service]
Type=oneshot Type=oneshot
User=archipelago
Group=archipelago
Environment=XDG_RUNTIME_DIR=/run/user/1000
ExecStart=/usr/local/bin/archipelago-container-watchdog.sh ExecStart=/usr/local/bin/archipelago-container-watchdog.sh
EOF EOF
@ -150,17 +190,20 @@ Some containers depend on others. The watchdog handles restarts, but initial boo
```bash ```bash
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh
#!/bin/bash #!/bin/bash
# Ordered container startup for Archipelago # Ordered container startup for Archipelago (Rootless Podman)
# Runs as archipelago user — NO sudo for podman commands
# Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay # Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay
LOG_TAG="ordered-start" LOG_TAG="ordered-start"
export XDG_RUNTIME_DIR=/run/user/1000
PODMAN="/usr/bin/podman"
wait_for_container() { wait_for_container() {
local name=$1 local name=$1
local max_wait=${2:-60} local max_wait=${2:-60}
local waited=0 local waited=0
while [ $waited -lt $max_wait ]; do while [ $waited -lt $max_wait ]; do
status=$(sudo podman inspect "$name" --format "{{.State.Running}}" 2>/dev/null) status=$($PODMAN inspect "$name" --format "{{.State.Running}}" 2>/dev/null)
if [ "$status" = "true" ]; then if [ "$status" = "true" ]; then
logger -t "$LOG_TAG" "$name is running" logger -t "$LOG_TAG" "$name is running"
return 0 return 0
@ -174,38 +217,45 @@ wait_for_container() {
# Tier 0: Infrastructure # Tier 0: Infrastructure
logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure" logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure"
sudo podman start tailscale 2>/dev/null $PODMAN start tailscale 2>/dev/null
# Tier 1: Bitcoin (foundation) # Tier 1: Databases (must start before services that depend on them)
logger -t "$LOG_TAG" "Starting Tier 1: Bitcoin" logger -t "$LOG_TAG" "Starting Tier 1: Databases"
sudo podman start bitcoin-knots 2>/dev/null $PODMAN start mempool-db 2>/dev/null
$PODMAN start btcpay-postgres 2>/dev/null
$PODMAN start immich_postgres 2>/dev/null
sleep 5
# Tier 2: Bitcoin (foundation for Lightning and explorers)
logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin"
$PODMAN start bitcoin-knots 2>/dev/null
wait_for_container bitcoin-knots 120 wait_for_container bitcoin-knots 120
# Tier 2: Bitcoin-dependent services # Tier 3: Bitcoin-dependent services
logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin-dependent" logger -t "$LOG_TAG" "Starting Tier 3: Bitcoin-dependent"
sudo podman start electrs 2>/dev/null $PODMAN start electrumx 2>/dev/null
sudo podman start lnd 2>/dev/null $PODMAN start lnd 2>/dev/null
wait_for_container electrs 90 wait_for_container electrumx 90
wait_for_container lnd 90 wait_for_container lnd 90
# Tier 3: Services depending on Tier 2 # Tier 4: Services depending on Tier 3
logger -t "$LOG_TAG" "Starting Tier 3: Second-order dependencies" logger -t "$LOG_TAG" "Starting Tier 4: Second-order dependencies"
sudo podman start mempool-db 2>/dev/null $PODMAN start mempool 2>/dev/null
sleep 5 $PODMAN start nbxplorer 2>/dev/null
sudo podman start mempool 2>/dev/null
sudo podman start nbxplorer 2>/dev/null
sleep 10 sleep 10
sudo podman start btcpay-server 2>/dev/null $PODMAN start btcpay-server 2>/dev/null
sudo podman start btcpay-postgres 2>/dev/null $PODMAN start fedimint 2>/dev/null
$PODMAN start fedimint-gateway 2>/dev/null
# Tier 4: Independent apps (start all remaining) # Tier 5: Independent apps (start all remaining)
logger -t "$LOG_TAG" "Starting Tier 4: Independent apps" logger -t "$LOG_TAG" "Starting Tier 5: Independent apps"
sudo podman start --all 2>/dev/null $PODMAN start --all 2>/dev/null
# Tier 5: UI containers (need parent apps running first) # Tier 6: UI containers (need parent apps running first)
logger -t "$LOG_TAG" "Starting Tier 5: UI containers" logger -t "$LOG_TAG" "Starting Tier 6: UI containers"
sudo podman start bitcoin-ui 2>/dev/null $PODMAN start bitcoin-ui 2>/dev/null
sudo podman start lnd-ui 2>/dev/null $PODMAN start lnd-ui 2>/dev/null
$PODMAN start electrs-ui 2>/dev/null
logger -t "$LOG_TAG" "Startup sequence complete" logger -t "$LOG_TAG" "Startup sequence complete"
SCRIPT SCRIPT
@ -216,18 +266,22 @@ sudo chmod +x /usr/local/bin/archipelago-ordered-start.sh
### Wire into boot sequence ### Wire into boot sequence
```bash ```bash
# Runs as archipelago user for rootless podman
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service
[Unit] [Unit]
Description=Archipelago Ordered Container Startup Description=Archipelago Ordered Container Startup
After=network-online.target podman.service After=network-online.target
Wants=network-online.target Wants=network-online.target
Before=archipelago.service Before=archipelago.service
[Service] [Service]
Type=oneshot Type=oneshot
User=archipelago
Group=archipelago
Environment=XDG_RUNTIME_DIR=/run/user/1000
ExecStart=/usr/local/bin/archipelago-ordered-start.sh ExecStart=/usr/local/bin/archipelago-ordered-start.sh
RemainAfterExit=yes RemainAfterExit=yes
TimeoutStartSec=300 TimeoutStartSec=600
[Install] [Install]
WantedBy=multi-user.target WantedBy=multi-user.target
@ -237,14 +291,45 @@ sudo systemctl daemon-reload
sudo systemctl enable archipelago-containers.service sudo systemctl enable archipelago-containers.service
``` ```
## Rootless-Specific Uptime Considerations
### Volume ownership survives reboots
Volume ownership doesn't change on reboot, but if a container image is updated (re-pulled), the new container may run as a different UID. Always verify after image updates:
```bash
# Quick ownership audit after image pull
podman inspect CONTAINER_NAME --format "{{.Config.User}}"
# Then verify: sudo stat -c '%u:%g' /var/lib/archipelago/APP_NAME
# Formula: host_uid = 100000 + container_uid
```
### XDG_RUNTIME_DIR on boot
Rootless Podman requires `/run/user/1000` to exist. This is created by `pam_systemd` when the user logs in, or by `loginctl enable-linger`. If it's missing after boot, containers won't start.
```bash
# Verify it exists
ls -la /run/user/1000/ || echo "CRITICAL: /run/user/1000 missing — run: sudo loginctl enable-linger archipelago"
```
### Systemd sandbox must not block podman
If the archipelago.service sandbox blocks namespace/syscall operations, the Rust backend can't scan containers. See Fix 10 in /podman-fix.
## Verification Checklist ## Verification Checklist
After setting up all 3 layers, verify: After setting up all 3 layers, verify:
```bash ```bash
echo "=== Rootless Podman Prerequisites ==="
echo "User: $(whoami)"
echo "XDG_RUNTIME_DIR: $XDG_RUNTIME_DIR"
grep archipelago /etc/subuid | head -1
ls /var/lib/systemd/linger/ | grep archipelago && echo "Linger: enabled" || echo "Linger: DISABLED"
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
echo ""
echo "=== Layer 1: Restart Policies ===" echo "=== Layer 1: Restart Policies ==="
for c in $(sudo podman ps -a --format "{{.Names}}"); do for c in $(podman ps -a --format "{{.Names}}"); do
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
echo " $c: $policy" echo " $c: $policy"
done done
@ -261,11 +346,19 @@ sudo systemctl is-enabled archipelago-watchdog.timer 2>/dev/null || echo "watchd
echo "" echo ""
echo "=== Container Health Summary ===" echo "=== Container Health Summary ==="
total=$(sudo podman ps -a --format "{{.Names}}" | wc -l) total=$(podman ps -a --format "{{.Names}}" | wc -l)
running=$(sudo podman ps --format "{{.Names}}" | wc -l) running=$(podman ps --format "{{.Names}}" | wc -l)
stopped=$((total - running)) stopped=$((total - running))
unhealthy=$(sudo podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l) unhealthy=$(podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l)
echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy" echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy"
echo ""
echo "=== Volume Ownership Spot Check ==="
for dir in bitcoin lnd grafana; do
if [ -d "/var/lib/archipelago/$dir" ]; then
echo " $dir: $(stat -c '%u:%g' /var/lib/archipelago/$dir)"
fi
done
``` ```
## Reboot Test ## Reboot Test
@ -274,17 +367,20 @@ The ultimate uptime test — reboot the server and verify everything comes back:
```bash ```bash
# Before reboot: record running containers # Before reboot: record running containers
sudo podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt
# Reboot # Reboot
sudo reboot sudo reboot
# After reboot (wait ~3 minutes, then SSH back in): # After reboot (wait ~3 minutes, then SSH back in):
sudo podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt
# Compare # Compare
diff /tmp/before-reboot.txt /tmp/after-reboot.txt diff /tmp/before-reboot.txt /tmp/after-reboot.txt
# Should show no differences # Should show no differences
# Also verify XDG_RUNTIME_DIR survived reboot
ls /run/user/1000/ || echo "CRITICAL: lingering not working"
``` ```
## Monitoring ## Monitoring
@ -292,18 +388,23 @@ diff /tmp/before-reboot.txt /tmp/after-reboot.txt
Check uptime status anytime: Check uptime status anytime:
```bash ```bash
# Quick status # Quick status
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort
# Watchdog activity # Watchdog activity
sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager
# Container events (starts, stops, deaths) # Container events (starts, stops, deaths)
sudo podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30 podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30
# Check for permission denied errors (rootless UID mapping issue)
podman ps -a --filter status=exited --format "{{.Names}}" | while read c; do
podman logs --tail 5 "$c" 2>&1 | grep -i "permission denied" && echo " ^ UID mapping issue in: $c"
done
``` ```
## Integration ## Integration
- Run `/podman-doctor` first to identify issues - Run `/podman-doctor` first to identify issues (includes rootless health checks)
- Run `/podman-fix` for specific container repairs - Run `/podman-fix` for specific container repairs (includes UID mapping fixes)
- Run `/podman-uptime` to set up permanent reliability infrastructure - Run `/podman-uptime` to set up permanent reliability infrastructure
- Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot - Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot

File diff suppressed because it is too large Load Diff

View File

@ -82,7 +82,7 @@ define(['./workbox-21a80088'], (function (workbox) { 'use strict';
"revision": "3ca0b8505b4bec776b69afdba2768812" "revision": "3ca0b8505b4bec776b69afdba2768812"
}, { }, {
"url": "index.html", "url": "index.html",
"revision": "0.a4nevj6csc4" "revision": "0.2lte02eatlc"
}], {}); }], {});
workbox.cleanupOutdatedCaches(); workbox.cleanupOutdatedCaches();
workbox.registerRoute(new workbox.NavigationRoute(workbox.createHandlerBoundToURL("index.html"), { workbox.registerRoute(new workbox.NavigationRoute(workbox.createHandlerBoundToURL("index.html"), {