fix: rootless UID mapping corrections + credential injection
- Correct off-by-one in UID mapping: container UID N → host UID (100000 + N - 1), not (100000 + N) - Deploy script auto-fixes UID ownership on every deploy - Bitcoin UI nginx uses __BITCOIN_RPC_AUTH__ placeholder injected from secrets at deploy time - container rules updated for rootless podman architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
bf0cd342ca
commit
5008cb6d1f
@ -5,15 +5,46 @@ globs:
|
|||||||
- "**/*podman*"
|
- "**/*podman*"
|
||||||
- "**/Containerfile"
|
- "**/Containerfile"
|
||||||
- "**/Dockerfile"
|
- "**/Dockerfile"
|
||||||
|
- "**/first-boot*"
|
||||||
|
- "**/container-doctor*"
|
||||||
---
|
---
|
||||||
|
|
||||||
# Container Security Rules (Archipelago)
|
# Container Security Rules (Archipelago — Rootless Podman)
|
||||||
|
|
||||||
- `readonly_root: true` always — containers must not write to their root filesystem
|
## Rootless Podman Architecture
|
||||||
|
- Podman runs as `archipelago` user (UID 1000), NOT root — never use `sudo podman`
|
||||||
|
- UID namespace mapping via subuid: container UID N → host UID (100000 + N)
|
||||||
|
- Container images stored in `~/.local/share/containers/storage/` (NOT /var/lib/containers)
|
||||||
|
- Container subnet: `10.89.0.0/16` (rootless), not `10.88.0.0/16` (rootful)
|
||||||
|
- XDG_RUNTIME_DIR must be `/run/user/1000` — required for podman socket
|
||||||
|
- `loginctl enable-linger archipelago` required for containers to survive logout
|
||||||
|
|
||||||
|
## Container Security (Non-Negotiable)
|
||||||
- Drop ALL capabilities, add only what's required (`--cap-drop=ALL --cap-add=...`)
|
- Drop ALL capabilities, add only what's required (`--cap-drop=ALL --cap-add=...`)
|
||||||
- Run as non-root user (UID > 1000): `--user 1001:1001`
|
- Set `--security-opt=no-new-privileges:true` on all containers
|
||||||
- Set `--security-opt=no-new-privileges:true`
|
- Use `--read-only` + tmpfs where possible (safe apps: searxng, grafana, filebrowser, electrumx, nostr-rs-relay, ollama, indeedhub)
|
||||||
- Pin image versions by SHA256 digest, never use `:latest` tag
|
- Pin image versions — never use `:latest` tag
|
||||||
- Mount secrets as read-only files, never pass as environment variables when possible
|
- Mount secrets as read-only files, never pass as environment variables when possible
|
||||||
- Set memory and CPU limits on all containers
|
- Set memory and CPU limits on all containers
|
||||||
- Use `--network=none` unless network access is required
|
- All containers must have `--restart unless-stopped`
|
||||||
|
|
||||||
|
## Volume Ownership (Critical for Rootless)
|
||||||
|
- Volume directories must be owned by the MAPPED UID, not the container UID
|
||||||
|
- Formula: `host_uid = 100000 + container_uid`
|
||||||
|
- UID 0 (most apps) → `sudo chown -R 100000:100000 /var/lib/archipelago/{app}`
|
||||||
|
- UID 101 (bitcoin) → `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin`
|
||||||
|
- UID 70 (postgres) → `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*`
|
||||||
|
- UID 472 (grafana) → `sudo chown -R 100472:100472 /var/lib/archipelago/grafana`
|
||||||
|
- UID 999 (mariadb) → `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-*`
|
||||||
|
|
||||||
|
## Systemd Service Requirements
|
||||||
|
- `ProtectHome=no` — podman needs `~/.local/share/containers/`
|
||||||
|
- `PrivateTmp=no` — podman runtime uses `/tmp/podman-run-1000/`
|
||||||
|
- `RestrictNamespaces=` must NOT be set — rootless podman creates user namespaces
|
||||||
|
- `SystemCallFilter=` must NOT be set — rootless podman needs clone/unshare
|
||||||
|
- UFW `DEFAULT_FORWARD_POLICY="ACCEPT"` — required for LAN access to container ports
|
||||||
|
|
||||||
|
## Network Rules
|
||||||
|
- Apps needing inter-container DNS: use `--network=archy-net` (bitcoin, lnd, electrumx, mempool, btcpay, fedimint)
|
||||||
|
- Standalone apps: default bridge network
|
||||||
|
- Tailscale only: `--network=host` + `NET_ADMIN` + `NET_RAW` + `/dev/net/tun`
|
||||||
|
|||||||
@ -4,6 +4,7 @@ description: >
|
|||||||
Comprehensive Podman container diagnostic for Archipelago. Audits all running containers,
|
Comprehensive Podman container diagnostic for Archipelago. Audits all running containers,
|
||||||
port mappings, network connectivity, health status, restart policies, and config consistency
|
port mappings, network connectivity, health status, restart policies, and config consistency
|
||||||
across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing).
|
across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing).
|
||||||
|
Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536).
|
||||||
Use when asked to "diagnose containers", "check podman", "why is app not working",
|
Use when asked to "diagnose containers", "check podman", "why is app not working",
|
||||||
"container health check", "port not reachable", "audit containers", "podman status",
|
"container health check", "port not reachable", "audit containers", "podman status",
|
||||||
or when any container/app is misbehaving.
|
or when any container/app is misbehaving.
|
||||||
@ -12,46 +13,123 @@ allowed-tools: Bash Read Glob Grep
|
|||||||
|
|
||||||
# Podman Doctor — Container Infrastructure Diagnostics
|
# Podman Doctor — Container Infrastructure Diagnostics
|
||||||
|
|
||||||
Systematic diagnostic for Archipelago's Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, and config drift across all layers.
|
Systematic diagnostic for Archipelago's **rootless Podman** container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, UID mapping issues, and config drift across all layers.
|
||||||
|
|
||||||
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||||
|
|
||||||
|
> **ROOTLESS PODMAN**: Archipelago runs Podman as the `archipelago` user (UID 1000), NOT root.
|
||||||
|
> Never use `sudo podman` — use plain `podman` after SSH'ing in as the `archipelago` user.
|
||||||
|
> Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).
|
||||||
|
|
||||||
If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
|
If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit.
|
||||||
|
|
||||||
## Workflow
|
## Workflow
|
||||||
|
|
||||||
### Step 1: Gather Runtime State
|
### Step 1: Gather Runtime State
|
||||||
|
|
||||||
Run these on the server:
|
Run these on the server (as `archipelago` user — NO sudo):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# All containers with status, ports, networks
|
# All containers with status, ports, networks
|
||||||
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
|
podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
|
||||||
|
|
||||||
# Check for port conflicts on known ports
|
# Check for port conflicts on known ports
|
||||||
sudo ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
|
ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 2: Check Restart Policies
|
### Step 2: Rootless Podman Health Check
|
||||||
|
|
||||||
|
Rootless Podman has specific requirements that must be verified:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify running as archipelago user (NOT root)
|
||||||
|
whoami # Must be "archipelago"
|
||||||
|
id # Must show uid=1000(archipelago)
|
||||||
|
|
||||||
|
# Check XDG_RUNTIME_DIR is set (required for rootless podman socket)
|
||||||
|
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR" # Must be /run/user/1000
|
||||||
|
|
||||||
|
# Verify subuid/subgid mapping exists
|
||||||
|
grep archipelago /etc/subuid # Must show: archipelago:100000:65536
|
||||||
|
grep archipelago /etc/subgid # Must show: archipelago:100000:65536
|
||||||
|
|
||||||
|
# Verify user lingering is enabled (keeps user services after logout)
|
||||||
|
ls /var/lib/systemd/linger/ | grep archipelago # Must exist
|
||||||
|
|
||||||
|
# Check podman storage is accessible
|
||||||
|
podman info --format "{{.Store.GraphRoot}}" # ~/.local/share/containers/storage
|
||||||
|
ls -la ~/.local/share/containers/storage/ 2>/dev/null || echo "ERROR: Storage not accessible"
|
||||||
|
|
||||||
|
# Check podman socket
|
||||||
|
ls -la /run/user/1000/podman/ 2>/dev/null || echo "WARNING: No podman socket directory"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Check Restart Policies
|
||||||
|
|
||||||
Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots.
|
Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
||||||
echo -n "$c: "
|
echo -n "$c: "
|
||||||
sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
|
podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
|
||||||
done
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
**Red flag**: `no` or empty = container won't survive reboot.
|
**Red flag**: `no` or empty = container won't survive reboot.
|
||||||
|
|
||||||
### Step 3: Verify Port Mapping Consistency
|
### Step 4: Volume Ownership Audit (Rootless UID Mapping)
|
||||||
|
|
||||||
|
Rootless Podman maps container UIDs via subuid. Volume directories must be owned by the MAPPED UID, not the container UID. Formula: `host_uid = 100000 + container_uid`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
echo "=== Volume Ownership Check ==="
|
||||||
|
|
||||||
|
# Default containers (run as root inside = UID 0 → host UID 100000)
|
||||||
|
for dir in lnd fedimint homeassistant jellyfin vaultwarden photoprism ollama filebrowser electrumx btcpay immich; do
|
||||||
|
if [ -d "/var/lib/archipelago/$dir" ]; then
|
||||||
|
owner=$(stat -c '%u:%g' "/var/lib/archipelago/$dir" 2>/dev/null)
|
||||||
|
if [ "$owner" != "100000:100000" ]; then
|
||||||
|
echo "WRONG: /var/lib/archipelago/$dir owned by $owner (should be 100000:100000)"
|
||||||
|
else
|
||||||
|
echo " OK: $dir → $owner"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# Bitcoin Knots (container UID 101 → host UID 100101)
|
||||||
|
if [ -d "/var/lib/archipelago/bitcoin" ]; then
|
||||||
|
owner=$(stat -c '%u:%g' "/var/lib/archipelago/bitcoin")
|
||||||
|
[ "$owner" != "100101:100101" ] && echo "WRONG: bitcoin owned by $owner (should be 100101:100101)" || echo " OK: bitcoin → $owner"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# PostgreSQL (container UID 70 → host UID 100070)
|
||||||
|
for dir in /var/lib/archipelago/*-db /var/lib/archipelago/postgres-*; do
|
||||||
|
if [ -d "$dir" ]; then
|
||||||
|
owner=$(stat -c '%u:%g' "$dir")
|
||||||
|
[ "$owner" != "100070:100070" ] && echo "WRONG: $dir owned by $owner (should be 100070:100070)" || echo " OK: $(basename $dir) → $owner"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
# Grafana (container UID 472 → host UID 100472)
|
||||||
|
if [ -d "/var/lib/archipelago/grafana" ]; then
|
||||||
|
owner=$(stat -c '%u:%g' "/var/lib/archipelago/grafana")
|
||||||
|
[ "$owner" != "100472:100472" ] && echo "WRONG: grafana owned by $owner (should be 100472:100472)" || echo " OK: grafana → $owner"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# MariaDB/MySQL (container UID 999 → host UID 100999)
|
||||||
|
if [ -d "/var/lib/archipelago/mysql-mempool" ]; then
|
||||||
|
owner=$(stat -c '%u:%g' "/var/lib/archipelago/mysql-mempool")
|
||||||
|
[ "$owner" != "100999:100999" ] && echo "WRONG: mysql-mempool owned by $owner (should be 100999:100999)" || echo " OK: mysql-mempool → $owner"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Verify Port Mapping Consistency
|
||||||
|
|
||||||
Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
|
Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs:
|
||||||
|
|
||||||
**Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings.
|
**Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings.
|
||||||
|
|
||||||
**Layer 2 — Podman Runtime**: `sudo podman ps --format "{{.Names}}: {{.Ports}}"`
|
**Layer 2 — Podman Runtime**: `podman ps --format "{{.Names}}: {{.Ports}}"`
|
||||||
|
|
||||||
**Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks:
|
**Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks:
|
||||||
- `image-recipe/configs/nginx-archipelago.conf` (HTTP)
|
- `image-recipe/configs/nginx-archipelago.conf` (HTTP)
|
||||||
@ -66,77 +144,114 @@ Cross-reference these 4 layers — mismatches between ANY two cause "app not loa
|
|||||||
| Works on port but not /app/ path | Missing nginx location block |
|
| Works on port but not /app/ path | Missing nginx location block |
|
||||||
| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
|
| Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts |
|
||||||
|
|
||||||
### Step 4: Network Connectivity Audit
|
### Step 6: Network Connectivity Audit
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Networks and their containers
|
# Networks and their containers
|
||||||
sudo podman network ls
|
podman network ls
|
||||||
sudo podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
|
podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!"
|
||||||
|
|
||||||
|
# Check container subnet (rootless uses 10.89.x.x, NOT 10.88.x.x)
|
||||||
|
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}" 2>/dev/null
|
||||||
```
|
```
|
||||||
|
|
||||||
**Must be on archy-net**: bitcoin-knots, lnd, electrs, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
|
**Must be on archy-net**: bitcoin-knots, lnd, electrs/electrumx, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui
|
||||||
|
|
||||||
**Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
|
**Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network)
|
||||||
|
|
||||||
### Step 5: Health Check Status
|
### Step 7: UFW Forward Policy Check
|
||||||
|
|
||||||
|
Rootless Podman requires `DEFAULT_FORWARD_POLICY="ACCEPT"` in UFW, otherwise container ports are unreachable from LAN.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
|
||||||
|
# Must be "ACCEPT", NOT "DROP"
|
||||||
|
# If DROP: containers work locally but NOT from other machines on the network
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 8: Systemd Service Sandbox Check
|
||||||
|
|
||||||
|
The `archipelago.service` must have specific settings relaxed for rootless Podman:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check critical settings
|
||||||
|
systemctl cat archipelago.service | grep -E "ProtectHome|PrivateTmp|RestrictNamespaces|ReadWritePaths|XDG_RUNTIME_DIR"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Required settings for rootless Podman**:
|
||||||
|
- `ProtectHome=no` — podman stores images in `~/.local/share/containers/`
|
||||||
|
- `PrivateTmp=no` or disabled — podman runtime uses `/tmp/podman-run-1000/`
|
||||||
|
- `RestrictNamespaces=` must NOT be set — rootless podman needs user namespaces
|
||||||
|
- `ReadWritePaths=` must include `/var/lib/archipelago /run/user /tmp`
|
||||||
|
- `Environment=XDG_RUNTIME_DIR=/run/user/1000`
|
||||||
|
|
||||||
|
### Step 9: Health Check Status
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Containers with health checks — are they passing?
|
# Containers with health checks — are they passing?
|
||||||
for c in $(sudo podman ps --format "{{.Names}}"); do
|
for c in $(podman ps --format "{{.Names}}"); do
|
||||||
health=$(sudo podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
|
health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
|
||||||
if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
|
if [ -n "$health" ] && [ "$health" != "<no value>" ]; then
|
||||||
echo "$c: $health"
|
echo "$c: $health"
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
|
||||||
# Containers WITHOUT health checks (gap in monitoring)
|
# Containers WITHOUT health checks (gap in monitoring)
|
||||||
for c in $(sudo podman ps --format "{{.Names}}"); do
|
for c in $(podman ps --format "{{.Names}}"); do
|
||||||
hc=$(sudo podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
|
hc=$(podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null)
|
||||||
if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
|
if [ "$hc" = "<nil>" ] || [ -z "$hc" ]; then
|
||||||
echo "NO HEALTHCHECK: $c"
|
echo "NO HEALTHCHECK: $c"
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 6: Resource & Failure Analysis
|
### Step 10: Resource & Failure Analysis
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Resource usage
|
# Resource usage
|
||||||
sudo podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
|
podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}"
|
||||||
|
|
||||||
# Recent deaths (last 24h)
|
# Recent deaths (last 24h)
|
||||||
sudo podman events --filter event=died --since 24h 2>/dev/null | tail -20
|
podman events --filter event=died --since 24h 2>/dev/null | tail -20
|
||||||
|
|
||||||
# OOM kills
|
# OOM kills
|
||||||
sudo podman ps -a --format "{{.Names}}" | while read c; do
|
podman ps -a --format "{{.Names}}" | while read c; do
|
||||||
oom=$(sudo podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
|
oom=$(podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null)
|
||||||
[ "$oom" = "true" ] && echo "OOM KILLED: $c"
|
[ "$oom" = "true" ] && echo "OOM KILLED: $c"
|
||||||
done
|
done
|
||||||
|
|
||||||
# Non-zero exits
|
# Non-zero exits
|
||||||
sudo podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
|
podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 7: Systemd Integration
|
### Step 11: Systemd Integration
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
systemctl is-active archipelago nginx
|
systemctl is-active archipelago nginx
|
||||||
systemctl list-units --type=service | grep -i podman
|
systemctl --user list-units --type=service 2>/dev/null | grep -i podman
|
||||||
systemctl list-timers --all | grep -i -E "podman|container|archipelago"
|
systemctl list-timers --all | grep -i -E "podman|container|archipelago"
|
||||||
```
|
```
|
||||||
|
|
||||||
### Step 8: Generate Report
|
### Step 12: Generate Report
|
||||||
|
|
||||||
Produce a structured report:
|
Produce a structured report:
|
||||||
|
|
||||||
```
|
```
|
||||||
## Container Diagnostic Report
|
## Container Diagnostic Report
|
||||||
|
|
||||||
|
### Rootless Podman Status
|
||||||
|
- User: archipelago (UID 1000)
|
||||||
|
- Subuid mapping: [OK/MISSING]
|
||||||
|
- XDG_RUNTIME_DIR: [OK/MISSING]
|
||||||
|
- User linger: [enabled/disabled]
|
||||||
|
- UFW forward policy: [ACCEPT/DROP]
|
||||||
|
|
||||||
### Summary
|
### Summary
|
||||||
- Total containers: X running, Y stopped, Z unhealthy
|
- Total containers: X running, Y stopped, Z unhealthy
|
||||||
- Port conflicts: [list or "none"]
|
- Port conflicts: [list or "none"]
|
||||||
- Missing restart policies: [list or "none"]
|
- Missing restart policies: [list or "none"]
|
||||||
- Network issues: [list or "none"]
|
- Network issues: [list or "none"]
|
||||||
|
- UID mapping issues: [list or "none"]
|
||||||
- Health check gaps: [list]
|
- Health check gaps: [list]
|
||||||
|
|
||||||
### Critical Issues (fix immediately)
|
### Critical Issues (fix immediately)
|
||||||
@ -154,3 +269,7 @@ After diagnosis, suggest running `/podman-fix` for any issues found.
|
|||||||
## Port Reference
|
## Port Reference
|
||||||
|
|
||||||
See `references/port-map.md` for the canonical port assignment table across all 4 layers.
|
See `references/port-map.md` for the canonical port assignment table across all 4 layers.
|
||||||
|
|
||||||
|
## UID Mapping Reference
|
||||||
|
|
||||||
|
See `references/uid-mapping.md` for the complete rootless UID mapping table.
|
||||||
|
|||||||
@ -1,15 +1,31 @@
|
|||||||
# Common Podman Failure Patterns
|
# Common Podman Failure Patterns
|
||||||
|
|
||||||
|
## Rootless Podman Specific Failures
|
||||||
|
|
||||||
|
| Error | Cause | Fix |
|
||||||
|
|-------|-------|-----|
|
||||||
|
| `ERRO[0000] cannot find UID/GID for user` | subuid/subgid not configured | Add `archipelago:100000:65536` to `/etc/subuid` and `/etc/subgid` |
|
||||||
|
| `Error: unshare: operation not permitted` | Systemd `RestrictNamespaces` blocks user namespaces | Remove `RestrictNamespaces=` from `archipelago.service` |
|
||||||
|
| `Error: could not get runtime: creating runtime` | XDG_RUNTIME_DIR not set or /run/user/1000 missing | Set `Environment=XDG_RUNTIME_DIR=/run/user/1000` in service, ensure `loginctl enable-linger archipelago` |
|
||||||
|
| `permission denied` on volume mount | Wrong UID ownership — must use mapped UIDs | `sudo chown -R 100000:100000 /var/lib/archipelago/APP` (see UID mapping table) |
|
||||||
|
| `ERRO[0000] rootless containers not supported` | Podman not configured for rootless | Run `podman system migrate`, check `/etc/subuid` |
|
||||||
|
| `Error: creating container storage: layer not known` | Corrupted rootless storage | `podman system reset` (destroys all containers — last resort) |
|
||||||
|
| `Error: stat /tmp/podman-run-1000/...: no such file` | PrivateTmp=yes in systemd isolates /tmp | Set `PrivateTmp=no` in `archipelago.service` |
|
||||||
|
| Container ports unreachable from LAN | UFW DEFAULT_FORWARD_POLICY="DROP" | Change to "ACCEPT" in `/etc/default/ufw`, then `sudo ufw reload` |
|
||||||
|
| `Error: error creating network namespace` | Systemd `SystemCallFilter` blocks clone/unshare | Remove `SystemCallFilter=` from `archipelago.service` |
|
||||||
|
| Containers lose network after service restart | podman runtime dir in /tmp cleaned | Ensure `PrivateTmp=no` so /tmp/podman-run-1000/ persists |
|
||||||
|
|
||||||
## Container Won't Start
|
## Container Won't Start
|
||||||
|
|
||||||
| Error | Cause | Fix |
|
| Error | Cause | Fix |
|
||||||
|-------|-------|-----|
|
|-------|-------|-----|
|
||||||
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
|
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
|
||||||
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
|
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
|
||||||
| `permission denied` | Missing capability or read-only root | Check `get_app_capabilities()`, add tmpfs |
|
| `permission denied` | Missing capability, wrong UID ownership, or read-only root | Check capabilities, check volume ownership with mapped UID, add tmpfs |
|
||||||
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
|
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
|
||||||
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
|
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
|
||||||
| `no such network` | Network missing | `podman network create archy-net` |
|
| `no such network` | Network missing | `podman network create archy-net` |
|
||||||
|
| `Error: netavark: ...subnet overlap` | Network CIDR conflict | `podman network rm archy-net && podman network create archy-net` |
|
||||||
|
|
||||||
## Container Starts But App Unreachable
|
## Container Starts But App Unreachable
|
||||||
|
|
||||||
@ -20,6 +36,7 @@
|
|||||||
| Port mapped but refused | Container logs | App crashing internally — check logs |
|
| Port mapped but refused | Container logs | App crashing internally — check logs |
|
||||||
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
|
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
|
||||||
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
|
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
|
||||||
|
| Works locally but not from LAN | UFW forward policy | Set `DEFAULT_FORWARD_POLICY="ACCEPT"` in `/etc/default/ufw` |
|
||||||
|
|
||||||
## Container Keeps Dying
|
## Container Keeps Dying
|
||||||
|
|
||||||
@ -29,6 +46,8 @@
|
|||||||
| Dies after minutes | OOM killed | Increase `--memory` limit |
|
| Dies after minutes | OOM killed | Increase `--memory` limit |
|
||||||
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
|
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
|
||||||
| Crash loop | Repeated crash | Fix root cause, don't just restart |
|
| Crash loop | Repeated crash | Fix root cause, don't just restart |
|
||||||
|
| Exit code 127 | Missing binary in container | Wrong image tag or corrupted image — re-pull |
|
||||||
|
| Exit code 137 | Killed by OOM or signal | Check `dmesg` for OOM kill, check `podman inspect` for OOMKilled |
|
||||||
|
|
||||||
## Network Issues
|
## Network Issues
|
||||||
|
|
||||||
@ -37,6 +56,20 @@
|
|||||||
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
|
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
|
||||||
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
|
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
|
||||||
| Container-to-container timeout | Different networks | Put both on same network |
|
| Container-to-container timeout | Different networks | Put both on same network |
|
||||||
|
| Bitcoin RPC refused from container | rpcallowip wrong subnet | Use `rpcallowip=0.0.0.0/0` (safe: port mapped, not exposed) |
|
||||||
|
| Old containers can't find new network | Subnet changed (rootful→rootless) | Recreate containers on new archy-net (rootless uses 10.89.x.x) |
|
||||||
|
|
||||||
|
## Volume Permission Patterns (Rootless UID Mapping)
|
||||||
|
|
||||||
|
Formula: **host_uid = 100000 + container_uid**
|
||||||
|
|
||||||
|
| Container UID | Host UID | Apps | Data Directory |
|
||||||
|
|---|---|---|---|
|
||||||
|
| 0 (root) | 100000 | lnd, fedimint, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay, immich | `/var/lib/archipelago/{app}` |
|
||||||
|
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `/var/lib/archipelago/postgres-*` |
|
||||||
|
| 101 | 100101 | bitcoin-knots | `/var/lib/archipelago/bitcoin` |
|
||||||
|
| 472 | 100472 | grafana | `/var/lib/archipelago/grafana` |
|
||||||
|
| 999 | 100999 | MariaDB (mysql-mempool) | `/var/lib/archipelago/mysql-mempool` |
|
||||||
|
|
||||||
## Capability Reference
|
## Capability Reference
|
||||||
|
|
||||||
@ -47,9 +80,23 @@
|
|||||||
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
|
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
|
||||||
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
|
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
|
||||||
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
|
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
|
||||||
|
| NET_ADMIN + NET_RAW | tailscale | Can't create TUN device or manage routes |
|
||||||
|
|
||||||
## Read-Only Safe Apps
|
## Read-Only Safe Apps
|
||||||
|
|
||||||
Only these 8 apps can run with `--read-only`: searxng, grafana, filebrowser, electrs, nostr-rs-relay, ollama, indeedhub
|
Only these apps can run with `--read-only` + tmpfs: searxng, grafana, filebrowser, electrumx, mempool-electrs, electrs, nostr-rs-relay, ollama, indeedhub
|
||||||
|
|
||||||
All others need writable root or will fail silently.
|
All others need writable root or will fail silently.
|
||||||
|
|
||||||
|
## Systemd Sandbox Requirements for Rootless Podman
|
||||||
|
|
||||||
|
These systemd service settings MUST be configured for rootless Podman to work:
|
||||||
|
|
||||||
|
| Setting | Required Value | Why |
|
||||||
|
|---------|---------------|-----|
|
||||||
|
| `ProtectHome=` | `no` | Podman stores images in `~/.local/share/containers/` |
|
||||||
|
| `PrivateTmp=` | `no` | Podman runtime lives in `/tmp/podman-run-1000/` |
|
||||||
|
| `RestrictNamespaces=` | NOT SET | Rootless podman creates user namespaces |
|
||||||
|
| `SystemCallFilter=` | NOT SET | Rootless podman needs clone/unshare syscalls |
|
||||||
|
| `ReadWritePaths=` | Include `/var/lib/archipelago /run/user /tmp /etc/containers /var/lib/containers /run/containers` | Volume data + podman runtime paths |
|
||||||
|
| `Environment=` | `XDG_RUNTIME_DIR=/run/user/1000` | Podman socket location |
|
||||||
|
|||||||
93
.claude/skills/podman-doctor/references/uid-mapping.md
Normal file
93
.claude/skills/podman-doctor/references/uid-mapping.md
Normal file
@ -0,0 +1,93 @@
|
|||||||
|
# Rootless Podman UID Mapping Reference
|
||||||
|
|
||||||
|
## How Rootless UID Mapping Works
|
||||||
|
|
||||||
|
When Podman runs as the `archipelago` user (UID 1000), container processes don't run as their "apparent" UID on the host. Instead, Linux user namespaces remap UIDs.
|
||||||
|
|
||||||
|
**Mapping formula**: `host_uid = 100000 + container_uid`
|
||||||
|
|
||||||
|
This is configured in `/etc/subuid` and `/etc/subgid`:
|
||||||
|
```
|
||||||
|
archipelago:100000:65536
|
||||||
|
```
|
||||||
|
|
||||||
|
This means:
|
||||||
|
- Container UID 0 (root inside container) → Host UID 100000 (unprivileged on host)
|
||||||
|
- Container UID 70 (postgres) → Host UID 100070
|
||||||
|
- Container UID 101 (bitcoin) → Host UID 100101
|
||||||
|
- etc.
|
||||||
|
|
||||||
|
## Why This Matters
|
||||||
|
|
||||||
|
Volume directories (bind mounts) on the host must be owned by the **mapped** UID, not the container UID. If Bitcoin runs as UID 101 inside its container, the host directory must be owned by UID 100101.
|
||||||
|
|
||||||
|
If ownership is wrong, the container gets `permission denied` when trying to read/write its data.
|
||||||
|
|
||||||
|
## Complete UID Mapping Table
|
||||||
|
|
||||||
|
| Container UID | Host UID | Containers | Fix Command |
|
||||||
|
|---|---|---|---|
|
||||||
|
| 0 (root) | 100000 | lnd, fedimint, fedimint-gateway, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay-server, nbxplorer, immich, nostr-rs-relay, strfry, nextcloud, searxng, onlyoffice, tailscale, uptime-kuma | `sudo chown -R 100000:100000 /var/lib/archipelago/{app}` |
|
||||||
|
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*` |
|
||||||
|
| 101 | 100101 | bitcoin-knots, bitcoin-core | `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin` |
|
||||||
|
| 472 | 100472 | grafana | `sudo chown -R 100472:100472 /var/lib/archipelago/grafana` |
|
||||||
|
| 999 | 100999 | MariaDB (mysql-mempool) | `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool` |
|
||||||
|
|
||||||
|
## How to Find a Container's UID
|
||||||
|
|
||||||
|
If you encounter a new container with permission issues:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check what user the container runs as
|
||||||
|
podman inspect CONTAINER_NAME --format "{{.Config.User}}"
|
||||||
|
|
||||||
|
# If empty, it runs as root (UID 0) → host UID 100000
|
||||||
|
|
||||||
|
# If it shows a username, find the UID inside the image
|
||||||
|
podman run --rm IMAGE_NAME id
|
||||||
|
|
||||||
|
# Then calculate: host_uid = 100000 + container_uid
|
||||||
|
```
|
||||||
|
|
||||||
|
## Fix Script
|
||||||
|
|
||||||
|
Run this after any fresh install, migration, or when containers have permission errors:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# Fix all rootless podman volume ownership
|
||||||
|
|
||||||
|
# UID 0 → 100000 (most containers)
|
||||||
|
for dir in lnd fedimint fedimint-gateway homeassistant jellyfin vaultwarden photoprism \
|
||||||
|
ollama filebrowser electrumx btcpay nbxplorer immich nostr-rs-relay nextcloud \
|
||||||
|
searxng onlyoffice uptime-kuma; do
|
||||||
|
[ -d "/var/lib/archipelago/$dir" ] && sudo chown -R 100000:100000 "/var/lib/archipelago/$dir"
|
||||||
|
done
|
||||||
|
|
||||||
|
# UID 101 → 100101 (Bitcoin)
|
||||||
|
[ -d "/var/lib/archipelago/bitcoin" ] && sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin
|
||||||
|
|
||||||
|
# UID 70 → 100070 (PostgreSQL)
|
||||||
|
for dir in /var/lib/archipelago/postgres-* /var/lib/archipelago/btcpay-db /var/lib/archipelago/immich-db; do
|
||||||
|
[ -d "$dir" ] && sudo chown -R 100070:100070 "$dir"
|
||||||
|
done
|
||||||
|
|
||||||
|
# UID 999 → 100999 (MariaDB)
|
||||||
|
[ -d "/var/lib/archipelago/mysql-mempool" ] && sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool
|
||||||
|
|
||||||
|
# UID 472 → 100472 (Grafana)
|
||||||
|
[ -d "/var/lib/archipelago/grafana" ] && sudo chown -R 100472:100472 /var/lib/archipelago/grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
## Rootful vs Rootless Comparison
|
||||||
|
|
||||||
|
| Aspect | Rootful (old) | Rootless (current) |
|
||||||
|
|--------|---------------|-------------------|
|
||||||
|
| Podman command | `sudo podman` | `podman` (as archipelago user) |
|
||||||
|
| Container storage | `/var/lib/containers/storage` | `~/.local/share/containers/storage` |
|
||||||
|
| Container subnet | `10.88.0.0/16` | `10.89.0.0/16` |
|
||||||
|
| Volume ownership | Container UID directly | Mapped UID (100000 + container_uid) |
|
||||||
|
| Requires root? | Yes | No (except fixing volume ownership) |
|
||||||
|
| XDG_RUNTIME_DIR | Not needed | Required: `/run/user/1000` |
|
||||||
|
| User lingering | Not needed | Required: `loginctl enable-linger` |
|
||||||
|
| Systemd restrictions | All can be enabled | Must disable: RestrictNamespaces, SystemCallFilter |
|
||||||
@ -2,19 +2,24 @@
|
|||||||
name: podman-fix
|
name: podman-fix
|
||||||
description: >
|
description: >
|
||||||
Fix Podman container issues on Archipelago — restart failed containers, repair port bindings,
|
Fix Podman container issues on Archipelago — restart failed containers, repair port bindings,
|
||||||
fix network connectivity, add missing restart policies, and resolve config drift.
|
fix network connectivity, add missing restart policies, fix rootless UID mapping, and resolve
|
||||||
|
config drift. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536).
|
||||||
Use when asked to "fix container", "restart app", "fix port mapping", "container not working",
|
Use when asked to "fix container", "restart app", "fix port mapping", "container not working",
|
||||||
"app won't start", "fix podman", "repair container", "container down", or after /podman-doctor
|
"app won't start", "fix podman", "repair container", "container down", "permission denied",
|
||||||
identifies issues to fix.
|
or after /podman-doctor identifies issues to fix.
|
||||||
allowed-tools: Bash Read Edit Write Glob Grep
|
allowed-tools: Bash Read Edit Write Glob Grep
|
||||||
---
|
---
|
||||||
|
|
||||||
# Podman Fix — Container Remediation
|
# Podman Fix — Container Remediation
|
||||||
|
|
||||||
Targeted fix workflow for Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
|
Targeted fix workflow for **rootless Podman** container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it.
|
||||||
|
|
||||||
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||||
|
|
||||||
|
> **ROOTLESS PODMAN**: All `podman` commands run as the `archipelago` user — NO sudo.
|
||||||
|
> Only use `sudo` for: chown on volume directories, UFW changes, systemd service edits, nginx reload.
|
||||||
|
> Container UIDs are mapped via subuid: container UID N → host UID (100000 + N).
|
||||||
|
|
||||||
If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
|
If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing.
|
||||||
|
|
||||||
## Fix Procedures
|
## Fix Procedures
|
||||||
@ -23,21 +28,22 @@ If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check why it stopped
|
# Check why it stopped
|
||||||
sudo podman logs --tail 50 CONTAINER_NAME
|
podman logs --tail 50 CONTAINER_NAME
|
||||||
sudo podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
|
podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}"
|
||||||
|
|
||||||
# If clean exit or crash — just restart
|
# If clean exit or crash — just restart
|
||||||
sudo podman start CONTAINER_NAME
|
podman start CONTAINER_NAME
|
||||||
|
|
||||||
# If corrupt state — remove and recreate
|
# If corrupt state — remove and recreate
|
||||||
sudo podman rm -f CONTAINER_NAME
|
podman rm -f CONTAINER_NAME
|
||||||
# Then recreate using the install flow (trigger from UI or re-run creation command)
|
# Then recreate using the install flow (trigger from UI or re-run creation command)
|
||||||
```
|
```
|
||||||
|
|
||||||
**If container keeps crashing**: check logs for the actual error. Common causes:
|
**If container keeps crashing**, check logs for the actual error. Common causes:
|
||||||
- Missing config file → check if volume mount has the config
|
- Missing config file → check if volume mount has the config
|
||||||
- Wrong permissions → `chown -R` the data directory
|
- Wrong permissions → fix UID mapping (see Fix 8 below)
|
||||||
- Dependency not ready → start dependency first, wait, then start this container
|
- Dependency not ready → start dependency first, wait, then start this container
|
||||||
|
- Exit code 127 → missing binary in container image, re-pull the image
|
||||||
|
|
||||||
### Fix 2: Missing Restart Policy
|
### Fix 2: Missing Restart Policy
|
||||||
|
|
||||||
@ -45,14 +51,14 @@ The most common uptime killer. Fix for ALL containers at once:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Fix a single container
|
# Fix a single container
|
||||||
sudo podman update --restart unless-stopped CONTAINER_NAME
|
podman update --restart unless-stopped CONTAINER_NAME
|
||||||
|
|
||||||
# Fix ALL containers that have no restart policy
|
# Fix ALL containers that have no restart policy
|
||||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
||||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||||
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
|
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
|
||||||
echo "Fixing restart policy for: $c"
|
echo "Fixing restart policy for: $c"
|
||||||
sudo podman update --restart unless-stopped "$c"
|
podman update --restart unless-stopped "$c"
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
```
|
```
|
||||||
@ -66,23 +72,24 @@ done
|
|||||||
#### Port conflict (address already in use)
|
#### Port conflict (address already in use)
|
||||||
```bash
|
```bash
|
||||||
# Find what's using the port
|
# Find what's using the port
|
||||||
sudo ss -tlnp | grep :PORT_NUMBER
|
ss -tlnp | grep :PORT_NUMBER
|
||||||
|
|
||||||
# If it's another container, either change one's port or stop the conflicting one
|
# If it's another container, either change one's port or stop the conflicting one
|
||||||
sudo podman stop CONFLICTING_CONTAINER
|
podman stop CONFLICTING_CONTAINER
|
||||||
|
|
||||||
# If it's a host process
|
# If it's a host process (e.g., system tor vs container tor)
|
||||||
sudo kill PID # or stop the service
|
sudo systemctl stop tor # Stop system service if container needs the port
|
||||||
|
sudo systemctl disable tor
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Port not mapped (container running but port unreachable)
|
#### Port not mapped (container running but port unreachable)
|
||||||
```bash
|
```bash
|
||||||
# Check current port mappings
|
# Check current port mappings
|
||||||
sudo podman port CONTAINER_NAME
|
podman port CONTAINER_NAME
|
||||||
|
|
||||||
# Can't add ports to running container — must recreate
|
# Can't add ports to running container — must recreate
|
||||||
sudo podman stop CONTAINER_NAME
|
podman stop CONTAINER_NAME
|
||||||
sudo podman rm CONTAINER_NAME
|
podman rm CONTAINER_NAME
|
||||||
# Recreate with correct -p flags (use the Rust install flow or manual podman run)
|
# Recreate with correct -p flags (use the Rust install flow or manual podman run)
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -124,35 +131,51 @@ Edit `neode-ui/src/stores/appLauncher.ts`:
|
|||||||
#### Container not on archy-net (can't resolve other containers)
|
#### Container not on archy-net (can't resolve other containers)
|
||||||
```bash
|
```bash
|
||||||
# Connect to archy-net without recreating
|
# Connect to archy-net without recreating
|
||||||
sudo podman network connect archy-net CONTAINER_NAME
|
podman network connect archy-net CONTAINER_NAME
|
||||||
|
|
||||||
# Verify
|
# Verify
|
||||||
sudo podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
|
podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}"
|
||||||
```
|
```
|
||||||
|
|
||||||
#### archy-net doesn't exist
|
#### archy-net doesn't exist
|
||||||
```bash
|
```bash
|
||||||
sudo podman network create archy-net
|
podman network create archy-net
|
||||||
# Then reconnect all containers that need it
|
# Then reconnect all containers that need it
|
||||||
```
|
```
|
||||||
|
|
||||||
#### DNS not working inside container
|
#### DNS not working inside container
|
||||||
```bash
|
```bash
|
||||||
# Test DNS from inside container
|
# Test DNS from inside container
|
||||||
sudo podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
|
podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \
|
||||||
sudo podman exec CONTAINER_NAME ping -c1 bitcoin-knots
|
podman exec CONTAINER_NAME ping -c1 bitcoin-knots
|
||||||
|
|
||||||
|
# If DNS fails, check the container's resolv.conf
|
||||||
|
podman exec CONTAINER_NAME cat /etc/resolv.conf
|
||||||
|
|
||||||
# If DNS fails, recreate container with explicit DNS
|
# If DNS fails, recreate container with explicit DNS
|
||||||
# Add --dns 1.1.1.1 to the podman run command
|
# Add --dns 1.1.1.1 to the podman run command
|
||||||
```
|
```
|
||||||
|
|
||||||
|
#### Container subnet changed (rootful → rootless migration)
|
||||||
|
```bash
|
||||||
|
# Old rootful subnet: 10.88.0.0/16
|
||||||
|
# New rootless subnet: 10.89.0.0/16
|
||||||
|
# Bitcoin RPC rpcallowip must be updated if using subnet-specific allowlist
|
||||||
|
|
||||||
|
# Check current archy-net subnet
|
||||||
|
podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}"
|
||||||
|
|
||||||
|
# If Bitcoin RPC refuses connections from containers:
|
||||||
|
# Update bitcoin.conf rpcallowip to 0.0.0.0/0 (safe: only accessible via port mapping)
|
||||||
|
```
|
||||||
|
|
||||||
### Fix 5: Health Check Issues
|
### Fix 5: Health Check Issues
|
||||||
|
|
||||||
#### Add missing health check to running container
|
#### Add missing health check to running container
|
||||||
Can't add to running container — must recreate with health check flags:
|
Can't add to running container — must recreate with health check flags:
|
||||||
```bash
|
```bash
|
||||||
# Example for a web app
|
# Example for a web app
|
||||||
sudo podman run ... \
|
podman run ... \
|
||||||
--health-cmd "curl -f http://localhost:PORT/health || exit 1" \
|
--health-cmd "curl -f http://localhost:PORT/health || exit 1" \
|
||||||
--health-interval 30s \
|
--health-interval 30s \
|
||||||
--health-timeout 5s \
|
--health-timeout 5s \
|
||||||
@ -164,10 +187,10 @@ sudo podman run ... \
|
|||||||
#### Fix unhealthy container
|
#### Fix unhealthy container
|
||||||
```bash
|
```bash
|
||||||
# See what the health check is actually running
|
# See what the health check is actually running
|
||||||
sudo podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
|
podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}"
|
||||||
|
|
||||||
# Run the health check manually to see the error
|
# Run the health check manually to see the error
|
||||||
sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
|
podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
|
||||||
|
|
||||||
# Common fixes:
|
# Common fixes:
|
||||||
# - curl not installed in container → use wget or nc instead
|
# - curl not installed in container → use wget or nc instead
|
||||||
@ -179,13 +202,10 @@ sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Check what capabilities container has
|
# Check what capabilities container has
|
||||||
sudo podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
|
podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}"
|
||||||
|
|
||||||
# If missing required caps, must recreate with correct --cap-add flags
|
# If missing required caps, must recreate with correct --cap-add flags
|
||||||
# Refer to the capability reference in /podman-doctor references
|
# Refer to the capability reference in /podman-doctor references
|
||||||
|
|
||||||
# Fix data directory permissions
|
|
||||||
sudo chown -R 1000:1000 /var/lib/archipelago/APP_NAME/
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Fix 7: Full Config Consistency Fix
|
### Fix 7: Full Config Consistency Fix
|
||||||
@ -199,12 +219,108 @@ When port map is inconsistent across layers, fix ALL layers:
|
|||||||
5. **Deploy**: `./scripts/deploy-to-target.sh --live`
|
5. **Deploy**: `./scripts/deploy-to-target.sh --live`
|
||||||
6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/`
|
6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/`
|
||||||
|
|
||||||
|
### Fix 8: Rootless UID Mapping (Permission Denied on Volumes)
|
||||||
|
|
||||||
|
This is the #1 rootless-specific issue. Container UIDs are remapped by user namespaces.
|
||||||
|
|
||||||
|
**Formula**: `host_uid = 100000 + container_uid`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Fix UID 0 containers (most apps — run as root inside, mapped to 100000 on host)
|
||||||
|
sudo chown -R 100000:100000 /var/lib/archipelago/APP_NAME
|
||||||
|
|
||||||
|
# Fix Bitcoin (container UID 101 → host UID 100101)
|
||||||
|
sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin
|
||||||
|
|
||||||
|
# Fix PostgreSQL (container UID 70 → host UID 100070)
|
||||||
|
sudo chown -R 100070:100070 /var/lib/archipelago/postgres-APP_NAME
|
||||||
|
|
||||||
|
# Fix Grafana (container UID 472 → host UID 100472)
|
||||||
|
sudo chown -R 100472:100472 /var/lib/archipelago/grafana
|
||||||
|
|
||||||
|
# Fix MariaDB (container UID 999 → host UID 100999)
|
||||||
|
sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool
|
||||||
|
```
|
||||||
|
|
||||||
|
**How to find the right UID for a new container:**
|
||||||
|
```bash
|
||||||
|
# Check what user the container image runs as
|
||||||
|
podman inspect IMAGE_NAME --format "{{.Config.User}}"
|
||||||
|
# If empty = root (UID 0) → host UID 100000
|
||||||
|
# If number → host UID = 100000 + that number
|
||||||
|
# If username → run: podman run --rm IMAGE_NAME id
|
||||||
|
```
|
||||||
|
|
||||||
|
After fixing ownership, restart the container:
|
||||||
|
```bash
|
||||||
|
podman restart CONTAINER_NAME
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fix 9: UFW Forward Policy (LAN Access Broken)
|
||||||
|
|
||||||
|
If containers work locally but not from other machines on the network:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check current policy
|
||||||
|
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
|
||||||
|
|
||||||
|
# Fix: change DROP to ACCEPT
|
||||||
|
sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw
|
||||||
|
sudo ufw reload
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fix 10: Systemd Sandbox Too Restrictive
|
||||||
|
|
||||||
|
If the Rust backend can't scan/manage containers after a systemd update:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check what's blocked
|
||||||
|
sudo journalctl -u archipelago --since "10 min ago" | grep -i "denied\|permission\|namespace\|syscall"
|
||||||
|
|
||||||
|
# The archipelago.service MUST have these for rootless podman:
|
||||||
|
# ProtectHome=no
|
||||||
|
# PrivateTmp=no (or disabled)
|
||||||
|
# RestrictNamespaces= (NOT SET — don't restrict)
|
||||||
|
# SystemCallFilter= (NOT SET — don't filter)
|
||||||
|
# ReadWritePaths=/var/lib/archipelago /etc/containers /var/lib/containers /run/containers /run/user /tmp
|
||||||
|
# Environment=XDG_RUNTIME_DIR=/run/user/1000
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit the service file:
|
||||||
|
```bash
|
||||||
|
sudo systemctl edit archipelago.service
|
||||||
|
# Add overrides, then:
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl restart archipelago
|
||||||
|
```
|
||||||
|
|
||||||
|
### Fix 11: Stale Podman Processes
|
||||||
|
|
||||||
|
If `podman ps` hangs or is very slow:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Kill stuck podman processes (>10 of them = something is wrong)
|
||||||
|
stuck=$(pgrep -c -f "podman ps\|podman stats" 2>/dev/null || echo 0)
|
||||||
|
if [ "$stuck" -gt 10 ]; then
|
||||||
|
pkill -f "podman ps\|podman stats"
|
||||||
|
echo "Killed $stuck stuck podman processes"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Kill orphaned conmon processes holding ports
|
||||||
|
for pid in $(pgrep conmon); do
|
||||||
|
container=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | grep -oP '(?<=--cid )\S+')
|
||||||
|
if [ -n "$container" ] && ! podman ps -a --format "{{.ID}}" | grep -q "${container:0:12}"; then
|
||||||
|
kill "$pid" 2>/dev/null && echo "Killed orphan conmon $pid"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
## After Fixing
|
## After Fixing
|
||||||
|
|
||||||
Always verify the fix:
|
Always verify the fix:
|
||||||
```bash
|
```bash
|
||||||
# Container running?
|
# Container running?
|
||||||
sudo podman ps --filter name=CONTAINER_NAME
|
podman ps --filter name=CONTAINER_NAME
|
||||||
|
|
||||||
# Port reachable?
|
# Port reachable?
|
||||||
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
|
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
|
||||||
@ -213,7 +329,10 @@ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/
|
|||||||
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
|
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/
|
||||||
|
|
||||||
# Health check passing?
|
# Health check passing?
|
||||||
sudo podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
|
podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}"
|
||||||
|
|
||||||
|
# Volume permissions correct? (rootless check)
|
||||||
|
podman exec CONTAINER_NAME ls -la /data/ 2>/dev/null || echo "Check container data path"
|
||||||
```
|
```
|
||||||
|
|
||||||
Run `/podman-doctor` again to confirm all issues are resolved.
|
Run `/podman-doctor` again to confirm all issues are resolved.
|
||||||
|
|||||||
@ -3,7 +3,8 @@ name: podman-uptime
|
|||||||
description: >
|
description: >
|
||||||
Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies
|
Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies
|
||||||
restart policies, creates health check monitors, and configures auto-recovery for all
|
restart policies, creates health check monitors, and configures auto-recovery for all
|
||||||
containers. Use when asked to "ensure uptime", "containers keep dying", "auto-restart",
|
containers. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536).
|
||||||
|
Use when asked to "ensure uptime", "containers keep dying", "auto-restart",
|
||||||
"watchdog", "container monitoring", "uptime guarantee", "keep containers running",
|
"watchdog", "container monitoring", "uptime guarantee", "keep containers running",
|
||||||
"survive reboot", or to harden container reliability.
|
"survive reboot", or to harden container reliability.
|
||||||
allowed-tools: Bash Read Edit Write Glob Grep
|
allowed-tools: Bash Read Edit Write Glob Grep
|
||||||
@ -15,6 +16,31 @@ Ensures every Archipelago container survives reboots, recovers from crashes, and
|
|||||||
|
|
||||||
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
**SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
|
||||||
|
|
||||||
|
> **ROOTLESS PODMAN**: All `podman` commands run as the `archipelago` user — NO sudo.
|
||||||
|
> Only use `sudo` for: systemd unit files, chown on volumes, UFW changes.
|
||||||
|
> The archipelago user runs containers directly via user namespaces.
|
||||||
|
|
||||||
|
## Prerequisites for Rootless Uptime
|
||||||
|
|
||||||
|
Before setting up uptime infrastructure, verify rootless Podman basics are working:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Must be the archipelago user
|
||||||
|
whoami # archipelago
|
||||||
|
|
||||||
|
# User lingering must be enabled (keeps user services running after logout)
|
||||||
|
ls /var/lib/systemd/linger/ | grep archipelago || sudo loginctl enable-linger archipelago
|
||||||
|
|
||||||
|
# XDG_RUNTIME_DIR must be set
|
||||||
|
echo $XDG_RUNTIME_DIR # /run/user/1000
|
||||||
|
|
||||||
|
# Subuid/subgid must be configured
|
||||||
|
grep archipelago /etc/subuid # archipelago:100000:65536
|
||||||
|
|
||||||
|
# UFW forward policy must be ACCEPT (for LAN access to containers)
|
||||||
|
grep DEFAULT_FORWARD_POLICY /etc/default/ufw # Must be "ACCEPT"
|
||||||
|
```
|
||||||
|
|
||||||
## Layer 1: Restart Policies (Survive Reboots)
|
## Layer 1: Restart Policies (Survive Reboots)
|
||||||
|
|
||||||
Every container MUST have `--restart unless-stopped`. This is non-negotiable.
|
Every container MUST have `--restart unless-stopped`. This is non-negotiable.
|
||||||
@ -23,28 +49,31 @@ Every container MUST have `--restart unless-stopped`. This is non-negotiable.
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Audit
|
# Audit
|
||||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
||||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||||
echo "$c: $policy"
|
echo "$c: $policy"
|
||||||
done
|
done
|
||||||
|
|
||||||
# Fix any with "no" or empty policy
|
# Fix any with "no" or empty policy
|
||||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
||||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||||
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
|
if [ "$policy" = "no" ] || [ -z "$policy" ]; then
|
||||||
echo "Fixing: $c"
|
echo "Fixing: $c"
|
||||||
sudo podman update --restart unless-stopped "$c"
|
podman update --restart unless-stopped "$c"
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
### Ensure podman auto-starts containers on boot
|
### Ensure podman auto-starts containers on boot
|
||||||
|
|
||||||
```bash
|
For rootless Podman, containers with restart policies are auto-started by `podman-restart` as a **user** service:
|
||||||
# Enable podman-restart service (restarts containers with restart policy on boot)
|
|
||||||
sudo systemctl enable podman-restart.service 2>/dev/null || true
|
|
||||||
|
|
||||||
# If podman-restart doesn't exist, create it
|
```bash
|
||||||
|
# Enable the rootless podman-restart user service
|
||||||
|
systemctl --user enable podman-restart.service 2>/dev/null
|
||||||
|
|
||||||
|
# If the user service doesn't exist, create a system-level one
|
||||||
|
# (runs as archipelago user via User= directive)
|
||||||
cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service
|
cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Podman Start All Containers With Restart Policy
|
Description=Podman Start All Containers With Restart Policy
|
||||||
@ -53,8 +82,12 @@ Wants=network-online.target
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=oneshot
|
Type=oneshot
|
||||||
|
User=archipelago
|
||||||
|
Group=archipelago
|
||||||
|
Environment=XDG_RUNTIME_DIR=/run/user/1000
|
||||||
ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped
|
ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped
|
||||||
RemainAfterExit=yes
|
RemainAfterExit=yes
|
||||||
|
TimeoutStartSec=300
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
@ -73,27 +106,31 @@ Create a systemd timer that checks container health every 2 minutes and restarts
|
|||||||
```bash
|
```bash
|
||||||
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh
|
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
# Archipelago Container Watchdog
|
# Archipelago Container Watchdog (Rootless Podman)
|
||||||
# Checks all containers and restarts any that are stopped or unhealthy
|
# Runs as archipelago user — NO sudo for podman commands
|
||||||
|
|
||||||
LOG_TAG="container-watchdog"
|
LOG_TAG="container-watchdog"
|
||||||
|
|
||||||
|
# Run podman as the archipelago user with correct XDG path
|
||||||
|
export XDG_RUNTIME_DIR=/run/user/1000
|
||||||
|
PODMAN="/usr/bin/podman"
|
||||||
|
|
||||||
# Restart any stopped containers that should be running (have restart policy)
|
# Restart any stopped containers that should be running (have restart policy)
|
||||||
for c in $(sudo podman ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}"); do
|
for c in $($PODMAN ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}" 2>/dev/null); do
|
||||||
logger -t "$LOG_TAG" "Restarting stopped container: $c"
|
logger -t "$LOG_TAG" "Restarting stopped container: $c"
|
||||||
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
|
$PODMAN start "$c" 2>&1 | logger -t "$LOG_TAG"
|
||||||
done
|
done
|
||||||
|
|
||||||
# Restart unhealthy containers
|
# Restart unhealthy containers
|
||||||
for c in $(sudo podman ps --filter health=unhealthy --format "{{.Names}}"); do
|
for c in $($PODMAN ps --filter health=unhealthy --format "{{.Names}}" 2>/dev/null); do
|
||||||
logger -t "$LOG_TAG" "Restarting unhealthy container: $c"
|
logger -t "$LOG_TAG" "Restarting unhealthy container: $c"
|
||||||
sudo podman restart "$c" 2>&1 | logger -t "$LOG_TAG"
|
$PODMAN restart "$c" 2>&1 | logger -t "$LOG_TAG"
|
||||||
done
|
done
|
||||||
|
|
||||||
# Check for containers in "created" state (never started)
|
# Check for containers in "created" state (never started)
|
||||||
for c in $(sudo podman ps -a --filter status=created --format "{{.Names}}"); do
|
for c in $($PODMAN ps -a --filter status=created --format "{{.Names}}" 2>/dev/null); do
|
||||||
logger -t "$LOG_TAG" "Starting created container: $c"
|
logger -t "$LOG_TAG" "Starting created container: $c"
|
||||||
sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG"
|
$PODMAN start "$c" 2>&1 | logger -t "$LOG_TAG"
|
||||||
done
|
done
|
||||||
SCRIPT
|
SCRIPT
|
||||||
|
|
||||||
@ -103,7 +140,7 @@ sudo chmod +x /usr/local/bin/archipelago-container-watchdog.sh
|
|||||||
### Create the systemd timer
|
### Create the systemd timer
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Service unit
|
# Service unit — runs as archipelago user for rootless podman
|
||||||
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service
|
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Archipelago Container Watchdog
|
Description=Archipelago Container Watchdog
|
||||||
@ -111,6 +148,9 @@ After=podman-restart.service
|
|||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=oneshot
|
Type=oneshot
|
||||||
|
User=archipelago
|
||||||
|
Group=archipelago
|
||||||
|
Environment=XDG_RUNTIME_DIR=/run/user/1000
|
||||||
ExecStart=/usr/local/bin/archipelago-container-watchdog.sh
|
ExecStart=/usr/local/bin/archipelago-container-watchdog.sh
|
||||||
EOF
|
EOF
|
||||||
|
|
||||||
@ -150,17 +190,20 @@ Some containers depend on others. The watchdog handles restarts, but initial boo
|
|||||||
```bash
|
```bash
|
||||||
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh
|
cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
# Ordered container startup for Archipelago
|
# Ordered container startup for Archipelago (Rootless Podman)
|
||||||
|
# Runs as archipelago user — NO sudo for podman commands
|
||||||
# Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay
|
# Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay
|
||||||
|
|
||||||
LOG_TAG="ordered-start"
|
LOG_TAG="ordered-start"
|
||||||
|
export XDG_RUNTIME_DIR=/run/user/1000
|
||||||
|
PODMAN="/usr/bin/podman"
|
||||||
|
|
||||||
wait_for_container() {
|
wait_for_container() {
|
||||||
local name=$1
|
local name=$1
|
||||||
local max_wait=${2:-60}
|
local max_wait=${2:-60}
|
||||||
local waited=0
|
local waited=0
|
||||||
while [ $waited -lt $max_wait ]; do
|
while [ $waited -lt $max_wait ]; do
|
||||||
status=$(sudo podman inspect "$name" --format "{{.State.Running}}" 2>/dev/null)
|
status=$($PODMAN inspect "$name" --format "{{.State.Running}}" 2>/dev/null)
|
||||||
if [ "$status" = "true" ]; then
|
if [ "$status" = "true" ]; then
|
||||||
logger -t "$LOG_TAG" "$name is running"
|
logger -t "$LOG_TAG" "$name is running"
|
||||||
return 0
|
return 0
|
||||||
@ -174,38 +217,45 @@ wait_for_container() {
|
|||||||
|
|
||||||
# Tier 0: Infrastructure
|
# Tier 0: Infrastructure
|
||||||
logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure"
|
logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure"
|
||||||
sudo podman start tailscale 2>/dev/null
|
$PODMAN start tailscale 2>/dev/null
|
||||||
|
|
||||||
# Tier 1: Bitcoin (foundation)
|
# Tier 1: Databases (must start before services that depend on them)
|
||||||
logger -t "$LOG_TAG" "Starting Tier 1: Bitcoin"
|
logger -t "$LOG_TAG" "Starting Tier 1: Databases"
|
||||||
sudo podman start bitcoin-knots 2>/dev/null
|
$PODMAN start mempool-db 2>/dev/null
|
||||||
|
$PODMAN start btcpay-postgres 2>/dev/null
|
||||||
|
$PODMAN start immich_postgres 2>/dev/null
|
||||||
|
sleep 5
|
||||||
|
|
||||||
|
# Tier 2: Bitcoin (foundation for Lightning and explorers)
|
||||||
|
logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin"
|
||||||
|
$PODMAN start bitcoin-knots 2>/dev/null
|
||||||
wait_for_container bitcoin-knots 120
|
wait_for_container bitcoin-knots 120
|
||||||
|
|
||||||
# Tier 2: Bitcoin-dependent services
|
# Tier 3: Bitcoin-dependent services
|
||||||
logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin-dependent"
|
logger -t "$LOG_TAG" "Starting Tier 3: Bitcoin-dependent"
|
||||||
sudo podman start electrs 2>/dev/null
|
$PODMAN start electrumx 2>/dev/null
|
||||||
sudo podman start lnd 2>/dev/null
|
$PODMAN start lnd 2>/dev/null
|
||||||
wait_for_container electrs 90
|
wait_for_container electrumx 90
|
||||||
wait_for_container lnd 90
|
wait_for_container lnd 90
|
||||||
|
|
||||||
# Tier 3: Services depending on Tier 2
|
# Tier 4: Services depending on Tier 3
|
||||||
logger -t "$LOG_TAG" "Starting Tier 3: Second-order dependencies"
|
logger -t "$LOG_TAG" "Starting Tier 4: Second-order dependencies"
|
||||||
sudo podman start mempool-db 2>/dev/null
|
$PODMAN start mempool 2>/dev/null
|
||||||
sleep 5
|
$PODMAN start nbxplorer 2>/dev/null
|
||||||
sudo podman start mempool 2>/dev/null
|
|
||||||
sudo podman start nbxplorer 2>/dev/null
|
|
||||||
sleep 10
|
sleep 10
|
||||||
sudo podman start btcpay-server 2>/dev/null
|
$PODMAN start btcpay-server 2>/dev/null
|
||||||
sudo podman start btcpay-postgres 2>/dev/null
|
$PODMAN start fedimint 2>/dev/null
|
||||||
|
$PODMAN start fedimint-gateway 2>/dev/null
|
||||||
|
|
||||||
# Tier 4: Independent apps (start all remaining)
|
# Tier 5: Independent apps (start all remaining)
|
||||||
logger -t "$LOG_TAG" "Starting Tier 4: Independent apps"
|
logger -t "$LOG_TAG" "Starting Tier 5: Independent apps"
|
||||||
sudo podman start --all 2>/dev/null
|
$PODMAN start --all 2>/dev/null
|
||||||
|
|
||||||
# Tier 5: UI containers (need parent apps running first)
|
# Tier 6: UI containers (need parent apps running first)
|
||||||
logger -t "$LOG_TAG" "Starting Tier 5: UI containers"
|
logger -t "$LOG_TAG" "Starting Tier 6: UI containers"
|
||||||
sudo podman start bitcoin-ui 2>/dev/null
|
$PODMAN start bitcoin-ui 2>/dev/null
|
||||||
sudo podman start lnd-ui 2>/dev/null
|
$PODMAN start lnd-ui 2>/dev/null
|
||||||
|
$PODMAN start electrs-ui 2>/dev/null
|
||||||
|
|
||||||
logger -t "$LOG_TAG" "Startup sequence complete"
|
logger -t "$LOG_TAG" "Startup sequence complete"
|
||||||
SCRIPT
|
SCRIPT
|
||||||
@ -216,18 +266,22 @@ sudo chmod +x /usr/local/bin/archipelago-ordered-start.sh
|
|||||||
### Wire into boot sequence
|
### Wire into boot sequence
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
# Runs as archipelago user for rootless podman
|
||||||
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service
|
cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Archipelago Ordered Container Startup
|
Description=Archipelago Ordered Container Startup
|
||||||
After=network-online.target podman.service
|
After=network-online.target
|
||||||
Wants=network-online.target
|
Wants=network-online.target
|
||||||
Before=archipelago.service
|
Before=archipelago.service
|
||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=oneshot
|
Type=oneshot
|
||||||
|
User=archipelago
|
||||||
|
Group=archipelago
|
||||||
|
Environment=XDG_RUNTIME_DIR=/run/user/1000
|
||||||
ExecStart=/usr/local/bin/archipelago-ordered-start.sh
|
ExecStart=/usr/local/bin/archipelago-ordered-start.sh
|
||||||
RemainAfterExit=yes
|
RemainAfterExit=yes
|
||||||
TimeoutStartSec=300
|
TimeoutStartSec=600
|
||||||
|
|
||||||
[Install]
|
[Install]
|
||||||
WantedBy=multi-user.target
|
WantedBy=multi-user.target
|
||||||
@ -237,14 +291,45 @@ sudo systemctl daemon-reload
|
|||||||
sudo systemctl enable archipelago-containers.service
|
sudo systemctl enable archipelago-containers.service
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Rootless-Specific Uptime Considerations
|
||||||
|
|
||||||
|
### Volume ownership survives reboots
|
||||||
|
Volume ownership doesn't change on reboot, but if a container image is updated (re-pulled), the new container may run as a different UID. Always verify after image updates:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Quick ownership audit after image pull
|
||||||
|
podman inspect CONTAINER_NAME --format "{{.Config.User}}"
|
||||||
|
# Then verify: sudo stat -c '%u:%g' /var/lib/archipelago/APP_NAME
|
||||||
|
# Formula: host_uid = 100000 + container_uid
|
||||||
|
```
|
||||||
|
|
||||||
|
### XDG_RUNTIME_DIR on boot
|
||||||
|
Rootless Podman requires `/run/user/1000` to exist. This is created by `pam_systemd` when the user logs in, or by `loginctl enable-linger`. If it's missing after boot, containers won't start.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Verify it exists
|
||||||
|
ls -la /run/user/1000/ || echo "CRITICAL: /run/user/1000 missing — run: sudo loginctl enable-linger archipelago"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Systemd sandbox must not block podman
|
||||||
|
If the archipelago.service sandbox blocks namespace/syscall operations, the Rust backend can't scan containers. See Fix 10 in /podman-fix.
|
||||||
|
|
||||||
## Verification Checklist
|
## Verification Checklist
|
||||||
|
|
||||||
After setting up all 3 layers, verify:
|
After setting up all 3 layers, verify:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
echo "=== Rootless Podman Prerequisites ==="
|
||||||
|
echo "User: $(whoami)"
|
||||||
|
echo "XDG_RUNTIME_DIR: $XDG_RUNTIME_DIR"
|
||||||
|
grep archipelago /etc/subuid | head -1
|
||||||
|
ls /var/lib/systemd/linger/ | grep archipelago && echo "Linger: enabled" || echo "Linger: DISABLED"
|
||||||
|
grep DEFAULT_FORWARD_POLICY /etc/default/ufw
|
||||||
|
|
||||||
|
echo ""
|
||||||
echo "=== Layer 1: Restart Policies ==="
|
echo "=== Layer 1: Restart Policies ==="
|
||||||
for c in $(sudo podman ps -a --format "{{.Names}}"); do
|
for c in $(podman ps -a --format "{{.Names}}"); do
|
||||||
policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
|
||||||
echo " $c: $policy"
|
echo " $c: $policy"
|
||||||
done
|
done
|
||||||
|
|
||||||
@ -261,11 +346,19 @@ sudo systemctl is-enabled archipelago-watchdog.timer 2>/dev/null || echo "watchd
|
|||||||
|
|
||||||
echo ""
|
echo ""
|
||||||
echo "=== Container Health Summary ==="
|
echo "=== Container Health Summary ==="
|
||||||
total=$(sudo podman ps -a --format "{{.Names}}" | wc -l)
|
total=$(podman ps -a --format "{{.Names}}" | wc -l)
|
||||||
running=$(sudo podman ps --format "{{.Names}}" | wc -l)
|
running=$(podman ps --format "{{.Names}}" | wc -l)
|
||||||
stopped=$((total - running))
|
stopped=$((total - running))
|
||||||
unhealthy=$(sudo podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l)
|
unhealthy=$(podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l)
|
||||||
echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy"
|
echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== Volume Ownership Spot Check ==="
|
||||||
|
for dir in bitcoin lnd grafana; do
|
||||||
|
if [ -d "/var/lib/archipelago/$dir" ]; then
|
||||||
|
echo " $dir: $(stat -c '%u:%g' /var/lib/archipelago/$dir)"
|
||||||
|
fi
|
||||||
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
## Reboot Test
|
## Reboot Test
|
||||||
@ -274,17 +367,20 @@ The ultimate uptime test — reboot the server and verify everything comes back:
|
|||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Before reboot: record running containers
|
# Before reboot: record running containers
|
||||||
sudo podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt
|
podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt
|
||||||
|
|
||||||
# Reboot
|
# Reboot
|
||||||
sudo reboot
|
sudo reboot
|
||||||
|
|
||||||
# After reboot (wait ~3 minutes, then SSH back in):
|
# After reboot (wait ~3 minutes, then SSH back in):
|
||||||
sudo podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt
|
podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt
|
||||||
|
|
||||||
# Compare
|
# Compare
|
||||||
diff /tmp/before-reboot.txt /tmp/after-reboot.txt
|
diff /tmp/before-reboot.txt /tmp/after-reboot.txt
|
||||||
# Should show no differences
|
# Should show no differences
|
||||||
|
|
||||||
|
# Also verify XDG_RUNTIME_DIR survived reboot
|
||||||
|
ls /run/user/1000/ || echo "CRITICAL: lingering not working"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Monitoring
|
## Monitoring
|
||||||
@ -292,18 +388,23 @@ diff /tmp/before-reboot.txt /tmp/after-reboot.txt
|
|||||||
Check uptime status anytime:
|
Check uptime status anytime:
|
||||||
```bash
|
```bash
|
||||||
# Quick status
|
# Quick status
|
||||||
sudo podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort
|
podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort
|
||||||
|
|
||||||
# Watchdog activity
|
# Watchdog activity
|
||||||
sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager
|
sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager
|
||||||
|
|
||||||
# Container events (starts, stops, deaths)
|
# Container events (starts, stops, deaths)
|
||||||
sudo podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30
|
podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30
|
||||||
|
|
||||||
|
# Check for permission denied errors (rootless UID mapping issue)
|
||||||
|
podman ps -a --filter status=exited --format "{{.Names}}" | while read c; do
|
||||||
|
podman logs --tail 5 "$c" 2>&1 | grep -i "permission denied" && echo " ^ UID mapping issue in: $c"
|
||||||
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
## Integration
|
## Integration
|
||||||
|
|
||||||
- Run `/podman-doctor` first to identify issues
|
- Run `/podman-doctor` first to identify issues (includes rootless health checks)
|
||||||
- Run `/podman-fix` for specific container repairs
|
- Run `/podman-fix` for specific container repairs (includes UID mapping fixes)
|
||||||
- Run `/podman-uptime` to set up permanent reliability infrastructure
|
- Run `/podman-uptime` to set up permanent reliability infrastructure
|
||||||
- Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot
|
- Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot
|
||||||
|
|||||||
1528
docs/architecture-review.html
Normal file
1528
docs/architecture-review.html
Normal file
File diff suppressed because it is too large
Load Diff
@ -82,7 +82,7 @@ define(['./workbox-21a80088'], (function (workbox) { 'use strict';
|
|||||||
"revision": "3ca0b8505b4bec776b69afdba2768812"
|
"revision": "3ca0b8505b4bec776b69afdba2768812"
|
||||||
}, {
|
}, {
|
||||||
"url": "index.html",
|
"url": "index.html",
|
||||||
"revision": "0.a4nevj6csc4"
|
"revision": "0.2lte02eatlc"
|
||||||
}], {});
|
}], {});
|
||||||
workbox.cleanupOutdatedCaches();
|
workbox.cleanupOutdatedCaches();
|
||||||
workbox.registerRoute(new workbox.NavigationRoute(workbox.createHandlerBoundToURL("index.html"), {
|
workbox.registerRoute(new workbox.NavigationRoute(workbox.createHandlerBoundToURL("index.html"), {
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user