diff --git a/.claude/rules/containers.md b/.claude/rules/containers.md index b756b8aa..2e0e6339 100644 --- a/.claude/rules/containers.md +++ b/.claude/rules/containers.md @@ -5,15 +5,46 @@ globs: - "**/*podman*" - "**/Containerfile" - "**/Dockerfile" + - "**/first-boot*" + - "**/container-doctor*" --- -# Container Security Rules (Archipelago) +# Container Security Rules (Archipelago — Rootless Podman) -- `readonly_root: true` always — containers must not write to their root filesystem +## Rootless Podman Architecture +- Podman runs as `archipelago` user (UID 1000), NOT root — never use `sudo podman` +- UID namespace mapping via subuid: container UID N → host UID (100000 + N) +- Container images stored in `~/.local/share/containers/storage/` (NOT /var/lib/containers) +- Container subnet: `10.89.0.0/16` (rootless), not `10.88.0.0/16` (rootful) +- XDG_RUNTIME_DIR must be `/run/user/1000` — required for podman socket +- `loginctl enable-linger archipelago` required for containers to survive logout + +## Container Security (Non-Negotiable) - Drop ALL capabilities, add only what's required (`--cap-drop=ALL --cap-add=...`) -- Run as non-root user (UID > 1000): `--user 1001:1001` -- Set `--security-opt=no-new-privileges:true` -- Pin image versions by SHA256 digest, never use `:latest` tag +- Set `--security-opt=no-new-privileges:true` on all containers +- Use `--read-only` + tmpfs where possible (safe apps: searxng, grafana, filebrowser, electrumx, nostr-rs-relay, ollama, indeedhub) +- Pin image versions — never use `:latest` tag - Mount secrets as read-only files, never pass as environment variables when possible - Set memory and CPU limits on all containers -- Use `--network=none` unless network access is required +- All containers must have `--restart unless-stopped` + +## Volume Ownership (Critical for Rootless) +- Volume directories must be owned by the MAPPED UID, not the container UID +- Formula: `host_uid = 100000 + container_uid` +- UID 0 (most apps) → `sudo chown -R 100000:100000 /var/lib/archipelago/{app}` +- UID 101 (bitcoin) → `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin` +- UID 70 (postgres) → `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*` +- UID 472 (grafana) → `sudo chown -R 100472:100472 /var/lib/archipelago/grafana` +- UID 999 (mariadb) → `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-*` + +## Systemd Service Requirements +- `ProtectHome=no` — podman needs `~/.local/share/containers/` +- `PrivateTmp=no` — podman runtime uses `/tmp/podman-run-1000/` +- `RestrictNamespaces=` must NOT be set — rootless podman creates user namespaces +- `SystemCallFilter=` must NOT be set — rootless podman needs clone/unshare +- UFW `DEFAULT_FORWARD_POLICY="ACCEPT"` — required for LAN access to container ports + +## Network Rules +- Apps needing inter-container DNS: use `--network=archy-net` (bitcoin, lnd, electrumx, mempool, btcpay, fedimint) +- Standalone apps: default bridge network +- Tailscale only: `--network=host` + `NET_ADMIN` + `NET_RAW` + `/dev/net/tun` diff --git a/.claude/skills/podman-doctor/SKILL.md b/.claude/skills/podman-doctor/SKILL.md index 44d13bc0..38564b70 100644 --- a/.claude/skills/podman-doctor/SKILL.md +++ b/.claude/skills/podman-doctor/SKILL.md @@ -4,6 +4,7 @@ description: > Comprehensive Podman container diagnostic for Archipelago. Audits all running containers, port mappings, network connectivity, health status, restart policies, and config consistency across all 4 layers (backend Rust, Podman runtime, Nginx proxy, frontend routing). + Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). Use when asked to "diagnose containers", "check podman", "why is app not working", "container health check", "port not reachable", "audit containers", "podman status", or when any container/app is misbehaving. @@ -12,46 +13,123 @@ allowed-tools: Bash Read Glob Grep # Podman Doctor — Container Infrastructure Diagnostics -Systematic diagnostic for Archipelago's Podman container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, and config drift across all layers. +Systematic diagnostic for Archipelago's **rootless Podman** container stack. Catches port conflicts, network misconfigurations, health failures, missing restart policies, UID mapping issues, and config drift across all layers. **SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` +> **ROOTLESS PODMAN**: Archipelago runs Podman as the `archipelago` user (UID 1000), NOT root. +> Never use `sudo podman` — use plain `podman` after SSH'ing in as the `archipelago` user. +> Container UIDs are mapped via subuid: container UID N → host UID (100000 + N). + If $ARGUMENTS is provided, focus diagnosis on that specific app/container. Otherwise run full audit. ## Workflow ### Step 1: Gather Runtime State -Run these on the server: +Run these on the server (as `archipelago` user — NO sudo): ```bash # All containers with status, ports, networks -sudo podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}" +podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}" # Check for port conflicts on known ports -sudo ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b" +ss -tlnp | grep -E ":(80|443|3000|4080|5678|8080|8081|8082|8083|8085|8096|8123|8173|8174|8175|8240|8332|8333|8334|8888|9735|10009|11434|23000|50001)\b" ``` -### Step 2: Check Restart Policies +### Step 2: Rootless Podman Health Check + +Rootless Podman has specific requirements that must be verified: + +```bash +# Verify running as archipelago user (NOT root) +whoami # Must be "archipelago" +id # Must show uid=1000(archipelago) + +# Check XDG_RUNTIME_DIR is set (required for rootless podman socket) +echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR" # Must be /run/user/1000 + +# Verify subuid/subgid mapping exists +grep archipelago /etc/subuid # Must show: archipelago:100000:65536 +grep archipelago /etc/subgid # Must show: archipelago:100000:65536 + +# Verify user lingering is enabled (keeps user services after logout) +ls /var/lib/systemd/linger/ | grep archipelago # Must exist + +# Check podman storage is accessible +podman info --format "{{.Store.GraphRoot}}" # ~/.local/share/containers/storage +ls -la ~/.local/share/containers/storage/ 2>/dev/null || echo "ERROR: Storage not accessible" + +# Check podman socket +ls -la /run/user/1000/podman/ 2>/dev/null || echo "WARNING: No podman socket directory" +``` + +### Step 3: Check Restart Policies Every container MUST have `--restart unless-stopped`. This is the #1 cause of downtime after reboots. ```bash -for c in $(sudo podman ps -a --format "{{.Names}}"); do +for c in $(podman ps -a --format "{{.Names}}"); do echo -n "$c: " - sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}" + podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}" done ``` **Red flag**: `no` or empty = container won't survive reboot. -### Step 3: Verify Port Mapping Consistency +### Step 4: Volume Ownership Audit (Rootless UID Mapping) + +Rootless Podman maps container UIDs via subuid. Volume directories must be owned by the MAPPED UID, not the container UID. Formula: `host_uid = 100000 + container_uid` + +```bash +echo "=== Volume Ownership Check ===" + +# Default containers (run as root inside = UID 0 → host UID 100000) +for dir in lnd fedimint homeassistant jellyfin vaultwarden photoprism ollama filebrowser electrumx btcpay immich; do + if [ -d "/var/lib/archipelago/$dir" ]; then + owner=$(stat -c '%u:%g' "/var/lib/archipelago/$dir" 2>/dev/null) + if [ "$owner" != "100000:100000" ]; then + echo "WRONG: /var/lib/archipelago/$dir owned by $owner (should be 100000:100000)" + else + echo " OK: $dir → $owner" + fi + fi +done + +# Bitcoin Knots (container UID 101 → host UID 100101) +if [ -d "/var/lib/archipelago/bitcoin" ]; then + owner=$(stat -c '%u:%g' "/var/lib/archipelago/bitcoin") + [ "$owner" != "100101:100101" ] && echo "WRONG: bitcoin owned by $owner (should be 100101:100101)" || echo " OK: bitcoin → $owner" +fi + +# PostgreSQL (container UID 70 → host UID 100070) +for dir in /var/lib/archipelago/*-db /var/lib/archipelago/postgres-*; do + if [ -d "$dir" ]; then + owner=$(stat -c '%u:%g' "$dir") + [ "$owner" != "100070:100070" ] && echo "WRONG: $dir owned by $owner (should be 100070:100070)" || echo " OK: $(basename $dir) → $owner" + fi +done + +# Grafana (container UID 472 → host UID 100472) +if [ -d "/var/lib/archipelago/grafana" ]; then + owner=$(stat -c '%u:%g' "/var/lib/archipelago/grafana") + [ "$owner" != "100472:100472" ] && echo "WRONG: grafana owned by $owner (should be 100472:100472)" || echo " OK: grafana → $owner" +fi + +# MariaDB/MySQL (container UID 999 → host UID 100999) +if [ -d "/var/lib/archipelago/mysql-mempool" ]; then + owner=$(stat -c '%u:%g' "/var/lib/archipelago/mysql-mempool") + [ "$owner" != "100999:100999" ] && echo "WRONG: mysql-mempool owned by $owner (should be 100999:100999)" || echo " OK: mysql-mempool → $owner" +fi +``` + +### Step 5: Verify Port Mapping Consistency Cross-reference these 4 layers — mismatches between ANY two cause "app not loading" bugs: **Layer 1 — Backend Config (Rust)**: Read `core/archipelago/src/api/rpc/package.rs`, look at `get_app_config()` port mappings. -**Layer 2 — Podman Runtime**: `sudo podman ps --format "{{.Names}}: {{.Ports}}"` +**Layer 2 — Podman Runtime**: `podman ps --format "{{.Names}}: {{.Ports}}"` **Layer 3 — Nginx Proxy**: Read these for `/app/{id}/` location blocks: - `image-recipe/configs/nginx-archipelago.conf` (HTTP) @@ -66,77 +144,114 @@ Cross-reference these 4 layers — mismatches between ANY two cause "app not loa | Works on port but not /app/ path | Missing nginx location block | | Frontend can't find app | PORT_TO_APP_ID missing in appLauncher.ts | -### Step 4: Network Connectivity Audit +### Step 6: Network Connectivity Audit ```bash # Networks and their containers -sudo podman network ls -sudo podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!" +podman network ls +podman network inspect archy-net 2>/dev/null || echo "WARNING: archy-net missing!" + +# Check container subnet (rootless uses 10.89.x.x, NOT 10.88.x.x) +podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}" 2>/dev/null ``` -**Must be on archy-net**: bitcoin-knots, lnd, electrs, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui +**Must be on archy-net**: bitcoin-knots, lnd, electrs/electrumx, mempool, btcpay-server, nbxplorer, fedimint, fedimint-gateway, nostr-rs-relay, indeedhub, ollama, open-webui **Must NOT be on archy-net**: grafana, nextcloud, filebrowser, vaultwarden, bitcoin-ui, lnd-ui, tailscale (host network) -### Step 5: Health Check Status +### Step 7: UFW Forward Policy Check + +Rootless Podman requires `DEFAULT_FORWARD_POLICY="ACCEPT"` in UFW, otherwise container ports are unreachable from LAN. + +```bash +grep DEFAULT_FORWARD_POLICY /etc/default/ufw +# Must be "ACCEPT", NOT "DROP" +# If DROP: containers work locally but NOT from other machines on the network +``` + +### Step 8: Systemd Service Sandbox Check + +The `archipelago.service` must have specific settings relaxed for rootless Podman: + +```bash +# Check critical settings +systemctl cat archipelago.service | grep -E "ProtectHome|PrivateTmp|RestrictNamespaces|ReadWritePaths|XDG_RUNTIME_DIR" +``` + +**Required settings for rootless Podman**: +- `ProtectHome=no` — podman stores images in `~/.local/share/containers/` +- `PrivateTmp=no` or disabled — podman runtime uses `/tmp/podman-run-1000/` +- `RestrictNamespaces=` must NOT be set — rootless podman needs user namespaces +- `ReadWritePaths=` must include `/var/lib/archipelago /run/user /tmp` +- `Environment=XDG_RUNTIME_DIR=/run/user/1000` + +### Step 9: Health Check Status ```bash # Containers with health checks — are they passing? -for c in $(sudo podman ps --format "{{.Names}}"); do - health=$(sudo podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null) +for c in $(podman ps --format "{{.Names}}"); do + health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null) if [ -n "$health" ] && [ "$health" != "" ]; then echo "$c: $health" fi done # Containers WITHOUT health checks (gap in monitoring) -for c in $(sudo podman ps --format "{{.Names}}"); do - hc=$(sudo podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null) +for c in $(podman ps --format "{{.Names}}"); do + hc=$(podman inspect "$c" --format "{{.Config.Healthcheck}}" 2>/dev/null) if [ "$hc" = "" ] || [ -z "$hc" ]; then echo "NO HEALTHCHECK: $c" fi done ``` -### Step 6: Resource & Failure Analysis +### Step 10: Resource & Failure Analysis ```bash # Resource usage -sudo podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}" +podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}" # Recent deaths (last 24h) -sudo podman events --filter event=died --since 24h 2>/dev/null | tail -20 +podman events --filter event=died --since 24h 2>/dev/null | tail -20 # OOM kills -sudo podman ps -a --format "{{.Names}}" | while read c; do - oom=$(sudo podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null) +podman ps -a --format "{{.Names}}" | while read c; do + oom=$(podman inspect "$c" --format "{{.State.OOMKilled}}" 2>/dev/null) [ "$oom" = "true" ] && echo "OOM KILLED: $c" done # Non-zero exits -sudo podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}" +podman ps -a --filter status=exited --format "{{.Names}}\t{{.Status}}" ``` -### Step 7: Systemd Integration +### Step 11: Systemd Integration ```bash systemctl is-active archipelago nginx -systemctl list-units --type=service | grep -i podman +systemctl --user list-units --type=service 2>/dev/null | grep -i podman systemctl list-timers --all | grep -i -E "podman|container|archipelago" ``` -### Step 8: Generate Report +### Step 12: Generate Report Produce a structured report: ``` ## Container Diagnostic Report +### Rootless Podman Status +- User: archipelago (UID 1000) +- Subuid mapping: [OK/MISSING] +- XDG_RUNTIME_DIR: [OK/MISSING] +- User linger: [enabled/disabled] +- UFW forward policy: [ACCEPT/DROP] + ### Summary - Total containers: X running, Y stopped, Z unhealthy - Port conflicts: [list or "none"] - Missing restart policies: [list or "none"] - Network issues: [list or "none"] +- UID mapping issues: [list or "none"] - Health check gaps: [list] ### Critical Issues (fix immediately) @@ -154,3 +269,7 @@ After diagnosis, suggest running `/podman-fix` for any issues found. ## Port Reference See `references/port-map.md` for the canonical port assignment table across all 4 layers. + +## UID Mapping Reference + +See `references/uid-mapping.md` for the complete rootless UID mapping table. diff --git a/.claude/skills/podman-doctor/references/common-failures.md b/.claude/skills/podman-doctor/references/common-failures.md index 75f2afb0..983a58fc 100644 --- a/.claude/skills/podman-doctor/references/common-failures.md +++ b/.claude/skills/podman-doctor/references/common-failures.md @@ -1,15 +1,31 @@ # Common Podman Failure Patterns +## Rootless Podman Specific Failures + +| Error | Cause | Fix | +|-------|-------|-----| +| `ERRO[0000] cannot find UID/GID for user` | subuid/subgid not configured | Add `archipelago:100000:65536` to `/etc/subuid` and `/etc/subgid` | +| `Error: unshare: operation not permitted` | Systemd `RestrictNamespaces` blocks user namespaces | Remove `RestrictNamespaces=` from `archipelago.service` | +| `Error: could not get runtime: creating runtime` | XDG_RUNTIME_DIR not set or /run/user/1000 missing | Set `Environment=XDG_RUNTIME_DIR=/run/user/1000` in service, ensure `loginctl enable-linger archipelago` | +| `permission denied` on volume mount | Wrong UID ownership — must use mapped UIDs | `sudo chown -R 100000:100000 /var/lib/archipelago/APP` (see UID mapping table) | +| `ERRO[0000] rootless containers not supported` | Podman not configured for rootless | Run `podman system migrate`, check `/etc/subuid` | +| `Error: creating container storage: layer not known` | Corrupted rootless storage | `podman system reset` (destroys all containers — last resort) | +| `Error: stat /tmp/podman-run-1000/...: no such file` | PrivateTmp=yes in systemd isolates /tmp | Set `PrivateTmp=no` in `archipelago.service` | +| Container ports unreachable from LAN | UFW DEFAULT_FORWARD_POLICY="DROP" | Change to "ACCEPT" in `/etc/default/ufw`, then `sudo ufw reload` | +| `Error: error creating network namespace` | Systemd `SystemCallFilter` blocks clone/unshare | Remove `SystemCallFilter=` from `archipelago.service` | +| Containers lose network after service restart | podman runtime dir in /tmp cleaned | Ensure `PrivateTmp=no` so /tmp/podman-run-1000/ persists | + ## Container Won't Start | Error | Cause | Fix | |-------|-------|-----| | `exec format error` | Binary built on wrong arch | Rebuild on the Linux server | | `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender | -| `permission denied` | Missing capability or read-only root | Check `get_app_capabilities()`, add tmpfs | +| `permission denied` | Missing capability, wrong UID ownership, or read-only root | Check capabilities, check volume ownership with mapped UID, add tmpfs | | `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` | | `image not known` | Image not pulled | `podman pull IMAGE:TAG` | | `no such network` | Network missing | `podman network create archy-net` | +| `Error: netavark: ...subnet overlap` | Network CIDR conflict | `podman network rm archy-net && podman network create archy-net` | ## Container Starts But App Unreachable @@ -20,6 +36,7 @@ | Port mapped but refused | Container logs | App crashing internally — check logs | | Works sometimes | Resources | Check OOM kills, CPU, disk space | | 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted | +| Works locally but not from LAN | UFW forward policy | Set `DEFAULT_FORWARD_POLICY="ACCEPT"` in `/etc/default/ufw` | ## Container Keeps Dying @@ -29,6 +46,8 @@ | Dies after minutes | OOM killed | Increase `--memory` limit | | Dies when dep restarts | No restart policy | Add `--restart unless-stopped` | | Crash loop | Repeated crash | Fix root cause, don't just restart | +| Exit code 127 | Missing binary in container | Wrong image tag or corrupted image — re-pull | +| Exit code 137 | Killed by OOM or signal | Check `dmesg` for OOM kill, check `podman inspect` for OOMKilled | ## Network Issues @@ -37,6 +56,20 @@ | Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` | | Can't reach internet | DNS missing | Add `--dns 1.1.1.1` | | Container-to-container timeout | Different networks | Put both on same network | +| Bitcoin RPC refused from container | rpcallowip wrong subnet | Use `rpcallowip=0.0.0.0/0` (safe: port mapped, not exposed) | +| Old containers can't find new network | Subnet changed (rootful→rootless) | Recreate containers on new archy-net (rootless uses 10.89.x.x) | + +## Volume Permission Patterns (Rootless UID Mapping) + +Formula: **host_uid = 100000 + container_uid** + +| Container UID | Host UID | Apps | Data Directory | +|---|---|---|---| +| 0 (root) | 100000 | lnd, fedimint, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay, immich | `/var/lib/archipelago/{app}` | +| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `/var/lib/archipelago/postgres-*` | +| 101 | 100101 | bitcoin-knots | `/var/lib/archipelago/bitcoin` | +| 472 | 100472 | grafana | `/var/lib/archipelago/grafana` | +| 999 | 100999 | MariaDB (mysql-mempool) | `/var/lib/archipelago/mysql-mempool` | ## Capability Reference @@ -47,9 +80,23 @@ | DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files | | FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms | | NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 | +| NET_ADMIN + NET_RAW | tailscale | Can't create TUN device or manage routes | ## Read-Only Safe Apps -Only these 8 apps can run with `--read-only`: searxng, grafana, filebrowser, electrs, nostr-rs-relay, ollama, indeedhub +Only these apps can run with `--read-only` + tmpfs: searxng, grafana, filebrowser, electrumx, mempool-electrs, electrs, nostr-rs-relay, ollama, indeedhub All others need writable root or will fail silently. + +## Systemd Sandbox Requirements for Rootless Podman + +These systemd service settings MUST be configured for rootless Podman to work: + +| Setting | Required Value | Why | +|---------|---------------|-----| +| `ProtectHome=` | `no` | Podman stores images in `~/.local/share/containers/` | +| `PrivateTmp=` | `no` | Podman runtime lives in `/tmp/podman-run-1000/` | +| `RestrictNamespaces=` | NOT SET | Rootless podman creates user namespaces | +| `SystemCallFilter=` | NOT SET | Rootless podman needs clone/unshare syscalls | +| `ReadWritePaths=` | Include `/var/lib/archipelago /run/user /tmp /etc/containers /var/lib/containers /run/containers` | Volume data + podman runtime paths | +| `Environment=` | `XDG_RUNTIME_DIR=/run/user/1000` | Podman socket location | diff --git a/.claude/skills/podman-doctor/references/uid-mapping.md b/.claude/skills/podman-doctor/references/uid-mapping.md new file mode 100644 index 00000000..a8338720 --- /dev/null +++ b/.claude/skills/podman-doctor/references/uid-mapping.md @@ -0,0 +1,93 @@ +# Rootless Podman UID Mapping Reference + +## How Rootless UID Mapping Works + +When Podman runs as the `archipelago` user (UID 1000), container processes don't run as their "apparent" UID on the host. Instead, Linux user namespaces remap UIDs. + +**Mapping formula**: `host_uid = 100000 + container_uid` + +This is configured in `/etc/subuid` and `/etc/subgid`: +``` +archipelago:100000:65536 +``` + +This means: +- Container UID 0 (root inside container) → Host UID 100000 (unprivileged on host) +- Container UID 70 (postgres) → Host UID 100070 +- Container UID 101 (bitcoin) → Host UID 100101 +- etc. + +## Why This Matters + +Volume directories (bind mounts) on the host must be owned by the **mapped** UID, not the container UID. If Bitcoin runs as UID 101 inside its container, the host directory must be owned by UID 100101. + +If ownership is wrong, the container gets `permission denied` when trying to read/write its data. + +## Complete UID Mapping Table + +| Container UID | Host UID | Containers | Fix Command | +|---|---|---|---| +| 0 (root) | 100000 | lnd, fedimint, fedimint-gateway, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay-server, nbxplorer, immich, nostr-rs-relay, strfry, nextcloud, searxng, onlyoffice, tailscale, uptime-kuma | `sudo chown -R 100000:100000 /var/lib/archipelago/{app}` | +| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*` | +| 101 | 100101 | bitcoin-knots, bitcoin-core | `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin` | +| 472 | 100472 | grafana | `sudo chown -R 100472:100472 /var/lib/archipelago/grafana` | +| 999 | 100999 | MariaDB (mysql-mempool) | `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool` | + +## How to Find a Container's UID + +If you encounter a new container with permission issues: + +```bash +# Check what user the container runs as +podman inspect CONTAINER_NAME --format "{{.Config.User}}" + +# If empty, it runs as root (UID 0) → host UID 100000 + +# If it shows a username, find the UID inside the image +podman run --rm IMAGE_NAME id + +# Then calculate: host_uid = 100000 + container_uid +``` + +## Fix Script + +Run this after any fresh install, migration, or when containers have permission errors: + +```bash +#!/bin/bash +# Fix all rootless podman volume ownership + +# UID 0 → 100000 (most containers) +for dir in lnd fedimint fedimint-gateway homeassistant jellyfin vaultwarden photoprism \ + ollama filebrowser electrumx btcpay nbxplorer immich nostr-rs-relay nextcloud \ + searxng onlyoffice uptime-kuma; do + [ -d "/var/lib/archipelago/$dir" ] && sudo chown -R 100000:100000 "/var/lib/archipelago/$dir" +done + +# UID 101 → 100101 (Bitcoin) +[ -d "/var/lib/archipelago/bitcoin" ] && sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin + +# UID 70 → 100070 (PostgreSQL) +for dir in /var/lib/archipelago/postgres-* /var/lib/archipelago/btcpay-db /var/lib/archipelago/immich-db; do + [ -d "$dir" ] && sudo chown -R 100070:100070 "$dir" +done + +# UID 999 → 100999 (MariaDB) +[ -d "/var/lib/archipelago/mysql-mempool" ] && sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool + +# UID 472 → 100472 (Grafana) +[ -d "/var/lib/archipelago/grafana" ] && sudo chown -R 100472:100472 /var/lib/archipelago/grafana +``` + +## Rootful vs Rootless Comparison + +| Aspect | Rootful (old) | Rootless (current) | +|--------|---------------|-------------------| +| Podman command | `sudo podman` | `podman` (as archipelago user) | +| Container storage | `/var/lib/containers/storage` | `~/.local/share/containers/storage` | +| Container subnet | `10.88.0.0/16` | `10.89.0.0/16` | +| Volume ownership | Container UID directly | Mapped UID (100000 + container_uid) | +| Requires root? | Yes | No (except fixing volume ownership) | +| XDG_RUNTIME_DIR | Not needed | Required: `/run/user/1000` | +| User lingering | Not needed | Required: `loginctl enable-linger` | +| Systemd restrictions | All can be enabled | Must disable: RestrictNamespaces, SystemCallFilter | diff --git a/.claude/skills/podman-fix/SKILL.md b/.claude/skills/podman-fix/SKILL.md index e3b3e413..15a4a789 100644 --- a/.claude/skills/podman-fix/SKILL.md +++ b/.claude/skills/podman-fix/SKILL.md @@ -2,19 +2,24 @@ name: podman-fix description: > Fix Podman container issues on Archipelago — restart failed containers, repair port bindings, - fix network connectivity, add missing restart policies, and resolve config drift. + fix network connectivity, add missing restart policies, fix rootless UID mapping, and resolve + config drift. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). Use when asked to "fix container", "restart app", "fix port mapping", "container not working", - "app won't start", "fix podman", "repair container", "container down", or after /podman-doctor - identifies issues to fix. + "app won't start", "fix podman", "repair container", "container down", "permission denied", + or after /podman-doctor identifies issues to fix. allowed-tools: Bash Read Edit Write Glob Grep --- # Podman Fix — Container Remediation -Targeted fix workflow for Podman container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it. +Targeted fix workflow for **rootless Podman** container issues on Archipelago. Given a specific problem (from /podman-doctor or user report), diagnose the root cause and fix it. **SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` +> **ROOTLESS PODMAN**: All `podman` commands run as the `archipelago` user — NO sudo. +> Only use `sudo` for: chown on volume directories, UFW changes, systemd service edits, nginx reload. +> Container UIDs are mapped via subuid: container UID N → host UID (100000 + N). + If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs fixing. ## Fix Procedures @@ -23,21 +28,22 @@ If $ARGUMENTS is provided, fix that specific app/issue. Otherwise ask what needs ```bash # Check why it stopped -sudo podman logs --tail 50 CONTAINER_NAME -sudo podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}" +podman logs --tail 50 CONTAINER_NAME +podman inspect CONTAINER_NAME --format "{{.State.ExitCode}} {{.State.Error}}" # If clean exit or crash — just restart -sudo podman start CONTAINER_NAME +podman start CONTAINER_NAME # If corrupt state — remove and recreate -sudo podman rm -f CONTAINER_NAME +podman rm -f CONTAINER_NAME # Then recreate using the install flow (trigger from UI or re-run creation command) ``` -**If container keeps crashing**: check logs for the actual error. Common causes: +**If container keeps crashing**, check logs for the actual error. Common causes: - Missing config file → check if volume mount has the config -- Wrong permissions → `chown -R` the data directory +- Wrong permissions → fix UID mapping (see Fix 8 below) - Dependency not ready → start dependency first, wait, then start this container +- Exit code 127 → missing binary in container image, re-pull the image ### Fix 2: Missing Restart Policy @@ -45,14 +51,14 @@ The most common uptime killer. Fix for ALL containers at once: ```bash # Fix a single container -sudo podman update --restart unless-stopped CONTAINER_NAME +podman update --restart unless-stopped CONTAINER_NAME # Fix ALL containers that have no restart policy -for c in $(sudo podman ps -a --format "{{.Names}}"); do - policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") +for c in $(podman ps -a --format "{{.Names}}"); do + policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") if [ "$policy" = "no" ] || [ -z "$policy" ]; then echo "Fixing restart policy for: $c" - sudo podman update --restart unless-stopped "$c" + podman update --restart unless-stopped "$c" fi done ``` @@ -66,23 +72,24 @@ done #### Port conflict (address already in use) ```bash # Find what's using the port -sudo ss -tlnp | grep :PORT_NUMBER +ss -tlnp | grep :PORT_NUMBER # If it's another container, either change one's port or stop the conflicting one -sudo podman stop CONFLICTING_CONTAINER +podman stop CONFLICTING_CONTAINER -# If it's a host process -sudo kill PID # or stop the service +# If it's a host process (e.g., system tor vs container tor) +sudo systemctl stop tor # Stop system service if container needs the port +sudo systemctl disable tor ``` #### Port not mapped (container running but port unreachable) ```bash # Check current port mappings -sudo podman port CONTAINER_NAME +podman port CONTAINER_NAME # Can't add ports to running container — must recreate -sudo podman stop CONTAINER_NAME -sudo podman rm CONTAINER_NAME +podman stop CONTAINER_NAME +podman rm CONTAINER_NAME # Recreate with correct -p flags (use the Rust install flow or manual podman run) ``` @@ -124,35 +131,51 @@ Edit `neode-ui/src/stores/appLauncher.ts`: #### Container not on archy-net (can't resolve other containers) ```bash # Connect to archy-net without recreating -sudo podman network connect archy-net CONTAINER_NAME +podman network connect archy-net CONTAINER_NAME # Verify -sudo podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}" +podman inspect CONTAINER_NAME --format "{{.NetworkSettings.Networks}}" ``` #### archy-net doesn't exist ```bash -sudo podman network create archy-net +podman network create archy-net # Then reconnect all containers that need it ``` #### DNS not working inside container ```bash # Test DNS from inside container -sudo podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \ -sudo podman exec CONTAINER_NAME ping -c1 bitcoin-knots +podman exec CONTAINER_NAME nslookup bitcoin-knots 2>/dev/null || \ +podman exec CONTAINER_NAME ping -c1 bitcoin-knots + +# If DNS fails, check the container's resolv.conf +podman exec CONTAINER_NAME cat /etc/resolv.conf # If DNS fails, recreate container with explicit DNS # Add --dns 1.1.1.1 to the podman run command ``` +#### Container subnet changed (rootful → rootless migration) +```bash +# Old rootful subnet: 10.88.0.0/16 +# New rootless subnet: 10.89.0.0/16 +# Bitcoin RPC rpcallowip must be updated if using subnet-specific allowlist + +# Check current archy-net subnet +podman network inspect archy-net --format "{{range .Subnets}}{{.Subnet}}{{end}}" + +# If Bitcoin RPC refuses connections from containers: +# Update bitcoin.conf rpcallowip to 0.0.0.0/0 (safe: only accessible via port mapping) +``` + ### Fix 5: Health Check Issues #### Add missing health check to running container Can't add to running container — must recreate with health check flags: ```bash # Example for a web app -sudo podman run ... \ +podman run ... \ --health-cmd "curl -f http://localhost:PORT/health || exit 1" \ --health-interval 30s \ --health-timeout 5s \ @@ -164,10 +187,10 @@ sudo podman run ... \ #### Fix unhealthy container ```bash # See what the health check is actually running -sudo podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}" +podman inspect CONTAINER_NAME --format "{{.Config.Healthcheck.Test}}" # Run the health check manually to see the error -sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND +podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND # Common fixes: # - curl not installed in container → use wget or nc instead @@ -179,13 +202,10 @@ sudo podman exec CONTAINER_NAME HEALTH_CHECK_COMMAND ```bash # Check what capabilities container has -sudo podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}" +podman inspect CONTAINER_NAME --format "{{.HostConfig.CapAdd}}" # If missing required caps, must recreate with correct --cap-add flags # Refer to the capability reference in /podman-doctor references - -# Fix data directory permissions -sudo chown -R 1000:1000 /var/lib/archipelago/APP_NAME/ ``` ### Fix 7: Full Config Consistency Fix @@ -199,12 +219,108 @@ When port map is inconsistent across layers, fix ALL layers: 5. **Deploy**: `./scripts/deploy-to-target.sh --live` 6. **Verify**: `curl -I http://192.168.1.228/app/APP_ID/` +### Fix 8: Rootless UID Mapping (Permission Denied on Volumes) + +This is the #1 rootless-specific issue. Container UIDs are remapped by user namespaces. + +**Formula**: `host_uid = 100000 + container_uid` + +```bash +# Fix UID 0 containers (most apps — run as root inside, mapped to 100000 on host) +sudo chown -R 100000:100000 /var/lib/archipelago/APP_NAME + +# Fix Bitcoin (container UID 101 → host UID 100101) +sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin + +# Fix PostgreSQL (container UID 70 → host UID 100070) +sudo chown -R 100070:100070 /var/lib/archipelago/postgres-APP_NAME + +# Fix Grafana (container UID 472 → host UID 100472) +sudo chown -R 100472:100472 /var/lib/archipelago/grafana + +# Fix MariaDB (container UID 999 → host UID 100999) +sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool +``` + +**How to find the right UID for a new container:** +```bash +# Check what user the container image runs as +podman inspect IMAGE_NAME --format "{{.Config.User}}" +# If empty = root (UID 0) → host UID 100000 +# If number → host UID = 100000 + that number +# If username → run: podman run --rm IMAGE_NAME id +``` + +After fixing ownership, restart the container: +```bash +podman restart CONTAINER_NAME +``` + +### Fix 9: UFW Forward Policy (LAN Access Broken) + +If containers work locally but not from other machines on the network: + +```bash +# Check current policy +grep DEFAULT_FORWARD_POLICY /etc/default/ufw + +# Fix: change DROP to ACCEPT +sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw +sudo ufw reload +``` + +### Fix 10: Systemd Sandbox Too Restrictive + +If the Rust backend can't scan/manage containers after a systemd update: + +```bash +# Check what's blocked +sudo journalctl -u archipelago --since "10 min ago" | grep -i "denied\|permission\|namespace\|syscall" + +# The archipelago.service MUST have these for rootless podman: +# ProtectHome=no +# PrivateTmp=no (or disabled) +# RestrictNamespaces= (NOT SET — don't restrict) +# SystemCallFilter= (NOT SET — don't filter) +# ReadWritePaths=/var/lib/archipelago /etc/containers /var/lib/containers /run/containers /run/user /tmp +# Environment=XDG_RUNTIME_DIR=/run/user/1000 +``` + +Edit the service file: +```bash +sudo systemctl edit archipelago.service +# Add overrides, then: +sudo systemctl daemon-reload +sudo systemctl restart archipelago +``` + +### Fix 11: Stale Podman Processes + +If `podman ps` hangs or is very slow: + +```bash +# Kill stuck podman processes (>10 of them = something is wrong) +stuck=$(pgrep -c -f "podman ps\|podman stats" 2>/dev/null || echo 0) +if [ "$stuck" -gt 10 ]; then + pkill -f "podman ps\|podman stats" + echo "Killed $stuck stuck podman processes" +fi + +# Kill orphaned conmon processes holding ports +for pid in $(pgrep conmon); do + container=$(cat /proc/$pid/cmdline 2>/dev/null | tr '\0' ' ' | grep -oP '(?<=--cid )\S+') + if [ -n "$container" ] && ! podman ps -a --format "{{.ID}}" | grep -q "${container:0:12}"; then + kill "$pid" 2>/dev/null && echo "Killed orphan conmon $pid" + fi +done +``` + ## After Fixing Always verify the fix: ```bash # Container running? -sudo podman ps --filter name=CONTAINER_NAME +podman ps --filter name=CONTAINER_NAME # Port reachable? curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/ @@ -213,7 +329,10 @@ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:PORT/ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1/app/APP_ID/ # Health check passing? -sudo podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}" +podman inspect CONTAINER_NAME --format "{{.State.Health.Status}}" + +# Volume permissions correct? (rootless check) +podman exec CONTAINER_NAME ls -la /data/ 2>/dev/null || echo "Check container data path" ``` Run `/podman-doctor` again to confirm all issues are resolved. diff --git a/.claude/skills/podman-uptime/SKILL.md b/.claude/skills/podman-uptime/SKILL.md index 5b080e56..7142f7d2 100644 --- a/.claude/skills/podman-uptime/SKILL.md +++ b/.claude/skills/podman-uptime/SKILL.md @@ -3,7 +3,8 @@ name: podman-uptime description: > Ensure 100% container uptime on Archipelago. Sets up systemd watchdog timers, verifies restart policies, creates health check monitors, and configures auto-recovery for all - containers. Use when asked to "ensure uptime", "containers keep dying", "auto-restart", + containers. Handles rootless Podman (user: archipelago, UID 1000, subuid 100000:65536). + Use when asked to "ensure uptime", "containers keep dying", "auto-restart", "watchdog", "container monitoring", "uptime guarantee", "keep containers running", "survive reboot", or to harden container reliability. allowed-tools: Bash Read Edit Write Glob Grep @@ -15,6 +16,31 @@ Ensures every Archipelago container survives reboots, recovers from crashes, and **SSH command**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` +> **ROOTLESS PODMAN**: All `podman` commands run as the `archipelago` user — NO sudo. +> Only use `sudo` for: systemd unit files, chown on volumes, UFW changes. +> The archipelago user runs containers directly via user namespaces. + +## Prerequisites for Rootless Uptime + +Before setting up uptime infrastructure, verify rootless Podman basics are working: + +```bash +# Must be the archipelago user +whoami # archipelago + +# User lingering must be enabled (keeps user services running after logout) +ls /var/lib/systemd/linger/ | grep archipelago || sudo loginctl enable-linger archipelago + +# XDG_RUNTIME_DIR must be set +echo $XDG_RUNTIME_DIR # /run/user/1000 + +# Subuid/subgid must be configured +grep archipelago /etc/subuid # archipelago:100000:65536 + +# UFW forward policy must be ACCEPT (for LAN access to containers) +grep DEFAULT_FORWARD_POLICY /etc/default/ufw # Must be "ACCEPT" +``` + ## Layer 1: Restart Policies (Survive Reboots) Every container MUST have `--restart unless-stopped`. This is non-negotiable. @@ -23,28 +49,31 @@ Every container MUST have `--restart unless-stopped`. This is non-negotiable. ```bash # Audit -for c in $(sudo podman ps -a --format "{{.Names}}"); do - policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") +for c in $(podman ps -a --format "{{.Names}}"); do + policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") echo "$c: $policy" done # Fix any with "no" or empty policy -for c in $(sudo podman ps -a --format "{{.Names}}"); do - policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") +for c in $(podman ps -a --format "{{.Names}}"); do + policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") if [ "$policy" = "no" ] || [ -z "$policy" ]; then echo "Fixing: $c" - sudo podman update --restart unless-stopped "$c" + podman update --restart unless-stopped "$c" fi done ``` ### Ensure podman auto-starts containers on boot -```bash -# Enable podman-restart service (restarts containers with restart policy on boot) -sudo systemctl enable podman-restart.service 2>/dev/null || true +For rootless Podman, containers with restart policies are auto-started by `podman-restart` as a **user** service: -# If podman-restart doesn't exist, create it +```bash +# Enable the rootless podman-restart user service +systemctl --user enable podman-restart.service 2>/dev/null + +# If the user service doesn't exist, create a system-level one +# (runs as archipelago user via User= directive) cat <<'EOF' | sudo tee /etc/systemd/system/podman-restart.service [Unit] Description=Podman Start All Containers With Restart Policy @@ -53,8 +82,12 @@ Wants=network-online.target [Service] Type=oneshot +User=archipelago +Group=archipelago +Environment=XDG_RUNTIME_DIR=/run/user/1000 ExecStart=/usr/bin/podman start --all --filter restart-policy=unless-stopped RemainAfterExit=yes +TimeoutStartSec=300 [Install] WantedBy=multi-user.target @@ -73,27 +106,31 @@ Create a systemd timer that checks container health every 2 minutes and restarts ```bash cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-container-watchdog.sh #!/bin/bash -# Archipelago Container Watchdog -# Checks all containers and restarts any that are stopped or unhealthy +# Archipelago Container Watchdog (Rootless Podman) +# Runs as archipelago user — NO sudo for podman commands LOG_TAG="container-watchdog" +# Run podman as the archipelago user with correct XDG path +export XDG_RUNTIME_DIR=/run/user/1000 +PODMAN="/usr/bin/podman" + # Restart any stopped containers that should be running (have restart policy) -for c in $(sudo podman ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}"); do +for c in $($PODMAN ps -a --filter status=exited --filter restart-policy=unless-stopped --format "{{.Names}}" 2>/dev/null); do logger -t "$LOG_TAG" "Restarting stopped container: $c" - sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG" + $PODMAN start "$c" 2>&1 | logger -t "$LOG_TAG" done # Restart unhealthy containers -for c in $(sudo podman ps --filter health=unhealthy --format "{{.Names}}"); do +for c in $($PODMAN ps --filter health=unhealthy --format "{{.Names}}" 2>/dev/null); do logger -t "$LOG_TAG" "Restarting unhealthy container: $c" - sudo podman restart "$c" 2>&1 | logger -t "$LOG_TAG" + $PODMAN restart "$c" 2>&1 | logger -t "$LOG_TAG" done # Check for containers in "created" state (never started) -for c in $(sudo podman ps -a --filter status=created --format "{{.Names}}"); do +for c in $($PODMAN ps -a --filter status=created --format "{{.Names}}" 2>/dev/null); do logger -t "$LOG_TAG" "Starting created container: $c" - sudo podman start "$c" 2>&1 | logger -t "$LOG_TAG" + $PODMAN start "$c" 2>&1 | logger -t "$LOG_TAG" done SCRIPT @@ -103,7 +140,7 @@ sudo chmod +x /usr/local/bin/archipelago-container-watchdog.sh ### Create the systemd timer ```bash -# Service unit +# Service unit — runs as archipelago user for rootless podman cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-watchdog.service [Unit] Description=Archipelago Container Watchdog @@ -111,6 +148,9 @@ After=podman-restart.service [Service] Type=oneshot +User=archipelago +Group=archipelago +Environment=XDG_RUNTIME_DIR=/run/user/1000 ExecStart=/usr/local/bin/archipelago-container-watchdog.sh EOF @@ -150,17 +190,20 @@ Some containers depend on others. The watchdog handles restarts, but initial boo ```bash cat <<'SCRIPT' | sudo tee /usr/local/bin/archipelago-ordered-start.sh #!/bin/bash -# Ordered container startup for Archipelago +# Ordered container startup for Archipelago (Rootless Podman) +# Runs as archipelago user — NO sudo for podman commands # Respects dependency chain: bitcoin → electrs/lnd → mempool/btcpay LOG_TAG="ordered-start" +export XDG_RUNTIME_DIR=/run/user/1000 +PODMAN="/usr/bin/podman" wait_for_container() { local name=$1 local max_wait=${2:-60} local waited=0 while [ $waited -lt $max_wait ]; do - status=$(sudo podman inspect "$name" --format "{{.State.Running}}" 2>/dev/null) + status=$($PODMAN inspect "$name" --format "{{.State.Running}}" 2>/dev/null) if [ "$status" = "true" ]; then logger -t "$LOG_TAG" "$name is running" return 0 @@ -174,38 +217,45 @@ wait_for_container() { # Tier 0: Infrastructure logger -t "$LOG_TAG" "Starting Tier 0: Infrastructure" -sudo podman start tailscale 2>/dev/null +$PODMAN start tailscale 2>/dev/null -# Tier 1: Bitcoin (foundation) -logger -t "$LOG_TAG" "Starting Tier 1: Bitcoin" -sudo podman start bitcoin-knots 2>/dev/null +# Tier 1: Databases (must start before services that depend on them) +logger -t "$LOG_TAG" "Starting Tier 1: Databases" +$PODMAN start mempool-db 2>/dev/null +$PODMAN start btcpay-postgres 2>/dev/null +$PODMAN start immich_postgres 2>/dev/null +sleep 5 + +# Tier 2: Bitcoin (foundation for Lightning and explorers) +logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin" +$PODMAN start bitcoin-knots 2>/dev/null wait_for_container bitcoin-knots 120 -# Tier 2: Bitcoin-dependent services -logger -t "$LOG_TAG" "Starting Tier 2: Bitcoin-dependent" -sudo podman start electrs 2>/dev/null -sudo podman start lnd 2>/dev/null -wait_for_container electrs 90 +# Tier 3: Bitcoin-dependent services +logger -t "$LOG_TAG" "Starting Tier 3: Bitcoin-dependent" +$PODMAN start electrumx 2>/dev/null +$PODMAN start lnd 2>/dev/null +wait_for_container electrumx 90 wait_for_container lnd 90 -# Tier 3: Services depending on Tier 2 -logger -t "$LOG_TAG" "Starting Tier 3: Second-order dependencies" -sudo podman start mempool-db 2>/dev/null -sleep 5 -sudo podman start mempool 2>/dev/null -sudo podman start nbxplorer 2>/dev/null +# Tier 4: Services depending on Tier 3 +logger -t "$LOG_TAG" "Starting Tier 4: Second-order dependencies" +$PODMAN start mempool 2>/dev/null +$PODMAN start nbxplorer 2>/dev/null sleep 10 -sudo podman start btcpay-server 2>/dev/null -sudo podman start btcpay-postgres 2>/dev/null +$PODMAN start btcpay-server 2>/dev/null +$PODMAN start fedimint 2>/dev/null +$PODMAN start fedimint-gateway 2>/dev/null -# Tier 4: Independent apps (start all remaining) -logger -t "$LOG_TAG" "Starting Tier 4: Independent apps" -sudo podman start --all 2>/dev/null +# Tier 5: Independent apps (start all remaining) +logger -t "$LOG_TAG" "Starting Tier 5: Independent apps" +$PODMAN start --all 2>/dev/null -# Tier 5: UI containers (need parent apps running first) -logger -t "$LOG_TAG" "Starting Tier 5: UI containers" -sudo podman start bitcoin-ui 2>/dev/null -sudo podman start lnd-ui 2>/dev/null +# Tier 6: UI containers (need parent apps running first) +logger -t "$LOG_TAG" "Starting Tier 6: UI containers" +$PODMAN start bitcoin-ui 2>/dev/null +$PODMAN start lnd-ui 2>/dev/null +$PODMAN start electrs-ui 2>/dev/null logger -t "$LOG_TAG" "Startup sequence complete" SCRIPT @@ -216,18 +266,22 @@ sudo chmod +x /usr/local/bin/archipelago-ordered-start.sh ### Wire into boot sequence ```bash +# Runs as archipelago user for rootless podman cat <<'EOF' | sudo tee /etc/systemd/system/archipelago-containers.service [Unit] Description=Archipelago Ordered Container Startup -After=network-online.target podman.service +After=network-online.target Wants=network-online.target Before=archipelago.service [Service] Type=oneshot +User=archipelago +Group=archipelago +Environment=XDG_RUNTIME_DIR=/run/user/1000 ExecStart=/usr/local/bin/archipelago-ordered-start.sh RemainAfterExit=yes -TimeoutStartSec=300 +TimeoutStartSec=600 [Install] WantedBy=multi-user.target @@ -237,14 +291,45 @@ sudo systemctl daemon-reload sudo systemctl enable archipelago-containers.service ``` +## Rootless-Specific Uptime Considerations + +### Volume ownership survives reboots +Volume ownership doesn't change on reboot, but if a container image is updated (re-pulled), the new container may run as a different UID. Always verify after image updates: + +```bash +# Quick ownership audit after image pull +podman inspect CONTAINER_NAME --format "{{.Config.User}}" +# Then verify: sudo stat -c '%u:%g' /var/lib/archipelago/APP_NAME +# Formula: host_uid = 100000 + container_uid +``` + +### XDG_RUNTIME_DIR on boot +Rootless Podman requires `/run/user/1000` to exist. This is created by `pam_systemd` when the user logs in, or by `loginctl enable-linger`. If it's missing after boot, containers won't start. + +```bash +# Verify it exists +ls -la /run/user/1000/ || echo "CRITICAL: /run/user/1000 missing — run: sudo loginctl enable-linger archipelago" +``` + +### Systemd sandbox must not block podman +If the archipelago.service sandbox blocks namespace/syscall operations, the Rust backend can't scan containers. See Fix 10 in /podman-fix. + ## Verification Checklist After setting up all 3 layers, verify: ```bash +echo "=== Rootless Podman Prerequisites ===" +echo "User: $(whoami)" +echo "XDG_RUNTIME_DIR: $XDG_RUNTIME_DIR" +grep archipelago /etc/subuid | head -1 +ls /var/lib/systemd/linger/ | grep archipelago && echo "Linger: enabled" || echo "Linger: DISABLED" +grep DEFAULT_FORWARD_POLICY /etc/default/ufw + +echo "" echo "=== Layer 1: Restart Policies ===" -for c in $(sudo podman ps -a --format "{{.Names}}"); do - policy=$(sudo podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") +for c in $(podman ps -a --format "{{.Names}}"); do + policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}") echo " $c: $policy" done @@ -261,11 +346,19 @@ sudo systemctl is-enabled archipelago-watchdog.timer 2>/dev/null || echo "watchd echo "" echo "=== Container Health Summary ===" -total=$(sudo podman ps -a --format "{{.Names}}" | wc -l) -running=$(sudo podman ps --format "{{.Names}}" | wc -l) +total=$(podman ps -a --format "{{.Names}}" | wc -l) +running=$(podman ps --format "{{.Names}}" | wc -l) stopped=$((total - running)) -unhealthy=$(sudo podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l) +unhealthy=$(podman ps --filter health=unhealthy --format "{{.Names}}" | wc -l) echo " Total: $total | Running: $running | Stopped: $stopped | Unhealthy: $unhealthy" + +echo "" +echo "=== Volume Ownership Spot Check ===" +for dir in bitcoin lnd grafana; do + if [ -d "/var/lib/archipelago/$dir" ]; then + echo " $dir: $(stat -c '%u:%g' /var/lib/archipelago/$dir)" + fi +done ``` ## Reboot Test @@ -274,17 +367,20 @@ The ultimate uptime test — reboot the server and verify everything comes back: ```bash # Before reboot: record running containers -sudo podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt +podman ps --format "{{.Names}}" | sort > /tmp/before-reboot.txt # Reboot sudo reboot # After reboot (wait ~3 minutes, then SSH back in): -sudo podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt +podman ps --format "{{.Names}}" | sort > /tmp/after-reboot.txt # Compare diff /tmp/before-reboot.txt /tmp/after-reboot.txt # Should show no differences + +# Also verify XDG_RUNTIME_DIR survived reboot +ls /run/user/1000/ || echo "CRITICAL: lingering not working" ``` ## Monitoring @@ -292,18 +388,23 @@ diff /tmp/before-reboot.txt /tmp/after-reboot.txt Check uptime status anytime: ```bash # Quick status -sudo podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort +podman ps -a --format "table {{.Names}}\t{{.Status}}" | sort # Watchdog activity sudo journalctl -t container-watchdog --since "24 hours ago" --no-pager # Container events (starts, stops, deaths) -sudo podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30 +podman events --since 24h --filter event=start --filter event=stop --filter event=died 2>/dev/null | tail -30 + +# Check for permission denied errors (rootless UID mapping issue) +podman ps -a --filter status=exited --format "{{.Names}}" | while read c; do + podman logs --tail 5 "$c" 2>&1 | grep -i "permission denied" && echo " ^ UID mapping issue in: $c" +done ``` ## Integration -- Run `/podman-doctor` first to identify issues -- Run `/podman-fix` for specific container repairs +- Run `/podman-doctor` first to identify issues (includes rootless health checks) +- Run `/podman-fix` for specific container repairs (includes UID mapping fixes) - Run `/podman-uptime` to set up permanent reliability infrastructure - Add to ISO build: copy watchdog scripts to `image-recipe/configs/` and enable in first-boot diff --git a/docs/architecture-review.html b/docs/architecture-review.html new file mode 100644 index 00000000..2c23e1b9 --- /dev/null +++ b/docs/architecture-review.html @@ -0,0 +1,1528 @@ + + + + + +Archipelago — Architecture Review & Learning Guide + + + + + + + + +
+ + +
+

Archipelago

+

A complete architecture review and learning guide for the Bitcoin Node OS — explained so anyone can understand it.

+
+ Rust + Vue 3 + Podman + ~46,000 lines of Rust + ~12,000 lines of TypeScript + ~100 shell scripts + v0.1.0-alpha +
+
+ + +

What Is Archipelago?

+ +

Archipelago (nicknamed "Archy") is a personal server operating system focused on Bitcoin. You download an ISO file, flash it to a USB drive, install it on any computer, and it gives you:

+ +
    +
  • A full Bitcoin node — you verify your own transactions, no trust in anyone else
  • +
  • A Lightning Network node — fast, cheap Bitcoin payments
  • +
  • A web dashboard — manage everything from your phone or laptop browser
  • +
  • An app marketplace — install apps like Nextcloud, Jellyfin, Vaultwarden with one click
  • +
  • Privacy by default — Tor routing, encrypted secrets, no telemetry
  • +
+ +
+

Think of it like an iPhone for servers. Apple gives you a phone with an App Store where you install apps. Archipelago gives you a server with a Marketplace where you install self-hosted apps. The difference? You own and control everything — your data never leaves your machine.

+
+ +

Similar projects exist (Umbrel, Start9, RaspiBlitz), but Archipelago is built from scratch with production-grade security and a custom Rust backend instead of Node.js.

+ + +

The Big Picture

+ +

Before diving into code, understand the four layers of the system and how they stack:

+ +
+┌──────────────────────────────────────────────────────┐ +│ YOUR BROWSER │ +│ (Vue.js Single Page Application) │ +└──────────────────────┬───────────────────────────────┘ + │ HTTP requests (fetch API) +┌──────────────────────┴───────────────────────────────┐ +│ NGINX │ +│ Reverse proxy — routes traffic to the right place │ +│ /rpc/v1 → backend /app/bitcoin/ → container │ +└──────────────────────┬───────────────────────────────┘ + │ Internal HTTP (port 5678) +┌──────────────────────┴───────────────────────────────┐ +│ RUST BACKEND │ +│ The brain — handles auth, app installs, Bitcoin │ +│ RPC, mesh networking, federation, health checks │ +└──────────────────────┬───────────────────────────────┘ + │ Podman commands (CLI) +┌──────────────────────┴───────────────────────────────┐ +│ PODMAN CONTAINERS │ +│ Bitcoin Core, LND, Mempool, Nextcloud, etc. │ +│ Each app runs isolated in its own container │ +└──────────────────────────────────────────────────────┘ + +┌──────────────────────────────────────────────────────┐ +│ DEBIAN 12 (Linux OS) │ +│ The foundation — systemd, firewall, filesystem │ +└──────────────────────────────────────────────────────┘ +
+ +
+ Key Concept: Separation of Concerns + Each layer has ONE job. The browser shows things. Nginx routes traffic. Rust makes decisions. Podman runs apps. This makes the system easier to understand, test, and fix — if the UI breaks, you know the problem is in the Vue code, not the Rust code. +
+ + +

How It Runs on a Machine

+ +

When you install Archipelago on a computer and power it on, here's what happens in order:

+ +
1

Linux boots — Debian 12 starts up, loads drivers, mounts disks

+
2

systemd starts services — A program called systemd reads archipelago.service and launches the Rust backend

+
3

Rust backend initializes — Loads config, creates/loads encryption keys, starts the HTTP server on port 5678

+
4

Health monitor starts — Checks which containers are running, restarts crashed ones, reports readiness

+
5

Nginx starts — Listens on port 80 (HTTP) and routes all incoming traffic

+
6

Containers start — Bitcoin, LND, and other apps start in priority order (Bitcoin first, then things that depend on it)

+
7

Ready! — You open a browser, go to your server's IP address, and see the dashboard

+ +
+

It's like starting a restaurant. First the building opens (Linux). Then the manager arrives (Rust backend). They check if all kitchen stations are ready (health monitor). The front door opens (Nginx). The cooks start preparing (containers). Customers can now order (you open the web UI).

+
+ + +

The Four Layers — Detailed

+ +

Layer 1: The Rust Backend (The Brain)

+ +

This is the most important piece. It's written in Rust — a programming language known for speed and safety. The backend is the "brain" that controls everything.

+ +
+ Why Rust? + Rust prevents entire categories of bugs (memory leaks, crashes, race conditions) at compile time. For a server that manages Bitcoin wallets and runs 24/7, this matters. A crash could mean lost money. Rust makes crashes nearly impossible. +
+ +

How the code is organized

+

The Rust code lives in core/ and is split into 5 separate packages (called "crates"):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CrateWhat It DoesSizeAnalogy
archipelagoThe main program. Contains all the API endpoints, authentication, identity management, federation, mesh networking~12,000 linesThe restaurant manager — coordinates everything
containerTalks to Podman to create, start, stop, and monitor containers~2,000 linesThe kitchen manager — controls the cook stations
securityEncrypts secrets, generates security profiles, verifies container images~500 linesThe security guard — locks doors, checks IDs
performanceMonitors CPU, memory, and disk usage~300 linesThe meter reader — watches resource gauges
parmanodeCompatibility layer for migrating from an older project~600 linesA translation book — speaks the old language
+ +

Key files you should know

+ + + + + + + + + + + +
FileWhat It DoesLines
main.rsThe entry point — starts the server, registers signal handlers~200
server.rsWires everything together — creates the HTTP server, connects components~500
api/rpc/mod.rsThe traffic cop — receives API calls and sends them to the right handler~1,000
api/rpc/package.rsThe app installer — handles installing, starting, stopping containers~1,770
session.rsLogin management — creates sessions, validates tokens, persists to disk~790
health_monitor.rsWatches containers, restarts crashed ones, reports system health~710
federation.rsMulti-node communication — syncs state between trusted Archipelago nodes~810
credentials.rsVerifiable credentials — W3C standard digital identity proofs~800
+ +

How the backend handles a request

+ +
+Browser sends: POST /rpc/v1 +Body: { "method": "package.install", "params": { "id": "bitcoin-knots" } } + +Step 1: Nginx receives it on port 80, forwards to port 5678 +Step 2: Rust HTTP server (Hyper) receives the raw bytes +Step 3: mod.rs parses the JSON, extracts the method name +Step 4: mod.rs checks the CSRF token (security check) +Step 5: mod.rs checks the session cookie (are you logged in?) +Step 6: mod.rs routes to package.rs based on method name +Step 7: package.rs validates the app ID, checks dependencies +Step 8: package.rs tells Podman to pull the container image +Step 9: package.rs creates and starts the container +Step 10: Response sent back: { "result": { "state": "installing" } } +
+ +
+ +

Layer 2: The Vue.js Frontend (The Face)

+ +

The frontend is what you see in the browser. It's built with Vue 3 — a JavaScript framework for building interactive web pages — and TypeScript — JavaScript with type safety.

+ +
+ What is a Single Page Application (SPA)? + Instead of loading a new HTML page every time you click something (like old websites), an SPA loads once and then dynamically updates the page content. When you click "Marketplace" in Archipelago, it doesn't load a new page — it swaps out the content area. This makes it feel fast and smooth, like a native app. +
+ +

Frontend file structure

+
neode-ui/src/
+├── api/              ← How the frontend talks to the backend
+│   ├── rpc-client.ts    ← Makes API calls (fetch + retry + auth)
+│   ├── container-client.ts ← Container-specific API helpers
+│   └── websocket.ts     ← Real-time updates (push, not poll)
+├── views/            ← Full pages (one per route)
+│   ├── Dashboard.vue    ← Main dashboard with sidebar
+│   ├── Marketplace.vue  ← App store for installing containers
+│   ├── Settings.vue     ← System settings
+│   ├── Web5.vue         ← Decentralized identity management
+│   ├── Mesh.vue         ← LoRa mesh radio interface
+│   └── Login.vue        ← Login page
+├── components/       ← Reusable UI pieces
+│   ├── BootScreen.vue   ← Startup loading animation
+│   ├── SplashScreen.vue ← Welcome/intro screen
+│   └── SpotlightSearch.vue ← Command palette (Cmd+K)
+├── stores/           ← State management (Pinia)
+│   ├── app.ts           ← Core app state (auth, server data)
+│   ├── container.ts     ← Container states & lifecycle
+│   ├── mesh.ts          ← Mesh networking state
+│   └── appLauncher.ts   ← App launching & iframe management
+├── composables/      ← Reusable logic (like React hooks)
+│   ├── useToast.ts      ← Notification popups
+│   └── useAudioPlayer.ts ← Sound effects
+├── types/            ← TypeScript type definitions
+│   └── api.ts           ← Shapes of data from the backend
+├── router/           ← URL → page mapping
+└── style.css            ← All global styles (glassmorphism theme)
+ +

How a Vue component works

+

Every .vue file has three sections:

+ +
<!-- 1. THE LOGIC (TypeScript) -->
+<script setup lang="ts">
+import { ref, onMounted } from 'vue'
+import { rpcClient } from '@/api/rpc-client'
+
+// "ref" is a reactive variable — when it changes, the UI updates automatically
+const apps = ref([])
+const loading = ref(true)
+
+// "onMounted" runs when the component first appears on screen
+onMounted(async () => {
+  apps.value = await rpcClient.getMarketplace()
+  loading.value = false
+})
+</script>
+
+<!-- 2. THE TEMPLATE (HTML with Vue directives) -->
+<template>
+  <div v-if="loading">Loading...</div>
+  <div v-else v-for="app in apps" class="glass-card">
+    {{ app.name }}
+  </div>
+</template>
+
+<!-- 3. THE STYLES (CSS, scoped to this component) -->
+<style scoped>
+  /* Styles here only affect THIS component */
+</style>
+ +
+

A Vue component is like a LEGO brick. Each brick (component) has its own shape (template), color (styles), and moving parts (script). You snap them together to build the full UI. The <Dashboard> component contains <Sidebar>, which contains <NavItem> components — just like nesting LEGO bricks.

+
+ +
+ +

Layer 3: The Container System (The Apps)

+ +

Containers are how Archipelago runs apps like Bitcoin Core, Lightning, Nextcloud, etc. Each app runs in its own isolated "box" called a container.

+ +
+ What is a Container? + A container is like a lightweight virtual machine. It has its own filesystem, its own network, and its own processes — but it shares the host's Linux kernel, so it's much faster than a full VM. Think of it as an apartment in a building — each apartment has its own walls and locks, but they all share the same building infrastructure. +
+ +

Archipelago uses Podman instead of Docker. They're nearly identical, but Podman runs without root privileges (more secure) and doesn't need a background daemon.

+ +

Container security rules

+

Every container in Archipelago follows strict security rules:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + +
RuleWhat It MeansWhy
--cap-drop=ALLRemove all Linux capabilities (super-powers)A hacked container can't do anything dangerous
--cap-add=CHOWNGive back only the specific powers neededMinimum privilege — only what's necessary
readonly_root: trueContainer can't modify its own program filesPrevents malware from modifying the app
--user 1001:1001Run as non-root userEven if exploited, can't access system files
no-new-privilegesCan't escalate to higher permissionsPrevents privilege escalation attacks
+ +

Container startup order (tiers)

+
+Tier 1: Foundation (start first, other apps depend on these) + ├── Bitcoin Core/Knots ← The blockchain + ├── MySQL/PostgreSQL ← Databases + └── Redis ← Cache + +Tier 2: Core Services (need Tier 1 to be running) + ├── LND (Lightning) ← Needs Bitcoin + ├── ElectrumX ← Needs Bitcoin + ├── Mempool ← Needs Bitcoin + ElectrumX + └── BTCPay Server ← Needs Bitcoin + LND + +Tier 3: Applications (independent or need Tier 2) + ├── Nextcloud, Jellyfin ← File storage, media + ├── Vaultwarden ← Password manager + ├── Home Assistant ← Smart home + └── Grafana ← Monitoring dashboards +
+ +
+ +

Layer 4: Nginx (The Traffic Cop)

+ +

Nginx (pronounced "engine-X") is a web server that sits between the internet and everything else. Every single request goes through it first.

+ +
+

Nginx is like the receptionist at a hospital. You walk in and say what you need. "I need the API" — they send you to the Rust backend. "I need the Bitcoin app" — they send you to the Bitcoin container. "I need the website" — they hand you the static files. Without the receptionist, you'd be wandering the hallways lost.

+
+ +

How Nginx routes traffic

+ + + + + + + + + +
URL PatternGoes ToWhy
/rpc/v1Rust backend (:5678)All API calls
/healthRust backend (:5678)Health checks (no auth needed)
/app/bitcoin-ui/Bitcoin container (:8334)Bitcoin web interface
/app/mempool/Mempool container (:4080)Mempool explorer
/app/filebrowser/FileBrowser container (:8083)File manager
/aiui/Static files on diskAI chat interface
/ (everything else)Vue.js SPA files on diskThe main dashboard
+ +

Nginx also handles rate limiting (blocking too many requests), security headers (preventing attacks), and WebSocket upgrades (for real-time updates).

+ + +

How Data Flows Through the System

+ +

Let's trace what happens when you click "Install Bitcoin" in the UI:

+ +
+
1

You click the Install button in Marketplace.vue. Vue calls the Pinia store action installPackage('bitcoin-knots')

+
2

The store calls the RPC client: rpcClient.installPackage('bitcoin-knots', 'docker.io/bitcoin/knots:28')

+
3

RPC client sends HTTP POST to /rpc/v1 with a session cookie and CSRF token for security

+
4

Nginx receives the request on port 80, checks rate limits, forwards to the Rust backend on port 5678

+
5

Rust backend validates — checks your session is valid, CSRF token matches, app ID is safe (no shell injection characters)

+
6

Rust checks dependencies — if you're installing LND, it checks Bitcoin is already running

+
7

Rust tells Podman to pull the imagepodman pull docker.io/bitcoin/knots:28 (downloads the app)

+
8

Rust creates and starts the container with all security flags (cap-drop, readonly root, etc.)

+
9

Backend sends a WebSocket update — the frontend receives a "state changed" event in real time

+
10

Vue reactively updates the UI — the Marketplace card changes from "Install" to "Running" with no page reload

+
+ + +

RPC: How Frontend Talks to Backend

+ +

RPC stands for Remote Procedure Call. It's a way for the frontend to tell the backend "do something" — like calling a function on a remote computer.

+ +
+ RPC vs REST + Most web APIs use REST (different URLs for different things: GET /users, POST /users, DELETE /users/5). Archipelago uses RPC instead — every request goes to the same URL (/rpc/v1) and the method name says what to do. It's like having one phone number for a building, and you say who you want to talk to. +
+ +

The frontend has a class called RPCClient (in rpc-client.ts) with ~70 methods. Each method maps to a backend function:

+ + + + + + + + +
Frontend MethodBackend HandlerWhat It Does
rpcClient.login(password)auth.loginLog in with password
rpcClient.getServerInfo()system.infoGet server name, version, uptime
rpcClient.installPackage(id, image)package.installInstall a container app
rpcClient.getBitcoinInfo()bitcoin.infoGet blockchain sync %, block height
rpcClient.sendMeshMessage(text)mesh.sendSend a message over LoRa radio
+ +

Built-in resilience

+

The RPC client has built-in protections:

+
    +
  • Auto-retry — if a request fails (502/503), it waits and tries again (up to 3 times)
  • +
  • Timeout — if the backend doesn't respond in 30 seconds, the request fails instead of hanging forever
  • +
  • Session expiry — if you get a 401 (unauthorized), it redirects to the login page
  • +
  • CSRF protection — every request includes a security token to prevent cross-site attacks
  • +
+ + +

State Management

+ +

State is the data your app is currently working with: is the user logged in? What apps are installed? Is Bitcoin synced? This data needs to be shared between components.

+ +
+ What is Pinia? + Pinia is Vue's state management library. Instead of each component keeping its own data (which leads to chaos), you put shared data in a "store" — a central place that any component can read from and write to. When the store changes, every component that uses it updates automatically. +
+ +

Archipelago has 15 Pinia stores:

+ +
+
+

app.ts god store

+

Auth, WebSocket, server data, package management — does too much (see refactoring section)

+
+
+

container.ts good

+

Container lifecycle — running, stopped, installing states

+
+
+

mesh.ts okay

+

LoRa radio state — device, peers, messages, channels

+
+
+

appLauncher.ts okay

+

App iframe management, Nostr consent, port mapping

+
+
+

spotlight.ts good

+

Command palette (Cmd+K) — search, help modal

+
+
+

goals.ts good

+

Gamified goal/quest tracking state machine

+
+
+ +

WebSocket: real-time updates

+

Instead of the frontend asking "has anything changed?" every second (polling), the backend pushes updates to the frontend through a WebSocket — a persistent, two-way connection.

+ +
+Traditional polling (slow, wasteful): + Frontend: "Anything new?" → Backend: "No" (every 1 second) + Frontend: "Anything new?" → Backend: "No" + Frontend: "Anything new?" → Backend: "Yes! Bitcoin synced!" + +WebSocket (fast, efficient): + Frontend ←→ Backend: persistent connection + Backend: "Bitcoin synced!" → Frontend instantly updates + Backend: "New container started!" → Frontend instantly updates +
+ + +

Authentication & Sessions

+ +

When you log in, the backend creates a session — a temporary "you're allowed in" token. Here's how it works:

+ +
1

You enter your password on the login page

+
2

Backend hashes it with bcrypt — a one-way function that makes it impossible to reverse

+
3

Backend compares the hash to the stored hash (never compares raw passwords)

+
4

Backend creates a session — generates a random 256-bit token using a cryptographically secure random number generator

+
5

Session ID sent as a cookie — the browser stores it and sends it with every request

+
6

CSRF token also sent — a second token that prevents cross-site request forgery attacks

+ +
+ Why two tokens? + The session cookie proves you're logged in. The CSRF token proves the request came from YOUR browser tab, not a malicious website that tricked your browser into sending a request. Both must match for any request to succeed. +
+ + +

Security Model

+ +

Archipelago is a defense-in-depth system — multiple layers of security so that if one fails, others still protect you.

+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LayerProtectionAgainst What
OSUFW firewall, AppArmor profilesNetwork attacks, process escape
NginxRate limiting, security headers, HSTSDDoS, XSS, clickjacking
BackendCSRF validation, session auth, input sanitizationCSRF, injection, unauthorized access
ContainersCapability dropping, readonly root, non-root userContainer escape, privilege escalation
CryptoChaCha20-Poly1305 encryption, Argon2 key derivation, ed25519 signaturesData theft, key compromise, impersonation
NetworkTor routing, onion servicesTraffic analysis, IP exposure
+
+ + +

Bitcoin Integration

+ +

Bitcoin is the heart of Archipelago. The backend communicates with Bitcoin Core/Knots using JSON-RPC — the same protocol Bitcoin has used since 2009.

+ +
+ Critical Rule: Never Use Floating Point for Bitcoin + Bitcoin amounts are always in satoshis (1 BTC = 100,000,000 sats) as integers. Using floating point (decimals) causes rounding errors. 0.1 + 0.2 ≠ 0.3 in floating point. When you're dealing with money, that's unacceptable. Archipelago uses u64 in Rust and BigInt in TypeScript for all Bitcoin amounts. +
+ +

Bitcoin RPC examples

+
// The backend calls Bitcoin Core like this:
+bitcoin_rpc("getblockchaininfo")   → sync progress, block height
+bitcoin_rpc("getnetworkinfo")      → peer count, version
+bitcoin_rpc("getmempoolinfo")      → unconfirmed transaction count
+bitcoin_rpc("estimatesmartfee", 6) → fee estimate for 6-block confirmation
+ + +

Federation & Multi-Node

+ +

Multiple Archipelago nodes can form a federation — a trusted network of servers that sync data, share state, and communicate privately.

+ +
+Your Node (.228) ←── Tor ──→ Friend's Node + │ │ + └──── Tor ──→ Office Node ←── Tor ──┘ + +Each node has: + • Ed25519 identity key (cryptographic identity) + • DID (Decentralized Identifier — like a username that can't be taken away) + • Onion address (Tor hidden service — no IP address exposed) + • DWN (Decentralized Web Node — stores and syncs data) +
+ +

Nodes discover each other through Nostr relays (publish presence, but never onion addresses — those are exchanged privately via encrypted DMs).

+ + +

Mesh Networking

+ +

Archipelago can communicate over LoRa radio — no internet needed. A small radio device plugs into the server's USB port and sends messages up to 10+ km using the Meshtastic/Meshcore protocol.

+ +
+

Imagine walkie-talkies that can send text messages. Each radio can relay messages for others, so even if two radios can't reach each other directly, they can communicate through intermediate radios. That's mesh networking — no cell towers, no ISPs, no internet required.

+
+ + +

Deploy System

+ +

The deploy script (scripts/deploy-to-target.sh) is how code gets from your development laptop to the live server. It's a 1,570-line shell script that automates everything:

+ +
1

Pre-flight checks — verifies SSH connectivity, checks git state, warns about uncommitted changes

+
2

Frontend build — runs npm run build to compile Vue/TypeScript into static files

+
3

Upload frontend — rsyncs built files to /opt/archipelago/web-ui/ on the server

+
4

Upload Rust source — rsyncs core/ to the server (builds ON the server, not macOS)

+
5

Build on server — runs cargo build --release on the Linux server

+
6

Sync configs — copies nginx config, systemd service from image-recipe/configs/

+
7

Restart services — reloads nginx, restarts the Rust backend via systemd

+
8

Health check — pings /health endpoint to verify everything came back up

+
9

Deploy manifest — writes a JSON file recording the commit, timestamp, and deploy status

+ +
+ Why build on the server? + Rust compiles to machine code specific to the CPU architecture. If you compile on macOS (ARM/x86) and copy the binary to a Linux server, it won't run — you get an "Exec format error". The deploy script sends the source code and compiles on the target machine. +
+ + +

ISO Build Process

+ +

The ISO build creates the installer that users flash to USB. It's a 1,775-line script that:

+ +
    +
  1. Downloads a Debian 12 Live ISO as the base
  2. +
  3. Creates a Docker container to build a custom root filesystem
  4. +
  5. Installs Podman, Nginx, and all system dependencies
  6. +
  7. Captures running container images from the live dev server
  8. +
  9. Bundles the frontend files, backend binary, and configs
  10. +
  11. Writes a first-boot script that sets everything up on install
  12. +
  13. Packages everything into a bootable ISO file
  14. +
+ + +

First Boot Sequence

+ +

When someone installs the ISO and boots for the first time, first-boot-containers.sh runs automatically and:

+ +
    +
  1. Generates unique credentials for this installation (Bitcoin RPC password, database passwords)
  2. +
  3. Sets up swap space based on available RAM
  4. +
  5. Creates the archy-net container network for inter-container communication
  6. +
  7. Starts 30+ containers in tiered order (databases first, then Bitcoin, then everything else)
  8. +
  9. Runs health checks on critical containers
  10. +
  11. Configures Tor hidden services
  12. +
+ + + + + +

Quality Scores

+ +

After reviewing ~46,000 lines of Rust, ~12,000 lines of TypeScript, and ~100 shell scripts, here are the quality scores:

+ +
+
+
Rust Error Handling
+
A
+

Zero unwrap/panic in prod code

+
+
+
TypeScript Safety
+
A
+

Strict mode, zero any types

+
+
+
Security
+
A-
+

Defense in depth, minor gaps

+
+
+
Frontend Architecture
+
A-
+

Well-organized, 1 god store

+
+
+
Backend Modularity
+
B+
+

Good separation, large files

+
+
+
Container Security
+
A
+

Cap-drop, readonly, non-root

+
+
+
Script Modularity
+
C+
+

Monolithic, no shared library

+
+
+
Test Coverage
+
D
+

No automated tests

+
+
+
CI/CD
+
D
+

Build only, no test gating

+
+
+
Documentation
+
B
+

Good docs, gaps in API ref

+
+
+
Dependency Hygiene
+
B-
+

Floating crypto versions

+
+
+
Deploy Safety
+
A-
+

Rollback, manifests, health checks

+
+
+ + +

What's Done Well

+ +
+

Rust: Exceptional Error Discipline

+

Zero unwrap() or panic!() in production code (only 2 expect() in startup code). Every fallible operation uses the ? operator to propagate errors gracefully. This is rare even in professional Rust codebases.

+ +

Input Validation is Thorough

+

App IDs validated against a strict character whitelist. Docker image names checked for shell injection characters. All external input sanitized at the boundary.

+ +

TypeScript Strict Mode Actually Used

+

All 5 strictest compiler flags enabled. Zero any types across 12,000+ lines. Every function has proper types. This prevents entire categories of bugs.

+ +

Container Security is Production-Grade

+

Every container drops all capabilities and adds back only what's needed. Read-only root filesystems. Non-root users. No-new-privileges. This is better than most commercial container platforms.

+ +

WebSocket Resilience

+

Auto-reconnection with exponential backoff, visibility change detection (handles tab switching), network online/offline detection. The real-time connection is very robust.

+ +

Composables Well-Factored

+

11 Vue composables, each focused on one concern (toasts, audio, keyboard, onboarding). Clean, reusable, properly scoped.

+ +

Deploy Safety Features

+

Rollback backups before deployment, deploy manifests tracking what was deployed, health checks after deployment, progress bars with ETAs.

+
+ + +

What Needs Fixing

+ +

Critical Issues fix now

+ +
+

1. package.rs is 1,770 lines — a "god file"

+

What: core/archipelago/src/api/rpc/package.rs handles ALL container operations: install, start, stop, configure ports, configure volumes, configure environment variables, dependency checking, image validation, progress streaming.

+

Why it's bad: You can't change one thing without risking breaking something else. It's impossible to test in isolation. Any new app requires modifying this massive file.

+

Fix: Split into app_config.rs (port/volume/env definitions), app_lifecycle.rs (install/start/stop), app_validation.rs (input checks, dependency verification).

+
+ +
+

2. Web5.vue is 3,901 lines — a "god component"

+

What: One Vue file contains 17 different sections: DID management, wallet, Nostr relays, credentials, voting, P2P peers, storage, profiles, marketplace, goals, data explorer, and more.

+

Why it's bad: Loading one massive component is slow. Changes to the voting section could break the wallet section. Impossible to reuse any section independently.

+

Fix: Split into 5+ sub-views under /dashboard/web5/ with their own routes.

+
+ +
+

3. No automated tests

+

What: Zero unit tests in the Rust backend. No integration tests. No end-to-end tests that run automatically. The only "test" is deploying and checking manually.

+

Why it's bad: Every change could break something, and you won't know until a user reports it. As the codebase grows, confidence in changes decreases.

+

Fix: Start with tests for the most critical paths: session validation, input sanitization, container lifecycle. Add CI that runs tests on every push.

+
+ +
+

4. useAppStore is a "god store" with 8+ responsibilities

+

What: One Pinia store handles: auth state, WebSocket connection, server data, package install/uninstall, server restart/shutdown, marketplace data, metrics, loading states.

+

Why it's bad: Every component that imports this store gets ALL of its complexity. Hard to track where state changes come from. Testing any one concern requires mocking everything else.

+

Fix: Split into auth.ts, server.ts, realtime.ts, keep app.ts as a thin data store only.

+
+ +

High Priority fix soon

+ +
+

5. Cryptographic dependency versions not pinned exactly

+

What: zeroize = "1.7", chacha20poly1305 = "0.10", ed25519-dalek = "2.1" use floating versions.

+

Why it's bad: A minor version bump in a crypto library could introduce a vulnerability or behavioral change. The project's own rules require exact pinning for crypto deps.

+

Fix: Pin to exact versions: "1.7.0", "0.10.1", "2.1.1".

+
+ +
+

6. No frontend-backend type synchronization

+

What: TypeScript types in types/api.ts are manually maintained copies of Rust structs. If the backend changes a field name, the frontend doesn't know until runtime.

+

Why it's bad: Types can drift apart silently. A backend developer renames sync_progress to syncProgress and the frontend breaks in production.

+

Fix: Generate TypeScript types from Rust structs (using ts-rs or a JSON Schema).

+
+ +
+

7. Container metadata duplicated in 3 places

+

What: App configuration (ports, volumes, env vars) exists in: package.rs (RPC handler), docker_packages.rs (metadata reader), health_monitor.rs (startup tiers).

+

Why it's bad: Adding a new app means updating 3 files. If you forget one, the app partially works but something is wrong.

+

Fix: Single app config source (manifest YAML or a shared Rust module) that all three consumers read from.

+
+ +
+

8. Deploy and ISO build scripts are 1,500+ lines each

+

What: Two monolithic shell scripts handle dozens of responsibilities each, with duplicated utility functions across 15+ scripts.

+

Why it's bad: Hard to review, hard to debug, hard to modify. One wrong change can break the entire deploy pipeline. No shared library means the same health-check loop is copy-pasted in 8 places.

+

Fix: Extract shared functions into scripts/lib/common.sh. Split deploy into modules: deploy-frontend.sh, deploy-backend.sh, sync-configs.sh, health-checks.sh.

+
+ +

Medium Priority improve over time

+ +
+

9. App integration requires updates in 6+ locations

+

What: Adding a new app to Archipelago requires manual changes in: manifest YAML, package.rs (backend config), docker_packages.rs (metadata), nginx config (routing), Marketplace.vue (frontend listing), appLauncher.ts (port mapping), first-boot-containers.sh (first boot), build-auto-installer-iso.sh (ISO capture).

+

Fix: Move toward a single manifest file per app that drives all of these automatically.

+
+ +
+

10. No CI/CD pipeline

+

What: One GitHub Action builds a macOS binary. No tests run. No linting. No deploy automation.

+

Fix: Add CI that runs cargo clippy, cargo test, npm run type-check, and npm run lint on every push.

+
+ +
+

11. Session persistence uses blocking I/O

+

What: On startup, session.rs reads sessions.json using synchronous (blocking) file I/O in an async context.

+

Fix: Use tokio::fs::read_to_string for non-blocking I/O at startup.

+
+ +
+

12. Inconsistent loading state patterns in frontend

+

What: Some components use loading, others isLoading, others loadingApps. No shared composable.

+

Fix: Create a useAsyncState composable that standardizes loading/error/data patterns.

+
+ + +

Refactoring Priorities

+ +

Ordered by impact — what to fix first for the biggest improvement:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
#TaskImpactEffortPriority
1Split package.rs into 3-4 focused fileshigh2-3 daysP0
2Split useAppStore into auth/server/realtimehigh2 daysP0
3Add CI pipeline (clippy + type-check + basic tests)high1 dayP0
4Split Web5.vue into sub-viewsmedium3 daysP1
5Pin all crypto dependency versions exactlymedium1 hourP1
6Extract shared shell library (lib/common.sh)medium1 dayP1
7Consolidate container metadata to single sourcemedium2 daysP1
8Generate TypeScript types from Rust structsmedium1 dayP2
9Split deploy script into moduleslow2 daysP2
10Add unit tests for critical paths (session, validation)high3 daysP2
11Create useAsyncState composable for frontendlow4 hoursP3
12Split large Vue components (SplashScreen, Mesh, Settings)low2 daysP3
+ + +

Technical Debt Map

+ +

A visual summary of where debt lives in the codebase:

+ +
+BACKEND (Rust) + ██████████ package.rs (1,770 lines — god file) + ████████ rpc/mod.rs (999 lines — giant match dispatcher) + ████████ lnd.rs (996 lines — could split) + ██████ mesh.rs, identity.rs, federation.rs (800+ lines each) + ████ session.rs, health_monitor.rs (700+ lines, acceptable) + ██ container crate (2,000 lines — well-scoped) + security, performance crates (clean) + +FRONTEND (Vue + TS) + ████████████ Web5.vue (3,901 lines — god component) + ██████████ Dashboard.vue (1,803 lines) + ████████ Mesh.vue, Settings.vue (1,500+ lines each) + ██████ useAppStore (317 lines — god store) + ████ rpc-client.ts (708 lines — well-designed) + ██ Composables (clean, focused) + Type safety (excellent) + +SCRIPTS (Shell) + ████████████ deploy-to-target.sh (1,570 lines) + ████████████ build-auto-installer-iso.sh (1,775 lines) + ██████ first-boot-containers.sh (739 lines) + ████ No shared library (8+ duplicated functions) + ██ Test scripts (well-organized) + +ARCHITECTURE + ████████ No automated tests (0% coverage) + ██████ No CI/CD test gating + ████ Manual type sync (Rust ↔ TypeScript) + ████ App integration requires 6+ file changes + ██ Security model (strong defense-in-depth) + ██ Deploy safety (rollback, manifests) + +Legend: ██ Critical ██ Needs attention ██ Good +
+ + +

Recommended Learning Path

+ +

If you want to understand this codebase deeply and become proficient in all the technologies, study in this order:

+ +
+

Phase 1: Foundations (Weeks 1-4)

+
    +
  1. Linux basics — commands, file permissions, processes, systemd
  2. +
  3. Git — branches, commits, diffs, rebasing
  4. +
  5. HTML/CSS/JavaScript — the building blocks of web UIs
  6. +
  7. TypeScript — JavaScript with type safety (read the official handbook)
  8. +
+
+ +
+

Phase 2: Frontend (Weeks 5-8)

+
    +
  1. Vue 3 Composition APIref, computed, watch, onMounted
  2. +
  3. Pinia — state management (read stores/container.ts as a good example)
  4. +
  5. Vue Router — URL-to-component mapping
  6. +
  7. Tailwind CSS — utility-first CSS framework
  8. +
  9. Vite — the build tool that bundles everything
  10. +
+
+ +
+

Phase 3: Backend (Weeks 9-14)

+
    +
  1. Rust basics — ownership, borrowing, lifetimes, pattern matching (read "The Rust Book")
  2. +
  3. Async Rust with Tokioasync/await, futures, tokio::spawn
  4. +
  5. Hyper — the HTTP server library (read server.rs)
  6. +
  7. Serde — JSON serialization/deserialization
  8. +
  9. Error handlinganyhow, thiserror, the ? operator
  10. +
+
+ +
+

Phase 4: Infrastructure (Weeks 15-18)

+
    +
  1. Containers — Docker/Podman concepts (images, containers, volumes, networks)
  2. +
  3. Nginx — reverse proxy, location blocks, upstream servers
  4. +
  5. Shell scripting — bash/zsh, set -e, functions, trap
  6. +
  7. systemd — service management, unit files, journalctl
  8. +
  9. Networking — TCP/IP, DNS, ports, firewalls (UFW)
  10. +
+
+ +
+

Phase 5: Bitcoin & Crypto (Weeks 19-24)

+
    +
  1. Bitcoin protocol — blocks, transactions, UTXOs, mining (read "Mastering Bitcoin")
  2. +
  3. Lightning Network — payment channels, routing, invoices
  4. +
  5. Cryptography — hashing, symmetric/asymmetric encryption, digital signatures
  6. +
  7. Tor — onion routing, hidden services, SOCKS5 proxy
  8. +
  9. Nostr — decentralized messaging protocol, NIPs
  10. +
  11. DIDs — Decentralized Identifiers, Verifiable Credentials
  12. +
+
+ +
+ Recommended first files to read +
    +
  1. neode-ui/src/stores/container.ts — Clean, well-structured Pinia store (312 lines)
  2. +
  3. neode-ui/src/api/rpc-client.ts — Well-designed API client with retry logic
  4. +
  5. core/archipelago/src/session.rs — Auth flow in Rust with crypto
  6. +
  7. core/container/src/podman_client.rs — How Rust talks to Podman
  8. +
  9. image-recipe/configs/nginx-archipelago.conf — The full routing map
  10. +
+
+ + +

Glossary

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TermWhat It Means
APIApplication Programming Interface — a defined way for two programs to talk to each other
Async/AwaitA way to write code that waits for slow things (network, disk) without blocking other work
BackendThe server-side code that runs on the machine (not visible to users)
ContainerAn isolated environment for running an app, like a lightweight virtual machine
ComposableA reusable piece of logic in Vue (similar to React hooks)
CSRFCross-Site Request Forgery — an attack where a malicious site tricks your browser into sending requests
CrateA Rust package (like npm package for JavaScript)
DIDDecentralized Identifier — a self-owned digital identity (no central authority controls it)
DWNDecentralized Web Node — personal data storage that syncs across your devices
FrontendThe browser-side code that users see and interact with
ISOA disk image file — like a digital copy of an installation CD
JWTJSON Web Token — a compact way to pass verified identity between systems
LoRaLong Range radio — low-power wireless communication over several kilometers
NginxA web server that also works as a reverse proxy (routes traffic to the right service)
NostrA decentralized messaging protocol using public/private key pairs
Onion ServiceA Tor hidden service — a server accessible only through the Tor network (no IP address)
PiniaVue's official state management library (successor to Vuex)
PodmanA container runtime like Docker, but rootless (more secure)
RPCRemote Procedure Call — calling a function on another computer over the network
ReactiveData that automatically updates the UI when it changes (core Vue concept)
Reverse ProxyA server that sits between clients and backend servers, forwarding requests
RustA systems programming language focused on safety and performance
SPASingle Page Application — a web app that loads once and dynamically updates content
Satoshi (sat)The smallest unit of Bitcoin. 1 BTC = 100,000,000 sats
systemdLinux's service manager — starts, stops, and monitors background services
TokioRust's async runtime — handles thousands of concurrent operations efficiently
TorThe Onion Router — anonymizes internet traffic by routing through multiple relays
TypeScriptJavaScript with static types — catches bugs at compile time instead of runtime
Vue 3A JavaScript framework for building reactive user interfaces
WebSocketA persistent, two-way connection between browser and server for real-time data
+ +
+

+ Architecture Review — Archipelago v0.1.0-alpha — Generated 2026-03-18
+ ~46,000 lines Rust · ~12,000 lines TypeScript · ~100 shell scripts +

+ +
+ + + + diff --git a/neode-ui/dev-dist/sw.js b/neode-ui/dev-dist/sw.js index b3b53078..713ca6bd 100644 --- a/neode-ui/dev-dist/sw.js +++ b/neode-ui/dev-dist/sw.js @@ -82,7 +82,7 @@ define(['./workbox-21a80088'], (function (workbox) { 'use strict'; "revision": "3ca0b8505b4bec776b69afdba2768812" }, { "url": "index.html", - "revision": "0.a4nevj6csc4" + "revision": "0.2lte02eatlc" }], {}); workbox.cleanupOutdatedCaches(); workbox.registerRoute(new workbox.NavigationRoute(workbox.createHandlerBoundToURL("index.html"), {