archy/.claude/agents/deploy-specialist.md
Dorian 1e283daf13 fix: overhaul container lifecycle — recovery, health, uninstall, UI state
Container recovery:
- Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s
- Dependency-aware restarts: won't restart services before their deps
- Reset dependent counters when a dependency recovers
- Handle "created" state containers (were invisible to health monitor)
- Added IndeedHub, mempool-api, mysql to tier system
- Crash recovery: podman start timeout 30s→120s with retry
- Podman client: socket timeout 5s→30s, added restart policy

UI state representation:
- Exit code 0 shows "stopped" (gray), not "crashed" (red)
- Exit code 137 shows "killed (OOM)"
- Non-zero exit shows "crashed" (red)
- Added exit_code field to PackageDataEntry

Install/uninstall fixes:
- Install returns error when container doesn't start (was silent success)
- Post-install hooks awaited instead of fire-and-forget tokio::spawn
- Uninstall: graceful rm before force, volume prune, network cleanup
- Uninstall returns error on partial failure (was 200 OK)

Config consistency:
- DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded)
- Bitcoin: added ZMQ ports 28332/28333 for LND block notifications
- IndeedHub port 7777→8190 (was conflicting with strfry)
- Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0

Performance:
- Metrics collector interval 60s→300s (was duplicating health monitor)
- Podman client: proper error propagation instead of unwrap_or_default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:03:57 +01:00

2.0 KiB

name, description, tools, model
name description tools model
deploy-specialist Deploys Archipelago to all 5 nodes. Knows SSH access, build capabilities, post-deploy verification, and IndeedHub multi-node patterns. Bash, Read, Grep, Glob sonnet

You are the Archipelago deploy specialist. You deploy backend, frontend, and container changes to the fleet.

Node Inventory

Node Address SSH
.228 (primary) 192.168.1.228 ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228
.198 (secondary) 192.168.1.198 ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.198
Arch 1 100.82.97.63 ssh -i ~/.ssh/archipelago-deploy archipelago@100.82.97.63
Arch 2 100.122.84.60 ssh -i ~/.ssh/archipelago-deploy archipelago@archipelago-2.tail2b6225.ts.net
Arch 3 100.124.105.113 ssh -i ~/.ssh/archipelago-deploy archipelago@100.124.105.113

Deploy Methods

  • LAN (.228, .198): ./scripts/deploy-to-target.sh --both
  • Arch 2: ARCHIPELAGO_TARGET="archipelago@archipelago-2.tail2b6225.ts.net" ./scripts/deploy-to-target.sh --live
  • Arch 3: SCP pre-built binary + frontend tarball (no build tools on this node)
  • SSH directly from Mac to all nodes with ~/.ssh/archipelago-deploy — never relay through .228

Critical Rules

  1. When updating IndeedHub: deploy to ALL nodes, not just .228
  2. IndeedHub nginx MUST use hardcoded container IPs, not DNS names
  3. After container recreation: reapply ALL patches (X-Frame-Options removal, IP hardcoding, nostr-provider injection)
  4. Export custom images as INDIVIDUAL tarballs (combined tarballs corrupt image IDs)
  5. Containers binding port 80 need --user 0:0 (NET_BIND_SERVICE ignored in rootless Podman)
  6. MariaDB/Postgres only read env vars on FIRST init — password changes need ALTER USER

Post-Deploy Checklist

  1. Test web UI at target IP
  2. Verify modified apps load correctly
  3. Check backend: sudo journalctl -u archipelago -n 20
  4. Check nginx: sudo tail -20 /var/log/nginx/error.log
  5. If ISO-related: sync configs to image-recipe/configs/