archy/.claude/memory/project_deploy_session_2026_03_22.md
Dorian d37165ca52 fix: deploy credential sync, health checks, rootless port binding
- LND config always synced with secrets/bitcoin-rpc-password before
  starting (both deploy scripts) — fixes 401 auth errors on all nodes
- Replace eval "$DB_PASSWORDS" with safe individual SSH reads in
  deploy-tailscale.sh (eliminates command injection risk)
- Add MariaDB password sync step after container start (ALTER USER)
- Add --health-cmd to all 25 containers in deploy-tailscale.sh
- FileBrowser uses --user 0:0 for rootless port 80 binding (both scripts)
- Fedimint env var fixed: FM_REL_NOTES_ACK=0_4_xyz

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 14:16:11 +00:00

4.1 KiB

name, description, type
name description type
Deploy session 2026-03-22 findings Comprehensive deploy/build fixes made overnight — container issues, image tags, script improvements, remaining work project

Session Summary (2026-03-22 overnight)

Massive deploy infrastructure overhaul across all 5 nodes (.228, .198, Arch 1/2/3).

Fixed in deploy-tailscale.sh

  • Image tags: Bitcoin Knots 28.1 (not v28.1), BTCPay 1.13.7 (not 1.14.5), SearXNG 2026.3.20-6c7e9c197
  • Removed Immich (3 containers) and Penpot (5 containers) from deploy + build
  • Fedimint: FM_REL_NOTES_ACK=0_4_xyz env var (NOT FM_SKIP_REL_NOTES_ACK or FM_REQ_RELEASE_NOTES_ACK_V0_4)
  • Fedimint-gateway: --password instead of --bcrypt-password-hash (v0.5.1 CLI change)
  • FileBrowser: added --cap-add NET_BIND_SERVICE for port 80 binding
  • SearXNG: added /var/lib/archipelago/searxng:/etc/searxng volume mount + caps
  • Postgres: pinned to postgres:15 (data initialized with 15, incompatible with 16)
  • Migration: one-time flag file /var/lib/archipelago/.rootless-migrated
  • Recreate-if-broken pattern: containers that exist but are stopped get deleted and recreated
  • Arch 2 hostname: fixed from hardcoded hostname to $TAILSCALE_ARCH2
  • Custom UI images: graceful skip if not available, source extracted to repo (docker/bitcoin-ui/, docker/electrs-ui/)
  • AIUI tar xattr: silenced with --no-xattrs (only in deploy-tailscale.sh, NOT deploy-to-target.sh yet)
  • Nginx MIME warning: removed text/html from sub_filter_types

Added

  • --fleet flag in deploy-to-target.sh: deploys .228 → .198 → Arch 1/2/3
  • --both lock fix: releases lock before recursive --live call
  • Container verification step (Step 26b): restarts exited containers, fixes permissions, checks Tor
  • IndeedHub backend stack rebuilt on .228 (7 containers)
  • IndeedHub nginx patched with direct IPs (podman DNS doesn't work with nginx resolver)

Frontend changes

  • Replaced Immich with FileBrowser on Setup homescreen (goals.ts, EasyHome.vue)
  • MEMPOOL_API_IMAGE renamed to MEMPOOL_BACKEND_IMAGE in image-versions.sh
  • Nextcloud downgraded from 30 to 29 (one major version upgrade at a time)

Session 2 fixes (same day)

Critical pattern found: Container credential mismatches

  • Deploy generates random passwords stored in secrets/. MariaDB/Postgres only use env vars on FIRST init — subsequent restarts ignore them. Container recreation with new passwords → auth failures → crash loops.
  • 50,000+ cumulative container restarts across fleet from this single root cause.

Fixes applied to all nodes:

  1. LND: lnd.conf rpcpass synced from secrets/bitcoin-rpc-password (was hardcoded archipelago123)
  2. MariaDB mempool: data dirs wiped + reinitialized (password mismatch unrecoverable)
  3. BTCPay Postgres: ALTER USER to sync password with secrets
  4. FileBrowser: --user 0:0 instead of --cap-add NET_BIND_SERVICE (rootless port 80 fix)
  5. Nextcloud: same --user 0:0 fix
  6. Tailscale container on .228: removed (2,685 restarts — unauthenticated, host already has TS)

Deploy script fixes:

  • deploy-tailscale.sh: LND config always synced before start, eval "$DB_PASSWORDS" → safe individual reads, MariaDB password sync step, filebrowser --user 0:0
  • deploy-to-target.sh: LND stale config check now compares passwords (not just cookie/localhost), filebrowser --user 0:0

Rootless port 80 rule: Containers binding port 80 MUST use --user 0:0. NET_BIND_SERVICE cap doesn't work in rootless (UID 0 → host 100000, unprivileged).

Remaining issues for next session

  • Vaultwarden exit 101 on Arch 2 — likely corrupted SQLite DB
  • PhotoPrism storage permission on Arch 1 — file creation fails despite correct ownership
  • Arch 3 resource contention — 7.3GB RAM, load 14, 28 containers. May need to reduce container count.
  • Health checks missing on most containers (only filebrowser/jellyfin have them)
  • Tar xattr spam in deploy-to-target.sh (fixed in deploy-tailscale.sh only)
  • IndeedHub nginx IPs are ephemeral — need re-patch after container restart