archy/.claude/memory/project_deploy_session_2026_03_22.md
Dorian 4d1df4a319 docs: update deploy session memory with session 3 fixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 18:06:57 +00:00

6.1 KiB

name, description, type
name description type
Deploy session 2026-03-22 findings Comprehensive deploy/build fixes made overnight — container issues, image tags, script improvements, remaining work project

Session Summary (2026-03-22 overnight)

Massive deploy infrastructure overhaul across all 5 nodes (.228, .198, Arch 1/2/3).

Fixed in deploy-tailscale.sh

  • Image tags: Bitcoin Knots 28.1 (not v28.1), BTCPay 1.13.7 (not 1.14.5), SearXNG 2026.3.20-6c7e9c197
  • Removed Immich (3 containers) and Penpot (5 containers) from deploy + build
  • Fedimint: FM_REL_NOTES_ACK=0_4_xyz env var (NOT FM_SKIP_REL_NOTES_ACK or FM_REQ_RELEASE_NOTES_ACK_V0_4)
  • Fedimint-gateway: --password instead of --bcrypt-password-hash (v0.5.1 CLI change)
  • FileBrowser: added --cap-add NET_BIND_SERVICE for port 80 binding
  • SearXNG: added /var/lib/archipelago/searxng:/etc/searxng volume mount + caps
  • Postgres: pinned to postgres:15 (data initialized with 15, incompatible with 16)
  • Migration: one-time flag file /var/lib/archipelago/.rootless-migrated
  • Recreate-if-broken pattern: containers that exist but are stopped get deleted and recreated
  • Arch 2 hostname: fixed from hardcoded hostname to $TAILSCALE_ARCH2
  • Custom UI images: graceful skip if not available, source extracted to repo (docker/bitcoin-ui/, docker/electrs-ui/)
  • AIUI tar xattr: silenced with --no-xattrs (only in deploy-tailscale.sh, NOT deploy-to-target.sh yet)
  • Nginx MIME warning: removed text/html from sub_filter_types

Added

  • --fleet flag in deploy-to-target.sh: deploys .228 → .198 → Arch 1/2/3
  • --both lock fix: releases lock before recursive --live call
  • Container verification step (Step 26b): restarts exited containers, fixes permissions, checks Tor
  • IndeedHub backend stack rebuilt on .228 (7 containers)
  • IndeedHub nginx patched with direct IPs (podman DNS doesn't work with nginx resolver)

Frontend changes

  • Replaced Immich with FileBrowser on Setup homescreen (goals.ts, EasyHome.vue)
  • MEMPOOL_API_IMAGE renamed to MEMPOOL_BACKEND_IMAGE in image-versions.sh
  • Nextcloud downgraded from 30 to 29 (one major version upgrade at a time)

Session 2 fixes (same day)

Critical pattern found: Container credential mismatches

  • Deploy generates random passwords stored in secrets/. MariaDB/Postgres only use env vars on FIRST init — subsequent restarts ignore them. Container recreation with new passwords → auth failures → crash loops.
  • 50,000+ cumulative container restarts across fleet from this single root cause.

Fixes applied to all nodes:

  1. LND: lnd.conf rpcpass synced from secrets/bitcoin-rpc-password (was hardcoded archipelago123)
  2. MariaDB mempool: data dirs wiped + reinitialized (password mismatch unrecoverable)
  3. BTCPay Postgres: ALTER USER to sync password with secrets
  4. FileBrowser: --user 0:0 instead of --cap-add NET_BIND_SERVICE (rootless port 80 fix)
  5. Nextcloud: same --user 0:0 fix
  6. Tailscale container on .228: removed (2,685 restarts — unauthenticated, host already has TS)

Deploy script fixes:

  • deploy-tailscale.sh: LND config always synced before start, eval "$DB_PASSWORDS" → safe individual reads, MariaDB password sync step, filebrowser --user 0:0
  • deploy-to-target.sh: LND stale config check now compares passwords (not just cookie/localhost), filebrowser --user 0:0

Rootless port 80 rule: Containers binding port 80 MUST use --user 0:0. NET_BIND_SERVICE cap doesn't work in rootless (UID 0 → host 100000, unprivileged).

Session 3 fixes (2026-03-22 to 2026-03-24)

Additional container fixes applied live:

  • PhotoPrism: recreated with proper /photoprism/storage, /photoprism/originals, /photoprism/import volume mounts (all 3 nodes)
  • Vaultwarden/Jellyfin: recreated with --user 0:0 + health checks (Arch 1/2)
  • Nextcloud: downgraded image to v29 (data initialized with v28, can't skip to v30)
  • Fedimint: upgraded v0.5.1 → v0.10.0 on all Tailscale nodes
  • Fedimint-gateway: bcrypt hash passed via file mount (shell escaping workaround)
  • SearXNG: recreated with proper caps on Arch 2
  • Arch 3 right-sized: stopped immich (3), jellyfin, vaultwarden, nbxplorer (7.3GB RAM)

Deploy script improvements (6 commits pushed):

  1. d37165ca — Credential sync, health checks, rootless port binding
  2. f5714a5b — Fleet deploy falls back to Tailscale when LAN unreachable, --all alias
  3. 028248df — Suppress tar xattr spam in AIUI deploy (--no-xattrs)
  4. f5802f9e — Fix LND config SSH escaping, Tailscale fallback for BUILD_SOURCE
  5. 06d85e1d — Fix health check escaping for SSH heredoc (--health-cmd 'cmd' not "cmd")
  6. a7920de8 — Correct health check endpoints (fedimint→8175, nextcloud→/, filebrowser→/)

Health checks added to deploy-tailscale.sh:

  • 25 containers now have --health-cmd in deploy-tailscale.sh (was zero)
  • Key corrections: fedimint checks port 8175 (UI) not 8174 (websocket), nextcloud/filebrowser check / not custom endpoints

Fleet status at end of session:

Node Status Notes
.228 36/36, 0 unhealthy, load 1.0 Fully stable
Arch 1 25/25, 0 unhealthy, load 0.5 Fully stable
Arch 2 25/25, 0 unhealthy, load 0.2 Fully stable
Arch 3 24/28, 0 unhealthy, load 7.7 Right-sized for 7.3GB RAM, Bitcoin IBD at 97.8%
.198 Bitcoin chain data empty (4KB) Needs full IBD — will take days. Not pruned.

Remaining for next session

  • .198: Bitcoin doing full IBD from scratch (chain data was lost/empty). No prune flag set. Will take days.
  • Arch 3: Bitcoin IBD was at 97.8% — check if complete, then start LND/nbxplorer
  • Tor config Python syntax errors in deploy-to-target.sh step 33 (cosmetic, falls back to system Tor)
  • deploy-to-target.sh still missing health checks (only deploy-tailscale.sh has them)
  • first-boot-containers.sh needs same rootless fixes (filebrowser --user 0:0, credential sync)
  • Fedimint guardian setup not done on any node — all in "Setup UI" mode
  • User needs to git pull && ./scripts/deploy-to-target.sh --all to deploy latest fixes to Tailscale nodes