Remove the Bitcoin RPC 60-second gate that blocked 13+ dependent containers
(mempool, electrumx, btcpay, lnd, fedimint) from being created on first boot.
Containers now always get created and auto-restart via health monitor once
Bitcoin becomes responsive — the designed recovery path.
Additional hardening:
- Validate archy-net creation with retry (silent failure broke DNS)
- Verify critical images are loaded, re-load from tarballs if missing
- Create SearXNG settings.yml before container start (was missing)
- Run reconciler automatically after first-boot failures
- Add load-images as explicit systemd dependency with 900s timeout
- Propagate config write errors in install.rs (bitcoin.conf, lnd.conf)
- FileBrowser password change: retry loop (6 attempts) + 0o600 perms
- Post-start verification: detect containers that exit immediately
- Add 2s dependency waits between container starts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Install, start, and failure events logged to
/var/log/archipelago-container-installs.log with timestamps
- Enables post-mortem debugging of container lifecycle issues
- UI container hooks: try registry pull before local build fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- session.rs: use OnceCell for remember_secret to prevent concurrent
requests on first boot from generating different HMAC secrets, which
caused CSRF token mismatch on every state-changing RPC call (app
install, start, stop all failed with "CSRF token missing or invalid")
- install.rs: write lnd.conf with Bitcoin RPC credentials before LND
container starts (prevents "bitcoin.mainnet must be specified" crash);
inject Bitcoin RPC auth into bitcoin-ui nginx.conf; add proper error
logging to UI container build/run steps; fix UI containers to use
--network=host (they proxy to localhost backend/bitcoin RPC)
- Tor: remove After=tor.service from archipelago-tor-helper.path to
break systemd ordering cycle that prevented Tor from starting on boot
- Seed screen: compact grid layout (2 cols mobile, 4 cols sm+) with
tighter padding to fit kiosk displays without scrolling
- Dockerfiles: remove nonexistent assets/ COPY from bitcoin-ui, fix
electrs-ui to COPY qrcode.js and EXPOSE 50002 (matches nginx.conf)
- image-versions.sh: add UI container image variables for registry
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend reads onion addresses from /var/lib/archipelago/tor-hostnames/.
This dir was never created on fresh installs, breaking connect wallet
and tor address display. Now copies from system Tor hidden service dirs.
Also fixed log() function ordering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The bitcoin-ui nginx proxy needs Basic Auth to talk to Bitcoin Core RPC.
The __BITCOIN_RPC_AUTH__ placeholder was not being replaced, causing a
browser login prompt. Now injects creds from secrets dir before build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tor was configured in torrc but never started by first-boot-containers.sh.
Connect wallet and .onion services were broken on fresh installs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Chromium kiosk: add --disable-gpu-compositing, --disable-gpu-rasterization,
--disable-software-rasterizer, --renderer-process-limit=1
drops GPU process from 64% to 12% CPU
- Container healthchecks: 30s to 120s interval in first-boot and reconcile
- AppCard: min-height on description so cards dont shift
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical bug: the doctor runs as root but containers are rootless
under the archipelago user. When checking if a conmon process has an
associated container, the root podman database was queried (empty),
causing ALL conmon processes to be identified as orphaned and killed.
This terminated running containers every 30 minutes.
Fix: use sudo -u archipelago to query the rootless podman database.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Nginx needs CHOWN, SETUID, SETGID to chown cache directories and drop
privileges on startup. LND UI additionally needs NET_BIND_SERVICE to
bind port 80 inside the container. Without these, cap-drop ALL causes
nginx to crash with "Operation not permitted" on chown or bind.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The doctor runs as root (for tor permissions, process cleanup) but
containers are rootless under the archipelago user. Use sudo -u to
switch user context for podman commands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rootless Podman 4.x restart policies don't auto-restart containers
after crashes. The doctor (which runs on a timer) now checks for
exited core containers (tiers 0-2) and restarts them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The health check command goes through multiple shell layers
(assignment → variable expansion → eval → podman → sh -c). Inner
double quotes need \\\" escaping to survive as literal " in Python.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The electrumx container image doesn't include curl. Replace the HTTP
health check with a Python socket connection test to the RPC port.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- electrumx: add DAC_OVERRIDE to SPEC_CAPS — rootless podman maps container
UID 0 to host UID 1000, but volumes are owned by host UID 100000; without
DAC_OVERRIDE the container can't write to its own data directory
- lnd: replace curl-based health check with lncli using readonly macaroon —
the REST API requires macaroon auth, so unauthenticated curl always fails
- grafana: add DAC_OVERRIDE to SPEC_CAPS for the same rootless volume issue
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add orchestration_tests.rs + mock_podman.rs (container unit tests)
- Add container-tests.yml CI workflow
- Add dev-container-test.sh for local testing
- MASTER_PLAN.md: add TASK-49 (P0) with 6-phase plan
- Login.vue: minor fixes from user testing
- AppCard.vue: enter key handler fix
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- image-versions.sh: fix 15+ tag mismatches against actual registry
(bitcoin-knots:28.1→latest, lnd:v0.18.5→v0.18.4, grafana:11.4→10.2,
vaultwarden:1.32.5→1.30.0-alpine, nextcloud:29→28, etc.)
- .bashrc: add /sbin:/usr/sbin to PATH so reboot/shutdown work
- Tailscale: add Arch Atob node (100.113.33.31)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Option 0 in dev-start.sh launches the branding development workflow:
- Finds latest ISO on Desktop or results/
- Patches branding files into the ISO
- Boots in QEMU for immediate visual feedback
- Lists editable files if no ISO is available
Edit background.png, theme.txt, or Plymouth files, re-run option 0,
see changes in ~10 seconds without a full CI build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FileBrowser crash fix:
- Add --cap-add=NET_BIND_SERVICE (port 80 needs it with --cap-drop=ALL)
- Add --cap-add=DAC_OVERRIDE for rootless volume access
- Both in first-boot script and backend config.rs
Test script fixes:
- Extract csrf_token cookie and send as X-CSRF-Token header on RPC calls
- Add --phase1-only flag for safe install-only checks (no side effects)
- Auto-test service uses --phase1-only so it doesn't steal onboarding
Install fixes:
- Pre-create ~/.local/share/containers (ReadWritePaths mount namespace error)
- Fix console-setup.service: add After=tmp.mount + ExecStartPre mkdir /tmp
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All container image references now pull from 80.71.235.15:3000/archipelago/
instead of Docker Hub and ghcr.io. image-versions.sh is the single source
of truth; all scripts use $*_IMAGE variables instead of hardcoded refs.
Files updated:
- scripts/image-versions.sh: central ARCHY_REGISTRY variable
- core/*/config.rs: registry whitelist includes app registry
- core/*/stacks.rs: Immich + Penpot stack images
- scripts/{first-boot,deploy-to-target,container-specs}.sh: use variables
- docker/*/Dockerfile: nginx base image from registry
- image-recipe/: ISO build, podman config, menu script
- scripts/{container-doctor,deploy-bitcoin-knots,fix-indeedhub,validate-app-manifest}.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Dashboard.vue: move DashboardMobileNav outside <main> so position:fixed
isn't broken by will-change:transform on the perspective container
- Add container-specs.sh and reconcile-containers.sh utility scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove || exit 1 from health-cmd (redundant, breaks SSH heredoc)
- Use --health-cmd 'cmd' format (space, not equals) for proper quoting
- Simplify bitcoin health check to bitcoin-cli getnetworkinfo (no creds needed)
- Fix MariaDB health check nested quote issue
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix shell escaping in LND config sync block (single-quoted SSH context
doesn't need backslash-escaped dollars)
- deploy-tailscale.sh BUILD_SOURCE auto-detects Tailscale IP when LAN
unreachable (fixes "No binary on .228" error)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add --all as alias for --fleet
- Fleet deploy auto-detects Tailscale IP when LAN SSH fails
- Skip .198 gracefully when unreachable instead of failing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create scripts/smoke-test.sh for live server verification (7 checks)
- Document planned GitHub Actions CI/CD pipeline in docs/ci-cd-plan.md
- Integration tests deferred to future task (require test harness setup)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- S10: Add warnings to silent health check failures in deploy scripts
- S11: Add trap cleanup for temp dirs in deploy and tailscale scripts
- S12: Quote 20+ critical unquoted variables across deploy scripts
- S13: Extract hardcoded IPs to deploy-config-defaults.sh
- S15: Add --memory=256m to UI container runs
- F16: Remove in-memory JWT, use cookie-only auth in filebrowser client
- F17: Add meta tag fallback for CSRF token in RPC client
- F19: Track and clear setTimeout in AppSession on unmount
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- I2: Add MemoryMax=4G, LimitNOFILE=65535, TasksMax=2048 to systemd service
- I3: Tor rotation keeps old service for 1h transition before cleanup
- R14: Replace .parse().unwrap() with .unwrap_or(localhost) in rate limiter
- R15: Replace 7 unwrap/expect in mesh protocol with proper error propagation
- R27: Add 10s timeouts to mesh Bitcoin RPC calls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- F4: Fetch fresh server state after WebSocket reconnect
- F5: Guard message polling timer with auth check, stop on logout
- F6: Remove NIP-07 listener in appLauncher close()
- F7: Initialize audio player once to prevent listener stacking
- S3: Pin all container images to specific versions, create image-versions.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- F1: Guard connectWebSocket against concurrent calls with isWsConnecting flag
- F2: Serialize mesh send operations with sendQueue to prevent fetchMessages races
- F3: Add global Vue error handler with toast notification
- S1: Replace sudo podman with podman across all scripts (rootless Podman)
- S2: Add health-cmd to all 40 container run commands in first-boot-containers.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major changes:
- Full Tor hidden service management via systemd path unit pattern
(tor-helper.sh + archipelago-tor-helper.path/service) — respects
NoNewPrivileges=yes, no sudo needed from backend
- Container doctor: prefer system Tor over container, remove archy-tor
- Deploy script: fix torrc generation (read correct services.json path),
web apps map port 80→local port, enable both tor and tor@default
- Federation: server rename pushes name to peers via background sync
- Server name: fix root-owned file, optimistic store update
- Mesh: local echo for sent messages, sendingArch loading state
- Web5: Message button → Mesh redirect, node name lookup in messages
- PeerFiles: show DID not onion in header
- Connected Nodes: flex-1 instead of fixed max-h
- Toast notifications route to Mesh
- Deploy script: fix single-quote syntax in SSH block
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Part 1 — DID Persistence:
- Deploy script creates /var/lib/archipelago/identity/ directory
- First-boot script creates identity dir with proper ownership
- Identity load now logs pubkey to confirm persistence across restarts
Part 2 — Node Names:
- NodeStateSnapshot includes node_name field
- build_local_state() passes server name to sync responses
- update_node_state() stores peer's announced name on the FederatedNode
- Names propagate automatically during federation.sync-state
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures LND Connect works through every deployment path:
- Nginx: CORS $http_origin on /lnd-connect-info (both HTTP+HTTPS)
- Nginx: no cookie gate (backend is 127.0.0.1-only)
- LND UI source: fetch with credentials: 'include'
- Deploy: rebuilds LND UI with --no-cache every deploy
- First-boot: --restart unless-stopped + memory limits on UI containers
- Backend: bound to 127.0.0.1:5678 in systemd service
Root cause was CORS: LND UI on :8081 fetching :80 is cross-origin.
Browser blocked reading the 200 response without CORS headers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New Discover.vue (app store redesign)
- Fleet.vue dashboard for .228
- MeshMap.vue component
- Fixed Discover.vue type errors (unused var, type predicate)
- Various UI updates (Apps, Dashboard, Marketplace, Mesh, Web5)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>