139 Commits

Author SHA1 Message Date
Dorian
6cad154028 fix: add NET_RAW capability to LND container for TLS cert generation
LND crashes with "netlinkrib: address family not supported by protocol"
in rootless podman because it needs NET_RAW to enumerate network
interfaces during TLS certificate generation. Added to capabilities
in config.rs, first-boot-containers.sh, and container-specs.sh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:19:09 +01:00
Dorian
22609abd64 fix: LND crash in rootless podman, improve container status labels
LND v0.18+ crashes with "netlinkrib: address family not supported"
because rootless podman blocks netlink access for TLS cert SAN
enumeration. Fix: add tlsextraip=0.0.0.0 and tlsextradomain=lnd
to lnd.conf so LND skips interface enumeration.

Also: fix status label to show "crashed" for both exited and
stopped containers with non-zero exit codes (previously only
caught "exited" state, but podman reports "stopped" for
restart-looping containers).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:01:51 +01:00
Dorian
8e8020833d feat: integrate demo TUI animations into real installer
Add install-tui.sh library with boot scan, logo decrypt reveal,
bouncing Bitcoin symbol progress bar, and celebration strobe.
The installer sources it if available, falls back to plain text
if missing (easy revert: just remove the source line).

Animations: CRT power-on scan, BIOS memory check simulation,
3D ASCII logo with character-by-character decrypt reveal,
progress bar with ₿ bouncing DVD-screensaver style during
long operations, logo color party on completion, flashing
"REMOVE THE USB DRIVE NOW" warning.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 21:23:29 +01:00
Dorian
31e87d98c1 fix: bulletproof first-boot container creation and install reliability
Remove the Bitcoin RPC 60-second gate that blocked 13+ dependent containers
(mempool, electrumx, btcpay, lnd, fedimint) from being created on first boot.
Containers now always get created and auto-restart via health monitor once
Bitcoin becomes responsive — the designed recovery path.

Additional hardening:
- Validate archy-net creation with retry (silent failure broke DNS)
- Verify critical images are loaded, re-load from tarballs if missing
- Create SearXNG settings.yml before container start (was missing)
- Run reconciler automatically after first-boot failures
- Add load-images as explicit systemd dependency with 900s timeout
- Propagate config write errors in install.rs (bitcoin.conf, lnd.conf)
- FileBrowser password change: retry loop (6 attempts) + 0o600 perms
- Post-start verification: detect containers that exit immediately
- Add 2s dependency waits between container starts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:31:00 +01:00
Dorian
843d778f90 fix: container security hardening, onboarding viewport scaling, boot screen cleanup
Container security:
- Add --cap-drop ALL + --security-opt no-new-privileges:true to 12 containers
  missing hardening in first-boot-containers.sh (mempool-db, electrumx,
  mempool-api, mempool-web, electrs-ui, btcpay-db, nbxplorer, nostr-rs-relay,
  strfry, tailscale, bitcoin-ui, lnd-ui)
- Mirror same hardening in deploy-to-target.sh for consistency
- Add --read-only + tmpfs to nostr-rs-relay
- Fix filebrowser deploy to include security flags
- Remove duplicate UI image definitions in image-versions.sh
- Separate Jellyfin capabilities (needs FOWNER, exec tmpfs for CoreCLR JIT)
- Harden archy-net creation with existence check and error handling

UI fixes:
- Fix onboarding viewport scaling: all 7 screens now use h-full + max-h-full
  pattern so containers never overflow viewport regardless of padding
- Remove path-option-card wrappers from seed verify inputs, left-justify labels
- Remove batteries/barbarian icons from boot screen (keep bitcoin, cloud, github, save)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:35:34 +01:00
Dorian
c7884919d2 fix: add persistent container install/start logging
- Install, start, and failure events logged to
  /var/log/archipelago-container-installs.log with timestamps
- Enables post-mortem debugging of container lifecycle issues
- UI container hooks: try registry pull before local build fallback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 11:09:49 +01:00
Dorian
4b0e1cfbe3 fix: CSRF race condition, UI containers, Tor ordering, seed layout
- session.rs: use OnceCell for remember_secret to prevent concurrent
  requests on first boot from generating different HMAC secrets, which
  caused CSRF token mismatch on every state-changing RPC call (app
  install, start, stop all failed with "CSRF token missing or invalid")

- install.rs: write lnd.conf with Bitcoin RPC credentials before LND
  container starts (prevents "bitcoin.mainnet must be specified" crash);
  inject Bitcoin RPC auth into bitcoin-ui nginx.conf; add proper error
  logging to UI container build/run steps; fix UI containers to use
  --network=host (they proxy to localhost backend/bitcoin RPC)

- Tor: remove After=tor.service from archipelago-tor-helper.path to
  break systemd ordering cycle that prevented Tor from starting on boot

- Seed screen: compact grid layout (2 cols mobile, 4 cols sm+) with
  tighter padding to fit kiosk displays without scrolling

- Dockerfiles: remove nonexistent assets/ COPY from bitcoin-ui, fix
  electrs-ui to COPY qrcode.js and EXPOSE 50002 (matches nginx.conf)

- image-versions.sh: add UI container image variables for registry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 11:06:19 +01:00
Dorian
64b57dca7d fix: overhaul container lifecycle — recovery, health, uninstall, UI state
Container recovery:
- Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s
- Dependency-aware restarts: won't restart services before their deps
- Reset dependent counters when a dependency recovers
- Handle "created" state containers (were invisible to health monitor)
- Added IndeedHub, mempool-api, mysql to tier system
- Crash recovery: podman start timeout 30s→120s with retry
- Podman client: socket timeout 5s→30s, added restart policy

UI state representation:
- Exit code 0 shows "stopped" (gray), not "crashed" (red)
- Exit code 137 shows "killed (OOM)"
- Non-zero exit shows "crashed" (red)
- Added exit_code field to PackageDataEntry

Install/uninstall fixes:
- Install returns error when container doesn't start (was silent success)
- Post-install hooks awaited instead of fire-and-forget tokio::spawn
- Uninstall: graceful rm before force, volume prune, network cleanup
- Uninstall returns error on partial failure (was 200 OK)

Config consistency:
- DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded)
- Bitcoin: added ZMQ ports 28332/28333 for LND block notifications
- IndeedHub port 7777→8190 (was conflicting with strfry)
- Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0

Performance:
- Metrics collector interval 60s→300s (was duplicating health monitor)
- Podman client: proper error propagation instead of unwrap_or_default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:03:57 +01:00
Dorian
08330e13f7 fix: populate tor-hostnames dir in first-boot for backend onion discovery
Backend reads onion addresses from /var/lib/archipelago/tor-hostnames/.
This dir was never created on fresh installs, breaking connect wallet
and tor address display. Now copies from system Tor hidden service dirs.
Also fixed log() function ordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 04:33:55 +01:00
Dorian
2f1515b9c6 fix: inject Bitcoin RPC auth into bitcoin-ui before build in first-boot
The bitcoin-ui nginx proxy needs Basic Auth to talk to Bitcoin Core RPC.
The __BITCOIN_RPC_AUTH__ placeholder was not being replaced, causing a
browser login prompt. Now injects creds from secrets dir before build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 03:48:22 +01:00
Dorian
6cced5d042 fix: start Tor in first-boot, ensure hidden services on fresh installs
Tor was configured in torrc but never started by first-boot-containers.sh.
Connect wallet and .onion services were broken on fresh installs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 03:43:01 +01:00
Dorian
92a429535a perf: reduce CPU — Chromium GPU flags, healthcheck 30s to 120s, app card fixed height
- Chromium kiosk: add --disable-gpu-compositing, --disable-gpu-rasterization,
  --disable-software-rasterizer, --renderer-process-limit=1
  drops GPU process from 64% to 12% CPU
- Container healthchecks: 30s to 120s interval in first-boot and reconcile
- AppCard: min-height on description so cards dont shift

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 02:42:44 +01:00
Dorian
251447b17a fix: use rootless podman to check conmon ownership in doctor
Critical bug: the doctor runs as root but containers are rootless
under the archipelago user. When checking if a conmon process has an
associated container, the root podman database was queried (empty),
causing ALL conmon processes to be identified as orphaned and killed.
This terminated running containers every 30 minutes.

Fix: use sudo -u archipelago to query the rootless podman database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 23:22:28 +01:00
Dorian
768ca26e90 fix: add required capabilities to UI container specs for nginx startup
Nginx needs CHOWN, SETUID, SETGID to chown cache directories and drop
privileges on startup. LND UI additionally needs NET_BIND_SERVICE to
bind port 80 inside the container. Without these, cap-drop ALL causes
nginx to crash with "Operation not permitted" on chown or bind.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 23:03:27 +01:00
Dorian
4dd3d29dc4 fix: run rootless podman commands as archipelago user in doctor
The doctor runs as root (for tor permissions, process cleanup) but
containers are rootless under the archipelago user. Use sudo -u to
switch user context for podman commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:49:36 +01:00
Dorian
d67c636988 fix: add stopped core container restart to doctor
Rootless Podman 4.x restart policies don't auto-restart containers
after crashes. The doctor (which runs on a timer) now checks for
exited core containers (tiers 0-2) and restarts them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:46:06 +01:00
Dorian
e8c363e4f5 fix: escape quotes in electrumx health check for eval pass-through
The health check command goes through multiple shell layers
(assignment → variable expansion → eval → podman → sh -c). Inner
double quotes need \\\" escaping to survive as literal " in Python.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:29:28 +01:00
Dorian
dd0c3982b0 fix: use python3 socket health check for electrumx (no curl in image)
The electrumx container image doesn't include curl. Replace the HTTP
health check with a Python socket connection test to the RPC port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:19:46 +01:00
Dorian
bc6b4e0bec fix: add DAC_OVERRIDE cap for rootless volume access, fix LND health check
- electrumx: add DAC_OVERRIDE to SPEC_CAPS — rootless podman maps container
  UID 0 to host UID 1000, but volumes are owned by host UID 100000; without
  DAC_OVERRIDE the container can't write to its own data directory
- lnd: replace curl-based health check with lncli using readonly macaroon —
  the REST API requires macaroon auth, so unauthenticated curl always fails
- grafana: add DAC_OVERRIDE to SPEC_CAPS for the same rootless volume issue

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:14:01 +01:00
Dorian
e8735b39ec feat: TASK-49 container reliability — tests, orchestration, MASTER_PLAN
- Add orchestration_tests.rs + mock_podman.rs (container unit tests)
- Add container-tests.yml CI workflow
- Add dev-container-test.sh for local testing
- MASTER_PLAN.md: add TASK-49 (P0) with 6-phase plan
- Login.vue: minor fixes from user testing
- AppCard.vue: enter key handler fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:15:56 +01:00
Dorian
9b49ab6d99 feat: TUI updates — ASCII block logo, install demo script
- archipelago-menu.sh: replace box-drawing banner with ASCII block
  letter logo (ARCHIPELAGO in chunky block chars)
- scripts/install-tui-demo.sh: standalone TUI demo with all animations
  (boot scan, decrypt reveal, progress bars, bouncing BTC symbol,
  CRT transitions, celebration effects)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 16:08:41 +01:00
Dorian
d25969e2e5 fix: align image-versions.sh with registry, PATH for reboot
- image-versions.sh: fix 15+ tag mismatches against actual registry
  (bitcoin-knots:28.1→latest, lnd:v0.18.5→v0.18.4, grafana:11.4→10.2,
  vaultwarden:1.32.5→1.30.0-alpine, nextcloud:29→28, etc.)
- .bashrc: add /sbin:/usr/sbin to PATH so reboot/shutdown work
- Tailscale: add Arch Atob node (100.113.33.31)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 14:25:13 +01:00
Dorian
7f03e39f58 feat: onboarding polish, splash screen, controller nav, dev script
Onboarding flow:
- Intro: improved layout and transitions
- DID: better card styling and responsiveness
- Path: added visual enhancements
- Backup/Identity/Verify: streamlined markup
- SplashScreen component added

UI:
- Controller navigation improvements (useControllerNav)
- Style.css refinements

Backend:
- Runtime package fix

Dev:
- dev-start.sh improvements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:41:52 +00:00
Dorian
013b724e02 feat: add boot branding dev option (0) to dev-start.sh
Option 0 in dev-start.sh launches the branding development workflow:
- Finds latest ISO on Desktop or results/
- Patches branding files into the ISO
- Boots in QEMU for immediate visual feedback
- Lists editable files if no ISO is available

Edit background.png, theme.txt, or Plymouth files, re-run option 0,
see changes in ~10 seconds without a full CI build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:34:29 +00:00
Dorian
f940b4562a fix: filebrowser port bind, CSRF in tests, console-setup, auto-test scope
FileBrowser crash fix:
- Add --cap-add=NET_BIND_SERVICE (port 80 needs it with --cap-drop=ALL)
- Add --cap-add=DAC_OVERRIDE for rootless volume access
- Both in first-boot script and backend config.rs

Test script fixes:
- Extract csrf_token cookie and send as X-CSRF-Token header on RPC calls
- Add --phase1-only flag for safe install-only checks (no side effects)
- Auto-test service uses --phase1-only so it doesn't steal onboarding

Install fixes:
- Pre-create ~/.local/share/containers (ReadWritePaths mount namespace error)
- Fix console-setup.service: add After=tmp.mount + ExecStartPre mkdir /tmp

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 17:17:18 +00:00
Dorian
127a36c5c8 fix: production onboarding, CI tests, container security, keyboard nav
Install & Onboarding:
- Remove DEV_MODE=true from production ISO service file (auto-created
  users, skipped password setup)
- Auto-install no longer overwrites rootfs service file with bad template
- Login.vue always checks auth.isSetup — shows password creation form
  on fresh install without requiring dev build flag
- Deploy image-versions.sh to /opt/archipelago/scripts/ on installed nodes
- First-boot-containers sources image-versions.sh, runs podman as
  archipelago user (rootless), enables linger + podman.socket
- Correct volume ownership (100000:100000 for rootless UID mapping)

Container Security:
- FileBrowser: add --cap-add=DAC_OVERRIDE for rootless podman volume access
- FileBrowser: add --read-only, /data volume for database, proper cmd args
- First-boot script matches backend config (security hardening + health check)

CI Pipeline:
- Add vue-tsc type check + vitest run to build-iso.yml (runs every push)
- Add post-install-tests.yml workflow (workflow_dispatch, SSH to target)
- Build report: set +eo pipefail, fix rootfs path, add || true guards
- Bundle run-post-install-tests.sh into ISO

E2E Test Suite (scripts/run-post-install-tests.sh):
- Phase 1: Install verification (files, services, podman, linger, DEV_MODE check)
- Phase 2: Onboarding flow (auth.isSetup, auth.setup, login, DID, complete)
- Phase 3: Container lifecycle (install 3 apps via package.install RPC,
  verify running, stop, verify stopped, restart, verify running, health)
- Phase 4: Log verification (first-boot log, diagnostics, journal errors)
- Correct package.install params: {"id", "dockerImage"}

Frontend:
- Fix backdrop-filter tab-switch bug (keep animations paused during rebuild)
- Dashboard glitch animations paused during tab-hidden
- Gamepad nav: auto-focus first container on route change
- Tab roving: Left/Right on role="tab" cycles and activates sibling tabs
- ContainerApps: data-controller-launch on running app cards
- 515 tests passing (fixed 30 broken, added 19 new keyboard nav tests)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:16:57 +00:00
Dorian
320c9f5b19 fix: container install flow, filebrowser auth, AppCard enrichment
- Fix .198-style fresh installs: systemd service ExecStartPre creates
  /run/user/1000, enable podman.socket, chmod 644 /etc/hosts
- Filebrowser: add /data volume for database (fixes read-only crash),
  secure auth with random password via backend RPC (no more admin/admin)
- AppCard: enrich installing state with marketplace metadata (icon,
  title, description, tier badge, author, version)
- Registry: btcpayserver 1.13.5 → 1.13.7, images mirrored
- ReadWritePaths: add home container paths for rootless podman

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:32:54 +00:00
Dorian
ae13c0dad2 feat: migrate all container images to Archipelago app registry
All container image references now pull from 80.71.235.15:3000/archipelago/
instead of Docker Hub and ghcr.io. image-versions.sh is the single source
of truth; all scripts use $*_IMAGE variables instead of hardcoded refs.

Files updated:
- scripts/image-versions.sh: central ARCHY_REGISTRY variable
- core/*/config.rs: registry whitelist includes app registry
- core/*/stacks.rs: Immich + Penpot stack images
- scripts/{first-boot,deploy-to-target,container-specs}.sh: use variables
- docker/*/Dockerfile: nginx base image from registry
- image-recipe/: ISO build, podman config, menu script
- scripts/{container-doctor,deploy-bitcoin-knots,fix-indeedhub,validate-app-manifest}.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:06:21 +00:00
Dorian
35f1aa2e13 fix: move mobile nav outside main for fixed positioning, add container scripts
- Dashboard.vue: move DashboardMobileNav outside <main> so position:fixed
  isn't broken by will-change:transform on the perspective container
- Add container-specs.sh and reconcile-containers.sh utility scripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 18:13:22 +00:00
Dorian
0e0c97c203 feat: architecture review fixes, self-update system, CI pipeline, supply chain hardening
Architecture review (all P0+P1 issues now fixed):
- Add 10s timeout to 6 bare Nostr client.connect() calls
- Pin all 12 crypto deps to exact versions from Cargo.lock
- Pin all 15 floating container image tags to exact patch versions
- Add CI pipeline (cargo fmt + clippy + tests, frontend type-check + build)

Self-update system (git.tx1138.com):
- scripts/self-update.sh: pull, build, install, restart with rollback
- systemd timer checks daily at 3 AM
- update.check RPC does git-based checks when repo is present
- update.git-apply RPC triggers self-update from UI
- Default update URL changed from GitHub to git.tx1138.com
- Git added to ISO package list for fresh installs

Documentation:
- CHANGELOG v1.3.1 with all changes
- README updated (version, update system section)
- BETA-PROGRESS session #6 logged
- architecture-review.html: 4 issues marked FIXED, 8/12 refactoring done

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 15:52:26 +00:00
Dorian
a7920de824 fix: correct health check endpoints for fedimint, nextcloud, filebrowser
- Fedimint: check port 8175 (UI) not 8174 (websocket API)
- Nextcloud: check / not /status.php (returns 302 during setup)
- FileBrowser: check / not /health (endpoint doesn't exist)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 00:47:49 +00:00
Dorian
06d85e1d6f fix: health check escaping for SSH heredoc context
- Remove || exit 1 from health-cmd (redundant, breaks SSH heredoc)
- Use --health-cmd 'cmd' format (space, not equals) for proper quoting
- Simplify bitcoin health check to bitcoin-cli getnetworkinfo (no creds needed)
- Fix MariaDB health check nested quote issue

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:45:32 +00:00
Dorian
f5802f9ed0 fix: LND config escaping in SSH heredoc, Tailscale fallback for build source
- Fix shell escaping in LND config sync block (single-quoted SSH context
  doesn't need backslash-escaped dollars)
- deploy-tailscale.sh BUILD_SOURCE auto-detects Tailscale IP when LAN
  unreachable (fixes "No binary on .228" error)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 17:01:02 +00:00
Dorian
028248dfd7 fix: suppress tar xattr spam in AIUI deploy
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 16:54:54 +00:00
Dorian
f5714a5b2e fix: fleet deploy falls back to Tailscale when LAN unreachable
- Add --all as alias for --fleet
- Fleet deploy auto-detects Tailscale IP when LAN SSH fails
- Skip .198 gracefully when unreachable instead of failing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 16:51:49 +00:00
Dorian
d37165ca52 fix: deploy credential sync, health checks, rootless port binding
- LND config always synced with secrets/bitcoin-rpc-password before
  starting (both deploy scripts) — fixes 401 auth errors on all nodes
- Replace eval "$DB_PASSWORDS" with safe individual SSH reads in
  deploy-tailscale.sh (eliminates command injection risk)
- Add MariaDB password sync step after container start (ALTER USER)
- Add --health-cmd to all 25 containers in deploy-tailscale.sh
- FileBrowser uses --user 0:0 for rootless port 80 binding (both scripts)
- Fedimint env var fixed: FM_REL_NOTES_ACK=0_4_xyz

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 14:16:11 +00:00
Dorian
13e4a738be bug fixing and deploy and build diagnostics 2026-03-22 03:30:21 +00:00
Dorian
24f86632d0 feat: add E2E smoke test script and CI/CD pipeline plan
- Create scripts/smoke-test.sh for live server verification (7 checks)
- Document planned GitHub Actions CI/CD pipeline in docs/ci-cd-plan.md
- Integration tests deferred to future task (require test harness setup)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:08:00 +00:00
Dorian
5099f6f763 refactor: create shared script library, fix ISO image pinning, document planned splits
- S21: Create scripts/lib/common.sh with shared logging, SSH, health check, mem_limit functions
- S18: Source common.sh from deploy-to-target.sh, deploy-tailscale.sh, first-boot-containers.sh
- S16: Fix 2 hardcoded images in ISO build, add missing image variables
- S19: Document planned 7-module split of build-auto-installer-iso.sh
- S20: Document planned 8-module split of first-boot-containers.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:06:29 +00:00
Dorian
8e4d352393 fix: deploy error visibility, trap cleanup, variable quoting, frontend resilience
- S10: Add warnings to silent health check failures in deploy scripts
- S11: Add trap cleanup for temp dirs in deploy and tailscale scripts
- S12: Quote 20+ critical unquoted variables across deploy scripts
- S13: Extract hardcoded IPs to deploy-config-defaults.sh
- S15: Add --memory=256m to UI container runs
- F16: Remove in-memory JWT, use cookie-only auth in filebrowser client
- F17: Add meta tag fallback for CSRF token in RPC client
- F19: Track and clear setTimeout in AppSession on unmount

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:06:08 +00:00
Dorian
5c3a3ffa8e fix: systemd resource limits, Tor rotation transition, unwrap elimination, RPC timeouts
- I2: Add MemoryMax=4G, LimitNOFILE=65535, TasksMax=2048 to systemd service
- I3: Tor rotation keeps old service for 1h transition before cleanup
- R14: Replace .parse().unwrap() with .unwrap_or(localhost) in rate limiter
- R15: Replace 7 unwrap/expect in mesh protocol with proper error propagation
- R27: Add 10s timeouts to mesh Bitcoin RPC calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:46:40 +00:00
Dorian
2f60ef44ea fix: deploy locking, safe eval replacement, first-boot error handling, script hardening
- S4: Add Bitcoin readiness gate and container tracking with final summary
- S5: Replace eval "$DB_PASSWORDS" with safe case-based variable parsing
- S6: Add deploy locking with stale lock detection (30min timeout)
- S7: Deploy rollback already implemented — verified existing mechanism
- S8: Switch trust-archipelago-cert.sh to SSH key auth, sshpass as fallback
- S9: Pipe MariaDB SQL via stdin to avoid password in ps output
- S17: Add disk space pre-flight check (abort if >85% full)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:39:22 +00:00
Dorian
3b7d541224 fix: WebSocket reconnect state refresh, listener leak fixes, pin container images
- F4: Fetch fresh server state after WebSocket reconnect
- F5: Guard message polling timer with auth check, stop on logout
- F6: Remove NIP-07 listener in appLauncher close()
- F7: Initialize audio player once to prevent listener stacking
- S3: Pin all container images to specific versions, create image-versions.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:32:28 +00:00
Dorian
38dc845f57 fix: WebSocket race conditions, Vue error handler, remove sudo podman, add container health checks
- F1: Guard connectWebSocket against concurrent calls with isWsConnecting flag
- F2: Serialize mesh send operations with sendQueue to prevent fetchMessages races
- F3: Add global Vue error handler with toast notification
- S1: Replace sudo podman with podman across all scripts (rootless Podman)
- S2: Add health-cmd to all 40 container run commands in first-boot-containers.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:11:05 +00:00
Dorian
b31148a8b7 fix: rpcauth credentials, reboot survival, system Tor for all containers
- Bitcoin RPC: switch to rpcauth (salted hash in bitcoin.conf, no plaintext
  in config or CLI). Password stable across reboots/restarts/deploys.
- Remove daily-reboot-test.sh cron on both servers
- Enable podman-restart.service for container auto-start after reboot
- System Tor: SocksPort 0.0.0.0:9050 with SocksPolicy for container access
- LND: tor.socks=host.containers.internal:9050 (system Tor, not container)
- Bitcoin: -proxy=host.containers.internal:9050 for Tor outbound
- bitcoin_rpc.rs: reads from secrets file, cached, stable credentials
- package.rs: dynamic rpc_user/rpc_pass, rpcauth hash generation
- network.rs: fix missing send_to_peer args (mesh encryption update)
- first-boot-containers.sh: rpcauth generation, system Tor config
- deploy-to-target.sh: rpcauth credentials, LND config migration
- Mesh: encrypted channel message support (ChaCha20-Poly1305 updates)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 11:56:20 +00:00
Dorian
b4d204d1d6 feat: reboot button in Settings with password confirmation
- system.reboot RPC endpoint requires password re-verification
- Uses systemd path unit pattern (tor-helper.sh) for privilege escalation
- 2-second delay before reboot to allow RPC response to reach client
- Clean UI: password input modal, loading state, error feedback

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 10:48:06 +00:00
Dorian
fc1120338d fix: Tor management system, bug fixes, federation name sync
Major changes:
- Full Tor hidden service management via systemd path unit pattern
  (tor-helper.sh + archipelago-tor-helper.path/service) — respects
  NoNewPrivileges=yes, no sudo needed from backend
- Container doctor: prefer system Tor over container, remove archy-tor
- Deploy script: fix torrc generation (read correct services.json path),
  web apps map port 80→local port, enable both tor and tor@default
- Federation: server rename pushes name to peers via background sync
- Server name: fix root-owned file, optimistic store update
- Mesh: local echo for sent messages, sendingArch loading state
- Web5: Message button → Mesh redirect, node name lookup in messages
- PeerFiles: show DID not onion in header
- Connected Nodes: flex-1 instead of fixed max-h
- Toast notifications route to Mesh
- Deploy script: fix single-quote syntax in SSH block

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 02:59:29 +00:00
Dorian
f8794791f3 feat: DID persistence + federation node names in sync
Part 1 — DID Persistence:
- Deploy script creates /var/lib/archipelago/identity/ directory
- First-boot script creates identity dir with proper ownership
- Identity load now logs pubkey to confirm persistence across restarts

Part 2 — Node Names:
- NodeStateSnapshot includes node_name field
- build_local_state() passes server name to sync responses
- update_node_state() stores peer's announced name on the FederatedNode
- Names propagate automatically during federation.sync-state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 19:19:13 +00:00
Dorian
86df0bcaf2 fix: LND Connect bulletproof — CORS, credentials, memory limits, restart policy
Ensures LND Connect works through every deployment path:
- Nginx: CORS $http_origin on /lnd-connect-info (both HTTP+HTTPS)
- Nginx: no cookie gate (backend is 127.0.0.1-only)
- LND UI source: fetch with credentials: 'include'
- Deploy: rebuilds LND UI with --no-cache every deploy
- First-boot: --restart unless-stopped + memory limits on UI containers
- Backend: bound to 127.0.0.1:5678 in systemd service

Root cause was CORS: LND UI on :8081 fetching :80 is cross-origin.
Browser blocked reading the 200 response without CORS headers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 17:17:14 +00:00
Dorian
f20f0650cf feat: Discover view, Fleet dashboard, MeshMap, type fixes
- New Discover.vue (app store redesign)
- Fleet.vue dashboard for .228
- MeshMap.vue component
- Fixed Discover.vue type errors (unused var, type predicate)
- Various UI updates (Apps, Dashboard, Marketplace, Mesh, Web5)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:12:01 +00:00