13 Commits

Author SHA1 Message Date
archipelago
2a26576dbd fix(specs): measure DISK_GB at /var/lib/archipelago, not /
The reconcile spec for bitcoin-knots auto-enables prune=550 when
DISK_GB < 1000. DISK_GB was measured via `df /`, which on every
archy install reports the ~30 GB OS partition because user data
lives on a separate encrypted /var/lib/archipelago volume.

Result: every archy node with a 2 TB data drive was silently being
configured as a pruned node, and any bitcoin-knots container
recreated by reconcile would delete its historical blocks down to
the 550 MB prune window on next start.

Observed on .228 (2 TB box): blocks dir went from 384 GB to 926 MB
after a reconcile-triggered restart. Historical archive unrecoverable
without full re-IBD from genesis.

Fix: check /var/lib/archipelago first (where bitcoin data actually
lives). Fall back to / only on first-boot before the data partition
is mounted.
2026-04-23 09:54:16 -04:00
Dorian
56d4875b35 fix(vpn,reconcile): restore WG peers on boot + filebrowser spec drift
Follow-up to 8b7cb002 (no version bump — same v1.7.0-alpha manifest):

* WireGuard peer persistence. Kernel peer state is ephemeral; the add-peer
  RPC wrote each peer to data_dir/nostr-vpn/peers/*.json but nothing
  re-pushed them on reboot. Result on .198: wg0 came up listening with zero
  peers after last night's reboot. Added vpn::restore_wg_peers() — reads
  the peers dir, waits up to 30s for wg0 to exist, then replays each via
  `archipelago-wg add-peer`. Spawned from main.rs alongside the other
  startup tasks.
* Reconcile + filebrowser drift. scripts/container-specs.sh load_spec_
  filebrowser now declares SPEC_NETWORK="archy-net" (to match what
  first-boot-containers.sh creates) and pins the filebrowser-data volume
  + wget-style healthcheck so the reconciler stops reporting network
  drift. Without this, reconcile would kill the healthy first-boot
  filebrowser container and recreate it on bridge, breaking the archy-net
  DNS name the backend proxies to.

Manifest binary sha/size refreshed:
  6c178a76…3582cc, 40361912 bytes.
Rebuilt ISO at image-recipe/results/archipelago-installer-unbundled-x86_64.iso
(Apr 20 07:10) carries both fixes baked in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 07:10:49 -04:00
Dorian
8b7cb0029f release(v1.7.0-alpha): bump + fix git-method update + reconciler creates
Two fixes bundled into the OTA:

1. update.download hard-fail on git-path nodes. handle_update_check's git
   branch reported update_available=true + update_method="git" but never
   populated state.available_update, so update.download returned "No update
   available to download" even though the UI showed one. SystemUpdate.vue
   now routes update_method=="git" through update.git-apply (pull+rebuild+
   restart via self-update.sh); manifest-path nodes keep the download→apply
   flow. i18n strings + confirm modal added for the git path.

2. Reconciler creating containers behind the user's back. On fresh
   unbundled installs (.198, .253) archy-mempool-db and archy-btcpay-db
   materialised ~10 min after first boot because reconcile-containers.sh
   walked container-specs.sh's canonical tier list and created any
   "missing" container. reset_spec() now defaults SPEC_OPTIONAL="true",
   so reconcile is strictly a repair tool — baseline comes from
   first-boot-containers.sh (filebrowser on unbundled), everything else
   from the install RPC.

Also forces OTA trigger for nodes on 1.6.0-alpha that otherwise saw
"I'm at manifest.version, nothing to do" and skipped the refreshed 1.6
artifacts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 06:22:29 -04:00
Dorian
ff5ef2951f feat: dynamic app catalog, Gitea app polish, registry sync
App catalog served from Gitea repos (app-catalog) with 35 apps.
Nodes fetch catalog dynamically — new apps appear without frontend
rebuild. Test app added and removed to verify pipeline.

Gitea manifest updated with internal_port/nginx_proxy for iframe.
Updated catalog.json, nginx configs, app session configs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 08:20:18 -04:00
Dorian
f353c91e61 feat: botfights, discover, mobile gamepad, content handler, package config updates
Miscellaneous improvements: botfights manifest, discover page curated
apps, mobile gamepad enhancements, content HTTP handler, package
install config updates, health monitor tweaks, shared content UI,
container specs and image version updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 23:11:41 -04:00
Dorian
6cad154028 fix: add NET_RAW capability to LND container for TLS cert generation
LND crashes with "netlinkrib: address family not supported by protocol"
in rootless podman because it needs NET_RAW to enumerate network
interfaces during TLS certificate generation. Added to capabilities
in config.rs, first-boot-containers.sh, and container-specs.sh.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 23:19:09 +01:00
Dorian
31e87d98c1 fix: bulletproof first-boot container creation and install reliability
Remove the Bitcoin RPC 60-second gate that blocked 13+ dependent containers
(mempool, electrumx, btcpay, lnd, fedimint) from being created on first boot.
Containers now always get created and auto-restart via health monitor once
Bitcoin becomes responsive — the designed recovery path.

Additional hardening:
- Validate archy-net creation with retry (silent failure broke DNS)
- Verify critical images are loaded, re-load from tarballs if missing
- Create SearXNG settings.yml before container start (was missing)
- Run reconciler automatically after first-boot failures
- Add load-images as explicit systemd dependency with 900s timeout
- Propagate config write errors in install.rs (bitcoin.conf, lnd.conf)
- FileBrowser password change: retry loop (6 attempts) + 0o600 perms
- Post-start verification: detect containers that exit immediately
- Add 2s dependency waits between container starts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 18:31:00 +01:00
Dorian
768ca26e90 fix: add required capabilities to UI container specs for nginx startup
Nginx needs CHOWN, SETUID, SETGID to chown cache directories and drop
privileges on startup. LND UI additionally needs NET_BIND_SERVICE to
bind port 80 inside the container. Without these, cap-drop ALL causes
nginx to crash with "Operation not permitted" on chown or bind.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 23:03:27 +01:00
Dorian
e8c363e4f5 fix: escape quotes in electrumx health check for eval pass-through
The health check command goes through multiple shell layers
(assignment → variable expansion → eval → podman → sh -c). Inner
double quotes need \\\" escaping to survive as literal " in Python.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:29:28 +01:00
Dorian
dd0c3982b0 fix: use python3 socket health check for electrumx (no curl in image)
The electrumx container image doesn't include curl. Replace the HTTP
health check with a Python socket connection test to the RPC port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:19:46 +01:00
Dorian
bc6b4e0bec fix: add DAC_OVERRIDE cap for rootless volume access, fix LND health check
- electrumx: add DAC_OVERRIDE to SPEC_CAPS — rootless podman maps container
  UID 0 to host UID 1000, but volumes are owned by host UID 100000; without
  DAC_OVERRIDE the container can't write to its own data directory
- lnd: replace curl-based health check with lncli using readonly macaroon —
  the REST API requires macaroon auth, so unauthenticated curl always fails
- grafana: add DAC_OVERRIDE to SPEC_CAPS for the same rootless volume issue

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 22:14:01 +01:00
Dorian
ae13c0dad2 feat: migrate all container images to Archipelago app registry
All container image references now pull from 80.71.235.15:3000/archipelago/
instead of Docker Hub and ghcr.io. image-versions.sh is the single source
of truth; all scripts use $*_IMAGE variables instead of hardcoded refs.

Files updated:
- scripts/image-versions.sh: central ARCHY_REGISTRY variable
- core/*/config.rs: registry whitelist includes app registry
- core/*/stacks.rs: Immich + Penpot stack images
- scripts/{first-boot,deploy-to-target,container-specs}.sh: use variables
- docker/*/Dockerfile: nginx base image from registry
- image-recipe/: ISO build, podman config, menu script
- scripts/{container-doctor,deploy-bitcoin-knots,fix-indeedhub,validate-app-manifest}.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 14:06:21 +00:00
Dorian
35f1aa2e13 fix: move mobile nav outside main for fixed positioning, add container scripts
- Dashboard.vue: move DashboardMobileNav outside <main> so position:fixed
  isn't broken by will-change:transform on the perspective container
- Add container-specs.sh and reconcile-containers.sh utility scripts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 18:13:22 +00:00