23 Commits

Author SHA1 Message Date
archipelago
45ac9be965 fix(kiosk): cap chromium resources + drop GPU rasterization when headless (#36)
The kiosk chromium pinned ~92% of a core (software-compositing spin from
--enable-gpu-rasterization on a GPU-less/headless node), saturating the machine
and starving the backend + container builds — it caused the .198 receive timeout
and the deploy storms.

- archipelago-kiosk.service: CPUQuota=75% + MemoryMax/High + Delegate, so a
  runaway kiosk can never take the whole node down.
- archipelago-kiosk-launcher.sh: detect /dev/dri — use GPU rasterization only
  when a GPU exists, else --disable-gpu (avoids the headless spin).
- bootstrap::ensure_kiosk_hardened: OTA self-heal that installs the updated
  unit+launcher on already-deployed nodes, daemon-reloads, and only try-restarts
  a *running* kiosk (never re-enables an operator-disabled one).

cargo check clean; launcher bash -n clean; unit syntax valid.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 11:10:26 -04:00
archipelago
34b1fdc1a3 fix(boot): order archipelago.service after the data volume mount (B17)
On production nodes /var/lib/archipelago (the app data dir AND podman's
graphroot=/var/lib/archipelago/containers/storage) is a separate
device-mapper volume. archipelago.service ordered only After=network-online
.target, so on cold boots it (and its ExecStartPre) could start BEFORE
var-lib-archipelago.mount, write to the bare mountpoint on rootfs, fail every
podman call, exit, and be restarted every 5s until the volume mounted — the
"~20x [FAILED] Failed to start over ~5min" boot flap. Proven live on .198:
"var-lib-archipelago.mount: Directory /var/lib/archipelago to mount over is
not empty, mounting anyway" — the service had written there pre-mount.

Fix: RequiresMountsFor=/var/lib/archipelago (adds Requires= + After= on the
mount unit).
- image-recipe/configs/archipelago.service: ships the directive on fresh ISOs.
- bootstrap::ensure_archipelago_mount_ordering(): self-heals already-deployed
  nodes' installed unit + daemon-reload (boot-ordering only, effective next
  reboot; never restarts the running service). Idempotent; harmless on rootfs
  installs (maps to the always-mounted root).

Verified on .198: after applying, systemctl shows After=var-lib-archipelago
.mount and systemd-analyze verify is clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 03:33:29 -04:00
archipelago
2943fd0c5e style(core): cargo fmt (B1/B3/B13 follow-up — satisfy release fmt gate) 2026-06-16 03:09:18 -04:00
archipelago
987a961f4a fix(nginx): self-heal fedimint asset rewrite on deployed nodes — HTTP + HTTPS (B13)
The B13 template fix only fixed fresh ISOs. Already-deployed nodes keep their
old nginx config, where /app/fedimint/ proxies to :8175 without rewriting the
Guardian UI's root-rooted asset URLs (src="/assets/...", url("/assets/...")).
Those resolve against the SPA root: bg-network.jpg exists there by luck, but
app-icons/fedimint.jpg 404s (location /assets/ uses try_files =404) — the
visibly-broken icon.

bootstrap.rs::patch_nginx_conf now heals both paths on startup:
- Style A (main conf, HTTP): swaps the old single nostr-provider sub_filter tail
  for the full reroot set; byte-matches the shipped template.
- Style B (HTTPS app-proxy snippet): the snippet's fedimint block has no
  sub_filter and a per-node-varying trailing directive, so anchor on the unique
  :8175 proxy_pass and insert the reroot set after it (nginx ignores directive
  order). Snippet added to the bootstrap nginx loop (skipped on HTTP-only nodes).

missing_* flags are now gated on their splice anchors so the included snippet
neither attempts the main-conf-only patches nor logs warn-skips every boot.
Idempotent via the 'href="/' 'href="/app/fedimint/' marker.

Verified on .198 (both paths): fedimint app-icon 404 -> 200 image/jpeg; nginx -t
OK; containers survived restart (Quadlet); idempotent steady state, no warn spam.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 18:03:04 -04:00
archipelago
602b9cd3df fix(nginx): route /api/peer-content/* to the backend for B3 streaming
The B3 streaming proxy endpoint existed in the backend but nginx had no
location for /api/peer-content/*, so the browser's requests fell through to
the SPA (200 text/html) and media still wouldn't play. Add an
NGINX_PEER_CONTENT_BLOCK that bootstrap patches into every server block
(forwards Cookie for session auth + Range, proxy_buffering off). Idempotent;
covers fresh-ISO nodes too since bootstrap runs on every startup.

Verified on .198: after restart the async nginx patch lands and
/api/peer-content/<onion>/<id> returns 401 (reaches backend, auth-gated)
instead of the SPA; nginx block present in both server blocks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:07:39 -04:00
archipelago
1db720af13 fix(lnd): repair fleet-wide CORS on LND connect-wallet endpoints (B5)
The LND wallet UI (served on its own app port) fetches /lnd-connect-info
and /proxy/lnd/* cross-origin, so both need correct CORS headers.

(a) Older nginx configs add their own Access-Control-Allow-Origin in the
    /lnd-connect-info location on top of the one the backend sets, yielding
    a DUPLICATE header that browsers reject ("multiple values"). bootstrap
    now strips that redundant nginx add_header (backend owns CORS).
(b) /proxy/lnd/* returned a 401 with no CORS headers when the session
    check failed, so the browser saw an opaque CORS error instead of a
    readable 401. Add unauthorized_cors() and use it on that path.

Adds tests/production-quality/ (bug tracker + lnd-cors-test.sh harness).
Verified: harness 4/4 on .116, .198, .103.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:31:14 -04:00
archipelago
95f9a805b1 feat(fips): connect to public mesh anchor over TCP + wire daemon updates
The whole fleet was silently never reaching the FIPS mesh: the default
public anchor was configured as fips.v0l.io:8668/udp, but the anchor only
answers on TCP/8443. Fix the default to 185.18.221.160:8443/tcp (IPv4
literal — the hostname resolves IPv6-first and the daemon binds v4-only,
which fails the handshake with EAFNOSUPPORT), and auto-seed it in
anchors::load() so every node dials it without operator action (removal
still persists). Proven live on .116: cold start → anchor_connected in
~400ms, anchor became mesh parent.

Wire fips::update::apply() against upstream GitHub releases (stable
channel only): resolve /releases/latest → SHA256-verify the .deb against
checksums-linux.txt → install → restart. dpkg runs via `systemd-run` to
escape archipelago's ProtectSystem=strict sandbox (else /var/lib/dpkg is
read-only), with --force-confold (archipelago manages /etc/fips conffiles)
and --force-downgrade (dev builds sort newer than the stable tag).
Validated live: .116 upgraded 0.3.0-dev -> stable v0.3.0.

Also: standalone fips-ui dashboard app (apps/fips-ui + docker/fips-ui,
static nginx proxying /rpc/v1 same-origin, copiable own-anchor address);
reserve UI port 8336; register fips/fips-ui as platform-managed. Includes
the Lightning wallet cross-origin (CORS) + LND proxy auth + nginx
self-healer fix so the wallet screen connects instead of "failed to fetch".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:41:48 -04:00
archipelago
e05e356d64 chore: release v1.7.58-alpha 2026-05-17 18:40:50 -04:00
Dorian
b8053c00ca fix: clear stale health notifications 2026-05-14 08:57:54 -04:00
Dorian
be50dc3235 fix: avoid bootstrap bitcoin restarts 2026-05-14 00:03:16 -04:00
archipelago
c0751e2551 chore(release): stage v1.7.54-alpha 2026-05-06 09:23:57 -04:00
archipelago
1a0d8a432c chore(release): stage v1.7.53-alpha 2026-05-05 13:59:50 -04:00
archipelago
745cb1c626 chore(release): stage v1.7.52-alpha 2026-05-05 11:29:18 -04:00
archipelago
c55a4f4e86 test(bootstrap): regression gate for the heal_podman_state socket bug
Extracted the heal_podman_state cleanup list as a module-level
HEAL_RUNTIME_SUBDIRS const so a unit test can structurally enforce
the invariant: the list must contain "containers" + "libpod" but
must NOT contain "podman" (which holds systemd's podman.sock
listener and was the bug fixed in commit bb421803).

If anyone re-adds "podman" — accidentally, by reverting, or by
copy-paste from old plan memory — this test fires before we ship,
not on the next deploy when it nukes the orchestrator's HTTP path.

Total tests: 614 → 615.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:32:59 -04:00
archipelago
5be2febe13 fix(bootstrap): don't nuke podman socket dir during runtime self-heal
Observed live on .198: heal_podman_state was removing
$XDG_RUNTIME_DIR/podman/ alongside containers/ and libpod/. That dir
holds the systemd-bound podman.sock — the listener systemd creates for
socket-activated podman.service. Removing it broke every libpod HTTP
call from the orchestrator until `systemctl --user restart
podman.socket` ran. Far worse than any wedge it was trying to repair.

Drop podman/ from the cleanup list. The runtime state we actually want
to clean for FM6 (bolt_state.db drift) lives in containers/ and
libpod/ only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:57:15 -04:00
archipelago
8f13298805 fix(bootstrap): self-heal wedged podman runtime state at startup
Closes FM6 (podman bolt_state.db / runtime drift) — observed live on
.198 today: bitcoind was running for several minutes, but podman's
state DB reported the container as Exited. The reconciler then tried
to "restart" it, racing the still-bound port 8332 and failing in a
loop.

heal_podman_state() runs as the last bootstrap stage, BEFORE the
orchestrator's reconcile loop ticks. It probes `podman info` with a
5s timeout; on failure it removes the runtime-state dirs under
$XDG_RUNTIME_DIR and re-probes. Persistent storage under
~/.local/share/containers/storage/ is never touched, so containers
re-discover from manifests on next call.

Cleanup never includes `podman system reset` or `system renumber` —
those are destructive and must stay operator-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:23:36 -04:00
archipelago
2bf8181110 refactor(security): tighten capability + TLS-bypass surface
Three small, focused tightenings:

- core/container/src/podman_client.rs: drop the legacy Hetzner
  23.182.128.160:3000 mirror from image_uses_insecure_registry().
  It was decommissioned in v1.7.x and is stripped from active
  registry config at load time; leaving it in the bypass list let
  a stale config still skip TLS. Replace the inline match with a
  named INSECURE_REGISTRY_HOSTS slice so future entries are one
  line. Test now also pins the spoofing-immune semantics
  ("evil.example/146.59.87.168:3000/x" must NOT match).

- core/archipelago/src/api/rpc/package/config.rs: split bitcoin
  from lnd in get_app_capabilities(). bitcoind never opens raw
  sockets — drop CAP_NET_RAW from bitcoin/bitcoin-core/bitcoin-knots.
  lnd/fedimint/fedimint-gateway keep it because they enumerate
  network interfaces during cert generation.

- core/archipelago/src/bootstrap.rs: tighten_secrets_dir()
  enforces 0700 on /var/lib/archipelago/secrets and 0600 on every
  file inside on each startup. The dir-mode is the load-bearing
  isolation boundary against rootless container escapes (their UID
  maps to >=100000, can't traverse uid=1000/0700). The per-file
  sweep is defense-in-depth against any installer that wrote 0644.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:59:11 -04:00
archipelago
0684491072 chore: baseline codex hardening before lifecycle refactor
Snapshots the in-flight hardening work so subsequent reconcile/Quadlet
phases land on a clean before/after diff.

Changes:
- core/container/src/podman_client.rs: image_uses_insecure_registry()
  whitelist for the OVH (146.59.87.168:3000) and legacy Hetzner
  (23.182.128.160:3000) HTTP mirrors; podman_network_settings() lifts
  custom networks into the Networks map so containers can join them.
- core/archipelago/src/container/prod_orchestrator.rs:
  ensure_container_network() creates per-manifest networks on demand;
  apply_data_uid() now goes through host_sudo for mkdir -p + chown so
  bind-mount roots get created and chowned without password prompts.
- core/archipelago/src/api/rpc/package/{install,update,stacks}.rs:
  podman pull adds --tls-verify=false only for whitelisted registries.
- core/archipelago/src/bootstrap.rs: removes stale dev-mode systemd
  override on startup (live nodes carried it from old installers).
- core/archipelago/src/config.rs: ignore ARCHIPELAGO_DEV_MODE in prod
  binaries — it had been silently rerouting volumes to /tmp.
- apps/bitcoin-{core,knots}/manifest.yml: locate bitcoind at runtime
  so image-layout differences don't break entrypoint.
- scripts/app-catalog-image-smoke-test.py: production catalog/image
  smoke test that probes a target node before users click Install.
- .gitignore: cover .codex, .pnpm-store, __pycache__, *.bak.

Removes filebrowser.rs.bak and two stale catalog.json.bak files
(verified identical to live counterparts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:52:29 -04:00
archipelago
05e6c2e738 fix: release v1.7.51-alpha install hardening 2026-05-01 05:02:39 -04:00
archipelago
be9f9528c3 fix: release v1.7.50-alpha OTA runtime repair 2026-05-01 03:14:07 -04:00
archipelago
8f83b37d51 feat(orchestrator): complete container migration and release hardening 2026-04-28 15:00:58 -04:00
Dorian
cfc98c600e release(v1.7.37-alpha): bitcoin-core install fixes + dynamic node UI + full-archive default
Install flow
- api/rpc/package/install.rs: always append the literal image URL as a
  last-resort pull candidate in do_pull_image, so images not carried by
  any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install
  instead of masquerading as a generic pull failure across every mirror.
- api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat
  error, not just "file exists". Once bitcoin-knots' first-boot chowns
  /var/lib/archipelago/bitcoin into the container's user namespace (700
  perms, UID 100100/100101), the archipelago daemon can't even traverse
  in — try_exists returns Err which unwrap_or(false) treated as "not
  present" and drove a doomed write. Now errors out of the directory
  traversal are treated as "conf already owned by container user" and
  the write is skipped. Mirrors the lnd.conf pattern.
- api/rpc/package/install.rs: drop the hardcoded `prune=550` from the
  conf default. Operators with multi-TB drives shouldn't be silently
  pruned; users who want a pruned node can set it in bitcoin.conf
  themselves. Full archive is the only honest default.
- api/rpc/package/config.rs: bitcoin-core now passes explicit
  -server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI
  args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and
  reads conf + argv only; without these the RPC listens on 127.0.0.1
  inside the container and rootlessport can't reach it, so the
  bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call.
  Bitcoin Knots keeps its own entrypoint-driven defaults.
- container/docker_packages.rs: split bitcoin-core out of the shared
  AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with
  bitcoin-core.svg and a Reference-implementation description; the
  bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home
  card showing "Bitcoin Knots" for a Core install.

Bitcoin node UI (docker/bitcoin-ui)
- index.html: impl name/tagline/logo now dynamic. applyImplBranding()
  reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves
  to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both
  get their own icon and subtitle. Settings modal replaced its
  hardcoded Regtest/txindex=1/port-18443 placeholders with live values
  from getblockchaininfo + getindexinfo + getzmqnotifications.
- index.html: new Storage info card (Full Archive · X GB /
  Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on
  the main dashboard, same level as Network. Settings modal mirrors it
  with the prune height when applicable.
- Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the
  bg-network.jpg used by the dashboard are now COPY'd into the image
  under /usr/share/nginx/html/assets. Previously the <img src> pointed
  at paths that 404'd into the SPA fallback and the onerror handler
  hid the broken logo silently.

Frontend
- appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334),
  HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core).
  Without these the AppSessionFrame showed "No URL found for
  bitcoin-core" and the home/app-list title fell through to the raw id.
- settings/AccountInfoSection.vue: backfill What's New entries for
  v1.7.31 through v1.7.37 that had been missed in earlier cuts.

Release plumbing
- releases/v1.7.37-alpha/: binary + frontend tarball.
- releases/manifest.json: v1.7.37-alpha, sha256/size refreshed.
- Cargo.toml / package.json: version bumps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:03:47 -04:00
Dorian
a7048f6d8e release(v1.7.35-alpha): rootless-netns self-heal + app update button + bitcoin-core 28.4 + Node DID unification
- core/archipelago/src/bootstrap.rs (NEW): embed scripts/container-doctor.sh
  and image-recipe/configs/archipelago-doctor.{service,timer} via
  include_str! and sync to disk + enable the timer on every archipelago
  startup. Idempotent (content-hash compare), dev-box symlink guard keeps
  the git checkout untouched, best-effort (warn-only on failure) so
  bootstrap never blocks server readiness. Wired in main.rs as a
  background tokio task.
- scripts/container-doctor.sh: add fix_rootless_netns_egress(). Detects
  when the rootless-netns has lost its pasta tap (container-to-container
  still works but outbound DNS/TCP fails) via an nsenter probe into
  aardvark-dns; with a two-probe 10s debounce to rule out transients and
  a host-precheck that bails out if the host itself is offline. When the
  rootless-netns is truly broken, does a graceful podman stop --all /
  start --all so pasta + aardvark-dns rebuild the netns from scratch.
  Bitcoin-knots and every other outbound container recover in one cycle.
- core/archipelago/src/update.rs: host_sudo → pub(crate) so bootstrap.rs
  can reuse the existing systemd-run escape hatch.
- apps/bitcoin-core/manifest.yml: bump app version 24.0.0 → 28.4.0 and
  image bitcoin/bitcoin:24.0 → bitcoin/bitcoin:28.4. Resources aligned
  with the real container-specs.sh large-disk tune (4 GiB memory cap,
  cpu_limit: 0 so bitcoind can run -par=auto across every core).
- neode-ui/src/views/apps/AppCard.vue + Apps.vue: add an Update button
  + Updating spinner to every app card that has available-update set.
  Wires through serverStore.updatePackage(id) — the same RPC the detail
  view already calls. common.update / common.updating i18n keys added in
  en.json and es.json.
- core/archipelago/src/identity_manager.rs: add create_from_signing_key()
  that mirrors an existing Ed25519 key as a manager-level identity with
  a deterministic id (`node-<pubkey16>`). Idempotent across restarts,
  gets the hex-SVG master avatar.
- core/archipelago/src/server.rs: the auto-create path on first boot now
  mirrors the node's own signing_key (seed-derived on onboarded installs)
  as a "Node" identity instead of generating a random "Default" keypair.
  Once this ships, the DID on the Web5 DID Status card (via node.did
  RPC), the Node entry on the Identities page (via identity.list), and
  the DID used for peer-to-peer connects (via server_info.pubkey) all
  resolve to the same seed-derived pubkey.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 08:29:56 -04:00