The kiosk chromium pinned ~92% of a core (software-compositing spin from
--enable-gpu-rasterization on a GPU-less/headless node), saturating the machine
and starving the backend + container builds — it caused the .198 receive timeout
and the deploy storms.
- archipelago-kiosk.service: CPUQuota=75% + MemoryMax/High + Delegate, so a
runaway kiosk can never take the whole node down.
- archipelago-kiosk-launcher.sh: detect /dev/dri — use GPU rasterization only
when a GPU exists, else --disable-gpu (avoids the headless spin).
- bootstrap::ensure_kiosk_hardened: OTA self-heal that installs the updated
unit+launcher on already-deployed nodes, daemon-reloads, and only try-restarts
a *running* kiosk (never re-enables an operator-disabled one).
cargo check clean; launcher bash -n clean; unit syntax valid.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On production nodes /var/lib/archipelago (the app data dir AND podman's
graphroot=/var/lib/archipelago/containers/storage) is a separate
device-mapper volume. archipelago.service ordered only After=network-online
.target, so on cold boots it (and its ExecStartPre) could start BEFORE
var-lib-archipelago.mount, write to the bare mountpoint on rootfs, fail every
podman call, exit, and be restarted every 5s until the volume mounted — the
"~20x [FAILED] Failed to start over ~5min" boot flap. Proven live on .198:
"var-lib-archipelago.mount: Directory /var/lib/archipelago to mount over is
not empty, mounting anyway" — the service had written there pre-mount.
Fix: RequiresMountsFor=/var/lib/archipelago (adds Requires= + After= on the
mount unit).
- image-recipe/configs/archipelago.service: ships the directive on fresh ISOs.
- bootstrap::ensure_archipelago_mount_ordering(): self-heals already-deployed
nodes' installed unit + daemon-reload (boot-ordering only, effective next
reboot; never restarts the running service). Idempotent; harmless on rootfs
installs (maps to the always-mounted root).
Verified on .198: after applying, systemctl shows After=var-lib-archipelago
.mount and systemd-analyze verify is clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B13 template fix only fixed fresh ISOs. Already-deployed nodes keep their
old nginx config, where /app/fedimint/ proxies to :8175 without rewriting the
Guardian UI's root-rooted asset URLs (src="/assets/...", url("/assets/...")).
Those resolve against the SPA root: bg-network.jpg exists there by luck, but
app-icons/fedimint.jpg 404s (location /assets/ uses try_files =404) — the
visibly-broken icon.
bootstrap.rs::patch_nginx_conf now heals both paths on startup:
- Style A (main conf, HTTP): swaps the old single nostr-provider sub_filter tail
for the full reroot set; byte-matches the shipped template.
- Style B (HTTPS app-proxy snippet): the snippet's fedimint block has no
sub_filter and a per-node-varying trailing directive, so anchor on the unique
:8175 proxy_pass and insert the reroot set after it (nginx ignores directive
order). Snippet added to the bootstrap nginx loop (skipped on HTTP-only nodes).
missing_* flags are now gated on their splice anchors so the included snippet
neither attempts the main-conf-only patches nor logs warn-skips every boot.
Idempotent via the 'href="/' 'href="/app/fedimint/' marker.
Verified on .198 (both paths): fedimint app-icon 404 -> 200 image/jpeg; nginx -t
OK; containers survived restart (Quadlet); idempotent steady state, no warn spam.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The B3 streaming proxy endpoint existed in the backend but nginx had no
location for /api/peer-content/*, so the browser's requests fell through to
the SPA (200 text/html) and media still wouldn't play. Add an
NGINX_PEER_CONTENT_BLOCK that bootstrap patches into every server block
(forwards Cookie for session auth + Range, proxy_buffering off). Idempotent;
covers fresh-ISO nodes too since bootstrap runs on every startup.
Verified on .198: after restart the async nginx patch lands and
/api/peer-content/<onion>/<id> returns 401 (reaches backend, auth-gated)
instead of the SPA; nginx block present in both server blocks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The LND wallet UI (served on its own app port) fetches /lnd-connect-info
and /proxy/lnd/* cross-origin, so both need correct CORS headers.
(a) Older nginx configs add their own Access-Control-Allow-Origin in the
/lnd-connect-info location on top of the one the backend sets, yielding
a DUPLICATE header that browsers reject ("multiple values"). bootstrap
now strips that redundant nginx add_header (backend owns CORS).
(b) /proxy/lnd/* returned a 401 with no CORS headers when the session
check failed, so the browser saw an opaque CORS error instead of a
readable 401. Add unauthorized_cors() and use it on that path.
Adds tests/production-quality/ (bug tracker + lnd-cors-test.sh harness).
Verified: harness 4/4 on .116, .198, .103.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The whole fleet was silently never reaching the FIPS mesh: the default
public anchor was configured as fips.v0l.io:8668/udp, but the anchor only
answers on TCP/8443. Fix the default to 185.18.221.160:8443/tcp (IPv4
literal — the hostname resolves IPv6-first and the daemon binds v4-only,
which fails the handshake with EAFNOSUPPORT), and auto-seed it in
anchors::load() so every node dials it without operator action (removal
still persists). Proven live on .116: cold start → anchor_connected in
~400ms, anchor became mesh parent.
Wire fips::update::apply() against upstream GitHub releases (stable
channel only): resolve /releases/latest → SHA256-verify the .deb against
checksums-linux.txt → install → restart. dpkg runs via `systemd-run` to
escape archipelago's ProtectSystem=strict sandbox (else /var/lib/dpkg is
read-only), with --force-confold (archipelago manages /etc/fips conffiles)
and --force-downgrade (dev builds sort newer than the stable tag).
Validated live: .116 upgraded 0.3.0-dev -> stable v0.3.0.
Also: standalone fips-ui dashboard app (apps/fips-ui + docker/fips-ui,
static nginx proxying /rpc/v1 same-origin, copiable own-anchor address);
reserve UI port 8336; register fips/fips-ui as platform-managed. Includes
the Lightning wallet cross-origin (CORS) + LND proxy auth + nginx
self-healer fix so the wallet screen connects instead of "failed to fetch".
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extracted the heal_podman_state cleanup list as a module-level
HEAL_RUNTIME_SUBDIRS const so a unit test can structurally enforce
the invariant: the list must contain "containers" + "libpod" but
must NOT contain "podman" (which holds systemd's podman.sock
listener and was the bug fixed in commit bb421803).
If anyone re-adds "podman" — accidentally, by reverting, or by
copy-paste from old plan memory — this test fires before we ship,
not on the next deploy when it nukes the orchestrator's HTTP path.
Total tests: 614 → 615.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observed live on .198: heal_podman_state was removing
$XDG_RUNTIME_DIR/podman/ alongside containers/ and libpod/. That dir
holds the systemd-bound podman.sock — the listener systemd creates for
socket-activated podman.service. Removing it broke every libpod HTTP
call from the orchestrator until `systemctl --user restart
podman.socket` ran. Far worse than any wedge it was trying to repair.
Drop podman/ from the cleanup list. The runtime state we actually want
to clean for FM6 (bolt_state.db drift) lives in containers/ and
libpod/ only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes FM6 (podman bolt_state.db / runtime drift) — observed live on
.198 today: bitcoind was running for several minutes, but podman's
state DB reported the container as Exited. The reconciler then tried
to "restart" it, racing the still-bound port 8332 and failing in a
loop.
heal_podman_state() runs as the last bootstrap stage, BEFORE the
orchestrator's reconcile loop ticks. It probes `podman info` with a
5s timeout; on failure it removes the runtime-state dirs under
$XDG_RUNTIME_DIR and re-probes. Persistent storage under
~/.local/share/containers/storage/ is never touched, so containers
re-discover from manifests on next call.
Cleanup never includes `podman system reset` or `system renumber` —
those are destructive and must stay operator-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small, focused tightenings:
- core/container/src/podman_client.rs: drop the legacy Hetzner
23.182.128.160:3000 mirror from image_uses_insecure_registry().
It was decommissioned in v1.7.x and is stripped from active
registry config at load time; leaving it in the bypass list let
a stale config still skip TLS. Replace the inline match with a
named INSECURE_REGISTRY_HOSTS slice so future entries are one
line. Test now also pins the spoofing-immune semantics
("evil.example/146.59.87.168:3000/x" must NOT match).
- core/archipelago/src/api/rpc/package/config.rs: split bitcoin
from lnd in get_app_capabilities(). bitcoind never opens raw
sockets — drop CAP_NET_RAW from bitcoin/bitcoin-core/bitcoin-knots.
lnd/fedimint/fedimint-gateway keep it because they enumerate
network interfaces during cert generation.
- core/archipelago/src/bootstrap.rs: tighten_secrets_dir()
enforces 0700 on /var/lib/archipelago/secrets and 0600 on every
file inside on each startup. The dir-mode is the load-bearing
isolation boundary against rootless container escapes (their UID
maps to >=100000, can't traverse uid=1000/0700). The per-file
sweep is defense-in-depth against any installer that wrote 0644.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshots the in-flight hardening work so subsequent reconcile/Quadlet
phases land on a clean before/after diff.
Changes:
- core/container/src/podman_client.rs: image_uses_insecure_registry()
whitelist for the OVH (146.59.87.168:3000) and legacy Hetzner
(23.182.128.160:3000) HTTP mirrors; podman_network_settings() lifts
custom networks into the Networks map so containers can join them.
- core/archipelago/src/container/prod_orchestrator.rs:
ensure_container_network() creates per-manifest networks on demand;
apply_data_uid() now goes through host_sudo for mkdir -p + chown so
bind-mount roots get created and chowned without password prompts.
- core/archipelago/src/api/rpc/package/{install,update,stacks}.rs:
podman pull adds --tls-verify=false only for whitelisted registries.
- core/archipelago/src/bootstrap.rs: removes stale dev-mode systemd
override on startup (live nodes carried it from old installers).
- core/archipelago/src/config.rs: ignore ARCHIPELAGO_DEV_MODE in prod
binaries — it had been silently rerouting volumes to /tmp.
- apps/bitcoin-{core,knots}/manifest.yml: locate bitcoind at runtime
so image-layout differences don't break entrypoint.
- scripts/app-catalog-image-smoke-test.py: production catalog/image
smoke test that probes a target node before users click Install.
- .gitignore: cover .codex, .pnpm-store, __pycache__, *.bak.
Removes filebrowser.rs.bak and two stale catalog.json.bak files
(verified identical to live counterparts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Install flow
- api/rpc/package/install.rs: always append the literal image URL as a
last-resort pull candidate in do_pull_image, so images not carried by
any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install
instead of masquerading as a generic pull failure across every mirror.
- api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat
error, not just "file exists". Once bitcoin-knots' first-boot chowns
/var/lib/archipelago/bitcoin into the container's user namespace (700
perms, UID 100100/100101), the archipelago daemon can't even traverse
in — try_exists returns Err which unwrap_or(false) treated as "not
present" and drove a doomed write. Now errors out of the directory
traversal are treated as "conf already owned by container user" and
the write is skipped. Mirrors the lnd.conf pattern.
- api/rpc/package/install.rs: drop the hardcoded `prune=550` from the
conf default. Operators with multi-TB drives shouldn't be silently
pruned; users who want a pruned node can set it in bitcoin.conf
themselves. Full archive is the only honest default.
- api/rpc/package/config.rs: bitcoin-core now passes explicit
-server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI
args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and
reads conf + argv only; without these the RPC listens on 127.0.0.1
inside the container and rootlessport can't reach it, so the
bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call.
Bitcoin Knots keeps its own entrypoint-driven defaults.
- container/docker_packages.rs: split bitcoin-core out of the shared
AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with
bitcoin-core.svg and a Reference-implementation description; the
bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home
card showing "Bitcoin Knots" for a Core install.
Bitcoin node UI (docker/bitcoin-ui)
- index.html: impl name/tagline/logo now dynamic. applyImplBranding()
reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves
to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both
get their own icon and subtitle. Settings modal replaced its
hardcoded Regtest/txindex=1/port-18443 placeholders with live values
from getblockchaininfo + getindexinfo + getzmqnotifications.
- index.html: new Storage info card (Full Archive · X GB /
Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on
the main dashboard, same level as Network. Settings modal mirrors it
with the prune height when applicable.
- Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the
bg-network.jpg used by the dashboard are now COPY'd into the image
under /usr/share/nginx/html/assets. Previously the <img src> pointed
at paths that 404'd into the SPA fallback and the onerror handler
hid the broken logo silently.
Frontend
- appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334),
HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core).
Without these the AppSessionFrame showed "No URL found for
bitcoin-core" and the home/app-list title fell through to the raw id.
- settings/AccountInfoSection.vue: backfill What's New entries for
v1.7.31 through v1.7.37 that had been missed in earlier cuts.
Release plumbing
- releases/v1.7.37-alpha/: binary + frontend tarball.
- releases/manifest.json: v1.7.37-alpha, sha256/size refreshed.
- Cargo.toml / package.json: version bumps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- core/archipelago/src/bootstrap.rs (NEW): embed scripts/container-doctor.sh
and image-recipe/configs/archipelago-doctor.{service,timer} via
include_str! and sync to disk + enable the timer on every archipelago
startup. Idempotent (content-hash compare), dev-box symlink guard keeps
the git checkout untouched, best-effort (warn-only on failure) so
bootstrap never blocks server readiness. Wired in main.rs as a
background tokio task.
- scripts/container-doctor.sh: add fix_rootless_netns_egress(). Detects
when the rootless-netns has lost its pasta tap (container-to-container
still works but outbound DNS/TCP fails) via an nsenter probe into
aardvark-dns; with a two-probe 10s debounce to rule out transients and
a host-precheck that bails out if the host itself is offline. When the
rootless-netns is truly broken, does a graceful podman stop --all /
start --all so pasta + aardvark-dns rebuild the netns from scratch.
Bitcoin-knots and every other outbound container recover in one cycle.
- core/archipelago/src/update.rs: host_sudo → pub(crate) so bootstrap.rs
can reuse the existing systemd-run escape hatch.
- apps/bitcoin-core/manifest.yml: bump app version 24.0.0 → 28.4.0 and
image bitcoin/bitcoin:24.0 → bitcoin/bitcoin:28.4. Resources aligned
with the real container-specs.sh large-disk tune (4 GiB memory cap,
cpu_limit: 0 so bitcoind can run -par=auto across every core).
- neode-ui/src/views/apps/AppCard.vue + Apps.vue: add an Update button
+ Updating spinner to every app card that has available-update set.
Wires through serverStore.updatePackage(id) — the same RPC the detail
view already calls. common.update / common.updating i18n keys added in
en.json and es.json.
- core/archipelago/src/identity_manager.rs: add create_from_signing_key()
that mirrors an existing Ed25519 key as a manager-level identity with
a deterministic id (`node-<pubkey16>`). Idempotent across restarts,
gets the hex-SVG master avatar.
- core/archipelago/src/server.rs: the auto-create path on first boot now
mirrors the node's own signing_key (seed-derived on onboarded installs)
as a "Node" identity instead of generating a random "Default" keypair.
Once this ships, the DID on the Web5 DID Status card (via node.did
RPC), the Node entry on the Identities page (via identity.list), and
the DID used for peer-to-peer connects (via server_info.pubkey) all
resolve to the same seed-derived pubkey.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>