9 Commits

Author SHA1 Message Date
archipelago
2c8c99fd28 fix(security): bind seq into mesh signatures (v2 preimage), guard DID slice, cfg-gate dev password
- mesh: verify_signature accepts a v2 preimage (t,v,ts,seq) alongside
  legacy v1 (t,v,ts); signed_with_seq() is the v2 sender path, not yet
  wired — senders stay v1 until the fleet verifies v2 (receivers
  hard-drop bad sigs, so flipping send-side first would break
  mixed-fleet alerts). Tests: v2 verify, v2 seq-tamper rejection,
  v1 sign-then-set-seq compat.
- mesh listener: malformed radio-supplied DID shorter than the
  'did🔑' prefix can no longer panic advert_name (slice -> .get()).
- auth: the pre-setup password123 dev login and the constant itself are
  now #[cfg(debug_assertions)] — no release binary carries the bypass,
  whatever its runtime config says.
- orchestrator: canned host-facts under #[cfg(test)] — awaiting real
  subprocesses under tokio's paused test clock deadlocks against
  auto-advanced timers (the old blocking detection only worked by never
  yielding).
- drop two now-unused std::process::Command imports left by 4c75bb3d.

Tests: mesh 110/110 (incl. 2 new), api 68/68, container 159/159,
archipelago-container check clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 17:49:52 -04:00
archipelago
9020b8526c fix(security): stop trusting client-supplied forwarded headers in rate limiting
extract_client_ip took X-Real-IP/X-Forwarded-For from any request, so
a client talking to the backend directly (the FIPS peer listener, or
any non-proxy path) could rotate a fake IP per request and never trip
the login rate limiter. The accept loop now records the TCP peer
address in request extensions, and forwarded headers are honored only
when the connection itself is from loopback — where nginx overwrites
X-Real-IP with the real client address. Direct connections bucket
under their socket IP.

§C of the 1.8.0 hardening plan; 3 new unit tests cover the
loopback/direct/no-header matrix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 15:48:07 -04:00
archipelago
bd7edb4376 feat(update): deepen post-OTA verification beyond a frontend 200
verify_pending_update previously cleared the rollback marker on any
2xx/3xx from GET / — a release with a dead RPC API or broken podman
access passed and never rolled back. Verification now requires, in the
same attempt: the frontend via nginx, backend RPC liveness (an
unauthenticated POST /rpc/v1 — 401 proves the stack is up, 5xx/404/
refused fails it), and rootless podman reachability. A pre-loop check
also asserts the running binary's version matches what the marker says
was applied, catching a silent or half swap deterministically.

Per-app container assertions are deliberately excluded: the
pre-Quadlet service restart legitimately takes containers down and the
boot reconciler can need minutes for heavy apps — that would
false-rollback healthy updates. Revisit after the Phase-3 flip.

§B of the 1.8.0 hardening plan; update suite 38/38 green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 13:50:00 -04:00
archipelago
4b4a1f88fb feat(security): enforce trusted-registry image policy at the orchestrator pull sites
Catalog- and manifest-supplied image refs reached pull_image without
ever passing the RPC boundary's validator — a malicious catalog entry
or manifest could pull from an arbitrary registry. The allowlist now
lives in container::image_policy (the RPC check delegates to it) and
both orchestrator pull sites (install_fresh and
ensure_resolved_source_available) refuse refs that fail it.

The shared policy accepts trusted-registry refs and registry-less
Docker Hub shorthand (grafana/grafana etc., used by 8 shipped
manifests — a registry-less ref cannot name an attacker host), and
rejects explicit non-allowlisted hosts, shell metacharacters, and
malformed refs. §A of the 1.8.0 hardening plan.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 09:52:31 -04:00
archipelago
4c75bb3d38 perf(async): remove blocking std::process::Command from async paths
Every production process spawn reachable from a tokio worker now uses
tokio::process: the install path's podman-port probe, the dependencies
disk check, factory-reset restart, config host-IP detection, the
orchestrator's host-facts helpers (resolve_dynamic_env and its call
sites made async to carry it through), and AutoRuntime's podman/docker
probes.

The FIPS transport probe is the special case: is_available() is a sync
trait method called from async route(), so instead of blocking ~50ms
on systemctl per stale-cache hit it now serves the cached value and
refreshes on a background thread (stale-while-revalidate) — bounded
staleness, zero stalled workers.

§C of the 1.8.0 hardening plan; container/transport/config/package
suites green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 09:00:50 -04:00
archipelago
01cbec27ed fix(robustness): surface swallowed persistence-write failures + federation tombstone durability
§C of the 1.8.0 hardening plan: persistence writes whose Results were
silently dropped now log a warn/error with context (mesh contact
blocklist, scheduler state, content catalog, container registry,
update state, bitcoin relay, package install markers, server shutdown
state). §I: federation tombstones are now flushed durably in
storage/sync so cleared peers can't resurrect after a crash.

Tracker updated with shas in docs/1.8.0-RELEASE-HARDENING-PLAN.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 21:02:54 -04:00
archipelago
206d5fe8cf fix(security): origin-check the NIP-07 bridge + share-to-mesh, gate all identity methods behind consent
The nostr bridge derived the caller from the launcher's own URL and
never checked event.origin, so any co-resident iframe could pull the
node's nostr pubkey or use nip04/nip44 decrypt as an oracle while an
app was open. The bridge now rejects senders whose real origin doesn't
match the open app's origin, and every identity-sensitive method
(getPublicKey, signEvent, encrypt/decrypt) requires user consent or a
remembered per-origin approval — previously only signEvent did.

share-to-mesh in App.vue likewise accepted messages from any sender
and force-navigated to /mesh with an attacker-staged CID; it now
requires same-origin, matching Chat.vue's existing handler.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 12:53:41 -04:00
archipelago
51647b21cd feat(trust): verify release-root signature on the OTA manifest
check_for_updates now fetches the manifest as raw JSON and runs
trust::verify_detached before parsing: a tampered or wrong-signer
signature rejects the mirror outright, and unsigned manifests are
offered for MANUAL apply only — the 3 AM auto-apply scheduler refuses
them, closing the unattended remote-root hole (§A of the 1.8.0
hardening plan). UpdateState gains manifest_signed so the UI can
surface authenticity.

Publisher side: create-release.sh signs the manifest during the
release (ceremony, mnemonic via TTY/env only), publish-release-assets
hard-refuses to ship an unsigned manifest (grep + new 'ceremony
verify' cryptographic gate), and scripts/sign-manifest.sh covers
re-signing outside a release run.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 12:33:01 -04:00
archipelago
1977bdefb5 feat(trust): pin release-root anchor + ship signed app-catalog
Pin RELEASE_ROOT_PUBKEY_HEX from the 2026-07-02 release-root signing ceremony
(signer did🔑z6MkkidEnEpo6qHMCNSZoNKWtvQvxq3whnaME9wGgEFhq7ur) so nodes verify
the publisher identity of the app-catalog. Sign releases/app-catalog.json in place.

Fix two floats that made the catalog unsignable: archy-btcpay-db manifest version
-> string, fedimint-clientd cpu_limit 0.25 -> 1 (u32). Add scripts/sign-catalog.sh
helper, the 1.8.0 release-hardening plan/tracker, and the commit-and-push project
rule in CLAUDE.md.

Backward-compatible: old binaries still accept the signed catalog; the pinned-anchor
binary ships in the next build/OTA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 09:15:43 -04:00