189 Commits

Author SHA1 Message Date
archipelago
9020b8526c fix(security): stop trusting client-supplied forwarded headers in rate limiting
extract_client_ip took X-Real-IP/X-Forwarded-For from any request, so
a client talking to the backend directly (the FIPS peer listener, or
any non-proxy path) could rotate a fake IP per request and never trip
the login rate limiter. The accept loop now records the TCP peer
address in request extensions, and forwarded headers are honored only
when the connection itself is from loopback — where nginx overwrites
X-Real-IP with the real client address. Direct connections bucket
under their socket IP.

§C of the 1.8.0 hardening plan; 3 new unit tests cover the
loopback/direct/no-header matrix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 15:48:07 -04:00
archipelago
bd7edb4376 feat(update): deepen post-OTA verification beyond a frontend 200
verify_pending_update previously cleared the rollback marker on any
2xx/3xx from GET / — a release with a dead RPC API or broken podman
access passed and never rolled back. Verification now requires, in the
same attempt: the frontend via nginx, backend RPC liveness (an
unauthenticated POST /rpc/v1 — 401 proves the stack is up, 5xx/404/
refused fails it), and rootless podman reachability. A pre-loop check
also asserts the running binary's version matches what the marker says
was applied, catching a silent or half swap deterministically.

Per-app container assertions are deliberately excluded: the
pre-Quadlet service restart legitimately takes containers down and the
boot reconciler can need minutes for heavy apps — that would
false-rollback healthy updates. Revisit after the Phase-3 flip.

§B of the 1.8.0 hardening plan; update suite 38/38 green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 13:50:00 -04:00
archipelago
4b4a1f88fb feat(security): enforce trusted-registry image policy at the orchestrator pull sites
Catalog- and manifest-supplied image refs reached pull_image without
ever passing the RPC boundary's validator — a malicious catalog entry
or manifest could pull from an arbitrary registry. The allowlist now
lives in container::image_policy (the RPC check delegates to it) and
both orchestrator pull sites (install_fresh and
ensure_resolved_source_available) refuse refs that fail it.

The shared policy accepts trusted-registry refs and registry-less
Docker Hub shorthand (grafana/grafana etc., used by 8 shipped
manifests — a registry-less ref cannot name an attacker host), and
rejects explicit non-allowlisted hosts, shell metacharacters, and
malformed refs. §A of the 1.8.0 hardening plan.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 09:52:31 -04:00
archipelago
4c75bb3d38 perf(async): remove blocking std::process::Command from async paths
Every production process spawn reachable from a tokio worker now uses
tokio::process: the install path's podman-port probe, the dependencies
disk check, factory-reset restart, config host-IP detection, the
orchestrator's host-facts helpers (resolve_dynamic_env and its call
sites made async to carry it through), and AutoRuntime's podman/docker
probes.

The FIPS transport probe is the special case: is_available() is a sync
trait method called from async route(), so instead of blocking ~50ms
on systemctl per stale-cache hit it now serves the cached value and
refreshes on a background thread (stale-while-revalidate) — bounded
staleness, zero stalled workers.

§C of the 1.8.0 hardening plan; container/transport/config/package
suites green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 09:00:50 -04:00
archipelago
01cbec27ed fix(robustness): surface swallowed persistence-write failures + federation tombstone durability
§C of the 1.8.0 hardening plan: persistence writes whose Results were
silently dropped now log a warn/error with context (mesh contact
blocklist, scheduler state, content catalog, container registry,
update state, bitcoin relay, package install markers, server shutdown
state). §I: federation tombstones are now flushed durably in
storage/sync so cleared peers can't resurrect after a crash.

Tracker updated with shas in docs/1.8.0-RELEASE-HARDENING-PLAN.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 21:02:54 -04:00
archipelago
206d5fe8cf fix(security): origin-check the NIP-07 bridge + share-to-mesh, gate all identity methods behind consent
The nostr bridge derived the caller from the launcher's own URL and
never checked event.origin, so any co-resident iframe could pull the
node's nostr pubkey or use nip04/nip44 decrypt as an oracle while an
app was open. The bridge now rejects senders whose real origin doesn't
match the open app's origin, and every identity-sensitive method
(getPublicKey, signEvent, encrypt/decrypt) requires user consent or a
remembered per-origin approval — previously only signEvent did.

share-to-mesh in App.vue likewise accepted messages from any sender
and force-navigated to /mesh with an attacker-staged CID; it now
requires same-origin, matching Chat.vue's existing handler.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 12:53:41 -04:00
archipelago
51647b21cd feat(trust): verify release-root signature on the OTA manifest
check_for_updates now fetches the manifest as raw JSON and runs
trust::verify_detached before parsing: a tampered or wrong-signer
signature rejects the mirror outright, and unsigned manifests are
offered for MANUAL apply only — the 3 AM auto-apply scheduler refuses
them, closing the unattended remote-root hole (§A of the 1.8.0
hardening plan). UpdateState gains manifest_signed so the UI can
surface authenticity.

Publisher side: create-release.sh signs the manifest during the
release (ceremony, mnemonic via TTY/env only), publish-release-assets
hard-refuses to ship an unsigned manifest (grep + new 'ceremony
verify' cryptographic gate), and scripts/sign-manifest.sh covers
re-signing outside a release run.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 12:33:01 -04:00
archipelago
1977bdefb5 feat(trust): pin release-root anchor + ship signed app-catalog
Pin RELEASE_ROOT_PUBKEY_HEX from the 2026-07-02 release-root signing ceremony
(signer did🔑z6MkkidEnEpo6qHMCNSZoNKWtvQvxq3whnaME9wGgEFhq7ur) so nodes verify
the publisher identity of the app-catalog. Sign releases/app-catalog.json in place.

Fix two floats that made the catalog unsignable: archy-btcpay-db manifest version
-> string, fedimint-clientd cpu_limit 0.25 -> 1 (u32). Add scripts/sign-catalog.sh
helper, the 1.8.0 release-hardening plan/tracker, and the commit-and-push project
rule in CLAUDE.md.

Backward-compatible: old binaries still accept the signed catalog; the pinned-anchor
binary ships in the next build/OTA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 09:15:43 -04:00
archipelago
8b6485078a docs(handover): pushed-to-main state + pre-existing trust test failures caveat
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 08:36:12 -04:00
archipelago
f5d2479605 Merge branch 'iso-feedback-fixes-2026-07-02' into merge-iso-feedback
# Conflicts:
#	core/archipelago/src/api/rpc/middleware.rs
2026-07-02 08:03:25 -04:00
archipelago
c375ecc441 fix: fresh-ISO feedback bug-bash — onboarding, status truthfulness, recovery, kiosk, logs
Fixes from real fresh-install feedback (Framework node .81) + its log bundle:

Backend:
- websocket: subscribe before initial snapshot — broadcasts in the gap were
  silently lost, stranding clients on stale state until a hard refresh
  (the "everything needs ctrl-r" bug: My Apps stuck Loading, App Store
  stuck Checking, containers-scanned never arriving)
- crash recovery: check the crash marker BEFORE writing our own PID —
  recovery had never run on any node (always saw its own PID and skipped);
  PID-reuse guard via /proc cmdline
- boot status: pending-boot-starts registry (recovery, stack recovery,
  reconciler, adoption) — scanner overlays queued-but-down apps as
  Restarting instead of Stopped after a reboot; scanner-authored
  Restarting resolves immediately on a settled scan (no transitional wedge)
- install deps: bounded wait (36x5s) when a dependency is installed but
  still starting ("Waiting for Bitcoin to start…") instead of instant
  rejection; dependency-gate rejections remove the optimistic entry (no
  phantom Stopped tile) and surface as a notification
- seed backup: auth.setup persists the onboarding mnemonic as the
  encrypted seed backup (reveal previously failed on EVERY node — nothing
  ever wrote master_seed.enc); seed.restore stashes too; error sanitizer
  lets seed/2FA errors through instead of "Check server logs"
- lnd: bitcoind.rpchost resolved from the running Bitcoin variant
  (hardcoded bitcoin-knots broke Core nodes); manifest uses derived_env
- bitcoin status: clean human message for connection-reset/startup; raw
  URLs + os-error chains no longer reach the app card
- fedimint-clientd: chown /var/lib/archipelago/fmcd to 1000:1000 (root-
  created dir crash-looped the rootless container, EACCES) — first-boot
  script + pre-start self-heal
- log volume (>1GB/day on a day-old node): journald caps drop-in (ISO +
  bootstrap self-heal), bitcoind -printtoconsole=0 everywhere (90% of the
  journal was IBD UpdateTip spam), tracing default debug→info

Frontend:
- Login: Enter advances to confirm field then submits; submit always
  clickable with inline errors (was silently disabled on mismatch);
  Restart Onboarding needs a confirming second click (the mismatch →
  "onboarding restarted" trap)
- sync store: 30s state reconciliation + refetch on re-entrant connect;
  20s containers-scanned escape hatch so Checking can never show forever;
  fresh empty node reaches the real "no apps yet" state
- intro video: CRF20 re-encode (SSIM 0.988) + faststart — moov was at EOF
  so playback needed the full 15MB first (the intro lag)
- backgrounds: 10 heaviest JPEGs → WebP q90 (9.4MB→6.6MB); 7 stayed JPEG
  (WebP larger on noisy sources)
- Web5ConnectedNodes: drop unused template ref that failed vue-tsc -b

ISO/kiosk:
- nginx: /assets/ 404s no longer cached immutable for a year; HTTPS block
  gained the missing /assets/ location (served index.html as images)
- kiosk: launcher/service spliced from configs/ at ISO build (stale
  heredoc force-disabled GPU); MemoryHigh/Max 1200/1500→2200/2800M (kiosk
  rode the reclaim throttle = the lag); firmware-intel-graphics +
  firmware-amd-graphics (trixie split DMC blobs out of misc-nonfree)

Verified: cargo test 898/898 green, npm run build green with dist
contents confirmed (webp refs, lnd.png, faststart video, new strings).
Handover for ISO build + deploy: docs/HANDOVER-2026-07-02-iso-feedback.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 08:00:39 -04:00
archipelago
b9e4fbe9f7 docs: PR#67 + back-button fix merged/pushed but NOT deployed — resume note
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-02 03:33:52 -04:00
archipelago
61bfde3200 docs: consolidated deploy done — all 5 fleet nodes verified + unbundled ISO built
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 19:54:07 -04:00
archipelago
2c1d2a2572 docs: multinode gate finished + boot-reconciler self-heal bug found+fixed
.5's 5x gate done: 5/5 iterations, all technically FAIL per run-gate.sh's
tally but only from .5's permanent pruned-bitcoin ceiling (accepted going
in); down to 2 failures/iteration by the end. Found + fixed a real hang
(lnd cached a dead bitcoin-knots IP after a restart) live mid-run.

Separately found a real boot-reconciler bug via indeedhub going stuck on
.116: any genuinely-installed-but-fully-absent app was left stuck forever
unless it was one of 8 hardcoded "baseline" apps. Fix tracked, code change
in the shared working tree pending test confirmation.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 17:24:42 -04:00
archipelago
81444ab4a8 docs: multinode-pass parallel work — 3 items closed, 1 real regression found
While the .5 gate ran: confirmed no legacy multi-container stacks remain
(workstream A tail fully closed), reframed the "30 apps zero coverage"
claim as stale (all apps get generic baseline coverage via
all-apps-lifecycle/matrix, real gap is 34 apps lacking app-specific
assertions), and discovered tests/multinode/smoke.sh already exists and
ran it live against .116<->.228: federation pairing/FIPS/content-browse
all confirmed working, but found + root-caused a real tombstone bug
(federation.remove-node silently swallows tombstone-write failures,
letting removed peers get re-added by background sync). Not fixed yet —
federation/trust code, needs a careful fix, not a blind one.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 15:23:52 -04:00
archipelago
6b7af884ab docs: multinode pass swapped to .5, 5x gate launched
.198 IBD/pruned blocker → user chose swap over wait/hardware. .116 ruled
out (no bitcoin container), .120 ruled out (reserved for another dev). .5
(archy-x250-beta) is fully synced despite also being sub-1TB/pruned;
bootstrapped bats+jq and launched the 5x destructive gate there.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 13:04:04 -04:00
archipelago
9cc288521d docs: multinode pass — cleared .198 preconditions, hit a real hardware blocker
Reset-failed 2 stale dead-unit records on .198, confirmed nginx lnd proxy
target is correct. Hit a genuine blocker needing a user decision: .198's
448GB disk is below the 1TB archival threshold so it runs pruned bitcoin,
currently only 21% through IBD — the multinode plan's precondition requires
pruned:false + fully synced.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:45:41 -04:00
archipelago
0323310c91 docs: close out Tier 1 tracker items — all 3 turned out non-issues
immich is already fully Quadlet-migrated (verified live on .228, same
install_stack_via_orchestrator primitive as netbird/btcpay). TanStack Query
spike recommends not adopting — no cache/staleness bugs, WS push already
covers hot data. Netbird reinstall adoption-skips-cert-render is correct by
design (adoption only fires when no manifest exists to render from anyway).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:37:23 -04:00
archipelago
79bbcca964 docs: consolidate OTA 1.8.0 + master-plan open items into one priority-ordered tracker
docs/UNIFIED-TASK-TRACKER.md replaces hunting across SESSION-1.8.0-OTA-PROGRESS.md
and PRODUCTION-MASTER-PLAN.md for "what's left" — fastest/simplest tasks first.
Verified against live code/nodes rather than trusting doc text: several previously
"open" items (bind-dir chown, netbird legacy installer, launch-port fallback,
archival-bitcoin manifest field, progress-UI monotonicity, all-apps coverage,
fedimint test coverage, changelog backfill, portainer image pin, grafana quadlet
activation) turned out already shipped or non-issues, and are closed out here.
TESTING.md's release-gate checklist updated to match reality (cargo warnings,
5x gate, changelog already green; multinode/backend-default-flip/tag genuinely open).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:29:26 -04:00
archipelago
e3baaa5de3 docs: record fleet-deploy ENOSPC bug + fix + cleanup outcome
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 11:01:27 -04:00
archipelago
5269d50039 docs: record .198 cleanup outcome + .228 fedimint-guardian clarification
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 09:20:15 -04:00
archipelago
d0710e7491 fix(orchestrator,content): bound repair-recreate loops; self-heal stale content catalog entries
- prod_orchestrator.rs: the boot reconciler's zombie-guard and start-failed
  recreate paths (Created/Stopped/Exited states) had no attempt cap, unlike
  health_monitor's independent restart tracker. A container whose entrypoint
  fatally crashes right after `podman start` succeeds got stop+remove+
  install_fresh'd every ~30s reconcile tick forever (portainer on .198,
  2026-07-01: a DB schema newer than the pinned binary could read -- no
  amount of recreating fixes that). Added a 5-attempts/30-minute circuit
  breaker; once exhausted the container is left alone with an error! log
  instead of looping, and an explicit install/start clears the counter.
- content_server.rs: serve_content now prunes a catalog entry whose backing
  file is missing on disk, instead of leaving it advertised to every peer
  forever with no way to distinguish "gone" from "transient failure."

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 08:19:54 -04:00
archipelago
5b7cd5d5d0 fix(orchestrator): durable uninstall marker for baseline apps + archival-bitcoin/version-report gaps
- mempool-api now declares dependencies:[bitcoin:archival] directly, closing a
  gap where installing it standalone (a legitimate direct orchestrator-install
  target) bypassed the mempool umbrella's pruning gate entirely.
- New durable user-uninstalled marker (crash_recovery.rs, mirrors user_stopped)
  fixes required-baseline-app self-heal (bitcoin-knots/electrumx/lnd/mempool/
  etc.) resurrecting itself after an explicit uninstall survives a restart or
  reboot, since the in-memory disabled set is wiped by every load_manifests().
- installed_version() (set_config.rs) no longer trusts a floating image tag
  ("latest") as the reported running version -- a stale local :latest cache
  reported "latest" forever regardless of what latest had moved on to. Now
  falls back to asking the Bitcoin backend directly via `bitcoind --version`
  when the tag is floating.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 06:29:11 -04:00
archipelago
0eb5c258f5 fix(mesh): Meshtastic 3ccc pkc_capable pill + Sideband image interop + critical CBOR wire-bloat fix
Merges in the meshtastic agent's now-finished work alongside this session's
continuation: stock-peer (3ccc) PKI-capability is now stamped through
get_contacts -> refresh_contacts -> MeshPeer.pkc_capable, so a directed DM to/from
a PKC-capable stock Meshtastic peer correctly shows the E2E pill on the Sent row,
not just received messages. Confirmed live: .198 sees "Meshtastic 3ccc" with
pkc_capable=true.

Also fixes two real interop/correctness bugs found while live-testing the
Reticulum <-> Sideband link:
  - Receive: the daemon only ever read LXMF's plain-text content, silently
    dropping native FIELD_IMAGE/FIELD_FILE_ATTACHMENTS fields — a stock
    Sideband/NomadNet photo vanished into a blank-space message. Now decoded
    into the same ContentInline typed envelope our own attachments use.
  - Send: images to a non-archy (stock) peer now use native LXMF FIELD_IMAGE
    instead of our own opaque CBOR wire format, which Sideband can't decode.
  - Root cause of a garbled MC-chunk-fragment bug: TypedEnvelope.v/.sig (the
    OUTER wrapper every message type uses) serialized raw bytes as a CBOR
    array-of-integers instead of a native byte string, bloating every
    message on the wire ~2-3.5x — enough to push even a tiny ReadReceipt
    over the 140-byte single-frame chunking threshold. Root-caused by
    reading ciborium's deserializer source directly (deserialize_bytes only
    works within its internal scratch buffer; deserialize_byte_buf streams
    unbounded).

Frontend: consolidated the attach/record buttons into a single animated "+"
menu (was overflowing the compose row).

857/857 tests pass. Verified live across all 5 deploy-roster nodes.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 22:07:45 -04:00
archipelago
f54c853128 feat(mesh): Reticulum LoRa hardware gates pass + RNS Resource transfer + image/voice attachments
Phase 0 gates #2/#3 (two-node LXMF-over-LoRa, external Sideband interop) passed
on real hardware (.116's flashed Heltec V3 RNode <-> a phone-flashed RNode running
Sideband) — RNS announce, encrypted DM round-trip, and contact binding all verified
live. Fixed two bugs found in the process: the Reticulum send path wasn't stamping
outbound messages as E2E despite LXMF being unconditionally encrypted, and the
per-message transport pill collapsed Meshcore/Meshtastic into one generic "lora"
color instead of distinguishing the three radio transports.

Built on top of that link: a Columba-style image/file send experience —
compression-quality presets with a real transfer-time estimate (mesh.transport-advice,
now device-throughput-aware), receive-side thumbnail previews + auto-render for
already-local attachments, and async voice messages, all reusing the existing
ContentRef/ContentInline attachment pipeline. The headline addition is genuine RNS
Resource transfer support (daemon-side RNS.Link + RNS.Resource, Rust-side
send_resource/resource_recv plumbing, a new "resource-mesh" transport-advice tier)
so compressed photos up to 2MB now actually transfer over LoRa for Reticulum peers
instead of always falling back to Tor past the small inline-chunk cap.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 19:57:01 -04:00
archipelago
12e7990b10 fix(mesh): route Meshtastic public-channel text to the channel thread, not DMs
Inbound Meshtastic text addressed to BROADCAST_NUM (the default public
LongFast channel, or any channel slot) was filed into a per-sender 1:1 DM
thread, so public-channel messages polluted individual people's DM chats
and appeared as if sent directly to the user.

packet_to_inbound_frame now detects `to == BROADCAST_NUM` and emits a new
synthetic RESP_MESHTASTIC_CHANNEL_TEXT frame
([channel_idx][sender_prefix(6)][text]) that the listener files under the
channel thread (contact_id = u32::MAX - idx) while still attributing the
message to its real sender. Directed text (to == our node) still routes to
the DM thread — a regression test locks that split in.

send_channel_text now sets MeshPacket.channel (field 3) so archy actually
transmits on channel 0 (public) instead of ignoring the slot. Mesh.vue keeps
the synthetic "Meshtastic !xxxx" sender id when that is the best identity
available for a stock public-channel device.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 14:33:30 -04:00
archipelago
fbfeeeb0f5 fix(mesh): native E2E DM for archy↔archy text + software radio-reboot
- send_message now sends archy↔archy plain text as a native TEXT_MESSAGE_APP
  DM (firmware PKC-encrypts E2E), not wrapped in the binary typed envelope
  that silently broke archy↔archy LoRa delivery. Archy peers' Sent rows are
  marked encrypted so the E2E pill shows; rich typed msgs still use the
  typed-wire path.
- Add a software radio-reboot to recover a wedged/RX-deaf radio without
  physical access (and for the Device-tab settings panel): driver reboot()
  via AdminMessage reboot_seconds=97 (verified vs meshtastic/protobufs),
  MeshCommand::RebootRadio, MeshService::reboot_radio, RPC mesh.reboot-radio.
- Handoff doc: docs/SESSION-1.8.0-OTA-PROGRESS.md "RESUME HERE" — RF link is
  the proven blocker (radios not hearing each other); modem_preset mismatch
  is the prime suspect; on-device Meshtastic-app check + fix plan documented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:39:34 -04:00
archipelago
c2c4b5af7d merge: demo build updates
# Conflicts:
#	neode-ui/src/stores/appLauncher.ts
#	neode-ui/src/views/AppSession.vue
2026-06-30 05:22:42 -04:00
archipelago
df9d3a55be integration: preserve deployed 1.8.0 OTA work 2026-06-30 05:08:17 -04:00
archipelago
f4f45c1a09 docs: mark .228 reindex finish/verify as other-agent owned
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:04:01 -04:00
archipelago
ed1352d3a3 docs+catalog: bitcoin multi-version rollout handoff + reproducible generator
- generate-app-catalog.sh: VERSIONS map now lists the full Knots set
  (29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core
  (adds 29.2 + a `latest` entry → newest); generator forces top-level
  `version` == the default entry's version (the 169ff2e2 invariant) so
  regeneration is reproducible. releases/app-catalog.json regenerated.
- docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes,
  fixes, current .228 state, the coordinated fleet-rollout steps (incl.
  :latest repoint sequencing / fleet-safety), reindex finish procedure, and
  the switch-matrix test plan.
- PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:02:24 -04:00
archipelago
6aa74c7386 feat(bitcoin): multi-version support for Core & Knots (install/switch/pin/auto-update)
Lets a node runner choose which Bitcoin Core / Knots version to install
(latest pre-selected), then switch, pin, or opt into auto-update from the
app's interface — all manifest/catalog-driven, rootless, signed-registry,
zero-data-loss. Motivated by upcoming BIP-110 signalling: runners need a
real choice of software version.

Backend:
- version_config.rs: per-app pin + auto-update persistence (atomic, merge-
  preserving), downgrade detection, auto-update enumeration (+ unit tests).
- app_catalog.rs: CatalogVersion / versions[] schema, catalog_versions(),
  catalog_image_for_version() (same-repo guard); a pin suppresses the update
  badge.
- prod_orchestrator.rs: pinned version wins over the catalog default on every
  install/recreate.
- install.rs: install-time `version` param persisted (default = unpinned).
- set_config.rs: package.versions (read) + package.set-config (write) RPCs;
  downgrade is gated behind explicit confirm (warn + confirm + allow).
- update.rs/main.rs: hourly per-app auto-update tick via the orchestrator
  (opt-in, pin-respecting); fix handle_package_update to be non-fatal for
  orchestrator-managed apps lacking a catalog primary image (bitcoin-core).

UI:
- MarketplaceAppDetails.vue: install-time version selector (shown when an app
  offers >=2 versions).
- appDetails/AppSidebar.vue: "Version & Updates" card (switch / pin / auto-
  update toggle / downgrade warning), per app.
- rpc-client.ts + en.json: RPC methods, types, strings.

Phase 0 image pipeline:
- scripts/build-bitcoin-image.sh: download official tarball + SHA256SUMS(.asc),
  verify SHA-256 + pinned-maintainer OpenPGP signature (fail-closed), build a
  minimal rootless image, smoke-test, tag + push.
- apps/bitcoin-core/Dockerfile rewritten (drops stale community base);
  apps/bitcoin-knots/Dockerfile added.
- generate-app-catalog.sh: emit curated versions[]; published + catalog now
  offers Core 25.2/26.2/27.2/28.4/29.3/30.2/31.0 + Knots 29.3.knots20260508.

docs/bitcoin-multi-version-design.md: live progress tracker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 18:46:17 -04:00
archipelago
a38c9d5f29 docs(master-plan): §10d Meshtastic MeshCore-parity status (one open received-msg bug)
Region (EU_868) + shared channel "archipelago" auto-provisioning shipped in
8fdb45e8 and riding the rolled #9 fleet binary (0060dcd6). Discovery, RF, and
sending verified on .116+.228; the one open blocker is the running driver not
surfacing received messages. Slotted after WS-F #9–11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:53:06 -04:00
archipelago
fc64b422e7 docs(master-plan): WS-F#3 first destructive run — 3 reinstall bugs found
Full all-apps-lifecycle pass on .228: lifecycle 11/11, teardown 8/11.
Surfaced (1) fresh-install bind-dir ownership root:root → reinstall
EACCES (jellyfin/netbird; Fix B misses the install path), (2) netbird
reinstall adopts leftover containers → skips manifest cert/file render,
(3) portainer image pin lfg2025/portainer:2.19.4 unpublished (manifest
unknown), pin overrides RPC dockerImage. .228 restored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:47:24 -04:00
archipelago
a3e09eab57 docs(master-plan): WS-F#3 — destructive all-apps lifecycle matrix landed (43934eef)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:51 -04:00
archipelago
80146f4476 docs(master-plan): WS-F#2 — uninstall progress bar made truthful (9f17ba68)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:15:11 -04:00
archipelago
67426c0d41 docs(master-plan): cascade tier wired into the gate (b7d92107)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:24:07 -04:00
archipelago
292a2650df docs(master-plan): WS-F — uninstall-hang root cause fixed + cascade validated
Workstream F now in-progress: the immich/grafana uninstall hang →
ghost/stuck-bar/reinstall-block is root-caused (unbounded systemctl/
podman in quadlet::disable_remove) and fixed (71cc9ac4); cascade-
uninstall.bats 7/7 on .228. Records the remaining F items + the pending
gate-wiring decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:18:39 -04:00
archipelago
2ebcd8f9a8 docs(master-plan): backlog — smart launch-port selection + manifest-driven archival-node blocker
§10b: replace per-app static launch-port map with a manifest-first +
non-HTTP-port-skipping heuristic (the gitea :2222 class).
§10c: generalize the un-pruned/archival Bitcoin install blocker from a
hardcoded requires_unpruned_bitcoin() match to a manifest-declared
dependency, with a clear pre-install UX.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:47:25 -04:00
archipelago
3515344800 docs(master-plan): session h — zombie guard + gitea launch-port fix
Banner + §8b: zombie-container guard (0a8db904, live-proven on .228) and
gitea launch-port fix (670ebb06) shipped in binary 040df5ce, rolled to
the fleet. Logs the mempool env-drift recreate-loop and nostr-rs-relay
follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:41:59 -04:00
archipelago
15f65428b8 docs(master-plan): §8b — uninstall fix deployed+live-verifying, #15 guardian resolved
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 18:07:41 -04:00
archipelago
36015a19fe docs(master-plan): §8b session-b state — connection-lost+netbird+UX-merge shipped to .228, uninstall ghost fix, workstream F in progress
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 15:26:17 -04:00
archipelago
c4cd5fdc90 docs(master-plan): §8b resume — gate green + 6-node deploy + APK fix + workstream F
Comprehensive resume for the session restart: single-node gate green
(5/5 .228), latest backend + UX + one-tap companion APK deployed to 6
nodes (table w/ creds + pending 100.64.83.15 cred), workstream-F bugs
from manual testing, agreed next order (netbird → Phase-3 → F →
multinode), and loose ends (untracked AppLoadingScreen.vue, broken
gitea-local mirror, don't-delete-bitcoin-data directive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:56:54 -04:00
archipelago
deff380191 docs(master-plan): workstream F (lifecycle perfection) + §10 state-mgmt backlog
The 2026-06-23 5×-green gate is DESTRUCTIVE-tier / ~8 core apps only —
it skips uninstall/reinstall (cascade) and has no progress-UI or
all-apps coverage. Manual multinode testing found real bugs it never
ran (immich+grafana uninstall hangs at full-red bar + ghost in My Apps;
grafana reinstall stops; fedimint guardian "waiting for bitcoin sync").
Adds §4 row F, §6b post-deploy order (netbird→Phase-3→F), §6c scope +
observed bugs + definition-of-done, a §5 warning, and §10 backlog to
investigate TanStack-Query/push-based state management for neode-ui.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:28:19 -04:00
archipelago
ae47897601 docs: single-node production gate GREEN (5/5 on .228) — demote banner
run-gate.sh 5×-green on .228, 0 not-ok (gate-5x5.log). Records the
milestone in the header/banner, §4 workstream E, §6 sequence, and §8b;
demotes the priority banner per §6 item 6. Next: bundled testing deploy
(.116/.198 + UX frontend), multinode pass, workstreams B/C/D.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:27:36 -04:00
archipelago
256d354048 docs(master-plan): tick off §8 P1 mobile app-launch UX (code-complete)
Mobile launch UX is code-complete on branch `companion-mobile-ux` (store-driven
panel, no interstitial, in-app WebView footer + loader, mesh 100dvh, ElectrumX
icon, companion v0.4.7 + shared debug keystore). Marked code-complete pending
on-device/mobile-web verification and merge to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:11:25 -04:00
archipelago
6511754545 docs: master-plan §8b — 5× triage, mempool restart bug fixed
Record the overnight 5× outcome (2/5) and the triage: all three
fails were distinct one-offs. iter1 #5 bitcoin-knots = pre-launch
churn (hardened anyway); iter2 #74 + iter5 #73 = one real
orchestrator bug (phantom stack-member injection in
ordered_containers_for_start), now fixed + live-verified on .228.
Update the resume check command to gate-5x4.log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 02:23:07 -04:00
archipelago
57a013bc66 test(gate): make 5× the canonical gate, drop 20x naming
Rename run-20x.sh → run-gate.sh, default ARCHY_ITERATIONS 20→5, and scrub
20× references across CLAUDE.md, the master plan, TESTING.md, app-registry
status, the orchestrator/config doc-comments, and the bats suites. Also add
a minimal fail() helper to mempool.bats so guard failures report cleanly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 18:12:41 -04:00
archipelago
c8acc84506 docs: §2 invariant single-node (.228); multinode → separate plan 2026-06-22 17:23:19 -04:00
archipelago
8355453a7e docs: exact cutoff-proof resume in master-plan SS8b (resume from any device)
Captures: .228 1x-GREEN (110/110); hardened 5x DETACHED on .228 (/tmp/gate-5x2.log,
nohup — survives terminal close) with the exact check-from-any-machine command; all
shipped code fixes (commits) + deploy state (.228 + .198); node-state fixes NOT in
repo (lnd nginx proxy 8081->18083, home-assistant orphan unit removed, electrumx
re-registered); the run-ON-the-node lesson; and remaining work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 17:22:29 -04:00