Compare commits

...

176 Commits

Author SHA1 Message Date
archipelago
1977bdefb5 feat(trust): pin release-root anchor + ship signed app-catalog
Pin RELEASE_ROOT_PUBKEY_HEX from the 2026-07-02 release-root signing ceremony
(signer did🔑z6MkkidEnEpo6qHMCNSZoNKWtvQvxq3whnaME9wGgEFhq7ur) so nodes verify
the publisher identity of the app-catalog. Sign releases/app-catalog.json in place.

Fix two floats that made the catalog unsignable: archy-btcpay-db manifest version
-> string, fedimint-clientd cpu_limit 0.25 -> 1 (u32). Add scripts/sign-catalog.sh
helper, the 1.8.0 release-hardening plan/tracker, and the commit-and-push project
rule in CLAUDE.md.

Backward-compatible: old binaries still accept the signed catalog; the pinned-anchor
binary ships in the next build/OTA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 09:15:43 -04:00
archipelago
8b6485078a docs(handover): pushed-to-main state + pre-existing trust test failures caveat
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 08:36:12 -04:00
archipelago
f5d2479605 Merge branch 'iso-feedback-fixes-2026-07-02' into merge-iso-feedback
# Conflicts:
#	core/archipelago/src/api/rpc/middleware.rs
2026-07-02 08:03:25 -04:00
archipelago
c375ecc441 fix: fresh-ISO feedback bug-bash — onboarding, status truthfulness, recovery, kiosk, logs
Fixes from real fresh-install feedback (Framework node .81) + its log bundle:

Backend:
- websocket: subscribe before initial snapshot — broadcasts in the gap were
  silently lost, stranding clients on stale state until a hard refresh
  (the "everything needs ctrl-r" bug: My Apps stuck Loading, App Store
  stuck Checking, containers-scanned never arriving)
- crash recovery: check the crash marker BEFORE writing our own PID —
  recovery had never run on any node (always saw its own PID and skipped);
  PID-reuse guard via /proc cmdline
- boot status: pending-boot-starts registry (recovery, stack recovery,
  reconciler, adoption) — scanner overlays queued-but-down apps as
  Restarting instead of Stopped after a reboot; scanner-authored
  Restarting resolves immediately on a settled scan (no transitional wedge)
- install deps: bounded wait (36x5s) when a dependency is installed but
  still starting ("Waiting for Bitcoin to start…") instead of instant
  rejection; dependency-gate rejections remove the optimistic entry (no
  phantom Stopped tile) and surface as a notification
- seed backup: auth.setup persists the onboarding mnemonic as the
  encrypted seed backup (reveal previously failed on EVERY node — nothing
  ever wrote master_seed.enc); seed.restore stashes too; error sanitizer
  lets seed/2FA errors through instead of "Check server logs"
- lnd: bitcoind.rpchost resolved from the running Bitcoin variant
  (hardcoded bitcoin-knots broke Core nodes); manifest uses derived_env
- bitcoin status: clean human message for connection-reset/startup; raw
  URLs + os-error chains no longer reach the app card
- fedimint-clientd: chown /var/lib/archipelago/fmcd to 1000:1000 (root-
  created dir crash-looped the rootless container, EACCES) — first-boot
  script + pre-start self-heal
- log volume (>1GB/day on a day-old node): journald caps drop-in (ISO +
  bootstrap self-heal), bitcoind -printtoconsole=0 everywhere (90% of the
  journal was IBD UpdateTip spam), tracing default debug→info

Frontend:
- Login: Enter advances to confirm field then submits; submit always
  clickable with inline errors (was silently disabled on mismatch);
  Restart Onboarding needs a confirming second click (the mismatch →
  "onboarding restarted" trap)
- sync store: 30s state reconciliation + refetch on re-entrant connect;
  20s containers-scanned escape hatch so Checking can never show forever;
  fresh empty node reaches the real "no apps yet" state
- intro video: CRF20 re-encode (SSIM 0.988) + faststart — moov was at EOF
  so playback needed the full 15MB first (the intro lag)
- backgrounds: 10 heaviest JPEGs → WebP q90 (9.4MB→6.6MB); 7 stayed JPEG
  (WebP larger on noisy sources)
- Web5ConnectedNodes: drop unused template ref that failed vue-tsc -b

ISO/kiosk:
- nginx: /assets/ 404s no longer cached immutable for a year; HTTPS block
  gained the missing /assets/ location (served index.html as images)
- kiosk: launcher/service spliced from configs/ at ISO build (stale
  heredoc force-disabled GPU); MemoryHigh/Max 1200/1500→2200/2800M (kiosk
  rode the reclaim throttle = the lag); firmware-intel-graphics +
  firmware-amd-graphics (trixie split DMC blobs out of misc-nonfree)

Verified: cargo test 898/898 green, npm run build green with dist
contents confirmed (webp refs, lnd.png, faststart video, new strings).
Handover for ISO build + deploy: docs/HANDOVER-2026-07-02-iso-feedback.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 08:00:39 -04:00
archipelago
b9e4fbe9f7 docs: PR#67 + back-button fix merged/pushed but NOT deployed — resume note
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-02 03:33:52 -04:00
archipelago
7d7ba5734a fix(ui): wire up OpenWrtGateway's back button
BackButton is presentational-only (emits click, parent wires navigation)
per its own doc comment, but OpenWrtGateway.vue rendered it with no
@click handler at all -- clicking it did nothing. Added useRouter +
goBack() (-> the 'server' route, matching the page's location under
views/server/), same pattern as PeerFiles.vue/CloudFolder.vue.

Router-detection (openwrt.scan) spot-checked live: RPC plumbing works
end-to-end and returns a valid response, but no physical OpenWrt device
was on hand to confirm a true-positive detection. Also noted:
detect::scan_subnet does blocking TCP/SSH calls inside an async fn with
no .await points -- not proven to cause a real issue yet, but worth
hardening (spawn_blocking or async I/O) before a large subnet scan is
exercised for real.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-02 03:24:37 -04:00
archipelago
7a7fec21d4 Merge remote-tracking branch 'gitea-ai/fix/reticulum-daemon-process-group' 2026-07-02 02:56:40 -04:00
archipelago
61bfde3200 docs: consolidated deploy done — all 5 fleet nodes verified + unbundled ISO built
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 19:54:07 -04:00
archipelago
9f52e81471 fix(ui): remove vestigial ref, fix stale MeshMap test mock
Web5ConnectedNodes.vue declared nodesContainerRef but never consumed it
(the controller-nav system scans [data-controller-container] globally,
no other view uses a per-component ref for it) — broke the vue-tsc build.
MeshMap.test.ts's mocked mesh store predated federatedPositions (added
earlier this session for the Mesh Map federated-node feature) and crashed
on mount. Found live merging PR#67 (reticulum) + UI/UX work +
archy-openwrt into main for a combined fleet deploy.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 18:20:04 -04:00
archipelago
27093e682f Merge remote-tracking branch 'gitea-ai/archy-openwrt' 2026-07-01 18:09:41 -04:00
archipelago
0da73a8ce1 Merge remote-tracking branch 'gitea-ai/fix/reticulum-daemon-pdeathsig' 2026-07-01 18:09:31 -04:00
archipelago
8256fde1a6 fix(ui): mesh/web5/apps layout, modal, and search UX fixes
- Mesh: fix 920-1280px bottom margin (phantom mobile-nav reservation
  leaking into the desktop-sidebar range), let the mesh view scale to
  full width on wide screens instead of capping at 1600px, and make the
  Device panel collapsible on desktop (previously mobile-only)
- Search/controller-nav: a global gamepad/keyboard-nav feature was
  auto-clicking "the next button in the DOM" on Enter in any text input,
  which cleared the mesh peer search and popped the sideload modal from
  the App Store/My Apps search boxes. Opt out via data-controller-no-submit
  on all filter inputs; bump the mesh clear button's touch target
- Modals: several (sideload, credential, Lightning channel open, identity
  create) used ad-hoc blue buttons and non-fullscreen backdrops that only
  covered the main content area, not the sidebar. Teleport them to body,
  unify backdrop/button theming to the dark+orange convention, fix the
  sideload modal's square bottom corners on desktop, and standardize
  close buttons to the ghost-icon style
- Web5: remove the redundant/dead "Messages" tab from Connected Nodes
  (its deep-link was unreachable dead code); fix the "view message" toast
  to actually open the Archipelago channel instead of silently failing to
  match a LoRa peer; make identity rows responsive via a container query
  (viewport-based breakpoints don't work in the page's 2-column grid) and
  right-justify their action icons; collapse DID/DHT/Wallet/Nostr/Connected
  Nodes by default on mobile
- Apps/App Store: match the search bar and sideload button's height,
  padding, and background to the mode-switcher tabs beside them
- Mesh chat: keep the compose input focused after sending

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 18:04:31 -04:00
archipelago
936b4cca29 fix(orchestrator): self-heal ANY installed app, not just baseline ones
The boot reconciler only self-healed a fully-absent container for one of
8 hardcoded "required baseline" apps (bitcoin-knots, electrumx, lnd,
mempool*, filebrowser, fedimint-clientd) — every other genuinely-installed
app whose container went missing (crash, lost record, wedged teardown)
was left as Left("absent") forever, with no path back short of an
explicit manual reinstall.

Surfaced live: indeedhub's backend containers (minio/postgres/relay) went
absent on .116 and never recovered despite indeedhub still being
installed. By the time this code path runs, the app is already confirmed
NOT user-stopped and NOT user-uninstalled (both checked earlier in the
same function, backed by durable markers correctly cleared on
reinstall/start) — so gating self-heal further behind a hardcoded app-id
list was an unnecessary restriction, not a safety measure. An app the
user installed and never removed should come back on its own, same as
baseline services always have.

Deleted the now-dead is_required_baseline_app(); updated the test that
had locked in the old (wrong) behavior.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 17:27:16 -04:00
archipelago
2c1d2a2572 docs: multinode gate finished + boot-reconciler self-heal bug found+fixed
.5's 5x gate done: 5/5 iterations, all technically FAIL per run-gate.sh's
tally but only from .5's permanent pruned-bitcoin ceiling (accepted going
in); down to 2 failures/iteration by the end. Found + fixed a real hang
(lnd cached a dead bitcoin-knots IP after a restart) live mid-run.

Separately found a real boot-reconciler bug via indeedhub going stuck on
.116: any genuinely-installed-but-fully-absent app was left stuck forever
unless it was one of 8 hardcoded "baseline" apps. Fix tracked, code change
in the shared working tree pending test confirmation.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 17:24:42 -04:00
archipelago
27e6747c2a feat(security): pin the release-root trust anchor (Workstream B)
Pins RELEASE_ROOT_PUBKEY_HEX from the signing ceremony
(did🔑z6MkkidEnEpo6qHMCNSZoNKWtvQvxq3whnaME9wGgEFhq7ur). The
corresponding mnemonic is held offline by the publisher, never committed
or stored on any node/build host. Nodes built with this binary now verify
the app catalog's signature against this anchor instead of accepting any
signer; unsigned catalogs are still accepted during the migration window
per docs/workstream-b-signing-runbook.md.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 16:59:48 -04:00
Dorian
be50c886bb fix(mesh/reticulum): kill the whole daemon process group on drop
The reticulum daemon is a PyInstaller one-file binary: a bootloader parent
that forks the real Python process. `kill_on_drop`/`start_kill()` only SIGKILL
the bootloader, orphaning the forked child — which keeps holding the RNode
serial port. Across the listener's 30-min RX-stall reconnects this piled up
(observed 9 concurrent instances on a live node) all clutching /dev/ttyUSB0,
garbling the RNode so it stopped transmitting entirely.

Spawn the daemon as its own process-group leader (`process_group(0)`) and, on
drop, signal the whole group (SIGTERM for a clean RNode/socket release, then
SIGKILL as a hard backstop) so the forked child can never be orphaned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 21:29:54 +01:00
archipelago
469b0203b7 fix(reticulum-daemon): die with parent to stop RNode-jamming pile-ups
The daemon ships as a PyInstaller one-file binary; its direct parent is the
bootloader, which the Rust supervisor (mesh/reticulum.rs Drop) stops via
start_kill() == SIGKILL. SIGKILL can't be forwarded, so the Python child was
orphaned on every link recreation and kept holding the RNode serial port.
These stale daemons piled up (9 seen on one node), all clutching /dev/ttyUSB0
and garbling the RNode so it silently stopped transmitting (txb frozen,
interface status False).

Set PR_SET_PDEATHSIG(SIGTERM) at daemon startup so the kernel signals us when
the parent exits; our existing SIGTERM handler then shuts down cleanly and
frees the port. Linux-only, best-effort, no-op elsewhere.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 16:14:55 -04:00
archipelago
81444ab4a8 docs: multinode-pass parallel work — 3 items closed, 1 real regression found
While the .5 gate ran: confirmed no legacy multi-container stacks remain
(workstream A tail fully closed), reframed the "30 apps zero coverage"
claim as stale (all apps get generic baseline coverage via
all-apps-lifecycle/matrix, real gap is 34 apps lacking app-specific
assertions), and discovered tests/multinode/smoke.sh already exists and
ran it live against .116<->.228: federation pairing/FIPS/content-browse
all confirmed working, but found + root-caused a real tombstone bug
(federation.remove-node silently swallows tombstone-write failures,
letting removed peers get re-added by background sync). Not fixed yet —
federation/trust code, needs a careful fix, not a blind one.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 15:23:52 -04:00
archipelago
2f1a577109 fix(tests): installed_required_containers must not fail under set -e
The prior fix's loop `container_installed "$c" && echo "$c"` makes the
function's own exit status the exit status of its LAST array entry. If
that entry isn't installed on this node (e.g. required-stack-destructive's
array ends with mempool-api, absent on .5), the whole function reports
failure even though earlier entries matched fine — and under bats' set -e,
`targets="$(installed_required_containers)"` then aborts the test outright.
required-stack.bats got lucky (its array happens to end with an installed
container) but has the identical latent bug. Caught live on .5's iteration
3 of the multinode-pass gate run. Add explicit `return 0`.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 15:11:07 -04:00
archipelago
4c3aa8cc8e fix(icons): remove remaining electrs icon references, use electrumx.png
GoalDetail.vue, EasyHome.vue, and the backend's docker_packages.rs
metadata still pointed electrs-family app ids at the old electrs
icon (svg). Point them at electrumx.png like every other reference,
and delete the now-unused electrs.svg asset.
2026-07-01 14:48:14 -04:00
archipelago
ed95d54ffe chore(assets): replace lnd icon svg with png
lnd.svg no longer exists; every reference now points at lnd.png.
2026-07-01 14:41:15 -04:00
archipelago
7d2ac1f842 chore(assets): update searxng app icon 2026-07-01 14:36:12 -04:00
archipelago
daa8fb4891 fix(tests): make required-stack-destructive.bats portable across app rosters
Same class of bug as required-stack.bats: hardcoded required_containers
included mempool/mempool-api unconditionally, so a node without the
mempool stack (e.g. .5) hard-fails restarting a container that was never
installed, and waits out full 180-240s timeouts probing endpoints that
will never come up. Likely explains .5's abnormally long (2216s) iteration
1 runtime during the current multinode-pass run. Same skip-if-absent fix
as the prior commit.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 13:59:06 -04:00
archipelago
f1055164d2 fix(tests): make required-stack.bats portable across nodes with different app rosters
Found live during the .5 multinode-pass run: this suite was hardcoded to
.116's exact app bundle (including the mempool stack), so any node missing
an app hard-failed instead of skipping — and a missing local fail() helper
(present in 3 sibling bats files, absent here) masked the real error as
"command not found" (exit 127). Add the same skip-if-absent idiom already
used in mempool.bats per-app, and define fail() locally like the others.
Verified: skips cleanly on .116 (no bitcoin-knots here), still exercises
real checks for apps that are installed.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 13:56:24 -04:00
archipelago
6b7af884ab docs: multinode pass swapped to .5, 5x gate launched
.198 IBD/pruned blocker → user chose swap over wait/hardware. .116 ruled
out (no bitcoin container), .120 ruled out (reserved for another dev). .5
(archy-x250-beta) is fully synced despite also being sub-1TB/pruned;
bootstrapped bats+jq and launched the 5x destructive gate there.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 13:04:04 -04:00
archipelago
9cc288521d docs: multinode pass — cleared .198 preconditions, hit a real hardware blocker
Reset-failed 2 stale dead-unit records on .198, confirmed nginx lnd proxy
target is correct. Hit a genuine blocker needing a user decision: .198's
448GB disk is below the 1TB archival threshold so it runs pruned bitcoin,
currently only 21% through IBD — the multinode plan's precondition requires
pruned:false + fully synced.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:45:41 -04:00
archipelago
0323310c91 docs: close out Tier 1 tracker items — all 3 turned out non-issues
immich is already fully Quadlet-migrated (verified live on .228, same
install_stack_via_orchestrator primitive as netbird/btcpay). TanStack Query
spike recommends not adopting — no cache/staleness bugs, WS push already
covers hot data. Netbird reinstall adoption-skips-cert-render is correct by
design (adoption only fires when no manifest exists to render from anyway).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:37:23 -04:00
archipelago
79bbcca964 docs: consolidate OTA 1.8.0 + master-plan open items into one priority-ordered tracker
docs/UNIFIED-TASK-TRACKER.md replaces hunting across SESSION-1.8.0-OTA-PROGRESS.md
and PRODUCTION-MASTER-PLAN.md for "what's left" — fastest/simplest tasks first.
Verified against live code/nodes rather than trusting doc text: several previously
"open" items (bind-dir chown, netbird legacy installer, launch-port fallback,
archival-bitcoin manifest field, progress-UI monotonicity, all-apps coverage,
fedimint test coverage, changelog backfill, portainer image pin, grafana quadlet
activation) turned out already shipped or non-issues, and are closed out here.
TESTING.md's release-gate checklist updated to match reality (cargo warnings,
5x gate, changelog already green; multinode/backend-default-flip/tag genuinely open).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:29:26 -04:00
archipelago
177b8a4338 feat(mesh): show federated Archipelago nodes on the Mesh Map
Peers that opt in via a new "Share Location" toggle in Settings
(server.set-location RPC) get plotted on other trusted peers' Mesh Map
with a distinct Archy-logo marker, separate from raw LoRa radio peers.
Location is persisted locally, carried in NodeStateSnapshot, and
propagated through federation sync/delta like other node state.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 12:04:31 -04:00
archipelago
e3baaa5de3 docs: record fleet-deploy ENOSPC bug + fix + cleanup outcome
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 11:01:27 -04:00
archipelago
84d35b3b68 fix(deploy): also exclude .venv from the rsync payload
reticulum-daemon/.venv (a local Python virtualenv bundling PyInstaller +
esptool + Qt hooks, several hundred MB) was also being synced to deploy
targets uncached -- same class of bug as the releases/ exclude just added.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 10:54:34 -04:00
archipelago
aa849849e8 fix(deploy): exclude releases/ from the rsync payload
releases/ (the local repo's own historical build artifacts -- dozens of
versioned binaries + frontend tarballs, 7-10GB) was never excluded, so every
deploy synced it to the target's root disk. Filled .198 (29GB disk) to 100%
mid-deploy and .228 to 100% right after a "successful" deploy -- the target
node never needs its own copy of the release archive, only the built
binary+frontend actually get installed into system paths.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 10:44:07 -04:00
archipelago
bebf3bae10 fix(mesh): Reticulum garbage-text + reconnect churn + signal bars + node naming/HTTPS
- reticulum.rs: send_text_msg was lossy-UTF8-mangling binary CBOR control
  envelopes (ReadReceipt etc.) before sending as LXMF text; base64-encode
  with a marker instead, decoded losslessly on receive.
- typed_messages.rs: mesh.send-read-receipt fired automatically on every
  chat view with no is_archy_peer gate, so viewing a message from a stock
  (non-archy) LXMF peer auto-sent it an undecodable control envelope,
  surfacing as garbage text right after whatever it just sent. Now a no-op
  for non-archy peers.
- mesh/listener/mod.rs: RX_STALL_TIMEOUT was 300s and forced a full
  auto-detect reconnect on any otherwise-healthy but quiet mesh link
  (visible as "Connecting..." flapping); this also wiped Reticulum's
  in-memory peer-address table every cycle, breaking messaging with peers
  who hadn't re-announced in the window. Bumped to 1800s.
- reticulum.rs: persist the peer prefix/dest-hash/display-name table to
  disk so a restart doesn't force every peer back to "Anonymous Peer"
  until they re-announce.
- decode.rs/frames.rs: Meshcore was discarding the SNR its wire format
  carries; wire it onto the peer record. Mesh.vue's signalBars() now falls
  back to SNR-based bars when RSSI is unavailable (always true for
  Meshcore); Reticulum has neither and correctly stays at 0/"no data".
- system/handlers.rs, dispatcher.rs: new system.get-hostname RPC + cert
  regeneration (with a proper SAN) whenever server.set-name changes the
  hostname, so HTTPS doesn't add a mismatch warning on top of the
  self-signed one after a rename.
- AccountInfoSection.vue: surface the mDNS hostname + http/https links in
  Settings (HTTPS needed for mic/camera secure-context features) — never
  forced, both keep working.
- build-auto-installer-iso.sh: ship avahi-daemon so .local names actually
  resolve on the LAN, and give the self-signed cert a real SAN instead of
  a bare CN, both at image-build and install-time-fallback.
- Mesh.vue/MediaLightbox.vue/mesh-styles.css: mic/attach-stack no longer
  closes on a plain hover-past; mesh images open in the shared lightbox
  and have a real download button; lightbox close button moves to
  bottom-center on mobile instead of under the status bar; mesh device
  panel gets the same height/padding as its sibling tabs.

Verified: 108/108 mesh unit tests, deployed + confirmed healthy on
.116/.198/.228 (matching binary hash across all three), live Reticulum
messaging confirmed working end-to-end post-deploy.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 10:42:20 -04:00
2a6e624189 toggle for wifi and switch-router on openwrt page 2026-07-01 14:06:02 +00:00
archipelago
99cd82ab0a fix(ui): catch useAudioPlayer's play() rejection instead of leaving it unhandled
play() on the underlying <audio> element rejects independently of its
'error' event (e.g. NotSupportedError when a peer-content request 404s and
there's no decodable source) — the 'error' listener already sets a friendly
message, but the unawaited play() promise still surfaced as a raw unhandled
rejection in the console. Follow-up from the .116->.228 peer-content
investigation (2026-07-01).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 09:53:50 -04:00
e497f8fed1 feat(home): surface TollGate status on the Network tile
Add a TollGate row (Enabled/Disabled/Not installed) to the Home
dashboard's Network tile, polling the existing openwrt.get-status RPC
on the same cadence as the other network rows. Only rendered once an
OpenWrt router is actually configured, so nodes without one aren't
cluttered with an always-"Not configured" row.

Also fixes the underlying reason this could never have worked: nothing
in the OpenWrt Gateway flow ever persisted the router's host/credentials
server-side — the "connect" form only kept them in local component
state, so any no-args openwrt.get-status call (this new tile, and even
the Gateway page's own reload) always failed with "No router
configured" despite a fully working, provisioned router. Now
handle_openwrt_get_status saves the connection to router_config.json
whenever a host is explicitly passed in and the connection succeeds.
2026-07-01 13:25:43 +00:00
archipelago
5269d50039 docs: record .198 cleanup outcome + .228 fedimint-guardian clarification
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 09:20:15 -04:00
archipelago
09d42cbbf7 fix(orchestrator): immich uninstall must disable its sibling app_ids too
orchestrator_uninstall_app_ids("immich") only disabled the "immich" app_id
itself; "immich-postgres" and "immich-redis" (separate orchestrator-tracked
manifests, same pattern as mempool-api/archy-mempool-db) stayed enabled, so
the boot reconciler kept restarting their leftover stopped containers
forever after the generic uninstall path stopped them (.198, 2026-07-01 --
found while uninstalling immich to relieve disk I/O pressure competing with
a slow Bitcoin IBD).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 09:12:13 -04:00
archipelago
d0710e7491 fix(orchestrator,content): bound repair-recreate loops; self-heal stale content catalog entries
- prod_orchestrator.rs: the boot reconciler's zombie-guard and start-failed
  recreate paths (Created/Stopped/Exited states) had no attempt cap, unlike
  health_monitor's independent restart tracker. A container whose entrypoint
  fatally crashes right after `podman start` succeeds got stop+remove+
  install_fresh'd every ~30s reconcile tick forever (portainer on .198,
  2026-07-01: a DB schema newer than the pinned binary could read -- no
  amount of recreating fixes that). Added a 5-attempts/30-minute circuit
  breaker; once exhausted the container is left alone with an error! log
  instead of looping, and an explicit install/start clears the counter.
- content_server.rs: serve_content now prunes a catalog entry whose backing
  file is missing on disk, instead of leaving it advertised to every peer
  forever with no way to distinguish "gone" from "transient failure."

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 08:19:54 -04:00
d6c1feca97 fix(openwrt): fix TollGate provisioning pipeline, add reconfigure UI
Several compounding bugs were blocking end-to-end TollGate provisioning
on OpenWrt 25.x (apk-native) routers:

- install_ipk's non-ar fallback assumed a flat tarball, but some .ipks are
  a gzip tar of the three classic ipk members one level deep; it was
  dumping debian-binary/data.tar.gz/control.tar.gz straight into / instead
  of unpacking the real payload.
- Manually-extracted packages never ran their pending /etc/uci-defaults/*
  scripts (that only happens through opkg/apk's own postinst bookkeeping),
  so nothing ever created /etc/config/tollgate.
- uci_apply() never ensured the target config file existed first — `uci
  set` fails outright on a config namespace nothing has created yet, which
  is true for a package-defined one like "tollgate" (unlike wireless/
  network/dhcp, which ship by default).
- The installed-check and restart_services looked for a binary/init script
  named after the opkg package ("tollgate-module-basic-go"/"tollgate"),
  but the real on-disk names are tollgate-wrt — so status always reported
  "not installed" and service restarts silently no-op'd.
- provision_ssid used `uci add`, creating a new wifi-iface section (and
  therefore a new duplicate broadcast SSID) on every provision call instead
  of updating one in place.

Also adds a TollGateConfig.enabled field so the enable/disable state is
actually applied to the running service and the SSID's own broadcast
(stop + disable at boot, or start + enable), not just written to UCI.

On the frontend, the OpenWrt Gateway page's TollGate panel was read-only
once installed — add an edit form (price, step size, min steps, mint URL,
enabled toggle) that reuses the same idempotent provision-tollgate call.
2026-07-01 11:59:43 +00:00
1866c40edf fix(openwrt): detect radios and scan networks on vendor MediaTek drivers
Routers running MediaTek's proprietary mt_wifi SDK driver (e.g. GL.iNet)
never register with cfg80211/mac80211, so they have no `iw dev` entry and
no /sys/class/ieee80211 phy even though the radio is real and working —
find_wireless_iface was bailing with "No wireless radio found" on these.
Fall back to iwinfo's device listing, which abstracts over vendor backends
too, and to the vendor's iwpriv site-survey ioctl for scanning when iwinfo
itself can't trigger a scan on the interface.
2026-07-01 11:59:28 +00:00
6299e91544 fix(kiosk): stop HDMI mode detection from perpetuating a bad clone state
configure_display picked whichever mode was already "active" on the HDMI
output, so if X ever booted cloned to the laptop panel's resolution it
would keep re-confirming that wrong mode forever instead of self-healing
to the display's native mode.
2026-07-01 11:59:21 +00:00
archipelago
d414ae3daa fix(orchestrator,ui): stop crash-looping orphan stack members; dedupe Electrum launch overlay
- crash_recovery.rs: stack boot/runtime recovery (immich/indeedhub/netbird) now
  requires the stack's core dependency container to exist before touching any
  sibling, instead of firing on any leftover container. Fixes an infinite
  120s-interval crash loop where orphan debris from a partial/failed install
  (indeedhub-api with no indeedhub-postgres ever created) was repeatedly
  force-restarted against a dependency that doesn't exist, which also blocked
  a real reinstall via container name conflicts.
- AppSessionFrame.vue: the generic app-loading overlay and the ElectrumX
  sync-in-progress overlay could render simultaneously (same z-index) during
  launch. The sync screen is strictly more informative, so it now takes
  precedence instead of the two stacking on top of each other.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 07:02:01 -04:00
archipelago
5b7cd5d5d0 fix(orchestrator): durable uninstall marker for baseline apps + archival-bitcoin/version-report gaps
- mempool-api now declares dependencies:[bitcoin:archival] directly, closing a
  gap where installing it standalone (a legitimate direct orchestrator-install
  target) bypassed the mempool umbrella's pruning gate entirely.
- New durable user-uninstalled marker (crash_recovery.rs, mirrors user_stopped)
  fixes required-baseline-app self-heal (bitcoin-knots/electrumx/lnd/mempool/
  etc.) resurrecting itself after an explicit uninstall survives a restart or
  reboot, since the in-memory disabled set is wiped by every load_manifests().
- installed_version() (set_config.rs) no longer trusts a floating image tag
  ("latest") as the reported running version -- a stale local :latest cache
  reported "latest" forever regardless of what latest had moved on to. Now
  falls back to asking the Bitcoin backend directly via `bitcoind --version`
  when the tag is floating.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 06:29:11 -04:00
archipelago
de8b2bb812 fix(iso): raise /tmp tmpfs cap above systemd's 50%-of-RAM default
PyInstaller-based daemons (e.g. the Reticulum mesh daemon) self-extract
to /tmp on every launch and leak their extraction dir on abnormal exit;
combined with release-build staging dirs this exhausted the default cap
on .116 and silently broke Reticulum ("no space left on device" during
self-extraction, masquerading as a connect failure). Ship a tmp.mount.d
override (75% of RAM) in the installer image so fresh installs don't
inherit the same ceiling.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 04:53:52 -04:00
archipelago
306b6356ee fix(orchestrator): generalize launch-port fallback + archival-bitcoin dependency gating
Master-plan backlog §10b/§10c: replace two per-app-hardcoded lookups with
generic, manifest-driven behavior so future apps are covered automatically
instead of needing a code edit.

- extract_lan_address (docker_packages.rs) now skips container-side ports
  that are known non-HTTP (SSH, FTP, common DB ports) instead of blindly
  taking podman's first-listed port. Fixes the whole class of bug the gitea
  SSH-before-web static override was a one-off patch for.
- requires_unpruned_bitcoin (dependencies.rs) now checks the app's own
  manifest for a `bitcoin:archival` dependency declaration first, falling
  back to the old hardcoded id list. electrumx and mempool manifests now
  declare it explicitly as the proof case.

869/869 Rust tests green, catalog drift clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 03:59:00 -04:00
archipelago
46dae75a0f feat(mesh): device onboarding modal (backlog #6)
Guided prompt that pops up when a mesh radio is detected but not yet
connected -- wraps the existing Connect action (mesh.configure with
device_path) rather than building a new setup engine. Dismissible per
device path (won't re-prompt for the same undismissed-but-ignored device on
every poll tick). Not the whole-app identity/seed onboarding system
(useOnboarding.ts) -- confirmed unrelated, this is mesh-specific only.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 23:24:12 -04:00
archipelago
712df2278f feat(mesh): Meshtastic provisioning robustness (backlog #12)
Three fixes:
1. Modem-preset authoritative: parse_config_lora_region now also decodes
   modem_preset (field 2) alongside region, tracked as current_modem_preset.
   ensure_lora_region's "region already set, don't touch it" branch (correct,
   unchanged) now ALSO re-asserts LONG_FAST when a real observed preset has
   drifted -- previously modem_preset only ever got written when region was
   UNSET, so a radio with the right region but wrong preset was never fixed.
   Only acts on an actually-observed wrong value (never speculative), so it
   can't reboot-loop.
2. RX-stall watchdog: run_mesh_session now bails (triggering the existing
   auto-reconnect path) if no frame has been successfully received in 5
   minutes -- the existing consecutive_write_failures counter is blind to a
   receive-only stall (writes can keep succeeding while inbound streaming is
   wedged).
3. Hot-swap detection: spawn_mesh_listener now compares self_node_id across
   session restarts and logs clearly when the physical radio itself changed
   (not just an ordinary reconnect of the same board). Per-session device
   state (contacts, current_region, etc.) was already naturally isolated
   per-session (fresh struct each reconnect) -- nothing else needed clearing.

107/107 mesh tests pass (2 new: modem_preset decode + the
absent-field-defaults-to-LONG_FAST case).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 23:21:29 -04:00
archipelago
494f272815 feat(mesh): Device settings tab (backlog #8)
New MeshDevicePanel.vue, added as a 4th/5th tab entry to activeTab/toolsTab/
mobileTab following the exact existing pattern (chat/bitcoin/deadman/
assistant/map). Shows firmware version, node ID, advert name, LoRa region,
channel, and device type -- firmware_version/self_node_id were already
server-side but never rendered; region is new (composed into MeshStatus from
MeshConfig.lora_region at read time, not part of the live session state).
Reboot button wired to the already-working mesh.reboot-radio RPC.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 23:03:09 -04:00
archipelago
4a309a3ee4 feat(mesh): RSSI/SNR dBm tooltip on the existing signal-bars indicator
The bars UI (signalBars/.mesh-signal-bars) was already built and wired to
mp.primary_rssi -- it just needed real backend data, which the previous
commit provides. Adds primary_snr alongside primary_rssi in MergedPeer and a
hover tooltip showing exact dBm/SNR values.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 22:54:51 -04:00
archipelago
02b6b52a8c feat(mesh): Meshtastic RSSI/SNR + peer-location map wiring (backlog #14/#15, part 1)
Backend: parse_mesh_packet now decodes MeshPacket.rx_snr (field 8, float) and
rx_rssi (field 12, int32), and a new POSITION_APP branch decodes Position.
latitude_i/longitude_i (fields 1/2, sfixed32) -- all field numbers confirmed
against the canonical meshtastic/protobufs mesh.proto, not guessed. Threaded
through ParsedContact -> refresh_contacts -> MeshPeer (mirroring how
pkc_capable was wired for #17), so mesh.peers now surfaces real rssi/snr/lat/
lon instead of always-null. Fixed a real bug found along the way:
update_node_info's unconditional contact replace would have silently wiped
any already-tracked signal/position data on the next NodeInfo packet -- now
preserves it.

Frontend: mesh.ts's updateNodePositionsFromPeers() feeds real position data
into the SAME nodePositions map MeshMap.vue already renders from (parallel to
the existing Coordinate/Alert-message path) -- MeshMap.vue itself needed zero
changes, it was already built for this.

105/105 mesh tests pass (4 new: rx_snr/rx_rssi decode, position decode +
incomplete-field handling, full packet_to_inbound_frame integration).

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 22:52:42 -04:00
archipelago
dfca007949 wip(mesh): parse MeshPacket rx_snr/rx_rssi fields (Meshtastic backlog #14, part 1/many)
Field numbers confirmed against the canonical meshtastic/protobufs mesh.proto
(rx_snr=8 float, rx_rssi=12 int32), not guessed. Not yet threaded through to
ParsedContact/MeshPeer/mesh.peers — that's the next step. Part of the
Meshtastic 1.8.0 backlog plan (RSSI/SNR indicator, peer-location map, Device
tab, provisioning robustness, onboarding modal) — see
.claude/plans/floofy-riding-seahorse.md for the full plan.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 22:27:54 -04:00
archipelago
0eb5c258f5 fix(mesh): Meshtastic 3ccc pkc_capable pill + Sideband image interop + critical CBOR wire-bloat fix
Merges in the meshtastic agent's now-finished work alongside this session's
continuation: stock-peer (3ccc) PKI-capability is now stamped through
get_contacts -> refresh_contacts -> MeshPeer.pkc_capable, so a directed DM to/from
a PKC-capable stock Meshtastic peer correctly shows the E2E pill on the Sent row,
not just received messages. Confirmed live: .198 sees "Meshtastic 3ccc" with
pkc_capable=true.

Also fixes two real interop/correctness bugs found while live-testing the
Reticulum <-> Sideband link:
  - Receive: the daemon only ever read LXMF's plain-text content, silently
    dropping native FIELD_IMAGE/FIELD_FILE_ATTACHMENTS fields — a stock
    Sideband/NomadNet photo vanished into a blank-space message. Now decoded
    into the same ContentInline typed envelope our own attachments use.
  - Send: images to a non-archy (stock) peer now use native LXMF FIELD_IMAGE
    instead of our own opaque CBOR wire format, which Sideband can't decode.
  - Root cause of a garbled MC-chunk-fragment bug: TypedEnvelope.v/.sig (the
    OUTER wrapper every message type uses) serialized raw bytes as a CBOR
    array-of-integers instead of a native byte string, bloating every
    message on the wire ~2-3.5x — enough to push even a tiny ReadReceipt
    over the 140-byte single-frame chunking threshold. Root-caused by
    reading ciborium's deserializer source directly (deserialize_bytes only
    works within its internal scratch buffer; deserialize_byte_buf streams
    unbounded).

Frontend: consolidated the attach/record buttons into a single animated "+"
menu (was overflowing the compose row).

857/857 tests pass. Verified live across all 5 deploy-roster nodes.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 22:07:45 -04:00
archipelago
f54c853128 feat(mesh): Reticulum LoRa hardware gates pass + RNS Resource transfer + image/voice attachments
Phase 0 gates #2/#3 (two-node LXMF-over-LoRa, external Sideband interop) passed
on real hardware (.116's flashed Heltec V3 RNode <-> a phone-flashed RNode running
Sideband) — RNS announce, encrypted DM round-trip, and contact binding all verified
live. Fixed two bugs found in the process: the Reticulum send path wasn't stamping
outbound messages as E2E despite LXMF being unconditionally encrypted, and the
per-message transport pill collapsed Meshcore/Meshtastic into one generic "lora"
color instead of distinguishing the three radio transports.

Built on top of that link: a Columba-style image/file send experience —
compression-quality presets with a real transfer-time estimate (mesh.transport-advice,
now device-throughput-aware), receive-side thumbnail previews + auto-render for
already-local attachments, and async voice messages, all reusing the existing
ContentRef/ContentInline attachment pipeline. The headline addition is genuine RNS
Resource transfer support (daemon-side RNS.Link + RNS.Resource, Rust-side
send_resource/resource_recv plumbing, a new "resource-mesh" transport-advice tier)
so compressed photos up to 2MB now actually transfer over LoRa for Reticulum peers
instead of always falling back to Tor past the small inline-chunk cap.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 19:57:01 -04:00
f3cbeb2834 first commit of openwrt-tollgate integration 2026-06-30 20:30:26 +00:00
archipelago
12e7990b10 fix(mesh): route Meshtastic public-channel text to the channel thread, not DMs
Inbound Meshtastic text addressed to BROADCAST_NUM (the default public
LongFast channel, or any channel slot) was filed into a per-sender 1:1 DM
thread, so public-channel messages polluted individual people's DM chats
and appeared as if sent directly to the user.

packet_to_inbound_frame now detects `to == BROADCAST_NUM` and emits a new
synthetic RESP_MESHTASTIC_CHANNEL_TEXT frame
([channel_idx][sender_prefix(6)][text]) that the listener files under the
channel thread (contact_id = u32::MAX - idx) while still attributing the
message to its real sender. Directed text (to == our node) still routes to
the DM thread — a regression test locks that split in.

send_channel_text now sets MeshPacket.channel (field 3) so archy actually
transmits on channel 0 (public) instead of ignoring the slot. Mesh.vue keeps
the synthetic "Meshtastic !xxxx" sender id when that is the best identity
available for a stock public-channel device.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 14:33:30 -04:00
edbad30501 fix(openwrt): TollGate apk-native install for OpenWrt 25.x
- WISP wizard: step-by-step flow for WiFi, DHCP, masquerade config
- WAN status: expose lan_ip, dhcp_start/limit, masq, sta_state, wifi_log
- wifi_scan: detect CCMP as WPA2 (psk2) so association succeeds
- opkg: PkgManager enum — detect apk-native mode when opkg not in repos
- tollgate: apk-native install path using manual ipk extraction
- arch detection: read DISTRIB_ARCH from /etc/openwrt_release; normalise
  bare mipsel/mips from uname -m to mipsel_24kc/mips_24kc
- install_ipk: install binutils via apk when ar not in BusyBox
- install_ipk: wget --no-check-certificate for routers without CA bundle
- install_ipk: ar fallback to tar -xzf for non-standard ipk formats
- install_ipk: 5MB overlay space check with clear user-facing error
- middleware: allow "Not enough flash/space" errors through sanitizer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
a862877189 feat(openwrt): add WAN diagnostics to get-status and UI
get_wan_status now returns: radio0_disabled, sta_iface (from iw dev),
sta_state (operstate), assoc_ssid (actually associated SSID vs
configured), and recent wifi_log lines from logread. The WAN panel
shows a diagnostic grid when configured but not connected so the user
can see exactly what's wrong without digging into server logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
33b96f4acf fix(openwrt): enable radio0 when configuring WISP
configure_wisp was setting up wireless.wwan but leaving
radio0.disabled=1, so wifi reload did nothing and the sta
interface never appeared. Explicitly set radio0.disabled=0
before committing the wireless UCI config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
5ab569f150 fix(openwrt): use iw phy interface add for scan when no UCI wifi-iface exists
wifi up does nothing without a wifi-iface section in UCI (common on fresh
flash). Instead, create a temporary managed interface directly on phy0
via nl80211 (iw phy phy0 interface add scan0 type managed), scan on it,
then delete it. No netifd/UCI involvement needed for scanning.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
9dc2343b60 fix(openwrt): enable radio0 and run wifi up before scanning
On a freshly-flashed OpenWrt router, radio0 is disabled by default so
iw dev returns empty. Detect the PHY via /sys/class/ieee80211/, enable
radio0, run `wifi up`, then poll up to 8s for netifd to create the
virtual interface before handing it to iwinfo scan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
ddc839400a fix(openwrt): use iw dev for wireless interface detection
wlan0 doesn't exist on OpenWrt 25.x with mt76 drivers (Cudy TR1200);
interfaces are named phy0-ap0 etc. `iw dev` handles all mac80211
naming styles. The old while-read loop also exited with code 1 when
no match was found, causing run_ok to fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
9a782fb551 feat(openwrt): WAN/WISP setup from the UI with WiFi network scan
New RPC methods:
- openwrt.scan-wifi: triggers iwinfo scan on the router radio,
  returns networks sorted by signal strength
- openwrt.configure-wan: creates UCI wireless.wwan (sta mode) +
  network.wwan (DHCP) + adds wwan to firewall WAN zone, then
  calls `wifi reload`

get-status now includes a `wan` object with configured/ssid/ip/
internet fields so the UI can show current uplink state.

Frontend WAN panel: scan → pick SSID (signal bars) → enter password
→ apply. Shows "Configure WAN first" hint above TollGate install
button when internet is not available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
dd3a3dfbac fix(openwrt): capture apk stderr and run apk update before apk add opkg
apk errors were being silently dropped (stdout only). Run apk update
first and fail with a clear "router may have no internet" message if
it fails, rather than a cryptic exit-1 from apk add.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
5d82e6ff8d fix(openwrt): bootstrap opkg via apk on OpenWrt 25.x routers
OpenWrt 25.x switched from opkg to apk as the default package manager,
so devices like the Cudy TR1200 on 25.12.4 don't have /usr/bin/opkg.
When opkg is missing but apk is present, install opkg through apk first
so the rest of the provisioning flow can proceed unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
58266dea66 fix(openwrt): allow opkg-not-found error through RPC sanitizer
"opkg not found at /usr/bin/opkg" was being swallowed by the error
sanitizer and shown as generic "Operation failed". Also fix bare
`opkg list-installed` call in get-status handler to use full path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
bc1ec9aa3e fix(openwrt): use full opkg path and pre-check availability
`channel.exec()` doesn't source the shell profile, so PATH may not
include /usr/bin on some routers. Using /usr/bin/opkg explicitly
avoids exit-127 surprises. Added opkg_check() to give a clear error
("firmware may not support package management") before attempting
opkg_update, rather than a confusing "command not found" exit code.
Also split the BusyBox-hostile `grep -v 'all\|noarch'` into two
separate greps for the arch-detection fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
4c56e1bb96 fix(openwrt): detect opkg silent failure and show disk space error
BusyBox opkg exits 0 even when 'Cannot install' due to insufficient space,
causing the fallback to silently report success. Now captures stderr and
checks for the failure string explicitly.

Adds user-visible error for the common case where the router flash is too
small for the TollGate package (~19 MB needed vs ~9 MB available on typical
budget routers). Adds error prefixes to the RPC sanitizer allowlist so the
message reaches the UI instead of showing 'Check server logs'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
f69fac627a style(openwrt): adopt glass-card design system for contrast
Replace bg-white/5 card containers with glass-card (rgba(0,0,0,0.65) +
backdrop-blur), match input styling to Login.vue, and use glass-button
variants for actions. Fixes low contrast against the background image.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
f054766a58 feat(openwrt): add TollGate provision button and direct-download fallback
- OpenWrtGateway.vue: add "Install TollGate" button when not installed;
  tracks connected credentials for reuse in the provision call
- install.rs: fall back to wget download from GitHub releases when the
  package is not in any opkg feed (mips_24kc and other arches supported)
- openwrt.rs: provision-tollgate now falls back to saved router_config
  for credentials, matching the behaviour of get-status

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
6c534715ec fix(openwrt): allow No router/OpenWrt errors through RPC sanitizer
Without these prefixes in the allowlist, sanitize_error_message swallowed
the "No router configured" error and returned a generic "Operation failed",
so the frontend could never detect the unconfigured state and show the
connect form.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
d71f36370d feat(openwrt): add OpenWrt gateway status view and get-status RPC
Backend: new `openwrt.get-status` RPC endpoint SSHes into the saved (or
provided) OpenWrt router and returns system info, TollGate config, and WiFi
AP interfaces via UCI.

Frontend: new OpenWrtGateway.vue view at /dashboard/server/openwrt shows
system hostname, OpenWrt version, uptime, TollGate install/enable state with
pricing and mint URL, and all AP-mode WiFi interfaces. Linked from the Local
Network section of the Server view.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
e0cc00be0f feat(openwrt): add archipelago-openwrt crate with TollGate provisioning
New `archipelago-openwrt` workspace crate provides SSH/UCI-based management
of OpenWrt routers, including automated TollGate installation and configuration
of a pay-as-you-go "archipelago" SSID backed by the local Cashu mint.

Exposes two RPC endpoints:
- `openwrt.scan` — discover OpenWrt routers on the LAN
- `openwrt.provision-tollgate` — install tollgate-module-basic-go, write UCI
  config (TIP-01/TIP-02), and create isolated WiFi SSID + firewall zone

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-30 17:12:57 +00:00
archipelago
f392670e2a feat(mesh): show sender identity on received channel messages
Received messages snapshot peer_name at receive time, so a Meshtastic
text that arrived before its sender's NodeInfo was stuck showing the
synthetic "Meshtastic !xxxx" id forever, and channel/group bubbles
showed no sender at all. Add a per-bubble sender label for received
messages in multi-sender views (mesh + Archipelago channels), resolved
LIVE from the peer table so it always shows the current archy identity
(e.g. "Arch Optiplex") the moment NodeInfo is learned. Falls back to
"Unknown sender" rather than echoing a Channel/synthetic placeholder.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:04:41 -04:00
archipelago
a57ae388ec fix(mesh): restore Meshtastic inbound stream after radio reboot
archy went deaf to inbound LoRa packets after every config write.
A config write (region/channel/owner) reboots the radio, which resets
the firmware PhoneAPI to STATE_SEND_NOTHING; it won't stream received
packets again until the client re-sends want_config. archy ignored
FromRadio.rebooted (field 8) so never resubscribed — which is why old
messages only arrived after a full restart (restart = fresh want_config).

- meshtastic.rs: handle FROM_RADIO_REBOOTED -> set pending_reinit;
  try_recv_frame re-sends want_config to resubscribe the packet stream.
  Add send_keepalive (bare heartbeat) and pin modem_preset=LONG_FAST in
  set_lora_region so all radios share frequency.
- listener/session.rs: MeshRadioDevice::send_keepalive; 10s sync_timer
  sends a keepalive each tick (insurance vs 15-min idle serial close).
- mod.rs send_message: device-aware send — Meshtastic archy peers get a
  plain TEXT_MESSAGE_APP DM (firmware PKC E2E); Meshcore archy peers keep
  the typed envelope (no meshcore regression).

Verified: .198->.228 directed DM arrives as RECEIVED enc=True
peer="Arch Optiplex"; all 3 nodes (.116/.198/.228) + 3ccc hear each
other. Binary 737b16c3 deployed+active on all three.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 12:44:31 -04:00
archipelago
fbfeeeb0f5 fix(mesh): native E2E DM for archy↔archy text + software radio-reboot
- send_message now sends archy↔archy plain text as a native TEXT_MESSAGE_APP
  DM (firmware PKC-encrypts E2E), not wrapped in the binary typed envelope
  that silently broke archy↔archy LoRa delivery. Archy peers' Sent rows are
  marked encrypted so the E2E pill shows; rich typed msgs still use the
  typed-wire path.
- Add a software radio-reboot to recover a wedged/RX-deaf radio without
  physical access (and for the Device-tab settings panel): driver reboot()
  via AdminMessage reboot_seconds=97 (verified vs meshtastic/protobufs),
  MeshCommand::RebootRadio, MeshService::reboot_radio, RPC mesh.reboot-radio.
- Handoff doc: docs/SESSION-1.8.0-OTA-PROGRESS.md "RESUME HERE" — RF link is
  the proven blocker (radios not hearing each other); modem_preset mismatch
  is the prime suspect; on-device Meshtastic-app check + fix plan documented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:39:34 -04:00
archipelago
b4531bb4fc fix(mesh): enforce LoRa-only off-grid labels 2026-06-30 06:22:45 -04:00
archipelago
2ac0711f8e fix(ui): refresh mesh transport labels after send 2026-06-30 06:05:41 -04:00
archipelago
a91814641e fix(mesh): set Meshtastic hop limit and show LoRa pill 2026-06-30 05:59:53 -04:00
archipelago
c2c4b5af7d merge: demo build updates
# Conflicts:
#	neode-ui/src/stores/appLauncher.ts
#	neode-ui/src/views/AppSession.vue
2026-06-30 05:22:42 -04:00
archipelago
daf750688d merge: mesh multiversion and transport pills
# Conflicts:
#	core/archipelago/src/mesh/listener/decode.rs
#	core/archipelago/src/mesh/meshtastic.rs
2026-06-30 05:19:58 -04:00
archipelago
4b7cbf2b5e merge: bitcoin version bulletproof and OTA work 2026-06-30 05:08:27 -04:00
archipelago
df9d3a55be integration: preserve deployed 1.8.0 OTA work 2026-06-30 05:08:17 -04:00
archipelago
7b0748c868 fix(mesh): respect the radio's flashed LoRa region (don't force ours)
ensure_lora_region previously force-overrode the device's region with the
mesh-config region (EU_868) whenever they differed — which would shove a US/ANZ
user's radio onto EU_868: an illegal band that also cuts it off from its local
mesh. Off-the-shelf interop must respect whatever region the user flashed.

Now: a radio that already reports a REAL region (US, EU_868, ANZ, …) is left
untouched. We only set a region when the device reports UNSET (a fresh radio is
RF-silent and can't mesh at all), using the operator-configured region as the
fallback. Unknown/None (never reported) is also left alone. Pairs with the
default-channel change so a meshtastic archy node behaves like a stock device.

cargo check green (built into the same binary as the channel fix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 08:36:04 -04:00
archipelago
810127fd3e feat(mesh): meshtastic off-the-shelf interop — default channel + private archipelago
Make a meshtastic-equipped archy node work like a stock Meshtastic device AND
keep the private archy group, instead of being isolated on a custom primary:
- slot 0 (PRIMARY)  = the DEFAULT public channel (empty name + default key) →
  interoperates with every off-the-shelf device on LongFast and picks up
  default-channel users; our NodeInfo broadcasts ride here like normal.
- slot 1 (SECONDARY) = "archipelago" (deterministic psk) → private archy↔archy.

Previously the driver set "archipelago" as the PRIMARY, isolating archy from the
public mesh. Now ensure_channel writes at most one channel per call (default
primary first, then archipelago secondary), reusing the existing reboot→
reconnect→re-check loop so it converges in ≤2 cycles without reboot-looping;
primary_is_default() accepts the default key in 1-byte or expanded form so a
stock radio is never needlessly rewritten. set_channel generalized to
(index, name, psk, role); want_config parse tracks both slots.

MeshCore needs no change — it never overrides channels (ensure_channel is a
no-op) and already rides MeshCore's default Public channel off the shelf.

cargo check green. NEEDS radio verify on .116/.198 (default-channel RX + archy
group on the secondary). Channel provision cap (3) covers the 2-write migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 07:40:10 -04:00
archipelago
067002b04b Merge branch 'bitcoin-version-bulletproof' into mesh-multiversion-integration 2026-06-29 06:45:50 -04:00
archipelago
20f762cb2c feat(fips): auto-peer LAN-discovered federation nodes directly over FIPS
Mesh/federation messages between co-located nodes were always falling back to
Tor because the FIPS overlay had no direct peering — every node depended on the
global anchor's spanning tree, and when that anchor link flaps a node is
isolated and all FIPS dials time out. (Diagnosed live on .116/.198: pure-FIPS
direct peering over UDP 8668 fixes it — 2.5ms vs timeout.)

Generalize the manual fix: in the existing 5-min FIPS seed-anchor apply loop,
also auto-connect every federation peer the PeerRegistry knows both a LAN
address AND a FIPS npub for, dialing its FIPS UDP transport (port 8668) at its
LAN IP via the same idempotent `fipsctl connect` path (new
anchors::lan_fips_anchors). This is FIPS's own transport over the LAN — NOT
Tailscale, NOT the HTTP/LAN messaging port. Transient (recomputed each tick from
live mDNS discovery, never persisted) so changing IPs self-correct. Remote peers
with no LAN address are untouched (still routed via the anchor).

Registry Arc hoisted out of the transport-init block so the loop can read
all_peers(). cargo check green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:42:18 -04:00
archipelago
11155055aa feat(mesh): meshtastic PKI E2E pill — surface pki_encrypted on received DMs
The synthetic meshcore-style frame the meshtastic driver builds can't carry the
radio's PKI-encryption status, so received meshtastic DMs never lit the E2E pill.
Thread it out-of-band: the device records `last_rx_encrypted` (= packet
pki_encrypted) when it yields a text frame; the session loop reads it via
`take_rx_encrypted()` right after dispatch and stamps the just-stored received
message E2E (dispatch::stamp_received_encrypted, monotonic-id keyed). Meshcore
returns false here (its E2E is derived in the frames decrypt path). Pure
out-of-band signal — no change to the shared meshcore wire format.

Built + deployed live in binary d937814e on .116/.198. cargo check green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:25:01 -04:00
archipelago
f4f45c1a09 docs: mark .228 reindex finish/verify as other-agent owned
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:04:01 -04:00
archipelago
ed1352d3a3 docs+catalog: bitcoin multi-version rollout handoff + reproducible generator
- generate-app-catalog.sh: VERSIONS map now lists the full Knots set
  (29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core
  (adds 29.2 + a `latest` entry → newest); generator forces top-level
  `version` == the default entry's version (the 169ff2e2 invariant) so
  regeneration is reproducible. releases/app-catalog.json regenerated.
- docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes,
  fixes, current .228 state, the coordinated fleet-rollout steps (incl.
  :latest repoint sequencing / fleet-safety), reindex finish procedure, and
  the switch-matrix test plan.
- PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:02:24 -04:00
archipelago
095a76cd20 fix(bitcoin): bulletproof multi-version switching (Knots & Core)
Three stacked bugs made "switch version" silently fail / crash-loop, and
the data-access mismatch corrupted a node's index during recovery attempts.

Backend renderer:
- sync_quadlet_unit ignored the per-app pinned version and re-rendered the
  quadlet with the manifest's :latest every reconcile tick, reverting any
  switch. Factor the install-time catalog/pin resolution into a shared
  resolve_catalog_image() and call it in BOTH install_fresh and
  sync_quadlet_unit.
- The renderer folded manifest `entrypoint: ["sh","-lc"]` into Exec=, which
  only worked when the image entrypoint was a passthrough shell wrapper. The
  versioned images use ENTRYPOINT ["bitcoind"], so Exec=sh -lc ... became
  `bitcoind sh -lc ...` and crash-looped. Emit a real Entrypoint= override;
  exec_changed now also compares Entrypoint=.

Images:
- Build all bitcoin images (Core + Knots, every version) as container-root
  (USER removed) like the legacy :latest image. Chain data is owned by the
  data_uid (container uid 102); root reads it via CAP_DAC_OVERRIDE (granted in
  the manifest). A non-root USER (the previous uid 1000) can't read existing
  chain data → "Error initializing block database". Still fully rootless:
  container-root maps to the unprivileged host service user.

Catalog:
- bitcoin-knots versions[]: 29.3.knots20260508/20260507/20260210 +
  29.2.knots20251110, "latest" tracking newest.
- bitcoin-core versions[]: add 29.2 + a "latest" entry. All images rebuilt
  root and published to the mirror.

Frontend:
- AppSidebar version dropdown: rename the latest option to "Always use the
  latest version" (no v prefix), fix right padding, and guarantee the current
  selection matches a real option (was rendering blank).
- New InstallVersionModal: full-screen version chooser shown from the App
  Store / Discover install button for multi-version apps (Bitcoin Knots/Core),
  app icon + "Install <name>", latest pre-selected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 05:46:04 -04:00
archipelago
3c7c04a662 fix(mesh): meshtastic receive — drain frame batch per poll + rx diagnostics
Addresses the open Meshtastic parity bug (project_meshtastic_parity): the
running driver received nothing (`mesh.messages` stayed []) though the radio
got the packets and sends worked.

Root-cause candidate: `try_recv_frame` decoded ONE serial frame per poll and
returned Ok(None) for every non-text FromRadio frame, so the session loop slept
50ms between frames. Under Meshtastic's frequent NodeInfo/telemetry stream a
received text packet queued behind them, and read_from_radio's 64KB buffer cap
could drain (drop) it before it was ever decoded — reception silently dead while
sends kept working.

- try_recv_frame now drains a bounded batch (64) per poll, processing each
  frame's side effects and returning the first inbound text frame, so a text
  packet is decoded the same poll it arrives and the buffer never grows enough
  to hit the lossy cap. Bounded so a continuous flood still yields to select!.
- packet_to_inbound_frame logs every decoded packet (from/portnum/payload_len)
  and a "did not parse (dropped)" case, so one live radio pass is conclusive.

The rest of the decode path was verified correct by inspection (FROM_RADIO_PACKET
=2, wire-type-5 handled, parse_mesh_packet sound, 60s heartbeat present) — not a
parse bug. cargo check green. NEEDS a live radio pass on a rig that isn't .228
(off-limits: bitcoin testing) to confirm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 05:04:09 -04:00
archipelago
11038cdcc9 feat(mesh,ui): per-message transport pill (Mesh/FIPS/Tor) + fix E2E pill
Adds a per-message transport badge to archy↔archy mesh chats and fixes the
long-broken E2E badge — both meshcore and meshtastic, styled like the existing
E2E pill.

Transport pill:
- New `MeshMessage.transport` ("lora"/"fips"/"tor"), surfaced in the UI beside
  the E2E badge (Mesh.vue transportLabel() → Mesh/FIPS/Tor, mesh-styles.css).
- Sent LoRa → "lora"; sent federation → finalized to the real leg ("fips"/"tor")
  once the background send resolves (req.send_json transport), via an id-keyed
  store update.
- Received: a post-dispatch stamp on handle_typed_envelope_direct's output
  (monotonic ids) tags both transports without threading through all 20 typed-
  dispatch sites — radio wrapper stamps "lora", federation injector stamps the
  peer's last_transport ("fips"/"tor", default tor; the inbound HTTP carries no
  FIPS-vs-Tor signal).
- Plain native/channel LoRa frames → "lora"; channel broadcasts stay non-E2E.

E2E pill fix:
- `encrypted` was hardcoded false at every MeshMessage construction site, so the
  UI badge (Mesh.vue `v-if="msg.encrypted"`) never showed. Now: federation
  envelopes are E2E (identity-signed over an encrypted transport); the meshcore
  native-DM receive path already had a real `encrypted` flag (now also tagged
  with transport). meshtastic-PKI radio E2E flag threading is a noted follow-up.

Backend cargo check + frontend vue-tsc build both green. Needs a live radio +
multi-transport pass on .116/.228 to confirm end-to-end (see
project_transport_pill / project_meshtastic_parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:29:25 -04:00
archipelago
169ff2e2cd fix(bitcoin): knots catalog default must equal top-level version
The knots versions[] marked 29.3.knots20260508 as default while the
top-level catalog version is the floating 'latest' tag — violating the
generator's own invariant (default:true MUST equal the top-level version
so selecting it un-pins / tracks latest). Live effect via package.versions:
catalog_default_version='latest' so the UI-highlighted default actually
PINS+recreates (opposite of un-pin) and 'latest' was unreachable from the
Version & Updates card.

Add a 'latest' default entry (== the manifest's floating tag) and keep
29.3.knots20260508 as a pinnable option. Verified on .228: package.versions
now returns default=latest with 2 selectable versions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 19:56:49 -04:00
archipelago
da20f67462 Merge bitcoin-multi-version: multi-version support for Core & Knots
Integrate the bitcoin-multi-version feature (commit 6aa74c73): per-node
choice/pin/switch of Bitcoin Core & Knots versions with auto-update toggle —
catalog versions[] schema, install-time selection, package.versions +
package.set-config RPCs, hourly per-app auto-update tick, build-bitcoin-image.sh
(GPG+SHA verified rootless image builder), and UI (version select + Version &
Updates card). Catalog regenerated; preserves the mempool 127.0.0.1 health fix.

Not yet live-verified on .228 — gate any tagged release on that per CLAUDE.md.
2026-06-28 18:48:38 -04:00
archipelago
6aa74c7386 feat(bitcoin): multi-version support for Core & Knots (install/switch/pin/auto-update)
Lets a node runner choose which Bitcoin Core / Knots version to install
(latest pre-selected), then switch, pin, or opt into auto-update from the
app's interface — all manifest/catalog-driven, rootless, signed-registry,
zero-data-loss. Motivated by upcoming BIP-110 signalling: runners need a
real choice of software version.

Backend:
- version_config.rs: per-app pin + auto-update persistence (atomic, merge-
  preserving), downgrade detection, auto-update enumeration (+ unit tests).
- app_catalog.rs: CatalogVersion / versions[] schema, catalog_versions(),
  catalog_image_for_version() (same-repo guard); a pin suppresses the update
  badge.
- prod_orchestrator.rs: pinned version wins over the catalog default on every
  install/recreate.
- install.rs: install-time `version` param persisted (default = unpinned).
- set_config.rs: package.versions (read) + package.set-config (write) RPCs;
  downgrade is gated behind explicit confirm (warn + confirm + allow).
- update.rs/main.rs: hourly per-app auto-update tick via the orchestrator
  (opt-in, pin-respecting); fix handle_package_update to be non-fatal for
  orchestrator-managed apps lacking a catalog primary image (bitcoin-core).

UI:
- MarketplaceAppDetails.vue: install-time version selector (shown when an app
  offers >=2 versions).
- appDetails/AppSidebar.vue: "Version & Updates" card (switch / pin / auto-
  update toggle / downgrade warning), per app.
- rpc-client.ts + en.json: RPC methods, types, strings.

Phase 0 image pipeline:
- scripts/build-bitcoin-image.sh: download official tarball + SHA256SUMS(.asc),
  verify SHA-256 + pinned-maintainer OpenPGP signature (fail-closed), build a
  minimal rootless image, smoke-test, tag + push.
- apps/bitcoin-core/Dockerfile rewritten (drops stale community base);
  apps/bitcoin-knots/Dockerfile added.
- generate-app-catalog.sh: emit curated versions[]; published + catalog now
  offers Core 25.2/26.2/27.2/28.4/29.3/30.2/31.0 + Knots 29.3.knots20260508.

docs/bitcoin-multi-version-design.md: live progress tracker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 18:46:17 -04:00
archipelago
3cea7dd6c5 test(phase3): fix Phase-3 quadlet gates — define fail(), drop stale Notify=healthy assert
Two Phase-3 bats suites used `fail` (a bats-assert helper) but bats-assert
isn't installed on the alpha fleet (only bats-core), so every tripped
assertion crashed with `fail: command not found` (status 127) instead of
reporting a real pass/fail. Define the same minimal `fail() { echo ...;
return 1; }` the other suites already use (see mempool.bats). Without this
the gates were silently non-functional.

Also rewrite the obsolete "HealthCmd= implies Notify=healthy" assertion in
use-quadlet-backends-install.bats. Phase 3.4's Notify=healthy was
deliberately reverted: gating `systemctl start` on health hung boot
reconciliation for dependency-waiting apps (fedimint idles until Bitcoin
IBD; lnd until macaroon unlock), leaving units stuck "deactivating". The
renderer now emits HealthCmd= for Podman's health state but TimeoutStartSec=0
and NO Notify=healthy (quadlet.rs render() + contains_stale_health_gate()).
The test now asserts the current invariant: no backend unit gates start on
health.

Verified on the .228 canary node (ARCHIPELAGO_USE_QUADLET_BACKENDS=1):
use-quadlet-backends-install 6/6, backend-survives-archipelago-restart 3/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 16:09:05 -04:00
archipelago
d7c6f8c348 fix(mempool): health-check 127.0.0.1 not localhost (stops false-unhealthy loop)
The archy-mempool-web health_check endpoint used http://localhost:8080.
Inside the frontend image, wget resolves `localhost` to ::1 (IPv6) first,
but nginx binds 0.0.0.0:8080 (IPv4) only -> the baked HealthCmd gets
"connection refused" every probe -> container is perpetually unhealthy ->
the reconciler recreates it forever (observed on .228: mempool container
re-Started every ~3 min, Health=unhealthy). Proven live: in-container
`wget http://localhost:8080/` = refused, `wget http://127.0.0.1:8080/` = OK.

Pin the probe to 127.0.0.1 so it matches nginx's IPv4 bind. Updated both
the source manifest and the embedded copy in releases/app-catalog.json
(the catalog overlay wins over the disk manifest on fleet nodes, so the
catalog copy is the one that actually reaches .228).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:09:34 -04:00
archipelago
83344b9f3a fix(orchestrator): drop legacy mempool umbrella manifest on catalog-driven nodes
The split-mempool-stack guard that skips the legacy monolithic `mempool`
manifest (whose container collides with its split-stack frontend member
`archy-mempool-web`) only ran over DISK manifests. On catalog-driven nodes
(no disk manifests — e.g. the Phase-3/registry-manifest path), the legacy
`mempool` manifest arrives via the registry-catalog overlay AFTER that
guard, so both `mempool` and `archy-mempool-web` end up owning container
`mempool` and rewrite+restart each other forever ("port binding drift" /
"network alias drift" loop observed on .228, leaving mempool down).

Enforce the guard once more over the merged (disk + catalog) manifest set:
drop the `mempool` umbrella whenever all three split members are present.
Installing `mempool` assembles the split stack, so `archy-mempool-web`
owns the frontend container either way.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:04:41 -04:00
archipelago
05c22b6085 fix(mempool): correct frontend container port 4080->8080 (stops restart loop)
The mempool manifest + embedded catalog declared the frontend container
port as 4080, but mempool-frontend nginx listens on 8080 (the stack
creates it as -p 4080:8080 with FRONTEND_HTTP_PORT=8080, see
api/rpc/package/stacks.rs). So every reconcile rendered the quadlet as
PublishPort=4080:4080, disagreed with the working 4080:8080 container,
and restarted it ("port binding drift" -> "host port 4080 did not become
reachable within 5s" -> "host listener disappeared; restarting") in a
perpetual loop on .228. Correcting the manifest container port to 8080
makes the rendered quadlet match reality so the drift/restart loop stops.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 13:49:54 -04:00
archipelago
6734947c3e fix(fmcd): cap CPU + watchdog-restart the iroh relay hot-loop
On NAT'd nodes that can reach the iroh federation neither directly nor
via iroh's public relays, fmcd's embedded iroh networking enters a
relay/hole-punch reconnect hot-loop that pegs its entire CPU allotment
indefinitely (observed ~1 core sustained for 4 days on a Tailscale node,
while LAN nodes that reach the guardian directly stay <3%). fmcd 0.8.0
exposes no iroh/relay knobs, so:

- fmcd-run now samples fmcd's own CPU and restarts it when it stays near
  its allotment for ~15 min (a restart demonstrably clears the stuck iroh
  state; real work is bursty and never flat-pegs a core for minutes).
- Lower cpu_limit 1 -> 0.25 core so a stuck instance can't starve the
  node (steady-state is <3% of a core; joins are brief).

Ships as fmcd:0.8.1 (launcher-only rebuild, same fmcd binary). Bumped the
image pin + cpu_limit in the manifest, image-versions.sh, the embedded
catalog manifest (releases/app-catalog.json), and the UI catalogs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 12:19:27 -04:00
archipelago
4519dbf04f fix(orchestrator): render manifest certs on the adopted-running reconcile path
WS-F #10: a netbird reinstall that adopts a leftover running container
skipped ensure_manifest_certs, so when its data dir was wiped the self-
signed tls.crt/key were never regenerated; the next nginx.conf rewrite +
restart then died on the missing cert (proxy 502, login broken). The
Running branch of ensure_running_with_mode now calls ensure_manifest_certs
before ensure_manifest_files, mirroring prepare_for_start's certs-before-
files ordering. Idempotent: a no-op when crt+key already exist.

Live-validated on .228: deleted netbird tls.crt/key under a Running
container; reconciler regenerated a fresh CN=<host_ip> self-signed cert
(1000:1000), https :8087 = 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 17:49:50 -04:00
archipelago
a38c9d5f29 docs(master-plan): §10d Meshtastic MeshCore-parity status (one open received-msg bug)
Region (EU_868) + shared channel "archipelago" auto-provisioning shipped in
8fdb45e8 and riding the rolled #9 fleet binary (0060dcd6). Discovery, RF, and
sending verified on .116+.228; the one open blocker is the running driver not
surfacing received messages. Slotted after WS-F #9–11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:53:06 -04:00
archipelago
f9a6ae3f32 feat(mesh): Meshtastic region + shared-channel auto-provisioning (MeshCore parity)
Fresh Meshtastic radios ship region-UNSET (RF-silent) and on mismatched
channels, so nodes only ever saw themselves. Bring them to MeshCore parity
using the official Meshtastic admin API:

- Auto-provision LoRa region (set_config, AdminMessage field 34) from a new
  mesh-config `lora_region` (e.g. EU_868) when the radio's region differs.
- Auto-provision a shared primary channel (set_channel, field 33) with a
  PSK derived deterministically from channel_name, so every node converges on
  one mesh — the parity equivalent of MeshCore's named "archipelago" channel.
- Read current region/channel from want_config; only write when different
  (no reboot loop); cap attempts so a radio that won't persist can't loop.
- Active NodeInfo advert scaffolding + aggressive serial drain.

Verified on .116+.228: region+channel persist, discovery works (both see each
other as named reachable contacts), bidirectional RF + sending confirmed.
Receiving in the running driver is still under diagnosis (instrumentation added).

Also removes the unwanted `meshtastic` daemon app from the registry (it was
never meant to be a container — native driver provides system-level support):
deletes apps/meshtastic + catalog entries (app-catalog, neode-ui, releases) +
test refs. Meshtastic stays native, like MeshCore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:46:35 -04:00
archipelago
fd3a4ee4ef fix(orchestrator): chown the whole fresh bind subtree, not just the leaf
ensure_bind_mount_dirs chowned a freshly-created no-data_uid bind dir
with --reference={immediate_parent}. For a NESTED bind source like
jellyfin's /var/lib/archipelago/jellyfin/config (or netbird's .../netbird/
data), `mkdir -p` creates the intermediate <app> dir root:root too, so
referencing the immediate parent just copied ROOT — leaving the dir
unwritable and the app EACCES-crash-looping on reinstall (found by the
all-apps-lifecycle pass: jellyfin "/config/log denied" exit 139;
netbird-server "unable to open database file"). It only ever worked for
direct children of the data root (immich).

Fix: anchor to the nearest PRE-EXISTING ancestor (the rootless data root,
owned by the service user) and chown -R the entire newly-created subtree
to it. Extracted the walk into fresh_subtree_anchor() with a unit test
covering nested / direct / second-volume cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:46:35 -04:00
Dorian
38d2bbf570 chore(android): update companion APK download [skip ci] 2026-06-26 13:08:37 +01:00
Dorian
a90fea80ed feat(android): edit server entries from in-app settings menu (NESMenu); bump to 0.4.12 (vc16)
The 0.4.11 edit affordance only lived on ServerConnectScreen, which a
connected user never sees. Add edit to NESMenu — the settings modal
reached via two-finger hold while connected: a ✎ pencil on each saved
server opens the form pre-populated (Edit Server header + Cancel),
persists via ServerPreferences.updateSavedServer(), and reconnects when
the edited server is the live one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 13:08:18 +01:00
Dorian
389e602097 chore(android): update companion APK download [skip ci] 2026-06-26 12:54:52 +01:00
Dorian
5677f9cca1 feat(android): edit saved server entries; bump companion to 0.4.11 (vc15)
Add an edit affordance to each saved server in ServerConnectScreen: a
pencil button loads the entry into the form (Edit Server mode) with
Save Changes / Cancel actions. Persisted via a new
ServerPreferences.updateSavedServer() that replaces by connection
identity (address/port/scheme) and keeps the active record in sync when
the edited server is the active one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:54:07 +01:00
archipelago
fc64b422e7 docs(master-plan): WS-F#3 first destructive run — 3 reinstall bugs found
Full all-apps-lifecycle pass on .228: lifecycle 11/11, teardown 8/11.
Surfaced (1) fresh-install bind-dir ownership root:root → reinstall
EACCES (jellyfin/netbird; Fix B misses the install path), (2) netbird
reinstall adopts leftover containers → skips manifest cert/file render,
(3) portainer image pin lfg2025/portainer:2.19.4 unpublished (manifest
unknown), pin overrides RPC dockerImage. .228 restored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:47:24 -04:00
Dorian
07b9b5a3aa docs(android): companion release + App-Not-Installed runbook
Capture the 2026-06-26 lessons durably: ship via the hardened publish
script only, v1+v2+v3 signing is enforced by apksigner (AGP ignores
enableV1Signing at minSdk>=24), diagnose install failures with adb
install FIRST, signature-key changes force a one-time uninstall, and
keep all phone/adb work scoped to com.archipelago.app.debug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:21:48 +01:00
Dorian
ac59771560 fix(android): force v1+v2+v3 signing & clean-build guards in companion publish
The published companion APK was v2-only (AGP silently ignores
enableV1Signing for minSdk>=24) and clean builds broke on stray
space-named resource dirs. Harden scripts/publish-companion-apk.sh:
clean build, remove/ýreject space-named res dirs, force v1+v2+v3 via
zipalign+apksigner, and abort unless all three schemes verify. Wire
ship-companion.sh to the shared script. Re-sign the served 0.4.10 APK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:53:25 +01:00
Dorian
d1f9e9ce88 chore(android): update companion apk download 2026-06-26 11:32:00 +01:00
Dorian
58847fc3d7 chore(android): bump companion to 0.4.10 (versionCode 14)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:31:36 +01:00
archipelago
a3e09eab57 docs(master-plan): WS-F#3 — destructive all-apps lifecycle matrix landed (43934eef)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:51 -04:00
archipelago
43934eefa5 test(gate): destructive all-apps lifecycle matrix (WS-F#3)
Active counterpart to the read-only all-apps-matrix.bats: drives
stop/start/restart for every installed app and, under
ARCHY_ALLOW_CASCADE_DESTRUCTIVE, a FULL teardown (uninstall →
no-ghost → reinstall) — the broad coverage F needs beyond the ~8 core
suites. App set is discovered from My Apps ∩ the node catalog; reinstall
spec comes from catalog.json {dockerImage, containerConfig}.

PROTECTED by default (never cycled or torn down): bitcoin*/electrum*
(expensive resync) AND lnd/btcpay*/fedimint* (teardown = irreversible
wallet/channel/guardian loss). The user asked to protect only
bitcoin+electrum; the wallet apps are added for safety and can be
removed via ARCHY_MATRIX_PROTECT. Heavy + destructive → a supervised
pass, not folded into run-gate. Validated on .228: discovery excludes
the 6 protected installed apps; lifecycle tier cycles a single app
(botfights) stop/start/restart green; teardown gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:22 -04:00
archipelago
80146f4476 docs(master-plan): WS-F#2 — uninstall progress bar made truthful (9f17ba68)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:15:11 -04:00
archipelago
9f17ba6867 fix(ui): truthful uninstall progress bar (was a solid full-red block)
AppCard's uninstall bar was hardcoded `w-full bg-red-400/60 animate-pulse`
— a solid, full-width, red, fake-pulsing block that never moved and read
as an error, no matter the actual teardown progress (the install bar, by
contrast, renders a real percentage). Derive a truthful percentage from
the backend's existing `uninstall-stage` label — "Stopping containers
(X/N)" → 10–50%, "Cleaning up volumes" → 70%, "Removing app data" → 90%
— and render it exactly like install: neutral fill, real width + percent,
shimmer (not a fake pulse) carrying motion when a stage has no number.
Frontend-only; the backend already broadcasts these stages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:04:48 -04:00
archipelago
67426c0d41 docs(master-plan): cascade tier wired into the gate (b7d92107)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:24:07 -04:00
archipelago
b7d9210784 test(gate): optional ARCHY_GATE_CASCADE pass — wire the cascade tier in
run-gate.sh ran only the DESTRUCTIVE tier; the cascade-uninstall suite
(uninstall→no-ghost→reinstall, the #13/#14/uninstall-hang regression
guard) existed but was never enabled by the gate. Add an opt-in single
cascade pass after the 5× loop (ARCHY_GATE_CASCADE=1, requires
ARCHY_ALLOW_DESTRUCTIVE=1), counted into the pass/fail tally. Kept out
of the 5× loop deliberately — uninstall/reinstall every iteration would
balloon runtime and re-pull images; one pass guards the class. Default
gate behavior unchanged. Validated: cascade-uninstall.bats 7/7 on .228.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:22:45 -04:00
archipelago
292a2650df docs(master-plan): WS-F — uninstall-hang root cause fixed + cascade validated
Workstream F now in-progress: the immich/grafana uninstall hang →
ghost/stuck-bar/reinstall-block is root-caused (unbounded systemctl/
podman in quadlet::disable_remove) and fixed (71cc9ac4); cascade-
uninstall.bats 7/7 on .228. Records the remaining F items + the pending
gate-wiring decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:18:39 -04:00
archipelago
71cc9ac46a fix(uninstall): bound systemctl/podman teardown so uninstall can't hang
Uninstalling immich/grafana could hang with a frozen full-red progress
bar, leave a ghost entry stuck in My Apps, and then refuse reinstall.
Single root cause: quadlet::disable_remove() — called first in the
uninstall task (via companion + orchestrator teardown) — ran
`systemctl --user stop`, daemon-reload, and `podman rm -f` with NO
timeout. On rootless podman a generated unit can wedge in "deactivating"
while podman hangs underneath, so `systemctl stop` blocks forever. The
spawned uninstall task then never returns Ok or Err, so:
  - set_uninstall_stage() (after the stop) never fires → progress frozen;
  - remove_package_state_entry() never runs → entry stranded in
    `Removing` → ghost in My Apps;
  - the install guard rejects reinstall with "already Removing".

The spawn wrapper already reverts state on Err and removes the entry on
Ok — the only failure mode was a hang that returns neither. Bound the
teardown so it always terminates:
  - systemctl stop → QUADLET_STOP_TIMEOUT, escalate to kill+reset-failed
    on timeout (reuses the existing helpers);
  - daemon_reload_user() → bounded systemctl_user_status (30s);
  - defensive `podman rm -f` → wrapped in tokio timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 04:27:02 -04:00
archipelago
2ebcd8f9a8 docs(master-plan): backlog — smart launch-port selection + manifest-driven archival-node blocker
§10b: replace per-app static launch-port map with a manifest-first +
non-HTTP-port-skipping heuristic (the gitea :2222 class).
§10c: generalize the un-pruned/archival Bitcoin install blocker from a
hardcoded requires_unpruned_bitcoin() match to a manifest-declared
dependency, with a clear pre-install UX.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:47:25 -04:00
archipelago
3515344800 docs(master-plan): session h — zombie guard + gitea launch-port fix
Banner + §8b: zombie-container guard (0a8db904, live-proven on .228) and
gitea launch-port fix (670ebb06) shipped in binary 040df5ce, rolled to
the fleet. Logs the mempool env-drift recreate-loop and nostr-rs-relay
follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:41:59 -04:00
archipelago
670ebb0666 fix(launcher): pin Gitea launch URL to web port 3001 (not SSH 2222)
Gitea publishes two host ports — SSH on 2222 and the web UI on 3001.
The launch URL comes from manifest_lan_address_for() (the manifest's
interfaces.main → 3001), but Gitea had no entry in the static
lan_address_for() fallback map. On a node where the gitea manifest is
absent or stale (no interfaces block), the lookup returns None and the
code falls through to extract_lan_address(), which returns whichever
port podman lists first — frequently the SSH port. Result: the app
launched at :2222 instead of :3001 (observed on tailscale node
100.82.34.38).

Add the canonical "gitea" => http://localhost:3001 entry to the static
map, matching every other core app, so the web UI is pinned regardless
of manifest presence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:16:41 -04:00
archipelago
0a8db9044f fix(orchestrator): recreate zombie "Up" containers whose process is dead
podman trusts its own state DB: when a container's conmon dies without
podman observing it (cgroup-cascade SIGKILL on archipelago.service
restart, a crash), `podman ps` keeps reporting it "Up" long after the
process is gone. The reconciler NoOp'd such a zombie forever, so a dead
dependency with no published host port never recovered.

Observed live on .228 (2026-06-25): netbird-dashboard reported "Up" with
a dead State.Pid → its nginx proxy 502'd → NetBird login broke
("Unauthenticated"). The dashboard publishes no host port, so the
Running branch had nothing to probe and never recreated it.

Add a zombie guard to the Running branch: verify the recorded State.Pid
is alive (its /proc entry exists) before trusting "running"; on a
concrete dead PID, stop+remove+install_fresh from the manifest.
Conservative by design — any uncertainty (inspect failed, PID
unparseable) assumes alive, so a transient podman hiccup never destroys
a healthy container. Unit test covers live/dead/out-of-range PIDs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 02:25:52 -04:00
archipelago
43e700498b fix(android): trust self-signed certs for the user's own node in WebView
Node apps (e.g. NetBird on :8087) terminate TLS with a self-signed cert
so the dashboard gets a secure context (OIDC / window.crypto.subtle, #15).
The WebView's default onReceivedSslError CANCELs untrusted certs, so those
apps rendered blank in the companion — exactly the netbird "won't load in
the webview" report. Override onReceivedSslError in both WebViewClients
(kiosk + in-app browser) to proceed() only when the failing cert's host
matches the connected node; reject everything else (no blanket trust).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:13:52 -04:00
archipelago
89d397bb74 refactor(netbird): delete legacy Rust installer — #20 ph4 (manifest-driven only)
netbird is fully manifest-driven (apps/netbird-*/manifest.yml via the signed
catalog): install_stack_via_orchestrator renders the 3-member stack with
generated_certs (self-signed TLS for the #15 OIDC secure context), base64
generated_secrets, and templated config — and adopts the running stack by live
container name. The hardcoded `podman run` fallback was therefore dead code on
any node with the embedded catalog (verified live: .228 https:8087 -> 200).

Removes the per-app Rust installer anti-pattern the master plan calls out:
- install_netbird_stack: orchestrator -> adopt -> bail! (no in-Rust installer)
- deletes 6 now-dead helpers (write_netbird_config_files, ensure_netbird_tls_cert,
  read_or_generate_b64_secret, netbird_net_resolver_ip, detect_netbird_public_host_ip,
  wait_for_netbird_oidc_ready), 3 NETBIRD_*_IMAGE consts, unused base64::Engine import
- ~485 lines removed; prod_orchestrator doc-comments updated

Behavioural parity: the manifest path already executed on the fleet, so this
changes no live behavior. The legacy #10 OIDC-readiness wait was already bypassed
by the manifest path; if that race resurfaces, add an OIDC-ready gate to the
manifest rather than resurrecting the Rust fn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 11:04:01 -04:00
archipelago
41e7f500f8 test(lifecycle): tolerate slow-but-healthy heavy-app recovery under 5x churn
The 5x destructive gate on heavy nodes false-failed on transient windows
during stack recovery, not real regressions:

- immich.bats: lan_address port-publish probe 30s -> 90s. The postgres->redis
  ->server (DB migrations on boot) stack can take >30s to republish :2283 after
  a churn-induced recreate; destructive-tier immich tests already allow 180-240s.
- mempool.bats: orphan-container check now polls to steady state (<=30s) instead
  of a single-shot count, which caught a recreated member briefly visible
  alongside its replacement mid-reconcile.
- run-gate.sh: settle cap 180s -> 300s and also gate on immich's :2283 when
  installed, so the next iteration's read-only probe doesn't race a still-
  recovering stack. Settle returns the instant every probe is green.

A genuinely unexposed/orphaned/unhealthy app still fails these checks; they only
absorb the transient recreate window under sustained churn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 09:18:34 -04:00
archipelago
a721532f55 feat(orchestrator): desired-state recovery + recreate volume-ownership [UNVALIDATED WIP]
NOT yet validated on a node or fleet-deployed — cargo check passes, release build
+ .228 canary validation pending. Committed as a checkpoint so the work survives.

Two fixes the immich .198 incident exposed:

Fix A (reconcile_all_with_mode): a previously-running app whose container vanished
(e.g. a wedged podman teardown cleared by a reboot) was left absent on boot. Now,
when boot reconcile would leave an app 'absent' but it was running at the last
running-containers snapshot, recreate it (install_fresh). New
crash_recovery::load_last_running_names() reads the snapshot without the PID/crash
gate (+2 unit tests). Match is exact on compute_container_name (incl stack
members); user-stopped + uninstalled apps are already excluded, so no false
positives.

Fix B (ensure_bind_mount_dirs): a freshly-created bind dir was left root:root, so a
no-data_uid app running as container-root (→ host rootless user) hit EACCES and
crash-looped (the exact immich upload-dir failure). Now a newly-created bind dir
for a no-data_uid app is chowned via --reference=<parent> to match the rootless
data root — no host-uid guessing, only fresh dirs (no regression for existing
installs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 09:28:40 -04:00
archipelago
80f49cac1c fix(ui): backoff remote-relay reconnects + stop cryptpad icon 404
Two console-noise fixes from a live error dump:
- remote-relay.ts reconnected on a FIXED 5s interval with no backoff, so when
  the backend is briefly down it floods the console/network with failed-WS
  attempts for the whole outage. It's a secondary feature (companion input), so
  add exponential backoff 1s->30s (mirrors websocket.ts), reset on open/start.
- cryptpad's catalog/marketplace entries pointed at a non-existent
  /assets/img/app-icons/cryptpad.webp -> a 404 on every marketplace render.
  Point it at the existing default icon (handleImageError swapped to it anyway).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 08:41:04 -04:00
archipelago
2d8ade629b fix(ui): log global errors silently instead of popping a toast + overlay
The global error handler (Vue errorHandler + window error + unhandledrejection)
fired a red 'Something went wrong: <raw msg>' toast AND an auto on-device overlay
on every caught error — deliberately loud for bug-bash, but it surfaces benign,
non-actionable noise (e.g. a transient RPC rejection during a ws reconnect, or
the service worker failing to register over a self-signed cert) right in the
user's face.

Demote the catch-all to SILENT capture: keep console.error + the
window.__archyErrors ring buffer, and expose the screenshot-able overlay
on-demand via window.__archyShowErrors() — but never auto-pop. Components that
need to report a specific, actionable failure still call toast.error() directly.

Also filter known-benign environmental noise (PWA service-worker registration
failing over a self-signed cert — needs a trusted cert, #56) so it doesn't even
occupy a ring-buffer slot and push out real errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 05:55:49 -04:00
archipelago
0406af522c test(lifecycle): add manifest-driven all-apps health matrix
The per-app suites cover ~8 core apps in depth; nothing covered the ~30 others
(jellyfin, vaultwarden, penpot, nextcloud, grafana, …). all-apps-matrix.bats
derives the app set from server.get-state package-data (no hardcoded list) and
asserts baseline health across EVERY installed app:
  - settles to a non-transitional state within a window (the #13/#14 stuck-ghost
    class, generalized fleet-wide — installing/removing that never settles)
  - not in error/failed
  - reports a recognized (non-garbage) state
  - every running UI app (manifest ui=="true") exposes a non-null lan-address
    (the immich/port-drift unreachable-UI failure, generalized to all UI apps)

Read-only, so it joins run.sh/run-gate.sh on every node and grows coverage as
nodes install more apps. Verified 5/5 on .228 (17 apps) and .116 (20 apps).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 05:27:10 -04:00
archipelago
57a69257c4 test(lifecycle): add CASCADE uninstall/reinstall tier (guards #13 ghost, #14 reinstall)
The 5x gate is DESTRUCTIVE-only and never exercised uninstall/reinstall — where
the worst field bugs lived (#13 app ghosting in My Apps after uninstall, #14
reinstall stalling on stale state). New cascade-uninstall.bats drives the full
teardown path on a throwaway app (default grafana, precondition-skips if already
installed so it can't destroy real data) and asserts:
  - fresh install reaches running via a truthful, non-silent progression
  - uninstall makes the entry DISAPPEAR from server.get-state package-data
    (the literal My Apps map) — no ghost, no stuck uninstall stage
  - container + (on-node) data dir are gone
  - reinstall returns to running
  - node left as found

Opt-in via ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1; not yet folded into the canonical
gate. Verified 7/7 against .228.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 05:13:53 -04:00
archipelago
d1cd42c821 fix(orchestrator): stop retrying unrepairable volume chowns every reconcile
ensure_running_container_ownership re-probed and re-attempted the in-container
chown on every reconcile pass. For a mount that can't be re-owned from inside the
userns (observed: mempool-api /data -> 'Operation not permitted'), this burned
CPU and logged a WARN on every pass, forever (~6x/30min on .228/.116).

Remember hard chown failures in a process-lifetime set keyed by (container-id,
dest) and skip the probe+chown for known-unrepairable mounts. Keyed by Id (not
name) so a recreated container gets a fresh repair attempt. Verified on .116:
one recorded failure at startup, then silent across subsequent reconciles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 04:58:57 -04:00
archipelago
3e3016f2bd fix(ui): debounce connection-lost banner so transient ws blips don't flash
The reconnect banner showed 'Connection lost'/'Reconnecting' instantly on every
socket close, even ones that recover in 100ms-2s (load spikes, Tailscale/relay
TCP resets). On a healthy node the drops are brief and self-healing, but each one
flashed a jarring banner, reading as constant instability.

Debounce the transient banner by 2.5s: only surface after the connection issue
persists past the grace window; hide immediately on recovery. Deliberate server
lifecycle transitions (restart/shutdown) bypass the debounce and still show at
once. A genuine persistent outage keeps isOffline true and surfaces after 2.5s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 04:58:54 -04:00
archipelago
7d89b4d8b2 chore(registry): publish embedded app-catalog.json (52 manifests) for fleet fetch
Force-add the gitignored releases/app-catalog.json so nodes resolve
146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/app-catalog.json
(currently HTTP 404 → disk-manifest fallback). Embedded-manifest delivery
is default-on; origin-wins overlay with disk as fallback. Unsigned (migration
window accepts unsigned). Includes netbird x3 manifests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 23:45:31 -04:00
archipelago
15f65428b8 docs(master-plan): §8b — uninstall fix deployed+live-verifying, #15 guardian resolved
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 18:07:41 -04:00
archipelago
36015a19fe docs(master-plan): §8b session-b state — connection-lost+netbird+UX-merge shipped to .228, uninstall ghost fix, workstream F in progress
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 15:26:17 -04:00
archipelago
e57514b690 fix(uninstall): never ghost a removed app in My Apps on cleanup residue
handle_package_uninstall lumped every teardown failure into one `errors` vec
and returned Err on any of them BEFORE removing the package state entry — so a
non-fatal cleanup hiccup (a slow/failed `sudo rm -rf` of a large data dir, a
volume/network removal) left the app's containers gone but its entry in
package_data → a ghost in My Apps, and the spawned task reverted it to Installed.

Split the failures: container removal that even force-rm can't complete (app
genuinely still present) keeps the entry + returns Err; everything after the
containers are gone is best-effort. Remove the state entry as soon as the
containers are gone — BEFORE the slow volume/data teardown — so My Apps updates
immediately and residue can never ghost the app. set_uninstall_stage is a no-op
once the entry is gone (if-let guard), so the later stages don't re-create it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 15:23:16 -04:00
archipelago
4346007d37 fix(orchestrator): only TCP host ports get reachability-probed
wait_for_manifest_host_ports TCP-connect-probed every published port, including
UDP/SCTP. netbird's 3478/udp STUN can never answer a TCP connect, so the probe
failed forever and drove an endless host-port repair/reconcile loop on .228
(netbird-server restarting ~every 60s). Filter to tcp (empty protocol = tcp).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 14:40:48 -04:00
archipelago
44f7af2017 merge: companion-mobile-ux UX (loader/store-driven launch/icons + android webview) into main
# Conflicts:
#	Android/app/build.gradle.kts
#	Android/app/src/main/java/com/archipelago/app/ui/screens/WebViewScreen.kt
#	neode-ui/src/views/apps/appsConfig.ts
2026-06-23 14:07:44 -04:00
archipelago
9670af62b6 feat(registry): deliver app manifests via the signed catalog (embed by default)
Turn on registry-distributed manifests for all apps: generate-app-catalog.sh now
embeds each apps/<id>/manifest.yml by default (EMBED_MANIFESTS opt-out), so nodes
install from the signed catalog (origin-wins overlay, disk = fallback) with no
OTA-shipped disk manifest. main.rs awaits a bounded (25s) refresh_catalog before
load_manifests so a fresh boot overlays the latest embedded catalog instead of a
restart later; offline/ISO boot falls through to disk and never hangs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:39:54 -04:00
archipelago
a8b9b0f5e8 feat(netbird): manifest-driven migration via reusable orchestrator primitives
Migrate the netbird stack (server/dashboard/proxy) off ~500 lines of per-app Rust
to 3 declarative manifests, adding 4 reusable primitives:
- SecretGenKind::Base64 (netbird relay authSecret + sqlite store encryptionKey)
- GeneratedCert schema + ensure_manifest_certs (self-signed TLS so the dashboard
  gets a secure context for OIDC PKCE — issue #15; https proxy on 8087 preserved)
- templated GeneratedFile render: {{HOST_IP}}/{{HOST_MDNS}}/{{NETWORK_GATEWAY}}
  (aardvark resolver for the #15 stale-IP fix) /{{secret:NAME}} (never logged)
- legacy create_container now honours port.protocol (3478/udp STUN)
install_netbird_stack routes via the orchestrator first (legacy kept as fallback,
mirroring indeedhub); launch URL derives https://{host_ip}:8087 from host facts.
Legacy Rust deletion deferred to post-live-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:39:53 -04:00
archipelago
3c36cf1c40 fix(companion): stop image_exists journal flood that drops the UI websocket
image_exists ran `podman image inspect <image>` via .status() (inherits the
service stdout) with no --format, so every hit dumped the image's full ~249-line
manifest JSON into the journal — once per companion image, every reconcile pass
(.228: 21.6k journal lines / 10 min, 4131 inspect dumps). The service never
crashed (NRestarts=0); the sustained journald/IO flood starved the async runtime
and dropped the UI /ws/db websocket -> constant "connection lost"/reconnect.
Discard the child's stdout/stderr; only the exit status is used.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:39:19 -04:00
archipelago
c4cd5fdc90 docs(master-plan): §8b resume — gate green + 6-node deploy + APK fix + workstream F
Comprehensive resume for the session restart: single-node gate green
(5/5 .228), latest backend + UX + one-tap companion APK deployed to 6
nodes (table w/ creds + pending 100.64.83.15 cred), workstream-F bugs
from manual testing, agreed next order (netbird → Phase-3 → F →
multinode), and loose ends (untracked AppLoadingScreen.vue, broken
gitea-local mirror, don't-delete-bitcoin-data directive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:56:54 -04:00
archipelago
ccb594fb85 test(gate): fix bitcoin-knots getinfo-after-restart helper + IBD note
It called bats-assert's `fail` (not loaded in this file) → "fail:
command not found"/127, masking the real reason. Emit+return instead,
bump the cold-restart RPC window 60s→120s (block-index reload), and
note a node mid-IBD legitimately can't serve getinfo (environmental
precondition, not a product regression).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:28:20 -04:00
archipelago
deff380191 docs(master-plan): workstream F (lifecycle perfection) + §10 state-mgmt backlog
The 2026-06-23 5×-green gate is DESTRUCTIVE-tier / ~8 core apps only —
it skips uninstall/reinstall (cascade) and has no progress-UI or
all-apps coverage. Manual multinode testing found real bugs it never
ran (immich+grafana uninstall hangs at full-red bar + ghost in My Apps;
grafana reinstall stops; fedimint guardian "waiting for bitcoin sync").
Adds §4 row F, §6b post-deploy order (netbird→Phase-3→F), §6c scope +
observed bugs + definition-of-done, a §5 warning, and §10 backlog to
investigate TanStack-Query/push-based state management for neode-ui.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:28:19 -04:00
Dorian
5c43e12782 chore(android): publish companion as raw APK instead of zip
Serve the companion download as a plain .apk so a phone installs it
straight from the link/QR with no unzip step. Repoint the in-app
download URL, the ship + publish scripts, and the pre-push hook at
archipelago-companion.apk, and drop the legacy .apk.zip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 09:41:10 +01:00
Dorian
e825bbed73 feat(android): file upload/download + in-app tab redesign
Companion WebView now supports file inputs and downloads, and apps
opened in the in-app tab get a proper loading splash and a footer
control bar matching the web app-session bar.

- onShowFileChooser wired to an ActivityResultLauncher so <input
  type=file> opens the system file browser (kiosk + in-app tab)
- DownloadListener: http(s) via DownloadManager (forwarding session
  cookies), blob: via JS->base64->MediaStore, data: decoded inline
- in-app tab: app-icon + progress loading splash (eager favicon
  fetch, upgraded via onReceivedIcon)
- footer controls (back/forward/refresh/open/close) matched to the
  web AppSession mobile bar, with the same SVG glyphs as drawables
- bump to 0.4.8 (versionCode 12)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 09:41:10 +01:00
archipelago
0dd19f0721 docs(CLAUDE.md): single-node gate GREEN — demote priority banner
run-gate.sh 5/5 on .228. Reframe the TOP PRIORITY banner as
gate-green; keep the master plan as north-star source of truth; mark
the gate definition-of-done green and point at multinode as the next
exit criterion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:35:50 -04:00
archipelago
ae47897601 docs: single-node production gate GREEN (5/5 on .228) — demote banner
run-gate.sh 5×-green on .228, 0 not-ok (gate-5x5.log). Records the
milestone in the header/banner, §4 workstream E, §6 sequence, and §8b;
demotes the priority banner per §6 item 6. Next: bundled testing deploy
(.116/.198 + UX frontend), multinode pass, workstreams B/C/D.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:27:36 -04:00
archipelago
256d354048 docs(master-plan): tick off §8 P1 mobile app-launch UX (code-complete)
Mobile launch UX is code-complete on branch `companion-mobile-ux` (store-driven
panel, no interstitial, in-app WebView footer + loader, mesh 100dvh, ElectrumX
icon, companion v0.4.7 + shared debug keystore). Marked code-complete pending
on-device/mobile-web verification and merge to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:11:25 -04:00
archipelago
2a249b8a48 feat(android): companion in-app WebView footer controls + loader; shared debug key; v0.4.7
- InAppBrowser now has a bottom control bar (back/forward/reload/open-in-browser/
  close) mirroring the web mobile footer, plus a centered loading screen
  (app favicon + progress bar) instead of a bare top bar over black.
- Commit a repo-dedicated debug keystore and pin signingConfigs.debug to it so
  every machine — and the published companion download — signs debug builds with
  the SAME key (fixes "App not installed" signature-mismatch on update). Force v1+v2.
- Bump versionCode 10→11, versionName 0.4.6→0.4.7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:48:58 -04:00
archipelago
a7c7c44843 feat(neode-ui): mobile app-launch UX — store-driven panel, loader, ElectrumX icon
- Mobile launches use the store-driven panel (no route push) so the background
  tab no longer changes and closing returns to where you launched from.
- Tab-only apps open directly (in-app WebView on companion / new tab on PWA) —
  no "this app opens in a tab" interstitial.
- Shared AppLoadingScreen (app icon + progress bar) on the app session and the
  legacy iframe overlay instead of a black screen.
- Pin the dashboard to 100dvh on mobile so the mesh chat/tools panes stop sliding
  under the bottom tab bar in mobile browsers (no-op in the companion WebView).
- ElectrumX/electrs/electrs-ui ids now resolve to the real ElectrumX icon in My Apps.
- isMobile made reactive so overlay/footer/teleport decisions track the viewport.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:48:57 -04:00
archipelago
2afd18c6de test(gate): poll immich lan_address to absorb mid-recreate churn
5× run #4 flaked iter4 on "immich exposes its web UI lan-address
(port 2283)": container-list returned lan_address=null because
immich_server was momentarily mid-recreate when the read-only tier
queried it (passed the other 4 iterations; immich_server does publish
0.0.0.0:2283->2283). Same single-shot-read class as the bitcoin-knots
state probe — poll <=30s for the exposed port instead of one read. A
genuinely unexposed immich never publishes 2283, so real port drift
is still caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:20:18 -04:00
archipelago
6511754545 docs: master-plan §8b — 5× triage, mempool restart bug fixed
Record the overnight 5× outcome (2/5) and the triage: all three
fails were distinct one-offs. iter1 #5 bitcoin-knots = pre-launch
churn (hardened anyway); iter2 #74 + iter5 #73 = one real
orchestrator bug (phantom stack-member injection in
ordered_containers_for_start), now fixed + live-verified on .228.
Update the resume check command to gate-5x4.log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 02:23:07 -04:00
archipelago
92d7f52dd6 fix(orchestrator): order only live containers on package start/restart
package.restart resolved its container list via
ordered_containers_for_start, which injected every name from the
union startup_order list that wasn't already present — including
variant names not live on a given node (mysql-mempool,
archy-mempool-api, archy-mempool-web). The phantom mysql-mempool is
2nd in the mempool start order, so do_orchestrator_package_start hit
its unknown-app-id fallback, do_package_start failed the inspect
("no such object"), and the `?` aborted the whole start sequence —
leaving mempool-api + the frontend down until the health monitor
recovered them minutes later. That was the source of the 5× gate
flakes #73 (frontend not running in 180s) and #74 (api not queryable
in 300s); root-caused from the .228 journal
("Start failed: mysql-mempool").

Replace the inject-then-sort logic with a pure helper
order_present_containers that orders only the actually-present
containers and never adds phantom entries. startup_order remains a
union of name variants across install generations — it's now used
purely to order what's live, not to inject what isn't. +3 unit tests.

Also harden bitcoin-knots.bats "valid state" probe: poll ≤30s for a
settled state instead of a single-shot read, so a container caught
mid-reconcile (transient restarting/configured) can't flake a 20-min
iteration. A genuinely-stuck container never settles, so real
breakage is still caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 02:22:50 -04:00
archipelago
57a013bc66 test(gate): make 5× the canonical gate, drop 20x naming
Rename run-20x.sh → run-gate.sh, default ARCHY_ITERATIONS 20→5, and scrub
20× references across CLAUDE.md, the master plan, TESTING.md, app-registry
status, the orchestrator/config doc-comments, and the bats suites. Also add
a minimal fail() helper to mempool.bats so guard failures report cleanly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 18:12:41 -04:00
archipelago
0f05f73a23 fix(mempool): self-healing nginx backend proxy (v3.0.1) + gate timeout
The frontend nginx used a literal proxy_pass host with no resolver, so it
pinned mempool-api's IP at worker startup. When the backend restarts (gate,
OTA, crash, reboot re-IPAM) podman reassigns its IP and nginx keeps proxying
to the dead one -> /api hangs, websocket 502s, UI shows 'offline' until a
manual nginx reload. Same stale-upstream-IP class as the netbird 502.

Fix: mempool-frontend:v3.0.1 rewrites the generated nginx-mempool.conf to
re-resolve the backend per-request via 'resolver' + a variable proxy_pass.
Resolver address is read from /etc/resolv.conf (podman aardvark-dns answers
on the network gateway, not Docker's 127.0.0.11). Per-location path mapping
preserved (ws -> '/', /api/v1 identity via no-URI, /api/ -> /api/v1/ rewrite).
Proven on .228: backend IP change now auto-recovers with no reload; the
literal-host control still 502s. Migrated the manifest off the retired
tx1138 registry to vps2.

Also: mempool.bats #74 waited only 180s post-restart (the slow path) and
called an undefined 'fail' helper (status 127). Bumped to 300s to match the
passing parity probes and emit a real failure instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 18:07:07 -04:00
archipelago
c8acc84506 docs: §2 invariant single-node (.228); multinode → separate plan 2026-06-22 17:23:19 -04:00
archipelago
8355453a7e docs: exact cutoff-proof resume in master-plan SS8b (resume from any device)
Captures: .228 1x-GREEN (110/110); hardened 5x DETACHED on .228 (/tmp/gate-5x2.log,
nohup — survives terminal close) with the exact check-from-any-machine command; all
shipped code fixes (commits) + deploy state (.228 + .198); node-state fixes NOT in
repo (lnd nginx proxy 8081->18083, home-assistant orphan unit removed, electrumx
re-registered); the run-ON-the-node lesson; and remaining work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 17:22:29 -04:00
archipelago
98f4fa44a8 test(gate): harden readiness for sustained 5x churn + inter-iteration settle
The 1x gate is green; the 5x failed iters 1-2 on readiness-under-churn (apps DO
recover — lnd synced, mempool just mid-restart when probed — but slower than the
windows when restarted back-to-back). Hardening:
- run-20x.sh: best-effort settle_stack() before each iteration (wait for
  mempool-api/frontend + lnd RPC healthy, 180s, on-node, never fails the run).
- required containers present/running (80/81): wait-loops (180s) not single-shot.
- mempool api/frontend (87/88): retry ~180s not single-shot.
- mempool queryable (74): 60s->180s. lnd restart-running (64): 120s->240s.
  lnd getinfo (60): 90s->240s retry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 17:11:15 -04:00
archipelago
22b05de6d9 docs(roadmap): P1 mobile app-launch UX — drop 'opens in a tab' interstitial
Companion app: open every app in the in-app WebView (not just non-iframeable),
carrying the mobile-iframe footer controls into the WebView. Mobile web (PWA):
open tab-apps directly in a new tab. No interstitial on either surface. Touch
points + prior commits (b5a9deb8, d1fbcd9b) noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 16:57:44 -04:00
archipelago
27299ea687 docs: make the production test gate a SINGLE-NODE (.228) criterion; split out multinode
Per direction: the gate is now 5x green ON .228 only (run on the node, not via RPC).
Fleet/multinode verification (.198 + others) moved to a new docs/multinode-testing-plan.md
with the bootstrap recipe, per-node preconditions (synced archival bitcoin, no stale
nginx proxy targets, no orphan quadlet units), node roster, and cross-node suites.
Updated CLAUDE.md, master-plan SS5/SS6/SS8b/WS-E, and TESTING.md release gates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 16:47:34 -04:00
archipelago
892ff083c4 test(gate): fix the last 4 readiness/config false-fails (none are product bugs)
On a proper on-node .228 run (synced bitcoin, 4-fix binary) the lifecycle matrix is
green; these 4 were test-harness issues:
- lnd 'recovers after restart' (65): bump retry window 90s->240s. lnd cold-restart
  recovery (wallet unlock + bitcoind reconnect + graph sync) exceeds 90s on a loaded
  node but DOES complete (synced_to_chain:true).
- bitcoin ui responds (89): retry ~120s instead of single-shot (companion nginx may
  have just been recreated by the companion-survives test).
- probe_app_url (99 lnd proxy + all ui-coverage proxy probes): retry up to 90s for
  post-restart proxy/UI readiness instead of single-shot.
- required endpoints after restart (94): :8081 is nginx-proxy-manager, an OPTIONAL
  app (not in required_containers) — only assert it when NPM is installed; and make
  the trailing lncli getinfo a retry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 15:43:51 -04:00
archipelago
8893055810 test(gate): retry lnd getinfo for RPC readiness (wallet-unlock lags 'running')
lnd's RPC isn't ready until its wallet auto-unlocks on (re)start, which lags the
container 'running' state — single-shot lncli getinfo raced that window and
false-failed (gate tests 60 + 85). Retry up to ~90s like a health probe. lnd is
functional (getinfo returns cleanly once ready).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 14:45:36 -04:00
archipelago
53b8e47f1d test(gate): fix two false-failing lifecycle tests (not product bugs)
- immich restart: bump wait 120s->240s. Restart = ordered stop+start of the 3-
  container stack (postgres->redis->server w/ DB migrations), so it needs at least
  as long as the start test (180s) — the old 120s was inconsistent and false-failed
  on loaded nodes. immich does return to running.
- fedimint orphan check: the unanchored 'total' regex (^fedimint) counts the
  legitimate fedimint-clientd (dual-ecash bridge) but the anchored 'known' regex
  omitted it -> total>known false orphan on every node running fedimint-clientd.
  Add fedimint-clientd to known.

Both run as LOCAL podman/systemctl on the gate runner, so they test the runner node
(.116), not the RPC target — surfaced while driving the .228 gate green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 14:11:35 -04:00
archipelago
f4727bfdb3 docs(gate): companion self-heal fix validated (10s) + test-31 harness caveat
Independent companion loop (452f05d8) validated on .228: deleted archy-electrs-ui
recreates in ~10s (was stuck 100s+). Also: companion-survives bats does LOCAL
rm/systemctl --user, so running it from .116 via RPC tests .116's companions with
.116's binary, NOT the remote target — must run ON the target node. Explains the
'failed on both nodes' runs (both silently tested .116).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 13:44:57 -04:00
archipelago
452f05d849 fix(reconciler): decouple companion self-heal onto its own cadence
The companion-unit repair stage ran at the END of each boot-reconciler tick, after
reconcile_existing(). On a heavily loaded node that per-app pass takes >60-90s, so a
deleted/lost companion unit (electrs-ui, bitcoin-ui, …) wasn't repaired within any
reasonable window (gate test 31 'deleted unit recreated within one reconcile tick'
timed out at 90s on the 45-app .228 node). Detecting + rewriting a companion unit is
cheap, so spawn it as its own ~interval(30s) loop, independent of the slow app pass.
Handle is aborted when the main loop exits (shutdown uses notify_one, so a second
waiter would steal the wake permit). tick() is now app-reconcile only.

All 4 boot_reconciler cadence tests still green (companion_stage=false in tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 13:04:28 -04:00
archipelago
de7d3d83dc docs(gate): final read — every failure fixed/explained, no lifecycle bugs remain
Last 2 .228 stragglers confirmed load/timing, not bugs: test 31 (companion recreate)
= contamination + ~108s reconcile cadence > 90s window; test 55 (immich restart) =
heavy stack restarts >120s under load but DOES return. Path to literally-green gate
is infra (bitcoin sync, re-quadletize .228) + minor test-window tuning. Optional
product improvement noted: independent ~30s companion-reconcile cadence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 12:36:03 -04:00
archipelago
76b23adcc0 docs(gate): test 31 root-caused = .228 contamination (not a product bug)
companion::reconcile only recreates a deleted companion unit when its parent
backend is in manifest_ids. On contaminated .228, electrumx ran as plain podman
and was NOT a tracked manifest install (manifest on disk but unloaded), so the
reconciler never iterated it -> archy-electrs-ui companion orphaned. Proven:
package.install electrumx re-registered it + restored the companion. Self-heal
logic is sound; test 31 clears on re-quadletize. electrumx on .228 de-contaminated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 11:34:55 -04:00
archipelago
47a5148865 docs(gate): two-node result — stop blocker FIXED; residual red is bitcoin-IBD + node prep
.228 104/110, .198 94/110 with the 3-fix binary. Every package.stop test passes on
healthy apps. .198's 14/16 failures trace to bitcoin in IBD (test 83: ~137k blocks
behind) cascading to lnd/btcpay/electrumx/mempool. 2 node-independent: companion
recreate (31, both nodes), fedimint orphan pollution (44). Path to green 5x gate is
now infra (sync bitcoin, re-quadletize .228) + minor (test 31), not lifecycle bugs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 11:09:12 -04:00
archipelago
b090235b04 docs(gate): 3 stop bugs FIXED, electrumx suite GREEN on .228
Stop failure was 3 real product bugs (grace / reconcile-resurrection /
container-list user-stopped state), all fixed (2dad64b2, 760a32bc, 6e49ce6f) +
deployed. electrumx lifecycle suite 10/10 green (66s). fedimint 'crash loop' was
probe-induced churn (stable when left alone). Validating breadth next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:49:45 -04:00
archipelago
6e49ce6f88 fix(container-list): report user-stopped apps as stopped despite live UI companion
A user-stopped backend (electrumx, bitcoin, lnd, fedimint) kept reading 'running'
in container-list because its UI companion (electrs-ui, …) still serves the launch
port, and the state-refresh upgrades any reachable launch port to 'running'. The
gate's wait_for_container_status <app> stopped therefore never saw 'stopped'.

Fix: load the user_stopped marker in handle_container_list and force 'stopped' for
those apps before the launch-port refresh. The reconcile guard keeps the backend
down, so the marker is authoritative. package.start clears it first, so a started
app reports 'running' normally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:26:30 -04:00
archipelago
760a32bccf fix(reconcile): keep user-stopped apps stopped (reconciler was resurrecting them)
package.stop a dependency (e.g. electrumx, a mempool dep) and the reconciler
restarts it within ~8s: the reconcile filter's dependency_required override
re-includes a user-stopped app that an active app depends on, and the in-memory
disabled set is wiped on manifest reload — so ensure_running runs, the stopped
app's unreachable ports look like a fault, the host-port repair restarts it, and
package.stop never sticks (gate 'transitions to stopped' times out).

Fix: guard ensure_running_with_mode on the on-disk user_stopped marker (the single
choke point every reconcile flows through) → Left('user-stopped'). Explicit
install/start clear the marker first (added clear_user_stopped to orchestrator
install/start, symmetric with disabled.remove; start/restart RPC already cleared
it) so user actions are unaffected. The container itself already stopped correctly
— this stops the resurrection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:04:02 -04:00
280 changed files with 24343 additions and 2146 deletions

View File

@ -2,7 +2,7 @@
# Keep the served companion APK in sync with main on every push. # Keep the served companion APK in sync with main on every push.
# #
# When a push to main includes Android changes, rebuild the APK, refresh # When a push to main includes Android changes, rebuild the APK, refresh
# neode-ui/public/packages/archipelago-companion.apk.zip, commit it, and ask # neode-ui/public/packages/archipelago-companion.apk, commit it, and ask
# you to push again (so the refreshed APK rides along in the same push). # you to push again (so the refreshed APK rides along in the same push).
# #
# Enable once per clone: git config core.hooksPath .githooks # Enable once per clone: git config core.hooksPath .githooks
@ -40,7 +40,7 @@ fi
bash scripts/publish-companion-apk.sh || exit 0 bash scripts/publish-companion-apk.sh || exit 0
DEST="neode-ui/public/packages/archipelago-companion.apk.zip" DEST="neode-ui/public/packages/archipelago-companion.apk"
if git diff --cached --quiet -- "$DEST"; then if git diff --cached --quiet -- "$DEST"; then
exit 0 # APK unchanged — nothing to do exit 0 # APK unchanged — nothing to do
fi fi

5
Android/.gitignore vendored
View File

@ -14,3 +14,8 @@ local.properties
*.aab *.aab
*.jks *.jks
*.keystore *.keystore
# Exception: the repo-dedicated *debug* keystore is committed on purpose so every
# machine (and the published companion download) signs debug builds identically —
# updates then install over the top without an uninstall. Debug keys are not
# secret (well-known password "android"); never commit a real release keystore.
!/app/debug.keystore

View File

@ -0,0 +1,94 @@
# Companion App — Build, Ship & "App Not Installed" Runbook
Canonical procedure for releasing the Archipelago Companion Android app and for
debugging install failures. Read this before touching the companion release flow.
Hard lessons from 2026-06-26 are baked in below — don't relearn them.
## Ship the companion (the only sanctioned way)
```bash
./Android/ship-companion.sh
```
This calls `scripts/publish-companion-apk.sh` (the single source of truth, also
used by the `.githooks/pre-push` hook), which:
1. **Removes/rejects resource dirs whose names contain spaces.** Empty stray
`mipmap-* NNN` dirs (left by icon-export tools) break a *clean* build with
`Invalid resource directory name`. Incremental builds hide them — clean builds
don't.
2. **Always does a CLEAN build** (`:app:clean :app:assembleDebug`).
3. **Forces v1 + v2 + v3 signing** via `zipalign` + `apksigner`.
4. **Verifies all three schemes** (`apksigner verify --min-sdk-version 21`) and
**aborts** if any is missing.
5. Stages the signed APK at `neode-ui/public/packages/archipelago-companion.apk`,
commits, and pushes with `SHIP_COMPANION=1` (the sanctioned pre-push bypass).
**Never** hand-roll `gradlew assembleDebug` + `cp` to the served path. That path
skips the clean build and the signature enforcement and is exactly how a broken
APK shipped.
### Bump the version first
Edit `Android/app/build.gradle.kts``versionCode` (must strictly increase) and
`versionName`. The committed value can drift AHEAD of what's actually built into
the served APK, so verify the served APK's real version after shipping:
`aapt2 dump badging neode-ui/public/packages/archipelago-companion.apk | grep version`.
## Signing facts (important)
- Debug builds are signed with the **committed** `Android/app/debug.keystore`
(store/key pass `android`, alias `androiddebugkey`) so every machine and the
served download share ONE signing key. Cert SHA-256: `D6:22:E0:7E:…:66:4D`.
- **AGP silently ignores `enableV1Signing = true` for `minSdk ≥ 24`**, so a plain
gradle build produces a **v2-only** APK. The `apksigner` step in the publish
script is what actually guarantees v1+v2+v3 — do not remove it.
- **Changing the signing key forces every existing install to be uninstalled
once.** Android blocks in-place upgrades across different signatures. Treat the
keystore as permanent; never regenerate it casually.
## Debugging "App Not Installed" — DIAGNOSE FIRST
Do **not** theorize about signing schemes / OEM quirks. Get the real reason:
```bash
adb install ~/Desktop/archipelago-companion-<ver>.apk
# -> Failure [INSTALL_FAILED_<REASON>: ...]
```
Map the reason:
| `INSTALL_FAILED_*` | Cause | Fix |
|---|---|---|
| `UPDATE_INCOMPATIBLE … signatures do not match` | Old install signed with a **different key** (e.g. pre-shared-keystore per-machine key `58:31:12…`). | Uninstall the old package, then install. **One-time** per device after a key change. |
| `INVALID_APK` / parse error | Corrupt/incomplete download or bad signing. | Re-download; re-run the publish script. |
| `INSUFFICIENT_STORAGE` | Storage. | Free space. |
| `OLDER_SDK` | Device below `minSdk` (26 = Android 8.0). | Unsupported device. |
> A manual uninstall on the phone may NOT clear `UPDATE_INCOMPATIBLE` if the
> package is registered under another user/profile — `pm path <pkg>` under user 0
> can show nothing while the conflict persists. `adb uninstall <pkg>` clears it
> across all users.
## Phone / adb safety (non-negotiable)
When acting on the user's physical phone, be surgical — the user once had all
home-screen app layouts wiped by an over-broad action.
- Default to **read-only** adb (`devices`, `getprop`, `pm path/list`, `dumpsys`).
- Mutations (`adb install`, `adb uninstall com.archipelago.app.debug`) only with
explicit go-ahead and **scoped to our exact package** — echo it first.
- **Never** run launcher/system resets: no `pm clear` on launchers, no
`reset-permissions`, no factory wipe, no uninstalling apps you didn't build.
## Verify the published download after shipping
The download served to nodes is Gitea raw-on-main. Confirm the live bytes match
what you built and signed:
```bash
SERVED=neode-ui/public/packages/archipelago-companion.apk
URL=http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/$SERVED
curl -sS -o /tmp/live.apk "$URL"
shasum -a 256 "$SERVED" /tmp/live.apk # must match
apksigner verify -v --min-sdk-version 21 /tmp/live.apk | grep -i "scheme" # v1/v2/v3 = true
```

View File

@ -11,20 +11,40 @@ android {
applicationId = "com.archipelago.app" applicationId = "com.archipelago.app"
minSdk = 26 minSdk = 26
targetSdk = 35 targetSdk = 35
versionCode = 10 versionCode = 16
versionName = "0.4.6" versionName = "0.4.12"
vectorDrawables { vectorDrawables {
useSupportLibrary = true useSupportLibrary = true
} }
} }
signingConfigs {
// Repo-dedicated debug keystore (committed at app/debug.keystore) so every
// machine — and the published companion download — signs debug builds with
// the SAME key. Without this, Gradle falls back to each machine's
// ~/.android/debug.keystore, so a build from a different machine has a
// different signature and the phone rejects the update ("App not installed").
getByName("debug") {
storeFile = file("debug.keystore")
storePassword = "android"
keyAlias = "androiddebugkey"
keyPassword = "android"
// Force both legacy JAR (v1) and APK Signature Scheme v2. AGP drops v1
// for minSdk>=24, but some OEM package installers (e.g. Samsung) reject
// a v2-only sideload with "App not installed" — keep v1 for max compat.
enableV1Signing = true
enableV2Signing = true
}
}
buildTypes { buildTypes {
debug { debug {
// Separate app ID so a debug/test build installs alongside the // Separate app ID so a debug/test build installs alongside the
// release app instead of colliding on signature. // release app instead of colliding on signature.
applicationIdSuffix = ".debug" applicationIdSuffix = ".debug"
versionNameSuffix = "-debug" versionNameSuffix = "-debug"
signingConfig = signingConfigs.getByName("debug")
} }
release { release {
isMinifyEnabled = true isMinifyEnabled = true

BIN
Android/app/debug.keystore Normal file

Binary file not shown.

View File

@ -112,6 +112,37 @@ class ServerPreferences(private val context: Context) {
} }
} }
/**
* Replace a saved server in place. Matches the existing entry by connection
* identity (address/port/scheme) so edits that change the name or password
* or that touch a legacy 4-field entry still update the right record. If the
* edited server is also the active one, the active record is kept in sync.
*/
suspend fun updateSavedServer(original: ServerEntry, updated: ServerEntry) {
context.dataStore.edit { prefs ->
val current = prefs[savedServersKey] ?: emptySet()
val filtered = current.filterNot { raw ->
val e = ServerEntry.deserialize(raw)
e != null &&
e.address == original.address &&
e.port == original.port &&
e.useHttps == original.useHttps
}.toSet()
prefs[savedServersKey] = filtered + updated.serialize()
val isActive = prefs[activeAddressKey] == original.address &&
(prefs[activePortKey] ?: "") == original.port &&
(prefs[activeHttpsKey] ?: false) == original.useHttps
if (isActive) {
prefs[activeAddressKey] = updated.address
prefs[activeHttpsKey] = updated.useHttps
prefs[activePortKey] = updated.port
prefs[activePasswordKey] = updated.password
prefs[activeNameKey] = updated.name
}
}
}
suspend fun removeSavedServer(server: ServerEntry) { suspend fun removeSavedServer(server: ServerEntry) {
context.dataStore.edit { prefs -> context.dataStore.edit { prefs ->
val current = prefs[savedServersKey] ?: emptySet() val current = prefs[savedServersKey] ?: emptySet()

View File

@ -75,6 +75,7 @@ fun NESMenu(
onDismiss: () -> Unit, onDismiss: () -> Unit,
onSelectServer: (ServerEntry) -> Unit, onSelectServer: (ServerEntry) -> Unit,
onAddServer: (ServerEntry) -> Unit, onAddServer: (ServerEntry) -> Unit,
onEditServer: (ServerEntry, ServerEntry) -> Unit,
onRemoveServer: (ServerEntry) -> Unit, onRemoveServer: (ServerEntry) -> Unit,
onToggleMode: () -> Unit, onToggleMode: () -> Unit,
onToggleStyle: () -> Unit, onToggleStyle: () -> Unit,
@ -87,7 +88,7 @@ fun NESMenu(
contentAlignment = Alignment.Center, contentAlignment = Alignment.Center,
) { ) {
AnimatedVisibility(visible = visible, enter = fadeIn() + scaleIn(initialScale = 0.95f), exit = fadeOut() + scaleOut(targetScale = 0.95f)) { AnimatedVisibility(visible = visible, enter = fadeIn() + scaleIn(initialScale = 0.95f), exit = fadeOut() + scaleOut(targetScale = 0.95f)) {
MenuPanel(servers, activeServer, isGamepadMode, controllerStyle, onDismiss, onSelectServer, onAddServer, onRemoveServer, onToggleMode, onToggleStyle, onBackToWebView) MenuPanel(servers, activeServer, isGamepadMode, controllerStyle, onDismiss, onSelectServer, onAddServer, onEditServer, onRemoveServer, onToggleMode, onToggleStyle, onBackToWebView)
} }
} }
} }
@ -102,21 +103,39 @@ private fun MenuPanel(
onDismiss: () -> Unit, onDismiss: () -> Unit,
onSelectServer: (ServerEntry) -> Unit, onSelectServer: (ServerEntry) -> Unit,
onAddServer: (ServerEntry) -> Unit, onAddServer: (ServerEntry) -> Unit,
onEditServer: (ServerEntry, ServerEntry) -> Unit,
onRemoveServer: (ServerEntry) -> Unit, onRemoveServer: (ServerEntry) -> Unit,
onToggleMode: () -> Unit, onToggleMode: () -> Unit,
onToggleStyle: () -> Unit, onToggleStyle: () -> Unit,
onBackToWebView: (() -> Unit)?, onBackToWebView: (() -> Unit)?,
) { ) {
var showAdd by remember { mutableStateOf(false) } var showAdd by remember { mutableStateOf(false) }
// The saved server being edited, or null when adding a new one.
var editing by remember { mutableStateOf<ServerEntry?>(null) }
var nm by remember { mutableStateOf("") } var nm by remember { mutableStateOf("") }
var addr by remember { mutableStateOf("") } var addr by remember { mutableStateOf("") }
var pwd by remember { mutableStateOf("") } var pwd by remember { mutableStateOf("") }
fun resetForm() {
nm = ""; addr = ""; pwd = ""; showAdd = false; editing = null
}
fun startEdit(server: ServerEntry) {
editing = server
nm = server.name; addr = server.address; pwd = server.password
showAdd = false
}
fun submit() { fun submit() {
if (addr.isNotBlank()) { if (addr.isBlank()) return
val orig = editing
if (orig != null) {
// Preserve fields the compact form doesn't expose (scheme, port).
onEditServer(orig, orig.copy(address = addr, password = pwd, name = nm))
} else {
onAddServer(ServerEntry(addr, false, password = pwd, name = nm)) onAddServer(ServerEntry(addr, false, password = pwd, name = nm))
nm = ""; addr = ""; pwd = ""; showAdd = false
} }
resetForm()
} }
Column( Column(
@ -149,6 +168,7 @@ private fun MenuPanel(
label = server.displayName(), label = server.displayName(),
selected = active, selected = active,
onClick = { onSelectServer(server) }, onClick = { onSelectServer(server) },
onEdit = { startEdit(server) },
onRemove = { onRemoveServer(server) }, onRemove = { onRemoveServer(server) },
) )
} }
@ -157,8 +177,8 @@ private fun MenuPanel(
Text("No servers", color = TextMuted, fontSize = 14.sp, modifier = Modifier.padding(vertical = 4.dp)) Text("No servers", color = TextMuted, fontSize = 14.sp, modifier = Modifier.padding(vertical = 4.dp))
} }
// Add server // Add / edit server
if (showAdd) { if (showAdd || editing != null) {
Column( Column(
Modifier Modifier
.fillMaxWidth() .fillMaxWidth()
@ -168,6 +188,25 @@ private fun MenuPanel(
.padding(12.dp), .padding(12.dp),
verticalArrangement = Arrangement.spacedBy(8.dp), verticalArrangement = Arrangement.spacedBy(8.dp),
) { ) {
Row(
Modifier.fillMaxWidth(),
verticalAlignment = Alignment.CenterVertically,
horizontalArrangement = Arrangement.SpaceBetween,
) {
Text(
if (editing != null) "Edit Server" else "Add Server",
color = TextMuted,
fontSize = 13.sp,
letterSpacing = 1.sp,
fontWeight = FontWeight.Medium,
)
Text(
"Cancel",
color = TextMuted,
fontSize = 13.sp,
modifier = Modifier.clickable { resetForm() }.padding(start = 8.dp),
)
}
GlassField( GlassField(
value = nm, onValueChange = { nm = it }, value = nm, onValueChange = { nm = it },
placeholder = "Name (optional)", placeholder = "Name (optional)",
@ -228,6 +267,7 @@ private fun MenuItem(
selected: Boolean = false, selected: Boolean = false,
labelColor: Color = TextPrimary, labelColor: Color = TextPrimary,
onClick: () -> Unit, onClick: () -> Unit,
onEdit: (() -> Unit)? = null,
onRemove: (() -> Unit)? = null, onRemove: (() -> Unit)? = null,
) { ) {
Row( Row(
@ -247,7 +287,16 @@ private fun MenuItem(
color = if (selected) BitcoinOrange else labelColor, color = if (selected) BitcoinOrange else labelColor,
fontSize = 16.sp, fontSize = 16.sp,
fontWeight = FontWeight.Medium, fontWeight = FontWeight.Medium,
modifier = Modifier.weight(1f),
) )
if (onEdit != null) {
Text(
"",
color = TextMuted,
fontSize = 16.sp,
modifier = Modifier.clickable { onEdit() }.padding(horizontal = 8.dp),
)
}
if (onRemove != null) { if (onRemove != null) {
Text( Text(
"", "",

View File

@ -216,6 +216,17 @@ fun RemoteInputScreen(onBack: () -> Unit) {
onAddServer = { server -> onAddServer = { server ->
scope.launch { prefs.addSavedServer(server); if (activeServer == null) prefs.setActiveServer(server) } scope.launch { prefs.addSavedServer(server); if (activeServer == null) prefs.setActiveServer(server) }
}, },
onEditServer = { original, updated ->
scope.launch {
prefs.updateSavedServer(original, updated)
// If the edited server is the live one, reconnect with the new
// address/credentials so the change takes effect immediately.
if (original.serialize() == activeServer?.serialize()) {
ws.disconnect()
prefs.setActiveServer(updated)
}
}
},
onRemoveServer = { server -> onRemoveServer = { server ->
scope.launch { scope.launch {
prefs.removeSavedServer(server) prefs.removeSavedServer(server)

View File

@ -30,6 +30,7 @@ import androidx.compose.material.icons.filled.VisibilityOff
import androidx.compose.foundation.verticalScroll import androidx.compose.foundation.verticalScroll
import androidx.compose.material.icons.Icons import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.filled.Close import androidx.compose.material.icons.filled.Close
import androidx.compose.material.icons.filled.Edit
import androidx.compose.material.icons.filled.Lock import androidx.compose.material.icons.filled.Lock
import androidx.compose.material.icons.filled.LockOpen import androidx.compose.material.icons.filled.LockOpen
import androidx.compose.material3.CircularProgressIndicator import androidx.compose.material3.CircularProgressIndicator
@ -106,9 +107,50 @@ fun ServerConnectScreen(
var useHttps by remember { mutableStateOf(false) } var useHttps by remember { mutableStateOf(false) }
var isConnecting by remember { mutableStateOf(false) } var isConnecting by remember { mutableStateOf(false) }
var errorMessage by remember { mutableStateOf<String?>(null) } var errorMessage by remember { mutableStateOf<String?>(null) }
// The saved server currently being edited, or null when adding/connecting.
var editingServer by remember { mutableStateOf<ServerEntry?>(null) }
val savedServers by prefs.savedServers.collectAsState(initial = emptyList()) val savedServers by prefs.savedServers.collectAsState(initial = emptyList())
fun clearForm() {
name = ""
address = ""
port = ""
password = ""
useHttps = false
passwordVisible = false
errorMessage = null
}
fun startEdit(server: ServerEntry) {
editingServer = server
name = server.name
address = server.address
port = server.port
password = server.password
useHttps = server.useHttps
passwordVisible = false
errorMessage = null
}
fun cancelEdit() {
editingServer = null
clearForm()
}
fun saveEdit() {
val original = editingServer ?: return
if (address.isBlank()) {
errorMessage = "Enter a server address"
return
}
val updated = ServerEntry(address, useHttps, port, password, name)
scope.launch {
prefs.updateSavedServer(original, updated)
cancelEdit()
}
}
fun connect(server: ServerEntry) { fun connect(server: ServerEntry) {
if (isConnecting) return if (isConnecting) return
if (server.address.isBlank()) { if (server.address.isBlank()) {
@ -178,7 +220,7 @@ fun ServerConnectScreen(
Spacer(modifier = Modifier.height(4.dp)) Spacer(modifier = Modifier.height(4.dp))
Text( Text(
text = "Connect to Server", text = if (editingServer != null) stringResource(R.string.edit_server_title) else "Connect to Server",
style = MaterialTheme.typography.headlineMedium, style = MaterialTheme.typography.headlineMedium,
color = TextPrimary, color = TextPrimary,
textAlign = TextAlign.Center, textAlign = TextAlign.Center,
@ -324,7 +366,11 @@ fun ServerConnectScreen(
keyboardActions = KeyboardActions( keyboardActions = KeyboardActions(
onGo = { onGo = {
keyboard?.hide() keyboard?.hide()
connect(ServerEntry(address, useHttps, port, password, name)) if (editingServer != null) {
saveEdit()
} else {
connect(ServerEntry(address, useHttps, port, password, name))
}
}, },
), ),
colors = OutlinedTextFieldDefaults.colors( colors = OutlinedTextFieldDefaults.colors(
@ -389,15 +435,40 @@ fun ServerConnectScreen(
} }
} }
// Connect button — glass style if (editingServer != null) {
GlassButton( // Save / Cancel while editing an existing saved server
text = if (isConnecting) stringResource(R.string.connecting) else stringResource(R.string.connect), Row(
onClick = { modifier = Modifier.fillMaxWidth(),
keyboard?.hide() horizontalArrangement = Arrangement.spacedBy(12.dp),
connect(ServerEntry(address, useHttps, port, password, name)) ) {
}, GlassButton(
modifier = Modifier.fillMaxWidth().height(56.dp), text = stringResource(R.string.cancel),
) onClick = {
keyboard?.hide()
cancelEdit()
},
modifier = Modifier.weight(1f).height(56.dp),
)
GlassButton(
text = stringResource(R.string.save_changes),
onClick = {
keyboard?.hide()
saveEdit()
},
modifier = Modifier.weight(1f).height(56.dp),
)
}
} else {
// Connect button — glass style
GlassButton(
text = if (isConnecting) stringResource(R.string.connecting) else stringResource(R.string.connect),
onClick = {
keyboard?.hide()
connect(ServerEntry(address, useHttps, port, password, name))
},
modifier = Modifier.fillMaxWidth().height(56.dp),
)
}
if (isConnecting) { if (isConnecting) {
CircularProgressIndicator( CircularProgressIndicator(
@ -407,8 +478,8 @@ fun ServerConnectScreen(
) )
} }
// Saved servers // Saved servers (hidden while editing one to keep focus on the form)
if (savedServers.isNotEmpty()) { if (editingServer == null && savedServers.isNotEmpty()) {
Spacer(modifier = Modifier.height(8.dp)) Spacer(modifier = Modifier.height(8.dp))
Text( Text(
text = stringResource(R.string.saved_servers), text = stringResource(R.string.saved_servers),
@ -422,6 +493,7 @@ fun ServerConnectScreen(
SavedServerItem( SavedServerItem(
server = server, server = server,
onConnect = { connect(it) }, onConnect = { connect(it) },
onEdit = { startEdit(it) },
onRemove = { scope.launch { prefs.removeSavedServer(it) } }, onRemove = { scope.launch { prefs.removeSavedServer(it) } },
) )
} }
@ -434,6 +506,7 @@ fun ServerConnectScreen(
private fun SavedServerItem( private fun SavedServerItem(
server: ServerEntry, server: ServerEntry,
onConnect: (ServerEntry) -> Unit, onConnect: (ServerEntry) -> Unit,
onEdit: (ServerEntry) -> Unit,
onRemove: (ServerEntry) -> Unit, onRemove: (ServerEntry) -> Unit,
) { ) {
Row( Row(
@ -476,6 +549,9 @@ private fun SavedServerItem(
} }
} }
} }
IconButton(onClick = { onEdit(server) }) {
Icon(imageVector = Icons.Default.Edit, contentDescription = stringResource(R.string.edit_server), modifier = Modifier.size(18.dp), tint = TextMuted)
}
IconButton(onClick = { onRemove(server) }) { IconButton(onClick = { onRemove(server) }) {
Icon(imageVector = Icons.Default.Close, contentDescription = stringResource(R.string.remove_server), modifier = Modifier.size(18.dp), tint = TextMuted) Icon(imageVector = Icons.Default.Close, contentDescription = stringResource(R.string.remove_server), modifier = Modifier.size(18.dp), tint = TextMuted)
} }

View File

@ -2,6 +2,7 @@ package com.archipelago.app.ui.screens
import android.annotation.SuppressLint import android.annotation.SuppressLint
import android.graphics.Bitmap import android.graphics.Bitmap
import android.graphics.BitmapFactory
import android.view.ViewGroup import android.view.ViewGroup
import android.webkit.CookieManager import android.webkit.CookieManager
import android.webkit.WebChromeClient import android.webkit.WebChromeClient
@ -14,6 +15,7 @@ import androidx.activity.compose.BackHandler
import androidx.compose.animation.AnimatedVisibility import androidx.compose.animation.AnimatedVisibility
import androidx.compose.animation.fadeIn import androidx.compose.animation.fadeIn
import androidx.compose.animation.fadeOut import androidx.compose.animation.fadeOut
import androidx.compose.foundation.Image
import androidx.compose.foundation.background import androidx.compose.foundation.background
import androidx.compose.foundation.layout.Arrangement import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Box import androidx.compose.foundation.layout.Box
@ -27,17 +29,24 @@ import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.layout.safeDrawing import androidx.compose.foundation.layout.safeDrawing
import androidx.compose.foundation.layout.size import androidx.compose.foundation.layout.size
import androidx.compose.foundation.layout.width
import androidx.compose.foundation.layout.windowInsetsPadding import androidx.compose.foundation.layout.windowInsetsPadding
import androidx.compose.foundation.shape.RoundedCornerShape
import androidx.compose.material.icons.Icons import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.automirrored.filled.ArrowBack
import androidx.compose.material.icons.automirrored.filled.ArrowForward
import androidx.compose.material.icons.filled.Close import androidx.compose.material.icons.filled.Close
import androidx.compose.material.icons.filled.CloudOff import androidx.compose.material.icons.filled.CloudOff
import androidx.compose.material.icons.filled.OpenInBrowser import androidx.compose.material.icons.filled.OpenInBrowser
import androidx.compose.material.icons.filled.Refresh
import androidx.compose.material3.CircularProgressIndicator
import androidx.compose.material3.Icon import androidx.compose.material3.Icon
import androidx.compose.material3.IconButton import androidx.compose.material3.IconButton
import androidx.compose.material3.LinearProgressIndicator import androidx.compose.material3.LinearProgressIndicator
import androidx.compose.material3.MaterialTheme import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Text import androidx.compose.material3.Text
import androidx.compose.runtime.Composable import androidx.compose.runtime.Composable
import androidx.compose.runtime.LaunchedEffect
import androidx.compose.runtime.getValue import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableIntStateOf import androidx.compose.runtime.mutableIntStateOf
import androidx.compose.runtime.mutableStateOf import androidx.compose.runtime.mutableStateOf
@ -45,6 +54,8 @@ import androidx.compose.runtime.remember
import androidx.compose.runtime.setValue import androidx.compose.runtime.setValue
import androidx.compose.ui.Alignment import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier import androidx.compose.ui.Modifier
import androidx.compose.ui.draw.clip
import androidx.compose.ui.graphics.asImageBitmap
import androidx.compose.ui.platform.LocalContext import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.res.stringResource import androidx.compose.ui.res.stringResource
import androidx.compose.ui.text.style.TextAlign import androidx.compose.ui.text.style.TextAlign
@ -56,6 +67,8 @@ import com.archipelago.app.ui.theme.BitcoinOrange
import com.archipelago.app.ui.theme.SurfaceBlack import com.archipelago.app.ui.theme.SurfaceBlack
import com.archipelago.app.ui.theme.TextMuted import com.archipelago.app.ui.theme.TextMuted
import com.archipelago.app.ui.theme.TextPrimary import com.archipelago.app.ui.theme.TextPrimary
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.withContext
/** Open a URL in the phone's default browser (genuinely external links). */ /** Open a URL in the phone's default browser (genuinely external links). */
private fun openExternalUrl(context: android.content.Context, url: String) { private fun openExternalUrl(context: android.content.Context, url: String) {
@ -310,6 +323,26 @@ fun WebViewScreen(
} }
} }
// Node apps (e.g. NetBird) terminate TLS with a
// self-signed cert — the dashboard needs a secure
// context for OIDC/window.crypto.subtle (#15). The
// WebView default is to CANCEL untrusted certs, so
// those apps render blank. The user explicitly trusts
// their own node, so proceed for same-host certs only;
// reject anything else (don't blanket-trust the web).
override fun onReceivedSslError(
view: WebView?,
handler: android.webkit.SslErrorHandler?,
error: android.net.http.SslError?,
) {
val u = error?.url
if (u != null && isSameHost(u, serverUrl)) {
handler?.proceed()
} else {
handler?.cancel()
}
}
override fun shouldOverrideUrlLoading( override fun shouldOverrideUrlLoading(
view: WebView?, view: WebView?,
request: WebResourceRequest?, request: WebResourceRequest?,
@ -428,11 +461,34 @@ fun WebViewScreen(
} }
} }
/** Best-effort fetch of the origin's /favicon.ico, so the launched app's icon
* can be shown on the loading screen before the WebView reports onReceivedIcon
* (which only fires once the page's <head> has parsed). Blocking call on IO. */
private fun fetchFavicon(pageUrl: String): Bitmap? {
return try {
val u = android.net.Uri.parse(pageUrl)
val scheme = u.scheme ?: return null
val host = u.host ?: return null
val portPart = if (u.port > 0) ":${u.port}" else ""
val conn = (java.net.URL("$scheme://$host$portPart/favicon.ico").openConnection()
as java.net.HttpURLConnection).apply {
connectTimeout = 4000
readTimeout = 4000
instanceFollowRedirects = true
}
conn.inputStream.use { BitmapFactory.decodeStream(it) }
} catch (_: Exception) {
null
}
}
/** /**
* Lightweight in-app browser used when the kiosk hands off an app that can't be * Lightweight in-app browser used when the kiosk hands off an app that can't be
* shown in an iframe. Loads the app in a local WebView with a minimal top bar * shown in an iframe. Loads the app in a local WebView with a centered loading
* (close + title + escalate-to-real-browser). Same-host navigation stays here; * screen (app favicon + progress bar) and a BOTTOM control bar mirroring the
* any genuinely external link escapes to the phone's browser. * web mobile-iframe footer (back / forward / reload / open-in-browser / close).
* Same-host navigation stays here; any genuinely external link escapes to the
* phone's browser.
*/ */
@SuppressLint("SetJavaScriptEnabled") @SuppressLint("SetJavaScriptEnabled")
@Composable @Composable
@ -444,8 +500,20 @@ private fun InAppBrowser(
val context = LocalContext.current val context = LocalContext.current
var browser by remember { mutableStateOf<WebView?>(null) } var browser by remember { mutableStateOf<WebView?>(null) }
var title by remember { mutableStateOf(android.net.Uri.parse(url).host ?: url) } var title by remember { mutableStateOf(android.net.Uri.parse(url).host ?: url) }
var favicon by remember { mutableStateOf<Bitmap?>(null) }
var progress by remember { mutableIntStateOf(0) } var progress by remember { mutableIntStateOf(0) }
var loading by remember { mutableStateOf(true) } var loading by remember { mutableStateOf(true) }
var canGoBack by remember { mutableStateOf(false) }
var canGoForward by remember { mutableStateOf(false) }
// Seed the loading-screen icon immediately from a best-effort favicon
// pre-fetch (main's app-icon work), then onReceivedIcon upgrades it — so the
// loader shows an icon right away instead of staying blank until the page
// parses its <head> (which is what made the loader look stuck).
LaunchedEffect(url) {
val fetched = withContext(Dispatchers.IO) { fetchFavicon(url) }
if (fetched != null && favicon == null) favicon = fetched
}
// Back: walk the in-app history first, then close the overlay. // Back: walk the in-app history first, then close the overlay.
BackHandler { BackHandler {
@ -459,13 +527,169 @@ private fun InAppBrowser(
.background(SurfaceBlack) .background(SurfaceBlack)
.windowInsetsPadding(WindowInsets.safeDrawing), .windowInsetsPadding(WindowInsets.safeDrawing),
) { ) {
// WebView + loading overlay fill the area above the bottom control bar.
Box(modifier = Modifier.weight(1f).fillMaxWidth()) {
AndroidView(
modifier = Modifier.fillMaxSize(),
factory = { ctx ->
WebView(ctx).apply {
layoutParams = ViewGroup.LayoutParams(
ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT,
)
isVerticalScrollBarEnabled = false
isHorizontalScrollBarEnabled = false
CookieManager.getInstance().setAcceptThirdPartyCookies(this, true)
applyArchipelagoSettings()
webChromeClient = object : WebChromeClient() {
override fun onProgressChanged(view: WebView?, newProgress: Int) {
progress = newProgress
}
override fun onReceivedTitle(view: WebView?, t: String?) {
if (!t.isNullOrBlank()) title = t
}
override fun onReceivedIcon(view: WebView?, icon: Bitmap?) {
if (icon != null) favicon = icon
}
}
webViewClient = object : WebViewClient() {
override fun onPageStarted(view: WebView?, u: String?, favicon: Bitmap?) {
loading = true
}
override fun onPageFinished(view: WebView?, u: String?) {
loading = false
canGoBack = view?.canGoBack() == true
canGoForward = view?.canGoForward() == true
}
override fun doUpdateVisitedHistory(view: WebView?, u: String?, isReload: Boolean) {
canGoBack = view?.canGoBack() == true
canGoForward = view?.canGoForward() == true
}
// Self-signed TLS on the node's apps (e.g. NetBird on
// :8087) would otherwise be cancelled by the WebView
// and render blank. Proceed for the user's own node
// (same host); reject any other untrusted cert.
override fun onReceivedSslError(
view: WebView?,
handler: android.webkit.SslErrorHandler?,
error: android.net.http.SslError?,
) {
val u = error?.url
if (u != null && isSameHost(u, serverUrl)) {
handler?.proceed()
} else {
handler?.cancel()
}
}
override fun shouldOverrideUrlLoading(
view: WebView?,
request: WebResourceRequest?,
): Boolean {
val u = request?.url?.toString() ?: return false
// Stay in the overlay for same-node navigation;
// hand genuinely external links to the real browser.
if (isSameHost(u, serverUrl)) return false
openExternalUrl(ctx, u)
return true
}
}
browser = this
loadUrl(url)
}
},
)
// Centered loading screen — app favicon (or spinner) + title + bar.
if (loading) {
Column(
modifier = Modifier
.fillMaxSize()
.background(SurfaceBlack),
horizontalAlignment = Alignment.CenterHorizontally,
verticalArrangement = Arrangement.Center,
) {
Box(
modifier = Modifier.size(84.dp).clip(RoundedCornerShape(20.dp)),
contentAlignment = Alignment.Center,
) {
val fav = favicon
if (fav != null) {
Image(
bitmap = fav.asImageBitmap(),
contentDescription = title,
modifier = Modifier.fillMaxSize(),
)
} else {
CircularProgressIndicator(color = BitcoinOrange)
}
}
Spacer(modifier = Modifier.height(18.dp))
Text(
text = title,
style = MaterialTheme.typography.bodyLarge,
color = TextPrimary,
maxLines = 1,
overflow = TextOverflow.Ellipsis,
)
Spacer(modifier = Modifier.height(16.dp))
LinearProgressIndicator(
progress = { progress / 100f },
modifier = Modifier.width(220.dp),
color = BitcoinOrange,
trackColor = TextMuted.copy(alpha = 0.2f),
)
}
}
}
// Bottom control bar — mirrors the web mobile-iframe footer.
Row( Row(
modifier = Modifier modifier = Modifier
.fillMaxWidth() .fillMaxWidth()
.height(48.dp) .height(56.dp)
.padding(horizontal = 4.dp), .background(SurfaceBlack)
.padding(horizontal = 8.dp),
horizontalArrangement = Arrangement.SpaceAround,
verticalAlignment = Alignment.CenterVertically, verticalAlignment = Alignment.CenterVertically,
) { ) {
IconButton(onClick = { browser?.goBack() }, enabled = canGoBack) {
Icon(
imageVector = Icons.AutoMirrored.Filled.ArrowBack,
contentDescription = "Back",
tint = if (canGoBack) TextPrimary else TextMuted.copy(alpha = 0.4f),
)
}
IconButton(onClick = { browser?.goForward() }, enabled = canGoForward) {
Icon(
imageVector = Icons.AutoMirrored.Filled.ArrowForward,
contentDescription = "Forward",
tint = if (canGoForward) TextPrimary else TextMuted.copy(alpha = 0.4f),
)
}
IconButton(onClick = { browser?.reload() }) {
Icon(
imageVector = Icons.Default.Refresh,
contentDescription = "Reload",
tint = TextPrimary,
)
}
IconButton(onClick = { openExternalUrl(context, browser?.url ?: url) }) {
Icon(
imageVector = Icons.Default.OpenInBrowser,
contentDescription = stringResource(R.string.open_in_browser),
tint = TextPrimary,
)
}
IconButton(onClick = onClose) { IconButton(onClick = onClose) {
Icon( Icon(
imageVector = Icons.Default.Close, imageVector = Icons.Default.Close,
@ -473,82 +697,6 @@ private fun InAppBrowser(
tint = TextPrimary, tint = TextPrimary,
) )
} }
Text(
text = title,
style = MaterialTheme.typography.bodyMedium,
color = TextPrimary,
maxLines = 1,
overflow = TextOverflow.Ellipsis,
modifier = Modifier.weight(1f),
)
IconButton(onClick = { openExternalUrl(context, browser?.url ?: url) }) {
Icon(
imageVector = Icons.Default.OpenInBrowser,
contentDescription = stringResource(R.string.open_in_browser),
tint = TextMuted,
)
}
} }
AnimatedVisibility(visible = loading, enter = fadeIn(), exit = fadeOut()) {
LinearProgressIndicator(
progress = { progress / 100f },
modifier = Modifier.fillMaxWidth(),
color = BitcoinOrange,
trackColor = SurfaceBlack,
)
}
AndroidView(
modifier = Modifier.fillMaxSize(),
factory = { ctx ->
WebView(ctx).apply {
layoutParams = ViewGroup.LayoutParams(
ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT,
)
isVerticalScrollBarEnabled = false
isHorizontalScrollBarEnabled = false
CookieManager.getInstance().setAcceptThirdPartyCookies(this, true)
applyArchipelagoSettings()
webChromeClient = object : WebChromeClient() {
override fun onProgressChanged(view: WebView?, newProgress: Int) {
progress = newProgress
}
override fun onReceivedTitle(view: WebView?, t: String?) {
if (!t.isNullOrBlank()) title = t
}
}
webViewClient = object : WebViewClient() {
override fun onPageStarted(view: WebView?, u: String?, favicon: Bitmap?) {
loading = true
}
override fun onPageFinished(view: WebView?, u: String?) {
loading = false
}
override fun shouldOverrideUrlLoading(
view: WebView?,
request: WebResourceRequest?,
): Boolean {
val u = request?.url?.toString() ?: return false
// Stay in the overlay for same-node navigation;
// hand genuinely external links to the real browser.
if (isSameHost(u, serverUrl)) return false
openExternalUrl(ctx, u)
return true
}
}
browser = this
loadUrl(url)
}
},
)
} }
} }

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M15,19l-7,-7 7,-7"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M6,18L18,6M6,6l12,12"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M9,5l7,7 -7,7"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M10,6H6a2,2 0,0 0,-2 2v10a2,2 0,0 0,2 2h10a2,2 0,0 0,2 -2v-4M14,4h6m0,0v6m0,-6L10,14"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M4,4v6h6M20,20v-6h-6M5.64,15.36A8,8 0,0 0,18.36 18M18.36,8.64A8,8 0,0 0,5.64 6"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -23,6 +23,13 @@
<string name="remote_input_hint">Use your phone as a keyboard and mouse for the kiosk</string> <string name="remote_input_hint">Use your phone as a keyboard and mouse for the kiosk</string>
<string name="close">Close</string> <string name="close">Close</string>
<string name="open_in_browser">Open in browser</string> <string name="open_in_browser">Open in browser</string>
<string name="back">Back</string>
<string name="forward">Forward</string>
<string name="refresh">Refresh</string>
<string name="server_name_label">Server Name (optional)</string> <string name="server_name_label">Server Name (optional)</string>
<string name="server_name_placeholder">My Archipelago</string> <string name="server_name_placeholder">My Archipelago</string>
<string name="edit_server">Edit</string>
<string name="edit_server_title">Edit Server</string>
<string name="save_changes">Save Changes</string>
<string name="cancel">Cancel</string>
</resources> </resources>

View File

@ -1,13 +1,18 @@
#!/usr/bin/env bash #!/usr/bin/env bash
# #
# Build the Android companion app and publish it as the served download # Build the Android companion app and publish it as the served download
# (neode-ui/public/packages/archipelago-companion.apk.zip), then commit + push. # (neode-ui/public/packages/archipelago-companion.apk — a plain APK a phone can
# install straight from the link), then commit + push.
# #
# Use this INSTEAD of `git push` when shipping the companion app, so the # Use this INSTEAD of `git push` when shipping the companion app, so the
# downloadable APK on the node always matches what's on main. # downloadable APK on the node always matches what's on main.
# #
# ./Android/ship-companion.sh # ./Android/ship-companion.sh
# #
# The actual build/sign/verify/stage is done by scripts/publish-companion-apk.sh
# (single source of truth, shared with the pre-push hook). It does a CLEAN build,
# forces v1+v2+v3 signing, and ABORTS if any signature scheme is missing — so a
# broken or v2-only APK can never be shipped.
set -euo pipefail set -euo pipefail
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
@ -16,21 +21,15 @@ cd "$ROOT"
export JAVA_HOME="${JAVA_HOME:-/opt/homebrew/opt/openjdk@17}" export JAVA_HOME="${JAVA_HOME:-/opt/homebrew/opt/openjdk@17}"
export ANDROID_HOME="${ANDROID_HOME:-$HOME/Library/Android/sdk}" export ANDROID_HOME="${ANDROID_HOME:-$HOME/Library/Android/sdk}"
APK="Android/app/build/outputs/apk/debug/app-debug.apk" DEST="neode-ui/public/packages/archipelago-companion.apk"
DEST="neode-ui/public/packages/archipelago-companion.apk.zip"
echo "==> Building debug APK" echo "==> Building + signing + verifying companion APK"
( cd Android && ./gradlew :app:assembleDebug --console=plain -q ) bash scripts/publish-companion-apk.sh
[ -f "$APK" ] || { echo "ERROR: APK not found at $APK" >&2; exit 1; }
echo "==> Publishing -> $DEST" [ -f "$DEST" ] || { echo "ERROR: served APK not found at $DEST" >&2; exit 1; }
mkdir -p "$(dirname "$DEST")"
rm -f "$DEST"
( cd "$(dirname "$APK")" && zip -j -q "$ROOT/$DEST" "$(basename "$APK")" )
git add "$DEST" if git diff --cached --quiet -- "$DEST"; then
if git diff --cached --quiet; then echo "==> Nothing to commit (APK unchanged)"
echo "==> Nothing to commit (working tree + APK unchanged)"
else else
git commit -q -m "chore(android): update companion apk download" git commit -q -m "chore(android): update companion apk download"
echo "==> Committed" echo "==> Committed"

View File

@ -1,13 +1,23 @@
# Archipelago — agent guide # Archipelago — agent guide
## 🚩 TOP PRIORITY (until production testing passes) ## ✅ Single-node production gate is GREEN (2026-06-23)
**Read `docs/PRODUCTION-MASTER-PLAN.md` first.** It is the authoritative plan and `tests/lifecycle/run-gate.sh` is **5/5 on .228, 0 failures** — the single-node exit
overrides ad-hoc direction until the production test gate is green. Goal: a criterion is met and the priority banner is demoted. Next exit-criteria: the
world-class, **developer-ready app platform** where every app is manifest-driven, **multinode pass** (`docs/multinode-testing-plan.md`) and workstreams B/C/D.
manifests ship via the **signed registry** (not OTA disk files), and **third-party
developers publish apps via an external/decentralized registry** — all rootless, **For day-to-day work, use `docs/UNIFIED-TASK-TRACKER.md`** — the consolidated,
secure, robust, and 100%-uptime-capable. priority-ordered "what's left" list across the 1.8.0 OTA and master-plan docs
(fastest/simplest tasks first). It supersedes hunting through the two source docs
below for open items; those remain the narrative/history.
**Read `docs/PRODUCTION-MASTER-PLAN.md` first** — it is still the authoritative plan
for the north star: a world-class, **developer-ready app platform** where every app
is manifest-driven, manifests ship via the **signed registry** (not OTA disk files),
and **third-party developers publish apps via an external/decentralized registry**
all rootless, secure, robust, and 100%-uptime-capable. It no longer overrides all
ad-hoc direction now that the gate is green, but it remains the source of truth for
sequencing the remaining workstreams.
Detailed sub-plans (all linked from the master): Detailed sub-plans (all linked from the master):
- App platform / packaging phases + security model → `docs/APP-PACKAGING-MIGRATION-PLAN.md` - App platform / packaging phases + security model → `docs/APP-PACKAGING-MIGRATION-PLAN.md`
@ -16,6 +26,28 @@ Detailed sub-plans (all linked from the master):
- Current per-app state → `docs/app-registry-status-2026-06-21.md` - Current per-app state → `docs/app-registry-status-2026-06-21.md`
- Production test gate (exit criterion) → `tests/lifecycle/TESTING.md` - Production test gate (exit criterion) → `tests/lifecycle/TESTING.md`
## Commit & push every unit of work (never violate)
**The #1 process rule: work is not "done" until it is committed AND pushed.** This
exists because finished work has been lost/clobbered by sitting uncommitted in the
shared tree across agents and sessions. To prevent that:
- **Commit each feature/fix the moment it works** — one focused, self-contained
commit per logical change (it compiles and its targeted tests pass). Do not let
unrelated changes accumulate uncommitted.
- **Push immediately after committing** so nothing lives only on one machine. `main`
is protected → push via `git push gitea-ai main` (account `ai`, see the memory
note); feature branches push to their own remote.
- **Never leave a stack of finished work uncommitted** overnight or when handing off
between agents — if you must pause mid-change, commit a clearly-labelled WIP
checkpoint rather than leaving it dirty.
- **Stage explicitly by path** (`git add <paths>`) when another agent's uncommitted
work shares the tree — never `git add -A` / `git commit -a`, which clobbers or
entangles their changes.
- **Never commit or push secrets** (mnemonics, private keys, API tokens). Signing is
done offline; artifacts (catalog/manifest) are signed, not the keys.
- Commit messages end with the `Co-Authored-By: Claude …` trailer.
## Invariants (never violate) ## Invariants (never violate)
- **Rootless Podman only.** No rootful, no Docker-socket mounts, no privileged - **Rootless Podman only.** No rootful, no Docker-socket mounts, no privileged
@ -27,7 +59,8 @@ Detailed sub-plans (all linked from the master):
`container::secrets`, 0600/rootless) — never hardcoded, per-app, or logged. `container::secrets`, 0600/rootless) — never hardcoded, per-app, or logged.
- **Migrations never destroy data** — preserve `/var/lib/archipelago/<app>`, - **Migrations never destroy data** — preserve `/var/lib/archipelago/<app>`,
secrets, credentials, ports, and adoption container names; keep a rollback path. secrets, credentials, ports, and adoption container names; keep a rollback path.
- **Verify on a real node (.228, then .198) before any tag.** - **Verify on the real node .228 before any tag.** (Fleet-wide multinode
verification is a separate plan: `docs/multinode-testing-plan.md`.)
## Build / verify ## Build / verify
@ -41,7 +74,11 @@ Detailed sub-plans (all linked from the master):
## Production test gate (definition of done) ## Production test gate (definition of done)
`tests/lifecycle/run-20x.sh` green across install / UI / stop / start / restart / `tests/lifecycle/run-gate.sh` green across install / UI / stop / start / restart /
reinstall / reboot-survive / archipelago-restart-survive / uninstall — **5× on reinstall / reboot-survive / archipelago-restart-survive / uninstall — **5× on
.228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from 20× .228** (`ARCHY_ITERATIONS=5`). **Run the gate ON the node** (it uses local podman/systemctl/bitcoin
restore to 20× before the final ship). Until green, the master plan is the priority. probes), not via RPC from another host. **✅ GREEN 2026-06-23 (5/5, 0 not-ok)** — keep it
green (re-run after orchestrator/lifecycle changes); regressions are top priority again.
**Multinode testing (.198 + the rest of the fleet) is a SEPARATE plan** —
`docs/multinode-testing-plan.md` — not part of this single-node gate criterion, and is
the next exit criterion now that single-node is green.

View File

@ -73,7 +73,7 @@
"author": "Mempool", "author": "Mempool",
"category": "money", "category": "money",
"tier": "core", "tier": "core",
"dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0", "dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1",
"repoUrl": "https://github.com/mempool/mempool", "repoUrl": "https://github.com/mempool/mempool",
"requires": [ "requires": [
"bitcoin-knots", "bitcoin-knots",
@ -214,31 +214,6 @@
] ]
} }
}, },
{
"id": "meshtastic",
"title": "Meshtastic",
"version": "2-daily-alpine",
"description": "Open-source mesh networking for LoRa radios. Create decentralized communication networks.",
"icon": "/assets/img/app-icons/meshcore.svg",
"author": "Meshtastic",
"category": "networking",
"tier": "recommended",
"dockerImage": "docker.io/meshtastic/meshtasticd:daily-alpine",
"repoUrl": "https://github.com/meshtastic/firmware",
"containerConfig": {
"ports": [
"4403:4403"
],
"volumes": [
"/var/lib/archipelago/meshtastic:/var/lib/meshtasticd"
],
"env": [
"MESHTASTIC_PORT=/dev/ttyUSB0",
"MESHTASTIC_SERIAL=true"
],
"notes": "Requires a LoRa radio device at /dev/ttyUSB0. The config file is rendered from the app manifest before container start."
}
},
{ {
"id": "vaultwarden", "id": "vaultwarden",
"title": "Vaultwarden", "title": "Vaultwarden",
@ -294,12 +269,12 @@
"id": "fedimint-clientd", "id": "fedimint-clientd",
"title": "Fedimint Client", "title": "Fedimint Client",
"version": "0.8.0", "version": "0.8.0",
"description": "Fedimint ecash client daemon (fmcd). Lets your node hold Fedimint ecash and join federations; the wallet talks to it over a local REST API.", "description": "Fedimint ecash client daemon (fmcd). Lets the node hold Fedimint ecash and join federations; the wallet talks to it over a local REST API.",
"icon": "/assets/img/app-icons/fedimint.png", "icon": "/assets/img/app-icons/fedimint.png",
"author": "Fedimint", "author": "Fedimint",
"category": "money", "category": "money",
"tier": "core", "tier": "core",
"dockerImage": "146.59.87.168:3000/lfg2025/fmcd:0.8.0", "dockerImage": "146.59.87.168:3000/lfg2025/fmcd:0.8.1",
"repoUrl": "https://github.com/minmoto/fmcd" "repoUrl": "https://github.com/minmoto/fmcd"
}, },
{ {
@ -346,8 +321,8 @@
{ {
"id": "immich", "id": "immich",
"title": "Immich", "title": "Immich",
"version": "1.90.0", "version": "2.7.4",
"description": "High-performance photo and video backup with ML.", "description": "Self-hosted photo and video backup with mobile apps and search.",
"icon": "/assets/img/app-icons/immich.png", "icon": "/assets/img/app-icons/immich.png",
"author": "Immich", "author": "Immich",
"category": "data", "category": "data",
@ -453,13 +428,13 @@
{ {
"id": "netbird", "id": "netbird",
"title": "NetBird", "title": "NetBird",
"version": "0.71.2", "version": "2.38.0",
"description": "Self-hosted WireGuard mesh VPN control plane with dashboard, embedded identity provider, management API, signal, relay, and STUN service.", "description": "Self-hosted WireGuard mesh VPN control plane with dashboard, embedded identity provider, management API, signal, relay, and STUN. The user-facing entry point — a TLS proxy in front of the dashboard + server.",
"icon": "/assets/img/app-icons/netbird.svg", "icon": "/assets/img/app-icons/netbird.svg",
"author": "NetBird", "author": "NetBird",
"category": "networking", "category": "networking",
"tier": "recommended", "tier": "recommended",
"dockerImage": "docker.io/netbirdio/dashboard:v2.38.0", "dockerImage": "docker.io/library/nginx:1.27-alpine",
"repoUrl": "https://github.com/netbirdio/netbird", "repoUrl": "https://github.com/netbirdio/netbird",
"containerConfig": { "containerConfig": {
"ports": [ "ports": [

View File

@ -1,7 +1,7 @@
app: app:
id: archy-btcpay-db id: archy-btcpay-db
name: BTCPay Postgres name: BTCPay Postgres
version: 15.17 version: "15.17"
description: Postgres backend for BTCPay and NBXplorer. description: Postgres backend for BTCPay and NBXplorer.
container: container:

View File

@ -1,12 +1,12 @@
app: app:
id: archy-mempool-web id: archy-mempool-web
name: Mempool Web name: Mempool Web
version: 3.0.0 version: 3.0.1
description: Frontend web UI for mempool explorer. description: Frontend web UI for mempool explorer.
container_name: mempool container_name: mempool
container: container:
image: git.tx1138.com/lfg2025/mempool-frontend:v3.0.0 image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1
pull_policy: if-not-present pull_policy: if-not-present
network: archy-net network: archy-net
@ -33,7 +33,10 @@ app:
health_check: health_check:
type: http type: http
endpoint: http://localhost:8080 # 127.0.0.1 not localhost: the image's wget resolves localhost to ::1 (IPv6)
# first, but nginx binds 0.0.0.0:8080 (IPv4) only -> localhost probe gets
# "connection refused" -> perpetual unhealthy -> health_monitor restart loop.
endpoint: http://127.0.0.1:8080
path: / path: /
interval: 30s interval: 30s
timeout: 5s timeout: 5s

View File

@ -1,5 +1,34 @@
# Bitcoin Core - uses official image # Bitcoin Core — minimal rootless image built from the OFFICIAL upstream release.
FROM bitcoin/bitcoin:24.0 #
# The CANONICAL, verified build path is scripts/build-bitcoin-image.sh, which
# Default user is already 'bitcoin' # downloads the upstream tarball, verifies SHA-256 + the OpenPGP signature
# No additional setup needed # (fail-closed), and tags/pushes <registry>/bitcoin:<version>. This Dockerfile
# mirrors that image for a manual/local build and replaces the old stale
# community base (`FROM bitcoin/bitcoin:24.0`).
#
# Build (binaries must be pre-fetched + verified into ./bin — see the script):
# scripts/build-bitcoin-image.sh core 31.0
FROM debian:bookworm-slim
ARG BITCOIN_VERSION=31.0
RUN set -eux; \
apt-get update; \
apt-get install -y --no-install-recommends ca-certificates; \
rm -rf /var/lib/apt/lists/*; \
useradd -m -u 1000 -s /bin/bash bitcoin; \
mkdir -p /home/bitcoin/.bitcoin; \
chown -R bitcoin:bitcoin /home/bitcoin
# bin/ holds the SHA-256 + GPG-verified bitcoind / bitcoin-cli (Guix-built,
# x86_64-linux-gnu) extracted from the official release tarball.
COPY bin/bitcoind /usr/local/bin/bitcoind
COPY bin/bitcoin-cli /usr/local/bin/bitcoin-cli
RUN chmod 0755 /usr/local/bin/bitcoind /usr/local/bin/bitcoin-cli
# Run as (container) root, like the legacy hand-built :latest image. Rootless
# Podman maps container-root to the unprivileged host service user; the manifest
# grants CAP_DAC_OVERRIDE so bitcoind can read its data dir, which the
# orchestrator chowns to the data_uid (host 100101 / container uid 102), not to
# this image's `bitcoin` user. A non-root USER can't read existing chain data and
# bitcoind crash-loops with "Error initializing block database".
WORKDIR /home/bitcoin
VOLUME ["/home/bitcoin/.bitcoin"]
EXPOSE 8332 8333
ENTRYPOINT ["bitcoind"]

View File

@ -17,6 +17,13 @@ app:
# the IBD sweet spot - 4GB on full nodes, 1GB on pruned. Container # the IBD sweet spot - 4GB on full nodes, 1GB on pruned. Container
# --memory=8g (config.rs::get_memory_limit) leaves headroom for # --memory=8g (config.rs::get_memory_limit) leaves headroom for
# mempool + connections. # mempool + connections.
#
# -printtoconsole=0: foreground bitcoind defaults console logging ON,
# which pushed every IBD "UpdateTip" line through conmon into journald
# (>1 GB/day on a fresh node). bitcoind still writes debug.log in the
# datadir (/var/lib/archipelago/bitcoin/debug.log, self-shrunk on
# restart) — use that for deep debugging; podman logs only carries
# entrypoint/startup errors.
- >- - >-
BITCOIND="$(command -v bitcoind || true)"; BITCOIND="$(command -v bitcoind || true)";
if [ -z "$BITCOIND" ]; then if [ -z "$BITCOIND" ]; then
@ -36,9 +43,9 @@ app:
RPC_TXRELAY_FLAGS="$RPC_TXRELAY_FLAGS -rpcauth=$RPC_TXRELAY_AUTH -rpcwhitelist=txrelay:sendrawtransaction,submitpackage,testmempoolaccept,getmempoolinfo,getrawmempool,getmempoolentry,getnetworkinfo,getblockchaininfo,getblockcount,getblockhash,getblock,getblockheader,getrawtransaction,gettxout,gettxspendingprevout,decoderawtransaction,decodescript,estimatesmartfee,uptime,ping,getconnectioncount,getpeerinfo,getindexinfo,getdeploymentinfo,getchaintips"; RPC_TXRELAY_FLAGS="$RPC_TXRELAY_FLAGS -rpcauth=$RPC_TXRELAY_AUTH -rpcwhitelist=txrelay:sendrawtransaction,submitpackage,testmempoolaccept,getmempoolinfo,getrawmempool,getmempoolentry,getnetworkinfo,getblockchaininfo,getblockcount,getblockhash,getblock,getblockheader,getrawtransaction,gettxout,gettxspendingprevout,decoderawtransaction,decodescript,estimatesmartfee,uptime,ping,getconnectioncount,getpeerinfo,getindexinfo,getdeploymentinfo,getchaintips";
fi; fi;
if [ "${DISK_GB_VALUE:-0}" -lt 1000 ]; then if [ "${DISK_GB_VALUE:-0}" -lt 1000 ]; then
exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -server=1 -prune=550 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=1024 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS"; exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -printtoconsole=0 -server=1 -prune=550 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=1024 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS";
else else
exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -server=1 -txindex=1 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=4096 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS"; exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -printtoconsole=0 -server=1 -txindex=1 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=4096 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS";
fi fi
derived_env: derived_env:
- key: DISK_GB - key: DISK_GB

View File

@ -0,0 +1,35 @@
# Bitcoin Knots — minimal rootless image built from the OFFICIAL upstream release.
#
# Knots previously had NO Dockerfile (the :latest tag was built/pushed by hand).
# The CANONICAL, verified build path is scripts/build-bitcoin-image.sh, which
# downloads the upstream tarball, verifies SHA-256 + the OpenPGP signature
# (fail-closed, Luke-Jr release key), and tags/pushes
# <registry>/bitcoin-knots:<version>. Knots version strings embed a build date,
# e.g. 29.3.knots20260508 — the full string is the tag.
#
# Build (binaries must be pre-fetched + verified into ./bin — see the script):
# scripts/build-bitcoin-image.sh knots 29.3.knots20260508
FROM debian:bookworm-slim
ARG KNOTS_VERSION=29.3.knots20260508
RUN set -eux; \
apt-get update; \
apt-get install -y --no-install-recommends ca-certificates; \
rm -rf /var/lib/apt/lists/*; \
useradd -m -u 1000 -s /bin/bash bitcoin; \
mkdir -p /home/bitcoin/.bitcoin; \
chown -R bitcoin:bitcoin /home/bitcoin
# bin/ holds the SHA-256 + GPG-verified bitcoind / bitcoin-cli (Knots, Guix-built,
# x86_64-linux-gnu) extracted from the official release tarball.
COPY bin/bitcoind /usr/local/bin/bitcoind
COPY bin/bitcoin-cli /usr/local/bin/bitcoin-cli
RUN chmod 0755 /usr/local/bin/bitcoind /usr/local/bin/bitcoin-cli
# Run as (container) root, like the legacy hand-built :latest image. Rootless
# Podman maps container-root to the unprivileged host service user; the manifest
# grants CAP_DAC_OVERRIDE so bitcoind can read its data dir, which the
# orchestrator chowns to the data_uid (host 100101 / container uid 102), not to
# this image's `bitcoin` user. A non-root USER can't read existing chain data and
# bitcoind crash-loops with "Error initializing block database".
WORKDIR /home/bitcoin
VOLUME ["/home/bitcoin/.bitcoin"]
EXPOSE 8332 8333
ENTRYPOINT ["bitcoind"]

View File

@ -17,6 +17,13 @@ app:
# the IBD sweet spot - 4GB on full nodes, 1GB on pruned. Container # the IBD sweet spot - 4GB on full nodes, 1GB on pruned. Container
# --memory=8g (config.rs::get_memory_limit) leaves headroom for # --memory=8g (config.rs::get_memory_limit) leaves headroom for
# mempool + connections. # mempool + connections.
#
# -printtoconsole=0: foreground bitcoind defaults console logging ON,
# which pushed every IBD "UpdateTip" line through conmon into journald
# (>1 GB/day on a fresh node). bitcoind still writes debug.log in the
# datadir (/var/lib/archipelago/bitcoin/debug.log, self-shrunk on
# restart) — use that for deep debugging; podman logs only carries
# entrypoint/startup errors.
- >- - >-
BITCOIND="$(command -v bitcoind || true)"; BITCOIND="$(command -v bitcoind || true)";
if [ -z "$BITCOIND" ]; then if [ -z "$BITCOIND" ]; then
@ -36,9 +43,9 @@ app:
RPC_TXRELAY_FLAGS="$RPC_TXRELAY_FLAGS -rpcauth=$RPC_TXRELAY_AUTH -rpcwhitelist=txrelay:sendrawtransaction,submitpackage,testmempoolaccept,getmempoolinfo,getrawmempool,getmempoolentry,getnetworkinfo,getblockchaininfo,getblockcount,getblockhash,getblock,getblockheader,getrawtransaction,gettxout,gettxspendingprevout,decoderawtransaction,decodescript,estimatesmartfee,uptime,ping,getconnectioncount,getpeerinfo,getindexinfo,getdeploymentinfo,getchaintips"; RPC_TXRELAY_FLAGS="$RPC_TXRELAY_FLAGS -rpcauth=$RPC_TXRELAY_AUTH -rpcwhitelist=txrelay:sendrawtransaction,submitpackage,testmempoolaccept,getmempoolinfo,getrawmempool,getmempoolentry,getnetworkinfo,getblockchaininfo,getblockcount,getblockhash,getblock,getblockheader,getrawtransaction,gettxout,gettxspendingprevout,decoderawtransaction,decodescript,estimatesmartfee,uptime,ping,getconnectioncount,getpeerinfo,getindexinfo,getdeploymentinfo,getchaintips";
fi; fi;
if [ "${DISK_GB_VALUE:-0}" -lt 1000 ]; then if [ "${DISK_GB_VALUE:-0}" -lt 1000 ]; then
exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -server=1 -prune=550 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=2048 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS"; exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -printtoconsole=0 -server=1 -prune=550 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=2048 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS";
else else
exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -server=1 -txindex=1 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=4096 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS"; exec "$BITCOIND" -datadir=/home/bitcoin/.bitcoin -noconf -printtoconsole=0 -server=1 -txindex=1 -rpcallowip=0.0.0.0/0 -rpcbind=0.0.0.0:8332 -listen=1 -bind=0.0.0.0:8333 -dbcache=4096 -par=0 -maxconnections=125 $RPC_HEADROOM $RPC_TXRELAY_FLAGS -rpcuser="$RPC_USER" -rpcpassword="$RPC_PASS";
fi fi
derived_env: derived_env:
- key: DISK_GB - key: DISK_GB

View File

@ -22,6 +22,7 @@ app:
- app_id: bitcoin-knots - app_id: bitcoin-knots
version: ">=26.0" version: ">=26.0"
- storage: 50Gi - storage: 50Gi
- bitcoin:archival
resources: resources:
cpu_limit: 0 cpu_limit: 0

View File

@ -9,7 +9,7 @@ app:
# 0.8.2 — iroh-capable). No usable upstream image exists, so we build + push # 0.8.2 — iroh-capable). No usable upstream image exists, so we build + push
# this to the node registry. Pin the tag to match the REST shapes coded in # this to the node registry. Pin the tag to match the REST shapes coded in
# core/archipelago/src/wallet/fedimint_client.rs (validated against 0.8.2). # core/archipelago/src/wallet/fedimint_client.rs (validated against 0.8.2).
image: 146.59.87.168:3000/lfg2025/fmcd:0.8.0 image: 146.59.87.168:3000/lfg2025/fmcd:0.8.1
pull_policy: if-not-present pull_policy: if-not-present
network: archy-net network: archy-net
# No entrypoint override: the image's resilient `fmcd-run` launcher loops # No entrypoint override: the image's resilient `fmcd-run` launcher loops
@ -33,6 +33,11 @@ app:
- storage: 2Gi - storage: 2Gi
resources: resources:
# fmcd's embedded iroh networking can hot-loop on relay/hole-punch retries
# on NAT'd nodes that reach the federation neither directly nor via iroh's
# public relays, pegging its whole allotment. Cap it low so a stuck instance
# can't starve the node (steady-state is <3% of a core; joins are brief);
# the fmcd-run watchdog additionally restarts a sustained-hot process.
cpu_limit: 1 cpu_limit: 1
memory_limit: 1Gi memory_limit: 1Gi
disk_limit: 2Gi disk_limit: 2Gi

View File

@ -8,6 +8,13 @@ app:
image: 146.59.87.168:3000/lfg2025/lnd:v0.18.4-beta image: 146.59.87.168:3000/lfg2025/lnd:v0.18.4-beta
pull_policy: if-not-present pull_policy: if-not-present
network: archy-net network: archy-net
# BITCOIND_HOST must follow the node's actual Bitcoin container — Knots or
# Core — resolved at apply time from host facts. Hardcoding either breaks
# LND's chain backend connection on the other (lnd.conf is likewise
# resolved in lnd::ensure_config).
derived_env:
- key: BITCOIND_HOST
template: "{{BITCOIN_HOST}}"
secret_env: secret_env:
- key: BITCOIND_RPCPASS - key: BITCOIND_RPCPASS
secret_file: bitcoin-rpc-password secret_file: bitcoin-rpc-password
@ -45,7 +52,6 @@ app:
options: [rw] options: [rw]
environment: environment:
- BITCOIND_HOST=bitcoin-knots
- BITCOIND_RPCUSER=archipelago - BITCOIND_RPCUSER=archipelago
- NETWORK=mainnet - NETWORK=mainnet

View File

@ -27,6 +27,7 @@ app:
version: ">=1.18.0" version: ">=1.18.0"
- app_id: archy-mempool-db - app_id: archy-mempool-db
version: ">=11.4.10" version: ">=11.4.10"
- bitcoin:archival
resources: resources:
memory_limit: 2Gi memory_limit: 2Gi

View File

@ -5,7 +5,7 @@ app:
description: Bitcoin mempool and blockchain explorer. Real-time transaction and block visualization. description: Bitcoin mempool and blockchain explorer. Real-time transaction and block visualization.
container: container:
image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0 image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1
image_signature: cosign://... image_signature: cosign://...
pull_policy: if-not-present pull_policy: if-not-present
@ -13,6 +13,7 @@ app:
- app_id: bitcoin-core - app_id: bitcoin-core
version: ">=24.0" version: ">=24.0"
- storage: 20Gi - storage: 20Gi
- bitcoin:archival
resources: resources:
cpu_limit: 2 cpu_limit: 2
@ -30,7 +31,7 @@ app:
ports: ports:
- host: 4080 - host: 4080
container: 4080 container: 8080 # mempool-frontend nginx listens on 8080 (FRONTEND_HTTP_PORT=8080)
protocol: tcp # Web UI protocol: tcp # Web UI
volumes: volumes:

View File

@ -1,5 +0,0 @@
# Meshtastic - uses official image
FROM meshtastic/meshtastic:latest
# Default configuration is in the image
# No additional setup needed

View File

@ -1,69 +0,0 @@
app:
id: meshtastic
name: Meshtastic
version: 2-daily-alpine
description: Open-source mesh networking for LoRa radios. Create decentralized communication networks.
container:
image: docker.io/meshtastic/meshtasticd:daily-alpine
pull_policy: if-not-present
dependencies:
- storage: 1Gi
resources:
cpu_limit: 1
memory_limit: 512Mi
disk_limit: 1Gi
security:
capabilities: [NET_ADMIN, SYS_ADMIN] # Required for LoRa radio access
readonly_root: false # Needs write access for device management
no_new_privileges: true
user: 1000
seccomp_profile: default
network_policy: host # Requires host network for radio access
apparmor_profile: meshtastic
ports:
- host: 4403
container: 4403
protocol: tcp # Meshtastic TCP API
devices:
- /dev/ttyUSB0 # LoRa radio device (if connected)
volumes:
- type: bind
source: /var/lib/archipelago/meshtastic
target: /var/lib/meshtasticd
options: [rw]
files:
- path: /var/lib/archipelago/meshtastic/config.yaml
content: |
General:
MACAddress: AA:BB:CC:DD:EE:01
Webserver:
Port: 4403
environment:
- MESHTASTIC_PORT=/dev/ttyUSB0
- MESHTASTIC_SERIAL=true
health_check:
type: cmd
endpoint: test -f /var/lib/meshtasticd/config.yaml
interval: 30s
timeout: 30s
retries: 5
networking:
mesh_enabled: true
local_network_access: true
metadata:
icon: /assets/img/app-icons/meshcore.svg
category: networking
tier: recommended
repo: https://github.com/meshtastic/firmware

View File

@ -0,0 +1,77 @@
app:
id: netbird-dashboard
name: NetBird Dashboard
version: "2.38.0"
description: NetBird management dashboard (SPA). Internal stack member served through the netbird proxy.
category: networking
# Hyphen name matches runtime references + the live container (adoption).
# Alias `netbird-dashboard` is the short hostname the proxy's nginx proxies to.
container_name: netbird-dashboard
container:
image: docker.io/netbirdio/dashboard:v2.38.0
pull_policy: if-not-present
network: netbird-net
network_aliases: [netbird-dashboard]
# The dashboard SPA bakes its API/OIDC base URL from these at container
# start. They must point at the proxy's public HTTPS origin (8087) so the
# browser uses a secure context (window.crypto.subtle / OIDC PKCE, #15).
# {{HOST_IP}} is the node's primary host IP, resolved at apply time.
derived_env:
- key: NETBIRD_MGMT_API_ENDPOINT
template: "https://{{HOST_IP}}:8087"
- key: NETBIRD_MGMT_GRPC_API_ENDPOINT
template: "https://{{HOST_IP}}:8087"
- key: AUTH_AUTHORITY
template: "https://{{HOST_IP}}:8087/oauth2"
dependencies:
- app_id: netbird-server
resources:
memory_limit: 256Mi
security:
# cap-drop=ALL is applied by the orchestrator. The dashboard image runs
# nginx (master as root, drops workers) binding :80 — needs the worker-drop
# caps + NET_BIND_SERVICE for the privileged port.
capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE]
readonly_root: false
network_policy: isolated
# Internal only — reached container-to-container by the proxy via netbird-net.
ports: []
volumes: []
environment:
- AUTH_AUDIENCE=netbird-dashboard
- AUTH_CLIENT_ID=netbird-dashboard
- AUTH_CLIENT_SECRET=
- USE_AUTH0=false
- AUTH_SUPPORTED_SCOPES=openid profile email groups
- AUTH_REDIRECT_URI=/nb-auth
- AUTH_SILENT_REDIRECT_URI=/nb-silent-auth
- NETBIRD_TOKEN_SOURCE=idToken
- NGINX_SSL_PORT=443
- LETSENCRYPT_DOMAIN=none
health_check:
type: tcp
endpoint: localhost:80
interval: 30s
timeout: 5s
retries: 5
start_period: 20s
metadata:
author: NetBird
icon: /assets/img/app-icons/netbird.svg
website: https://netbird.io
repo: https://github.com/netbirdio/dashboard
license: BSD-3-Clause
tags:
- networking
- vpn
- dashboard

View File

@ -0,0 +1,122 @@
app:
id: netbird-server
name: NetBird Server
version: "0.71.2"
description: NetBird combined management / signal / relay server with an embedded identity provider and STUN. Backend for the self-hosted NetBird mesh VPN.
category: networking
# Hyphen name matches the runtime references (crash_recovery / dependencies /
# config startup order) + the live container, so on an existing node the
# orchestrator ADOPTS the running server rather than recreating it (data +
# the sqlite store under /var/lib/netbird preserved). Alias `netbird-server`
# is the short hostname the proxy's nginx proxies/grpc-passes to.
container_name: netbird-server
container:
image: docker.io/netbirdio/netbird-server:0.71.2
pull_policy: if-not-present
network: netbird-net
network_aliases: [netbird-server]
# The relay authSecret and the sqlite store encryptionKey are base64 keys
# (the server base64-decodes them to recover raw bytes — hex would decode to
# the wrong value). Generated once and reused: ensure_generated_secrets
# no-ops when the file already exists, so a re-render of config.yaml on an
# adopted node keeps the same keys (regenerating would orphan the store).
generated_secrets:
- name: netbird-relay-auth-secret
kind: base64
- name: netbird-store-encryption-key
kind: base64
# Pass the rendered config explicitly, mirroring the legacy `--config` arg.
custom_args: ["--config", "/etc/netbird/config.yaml"]
dependencies:
- storage: 1Gi
resources:
memory_limit: 1Gi
security:
# cap-drop=ALL is applied by the orchestrator. The server binds :80
# (management/signal/relay HTTP + gRPC) inside the container — a privileged
# port — so it needs NET_BIND_SERVICE. STUN is 3478/udp (unprivileged).
capabilities: [NET_BIND_SERVICE]
readonly_root: false
network_policy: isolated
ports:
- host: 8086
container: 80
protocol: tcp # management API + embedded OIDC issuer (/oauth2)
- host: 3478
container: 3478
protocol: udp # STUN — must be UDP; tcp here breaks relay discovery
volumes:
- type: bind
source: /var/lib/archipelago/netbird/data
target: /var/lib/netbird
options: [rw]
# The rendered config.yaml, read-only. Re-rendered on every reconcile from
# host facts + the base64 secrets; idempotent (stable bytes → no restart).
- type: bind
source: /var/lib/archipelago/netbird/config.yaml
target: /etc/netbird/config.yaml
options: [ro]
environment: []
# The server's config. {{HOST_IP}} is the node's primary host IP (the proxy's
# public origin is https on 8087 — the dashboard needs a secure context for
# OIDC PKCE, issue #15). {{secret:...}} are read 0600 from the secrets dir.
files:
- path: /var/lib/archipelago/netbird/config.yaml
overwrite: true
content: |
server:
listenAddress: ":80"
exposedAddress: "https://{{HOST_IP}}:8087"
stunPorts:
- 3478
metricsPort: 9090
healthcheckAddress: ":9000"
logLevel: "info"
logFile: "console"
authSecret: "{{secret:netbird-relay-auth-secret}}"
dataDir: "/var/lib/netbird"
auth:
issuer: "https://{{HOST_IP}}:8087/oauth2"
localAuthDisabled: false
signKeyRefreshEnabled: false
dashboardRedirectURIs:
- "https://{{HOST_IP}}:8087/nb-auth"
- "https://{{HOST_IP}}:8087/nb-silent-auth"
dashboardPostLogoutRedirectURIs:
- "https://{{HOST_IP}}:8087/"
cliRedirectURIs:
- "http://localhost:53000/"
store:
engine: "sqlite"
encryptionKey: "{{secret:netbird-store-encryption-key}}"
# TCP liveness on the management port. Binds at startup, stays green; an http
# check of /oauth2 would false-fail while the issuer warms up.
health_check:
type: tcp
endpoint: localhost:80
interval: 30s
timeout: 5s
retries: 10
start_period: 30s
metadata:
author: NetBird
icon: /assets/img/app-icons/netbird.svg
website: https://netbird.io
repo: https://github.com/netbirdio/netbird
license: BSD-3-Clause
tags:
- networking
- vpn
- wireguard
- mesh

182
apps/netbird/manifest.yml Normal file
View File

@ -0,0 +1,182 @@
app:
id: netbird
name: NetBird
version: "2.38.0"
description: Self-hosted WireGuard mesh VPN control plane with dashboard, embedded identity provider, management API, signal, relay, and STUN. The user-facing entry point — a TLS proxy in front of the dashboard + server.
category: networking
# The user-facing launcher (app_id + container both "netbird", matching the
# runtime references + the live container so the orchestrator adopts it). This
# is the nginx that terminates TLS on 8087 and fans out to the dashboard +
# server by their short aliases on netbird-net.
container_name: netbird
container:
image: docker.io/library/nginx:1.27-alpine
pull_policy: if-not-present
network: netbird-net
# Self-signed TLS cert materialised before create — the dashboard needs a
# secure context (window.crypto.subtle / OIDC PKCE, issue #15), so the proxy
# serves HTTPS. Idempotent: kept as-is when crt+key already exist (a user
# accepts it once). SAN defaults to the host IP + 127.0.0.1 + localhost.
generated_certs:
- crt: /var/lib/archipelago/netbird/tls.crt
key: /var/lib/archipelago/netbird/tls.key
dependencies:
- app_id: netbird-server
- app_id: netbird-dashboard
- storage: 1Gi
resources:
memory_limit: 256Mi
security:
# cap-drop=ALL is applied by the orchestrator. nginx (master as root, drops
# workers) binds :443 — needs the worker-drop caps + NET_BIND_SERVICE.
capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE]
readonly_root: false
network_policy: isolated
ports:
# 8087 publishes the TLS listener (container :443). HTTPS is required for the
# dashboard's secure context (issue #15).
- host: 8087
container: 443
protocol: tcp
volumes:
- type: bind
source: /var/lib/archipelago/netbird/nginx.conf
target: /etc/nginx/conf.d/default.conf
options: [ro]
- type: bind
source: /var/lib/archipelago/netbird/tls.crt
target: /etc/nginx/tls.crt
options: [ro]
- type: bind
source: /var/lib/archipelago/netbird/tls.key
target: /etc/nginx/tls.key
options: [ro]
environment: []
# The proxy config. {{NETWORK_GATEWAY}} is the netbird-net bridge gateway =
# Podman's aardvark DNS. nginx uses it as an explicit `resolver` with VARIABLE
# upstreams so it re-resolves container names per request — without it nginx
# pins a container IP at startup and 502s forever once that IP moves on a
# restart/reboot (issue #15, observed live on .198). Every #15 fix below
# (CORS $http_origin reflect, grpc pass, nb-auth/nb-silent-auth rewrite to
# index.html, /relay websocket) is preserved verbatim from the legacy config.
files:
- path: /var/lib/archipelago/netbird/nginx.conf
overwrite: true
content: |
server {
listen 443 ssl;
server_name _;
# netbird's dashboard needs a secure context (window.crypto.subtle for
# OIDC PKCE), so the proxy terminates TLS with a self-signed cert (#15).
ssl_certificate /etc/nginx/tls.crt;
ssl_certificate_key /etc/nginx/tls.key;
# Rootless Podman can hand a container a new IP across restarts/reboots.
# nginx resolves a literal upstream name ONCE at startup and caches it,
# so after the IP moves every request 502s with "host unreachable"
# (issue #15, observed live on .198: nginx pinned to a dead
# netbird-dashboard IP). Fix: point `resolver` at the netbird-net
# gateway (Podman's aardvark DNS) and use VARIABLE upstreams, which
# forces nginx to re-resolve the container names at request time.
resolver {{NETWORK_GATEWAY}} valid=10s ipv6=off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
location ~ ^/(relay|ws-proxy/) {
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 1d;
}
location ~ ^/(api|oauth2)(/|$) {
# The dashboard is a SPA whose API/OIDC base URL is baked at build
# time to one host:port. A single box is reached via several
# addresses, so those fetches are cross-origin and the browser
# blocks them with no Access-Control-Allow-Origin (#15, live on
# .198). Reflect the caller's Origin and answer the CORS preflight.
if ($request_method = OPTIONS) {
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
add_header Access-Control-Max-Age 86400 always;
add_header Content-Length 0;
return 204;
}
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
}
location ~ ^/(signalexchange\.SignalExchange|management\.ManagementService|management\.ProxyService)/ {
set $nb_server netbird-server;
grpc_pass grpc://$nb_server:80;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
}
# OIDC callback routes are client-side SPA routes with NO prebuilt page
# in the dashboard bundle, so proxying them straight through 404s —
# which crashes the dashboard's auth init and shows "Unauthenticated"
# with dead buttons (#15, live on .198: /nb-auth + /nb-silent-auth
# returned 404). Serve index.html at these paths (URL unchanged) so
# react-oidc boots and completes the login / silent-SSO.
location ~ ^/(nb-auth|nb-silent-auth) {
set $nb_dashboard netbird-dashboard;
rewrite ^.*$ /index.html break;
proxy_pass http://$nb_dashboard:80;
}
location / {
set $nb_dashboard netbird-dashboard;
proxy_pass http://$nb_dashboard:80;
}
}
health_check:
type: tcp
endpoint: localhost:443
interval: 30s
timeout: 5s
retries: 5
start_period: 20s
interfaces:
main:
name: Dashboard
description: Manage your self-hosted NetBird mesh VPN
type: ui
port: 8087
protocol: https
path: /
metadata:
author: NetBird
icon: /assets/img/app-icons/netbird.svg
website: https://netbird.io
repo: https://github.com/netbirdio/netbird
license: BSD-3-Clause
tags:
- networking
- vpn
- wireguard
- mesh

80
core/Cargo.lock generated
View File

@ -99,6 +99,7 @@ version = "1.7.99-alpha"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"archipelago-container", "archipelago-container",
"archipelago-openwrt",
"archipelago-performance", "archipelago-performance",
"archipelago-security", "archipelago-security",
"argon2", "argon2",
@ -128,6 +129,7 @@ dependencies = [
"hyper-ws-listener", "hyper-ws-listener",
"iroh", "iroh",
"iroh-blobs", "iroh-blobs",
"libc",
"mainline", "mainline",
"mdns-sd", "mdns-sd",
"nostr-sdk", "nostr-sdk",
@ -180,6 +182,22 @@ dependencies = [
"uuid", "uuid",
] ]
[[package]]
name = "archipelago-openwrt"
version = "0.1.0"
dependencies = [
"anyhow",
"async-trait",
"reqwest 0.11.27",
"serde",
"serde_json",
"ssh2",
"thiserror 1.0.69",
"tokio",
"tokio-test",
"tracing",
]
[[package]] [[package]]
name = "archipelago-performance" name = "archipelago-performance"
version = "0.1.0" version = "0.1.0"
@ -2839,6 +2857,32 @@ dependencies = [
"redox_syscall 0.7.3", "redox_syscall 0.7.3",
] ]
[[package]]
name = "libssh2-sys"
version = "0.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "220e4f05ad4a218192533b300327f5150e809b54c4ec83b5a1d91833601811b9"
dependencies = [
"cc",
"libc",
"libz-sys",
"openssl-sys",
"pkg-config",
"vcpkg",
]
[[package]]
name = "libz-sys"
version = "1.1.29"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "85bc9657773828b90eeb625adff10eeac83cc21bbfd8e23a03eaa8a33c9e28d9"
dependencies = [
"cc",
"libc",
"pkg-config",
"vcpkg",
]
[[package]] [[package]]
name = "linux-raw-sys" name = "linux-raw-sys"
version = "0.11.0" version = "0.11.0"
@ -3580,6 +3624,18 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe" checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe"
[[package]]
name = "openssl-sys"
version = "0.9.117"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b47e7e6bb2c38cd930d25a23b40fa52e068c10e85f3e03a7f5ba5aaca5713695"
dependencies = [
"cc",
"libc",
"pkg-config",
"vcpkg",
]
[[package]] [[package]]
name = "papaya" name = "papaya"
version = "0.2.4" version = "0.2.4"
@ -3758,6 +3814,12 @@ dependencies = [
"spki 0.8.0", "spki 0.8.0",
] ]
[[package]]
name = "pkg-config"
version = "0.3.33"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "19f132c84eca552bf34cab8ec81f1c1dcc229b811638f9d283dceabe58c5569e"
[[package]] [[package]]
name = "plain" name = "plain"
version = "0.2.3" version = "0.2.3"
@ -4988,6 +5050,18 @@ dependencies = [
"der 0.8.0", "der 0.8.0",
] ]
[[package]]
name = "ssh2"
version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2f84d13b3b8a0d4e91a2629911e951db1bb8671512f5c09d7d4ba34500ba68c8"
dependencies = [
"bitflags 2.13.0",
"libc",
"libssh2-sys",
"parking_lot 0.12.5",
]
[[package]] [[package]]
name = "stable_deref_trait" name = "stable_deref_trait"
version = "1.2.1" version = "1.2.1"
@ -5775,6 +5849,12 @@ version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ba73ea9cf16a25df0c8caa16c51acb937d5712a8429db78a3ee29d5dcacd3a65" checksum = "ba73ea9cf16a25df0c8caa16c51acb937d5712a8429db78a3ee29d5dcacd3a65"
[[package]]
name = "vcpkg"
version = "0.2.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "accd4ea62f7bb7a82fe23066fb0957d48ef677f6eeb8215f372f52e48bb32426"
[[package]] [[package]]
name = "vergen" name = "vergen"
version = "9.1.0" version = "9.1.0"

View File

@ -4,6 +4,7 @@ resolver = "2"
members = [ members = [
"archipelago", "archipelago",
"container", "container",
"openwrt",
"performance", "performance",
"security", "security",
] ]

View File

@ -22,6 +22,7 @@ iroh-swarm = ["dep:iroh", "dep:iroh-blobs"]
[dependencies] [dependencies]
# Core dependencies # Core dependencies
tokio = { version = "1", features = ["full"] } tokio = { version = "1", features = ["full"] }
libc = "0.2" # process-group signalling for the supervised reticulum daemon
serde = { version = "1.0", features = ["derive"] } serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0" serde_json = "1.0"
anyhow = "1.0" anyhow = "1.0"
@ -42,6 +43,7 @@ futures-util = "0.3"
# Our modules # Our modules
archipelago-container = { path = "../container" } archipelago-container = { path = "../container" }
archipelago-openwrt = { path = "../openwrt" }
archipelago-security = { path = "../security" } archipelago-security = { path = "../security" }
archipelago-performance = { path = "../performance" } archipelago-performance = { path = "../performance" }

View File

@ -48,6 +48,17 @@ impl ApiHandler {
.get("x-blob-filename") .get("x-blob-filename")
.and_then(|v| v.to_str().ok()) .and_then(|v| v.to_str().ok())
.map(|s| s.to_string()); .map(|s| s.to_string());
// Optional caller-supplied thumbnail (small, base64) — e.g. the mesh
// chat's image-quality picker generates a tiny client-side preview so
// a ContentRef receiver can render something before fetching the full
// blob. Best-effort: a malformed header is just ignored, not fatal.
let thumb_bytes = headers
.get("x-blob-thumb")
.and_then(|v| v.to_str().ok())
.and_then(|b64| {
use base64::{engine::general_purpose::STANDARD, Engine as _};
STANDARD.decode(b64).ok()
});
let bytes = body.to_vec(); let bytes = body.to_vec();
// Uploads through /api/blob come from the node owner's session and // Uploads through /api/blob come from the node owner's session and
@ -55,7 +66,7 @@ impl ApiHandler {
// pictures, banners). Store them public so `/blob/<cid>` serves // pictures, banners). Store them public so `/blob/<cid>` serves
// without a capability check — external Nostr clients fetching a // without a capability check — external Nostr clients fetching a
// kind-0 `picture` URL have no cap and can't get one. // kind-0 `picture` URL have no cap and can't get one.
match store.put(&bytes, &mime, filename, None, true).await { match store.put(&bytes, &mime, filename, thumb_bytes, true).await {
Ok(meta) => { Ok(meta) => {
let exp = let exp =
(chrono::Utc::now().timestamp() as u64) + crate::blobs::DEFAULT_CAP_TTL_SECS; (chrono::Utc::now().timestamp() as u64) + crate::blobs::DEFAULT_CAP_TTL_SECS;

View File

@ -39,6 +39,17 @@ impl ApiHandler {
let (mut tx, mut rx) = ws_stream.split(); let (mut tx, mut rx) = ws_stream.split();
// Subscribe BEFORE taking the initial snapshot. Messages are full
// data dumps keyed by a monotonic revision, so a broadcast that
// races the snapshot is at worst a harmless duplicate/newer dump
// delivered right after — but subscribing after the snapshot send
// (the old order) let any update in that window vanish forever,
// since a tokio broadcast channel never delivers sends that
// predate subscribe(). That silently stuck clients (e.g. a fresh
// install's post-boot container scan) on a stale initial snapshot
// until a full page reload opened a new connection past the race.
let mut state_rx = state_manager.subscribe();
let initial_msg = state_manager.get_initial_message().await; let initial_msg = state_manager.get_initial_message().await;
if let Ok(json_msg) = serde_json::to_string(&initial_msg) { if let Ok(json_msg) = serde_json::to_string(&initial_msg) {
if let Err(e) = tx.send(Message::Text(json_msg)).await { if let Err(e) = tx.send(Message::Text(json_msg)).await {
@ -47,8 +58,6 @@ impl ApiHandler {
} }
debug!("Sent initial data dump at revision {}", initial_msg.rev); debug!("Sent initial data dump at revision {}", initial_msg.rev);
} }
let mut state_rx = state_manager.subscribe();
let ping_interval = tokio::time::interval(tokio::time::Duration::from_secs(30)); let ping_interval = tokio::time::interval(tokio::time::Duration::from_secs(30));
tokio::pin!(ping_interval); tokio::pin!(ping_interval);
let mut last_client_activity = Instant::now(); let mut last_client_activity = Instant::now();

View File

@ -141,6 +141,19 @@ impl RpcHandler {
self.auth_manager.setup_user(password).await?; self.auth_manager.setup_user(password).await?;
tracing::info!("[onboarding] user setup complete"); tracing::info!("[onboarding] user setup complete");
// Persist the pending onboarding seed as the encrypted backup now that
// a passphrase (the login password) finally exists — otherwise "Reveal
// recovery phrase" has nothing to decrypt on this node, ever.
// Best-effort: a failure here must not break password setup.
match super::seed_rpc::save_pending_seed_encrypted(&self.config.data_dir, password).await {
Ok(true) => tracing::info!("[onboarding] encrypted seed backup saved"),
Ok(false) => tracing::info!(
"[onboarding] no pending mnemonic to back up (restored earlier or legacy node)"
),
Err(e) => tracing::warn!("[onboarding] encrypted seed backup failed: {e:#}"),
}
Ok(serde_json::json!(true)) Ok(serde_json::json!(true))
} }

View File

@ -171,6 +171,12 @@ impl RpcHandler {
// than the WebSocket-delivered package_data, which caused apps to flicker // than the WebSocket-delivered package_data, which caused apps to flicker
// between "installed" and "not-installed" in the UI. // between "installed" and "not-installed" in the UI.
let (data, _) = self.state_manager.get_snapshot().await; let (data, _) = self.state_manager.get_snapshot().await;
// Apps the user explicitly stopped must read as "stopped" even though a
// UI companion (electrs-ui, bitcoin-ui, …) keeps serving the launch port:
// launch_port_reachable() below would otherwise upgrade an exited backend
// back to "running". The reconcile guard keeps these backends down, so the
// marker is authoritative here.
let user_stopped = crate::crash_recovery::load_user_stopped(&self.config.data_dir).await;
if data.server_info.status_info.containers_scanned && !data.package_data.is_empty() { if data.server_info.status_info.containers_scanned && !data.package_data.is_empty() {
let mut containers = Vec::with_capacity(data.package_data.len()); let mut containers = Vec::with_capacity(data.package_data.len());
for (id, pkg) in &data.package_data { for (id, pkg) in &data.package_data {
@ -202,7 +208,11 @@ impl RpcHandler {
// Scanner backoff preserves cached package_data. Refresh stable // Scanner backoff preserves cached package_data. Refresh stable
// states so callers do not see stale `running`/`exited` after // states so callers do not see stale `running`/`exited` after
// health-monitor recovery or Quadlet --rm container removal. // health-monitor recovery or Quadlet --rm container removal.
if state == "running" && requires_launch_port_for_health(id) { if user_stopped.contains(id) {
// User stopped it → authoritative "stopped". Do NOT let a
// still-running UI companion's launch port mark it running.
state = "stopped".to_string();
} else if state == "running" && requires_launch_port_for_health(id) {
if !self.cached_reachable_health(id).await?.is_some() { if !self.cached_reachable_health(id).await?.is_some() {
state = live_state_for_app(id) state = live_state_for_app(id)
.await .await

View File

@ -429,11 +429,15 @@ impl RpcHandler {
}, },
Some("fedimint") => match mint_fedimint().await { Some("fedimint") => match mint_fedimint().await {
Ok((notes, fed)) => { Ok((notes, fed)) => {
tracing::info!("paid download: spending {price_sats} sats Fedimint notes from {fed}"); tracing::info!(
"paid download: spending {price_sats} sats Fedimint notes from {fed}"
);
(notes, "fedimint") (notes, "fedimint")
} }
Err(e) => { Err(e) => {
tracing::warn!("paid download: fedimint spend failed for {price_sats} sats: {e:#}"); tracing::warn!(
"paid download: fedimint spend failed for {price_sats} sats: {e:#}"
);
return Ok(serde_json::json!({ "error": format!( return Ok(serde_json::json!({ "error": format!(
"Couldn't pay {price_sats} sats from your Fedimint wallet: {e}. \ "Couldn't pay {price_sats} sats from your Fedimint wallet: {e}. \
Fund it, or choose Cashu." Fund it, or choose Cashu."
@ -457,7 +461,9 @@ impl RpcHandler {
}, },
}, },
}; };
tracing::info!("paid download: paying {price_sats} sats to {onion} via {used_backend} ecash"); tracing::info!(
"paid download: paying {price_sats} sats to {onion} via {used_backend} ecash"
);
let (data, _) = self.state_manager.get_snapshot().await; let (data, _) = self.state_manager.get_snapshot().await;
let local_did = crate::identity::did_key_from_pubkey_hex(&data.server_info.pubkey)?; let local_did = crate::identity::did_key_from_pubkey_hex(&data.server_info.pubkey)?;

View File

@ -57,6 +57,8 @@ impl RpcHandler {
"package.uninstall" => self.clone().spawn_package_uninstall(params).await, "package.uninstall" => self.clone().spawn_package_uninstall(params).await,
"package.update" => self.clone().spawn_package_update(params).await, "package.update" => self.clone().spawn_package_update(params).await,
"package.check-updates" => self.handle_package_check_updates(params).await, "package.check-updates" => self.handle_package_check_updates(params).await,
"package.versions" => self.handle_package_versions(params).await,
"package.set-config" => self.clone().handle_package_set_config(params).await,
"package.credentials" => self.handle_package_credentials(params).await, "package.credentials" => self.handle_package_credentials(params).await,
"app.filebrowser-token" => self.handle_filebrowser_token().await, "app.filebrowser-token" => self.handle_filebrowser_token().await,
@ -221,6 +223,7 @@ impl RpcHandler {
"network.list-interfaces" => self.handle_network_list_interfaces().await, "network.list-interfaces" => self.handle_network_list_interfaces().await,
"network.scan-wifi" => self.handle_network_scan_wifi().await, "network.scan-wifi" => self.handle_network_scan_wifi().await,
"network.configure-wifi" => self.handle_network_configure_wifi(params).await, "network.configure-wifi" => self.handle_network_configure_wifi(params).await,
"network.set-wifi-radio" => self.handle_network_set_wifi_radio(params).await,
"network.configure-ethernet" => self.handle_network_configure_ethernet(params).await, "network.configure-ethernet" => self.handle_network_configure_ethernet(params).await,
"network.dns-status" => self.handle_network_dns_status().await, "network.dns-status" => self.handle_network_dns_status().await,
"network.configure-dns" => self.handle_network_configure_dns(params).await, "network.configure-dns" => self.handle_network_configure_dns(params).await,
@ -228,6 +231,13 @@ impl RpcHandler {
"router.info" => self.handle_router_info().await, "router.info" => self.handle_router_info().await,
"router.configure" => self.handle_router_configure(params).await, "router.configure" => self.handle_router_configure(params).await,
// OpenWrt / TollGate
"openwrt.scan" => self.handle_openwrt_scan(params).await,
"openwrt.get-status" => self.handle_openwrt_get_status(params).await,
"openwrt.provision-tollgate" => self.handle_openwrt_provision_tollgate(params).await,
"openwrt.scan-wifi" => self.handle_openwrt_scan_wifi(params).await,
"openwrt.configure-wan" => self.handle_openwrt_configure_wan(params).await,
// Ecash wallet // Ecash wallet
"wallet.ecash-balance" => self.handle_wallet_ecash_balance().await, "wallet.ecash-balance" => self.handle_wallet_ecash_balance().await,
"wallet.ecash-mint" => self.handle_wallet_ecash_mint(params).await, "wallet.ecash-mint" => self.handle_wallet_ecash_mint(params).await,
@ -364,6 +374,7 @@ impl RpcHandler {
"mesh.send" => self.handle_mesh_send(params).await, "mesh.send" => self.handle_mesh_send(params).await,
"mesh.send-channel" => self.handle_mesh_send_channel(params).await, "mesh.send-channel" => self.handle_mesh_send_channel(params).await,
"mesh.broadcast" => self.handle_mesh_broadcast().await, "mesh.broadcast" => self.handle_mesh_broadcast().await,
"mesh.reboot-radio" => self.handle_mesh_reboot_radio(params).await,
"mesh.configure" => self.handle_mesh_configure(params).await, "mesh.configure" => self.handle_mesh_configure(params).await,
"mesh.send-invoice" => self.handle_mesh_send_invoice(params).await, "mesh.send-invoice" => self.handle_mesh_send_invoice(params).await,
"mesh.send-coordinate" => self.handle_mesh_send_coordinate(params).await, "mesh.send-coordinate" => self.handle_mesh_send_coordinate(params).await,
@ -416,8 +427,10 @@ impl RpcHandler {
// Server settings // Server settings
"server.set-name" => self.handle_server_set_name(params).await, "server.set-name" => self.handle_server_set_name(params).await,
"server.set-location" => self.handle_server_set_location(params).await,
// System monitoring // System monitoring
"system.get-hostname" => self.handle_system_get_hostname().await,
"system.stats" => self.handle_system_stats().await, "system.stats" => self.handle_system_stats().await,
"system.processes" => self.handle_system_processes().await, "system.processes" => self.handle_system_processes().await,
"system.temperature" => self.handle_system_temperature().await, "system.temperature" => self.handle_system_temperature().await,

View File

@ -454,6 +454,12 @@ impl RpcHandler {
.flatten(), .flatten(),
}; };
let shared_location = if data.server_info.share_location {
data.server_info.lat.zip(data.server_info.lon)
} else {
None
};
let state = federation::build_local_state( let state = federation::build_local_state(
apps, apps,
0.0, 0.0,
@ -467,6 +473,7 @@ impl RpcHandler {
nostr_npub, nostr_npub,
own_fips_npub, own_fips_npub,
&federated_peers, &federated_peers,
shared_location,
); );
Ok(serde_json::to_value(&state)?) Ok(serde_json::to_value(&state)?)

View File

@ -18,6 +18,24 @@ impl RpcHandler {
Ok(serde_json::json!({ "networks": networks })) Ok(serde_json::json!({ "networks": networks }))
} }
/// network.set-wifi-radio — turn the wifi adapter fully on or off (not just
/// disconnect from a network). Params: `{ "enabled": bool }`.
pub(super) async fn handle_network_set_wifi_radio(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let params = params.ok_or_else(|| anyhow::anyhow!("Missing params"))?;
let enabled = params
.get("enabled")
.and_then(|v| v.as_bool())
.ok_or_else(|| anyhow::anyhow!("Missing required parameter: enabled"))?;
tracing::info!(enabled, "Setting wifi radio state");
set_wifi_radio(enabled).await?;
Ok(serde_json::json!({ "ok": true, "enabled": enabled }))
}
/// network.configure-wifi — connect to a WiFi network. /// network.configure-wifi — connect to a WiFi network.
pub(super) async fn handle_network_configure_wifi( pub(super) async fn handle_network_configure_wifi(
&self, &self,
@ -327,6 +345,27 @@ fn split_nmcli_escaped(line: &str, limit: usize) -> Vec<String> {
fields fields
} }
/// Turn the wifi radio fully on or off using nmcli (a rfkill-level toggle, not
/// just disconnecting from the current network — the adapter stops scanning/
/// associating entirely until switched back on).
async fn set_wifi_radio(enabled: bool) -> Result<()> {
let state = if enabled { "on" } else { "off" };
let output = tokio::process::Command::new("nmcli")
.args(["radio", "wifi", state])
.output()
.await
.context("Failed to run nmcli radio wifi")?;
if !output.status.success() {
anyhow::bail!(
"nmcli radio wifi {} failed: {}",
state,
String::from_utf8_lossy(&output.stderr)
);
}
Ok(())
}
/// Connect to a WiFi network using nmcli. /// Connect to a WiFi network using nmcli.
async fn connect_wifi(ssid: &str, password: &str) -> Result<()> { async fn connect_wifi(ssid: &str, password: &str) -> Result<()> {
let conn_name = format!("archipelago-wifi-{ssid}"); let conn_name = format!("archipelago-wifi-{ssid}");

View File

@ -19,7 +19,10 @@ impl RpcHandler {
let svc = service let svc = service
.as_ref() .as_ref()
.ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?; .ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?;
(svc.assistant_config().await, svc.assistant_denied_askers().await) (
svc.assistant_config().await,
svc.assistant_denied_askers().await,
)
}; };
let (ollama_detected, models) = detect_ollama().await; let (ollama_detected, models) = detect_ollama().await;

View File

@ -86,6 +86,29 @@ impl RpcHandler {
Ok(serde_json::json!({ "broadcast": true })) Ok(serde_json::json!({ "broadcast": true }))
} }
/// mesh.reboot-radio — Reboot the locally-connected radio firmware to
/// recover a wedged / RX-deaf radio. Optional `seconds` delay (default 2).
pub(in crate::api::rpc) async fn handle_mesh_reboot_radio(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let seconds = params
.as_ref()
.and_then(|p| p.get("seconds"))
.and_then(|v| v.as_i64())
.unwrap_or(2);
let service = self.mesh_service.read().await;
let svc = service
.as_ref()
.ok_or_else(|| anyhow::anyhow!("Mesh service not running. Enable mesh first."))?;
svc.reboot_radio(seconds).await?;
info!(seconds, "Mesh radio reboot requested via RPC");
Ok(serde_json::json!({ "reboot": true, "seconds": seconds }))
}
/// mesh.configure — Enable/disable mesh and set device path. /// mesh.configure — Enable/disable mesh and set device path.
pub(in crate::api::rpc) async fn handle_mesh_configure( pub(in crate::api::rpc) async fn handle_mesh_configure(
&self, &self,

View File

@ -5,6 +5,7 @@ use crate::mesh::message_types::{
Coordinate, DeletePayload, EditPayload, ForwardPayload, InvoicePayload, MeshMessageType, Coordinate, DeletePayload, EditPayload, ForwardPayload, InvoicePayload, MeshMessageType,
MessageKey, PsbtHashPayload, ReactionPayload, ReadReceiptPayload, ReplyPayload, TypedEnvelope, MessageKey, PsbtHashPayload, ReactionPayload, ReadReceiptPayload, ReplyPayload, TypedEnvelope,
}; };
use crate::mesh::types::radio_transport_label;
use anyhow::Result; use anyhow::Result;
use tracing::info; use tracing::info;
@ -391,9 +392,24 @@ impl RpcHandler {
// Hard ceiling matching the chunked-send capacity (~20 chunks * 152 // Hard ceiling matching the chunked-send capacity (~20 chunks * 152
// b64 chars after MCIIXXTT framing). Anything larger must go via // b64 chars after MCIIXXTT framing). Anything larger must go via
// ContentRef over Tor. // ContentRef over Tor — UNLESS the active device is Reticulum, which
// can carry up to RETICULUM_RESOURCE_MAX directly over LoRa via a
// native RNS Resource transfer (keep this ceiling in sync with
// `mesh.transport-advice`'s `"resource-mesh"` tier, the source of
// truth the frontend consults before ever reaching this size).
const INLINE_HARD_MAX: usize = 2300; const INLINE_HARD_MAX: usize = 2300;
if bytes.len() > INLINE_HARD_MAX { const RETICULUM_RESOURCE_MAX: usize = 2 * 1024 * 1024;
let service = self.mesh_service.read().await;
let svc = service
.as_ref()
.ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?;
let device_type = svc.shared_state().status.read().await.device_type;
let use_resource_transfer = bytes.len() > INLINE_HARD_MAX
&& device_type == crate::mesh::types::DeviceType::Reticulum
&& bytes.len() <= RETICULUM_RESOURCE_MAX;
if bytes.len() > INLINE_HARD_MAX && !use_resource_transfer {
anyhow::bail!( anyhow::bail!(
"Payload {} bytes exceeds inline max {} — use mesh.send-content (ContentRef) instead", "Payload {} bytes exceeds inline max {} — use mesh.send-content (ContentRef) instead",
bytes.len(), bytes.len(),
@ -414,22 +430,6 @@ impl RpcHandler {
.put(&bytes, &mime, filename.clone(), None, false) .put(&bytes, &mime, filename.clone(), None, false)
.await?; .await?;
let service = self.mesh_service.read().await;
let svc = service
.as_ref()
.ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?;
let content = ContentInlinePayload {
mime: mime.clone(),
filename: filename.clone(),
caption: caption.clone(),
bytes,
};
let seq = svc.next_send_seq(contact_id).await;
let payload = message_types::encode_payload(&content)?;
let envelope = TypedEnvelope::new(MeshMessageType::ContentInline, payload).with_seq(seq);
let wire = envelope.to_wire()?;
let display = match (&filename, &caption) { let display = match (&filename, &caption) {
(Some(f), Some(c)) => format!("📎 {}{}", f, c), (Some(f), Some(c)) => format!("📎 {}{}", f, c),
(Some(f), None) => format!("📎 {}", f), (Some(f), None) => format!("📎 {}", f),
@ -437,7 +437,8 @@ impl RpcHandler {
(None, None) => format!("📎 {} ({} bytes)", mime, meta.size), (None, None) => format!("📎 {} ({} bytes)", mime, meta.size),
}; };
// Render as a content_ref card on the sender side (UI already knows // Render as a content_ref card on the sender side (UI already knows
// how to draw it from cid + mime + filename + size). // how to draw it from cid + mime + filename + size) regardless of
// which wire format actually goes out — this is a local-only mirror.
let typed_json = serde_json::json!({ let typed_json = serde_json::json!({
"cid": meta.cid, "cid": meta.cid,
"size": meta.size, "size": meta.size,
@ -446,22 +447,67 @@ impl RpcHandler {
"caption": caption, "caption": caption,
"inline": true, "inline": true,
}); });
let seq = svc.next_send_seq(contact_id).await;
let msg = svc // A stock (non-archy) peer can't decode our typed-envelope wire
.send_typed_wire( // format — send images to them via LXMF's native FIELD_IMAGE
// instead, so they actually see the photo (Sideband/NomadNet).
let is_archy = svc.is_archy_peer(contact_id).await;
let native_image = !is_archy
&& device_type == crate::mesh::types::DeviceType::Reticulum
&& mime.starts_with("image/");
let msg = if native_image {
svc.send_native_image(contact_id, &mime, bytes, caption.clone())
.await?;
svc.record_sent_typed(
contact_id, contact_id,
wire,
"content_ref", "content_ref",
&display, &display,
Some(typed_json), Some(typed_json),
seq, seq,
Some(radio_transport_label(device_type).to_string()),
true, // Reticulum/LXMF is unconditionally E2E on every send
) )
.await?; .await
} else {
let content = ContentInlinePayload {
mime: mime.clone(),
filename: filename.clone(),
caption: caption.clone(),
bytes,
};
let payload = message_types::encode_payload(&content)?;
let envelope = TypedEnvelope::new(MeshMessageType::ContentInline, payload).with_seq(seq);
let wire = envelope.to_wire()?;
if use_resource_transfer {
svc.send_content_resource(
contact_id,
wire,
"content_ref",
&display,
Some(typed_json),
seq,
)
.await?
} else {
svc.send_typed_wire(
contact_id,
wire,
"content_ref",
&display,
Some(typed_json),
seq,
)
.await?
}
};
info!( info!(
contact_id, contact_id,
size = meta.size, size = meta.size,
cid = %meta.cid, cid = %meta.cid,
via_resource = use_resource_transfer,
"Sent content_inline over mesh" "Sent content_inline over mesh"
); );
Ok(serde_json::json!({ Ok(serde_json::json!({
@ -492,8 +538,19 @@ impl RpcHandler {
// Knobs — keep in sync with the frontend modal copy. // Knobs — keep in sync with the frontend modal copy.
const MESH_AUTO_MAX: u64 = 1024; const MESH_AUTO_MAX: u64 = 1024;
const MESH_HARD_MAX: u64 = 2300; const MESH_HARD_MAX: u64 = 2300;
// Reticulum-only: above the small inline-chunk cap, a real RNS Resource
// transfer can still carry the payload directly over LoRa (native
// chunked transfer with retries) instead of falling back to Tor. Capped
// well under TOR_LARGE_WARN to keep worst-case LoRa transfer time
// bounded — comfortably covers the HIGH image preset (512KB target).
const RETICULUM_RESOURCE_MAX: u64 = 2 * 1024 * 1024;
const TOR_LARGE_WARN: u64 = 5 * 1024 * 1024; const TOR_LARGE_WARN: u64 = 5 * 1024 * 1024;
const LORA_BYTES_PER_SEC: u64 = 50; // Meshcore/Meshtastic effective LoRa throughput after retries/FEC is much
// lower than the raw radio bitrate. Reticulum's RNodeInterface reports its
// real bitrate (e.g. ~3125 bps ≈ 390 B/s observed live), so estimates for it
// would be wildly pessimistic at the generic 50 B/s figure.
const LORA_BYTES_PER_SEC_DEFAULT: u64 = 50;
const LORA_BYTES_PER_SEC_RETICULUM: u64 = 390;
// Resolve peer Tor reachability via federation node list. // Resolve peer Tor reachability via federation node list.
let service = self.mesh_service.read().await; let service = self.mesh_service.read().await;
@ -501,6 +558,12 @@ impl RpcHandler {
.as_ref() .as_ref()
.ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?; .ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?;
let state = svc.shared_state(); let state = svc.shared_state();
let device_type = state.status.read().await.device_type;
let lora_bytes_per_sec = if device_type == crate::mesh::types::DeviceType::Reticulum {
LORA_BYTES_PER_SEC_RETICULUM
} else {
LORA_BYTES_PER_SEC_DEFAULT
};
let (peer_pubkey_hex, peer_did) = { let (peer_pubkey_hex, peer_did) = {
let peers = state.peers.read().await; let peers = state.peers.read().await;
match peers.get(&contact_id) { match peers.get(&contact_id) {
@ -520,8 +583,10 @@ impl RpcHandler {
.map(|d| nodes.iter().any(|n| &n.did == d)) .map(|d| nodes.iter().any(|n| &n.did == d))
.unwrap_or(false); .unwrap_or(false);
let est_seconds = (size.saturating_add(LORA_BYTES_PER_SEC - 1) / LORA_BYTES_PER_SEC).max(1); let est_seconds =
(size.saturating_add(lora_bytes_per_sec - 1) / lora_bytes_per_sec).max(1);
let is_reticulum = device_type == crate::mesh::types::DeviceType::Reticulum;
let (tier, reason) = if size <= MESH_AUTO_MAX { let (tier, reason) = if size <= MESH_AUTO_MAX {
("auto-mesh", "Small enough to send inline over mesh") ("auto-mesh", "Small enough to send inline over mesh")
} else if size <= MESH_HARD_MAX { } else if size <= MESH_HARD_MAX {
@ -530,6 +595,8 @@ impl RpcHandler {
} else { } else {
("auto-mesh", "No Tor path — sending inline over mesh") ("auto-mesh", "No Tor path — sending inline over mesh")
} }
} else if is_reticulum && size <= RETICULUM_RESOURCE_MAX {
("resource-mesh", "Sending directly over LoRa via a Reticulum resource transfer")
} else if size <= TOR_LARGE_WARN { } else if size <= TOR_LARGE_WARN {
if has_tor { if has_tor {
("tor-only", "Too large for mesh — Tor only") ("tor-only", "Too large for mesh — Tor only")
@ -674,18 +741,6 @@ impl RpcHandler {
.as_str() .as_str()
.ok_or_else(|| anyhow::anyhow!("Missing cid"))? .ok_or_else(|| anyhow::anyhow!("Missing cid"))?
.to_string(); .to_string();
let sender_onion = params["sender_onion"]
.as_str()
.ok_or_else(|| anyhow::anyhow!("Missing sender_onion"))?
.trim_end_matches('/')
.to_string();
let cap_token = params["cap_token"]
.as_str()
.ok_or_else(|| anyhow::anyhow!("Missing cap_token"))?
.to_string();
let cap_exp = params["cap_exp"]
.as_u64()
.ok_or_else(|| anyhow::anyhow!("Missing cap_exp"))?;
let mime_hint = params["mime"] let mime_hint = params["mime"]
.as_str() .as_str()
.unwrap_or("application/octet-stream") .unwrap_or("application/octet-stream")
@ -709,7 +764,12 @@ impl RpcHandler {
}; };
// Short-circuit if we already hold the blob — still issue a fresh // Short-circuit if we already hold the blob — still issue a fresh
// self-cap so the UI gets a displayable local URL. // self-cap so the UI gets a displayable local URL. Checked BEFORE the
// sender_onion/cap_token/cap_exp params are required below: an inline
// ContentInline attachment (mesh.send-content-inline) is written to
// our own BlobStore the moment it's received/sent (dispatch.rs), so
// its typed_payload never carries those fields at all — only a
// ContentRef fetched from a remote peer needs them.
if blob_store.has(&cid).await { if blob_store.has(&cid).await {
let local_exp = (chrono::Utc::now().timestamp() as u64) + DEFAULT_CAP_TTL_SECS; let local_exp = (chrono::Utc::now().timestamp() as u64) + DEFAULT_CAP_TTL_SECS;
let local_cap = blob_store.issue_capability(&cid, &self_pubkey_hex, local_exp); let local_cap = blob_store.issue_capability(&cid, &self_pubkey_hex, local_exp);
@ -725,6 +785,19 @@ impl RpcHandler {
})); }));
} }
let sender_onion = params["sender_onion"]
.as_str()
.ok_or_else(|| anyhow::anyhow!("Missing sender_onion"))?
.trim_end_matches('/')
.to_string();
let cap_token = params["cap_token"]
.as_str()
.ok_or_else(|| anyhow::anyhow!("Missing cap_token"))?
.to_string();
let cap_exp = params["cap_exp"]
.as_u64()
.ok_or_else(|| anyhow::anyhow!("Missing cap_exp"))?;
// Reach the sender: FIPS preferred when the sender is federated // Reach the sender: FIPS preferred when the sender is federated
// and has advertised a FIPS npub, Tor fallback otherwise. // and has advertised a FIPS npub, Tor fallback otherwise.
// Cap/exp/peer in the query string match what the sender signed in // Cap/exp/peer in the query string match what the sender signed in
@ -860,6 +933,15 @@ impl RpcHandler {
let svc = service let svc = service
.as_ref() .as_ref()
.ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?; .ok_or_else(|| anyhow::anyhow!("Mesh service not running"))?;
// Read receipts are fired automatically just by viewing a chat (no
// explicit user action), unlike every other typed send here — so a
// stock (non-archy) peer that can't decode a TypedEnvelope at all
// (e.g. a phone running plain Sideband) would otherwise get a raw
// control envelope shoved at it the moment its message is viewed,
// surfacing as garbage text right after whatever it just sent.
if !svc.is_archy_peer(contact_id).await {
return Ok(serde_json::json!({ "sent": false, "reason": "not an archy peer" }));
}
let seq = svc.next_send_seq(contact_id).await; let seq = svc.next_send_seq(contact_id).await;
let payload = message_types::encode_payload(&receipt)?; let payload = message_types::encode_payload(&receipt)?;
let envelope = TypedEnvelope::new(MeshMessageType::ReadReceipt, payload).with_seq(seq); let envelope = TypedEnvelope::new(MeshMessageType::ReadReceipt, payload).with_seq(seq);

View File

@ -64,6 +64,32 @@ pub(super) fn sanitize_error_message(msg: &str) -> String {
"Container", "Container",
"Image", "Image",
"Bitcoin address", "Bitcoin address",
"No router",
"No OpenWrt",
"No space left",
"Not enough flash",
"Not enough space",
"TollGate installation failed",
"No pre-built TollGate",
"opkg not found",
"apk update failed",
"No wireless interface",
"No wireless radio",
"WiFi radio enabled but",
"Missing required field",
// seed.reveal / auth flows — user-actionable, no internals to leak.
// Without these the sanitizer collapsed every reveal failure into
// "Operation failed. Check server logs." (which isn't even a crash).
"Incorrect",
"This node has no encrypted seed",
"A 2FA code is required",
"2FA is enabled but",
"Could not decrypt the saved seed",
"Could not unlock 2FA",
"No mnemonic available",
"No pending seed generation",
"Submitted words",
"Already set up",
]; ];
for prefix in &user_facing_prefixes { for prefix in &user_facing_prefixes {
if msg.starts_with(prefix) { if msg.starts_with(prefix) {
@ -83,6 +109,43 @@ pub(super) fn sanitize_error_message(msg: &str) -> String {
"Operation failed. Check server logs for details.".to_string() "Operation failed. Check server logs for details.".to_string()
} }
#[cfg(test)]
mod sanitize_tests {
use super::sanitize_error_message;
#[test]
fn seed_reveal_errors_pass_through() {
// Every user-actionable seed.reveal failure must reach the user —
// masking them as "Check server logs" sent a real user hunting a
// crash that never happened.
for msg in [
"Incorrect password",
"This node has no encrypted seed backup, so the recovery phrase cannot be shown. It was only displayed once during setup.",
"A 2FA code is required to reveal the recovery phrase",
"2FA is enabled but no TOTP data found",
"Could not decrypt the saved seed. If you set a separate backup passphrase during setup, enter that passphrase.",
"Could not unlock 2FA with this password",
"No mnemonic available. Generate or restore a seed first.",
"Submitted words do not match generated seed",
"Already set up. Use auth.changePassword to change.",
] {
assert_ne!(
sanitize_error_message(msg),
"Operation failed. Check server logs for details.",
"masked: {msg}"
);
}
}
#[test]
fn internal_errors_stay_generic() {
assert_eq!(
sanitize_error_message("thread panicked at src/foo.rs:42"),
"Operation failed. Check server logs for details."
);
}
}
/// Derive a CSRF token from the session token via HMAC. /// Derive a CSRF token from the session token via HMAC.
/// Deterministic: same session token always produces the same CSRF token. /// Deterministic: same session token always produces the same CSRF token.
/// Survives backend restarts because it depends only on the session token /// Survives backend restarts because it depends only on the session token

View File

@ -23,6 +23,7 @@ mod names;
mod network; mod network;
mod node; mod node;
mod nostr; mod nostr;
mod openwrt;
mod package; mod package;
mod peers; mod peers;
mod response; mod response;

View File

@ -0,0 +1,353 @@
use super::RpcHandler;
use anyhow::Result;
use archipelago_openwrt::{
detect,
router::Router,
tollgate::{self, TollGateConfig},
wan,
wifi_scan,
};
use crate::network::router as net_router;
/// Default port for the local Cashu mint (nutshell / cashu-mint app).
const LOCAL_MINT_PORT: u16 = 3338;
impl RpcHandler {
/// Scan the local subnet for OpenWrt routers.
///
/// Params: `{ "subnet": "192.168.1.0", "prefix": 24,
/// "ssh_user": "root", "ssh_password": "" }`
pub(super) async fn handle_openwrt_scan(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let p = params.unwrap_or_default();
let subnet: [u8; 4] = parse_ipv4(
p.get("subnet").and_then(|v| v.as_str()).unwrap_or("192.168.1.0"),
)?;
let prefix = p.get("prefix").and_then(|v| v.as_u64()).unwrap_or(24) as u8;
let ssh_user = p
.get("ssh_user")
.and_then(|v| v.as_str())
.unwrap_or("root")
.to_string();
let ssh_password = p
.get("ssh_password")
.and_then(|v| v.as_str())
.unwrap_or("")
.to_string();
let routers = detect::scan_subnet(subnet, prefix, &ssh_user, &ssh_password).await;
let ips: Vec<String> = routers.iter().map(|ip| ip.to_string()).collect();
Ok(serde_json::json!({ "routers": ips }))
}
/// Read current settings from a saved or ad-hoc OpenWrt router via SSH/UCI.
///
/// Params (all optional): `{ "host": "...", "ssh_user": "root", "ssh_password": "" }`
/// If params are omitted the saved `router_config.json` credentials are used.
pub(super) async fn handle_openwrt_get_status(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let saved = net_router::load_router_config(&self.config.data_dir).await?;
let p = params.unwrap_or_default();
let host_from_params = p.get("host").and_then(|v| v.as_str()).is_some();
let host = p
.get("host")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.or_else(|| if saved.configured { Some(saved.address.clone()) } else { None })
.ok_or_else(|| anyhow::anyhow!("No router configured — provide host or call router.configure first"))?;
let ssh_user = p
.get("ssh_user")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.or_else(|| saved.username.clone())
.unwrap_or_else(|| "root".to_string());
let ssh_password = p
.get("ssh_password")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.or_else(|| saved.password.clone())
.unwrap_or_default();
let router = Router::connect_password(&host, 22, &ssh_user, &ssh_password)?;
router.verify_openwrt()?;
// Persist the connection so other views (e.g. the Home dashboard's
// Network tile) can poll `openwrt.get-status` with no params instead
// of every caller needing to carry host/credentials around. Only do
// this when the host actually came from params — otherwise every
// no-args poll would re-save the same thing it just read.
if host_from_params {
let _ = net_router::configure_router(
&self.config.data_dir,
net_router::RouterType::OpenWrt,
&host,
None,
Some(&ssh_user),
Some(&ssh_password),
).await;
}
// System info
let release = router.run_ok("cat /etc/openwrt_release").unwrap_or_default();
let hostname = router
.uci_get("system.@system[0].hostname")
.unwrap_or_else(|_| "unknown".into());
let uptime_secs: u64 = router
.run_ok("cat /proc/uptime")
.unwrap_or_default()
.split_whitespace()
.next()
.and_then(|s| s.split('.').next())
.and_then(|s| s.parse().ok())
.unwrap_or(0);
// TollGate — check via opkg (≤24.x) or binary presence (25.x apk-native).
// The service binary is /usr/bin/tollgate-wrt (per its init.d script),
// not /usr/bin/tollgate-module-basic-go — that's only the opkg/apk
// *package* name, never an on-disk filename.
let tollgate_installed = router
.run("/usr/bin/opkg list-installed 2>/dev/null | grep -q '^tollgate-module-basic-go ' || \
test -f /usr/bin/tollgate-wrt 2>/dev/null")
.map(|(_, code)| code == 0)
.unwrap_or(false);
let tollgate = if tollgate_installed {
serde_json::json!({
"installed": true,
"enabled": router.uci_get("tollgate.main.enabled").map(|v| v == "1").unwrap_or(false),
"metric": router.uci_get("tollgate.main.metric").unwrap_or_default(),
"step_size_ms": router.uci_get("tollgate.main.step_size").ok().and_then(|v| v.parse::<u64>().ok()).unwrap_or(0),
"price_per_step":router.uci_get("tollgate.main.price_per_step").ok().and_then(|v| v.parse::<u64>().ok()).unwrap_or(0),
"min_steps": router.uci_get("tollgate.main.min_steps").ok().and_then(|v| v.parse::<u32>().ok()).unwrap_or(1),
"currency": router.uci_get("tollgate.main.currency").unwrap_or_default(),
"mint_url": router.uci_get("tollgate.main.mint_url").unwrap_or_default(),
})
} else {
serde_json::json!({ "installed": false })
};
// WiFi interfaces
let wifi_raw = router.run_ok("uci show wireless").unwrap_or_default();
let wifi_interfaces = parse_wifi_interfaces(&wifi_raw);
let wan_status = wan::get_wan_status(&router);
Ok(serde_json::json!({
"host": host,
"hostname": hostname,
"uptime_secs": uptime_secs,
"release": parse_release(&release),
"tollgate": tollgate,
"wifi_interfaces": wifi_interfaces,
"wan": wan_status,
}))
}
/// Provision TollGate on an OpenWrt router and create the "archipelago" SSID.
///
/// Params: `{ "host": "192.168.1.1", "ssh_user": "root", "ssh_password": "",
/// "price_sats": 10, "step_size_ms": 60000, "min_steps": 1,
/// "mint_url": "<optional override>" }`
///
/// `mint_url` defaults to `http://<this node's IP>:3338` — the local Cashu
/// mint that must be running as an Archy app before calling this endpoint.
pub(super) async fn handle_openwrt_provision_tollgate(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let saved = net_router::load_router_config(&self.config.data_dir).await?;
let p = params.unwrap_or_default();
let host = p
.get("host")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.or_else(|| if saved.configured { Some(saved.address.clone()) } else { None })
.ok_or_else(|| anyhow::anyhow!("No router configured — provide host or call router.configure first"))?;
let ssh_user = p
.get("ssh_user")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.or_else(|| saved.username.clone())
.unwrap_or_else(|| "root".to_string());
let ssh_password = p
.get("ssh_password")
.and_then(|v| v.as_str())
.map(|s| s.to_string())
.or_else(|| saved.password.clone())
.unwrap_or_default();
let default_mint_url = format!("http://{}:{}", self.config.host_ip, LOCAL_MINT_PORT);
let mint_url = p
.get("mint_url")
.and_then(|v| v.as_str())
.unwrap_or(&default_mint_url)
.to_string();
let config = TollGateConfig {
ssid: "archipelago".to_string(),
mint_url,
price_sats: p.get("price_sats").and_then(|v| v.as_u64()).unwrap_or(10),
step_size_ms: p
.get("step_size_ms")
.and_then(|v| v.as_u64())
.unwrap_or(60_000),
min_steps: p
.get("min_steps")
.and_then(|v| v.as_u64())
.unwrap_or(1) as u32,
enabled: p.get("enabled").and_then(|v| v.as_bool()).unwrap_or(true),
};
let router = Router::connect_password(&host, 22, &ssh_user, &ssh_password)?;
router.verify_openwrt()?;
tollgate::provision(&router, &config).await?;
Ok(serde_json::json!({
"ok": true,
"host": host,
"ssid": config.ssid,
"mint_url": config.mint_url,
}))
}
/// Scan for visible WiFi networks from the router's radio.
///
/// Params: same host/credentials as other openwrt methods.
pub(super) async fn handle_openwrt_scan_wifi(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let saved = net_router::load_router_config(&self.config.data_dir).await?;
let p = params.unwrap_or_default();
let host = p.get("host").and_then(|v| v.as_str()).map(|s| s.to_string())
.or_else(|| if saved.configured { Some(saved.address.clone()) } else { None })
.ok_or_else(|| anyhow::anyhow!("No router configured — provide host or call router.configure first"))?;
let ssh_user = p.get("ssh_user").and_then(|v| v.as_str()).map(|s| s.to_string())
.or_else(|| saved.username.clone()).unwrap_or_else(|| "root".to_string());
let ssh_password = p.get("ssh_password").and_then(|v| v.as_str()).map(|s| s.to_string())
.or_else(|| saved.password.clone()).unwrap_or_default();
let router = Router::connect_password(&host, 22, &ssh_user, &ssh_password)?;
router.verify_openwrt()?;
let networks = wifi_scan::scan_networks(&router)?;
let result: Vec<serde_json::Value> = networks
.iter()
.map(|n| serde_json::json!({
"ssid": n.ssid,
"bssid": n.bssid,
"signal": n.signal,
"channel": n.channel,
"encryption": n.encryption,
}))
.collect();
Ok(serde_json::json!({ "networks": result }))
}
/// Configure WAN/WISP — connect the router to an upstream WiFi network.
///
/// Params: host/credentials + `{ "ssid": "...", "password": "...", "encryption": "psk2" }`
pub(super) async fn handle_openwrt_configure_wan(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let saved = net_router::load_router_config(&self.config.data_dir).await?;
let p = params.unwrap_or_default();
let host = p.get("host").and_then(|v| v.as_str()).map(|s| s.to_string())
.or_else(|| if saved.configured { Some(saved.address.clone()) } else { None })
.ok_or_else(|| anyhow::anyhow!("No router configured — provide host or call router.configure first"))?;
let ssh_user = p.get("ssh_user").and_then(|v| v.as_str()).map(|s| s.to_string())
.or_else(|| saved.username.clone()).unwrap_or_else(|| "root".to_string());
let ssh_password = p.get("ssh_password").and_then(|v| v.as_str()).map(|s| s.to_string())
.or_else(|| saved.password.clone()).unwrap_or_default();
let ssid = p.get("ssid").and_then(|v| v.as_str())
.ok_or_else(|| anyhow::anyhow!("Missing required field: ssid"))?.to_string();
let password = p.get("password").and_then(|v| v.as_str()).unwrap_or("").to_string();
let encryption = p.get("encryption").and_then(|v| v.as_str()).unwrap_or("psk2").to_string();
let dhcp_start = p.get("dhcp_start").and_then(|v| v.as_u64()).unwrap_or(100) as u32;
let dhcp_limit = p.get("dhcp_limit").and_then(|v| v.as_u64()).unwrap_or(150) as u32;
let masq = p.get("masq").and_then(|v| v.as_bool()).unwrap_or(true);
let router = Router::connect_password(&host, 22, &ssh_user, &ssh_password)?;
router.verify_openwrt()?;
let config = wan::WispConfig { ssid: ssid.clone(), password, encryption, dhcp_start, dhcp_limit, masq };
wan::configure_wisp(&router, &config)?;
Ok(serde_json::json!({ "ok": true, "host": host, "ssid": ssid }))
}
}
/// Parse /etc/openwrt_release key=value pairs into a JSON object.
fn parse_release(raw: &str) -> serde_json::Value {
let mut m = serde_json::Map::new();
for line in raw.lines() {
if let Some((k, v)) = line.split_once('=') {
m.insert(
k.to_lowercase(),
serde_json::Value::String(v.trim_matches('"').to_string()),
);
}
}
serde_json::Value::Object(m)
}
/// Extract AP wifi-iface sections from `uci show wireless` output.
fn parse_wifi_interfaces(raw: &str) -> Vec<serde_json::Value> {
use std::collections::HashMap;
let mut sections: HashMap<String, HashMap<String, String>> = HashMap::new();
for line in raw.lines() {
if let Some((lhs, rhs)) = line.trim().split_once('=') {
let parts: Vec<&str> = lhs.splitn(3, '.').collect();
if parts.len() == 3 && parts[0] == "wireless" {
sections
.entry(parts[1].to_string())
.or_default()
.insert(parts[2].to_string(), rhs.trim_matches('\'').to_string());
}
}
}
let mut ifaces: Vec<serde_json::Value> = sections
.into_iter()
.filter(|(_, f)| f.get("mode").map(|m| m == "ap").unwrap_or(false))
.map(|(name, f)| serde_json::json!({
"section": name,
"ssid": f.get("ssid").cloned().unwrap_or_default(),
"device": f.get("device").cloned().unwrap_or_default(),
"encryption": f.get("encryption").cloned().unwrap_or_else(|| "none".into()),
"network": f.get("network").cloned().unwrap_or_default(),
"disabled": f.get("disabled").map(|v| v == "1").unwrap_or(false),
}))
.collect();
ifaces.sort_by_key(|v| v["section"].as_str().unwrap_or("").to_string());
ifaces
}
fn parse_ipv4(s: &str) -> Result<[u8; 4]> {
let parts: Vec<&str> = s.split('.').collect();
if parts.len() != 4 {
anyhow::bail!("Invalid IPv4: {}", s);
}
Ok([
parts[0].parse()?,
parts[1].parse()?,
parts[2].parse()?,
parts[3].parse()?,
])
}

View File

@ -114,6 +114,31 @@ impl RpcHandler {
Err(e) => { Err(e) => {
error!("package.install {} failed: {:#}", package_id_spawn, e); error!("package.install {} failed: {:#}", package_id_spawn, e);
install_log(&format!("INSTALL FAIL: {}{:#}", package_id_spawn, e)).await; install_log(&format!("INSTALL FAIL: {}{:#}", package_id_spawn, e)).await;
// Dependency-gate rejections happen BEFORE any resource
// (container/image/data dir) exists for this package, so
// keeping the optimistic entry would leave a phantom
// "Stopped" tile whose Start fails with `no such object`
// (the log-confirmed LND fresh-install failure). Remove
// the entry so the card reverts to installable, and
// surface the reason as a notification instead.
if let Some(gate) = e.downcast_ref::<super::dependencies::DependencyGateError>()
{
let (mut data, _) = handler.state_manager.get_snapshot().await;
data.package_data.remove(&package_id_spawn);
data.notifications.push(crate::data_model::Notification {
id: format!("install-deps-{package_id_spawn}"),
level: crate::data_model::NotificationLevel::Error,
title: format!("Could not install {package_id_spawn}"),
message: gate.to_string(),
timestamp: chrono::Utc::now().to_rfc3339(),
app_id: Some(package_id_spawn.clone()),
});
while data.notifications.len() > 20 {
data.notifications.remove(0);
}
handler.state_manager.update_data(data).await;
return;
}
// Don't remove the entry — that's what made the card // Don't remove the entry — that's what made the card
// vanish from My Apps mid-install / between retry-loop // vanish from My Apps mid-install / between retry-loop
// attempts (e.g. tailscale's entrypoint failure). Leave // attempts (e.g. tailscale's entrypoint failure). Leave

View File

@ -707,12 +707,17 @@ pub(super) async fn get_app_config(
// effectively pinned at 2 by --cpus=2 (now removed). // effectively pinned at 2 by --cpus=2 (now removed).
// -maxconnections=125 — default but explicit, so ops can // -maxconnections=125 — default but explicit, so ops can
// tune downward on bandwidth-constrained nodes. // tune downward on bandwidth-constrained nodes.
// Log volume: -printtoconsole=0 — bitcoind already writes
// debug.log in the datadir (self-shrunk on restart); echoing it
// to stdout too pushed every IBD "UpdateTip" line through
// conmon into journald (>1 GB/day on a fresh node). Deep
// debugging uses /var/lib/archipelago/bitcoin/debug.log.
Some(vec![ Some(vec![
"-server=1".to_string(), "-server=1".to_string(),
"-rpcbind=0.0.0.0".to_string(), "-rpcbind=0.0.0.0".to_string(),
"-rpcallowip=0.0.0.0/0".to_string(), "-rpcallowip=0.0.0.0/0".to_string(),
"-rpcport=8332".to_string(), "-rpcport=8332".to_string(),
"-printtoconsole=1".to_string(), "-printtoconsole=0".to_string(),
"-datadir=/home/bitcoin/.bitcoin".to_string(), "-datadir=/home/bitcoin/.bitcoin".to_string(),
format!("-dbcache={}", bitcoin_dbcache_mb()), format!("-dbcache={}", bitcoin_dbcache_mb()),
"-par=0".to_string(), "-par=0".to_string(),

View File

@ -1,6 +1,8 @@
use super::config::get_containers_for_app; use super::config::get_containers_for_app;
use super::runtime::manifest_apps_dirs;
use crate::data_model::{PackageDataEntry, PackageState}; use crate::data_model::{PackageDataEntry, PackageState};
use anyhow::{Context, Result}; use anyhow::{Context, Result};
use archipelago_container::{AppManifest, Dependency};
use std::collections::HashMap; use std::collections::HashMap;
use tracing::info; use tracing::info;
@ -11,7 +13,38 @@ const BITCOIN_NAMES: &[&str] = &["bitcoin-knots", "bitcoin-core", "bitcoin"];
const ELECTRUM_NAMES: &[&str] = &["electrumx", "mempool-electrs", "electrs"]; const ELECTRUM_NAMES: &[&str] = &["electrumx", "mempool-electrs", "electrs"];
const ARCHIVAL_BITCOIN_DISK_GB: u64 = 1000; const ARCHIVAL_BITCOIN_DISK_GB: u64 = 1000;
/// The manifest string dependency that declares "needs an archival
/// (unpruned + txindex) Bitcoin node" — see `manifest_declares_archival_bitcoin`.
const ARCHIVAL_BITCOIN_DEPENDENCY: &str = "bitcoin:archival";
/// Whether `package_id`'s own on-disk manifest declares
/// `dependencies: [bitcoin:archival]`. Manifest-driven alternative to the
/// hardcoded id list below — a new app just declares the dependency instead
/// of needing a code change here.
fn manifest_declares_archival_bitcoin(package_id: &str) -> bool {
for apps_dir in manifest_apps_dirs() {
let path = apps_dir.join(package_id).join("manifest.yml");
let Ok(contents) = std::fs::read_to_string(&path) else {
continue;
};
let Ok(manifest) = AppManifest::parse(&contents) else {
continue;
};
return dependency_list_declares_archival_bitcoin(&manifest.app.dependencies);
}
false
}
fn dependency_list_declares_archival_bitcoin(deps: &[Dependency]) -> bool {
deps.iter()
.any(|dep| matches!(dep, Dependency::Simple(s) if s == ARCHIVAL_BITCOIN_DEPENDENCY))
}
fn requires_unpruned_bitcoin(package_id: &str) -> bool { fn requires_unpruned_bitcoin(package_id: &str) -> bool {
if manifest_declares_archival_bitcoin(package_id) {
return true;
}
// Fallback for apps not yet migrated to the manifest declaration above.
matches!( matches!(
package_id, package_id,
"electrumx" | "mempool-electrs" | "electrs" | "mempool" | "mempool-web" "electrumx" | "mempool-electrs" | "electrs" | "mempool" | "mempool-web"
@ -25,6 +58,7 @@ fn archival_bitcoin_required_message(package_id: &str) -> String {
} }
/// Snapshot of which dependency services are currently running. /// Snapshot of which dependency services are currently running.
#[derive(Debug)]
pub(super) struct RunningDeps { pub(super) struct RunningDeps {
pub has_bitcoin: bool, pub has_bitcoin: bool,
pub has_electrumx: bool, pub has_electrumx: bool,
@ -194,6 +228,190 @@ pub(super) fn check_install_deps(package_id: &str, deps: &RunningDeps) -> Result
} }
} }
// ---------------------------------------------------------------------------
// Bounded dependency wait (install race fix)
// ---------------------------------------------------------------------------
//
// Confirmed race on fresh nodes: the user clicks "Install LND" while
// bitcoin-knots is itself still installing/starting. `check_install_deps`
// rejected instantly ("LND requires a running Bitcoin node…") even though
// Bitcoin came up 55s later. The fix: when the dependency is INSTALLED
// (container exists in `podman ps -a`, or the package state knows about it)
// but not Running yet, poll for up to DEP_WAIT_MAX_ATTEMPTS × DEP_WAIT_INTERVAL
// (~3 minutes) before failing, surfacing "Waiting for X to start…" via the
// install-progress message. If the dependency is not installed at all, fail
// fast with the canonical `check_install_deps` message — waiting can't help.
/// Poll interval while waiting for an installed dependency to start.
pub(super) const DEP_WAIT_INTERVAL: std::time::Duration = std::time::Duration::from_secs(5);
/// 36 × 5s = 3 minutes of bounded waiting.
pub(super) const DEP_WAIT_MAX_ATTEMPTS: u32 = 36;
/// Marker error: the install was rejected by the dependency gate BEFORE any
/// resource (container, image, data dir) was created for the package. The
/// async install wrapper (`async_lifecycle.rs`) downcasts to this to remove
/// the optimistic `Installing` state entry instead of leaving a phantom
/// "Stopped" tile whose Start fails with `no such object`.
#[derive(Debug)]
pub(in crate::api::rpc) struct DependencyGateError(pub String);
impl std::fmt::Display for DependencyGateError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(&self.0)
}
}
impl std::error::Error for DependencyGateError {}
/// One unsatisfied install dependency: a user-facing label plus the container
/// name variants that would satisfy it.
struct MissingDep {
label: &'static str,
containers: &'static [&'static str],
}
/// Which dependencies `check_install_deps` would reject `package_id` over.
/// Must stay in lockstep with the match arms in `check_install_deps` (the
/// wait loop re-runs `check_install_deps` for the canonical error message).
fn missing_install_deps(package_id: &str, deps: &RunningDeps) -> Vec<MissingDep> {
const BITCOIN: MissingDep = MissingDep {
label: "Bitcoin",
containers: BITCOIN_NAMES,
};
const ELECTRUM: MissingDep = MissingDep {
label: "ElectrumX",
containers: ELECTRUM_NAMES,
};
let mut missing = Vec::new();
match package_id {
"electrumx" | "mempool-electrs" | "electrs" | "lnd" | "btcpay-server" | "btcpayserver" => {
if !deps.has_bitcoin {
missing.push(BITCOIN);
}
}
"mempool" | "mempool-web" => {
if !deps.has_bitcoin {
missing.push(BITCOIN);
}
if !deps.has_electrumx {
missing.push(ELECTRUM);
}
}
// fedimint deliberately absent: check_install_deps allows it without
// a local Bitcoin node (remote RPC configured in guardian setup).
_ => {}
}
missing
}
fn join_dep_labels(missing: &[MissingDep]) -> String {
missing
.iter()
.map(|d| d.label)
.collect::<Vec<_>>()
.join(" and ")
}
/// One snapshot of the dependency world, fed to [`wait_for_install_deps`].
pub(super) struct DepProbe {
/// Which dependency services are currently Running.
pub running: RunningDeps,
/// Container/package names that EXIST in any state — installed, but
/// possibly not running yet (`podman ps -a` package-state entries).
pub existing: Vec<String>,
}
/// All container names known to podman in any state (`podman ps -a`).
/// Conservative on probe failure: returns an empty list, which makes the
/// wait loop fall back to the pre-fix fail-fast behavior.
pub(super) async fn detect_existing_containers() -> Vec<String> {
let out = tokio::time::timeout(
std::time::Duration::from_secs(30),
tokio::process::Command::new("podman")
.args(["ps", "-a", "--format", "{{.Names}}"])
.output(),
)
.await;
match out {
Ok(Ok(o)) if o.status.success() => String::from_utf8_lossy(&o.stdout)
.lines()
.map(|l| l.trim().to_string())
.filter(|l| !l.is_empty())
.collect(),
_ => Vec::new(),
}
}
/// Bounded dependency gate. Returns the (satisfied) `RunningDeps` snapshot,
/// or a [`DependencyGateError`]:
/// - immediately, when a missing dependency is not installed at all
/// (canonical `check_install_deps` message), or
/// - after `max_attempts × interval`, when an installed dependency never
/// reached Running.
///
/// `probe` and `on_waiting` are injected so unit tests can drive the loop
/// without a podman runtime; production wires them to
/// `RpcHandler::dep_probe_for_install` / `set_install_message`.
pub(super) async fn wait_for_install_deps<P, PF, L, LF>(
package_id: &str,
mut probe: P,
mut on_waiting: L,
max_attempts: u32,
interval: std::time::Duration,
) -> Result<RunningDeps>
where
P: FnMut() -> PF,
PF: std::future::Future<Output = Result<DepProbe>>,
L: FnMut(String) -> LF,
LF: std::future::Future<Output = ()>,
{
let mut waited_attempts = 0u32;
loop {
let DepProbe { running, existing } = probe().await?;
let missing = missing_install_deps(package_id, &running);
if missing.is_empty() {
// Keep behavior in lockstep with the canonical gate (covers any
// future arm added there but not mirrored in missing_install_deps).
check_install_deps(package_id, &running)?;
return Ok(running);
}
// Fail fast if any missing dependency has no installed container
// under any name variant — waiting cannot satisfy it.
let some_dep_not_installed = missing
.iter()
.any(|dep| !dep.containers.iter().any(|c| existing.iter().any(|e| e == c)));
if some_dep_not_installed {
let msg = match check_install_deps(package_id, &running) {
Err(e) => e.to_string(),
Ok(()) => format!("{package_id} dependencies are not running"),
};
return Err(anyhow::Error::new(DependencyGateError(msg)));
}
if waited_attempts >= max_attempts {
let labels = join_dep_labels(&missing);
return Err(anyhow::Error::new(DependencyGateError(format!(
"{labels} is installed but did not reach the running state within \
{} seconds. Start {labels}, then install {package_id} again.",
u64::from(max_attempts) * interval.as_secs()
))));
}
waited_attempts += 1;
let labels = join_dep_labels(&missing);
if waited_attempts == 1 {
info!(
"Install {package_id}: dependency {labels} installed but not running yet — \
waiting up to {}s for it to start",
u64::from(max_attempts) * interval.as_secs()
);
}
on_waiting(format!("Waiting for {labels} to start…")).await;
tokio::time::sleep(interval).await;
}
}
/// ElectrumX and Mempool's Electrum backend need historical blocks from an /// ElectrumX and Mempool's Electrum backend need historical blocks from an
/// unpruned node while building their indexes. A pruned Bitcoin node can be /// unpruned node while building their indexes. A pruned Bitcoin node can be
/// running and RPC-reachable but still leave them stuck with closed ports. /// running and RPC-reachable but still leave them stuck with closed ports.
@ -376,16 +594,31 @@ pub(super) fn startup_order(package_id: &str) -> &'static [&'static str] {
/// order for the given app. Unknown containers sort to the end. /// order for the given app. Unknown containers sort to the end.
pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec<String>> { pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec<String>> {
let containers = get_containers_for_app(package_id).await?; let containers = get_containers_for_app(package_id).await?;
Ok(order_present_containers(package_id, containers))
}
/// Order the *actually-present* containers of an app by its dependency-aware
/// startup order. Containers whose name is unknown to the order list sort to
/// the end, preserving their relative input order.
///
/// This deliberately does NOT inject order entries that aren't live
/// containers. `startup_order` is a union of container-name variants across
/// install generations (e.g. `mysql-mempool` vs `archy-mempool-db`), so any
/// single install only ever has a subset of those names. Injecting a phantom
/// name makes the start path fail on a "no such object" inspect — and because
/// `do_orchestrator_package_start` propagates the unknown-app-id fallback
/// error via `?`, every later member (the api + frontend) is then skipped,
/// leaving the stack down until the health monitor recovers it minutes later.
/// That was the source of mempool gate flakes #73 (frontend) / #74 (api).
fn order_present_containers(package_id: &str, containers: Vec<String>) -> Vec<String> {
if containers.is_empty() {
// Nothing is live under any known name. Fall back to the package id so
// a single-container app whose container matches its id still gets one
// start attempt; multi-container stacks with no live members are
// surfaced as "no containers" by the caller's emptiness check.
return vec![package_id.to_string()];
}
let order = startup_order(package_id); let order = startup_order(package_id);
if order.is_empty() && containers.is_empty() {
return Ok(vec![package_id.to_string()]);
}
let mut sorted = containers;
for required in order {
if !sorted.iter().any(|name| name == required) {
sorted.push((*required).to_string());
}
}
// If no special order is defined, fall back to mempool order for legacy // If no special order is defined, fall back to mempool order for legacy
// multi-container names that may still be returned by config lookups. // multi-container names that may still be returned by config lookups.
let effective_order: &[&str] = if order.is_empty() { let effective_order: &[&str] = if order.is_empty() {
@ -393,8 +626,14 @@ pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec
} else { } else {
order order
}; };
sorted.sort_by_key(|c| effective_order.iter().position(|o| *o == c).unwrap_or(99)); let mut sorted = containers;
Ok(sorted) sorted.sort_by_key(|c| {
effective_order
.iter()
.position(|o| *o == c)
.unwrap_or(usize::MAX)
});
sorted
} }
/// Configure Fedimint Gateway to use LND instead of LDK. /// Configure Fedimint Gateway to use LND instead of LDK.
@ -452,7 +691,52 @@ pub(super) fn configure_fedimint_lnd(
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{requires_unpruned_bitcoin, startup_order}; use super::{
dependency_list_declares_archival_bitcoin, manifest_declares_archival_bitcoin,
order_present_containers, requires_unpruned_bitcoin, startup_order,
};
use archipelago_container::Dependency;
#[test]
fn order_present_containers_never_injects_phantom_stack_members() {
// The live mempool stack on a node: db + api + frontend. These are the
// only real container names; the startup_order list also contains
// variant/legacy names (mysql-mempool, archy-mempool-api, ...) that are
// NOT live here and must never appear in the result — a phantom name in
// the start list aborts the orchestrator start mid-sequence (gate
// #73/#74).
let present = vec![
"mempool".to_string(),
"mempool-api".to_string(),
"archy-mempool-db".to_string(),
];
let ordered = order_present_containers("mempool", present);
// Dependency order: db -> api -> frontend.
assert_eq!(ordered, vec!["archy-mempool-db", "mempool-api", "mempool"]);
// No phantom variants leaked in.
for phantom in ["mysql-mempool", "archy-mempool-api", "archy-mempool-web"] {
assert!(
!ordered.iter().any(|c| c == phantom),
"phantom {phantom} must not be injected"
);
}
}
#[test]
fn order_present_containers_orders_known_before_unknown() {
let present = vec!["mempool".to_string(), "some-sidecar".to_string()];
let ordered = order_present_containers("mempool", present);
// The known frontend sorts ahead of an unknown sidecar.
assert_eq!(ordered, vec!["mempool", "some-sidecar"]);
}
#[test]
fn order_present_containers_empty_falls_back_to_package_id() {
assert_eq!(
order_present_containers("mempool", vec![]),
vec!["mempool".to_string()]
);
}
#[test] #[test]
fn btcpay_start_order_includes_required_stack_members() { fn btcpay_start_order_includes_required_stack_members() {
@ -485,4 +769,272 @@ mod tests {
assert!(!requires_unpruned_bitcoin(package_id), "{package_id}"); assert!(!requires_unpruned_bitcoin(package_id), "{package_id}");
} }
} }
#[test]
fn dependency_matcher_finds_the_archival_marker_among_other_deps() {
let deps = vec![
Dependency::App {
app_id: "bitcoin-knots".to_string(),
version: Some(">=26.0".to_string()),
},
Dependency::Storage {
storage: "50Gi".to_string(),
},
Dependency::Simple("bitcoin:archival".to_string()),
];
assert!(dependency_list_declares_archival_bitcoin(&deps));
}
#[test]
fn dependency_matcher_false_when_marker_absent() {
let deps = vec![Dependency::App {
app_id: "bitcoin-knots".to_string(),
version: Some(">=26.0".to_string()),
}];
assert!(!dependency_list_declares_archival_bitcoin(&deps));
assert!(!dependency_list_declares_archival_bitcoin(&[]));
}
#[test]
fn manifest_declared_archival_bitcoin_covers_a_new_app_without_a_code_change() {
// electrumx and mempool declare `dependencies: [..., bitcoin:archival]`
// on disk (apps/electrumx/manifest.yml, apps/mempool/manifest.yml) —
// this is the manifest-driven path working end-to-end, not the
// hardcoded id list. A future app only needs this manifest line, no
// edit to `requires_unpruned_bitcoin`.
assert!(manifest_declares_archival_bitcoin("electrumx"));
assert!(manifest_declares_archival_bitcoin("mempool"));
// An app whose manifest exists but never declares the marker.
assert!(!manifest_declares_archival_bitcoin("bitcoin-knots"));
// An id with no manifest on disk at all.
assert!(!manifest_declares_archival_bitcoin("does-not-exist"));
}
mod dep_wait {
use super::super::{wait_for_install_deps, DepProbe, DependencyGateError, RunningDeps};
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Duration;
fn deps(has_bitcoin: bool, has_electrumx: bool) -> RunningDeps {
RunningDeps {
has_bitcoin,
has_electrumx,
has_lnd: false,
}
}
fn probe(has_bitcoin: bool, has_electrumx: bool, existing: &[&str]) -> DepProbe {
DepProbe {
running: deps(has_bitcoin, has_electrumx),
existing: existing.iter().map(|s| s.to_string()).collect(),
}
}
/// Collects "Waiting for X to start…" labels emitted during the wait.
fn label_sink() -> (Arc<Mutex<Vec<String>>>, impl FnMut(String) -> std::future::Ready<()>)
{
let labels = Arc::new(Mutex::new(Vec::new()));
let sink = {
let labels = Arc::clone(&labels);
move |msg: String| {
labels.lock().unwrap().push(msg);
std::future::ready(())
}
};
(labels, sink)
}
#[tokio::test]
async fn passes_immediately_when_dependency_is_running() {
let (labels, sink) = label_sink();
let result = wait_for_install_deps(
"lnd",
|| async { Ok(probe(true, false, &["bitcoin-knots"])) },
sink,
3,
Duration::ZERO,
)
.await;
assert!(result.is_ok());
assert!(labels.lock().unwrap().is_empty(), "no waiting expected");
}
#[tokio::test]
async fn fails_fast_when_dependency_not_installed_at_all() {
let calls = AtomicU32::new(0);
let (labels, sink) = label_sink();
let err = wait_for_install_deps(
"lnd",
|| {
calls.fetch_add(1, Ordering::SeqCst);
async { Ok(probe(false, false, &["uptime-kuma"])) }
},
sink,
36,
Duration::ZERO,
)
.await
.unwrap_err();
// Single probe — no polling when waiting cannot help.
assert_eq!(calls.load(Ordering::SeqCst), 1);
assert!(labels.lock().unwrap().is_empty());
// Canonical check_install_deps message, wrapped in the gate marker
// so async_lifecycle removes the optimistic Installing entry.
assert!(err.downcast_ref::<DependencyGateError>().is_some());
assert!(
err.to_string().contains("LND requires a running Bitcoin node"),
"unexpected message: {err}"
);
}
#[tokio::test]
async fn waits_while_installed_dependency_starts_then_passes() {
// Bitcoin container exists (installing/starting) but only reports
// Running from the 3rd probe onward — the log-confirmed LND race.
let calls = Arc::new(AtomicU32::new(0));
let (labels, sink) = label_sink();
let probe_calls = Arc::clone(&calls);
let result = wait_for_install_deps(
"lnd",
move || {
let n = probe_calls.fetch_add(1, Ordering::SeqCst);
async move { Ok(probe(n >= 2, false, &["bitcoin-knots"])) }
},
sink,
36,
Duration::ZERO,
)
.await;
assert!(result.is_ok(), "{result:?}");
assert_eq!(calls.load(Ordering::SeqCst), 3);
let labels = labels.lock().unwrap();
assert_eq!(labels.len(), 2, "one waiting label per polling attempt");
assert!(labels.iter().all(|l| l == "Waiting for Bitcoin to start…"));
}
#[tokio::test]
async fn times_out_when_installed_dependency_never_runs() {
let (labels, sink) = label_sink();
let err = wait_for_install_deps(
"lnd",
|| async { Ok(probe(false, false, &["bitcoin-knots"])) },
sink,
4,
Duration::ZERO,
)
.await
.unwrap_err();
assert!(err.downcast_ref::<DependencyGateError>().is_some());
assert!(
err.to_string()
.contains("did not reach the running state within 0 seconds"),
"unexpected message: {err}"
);
assert_eq!(labels.lock().unwrap().len(), 4);
}
#[tokio::test]
async fn mempool_waits_on_both_bitcoin_and_electrumx() {
let calls = Arc::new(AtomicU32::new(0));
let (labels, sink) = label_sink();
let probe_calls = Arc::clone(&calls);
let result = wait_for_install_deps(
"mempool",
move || {
let n = probe_calls.fetch_add(1, Ordering::SeqCst);
// Bitcoin comes up on probe 2, electrumx on probe 3.
async move { Ok(probe(n >= 1, n >= 2, &["bitcoin-knots", "electrumx"])) }
},
sink,
36,
Duration::ZERO,
)
.await;
assert!(result.is_ok(), "{result:?}");
let labels = labels.lock().unwrap();
assert_eq!(
labels.as_slice(),
&[
"Waiting for Bitcoin and ElectrumX to start…".to_string(),
"Waiting for ElectrumX to start…".to_string(),
]
);
}
#[tokio::test]
async fn mempool_fails_fast_when_one_dep_is_not_installed() {
// Bitcoin is installed (waiting could help) but ElectrumX is not
// installed at all — waiting can never satisfy the gate, so fail
// fast with the canonical message.
let (labels, sink) = label_sink();
let err = wait_for_install_deps(
"mempool",
|| async { Ok(probe(false, false, &["bitcoin-knots"])) },
sink,
36,
Duration::ZERO,
)
.await
.unwrap_err();
assert!(err.downcast_ref::<DependencyGateError>().is_some());
assert!(labels.lock().unwrap().is_empty());
assert!(
err.to_string().contains("Mempool requires"),
"unexpected message: {err}"
);
}
#[tokio::test]
async fn variant_container_names_count_as_installed() {
// bitcoin-core (not just bitcoin-knots) satisfies the "installed"
// check for the wait path.
let calls = Arc::new(AtomicU32::new(0));
let (_labels, sink) = label_sink();
let probe_calls = Arc::clone(&calls);
let result = wait_for_install_deps(
"electrumx",
move || {
let n = probe_calls.fetch_add(1, Ordering::SeqCst);
async move { Ok(probe(n >= 1, false, &["bitcoin-core"])) }
},
sink,
36,
Duration::ZERO,
)
.await;
assert!(result.is_ok(), "{result:?}");
}
#[tokio::test]
async fn apps_without_dependency_gate_pass_untouched() {
let (labels, sink) = label_sink();
let result = wait_for_install_deps(
"uptime-kuma",
|| async { Ok(probe(false, false, &[])) },
sink,
36,
Duration::ZERO,
)
.await;
assert!(result.is_ok());
assert!(labels.lock().unwrap().is_empty());
}
}
#[test]
fn mempool_api_is_directly_installable_and_covered_by_the_archival_gate() {
// `mempool-api` is a legitimate direct `package.install` target
// (`uses_orchestrator_install_flow` in install.rs), reachable without
// going through the `mempool`/`mempool-web` umbrella id that the old
// hardcoded fallback list only recognized. It was missing from that
// list, so installing/repairing it directly skipped the archival
// Bitcoin gate entirely. Its manifest now declares `bitcoin:archival`
// directly, closing the gap the manifest-driven path exists for.
assert!(requires_unpruned_bitcoin("mempool-api"));
assert!(manifest_declares_archival_bitcoin("mempool-api"));
// `archy-mempool-web` has no direct Bitcoin RPC access
// (bitcoin_integration.rpc_access: none) and correctly stays excluded.
assert!(!requires_unpruned_bitcoin("archy-mempool-web"));
}
} }

View File

@ -3,9 +3,10 @@ use super::config::{
is_readonly_compatible, is_valid_docker_image, is_readonly_compatible, is_valid_docker_image,
}; };
use super::dependencies::{ use super::dependencies::{
check_bitcoin_pruning_compatibility, check_install_deps, configure_fedimint_lnd, check_bitcoin_pruning_compatibility, configure_fedimint_lnd, detect_existing_containers,
detect_running_deps, detect_running_deps_from_package_data, log_optional_dep_info, detect_running_deps, detect_running_deps_from_package_data, log_optional_dep_info,
needs_archy_net, RunningDeps, needs_archy_net, wait_for_install_deps, DepProbe, RunningDeps, DEP_WAIT_INTERVAL,
DEP_WAIT_MAX_ATTEMPTS,
}; };
use super::progress::parse_pull_progress; use super::progress::parse_pull_progress;
use super::validation::validate_app_id; use super::validation::validate_app_id;
@ -243,6 +244,17 @@ impl RpcHandler {
} }
} }
// Multi-version support: honor an install-time version selection for the
// orchestrator-managed Bitcoin apps. Selecting the catalog default (or
// omitting `version`) leaves the app unpinned (tracks latest); selecting
// an older version pins it so install_fresh resolves that image and the
// update badge stays suppressed. See docs/bitcoin-multi-version-design.md.
if matches!(package_id, "bitcoin-core" | "bitcoin-knots") {
if let Some(version) = params.get("version").and_then(|v| v.as_str()) {
persist_install_version_selection(package_id, version).await;
}
}
// Phase: Preparing — emit BEFORE the stack dispatch so multi-container // Phase: Preparing — emit BEFORE the stack dispatch so multi-container
// stacks also flip state to Installing immediately. Without this, the // stacks also flip state to Installing immediately. Without this, the
// backend's package state for stack apps stayed empty until the first // backend's package state for stack apps stayed empty until the first
@ -254,8 +266,7 @@ impl RpcHandler {
.await; .await;
if matches!(package_id, "mempool" | "mempool-web") { if matches!(package_id, "mempool" | "mempool-web") {
let deps = self.running_deps_for_install(package_id).await?; self.gate_install_deps(package_id).await?;
check_install_deps(package_id, &deps)?;
check_bitcoin_pruning_compatibility(package_id).await?; check_bitcoin_pruning_compatibility(package_id).await?;
} }
@ -278,9 +289,11 @@ impl RpcHandler {
// Dependency checks. Prefer the scanner's cached package state so a // Dependency checks. Prefer the scanner's cached package state so a
// congested Podman API does not turn an already-running dependency into // congested Podman API does not turn an already-running dependency into
// a false install failure. Fall back to a bounded direct Podman probe // a false install failure. Fall back to a bounded direct Podman probe
// only when the cache does not show the dependency. // only when the cache does not show the dependency. When the dependency
let deps = self.running_deps_for_install(package_id).await?; // is installed but not Running yet (the "clicked Install LND 55s before
check_install_deps(package_id, &deps)?; // Bitcoin was up" race), wait up to ~3 minutes for it instead of
// failing instantly.
let deps = self.gate_install_deps(package_id).await?;
check_bitcoin_pruning_compatibility(package_id).await?; check_bitcoin_pruning_compatibility(package_id).await?;
log_optional_dep_info(package_id, &deps); log_optional_dep_info(package_id, &deps);
let repaired_bitcoin_conf = let repaired_bitcoin_conf =
@ -934,6 +947,27 @@ impl RpcHandler {
} }
} }
/// Bounded dependency gate for installs: passes immediately when deps are
/// running, fails fast (with the phantom-tile marker) when a dependency
/// isn't installed at all, and otherwise waits up to
/// `DEP_WAIT_MAX_ATTEMPTS × DEP_WAIT_INTERVAL` for an installed-but-
/// starting dependency, surfacing "Waiting for X to start…" on the card.
pub(super) async fn gate_install_deps(&self, package_id: &str) -> Result<RunningDeps> {
wait_for_install_deps(
package_id,
|| async {
Ok(DepProbe {
running: self.running_deps_for_install(package_id).await?,
existing: detect_existing_containers().await,
})
},
|msg| async move { self.set_install_message(package_id, &msg).await },
DEP_WAIT_MAX_ATTEMPTS,
DEP_WAIT_INTERVAL,
)
.await
}
// -- Private helpers for install -- // -- Private helpers for install --
/// Pull the image from a registry or verify a local image exists. /// Pull the image from a registry or verify a local image exists.
@ -1284,6 +1318,11 @@ impl RpcHandler {
// Default to full archive — operators with 2TB+ drives shouldn't be // Default to full archive — operators with 2TB+ drives shouldn't be
// silently pruned down to 550 MB. Users who want a pruned node can // silently pruned down to 550 MB. Users who want a pruned node can
// set `prune=N` in bitcoin.conf themselves after install. // set `prune=N` in bitcoin.conf themselves after install.
//
// printtoconsole=0: bitcoind already writes debug.log in the datadir
// (self-shrunk on restart); duplicating it to stdout pushed every IBD
// "UpdateTip" line through conmon into journald (>1 GB/day). Deep
// debugging uses /var/lib/archipelago/bitcoin/debug.log.
let bitcoin_conf = format!( let bitcoin_conf = format!(
"\ "\
# rpcauth: salted hash only - no plaintext password in config or CLI\n\ # rpcauth: salted hash only - no plaintext password in config or CLI\n\
@ -1293,7 +1332,7 @@ rpcallowip=0.0.0.0/0\n\
listen=1\n\ listen=1\n\
rpcthreads=16\n\ rpcthreads=16\n\
rpcworkqueue=256\n\ rpcworkqueue=256\n\
printtoconsole=1\n", printtoconsole=0\n",
rpcauth_line rpcauth_line
); );
tokio::fs::create_dir_all(bitcoin_dir) tokio::fs::create_dir_all(bitcoin_dir)
@ -2427,6 +2466,36 @@ exit 2
} }
} }
/// Persist an install-time version selection for a multi-version app. Selecting
/// the catalog default (or a version equal to it) un-pins so the app tracks
/// latest; selecting any other version pins it. Best-effort: a write failure
/// just means the app installs at the catalog default.
async fn persist_install_version_selection(app_id: &str, version: &str) {
use crate::container::version_config::{read, write, AppVersionConfig};
let is_default = crate::container::app_catalog::catalog_default_version(app_id)
.map(|d| d == version)
.unwrap_or(false);
let existing = read(app_id);
let cfg = AppVersionConfig {
pinned_version: if is_default {
None
} else {
Some(version.to_string())
},
auto_update: existing.auto_update,
};
if let Err(e) = write(app_id, &cfg) {
tracing::warn!(app_id, version, error = %e, "failed to persist install-time version selection");
} else {
tracing::info!(
app_id,
version,
pinned = !is_default,
"persisted install-time version selection"
);
}
}
fn should_try_orchestrator_install(package_id: &str, orchestrator_available: bool) -> bool { fn should_try_orchestrator_install(package_id: &str, orchestrator_available: bool) -> bool {
orchestrator_available && uses_orchestrator_install_flow(package_id) orchestrator_available && uses_orchestrator_install_flow(package_id)
} }

View File

@ -5,6 +5,7 @@ mod install;
mod lifecycle; mod lifecycle;
mod progress; mod progress;
mod runtime; mod runtime;
mod set_config;
mod stacks; mod stacks;
mod update; mod update;
mod validation; mod validation;

View File

@ -61,6 +61,31 @@ impl RpcHandler {
self.state_manager.update_data(data).await; self.state_manager.update_data(data).await;
} }
/// Set a user-facing install status message (e.g. "Waiting for Bitcoin
/// to start…") without disturbing the current phase/byte counters.
pub(super) async fn set_install_message(&self, package_id: &str, message: &str) {
let (mut data, _rev) = self.state_manager.get_snapshot().await;
let entry = data
.package_data
.entry(package_id.to_string())
.or_insert_with(|| create_installing_entry(package_id));
if entry.state != PackageState::Updating {
entry.state = PackageState::Installing;
}
let (size, downloaded, phase) = entry
.install_progress
.as_ref()
.map(|p| (p.size, p.downloaded, p.phase))
.unwrap_or((0, 0, None));
entry.install_progress = Some(InstallProgress {
size,
downloaded,
phase,
message: Some(message.to_string()),
});
self.state_manager.update_data(data).await;
}
/// Clear install progress after pull completes or fails. /// Clear install progress after pull completes or fails.
pub(super) async fn clear_install_progress(&self, package_id: &str) { pub(super) async fn clear_install_progress(&self, package_id: &str) {
let (mut data, _rev) = self.state_manager.get_snapshot().await; let (mut data, _rev) = self.state_manager.get_snapshot().await;

View File

@ -312,7 +312,16 @@ impl RpcHandler {
let mut stopped = 0u32; let mut stopped = 0u32;
let mut removed = 0u32; let mut removed = 0u32;
let mut errors = Vec::new(); // Two distinct failure classes, kept separate so they don't get
// conflated (the old single `errors` vec did, which caused the "ghost in
// My Apps" bug): `container_errors` means a container could NOT be
// removed (force-rm failed too) — the app is genuinely still present, so
// we keep its state entry and surface a hard error. `cleanup_errors`
// means volume/network/data-dir teardown left residue — the containers
// are already gone, so the app IS uninstalled and MUST disappear from My
// Apps; the residue is logged but never ghosts the app.
let mut container_errors: Vec<String> = Vec::new();
let mut cleanup_errors: Vec<String> = Vec::new();
self.set_uninstall_stage( self.set_uninstall_stage(
package_id, package_id,
@ -370,7 +379,7 @@ impl RpcHandler {
let msg = let msg =
format!("Failed to remove {}: {}; {}", name, stderr.trim(), e); format!("Failed to remove {}: {}; {}", name, stderr.trim(), e);
tracing::error!("Uninstall {}: {}", package_id, msg); tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg); container_errors.push(msg);
} }
} }
} }
@ -379,12 +388,35 @@ impl RpcHandler {
Err(force_err) => { Err(force_err) => {
let msg = format!("Failed to remove {}: {}; {}", name, e, force_err); let msg = format!("Failed to remove {}: {}; {}", name, e, force_err);
tracing::error!("Uninstall {}: {}", package_id, msg); tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg); container_errors.push(msg);
} }
}, },
} }
} }
// A container that survived even force-remove means the app is NOT
// actually uninstalled — keep its state entry and fail so the spawned
// task reverts it to its prior state (and the user can retry), rather
// than orphaning a live container that's missing from My Apps.
if !container_errors.is_empty() {
tracing::error!(
"Uninstall {}: containers could not be removed: {:?}",
package_id,
container_errors
);
return Err(anyhow::anyhow!(
"Uninstall {} failed: {}",
package_id,
container_errors.join("; ")
));
}
// Containers are gone → the app is uninstalled. Remove its state entry
// NOW, before the (possibly slow, possibly fallible) volume/data
// teardown below, so My Apps updates immediately and a residue failure
// can never leave a ghost. Reinstall/scan no longer see a stale entry.
self.remove_package_state_entry(package_id).await;
self.set_uninstall_stage(package_id, "Cleaning up volumes") self.set_uninstall_stage(package_id, "Cleaning up volumes")
.await; .await;
// Avoid global Podman volume prune on production nodes: store-wide // Avoid global Podman volume prune on production nodes: store-wide
@ -432,70 +464,73 @@ impl RpcHandler {
let stderr = String::from_utf8_lossy(&o.stderr); let stderr = String::from_utf8_lossy(&o.stderr);
let msg = format!("Failed to remove data {}: {}", dir, stderr.trim()); let msg = format!("Failed to remove data {}: {}", dir, stderr.trim());
tracing::error!("Uninstall {}: {}", package_id, msg); tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg); cleanup_errors.push(msg);
} }
Err(e) => { Err(e) => {
let msg = format!("Failed to remove data {}: {}", dir, e); let msg = format!("Failed to remove data {}: {}", dir, e);
tracing::error!("Uninstall {}: {}", package_id, msg); tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg); cleanup_errors.push(msg);
} }
_ => {} _ => {}
} }
} }
} }
if !errors.is_empty() { // The app is already gone from My Apps (entry removed above). Residual
// volume/data cleanup failures are logged but NEVER ghost the app — a
// reinstall and the next uninstall both tolerate leftover dirs.
if !cleanup_errors.is_empty() {
tracing::error!( tracing::error!(
"Uninstall {} completed with errors: {:?}", "Uninstall {} removed but left cleanup residue: {:?}",
package_id, package_id,
errors cleanup_errors
); );
return Err(anyhow::anyhow!(
"Uninstall {} partially failed: {}",
package_id,
errors.join("; ")
));
} }
tracing::info!( tracing::info!(
"Uninstall {} complete: stopped={}, removed={}", "Uninstall {} complete: stopped={}, removed={}, cleanup_errors={}",
package_id, package_id,
stopped, stopped,
removed removed,
cleanup_errors.len()
); );
// Immediately remove from in-memory state so the UI updates without
// waiting for the scanner's absence threshold (3 scans × 60s each).
{
let (mut data, _rev) = self.state_manager.get_snapshot().await;
let before = data.package_data.len();
data.package_data.remove(package_id);
// Also remove any alias keys (e.g. "bitcoin-knots" vs "bitcoin")
let aliases: Vec<String> = data
.package_data
.keys()
.filter(|k| {
super::config::all_container_names(package_id)
.iter()
.any(|c| c.strip_prefix("archy-").unwrap_or(c) == k.as_str())
})
.cloned()
.collect();
for alias in &aliases {
data.package_data.remove(alias);
}
if data.package_data.len() < before {
self.state_manager.update_data(data).await;
}
}
Ok(serde_json::json!({ Ok(serde_json::json!({
"status": "uninstalled", "status": "uninstalled",
"stopped": stopped, "stopped": stopped,
"removed": removed, "removed": removed,
"cleanup_warnings": cleanup_errors,
})) }))
} }
/// Remove a package's entry (and any alias keys) from persisted state so it
/// disappears from My Apps immediately, without waiting for the scanner's
/// absence threshold (3 scans × 60s). Called as soon as an uninstall has
/// removed the app's containers — before the slower volume/data teardown —
/// so a residue failure can never leave a ghost entry behind.
async fn remove_package_state_entry(&self, package_id: &str) {
let (mut data, _rev) = self.state_manager.get_snapshot().await;
let before = data.package_data.len();
data.package_data.remove(package_id);
// Also remove any alias keys (e.g. "bitcoin-knots" vs "bitcoin").
let aliases: Vec<String> = data
.package_data
.keys()
.filter(|k| {
super::config::all_container_names(package_id)
.iter()
.any(|c| c.strip_prefix("archy-").unwrap_or(c) == k.as_str())
})
.cloned()
.collect();
for alias in &aliases {
data.package_data.remove(alias);
}
if data.package_data.len() < before {
self.state_manager.update_data(data).await;
}
}
/// Start a bundled app (create container from pre-loaded image if needed). /// Start a bundled app (create container from pre-loaded image if needed).
pub(in crate::api::rpc) async fn handle_bundled_app_start( pub(in crate::api::rpc) async fn handle_bundled_app_start(
&self, &self,
@ -1568,7 +1603,7 @@ fn manifest_host_ports(container_name: &str) -> Vec<u16> {
Vec::new() Vec::new()
} }
fn manifest_apps_dirs() -> Vec<std::path::PathBuf> { pub(super) fn manifest_apps_dirs() -> Vec<std::path::PathBuf> {
let mut dirs = Vec::new(); let mut dirs = Vec::new();
if let Ok(manifest_dir) = std::env::var("CARGO_MANIFEST_DIR") { if let Ok(manifest_dir) = std::env::var("CARGO_MANIFEST_DIR") {
dirs.push(Path::new(&manifest_dir).join("../../apps")); dirs.push(Path::new(&manifest_dir).join("../../apps"));
@ -1912,6 +1947,17 @@ pub(super) fn orchestrator_uninstall_app_ids(package_id: &str) -> Vec<String> {
"archy-btcpay-db".into(), "archy-btcpay-db".into(),
], ],
"fedimint" => vec!["fedimint".into(), "fedimint-gateway".into()], "fedimint" => vec!["fedimint".into(), "fedimint-gateway".into()],
// Immich: multi-container stack, mirrors `immich_stack_app_ids` in
// stacks.rs. Without this, uninstalling "immich" only disabled the
// orchestrator-tracked "immich" app_id — "immich-postgres" and
// "immich-redis" stayed enabled, so the boot reconciler kept
// restarting their leftover stopped containers forever after the
// generic uninstall path stopped them (`.198`, 2026-07-01).
"immich" => vec![
"immich-postgres".into(),
"immich-redis".into(),
"immich".into(),
],
_ => vec![package_id.to_string()], _ => vec![package_id.to_string()],
} }
} }
@ -1931,4 +1977,19 @@ mod tests {
fn runtime_host_ports_preserve_legacy_extra_ports() { fn runtime_host_ports_preserve_legacy_extra_ports() {
assert_eq!(runtime_host_ports("gitea"), vec![3001, 2222, 3000]); assert_eq!(runtime_host_ports("gitea"), vec![3001, 2222, 3000]);
} }
#[test]
fn immich_uninstall_covers_every_sibling_orchestrator_app_id() {
// Regression: uninstalling "immich" used to only disable the
// "immich" app_id itself, leaving immich-postgres/immich-redis
// enabled — the boot reconciler kept restarting their leftover
// stopped containers forever (.198, 2026-07-01).
let ids = orchestrator_uninstall_app_ids("immich");
for expected in ["immich-postgres", "immich-redis", "immich"] {
assert!(
ids.iter().any(|id| id == expected),
"missing {expected} in {ids:?}"
);
}
}
} }

View File

@ -0,0 +1,352 @@
//! Multi-version support — version listing + in-app version switch / pin /
//! auto-update toggle (`docs/bitcoin-multi-version-design.md` §3 Phase 3).
//!
//! Two RPCs:
//! - `package.versions` — read the selectable versions for an app plus the
//! runner's current pin / auto-update preference and (best-effort) the
//! version actually running. Drives the install modal + "Version & Updates"
//! card.
//! - `package.set-config` — persist a version pin (or un-pin to track latest)
//! and/or the auto-update toggle, then recreate the app at the chosen image
//! when the version actually changed. A DOWNGRADE (older release over a
//! newer chainstate — the highest-risk operation, design §4) is refused
//! unless the caller passes `confirm: true`, so the UI can warn first.
use super::config::get_containers_for_app;
use super::install::install_log;
use super::validation::validate_app_id;
use crate::api::rpc::RpcHandler;
use crate::container::{app_catalog, version_config};
use anyhow::Result;
use std::sync::Arc;
use tracing::{info, warn};
/// Apps that participate in multi-version selection today. Kept narrow on
/// purpose: version switching recreates the container, which is only safe for
/// the single-container, orchestrator-managed Bitcoin backends whose data and
/// downgrade semantics we understand. Any app the catalog gives a `versions[]`
/// list also qualifies (third-party registry apps inherit the capability).
fn supports_versions(app_id: &str) -> bool {
matches!(app_id, "bitcoin-core" | "bitcoin-knots")
|| !app_catalog::catalog_versions(app_id).is_empty()
}
/// Extract the tag from a full image reference, leaving a `registry:port/repo`
/// host-port colon intact (only a colon AFTER the last `/` is a tag).
fn image_tag(image: &str) -> Option<String> {
let after_slash = image.rsplit_once('/').map(|(_, r)| r).unwrap_or(image);
after_slash
.rsplit_once(':')
.map(|(_, tag)| tag.to_string())
.filter(|t| !t.is_empty())
}
/// Best-effort: the version tag of the backend container actually running for
/// `app_id`, by inspecting its image. `None` when not installed or unreadable.
async fn installed_version(app_id: &str) -> Option<String> {
let containers = get_containers_for_app(app_id).await.ok()?;
// Prefer the backend container (exact id / `archy-<id>`) over UI companions.
let name = containers
.iter()
.find(|n| n.as_str() == app_id || n.as_str() == format!("archy-{app_id}"))
.or_else(|| containers.first())?;
let out = tokio::process::Command::new("podman")
.args(["inspect", name, "--format", "{{.ImageName}}"])
.output()
.await
.ok()?;
if !out.status.success() {
return None;
}
let image = String::from_utf8_lossy(&out.stdout).trim().to_string();
let tag = image_tag(&image)?;
// A floating tag (latest/stable/...) names the reference used to CREATE the
// container, not what's actually running — podman never re-resolves it once
// cached, so a stale local `:latest` reports "latest" even when the real
// `latest` moved on months ago (.228, 2026-07-01: ran a 4-month-old cached
// image while a newer one already sat locally, unused). Ask the Bitcoin
// backends directly instead of trusting the tag literal in that case.
if is_floating_tag(&tag) {
if let Some(real) = bitcoind_reported_version(app_id, name).await {
return Some(real);
}
}
Some(tag)
}
fn is_floating_tag(tag: &str) -> bool {
matches!(tag, "latest" | "stable" | "release" | "main")
}
/// Best-effort: ask the running bitcoind binary for its own version, trimmed to
/// the catalog's version-tag format (e.g. `29.3.knots20260210`, `29.2`). `None`
/// for apps other than the Bitcoin backends (no generic way to introspect a
/// third-party image's content version this way) or if the exec fails.
async fn bitcoind_reported_version(app_id: &str, container_name: &str) -> Option<String> {
if !matches!(app_id, "bitcoin-core" | "bitcoin-knots") {
return None;
}
let out = tokio::process::Command::new("podman")
.args(["exec", container_name, "bitcoind", "--version"])
.output()
.await
.ok()?;
if !out.status.success() {
return None;
}
parse_bitcoind_version_output(&String::from_utf8_lossy(&out.stdout))
}
/// Parses e.g. "Bitcoin Knots daemon version v29.3.knots20260210\n..." or
/// "Bitcoin Core version v29.2.0\n..." down to the version tag after `version v`.
fn parse_bitcoind_version_output(output: &str) -> Option<String> {
let first_line = output.lines().next()?;
let (_, version) = first_line.rsplit_once("version v")?;
let version = version.trim();
if version.is_empty() {
return None;
}
Some(version.to_string())
}
impl RpcHandler {
/// `package.versions` — what a runner can install / switch to for this app,
/// plus their current preference and the running version.
pub(in crate::api::rpc) async fn handle_package_versions(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let params = params.ok_or_else(|| anyhow::anyhow!("Missing params"))?;
let app_id = params
.get("id")
.and_then(|v| v.as_str())
.ok_or_else(|| anyhow::anyhow!("Missing package id"))?;
validate_app_id(app_id)?;
let versions = app_catalog::catalog_versions(app_id);
let default = app_catalog::catalog_default_version(app_id);
let cfg = version_config::read(app_id);
let installed = installed_version(app_id).await;
Ok(serde_json::json!({
"id": app_id,
"supportsVersions": supports_versions(app_id),
"default": default,
"installedVersion": installed,
"pinnedVersion": cfg.pinned_version,
"autoUpdate": cfg.auto_update,
"versions": versions.iter().map(|v| serde_json::json!({
"version": v.version,
"default": v.default,
"deprecated": v.deprecated,
"eol": v.eol,
})).collect::<Vec<_>>(),
}))
}
/// `package.set-config` — persist version pin + auto-update preference and
/// recreate on an actual version change. Downgrades require `confirm:true`.
pub(in crate::api::rpc) async fn handle_package_set_config(
self: Arc<Self>,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let params = params.ok_or_else(|| anyhow::anyhow!("Missing params"))?;
let app_id = params
.get("id")
.and_then(|v| v.as_str())
.ok_or_else(|| anyhow::anyhow!("Missing package id"))?
.to_string();
validate_app_id(&app_id)?;
if !supports_versions(&app_id) {
return Err(anyhow::anyhow!(
"{} has no selectable versions in the catalog",
app_id
));
}
let confirm = params
.get("confirm")
.and_then(|v| v.as_bool())
.unwrap_or(false);
let existing = version_config::read(&app_id);
let default = app_catalog::catalog_default_version(&app_id);
// ---- Resolve the requested pin (if a version was supplied) ----------
// Absent `version` => leave the pin unchanged (an auto-update-only edit).
// `version == default` => un-pin (track latest). Any other version must
// exist in the catalog and resolve to a same-repo image, else reject.
let version_param = params
.get("version")
.and_then(|v| v.as_str())
.map(str::to_string);
let mut new_pin = existing.pinned_version.clone();
let mut version_changed = false;
if let Some(req) = version_param.as_deref() {
let resolved_pin = if default.as_deref() == Some(req) {
None // selecting the default un-pins
} else {
// Validate the version is real + same-repo before pinning.
if !app_catalog::catalog_versions(&app_id)
.iter()
.any(|v| v.version == req)
{
return Err(anyhow::anyhow!(
"version {} is not offered for {}",
req,
app_id
));
}
Some(req.to_string())
};
version_changed = resolved_pin != existing.pinned_version;
new_pin = resolved_pin;
}
let new_auto_update = params
.get("autoUpdate")
.and_then(|v| v.as_bool())
.unwrap_or(existing.auto_update);
// ---- Downgrade gate (design §4: warn + confirm + allow) -------------
// "Current" = what wrote the on-disk chainstate: the running version if
// we can read it, else the existing pin, else the catalog default.
if version_changed {
let target = version_param.as_deref().unwrap_or_default();
let current = installed_version(&app_id)
.await
.or_else(|| existing.pinned_version.clone())
.or_else(|| default.clone());
if let Some(current) = current {
if version_config::is_downgrade(&current, target) && !confirm {
warn!(
"set-config {}: refusing un-confirmed downgrade {} -> {}",
app_id, current, target
);
return Ok(serde_json::json!({
"status": "confirm_required",
"kind": "downgrade",
"id": app_id,
"currentVersion": current,
"targetVersion": target,
"warning": format!(
"Switching {app_id} from {current} down to {target} is a \
downgrade. Bitcoin may refuse to start on a chainstate \
written by the newer version without a full reindex, and \
a pruned node can lose block data. Re-confirm to proceed."
),
}));
}
}
}
// ---- Persist preference --------------------------------------------
version_config::write(
&app_id,
&version_config::AppVersionConfig {
pinned_version: new_pin.clone(),
auto_update: new_auto_update,
},
)?;
install_log(&format!(
"SET-CONFIG {}: pinned={:?} autoUpdate={} (version_changed={})",
app_id, new_pin, new_auto_update, version_changed
))
.await;
info!(
app_id = %app_id,
pinned = ?new_pin,
auto_update = new_auto_update,
version_changed,
"package.set-config applied"
);
// ---- Recreate when the version actually changed + app is installed --
// The orchestrator's install/recreate path reads the pin we just wrote
// (prod_orchestrator image resolution), so reusing the update machinery
// pulls + recreates at the chosen image. An auto-update-only edit, or a
// change to a not-installed app, just persists the preference.
let mut recreating = false;
if version_changed {
let installed = get_containers_for_app(&app_id)
.await
.map(|c| !c.is_empty())
.unwrap_or(false);
if installed {
recreating = true;
// Fire the existing async update flow; it flips state to
// Updating and recreates honoring the new pin. The UI polls.
self.clone()
.spawn_package_update(Some(serde_json::json!({ "id": app_id })))
.await?;
}
}
Ok(serde_json::json!({
"status": "ok",
"id": app_id,
"pinnedVersion": new_pin,
"autoUpdate": new_auto_update,
"versionChanged": version_changed,
"recreating": recreating,
}))
}
}
#[cfg(test)]
mod tests {
use super::{image_tag, is_floating_tag, parse_bitcoind_version_output};
#[test]
fn floating_tag_detects_generic_channel_names() {
for tag in ["latest", "stable", "release", "main"] {
assert!(is_floating_tag(tag), "{tag}");
}
for tag in ["29.3.knots20260508", "28.4", "v29.2.0"] {
assert!(!is_floating_tag(tag), "{tag}");
}
}
#[test]
fn parses_knots_version_line() {
assert_eq!(
parse_bitcoind_version_output(
"Bitcoin Knots daemon version v29.3.knots20260210\nCopyright...\n"
)
.as_deref(),
Some("29.3.knots20260210")
);
}
#[test]
fn parses_core_version_line() {
assert_eq!(
parse_bitcoind_version_output("Bitcoin Core version v29.2.0\n").as_deref(),
Some("29.2.0")
);
}
#[test]
fn parse_returns_none_when_output_has_no_version_marker() {
assert_eq!(parse_bitcoind_version_output("garbage output\n"), None);
assert_eq!(parse_bitcoind_version_output(""), None);
}
#[test]
fn image_tag_keeps_registry_port_colon() {
assert_eq!(
image_tag("146.59.87.168:3000/lfg2025/bitcoin:28.4").as_deref(),
Some("28.4")
);
assert_eq!(
image_tag("146.59.87.168:3000/lfg2025/bitcoin-knots:29.3.knots20260508").as_deref(),
Some("29.3.knots20260508")
);
// No tag => None (don't mistake the registry port for a tag).
assert_eq!(image_tag("146.59.87.168:3000/lfg2025/bitcoin"), None);
assert_eq!(
image_tag("docker.io/library/redis:7"),
Some("7".to_string())
);
}
}

View File

@ -6,7 +6,6 @@
use crate::api::rpc::RpcHandler; use crate::api::rpc::RpcHandler;
use crate::data_model::InstallPhase; use crate::data_model::InstallPhase;
use anyhow::{Context, Result}; use anyhow::{Context, Result};
use base64::Engine;
use std::process::Output; use std::process::Output;
use std::time::Duration; use std::time::Duration;
use tracing::info; use tracing::info;
@ -696,6 +695,16 @@ fn immich_stack_app_ids() -> &'static [&'static str] {
&["immich-postgres", "immich-redis", "immich"] &["immich-postgres", "immich-redis", "immich"]
} }
fn netbird_stack_app_ids() -> &'static [&'static str] {
// Dependency/startup order: the combined management/signal/relay server
// first (it owns the base64 relay/store secrets + the sqlite store, and is
// the OIDC issuer the others point at), then the dashboard SPA, then the
// user-facing TLS proxy ("netbird", which carries the self-signed cert +
// the templated nginx.conf and is the launcher). Mirrors the netbird
// startup_order in dependencies.rs.
&["netbird-server", "netbird-dashboard", "netbird"]
}
fn indeedhub_stack_app_ids() -> &'static [&'static str] { fn indeedhub_stack_app_ids() -> &'static [&'static str] {
// Dependency order: backends + their generated secrets first, then the api // Dependency order: backends + their generated secrets first, then the api
// (owns indeedhub-jwt; reads the db/minio secrets the backends materialised), // (owns indeedhub-jwt; reads the db/minio secrets the backends materialised),
@ -715,10 +724,6 @@ fn indeedhub_stack_app_ids() -> &'static [&'static str] {
const REGISTRY: &str = "146.59.87.168:3000/lfg2025"; const REGISTRY: &str = "146.59.87.168:3000/lfg2025";
const NETBIRD_DASHBOARD_IMAGE: &str = "docker.io/netbirdio/dashboard:v2.38.0";
const NETBIRD_SERVER_IMAGE: &str = "docker.io/netbirdio/netbird-server:0.71.2";
const NETBIRD_PROXY_IMAGE: &str = "docker.io/library/nginx:1.27-alpine";
/// Pull an image with retry and exponential backoff (3 attempts). /// Pull an image with retry and exponential backoff (3 attempts).
async fn pull_image_with_retry(image: &str) -> Result<()> { async fn pull_image_with_retry(image: &str) -> Result<()> {
let exists = podman_stack_status(&["image", "exists", image], PODMAN_STACK_PROBE_TIMEOUT).await; let exists = podman_stack_status(&["image", "exists", image], PODMAN_STACK_PROBE_TIMEOUT).await;
@ -1004,9 +1009,9 @@ impl RpcHandler {
return Ok(adopted); return Ok(adopted);
} }
// Dependency check: Bitcoin must be running // Dependency check: Bitcoin must be running. Bounded wait covers the
let deps = super::dependencies::detect_running_deps().await?; // "installed but still starting" race instead of failing instantly.
super::dependencies::check_install_deps("btcpay-server", &deps)?; self.gate_install_deps("btcpay-server").await?;
install_log("INSTALL START: btcpay-server (stack: postgres + nbxplorer + btcpay)").await; install_log("INSTALL START: btcpay-server (stack: postgres + nbxplorer + btcpay)").await;
@ -1828,6 +1833,27 @@ impl RpcHandler {
/// Install self-hosted NetBird (dashboard + combined management/signal/relay server). /// Install self-hosted NetBird (dashboard + combined management/signal/relay server).
pub(super) async fn install_netbird_stack(&self) -> Result<serde_json::Value> { pub(super) async fn install_netbird_stack(&self) -> Result<serde_json::Value> {
// Manifest-driven path (#20 phase 4): render the 3-member stack from
// apps/netbird-*/manifest.yml via the orchestrator — dedicated
// netbird-net + network_aliases, base64 generated_secrets, a self-signed
// TLS cert (generated_certs) so the dashboard gets a secure context for
// OIDC PKCE (#15), and templated config.yaml/nginx.conf rendered from
// host facts + the netbird-net gateway. The manifests use the exact live
// container names, so on an existing node this ADOPTS the running stack
// rather than recreating it (the sqlite store + base64 keys are
// preserved — ensure_generated_secrets no-ops on existing files).
//
// #20 ph4: the legacy hardcoded `podman run` installer was DELETED — the
// signed catalog always ships apps/netbird-*/manifest.yml, so there is no
// in-Rust fallback. If the orchestrator doesn't know these app_ids and no
// running stack exists to adopt, install errors rather than silently
// diverging from the manifest contract.
if let Some(orchestrated) =
install_stack_via_orchestrator(self, "netbird", netbird_stack_app_ids()).await?
{
return Ok(orchestrated);
}
if let Some(adopted) = adopt_stack_if_exists( if let Some(adopted) = adopt_stack_if_exists(
"netbird", "netbird",
"netbird", "netbird",
@ -1838,491 +1864,12 @@ impl RpcHandler {
return Ok(adopted); return Ok(adopted);
} }
install_log("INSTALL START: netbird stack (dashboard + server)").await; anyhow::bail!(
info!("Installing self-hosted NetBird stack"); "netbird manifests not available on this node — the signed catalog must provide apps/netbird-*/manifest.yml (legacy hardcoded installer removed in #20 ph4)"
self.set_install_phase("netbird", InstallPhase::PullingImage)
.await;
for (i, image) in [
NETBIRD_DASHBOARD_IMAGE,
NETBIRD_SERVER_IMAGE,
NETBIRD_PROXY_IMAGE,
]
.iter()
.enumerate()
{
self.set_install_progress("netbird", i as u64, 3).await;
pull_image_with_retry(image)
.await
.with_context(|| format!("Failed to pull NetBird image: {}", image))?;
}
self.set_install_progress("netbird", 3, 3).await;
for name in ["netbird", "netbird-dashboard", "netbird-server"] {
let _ = podman_stack_status(&["rm", "-f", name], PODMAN_STACK_PROBE_TIMEOUT).await;
}
let _ = podman_stack_status(
&["network", "rm", "-f", "netbird-net"],
PODMAN_STACK_PROBE_TIMEOUT,
) )
.await;
self.set_install_phase("netbird", InstallPhase::CreatingContainer)
.await;
tokio::fs::create_dir_all("/var/lib/archipelago/netbird/data")
.await
.context("Failed to create NetBird data directory")?;
let host_ip = detect_netbird_public_host_ip()
.await
.unwrap_or_else(|| self.config.host_ip.clone());
// Create the network FIRST so we can read back the gateway it was
// assigned — that gateway is Podman's aardvark DNS, which the proxy's
// nginx needs as an explicit `resolver` to re-resolve container names
// (issue #15: without it nginx caches a container IP and 502s forever
// once that IP changes on restart/reboot).
let _ = podman_stack_status(
&["network", "create", "netbird-net"],
PODMAN_STACK_PROBE_TIMEOUT,
)
.await;
let resolver_ip = netbird_net_resolver_ip().await;
write_netbird_config_files(&host_ip, &self.config.host_ip, &resolver_ip).await?;
ensure_netbird_tls_cert(&host_ip).await?;
let mut server_cmd = tokio::process::Command::new("podman");
server_cmd.args([
"run",
"-d",
"--name",
"netbird-server",
"--network",
"netbird-net",
"--network-alias",
"netbird-server",
"--restart=unless-stopped",
"-p",
"8086:80",
"-p",
"3478:3478/udp",
"-v",
"/var/lib/archipelago/netbird/data:/var/lib/netbird",
"-v",
"/var/lib/archipelago/netbird/config.yaml:/etc/netbird/config.yaml:ro",
NETBIRD_SERVER_IMAGE,
"--config",
"/etc/netbird/config.yaml",
]);
run_required_stack_command("netbird", "create server", &mut server_cmd).await?;
self.set_install_phase("netbird", InstallPhase::StartingContainer)
.await;
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
let mut dashboard_cmd = tokio::process::Command::new("podman");
dashboard_cmd.args([
"run",
"-d",
"--name",
"netbird-dashboard",
"--network",
"netbird-net",
// Explicit alias so the proxy can always resolve `netbird-dashboard`
// via Podman DNS — don't rely on implicit container-name aliasing.
"--network-alias",
"netbird-dashboard",
"--restart=unless-stopped",
"--env-file",
"/var/lib/archipelago/netbird/dashboard.env",
NETBIRD_DASHBOARD_IMAGE,
]);
run_required_stack_command("netbird", "create dashboard", &mut dashboard_cmd).await?;
let mut proxy_cmd = tokio::process::Command::new("podman");
proxy_cmd.args([
"run",
"-d",
"--name",
"netbird",
"--network",
"netbird-net",
"--restart=unless-stopped",
// 8087 publishes the TLS listener — netbird's dashboard requires a
// secure context (window.crypto.subtle / OIDC PKCE), issue #15.
"-p",
"8087:443",
"-v",
"/var/lib/archipelago/netbird/nginx.conf:/etc/nginx/conf.d/default.conf:ro",
"-v",
"/var/lib/archipelago/netbird/tls.crt:/etc/nginx/tls.crt:ro",
"-v",
"/var/lib/archipelago/netbird/tls.key:/etc/nginx/tls.key:ro",
NETBIRD_PROXY_IMAGE,
]);
run_required_stack_command("netbird", "create unified proxy", &mut proxy_cmd).await?;
wait_for_stack_containers(
"netbird",
&["netbird-server", "netbird-dashboard", "netbird"],
60,
)
.await?;
self.set_install_phase("netbird", InstallPhase::WaitingHealthy)
.await;
// Containers being "running" is NOT the same as the embedded OIDC
// provider being ready (#10). The dashboard SPA opens right after install
// and, if it loads before /oauth2/.well-known is served, caches a bad
// auth state — the user appears logged-in but can't log out until it
// self-corrects. Wait (best-effort) for OIDC discovery to answer before
// we report Done, so the first dashboard load sees a ready provider.
wait_for_netbird_oidc_ready(Duration::from_secs(60)).await;
self.set_install_phase("netbird", InstallPhase::PostInstall)
.await;
self.set_install_phase("netbird", InstallPhase::Done).await;
self.clear_install_progress("netbird").await;
install_log("INSTALL OK: netbird stack").await;
info!("NetBird stack installed");
Ok(serde_json::json!({
"success": true,
"package_id": "netbird",
"message": "NetBird self-hosted stack installed",
}))
} }
} }
/// Best-effort wait for NetBird's embedded OIDC provider to start serving its
/// discovery document. The management server publishes 8086:80 on the host and
/// is the issuer at `/oauth2`, so its `.well-known/openid-configuration` is the
/// signal that the dashboard's login/logout flow will work. Polls until a 2xx
/// or the timeout — NEVER fails the install (the stack is already running; this
/// only narrows the post-install race window in #10).
async fn wait_for_netbird_oidc_ready(timeout: Duration) {
let url = "http://127.0.0.1:8086/oauth2/.well-known/openid-configuration";
let client = match reqwest::Client::builder()
.timeout(Duration::from_secs(5))
.build()
{
Ok(c) => c,
Err(_) => return,
};
let deadline = tokio::time::Instant::now() + timeout;
loop {
if let Ok(resp) = client.get(url).send().await {
if resp.status().is_success() {
info!("NetBird OIDC discovery is ready");
return;
}
}
if tokio::time::Instant::now() >= deadline {
info!("NetBird OIDC discovery not ready within timeout — proceeding anyway");
return;
}
tokio::time::sleep(Duration::from_secs(2)).await;
}
}
async fn read_or_generate_b64_secret(name: &str) -> String {
let path = format!("/var/lib/archipelago/secrets/{}", name);
if let Ok(val) = tokio::fs::read_to_string(&path).await {
let trimmed = val.trim().to_string();
if !trimmed.is_empty() {
return trimmed;
}
}
let mut buf = [0u8; 32];
rand::RngCore::fill_bytes(&mut rand::rngs::OsRng, &mut buf);
let secret = base64::engine::general_purpose::STANDARD.encode(buf);
let _ = tokio::fs::create_dir_all("/var/lib/archipelago/secrets").await;
let _ = tokio::fs::write(&path, &secret).await;
secret
}
/// Read the gateway of the `netbird-net` bridge. Podman runs its aardvark DNS
/// resolver on this address, so nginx can use it as an explicit `resolver` to
/// re-resolve container names at request time. Falls back to Podman's usual
/// first-pool gateway if the inspect fails (best effort — config is rewritten
/// on every (re)install).
async fn netbird_net_resolver_ip() -> String {
let out = tokio::process::Command::new("podman")
.args([
"network",
"inspect",
"netbird-net",
"--format",
"{{range .Subnets}}{{.Gateway}}{{end}}",
])
.output()
.await;
if let Ok(o) = out {
let gw = String::from_utf8_lossy(&o.stdout).trim().to_string();
if !gw.is_empty() && gw.parse::<std::net::IpAddr>().is_ok() {
return gw;
}
}
"10.89.0.1".to_string()
}
/// Generate a self-signed TLS cert for the netbird proxy if absent. The
/// dashboard needs a secure context (window.crypto.subtle / OIDC PKCE), so the
/// proxy serves HTTPS; a self-signed cert is sufficient (the user accepts it
/// once when opening netbird in a tab). SAN covers the LAN IP plus
/// localhost/127.0.0.1 so it's valid however the box is reached locally.
async fn ensure_netbird_tls_cert(host_ip: &str) -> Result<()> {
let dir = "/var/lib/archipelago/netbird";
let crt = format!("{dir}/tls.crt");
let key = format!("{dir}/tls.key");
if tokio::fs::metadata(&crt).await.is_ok() && tokio::fs::metadata(&key).await.is_ok() {
return Ok(());
}
let _ = tokio::fs::create_dir_all(dir).await;
let san = format!("subjectAltName=IP:{host_ip},IP:127.0.0.1,DNS:localhost");
let status = tokio::process::Command::new("openssl")
.args([
"req",
"-x509",
"-newkey",
"rsa:2048",
"-nodes",
"-keyout",
&key,
"-out",
&crt,
"-days",
"3650",
"-subj",
&format!("/CN={host_ip}"),
"-addext",
&san,
])
.status()
.await
.context("failed to run openssl for netbird TLS cert")?;
if !status.success() {
anyhow::bail!("openssl failed to generate netbird TLS cert");
}
Ok(())
}
async fn write_netbird_config_files(host_ip: &str, lan_ip: &str, resolver_ip: &str) -> Result<()> {
// netbird's dashboard uses window.crypto.subtle (OIDC PKCE), which browsers
// only expose in a SECURE context — so the proxy serves HTTPS and every
// origin here is https (issue #15: over plain http the dashboard threw
// "window.crypto.subtle is unavailable" and never reached login).
let public_origin = format!("https://{}:8087", host_ip);
let server_origin = format!("http://{}:8086", host_ip);
// A single box is reached via several addresses. Allow the OIDC login flow
// to redirect back to whichever origin the user actually used, otherwise
// post-login lands on the wrong host and the dashboard shows
// "Unauthenticated" (issue #15). The browser-side CORS is handled in the
// nginx proxy; this covers the redirect-URI allow-list.
let lan_origin = format!("https://{}:8087", lan_ip);
let mut redirect_origins = vec![public_origin.clone()];
if lan_origin != public_origin {
redirect_origins.push(lan_origin);
}
let dashboard_redirect_uris = redirect_origins
.iter()
.flat_map(|o| {
[
format!(" - \"{o}/nb-auth\""),
format!(" - \"{o}/nb-silent-auth\""),
]
})
.collect::<Vec<_>>()
.join("\n");
let dashboard_logout_uris = redirect_origins
.iter()
.map(|o| format!(" - \"{o}/\""))
.collect::<Vec<_>>()
.join("\n");
let relay_secret = read_or_generate_b64_secret("netbird-relay-auth-secret").await;
let encryption_key = read_or_generate_b64_secret("netbird-store-encryption-key").await;
let config = format!(
r#"server:
listenAddress: ":80"
exposedAddress: "{public_origin}"
stunPorts:
- 3478
metricsPort: 9090
healthcheckAddress: ":9000"
logLevel: "info"
logFile: "console"
authSecret: "{relay_secret}"
dataDir: "/var/lib/netbird"
auth:
issuer: "{public_origin}/oauth2"
localAuthDisabled: false
signKeyRefreshEnabled: false
dashboardRedirectURIs:
{dashboard_redirect_uris}
dashboardPostLogoutRedirectURIs:
{dashboard_logout_uris}
cliRedirectURIs:
- "http://localhost:53000/"
store:
engine: "sqlite"
encryptionKey: "{encryption_key}"
"#
);
tokio::fs::write("/var/lib/archipelago/netbird/config.yaml", config)
.await
.context("Failed to write NetBird config.yaml")?;
let dashboard_env = format!(
r#"NETBIRD_MGMT_API_ENDPOINT={public_origin}
NETBIRD_MGMT_GRPC_API_ENDPOINT={public_origin}
AUTH_AUDIENCE=netbird-dashboard
AUTH_CLIENT_ID=netbird-dashboard
AUTH_CLIENT_SECRET=
AUTH_AUTHORITY={public_origin}/oauth2
USE_AUTH0=false
AUTH_SUPPORTED_SCOPES=openid profile email groups
AUTH_REDIRECT_URI=/nb-auth
AUTH_SILENT_REDIRECT_URI=/nb-silent-auth
NETBIRD_TOKEN_SOURCE=idToken
NGINX_SSL_PORT=443
LETSENCRYPT_DOMAIN=none
"#
);
tokio::fs::write("/var/lib/archipelago/netbird/dashboard.env", dashboard_env)
.await
.context("Failed to write NetBird dashboard.env")?;
let nginx_conf = format!(
r#"server {{
listen 443 ssl;
server_name _;
# netbird's dashboard needs a secure context (window.crypto.subtle for OIDC
# PKCE), so the proxy terminates TLS with a self-signed cert (issue #15).
ssl_certificate /etc/nginx/tls.crt;
ssl_certificate_key /etc/nginx/tls.key;
# Rootless Podman can hand a container a new IP across restarts/reboots.
# nginx resolves a literal upstream name ONCE at startup and caches it, so
# after the IP moves every request 502s with "host unreachable" (issue #15,
# observed live on .198: nginx pinned to a dead netbird-dashboard IP). Fix:
# point `resolver` at the netbird-net gateway (Podman's aardvark DNS) and
# use VARIABLE upstreams, which forces nginx to re-resolve the container
# names at request time. Everything is reached container-to-container by
# name so nothing depends on host-published ports either.
resolver {resolver_ip} valid=10s ipv6=off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
location ~ ^/(relay|ws-proxy/) {{
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 1d;
}}
location ~ ^/(api|oauth2)(/|$) {{
# The dashboard is a SPA whose API/OIDC base URL is baked at build time
# to one host:port. A single box is reached via several addresses (LAN
# IP, Tailscale 100.x, hostname), so those fetches are cross-origin and
# the browser blocks them with no Access-Control-Allow-Origin (issue
# #15, observed live on .198). Reflect the caller's Origin so the
# self-hosted management/OIDC API is reachable from any of them, and
# answer the CORS preflight here.
if ($request_method = OPTIONS) {{
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
add_header Access-Control-Max-Age 86400 always;
add_header Content-Length 0;
return 204;
}}
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
}}
location ~ ^/(signalexchange\.SignalExchange|management\.ManagementService|management\.ProxyService)/ {{
set $nb_server netbird-server;
grpc_pass grpc://$nb_server:80;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
}}
# OIDC callback routes are client-side SPA routes with NO prebuilt page in
# the dashboard bundle, so proxying them straight through 404s which
# crashes the dashboard's auth init and shows "Unauthenticated" with dead
# buttons (issue #15, confirmed live on .198: /nb-auth + /nb-silent-auth
# returned 404). Serve the dashboard's index.html at these paths (URL
# unchanged) so react-oidc boots and completes the login / silent-SSO.
location ~ ^/(nb-auth|nb-silent-auth) {{
set $nb_dashboard netbird-dashboard;
rewrite ^.*$ /index.html break;
proxy_pass http://$nb_dashboard:80;
}}
location / {{
set $nb_dashboard netbird-dashboard;
proxy_pass http://$nb_dashboard:80;
}}
}}
# Direct server remains available for diagnostics at {server_origin}.
"#
);
tokio::fs::write("/var/lib/archipelago/netbird/nginx.conf", nginx_conf)
.await
.context("Failed to write NetBird nginx.conf")?;
Ok(())
}
async fn detect_netbird_public_host_ip() -> Option<String> {
let output = tokio::process::Command::new("hostname")
.args(["-I"])
.output()
.await
.ok()?;
let stdout = String::from_utf8_lossy(&output.stdout);
let ips: Vec<&str> = stdout
.split_whitespace()
.filter(|s| s.contains('.'))
.collect();
// Prefer the LAN address as the canonical origin — that's what users browse
// to on the local network. Baking the Tailscale 100.x address here broke
// LAN access with cross-origin/redirect mismatches (issue #15). Tailscale
// (100.64.0.0/10 CGNAT) is only a fallback for nodes with no LAN IP.
let is_private_lan = |ip: &str| {
ip.starts_with("192.168.")
|| ip.starts_with("10.")
|| (ip.starts_with("172.")
&& ip
.split('.')
.nth(1)
.and_then(|o| o.parse::<u8>().ok())
.map(|o| (16..=31).contains(&o))
.unwrap_or(false))
};
if let Some(lan) = ips.iter().find(|ip| is_private_lan(ip)) {
return Some(lan.to_string());
}
ips.iter()
.find(|ip| ip.starts_with("100."))
.map(|s| s.to_string())
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::{btcpay_stack_app_ids, mempool_stack_app_ids}; use super::{btcpay_stack_app_ids, mempool_stack_app_ids};

View File

@ -32,19 +32,27 @@ impl RpcHandler {
.ok_or_else(|| anyhow::anyhow!("Missing package id"))?; .ok_or_else(|| anyhow::anyhow!("Missing package id"))?;
validate_app_id(package_id)?; validate_app_id(package_id)?;
// Verify an update is actually available. Prefer the remote app catalog // Resolve the target image. Prefer the remote app catalog (decoupled
// (decoupled from the binary OTA), falling back to the image-versions.sh // from the binary OTA), falling back to the image-versions.sh pin. This
// pin when the catalog is absent or doesn't cover this app. // is OPTIONAL for orchestrator-managed apps: the orchestrator resolves
// the image itself (manifest + catalog + version_config pin) in its
// upgrade path, so an app the catalog doesn't carry a primary image for
// (e.g. bitcoin-core, image lives in the embedded manifest + versions[])
// still upgrades. Only the legacy/stack path below hard-requires it.
let pinned = crate::container::app_catalog::catalog_primary_image(package_id) let pinned = crate::container::app_catalog::catalog_primary_image(package_id)
.or_else(|| image_versions::pinned_image_for_app(package_id)) .or_else(|| image_versions::pinned_image_for_app(package_id));
.ok_or_else(|| anyhow::anyhow!("No pinned image found for {}", package_id))?;
// Note: the `already updating` guard lives in `spawn_package_update` // Note: the `already updating` guard lives in `spawn_package_update`
// (the async wrapper that dispatch actually routes to). By the time // (the async wrapper that dispatch actually routes to). By the time
// this inner function runs, the wrapper has already flipped state to // this inner function runs, the wrapper has already flipped state to
// `Updating`, so duplicating the check here would be a false positive. // `Updating`, so duplicating the check here would be a false positive.
install_log(&format!("UPDATE: {}{}", package_id, pinned)).await; install_log(&format!(
"UPDATE: {} → {}",
package_id,
pinned.as_deref().unwrap_or("(orchestrator-resolved)")
))
.await;
// Set state to Updating // Set state to Updating
{ {
@ -114,6 +122,16 @@ impl RpcHandler {
} }
} }
// Legacy/stack path hard-requires a concrete primary image (the
// orchestrator path above already returned for apps it manages).
let pinned = match pinned {
Some(p) => p,
None => {
self.clear_update_state(package_id).await;
return Err(anyhow::anyhow!("No pinned image found for {}", package_id));
}
};
// Resolve images to pull — either a stack or single container // Resolve images to pull — either a stack or single container
let images_to_pull = self.resolve_images_to_pull(package_id, &pinned); let images_to_pull = self.resolve_images_to_pull(package_id, &pinned);

View File

@ -26,6 +26,36 @@ impl Drop for OnboardingMnemonicState {
const MNEMONIC_TTL: std::time::Duration = std::time::Duration::from_secs(600); // 10 minutes const MNEMONIC_TTL: std::time::Duration = std::time::Duration::from_secs(600); // 10 minutes
/// Persist the pending onboarding mnemonic as `identity/master_seed.enc`,
/// encrypted with `passphrase`. Called from `auth.setup` — the first moment a
/// user password exists — so "Reveal recovery phrase" works after onboarding
/// without the frontend having to remember a separate save step (it never
/// did, which left every onboarded node with no encrypted seed backup).
///
/// Deliberately ignores MNEMONIC_TTL: the mnemonic stays in memory until
/// overwritten regardless, so using it here widens nothing, and onboarding
/// legitimately takes longer than 10 minutes when the user carefully writes
/// down 24 words. Clears the in-memory copy on success — password setup is
/// the end of onboarding, so the plaintext no longer needs to linger.
///
/// Returns Ok(true) if a seed was saved, Ok(false) if none was pending.
pub(in crate::api::rpc) async fn save_pending_seed_encrypted(
data_dir: &std::path::Path,
passphrase: &str,
) -> Result<bool> {
let mut state = ONBOARDING_MNEMONIC.lock().await;
let Some(pending) = state.as_ref() else {
return Ok(false);
};
let mnemonic: bip39::Mnemonic = pending
.words
.parse()
.context("Invalid mnemonic in memory")?;
crate::seed::save_seed_encrypted(data_dir, &mnemonic, passphrase).await?;
*state = None;
Ok(true)
}
/// Best-effort: install fips.yaml + start archipelago-fips.service after the /// Best-effort: install fips.yaml + start archipelago-fips.service after the
/// seed onboarding has written the fips_key to disk. Runs in a detached task /// seed onboarding has written the fips_key to disk. Runs in a detached task
/// so the user-facing RPC returns immediately — the systemctl calls can take /// so the user-facing RPC returns immediately — the systemctl calls can take
@ -208,6 +238,17 @@ impl RpcHandler {
let phrase = words.join(" "); let phrase = words.join(" ");
let (_mnemonic, seed) = crate::seed::MasterSeed::from_mnemonic_words(&phrase)?; let (_mnemonic, seed) = crate::seed::MasterSeed::from_mnemonic_words(&phrase)?;
// Stash the restored words like seed.generate does, so auth.setup can
// persist the encrypted backup once the user's password exists and
// "Reveal recovery phrase" works on restored nodes too.
{
let mut state = ONBOARDING_MNEMONIC.lock().await;
*state = Some(OnboardingMnemonicState {
words: phrase.clone(),
created_at: std::time::Instant::now(),
});
}
// Derive and write node Ed25519 key. // Derive and write node Ed25519 key.
let identity_dir = self.config.data_dir.join("identity"); let identity_dir = self.config.data_dir.join("identity");
crate::identity::NodeIdentity::from_seed(&identity_dir, &seed).await?; crate::identity::NodeIdentity::from_seed(&identity_dir, &seed).await?;

View File

@ -47,6 +47,17 @@ impl RpcHandler {
} }
}; };
// Keep the self-signed HTTPS cert's SAN in sync with the new hostname —
// best-effort, never blocks the rename itself. Without this the cert
// stays pinned to whatever name was set at install time, so browsers
// hit a hostname-mismatch warning on top of the usual self-signed one
// the moment a node is renamed.
if hostname_updated {
if let Err(e) = regenerate_tls_cert(&hostname).await {
warn!(hostname = %hostname, "TLS cert regen after rename failed: {}", e);
}
}
info!("Server name updated to: {}", name); info!("Server name updated to: {}", name);
// Push the new name to federation peers in background // Push the new name to federation peers in background
@ -66,6 +77,70 @@ impl RpcHandler {
})) }))
} }
/// server.set-location — Set this node's own lat/lon + whether to share
/// it with trusted federation peers (for the Mesh Map). `lat`/`lon` are
/// optional so a caller can flip `share` off without clearing the saved
/// position, or clear the position by passing nulls.
pub(in crate::api::rpc) async fn handle_server_set_location(
&self,
params: Option<serde_json::Value>,
) -> Result<serde_json::Value> {
let params = params.ok_or_else(|| anyhow::anyhow!("Missing params"))?;
let lat = params.get("lat").and_then(|v| v.as_f64());
let lon = params.get("lon").and_then(|v| v.as_f64());
let share_location = params
.get("share")
.and_then(|v| v.as_bool())
.ok_or_else(|| anyhow::anyhow!("Missing required parameter: share"))?;
if let (Some(lat), Some(lon)) = (lat, lon) {
if !(-90.0..=90.0).contains(&lat) || !(-180.0..=180.0).contains(&lon) {
anyhow::bail!("Invalid lat/lon");
}
}
let location_file = self.config.data_dir.join("server-location.json");
let payload = serde_json::json!({ "lat": lat, "lon": lon, "share_location": share_location });
tokio::fs::write(&location_file, serde_json::to_vec(&payload)?)
.await
.context("Failed to write server location")?;
let (mut data, _) = self.state_manager.get_snapshot().await;
data.server_info.lat = lat;
data.server_info.lon = lon;
data.server_info.share_location = share_location;
self.state_manager.update_data(data).await;
info!(share_location, "Server location updated");
// Push the new location to federation peers in background, same as
// a rename — trusted peers' next state sync picks it up.
let data_dir = self.config.data_dir.clone();
let state_manager = self.state_manager.clone();
tokio::spawn(async move {
if let Err(e) = push_name_to_peers(&data_dir, &state_manager).await {
debug!("Federation location push (non-fatal): {}", e);
}
});
Ok(serde_json::json!({ "lat": lat, "lon": lon, "share_location": share_location }))
}
/// system.get-hostname — Current OS hostname + the mDNS `.local` name it
/// resolves to on the LAN (avahi-daemon advertises `<hostname>.local`).
/// Lets Settings show users where to reach this node over HTTPS for
/// features (mic/camera access) that require a secure context.
pub(in crate::api::rpc) async fn handle_system_get_hostname(&self) -> Result<serde_json::Value> {
let hostname = tokio::fs::read_to_string("/etc/hostname")
.await
.map(|s| s.trim().to_string())
.unwrap_or_else(|_| "archipelago".to_string());
Ok(serde_json::json!({
"hostname": hostname,
"mdns_hostname": format!("{hostname}.local"),
}))
}
/// system.stats — CPU usage, RAM used/total, disk used/total, uptime, load average /// system.stats — CPU usage, RAM used/total, disk used/total, uptime, load average
pub(in crate::api::rpc) async fn handle_system_stats(&self) -> Result<serde_json::Value> { pub(in crate::api::rpc) async fn handle_system_stats(&self) -> Result<serde_json::Value> {
debug!("Getting system stats"); debug!("Getting system stats");
@ -319,6 +394,63 @@ async fn set_system_hostname(hostname: &str) -> Result<()> {
Ok(()) Ok(())
} }
/// Regenerate the self-signed HTTPS cert (`/etc/archipelago/ssl/archipelago.{crt,key}`)
/// with a SAN covering `hostname`, `hostname.local`, `localhost`, and 127.0.0.1, then
/// reload nginx so it picks up the new cert. Still self-signed (browsers will warn
/// on first visit regardless), but avoids stacking a hostname-mismatch warning on
/// top once a node has been renamed away from the install-time default.
async fn regenerate_tls_cert(hostname: &str) -> Result<()> {
let subj = format!("/C=XX/ST=Bitcoin/L=Node/O=Archipelago/CN={hostname}");
let san = format!("subjectAltName=DNS:{hostname},DNS:{hostname}.local,DNS:localhost,IP:127.0.0.1");
let output = tokio::process::Command::new("/usr/bin/sudo")
.args([
"-n",
"/usr/bin/openssl",
"req",
"-x509",
"-nodes",
"-days",
"3650",
"-newkey",
"rsa:2048",
"-keyout",
"/etc/archipelago/ssl/archipelago.key",
"-out",
"/etc/archipelago/ssl/archipelago.crt",
"-subj",
&subj,
"-addext",
&san,
])
.output()
.await
.context("Failed to run openssl")?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr).trim().to_string();
anyhow::bail!(
"{}",
if stderr.is_empty() {
"openssl cert regen failed".to_string()
} else {
stderr
}
);
}
let reload = tokio::process::Command::new("/usr/bin/sudo")
.args(["-n", "/usr/bin/systemctl", "reload", "nginx"])
.output()
.await
.context("Failed to reload nginx")?;
if !reload.status.success() {
let stderr = String::from_utf8_lossy(&reload.stderr).trim().to_string();
anyhow::bail!("nginx reload failed: {}", stderr);
}
Ok(())
}
impl RpcHandler { impl RpcHandler {
/// system.factory-reset — Wipe all user data, remove containers, and restart. /// system.factory-reset — Wipe all user data, remove containers, and restart.
/// Only preserves the data_dir itself (recreated empty on restart). /// Only preserves the data_dir itself (recreated empty on restart).

View File

@ -16,12 +16,11 @@ impl RpcHandler {
// Spendable Fedimint balance too, so callers (e.g. the pay-for-file // Spendable Fedimint balance too, so callers (e.g. the pay-for-file
// pre-check) see funds available across BOTH backends (#3). Best-effort: // pre-check) see funds available across BOTH backends (#3). Best-effort:
// if fmcd isn't installed/joined this is just 0, never an error. // if fmcd isn't installed/joined this is just 0, never an error.
let fedimint_sats = match fedimint_client::FedimintClient::from_node(&self.config.data_dir) let fedimint_sats =
.await match fedimint_client::FedimintClient::from_node(&self.config.data_dir).await {
{ Ok(client) => client.total_balance_sats().await.unwrap_or(0),
Ok(client) => client.total_balance_sats().await.unwrap_or(0), Err(_) => 0,
Err(_) => 0, };
};
Ok(serde_json::json!({ Ok(serde_json::json!({
// `balance_sats` stays Cashu-only for back-compat; `total_sats` is the // `balance_sats` stays Cashu-only for back-compat; `total_sats` is the
// spendable amount across Cashu + Fedimint. // spendable amount across Cashu + Fedimint.

View File

@ -101,19 +101,45 @@ fn friendly_transient_error(has_cached_state: bool, err_msg: &str) -> String {
.trim_end_matches('.'); .trim_end_matches('.');
let lower = detail.to_lowercase(); let lower = detail.to_lowercase();
let state = if lower.contains("verifying blocks") { let state = if lower.contains("verifying blocks") {
"verifying blocks after restart" Some("verifying blocks after restart")
} else if lower.contains("connection reset") {
Some("starting up and not yet accepting RPC connections")
} else if lower.contains("connection refused") || lower.contains("tcp connect error") { } else if lower.contains("connection refused") || lower.contains("tcp connect error") {
"waiting for the Bitcoin RPC listener" Some("waiting for the Bitcoin RPC listener")
} else if lower.contains("timed out") || lower.contains("timeout") { } else if lower.contains("timed out") || lower.contains("timeout") {
"busy and not answering RPC before the timeout" Some("busy and not answering RPC before the timeout")
} else { } else {
"starting or busy syncing" None
}; };
if has_cached_state { // Recognized transient causes get a clean human sentence only — the raw
format!("Bitcoin node is {state}; showing last known state and retrying. Detail: {detail}") // transport error (URLs, repeated "os error 104" chains) is operator
// noise that was ending up verbatim on the app card. Unrecognized errors
// keep a bounded detail so a genuinely new failure stays diagnosable.
let (state, detail) = match state {
Some(state) => (state, None),
None => (
"starting or busy syncing",
Some(if detail.len() > 120 {
let mut cut = 120;
while !detail.is_char_boundary(cut) {
cut -= 1;
}
format!("{}", &detail[..cut])
} else {
detail.to_string()
}),
),
};
let base = if has_cached_state {
format!("Bitcoin node is {state}; showing last known state and retrying.")
} else { } else {
format!("Bitcoin node is {state}; retrying automatically. Detail: {detail}") format!("Bitcoin node is {state}; retrying automatically.")
};
match detail {
Some(detail) => format!("{base} Detail: {detail}"),
None => base,
} }
} }
@ -278,4 +304,39 @@ mod tests {
assert!(msg.contains("busy and not answering RPC before the timeout")); assert!(msg.contains("busy and not answering RPC before the timeout"));
} }
#[test]
fn connection_reset_gets_clean_message_without_raw_detail() {
// The exact string a fresh install showed on the app card: the raw
// reqwest chain (URL + repeated "os error 104") must not surface.
let msg = friendly_transient_error(
false,
"getblockchaininfo: Bitcoin RPC request failed: error sending request for url (http://127.0.0.1:8332/): connection error: Connection reset by peer (os error 104): connection error: Connection reset by peer (os error 104): Connection reset by peer (os error 104)",
);
assert!(msg.contains("starting up and not yet accepting RPC connections"));
assert!(!msg.contains("os error"));
assert!(!msg.contains("127.0.0.1"));
assert!(!msg.contains("Detail:"));
}
#[test]
fn recognized_causes_omit_detail_entirely() {
for raw in [
"x: Connection refused (os error 111)",
"x: operation timed out",
r#"x: {"error":{"code":-28,"message":"Verifying blocks..."}}"#,
] {
let msg = friendly_transient_error(false, raw);
assert!(!msg.contains("Detail:"), "leaked detail for: {raw}");
}
}
#[test]
fn unknown_errors_keep_bounded_detail() {
let long = format!("weird new failure {}", "x".repeat(300));
let msg = friendly_transient_error(false, &long);
assert!(msg.contains("Detail: weird new failure"));
assert!(msg.len() < 260);
}
} }

View File

@ -39,6 +39,16 @@ const KIOSK_LAUNCHER: &str =
const KIOSK_SERVICE_PATH: &str = "/etc/systemd/system/archipelago-kiosk.service"; const KIOSK_SERVICE_PATH: &str = "/etc/systemd/system/archipelago-kiosk.service";
const KIOSK_LAUNCHER_PATH: &str = "/usr/local/bin/archipelago-kiosk-launcher"; const KIOSK_LAUNCHER_PATH: &str = "/usr/local/bin/archipelago-kiosk-launcher";
// Journald log-volume policy (size cap + per-service rate limit). Fresh ISOs
// write the identical file at build time (image-recipe/_archived/
// build-auto-installer-iso.sh); this heals already-deployed nodes via OTA.
// A fresh node produced >1 GB/day of journal (bitcoind IBD console spam plus
// debug-level backend logging) — the cap bounds disk use and the rate limit
// keeps one chatty service from drowning everything else.
const JOURNALD_DROPIN: &str =
include_str!("../../../image-recipe/configs/journald-archipelago.conf");
const JOURNALD_DROPIN_PATH: &str = "/etc/systemd/journald.conf.d/10-archipelago-persistent.conf";
const NGINX_CONF_PATH: &str = "/etc/nginx/sites-available/archipelago"; const NGINX_CONF_PATH: &str = "/etc/nginx/sites-available/archipelago";
const NGINX_ENABLED_CONF_PATH: &str = "/etc/nginx/sites-enabled/archipelago"; const NGINX_ENABLED_CONF_PATH: &str = "/etc/nginx/sites-enabled/archipelago";
/// Per-app proxy snippet included by the HTTPS (:443) server block. Carries its /// Per-app proxy snippet included by the HTTPS (:443) server block. Carries its
@ -120,6 +130,11 @@ pub async fn ensure_doctor_installed() {
Ok(false) => debug!("Bitcoin RPC bind settings already usable"), Ok(false) => debug!("Bitcoin RPC bind settings already usable"),
Err(e) => warn!("Bitcoin RPC repair failed (non-fatal): {:#}", e), Err(e) => warn!("Bitcoin RPC repair failed (non-fatal): {:#}", e),
} }
match run_journald_dropin().await {
Ok(true) => info!("Installed journald log-volume policy drop-in"),
Ok(false) => debug!("journald log-volume policy already in place"),
Err(e) => warn!("journald drop-in bootstrap failed (non-fatal): {:#}", e),
}
match tighten_secrets_dir().await { match tighten_secrets_dir().await {
Ok(n) if n > 0 => info!(tightened = n, "Tightened mode on secret files"), Ok(n) if n > 0 => info!(tightened = n, "Tightened mode on secret files"),
Ok(_) => debug!("Secrets directory already at expected mode"), Ok(_) => debug!("Secrets directory already at expected mode"),
@ -408,6 +423,14 @@ ensure_line() {
ensure_line server=1 ensure_line server=1
ensure_line rpcallowip=0.0.0.0/0 ensure_line rpcallowip=0.0.0.0/0
ensure_line listen=1 ensure_line listen=1
# Log-volume fix: printtoconsole=1 duplicated every log line (incl. per-block
# IBD "UpdateTip" spam) into journald via conmon on top of the datadir
# debug.log bitcoind already writes. Console off; debug.log stays (bitcoind
# self-shrinks it on restart).
if grep -q '^printtoconsole=1' "$conf"; then
sed -i 's/^printtoconsole=1$/printtoconsole=0/' "$conf"
changed=1
fi
[ "$changed" -eq 0 ] && exit 0 [ "$changed" -eq 0 ] && exit 0
exit 2 exit 2
"#; "#;
@ -428,6 +451,44 @@ exit 2
} }
} }
/// Install the journald log-volume policy drop-in (JOURNALD_DROPIN) so nodes
/// deployed before the ISO shipped it get the size cap + rate limit via OTA.
/// Idempotent; restarts journald only when the file actually changed (safe:
/// the sockets are held by pid1, so at most a few messages queue briefly).
async fn run_journald_dropin() -> Result<bool> {
// Same dev-box guards as the doctor bootstrap: never touch /etc on
// contributors' laptops (symlinked or absent /home/archipelago/archy).
let home_archy = Path::new("/home/archipelago/archy");
if fs::symlink_metadata(home_archy)
.await
.map(|m| m.file_type().is_symlink())
.unwrap_or(false)
{
debug!("/home/archipelago/archy is a symlink — skipping journald bootstrap (dev box)");
return Ok(false);
}
if fs::metadata(home_archy).await.is_err() {
debug!("/home/archipelago/archy missing — skipping journald bootstrap");
return Ok(false);
}
let dropin_dir = "/etc/systemd/journald.conf.d";
let status = host_sudo(&["mkdir", "-p", dropin_dir])
.await
.with_context(|| format!("mkdir {}", dropin_dir))?;
if !status.success() {
anyhow::bail!("mkdir {} exited with {}", dropin_dir, status);
}
let changed = write_root_if_needed(JOURNALD_DROPIN_PATH, JOURNALD_DROPIN).await?;
if changed {
if let Err(e) = host_sudo(&["systemctl", "restart", "systemd-journald"]).await {
warn!("journald restart after drop-in update failed: {:#}", e);
}
}
Ok(changed)
}
async fn run() -> Result<bool> { async fn run() -> Result<bool> {
// Dev-box guard: on contributors' laptops `/home/archipelago/archy` is // Dev-box guard: on contributors' laptops `/home/archipelago/archy` is
// typically a symlink into the git checkout, and writing through it // typically a symlink into the git checkout, and writing through it

View File

@ -66,7 +66,7 @@ pub struct Config {
/// through Quadlet (`.container` units in ~/.config/containers/systemd /// through Quadlet (`.container` units in ~/.config/containers/systemd
/// + systemctl --user start) instead of `podman create + start`. Default /// + systemctl --user start) instead of `podman create + start`. Default
/// off so the legacy path stays the production path until the harness /// off so the legacy path stays the production path until the harness
/// at tests/lifecycle/run-20x.sh has gone green against the new path /// at tests/lifecycle/run-gate.sh has gone green against the new path
/// on .228 + .198. See `project_v1_7_52_phase3_quadlet_design`. /// on .228 + .198. See `project_v1_7_52_phase3_quadlet_design`.
#[serde(default)] #[serde(default)]
pub use_quadlet_backends: bool, pub use_quadlet_backends: bool,
@ -487,7 +487,7 @@ mod tests {
#[test] #[test]
fn test_config_use_quadlet_backends_defaults_off() { fn test_config_use_quadlet_backends_defaults_off() {
// Phase 3.2 of v1.7.52 — the new path stays gated until the 20× // Phase 3.2 of v1.7.52 — the new path stays gated until the 5×
// harness goes green on .228 and .198. Flipping this default // harness goes green on .228 and .198. Flipping this default
// ahead of that would route every backend install through code // ahead of that would route every backend install through code
// we haven't fleet-validated yet. // we haven't fleet-validated yet.

View File

@ -86,6 +86,12 @@ pub struct AppCatalogEntry {
/// Optional human-readable changelog lines for this version. /// Optional human-readable changelog lines for this version.
#[serde(default, skip_serializing_if = "Vec::is_empty")] #[serde(default, skip_serializing_if = "Vec::is_empty")]
pub changelog: Vec<String>, pub changelog: Vec<String>,
/// Multi-version support (`docs/bitcoin-multi-version-design.md`): the bounded
/// set of versions a user may install or switch to for this app. Empty for
/// single-version apps; `version`/`image` above remain the default/latest for
/// back-compat. Old nodes ignore this field (no `deny_unknown_fields`).
#[serde(default, skip_serializing_if = "Vec::is_empty")]
pub versions: Vec<CatalogVersion>,
/// Full app manifest, embedded so the app installs from the registry alone — /// Full app manifest, embedded so the app installs from the registry alone —
/// no OTA-shipped `apps/<id>/manifest.yml`. Carried as the raw value the /// no OTA-shipped `apps/<id>/manifest.yml`. Carried as the raw value the
/// publisher signed (so it stays part of the verified preimage) and /// publisher signed (so it stays part of the verified preimage) and
@ -97,6 +103,29 @@ pub struct AppCatalogEntry {
pub manifest: Option<serde_json::Value>, pub manifest: Option<serde_json::Value>,
} }
/// One selectable version in an app's `versions[]` list. The catalog carries a
/// curated, bounded set (current + a few majors back); see
/// `docs/bitcoin-multi-version-design.md` §3 Phase 1.
#[derive(Debug, Clone, Serialize, Deserialize, Default, PartialEq, Eq)]
pub struct CatalogVersion {
/// User-facing + tag-matching version string (e.g. `31.0`,
/// `29.3.knots20260508`). Treated as the image tag.
pub version: String,
/// Concrete image reference for this version. When omitted the orchestrator
/// falls back to composing `<default-repo>:<version>` from the entry image.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub image: Option<String>,
/// Marks the default / latest version pre-selected in the install modal.
#[serde(default, skip_serializing_if = "std::ops::Not::not")]
pub default: bool,
/// Deprecated versions are still installable but badged in the UI.
#[serde(default, skip_serializing_if = "std::ops::Not::not")]
pub deprecated: bool,
/// Optional end-of-life date (YYYY-MM-DD), surfaced in the UI.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub eol: Option<String>,
}
/// Read-side cache file search order. Mirrors `image_versions.rs`: the running /// Read-side cache file search order. Mirrors `image_versions.rs`: the running
/// daemon's data dir first (via env for dev), then the canonical runtime path. /// daemon's data dir first (via env for dev), then the canonical runtime path.
fn cache_paths() -> Vec<PathBuf> { fn cache_paths() -> Vec<PathBuf> {
@ -187,6 +216,66 @@ pub fn catalog_manifest_values() -> Vec<(String, serde_json::Value)> {
.collect() .collect()
} }
/// The catalog's default/latest version string for an app (the top-level
/// `version` field), if covered. Used to decide whether an install-time
/// selection should pin (older) or track-latest (default).
pub fn catalog_default_version(app_id: &str) -> Option<String> {
entry_for(app_id)
.map(|e| e.version)
.filter(|v| !v.is_empty())
}
/// Curated, selectable versions for an app per the remote catalog. Empty when
/// the catalog is absent or the app is single-version. The default entry (if
/// any) sorts first so callers can pre-select it.
pub fn catalog_versions(app_id: &str) -> Vec<CatalogVersion> {
let mut versions = entry_for(app_id).map(|e| e.versions).unwrap_or_default();
versions.sort_by_key(|v| !v.default); // default first, stable otherwise
versions
}
/// Resolve the image for a specific selectable `version` of `app_id`, validated
/// same-repo against `manifest_image` (the same guard `catalog_image_override`
/// applies). The version's explicit `image` is used when present; otherwise the
/// repo of `manifest_image` is retagged with `version`. Returns `None` when the
/// version is unknown or would point at a different repository — the caller then
/// keeps the default resolution and the switch is refused upstream.
pub fn catalog_image_for_version(
app_id: &str,
version: &str,
manifest_image: &str,
) -> Option<String> {
let entry = catalog_versions(app_id)
.into_iter()
.find(|v| v.version == version)?;
let manifest_repo =
crate::container::image_versions::image_without_registry_or_tag(manifest_image);
let candidate = match entry.image {
Some(img) => img,
None => {
// Retag the manifest's full registry/repo with the requested version.
let repo = manifest_image
.rsplit_once(':')
// keep registry:port colons intact: only strip a tag after the last '/'
.filter(|(left, _)| left.contains('/'))
.map(|(left, _)| left)
.unwrap_or(manifest_image);
format!("{repo}:{version}")
}
};
let same_repo = crate::container::image_versions::image_without_registry_or_tag(&candidate)
== manifest_repo;
if same_repo {
Some(candidate)
} else {
warn!(
"app-catalog: ignoring version {} for {} — repo mismatch (candidate={}, manifest={})",
version, app_id, candidate, manifest_image
);
None
}
}
/// Image override for the orchestrator's install/upgrade path. Returns the /// Image override for the orchestrator's install/upgrade path. Returns the
/// catalog's primary image for `app_id` ONLY when it refers to the same /// catalog's primary image for `app_id` ONLY when it refers to the same
/// repository as the manifest's current image — a guard so a catalog typo can /// repository as the manifest's current image — a guard so a catalog typo can
@ -214,6 +303,12 @@ pub fn catalog_image_override(app_id: &str, manifest_image: &str) -> Option<Stri
/// newer catalog, nor vice-versa). Falls back to the deployed pin only when the /// newer catalog, nor vice-versa). Falls back to the deployed pin only when the
/// catalog is missing or doesn't cover the app. /// catalog is missing or doesn't cover the app.
pub fn available_update_for_app(app_id: &str, running_image: &str) -> Option<String> { pub fn available_update_for_app(app_id: &str, running_image: &str) -> Option<String> {
// A runner-pinned version is an explicit "stay here" choice — never advertise
// an update over it (design §3 Phase 3). Auto-update, when enabled, ignores
// the pin and is driven by the catalog tick, not this badge.
if crate::container::version_config::pinned_version(app_id).is_some() {
return None;
}
if let Some(catalog_image) = catalog_primary_image(app_id) { if let Some(catalog_image) = catalog_primary_image(app_id) {
// Catalog covers this app with a concrete image -> authoritative. // Catalog covers this app with a concrete image -> authoritative.
return crate::container::image_versions::available_update_for_images( return crate::container::image_versions::available_update_for_images(

View File

@ -96,6 +96,35 @@ impl BootReconciler {
} }
} }
// Companion self-heal runs on its OWN cadence, decoupled from the
// per-app reconcile pass. On a heavily loaded node `reconcile_existing`
// over dozens of apps can take well over a minute, which would delay a
// companion-unit repair (deleted/lost unit file) past any reasonable
// safety window. Detecting + rewriting a companion unit is cheap, so it
// gets a dedicated `interval` loop. The handle is aborted when the main
// loop exits (shutdown uses `notify_one`, so we must NOT add a second
// waiter on `self.shutdown` — it would steal the single wake permit).
let companion_handle = if self.companion_stage {
let orchestrator = self.orchestrator.clone();
let interval = self.interval;
Some(tokio::spawn(async move {
loop {
let installed = orchestrator.manifest_ids().await;
for (companion, err) in crate::container::companion::reconcile(&installed).await
{
tracing::warn!(
companion = %companion,
error = %err,
"companion reconcile failed"
);
}
time::sleep(interval).await;
}
}))
} else {
None
};
// Initial pass: no delay. // Initial pass: no delay.
self.tick().await; self.tick().await;
@ -111,23 +140,15 @@ impl BootReconciler {
} }
} }
} }
if let Some(handle) = companion_handle {
handle.abort();
}
} }
async fn tick(&self) { async fn tick(&self) {
let report = self.orchestrator.reconcile_existing().await; let report = self.orchestrator.reconcile_existing().await;
Self::log_report(&report); Self::log_report(&report);
if !self.companion_stage {
return;
}
let installed = self.orchestrator.manifest_ids().await;
for (companion, err) in crate::container::companion::reconcile(&installed).await {
tracing::warn!(
companion = %companion,
error = %err,
"companion reconcile failed"
);
}
} }
fn log_report(report: &ReconcileReport) { fn log_report(report: &ReconcileReport) {
@ -273,7 +294,7 @@ mod tests {
} }
async fn wait_for_status_calls(rt: &CountingRuntime, expected: u32) -> u32 { async fn wait_for_status_calls(rt: &CountingRuntime, expected: u32) -> u32 {
for _ in 0..100 { for _ in 0..1000 {
let count = rt.status_call_count(); let count = rt.status_call_count();
if count >= expected { if count >= expected {
return count; return count;
@ -320,11 +341,10 @@ mod tests {
assert_eq!(wait_for_status_calls(&rt, 1).await, 1); assert_eq!(wait_for_status_calls(&rt, 1).await, 1);
tokio::time::sleep(Duration::from_millis(20)).await; tokio::time::sleep(Duration::from_millis(20)).await;
wait_for_status_calls(&rt, 2).await; let count = wait_for_status_calls(&rt, 2).await;
assert_eq!( assert!(
rt.status_call_count(), count >= 2,
2,
"a second reconcile pass should fire after one interval" "a second reconcile pass should fire after one interval"
); );
@ -382,9 +402,7 @@ mod tests {
assert!(first >= 1, "initial pass should have touched the runtime"); assert!(first >= 1, "initial pass should have touched the runtime");
tokio::time::sleep(Duration::from_millis(20)).await; tokio::time::sleep(Duration::from_millis(20)).await;
tokio::task::yield_now().await; let second = wait_for_status_calls(&rt, first + 1).await;
tokio::task::yield_now().await;
let second = rt.status_call_count();
assert!( assert!(
second > first, second > first,
"loop should have fired a second pass after the interval" "loop should have fired a second pass after the interval"

View File

@ -285,7 +285,15 @@ async fn ensure_image_present(spec: &CompanionSpec) -> Result<String> {
async fn image_exists(image: &str) -> bool { async fn image_exists(image: &str) -> bool {
let mut cmd = Command::new("podman"); let mut cmd = Command::new("podman");
cmd.args(["image", "inspect", image]); // Only the exit status matters. WITHOUT a `--format`, `podman image inspect`
// prints the image's full multi-KB manifest JSON; `.status()` inherits the
// service's stdout, so on a hit that whole blob lands in the journal — once
// per companion image, every reconcile pass. That flood spikes journald +
// IO and starves the async runtime (UI websocket then drops → "connection
// lost"/reconnect). Discard the child's stdout/stderr; we read neither.
cmd.args(["image", "inspect", image])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null());
match tokio::time::timeout(COMPANION_IMAGE_CHECK_TIMEOUT, cmd.status()).await { match tokio::time::timeout(COMPANION_IMAGE_CHECK_TIMEOUT, cmd.status()).await {
Ok(Ok(status)) => status.success(), Ok(Ok(status)) => status.success(),
Ok(Err(err)) => { Ok(Err(err)) => {
@ -328,7 +336,10 @@ async fn image_created_unix(image: &str) -> Option<i64> {
if !out.status.success() { if !out.status.success() {
return None; return None;
} }
String::from_utf8_lossy(&out.stdout).trim().parse::<i64>().ok() String::from_utf8_lossy(&out.stdout)
.trim()
.parse::<i64>()
.ok()
} }
/// Newest modification time (Unix seconds) across all files under `dir`, /// Newest modification time (Unix seconds) across all files under `dir`,

View File

@ -382,7 +382,7 @@ fn get_app_metadata(app_id: &str) -> AppMetadata {
"lnd" | "lightning-stack" => AppMetadata { "lnd" | "lightning-stack" => AppMetadata {
title: "LND".to_string(), title: "LND".to_string(),
description: "Lightning Network Daemon".to_string(), description: "Lightning Network Daemon".to_string(),
icon: "/assets/img/app-icons/lnd.svg".to_string(), icon: "/assets/img/app-icons/lnd.png".to_string(),
repo: "https://github.com/lightningnetwork/lnd".to_string(), repo: "https://github.com/lightningnetwork/lnd".to_string(),
tier: "", tier: "",
}, },
@ -396,7 +396,7 @@ fn get_app_metadata(app_id: &str) -> AppMetadata {
"electrumx" | "mempool-electrs" | "electrs" => AppMetadata { "electrumx" | "mempool-electrs" | "electrs" => AppMetadata {
title: "ElectrumX".to_string(), title: "ElectrumX".to_string(),
description: "ElectrumX server — full Electrum protocol indexer for Bitcoin. Powers Mempool and Electrum wallets.".to_string(), description: "ElectrumX server — full Electrum protocol indexer for Bitcoin. Powers Mempool and Electrum wallets.".to_string(),
icon: "/assets/img/app-icons/electrs.svg".to_string(), icon: "/assets/img/app-icons/electrumx.png".to_string(),
repo: "https://github.com/spesmilo/electrumx".to_string(), repo: "https://github.com/spesmilo/electrumx".to_string(),
tier: "", tier: "",
}, },
@ -677,30 +677,76 @@ pub async fn read_tor_address(app_id: &str) -> Option<String> {
.filter(|s| s.ends_with(".onion") && !s.is_empty()) .filter(|s| s.ends_with(".onion") && !s.is_empty())
} }
/// Container-side ports that are essentially never a web UI, even when
/// published alongside one — e.g. gitea publishes SSH (`2222->22`) before its
/// web port (`3001->3000`), and podman's port list order isn't guaranteed to
/// put the UI port first. Skipping these lets launch-URL guessing work for
/// any future multi-port app without a per-app static override.
const NON_HTTP_CONTAINER_PORTS: &[&str] = &["22", "21", "3306", "5432", "6379", "27017"];
fn extract_lan_address(ports: &[String]) -> Option<String> { fn extract_lan_address(ports: &[String]) -> Option<String> {
let mut first_candidate = None;
for port_str in ports { for port_str in ports {
// Parse port strings like "0.0.0.0:18443->18443/tcp" or "0.0.0.0:18443-18444->18443-18444/tcp" // Parse port strings like "0.0.0.0:18443->18443/tcp" or "0.0.0.0:18443-18444->18443-18444/tcp"
if let Some(public_part) = port_str.split("->").next() { let Some(public_part) = port_str.split("->").next() else {
if let Some(port_part) = public_part.split(':').nth(1) { continue;
// Extract just the first port if it's a range (e.g., "18443-18444" -> "18443") };
let single_port = port_part.split('-').next().unwrap_or(port_part); let Some(port_part) = public_part.split(':').nth(1) else {
return Some(format!("http://localhost:{}", single_port)); continue;
} };
// Extract just the first port if it's a range (e.g., "18443-18444" -> "18443")
let host_port = port_part.split('-').next().unwrap_or(port_part);
let candidate = format!("http://localhost:{}", host_port);
if first_candidate.is_none() {
first_candidate = Some(candidate.clone());
} }
let container_port = port_str
.split("->")
.nth(1)
.and_then(|s| s.split('/').next())
.map(|s| s.split('-').next().unwrap_or(s));
if container_port.is_some_and(|p| NON_HTTP_CONTAINER_PORTS.contains(&p)) {
continue;
}
return Some(candidate);
} }
None // Nothing looked HTTP-like — fall back to whatever was published first
// rather than reporting no launch URL at all.
first_candidate
} }
/// netbird's dashboard launch URL: HTTPS on 8087 (the proxy terminates TLS —
/// the dashboard needs a secure context for OIDC PKCE, issue #15) at the node's
/// primary host IP so it's reachable from the LAN. Manifest-driven netbird no
/// longer writes `dashboard.env`, so this is derived from host facts (the same
/// `{{HOST_IP}}` the orchestrator bakes into the cert/config); it falls back to
/// the static localhost mapping when the host IP can't be read. URL shape is
/// identical to the legacy installer's, so the existing https reachability
/// wrapper still applies.
async fn netbird_configured_launch_url() -> Option<String> { async fn netbird_configured_launch_url() -> Option<String> {
let env = tokio::fs::read_to_string("/var/lib/archipelago/netbird/dashboard.env") if let Some(ip) = first_host_ip().await {
return Some(format!("https://{ip}:8087"));
}
PodmanClient::lan_address_for("netbird")
}
/// First address from `hostname -I` — the node's primary host IP. Mirrors the
/// orchestrator's `detect_host_ip` so launch URLs match the cert/config the
/// orchestrator renders for `{{HOST_IP}}`.
async fn first_host_ip() -> Option<String> {
let out = tokio::process::Command::new("hostname")
.arg("-I")
.output()
.await .await
.ok()?; .ok()?;
env.lines() if !out.status.success() {
.find_map(|line| line.strip_prefix("NETBIRD_MGMT_API_ENDPOINT=")) return None;
.map(str::trim) }
.filter(|s| !s.is_empty()) String::from_utf8_lossy(&out.stdout)
.split_whitespace()
.next()
.map(ToOwned::to_owned) .map(ToOwned::to_owned)
.or_else(|| PodmanClient::lan_address_for("netbird"))
} }
async fn reachable_lan_address(app_id: &str, candidate: Option<String>) -> Option<String> { async fn reachable_lan_address(app_id: &str, candidate: Option<String>) -> Option<String> {
@ -837,3 +883,54 @@ mod launch_url_port_tests {
assert_eq!(launch_url_port("http://localhost/"), None); assert_eq!(launch_url_port("http://localhost/"), None);
} }
} }
#[cfg(test)]
mod extract_lan_address_tests {
use super::extract_lan_address;
#[test]
fn skips_ssh_port_when_web_port_is_published() {
// gitea: SSH published before the web port, in podman's list order.
let ports = vec![
"0.0.0.0:2222->22/tcp".to_string(),
"0.0.0.0:3001->3000/tcp".to_string(),
];
assert_eq!(
extract_lan_address(&ports).as_deref(),
Some("http://localhost:3001")
);
}
#[test]
fn falls_back_to_first_port_when_nothing_looks_like_http() {
let ports = vec!["0.0.0.0:2222->22/tcp".to_string()];
assert_eq!(
extract_lan_address(&ports).as_deref(),
Some("http://localhost:2222")
);
}
#[test]
fn single_http_port_still_resolves() {
let ports = vec!["0.0.0.0:8096->8096/tcp".to_string()];
assert_eq!(
extract_lan_address(&ports).as_deref(),
Some("http://localhost:8096")
);
}
#[test]
fn handles_port_ranges() {
let ports = vec!["0.0.0.0:18443-18444->18443-18444/tcp".to_string()];
assert_eq!(
extract_lan_address(&ports).as_deref(),
Some("http://localhost:18443")
);
}
#[test]
fn no_ports_returns_none() {
let ports: Vec<String> = vec![];
assert_eq!(extract_lan_address(&ports), None);
}
}

View File

@ -85,12 +85,7 @@ pub async fn run_post_install(manifest: &AppManifest, container_name: &str, data
} }
} }
async fn run_step( async fn run_step(step: &HookStep, container: &str, app_id: &str, data_dir: &Path) -> Result<()> {
step: &HookStep,
container: &str,
app_id: &str,
data_dir: &Path,
) -> Result<()> {
match step { match step {
HookStep::Exec { exec } => { HookStep::Exec { exec } => {
let mut args: Vec<&str> = Vec::with_capacity(exec.len() + 2); let mut args: Vec<&str> = Vec::with_capacity(exec.len() + 2);

View File

@ -43,7 +43,11 @@ pub enum EnsureOutcome {
Unchanged, Unchanged,
} }
pub async fn ensure_config(paths: &EnsurePaths, rpc_pass: &str) -> Result<EnsureOutcome> { pub async fn ensure_config(
paths: &EnsurePaths,
rpc_pass: &str,
bitcoin_host: &str,
) -> Result<EnsureOutcome> {
fs::create_dir_all(&paths.data_dir) fs::create_dir_all(&paths.data_dir)
.await .await
.with_context(|| format!("creating {}", paths.data_dir.display()))?; .with_context(|| format!("creating {}", paths.data_dir.display()))?;
@ -52,7 +56,7 @@ pub async fn ensure_config(paths: &EnsurePaths, rpc_pass: &str) -> Result<Ensure
let existing = fs::read_to_string(&paths.conf_path) let existing = fs::read_to_string(&paths.conf_path)
.await .await
.with_context(|| format!("reading {}", paths.conf_path.display()))?; .with_context(|| format!("reading {}", paths.conf_path.display()))?;
if has_required_lnd_flags(&existing, rpc_pass) { if has_required_lnd_flags(&existing, rpc_pass, bitcoin_host) {
return Ok(EnsureOutcome::Unchanged); return Ok(EnsureOutcome::Unchanged);
} }
} }
@ -68,12 +72,11 @@ restlisten=0.0.0.0:8080\n\
bitcoin.active=true\n\ bitcoin.active=true\n\
bitcoin.mainnet=true\n\ bitcoin.mainnet=true\n\
bitcoin.node=bitcoind\n\ bitcoin.node=bitcoind\n\
bitcoind.rpchost=bitcoin-knots:8332\n\ bitcoind.rpchost={bitcoin_host}:8332\n\
bitcoind.rpcuser=archipelago\n\ bitcoind.rpcuser=archipelago\n\
bitcoind.rpcpass={}\n\ bitcoind.rpcpass={rpc_pass}\n\
bitcoind.rpcpolling=true\n\ bitcoind.rpcpolling=true\n\
bitcoind.estimatemode=ECONOMICAL\n", bitcoind.estimatemode=ECONOMICAL\n"
rpc_pass
); );
write_config_atomically(paths, &conf).await?; write_config_atomically(paths, &conf).await?;
@ -653,13 +656,14 @@ fn shell_quote(s: &str) -> String {
s.replace('\'', "'\\''") s.replace('\'', "'\\''")
} }
fn has_required_lnd_flags(conf: &str, rpc_pass: &str) -> bool { fn has_required_lnd_flags(conf: &str, rpc_pass: &str, bitcoin_host: &str) -> bool {
let rpc_pass_line = format!("bitcoind.rpcpass={rpc_pass}"); let rpc_pass_line = format!("bitcoind.rpcpass={rpc_pass}");
let rpc_host_line = format!("bitcoind.rpchost={bitcoin_host}:8332");
[ [
"bitcoin.active=true", "bitcoin.active=true",
"bitcoin.mainnet=true", "bitcoin.mainnet=true",
"bitcoin.node=bitcoind", "bitcoin.node=bitcoind",
"bitcoind.rpchost=bitcoin-knots:8332", rpc_host_line.as_str(),
rpc_pass_line.as_str(), rpc_pass_line.as_str(),
] ]
.iter() .iter()
@ -678,7 +682,7 @@ mod tests {
conf_path: tmp.path().join("lnd/lnd.conf"), conf_path: tmp.path().join("lnd/lnd.conf"),
}; };
let out = ensure_config(&paths, "secret").await.unwrap(); let out = ensure_config(&paths, "secret", "bitcoin-knots").await.unwrap();
assert_eq!(out, EnsureOutcome::Written); assert_eq!(out, EnsureOutcome::Written);
let conf = fs::read_to_string(&paths.conf_path).await.unwrap(); let conf = fs::read_to_string(&paths.conf_path).await.unwrap();
assert!(conf.contains("bitcoin.active=true")); assert!(conf.contains("bitcoin.active=true"));
@ -697,17 +701,46 @@ mod tests {
}; };
assert_eq!( assert_eq!(
ensure_config(&paths, "first").await.unwrap(), ensure_config(&paths, "first", "bitcoin-knots").await.unwrap(),
EnsureOutcome::Written EnsureOutcome::Written
); );
assert_eq!( assert_eq!(
ensure_config(&paths, "second").await.unwrap(), ensure_config(&paths, "second", "bitcoin-knots").await.unwrap(),
EnsureOutcome::Written EnsureOutcome::Written
); );
let conf = fs::read_to_string(&paths.conf_path).await.unwrap(); let conf = fs::read_to_string(&paths.conf_path).await.unwrap();
assert!(conf.contains("bitcoind.rpcpass=second")); assert!(conf.contains("bitcoind.rpcpass=second"));
} }
#[tokio::test]
async fn ensure_config_repairs_bitcoin_host_drift() {
// A conf written against bitcoin-knots must be rewritten when the
// node's Bitcoin variant is bitcoin-core, or LND dials a hostname
// that doesn't exist on archy-net and dies on startup.
let tmp = tempfile::TempDir::new().unwrap();
let paths = EnsurePaths {
data_dir: tmp.path().join("lnd"),
conf_path: tmp.path().join("lnd/lnd.conf"),
};
assert_eq!(
ensure_config(&paths, "pw", "bitcoin-knots").await.unwrap(),
EnsureOutcome::Written
);
assert_eq!(
ensure_config(&paths, "pw", "bitcoin-core").await.unwrap(),
EnsureOutcome::Written
);
let conf = fs::read_to_string(&paths.conf_path).await.unwrap();
assert!(conf.contains("bitcoind.rpchost=bitcoin-core:8332"));
assert!(!conf.contains("bitcoind.rpchost=bitcoin-knots:8332"));
assert_eq!(
ensure_config(&paths, "pw", "bitcoin-core").await.unwrap(),
EnsureOutcome::Unchanged
);
}
#[tokio::test] #[tokio::test]
async fn ensure_config_repairs_incomplete_existing_config() { async fn ensure_config_repairs_incomplete_existing_config() {
let tmp = tempfile::TempDir::new().unwrap(); let tmp = tempfile::TempDir::new().unwrap();
@ -721,7 +754,7 @@ mod tests {
.unwrap(); .unwrap();
assert_eq!( assert_eq!(
ensure_config(&paths, "repaired").await.unwrap(), ensure_config(&paths, "repaired", "bitcoin-knots").await.unwrap(),
EnsureOutcome::Written EnsureOutcome::Written
); );
let conf = fs::read_to_string(&paths.conf_path).await.unwrap(); let conf = fs::read_to_string(&paths.conf_path).await.unwrap();

View File

@ -14,6 +14,7 @@ pub mod quadlet;
pub mod registry; pub mod registry;
pub mod secrets; pub mod secrets;
pub mod traits; pub mod traits;
pub mod version_config;
pub use boot_reconciler::{BootReconciler, DEFAULT_INTERVAL as RECONCILER_DEFAULT_INTERVAL}; pub use boot_reconciler::{BootReconciler, DEFAULT_INTERVAL as RECONCILER_DEFAULT_INTERVAL};
pub use dev_orchestrator::DevContainerOrchestrator; pub use dev_orchestrator::DevContainerOrchestrator;

File diff suppressed because it is too large Load Diff

View File

@ -268,14 +268,21 @@ impl QuadletUnit {
let _ = writeln!(s, "HealthTimeout={}", h.timeout); let _ = writeln!(s, "HealthTimeout={}", h.timeout);
let _ = writeln!(s, "HealthRetries={}", h.retries); let _ = writeln!(s, "HealthRetries={}", h.retries);
} }
if let Some(ep) = &self.entrypoint { if let Some((first, rest)) = self.entrypoint.as_deref().and_then(<[String]>::split_first) {
// Quadlet's Exec= replaces the image entrypoint+cmd. When // Quadlet's Exec= sets only the command (the args passed to the
// the manifest provides both entrypoint and command we // image's ENTRYPOINT) — it does NOT replace the entrypoint. So a
// concatenate; if only command is set we'll emit that on // manifest entrypoint like `sh -lc` must be emitted as a real
// its own below. // Entrypoint= override; otherwise it gets appended to whatever
let mut parts: Vec<String> = ep.clone(); // ENTRYPOINT the image baked in (e.g. the versioned bitcoind
// images use `ENTRYPOINT ["bitcoind"]`, which turned the wrapper
// into `bitcoind sh -lc ...` and crash-looped). Emitting
// Entrypoint= makes the unit independent of the image's entrypoint.
let _ = writeln!(s, "Entrypoint={first}");
let mut parts: Vec<String> = rest.to_vec();
parts.extend(self.command.iter().cloned()); parts.extend(self.command.iter().cloned());
let _ = writeln!(s, "Exec={}", shell_join(&parts)); if !parts.is_empty() {
let _ = writeln!(s, "Exec={}", shell_join(&parts));
}
} else if !self.command.is_empty() { } else if !self.command.is_empty() {
let _ = writeln!(s, "Exec={}", shell_join(&self.command)); let _ = writeln!(s, "Exec={}", shell_join(&self.command));
} }
@ -581,11 +588,12 @@ pub async fn write_if_changed(unit: &QuadletUnit, dir: &Path) -> Result<bool> {
/// Reload the user systemd manager. Required after any quadlet write /// Reload the user systemd manager. Required after any quadlet write
/// or removal so systemd picks up the generated `.service` translation. /// or removal so systemd picks up the generated `.service` translation.
pub async fn daemon_reload_user() -> Result<()> { pub async fn daemon_reload_user() -> Result<()> {
let status = Command::new("systemctl") // Bounded: a wedged user manager (e.g. a unit stuck "deactivating" while
.args(["--user", "daemon-reload"]) // podman hangs) could otherwise block daemon-reload indefinitely and freeze
.status() // any caller — notably uninstall teardown.
let status = systemctl_user_status(&["daemon-reload"], Duration::from_secs(30))
.await .await
.context("spawn systemctl --user daemon-reload")?; .context("systemctl --user daemon-reload")?;
if !status.success() { if !status.success() {
return Err(anyhow!("systemctl --user daemon-reload exited {status}")); return Err(anyhow!("systemctl --user daemon-reload exited {status}"));
} }
@ -768,9 +776,11 @@ pub fn network_aliases_changed(old_body: &str, new_body: &str) -> bool {
} }
pub fn exec_changed(old_body: &str, new_body: &str) -> bool { pub fn exec_changed(old_body: &str, new_body: &str) -> bool {
let old_exec = directive_values(old_body, "Exec="); // Entrypoint= and Exec= together define what the container runs, so a drift
let new_exec = directive_values(new_body, "Exec="); // in either must recreate the container (e.g. when this renderer first
old_exec != new_exec // splits a folded `Exec=sh -lc ...` into `Entrypoint=sh` + `Exec=-lc ...`).
directive_values(old_body, "Exec=") != directive_values(new_body, "Exec=")
|| directive_values(old_body, "Entrypoint=") != directive_values(new_body, "Entrypoint=")
} }
fn directive_values(unit_body: &str, prefix: &str) -> Vec<String> { fn directive_values(unit_body: &str, prefix: &str) -> Vec<String> {
@ -787,11 +797,19 @@ fn directive_values(unit_body: &str, prefix: &str) -> Vec<String> {
/// that systemd no longer knows about. /// that systemd no longer knows about.
pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> { pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
let svc = format!("{unit_name}.service"); let svc = format!("{unit_name}.service");
// Stop first; ignore failure (unit may already be down). // Stop first; ignore failure (unit may already be down). BOUNDED — on
let _ = Command::new("systemctl") // rootless podman a generated unit can wedge in "deactivating" while
.args(["--user", "stop", &svc]) // `podman rm -f` hangs underneath it, and an unbounded `systemctl stop`
.status() // would block the entire uninstall forever: the progress bar freezes and
.await; // the package entry is stranded in `Removing` (a ghost in My Apps that also
// blocks reinstall). If the graceful stop times out, escalate to
// SIGKILL + reset-failed so teardown always proceeds.
if systemctl_user_status(&["stop", &svc], QUADLET_STOP_TIMEOUT)
.await
.is_err()
{
let _ = kill_and_reset_service(&svc).await;
}
let path = dir.join(format!("{unit_name}.container")); let path = dir.join(format!("{unit_name}.container"));
if fs::try_exists(&path).await.unwrap_or(false) { if fs::try_exists(&path).await.unwrap_or(false) {
match fs::remove_file(&path).await { match fs::remove_file(&path).await {
@ -802,10 +820,15 @@ pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
} }
daemon_reload_user().await.ok(); daemon_reload_user().await.ok();
// Defensive: kill the actual container too, in case quadlet left it. // Defensive: kill the actual container too, in case quadlet left it.
let _ = Command::new("podman") // Bounded so a hung podman store can't re-introduce the stall this function
.args(["rm", "-f", unit_name]) // exists to avoid.
.status() let _ = tokio::time::timeout(
.await; QUADLET_STOP_TIMEOUT,
Command::new("podman")
.args(["rm", "-f", unit_name])
.status(),
)
.await;
Ok(()) Ok(())
} }
@ -1049,7 +1072,10 @@ mod tests {
assert!(s.contains("ReadOnly=true")); assert!(s.contains("ReadOnly=true"));
assert!(s.contains("NoNewPrivileges=true")); assert!(s.contains("NoNewPrivileges=true"));
assert!(s.contains("PodmanArgs=--cpus=2")); assert!(s.contains("PodmanArgs=--cpus=2"));
assert!(s.contains("Exec=/usr/local/bin/bitcoind -server=1 -rpcbind=0.0.0.0")); // Manifest entrypoint becomes a real Entrypoint= override (not folded
// into Exec=), so the unit doesn't depend on the image's own ENTRYPOINT.
assert!(s.contains("Entrypoint=/usr/local/bin/bitcoind"));
assert!(s.contains("Exec=-server=1 -rpcbind=0.0.0.0"));
assert!(s.contains("Restart=on-failure")); assert!(s.contains("Restart=on-failure"));
assert!(s.contains("Network=archy-net")); assert!(s.contains("Network=archy-net"));
} }
@ -1274,7 +1300,10 @@ app:
let u = QuadletUnit::from_manifest(&m, "x"); let u = QuadletUnit::from_manifest(&m, "x");
// tmpfs entry is dropped from bind_mounts; bind entry survives. // tmpfs entry is dropped from bind_mounts; bind entry survives.
assert_eq!(u.bind_mounts.len(), 1); assert_eq!(u.bind_mounts.len(), 1);
assert_eq!(u.bind_mounts[0].host, PathBuf::from("/var/lib/archipelago/x")); assert_eq!(
u.bind_mounts[0].host,
PathBuf::from("/var/lib/archipelago/x")
);
} }
#[test] #[test]

View File

@ -66,6 +66,7 @@ fn ensure_one(dir: &Path, gs: &GeneratedSecret) -> Result<()> {
match gs.kind { match gs.kind {
SecretGenKind::Hex16 => write_secret(&dir.join(&gs.name), &random_hex(16))?, SecretGenKind::Hex16 => write_secret(&dir.join(&gs.name), &random_hex(16))?,
SecretGenKind::Hex32 => write_secret(&dir.join(&gs.name), &random_hex(32))?, SecretGenKind::Hex32 => write_secret(&dir.join(&gs.name), &random_hex(32))?,
SecretGenKind::Base64 => write_secret(&dir.join(&gs.name), &random_base64(32))?,
SecretGenKind::Bcrypt => { SecretGenKind::Bcrypt => {
let password = random_hex(BCRYPT_PASSWORD_BYTES); let password = random_hex(BCRYPT_PASSWORD_BYTES);
let hash = bcrypt::hash(&password, bcrypt::DEFAULT_COST) let hash = bcrypt::hash(&password, bcrypt::DEFAULT_COST)
@ -92,6 +93,15 @@ fn random_hex(bytes: usize) -> String {
hex::encode(buf) hex::encode(buf)
} }
/// `bytes` of entropy, standard base64 (with padding). For keys that a service
/// base64-decodes to recover the raw bytes (e.g. netbird's store encryptionKey).
fn random_base64(bytes: usize) -> String {
use base64::Engine as _;
let mut buf = vec![0u8; bytes];
rand::thread_rng().fill_bytes(&mut buf);
base64::engine::general_purpose::STANDARD.encode(buf)
}
/// Atomically write a `0600` secret: a temp file in the same dir (so the rename /// Atomically write a `0600` secret: a temp file in the same dir (so the rename
/// is atomic), fsynced, then renamed over the target. /// is atomic), fsynced, then renamed over the target.
fn write_secret(path: &Path, value: &str) -> Result<()> { fn write_secret(path: &Path, value: &str) -> Result<()> {
@ -159,7 +169,10 @@ mod tests {
let hash = std::fs::read_to_string(dir.path().join("admin")).unwrap(); let hash = std::fs::read_to_string(dir.path().join("admin")).unwrap();
let pw = std::fs::read_to_string(dir.path().join("admin.pw")).unwrap(); let pw = std::fs::read_to_string(dir.path().join("admin.pw")).unwrap();
assert!(hash.starts_with("$2"), "bcrypt hash shape"); assert!(hash.starts_with("$2"), "bcrypt hash shape");
assert!(bcrypt::verify(pw.trim(), hash.trim()).unwrap(), "pw matches hash"); assert!(
bcrypt::verify(pw.trim(), hash.trim()).unwrap(),
"pw matches hash"
);
for f in ["tok", "admin", "admin.pw"] { for f in ["tok", "admin", "admin.pw"] {
let mode = std::fs::metadata(dir.path().join(f)) let mode = std::fs::metadata(dir.path().join(f))
@ -179,7 +192,10 @@ mod tests {
let first = std::fs::read_to_string(dir.path().join("tok")).unwrap(); let first = std::fs::read_to_string(dir.path().join("tok")).unwrap();
ensure_generated_secrets(dir.path(), &m).unwrap(); ensure_generated_secrets(dir.path(), &m).unwrap();
let second = std::fs::read_to_string(dir.path().join("tok")).unwrap(); let second = std::fs::read_to_string(dir.path().join("tok")).unwrap();
assert_eq!(first, second, "a present readable secret is never rewritten"); assert_eq!(
first, second,
"a present readable secret is never rewritten"
);
} }
#[test] #[test]

View File

@ -0,0 +1,272 @@
//! Per-app version preferences — the persistence layer for multi-version support.
//!
//! Multi-version support (`docs/bitcoin-multi-version-design.md`) lets a node
//! runner pin Bitcoin Core / Knots to a specific version and opt into
//! auto-update-to-latest. Both choices live in the existing per-app config file
//! at `/var/lib/archipelago/app-configs/<id>.json` as two keys:
//!
//! ```jsonc
//! { "pinnedVersion": "29.3.knots20260508", "autoUpdate": false }
//! ```
//!
//! This is the single source of truth the orchestrator's install path reads to
//! resolve the image, and that the auto-update tick + "available update" badge
//! consult. Reads/writes are merge-preserving so they never clobber any
//! `containerConfig` (ports/volumes/env) a generic app may also store here.
//!
//! Platform-managed apps (bitcoin-core/knots/…) never use the
//! `containerConfig`-style keys (see `config.rs::dynamic_app_config`, which
//! returns early for them), so adding these keys to their file is collision-free.
use serde_json::{Map, Value};
use std::path::PathBuf;
/// Resolved version preferences for one app. Defaults: no pin, auto-update off
/// (consensus-critical apps opt in explicitly — design open-question #4).
#[derive(Debug, Clone, Default, PartialEq, Eq)]
pub struct AppVersionConfig {
/// The version string the runner pinned, if any. Suppresses the update badge
/// and overrides the catalog default at install/recreate time.
pub pinned_version: Option<String>,
/// When true, the hourly catalog tick updates this app to the catalog
/// default automatically. Ignored while a version is pinned.
pub auto_update: bool,
}
fn config_dir() -> PathBuf {
let base = std::env::var("ARCHIPELAGO_DATA_DIR")
.unwrap_or_else(|_| "/var/lib/archipelago".to_string());
PathBuf::from(base).join("app-configs")
}
fn config_path(app_id: &str) -> PathBuf {
config_dir().join(format!("{app_id}.json"))
}
/// App ids that have opted into auto-update-to-latest AND are not pinned (a pin
/// is an explicit "stay here"). Drives the hourly per-app auto-update tick. The
/// app id is the config file stem. Returns empty when the dir is absent.
pub fn auto_update_apps() -> Vec<String> {
let mut out = Vec::new();
let Ok(entries) = std::fs::read_dir(config_dir()) else {
return out;
};
for entry in entries.flatten() {
let path = entry.path();
if path.extension().and_then(|e| e.to_str()) != Some("json") {
continue;
}
let Some(app_id) = path.file_stem().and_then(|s| s.to_str()) else {
continue;
};
let cfg = read(app_id);
if cfg.auto_update && cfg.pinned_version.is_none() {
out.push(app_id.to_string());
}
}
out
}
fn read_raw(app_id: &str) -> Map<String, Value> {
let path = config_path(app_id);
match std::fs::read_to_string(&path) {
Ok(s) => serde_json::from_str::<Value>(&s)
.ok()
.and_then(|v| v.as_object().cloned())
.unwrap_or_default(),
Err(_) => Map::new(),
}
}
/// Read the version preferences for `app_id`. Returns defaults when the file is
/// absent or the keys are unset.
pub fn read(app_id: &str) -> AppVersionConfig {
let obj = read_raw(app_id);
AppVersionConfig {
pinned_version: obj
.get("pinnedVersion")
.and_then(Value::as_str)
.filter(|s| !s.is_empty())
.map(String::from),
auto_update: obj
.get("autoUpdate")
.and_then(Value::as_bool)
.unwrap_or(false),
}
}
/// The pinned version for `app_id`, if set. Convenience for the hot path.
pub fn pinned_version(app_id: &str) -> Option<String> {
read(app_id).pinned_version
}
/// Parse the leading numeric `major.minor.patch` of a version string into a
/// comparable tuple. Stops at the first non-numeric component, so Bitcoin Core
/// (`31.0`, `28.4`) and the Knots date-suffixed form (`29.3.knots20260508` →
/// `(29, 3, 0)`) both compare on their consensus-relevant major/minor. The
/// Knots build-date suffix is intentionally ignored — a same-major.minor Knots
/// rebuild is not a chainstate downgrade.
fn version_key(version: &str) -> (u64, u64, u64) {
let mut it = version.split('.').map(|c| {
// Take the leading digit run of each dotted component (`knots20260508`
// yields no leading digits → 0; `3` → 3).
c.chars()
.take_while(|ch| ch.is_ascii_digit())
.collect::<String>()
.parse::<u64>()
.unwrap_or(0)
});
(
it.next().unwrap_or(0),
it.next().unwrap_or(0),
it.next().unwrap_or(0),
)
}
/// True when installing `candidate` over `current` is a DOWNGRADE — an older
/// Bitcoin release over a chainstate written by a newer one. This is the
/// highest-risk operation (Core refuses to start on a newer chainstate without
/// an expensive reindex; pruned nodes can lose data), so the UI must warn and
/// the switch must be explicitly confirmed (design §4). Equal or newer → false.
pub fn is_downgrade(current: &str, candidate: &str) -> bool {
version_key(candidate) < version_key(current)
}
/// Merge `cfg` into the on-disk config, preserving every other key. A
/// `pinned_version` of `None` removes the `pinnedVersion` key (un-pins / "track
/// latest"). Creates the directory and file on first write.
pub fn write(app_id: &str, cfg: &AppVersionConfig) -> std::io::Result<()> {
let path = config_path(app_id);
let mut obj = read_raw(app_id);
match &cfg.pinned_version {
Some(v) => {
obj.insert("pinnedVersion".to_string(), Value::String(v.clone()));
}
None => {
obj.remove("pinnedVersion");
}
}
obj.insert("autoUpdate".to_string(), Value::Bool(cfg.auto_update));
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent)?;
}
let serialized = serde_json::to_string_pretty(&Value::Object(obj))
.map_err(|e| std::io::Error::new(std::io::ErrorKind::InvalidData, e))?;
// Atomic-ish write: temp + rename so a crash mid-write can't truncate config.
let tmp = path.with_extension("json.tmp");
std::fs::write(&tmp, serialized.as_bytes())?;
std::fs::rename(&tmp, &path)
}
#[cfg(test)]
mod tests {
use super::*;
// `ARCHIPELAGO_DATA_DIR` is process-global, so the write/read tests must not
// run concurrently — serialize them and give each a unique dir. Without this
// lock, parallel `cargo test` races on the env var (poisoning is fine: a
// panicking test still releases a usable guard).
static ENV_LOCK: std::sync::Mutex<u64> = std::sync::Mutex::new(0);
fn with_tmp_data_dir<F: FnOnce()>(f: F) {
let mut counter = ENV_LOCK.lock().unwrap_or_else(|e| e.into_inner());
*counter += 1;
let dir =
std::env::temp_dir().join(format!("archy-vc-test-{}-{}", std::process::id(), *counter));
let _ = std::fs::remove_dir_all(&dir);
std::fs::create_dir_all(&dir).unwrap();
std::env::set_var("ARCHIPELAGO_DATA_DIR", &dir);
f();
std::env::remove_var("ARCHIPELAGO_DATA_DIR");
let _ = std::fs::remove_dir_all(&dir);
// `counter` guard drops here, releasing the lock for the next test.
}
#[test]
fn defaults_when_absent() {
with_tmp_data_dir(|| {
let cfg = read("bitcoin-core");
assert_eq!(cfg.pinned_version, None);
assert!(!cfg.auto_update);
});
}
#[test]
fn write_then_read_roundtrips() {
with_tmp_data_dir(|| {
write(
"bitcoin-knots",
&AppVersionConfig {
pinned_version: Some("29.3.knots20260508".into()),
auto_update: false,
},
)
.unwrap();
let cfg = read("bitcoin-knots");
assert_eq!(cfg.pinned_version.as_deref(), Some("29.3.knots20260508"));
assert!(!cfg.auto_update);
});
}
#[test]
fn write_preserves_existing_keys() {
with_tmp_data_dir(|| {
// Simulate a generic app's containerConfig already on disk.
let path = config_path("someapp");
std::fs::create_dir_all(path.parent().unwrap()).unwrap();
std::fs::write(&path, r#"{"ports":["80:80"],"autoUpdate":false}"#).unwrap();
write(
"someapp",
&AppVersionConfig {
pinned_version: Some("1.2.3".into()),
auto_update: true,
},
)
.unwrap();
let raw = read_raw("someapp");
assert!(raw.contains_key("ports"), "ports key must survive");
assert_eq!(raw.get("pinnedVersion").unwrap(), "1.2.3");
assert_eq!(raw.get("autoUpdate").unwrap(), &Value::Bool(true));
});
}
#[test]
fn downgrade_detection() {
// Older over newer = downgrade.
assert!(is_downgrade("31.0", "30.0"));
assert!(is_downgrade("28.4", "27.2"));
// Same or newer = not a downgrade.
assert!(!is_downgrade("30.0", "31.0"));
assert!(!is_downgrade("28.4", "28.4"));
// Knots date-suffixed strings compare on major.minor only.
assert!(is_downgrade("29.3.knots20260508", "28.1.knots20251010"));
assert!(!is_downgrade("29.3.knots20260101", "29.3.knots20260508"));
}
#[test]
fn unpin_removes_key() {
with_tmp_data_dir(|| {
write(
"bitcoin-core",
&AppVersionConfig {
pinned_version: Some("31.0".into()),
auto_update: true,
},
)
.unwrap();
write(
"bitcoin-core",
&AppVersionConfig {
pinned_version: None,
auto_update: true,
},
)
.unwrap();
let raw = read_raw("bitcoin-core");
assert!(!raw.contains_key("pinnedVersion"));
assert_eq!(read("bitcoin-core").pinned_version, None);
assert!(read("bitcoin-core").auto_update);
});
}
}

View File

@ -153,7 +153,9 @@ pub async fn read_owned(
onion: &str, onion: &str,
content_id: &str, content_id: &str,
) -> Option<(String, Vec<u8>)> { ) -> Option<(String, Vec<u8>)> {
let bytes = fs::read(bytes_path(data_dir, onion, content_id)).await.ok()?; let bytes = fs::read(bytes_path(data_dir, onion, content_id))
.await
.ok()?;
let mime = load_index(data_dir) let mime = load_index(data_dir)
.await .await
.items .items

View File

@ -7,7 +7,7 @@ use anyhow::{Context, Result};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use std::path::{Path, PathBuf}; use std::path::{Path, PathBuf};
use tokio::fs; use tokio::fs;
use tracing::debug; use tracing::{debug, warn};
const CATALOG_FILE: &str = "content/catalog.json"; const CATALOG_FILE: &str = "content/catalog.json";
const CONTENT_DIR: &str = "content/files"; const CONTENT_DIR: &str = "content/files";
@ -86,6 +86,22 @@ pub async fn save_catalog(data_dir: &Path, catalog: &ContentCatalog) -> Result<(
Ok(()) Ok(())
} }
/// Removes `id` from the on-disk catalog. Best-effort: a failure here just
/// means the entry gets pruned again next time it's requested, so errors are
/// logged rather than propagated.
async fn prune_missing_content_entry(data_dir: &Path, id: &str) {
let Ok(mut catalog) = load_catalog(data_dir).await else {
return;
};
let before = catalog.items.len();
catalog.items.retain(|i| i.id != id);
if catalog.items.len() != before {
if let Err(e) = save_catalog(data_dir, &catalog).await {
warn!(error = %e, content_id = %id, "failed to save catalog after pruning missing content entry");
}
}
}
/// Get the full filesystem path for a content item. /// Get the full filesystem path for a content item.
/// Checks the dedicated content/files/ directory first, then falls back to the /// Checks the dedicated content/files/ directory first, then falls back to the
/// FileBrowser data directory (where users manage files via the web UI). /// FileBrowser data directory (where users manage files via the web UI).
@ -268,6 +284,19 @@ pub async fn serve_content(
let file_path = content_file_path(data_dir, item); let file_path = content_file_path(data_dir, item);
if !file_path.exists() { if !file_path.exists() {
// The catalog entry survived (it's a separate JSON file) but its
// backing file is gone — most likely lost in an unrelated data-dir
// reset (a shared filebrowser file, 2026-07-01: two catalog entries
// outlived a filebrowser reinstall that wiped the files themselves).
// Leaving the entry in place would keep advertising it as available
// to every peer forever, each hitting the exact same dead end this
// one just did. Prune it so it stops being offered.
warn!(
content_id = %id,
filename = %item.filename,
"content catalog entry's file is missing on disk — pruning the stale entry"
);
prune_missing_content_entry(data_dir, id).await;
return Ok(ServeResult::NotFound); return Ok(ServeResult::NotFound);
} }
@ -555,3 +584,95 @@ mod faststart_tests {
assert_eq!(mp4_is_faststart(&p).await, Some(false)); assert_eq!(mp4_is_faststart(&p).await, Some(false));
} }
} }
#[cfg(test)]
mod prune_missing_content_tests {
use super::*;
#[tokio::test]
async fn serve_content_prunes_catalog_entry_whose_file_is_missing() {
// Simulates a catalog entry that outlived its backing file (a shared
// filebrowser file lost in an unrelated data-dir reset, 2026-07-01) —
// every peer request for it would otherwise 404 forever with no way
// to tell it apart from a transient failure.
let dir = tempfile::tempdir().unwrap();
let data_dir = dir.path();
let item = ContentItem {
id: "missing-item".to_string(),
filename: "gone.mp4".to_string(),
mime_type: "video/mp4".to_string(),
size_bytes: 123,
description: String::new(),
access: AccessControl::Free,
availability: Availability::AllPeers,
added_at: "2026-01-01T00:00:00Z".to_string(),
};
save_catalog(
data_dir,
&ContentCatalog {
items: vec![item],
},
)
.await
.unwrap();
// File was never written to disk under content/files/ or filebrowser/.
let result = serve_content(data_dir, "missing-item", None, None, None, None)
.await
.unwrap();
assert!(matches!(result, ServeResult::NotFound));
let reloaded = load_catalog(data_dir).await.unwrap();
assert!(
reloaded.items.is_empty(),
"stale entry should have been pruned after the 404"
);
}
#[tokio::test]
async fn serve_content_leaves_other_entries_untouched_when_pruning() {
let dir = tempfile::tempdir().unwrap();
let data_dir = dir.path();
let missing = ContentItem {
id: "missing-item".to_string(),
filename: "gone.mp4".to_string(),
mime_type: "video/mp4".to_string(),
size_bytes: 123,
description: String::new(),
access: AccessControl::Free,
availability: Availability::AllPeers,
added_at: "2026-01-01T00:00:00Z".to_string(),
};
let present = ContentItem {
id: "present-item".to_string(),
filename: "here.mp4".to_string(),
mime_type: "video/mp4".to_string(),
size_bytes: 4,
description: String::new(),
access: AccessControl::Free,
availability: Availability::AllPeers,
added_at: "2026-01-01T00:00:00Z".to_string(),
};
save_catalog(
data_dir,
&ContentCatalog {
items: vec![missing, present],
},
)
.await
.unwrap();
let content_dir = data_dir.join("content").join("files");
tokio::fs::create_dir_all(&content_dir).await.unwrap();
tokio::fs::write(content_dir.join("here.mp4"), b"data")
.await
.unwrap();
let _ = serve_content(data_dir, "missing-item", None, None, None, None)
.await
.unwrap();
let reloaded = load_catalog(data_dir).await.unwrap();
assert_eq!(reloaded.items.len(), 1);
assert_eq!(reloaded.items[0].id, "present-item");
}
}

View File

@ -20,6 +20,7 @@ use tracing::{info, warn};
const PID_FILE: &str = "archipelago.pid"; const PID_FILE: &str = "archipelago.pid";
const CONTAINER_STATE_FILE: &str = "running-containers.json"; const CONTAINER_STATE_FILE: &str = "running-containers.json";
const USER_STOPPED_FILE: &str = "user-stopped.json"; const USER_STOPPED_FILE: &str = "user-stopped.json";
const USER_UNINSTALLED_FILE: &str = "user-uninstalled.json";
/// Shared flag: true once boot recovery is complete. Health monitor should wait for this. /// Shared flag: true once boot recovery is complete. Health monitor should wait for this.
pub static RECOVERY_COMPLETE: AtomicBool = AtomicBool::new(false); pub static RECOVERY_COMPLETE: AtomicBool = AtomicBool::new(false);
@ -48,6 +49,46 @@ pub fn is_recovery_complete() -> bool {
RECOVERY_COMPLETE.load(Ordering::SeqCst) RECOVERY_COMPLETE.load(Ordering::SeqCst)
} }
// ── Pending boot-start tracking ─────────────────────────────────────────
// Containers that boot recovery / the reconciler is about to start (or is
// starting right now). The package scanner overlays these as `Restarting`
// instead of the raw podman `Stopped`/`Exited`, so a freshly rebooted node
// doesn't tell the user their apps are "Stopped" while the sequential
// recovery pass (3s stagger + up to minutes for heavyweights like bitcoin)
// is still working through the queue. Writers register names when a pass
// begins and remove each name once its start attempt finishes, whatever
// the outcome — a container that truly failed goes back to showing its
// real state on the next scan.
static PENDING_BOOT_STARTS: std::sync::LazyLock<std::sync::RwLock<std::collections::HashSet<String>>> =
std::sync::LazyLock::new(|| std::sync::RwLock::new(std::collections::HashSet::new()));
/// Register container/app names an active recovery or reconcile pass
/// intends to start.
pub fn pending_boot_starts_add<I: IntoIterator<Item = String>>(names: I) {
if let Ok(mut set) = PENDING_BOOT_STARTS.write() {
set.extend(names);
}
}
/// A start attempt for `name` finished (success or failure) — stop
/// overlaying it.
pub fn pending_boot_start_done(name: &str) {
if let Ok(mut set) = PENDING_BOOT_STARTS.write() {
set.remove(name);
}
}
/// Whether `name` (a container name or scanner app id) is queued for a
/// boot/reconcile start. Container names may carry an `archy-` prefix the
/// scanner strips when deriving app ids, so check both forms.
pub fn is_pending_boot_start(name: &str) -> bool {
let Ok(set) = PENDING_BOOT_STARTS.read() else {
return false;
};
set.contains(name) || set.contains(&format!("archy-{name}"))
}
// ── User-stopped tracking ─────────────────────────────────────────────── // ── User-stopped tracking ───────────────────────────────────────────────
// When a user explicitly stops a container via the UI, we record it here // When a user explicitly stops a container via the UI, we record it here
// so crash recovery and health monitor don't auto-restart it. // so crash recovery and health monitor don't auto-restart it.
@ -61,6 +102,22 @@ pub async fn load_user_stopped(data_dir: &Path) -> std::collections::HashSet<Str
} }
} }
/// Names of the containers that were running at the last periodic snapshot
/// (`running-containers.json`, saved every ~120s by `save_container_snapshot`).
/// Unlike `check_for_crash`, this reads the snapshot unconditionally (no PID/crash
/// gate) — it's the durable "what was running" signal the boot reconciler uses to
/// recreate a previously-running app whose container vanished. Empty if absent.
pub async fn load_last_running_names(data_dir: &Path) -> std::collections::HashSet<String> {
let path = data_dir.join(CONTAINER_STATE_FILE);
match fs::read_to_string(&path).await {
Ok(content) => match serde_json::from_str::<ContainerSnapshot>(&content) {
Ok(snapshot) => snapshot.containers.into_iter().map(|c| c.name).collect(),
Err(_) => std::collections::HashSet::new(),
},
Err(_) => std::collections::HashSet::new(),
}
}
/// Save the set of user-stopped containers to disk. /// Save the set of user-stopped containers to disk.
pub async fn save_user_stopped(data_dir: &Path, stopped: &std::collections::HashSet<String>) { pub async fn save_user_stopped(data_dir: &Path, stopped: &std::collections::HashSet<String>) {
let path = data_dir.join(USER_STOPPED_FILE); let path = data_dir.join(USER_STOPPED_FILE);
@ -84,6 +141,51 @@ pub async fn clear_user_stopped(data_dir: &Path, name: &str) {
} }
} }
// ── User-uninstalled tracking ───────────────────────────────────────────
// Baseline apps (bitcoin-knots, electrumx, lnd, mempool, ...) self-heal when
// their container is missing — see `is_required_baseline_app` in
// prod_orchestrator.rs — because they're expected to exist from first boot.
// That self-heal has no way to distinguish "container vanished after a
// crash" from "user explicitly uninstalled this," and the in-memory
// `disabled` set the orchestrator otherwise uses is wiped by every
// `load_manifests()` call (once per archipelago startup). Without a durable
// marker, uninstalling a baseline app only "sticks" until the next reboot or
// archipelago restart, at which point the boot reconciler resurrects it.
// This mirrors `user_stopped` exactly, just for uninstall instead of stop.
/// Load the set of explicitly user-uninstalled app/container names from disk.
pub async fn load_user_uninstalled(data_dir: &Path) -> std::collections::HashSet<String> {
let path = data_dir.join(USER_UNINSTALLED_FILE);
match fs::read_to_string(&path).await {
Ok(content) => serde_json::from_str(&content).unwrap_or_default(),
Err(_) => std::collections::HashSet::new(),
}
}
/// Save the set of user-uninstalled app/container names to disk.
pub async fn save_user_uninstalled(data_dir: &Path, uninstalled: &std::collections::HashSet<String>) {
let path = data_dir.join(USER_UNINSTALLED_FILE);
if let Ok(json) = serde_json::to_string_pretty(uninstalled) {
let _ = fs::write(&path, json).await;
}
}
/// Mark a name as user-uninstalled (won't be self-healed by the baseline-app
/// reconciler across restarts/reboots).
pub async fn mark_user_uninstalled(data_dir: &Path, name: &str) {
let mut uninstalled = load_user_uninstalled(data_dir).await;
uninstalled.insert(name.to_string());
save_user_uninstalled(data_dir, &uninstalled).await;
}
/// Clear the user-uninstalled flag (app was explicitly (re)installed/started).
pub async fn clear_user_uninstalled(data_dir: &Path, name: &str) {
let mut uninstalled = load_user_uninstalled(data_dir).await;
if uninstalled.remove(name) {
save_user_uninstalled(data_dir, &uninstalled).await;
}
}
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RunningContainerRecord { pub struct RunningContainerRecord {
pub name: String, pub name: String,
@ -116,10 +218,17 @@ pub async fn check_for_crash(data_dir: &Path) -> Result<Option<Vec<RunningContai
old_pid old_pid
); );
// Check if that PID is actually still running (zombie/stuck process) // Check if that PID is actually still running (zombie/stuck process).
// Guard against PID reuse: after a reboot the old PID often belongs to an
// unrelated process (or, before the main.rs ordering fix, to OURSELVES) —
// only treat it as "previous instance still alive" if it's a live process
// that is not us and whose cmdline looks like the archipelago binary.
if !old_pid.is_empty() { if !old_pid.is_empty() {
if let Ok(pid) = old_pid.parse::<u32>() { if let Ok(pid) = old_pid.parse::<u32>() {
if is_process_running(pid) { if pid != std::process::id()
&& is_process_running(pid)
&& process_is_archipelago(pid)
{
warn!( warn!(
"Previous process (PID {}) is still running — not a crash, skipping recovery", "Previous process (PID {}) is still running — not a crash, skipping recovery",
pid pid
@ -249,6 +358,8 @@ pub async fn recover_containers(containers: &[RunningContainerRecord]) -> Recove
failed: Vec::new(), failed: Vec::new(),
}; };
pending_boot_starts_add(containers.iter().map(|r| r.name.clone()));
for (i, record) in containers.iter().enumerate() { for (i, record) in containers.iter().enumerate() {
info!( info!(
"Recovering container: {} (image: {})", "Recovering container: {} (image: {})",
@ -311,6 +422,7 @@ pub async fn recover_containers(containers: &[RunningContainerRecord]) -> Recove
if !started { if !started {
report.failed.push(record.name.clone()); report.failed.push(record.name.clone());
} }
pending_boot_start_done(&record.name);
} }
report report
@ -329,6 +441,16 @@ fn is_process_running(pid: u32) -> bool {
std::path::Path::new(&format!("/proc/{}", pid)).exists() std::path::Path::new(&format!("/proc/{}", pid)).exists()
} }
/// Whether the process at `pid` looks like an archipelago instance. Used to
/// tell "the previous instance is genuinely still alive" apart from PID
/// reuse by an unrelated process after a reboot.
fn process_is_archipelago(pid: u32) -> bool {
match std::fs::read(format!("/proc/{pid}/cmdline")) {
Ok(cmdline) => String::from_utf8_lossy(&cmdline).contains("archipelago"),
Err(_) => false,
}
}
/// Start all stopped containers that were previously installed. /// Start all stopped containers that were previously installed.
/// Runs on every startup to ensure containers come back after clean reboots. /// Runs on every startup to ensure containers come back after clean reboots.
/// The crash recovery (PID-based) handles dirty shutdowns; this handles clean ones. /// The crash recovery (PID-based) handles dirty shutdowns; this handles clean ones.
@ -353,7 +475,7 @@ async fn start_stopped_app_stacks(data_dir: &Path) -> RecoveryReport {
}; };
for stack in stack_recovery_specs() { for stack in stack_recovery_specs() {
if !stack_has_any_container(stack).await { if !stack_anchor_container_exists(stack).await {
continue; continue;
} }
@ -363,16 +485,34 @@ async fn start_stopped_app_stacks(data_dir: &Path) -> RecoveryReport {
); );
repair_stack_network_aliases(stack).await; repair_stack_network_aliases(stack).await;
// Register the whole stack up front: the per-member dependency waits
// below can take minutes, and the UI should say "Restarting", not
// "Stopped", for members still queued behind them.
pending_boot_starts_add(
stack
.containers
.iter()
.filter(|c| !user_stopped.contains(**c))
.map(|c| (*c).to_string()),
);
for container in stack.containers { for container in stack.containers {
if user_stopped.contains(*container) { if user_stopped.contains(*container) {
info!("Skipping user-stopped container: {}", container); info!("Skipping user-stopped container: {}", container);
continue; continue;
} }
match container_state(container).await { let state = container_state(container).await;
Some(state) if state == "running" => continue, match state {
Some(state) if state == "running" => {
pending_boot_start_done(container);
continue;
}
Some(_) => {} Some(_) => {}
None => continue, None => {
pending_boot_start_done(container);
continue;
}
} }
repair_stack_network_aliases(stack).await; repair_stack_network_aliases(stack).await;
@ -384,6 +524,7 @@ async fn start_stopped_app_stacks(data_dir: &Path) -> RecoveryReport {
} else { } else {
report.failed.push((*container).to_string()); report.failed.push((*container).to_string());
} }
pending_boot_start_done(container);
} }
} }
@ -557,6 +698,11 @@ struct StackRecoverySpec {
network: &'static str, network: &'static str,
aliases: &'static [(&'static str, &'static str)], aliases: &'static [(&'static str, &'static str)],
containers: &'static [&'static str], containers: &'static [&'static str],
/// The stack's core dependency (its DB / server container) — every other
/// member depends on this being present. Used to distinguish "a genuinely
/// installed stack has a crashed member" from "orphan debris from a
/// partial/failed install" (see `stack_anchor_container_exists`).
anchor: &'static str,
} }
fn stack_recovery_specs() -> &'static [StackRecoverySpec] { fn stack_recovery_specs() -> &'static [StackRecoverySpec] {
@ -570,6 +716,7 @@ fn stack_recovery_specs() -> &'static [StackRecoverySpec] {
("immich_server", "immich_server"), ("immich_server", "immich_server"),
], ],
containers: &["immich_postgres", "immich_redis", "immich_server"], containers: &["immich_postgres", "immich_redis", "immich_server"],
anchor: "immich_postgres",
}, },
StackRecoverySpec { StackRecoverySpec {
name: "indeedhub", name: "indeedhub",
@ -591,6 +738,7 @@ fn stack_recovery_specs() -> &'static [StackRecoverySpec] {
"indeedhub-ffmpeg", "indeedhub-ffmpeg",
"indeedhub", "indeedhub",
], ],
anchor: "indeedhub-postgres",
}, },
StackRecoverySpec { StackRecoverySpec {
name: "netbird", name: "netbird",
@ -601,17 +749,20 @@ fn stack_recovery_specs() -> &'static [StackRecoverySpec] {
("netbird", "netbird"), ("netbird", "netbird"),
], ],
containers: &["netbird-server", "netbird-dashboard", "netbird"], containers: &["netbird-server", "netbird-dashboard", "netbird"],
anchor: "netbird-server",
}, },
] ]
} }
async fn stack_has_any_container(stack: &StackRecoverySpec) -> bool { /// Whether the stack's core dependency container exists at all (running or
for container in stack.containers { /// not — existence, not health, is what matters here). `false` means any
if container_state(container).await.is_some() { /// other stack member still lying around is orphan debris from a partial or
return true; /// already-uninstalled install, not a legitimately-installed-but-crashed
} /// stack — blindly restarting those siblings just crash-loops them forever
} /// against a dependency that was never created (indeedhub-api on `.116`,
false /// 2026-07-01: retried every 120s against a nonexistent indeedhub-postgres).
async fn stack_anchor_container_exists(stack: &StackRecoverySpec) -> bool {
container_state(stack.anchor).await.is_some()
} }
async fn repair_stack_network_aliases(stack: &StackRecoverySpec) { async fn repair_stack_network_aliases(stack: &StackRecoverySpec) {
@ -898,6 +1049,43 @@ mod tests {
assert_eq!(containers[1].name, "archy-mempool-web"); assert_eq!(containers[1].name, "archy-mempool-web");
} }
#[tokio::test]
async fn test_load_last_running_names_reads_snapshot_without_pid_gate() {
let tmp = TempDir::new().unwrap();
// No PID file written — load_last_running_names must NOT require a crash.
let snapshot = ContainerSnapshot {
timestamp: 1000,
containers: vec![
RunningContainerRecord {
name: "immich_server".to_string(),
image: "immich:2.7".to_string(),
},
RunningContainerRecord {
name: "immich_postgres".to_string(),
image: "postgres:16".to_string(),
},
],
};
fs::write(
tmp.path().join(CONTAINER_STATE_FILE),
serde_json::to_string(&snapshot).unwrap(),
)
.await
.unwrap();
let names = load_last_running_names(tmp.path()).await;
assert_eq!(names.len(), 2);
assert!(names.contains("immich_server"));
assert!(names.contains("immich_postgres"));
assert!(!names.contains("immich_redis"));
}
#[tokio::test]
async fn test_load_last_running_names_empty_when_absent() {
let tmp = TempDir::new().unwrap();
assert!(load_last_running_names(tmp.path()).await.is_empty());
}
#[tokio::test] #[tokio::test]
async fn test_write_and_remove_pid_marker() { async fn test_write_and_remove_pid_marker() {
let tmp = TempDir::new().unwrap(); let tmp = TempDir::new().unwrap();
@ -960,4 +1148,27 @@ mod tests {
true true
)); ));
} }
#[test]
fn stack_recovery_anchor_is_the_stacks_own_core_dependency() {
// Every stack's anchor must be one of its own containers (typically
// the DB/server the rest depend on) — a typo here would silently
// disable orphan-debris protection for that stack.
for stack in stack_recovery_specs() {
assert!(
stack.containers.contains(&stack.anchor),
"{}: anchor {} not among its own containers",
stack.name,
stack.anchor
);
}
assert_eq!(
stack_recovery_specs()
.iter()
.find(|s| s.name == "indeedhub")
.unwrap()
.anchor,
"indeedhub-postgres"
);
}
} }

View File

@ -61,6 +61,18 @@ pub struct ServerInfo {
/// True if this node's keys are derived from a BIP-39 seed. /// True if this node's keys are derived from a BIP-39 seed.
#[serde(rename = "seed-backed", default)] #[serde(rename = "seed-backed", default)]
pub seed_backed: bool, pub seed_backed: bool,
/// This node's own physical location, for the Mesh Map — opt-in only
/// (see `share_location`), set via `server.set-location`. `None` until
/// the user sets one, regardless of `share_location`.
#[serde(default)]
pub lat: Option<f64>,
#[serde(default)]
pub lon: Option<f64>,
/// Whether `lat`/`lon` should be included in the state snapshot we send
/// to trusted federation peers (so they can plot us on their Mesh Map).
/// Defaults to false — never shared unless explicitly turned on.
#[serde(rename = "share-location", default)]
pub share_location: bool,
} }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] #[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
@ -347,6 +359,9 @@ impl DataModel {
wifi_ssids: vec![], wifi_ssids: vec![],
zram_enabled: false, zram_enabled: false,
seed_backed: false, seed_backed: false,
lat: None,
lon: None,
share_location: false,
}, },
package_data: HashMap::new(), package_data: HashMap::new(),
peer_health: HashMap::new(), peer_health: HashMap::new(),

View File

@ -296,7 +296,9 @@ pub(crate) async fn notify_join(
status = %resp.status(), status = %resp.status(),
"peer-joined notification rejected; will retry" "peer-joined notification rejected; will retry"
), ),
Err(e) => tracing::warn!(attempt, error = %e, "peer-joined notification failed; will retry"), Err(e) => {
tracing::warn!(attempt, error = %e, "peer-joined notification failed; will retry")
}
} }
tokio::time::sleep(std::time::Duration::from_secs(10 * attempt as u64)).await; tokio::time::sleep(std::time::Duration::from_secs(10 * attempt as u64)).await;
} }

View File

@ -506,6 +506,8 @@ mod tests {
nostr_npub: None, nostr_npub: None,
own_fips_npub: None, own_fips_npub: None,
federated_peers: Vec::new(), federated_peers: Vec::new(),
lat: None,
lon: None,
}; };
update_node_state(dir.path(), "did:key:z1", state) update_node_state(dir.path(), "did:key:z1", state)

View File

@ -208,6 +208,7 @@ async fn merge_transitive_peers(
/// and route directly over FIPS from now on). Only peers we trust are /// and route directly over FIPS from now on). Only peers we trust are
/// shared — an Untrusted/Observer node should not be re-exported /// shared — an Untrusted/Observer node should not be re-exported
/// through us to the network. /// through us to the network.
#[allow(clippy::too_many_arguments)]
pub fn build_local_state( pub fn build_local_state(
apps: Vec<AppStatus>, apps: Vec<AppStatus>,
cpu: f64, cpu: f64,
@ -221,6 +222,9 @@ pub fn build_local_state(
nostr_npub: Option<String>, nostr_npub: Option<String>,
own_fips_npub: Option<String>, own_fips_npub: Option<String>,
federated_peers: &[FederatedNode], federated_peers: &[FederatedNode],
// Only Some when the node has opted in via server.set-location's
// `share` flag — see NodeStateSnapshot::lat/lon's doc comment.
shared_location: Option<(f64, f64)>,
) -> NodeStateSnapshot { ) -> NodeStateSnapshot {
let hints = federated_peers let hints = federated_peers
.iter() .iter()
@ -248,6 +252,8 @@ pub fn build_local_state(
nostr_npub, nostr_npub,
own_fips_npub, own_fips_npub,
federated_peers: hints, federated_peers: hints,
lat: shared_location.map(|(lat, _)| lat),
lon: shared_location.map(|(_, lon)| lon),
} }
} }
@ -341,12 +347,14 @@ mod tests {
None, None,
None, None,
&[], &[],
None,
); );
assert_eq!(state.apps.len(), 1); assert_eq!(state.apps.len(), 1);
assert_eq!(state.cpu_usage_percent, Some(25.5)); assert_eq!(state.cpu_usage_percent, Some(25.5));
assert_eq!(state.tor_active, Some(true)); assert_eq!(state.tor_active, Some(true));
assert_eq!(state.node_name, Some("Test Node".to_string())); assert_eq!(state.node_name, Some("Test Node".to_string()));
assert!(state.federated_peers.is_empty()); assert!(state.federated_peers.is_empty());
assert_eq!(state.lat, None);
} }
#[test] #[test]
@ -392,7 +400,7 @@ mod tests {
last_transport_at: None, last_transport_at: None,
}, },
]; ];
let state = build_local_state(vec![], 0.0, 0, 0, 0, 0, 0, true, None, None, None, &peers); let state = build_local_state(vec![], 0.0, 0, 0, 0, 0, 0, true, None, None, None, &peers, None);
assert_eq!(state.federated_peers.len(), 1); assert_eq!(state.federated_peers.len(), 1);
assert_eq!(state.federated_peers[0].did, "did:key:zTrusted"); assert_eq!(state.federated_peers[0].did, "did:key:zTrusted");
assert_eq!( assert_eq!(

View File

@ -93,6 +93,14 @@ pub struct NodeStateSnapshot {
/// re-export them in her own state snapshots). /// re-export them in her own state snapshots).
#[serde(default)] #[serde(default)]
pub federated_peers: Vec<FederationPeerHint>, pub federated_peers: Vec<FederationPeerHint>,
/// This node's own location, for the Mesh Map — only present when the
/// sender has opted in via `server.set-location`'s `share` flag. Absent
/// (not just null) for nodes that haven't opted in, so older receivers
/// and the map's "no location shared" state both fall out naturally.
#[serde(default)]
pub lat: Option<f64>,
#[serde(default)]
pub lon: Option<f64>,
} }
/// Minimal peer summary shared via `NodeStateSnapshot.federated_peers`. /// Minimal peer summary shared via `NodeStateSnapshot.federated_peers`.

View File

@ -216,6 +216,44 @@ pub struct ApplyResult {
pub message: String, pub message: String,
} }
/// FIPS UDP transport port (matches `transports.udp.bind_addr` in the generated
/// `fips.yaml`). Direct peer links dial this, NOT the HTTP/LAN messaging port.
const FIPS_UDP_PORT: u16 = 8668;
/// Build transient seed-anchor entries that dial LAN-discovered federation peers
/// directly over their FIPS UDP transport. For each peer the registry knows both
/// a LAN socket address AND a FIPS npub for, point a `udp` anchor at
/// `<lan-ip>:8668`. This lets co-located federation nodes form a DIRECT FIPS link
/// instead of depending on the global anchor's spanning tree to route between
/// them (the cause of every dial falling back to Tor when the anchor link flaps).
///
/// This is FIPS's own UDP transport over the LAN — not Tailscale, not the LAN
/// HTTP messaging port. NOT persisted to `seed-anchors.json`: recomputed each
/// apply tick from live LAN discovery, so a peer's changing IP self-corrects and
/// stale entries never accumulate. `fipsctl connect` is idempotent, so
/// re-applying just keeps the link warm.
pub fn lan_fips_anchors(peers: &[crate::transport::PeerRecord]) -> Vec<SeedAnchor> {
let mut out = Vec::new();
for p in peers {
let (Some(lan), Some(npub)) = (p.lan_address.as_deref(), p.fips_npub.as_deref()) else {
continue;
};
// lan_address is the peer's HTTP/LAN socket ("ip:port"); reuse only its IP
// and target the FIPS UDP port. SocketAddr::new(...).to_string() formats
// IPv6 with brackets correctly.
let Ok(sa) = lan.parse::<std::net::SocketAddr>() else {
continue;
};
out.push(SeedAnchor {
npub: npub.to_string(),
address: std::net::SocketAddr::new(sa.ip(), FIPS_UDP_PORT).to_string(),
transport: "udp".to_string(),
label: "LAN federation peer (direct FIPS)".to_string(),
});
}
out
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;

View File

@ -1358,6 +1358,14 @@ mod tests {
host_port_ready: None, host_port_ready: None,
healthy: true, healthy: true,
}, },
ContainerHealth {
name: "indeedhub-minio".into(),
app_id: "indeedhub-minio".into(),
state: "running".into(),
podman_health: None,
host_port_ready: None,
healthy: true,
},
ContainerHealth { ContainerHealth {
name: "indeedhub-api".into(), name: "indeedhub-api".into(),
app_id: "indeedhub-api".into(), app_id: "indeedhub-api".into(),

View File

@ -98,11 +98,15 @@ async fn main() -> Result<()> {
let startup_start = std::time::Instant::now(); let startup_start = std::time::Instant::now();
crash_recovery::init_start_time(); crash_recovery::init_start_time();
// Initialize tracing // Initialize tracing. Default to `info`: production units don't set
// RUST_LOG, and the old `archipelago=debug` default flooded journald
// with per-request debug lines ("RPC method: …", cookie-flag notes) —
// part of a >1 GB/day journal on a fresh node. Set RUST_LOG (e.g.
// RUST_LOG=archipelago=debug) to get debug logs back when debugging.
tracing_subscriber::fmt() tracing_subscriber::fmt()
.with_env_filter( .with_env_filter(
tracing_subscriber::EnvFilter::try_from_default_env() tracing_subscriber::EnvFilter::try_from_default_env()
.unwrap_or_else(|_| "archipelago=debug,info".into()), .unwrap_or_else(|_| "info".into()),
) )
.init(); .init();
@ -149,13 +153,18 @@ async fn main() -> Result<()> {
); );
} }
// Write PID marker early so we can detect crashes on next startup // Check for a crash marker BEFORE writing our own. The old order wrote
// the marker first, so the check always read the CURRENT process's PID,
// found it alive, and skipped recovery — on every boot, forever.
let crash_containers = crash_recovery::check_for_crash(&config.data_dir).await;
// Now mark this instance as running so the next startup can detect a crash.
crash_recovery::write_pid_marker(&config.data_dir).await?; crash_recovery::write_pid_marker(&config.data_dir).await?;
// Run crash recovery before starting the manifest reconciler. Both paths // Run crash recovery before starting the manifest reconciler. Both paths
// mutate Podman; running them concurrently can corrupt transient runtime // mutate Podman; running them concurrently can corrupt transient runtime
// state and leave netavark/conmon unable to start containers. // state and leave netavark/conmon unable to start containers.
match crash_recovery::check_for_crash(&config.data_dir).await { match crash_containers {
Ok(Some(containers)) => { Ok(Some(containers)) => {
info!( info!(
"🔧 Recovering {} containers from previous crash...", "🔧 Recovering {} containers from previous crash...",
@ -198,6 +207,24 @@ async fn main() -> Result<()> {
(Some(trait_obj), Some(dev)) (Some(trait_obj), Some(dev))
} else { } else {
let prod = Arc::new(ProdContainerOrchestrator::new(config.clone()).await?); let prod = Arc::new(ProdContainerOrchestrator::new(config.clone()).await?);
// Pull the freshest signed app-catalog BEFORE loading manifests, so any
// registry-embedded manifest (the origin-wins overlay in load_manifests)
// is in place on THIS boot — not a restart later. Without this the boot
// would overlay the previous run's cached catalog and a newly-published
// app (e.g. a registry-only install) wouldn't appear until the next
// restart. Bounded + best-effort: on timeout/unreachable origin the
// last-cached catalog (or the disk manifests) still load — registry is
// an overlay on top of disk, never a hard dependency.
match tokio::time::timeout(
std::time::Duration::from_secs(25),
crate::container::app_catalog::refresh_catalog(&config.data_dir),
)
.await
{
Ok(Ok(n)) => info!("🛰️ app-catalog refreshed before manifest load ({n} apps)"),
Ok(Err(e)) => tracing::debug!("app-catalog pre-load refresh failed (using cache): {e}"),
Err(_) => tracing::debug!("app-catalog pre-load refresh timed out (using cache)"),
}
// Best-effort manifest load; a missing /opt/archipelago/apps is // Best-effort manifest load; a missing /opt/archipelago/apps is
// logged inside load_manifests and not fatal. // logged inside load_manifests and not fatal.
match prod.load_manifests().await { match prod.load_manifests().await {
@ -270,7 +297,9 @@ async fn main() -> Result<()> {
// via auth.setup RPC. The Login page detects is_setup=false and shows // via auth.setup RPC. The Login page detects is_setup=false and shows
// "Create Password" form instead of login form. // "Create Password" form instead of login form.
// Create server // Create server. Keep a clone of the orchestrator handle for the background
// update scheduler (per-app auto-update applies via the orchestrator).
let update_orchestrator = orchestrator.clone();
let server = Server::new(config.clone(), orchestrator, dev_orchestrator).await?; let server = Server::new(config.clone(), orchestrator, dev_orchestrator).await?;
// Start server // Start server
@ -295,10 +324,12 @@ async fn main() -> Result<()> {
}); });
} }
// Spawn background update scheduler // Spawn background update scheduler. Pass the orchestrator so the scheduler
// can apply per-app auto-update-to-latest (multi-version support) via the
// safe orchestrator upgrade path; None in dev mode disables it.
let update_data_dir = config.data_dir.clone(); let update_data_dir = config.data_dir.clone();
tokio::spawn(async move { tokio::spawn(async move {
update::run_update_scheduler(update_data_dir).await; update::run_update_scheduler(update_data_dir, update_orchestrator).await;
}); });
// Synchronize host-side doctor artifacts (script + systemd units) with // Synchronize host-side doctor artifacts (script + systemd units) with

View File

@ -181,7 +181,10 @@ async fn is_sender_allowed(
match peers.get(&sender_contact_id) { match peers.get(&sender_contact_id) {
// Match identity on the bound archipelago key (stable, advert/ // Match identity on the bound archipelago key (stable, advert/
// federation-verified), not the firmware routing key. // federation-verified), not the firmware routing key.
Some(p) => (p.identity_pubkey_hex().map(|s| s.to_string()), p.did.clone()), Some(p) => (
p.identity_pubkey_hex().map(|s| s.to_string()),
p.did.clone(),
),
None => (None, None), None => (None, None),
} }
}; };

View File

@ -314,17 +314,82 @@ pub(super) async fn try_chunk_reassemble(
/// Look up a peer by pubkey hex prefix. Returns (contact_id, display_name). /// Look up a peer by pubkey hex prefix. Returns (contact_id, display_name).
pub(super) async fn resolve_peer(state: &Arc<MeshState>, sender_prefix: &str) -> (u32, String) { pub(super) async fn resolve_peer(state: &Arc<MeshState>, sender_prefix: &str) -> (u32, String) {
let peers = state.peers.read().await; {
peers let peers = state.peers.read().await;
.values() if let Some(peer) = peers.values().find(|p| {
.find(|p| {
p.pubkey_hex p.pubkey_hex
.as_ref() .as_ref()
.map(|k| k.starts_with(sender_prefix)) .map(|k| k.starts_with(sender_prefix))
.unwrap_or(false) .unwrap_or(false)
}) }) {
.map(|p| (p.contact_id, p.advert_name.clone())) return (peer.contact_id, peer.advert_name.clone());
.unwrap_or((0, sender_prefix.to_string())) }
}
if let Some((node_num, pubkey_hex, name)) = meshtastic_peer_from_prefix(sender_prefix) {
let peer = MeshPeer {
contact_id: node_num,
advert_name: name.clone(),
did: None,
pubkey_hex: Some(pubkey_hex),
arch_pubkey_hex: None,
x25519_pubkey: None,
rssi: None,
snr: None,
last_heard: chrono::Utc::now().to_rfc3339(),
hops: 0xff,
last_advert: 0,
reachable: true,
// Stamped fresh from `peer_pubkeys` in `get_contacts` once a real
// contact refresh runs; unknown at synthesis time here.
pkc_capable: false,
lat: None,
lon: None,
};
let is_new = {
let mut peers = state.peers.write().await;
peers.insert(node_num, peer.clone()).is_none()
};
state.update_peer_count().await;
let _ = state.event_tx.send(if is_new {
MeshEvent::PeerDiscovered(peer)
} else {
MeshEvent::PeerUpdated(peer)
});
return (node_num, name);
}
(0, sender_prefix.to_string())
}
fn meshtastic_peer_from_prefix(sender_prefix: &str) -> Option<(u32, String, String)> {
if sender_prefix.len() < 12 {
return None;
}
let bytes = hex::decode(&sender_prefix[..12]).ok()?;
if bytes.len() != 6 || bytes[4] != b'm' || bytes[5] != b'e' {
return None;
}
let node_num = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
if node_num == 0 || node_num == u32::MAX {
return None;
}
let mut full_key = [0u8; 32];
full_key[..4].copy_from_slice(&node_num.to_le_bytes());
full_key[4..15].copy_from_slice(b"meshtastic:");
let name = format!("Meshtastic !{:08x}", node_num);
Some((node_num, hex::encode(full_key), name))
}
/// Stamp the SNR carried in a Meshcore v3 contact-message frame onto the
/// sender's peer record so the signal-bars indicator has real data (Meshcore
/// has no per-packet RSSI like Meshtastic, only this 1-byte SNR — see
/// `protocol::parse_contact_msg_v3_raw`).
pub(super) async fn update_peer_snr(state: &Arc<MeshState>, contact_id: u32, snr: f32) {
let mut peers = state.peers.write().await;
if let Some(peer) = peers.get_mut(&contact_id) {
peer.snr = Some(snr);
}
} }
/// Store a plain-text (non-typed) message and emit an event. /// Store a plain-text (non-typed) message and emit an event.
@ -333,8 +398,19 @@ pub(super) async fn store_plain_message(
contact_id: u32, contact_id: u32,
peer_name: &str, peer_name: &str,
text: &str, text: &str,
) {
store_plain_message_with_encryption(state, contact_id, peer_name, text, false).await;
}
pub(super) async fn store_plain_message_with_encryption(
state: &Arc<MeshState>,
contact_id: u32,
peer_name: &str,
text: &str,
encrypted: bool,
) { ) {
let msg_id = state.next_id().await; let msg_id = state.next_id().await;
let radio_transport = radio_transport_label(state.status.read().await.device_type);
let msg = MeshMessage { let msg = MeshMessage {
id: msg_id, id: msg_id,
direction: MessageDirection::Received, direction: MessageDirection::Received,
@ -343,7 +419,8 @@ pub(super) async fn store_plain_message(
plaintext: text.to_string(), plaintext: text.to_string(),
timestamp: chrono::Utc::now().to_rfc3339(), timestamp: chrono::Utc::now().to_rfc3339(),
delivered: true, delivered: true,
encrypted: false, encrypted,
transport: Some(radio_transport.to_string()),
message_type: "text".to_string(), message_type: "text".to_string(),
typed_payload: None, typed_payload: None,
sender_pubkey: None, sender_pubkey: None,
@ -501,6 +578,11 @@ pub(super) async fn handle_identity_received(
last_advert: 0, last_advert: 0,
// We just heard this peer's identity advert, so it's reachable. // We just heard this peer's identity advert, so it's reachable.
reachable: true, reachable: true,
// PKC capability is tracked by the radio driver's get_contacts(), not
// known at identity-advert time.
pkc_capable: false,
lat: None,
lon: None,
}; };
let is_new = { let is_new = {
@ -567,6 +649,7 @@ pub(super) async fn handle_received_message(
.map(|p| p.advert_name.clone()); .map(|p| p.advert_name.clone());
let msg_id = state.next_id().await; let msg_id = state.next_id().await;
let radio_transport = radio_transport_label(state.status.read().await.device_type);
let msg = MeshMessage { let msg = MeshMessage {
id: msg_id, id: msg_id,
direction: MessageDirection::Received, direction: MessageDirection::Received,
@ -576,6 +659,7 @@ pub(super) async fn handle_received_message(
timestamp: chrono::Utc::now().to_rfc3339(), timestamp: chrono::Utc::now().to_rfc3339(),
delivered: true, delivered: true,
encrypted, encrypted,
transport: Some(radio_transport.to_string()),
message_type: "text".to_string(), message_type: "text".to_string(),
typed_payload: None, typed_payload: None,
sender_pubkey: None, sender_pubkey: None,

View File

@ -34,7 +34,10 @@ async fn store_typed_message(
plaintext: display_text.to_string(), plaintext: display_text.to_string(),
timestamp: chrono::Utc::now().to_rfc3339(), timestamp: chrono::Utc::now().to_rfc3339(),
delivered: true, delivered: true,
// transport + E2E are stamped post-dispatch by
// handle_typed_envelope_direct, which alone knows the receive transport.
encrypted: false, encrypted: false,
transport: None,
message_type: type_label.to_string(), message_type: type_label.to_string(),
typed_payload, typed_payload,
sender_pubkey, sender_pubkey,
@ -70,7 +73,69 @@ pub(super) async fn handle_typed_message(
return; return;
} }
}; };
// Radio-delivered → the active device's transport label ("lora" or
// "reticulum"). Stamp after dispatch (see stamp helper).
let before = max_message_id(state).await;
handle_typed_envelope_direct(state, sender_contact_id, sender_name, envelope).await; handle_typed_envelope_direct(state, sender_contact_id, sender_name, envelope).await;
let radio_transport = radio_transport_label(state.status.read().await.device_type);
stamp_received_transport(state, sender_contact_id, before, radio_transport, false).await;
}
/// Highest stored message id right now. Paired with `stamp_received_transport`
/// to identify messages a dispatch call just stored (ids are monotonic).
pub(crate) async fn max_message_id(state: &Arc<MeshState>) -> u64 {
state
.messages
.read()
.await
.iter()
.map(|m| m.id)
.max()
.unwrap_or(0)
}
/// Stamp the per-message transport pill (and E2E flag) onto every RECEIVED
/// message from `sender_contact_id` stored since `after_id` — i.e. the ones the
/// just-completed `handle_typed_envelope_direct` produced. This is how both the
/// radio path ("lora") and the federation path ("fips"/"tor") tag inbound
/// messages without threading transport through all 20 typed-dispatch sites.
/// `encrypted` only ever sets the flag true (a federation envelope is E2E),
/// never clears a true set elsewhere.
pub(crate) async fn stamp_received_transport(
state: &Arc<MeshState>,
sender_contact_id: u32,
after_id: u64,
transport: &str,
encrypted: bool,
) {
let mut messages = state.messages.write().await;
for m in messages.iter_mut() {
if m.id > after_id
&& matches!(m.direction, MessageDirection::Received)
&& m.peer_contact_id == sender_contact_id
{
if m.transport.is_none() {
m.transport = Some(transport.to_string());
}
if encrypted {
m.encrypted = true;
}
}
}
}
/// Mark every RECEIVED message stored since `after_id` as end-to-end encrypted.
/// Used by the session loop to stamp the E2E pill on a meshtastic frame the radio
/// reported PKI-encrypted (the synthetic frame can't carry that flag, and the
/// typed-dispatch store path defaults `encrypted` to false). One inbound frame
/// yields at most one received message, so no sender filter is needed.
pub(crate) async fn stamp_received_encrypted(state: &Arc<MeshState>, after_id: u64) {
let mut messages = state.messages.write().await;
for m in messages.iter_mut() {
if m.id > after_id && matches!(m.direction, MessageDirection::Received) {
m.encrypted = true;
}
}
} }
/// Dispatch a pre-decoded TypedEnvelope. Shared between the radio receive /// Dispatch a pre-decoded TypedEnvelope. Shared between the radio receive

View File

@ -4,7 +4,8 @@ use super::super::message_types::TypedEnvelope;
use super::super::protocol; use super::super::protocol;
use super::decode::{ use super::decode::{
handle_identity_received, is_mc_chunk_frame, resolve_peer, store_plain_message, handle_identity_received, is_mc_chunk_frame, resolve_peer, store_plain_message,
try_base64_typed, try_chunk_reassemble, try_decrypt_base64, try_decrypt_ratchet_base64, store_plain_message_with_encryption, try_base64_typed, try_chunk_reassemble,
try_decrypt_base64, try_decrypt_ratchet_base64, update_peer_snr,
}; };
use super::dispatch::handle_typed_message; use super::dispatch::handle_typed_message;
use super::MeshState; use super::MeshState;
@ -62,12 +63,14 @@ pub(super) async fn handle_frame(
return true; // Signal caller to sync immediately return true; // Signal caller to sync immediately
} }
protocol::RESP_CONTACT_MSG_V3 => { protocol::RESP_CONTACT_MSG_V3 | protocol::RESP_CONTACT_MSG_V3_E2E => {
// Direct message received (v3 format) — check for typed envelope first // Direct message received (v3 format) — check for typed envelope first
match protocol::parse_contact_msg_v3_raw(&frame.data) { match protocol::parse_contact_msg_v3_raw(&frame.data) {
Ok((sender_prefix, payload, _snr)) => { Ok((sender_prefix, payload, snr)) => {
if !payload.is_empty() { if !payload.is_empty() {
let encrypted = frame.code == protocol::RESP_CONTACT_MSG_V3_E2E;
let (contact_id, name) = resolve_peer(state, &sender_prefix).await; let (contact_id, name) = resolve_peer(state, &sender_prefix).await;
update_peer_snr(state, contact_id, snr as f32).await;
if TypedEnvelope::is_typed(&payload) { if TypedEnvelope::is_typed(&payload) {
handle_typed_message(&payload, contact_id, &name, state).await; handle_typed_message(&payload, contact_id, &name, state).await;
} else if let Some(decoded) = try_base64_typed(&payload) { } else if let Some(decoded) = try_base64_typed(&payload) {
@ -86,7 +89,10 @@ pub(super) async fn handle_frame(
handle_typed_message(&decoded, contact_id, &name, state).await; handle_typed_message(&decoded, contact_id, &name, state).await;
} else if !payload.starts_with(b"MC") { } else if !payload.starts_with(b"MC") {
let text = String::from_utf8_lossy(&payload).to_string(); let text = String::from_utf8_lossy(&payload).to_string();
store_plain_message(state, contact_id, &name, &text).await; store_plain_message_with_encryption(
state, contact_id, &name, &text, encrypted,
)
.await;
info!(from = %sender_prefix, "Received mesh DM (v3)"); info!(from = %sender_prefix, "Received mesh DM (v3)");
} }
} }
@ -133,8 +139,14 @@ pub(super) async fn handle_frame(
match protocol::parse_channel_msg_v3_raw(&frame.data) { match protocol::parse_channel_msg_v3_raw(&frame.data) {
Ok((channel_idx, payload)) => { Ok((channel_idx, payload)) => {
if !payload.is_empty() { if !payload.is_empty() {
handle_channel_payload(state, channel_idx, &payload, our_x25519_secret) handle_channel_payload(
.await; state,
channel_idx,
&payload,
our_x25519_secret,
None,
)
.await;
} }
} }
Err(e) => warn!("Failed to parse v3 channel message: {}", e), Err(e) => warn!("Failed to parse v3 channel message: {}", e),
@ -146,14 +158,44 @@ pub(super) async fn handle_frame(
match protocol::parse_channel_msg_v1_raw(&frame.data) { match protocol::parse_channel_msg_v1_raw(&frame.data) {
Ok((channel_idx, payload)) => { Ok((channel_idx, payload)) => {
if !payload.is_empty() { if !payload.is_empty() {
handle_channel_payload(state, channel_idx, &payload, our_x25519_secret) handle_channel_payload(
.await; state,
channel_idx,
&payload,
our_x25519_secret,
None,
)
.await;
} }
} }
Err(e) => warn!("Failed to parse channel message: {}", e), Err(e) => warn!("Failed to parse channel message: {}", e),
} }
} }
// Synthetic Meshtastic channel broadcast that carries its sender:
// `[channel_idx: u8][sender_pubkey_prefix: 6 bytes][text…]`. Resolve the
// sender to a friendly name, then file the message under the channel
// thread attributed to them — this is what makes the default public
// LongFast channel actually show inbound traffic (and who sent it).
protocol::RESP_MESHTASTIC_CHANNEL_TEXT => {
if frame.data.len() > 7 {
let channel_idx = frame.data[0];
let sender_prefix_hex = hex::encode(&frame.data[1..7]);
let payload = frame.data[7..].to_vec();
if !payload.is_empty() {
let (_cid, name) = resolve_peer(state, &sender_prefix_hex).await;
handle_channel_payload(
state,
channel_idx,
&payload,
our_x25519_secret,
Some(name),
)
.await;
}
}
}
protocol::PUSH_LOG_DATA | protocol::PUSH_PATH_UPDATE | protocol::PUSH_RAW_DATA => { protocol::PUSH_LOG_DATA | protocol::PUSH_PATH_UPDATE | protocol::PUSH_RAW_DATA => {
// Internal device logging/path data — safe to ignore // Internal device logging/path data — safe to ignore
} }
@ -177,6 +219,12 @@ async fn handle_channel_payload(
channel_idx: u8, channel_idx: u8,
payload: &[u8], payload: &[u8],
our_x25519_secret: &[u8; 32], our_x25519_secret: &[u8; 32],
// When the transport knows who sent this channel broadcast (Meshtastic
// packets carry the originating node), the plain-text/typed message is filed
// under the channel thread but attributed to this sender name. Meshcore
// channel frames carry no sender, so they pass `None` and fall back to a
// generic "Channel N" label.
sender_name: Option<String>,
) { ) {
// DM-via-channel wrapper (text form): the channel text carries an // DM-via-channel wrapper (text form): the channel text carries an
// ASCII "@DM:<base64>" token somewhere in the body. We locate the // ASCII "@DM:<base64>" token somewhere in the body. We locate the
@ -385,15 +433,18 @@ async fn handle_channel_payload(
} }
} }
// Regular channel broadcast (not DM-wrapped) // Regular channel broadcast (not DM-wrapped). File it under the channel
// thread (contact_id = u32::MAX - idx) but label it with the real sender
// when the transport gave us one (Meshtastic), so the channel view shows who
// said what. Meshcore frames have no sender → generic "Channel N".
let chan_contact_id = u32::MAX - (channel_idx as u32); let chan_contact_id = u32::MAX - (channel_idx as u32);
let chan_name = format!("Channel {}", channel_idx); let chan_name = sender_name.unwrap_or_else(|| format!("Channel {}", channel_idx));
if TypedEnvelope::is_typed(payload) { if TypedEnvelope::is_typed(payload) {
handle_typed_message(payload, chan_contact_id, &chan_name, state).await; handle_typed_message(payload, chan_contact_id, &chan_name, state).await;
} else { } else {
let text = String::from_utf8_lossy(payload).to_string(); let text = String::from_utf8_lossy(payload).to_string();
store_plain_message(state, chan_contact_id, &chan_name, &text).await; store_plain_message(state, chan_contact_id, &chan_name, &text).await;
info!(channel = channel_idx, "Received mesh channel message"); info!(channel = channel_idx, sender = %chan_name, "Received mesh channel message");
} }
} }

View File

@ -28,6 +28,26 @@ const ADVERT_INTERVAL: Duration = Duration::from_secs(60);
/// How often to poll for queued messages when no push notifications. /// How often to poll for queued messages when no push notifications.
const SYNC_INTERVAL: Duration = Duration::from_secs(10); const SYNC_INTERVAL: Duration = Duration::from_secs(10);
/// Backlog #12 (provisioning robustness): if we haven't successfully received
/// ANY frame in this long, treat the serial link as stalled and force a
/// reconnect — the write-side `consecutive_write_failures` counter is blind
/// to a receive-only stall (writes can keep succeeding while the radio's
/// stopped streaming inbound, e.g. the FROM_RADIO_REBOOTED-without-recovery
/// case meshtastic.rs already has a targeted, immediate fix for — this
/// watchdog is just the backstop for a device that goes silent WITHOUT
/// emitting that notification).
///
/// 5 minutes was originally chosen on the (wrong) assumption that the 60s
/// advert / 10s sync cadence implies *received* traffic — those are our own
/// OUTBOUND cadences and say nothing about what peers send us. A quiet mesh
/// (no peer transmitting, or Reticulum/LXMF's point-to-point store-and-
/// forward model with no broadcast echo) can be legitimately RX-silent for
/// long stretches with the link perfectly healthy; at 300s this forced a
/// full auto-detect reconnect (visible in the UI as "Connecting…") every
/// ~5 minutes on otherwise-idle nodes. 30 minutes still catches a wedged
/// device in reasonable time without false-triggering on normal mesh quiet.
const RX_STALL_TIMEOUT: Duration = Duration::from_secs(1800);
/// Maximum stored messages (circular buffer). /// Maximum stored messages (circular buffer).
const MAX_MESSAGES: usize = 100; const MAX_MESSAGES: usize = 100;
@ -63,6 +83,25 @@ pub enum MeshCommand {
dest_pubkey_prefix: [u8; 6], dest_pubkey_prefix: [u8; 6],
payload: Vec<u8>, payload: Vec<u8>,
}, },
/// Send pre-encoded binary over a dedicated Reticulum RNS Resource
/// transfer instead of the small inline-chunk path — Reticulum-only, see
/// `MeshRadioDevice::send_resource`. Used for large attachments
/// (compressed photos, voice messages) that exceed the small-message cap
/// but fit a sane LoRa-Resource budget; routing decision is made by the
/// RPC layer (`mesh.transport-advice`'s `"resource-mesh"` tier).
SendResource {
dest_pubkey_prefix: [u8; 6],
payload: Vec<u8>,
},
/// Native LXMF `FIELD_IMAGE` send — Reticulum-only, for a stock
/// (non-archy) peer that can't decode our typed envelope. See
/// `MeshRadioDevice::send_native_image`.
SendNativeImage {
dest_pubkey_prefix: [u8; 6],
mime: String,
bytes: Vec<u8>,
caption: Option<String>,
},
/// Send PLAIN text as one or more native meshcore DMs to a stock client /// Send PLAIN text as one or more native meshcore DMs to a stock client
/// (e.g. a phone). Long text is split into multiple readable plain messages /// (e.g. a phone). Long text is split into multiple readable plain messages
/// — never MC-chunked — because stock clients can't reassemble archy's /// — never MC-chunked — because stock clients can't reassemble archy's
@ -77,6 +116,11 @@ pub enum MeshCommand {
payload: Vec<u8>, payload: Vec<u8>,
}, },
SendAdvert, SendAdvert,
/// Reboot the locally-connected radio firmware to recover a wedged /
/// RX-deaf radio. Meshtastic-only; meshcore ignores it.
RebootRadio {
seconds: i64,
},
/// Re-fetch contact list from the radio device. /// Re-fetch contact list from the radio device.
RefreshContacts, RefreshContacts,
/// Delete a contact from the firmware table (clear-all / unreachable wipe). /// Delete a contact from the firmware table (clear-all / unreachable wipe).
@ -251,6 +295,7 @@ impl MeshState {
channel_name: channel_name.to_string(), channel_name: channel_name.to_string(),
messages_sent: 0, messages_sent: 0,
messages_received: 0, messages_received: 0,
region: None,
}), }),
event_tx: tx, event_tx: tx,
next_message_id: RwLock::new(1), next_message_id: RwLock::new(1),
@ -367,12 +412,16 @@ impl MeshState {
/// 4. Reconnect on disconnect /// 4. Reconnect on disconnect
pub fn spawn_mesh_listener( pub fn spawn_mesh_listener(
state: Arc<MeshState>, state: Arc<MeshState>,
data_dir: std::path::PathBuf,
device_path: Option<String>, device_path: Option<String>,
our_did: String, our_did: String,
our_ed_pubkey_hex: String, our_ed_pubkey_hex: String,
our_x25519_secret: [u8; 32], our_x25519_secret: [u8; 32],
our_x25519_pubkey_hex: String, our_x25519_pubkey_hex: String,
server_name: Option<String>, server_name: Option<String>,
lora_region: Option<String>,
channel_name: Option<String>,
device_kind: Option<super::types::DeviceType>,
shutdown: tokio::sync::watch::Receiver<bool>, shutdown: tokio::sync::watch::Receiver<bool>,
cmd_rx: mpsc::Receiver<MeshCommand>, cmd_rx: mpsc::Receiver<MeshCommand>,
) -> tokio::task::JoinHandle<()> { ) -> tokio::task::JoinHandle<()> {
@ -380,6 +429,15 @@ pub fn spawn_mesh_listener(
let mut shutdown = shutdown; let mut shutdown = shutdown;
let mut cmd_rx = cmd_rx; let mut cmd_rx = cmd_rx;
let mut reconnect_delay = RECONNECT_DELAY_INIT; let mut reconnect_delay = RECONNECT_DELAY_INIT;
// Backlog #12 hot-swap re-binding: each run_mesh_session call already
// builds a fresh device struct (contacts/current_region/etc. all
// start empty), so per-device session state is naturally isolated
// across reconnects — there's no stale in-memory state to clear here.
// What's worth doing is detecting when the *physical radio itself*
// changed (a genuine hot-swap, not just the same radio reconnecting)
// so it's visible in logs rather than silently treated the same as
// an ordinary reconnect.
let mut last_self_node_id: Option<u32> = None;
loop { loop {
if *shutdown.borrow() { if *shutdown.borrow() {
info!("Mesh listener shutting down"); info!("Mesh listener shutting down");
@ -388,12 +446,16 @@ pub fn spawn_mesh_listener(
match session::run_mesh_session( match session::run_mesh_session(
&state, &state,
&data_dir,
device_path.as_deref(), device_path.as_deref(),
&our_did, &our_did,
&our_ed_pubkey_hex, &our_ed_pubkey_hex,
&our_x25519_secret, &our_x25519_secret,
&our_x25519_pubkey_hex, &our_x25519_pubkey_hex,
server_name.as_deref(), server_name.as_deref(),
lora_region.as_deref(),
channel_name.as_deref(),
device_kind,
&mut shutdown, &mut shutdown,
&mut cmd_rx, &mut cmd_rx,
) )
@ -414,6 +476,25 @@ pub fn spawn_mesh_listener(
} }
} }
// Hot-swap detection: compare this session's self_node_id against
// the last one we saw. A change means the physical radio itself
// was swapped (not just a reconnect of the same board).
{
let current_self_node_id = state.status.read().await.self_node_id;
if let (Some(prev), Some(cur)) = (last_self_node_id, current_self_node_id) {
if prev != cur {
info!(
previous_node_id = prev,
new_node_id = cur,
"Local mesh radio identity changed — treating as a hot-swapped device"
);
}
}
if current_self_node_id.is_some() {
last_self_node_id = current_self_node_id;
}
}
// Update status to disconnected // Update status to disconnected
{ {
let mut status = state.status.write().await; let mut status = state.status.write().await;

View File

@ -1,20 +1,24 @@
//! Mesh session lifecycle: connect, initialize, main loop. //! Mesh session lifecycle: connect, initialize, main loop.
use super::super::meshtastic::MeshtasticDevice; use super::super::meshtastic::MeshtasticDevice;
use super::super::reticulum::ReticulumLink;
use super::super::serial::MeshcoreDevice; use super::super::serial::MeshcoreDevice;
use super::super::types::*; use super::super::types::*;
use super::{ use super::{
frames, MeshCommand, MeshState, ADVERT_INTERVAL, MAX_CONSECUTIVE_WRITE_FAILURES, SYNC_INTERVAL, dispatch, frames, MeshCommand, MeshState, ADVERT_INTERVAL, MAX_CONSECUTIVE_WRITE_FAILURES,
RX_STALL_TIMEOUT, SYNC_INTERVAL,
}; };
use anyhow::{Context, Result}; use anyhow::{Context, Result};
use std::path::Path;
use std::sync::Arc; use std::sync::Arc;
use std::time::Duration; use std::time::{Duration, Instant};
use tokio::sync::mpsc; use tokio::sync::mpsc;
use tracing::{debug, error, info, warn}; use tracing::{debug, error, info, warn};
enum MeshRadioDevice { enum MeshRadioDevice {
Meshcore(MeshcoreDevice), Meshcore(MeshcoreDevice),
Meshtastic(MeshtasticDevice), Meshtastic(MeshtasticDevice),
Reticulum(ReticulumLink),
} }
impl MeshRadioDevice { impl MeshRadioDevice {
@ -22,6 +26,7 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(_) => DeviceType::Meshcore, Self::Meshcore(_) => DeviceType::Meshcore,
Self::Meshtastic(_) => DeviceType::Meshtastic, Self::Meshtastic(_) => DeviceType::Meshtastic,
Self::Reticulum(_) => DeviceType::Reticulum,
} }
} }
@ -29,6 +34,7 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.advert_name.clone(), Self::Meshcore(device) => device.advert_name.clone(),
Self::Meshtastic(device) => device.advert_name(), Self::Meshtastic(device) => device.advert_name(),
Self::Reticulum(device) => device.advert_name(),
} }
} }
@ -36,6 +42,37 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.set_advert_name(name).await, Self::Meshcore(device) => device.set_advert_name(name).await,
Self::Meshtastic(device) => device.set_advert_name(name).await, Self::Meshtastic(device) => device.set_advert_name(name).await,
Self::Reticulum(device) => device.set_advert_name(name).await,
}
}
/// Provision the operator-configured LoRa region. Meshcore radios manage
/// their own band on the device, so this is a no-op for them; Meshtastic
/// radios ship region-UNSET (RF-silent) and must be set or they never mesh.
/// Returns `Ok(true)` when a region was written (the device reboots to
/// apply, so the caller should restart the session). No-op for Reticulum:
/// the daemon's RNodeInterface config carries its own LoRa profile, not
/// driven through this firmware-admin path.
async fn ensure_lora_region(&mut self, region: Option<&str>) -> Result<bool> {
match self {
Self::Meshcore(_) => Ok(false),
Self::Meshtastic(device) => device.ensure_lora_region(region).await,
Self::Reticulum(_) => Ok(false),
}
}
/// Provision the shared archy primary channel so all nodes can decode each
/// other. No-op for meshcore (it joins its channel by name on the device);
/// Meshtastic radios can sit on mismatched channels otherwise and silently
/// drop every packet as undecryptable. Returns `Ok(true)` when a channel was
/// written (device reboots; caller should restart the session). No-op for
/// Reticulum: RNS has no shared-PSK channel concept (see
/// `ReticulumLink::send_channel_text`).
async fn ensure_channel(&mut self, channel_name: Option<&str>) -> Result<bool> {
match self {
Self::Meshcore(_) => Ok(false),
Self::Meshtastic(device) => device.ensure_channel(channel_name).await,
Self::Reticulum(_) => Ok(false),
} }
} }
@ -43,6 +80,33 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.send_self_advert().await, Self::Meshcore(device) => device.send_self_advert().await,
Self::Meshtastic(device) => device.send_self_advert().await, Self::Meshtastic(device) => device.send_self_advert().await,
Self::Reticulum(device) => device.send_self_advert().await,
}
}
/// Lightweight serial keepalive (Meshtastic only). Keeps the firmware
/// streaming RECEIVED packets to our serial client — without it the radio
/// can mark a quiet client gone and deliver only our own queue-status.
/// Meshcore/Reticulum need no such ping (Reticulum's "serial" traffic is
/// the daemon's own RNS link, not a firmware queue we poll).
async fn send_keepalive(&mut self) -> Result<()> {
match self {
Self::Meshcore(_) => Ok(()),
Self::Meshtastic(device) => device.send_keepalive().await,
Self::Reticulum(_) => Ok(()),
}
}
/// Actively advertise our identity over the air. Meshcore already does this
/// inside `send_self_advert` (CMD_SEND_SELF_ADVERT), so this is a no-op for
/// it; Meshtastic needs an explicit NodeInfo broadcast or peers never learn
/// about an already-running node. No-op for Reticulum: its `announce` (via
/// `send_self_advert`) already covers discovery.
async fn send_nodeinfo_advert(&mut self, want_response: bool) -> Result<()> {
match self {
Self::Meshcore(_) => Ok(()),
Self::Meshtastic(device) => device.send_nodeinfo_broadcast(want_response).await,
Self::Reticulum(_) => Ok(()),
} }
} }
@ -50,6 +114,7 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.send_channel_text(channel, payload).await, Self::Meshcore(device) => device.send_channel_text(channel, payload).await,
Self::Meshtastic(device) => device.send_channel_text(channel, payload).await, Self::Meshtastic(device) => device.send_channel_text(channel, payload).await,
Self::Reticulum(device) => device.send_channel_text(channel, payload).await,
} }
} }
@ -57,6 +122,54 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.send_text_msg(dest_pubkey_prefix, payload).await, Self::Meshcore(device) => device.send_text_msg(dest_pubkey_prefix, payload).await,
Self::Meshtastic(device) => device.send_text_msg(dest_pubkey_prefix, payload).await, Self::Meshtastic(device) => device.send_text_msg(dest_pubkey_prefix, payload).await,
Self::Reticulum(device) => device.send_text_msg(dest_pubkey_prefix, payload).await,
}
}
/// Send an image via native LXMF `FIELD_IMAGE` — Reticulum-only, for a
/// stock (non-archy) peer that can't decode our typed envelope. See
/// `ReticulumLink::send_native_image`.
async fn send_native_image(
&mut self,
dest_pubkey_prefix: &[u8; 6],
mime: &str,
bytes: &[u8],
caption: Option<&str>,
) -> Result<()> {
match self {
Self::Meshcore(_) | Self::Meshtastic(_) => {
anyhow::bail!("Native image send is Reticulum-only")
}
Self::Reticulum(device) => {
device.send_native_image(dest_pubkey_prefix, mime, bytes, caption).await
}
}
}
/// Send `data` over a dedicated RNS Resource transfer instead of the
/// small-payload "content" path — only Reticulum has anything resembling
/// this (a native large-binary transfer protocol over a `RNS.Link`).
/// Meshcore/Meshtastic have no equivalent in our driver; callers must
/// check `device_type() == DeviceType::Reticulum` before reaching for
/// this (see `mesh.transport-advice`'s `"resource-mesh"` tier, which is
/// Reticulum-only), so an Err here means the caller's gating is wrong,
/// not a legitimate no-op.
async fn send_resource(&mut self, dest_pubkey_prefix: &[u8; 6], data: &[u8]) -> Result<()> {
match self {
Self::Meshcore(_) | Self::Meshtastic(_) => {
anyhow::bail!("Resource transfer is Reticulum-only")
}
Self::Reticulum(device) => device.send_resource(dest_pubkey_prefix, data).await,
}
}
async fn reboot(&mut self, seconds: i64) -> Result<()> {
match self {
// Meshcore/Reticulum have no equivalent local-admin reboot in our
// driver; the RX-deaf recovery this targets is Meshtastic-specific.
Self::Meshcore(_) => Ok(()),
Self::Meshtastic(device) => device.reboot(seconds).await,
Self::Reticulum(_) => Ok(()),
} }
} }
@ -64,6 +177,7 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.remove_contact(pubkey).await, Self::Meshcore(device) => device.remove_contact(pubkey).await,
Self::Meshtastic(device) => device.remove_contact(pubkey).await, Self::Meshtastic(device) => device.remove_contact(pubkey).await,
Self::Reticulum(device) => device.remove_contact(pubkey).await,
} }
} }
@ -87,6 +201,11 @@ impl MeshRadioDevice {
.add_contact(pubkey, contact_type, flags, out_path_len, name, last_advert) .add_contact(pubkey, contact_type, flags, out_path_len, name, last_advert)
.await .await
} }
Self::Reticulum(device) => {
device
.add_contact(pubkey, contact_type, flags, out_path_len, name, last_advert)
.await
}
} }
} }
@ -94,6 +213,7 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.get_contacts().await, Self::Meshcore(device) => device.get_contacts().await,
Self::Meshtastic(device) => device.get_contacts().await, Self::Meshtastic(device) => device.get_contacts().await,
Self::Reticulum(device) => device.get_contacts().await,
} }
} }
@ -101,6 +221,8 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.reset_contact_path(pubkey).await, Self::Meshcore(device) => device.reset_contact_path(pubkey).await,
Self::Meshtastic(device) => device.reset_contact_path(pubkey).await, Self::Meshtastic(device) => device.reset_contact_path(pubkey).await,
// RNS does its own pathfinding — no firmware path table to reset.
Self::Reticulum(_) => Ok(()),
} }
} }
@ -108,6 +230,7 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.sync_messages().await, Self::Meshcore(device) => device.sync_messages().await,
Self::Meshtastic(device) => device.sync_messages().await, Self::Meshtastic(device) => device.sync_messages().await,
Self::Reticulum(device) => device.sync_messages().await,
} }
} }
@ -115,37 +238,89 @@ impl MeshRadioDevice {
match self { match self {
Self::Meshcore(device) => device.try_recv_frame().await, Self::Meshcore(device) => device.try_recv_frame().await,
Self::Meshtastic(device) => device.try_recv_frame().await, Self::Meshtastic(device) => device.try_recv_frame().await,
Self::Reticulum(device) => device.try_recv_frame().await,
}
}
/// PKI-E2E status of the last inbound frame (meshtastic only; meshcore's
/// per-message E2E is derived in the frames decrypt path). Reticulum/LXMF
/// is unconditionally E2E (no plaintext mode), so it always reports true.
/// Take-and-clear.
fn take_rx_encrypted(&mut self) -> bool {
match self {
Self::Meshcore(_) => false,
Self::Meshtastic(device) => device.take_rx_encrypted(),
Self::Reticulum(device) => device.take_rx_encrypted(),
} }
} }
} }
/// Scan all candidate serial ports and open the first supported mesh device found. /// Scan all candidate serial ports and open the first supported mesh device found.
async fn auto_detect_and_open() -> Result<(String, MeshRadioDevice, DeviceInfo)> { ///
/// `device_kind`, when set, pins the expected firmware (operator-confirmed via
/// `MeshConfig.device_kind` — see the plan's §2c reflashable-board note): only
/// that one device's probe runs, so a non-matching firmware's init bytes are
/// never injected into the port. `None` keeps the strict
/// Meshcore→Meshtastic→Reticulum probe order.
async fn auto_detect_and_open(
data_dir: &Path,
our_ed_pubkey_hex: &str,
our_x25519_pubkey_hex: &str,
device_kind: Option<DeviceType>,
) -> Result<(String, MeshRadioDevice, DeviceInfo)> {
let paths = super::super::serial::detect_serial_devices().await; let paths = super::super::serial::detect_serial_devices().await;
if paths.is_empty() { if paths.is_empty() {
anyhow::bail!("No serial devices found in /dev"); anyhow::bail!("No serial devices found in /dev");
} }
for path in &paths { for path in &paths {
debug!(path = %path, "Probing for mesh radio device"); debug!(path = %path, "Probing for mesh radio device");
match MeshcoreDevice::open(path).await { if device_kind.is_none_or(|k| k == DeviceType::Meshcore) {
Ok(mut dev) => match dev.initialize().await { match MeshcoreDevice::open(path).await {
Ok(info) => { Ok(mut dev) => match dev.initialize().await {
info!(path = %path, firmware = %info.firmware_version, "Found Meshcore device via auto-detect"); Ok(info) => {
return Ok((path.clone(), MeshRadioDevice::Meshcore(dev), info)); info!(path = %path, firmware = %info.firmware_version, "Found Meshcore device via auto-detect");
} return Ok((path.clone(), MeshRadioDevice::Meshcore(dev), info));
Err(e) => debug!(path = %path, error = %e, "Not a Meshcore device"), }
}, Err(e) => debug!(path = %path, error = %e, "Not a Meshcore device"),
Err(e) => debug!(path = %path, error = %e, "Could not open serial port"), },
Err(e) => debug!(path = %path, error = %e, "Could not open serial port"),
}
} }
match MeshtasticDevice::open(path).await { if device_kind.is_none_or(|k| k == DeviceType::Meshtastic) {
Ok(mut dev) => match dev.initialize().await { match MeshtasticDevice::open(path).await {
Ok(info) => { Ok(mut dev) => match dev.initialize().await {
info!(path = %path, firmware = %info.firmware_version, "Found Meshtastic device via auto-detect"); Ok(info) => {
return Ok((path.clone(), MeshRadioDevice::Meshtastic(dev), info)); info!(path = %path, firmware = %info.firmware_version, "Found Meshtastic device via auto-detect");
} return Ok((path.clone(), MeshRadioDevice::Meshtastic(dev), info));
Err(e) => debug!(path = %path, error = %e, "Not a Meshtastic device"), }
}, Err(e) => debug!(path = %path, error = %e, "Not a Meshtastic device"),
Err(e) => debug!(path = %path, error = %e, "Could not open serial port for Meshtastic"), },
Err(e) => debug!(path = %path, error = %e, "Could not open serial port for Meshtastic"),
}
}
// Tried LAST: the same reflashable board (e.g. Heltec V3) can run
// Meshcore, Meshtastic, or RNode firmware, so each probe must fail
// strictly before the next is attempted. The RNode KISS-detect probe
// is the most expensive (spawns the supervised daemon on a match), so
// it goes after the two cheap firmware-specific handshakes above.
if device_kind.is_none_or(|k| k == DeviceType::Reticulum) {
match ReticulumLink::open(
path,
data_dir,
Some(our_ed_pubkey_hex),
Some(our_x25519_pubkey_hex),
)
.await
{
Ok(mut dev) => match dev.initialize().await {
Ok(info) => {
info!(path = %path, "Found Reticulum (RNode) device via auto-detect");
return Ok((path.clone(), MeshRadioDevice::Reticulum(dev), info));
}
Err(e) => debug!(path = %path, error = %e, "Reticulum daemon failed to initialize"),
},
Err(e) => debug!(path = %path, error = %e, "Not a Reticulum RNode"),
}
} }
} }
anyhow::bail!( anyhow::bail!(
@ -155,7 +330,57 @@ async fn auto_detect_and_open() -> Result<(String, MeshRadioDevice, DeviceInfo)>
) )
} }
async fn open_preferred_path(path: &str) -> Result<(MeshRadioDevice, DeviceInfo)> { async fn open_preferred_path(
path: &str,
data_dir: &Path,
our_ed_pubkey_hex: &str,
our_x25519_pubkey_hex: &str,
device_kind: Option<DeviceType>,
) -> Result<(MeshRadioDevice, DeviceInfo)> {
// Pinned: try only the configured firmware and surface its own error —
// never fall through to (and inject probe bytes into) another firmware's
// handshake on this port.
if let Some(kind) = device_kind {
return match kind {
DeviceType::Meshcore => {
let mut dev = MeshcoreDevice::open(path)
.await
.context("Could not open preferred path as Meshcore")?;
let info = dev
.initialize()
.await
.context("Preferred path is not a working Meshcore device")?;
Ok((MeshRadioDevice::Meshcore(dev), info))
}
DeviceType::Meshtastic => {
let mut dev = MeshtasticDevice::open(path)
.await
.context("Could not open preferred path as Meshtastic")?;
let info = dev
.initialize()
.await
.context("Preferred path is not a working Meshtastic device")?;
Ok((MeshRadioDevice::Meshtastic(dev), info))
}
DeviceType::Reticulum => {
let mut dev = ReticulumLink::open(
path,
data_dir,
Some(our_ed_pubkey_hex),
Some(our_x25519_pubkey_hex),
)
.await
.context("Could not open preferred path as Reticulum")?;
let info = dev
.initialize()
.await
.context("Preferred path is not a working Reticulum RNode")?;
Ok((MeshRadioDevice::Reticulum(dev), info))
}
DeviceType::Unknown => anyhow::bail!("device_kind cannot be Unknown"),
};
}
match MeshcoreDevice::open(path).await { match MeshcoreDevice::open(path).await {
Ok(mut dev) => match dev.initialize().await { Ok(mut dev) => match dev.initialize().await {
Ok(info) => return Ok((MeshRadioDevice::Meshcore(dev), info)), Ok(info) => return Ok((MeshRadioDevice::Meshcore(dev), info)),
@ -165,10 +390,24 @@ async fn open_preferred_path(path: &str) -> Result<(MeshRadioDevice, DeviceInfo)
} }
match MeshtasticDevice::open(path).await { match MeshtasticDevice::open(path).await {
Ok(mut dev) => match dev.initialize().await { Ok(mut dev) => match dev.initialize().await {
Ok(info) => Ok((MeshRadioDevice::Meshtastic(dev), info)), Ok(info) => return Ok((MeshRadioDevice::Meshtastic(dev), info)),
Err(e) => Err(e).context("Preferred path is not Meshtastic"), Err(e) => debug!(path = %path, error = %e, "Preferred path is not Meshtastic"),
}, },
Err(e) => Err(e).context("Could not open preferred path as Meshtastic"), Err(e) => debug!(path = %path, error = %e, "Could not open preferred path as Meshtastic"),
}
match ReticulumLink::open(
path,
data_dir,
Some(our_ed_pubkey_hex),
Some(our_x25519_pubkey_hex),
)
.await
{
Ok(mut dev) => match dev.initialize().await {
Ok(info) => Ok((MeshRadioDevice::Reticulum(dev), info)),
Err(e) => Err(e).context("Preferred path is not a working Reticulum RNode"),
},
Err(e) => Err(e).context("Could not open preferred path as Reticulum"),
} }
} }
@ -372,8 +611,16 @@ async fn refresh_contacts(device: &mut MeshRadioDevice, state: &Arc<MeshState>)
// user-controlled feature; until then every firmware contact is // user-controlled feature; until then every firmware contact is
// surfaced. `radio_contact_blocklist` is retained but unused. // surfaced. `radio_contact_blocklist` is retained but unused.
let mut peers = state.peers.write().await; let mut peers = state.peers.write().await;
let is_meshtastic = matches!(device.device_type(), DeviceType::Meshtastic);
let is_reticulum = matches!(device.device_type(), DeviceType::Reticulum);
for (idx, contact) in contacts.iter().enumerate() { for (idx, contact) in contacts.iter().enumerate() {
let contact_id = idx as u32; let contact_id = if is_meshtastic {
meshtastic_contact_id(&contact.public_key_hex).unwrap_or(idx as u32)
} else if is_reticulum {
reticulum_contact_id(&contact.public_key_hex).unwrap_or(idx as u32)
} else {
idx as u32
};
let existing = peers.get(&contact_id); let existing = peers.get(&contact_id);
let peer = super::super::types::MeshPeer { let peer = super::super::types::MeshPeer {
contact_id, contact_id,
@ -386,14 +633,29 @@ async fn refresh_contacts(device: &mut MeshRadioDevice, state: &Arc<MeshState>)
// fail authentication after the next contact refresh. // fail authentication after the next contact refresh.
arch_pubkey_hex: existing.and_then(|p| p.arch_pubkey_hex.clone()), arch_pubkey_hex: existing.and_then(|p| p.arch_pubkey_hex.clone()),
x25519_pubkey: existing.and_then(|p| p.x25519_pubkey), x25519_pubkey: existing.and_then(|p| p.x25519_pubkey),
rssi: None, // Meshtastic-only today (see ParsedContact) — falls back to
snr: None, // whatever was already known if this refresh's contact
// snapshot doesn't carry a fresher reading (it always does
// for Meshtastic, since packet_to_inbound_frame updates the
// live contacts map on every heard packet; this fallback
// just avoids flapping to None on a transitional refresh).
rssi: contact.rssi.or_else(|| existing.and_then(|p| p.rssi)),
snr: contact.snr.or_else(|| existing.and_then(|p| p.snr)),
last_heard: chrono::Utc::now().to_rfc3339(), last_heard: chrono::Utc::now().to_rfc3339(),
hops: 0, hops: 0,
last_advert: contact.last_advert, last_advert: contact.last_advert,
// A non-zero path_len means the firmware has a route (direct // A non-zero path_len means the firmware has a route (direct
// or flood) to this contact — i.e. we can deliver to it. // or flood) to this contact — i.e. we can deliver to it.
reachable: contact.path_len != 0, reachable: contact.path_len != 0,
// E2E capability only grows (once the radio learns a peer's
// PKI key it stays known), so OR with any prior value rather
// than letting a transient contact refresh clear the pill.
pkc_capable: contact.pkc_capable
|| existing.map(|p| p.pkc_capable).unwrap_or(false),
// Position only ever improves to a fresher fix; never clear
// it just because a refresh's snapshot didn't carry one.
lat: contact.lat.or_else(|| existing.and_then(|p| p.lat)),
lon: contact.lon.or_else(|| existing.and_then(|p| p.lon)),
}; };
peers.insert(contact_id, peer); peers.insert(contact_id, peer);
} }
@ -447,6 +709,30 @@ async fn refresh_contacts(device: &mut MeshRadioDevice, state: &Arc<MeshState>)
} }
} }
fn meshtastic_contact_id(public_key_hex: &str) -> Option<u32> {
let bytes = hex::decode(public_key_hex).ok()?;
if bytes.len() < 15 || &bytes[4..15] != b"meshtastic:" {
return None;
}
let node_num = u32::from_le_bytes([bytes[0], bytes[1], bytes[2], bytes[3]]);
if node_num == 0 || node_num == u32::MAX {
None
} else {
Some(node_num)
}
}
/// Stable `u32` contact id derived from a Reticulum contact's `public_key_hex`
/// (hex of the 16-byte RNS destination hash). Delegates to the canonical
/// derivation in `reticulum.rs` so there is exactly one masking rule (must
/// stay below `FEDERATION_CONTACT_ID_BASE`, mod.rs:53) shared with
/// `ReticulumLink::initialize()`'s reported `node_id`.
fn reticulum_contact_id(public_key_hex: &str) -> Option<u32> {
let bytes = hex::decode(public_key_hex).ok()?;
let hash: [u8; 16] = bytes.try_into().ok()?;
Some(super::super::reticulum::reticulum_contact_id_from_hash(&hash))
}
/// Drain any queued messages from the device. /// Drain any queued messages from the device.
/// Returns `true` if a write/communication error occurred (for failure tracking). /// Returns `true` if a write/communication error occurred (for failure tracking).
async fn sync_queued_messages( async fn sync_queued_messages(
@ -471,32 +757,62 @@ async fn sync_queued_messages(
} }
} }
/// How many times we will try to write the LoRa region across reconnects before
/// giving up. A healthy radio accepts it on the first try (the reboot-and-verify
/// resolves on the next session). A radio that silently refuses to persist
/// config — corrupt/full flash, managed mode, etc. — would otherwise reboot-loop
/// forever; after this many attempts we stop, log, and run without it.
const MAX_REGION_PROVISION_ATTEMPTS: u32 = 3;
/// Process-global count of LoRa-region writes attempted (one radio per process).
/// Reset to 0 whenever the radio reports the desired region, so genuine later
/// drift re-provisions but a broken radio doesn't loop.
static REGION_PROVISION_ATTEMPTS: std::sync::atomic::AtomicU32 =
std::sync::atomic::AtomicU32::new(0);
/// Same retry-cap idea as the region, for the shared-channel write.
static CHANNEL_PROVISION_ATTEMPTS: std::sync::atomic::AtomicU32 =
std::sync::atomic::AtomicU32::new(0);
/// Run a single mesh session (connect, initialize, main loop). /// Run a single mesh session (connect, initialize, main loop).
pub(super) async fn run_mesh_session( pub(super) async fn run_mesh_session(
state: &Arc<MeshState>, state: &Arc<MeshState>,
data_dir: &Path,
preferred_path: Option<&str>, preferred_path: Option<&str>,
our_did: &str, our_did: &str,
our_ed_pubkey_hex: &str, our_ed_pubkey_hex: &str,
our_x25519_secret: &[u8; 32], our_x25519_secret: &[u8; 32],
our_x25519_pubkey_hex: &str, our_x25519_pubkey_hex: &str,
server_name: Option<&str>, server_name: Option<&str>,
lora_region: Option<&str>,
channel_name: Option<&str>,
device_kind: Option<DeviceType>,
shutdown: &mut tokio::sync::watch::Receiver<bool>, shutdown: &mut tokio::sync::watch::Receiver<bool>,
cmd_rx: &mut mpsc::Receiver<MeshCommand>, cmd_rx: &mut mpsc::Receiver<MeshCommand>,
) -> Result<()> { ) -> Result<()> {
// Detect device — try preferred path first, fall back to auto-detect // Detect device — try preferred path first, fall back to auto-detect
let (device_path, mut device, device_info) = if let Some(path) = preferred_path { let (device_path, mut device, device_info) = if let Some(path) = preferred_path {
match open_preferred_path(path).await { match open_preferred_path(
path,
data_dir,
our_ed_pubkey_hex,
our_x25519_pubkey_hex,
device_kind,
)
.await
{
Ok((dev, info)) => (path.to_string(), dev, info), Ok((dev, info)) => (path.to_string(), dev, info),
Err(e) => { Err(e) => {
warn!( warn!(
"Preferred path {} probe failed: {} — trying auto-detect", "Preferred path {} probe failed: {} — trying auto-detect",
path, e path, e
); );
auto_detect_and_open().await? auto_detect_and_open(data_dir, our_ed_pubkey_hex, our_x25519_pubkey_hex, device_kind)
.await?
} }
} }
} else { } else {
auto_detect_and_open().await? auto_detect_and_open(data_dir, our_ed_pubkey_hex, our_x25519_pubkey_hex, device_kind).await?
}; };
// Update status // Update status
@ -512,6 +828,73 @@ pub(super) async fn run_mesh_session(
let _ = state.event_tx.send(MeshEvent::DeviceConnected(device_info)); let _ = state.event_tx.send(MeshEvent::DeviceConnected(device_info));
// Provision the LoRa region before anything else. A fresh Meshtastic radio
// is region-UNSET and therefore RF-silent — it can neither hear nor be
// heard, so contact discovery and DMs would all silently fail. If we write
// a new region the firmware reboots to apply it; restart the session so we
// re-handshake the freshly-rebooted radio (and then set its name on the
// reconnect, where the region already matches and no reboot occurs).
use std::sync::atomic::Ordering;
let region_attempts = REGION_PROVISION_ATTEMPTS.load(Ordering::Relaxed);
if region_attempts < MAX_REGION_PROVISION_ATTEMPTS {
match device.ensure_lora_region(lora_region).await {
Ok(true) => {
REGION_PROVISION_ATTEMPTS.fetch_add(1, Ordering::Relaxed);
info!(
region = lora_region.unwrap_or(""),
attempt = region_attempts + 1,
max = MAX_REGION_PROVISION_ATTEMPTS,
"Provisioned LoRa region — radio rebooting, restarting mesh session"
);
// Give the radio time to reboot before the reconnect re-opens it.
tokio::time::sleep(Duration::from_secs(10)).await;
return Ok(());
}
// Radio reports the desired region (or none configured): clear the
// attempt counter so a future genuine drift re-provisions cleanly.
Ok(false) => REGION_PROVISION_ATTEMPTS.store(0, Ordering::Relaxed),
Err(e) => warn!("Failed to provision LoRa region: {}", e),
}
} else if lora_region.is_some() {
warn!(
region = lora_region.unwrap_or(""),
attempts = MAX_REGION_PROVISION_ATTEMPTS,
"Radio did not persist the configured LoRa region after repeated \
attempts continuing without it. The radio likely needs a manual \
factory reset / reflash; mesh discovery stays offline until its \
region is set."
);
}
// Provision the shared primary channel (after the region, since both reboot
// the radio). Without a matching channel two same-region radios still can't
// decode each other's traffic. Same retry-cap + restart-on-change pattern.
let channel_attempts = CHANNEL_PROVISION_ATTEMPTS.load(Ordering::Relaxed);
if channel_attempts < MAX_REGION_PROVISION_ATTEMPTS {
match device.ensure_channel(channel_name).await {
Ok(true) => {
CHANNEL_PROVISION_ATTEMPTS.fetch_add(1, Ordering::Relaxed);
info!(
channel = channel_name.unwrap_or(""),
attempt = channel_attempts + 1,
max = MAX_REGION_PROVISION_ATTEMPTS,
"Provisioned shared mesh channel — radio rebooting, restarting mesh session"
);
tokio::time::sleep(Duration::from_secs(10)).await;
return Ok(());
}
Ok(false) => CHANNEL_PROVISION_ATTEMPTS.store(0, Ordering::Relaxed),
Err(e) => warn!("Failed to provision mesh channel: {}", e),
}
} else if channel_name.is_some() {
warn!(
channel = channel_name.unwrap_or(""),
attempts = MAX_REGION_PROVISION_ATTEMPTS,
"Radio did not persist the shared mesh channel after repeated \
attempts continuing without it; the radio may need a manual reset."
);
}
// Set advert name to the server's human-readable name (e.g. "ThinkPad"), // Set advert name to the server's human-readable name (e.g. "ThinkPad"),
// falling back to the DID fragment if no name is configured. // falling back to the DID fragment if no name is configured.
let advert_name = if let Some(name) = server_name { let advert_name = if let Some(name) = server_name {
@ -536,6 +919,13 @@ pub(super) async fn run_mesh_session(
if let Err(e) = device.send_self_advert().await { if let Err(e) = device.send_self_advert().await {
warn!("Failed to send initial advert: {}", e); warn!("Failed to send initial advert: {}", e);
} }
// Actively announce our identity over the air with want_response, so any
// already-running neighbour both learns about us and replies with its own
// NodeInfo — immediate two-way discovery instead of waiting for the radio's
// multi-hour NodeInfo cycle. (No-op for meshcore.)
if let Err(e) = device.send_nodeinfo_advert(true).await {
warn!("Failed to send initial NodeInfo advert: {}", e);
}
// NOTE: Archipelago identity adverts (`ARCHY:2:{ed}:{x25519}`) are intentionally // NOTE: Archipelago identity adverts (`ARCHY:2:{ed}:{x25519}`) are intentionally
// NOT broadcast on the shared public channel (channel 0). Doing so spams every // NOT broadcast on the shared public channel (channel 0). Doing so spams every
@ -560,6 +950,11 @@ pub(super) async fn run_mesh_session(
advert_timer.tick().await; // skip first immediate tick advert_timer.tick().await; // skip first immediate tick
sync_timer.tick().await; sync_timer.tick().await;
let mut consecutive_write_failures: u32 = 0; let mut consecutive_write_failures: u32 = 0;
// Backlog #12 RX-stall watchdog — see RX_STALL_TIMEOUT's doc comment.
// Reset on the very first frame check too (not just successful reads),
// so a session that never receives anything still gets a full timeout
// window from startup rather than an immediately-stale clock.
let mut last_rx_at = Instant::now();
loop { loop {
// If too many consecutive writes have failed, the serial port is dead — // If too many consecutive writes have failed, the serial port is dead —
@ -574,19 +969,39 @@ pub(super) async fn run_mesh_session(
consecutive_write_failures consecutive_write_failures
); );
} }
if last_rx_at.elapsed() >= RX_STALL_TIMEOUT {
error!(
stalled_for_secs = last_rx_at.elapsed().as_secs(),
"No mesh frames received for too long — triggering reconnection"
);
anyhow::bail!(
"RX stalled for over {}s — forcing reconnect",
RX_STALL_TIMEOUT.as_secs()
);
}
tokio::select! { tokio::select! {
// Check for incoming frames // Check for incoming frames
frame_result = device.try_recv_frame() => { frame_result = device.try_recv_frame() => {
match frame_result { match frame_result {
Ok(Some(frame)) => { Ok(Some(frame)) => {
// Successful read resets the failure counter // Successful read resets the failure counter and the
// RX-stall watchdog.
consecutive_write_failures = 0; consecutive_write_failures = 0;
last_rx_at = Instant::now();
// For meshtastic, the PKI-E2E status of this frame can't
// ride the synthetic meshcore frame — snapshot the message
// id high-water mark, dispatch, then stamp the E2E pill on
// whatever received message this frame produced.
let before_id = dispatch::max_message_id(state).await;
let should_action = frames::handle_frame( let should_action = frames::handle_frame(
&frame, &frame,
state, state,
our_x25519_secret, our_x25519_secret,
).await; ).await;
if device.take_rx_encrypted() {
dispatch::stamp_received_encrypted(state, before_id).await;
}
if should_action { if should_action {
// Contact discovery or messages waiting — sync both // Contact discovery or messages waiting — sync both
refresh_contacts(&mut device, state).await; refresh_contacts(&mut device, state).await;
@ -615,6 +1030,13 @@ pub(super) async fn run_mesh_session(
} else { } else {
consecutive_write_failures = 0; consecutive_write_failures = 0;
} }
// Periodic over-air identity beacon (no want_response, to avoid
// reply storms) so peers that come online later still discover
// us between the radio's own infrequent NodeInfo broadcasts.
// No-op for meshcore (its self-advert above already goes out).
if let Err(e) = device.send_nodeinfo_advert(false).await {
debug!("Periodic NodeInfo advert failed: {}", e);
}
// (Identity re-broadcast on the public channel intentionally // (Identity re-broadcast on the public channel intentionally
// removed — see the note at session startup. It spammed the // removed — see the note at session startup. It spammed the
// shared channel every advert tick.) // shared channel every advert tick.)
@ -626,8 +1048,14 @@ pub(super) async fn run_mesh_session(
handle_send_command(cmd, &mut device, state, &mut consecutive_write_failures).await; handle_send_command(cmd, &mut device, state, &mut consecutive_write_failures).await;
} }
// Periodic message sync // Periodic message sync + serial keepalive
_ = sync_timer.tick() => { _ = sync_timer.tick() => {
// Keep the radio streaming inbound packets to our serial client
// (best-effort — a failed keepalive shouldn't trip the reconnect
// counter on its own; a truly dead port is caught by real writes).
if let Err(e) = device.send_keepalive().await {
debug!("Mesh keepalive failed: {}", e);
}
if sync_queued_messages(&mut device, state, our_x25519_secret).await { if sync_queued_messages(&mut device, state, our_x25519_secret).await {
consecutive_write_failures += 1; consecutive_write_failures += 1;
debug!(failures = consecutive_write_failures, "Message sync failed"); debug!(failures = consecutive_write_failures, "Message sync failed");
@ -707,6 +1135,53 @@ async fn handle_send_command(
) )
.await; .await;
} }
MeshCommand::SendResource {
dest_pubkey_prefix,
payload,
} => {
// No MC-chunk framing here — RNS Resources do their own native
// chunked transfer at the link layer, so the payload goes through
// as-is (the receiving daemon hands back the complete blob in one
// `resource_recv` event).
if let Err(e) = device.send_resource(&dest_pubkey_prefix, &payload).await {
*consecutive_write_failures += 1;
warn!(
failures = *consecutive_write_failures,
"Failed to send Reticulum resource: {}", e
);
} else {
*consecutive_write_failures = 0;
info!(
dest = %hex::encode(dest_pubkey_prefix),
len = payload.len(),
"Sent Reticulum resource transfer"
);
}
}
MeshCommand::SendNativeImage {
dest_pubkey_prefix,
mime,
bytes,
caption,
} => {
if let Err(e) = device
.send_native_image(&dest_pubkey_prefix, &mime, &bytes, caption.as_deref())
.await
{
*consecutive_write_failures += 1;
warn!(
failures = *consecutive_write_failures,
"Failed to send native image: {}", e
);
} else {
*consecutive_write_failures = 0;
info!(
dest = %hex::encode(dest_pubkey_prefix),
len = bytes.len(),
"Sent native LXMF image"
);
}
}
MeshCommand::BroadcastChannel { channel, payload } => { MeshCommand::BroadcastChannel { channel, payload } => {
if let Err(e) = device.send_channel_text(channel, &payload).await { if let Err(e) = device.send_channel_text(channel, &payload).await {
*consecutive_write_failures += 1; *consecutive_write_failures += 1;
@ -730,6 +1205,13 @@ async fn handle_send_command(
*consecutive_write_failures = 0; *consecutive_write_failures = 0;
} }
} }
MeshCommand::RebootRadio { seconds } => {
if let Err(e) = device.reboot(seconds).await {
warn!("Failed to reboot radio: {}", e);
} else {
info!(seconds, "Radio reboot command sent to device");
}
}
MeshCommand::RefreshContacts => { MeshCommand::RefreshContacts => {
refresh_contacts(device, state).await; refresh_contacts(device, state).await;
} }

File diff suppressed because it is too large Load Diff

View File

@ -192,16 +192,28 @@ pub struct MessageKey {
// ─── Wire Envelope ────────────────────────────────────────────────────── // ─── Wire Envelope ──────────────────────────────────────────────────────
/// CBOR wire envelope wrapping any typed message. /// CBOR wire envelope wrapping any typed message.
///
/// `v`/`sig` MUST use `compact_bytes`/`compact_bytes_opt` — this is the
/// envelope EVERY message type wraps its payload in, so plain derived
/// `Vec<u8>` encoding here (one CBOR integer per byte instead of a native
/// byte string) bloats every single message on the wire, not just
/// attachments. Root-caused live: a small ReadReceipt (tiny inner payload)
/// crossed the 140-byte single-frame threshold purely from this envelope's
/// own array-of-ints tax on `v`, triggering MC-chunked send to a Reticulum
/// peer whose chunks then failed to reassemble — surfaced as raw
/// `MC000...` fragments in the chat instead of a receipt. Fix this here,
/// not just on individual payload structs like `ContentInlinePayload`.
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct TypedEnvelope { pub struct TypedEnvelope {
/// Message type. /// Message type.
pub t: u8, pub t: u8,
/// Payload bytes (type-specific CBOR or raw data). /// Payload bytes (type-specific CBOR or raw data).
#[serde(with = "compact_bytes")]
pub v: Vec<u8>, pub v: Vec<u8>,
/// Unix timestamp (seconds since epoch). /// Unix timestamp (seconds since epoch).
pub ts: u32, pub ts: u32,
/// Optional Ed25519 signature of (t || v || ts_bytes) — for signed messages. /// Optional Ed25519 signature of (t || v || ts_bytes) — for signed messages.
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none", with = "compact_bytes_opt")]
pub sig: Option<Vec<u8>>, pub sig: Option<Vec<u8>>,
/// Message sequence number (per-sender, monotonically increasing). /// Message sequence number (per-sender, monotonically increasing).
#[serde(default)] #[serde(default)]
@ -481,6 +493,29 @@ pub struct ReactionPayload {
pub emoji: String, pub emoji: String,
} }
/// `Option<Vec<u8>>` <-> base64 string, for fields that need to survive a JSON
/// round-trip to the frontend readably (plain serde would emit/expect a JSON
/// array of numbers for `Vec<u8>`, which isn't what `data:` URLs want). CBOR
/// wire encoding pays a small (~33%) size tax for this on `thumb_bytes`
/// specifically — negligible given thumbnails are capped at ~60 bytes.
mod base64_opt_bytes {
use base64::{engine::general_purpose::STANDARD, Engine as _};
use serde::{Deserialize, Deserializer, Serializer};
pub fn serialize<S: Serializer>(v: &Option<Vec<u8>>, s: S) -> Result<S::Ok, S::Error> {
match v {
Some(bytes) => s.serialize_str(&STANDARD.encode(bytes)),
None => s.serialize_none(),
}
}
pub fn deserialize<'de, D: Deserializer<'de>>(d: D) -> Result<Option<Vec<u8>>, D::Error> {
let opt: Option<String> = Option::deserialize(d)?;
opt.map(|s| STANDARD.decode(&s).map_err(serde::de::Error::custom))
.transpose()
}
}
/// Content/attachment reference: points at a blob held by the sender that /// Content/attachment reference: points at a blob held by the sender that
/// recipients fetch out-of-band via `GET {sender_onion}/blob/{cid}?cap=..&exp=..&peer=..`. /// recipients fetch out-of-band via `GET {sender_onion}/blob/{cid}?cap=..&exp=..&peer=..`.
/// Thumb bytes (≤60B) may be inlined for immediate display; full blob is lazy. /// Thumb bytes (≤60B) may be inlined for immediate display; full blob is lazy.
@ -491,7 +526,7 @@ pub struct ContentRefPayload {
pub mime: String, pub mime: String,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub filename: Option<String>, pub filename: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none", with = "base64_opt_bytes")]
pub thumb_bytes: Option<Vec<u8>>, pub thumb_bytes: Option<Vec<u8>>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub caption: Option<String>, pub caption: Option<String>,
@ -503,6 +538,86 @@ pub struct ContentRefPayload {
pub cap_exp: u64, pub cap_exp: u64,
} }
/// Serde's blanket `Serialize`/`Deserialize` for `Vec<u8>` goes through
/// `serialize_seq`/one CBOR integer per byte, NOT CBOR's native byte-string
/// type — measured ~3.5x wire bloat on a real attachment send (4746 raw
/// bytes -> 16638-byte CBOR envelope) before this fix. `serialize_bytes`
/// maps to CBOR major type 2 (compact byte string) instead. Only apply this
/// to fields that never need JSON round-tripping to the frontend (this one
/// is CBOR-wire-only — the frontend gets `cid`/`size`/`mime` metadata built
/// by hand, never the raw bytes, see typed_messages.rs's `typed_json`).
mod compact_bytes {
use serde::{Deserializer, Serializer};
use std::fmt;
pub fn serialize<S: Serializer>(v: &[u8], s: S) -> Result<S::Ok, S::Error> {
s.serialize_bytes(v)
}
struct BytesVisitor;
impl<'de> serde::de::Visitor<'de> for BytesVisitor {
type Value = Vec<u8>;
fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.write_str("a byte string")
}
fn visit_bytes<E: serde::de::Error>(self, v: &[u8]) -> Result<Vec<u8>, E> {
Ok(v.to_vec())
}
fn visit_borrowed_bytes<E: serde::de::Error>(self, v: &'de [u8]) -> Result<Vec<u8>, E> {
Ok(v.to_vec())
}
fn visit_byte_buf<E: serde::de::Error>(self, v: Vec<u8>) -> Result<Vec<u8>, E> {
Ok(v)
}
// ciborium's non-self-describing byte-string decode path visits a
// seq of u8 in some configurations rather than calling visit_bytes
// directly — accept that too so this is robust to the reader mode.
fn visit_seq<A: serde::de::SeqAccess<'de>>(self, mut seq: A) -> Result<Vec<u8>, A::Error> {
let mut out = Vec::with_capacity(seq.size_hint().unwrap_or(0));
while let Some(byte) = seq.next_element::<u8>()? {
out.push(byte);
}
Ok(out)
}
}
pub fn deserialize<'de, D: Deserializer<'de>>(d: D) -> Result<Vec<u8>, D::Error> {
// NOT deserialize_bytes: ciborium's deserialize_bytes only succeeds
// when the byte string fits its small internal scratch buffer —
// anything bigger (any real attachment) falls through to an
// "invalid type: bytes, expected bytes" error despite the CBOR
// header being genuinely Bytes. deserialize_byte_buf streams
// segments into an unbounded Vec instead (confirmed against
// ciborium 0.2.2's de/mod.rs — deserialize_bytes's `Header::Bytes(Some(len))
// if len <= self.scratch.len()` guard vs deserialize_byte_buf's
// unconditional `Header::Bytes(len)` streaming path).
d.deserialize_byte_buf(BytesVisitor)
}
}
/// `Option<Vec<u8>>` variant of `compact_bytes` — for wire-only optional byte
/// fields (e.g. `TypedEnvelope.sig`) that never need JSON round-tripping.
/// Not the same as `base64_opt_bytes` below, which exists specifically
/// because `ContentRefPayload.thumb_bytes` DOES need a JSON-friendly (string)
/// form for the frontend's `data:` URL — this one stays fully binary.
mod compact_bytes_opt {
use serde::{Deserialize, Deserializer, Serializer};
pub fn serialize<S: Serializer>(v: &Option<Vec<u8>>, s: S) -> Result<S::Ok, S::Error> {
match v {
Some(bytes) => s.serialize_bytes(bytes),
None => s.serialize_none(),
}
}
pub fn deserialize<'de, D: Deserializer<'de>>(d: D) -> Result<Option<Vec<u8>>, D::Error> {
#[derive(Deserialize)]
struct Wrapper(#[serde(with = "super::compact_bytes")] Vec<u8>);
let opt: Option<Wrapper> = Option::deserialize(d)?;
Ok(opt.map(|w| w.0))
}
}
/// Inline attachment payload — file bytes carried directly in the envelope. /// Inline attachment payload — file bytes carried directly in the envelope.
/// Used when the file is small enough to chunk over LoRa and the peer has no /// Used when the file is small enough to chunk over LoRa and the peer has no
/// Tor path. Receiver writes `bytes` to its local BlobStore on reassembly /// Tor path. Receiver writes `bytes` to its local BlobStore on reassembly
@ -514,6 +629,7 @@ pub struct ContentInlinePayload {
pub filename: Option<String>, pub filename: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")] #[serde(default, skip_serializing_if = "Option::is_none")]
pub caption: Option<String>, pub caption: Option<String>,
#[serde(with = "compact_bytes")]
pub bytes: Vec<u8>, pub bytes: Vec<u8>,
} }
@ -607,6 +723,59 @@ pub fn decode_payload<T: for<'a> Deserialize<'a>>(data: &[u8]) -> Result<T> {
mod tests { mod tests {
use super::*; use super::*;
#[test]
fn typed_envelope_of_a_small_payload_stays_under_single_frame_budget() {
// Regression test: a ReadReceipt (tiny inner payload — one MessageKey)
// wrapped in TypedEnvelope crossed the 140-byte single-LoRa-frame
// threshold purely from the OUTER envelope's own `v: Vec<u8>` field
// using array-of-ints CBOR encoding, live-observed forcing an
// unnecessary MC-chunked send whose chunks then failed to reassemble
// over Reticulum (surfaced as raw `MC000...` garbage in the chat).
let receipt = ReadReceiptPayload {
up_to: MessageKey {
sender_pubkey: "b550de818bb907047aad60d368668b3815ce2fcb9fc35d8040bb21c5c6217ccc"
.to_string(),
sender_seq: 42,
},
};
let payload = encode_payload(&receipt).unwrap();
let envelope = TypedEnvelope::new(MeshMessageType::ReadReceipt, payload).with_seq(1);
let wire = envelope.to_wire().unwrap();
assert!(
wire.len() < 140,
"a ReadReceipt envelope should fit one LoRa frame (<140B), got {} bytes — \
TypedEnvelope.v is bloating again",
wire.len()
);
let decoded = TypedEnvelope::from_wire(&wire).unwrap();
let decoded_receipt: ReadReceiptPayload = decode_payload(&decoded.v).unwrap();
assert_eq!(decoded_receipt.up_to, receipt.up_to);
}
#[test]
fn content_inline_bytes_use_compact_cbor_encoding() {
// Regression test: Vec<u8> without #[serde(with = "compact_bytes")]
// serializes as one CBOR integer per byte (~3.5x bloat, measured on
// a real send: 4746 raw bytes -> 16638-byte wire envelope). Compact
// encoding should stay close to the raw size, not balloon with it.
let raw = vec![0xABu8; 4746];
let payload = ContentInlinePayload {
mime: "image/jpeg".to_string(),
filename: None,
caption: None,
bytes: raw.clone(),
};
let encoded = encode_payload(&payload).unwrap();
assert!(
encoded.len() < raw.len() + 200,
"expected compact encoding close to {} raw bytes, got {} wire bytes",
raw.len(),
encoded.len()
);
let decoded: ContentInlinePayload = decode_payload(&encoded).unwrap();
assert_eq!(decoded.bytes, raw);
}
#[test] #[test]
fn test_typed_envelope_wire_roundtrip() { fn test_typed_envelope_wire_roundtrip() {
let envelope = TypedEnvelope::new(MeshMessageType::Text, b"hello mesh".to_vec()); let envelope = TypedEnvelope::new(MeshMessageType::Text, b"hello mesh".to_vec());

View File

@ -14,6 +14,7 @@ pub mod message_types;
pub mod outbox; pub mod outbox;
pub mod protocol; pub mod protocol;
pub mod ratchet; pub mod ratchet;
pub mod reticulum;
pub mod scheduler; pub mod scheduler;
pub mod serial; pub mod serial;
pub mod session; pub mod session;
@ -245,6 +246,11 @@ pub(crate) async fn upsert_federation_peer(
last_advert: existing.as_ref().map(|p| p.last_advert).unwrap_or(0), last_advert: existing.as_ref().map(|p| p.last_advert).unwrap_or(0),
// Federation peers are reachable off-radio (Tor/FIPS), so always true. // Federation peers are reachable off-radio (Tor/FIPS), so always true.
reachable: true, reachable: true,
// Off-radio E2E (federation) is handled by the archy-peer path; preserve
// any radio PKI capability learned for a twinned contact.
pkc_capable: existing.as_ref().map(|p| p.pkc_capable).unwrap_or(false),
lat: existing.as_ref().and_then(|p| p.lat),
lon: existing.as_ref().and_then(|p| p.lon),
}; };
peers.insert(contact_id, peer); peers.insert(contact_id, peer);
// A radio twin of this node (same advert_name, no arch identity yet) can now // A radio twin of this node (same advert_name, no arch identity yet) can now
@ -326,6 +332,14 @@ pub struct MeshConfig {
/// Channel name for broadcasts. /// Channel name for broadcasts.
#[serde(default)] #[serde(default)]
pub channel_name: Option<String>, pub channel_name: Option<String>,
/// Meshtastic LoRa region (e.g. "EU_868", "US", "ANZ"). Fresh-flashed
/// Meshtastic radios ship region-UNSET and are RF-silent until a region is
/// set, so archy provisions this region on connect to bring every node onto
/// the same band automatically (the parity equivalent of a meshcore radio
/// coming up on its configured band). Ignored for meshcore devices and when
/// unset/None.
#[serde(default)]
pub lora_region: Option<String>,
/// Whether to periodically broadcast our identity. /// Whether to periodically broadcast our identity.
#[serde(default)] #[serde(default)]
pub broadcast_identity: bool, pub broadcast_identity: bool,
@ -369,6 +383,15 @@ pub struct MeshConfig {
/// when `assistant_trusted_only` is on and they aren't federation-Trusted. /// when `assistant_trusted_only` is on and they aren't federation-Trusted.
#[serde(default)] #[serde(default)]
pub assistant_allowed_contacts: Vec<String>, pub assistant_allowed_contacts: Vec<String>,
/// Pin the expected firmware on `device_path`/auto-detected ports. A
/// reflashable board (e.g. Heltec V3) can run Meshcore, Meshtastic, or
/// RNode firmware, so probe order alone is best-effort — set this when an
/// operator knows which one is plugged in. When `Some`, only that
/// device's probe runs (no other firmware's init bytes are ever injected
/// into the port); `None` keeps today's Meshcore→Meshtastic→Reticulum
/// strict-probe auto-detect.
#[serde(default)]
pub device_kind: Option<types::DeviceType>,
} }
fn default_assistant_backend() -> String { fn default_assistant_backend() -> String {
@ -385,6 +408,7 @@ impl Default for MeshConfig {
enabled: false, enabled: false,
device_path: None, device_path: None,
channel_name: Some("archipelago".to_string()), channel_name: Some("archipelago".to_string()),
lora_region: None,
broadcast_identity: true, broadcast_identity: true,
advert_name: None, advert_name: None,
mesh_only_mode: None, mesh_only_mode: None,
@ -397,6 +421,7 @@ impl Default for MeshConfig {
assistant_trusted_only: true, assistant_trusted_only: true,
assistant_backend: default_assistant_backend(), assistant_backend: default_assistant_backend(),
assistant_allowed_contacts: Vec::new(), assistant_allowed_contacts: Vec::new(),
device_kind: None,
} }
} }
} }
@ -669,12 +694,16 @@ impl MeshService {
let handle = listener::spawn_mesh_listener( let handle = listener::spawn_mesh_listener(
Arc::clone(&self.state), Arc::clone(&self.state),
self.data_dir.clone(),
self.config.device_path.clone(), self.config.device_path.clone(),
self.our_did.clone(), self.our_did.clone(),
self.our_ed_pubkey_hex.clone(), self.our_ed_pubkey_hex.clone(),
self.our_x25519_secret, self.our_x25519_secret,
self.our_x25519_pubkey_hex.clone(), self.our_x25519_pubkey_hex.clone(),
self.server_name.clone(), self.server_name.clone(),
self.config.lora_region.clone(),
self.config.channel_name.clone(),
self.config.device_kind,
shutdown_rx, shutdown_rx,
cmd_rx, cmd_rx,
); );
@ -910,7 +939,13 @@ impl MeshService {
/// Get current mesh status. /// Get current mesh status.
pub async fn status(&self) -> MeshStatus { pub async fn status(&self) -> MeshStatus {
self.state.status.read().await.clone() let mut status = self.state.status.read().await.clone();
// The operator-configured LoRa region isn't part of the live session
// state (it's config, read once at session start) — compose it in
// here rather than threading it through the session's shared status
// writes, for the Device tab (#8) to display.
status.region = self.config.lora_region.clone();
status
} }
/// Get a reference to the shared mesh state. /// Get a reference to the shared mesh state.
@ -1098,16 +1133,21 @@ impl MeshService {
// (FIPS→Tor) instead of handing it to a radio that physically cannot // (FIPS→Tor) instead of handing it to a radio that physically cannot
// deliver it. Reachable radio peers stay on the mesh; oversized // deliver it. Reachable radio peers stay on the mesh; oversized
// envelopes (file shares etc.) always take the federation path. // envelopes (file shares etc.) always take the federation path.
let radio_federated_unreachable = !is_federation_synthetic let radio_federated_unreachable = !is_federation_synthetic && !exceeds_lora && {
&& !exceeds_lora let peers = self.state.peers.read().await;
&& { peers
let peers = self.state.peers.read().await; .get(&contact_id)
peers .map(|p| !p.reachable && p.arch_pubkey_hex.is_some())
.get(&contact_id) .unwrap_or(false)
.map(|p| !p.reachable && p.arch_pubkey_hex.is_some()) };
.unwrap_or(false) let mesh_only_mode = load_config(&self.data_dir)
}; .await
if is_federation_synthetic || exceeds_lora || radio_federated_unreachable { .ok()
.and_then(|cfg| cfg.mesh_only_mode)
.unwrap_or(false);
if !mesh_only_mode
&& (is_federation_synthetic || exceeds_lora || radio_federated_unreachable)
{
// Resolve the peer's pubkey/did. Prefer the live mesh peer table, // Resolve the peer's pubkey/did. Prefer the live mesh peer table,
// but fall back to federation storage for federation-synthetic ids // but fall back to federation storage for federation-synthetic ids
// that were never seeded into `state.peers` — e.g. a radio-less // that were never seeded into `state.peers` — e.g. a radio-less
@ -1176,8 +1216,21 @@ impl MeshService {
// (`send_dm_via_channel` in listener/session.rs) handles both // (`send_dm_via_channel` in listener/session.rs) handles both
// single-frame and chunked transmission internally; we must NOT // single-frame and chunked transmission internally; we must NOT
// pre-chunk here as well or the receiver sees garbage. // pre-chunk here as well or the receiver sees garbage.
} else if mesh_only_mode
&& (is_federation_synthetic || exceeds_lora || radio_federated_unreachable)
{
tracing::info!(
contact_id,
bytes = wire.len(),
is_federation_synthetic,
exceeds_lora,
radio_federated_unreachable,
"Off-grid mode active; forcing mesh message over LoRa only"
);
} }
self.send_raw_payload(contact_id, wire).await?; self.send_raw_payload(contact_id, wire).await?;
let device_type = self.state.status.read().await.device_type;
let radio_transport = radio_transport_label(device_type);
Ok(self Ok(self
.record_sent_typed( .record_sent_typed(
contact_id, contact_id,
@ -1185,6 +1238,98 @@ impl MeshService {
display_text, display_text,
typed_payload, typed_payload,
sender_seq, sender_seq,
Some(radio_transport.to_string()),
// Archy↔archy typed envelopes over LoRa are identity-signed; the
// radio E2E flag (meshtastic PKI / meshcore session) isn't
// threaded to the send side yet, so don't over-claim E2E here —
// except Reticulum/LXMF, which is unconditionally E2E on every
// send regardless of peer/session state (see send_message).
device_type == DeviceType::Reticulum,
)
.await)
}
/// Send an image via native LXMF `FIELD_IMAGE` instead of our own typed
/// envelope — for a stock (non-archy) peer that can't decode our CBOR
/// wire format. Caller (the RPC layer) gates this on
/// `!is_archy_peer(contact_id)`; low-level "just send the bytes" shape
/// mirroring `send_raw_payload` — does NOT record a Sent MeshMessage
/// itself, callers use `record_sent_typed` same as the typed-envelope
/// paths so the Sent card renders identically regardless of which wire
/// format actually went out.
pub async fn send_native_image(
&self,
contact_id: u32,
mime: &str,
bytes: Vec<u8>,
caption: Option<String>,
) -> Result<()> {
let status = self.state.status.read().await;
if !status.device_connected {
anyhow::bail!("No mesh device connected");
}
drop(status);
let dest_prefix = self.peer_dest_prefix(contact_id).await?;
self.state
.send_cmd(listener::MeshCommand::SendNativeImage {
dest_pubkey_prefix: dest_prefix,
mime: mime.to_string(),
bytes,
caption,
})
.await
.map_err(|_| anyhow::anyhow!("Mesh listener not running"))?;
Ok(())
}
/// Send a typed envelope over a dedicated Reticulum RNS Resource transfer
/// (`MeshCommand::SendResource`) instead of the small inline-chunk path
/// `send_typed_wire`/`send_raw_payload` uses. Callers (the `mesh.send-content-inline`
/// RPC handler) are responsible for only reaching this when the active
/// device is actually Reticulum and the payload fits the
/// `RETICULUM_RESOURCE_MAX` budget — see `mesh.transport-advice`'s
/// `"resource-mesh"` tier, the single source of truth for that decision.
/// Mirrors `send_typed_wire`'s signature/return shape so RPC call sites
/// can switch between the two paths without restructuring.
pub async fn send_content_resource(
&self,
contact_id: u32,
wire: Vec<u8>,
type_label: &str,
display_text: &str,
typed_payload: Option<serde_json::Value>,
sender_seq: u64,
) -> Result<MeshMessage> {
let status = self.state.status.read().await;
if !status.device_connected {
anyhow::bail!("No mesh device connected");
}
drop(status);
let dest_prefix = self.peer_dest_prefix(contact_id).await?;
self.state
.send_cmd(listener::MeshCommand::SendResource {
dest_pubkey_prefix: dest_prefix,
payload: wire,
})
.await
.map_err(|_| anyhow::anyhow!("Mesh listener not running"))?;
let device_type = self.state.status.read().await.device_type;
let radio_transport = radio_transport_label(device_type);
Ok(self
.record_sent_typed(
contact_id,
type_label,
display_text,
typed_payload,
sender_seq,
Some(radio_transport.to_string()),
// Reticulum/LXMF is unconditionally E2E on every send — same
// reasoning as send_message's native-text path. This method
// is Reticulum-only by construction (callers gate on
// device_type before reaching it), so this is never wrong.
true,
) )
.await) .await)
} }
@ -1240,6 +1385,11 @@ impl MeshService {
display_text, display_text,
typed_payload, typed_payload,
sender_seq, sender_seq,
// Transport is finalized below once the background send resolves
// FIPS vs Tor; mark E2E now — a federation envelope is
// identity-signed and rides an encrypted transport.
None,
true,
) )
.await; .await;
@ -1249,6 +1399,10 @@ impl MeshService {
// MeshMessage and the UI's delivery indicator tracks the receipt. // MeshMessage and the UI's delivery indicator tracks the receipt.
let peer_onion_owned = peer_onion.to_string(); let peer_onion_owned = peer_onion.to_string();
let data_dir_owned = self.data_dir.clone(); let data_dir_owned = self.data_dir.clone();
// Finalize the Sent record's transport pill once we know which leg
// (FIPS/Tor) actually delivered it.
let state_for_transport = self.state.clone();
let sent_msg_id = msg.id;
tokio::spawn(async move { tokio::spawn(async move {
let fips_npub = let fips_npub =
crate::federation::fips_npub_for_onion(&data_dir_owned, &peer_onion_owned).await; crate::federation::fips_npub_for_onion(&data_dir_owned, &peer_onion_owned).await;
@ -1269,6 +1423,12 @@ impl MeshService {
match req.send_json(&body).await { match req.send_json(&body).await {
Ok((resp, transport)) if resp.status().is_success() => { Ok((resp, transport)) if resp.status().is_success() => {
tracing::debug!(contact_id, transport = %transport, "Federation envelope delivered"); tracing::debug!(contact_id, transport = %transport, "Federation envelope delivered");
// Tag the Sent bubble with the leg that delivered it (the
// transport pill: "fips" / "tor").
let mut messages = state_for_transport.messages.write().await;
if let Some(m) = messages.iter_mut().find(|m| m.id == sent_msg_id) {
m.transport = Some(transport.to_string());
}
} }
Ok((resp, transport)) => warn!( Ok((resp, transport)) => warn!(
contact_id, contact_id,
@ -1333,6 +1493,22 @@ impl MeshService {
Some(&display_name), Some(&display_name),
) )
.await; .await;
// The inbound HTTP gives no FIPS-vs-Tor signal, so label the message
// with the leg most recently used with this peer (federation storage's
// `last_transport`), defaulting to Tor. Federation envelopes are E2E
// (identity-signed over an encrypted transport).
let transport_label = {
let nodes = crate::federation::load_nodes(&self.data_dir)
.await
.unwrap_or_default();
nodes
.iter()
.find(|n| n.pubkey == from_pubkey_hex)
.and_then(|n| n.last_transport.clone())
.filter(|t| t == "fips" || t == "tor")
.unwrap_or_else(|| "tor".to_string())
};
let before = listener::dispatch::max_message_id(&self.state).await;
listener::dispatch::handle_typed_envelope_direct( listener::dispatch::handle_typed_envelope_direct(
&self.state, &self.state,
contact_id, contact_id,
@ -1340,6 +1516,14 @@ impl MeshService {
envelope, envelope,
) )
.await; .await;
listener::dispatch::stamp_received_transport(
&self.state,
contact_id,
before,
&transport_label,
true,
)
.await;
Ok(()) Ok(())
} }
@ -1441,6 +1625,7 @@ impl MeshService {
let chan_contact_id = u32::MAX - (channel as u32); let chan_contact_id = u32::MAX - (channel as u32);
let chan_name = format!("Channel {}", channel); let chan_name = format!("Channel {}", channel);
let msg_id = self.state.next_id().await; let msg_id = self.state.next_id().await;
let radio_transport = radio_transport_label(self.state.status.read().await.device_type);
let msg = MeshMessage { let msg = MeshMessage {
id: msg_id, id: msg_id,
direction: MessageDirection::Sent, direction: MessageDirection::Sent,
@ -1449,7 +1634,10 @@ impl MeshService {
plaintext: display_text.to_string(), plaintext: display_text.to_string(),
timestamp: chrono::Utc::now().to_rfc3339(), timestamp: chrono::Utc::now().to_rfc3339(),
delivered: false, delivered: false,
// Channel broadcasts use the shared channel PSK, not per-identity
// E2E — so not an E2E message, but it does travel over the radio.
encrypted: false, encrypted: false,
transport: Some(radio_transport.to_string()),
message_type: type_label.to_string(), message_type: type_label.to_string(),
typed_payload, typed_payload,
sender_pubkey: Some(self.our_ed_pubkey_hex.clone()), sender_pubkey: Some(self.our_ed_pubkey_hex.clone()),
@ -1470,39 +1658,78 @@ impl MeshService {
pub async fn send_message(&self, contact_id: u32, text: &str) -> Result<MeshMessage> { pub async fn send_message(&self, contact_id: u32, text: &str) -> Result<MeshMessage> {
use crate::mesh::message_types::{MeshMessageType, TypedEnvelope}; use crate::mesh::message_types::{MeshMessageType, TypedEnvelope};
let seq = self.state.next_send_seq(contact_id).await; let seq = self.state.next_send_seq(contact_id).await;
// Stock (non-archipelago) radio contacts — e.g. a phone running the let device_type = self.state.status.read().await.device_type;
// MeshCore app — can't decode our typed envelope and would render it as let archy = self.is_archy_peer(contact_id).await;
// garbled bytes. Send them the raw text as a plain native DM instead.
// Archipelago peers still get the typed envelope (seq/reply/reaction // Transport choice is DEVICE-AWARE so we fix Meshtastic without regressing
// addressing + encryption). // Meshcore:
if !self.is_archy_peer(contact_id).await { // • Meshtastic (any peer) → plain text native DM on TEXT_MESSAGE_APP. The
let dest_prefix = self.peer_dest_prefix(contact_id).await?; // firmware end-to-end (PKC/Curve25519) encrypts a directed DM to any
self.state // peer whose public key it knows (archy peers exchange them via
.send_cmd(listener::MeshCommand::SendNativeText { // NodeInfo), so it's delivered E2E and shows as chat on every client.
dest_pubkey_prefix: dest_prefix, // Meshtastic firmware 2.7.x will NOT deliver our opaque binary typed
payload: text.as_bytes().to_vec(), // envelope as a message (PRIVATE_APP is opaque app-data; a base64
}) // envelope overflows one LoRa frame and chunk-fails) — wrapping text
.await // is exactly what silently broke archy↔archy Meshtastic LoRa.
.map_err(|_| anyhow::anyhow!("Mesh listener not running"))?; // • Meshcore/Reticulum archy peer → keep the rich signed typed envelope.
return Ok(self // Meshcore frames are binary-safe (no UTF-8 mangling) and Reticulum/LXMF
.record_sent_typed(contact_id, "text", text, None, seq) // is binary-safe and high-capacity too; both carry their own transport
.await); // E2E plus our signature for `!ai` auth / seq reply addressing, so the
// envelope works there and we must not drop it.
// • Meshcore stock client → plain text (can't decode our envelope).
// Rich typed messages (invoice/coordinate/reaction/…) always use the
// typed-wire path via `send_typed_wire`; only plain Text is routed here.
let use_typed_envelope =
archy && matches!(device_type, DeviceType::Meshcore | DeviceType::Reticulum);
if use_typed_envelope {
// Sign with our archipelago identity so the receiver can authenticate
// us over LoRa (verifies against our bound `arch_pubkey_hex`). `with_seq`
// is applied after signing — seq is not covered by the signature.
let envelope = TypedEnvelope::new_signed(
MeshMessageType::Text,
text.as_bytes().to_vec(),
&self.signing_key,
)
.with_seq(seq);
let wire = envelope.to_wire()?;
return self
.send_typed_wire(contact_id, wire, "text", text, None, seq)
.await;
} }
// Sign the envelope with our archipelago identity key so the receiver
// can authenticate us over LoRa (it verifies against our bound let dest_prefix = self.peer_dest_prefix(contact_id).await?;
// `arch_pubkey_hex`). This is what lets a `!ai` typed in chat to a self.state
// trusted node pass the receiver's `trusted_only` gate over the radio — .send_cmd(listener::MeshCommand::SendNativeText {
// an unsigned radio packet can never authenticate. The signature is dest_pubkey_prefix: dest_prefix,
// optional on the wire and ignored by peers that don't know our key, so payload: text.as_bytes().to_vec(),
// it stays backward compatible. (Federation/Tor sends already sign in })
// `send_typed_wire_via_federation`.) `with_seq` is applied after signing
// — seq is not covered by the signature.
let envelope =
TypedEnvelope::new_signed(MeshMessageType::Text, text.as_bytes().to_vec(), &self.signing_key)
.with_seq(seq);
let wire = envelope.to_wire()?;
self.send_typed_wire(contact_id, wire, "text", text, None, seq)
.await .await
.map_err(|_| anyhow::anyhow!("Mesh listener not running"))?;
// The firmware PKI-encrypts a directed DM to any peer whose key it knows;
// archy peers always exchange keys, so mark those Sent rows E2E so the
// pill shows immediately. A non-archy stock peer (e.g. 3ccc) can also be
// PKC-capable once we've learned its NodeInfo public key — OR that in too
// so the pill isn't archy-only. (The receiver independently stamps E2E
// from the radio's `pki_encrypted` flag, so an inbound row is accurate
// regardless.)
//
// Reticulum/LXMF has no such conditional: every send is encrypted to the
// destination's identity key by the LXMF router itself, archy peer or
// not — so it's unconditionally E2E rather than gated on `archy`/`pkc_capable`
// (which is a Meshtastic-only concept; Reticulum contacts never set it).
let pkc_capable = self.peer_pkc_capable(contact_id).await;
let encrypted = device_type == DeviceType::Reticulum || archy || pkc_capable;
Ok(self
.record_sent_typed(
contact_id,
"text",
text,
None,
seq,
Some(radio_transport_label(device_type).to_string()),
encrypted,
)
.await)
} }
/// Whether `contact_id` is an archipelago peer (vs a stock meshcore client). /// Whether `contact_id` is an archipelago peer (vs a stock meshcore client).
@ -1510,7 +1737,7 @@ impl MeshService {
/// only once we've learned their archipelago identity (DID or x25519 key, /// only once we've learned their archipelago identity (DID or x25519 key,
/// from federation seeding or an identity exchange). Stock clients have /// from federation seeding or an identity exchange). Stock clients have
/// neither, so we send them plain text rather than typed envelopes. /// neither, so we send them plain text rather than typed envelopes.
async fn is_archy_peer(&self, contact_id: u32) -> bool { pub(crate) async fn is_archy_peer(&self, contact_id: u32) -> bool {
if contact_id & 0x8000_0000 != 0 { if contact_id & 0x8000_0000 != 0 {
return true; return true;
} }
@ -1521,6 +1748,21 @@ impl MeshService {
.unwrap_or(false) .unwrap_or(false)
} }
/// Whether `contact_id`'s real radio PKI (Curve25519) key is known, so the
/// firmware delivers a directed DM to it end-to-end encrypted even though
/// it's not an archipelago peer (e.g. stock Meshtastic peer 3ccc). Stamped
/// onto `MeshPeer::pkc_capable` by `refresh_contacts` from the driver's
/// `get_contacts()`.
async fn peer_pkc_capable(&self, contact_id: u32) -> bool {
self.state
.peers
.read()
.await
.get(&contact_id)
.map(|p| p.pkc_capable)
.unwrap_or(false)
}
/// Record a Sent MeshMessage for a typed envelope that has already been /// Record a Sent MeshMessage for a typed envelope that has already been
/// transmitted by the caller. Used by the RPC layer after sending /// transmitted by the caller. Used by the RPC layer after sending
/// invoice/coordinate/alert/etc. so the UI gets a proper rich Sent card /// invoice/coordinate/alert/etc. so the UI gets a proper rich Sent card
@ -1532,6 +1774,8 @@ impl MeshService {
display_text: &str, display_text: &str,
typed_payload: Option<serde_json::Value>, typed_payload: Option<serde_json::Value>,
sender_seq: u64, sender_seq: u64,
transport: Option<String>,
encrypted: bool,
) -> MeshMessage { ) -> MeshMessage {
let msg_id = self.state.next_id().await; let msg_id = self.state.next_id().await;
let peer_name = self let peer_name = self
@ -1549,7 +1793,8 @@ impl MeshService {
plaintext: display_text.to_string(), plaintext: display_text.to_string(),
timestamp: chrono::Utc::now().to_rfc3339(), timestamp: chrono::Utc::now().to_rfc3339(),
delivered: false, delivered: false,
encrypted: false, encrypted,
transport,
message_type: type_label.to_string(), message_type: type_label.to_string(),
typed_payload, typed_payload,
sender_pubkey: Some(self.our_ed_pubkey_hex.clone()), sender_pubkey: Some(self.our_ed_pubkey_hex.clone()),
@ -1591,6 +1836,7 @@ impl MeshService {
let chan_contact_id = u32::MAX - (channel as u32); let chan_contact_id = u32::MAX - (channel as u32);
let chan_name = format!("Channel {}", channel); let chan_name = format!("Channel {}", channel);
let msg_id = self.state.next_id().await; let msg_id = self.state.next_id().await;
let radio_transport = radio_transport_label(self.state.status.read().await.device_type);
let msg = MeshMessage { let msg = MeshMessage {
id: msg_id, id: msg_id,
@ -1600,7 +1846,9 @@ impl MeshService {
plaintext: text.to_string(), plaintext: text.to_string(),
timestamp: chrono::Utc::now().to_rfc3339(), timestamp: chrono::Utc::now().to_rfc3339(),
delivered: false, delivered: false,
// Plain channel broadcast over the radio (shared PSK, not E2E).
encrypted: false, encrypted: false,
transport: Some(radio_transport.to_string()),
message_type: "text".to_string(), message_type: "text".to_string(),
typed_payload: None, typed_payload: None,
sender_pubkey: None, sender_pubkey: None,
@ -1634,6 +1882,26 @@ impl MeshService {
Ok(()) Ok(())
} }
/// Reboot the locally-connected radio firmware to recover a wedged /
/// RX-deaf radio (one that has stopped hearing the mesh while still able to
/// transmit). The device reconnects via the listener's reboot→reconnect
/// loop. `seconds` is the firmware reboot delay.
pub async fn reboot_radio(&self, seconds: i64) -> Result<()> {
let status = self.state.status.read().await;
if !status.device_connected {
anyhow::bail!("No mesh device connected. Check USB connection.");
}
drop(status);
self.state
.send_cmd(listener::MeshCommand::RebootRadio { seconds })
.await
.map_err(|_| anyhow::anyhow!("Mesh listener not running"))?;
info!(seconds, "Mesh radio reboot triggered");
Ok(())
}
/// Current mesh-AI assistant settings (issue #50). /// Current mesh-AI assistant settings (issue #50).
pub async fn assistant_config(&self) -> listener::AssistantConfig { pub async fn assistant_config(&self) -> listener::AssistantConfig {
self.state.assistant.read().await.clone() self.state.assistant.read().await.clone()
@ -1642,7 +1910,13 @@ impl MeshService {
/// Recently-denied `!ai` askers (newest first) so the UI can offer to allow /// Recently-denied `!ai` askers (newest first) so the UI can offer to allow
/// them. Cleared implicitly as new denials rotate older ones out. /// them. Cleared implicitly as new denials rotate older ones out.
pub async fn assistant_denied_askers(&self) -> Vec<listener::DeniedAsker> { pub async fn assistant_denied_askers(&self) -> Vec<listener::DeniedAsker> {
self.state.assist_denied.read().await.iter().cloned().collect() self.state
.assist_denied
.read()
.await
.iter()
.cloned()
.collect()
} }
/// Update the mesh-AI assistant settings live (no listener restart) and /// Update the mesh-AI assistant settings live (no listener restart) and
@ -1859,6 +2133,9 @@ mod tests {
hops: 0, hops: 0,
last_advert: 0, last_advert: 0,
reachable, reachable,
pkc_capable: false,
lat: None,
lon: None,
} }
} }

View File

@ -64,6 +64,20 @@ pub const RESP_CONTACT_MSG_V3: u8 = 0x10;
pub const RESP_CHANNEL_MSG_V3: u8 = 0x11; pub const RESP_CHANNEL_MSG_V3: u8 = 0x11;
pub const RESP_CHANNEL_INFO: u8 = 0x12; pub const RESP_CHANNEL_INFO: u8 = 0x12;
pub const RESP_STATS: u8 = 0x18; pub const RESP_STATS: u8 = 0x18;
/// Archipelago-internal synthetic response code used by the Meshtastic adapter
/// for text DMs that the firmware reports as PKI-encrypted. Meshcore firmware
/// never emits this code; it lets the shared listener persist the E2E badge
/// without changing the on-wire Meshcore frame format.
pub const RESP_CONTACT_MSG_V3_E2E: u8 = 0x13;
/// Archipelago-internal synthetic response code used by the Meshtastic adapter
/// for CHANNEL broadcast text (e.g. the default public LongFast channel). Unlike
/// the Meshcore `RESP_CHANNEL_MSG_V3` — which carries no sender — a Meshtastic
/// MeshPacket gives us the originating node, so the listener can both file the
/// message under the channel thread AND attribute it to its sender. Frame
/// layout: `[channel_idx: u8][sender_pubkey_prefix: 6 bytes][text…]`. Kept below
/// 0x80 so it is not mistaken for a device push notification; Meshcore never
/// emits it.
pub const RESP_MESHTASTIC_CHANNEL_TEXT: u8 = 0x70;
// --- Push notification codes (device -> host, async, >= 0x80) --- // --- Push notification codes (device -> host, async, >= 0x80) ---
pub const PUSH_CONTACT_ADVERT: u8 = 0x80; pub const PUSH_CONTACT_ADVERT: u8 = 0x80;
@ -377,6 +391,21 @@ pub struct ParsedContact {
pub contact_type: u8, pub contact_type: u8,
pub path_len: u8, pub path_len: u8,
pub flags: u8, pub flags: u8,
/// Whether this contact is end-to-end (PKI / Curve25519) capable. Only the
/// Meshtastic adapter sets this (true once we've learned the peer's real
/// NodeInfo public key, so the firmware delivers DMs PKC-encrypted). Meshcore
/// contacts leave it `false` — their E2E status is tracked per-message.
pub pkc_capable: bool,
/// Signal strength (dBm) / signal-to-noise ratio (dB) of the most recently
/// heard packet from this contact. Meshtastic-only today (from
/// `MeshPacket.rx_rssi`/`.rx_snr`); other transports leave these `None`.
pub rssi: Option<i16>,
pub snr: Option<f32>,
/// Last known position, from a Meshtastic `POSITION_APP` broadcast
/// (`Position.latitude_i`/`.longitude_i`, degrees). `None` until the
/// contact has shared one.
pub lat: Option<f64>,
pub lon: Option<f64>,
} }
/// Parse RESP_CONTACT (0x03) response. /// Parse RESP_CONTACT (0x03) response.
@ -419,6 +448,15 @@ pub fn parse_contact(data: &[u8]) -> Result<ParsedContact> {
contact_type, contact_type,
path_len, path_len,
flags, flags,
// Meshcore tracks E2E per message, not per contact.
pkc_capable: false,
// Meshcore's own contact format does carry lat/lon at a fixed offset
// (see the format comment above) but wiring that up is out of scope
// for this Meshtastic-specific backlog item.
rssi: None,
snr: None,
lat: None,
lon: None,
}) })
} }

Some files were not shown because too many files have changed in this diff Show More