Compare commits

..

74 Commits

Author SHA1 Message Date
archipelago
a38c9d5f29 docs(master-plan): §10d Meshtastic MeshCore-parity status (one open received-msg bug)
Region (EU_868) + shared channel "archipelago" auto-provisioning shipped in
8fdb45e8 and riding the rolled #9 fleet binary (0060dcd6). Discovery, RF, and
sending verified on .116+.228; the one open blocker is the running driver not
surfacing received messages. Slotted after WS-F #9–11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:53:06 -04:00
archipelago
f9a6ae3f32 feat(mesh): Meshtastic region + shared-channel auto-provisioning (MeshCore parity)
Fresh Meshtastic radios ship region-UNSET (RF-silent) and on mismatched
channels, so nodes only ever saw themselves. Bring them to MeshCore parity
using the official Meshtastic admin API:

- Auto-provision LoRa region (set_config, AdminMessage field 34) from a new
  mesh-config `lora_region` (e.g. EU_868) when the radio's region differs.
- Auto-provision a shared primary channel (set_channel, field 33) with a
  PSK derived deterministically from channel_name, so every node converges on
  one mesh — the parity equivalent of MeshCore's named "archipelago" channel.
- Read current region/channel from want_config; only write when different
  (no reboot loop); cap attempts so a radio that won't persist can't loop.
- Active NodeInfo advert scaffolding + aggressive serial drain.

Verified on .116+.228: region+channel persist, discovery works (both see each
other as named reachable contacts), bidirectional RF + sending confirmed.
Receiving in the running driver is still under diagnosis (instrumentation added).

Also removes the unwanted `meshtastic` daemon app from the registry (it was
never meant to be a container — native driver provides system-level support):
deletes apps/meshtastic + catalog entries (app-catalog, neode-ui, releases) +
test refs. Meshtastic stays native, like MeshCore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:46:35 -04:00
archipelago
fd3a4ee4ef fix(orchestrator): chown the whole fresh bind subtree, not just the leaf
ensure_bind_mount_dirs chowned a freshly-created no-data_uid bind dir
with --reference={immediate_parent}. For a NESTED bind source like
jellyfin's /var/lib/archipelago/jellyfin/config (or netbird's .../netbird/
data), `mkdir -p` creates the intermediate <app> dir root:root too, so
referencing the immediate parent just copied ROOT — leaving the dir
unwritable and the app EACCES-crash-looping on reinstall (found by the
all-apps-lifecycle pass: jellyfin "/config/log denied" exit 139;
netbird-server "unable to open database file"). It only ever worked for
direct children of the data root (immich).

Fix: anchor to the nearest PRE-EXISTING ancestor (the rootless data root,
owned by the service user) and chown -R the entire newly-created subtree
to it. Extracted the walk into fresh_subtree_anchor() with a unit test
covering nested / direct / second-volume cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:46:35 -04:00
Dorian
38d2bbf570 chore(android): update companion APK download [skip ci] 2026-06-26 13:08:37 +01:00
Dorian
a90fea80ed feat(android): edit server entries from in-app settings menu (NESMenu); bump to 0.4.12 (vc16)
The 0.4.11 edit affordance only lived on ServerConnectScreen, which a
connected user never sees. Add edit to NESMenu — the settings modal
reached via two-finger hold while connected: a ✎ pencil on each saved
server opens the form pre-populated (Edit Server header + Cancel),
persists via ServerPreferences.updateSavedServer(), and reconnects when
the edited server is the live one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 13:08:18 +01:00
Dorian
389e602097 chore(android): update companion APK download [skip ci] 2026-06-26 12:54:52 +01:00
Dorian
5677f9cca1 feat(android): edit saved server entries; bump companion to 0.4.11 (vc15)
Add an edit affordance to each saved server in ServerConnectScreen: a
pencil button loads the entry into the form (Edit Server mode) with
Save Changes / Cancel actions. Persisted via a new
ServerPreferences.updateSavedServer() that replaces by connection
identity (address/port/scheme) and keeps the active record in sync when
the edited server is the active one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:54:07 +01:00
archipelago
fc64b422e7 docs(master-plan): WS-F#3 first destructive run — 3 reinstall bugs found
Full all-apps-lifecycle pass on .228: lifecycle 11/11, teardown 8/11.
Surfaced (1) fresh-install bind-dir ownership root:root → reinstall
EACCES (jellyfin/netbird; Fix B misses the install path), (2) netbird
reinstall adopts leftover containers → skips manifest cert/file render,
(3) portainer image pin lfg2025/portainer:2.19.4 unpublished (manifest
unknown), pin overrides RPC dockerImage. .228 restored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:47:24 -04:00
Dorian
07b9b5a3aa docs(android): companion release + App-Not-Installed runbook
Capture the 2026-06-26 lessons durably: ship via the hardened publish
script only, v1+v2+v3 signing is enforced by apksigner (AGP ignores
enableV1Signing at minSdk>=24), diagnose install failures with adb
install FIRST, signature-key changes force a one-time uninstall, and
keep all phone/adb work scoped to com.archipelago.app.debug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:21:48 +01:00
Dorian
ac59771560 fix(android): force v1+v2+v3 signing & clean-build guards in companion publish
The published companion APK was v2-only (AGP silently ignores
enableV1Signing for minSdk>=24) and clean builds broke on stray
space-named resource dirs. Harden scripts/publish-companion-apk.sh:
clean build, remove/ýreject space-named res dirs, force v1+v2+v3 via
zipalign+apksigner, and abort unless all three schemes verify. Wire
ship-companion.sh to the shared script. Re-sign the served 0.4.10 APK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:53:25 +01:00
Dorian
d1f9e9ce88 chore(android): update companion apk download 2026-06-26 11:32:00 +01:00
Dorian
58847fc3d7 chore(android): bump companion to 0.4.10 (versionCode 14)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:31:36 +01:00
archipelago
a3e09eab57 docs(master-plan): WS-F#3 — destructive all-apps lifecycle matrix landed (43934eef)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:51 -04:00
archipelago
43934eefa5 test(gate): destructive all-apps lifecycle matrix (WS-F#3)
Active counterpart to the read-only all-apps-matrix.bats: drives
stop/start/restart for every installed app and, under
ARCHY_ALLOW_CASCADE_DESTRUCTIVE, a FULL teardown (uninstall →
no-ghost → reinstall) — the broad coverage F needs beyond the ~8 core
suites. App set is discovered from My Apps ∩ the node catalog; reinstall
spec comes from catalog.json {dockerImage, containerConfig}.

PROTECTED by default (never cycled or torn down): bitcoin*/electrum*
(expensive resync) AND lnd/btcpay*/fedimint* (teardown = irreversible
wallet/channel/guardian loss). The user asked to protect only
bitcoin+electrum; the wallet apps are added for safety and can be
removed via ARCHY_MATRIX_PROTECT. Heavy + destructive → a supervised
pass, not folded into run-gate. Validated on .228: discovery excludes
the 6 protected installed apps; lifecycle tier cycles a single app
(botfights) stop/start/restart green; teardown gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:22 -04:00
archipelago
80146f4476 docs(master-plan): WS-F#2 — uninstall progress bar made truthful (9f17ba68)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:15:11 -04:00
archipelago
9f17ba6867 fix(ui): truthful uninstall progress bar (was a solid full-red block)
AppCard's uninstall bar was hardcoded `w-full bg-red-400/60 animate-pulse`
— a solid, full-width, red, fake-pulsing block that never moved and read
as an error, no matter the actual teardown progress (the install bar, by
contrast, renders a real percentage). Derive a truthful percentage from
the backend's existing `uninstall-stage` label — "Stopping containers
(X/N)" → 10–50%, "Cleaning up volumes" → 70%, "Removing app data" → 90%
— and render it exactly like install: neutral fill, real width + percent,
shimmer (not a fake pulse) carrying motion when a stage has no number.
Frontend-only; the backend already broadcasts these stages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:04:48 -04:00
archipelago
67426c0d41 docs(master-plan): cascade tier wired into the gate (b7d92107)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:24:07 -04:00
archipelago
b7d9210784 test(gate): optional ARCHY_GATE_CASCADE pass — wire the cascade tier in
run-gate.sh ran only the DESTRUCTIVE tier; the cascade-uninstall suite
(uninstall→no-ghost→reinstall, the #13/#14/uninstall-hang regression
guard) existed but was never enabled by the gate. Add an opt-in single
cascade pass after the 5× loop (ARCHY_GATE_CASCADE=1, requires
ARCHY_ALLOW_DESTRUCTIVE=1), counted into the pass/fail tally. Kept out
of the 5× loop deliberately — uninstall/reinstall every iteration would
balloon runtime and re-pull images; one pass guards the class. Default
gate behavior unchanged. Validated: cascade-uninstall.bats 7/7 on .228.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:22:45 -04:00
archipelago
292a2650df docs(master-plan): WS-F — uninstall-hang root cause fixed + cascade validated
Workstream F now in-progress: the immich/grafana uninstall hang →
ghost/stuck-bar/reinstall-block is root-caused (unbounded systemctl/
podman in quadlet::disable_remove) and fixed (71cc9ac4); cascade-
uninstall.bats 7/7 on .228. Records the remaining F items + the pending
gate-wiring decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:18:39 -04:00
archipelago
71cc9ac46a fix(uninstall): bound systemctl/podman teardown so uninstall can't hang
Uninstalling immich/grafana could hang with a frozen full-red progress
bar, leave a ghost entry stuck in My Apps, and then refuse reinstall.
Single root cause: quadlet::disable_remove() — called first in the
uninstall task (via companion + orchestrator teardown) — ran
`systemctl --user stop`, daemon-reload, and `podman rm -f` with NO
timeout. On rootless podman a generated unit can wedge in "deactivating"
while podman hangs underneath, so `systemctl stop` blocks forever. The
spawned uninstall task then never returns Ok or Err, so:
  - set_uninstall_stage() (after the stop) never fires → progress frozen;
  - remove_package_state_entry() never runs → entry stranded in
    `Removing` → ghost in My Apps;
  - the install guard rejects reinstall with "already Removing".

The spawn wrapper already reverts state on Err and removes the entry on
Ok — the only failure mode was a hang that returns neither. Bound the
teardown so it always terminates:
  - systemctl stop → QUADLET_STOP_TIMEOUT, escalate to kill+reset-failed
    on timeout (reuses the existing helpers);
  - daemon_reload_user() → bounded systemctl_user_status (30s);
  - defensive `podman rm -f` → wrapped in tokio timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 04:27:02 -04:00
archipelago
2ebcd8f9a8 docs(master-plan): backlog — smart launch-port selection + manifest-driven archival-node blocker
§10b: replace per-app static launch-port map with a manifest-first +
non-HTTP-port-skipping heuristic (the gitea :2222 class).
§10c: generalize the un-pruned/archival Bitcoin install blocker from a
hardcoded requires_unpruned_bitcoin() match to a manifest-declared
dependency, with a clear pre-install UX.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:47:25 -04:00
archipelago
3515344800 docs(master-plan): session h — zombie guard + gitea launch-port fix
Banner + §8b: zombie-container guard (0a8db904, live-proven on .228) and
gitea launch-port fix (670ebb06) shipped in binary 040df5ce, rolled to
the fleet. Logs the mempool env-drift recreate-loop and nostr-rs-relay
follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:41:59 -04:00
archipelago
670ebb0666 fix(launcher): pin Gitea launch URL to web port 3001 (not SSH 2222)
Gitea publishes two host ports — SSH on 2222 and the web UI on 3001.
The launch URL comes from manifest_lan_address_for() (the manifest's
interfaces.main → 3001), but Gitea had no entry in the static
lan_address_for() fallback map. On a node where the gitea manifest is
absent or stale (no interfaces block), the lookup returns None and the
code falls through to extract_lan_address(), which returns whichever
port podman lists first — frequently the SSH port. Result: the app
launched at :2222 instead of :3001 (observed on tailscale node
100.82.34.38).

Add the canonical "gitea" => http://localhost:3001 entry to the static
map, matching every other core app, so the web UI is pinned regardless
of manifest presence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:16:41 -04:00
archipelago
0a8db9044f fix(orchestrator): recreate zombie "Up" containers whose process is dead
podman trusts its own state DB: when a container's conmon dies without
podman observing it (cgroup-cascade SIGKILL on archipelago.service
restart, a crash), `podman ps` keeps reporting it "Up" long after the
process is gone. The reconciler NoOp'd such a zombie forever, so a dead
dependency with no published host port never recovered.

Observed live on .228 (2026-06-25): netbird-dashboard reported "Up" with
a dead State.Pid → its nginx proxy 502'd → NetBird login broke
("Unauthenticated"). The dashboard publishes no host port, so the
Running branch had nothing to probe and never recreated it.

Add a zombie guard to the Running branch: verify the recorded State.Pid
is alive (its /proc entry exists) before trusting "running"; on a
concrete dead PID, stop+remove+install_fresh from the manifest.
Conservative by design — any uncertainty (inspect failed, PID
unparseable) assumes alive, so a transient podman hiccup never destroys
a healthy container. Unit test covers live/dead/out-of-range PIDs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 02:25:52 -04:00
archipelago
43e700498b fix(android): trust self-signed certs for the user's own node in WebView
Node apps (e.g. NetBird on :8087) terminate TLS with a self-signed cert
so the dashboard gets a secure context (OIDC / window.crypto.subtle, #15).
The WebView's default onReceivedSslError CANCELs untrusted certs, so those
apps rendered blank in the companion — exactly the netbird "won't load in
the webview" report. Override onReceivedSslError in both WebViewClients
(kiosk + in-app browser) to proceed() only when the failing cert's host
matches the connected node; reject everything else (no blanket trust).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 18:13:52 -04:00
archipelago
89d397bb74 refactor(netbird): delete legacy Rust installer — #20 ph4 (manifest-driven only)
netbird is fully manifest-driven (apps/netbird-*/manifest.yml via the signed
catalog): install_stack_via_orchestrator renders the 3-member stack with
generated_certs (self-signed TLS for the #15 OIDC secure context), base64
generated_secrets, and templated config — and adopts the running stack by live
container name. The hardcoded `podman run` fallback was therefore dead code on
any node with the embedded catalog (verified live: .228 https:8087 -> 200).

Removes the per-app Rust installer anti-pattern the master plan calls out:
- install_netbird_stack: orchestrator -> adopt -> bail! (no in-Rust installer)
- deletes 6 now-dead helpers (write_netbird_config_files, ensure_netbird_tls_cert,
  read_or_generate_b64_secret, netbird_net_resolver_ip, detect_netbird_public_host_ip,
  wait_for_netbird_oidc_ready), 3 NETBIRD_*_IMAGE consts, unused base64::Engine import
- ~485 lines removed; prod_orchestrator doc-comments updated

Behavioural parity: the manifest path already executed on the fleet, so this
changes no live behavior. The legacy #10 OIDC-readiness wait was already bypassed
by the manifest path; if that race resurfaces, add an OIDC-ready gate to the
manifest rather than resurrecting the Rust fn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 11:04:01 -04:00
archipelago
41e7f500f8 test(lifecycle): tolerate slow-but-healthy heavy-app recovery under 5x churn
The 5x destructive gate on heavy nodes false-failed on transient windows
during stack recovery, not real regressions:

- immich.bats: lan_address port-publish probe 30s -> 90s. The postgres->redis
  ->server (DB migrations on boot) stack can take >30s to republish :2283 after
  a churn-induced recreate; destructive-tier immich tests already allow 180-240s.
- mempool.bats: orphan-container check now polls to steady state (<=30s) instead
  of a single-shot count, which caught a recreated member briefly visible
  alongside its replacement mid-reconcile.
- run-gate.sh: settle cap 180s -> 300s and also gate on immich's :2283 when
  installed, so the next iteration's read-only probe doesn't race a still-
  recovering stack. Settle returns the instant every probe is green.

A genuinely unexposed/orphaned/unhealthy app still fails these checks; they only
absorb the transient recreate window under sustained churn.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 09:18:34 -04:00
archipelago
a721532f55 feat(orchestrator): desired-state recovery + recreate volume-ownership [UNVALIDATED WIP]
NOT yet validated on a node or fleet-deployed — cargo check passes, release build
+ .228 canary validation pending. Committed as a checkpoint so the work survives.

Two fixes the immich .198 incident exposed:

Fix A (reconcile_all_with_mode): a previously-running app whose container vanished
(e.g. a wedged podman teardown cleared by a reboot) was left absent on boot. Now,
when boot reconcile would leave an app 'absent' but it was running at the last
running-containers snapshot, recreate it (install_fresh). New
crash_recovery::load_last_running_names() reads the snapshot without the PID/crash
gate (+2 unit tests). Match is exact on compute_container_name (incl stack
members); user-stopped + uninstalled apps are already excluded, so no false
positives.

Fix B (ensure_bind_mount_dirs): a freshly-created bind dir was left root:root, so a
no-data_uid app running as container-root (→ host rootless user) hit EACCES and
crash-looped (the exact immich upload-dir failure). Now a newly-created bind dir
for a no-data_uid app is chowned via --reference=<parent> to match the rootless
data root — no host-uid guessing, only fresh dirs (no regression for existing
installs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 09:28:40 -04:00
archipelago
80f49cac1c fix(ui): backoff remote-relay reconnects + stop cryptpad icon 404
Two console-noise fixes from a live error dump:
- remote-relay.ts reconnected on a FIXED 5s interval with no backoff, so when
  the backend is briefly down it floods the console/network with failed-WS
  attempts for the whole outage. It's a secondary feature (companion input), so
  add exponential backoff 1s->30s (mirrors websocket.ts), reset on open/start.
- cryptpad's catalog/marketplace entries pointed at a non-existent
  /assets/img/app-icons/cryptpad.webp -> a 404 on every marketplace render.
  Point it at the existing default icon (handleImageError swapped to it anyway).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 08:41:04 -04:00
archipelago
2d8ade629b fix(ui): log global errors silently instead of popping a toast + overlay
The global error handler (Vue errorHandler + window error + unhandledrejection)
fired a red 'Something went wrong: <raw msg>' toast AND an auto on-device overlay
on every caught error — deliberately loud for bug-bash, but it surfaces benign,
non-actionable noise (e.g. a transient RPC rejection during a ws reconnect, or
the service worker failing to register over a self-signed cert) right in the
user's face.

Demote the catch-all to SILENT capture: keep console.error + the
window.__archyErrors ring buffer, and expose the screenshot-able overlay
on-demand via window.__archyShowErrors() — but never auto-pop. Components that
need to report a specific, actionable failure still call toast.error() directly.

Also filter known-benign environmental noise (PWA service-worker registration
failing over a self-signed cert — needs a trusted cert, #56) so it doesn't even
occupy a ring-buffer slot and push out real errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 05:55:49 -04:00
archipelago
0406af522c test(lifecycle): add manifest-driven all-apps health matrix
The per-app suites cover ~8 core apps in depth; nothing covered the ~30 others
(jellyfin, vaultwarden, penpot, nextcloud, grafana, …). all-apps-matrix.bats
derives the app set from server.get-state package-data (no hardcoded list) and
asserts baseline health across EVERY installed app:
  - settles to a non-transitional state within a window (the #13/#14 stuck-ghost
    class, generalized fleet-wide — installing/removing that never settles)
  - not in error/failed
  - reports a recognized (non-garbage) state
  - every running UI app (manifest ui=="true") exposes a non-null lan-address
    (the immich/port-drift unreachable-UI failure, generalized to all UI apps)

Read-only, so it joins run.sh/run-gate.sh on every node and grows coverage as
nodes install more apps. Verified 5/5 on .228 (17 apps) and .116 (20 apps).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 05:27:10 -04:00
archipelago
57a69257c4 test(lifecycle): add CASCADE uninstall/reinstall tier (guards #13 ghost, #14 reinstall)
The 5x gate is DESTRUCTIVE-only and never exercised uninstall/reinstall — where
the worst field bugs lived (#13 app ghosting in My Apps after uninstall, #14
reinstall stalling on stale state). New cascade-uninstall.bats drives the full
teardown path on a throwaway app (default grafana, precondition-skips if already
installed so it can't destroy real data) and asserts:
  - fresh install reaches running via a truthful, non-silent progression
  - uninstall makes the entry DISAPPEAR from server.get-state package-data
    (the literal My Apps map) — no ghost, no stuck uninstall stage
  - container + (on-node) data dir are gone
  - reinstall returns to running
  - node left as found

Opt-in via ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1; not yet folded into the canonical
gate. Verified 7/7 against .228.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 05:13:53 -04:00
archipelago
d1cd42c821 fix(orchestrator): stop retrying unrepairable volume chowns every reconcile
ensure_running_container_ownership re-probed and re-attempted the in-container
chown on every reconcile pass. For a mount that can't be re-owned from inside the
userns (observed: mempool-api /data -> 'Operation not permitted'), this burned
CPU and logged a WARN on every pass, forever (~6x/30min on .228/.116).

Remember hard chown failures in a process-lifetime set keyed by (container-id,
dest) and skip the probe+chown for known-unrepairable mounts. Keyed by Id (not
name) so a recreated container gets a fresh repair attempt. Verified on .116:
one recorded failure at startup, then silent across subsequent reconciles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 04:58:57 -04:00
archipelago
3e3016f2bd fix(ui): debounce connection-lost banner so transient ws blips don't flash
The reconnect banner showed 'Connection lost'/'Reconnecting' instantly on every
socket close, even ones that recover in 100ms-2s (load spikes, Tailscale/relay
TCP resets). On a healthy node the drops are brief and self-healing, but each one
flashed a jarring banner, reading as constant instability.

Debounce the transient banner by 2.5s: only surface after the connection issue
persists past the grace window; hide immediately on recovery. Deliberate server
lifecycle transitions (restart/shutdown) bypass the debounce and still show at
once. A genuine persistent outage keeps isOffline true and surfaces after 2.5s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 04:58:54 -04:00
archipelago
7d89b4d8b2 chore(registry): publish embedded app-catalog.json (52 manifests) for fleet fetch
Force-add the gitignored releases/app-catalog.json so nodes resolve
146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/app-catalog.json
(currently HTTP 404 → disk-manifest fallback). Embedded-manifest delivery
is default-on; origin-wins overlay with disk as fallback. Unsigned (migration
window accepts unsigned). Includes netbird x3 manifests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 23:45:31 -04:00
archipelago
15f65428b8 docs(master-plan): §8b — uninstall fix deployed+live-verifying, #15 guardian resolved
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 18:07:41 -04:00
archipelago
36015a19fe docs(master-plan): §8b session-b state — connection-lost+netbird+UX-merge shipped to .228, uninstall ghost fix, workstream F in progress
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 15:26:17 -04:00
archipelago
e57514b690 fix(uninstall): never ghost a removed app in My Apps on cleanup residue
handle_package_uninstall lumped every teardown failure into one `errors` vec
and returned Err on any of them BEFORE removing the package state entry — so a
non-fatal cleanup hiccup (a slow/failed `sudo rm -rf` of a large data dir, a
volume/network removal) left the app's containers gone but its entry in
package_data → a ghost in My Apps, and the spawned task reverted it to Installed.

Split the failures: container removal that even force-rm can't complete (app
genuinely still present) keeps the entry + returns Err; everything after the
containers are gone is best-effort. Remove the state entry as soon as the
containers are gone — BEFORE the slow volume/data teardown — so My Apps updates
immediately and residue can never ghost the app. set_uninstall_stage is a no-op
once the entry is gone (if-let guard), so the later stages don't re-create it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 15:23:16 -04:00
archipelago
4346007d37 fix(orchestrator): only TCP host ports get reachability-probed
wait_for_manifest_host_ports TCP-connect-probed every published port, including
UDP/SCTP. netbird's 3478/udp STUN can never answer a TCP connect, so the probe
failed forever and drove an endless host-port repair/reconcile loop on .228
(netbird-server restarting ~every 60s). Filter to tcp (empty protocol = tcp).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 14:40:48 -04:00
archipelago
44f7af2017 merge: companion-mobile-ux UX (loader/store-driven launch/icons + android webview) into main
# Conflicts:
#	Android/app/build.gradle.kts
#	Android/app/src/main/java/com/archipelago/app/ui/screens/WebViewScreen.kt
#	neode-ui/src/views/apps/appsConfig.ts
2026-06-23 14:07:44 -04:00
archipelago
9670af62b6 feat(registry): deliver app manifests via the signed catalog (embed by default)
Turn on registry-distributed manifests for all apps: generate-app-catalog.sh now
embeds each apps/<id>/manifest.yml by default (EMBED_MANIFESTS opt-out), so nodes
install from the signed catalog (origin-wins overlay, disk = fallback) with no
OTA-shipped disk manifest. main.rs awaits a bounded (25s) refresh_catalog before
load_manifests so a fresh boot overlays the latest embedded catalog instead of a
restart later; offline/ISO boot falls through to disk and never hangs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:39:54 -04:00
archipelago
a8b9b0f5e8 feat(netbird): manifest-driven migration via reusable orchestrator primitives
Migrate the netbird stack (server/dashboard/proxy) off ~500 lines of per-app Rust
to 3 declarative manifests, adding 4 reusable primitives:
- SecretGenKind::Base64 (netbird relay authSecret + sqlite store encryptionKey)
- GeneratedCert schema + ensure_manifest_certs (self-signed TLS so the dashboard
  gets a secure context for OIDC PKCE — issue #15; https proxy on 8087 preserved)
- templated GeneratedFile render: {{HOST_IP}}/{{HOST_MDNS}}/{{NETWORK_GATEWAY}}
  (aardvark resolver for the #15 stale-IP fix) /{{secret:NAME}} (never logged)
- legacy create_container now honours port.protocol (3478/udp STUN)
install_netbird_stack routes via the orchestrator first (legacy kept as fallback,
mirroring indeedhub); launch URL derives https://{host_ip}:8087 from host facts.
Legacy Rust deletion deferred to post-live-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:39:53 -04:00
archipelago
3c36cf1c40 fix(companion): stop image_exists journal flood that drops the UI websocket
image_exists ran `podman image inspect <image>` via .status() (inherits the
service stdout) with no --format, so every hit dumped the image's full ~249-line
manifest JSON into the journal — once per companion image, every reconcile pass
(.228: 21.6k journal lines / 10 min, 4131 inspect dumps). The service never
crashed (NRestarts=0); the sustained journald/IO flood starved the async runtime
and dropped the UI /ws/db websocket -> constant "connection lost"/reconnect.
Discard the child's stdout/stderr; only the exit status is used.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:39:19 -04:00
archipelago
c4cd5fdc90 docs(master-plan): §8b resume — gate green + 6-node deploy + APK fix + workstream F
Comprehensive resume for the session restart: single-node gate green
(5/5 .228), latest backend + UX + one-tap companion APK deployed to 6
nodes (table w/ creds + pending 100.64.83.15 cred), workstream-F bugs
from manual testing, agreed next order (netbird → Phase-3 → F →
multinode), and loose ends (untracked AppLoadingScreen.vue, broken
gitea-local mirror, don't-delete-bitcoin-data directive).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:56:54 -04:00
archipelago
ccb594fb85 test(gate): fix bitcoin-knots getinfo-after-restart helper + IBD note
It called bats-assert's `fail` (not loaded in this file) → "fail:
command not found"/127, masking the real reason. Emit+return instead,
bump the cold-restart RPC window 60s→120s (block-index reload), and
note a node mid-IBD legitimately can't serve getinfo (environmental
precondition, not a product regression).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:28:20 -04:00
archipelago
deff380191 docs(master-plan): workstream F (lifecycle perfection) + §10 state-mgmt backlog
The 2026-06-23 5×-green gate is DESTRUCTIVE-tier / ~8 core apps only —
it skips uninstall/reinstall (cascade) and has no progress-UI or
all-apps coverage. Manual multinode testing found real bugs it never
ran (immich+grafana uninstall hangs at full-red bar + ghost in My Apps;
grafana reinstall stops; fedimint guardian "waiting for bitcoin sync").
Adds §4 row F, §6b post-deploy order (netbird→Phase-3→F), §6c scope +
observed bugs + definition-of-done, a §5 warning, and §10 backlog to
investigate TanStack-Query/push-based state management for neode-ui.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 06:28:19 -04:00
Dorian
5c43e12782 chore(android): publish companion as raw APK instead of zip
Serve the companion download as a plain .apk so a phone installs it
straight from the link/QR with no unzip step. Repoint the in-app
download URL, the ship + publish scripts, and the pre-push hook at
archipelago-companion.apk, and drop the legacy .apk.zip.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 09:41:10 +01:00
Dorian
e825bbed73 feat(android): file upload/download + in-app tab redesign
Companion WebView now supports file inputs and downloads, and apps
opened in the in-app tab get a proper loading splash and a footer
control bar matching the web app-session bar.

- onShowFileChooser wired to an ActivityResultLauncher so <input
  type=file> opens the system file browser (kiosk + in-app tab)
- DownloadListener: http(s) via DownloadManager (forwarding session
  cookies), blob: via JS->base64->MediaStore, data: decoded inline
- in-app tab: app-icon + progress loading splash (eager favicon
  fetch, upgraded via onReceivedIcon)
- footer controls (back/forward/refresh/open/close) matched to the
  web AppSession mobile bar, with the same SVG glyphs as drawables
- bump to 0.4.8 (versionCode 12)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 09:41:10 +01:00
archipelago
0dd19f0721 docs(CLAUDE.md): single-node gate GREEN — demote priority banner
run-gate.sh 5/5 on .228. Reframe the TOP PRIORITY banner as
gate-green; keep the master plan as north-star source of truth; mark
the gate definition-of-done green and point at multinode as the next
exit criterion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:35:50 -04:00
archipelago
ae47897601 docs: single-node production gate GREEN (5/5 on .228) — demote banner
run-gate.sh 5×-green on .228, 0 not-ok (gate-5x5.log). Records the
milestone in the header/banner, §4 workstream E, §6 sequence, and §8b;
demotes the priority banner per §6 item 6. Next: bundled testing deploy
(.116/.198 + UX frontend), multinode pass, workstreams B/C/D.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:27:36 -04:00
archipelago
256d354048 docs(master-plan): tick off §8 P1 mobile app-launch UX (code-complete)
Mobile launch UX is code-complete on branch `companion-mobile-ux` (store-driven
panel, no interstitial, in-app WebView footer + loader, mesh 100dvh, ElectrumX
icon, companion v0.4.7 + shared debug keystore). Marked code-complete pending
on-device/mobile-web verification and merge to main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 04:11:25 -04:00
archipelago
2a249b8a48 feat(android): companion in-app WebView footer controls + loader; shared debug key; v0.4.7
- InAppBrowser now has a bottom control bar (back/forward/reload/open-in-browser/
  close) mirroring the web mobile footer, plus a centered loading screen
  (app favicon + progress bar) instead of a bare top bar over black.
- Commit a repo-dedicated debug keystore and pin signingConfigs.debug to it so
  every machine — and the published companion download — signs debug builds with
  the SAME key (fixes "App not installed" signature-mismatch on update). Force v1+v2.
- Bump versionCode 10→11, versionName 0.4.6→0.4.7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:48:58 -04:00
archipelago
a7c7c44843 feat(neode-ui): mobile app-launch UX — store-driven panel, loader, ElectrumX icon
- Mobile launches use the store-driven panel (no route push) so the background
  tab no longer changes and closing returns to where you launched from.
- Tab-only apps open directly (in-app WebView on companion / new tab on PWA) —
  no "this app opens in a tab" interstitial.
- Shared AppLoadingScreen (app icon + progress bar) on the app session and the
  legacy iframe overlay instead of a black screen.
- Pin the dashboard to 100dvh on mobile so the mesh chat/tools panes stop sliding
  under the bottom tab bar in mobile browsers (no-op in the companion WebView).
- ElectrumX/electrs/electrs-ui ids now resolve to the real ElectrumX icon in My Apps.
- isMobile made reactive so overlay/footer/teleport decisions track the viewport.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:48:57 -04:00
archipelago
2afd18c6de test(gate): poll immich lan_address to absorb mid-recreate churn
5× run #4 flaked iter4 on "immich exposes its web UI lan-address
(port 2283)": container-list returned lan_address=null because
immich_server was momentarily mid-recreate when the read-only tier
queried it (passed the other 4 iterations; immich_server does publish
0.0.0.0:2283->2283). Same single-shot-read class as the bitcoin-knots
state probe — poll <=30s for the exposed port instead of one read. A
genuinely unexposed immich never publishes 2283, so real port drift
is still caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 03:20:18 -04:00
archipelago
6511754545 docs: master-plan §8b — 5× triage, mempool restart bug fixed
Record the overnight 5× outcome (2/5) and the triage: all three
fails were distinct one-offs. iter1 #5 bitcoin-knots = pre-launch
churn (hardened anyway); iter2 #74 + iter5 #73 = one real
orchestrator bug (phantom stack-member injection in
ordered_containers_for_start), now fixed + live-verified on .228.
Update the resume check command to gate-5x4.log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 02:23:07 -04:00
archipelago
92d7f52dd6 fix(orchestrator): order only live containers on package start/restart
package.restart resolved its container list via
ordered_containers_for_start, which injected every name from the
union startup_order list that wasn't already present — including
variant names not live on a given node (mysql-mempool,
archy-mempool-api, archy-mempool-web). The phantom mysql-mempool is
2nd in the mempool start order, so do_orchestrator_package_start hit
its unknown-app-id fallback, do_package_start failed the inspect
("no such object"), and the `?` aborted the whole start sequence —
leaving mempool-api + the frontend down until the health monitor
recovered them minutes later. That was the source of the 5× gate
flakes #73 (frontend not running in 180s) and #74 (api not queryable
in 300s); root-caused from the .228 journal
("Start failed: mysql-mempool").

Replace the inject-then-sort logic with a pure helper
order_present_containers that orders only the actually-present
containers and never adds phantom entries. startup_order remains a
union of name variants across install generations — it's now used
purely to order what's live, not to inject what isn't. +3 unit tests.

Also harden bitcoin-knots.bats "valid state" probe: poll ≤30s for a
settled state instead of a single-shot read, so a container caught
mid-reconcile (transient restarting/configured) can't flake a 20-min
iteration. A genuinely-stuck container never settles, so real
breakage is still caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 02:22:50 -04:00
archipelago
57a013bc66 test(gate): make 5× the canonical gate, drop 20x naming
Rename run-20x.sh → run-gate.sh, default ARCHY_ITERATIONS 20→5, and scrub
20× references across CLAUDE.md, the master plan, TESTING.md, app-registry
status, the orchestrator/config doc-comments, and the bats suites. Also add
a minimal fail() helper to mempool.bats so guard failures report cleanly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 18:12:41 -04:00
archipelago
0f05f73a23 fix(mempool): self-healing nginx backend proxy (v3.0.1) + gate timeout
The frontend nginx used a literal proxy_pass host with no resolver, so it
pinned mempool-api's IP at worker startup. When the backend restarts (gate,
OTA, crash, reboot re-IPAM) podman reassigns its IP and nginx keeps proxying
to the dead one -> /api hangs, websocket 502s, UI shows 'offline' until a
manual nginx reload. Same stale-upstream-IP class as the netbird 502.

Fix: mempool-frontend:v3.0.1 rewrites the generated nginx-mempool.conf to
re-resolve the backend per-request via 'resolver' + a variable proxy_pass.
Resolver address is read from /etc/resolv.conf (podman aardvark-dns answers
on the network gateway, not Docker's 127.0.0.11). Per-location path mapping
preserved (ws -> '/', /api/v1 identity via no-URI, /api/ -> /api/v1/ rewrite).
Proven on .228: backend IP change now auto-recovers with no reload; the
literal-host control still 502s. Migrated the manifest off the retired
tx1138 registry to vps2.

Also: mempool.bats #74 waited only 180s post-restart (the slow path) and
called an undefined 'fail' helper (status 127). Bumped to 300s to match the
passing parity probes and emit a real failure instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 18:07:07 -04:00
archipelago
c8acc84506 docs: §2 invariant single-node (.228); multinode → separate plan 2026-06-22 17:23:19 -04:00
archipelago
8355453a7e docs: exact cutoff-proof resume in master-plan SS8b (resume from any device)
Captures: .228 1x-GREEN (110/110); hardened 5x DETACHED on .228 (/tmp/gate-5x2.log,
nohup — survives terminal close) with the exact check-from-any-machine command; all
shipped code fixes (commits) + deploy state (.228 + .198); node-state fixes NOT in
repo (lnd nginx proxy 8081->18083, home-assistant orphan unit removed, electrumx
re-registered); the run-ON-the-node lesson; and remaining work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 17:22:29 -04:00
archipelago
98f4fa44a8 test(gate): harden readiness for sustained 5x churn + inter-iteration settle
The 1x gate is green; the 5x failed iters 1-2 on readiness-under-churn (apps DO
recover — lnd synced, mempool just mid-restart when probed — but slower than the
windows when restarted back-to-back). Hardening:
- run-20x.sh: best-effort settle_stack() before each iteration (wait for
  mempool-api/frontend + lnd RPC healthy, 180s, on-node, never fails the run).
- required containers present/running (80/81): wait-loops (180s) not single-shot.
- mempool api/frontend (87/88): retry ~180s not single-shot.
- mempool queryable (74): 60s->180s. lnd restart-running (64): 120s->240s.
  lnd getinfo (60): 90s->240s retry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 17:11:15 -04:00
archipelago
22b05de6d9 docs(roadmap): P1 mobile app-launch UX — drop 'opens in a tab' interstitial
Companion app: open every app in the in-app WebView (not just non-iframeable),
carrying the mobile-iframe footer controls into the WebView. Mobile web (PWA):
open tab-apps directly in a new tab. No interstitial on either surface. Touch
points + prior commits (b5a9deb8, d1fbcd9b) noted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 16:57:44 -04:00
archipelago
27299ea687 docs: make the production test gate a SINGLE-NODE (.228) criterion; split out multinode
Per direction: the gate is now 5x green ON .228 only (run on the node, not via RPC).
Fleet/multinode verification (.198 + others) moved to a new docs/multinode-testing-plan.md
with the bootstrap recipe, per-node preconditions (synced archival bitcoin, no stale
nginx proxy targets, no orphan quadlet units), node roster, and cross-node suites.
Updated CLAUDE.md, master-plan SS5/SS6/SS8b/WS-E, and TESTING.md release gates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 16:47:34 -04:00
archipelago
892ff083c4 test(gate): fix the last 4 readiness/config false-fails (none are product bugs)
On a proper on-node .228 run (synced bitcoin, 4-fix binary) the lifecycle matrix is
green; these 4 were test-harness issues:
- lnd 'recovers after restart' (65): bump retry window 90s->240s. lnd cold-restart
  recovery (wallet unlock + bitcoind reconnect + graph sync) exceeds 90s on a loaded
  node but DOES complete (synced_to_chain:true).
- bitcoin ui responds (89): retry ~120s instead of single-shot (companion nginx may
  have just been recreated by the companion-survives test).
- probe_app_url (99 lnd proxy + all ui-coverage proxy probes): retry up to 90s for
  post-restart proxy/UI readiness instead of single-shot.
- required endpoints after restart (94): :8081 is nginx-proxy-manager, an OPTIONAL
  app (not in required_containers) — only assert it when NPM is installed; and make
  the trailing lncli getinfo a retry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 15:43:51 -04:00
archipelago
8893055810 test(gate): retry lnd getinfo for RPC readiness (wallet-unlock lags 'running')
lnd's RPC isn't ready until its wallet auto-unlocks on (re)start, which lags the
container 'running' state — single-shot lncli getinfo raced that window and
false-failed (gate tests 60 + 85). Retry up to ~90s like a health probe. lnd is
functional (getinfo returns cleanly once ready).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 14:45:36 -04:00
archipelago
53b8e47f1d test(gate): fix two false-failing lifecycle tests (not product bugs)
- immich restart: bump wait 120s->240s. Restart = ordered stop+start of the 3-
  container stack (postgres->redis->server w/ DB migrations), so it needs at least
  as long as the start test (180s) — the old 120s was inconsistent and false-failed
  on loaded nodes. immich does return to running.
- fedimint orphan check: the unanchored 'total' regex (^fedimint) counts the
  legitimate fedimint-clientd (dual-ecash bridge) but the anchored 'known' regex
  omitted it -> total>known false orphan on every node running fedimint-clientd.
  Add fedimint-clientd to known.

Both run as LOCAL podman/systemctl on the gate runner, so they test the runner node
(.116), not the RPC target — surfaced while driving the .228 gate green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 14:11:35 -04:00
archipelago
f4727bfdb3 docs(gate): companion self-heal fix validated (10s) + test-31 harness caveat
Independent companion loop (452f05d8) validated on .228: deleted archy-electrs-ui
recreates in ~10s (was stuck 100s+). Also: companion-survives bats does LOCAL
rm/systemctl --user, so running it from .116 via RPC tests .116's companions with
.116's binary, NOT the remote target — must run ON the target node. Explains the
'failed on both nodes' runs (both silently tested .116).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 13:44:57 -04:00
archipelago
452f05d849 fix(reconciler): decouple companion self-heal onto its own cadence
The companion-unit repair stage ran at the END of each boot-reconciler tick, after
reconcile_existing(). On a heavily loaded node that per-app pass takes >60-90s, so a
deleted/lost companion unit (electrs-ui, bitcoin-ui, …) wasn't repaired within any
reasonable window (gate test 31 'deleted unit recreated within one reconcile tick'
timed out at 90s on the 45-app .228 node). Detecting + rewriting a companion unit is
cheap, so spawn it as its own ~interval(30s) loop, independent of the slow app pass.
Handle is aborted when the main loop exits (shutdown uses notify_one, so a second
waiter would steal the wake permit). tick() is now app-reconcile only.

All 4 boot_reconciler cadence tests still green (companion_stage=false in tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 13:04:28 -04:00
archipelago
de7d3d83dc docs(gate): final read — every failure fixed/explained, no lifecycle bugs remain
Last 2 .228 stragglers confirmed load/timing, not bugs: test 31 (companion recreate)
= contamination + ~108s reconcile cadence > 90s window; test 55 (immich restart) =
heavy stack restarts >120s under load but DOES return. Path to literally-green gate
is infra (bitcoin sync, re-quadletize .228) + minor test-window tuning. Optional
product improvement noted: independent ~30s companion-reconcile cadence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 12:36:03 -04:00
archipelago
76b23adcc0 docs(gate): test 31 root-caused = .228 contamination (not a product bug)
companion::reconcile only recreates a deleted companion unit when its parent
backend is in manifest_ids. On contaminated .228, electrumx ran as plain podman
and was NOT a tracked manifest install (manifest on disk but unloaded), so the
reconciler never iterated it -> archy-electrs-ui companion orphaned. Proven:
package.install electrumx re-registered it + restored the companion. Self-heal
logic is sound; test 31 clears on re-quadletize. electrumx on .228 de-contaminated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 11:34:55 -04:00
archipelago
47a5148865 docs(gate): two-node result — stop blocker FIXED; residual red is bitcoin-IBD + node prep
.228 104/110, .198 94/110 with the 3-fix binary. Every package.stop test passes on
healthy apps. .198's 14/16 failures trace to bitcoin in IBD (test 83: ~137k blocks
behind) cascading to lnd/btcpay/electrumx/mempool. 2 node-independent: companion
recreate (31, both nodes), fedimint orphan pollution (44). Path to green 5x gate is
now infra (sync bitcoin, re-quadletize .228) + minor (test 31), not lifecycle bugs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 11:09:12 -04:00
archipelago
b090235b04 docs(gate): 3 stop bugs FIXED, electrumx suite GREEN on .228
Stop failure was 3 real product bugs (grace / reconcile-resurrection /
container-list user-stopped state), all fixed (2dad64b2, 760a32bc, 6e49ce6f) +
deployed. electrumx lifecycle suite 10/10 green (66s). fedimint 'crash loop' was
probe-induced churn (stable when left alone). Validating breadth next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:49:45 -04:00
archipelago
6e49ce6f88 fix(container-list): report user-stopped apps as stopped despite live UI companion
A user-stopped backend (electrumx, bitcoin, lnd, fedimint) kept reading 'running'
in container-list because its UI companion (electrs-ui, …) still serves the launch
port, and the state-refresh upgrades any reachable launch port to 'running'. The
gate's wait_for_container_status <app> stopped therefore never saw 'stopped'.

Fix: load the user_stopped marker in handle_container_list and force 'stopped' for
those apps before the launch-port refresh. The reconcile guard keeps the backend
down, so the marker is authoritative. package.start clears it first, so a started
app reports 'running' normally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:26:30 -04:00
archipelago
760a32bccf fix(reconcile): keep user-stopped apps stopped (reconciler was resurrecting them)
package.stop a dependency (e.g. electrumx, a mempool dep) and the reconciler
restarts it within ~8s: the reconcile filter's dependency_required override
re-includes a user-stopped app that an active app depends on, and the in-memory
disabled set is wiped on manifest reload — so ensure_running runs, the stopped
app's unreachable ports look like a fault, the host-port repair restarts it, and
package.stop never sticks (gate 'transitions to stopped' times out).

Fix: guard ensure_running_with_mode on the on-disk user_stopped marker (the single
choke point every reconcile flows through) → Left('user-stopped'). Explicit
install/start clear the marker first (added clear_user_stopped to orchestrator
install/start, symmetric with disabled.remove; start/restart RPC already cleared
it) so user actions are unaffected. The container itself already stopped correctly
— this stops the resurrection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:04:02 -04:00
121 changed files with 9504 additions and 2777 deletions

View File

@ -7,14 +7,6 @@
# Allow demo assets (AIUI pre-built dist)
!demo/
# Allow the Bitcoin UI + ElectrumX UI mock shells (served from /docker/*)
!docker/
docker/*
!docker/bitcoin-ui/
!docker/electrs-ui/
!docker/lnd-ui/
!docker/fedimint-ui/
# Allow backend source for ISO source builds
!core/
!scripts/

View File

@ -2,7 +2,7 @@
# Keep the served companion APK in sync with main on every push.
#
# When a push to main includes Android changes, rebuild the APK, refresh
# neode-ui/public/packages/archipelago-companion.apk.zip, commit it, and ask
# neode-ui/public/packages/archipelago-companion.apk, commit it, and ask
# you to push again (so the refreshed APK rides along in the same push).
#
# Enable once per clone: git config core.hooksPath .githooks
@ -40,7 +40,7 @@ fi
bash scripts/publish-companion-apk.sh || exit 0
DEST="neode-ui/public/packages/archipelago-companion.apk.zip"
DEST="neode-ui/public/packages/archipelago-companion.apk"
if git diff --cached --quiet -- "$DEST"; then
exit 0 # APK unchanged — nothing to do
fi

View File

@ -1,67 +0,0 @@
name: Demo images
# Builds and pushes the public-demo images on every change to the UI / mock
# backend, so the separated `archy-demo` Portainer stack auto-tracks the real
# code (see demo-deploy/ and docs/demo-deployment-design.md).
#
# Required repo configuration:
# vars.DEMO_REGISTRY e.g. 146.59.87.168:3000/lfg2025
# secrets.DEMO_REGISTRY_USER
# secrets.DEMO_REGISTRY_TOKEN
# Optional:
# secrets.PORTAINER_WEBHOOK redeploy hook called after a successful push
on:
push:
branches: [main]
paths:
- 'neode-ui/**'
- 'docker-compose.demo.yml'
- '.github/workflows/demo-images.yml'
workflow_dispatch:
jobs:
build:
name: Build & push demo images
runs-on: ubuntu-latest
# Skip cleanly on forks / before registry config is set.
if: ${{ vars.DEMO_REGISTRY != '' }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ${{ vars.DEMO_REGISTRY_HOST || vars.DEMO_REGISTRY }}
username: ${{ secrets.DEMO_REGISTRY_USER }}
password: ${{ secrets.DEMO_REGISTRY_TOKEN }}
- name: Build & push backend
uses: docker/build-push-action@v6
with:
context: .
file: neode-ui/Dockerfile.backend
push: true
tags: |
${{ vars.DEMO_REGISTRY }}/archy-demo-backend:demo
${{ vars.DEMO_REGISTRY }}/archy-demo-backend:${{ github.sha }}
- name: Build & push web
uses: docker/build-push-action@v6
with:
context: .
file: neode-ui/Dockerfile.web
push: true
build-args: |
VITE_DEMO=1
tags: |
${{ vars.DEMO_REGISTRY }}/archy-demo-web:demo
${{ vars.DEMO_REGISTRY }}/archy-demo-web:${{ github.sha }}
- name: Trigger Portainer redeploy
if: ${{ success() && secrets.PORTAINER_WEBHOOK != '' }}
run: curl -fsS -X POST "${{ secrets.PORTAINER_WEBHOOK }}"

5
Android/.gitignore vendored
View File

@ -14,3 +14,8 @@ local.properties
*.aab
*.jks
*.keystore
# Exception: the repo-dedicated *debug* keystore is committed on purpose so every
# machine (and the published companion download) signs debug builds identically —
# updates then install over the top without an uninstall. Debug keys are not
# secret (well-known password "android"); never commit a real release keystore.
!/app/debug.keystore

View File

@ -0,0 +1,94 @@
# Companion App — Build, Ship & "App Not Installed" Runbook
Canonical procedure for releasing the Archipelago Companion Android app and for
debugging install failures. Read this before touching the companion release flow.
Hard lessons from 2026-06-26 are baked in below — don't relearn them.
## Ship the companion (the only sanctioned way)
```bash
./Android/ship-companion.sh
```
This calls `scripts/publish-companion-apk.sh` (the single source of truth, also
used by the `.githooks/pre-push` hook), which:
1. **Removes/rejects resource dirs whose names contain spaces.** Empty stray
`mipmap-* NNN` dirs (left by icon-export tools) break a *clean* build with
`Invalid resource directory name`. Incremental builds hide them — clean builds
don't.
2. **Always does a CLEAN build** (`:app:clean :app:assembleDebug`).
3. **Forces v1 + v2 + v3 signing** via `zipalign` + `apksigner`.
4. **Verifies all three schemes** (`apksigner verify --min-sdk-version 21`) and
**aborts** if any is missing.
5. Stages the signed APK at `neode-ui/public/packages/archipelago-companion.apk`,
commits, and pushes with `SHIP_COMPANION=1` (the sanctioned pre-push bypass).
**Never** hand-roll `gradlew assembleDebug` + `cp` to the served path. That path
skips the clean build and the signature enforcement and is exactly how a broken
APK shipped.
### Bump the version first
Edit `Android/app/build.gradle.kts``versionCode` (must strictly increase) and
`versionName`. The committed value can drift AHEAD of what's actually built into
the served APK, so verify the served APK's real version after shipping:
`aapt2 dump badging neode-ui/public/packages/archipelago-companion.apk | grep version`.
## Signing facts (important)
- Debug builds are signed with the **committed** `Android/app/debug.keystore`
(store/key pass `android`, alias `androiddebugkey`) so every machine and the
served download share ONE signing key. Cert SHA-256: `D6:22:E0:7E:…:66:4D`.
- **AGP silently ignores `enableV1Signing = true` for `minSdk ≥ 24`**, so a plain
gradle build produces a **v2-only** APK. The `apksigner` step in the publish
script is what actually guarantees v1+v2+v3 — do not remove it.
- **Changing the signing key forces every existing install to be uninstalled
once.** Android blocks in-place upgrades across different signatures. Treat the
keystore as permanent; never regenerate it casually.
## Debugging "App Not Installed" — DIAGNOSE FIRST
Do **not** theorize about signing schemes / OEM quirks. Get the real reason:
```bash
adb install ~/Desktop/archipelago-companion-<ver>.apk
# -> Failure [INSTALL_FAILED_<REASON>: ...]
```
Map the reason:
| `INSTALL_FAILED_*` | Cause | Fix |
|---|---|---|
| `UPDATE_INCOMPATIBLE … signatures do not match` | Old install signed with a **different key** (e.g. pre-shared-keystore per-machine key `58:31:12…`). | Uninstall the old package, then install. **One-time** per device after a key change. |
| `INVALID_APK` / parse error | Corrupt/incomplete download or bad signing. | Re-download; re-run the publish script. |
| `INSUFFICIENT_STORAGE` | Storage. | Free space. |
| `OLDER_SDK` | Device below `minSdk` (26 = Android 8.0). | Unsupported device. |
> A manual uninstall on the phone may NOT clear `UPDATE_INCOMPATIBLE` if the
> package is registered under another user/profile — `pm path <pkg>` under user 0
> can show nothing while the conflict persists. `adb uninstall <pkg>` clears it
> across all users.
## Phone / adb safety (non-negotiable)
When acting on the user's physical phone, be surgical — the user once had all
home-screen app layouts wiped by an over-broad action.
- Default to **read-only** adb (`devices`, `getprop`, `pm path/list`, `dumpsys`).
- Mutations (`adb install`, `adb uninstall com.archipelago.app.debug`) only with
explicit go-ahead and **scoped to our exact package** — echo it first.
- **Never** run launcher/system resets: no `pm clear` on launchers, no
`reset-permissions`, no factory wipe, no uninstalling apps you didn't build.
## Verify the published download after shipping
The download served to nodes is Gitea raw-on-main. Confirm the live bytes match
what you built and signed:
```bash
SERVED=neode-ui/public/packages/archipelago-companion.apk
URL=http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/$SERVED
curl -sS -o /tmp/live.apk "$URL"
shasum -a 256 "$SERVED" /tmp/live.apk # must match
apksigner verify -v --min-sdk-version 21 /tmp/live.apk | grep -i "scheme" # v1/v2/v3 = true
```

View File

@ -11,20 +11,40 @@ android {
applicationId = "com.archipelago.app"
minSdk = 26
targetSdk = 35
versionCode = 10
versionName = "0.4.6"
versionCode = 16
versionName = "0.4.12"
vectorDrawables {
useSupportLibrary = true
}
}
signingConfigs {
// Repo-dedicated debug keystore (committed at app/debug.keystore) so every
// machine — and the published companion download — signs debug builds with
// the SAME key. Without this, Gradle falls back to each machine's
// ~/.android/debug.keystore, so a build from a different machine has a
// different signature and the phone rejects the update ("App not installed").
getByName("debug") {
storeFile = file("debug.keystore")
storePassword = "android"
keyAlias = "androiddebugkey"
keyPassword = "android"
// Force both legacy JAR (v1) and APK Signature Scheme v2. AGP drops v1
// for minSdk>=24, but some OEM package installers (e.g. Samsung) reject
// a v2-only sideload with "App not installed" — keep v1 for max compat.
enableV1Signing = true
enableV2Signing = true
}
}
buildTypes {
debug {
// Separate app ID so a debug/test build installs alongside the
// release app instead of colliding on signature.
applicationIdSuffix = ".debug"
versionNameSuffix = "-debug"
signingConfig = signingConfigs.getByName("debug")
}
release {
isMinifyEnabled = true

BIN
Android/app/debug.keystore Normal file

Binary file not shown.

View File

@ -112,6 +112,37 @@ class ServerPreferences(private val context: Context) {
}
}
/**
* Replace a saved server in place. Matches the existing entry by connection
* identity (address/port/scheme) so edits that change the name or password
* or that touch a legacy 4-field entry still update the right record. If the
* edited server is also the active one, the active record is kept in sync.
*/
suspend fun updateSavedServer(original: ServerEntry, updated: ServerEntry) {
context.dataStore.edit { prefs ->
val current = prefs[savedServersKey] ?: emptySet()
val filtered = current.filterNot { raw ->
val e = ServerEntry.deserialize(raw)
e != null &&
e.address == original.address &&
e.port == original.port &&
e.useHttps == original.useHttps
}.toSet()
prefs[savedServersKey] = filtered + updated.serialize()
val isActive = prefs[activeAddressKey] == original.address &&
(prefs[activePortKey] ?: "") == original.port &&
(prefs[activeHttpsKey] ?: false) == original.useHttps
if (isActive) {
prefs[activeAddressKey] = updated.address
prefs[activeHttpsKey] = updated.useHttps
prefs[activePortKey] = updated.port
prefs[activePasswordKey] = updated.password
prefs[activeNameKey] = updated.name
}
}
}
suspend fun removeSavedServer(server: ServerEntry) {
context.dataStore.edit { prefs ->
val current = prefs[savedServersKey] ?: emptySet()

View File

@ -75,6 +75,7 @@ fun NESMenu(
onDismiss: () -> Unit,
onSelectServer: (ServerEntry) -> Unit,
onAddServer: (ServerEntry) -> Unit,
onEditServer: (ServerEntry, ServerEntry) -> Unit,
onRemoveServer: (ServerEntry) -> Unit,
onToggleMode: () -> Unit,
onToggleStyle: () -> Unit,
@ -87,7 +88,7 @@ fun NESMenu(
contentAlignment = Alignment.Center,
) {
AnimatedVisibility(visible = visible, enter = fadeIn() + scaleIn(initialScale = 0.95f), exit = fadeOut() + scaleOut(targetScale = 0.95f)) {
MenuPanel(servers, activeServer, isGamepadMode, controllerStyle, onDismiss, onSelectServer, onAddServer, onRemoveServer, onToggleMode, onToggleStyle, onBackToWebView)
MenuPanel(servers, activeServer, isGamepadMode, controllerStyle, onDismiss, onSelectServer, onAddServer, onEditServer, onRemoveServer, onToggleMode, onToggleStyle, onBackToWebView)
}
}
}
@ -102,21 +103,39 @@ private fun MenuPanel(
onDismiss: () -> Unit,
onSelectServer: (ServerEntry) -> Unit,
onAddServer: (ServerEntry) -> Unit,
onEditServer: (ServerEntry, ServerEntry) -> Unit,
onRemoveServer: (ServerEntry) -> Unit,
onToggleMode: () -> Unit,
onToggleStyle: () -> Unit,
onBackToWebView: (() -> Unit)?,
) {
var showAdd by remember { mutableStateOf(false) }
// The saved server being edited, or null when adding a new one.
var editing by remember { mutableStateOf<ServerEntry?>(null) }
var nm by remember { mutableStateOf("") }
var addr by remember { mutableStateOf("") }
var pwd by remember { mutableStateOf("") }
fun resetForm() {
nm = ""; addr = ""; pwd = ""; showAdd = false; editing = null
}
fun startEdit(server: ServerEntry) {
editing = server
nm = server.name; addr = server.address; pwd = server.password
showAdd = false
}
fun submit() {
if (addr.isNotBlank()) {
if (addr.isBlank()) return
val orig = editing
if (orig != null) {
// Preserve fields the compact form doesn't expose (scheme, port).
onEditServer(orig, orig.copy(address = addr, password = pwd, name = nm))
} else {
onAddServer(ServerEntry(addr, false, password = pwd, name = nm))
nm = ""; addr = ""; pwd = ""; showAdd = false
}
resetForm()
}
Column(
@ -149,6 +168,7 @@ private fun MenuPanel(
label = server.displayName(),
selected = active,
onClick = { onSelectServer(server) },
onEdit = { startEdit(server) },
onRemove = { onRemoveServer(server) },
)
}
@ -157,8 +177,8 @@ private fun MenuPanel(
Text("No servers", color = TextMuted, fontSize = 14.sp, modifier = Modifier.padding(vertical = 4.dp))
}
// Add server
if (showAdd) {
// Add / edit server
if (showAdd || editing != null) {
Column(
Modifier
.fillMaxWidth()
@ -168,6 +188,25 @@ private fun MenuPanel(
.padding(12.dp),
verticalArrangement = Arrangement.spacedBy(8.dp),
) {
Row(
Modifier.fillMaxWidth(),
verticalAlignment = Alignment.CenterVertically,
horizontalArrangement = Arrangement.SpaceBetween,
) {
Text(
if (editing != null) "Edit Server" else "Add Server",
color = TextMuted,
fontSize = 13.sp,
letterSpacing = 1.sp,
fontWeight = FontWeight.Medium,
)
Text(
"Cancel",
color = TextMuted,
fontSize = 13.sp,
modifier = Modifier.clickable { resetForm() }.padding(start = 8.dp),
)
}
GlassField(
value = nm, onValueChange = { nm = it },
placeholder = "Name (optional)",
@ -228,6 +267,7 @@ private fun MenuItem(
selected: Boolean = false,
labelColor: Color = TextPrimary,
onClick: () -> Unit,
onEdit: (() -> Unit)? = null,
onRemove: (() -> Unit)? = null,
) {
Row(
@ -247,7 +287,16 @@ private fun MenuItem(
color = if (selected) BitcoinOrange else labelColor,
fontSize = 16.sp,
fontWeight = FontWeight.Medium,
modifier = Modifier.weight(1f),
)
if (onEdit != null) {
Text(
"",
color = TextMuted,
fontSize = 16.sp,
modifier = Modifier.clickable { onEdit() }.padding(horizontal = 8.dp),
)
}
if (onRemove != null) {
Text(
"",

View File

@ -216,6 +216,17 @@ fun RemoteInputScreen(onBack: () -> Unit) {
onAddServer = { server ->
scope.launch { prefs.addSavedServer(server); if (activeServer == null) prefs.setActiveServer(server) }
},
onEditServer = { original, updated ->
scope.launch {
prefs.updateSavedServer(original, updated)
// If the edited server is the live one, reconnect with the new
// address/credentials so the change takes effect immediately.
if (original.serialize() == activeServer?.serialize()) {
ws.disconnect()
prefs.setActiveServer(updated)
}
}
},
onRemoveServer = { server ->
scope.launch {
prefs.removeSavedServer(server)

View File

@ -30,6 +30,7 @@ import androidx.compose.material.icons.filled.VisibilityOff
import androidx.compose.foundation.verticalScroll
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.filled.Close
import androidx.compose.material.icons.filled.Edit
import androidx.compose.material.icons.filled.Lock
import androidx.compose.material.icons.filled.LockOpen
import androidx.compose.material3.CircularProgressIndicator
@ -106,9 +107,50 @@ fun ServerConnectScreen(
var useHttps by remember { mutableStateOf(false) }
var isConnecting by remember { mutableStateOf(false) }
var errorMessage by remember { mutableStateOf<String?>(null) }
// The saved server currently being edited, or null when adding/connecting.
var editingServer by remember { mutableStateOf<ServerEntry?>(null) }
val savedServers by prefs.savedServers.collectAsState(initial = emptyList())
fun clearForm() {
name = ""
address = ""
port = ""
password = ""
useHttps = false
passwordVisible = false
errorMessage = null
}
fun startEdit(server: ServerEntry) {
editingServer = server
name = server.name
address = server.address
port = server.port
password = server.password
useHttps = server.useHttps
passwordVisible = false
errorMessage = null
}
fun cancelEdit() {
editingServer = null
clearForm()
}
fun saveEdit() {
val original = editingServer ?: return
if (address.isBlank()) {
errorMessage = "Enter a server address"
return
}
val updated = ServerEntry(address, useHttps, port, password, name)
scope.launch {
prefs.updateSavedServer(original, updated)
cancelEdit()
}
}
fun connect(server: ServerEntry) {
if (isConnecting) return
if (server.address.isBlank()) {
@ -178,7 +220,7 @@ fun ServerConnectScreen(
Spacer(modifier = Modifier.height(4.dp))
Text(
text = "Connect to Server",
text = if (editingServer != null) stringResource(R.string.edit_server_title) else "Connect to Server",
style = MaterialTheme.typography.headlineMedium,
color = TextPrimary,
textAlign = TextAlign.Center,
@ -324,7 +366,11 @@ fun ServerConnectScreen(
keyboardActions = KeyboardActions(
onGo = {
keyboard?.hide()
connect(ServerEntry(address, useHttps, port, password, name))
if (editingServer != null) {
saveEdit()
} else {
connect(ServerEntry(address, useHttps, port, password, name))
}
},
),
colors = OutlinedTextFieldDefaults.colors(
@ -389,15 +435,40 @@ fun ServerConnectScreen(
}
}
// Connect button — glass style
GlassButton(
text = if (isConnecting) stringResource(R.string.connecting) else stringResource(R.string.connect),
onClick = {
keyboard?.hide()
connect(ServerEntry(address, useHttps, port, password, name))
},
modifier = Modifier.fillMaxWidth().height(56.dp),
)
if (editingServer != null) {
// Save / Cancel while editing an existing saved server
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.spacedBy(12.dp),
) {
GlassButton(
text = stringResource(R.string.cancel),
onClick = {
keyboard?.hide()
cancelEdit()
},
modifier = Modifier.weight(1f).height(56.dp),
)
GlassButton(
text = stringResource(R.string.save_changes),
onClick = {
keyboard?.hide()
saveEdit()
},
modifier = Modifier.weight(1f).height(56.dp),
)
}
} else {
// Connect button — glass style
GlassButton(
text = if (isConnecting) stringResource(R.string.connecting) else stringResource(R.string.connect),
onClick = {
keyboard?.hide()
connect(ServerEntry(address, useHttps, port, password, name))
},
modifier = Modifier.fillMaxWidth().height(56.dp),
)
}
if (isConnecting) {
CircularProgressIndicator(
@ -407,8 +478,8 @@ fun ServerConnectScreen(
)
}
// Saved servers
if (savedServers.isNotEmpty()) {
// Saved servers (hidden while editing one to keep focus on the form)
if (editingServer == null && savedServers.isNotEmpty()) {
Spacer(modifier = Modifier.height(8.dp))
Text(
text = stringResource(R.string.saved_servers),
@ -422,6 +493,7 @@ fun ServerConnectScreen(
SavedServerItem(
server = server,
onConnect = { connect(it) },
onEdit = { startEdit(it) },
onRemove = { scope.launch { prefs.removeSavedServer(it) } },
)
}
@ -434,6 +506,7 @@ fun ServerConnectScreen(
private fun SavedServerItem(
server: ServerEntry,
onConnect: (ServerEntry) -> Unit,
onEdit: (ServerEntry) -> Unit,
onRemove: (ServerEntry) -> Unit,
) {
Row(
@ -476,6 +549,9 @@ private fun SavedServerItem(
}
}
}
IconButton(onClick = { onEdit(server) }) {
Icon(imageVector = Icons.Default.Edit, contentDescription = stringResource(R.string.edit_server), modifier = Modifier.size(18.dp), tint = TextMuted)
}
IconButton(onClick = { onRemove(server) }) {
Icon(imageVector = Icons.Default.Close, contentDescription = stringResource(R.string.remove_server), modifier = Modifier.size(18.dp), tint = TextMuted)
}

View File

@ -2,6 +2,7 @@ package com.archipelago.app.ui.screens
import android.annotation.SuppressLint
import android.graphics.Bitmap
import android.graphics.BitmapFactory
import android.view.ViewGroup
import android.webkit.CookieManager
import android.webkit.WebChromeClient
@ -14,6 +15,7 @@ import androidx.activity.compose.BackHandler
import androidx.compose.animation.AnimatedVisibility
import androidx.compose.animation.fadeIn
import androidx.compose.animation.fadeOut
import androidx.compose.foundation.Image
import androidx.compose.foundation.background
import androidx.compose.foundation.layout.Arrangement
import androidx.compose.foundation.layout.Box
@ -27,17 +29,24 @@ import androidx.compose.foundation.layout.height
import androidx.compose.foundation.layout.padding
import androidx.compose.foundation.layout.safeDrawing
import androidx.compose.foundation.layout.size
import androidx.compose.foundation.layout.width
import androidx.compose.foundation.layout.windowInsetsPadding
import androidx.compose.foundation.shape.RoundedCornerShape
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.automirrored.filled.ArrowBack
import androidx.compose.material.icons.automirrored.filled.ArrowForward
import androidx.compose.material.icons.filled.Close
import androidx.compose.material.icons.filled.CloudOff
import androidx.compose.material.icons.filled.OpenInBrowser
import androidx.compose.material.icons.filled.Refresh
import androidx.compose.material3.CircularProgressIndicator
import androidx.compose.material3.Icon
import androidx.compose.material3.IconButton
import androidx.compose.material3.LinearProgressIndicator
import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Text
import androidx.compose.runtime.Composable
import androidx.compose.runtime.LaunchedEffect
import androidx.compose.runtime.getValue
import androidx.compose.runtime.mutableIntStateOf
import androidx.compose.runtime.mutableStateOf
@ -45,6 +54,8 @@ import androidx.compose.runtime.remember
import androidx.compose.runtime.setValue
import androidx.compose.ui.Alignment
import androidx.compose.ui.Modifier
import androidx.compose.ui.draw.clip
import androidx.compose.ui.graphics.asImageBitmap
import androidx.compose.ui.platform.LocalContext
import androidx.compose.ui.res.stringResource
import androidx.compose.ui.text.style.TextAlign
@ -56,6 +67,8 @@ import com.archipelago.app.ui.theme.BitcoinOrange
import com.archipelago.app.ui.theme.SurfaceBlack
import com.archipelago.app.ui.theme.TextMuted
import com.archipelago.app.ui.theme.TextPrimary
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.withContext
/** Open a URL in the phone's default browser (genuinely external links). */
private fun openExternalUrl(context: android.content.Context, url: String) {
@ -310,6 +323,26 @@ fun WebViewScreen(
}
}
// Node apps (e.g. NetBird) terminate TLS with a
// self-signed cert — the dashboard needs a secure
// context for OIDC/window.crypto.subtle (#15). The
// WebView default is to CANCEL untrusted certs, so
// those apps render blank. The user explicitly trusts
// their own node, so proceed for same-host certs only;
// reject anything else (don't blanket-trust the web).
override fun onReceivedSslError(
view: WebView?,
handler: android.webkit.SslErrorHandler?,
error: android.net.http.SslError?,
) {
val u = error?.url
if (u != null && isSameHost(u, serverUrl)) {
handler?.proceed()
} else {
handler?.cancel()
}
}
override fun shouldOverrideUrlLoading(
view: WebView?,
request: WebResourceRequest?,
@ -428,11 +461,34 @@ fun WebViewScreen(
}
}
/** Best-effort fetch of the origin's /favicon.ico, so the launched app's icon
* can be shown on the loading screen before the WebView reports onReceivedIcon
* (which only fires once the page's <head> has parsed). Blocking call on IO. */
private fun fetchFavicon(pageUrl: String): Bitmap? {
return try {
val u = android.net.Uri.parse(pageUrl)
val scheme = u.scheme ?: return null
val host = u.host ?: return null
val portPart = if (u.port > 0) ":${u.port}" else ""
val conn = (java.net.URL("$scheme://$host$portPart/favicon.ico").openConnection()
as java.net.HttpURLConnection).apply {
connectTimeout = 4000
readTimeout = 4000
instanceFollowRedirects = true
}
conn.inputStream.use { BitmapFactory.decodeStream(it) }
} catch (_: Exception) {
null
}
}
/**
* Lightweight in-app browser used when the kiosk hands off an app that can't be
* shown in an iframe. Loads the app in a local WebView with a minimal top bar
* (close + title + escalate-to-real-browser). Same-host navigation stays here;
* any genuinely external link escapes to the phone's browser.
* shown in an iframe. Loads the app in a local WebView with a centered loading
* screen (app favicon + progress bar) and a BOTTOM control bar mirroring the
* web mobile-iframe footer (back / forward / reload / open-in-browser / close).
* Same-host navigation stays here; any genuinely external link escapes to the
* phone's browser.
*/
@SuppressLint("SetJavaScriptEnabled")
@Composable
@ -444,8 +500,20 @@ private fun InAppBrowser(
val context = LocalContext.current
var browser by remember { mutableStateOf<WebView?>(null) }
var title by remember { mutableStateOf(android.net.Uri.parse(url).host ?: url) }
var favicon by remember { mutableStateOf<Bitmap?>(null) }
var progress by remember { mutableIntStateOf(0) }
var loading by remember { mutableStateOf(true) }
var canGoBack by remember { mutableStateOf(false) }
var canGoForward by remember { mutableStateOf(false) }
// Seed the loading-screen icon immediately from a best-effort favicon
// pre-fetch (main's app-icon work), then onReceivedIcon upgrades it — so the
// loader shows an icon right away instead of staying blank until the page
// parses its <head> (which is what made the loader look stuck).
LaunchedEffect(url) {
val fetched = withContext(Dispatchers.IO) { fetchFavicon(url) }
if (fetched != null && favicon == null) favicon = fetched
}
// Back: walk the in-app history first, then close the overlay.
BackHandler {
@ -459,13 +527,169 @@ private fun InAppBrowser(
.background(SurfaceBlack)
.windowInsetsPadding(WindowInsets.safeDrawing),
) {
// WebView + loading overlay fill the area above the bottom control bar.
Box(modifier = Modifier.weight(1f).fillMaxWidth()) {
AndroidView(
modifier = Modifier.fillMaxSize(),
factory = { ctx ->
WebView(ctx).apply {
layoutParams = ViewGroup.LayoutParams(
ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT,
)
isVerticalScrollBarEnabled = false
isHorizontalScrollBarEnabled = false
CookieManager.getInstance().setAcceptThirdPartyCookies(this, true)
applyArchipelagoSettings()
webChromeClient = object : WebChromeClient() {
override fun onProgressChanged(view: WebView?, newProgress: Int) {
progress = newProgress
}
override fun onReceivedTitle(view: WebView?, t: String?) {
if (!t.isNullOrBlank()) title = t
}
override fun onReceivedIcon(view: WebView?, icon: Bitmap?) {
if (icon != null) favicon = icon
}
}
webViewClient = object : WebViewClient() {
override fun onPageStarted(view: WebView?, u: String?, favicon: Bitmap?) {
loading = true
}
override fun onPageFinished(view: WebView?, u: String?) {
loading = false
canGoBack = view?.canGoBack() == true
canGoForward = view?.canGoForward() == true
}
override fun doUpdateVisitedHistory(view: WebView?, u: String?, isReload: Boolean) {
canGoBack = view?.canGoBack() == true
canGoForward = view?.canGoForward() == true
}
// Self-signed TLS on the node's apps (e.g. NetBird on
// :8087) would otherwise be cancelled by the WebView
// and render blank. Proceed for the user's own node
// (same host); reject any other untrusted cert.
override fun onReceivedSslError(
view: WebView?,
handler: android.webkit.SslErrorHandler?,
error: android.net.http.SslError?,
) {
val u = error?.url
if (u != null && isSameHost(u, serverUrl)) {
handler?.proceed()
} else {
handler?.cancel()
}
}
override fun shouldOverrideUrlLoading(
view: WebView?,
request: WebResourceRequest?,
): Boolean {
val u = request?.url?.toString() ?: return false
// Stay in the overlay for same-node navigation;
// hand genuinely external links to the real browser.
if (isSameHost(u, serverUrl)) return false
openExternalUrl(ctx, u)
return true
}
}
browser = this
loadUrl(url)
}
},
)
// Centered loading screen — app favicon (or spinner) + title + bar.
if (loading) {
Column(
modifier = Modifier
.fillMaxSize()
.background(SurfaceBlack),
horizontalAlignment = Alignment.CenterHorizontally,
verticalArrangement = Arrangement.Center,
) {
Box(
modifier = Modifier.size(84.dp).clip(RoundedCornerShape(20.dp)),
contentAlignment = Alignment.Center,
) {
val fav = favicon
if (fav != null) {
Image(
bitmap = fav.asImageBitmap(),
contentDescription = title,
modifier = Modifier.fillMaxSize(),
)
} else {
CircularProgressIndicator(color = BitcoinOrange)
}
}
Spacer(modifier = Modifier.height(18.dp))
Text(
text = title,
style = MaterialTheme.typography.bodyLarge,
color = TextPrimary,
maxLines = 1,
overflow = TextOverflow.Ellipsis,
)
Spacer(modifier = Modifier.height(16.dp))
LinearProgressIndicator(
progress = { progress / 100f },
modifier = Modifier.width(220.dp),
color = BitcoinOrange,
trackColor = TextMuted.copy(alpha = 0.2f),
)
}
}
}
// Bottom control bar — mirrors the web mobile-iframe footer.
Row(
modifier = Modifier
.fillMaxWidth()
.height(48.dp)
.padding(horizontal = 4.dp),
.height(56.dp)
.background(SurfaceBlack)
.padding(horizontal = 8.dp),
horizontalArrangement = Arrangement.SpaceAround,
verticalAlignment = Alignment.CenterVertically,
) {
IconButton(onClick = { browser?.goBack() }, enabled = canGoBack) {
Icon(
imageVector = Icons.AutoMirrored.Filled.ArrowBack,
contentDescription = "Back",
tint = if (canGoBack) TextPrimary else TextMuted.copy(alpha = 0.4f),
)
}
IconButton(onClick = { browser?.goForward() }, enabled = canGoForward) {
Icon(
imageVector = Icons.AutoMirrored.Filled.ArrowForward,
contentDescription = "Forward",
tint = if (canGoForward) TextPrimary else TextMuted.copy(alpha = 0.4f),
)
}
IconButton(onClick = { browser?.reload() }) {
Icon(
imageVector = Icons.Default.Refresh,
contentDescription = "Reload",
tint = TextPrimary,
)
}
IconButton(onClick = { openExternalUrl(context, browser?.url ?: url) }) {
Icon(
imageVector = Icons.Default.OpenInBrowser,
contentDescription = stringResource(R.string.open_in_browser),
tint = TextPrimary,
)
}
IconButton(onClick = onClose) {
Icon(
imageVector = Icons.Default.Close,
@ -473,82 +697,6 @@ private fun InAppBrowser(
tint = TextPrimary,
)
}
Text(
text = title,
style = MaterialTheme.typography.bodyMedium,
color = TextPrimary,
maxLines = 1,
overflow = TextOverflow.Ellipsis,
modifier = Modifier.weight(1f),
)
IconButton(onClick = { openExternalUrl(context, browser?.url ?: url) }) {
Icon(
imageVector = Icons.Default.OpenInBrowser,
contentDescription = stringResource(R.string.open_in_browser),
tint = TextMuted,
)
}
}
AnimatedVisibility(visible = loading, enter = fadeIn(), exit = fadeOut()) {
LinearProgressIndicator(
progress = { progress / 100f },
modifier = Modifier.fillMaxWidth(),
color = BitcoinOrange,
trackColor = SurfaceBlack,
)
}
AndroidView(
modifier = Modifier.fillMaxSize(),
factory = { ctx ->
WebView(ctx).apply {
layoutParams = ViewGroup.LayoutParams(
ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT,
)
isVerticalScrollBarEnabled = false
isHorizontalScrollBarEnabled = false
CookieManager.getInstance().setAcceptThirdPartyCookies(this, true)
applyArchipelagoSettings()
webChromeClient = object : WebChromeClient() {
override fun onProgressChanged(view: WebView?, newProgress: Int) {
progress = newProgress
}
override fun onReceivedTitle(view: WebView?, t: String?) {
if (!t.isNullOrBlank()) title = t
}
}
webViewClient = object : WebViewClient() {
override fun onPageStarted(view: WebView?, u: String?, favicon: Bitmap?) {
loading = true
}
override fun onPageFinished(view: WebView?, u: String?) {
loading = false
}
override fun shouldOverrideUrlLoading(
view: WebView?,
request: WebResourceRequest?,
): Boolean {
val u = request?.url?.toString() ?: return false
// Stay in the overlay for same-node navigation;
// hand genuinely external links to the real browser.
if (isSameHost(u, serverUrl)) return false
openExternalUrl(ctx, u)
return true
}
}
browser = this
loadUrl(url)
}
},
)
}
}

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M15,19l-7,-7 7,-7"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M6,18L18,6M6,6l12,12"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M9,5l7,7 -7,7"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M10,6H6a2,2 0,0 0,-2 2v10a2,2 0,0 0,2 2h10a2,2 0,0 0,2 -2v-4M14,4h6m0,0v6m0,-6L10,14"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -0,0 +1,12 @@
<vector xmlns:android="http://schemas.android.com/apk/res/android"
android:width="24dp"
android:height="24dp"
android:viewportWidth="24"
android:viewportHeight="24">
<path
android:pathData="M4,4v6h6M20,20v-6h-6M5.64,15.36A8,8 0,0 0,18.36 18M18.36,8.64A8,8 0,0 0,5.64 6"
android:strokeColor="#FFFFFF"
android:strokeWidth="2"
android:strokeLineCap="round"
android:strokeLineJoin="round" />
</vector>

View File

@ -23,6 +23,13 @@
<string name="remote_input_hint">Use your phone as a keyboard and mouse for the kiosk</string>
<string name="close">Close</string>
<string name="open_in_browser">Open in browser</string>
<string name="back">Back</string>
<string name="forward">Forward</string>
<string name="refresh">Refresh</string>
<string name="server_name_label">Server Name (optional)</string>
<string name="server_name_placeholder">My Archipelago</string>
<string name="edit_server">Edit</string>
<string name="edit_server_title">Edit Server</string>
<string name="save_changes">Save Changes</string>
<string name="cancel">Cancel</string>
</resources>

View File

@ -1,13 +1,18 @@
#!/usr/bin/env bash
#
# Build the Android companion app and publish it as the served download
# (neode-ui/public/packages/archipelago-companion.apk.zip), then commit + push.
# (neode-ui/public/packages/archipelago-companion.apk — a plain APK a phone can
# install straight from the link), then commit + push.
#
# Use this INSTEAD of `git push` when shipping the companion app, so the
# downloadable APK on the node always matches what's on main.
#
# ./Android/ship-companion.sh
#
# The actual build/sign/verify/stage is done by scripts/publish-companion-apk.sh
# (single source of truth, shared with the pre-push hook). It does a CLEAN build,
# forces v1+v2+v3 signing, and ABORTS if any signature scheme is missing — so a
# broken or v2-only APK can never be shipped.
set -euo pipefail
ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
@ -16,21 +21,15 @@ cd "$ROOT"
export JAVA_HOME="${JAVA_HOME:-/opt/homebrew/opt/openjdk@17}"
export ANDROID_HOME="${ANDROID_HOME:-$HOME/Library/Android/sdk}"
APK="Android/app/build/outputs/apk/debug/app-debug.apk"
DEST="neode-ui/public/packages/archipelago-companion.apk.zip"
DEST="neode-ui/public/packages/archipelago-companion.apk"
echo "==> Building debug APK"
( cd Android && ./gradlew :app:assembleDebug --console=plain -q )
[ -f "$APK" ] || { echo "ERROR: APK not found at $APK" >&2; exit 1; }
echo "==> Building + signing + verifying companion APK"
bash scripts/publish-companion-apk.sh
echo "==> Publishing -> $DEST"
mkdir -p "$(dirname "$DEST")"
rm -f "$DEST"
( cd "$(dirname "$APK")" && zip -j -q "$ROOT/$DEST" "$(basename "$APK")" )
[ -f "$DEST" ] || { echo "ERROR: served APK not found at $DEST" >&2; exit 1; }
git add "$DEST"
if git diff --cached --quiet; then
echo "==> Nothing to commit (working tree + APK unchanged)"
if git diff --cached --quiet -- "$DEST"; then
echo "==> Nothing to commit (APK unchanged)"
else
git commit -q -m "chore(android): update companion apk download"
echo "==> Committed"

View File

@ -1,13 +1,18 @@
# Archipelago — agent guide
## 🚩 TOP PRIORITY (until production testing passes)
## ✅ Single-node production gate is GREEN (2026-06-23)
**Read `docs/PRODUCTION-MASTER-PLAN.md` first.** It is the authoritative plan and
overrides ad-hoc direction until the production test gate is green. Goal: a
world-class, **developer-ready app platform** where every app is manifest-driven,
manifests ship via the **signed registry** (not OTA disk files), and **third-party
developers publish apps via an external/decentralized registry** — all rootless,
secure, robust, and 100%-uptime-capable.
`tests/lifecycle/run-gate.sh` is **5/5 on .228, 0 failures** — the single-node exit
criterion is met and the priority banner is demoted. Next exit-criteria: the
**multinode pass** (`docs/multinode-testing-plan.md`) and workstreams B/C/D.
**Read `docs/PRODUCTION-MASTER-PLAN.md` first** — it is still the authoritative plan
for the north star: a world-class, **developer-ready app platform** where every app
is manifest-driven, manifests ship via the **signed registry** (not OTA disk files),
and **third-party developers publish apps via an external/decentralized registry**
all rootless, secure, robust, and 100%-uptime-capable. It no longer overrides all
ad-hoc direction now that the gate is green, but it remains the source of truth for
sequencing the remaining workstreams.
Detailed sub-plans (all linked from the master):
- App platform / packaging phases + security model → `docs/APP-PACKAGING-MIGRATION-PLAN.md`
@ -27,7 +32,8 @@ Detailed sub-plans (all linked from the master):
`container::secrets`, 0600/rootless) — never hardcoded, per-app, or logged.
- **Migrations never destroy data** — preserve `/var/lib/archipelago/<app>`,
secrets, credentials, ports, and adoption container names; keep a rollback path.
- **Verify on a real node (.228, then .198) before any tag.**
- **Verify on the real node .228 before any tag.** (Fleet-wide multinode
verification is a separate plan: `docs/multinode-testing-plan.md`.)
## Build / verify
@ -41,7 +47,11 @@ Detailed sub-plans (all linked from the master):
## Production test gate (definition of done)
`tests/lifecycle/run-20x.sh` green across install / UI / stop / start / restart /
`tests/lifecycle/run-gate.sh` green across install / UI / stop / start / restart /
reinstall / reboot-survive / archipelago-restart-survive / uninstall — **5× on
.228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from 20×
restore to 20× before the final ship). Until green, the master plan is the priority.
.228** (`ARCHY_ITERATIONS=5`). **Run the gate ON the node** (it uses local podman/systemctl/bitcoin
probes), not via RPC from another host. **✅ GREEN 2026-06-23 (5/5, 0 not-ok)** — keep it
green (re-run after orchestrator/lifecycle changes); regressions are top priority again.
**Multinode testing (.198 + the rest of the fleet) is a SEPARATE plan** —
`docs/multinode-testing-plan.md` — not part of this single-node gate criterion, and is
the next exit criterion now that single-node is green.

View File

@ -73,7 +73,7 @@
"author": "Mempool",
"category": "money",
"tier": "core",
"dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0",
"dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1",
"repoUrl": "https://github.com/mempool/mempool",
"requires": [
"bitcoin-knots",
@ -195,7 +195,7 @@
"title": "Nostr Relay (Rust)",
"version": "0.8.0",
"description": "High-performance Nostr relay written in Rust. Host your own decentralized social media relay and earn networking profits.",
"icon": "/assets/img/app-icons/nostrudel.svg",
"icon": "/assets/img/app-icons/nostr.svg",
"author": "Nostr RS Relay",
"category": "community",
"tier": "recommended",
@ -214,31 +214,6 @@
]
}
},
{
"id": "meshtastic",
"title": "Meshtastic",
"version": "2-daily-alpine",
"description": "Open-source mesh networking for LoRa radios. Create decentralized communication networks.",
"icon": "/assets/img/app-icons/meshcore.svg",
"author": "Meshtastic",
"category": "networking",
"tier": "recommended",
"dockerImage": "docker.io/meshtastic/meshtasticd:daily-alpine",
"repoUrl": "https://github.com/meshtastic/firmware",
"containerConfig": {
"ports": [
"4403:4403"
],
"volumes": [
"/var/lib/archipelago/meshtastic:/var/lib/meshtasticd"
],
"env": [
"MESHTASTIC_PORT=/dev/ttyUSB0",
"MESHTASTIC_SERIAL=true"
],
"notes": "Requires a LoRa radio device at /dev/ttyUSB0. The config file is rendered from the app manifest before container start."
}
},
{
"id": "vaultwarden",
"title": "Vaultwarden",

View File

@ -1,12 +1,12 @@
app:
id: archy-mempool-web
name: Mempool Web
version: 3.0.0
version: 3.0.1
description: Frontend web UI for mempool explorer.
container_name: mempool
container:
image: git.tx1138.com/lfg2025/mempool-frontend:v3.0.0
image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1
pull_policy: if-not-present
network: archy-net

View File

@ -5,7 +5,7 @@ app:
description: Bitcoin mempool and blockchain explorer. Real-time transaction and block visualization.
container:
image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0
image: 146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1
image_signature: cosign://...
pull_policy: if-not-present

View File

@ -1,5 +0,0 @@
# Meshtastic - uses official image
FROM meshtastic/meshtastic:latest
# Default configuration is in the image
# No additional setup needed

View File

@ -1,69 +0,0 @@
app:
id: meshtastic
name: Meshtastic
version: 2-daily-alpine
description: Open-source mesh networking for LoRa radios. Create decentralized communication networks.
container:
image: docker.io/meshtastic/meshtasticd:daily-alpine
pull_policy: if-not-present
dependencies:
- storage: 1Gi
resources:
cpu_limit: 1
memory_limit: 512Mi
disk_limit: 1Gi
security:
capabilities: [NET_ADMIN, SYS_ADMIN] # Required for LoRa radio access
readonly_root: false # Needs write access for device management
no_new_privileges: true
user: 1000
seccomp_profile: default
network_policy: host # Requires host network for radio access
apparmor_profile: meshtastic
ports:
- host: 4403
container: 4403
protocol: tcp # Meshtastic TCP API
devices:
- /dev/ttyUSB0 # LoRa radio device (if connected)
volumes:
- type: bind
source: /var/lib/archipelago/meshtastic
target: /var/lib/meshtasticd
options: [rw]
files:
- path: /var/lib/archipelago/meshtastic/config.yaml
content: |
General:
MACAddress: AA:BB:CC:DD:EE:01
Webserver:
Port: 4403
environment:
- MESHTASTIC_PORT=/dev/ttyUSB0
- MESHTASTIC_SERIAL=true
health_check:
type: cmd
endpoint: test -f /var/lib/meshtasticd/config.yaml
interval: 30s
timeout: 30s
retries: 5
networking:
mesh_enabled: true
local_network_access: true
metadata:
icon: /assets/img/app-icons/meshcore.svg
category: networking
tier: recommended
repo: https://github.com/meshtastic/firmware

View File

@ -0,0 +1,77 @@
app:
id: netbird-dashboard
name: NetBird Dashboard
version: "2.38.0"
description: NetBird management dashboard (SPA). Internal stack member served through the netbird proxy.
category: networking
# Hyphen name matches runtime references + the live container (adoption).
# Alias `netbird-dashboard` is the short hostname the proxy's nginx proxies to.
container_name: netbird-dashboard
container:
image: docker.io/netbirdio/dashboard:v2.38.0
pull_policy: if-not-present
network: netbird-net
network_aliases: [netbird-dashboard]
# The dashboard SPA bakes its API/OIDC base URL from these at container
# start. They must point at the proxy's public HTTPS origin (8087) so the
# browser uses a secure context (window.crypto.subtle / OIDC PKCE, #15).
# {{HOST_IP}} is the node's primary host IP, resolved at apply time.
derived_env:
- key: NETBIRD_MGMT_API_ENDPOINT
template: "https://{{HOST_IP}}:8087"
- key: NETBIRD_MGMT_GRPC_API_ENDPOINT
template: "https://{{HOST_IP}}:8087"
- key: AUTH_AUTHORITY
template: "https://{{HOST_IP}}:8087/oauth2"
dependencies:
- app_id: netbird-server
resources:
memory_limit: 256Mi
security:
# cap-drop=ALL is applied by the orchestrator. The dashboard image runs
# nginx (master as root, drops workers) binding :80 — needs the worker-drop
# caps + NET_BIND_SERVICE for the privileged port.
capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE]
readonly_root: false
network_policy: isolated
# Internal only — reached container-to-container by the proxy via netbird-net.
ports: []
volumes: []
environment:
- AUTH_AUDIENCE=netbird-dashboard
- AUTH_CLIENT_ID=netbird-dashboard
- AUTH_CLIENT_SECRET=
- USE_AUTH0=false
- AUTH_SUPPORTED_SCOPES=openid profile email groups
- AUTH_REDIRECT_URI=/nb-auth
- AUTH_SILENT_REDIRECT_URI=/nb-silent-auth
- NETBIRD_TOKEN_SOURCE=idToken
- NGINX_SSL_PORT=443
- LETSENCRYPT_DOMAIN=none
health_check:
type: tcp
endpoint: localhost:80
interval: 30s
timeout: 5s
retries: 5
start_period: 20s
metadata:
author: NetBird
icon: /assets/img/app-icons/netbird.svg
website: https://netbird.io
repo: https://github.com/netbirdio/dashboard
license: BSD-3-Clause
tags:
- networking
- vpn
- dashboard

View File

@ -0,0 +1,122 @@
app:
id: netbird-server
name: NetBird Server
version: "0.71.2"
description: NetBird combined management / signal / relay server with an embedded identity provider and STUN. Backend for the self-hosted NetBird mesh VPN.
category: networking
# Hyphen name matches the runtime references (crash_recovery / dependencies /
# config startup order) + the live container, so on an existing node the
# orchestrator ADOPTS the running server rather than recreating it (data +
# the sqlite store under /var/lib/netbird preserved). Alias `netbird-server`
# is the short hostname the proxy's nginx proxies/grpc-passes to.
container_name: netbird-server
container:
image: docker.io/netbirdio/netbird-server:0.71.2
pull_policy: if-not-present
network: netbird-net
network_aliases: [netbird-server]
# The relay authSecret and the sqlite store encryptionKey are base64 keys
# (the server base64-decodes them to recover raw bytes — hex would decode to
# the wrong value). Generated once and reused: ensure_generated_secrets
# no-ops when the file already exists, so a re-render of config.yaml on an
# adopted node keeps the same keys (regenerating would orphan the store).
generated_secrets:
- name: netbird-relay-auth-secret
kind: base64
- name: netbird-store-encryption-key
kind: base64
# Pass the rendered config explicitly, mirroring the legacy `--config` arg.
custom_args: ["--config", "/etc/netbird/config.yaml"]
dependencies:
- storage: 1Gi
resources:
memory_limit: 1Gi
security:
# cap-drop=ALL is applied by the orchestrator. The server binds :80
# (management/signal/relay HTTP + gRPC) inside the container — a privileged
# port — so it needs NET_BIND_SERVICE. STUN is 3478/udp (unprivileged).
capabilities: [NET_BIND_SERVICE]
readonly_root: false
network_policy: isolated
ports:
- host: 8086
container: 80
protocol: tcp # management API + embedded OIDC issuer (/oauth2)
- host: 3478
container: 3478
protocol: udp # STUN — must be UDP; tcp here breaks relay discovery
volumes:
- type: bind
source: /var/lib/archipelago/netbird/data
target: /var/lib/netbird
options: [rw]
# The rendered config.yaml, read-only. Re-rendered on every reconcile from
# host facts + the base64 secrets; idempotent (stable bytes → no restart).
- type: bind
source: /var/lib/archipelago/netbird/config.yaml
target: /etc/netbird/config.yaml
options: [ro]
environment: []
# The server's config. {{HOST_IP}} is the node's primary host IP (the proxy's
# public origin is https on 8087 — the dashboard needs a secure context for
# OIDC PKCE, issue #15). {{secret:...}} are read 0600 from the secrets dir.
files:
- path: /var/lib/archipelago/netbird/config.yaml
overwrite: true
content: |
server:
listenAddress: ":80"
exposedAddress: "https://{{HOST_IP}}:8087"
stunPorts:
- 3478
metricsPort: 9090
healthcheckAddress: ":9000"
logLevel: "info"
logFile: "console"
authSecret: "{{secret:netbird-relay-auth-secret}}"
dataDir: "/var/lib/netbird"
auth:
issuer: "https://{{HOST_IP}}:8087/oauth2"
localAuthDisabled: false
signKeyRefreshEnabled: false
dashboardRedirectURIs:
- "https://{{HOST_IP}}:8087/nb-auth"
- "https://{{HOST_IP}}:8087/nb-silent-auth"
dashboardPostLogoutRedirectURIs:
- "https://{{HOST_IP}}:8087/"
cliRedirectURIs:
- "http://localhost:53000/"
store:
engine: "sqlite"
encryptionKey: "{{secret:netbird-store-encryption-key}}"
# TCP liveness on the management port. Binds at startup, stays green; an http
# check of /oauth2 would false-fail while the issuer warms up.
health_check:
type: tcp
endpoint: localhost:80
interval: 30s
timeout: 5s
retries: 10
start_period: 30s
metadata:
author: NetBird
icon: /assets/img/app-icons/netbird.svg
website: https://netbird.io
repo: https://github.com/netbirdio/netbird
license: BSD-3-Clause
tags:
- networking
- vpn
- wireguard
- mesh

182
apps/netbird/manifest.yml Normal file
View File

@ -0,0 +1,182 @@
app:
id: netbird
name: NetBird
version: "2.38.0"
description: Self-hosted WireGuard mesh VPN control plane with dashboard, embedded identity provider, management API, signal, relay, and STUN. The user-facing entry point — a TLS proxy in front of the dashboard + server.
category: networking
# The user-facing launcher (app_id + container both "netbird", matching the
# runtime references + the live container so the orchestrator adopts it). This
# is the nginx that terminates TLS on 8087 and fans out to the dashboard +
# server by their short aliases on netbird-net.
container_name: netbird
container:
image: docker.io/library/nginx:1.27-alpine
pull_policy: if-not-present
network: netbird-net
# Self-signed TLS cert materialised before create — the dashboard needs a
# secure context (window.crypto.subtle / OIDC PKCE, issue #15), so the proxy
# serves HTTPS. Idempotent: kept as-is when crt+key already exist (a user
# accepts it once). SAN defaults to the host IP + 127.0.0.1 + localhost.
generated_certs:
- crt: /var/lib/archipelago/netbird/tls.crt
key: /var/lib/archipelago/netbird/tls.key
dependencies:
- app_id: netbird-server
- app_id: netbird-dashboard
- storage: 1Gi
resources:
memory_limit: 256Mi
security:
# cap-drop=ALL is applied by the orchestrator. nginx (master as root, drops
# workers) binds :443 — needs the worker-drop caps + NET_BIND_SERVICE.
capabilities: [CHOWN, DAC_OVERRIDE, SETGID, SETUID, NET_BIND_SERVICE]
readonly_root: false
network_policy: isolated
ports:
# 8087 publishes the TLS listener (container :443). HTTPS is required for the
# dashboard's secure context (issue #15).
- host: 8087
container: 443
protocol: tcp
volumes:
- type: bind
source: /var/lib/archipelago/netbird/nginx.conf
target: /etc/nginx/conf.d/default.conf
options: [ro]
- type: bind
source: /var/lib/archipelago/netbird/tls.crt
target: /etc/nginx/tls.crt
options: [ro]
- type: bind
source: /var/lib/archipelago/netbird/tls.key
target: /etc/nginx/tls.key
options: [ro]
environment: []
# The proxy config. {{NETWORK_GATEWAY}} is the netbird-net bridge gateway =
# Podman's aardvark DNS. nginx uses it as an explicit `resolver` with VARIABLE
# upstreams so it re-resolves container names per request — without it nginx
# pins a container IP at startup and 502s forever once that IP moves on a
# restart/reboot (issue #15, observed live on .198). Every #15 fix below
# (CORS $http_origin reflect, grpc pass, nb-auth/nb-silent-auth rewrite to
# index.html, /relay websocket) is preserved verbatim from the legacy config.
files:
- path: /var/lib/archipelago/netbird/nginx.conf
overwrite: true
content: |
server {
listen 443 ssl;
server_name _;
# netbird's dashboard needs a secure context (window.crypto.subtle for
# OIDC PKCE), so the proxy terminates TLS with a self-signed cert (#15).
ssl_certificate /etc/nginx/tls.crt;
ssl_certificate_key /etc/nginx/tls.key;
# Rootless Podman can hand a container a new IP across restarts/reboots.
# nginx resolves a literal upstream name ONCE at startup and caches it,
# so after the IP moves every request 502s with "host unreachable"
# (issue #15, observed live on .198: nginx pinned to a dead
# netbird-dashboard IP). Fix: point `resolver` at the netbird-net
# gateway (Podman's aardvark DNS) and use VARIABLE upstreams, which
# forces nginx to re-resolve the container names at request time.
resolver {{NETWORK_GATEWAY}} valid=10s ipv6=off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
location ~ ^/(relay|ws-proxy/) {
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 1d;
}
location ~ ^/(api|oauth2)(/|$) {
# The dashboard is a SPA whose API/OIDC base URL is baked at build
# time to one host:port. A single box is reached via several
# addresses, so those fetches are cross-origin and the browser
# blocks them with no Access-Control-Allow-Origin (#15, live on
# .198). Reflect the caller's Origin and answer the CORS preflight.
if ($request_method = OPTIONS) {
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
add_header Access-Control-Max-Age 86400 always;
add_header Content-Length 0;
return 204;
}
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
}
location ~ ^/(signalexchange\.SignalExchange|management\.ManagementService|management\.ProxyService)/ {
set $nb_server netbird-server;
grpc_pass grpc://$nb_server:80;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
}
# OIDC callback routes are client-side SPA routes with NO prebuilt page
# in the dashboard bundle, so proxying them straight through 404s —
# which crashes the dashboard's auth init and shows "Unauthenticated"
# with dead buttons (#15, live on .198: /nb-auth + /nb-silent-auth
# returned 404). Serve index.html at these paths (URL unchanged) so
# react-oidc boots and completes the login / silent-SSO.
location ~ ^/(nb-auth|nb-silent-auth) {
set $nb_dashboard netbird-dashboard;
rewrite ^.*$ /index.html break;
proxy_pass http://$nb_dashboard:80;
}
location / {
set $nb_dashboard netbird-dashboard;
proxy_pass http://$nb_dashboard:80;
}
}
health_check:
type: tcp
endpoint: localhost:443
interval: 30s
timeout: 5s
retries: 5
start_period: 20s
interfaces:
main:
name: Dashboard
description: Manage your self-hosted NetBird mesh VPN
type: ui
port: 8087
protocol: https
path: /
metadata:
author: NetBird
icon: /assets/img/app-icons/netbird.svg
website: https://netbird.io
repo: https://github.com/netbirdio/netbird
license: BSD-3-Clause
tags:
- networking
- vpn
- wireguard
- mesh

View File

@ -171,6 +171,13 @@ impl RpcHandler {
// than the WebSocket-delivered package_data, which caused apps to flicker
// between "installed" and "not-installed" in the UI.
let (data, _) = self.state_manager.get_snapshot().await;
// Apps the user explicitly stopped must read as "stopped" even though a
// UI companion (electrs-ui, bitcoin-ui, …) keeps serving the launch port:
// launch_port_reachable() below would otherwise upgrade an exited backend
// back to "running". The reconcile guard keeps these backends down, so the
// marker is authoritative here.
let user_stopped =
crate::crash_recovery::load_user_stopped(&self.config.data_dir).await;
if data.server_info.status_info.containers_scanned && !data.package_data.is_empty() {
let mut containers = Vec::with_capacity(data.package_data.len());
for (id, pkg) in &data.package_data {
@ -202,7 +209,11 @@ impl RpcHandler {
// Scanner backoff preserves cached package_data. Refresh stable
// states so callers do not see stale `running`/`exited` after
// health-monitor recovery or Quadlet --rm container removal.
if state == "running" && requires_launch_port_for_health(id) {
if user_stopped.contains(id) {
// User stopped it → authoritative "stopped". Do NOT let a
// still-running UI companion's launch port mark it running.
state = "stopped".to_string();
} else if state == "running" && requires_launch_port_for_health(id) {
if !self.cached_reachable_health(id).await?.is_some() {
state = live_state_for_app(id)
.await

View File

@ -376,16 +376,31 @@ pub(super) fn startup_order(package_id: &str) -> &'static [&'static str] {
/// order for the given app. Unknown containers sort to the end.
pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec<String>> {
let containers = get_containers_for_app(package_id).await?;
Ok(order_present_containers(package_id, containers))
}
/// Order the *actually-present* containers of an app by its dependency-aware
/// startup order. Containers whose name is unknown to the order list sort to
/// the end, preserving their relative input order.
///
/// This deliberately does NOT inject order entries that aren't live
/// containers. `startup_order` is a union of container-name variants across
/// install generations (e.g. `mysql-mempool` vs `archy-mempool-db`), so any
/// single install only ever has a subset of those names. Injecting a phantom
/// name makes the start path fail on a "no such object" inspect — and because
/// `do_orchestrator_package_start` propagates the unknown-app-id fallback
/// error via `?`, every later member (the api + frontend) is then skipped,
/// leaving the stack down until the health monitor recovers it minutes later.
/// That was the source of mempool gate flakes #73 (frontend) / #74 (api).
fn order_present_containers(package_id: &str, containers: Vec<String>) -> Vec<String> {
if containers.is_empty() {
// Nothing is live under any known name. Fall back to the package id so
// a single-container app whose container matches its id still gets one
// start attempt; multi-container stacks with no live members are
// surfaced as "no containers" by the caller's emptiness check.
return vec![package_id.to_string()];
}
let order = startup_order(package_id);
if order.is_empty() && containers.is_empty() {
return Ok(vec![package_id.to_string()]);
}
let mut sorted = containers;
for required in order {
if !sorted.iter().any(|name| name == required) {
sorted.push((*required).to_string());
}
}
// If no special order is defined, fall back to mempool order for legacy
// multi-container names that may still be returned by config lookups.
let effective_order: &[&str] = if order.is_empty() {
@ -393,8 +408,14 @@ pub(super) async fn ordered_containers_for_start(package_id: &str) -> Result<Vec
} else {
order
};
sorted.sort_by_key(|c| effective_order.iter().position(|o| *o == c).unwrap_or(99));
Ok(sorted)
let mut sorted = containers;
sorted.sort_by_key(|c| {
effective_order
.iter()
.position(|o| *o == c)
.unwrap_or(usize::MAX)
});
sorted
}
/// Configure Fedimint Gateway to use LND instead of LDK.
@ -452,7 +473,48 @@ pub(super) fn configure_fedimint_lnd(
#[cfg(test)]
mod tests {
use super::{requires_unpruned_bitcoin, startup_order};
use super::{order_present_containers, requires_unpruned_bitcoin, startup_order};
#[test]
fn order_present_containers_never_injects_phantom_stack_members() {
// The live mempool stack on a node: db + api + frontend. These are the
// only real container names; the startup_order list also contains
// variant/legacy names (mysql-mempool, archy-mempool-api, ...) that are
// NOT live here and must never appear in the result — a phantom name in
// the start list aborts the orchestrator start mid-sequence (gate
// #73/#74).
let present = vec![
"mempool".to_string(),
"mempool-api".to_string(),
"archy-mempool-db".to_string(),
];
let ordered = order_present_containers("mempool", present);
// Dependency order: db -> api -> frontend.
assert_eq!(ordered, vec!["archy-mempool-db", "mempool-api", "mempool"]);
// No phantom variants leaked in.
for phantom in ["mysql-mempool", "archy-mempool-api", "archy-mempool-web"] {
assert!(
!ordered.iter().any(|c| c == phantom),
"phantom {phantom} must not be injected"
);
}
}
#[test]
fn order_present_containers_orders_known_before_unknown() {
let present = vec!["mempool".to_string(), "some-sidecar".to_string()];
let ordered = order_present_containers("mempool", present);
// The known frontend sorts ahead of an unknown sidecar.
assert_eq!(ordered, vec!["mempool", "some-sidecar"]);
}
#[test]
fn order_present_containers_empty_falls_back_to_package_id() {
assert_eq!(
order_present_containers("mempool", vec![]),
vec!["mempool".to_string()]
);
}
#[test]
fn btcpay_start_order_includes_required_stack_members() {

View File

@ -312,7 +312,16 @@ impl RpcHandler {
let mut stopped = 0u32;
let mut removed = 0u32;
let mut errors = Vec::new();
// Two distinct failure classes, kept separate so they don't get
// conflated (the old single `errors` vec did, which caused the "ghost in
// My Apps" bug): `container_errors` means a container could NOT be
// removed (force-rm failed too) — the app is genuinely still present, so
// we keep its state entry and surface a hard error. `cleanup_errors`
// means volume/network/data-dir teardown left residue — the containers
// are already gone, so the app IS uninstalled and MUST disappear from My
// Apps; the residue is logged but never ghosts the app.
let mut container_errors: Vec<String> = Vec::new();
let mut cleanup_errors: Vec<String> = Vec::new();
self.set_uninstall_stage(
package_id,
@ -370,7 +379,7 @@ impl RpcHandler {
let msg =
format!("Failed to remove {}: {}; {}", name, stderr.trim(), e);
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
container_errors.push(msg);
}
}
}
@ -379,12 +388,35 @@ impl RpcHandler {
Err(force_err) => {
let msg = format!("Failed to remove {}: {}; {}", name, e, force_err);
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
container_errors.push(msg);
}
},
}
}
// A container that survived even force-remove means the app is NOT
// actually uninstalled — keep its state entry and fail so the spawned
// task reverts it to its prior state (and the user can retry), rather
// than orphaning a live container that's missing from My Apps.
if !container_errors.is_empty() {
tracing::error!(
"Uninstall {}: containers could not be removed: {:?}",
package_id,
container_errors
);
return Err(anyhow::anyhow!(
"Uninstall {} failed: {}",
package_id,
container_errors.join("; ")
));
}
// Containers are gone → the app is uninstalled. Remove its state entry
// NOW, before the (possibly slow, possibly fallible) volume/data
// teardown below, so My Apps updates immediately and a residue failure
// can never leave a ghost. Reinstall/scan no longer see a stale entry.
self.remove_package_state_entry(package_id).await;
self.set_uninstall_stage(package_id, "Cleaning up volumes")
.await;
// Avoid global Podman volume prune on production nodes: store-wide
@ -432,70 +464,73 @@ impl RpcHandler {
let stderr = String::from_utf8_lossy(&o.stderr);
let msg = format!("Failed to remove data {}: {}", dir, stderr.trim());
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
cleanup_errors.push(msg);
}
Err(e) => {
let msg = format!("Failed to remove data {}: {}", dir, e);
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
cleanup_errors.push(msg);
}
_ => {}
}
}
}
if !errors.is_empty() {
// The app is already gone from My Apps (entry removed above). Residual
// volume/data cleanup failures are logged but NEVER ghost the app — a
// reinstall and the next uninstall both tolerate leftover dirs.
if !cleanup_errors.is_empty() {
tracing::error!(
"Uninstall {} completed with errors: {:?}",
"Uninstall {} removed but left cleanup residue: {:?}",
package_id,
errors
cleanup_errors
);
return Err(anyhow::anyhow!(
"Uninstall {} partially failed: {}",
package_id,
errors.join("; ")
));
}
tracing::info!(
"Uninstall {} complete: stopped={}, removed={}",
"Uninstall {} complete: stopped={}, removed={}, cleanup_errors={}",
package_id,
stopped,
removed
removed,
cleanup_errors.len()
);
// Immediately remove from in-memory state so the UI updates without
// waiting for the scanner's absence threshold (3 scans × 60s each).
{
let (mut data, _rev) = self.state_manager.get_snapshot().await;
let before = data.package_data.len();
data.package_data.remove(package_id);
// Also remove any alias keys (e.g. "bitcoin-knots" vs "bitcoin")
let aliases: Vec<String> = data
.package_data
.keys()
.filter(|k| {
super::config::all_container_names(package_id)
.iter()
.any(|c| c.strip_prefix("archy-").unwrap_or(c) == k.as_str())
})
.cloned()
.collect();
for alias in &aliases {
data.package_data.remove(alias);
}
if data.package_data.len() < before {
self.state_manager.update_data(data).await;
}
}
Ok(serde_json::json!({
"status": "uninstalled",
"stopped": stopped,
"removed": removed,
"cleanup_warnings": cleanup_errors,
}))
}
/// Remove a package's entry (and any alias keys) from persisted state so it
/// disappears from My Apps immediately, without waiting for the scanner's
/// absence threshold (3 scans × 60s). Called as soon as an uninstall has
/// removed the app's containers — before the slower volume/data teardown —
/// so a residue failure can never leave a ghost entry behind.
async fn remove_package_state_entry(&self, package_id: &str) {
let (mut data, _rev) = self.state_manager.get_snapshot().await;
let before = data.package_data.len();
data.package_data.remove(package_id);
// Also remove any alias keys (e.g. "bitcoin-knots" vs "bitcoin").
let aliases: Vec<String> = data
.package_data
.keys()
.filter(|k| {
super::config::all_container_names(package_id)
.iter()
.any(|c| c.strip_prefix("archy-").unwrap_or(c) == k.as_str())
})
.cloned()
.collect();
for alias in &aliases {
data.package_data.remove(alias);
}
if data.package_data.len() < before {
self.state_manager.update_data(data).await;
}
}
/// Start a bundled app (create container from pre-loaded image if needed).
pub(in crate::api::rpc) async fn handle_bundled_app_start(
&self,

View File

@ -6,7 +6,6 @@
use crate::api::rpc::RpcHandler;
use crate::data_model::InstallPhase;
use anyhow::{Context, Result};
use base64::Engine;
use std::process::Output;
use std::time::Duration;
use tracing::info;
@ -696,6 +695,16 @@ fn immich_stack_app_ids() -> &'static [&'static str] {
&["immich-postgres", "immich-redis", "immich"]
}
fn netbird_stack_app_ids() -> &'static [&'static str] {
// Dependency/startup order: the combined management/signal/relay server
// first (it owns the base64 relay/store secrets + the sqlite store, and is
// the OIDC issuer the others point at), then the dashboard SPA, then the
// user-facing TLS proxy ("netbird", which carries the self-signed cert +
// the templated nginx.conf and is the launcher). Mirrors the netbird
// startup_order in dependencies.rs.
&["netbird-server", "netbird-dashboard", "netbird"]
}
fn indeedhub_stack_app_ids() -> &'static [&'static str] {
// Dependency order: backends + their generated secrets first, then the api
// (owns indeedhub-jwt; reads the db/minio secrets the backends materialised),
@ -715,10 +724,6 @@ fn indeedhub_stack_app_ids() -> &'static [&'static str] {
const REGISTRY: &str = "146.59.87.168:3000/lfg2025";
const NETBIRD_DASHBOARD_IMAGE: &str = "docker.io/netbirdio/dashboard:v2.38.0";
const NETBIRD_SERVER_IMAGE: &str = "docker.io/netbirdio/netbird-server:0.71.2";
const NETBIRD_PROXY_IMAGE: &str = "docker.io/library/nginx:1.27-alpine";
/// Pull an image with retry and exponential backoff (3 attempts).
async fn pull_image_with_retry(image: &str) -> Result<()> {
let exists = podman_stack_status(&["image", "exists", image], PODMAN_STACK_PROBE_TIMEOUT).await;
@ -1828,6 +1833,27 @@ impl RpcHandler {
/// Install self-hosted NetBird (dashboard + combined management/signal/relay server).
pub(super) async fn install_netbird_stack(&self) -> Result<serde_json::Value> {
// Manifest-driven path (#20 phase 4): render the 3-member stack from
// apps/netbird-*/manifest.yml via the orchestrator — dedicated
// netbird-net + network_aliases, base64 generated_secrets, a self-signed
// TLS cert (generated_certs) so the dashboard gets a secure context for
// OIDC PKCE (#15), and templated config.yaml/nginx.conf rendered from
// host facts + the netbird-net gateway. The manifests use the exact live
// container names, so on an existing node this ADOPTS the running stack
// rather than recreating it (the sqlite store + base64 keys are
// preserved — ensure_generated_secrets no-ops on existing files).
//
// #20 ph4: the legacy hardcoded `podman run` installer was DELETED — the
// signed catalog always ships apps/netbird-*/manifest.yml, so there is no
// in-Rust fallback. If the orchestrator doesn't know these app_ids and no
// running stack exists to adopt, install errors rather than silently
// diverging from the manifest contract.
if let Some(orchestrated) =
install_stack_via_orchestrator(self, "netbird", netbird_stack_app_ids()).await?
{
return Ok(orchestrated);
}
if let Some(adopted) = adopt_stack_if_exists(
"netbird",
"netbird",
@ -1838,491 +1864,12 @@ impl RpcHandler {
return Ok(adopted);
}
install_log("INSTALL START: netbird stack (dashboard + server)").await;
info!("Installing self-hosted NetBird stack");
self.set_install_phase("netbird", InstallPhase::PullingImage)
.await;
for (i, image) in [
NETBIRD_DASHBOARD_IMAGE,
NETBIRD_SERVER_IMAGE,
NETBIRD_PROXY_IMAGE,
]
.iter()
.enumerate()
{
self.set_install_progress("netbird", i as u64, 3).await;
pull_image_with_retry(image)
.await
.with_context(|| format!("Failed to pull NetBird image: {}", image))?;
}
self.set_install_progress("netbird", 3, 3).await;
for name in ["netbird", "netbird-dashboard", "netbird-server"] {
let _ = podman_stack_status(&["rm", "-f", name], PODMAN_STACK_PROBE_TIMEOUT).await;
}
let _ = podman_stack_status(
&["network", "rm", "-f", "netbird-net"],
PODMAN_STACK_PROBE_TIMEOUT,
anyhow::bail!(
"netbird manifests not available on this node — the signed catalog must provide apps/netbird-*/manifest.yml (legacy hardcoded installer removed in #20 ph4)"
)
.await;
self.set_install_phase("netbird", InstallPhase::CreatingContainer)
.await;
tokio::fs::create_dir_all("/var/lib/archipelago/netbird/data")
.await
.context("Failed to create NetBird data directory")?;
let host_ip = detect_netbird_public_host_ip()
.await
.unwrap_or_else(|| self.config.host_ip.clone());
// Create the network FIRST so we can read back the gateway it was
// assigned — that gateway is Podman's aardvark DNS, which the proxy's
// nginx needs as an explicit `resolver` to re-resolve container names
// (issue #15: without it nginx caches a container IP and 502s forever
// once that IP changes on restart/reboot).
let _ = podman_stack_status(
&["network", "create", "netbird-net"],
PODMAN_STACK_PROBE_TIMEOUT,
)
.await;
let resolver_ip = netbird_net_resolver_ip().await;
write_netbird_config_files(&host_ip, &self.config.host_ip, &resolver_ip).await?;
ensure_netbird_tls_cert(&host_ip).await?;
let mut server_cmd = tokio::process::Command::new("podman");
server_cmd.args([
"run",
"-d",
"--name",
"netbird-server",
"--network",
"netbird-net",
"--network-alias",
"netbird-server",
"--restart=unless-stopped",
"-p",
"8086:80",
"-p",
"3478:3478/udp",
"-v",
"/var/lib/archipelago/netbird/data:/var/lib/netbird",
"-v",
"/var/lib/archipelago/netbird/config.yaml:/etc/netbird/config.yaml:ro",
NETBIRD_SERVER_IMAGE,
"--config",
"/etc/netbird/config.yaml",
]);
run_required_stack_command("netbird", "create server", &mut server_cmd).await?;
self.set_install_phase("netbird", InstallPhase::StartingContainer)
.await;
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
let mut dashboard_cmd = tokio::process::Command::new("podman");
dashboard_cmd.args([
"run",
"-d",
"--name",
"netbird-dashboard",
"--network",
"netbird-net",
// Explicit alias so the proxy can always resolve `netbird-dashboard`
// via Podman DNS — don't rely on implicit container-name aliasing.
"--network-alias",
"netbird-dashboard",
"--restart=unless-stopped",
"--env-file",
"/var/lib/archipelago/netbird/dashboard.env",
NETBIRD_DASHBOARD_IMAGE,
]);
run_required_stack_command("netbird", "create dashboard", &mut dashboard_cmd).await?;
let mut proxy_cmd = tokio::process::Command::new("podman");
proxy_cmd.args([
"run",
"-d",
"--name",
"netbird",
"--network",
"netbird-net",
"--restart=unless-stopped",
// 8087 publishes the TLS listener — netbird's dashboard requires a
// secure context (window.crypto.subtle / OIDC PKCE), issue #15.
"-p",
"8087:443",
"-v",
"/var/lib/archipelago/netbird/nginx.conf:/etc/nginx/conf.d/default.conf:ro",
"-v",
"/var/lib/archipelago/netbird/tls.crt:/etc/nginx/tls.crt:ro",
"-v",
"/var/lib/archipelago/netbird/tls.key:/etc/nginx/tls.key:ro",
NETBIRD_PROXY_IMAGE,
]);
run_required_stack_command("netbird", "create unified proxy", &mut proxy_cmd).await?;
wait_for_stack_containers(
"netbird",
&["netbird-server", "netbird-dashboard", "netbird"],
60,
)
.await?;
self.set_install_phase("netbird", InstallPhase::WaitingHealthy)
.await;
// Containers being "running" is NOT the same as the embedded OIDC
// provider being ready (#10). The dashboard SPA opens right after install
// and, if it loads before /oauth2/.well-known is served, caches a bad
// auth state — the user appears logged-in but can't log out until it
// self-corrects. Wait (best-effort) for OIDC discovery to answer before
// we report Done, so the first dashboard load sees a ready provider.
wait_for_netbird_oidc_ready(Duration::from_secs(60)).await;
self.set_install_phase("netbird", InstallPhase::PostInstall)
.await;
self.set_install_phase("netbird", InstallPhase::Done).await;
self.clear_install_progress("netbird").await;
install_log("INSTALL OK: netbird stack").await;
info!("NetBird stack installed");
Ok(serde_json::json!({
"success": true,
"package_id": "netbird",
"message": "NetBird self-hosted stack installed",
}))
}
}
/// Best-effort wait for NetBird's embedded OIDC provider to start serving its
/// discovery document. The management server publishes 8086:80 on the host and
/// is the issuer at `/oauth2`, so its `.well-known/openid-configuration` is the
/// signal that the dashboard's login/logout flow will work. Polls until a 2xx
/// or the timeout — NEVER fails the install (the stack is already running; this
/// only narrows the post-install race window in #10).
async fn wait_for_netbird_oidc_ready(timeout: Duration) {
let url = "http://127.0.0.1:8086/oauth2/.well-known/openid-configuration";
let client = match reqwest::Client::builder()
.timeout(Duration::from_secs(5))
.build()
{
Ok(c) => c,
Err(_) => return,
};
let deadline = tokio::time::Instant::now() + timeout;
loop {
if let Ok(resp) = client.get(url).send().await {
if resp.status().is_success() {
info!("NetBird OIDC discovery is ready");
return;
}
}
if tokio::time::Instant::now() >= deadline {
info!("NetBird OIDC discovery not ready within timeout — proceeding anyway");
return;
}
tokio::time::sleep(Duration::from_secs(2)).await;
}
}
async fn read_or_generate_b64_secret(name: &str) -> String {
let path = format!("/var/lib/archipelago/secrets/{}", name);
if let Ok(val) = tokio::fs::read_to_string(&path).await {
let trimmed = val.trim().to_string();
if !trimmed.is_empty() {
return trimmed;
}
}
let mut buf = [0u8; 32];
rand::RngCore::fill_bytes(&mut rand::rngs::OsRng, &mut buf);
let secret = base64::engine::general_purpose::STANDARD.encode(buf);
let _ = tokio::fs::create_dir_all("/var/lib/archipelago/secrets").await;
let _ = tokio::fs::write(&path, &secret).await;
secret
}
/// Read the gateway of the `netbird-net` bridge. Podman runs its aardvark DNS
/// resolver on this address, so nginx can use it as an explicit `resolver` to
/// re-resolve container names at request time. Falls back to Podman's usual
/// first-pool gateway if the inspect fails (best effort — config is rewritten
/// on every (re)install).
async fn netbird_net_resolver_ip() -> String {
let out = tokio::process::Command::new("podman")
.args([
"network",
"inspect",
"netbird-net",
"--format",
"{{range .Subnets}}{{.Gateway}}{{end}}",
])
.output()
.await;
if let Ok(o) = out {
let gw = String::from_utf8_lossy(&o.stdout).trim().to_string();
if !gw.is_empty() && gw.parse::<std::net::IpAddr>().is_ok() {
return gw;
}
}
"10.89.0.1".to_string()
}
/// Generate a self-signed TLS cert for the netbird proxy if absent. The
/// dashboard needs a secure context (window.crypto.subtle / OIDC PKCE), so the
/// proxy serves HTTPS; a self-signed cert is sufficient (the user accepts it
/// once when opening netbird in a tab). SAN covers the LAN IP plus
/// localhost/127.0.0.1 so it's valid however the box is reached locally.
async fn ensure_netbird_tls_cert(host_ip: &str) -> Result<()> {
let dir = "/var/lib/archipelago/netbird";
let crt = format!("{dir}/tls.crt");
let key = format!("{dir}/tls.key");
if tokio::fs::metadata(&crt).await.is_ok() && tokio::fs::metadata(&key).await.is_ok() {
return Ok(());
}
let _ = tokio::fs::create_dir_all(dir).await;
let san = format!("subjectAltName=IP:{host_ip},IP:127.0.0.1,DNS:localhost");
let status = tokio::process::Command::new("openssl")
.args([
"req",
"-x509",
"-newkey",
"rsa:2048",
"-nodes",
"-keyout",
&key,
"-out",
&crt,
"-days",
"3650",
"-subj",
&format!("/CN={host_ip}"),
"-addext",
&san,
])
.status()
.await
.context("failed to run openssl for netbird TLS cert")?;
if !status.success() {
anyhow::bail!("openssl failed to generate netbird TLS cert");
}
Ok(())
}
async fn write_netbird_config_files(host_ip: &str, lan_ip: &str, resolver_ip: &str) -> Result<()> {
// netbird's dashboard uses window.crypto.subtle (OIDC PKCE), which browsers
// only expose in a SECURE context — so the proxy serves HTTPS and every
// origin here is https (issue #15: over plain http the dashboard threw
// "window.crypto.subtle is unavailable" and never reached login).
let public_origin = format!("https://{}:8087", host_ip);
let server_origin = format!("http://{}:8086", host_ip);
// A single box is reached via several addresses. Allow the OIDC login flow
// to redirect back to whichever origin the user actually used, otherwise
// post-login lands on the wrong host and the dashboard shows
// "Unauthenticated" (issue #15). The browser-side CORS is handled in the
// nginx proxy; this covers the redirect-URI allow-list.
let lan_origin = format!("https://{}:8087", lan_ip);
let mut redirect_origins = vec![public_origin.clone()];
if lan_origin != public_origin {
redirect_origins.push(lan_origin);
}
let dashboard_redirect_uris = redirect_origins
.iter()
.flat_map(|o| {
[
format!(" - \"{o}/nb-auth\""),
format!(" - \"{o}/nb-silent-auth\""),
]
})
.collect::<Vec<_>>()
.join("\n");
let dashboard_logout_uris = redirect_origins
.iter()
.map(|o| format!(" - \"{o}/\""))
.collect::<Vec<_>>()
.join("\n");
let relay_secret = read_or_generate_b64_secret("netbird-relay-auth-secret").await;
let encryption_key = read_or_generate_b64_secret("netbird-store-encryption-key").await;
let config = format!(
r#"server:
listenAddress: ":80"
exposedAddress: "{public_origin}"
stunPorts:
- 3478
metricsPort: 9090
healthcheckAddress: ":9000"
logLevel: "info"
logFile: "console"
authSecret: "{relay_secret}"
dataDir: "/var/lib/netbird"
auth:
issuer: "{public_origin}/oauth2"
localAuthDisabled: false
signKeyRefreshEnabled: false
dashboardRedirectURIs:
{dashboard_redirect_uris}
dashboardPostLogoutRedirectURIs:
{dashboard_logout_uris}
cliRedirectURIs:
- "http://localhost:53000/"
store:
engine: "sqlite"
encryptionKey: "{encryption_key}"
"#
);
tokio::fs::write("/var/lib/archipelago/netbird/config.yaml", config)
.await
.context("Failed to write NetBird config.yaml")?;
let dashboard_env = format!(
r#"NETBIRD_MGMT_API_ENDPOINT={public_origin}
NETBIRD_MGMT_GRPC_API_ENDPOINT={public_origin}
AUTH_AUDIENCE=netbird-dashboard
AUTH_CLIENT_ID=netbird-dashboard
AUTH_CLIENT_SECRET=
AUTH_AUTHORITY={public_origin}/oauth2
USE_AUTH0=false
AUTH_SUPPORTED_SCOPES=openid profile email groups
AUTH_REDIRECT_URI=/nb-auth
AUTH_SILENT_REDIRECT_URI=/nb-silent-auth
NETBIRD_TOKEN_SOURCE=idToken
NGINX_SSL_PORT=443
LETSENCRYPT_DOMAIN=none
"#
);
tokio::fs::write("/var/lib/archipelago/netbird/dashboard.env", dashboard_env)
.await
.context("Failed to write NetBird dashboard.env")?;
let nginx_conf = format!(
r#"server {{
listen 443 ssl;
server_name _;
# netbird's dashboard needs a secure context (window.crypto.subtle for OIDC
# PKCE), so the proxy terminates TLS with a self-signed cert (issue #15).
ssl_certificate /etc/nginx/tls.crt;
ssl_certificate_key /etc/nginx/tls.key;
# Rootless Podman can hand a container a new IP across restarts/reboots.
# nginx resolves a literal upstream name ONCE at startup and caches it, so
# after the IP moves every request 502s with "host unreachable" (issue #15,
# observed live on .198: nginx pinned to a dead netbird-dashboard IP). Fix:
# point `resolver` at the netbird-net gateway (Podman's aardvark DNS) and
# use VARIABLE upstreams, which forces nginx to re-resolve the container
# names at request time. Everything is reached container-to-container by
# name so nothing depends on host-published ports either.
resolver {resolver_ip} valid=10s ipv6=off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_http_version 1.1;
location ~ ^/(relay|ws-proxy/) {{
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 1d;
}}
location ~ ^/(api|oauth2)(/|$) {{
# The dashboard is a SPA whose API/OIDC base URL is baked at build time
# to one host:port. A single box is reached via several addresses (LAN
# IP, Tailscale 100.x, hostname), so those fetches are cross-origin and
# the browser blocks them with no Access-Control-Allow-Origin (issue
# #15, observed live on .198). Reflect the caller's Origin so the
# self-hosted management/OIDC API is reachable from any of them, and
# answer the CORS preflight here.
if ($request_method = OPTIONS) {{
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
add_header Access-Control-Max-Age 86400 always;
add_header Content-Length 0;
return 204;
}}
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Credentials true always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, PATCH, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Accept" always;
set $nb_server netbird-server;
proxy_pass http://$nb_server:80;
}}
location ~ ^/(signalexchange\.SignalExchange|management\.ManagementService|management\.ProxyService)/ {{
set $nb_server netbird-server;
grpc_pass grpc://$nb_server:80;
grpc_read_timeout 1d;
grpc_send_timeout 1d;
}}
# OIDC callback routes are client-side SPA routes with NO prebuilt page in
# the dashboard bundle, so proxying them straight through 404s which
# crashes the dashboard's auth init and shows "Unauthenticated" with dead
# buttons (issue #15, confirmed live on .198: /nb-auth + /nb-silent-auth
# returned 404). Serve the dashboard's index.html at these paths (URL
# unchanged) so react-oidc boots and completes the login / silent-SSO.
location ~ ^/(nb-auth|nb-silent-auth) {{
set $nb_dashboard netbird-dashboard;
rewrite ^.*$ /index.html break;
proxy_pass http://$nb_dashboard:80;
}}
location / {{
set $nb_dashboard netbird-dashboard;
proxy_pass http://$nb_dashboard:80;
}}
}}
# Direct server remains available for diagnostics at {server_origin}.
"#
);
tokio::fs::write("/var/lib/archipelago/netbird/nginx.conf", nginx_conf)
.await
.context("Failed to write NetBird nginx.conf")?;
Ok(())
}
async fn detect_netbird_public_host_ip() -> Option<String> {
let output = tokio::process::Command::new("hostname")
.args(["-I"])
.output()
.await
.ok()?;
let stdout = String::from_utf8_lossy(&output.stdout);
let ips: Vec<&str> = stdout
.split_whitespace()
.filter(|s| s.contains('.'))
.collect();
// Prefer the LAN address as the canonical origin — that's what users browse
// to on the local network. Baking the Tailscale 100.x address here broke
// LAN access with cross-origin/redirect mismatches (issue #15). Tailscale
// (100.64.0.0/10 CGNAT) is only a fallback for nodes with no LAN IP.
let is_private_lan = |ip: &str| {
ip.starts_with("192.168.")
|| ip.starts_with("10.")
|| (ip.starts_with("172.")
&& ip
.split('.')
.nth(1)
.and_then(|o| o.parse::<u8>().ok())
.map(|o| (16..=31).contains(&o))
.unwrap_or(false))
};
if let Some(lan) = ips.iter().find(|ip| is_private_lan(ip)) {
return Some(lan.to_string());
}
ips.iter()
.find(|ip| ip.starts_with("100."))
.map(|s| s.to_string())
}
#[cfg(test)]
mod tests {
use super::{btcpay_stack_app_ids, mempool_stack_app_ids};

View File

@ -66,7 +66,7 @@ pub struct Config {
/// through Quadlet (`.container` units in ~/.config/containers/systemd
/// + systemctl --user start) instead of `podman create + start`. Default
/// off so the legacy path stays the production path until the harness
/// at tests/lifecycle/run-20x.sh has gone green against the new path
/// at tests/lifecycle/run-gate.sh has gone green against the new path
/// on .228 + .198. See `project_v1_7_52_phase3_quadlet_design`.
#[serde(default)]
pub use_quadlet_backends: bool,
@ -487,7 +487,7 @@ mod tests {
#[test]
fn test_config_use_quadlet_backends_defaults_off() {
// Phase 3.2 of v1.7.52 — the new path stays gated until the 20×
// Phase 3.2 of v1.7.52 — the new path stays gated until the 5×
// harness goes green on .228 and .198. Flipping this default
// ahead of that would route every backend install through code
// we haven't fleet-validated yet.

View File

@ -96,6 +96,35 @@ impl BootReconciler {
}
}
// Companion self-heal runs on its OWN cadence, decoupled from the
// per-app reconcile pass. On a heavily loaded node `reconcile_existing`
// over dozens of apps can take well over a minute, which would delay a
// companion-unit repair (deleted/lost unit file) past any reasonable
// safety window. Detecting + rewriting a companion unit is cheap, so it
// gets a dedicated `interval` loop. The handle is aborted when the main
// loop exits (shutdown uses `notify_one`, so we must NOT add a second
// waiter on `self.shutdown` — it would steal the single wake permit).
let companion_handle = if self.companion_stage {
let orchestrator = self.orchestrator.clone();
let interval = self.interval;
Some(tokio::spawn(async move {
loop {
let installed = orchestrator.manifest_ids().await;
for (companion, err) in crate::container::companion::reconcile(&installed).await
{
tracing::warn!(
companion = %companion,
error = %err,
"companion reconcile failed"
);
}
time::sleep(interval).await;
}
}))
} else {
None
};
// Initial pass: no delay.
self.tick().await;
@ -111,23 +140,15 @@ impl BootReconciler {
}
}
}
if let Some(handle) = companion_handle {
handle.abort();
}
}
async fn tick(&self) {
let report = self.orchestrator.reconcile_existing().await;
Self::log_report(&report);
if !self.companion_stage {
return;
}
let installed = self.orchestrator.manifest_ids().await;
for (companion, err) in crate::container::companion::reconcile(&installed).await {
tracing::warn!(
companion = %companion,
error = %err,
"companion reconcile failed"
);
}
}
fn log_report(report: &ReconcileReport) {

View File

@ -285,7 +285,15 @@ async fn ensure_image_present(spec: &CompanionSpec) -> Result<String> {
async fn image_exists(image: &str) -> bool {
let mut cmd = Command::new("podman");
cmd.args(["image", "inspect", image]);
// Only the exit status matters. WITHOUT a `--format`, `podman image inspect`
// prints the image's full multi-KB manifest JSON; `.status()` inherits the
// service's stdout, so on a hit that whole blob lands in the journal — once
// per companion image, every reconcile pass. That flood spikes journald +
// IO and starves the async runtime (UI websocket then drops → "connection
// lost"/reconnect). Discard the child's stdout/stderr; we read neither.
cmd.args(["image", "inspect", image])
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null());
match tokio::time::timeout(COMPANION_IMAGE_CHECK_TIMEOUT, cmd.status()).await {
Ok(Ok(status)) => status.success(),
Ok(Err(err)) => {

View File

@ -691,16 +691,37 @@ fn extract_lan_address(ports: &[String]) -> Option<String> {
None
}
/// netbird's dashboard launch URL: HTTPS on 8087 (the proxy terminates TLS —
/// the dashboard needs a secure context for OIDC PKCE, issue #15) at the node's
/// primary host IP so it's reachable from the LAN. Manifest-driven netbird no
/// longer writes `dashboard.env`, so this is derived from host facts (the same
/// `{{HOST_IP}}` the orchestrator bakes into the cert/config); it falls back to
/// the static localhost mapping when the host IP can't be read. URL shape is
/// identical to the legacy installer's, so the existing https reachability
/// wrapper still applies.
async fn netbird_configured_launch_url() -> Option<String> {
let env = tokio::fs::read_to_string("/var/lib/archipelago/netbird/dashboard.env")
if let Some(ip) = first_host_ip().await {
return Some(format!("https://{ip}:8087"));
}
PodmanClient::lan_address_for("netbird")
}
/// First address from `hostname -I` — the node's primary host IP. Mirrors the
/// orchestrator's `detect_host_ip` so launch URLs match the cert/config the
/// orchestrator renders for `{{HOST_IP}}`.
async fn first_host_ip() -> Option<String> {
let out = tokio::process::Command::new("hostname")
.arg("-I")
.output()
.await
.ok()?;
env.lines()
.find_map(|line| line.strip_prefix("NETBIRD_MGMT_API_ENDPOINT="))
.map(str::trim)
.filter(|s| !s.is_empty())
if !out.status.success() {
return None;
}
String::from_utf8_lossy(&out.stdout)
.split_whitespace()
.next()
.map(ToOwned::to_owned)
.or_else(|| PodmanClient::lan_address_for("netbird"))
}
async fn reachable_lan_address(app_id: &str, candidate: Option<String>) -> Option<String> {

View File

@ -26,7 +26,7 @@
use anyhow::{Context, Result};
use archipelago_container::{
AppManifest, ContainerRuntime as ContainerRuntimeTrait, ContainerState, ContainerStatus,
Dependency, GeneratedFile, HostFacts, ManifestError, ResolvedSource, SecretsProvider,
Dependency, HostFacts, ManifestError, ResolvedSource, SecretsProvider,
};
use async_trait::async_trait;
use std::collections::{HashMap, HashSet};
@ -294,6 +294,20 @@ async fn chown_for_rootless_container(uid_gid: &str, path: &str) -> Result<()> {
))
}
/// `(container-id, mount-dest)` pairs whose in-container chown returned a hard,
/// permanent failure (e.g. "Operation not permitted" on a mount that can't be
/// re-owned from inside the userns). Remembered for the life of the process so
/// the per-reconcile repair stops re-attempting them — otherwise a single
/// unrepairable mount (observed: mempool-api `/data`) burns CPU + floods the
/// journal on every pass. Keyed by Id so a recreated container retries afresh.
fn unrepairable_ownership() -> &'static std::sync::Mutex<std::collections::HashSet<(String, String)>>
{
static SET: std::sync::OnceLock<
std::sync::Mutex<std::collections::HashSet<(String, String)>>,
> = std::sync::OnceLock::new();
SET.get_or_init(|| std::sync::Mutex::new(std::collections::HashSet::new()))
}
/// App-agnostic, userns-mapping-proof volume-ownership repair for a RUNNING
/// container.
///
@ -332,6 +346,13 @@ async fn ensure_running_container_ownership(name: &str) -> bool {
.filter(|g| !g.is_empty())
.unwrap_or_else(|| uid.clone());
// Stable identity of THIS container instance — used to remember mounts whose
// chown is hard-unrepairable so we stop hammering them every reconcile. Keyed
// by Id (not name) so a recreated container gets a fresh repair attempt.
let cid = podman_stdout(&["inspect", name, "--format", "{{.Id}}"])
.await
.unwrap_or_default();
// Writable bind-mount destinations only.
let dests = match podman_stdout(&[
"inspect",
@ -359,6 +380,19 @@ async fn ensure_running_container_ownership(name: &str) -> bool {
continue;
}
// Known hard-unrepairable for this container instance (a previous chown
// returned a permanent error like "Operation not permitted"). Skip the
// probe+chown entirely — retrying every reconcile only burns CPU and
// floods the journal; it will never succeed for this instance.
if !cid.is_empty()
&& unrepairable_ownership()
.lock()
.map(|s| s.contains(&(cid.clone(), dest.to_string())))
.unwrap_or(false)
{
continue;
}
// Drift check: can the service user write here already?
let probe = format!(
"t=\"{dest}/.archy-wtest.$$\"; touch \"$t\" 2>/dev/null && rm -f \"$t\" 2>/dev/null"
@ -395,11 +429,21 @@ async fn ensure_running_container_ownership(name: &str) -> bool {
"repaired unwritable volume ownership (in-container chown)"
);
}
Ok(o) => tracing::warn!(
container = %name, dest,
"volume ownership repair failed: {}",
String::from_utf8_lossy(&o.stderr).trim()
),
Ok(o) => {
// Permanent failure (e.g. "Operation not permitted" on a mount
// that simply can't be re-owned from inside the userns). Record
// it so we don't re-attempt every reconcile — log once, loudly.
if !cid.is_empty() {
if let Ok(mut s) = unrepairable_ownership().lock() {
s.insert((cid.clone(), dest.to_string()));
}
}
tracing::warn!(
container = %name, dest,
"volume ownership repair failed (won't retry for this container instance): {}",
String::from_utf8_lossy(&o.stderr).trim()
)
}
Err(e) => {
tracing::warn!(container = %name, dest, "volume ownership repair errored: {e}")
}
@ -469,7 +513,18 @@ async fn http_host_port_ready(port: u16, path: &str) -> bool {
}
async fn wait_for_manifest_host_ports(manifest: &AppManifest, timeout_secs: u64) -> Result<()> {
for port in manifest.app.ports.iter().map(|p| p.host) {
// Only TCP host ports are reachability-probed: the probe is a TCP connect,
// which a UDP/SCTP listener (e.g. netbird's 3478/udp STUN) can never answer,
// so probing it would always "fail" and drive an endless host-port repair
// loop (observed on .228 after netbird's manifest deploy). Default protocol
// (empty) is tcp.
for port in manifest
.app
.ports
.iter()
.filter(|p| matches!(p.protocol.to_ascii_lowercase().as_str(), "" | "tcp"))
.map(|p| p.host)
{
let ready = match manifest.app.id.as_str() {
"uptime-kuma" => wait_for_http_host_port(port, "/", timeout_secs).await,
_ => wait_for_host_port(port, timeout_secs).await,
@ -646,6 +701,70 @@ async fn remove_stale_podman_socket_path(socket_path: &str) {
}
}
/// For a bind-mount source we're about to `mkdir -p` (as root), return the
/// nearest pre-existing ancestor (whose ownership we copy) and the TOPMOST dir
/// that doesn't yet exist on the path to it (the root of the subtree mkdir will
/// create). Chowning that subtree to the anchor fixes nested bind sources
/// (`<dataroot>/<app>/<subdir>`) where `mkdir -p` would otherwise leave the
/// intermediate `<app>` dir root-owned. See `ensure_bind_mount_dirs`.
fn fresh_subtree_anchor(source: &Path) -> (Option<PathBuf>, PathBuf) {
let mut top = source.to_path_buf();
let mut cur = top.parent().map(Path::to_path_buf);
let mut anchor = None;
while let Some(p) = cur {
if p.exists() {
anchor = Some(p);
break;
}
cur = p.parent().map(Path::to_path_buf);
top = p;
}
(anchor, top)
}
/// True when `pid` names a live process (its `/proc/<pid>` entry exists).
/// `pid <= 0` is never alive. (Best-effort: a reused PID can read as alive, but
/// that only delays zombie detection a cycle — it never recreates a healthy one.)
fn pid_is_alive(pid: i32) -> bool {
pid > 0 && Path::new(&format!("/proc/{pid}")).exists()
}
/// Whether the process backing a podman **"running"** container is actually alive.
///
/// Podman trusts its own state DB: if a container's conmon dies without podman
/// observing it (a cgroup-cascade SIGKILL when `archipelago.service` restarts, a
/// crash), `podman ps` keeps reporting the container **"Up"** long after the
/// process is gone — a ZOMBIE. It serves nothing (its port is dead), yet the
/// reconciler NoOps it forever because the state says Running. Verify the
/// recorded main PID is alive so the caller can recreate a zombie rather than
/// trust the stale "running".
///
/// Conservative by design: any uncertainty (inspect failed, PID unparseable)
/// returns `true` (assume alive) so a transient podman hiccup never destroys a
/// healthy container. Only a concrete, dead PID returns `false`.
///
/// Observed live on .228 (2026-06-25): `netbird-dashboard` reported "Up" with
/// `State.Pid` 1394766 already gone → its nginx proxy 502'd → NetBird login
/// broke ("Unauthenticated"). The reconciler never recovered it because the
/// dashboard publishes no host port, so the Running branch had nothing to probe.
async fn container_running_process_alive(name: &str) -> bool {
let out = match tokio::process::Command::new("podman")
.args(["inspect", "--format", "{{.State.Pid}}", name])
.output()
.await
{
Ok(o) if o.status.success() => o,
_ => return true, // can't determine — don't destabilize a healthy app
};
match String::from_utf8_lossy(&out.stdout).trim().parse::<i32>() {
// A genuinely running container always has a supervised PID > 0 whose
// /proc entry exists. A dead PID (or PID <= 0 alongside state "running")
// is the anomaly we're catching.
Ok(pid) => pid_is_alive(pid),
Err(_) => true, // unparseable (older podman / odd output) — assume alive
}
}
async fn wait_for_container_stable_running(
runtime: &dyn ContainerRuntimeTrait,
name: &str,
@ -894,7 +1013,7 @@ pub struct ProdContainerOrchestrator {
/// Quadlet `.container` unit and starts it via systemctl --user
/// instead of shelling out to `podman create + start`. Default
/// false so the legacy path remains the production path until the
/// 20× lifecycle harness goes green against the new path.
/// 5× lifecycle harness goes green against the new path.
use_quadlet_backends: bool,
#[cfg(test)]
test_disk_gb: Option<u64>,
@ -1207,6 +1326,11 @@ impl ProdContainerOrchestrator {
async fn reconcile_all_with_mode(&self, mode: ReconcileMode) -> ReconcileReport {
let user_stopped = crate::crash_recovery::load_user_stopped(&self.data_dir).await;
// Durable desired-state signal: the container names that were running at
// the last periodic snapshot. Used below to recreate a previously-running
// app whose container vanished (e.g. a wedged teardown cleared by a
// reboot) instead of leaving it down. See the immich .198 incident.
let was_running = crate::crash_recovery::load_last_running_names(&self.data_dir).await;
let manifests: Vec<LoadedManifest> = {
let state = self.state.read().await;
let dependency_required = dependency_manifests_required_by_active_apps(
@ -1240,6 +1364,34 @@ impl ProdContainerOrchestrator {
continue;
}
match self.ensure_running_with_mode(&lm, mode).await {
// Desired-state recovery: the app has no container and was left
// "absent" by boot reconcile, BUT it was running at the last
// snapshot — so its container vanished unexpectedly (a wedged
// teardown cleared by a reboot, a lost container record after a
// crash). It isn't user-stopped (those are filtered out of
// `manifests` above) and it's still installed (manifest present),
// so recreate it rather than leave a previously-running app down.
// Match is exact: compute_container_name == the snapshot's podman
// name (incl. each stack member), so no false positives. The only
// "absent" Left reason is the optional-missing case, so this never
// fires for paused/unknown states.
Ok(ReconcileAction::Left(reason))
if mode == ReconcileMode::ExistingOnly
&& reason == "absent"
&& was_running.contains(&compute_container_name(&lm.manifest)) =>
{
tracing::warn!(
app_id = %app_id,
"previously-running app has no container after boot — recreating (desired-state recovery)"
);
match self.install_fresh(&lm).await {
Ok(()) => report.record(&app_id, ReconcileAction::Installed),
Err(e) => {
tracing::error!(app_id = %app_id, error = %e, "desired-state recovery (recreate) failed");
report.failures.push((app_id, e.to_string()));
}
}
}
Ok(action) => report.record(&app_id, action),
Err(e) => {
tracing::error!(app_id = %app_id, error = %e, "reconcile failed");
@ -1326,6 +1478,27 @@ impl ProdContainerOrchestrator {
self.resolve_dynamic_env(&mut resolved_manifest)?;
let name = compute_container_name(&lm.manifest);
// An explicitly user-stopped app MUST stay stopped. The reconcile filter
// already drops user-stopped apps, but its `dependency_required` override
// re-includes a stopped app that an *active* app depends on (e.g. mempool
// keeps electrumx in the list), and the in-memory `disabled` set is wiped
// on manifest reload — so reconcile would resurrect it: its now-unreachable
// ports look like a fault, the host-port "repair" restarts it, and
// package.stop never sticks. Honour the on-disk marker here, the single
// choke point every reconcile flows through. Explicit install/start/restart
// clear the marker BEFORE calling this, so they are unaffected.
{
let user_stopped = crate::crash_recovery::load_user_stopped(&self.data_dir).await;
if user_stopped.contains(&app_id) || user_stopped.contains(&name) {
tracing::debug!(
app_id = %app_id,
container = %name,
"reconcile skipped — app is user-stopped (must stay stopped)"
);
return Ok(ReconcileAction::Left("user-stopped".into()));
}
}
match self.runtime.get_container_status(&name).await {
Ok(status) => {
// Phase 3.3: migrate pre-Phase-3 containers in place, but only
@ -1341,6 +1514,26 @@ impl ProdContainerOrchestrator {
}
match status.state {
ContainerState::Running => {
// Zombie guard: podman can report a container "running"
// after its process has died (conmon SIGKILLed in a
// cgroup cascade on archipelago restart, etc.). Such a
// container serves nothing yet would be NoOp'd forever.
// Recreate it from the manifest. This is the ONLY path
// that recovers a dead dependency with no published host
// port (netbird-dashboard on .228, 2026-06-25 — stale
// "Up" → proxy 502 → NetBird login broke). Conservative:
// only fires on a concrete dead PID, never on uncertainty.
if !container_running_process_alive(&name).await {
tracing::warn!(
app_id = %app_id,
container = %name,
"container reported running but its process is dead (zombie) — recreating"
);
let _ = self.runtime.stop_container(&name).await;
let _ = self.runtime.remove_container(&name).await;
self.install_fresh(lm).await?;
return Ok(ReconcileAction::Installed);
}
// App-specific hooks get a chance to refresh bind-mounted
// config. bitcoin-ui: re-render nginx.conf if the RPC
// password rotated (or template changed via OTA). If
@ -1717,7 +1910,7 @@ impl ProdContainerOrchestrator {
} else {
self.remove_quadlet_unit_if_present(&name).await?;
ensure_user_podman_socket().await?;
// Legacy path. Production until tests/lifecycle/run-20x.sh
// Legacy path. Production until tests/lifecycle/run-gate.sh
// goes green against the Quadlet path.
self.runtime
.create_container(&resolved_manifest, &name, 0)
@ -1788,6 +1981,9 @@ impl ProdContainerOrchestrator {
self.run_pre_start_hooks(&manifest.app.id).await?;
self.ensure_bind_mount_sockets(manifest).await?;
self.ensure_bind_mount_dirs(manifest).await?;
// Certs before files: a templated file may not need the cert, but the
// container's bind-mounts expect both present before create_container.
self.ensure_manifest_certs(manifest).await?;
self.ensure_manifest_files(manifest).await?;
self.apply_data_uid(manifest).await?;
self.run_post_data_uid_hooks(&manifest.app.id).await?;
@ -2695,6 +2891,21 @@ impl ProdContainerOrchestrator {
continue;
}
// Whether the bind source already existed BEFORE we (root) create it,
// so the ownership fix-up below only touches a dir we just made.
let source_existed = Path::new(&volume.source).exists();
// Capture — BEFORE the root mkdir — the nearest pre-existing ancestor
// (the "anchor" whose ownership we copy) and the TOPMOST dir `mkdir -p`
// will newly create. For a NESTED bind source like
// `<dataroot>/<app>/<subdir>` (jellyfin /config + /cache, netbird
// /data), `mkdir -p` creates the intermediate `<app>` dir root:root
// too, so referencing the *immediate* parent copied ROOT — leaving the
// dir unwritable and the app EACCES-crash-looping on reinstall. Anchor
// instead to the nearest dir that already existed (the rootless data
// root, owned by the service user) and chown the whole new subtree.
let (anchor, top_created) = fresh_subtree_anchor(Path::new(&volume.source));
let mkdir_status = host_sudo(&["mkdir", "-p", &volume.source])
.await
.with_context(|| format!("mkdir {}", volume.source))?;
@ -2705,6 +2916,39 @@ impl ProdContainerOrchestrator {
mkdir_status.code()
));
}
// A bind dir we JUST created is owned root:root (mkdir ran via sudo).
// An app that declares no `data_uid` runs as its own root inside the
// container, which rootless Podman maps to the host user running
// archipelago — so a root:root dir is UNWRITABLE from inside and the
// app EACCES-crash-loops the moment it tries to create a subdir. The
// in-container ownership self-heal only runs on RUNNING containers, so
// it never fires for an app that crashes on startup. Match the new
// subtree to the anchor's owner via `--reference` (no host-uid
// guessing). Only on fresh creation, and only when apply_data_uid
// won't already chown it.
if !source_existed && manifest.app.container.data_uid.is_none() {
if let Some(anchor) = anchor {
match host_sudo(&[
"chown",
"-R",
&format!("--reference={}", anchor.display()),
&top_created.display().to_string(),
])
.await
{
Ok(s) if s.success() => {}
Ok(s) => tracing::warn!(
app_id = %manifest.app.id, dir = %volume.source,
"bind-dir ownership match exited {:?} (app may EACCES)", s.code()
),
Err(e) => tracing::warn!(
app_id = %manifest.app.id, dir = %volume.source,
"bind-dir ownership match failed (non-fatal): {e}"
),
}
}
}
}
Ok(())
}
@ -2729,7 +2973,14 @@ impl ProdContainerOrchestrator {
async fn ensure_manifest_files(&self, manifest: &AppManifest) -> Result<HookOutcome> {
let mut outcome = HookOutcome::Unchanged;
for file in &manifest.app.files {
if ensure_generated_file(file)
// Render templated placeholders before comparing/writing so the
// idempotency check is against the FINAL bytes (not the template),
// otherwise a rendered file would be rewritten every reconcile.
let rendered = self
.render_file_placeholders(manifest, &file.content)
.await
.with_context(|| format!("rendering manifest file {}", file.path))?;
if ensure_rendered_file(&file.path, &rendered, file.overwrite)
.await
.with_context(|| format!("ensure manifest file {}", file.path))?
== HookOutcome::Rewritten
@ -2739,23 +2990,186 @@ impl ProdContainerOrchestrator {
}
Ok(outcome)
}
/// Substitute the allow-listed placeholders a manifest `GeneratedFile` may
/// carry. Keeps runtime-derived config (netbird's `config.yaml`/`nginx.conf`)
/// declarative instead of generated by per-app Rust:
/// - `{{HOST_IP}}` / `{{HOST_MDNS}}` — host facts (`hostname -I` / `.local`).
/// - `{{NETWORK_GATEWAY}}` — the gateway of the app's podman network, i.e.
/// aardvark's DNS address. nginx uses it as an explicit `resolver` so it
/// re-resolves container names per request instead of pinning a stale IP
/// and 502-ing after a restart/reboot (issue #15). The network is ensured
/// to exist first so the gateway is readable on a fresh install (this runs
/// before `install_fresh`'s own `ensure_container_network`; both idempotent).
/// - `{{secret:NAME}}` — a `0600` secret read from the service-owned secrets
/// dir (e.g. netbird's base64 relay/store keys). NEVER logged.
async fn render_file_placeholders(
&self,
manifest: &AppManifest,
content: &str,
) -> Result<String> {
let mut out = content.to_string();
if out.contains("{{HOST_IP}}") || out.contains("{{HOST_MDNS}}") {
let facts = self.detect_host_facts();
out = out
.replace("{{HOST_IP}}", &facts.host_ip)
.replace("{{HOST_MDNS}}", &facts.host_mdns);
}
if out.contains("{{NETWORK_GATEWAY}}") {
self.ensure_container_network(manifest).await?;
let gw = self.network_gateway(manifest).await?;
out = out.replace("{{NETWORK_GATEWAY}}", &gw);
}
out = self.render_secret_placeholders(&out).await?;
Ok(out)
}
/// Replace every `{{secret:NAME}}` with the trimmed contents of
/// `<secrets_dir>/NAME`. `NAME` must be a bare filename (the same safety bar
/// as `secret_env`). The secret value is never placed in an error or log.
async fn render_secret_placeholders(&self, content: &str) -> Result<String> {
const OPEN: &str = "{{secret:";
let mut out = String::with_capacity(content.len());
let mut rest = content;
while let Some(start) = rest.find(OPEN) {
out.push_str(&rest[..start]);
let after = &rest[start + OPEN.len()..];
let end = after
.find("}}")
.ok_or_else(|| anyhow::anyhow!("unterminated {{secret:...}} placeholder"))?;
let name = &after[..end];
if name.is_empty() || name.contains('/') || name.contains("..") {
anyhow::bail!("invalid secret placeholder name '{name}' (must be a bare filename)");
}
let value = tokio::fs::read_to_string(self.secrets_dir.join(name))
.await
.map_err(|_| {
// Do not surface the path-with-value or io detail beyond the name.
anyhow::anyhow!("secret '{name}' referenced by a manifest file is missing")
})?;
out.push_str(value.trim());
rest = &after[end + 2..];
}
out.push_str(rest);
Ok(out)
}
/// The gateway IP of the app's podman network — aardvark's DNS resolver
/// address. (Generalised from the old per-app netbird resolver helper,
/// deleted in #20 ph4.) Falls back to
/// podman's usual first-pool gateway if the inspect can't be parsed (the
/// network was just ensured to exist, so this is a belt-and-braces default).
async fn network_gateway(&self, manifest: &AppManifest) -> Result<String> {
let network = manifest
.app
.container
.network
.as_deref()
.filter(|n| !n.is_empty() && !is_builtin_network_mode(n))
.ok_or_else(|| {
anyhow::anyhow!("{{NETWORK_GATEWAY}} used but app has no dedicated network")
})?;
let out = tokio::process::Command::new("podman")
.args([
"network",
"inspect",
network,
"--format",
"{{range .Subnets}}{{.Gateway}}{{end}}",
])
.output()
.await
.with_context(|| format!("inspecting podman network {network} for gateway"))?;
let gw = String::from_utf8_lossy(&out.stdout).trim().to_string();
if !gw.is_empty() && gw.parse::<std::net::IpAddr>().is_ok() {
return Ok(gw);
}
tracing::warn!(
network,
"could not read network gateway; falling back to 10.89.0.1"
);
Ok("10.89.0.1".to_string())
}
/// Materialise manifest-declared self-signed TLS certs before the container
/// is created (so a bind-mounted cert path resolves to a real file). Skips an
/// entry whose crt+key already exist (idempotent / data-preserving). CN and
/// SAN templates are rendered against host facts; when omitted they default
/// to the node's host IP plus `127.0.0.1`/`localhost` so the cert is valid
/// however the box is reached locally. (Generalised from the old per-app
/// netbird TLS helper, deleted in #20 ph4: rsa:2048, 10-year, no per-app Rust.)
async fn ensure_manifest_certs(&self, manifest: &AppManifest) -> Result<()> {
let facts = self.detect_host_facts();
let render = |s: &str| {
s.replace("{{HOST_IP}}", &facts.host_ip)
.replace("{{HOST_MDNS}}", &facts.host_mdns)
};
for cert in &manifest.app.container.generated_certs {
if tokio::fs::metadata(&cert.crt).await.is_ok()
&& tokio::fs::metadata(&cert.key).await.is_ok()
{
continue;
}
if let Some(parent) = Path::new(&cert.crt).parent() {
create_dir_all_or_sudo(parent).await?;
}
if let Some(parent) = Path::new(&cert.key).parent() {
create_dir_all_or_sudo(parent).await?;
}
let cn = render(cert.common_name.as_deref().unwrap_or("{{HOST_IP}}"));
let san = if cert.sans.is_empty() {
format!("IP:{},IP:127.0.0.1,DNS:localhost", facts.host_ip)
} else {
cert.sans
.iter()
.map(|s| render(s))
.collect::<Vec<_>>()
.join(",")
};
let status = tokio::process::Command::new("openssl")
.args([
"req",
"-x509",
"-newkey",
"rsa:2048",
"-nodes",
"-keyout",
&cert.key,
"-out",
&cert.crt,
"-days",
"3650",
"-subj",
&format!("/CN={cn}"),
"-addext",
&format!("subjectAltName={san}"),
])
.status()
.await
.with_context(|| format!("running openssl for manifest cert {}", cert.crt))?;
if !status.success() {
anyhow::bail!("openssl failed to generate manifest cert {}", cert.crt);
}
}
Ok(())
}
}
async fn ensure_generated_file(file: &GeneratedFile) -> Result<HookOutcome> {
let path = Path::new(&file.path);
if let Ok(existing) = tokio::fs::read_to_string(path).await {
if existing == file.content || !file.overwrite {
async fn ensure_rendered_file(path: &str, content: &str, overwrite: bool) -> Result<HookOutcome> {
let p = Path::new(path);
if let Ok(existing) = tokio::fs::read_to_string(p).await {
if existing == content || !overwrite {
return Ok(HookOutcome::Unchanged);
}
} else if path.exists() && !file.overwrite {
} else if p.exists() && !overwrite {
return Ok(HookOutcome::Unchanged);
}
let parent = path
let parent = p
.parent()
.ok_or_else(|| anyhow::anyhow!("generated file path has no parent: {}", file.path))?;
.ok_or_else(|| anyhow::anyhow!("generated file path has no parent: {}", path))?;
create_dir_all_or_sudo(parent).await?;
write_generated_file_atomically(path, &file.content).await?;
write_generated_file_atomically(p, content).await?;
Ok(HookOutcome::Rewritten)
}
@ -2839,6 +3253,11 @@ impl ContainerOrchestrator for ProdContainerOrchestrator {
let mut state = self.state.write().await;
state.disabled.remove(app_id);
}
// Installing is an explicit "I want this running" action — clear the
// user-stopped marker so the new reconcile guard in
// `ensure_running_with_mode` doesn't skip the very container we're
// installing. (start/restart RPC handlers clear it on their side too.)
crate::crash_recovery::clear_user_stopped(&self.data_dir, app_id).await;
// Idempotent: if the container is already up and healthy, just
// refresh hooks and return. If it's stopped, start it. If it's
// missing or in a wedged state, install fresh.
@ -2882,6 +3301,10 @@ impl ContainerOrchestrator for ProdContainerOrchestrator {
let mut state = self.state.write().await;
state.disabled.remove(app_id);
}
// Explicit start clears the user-stopped marker so the reconcile guard in
// `ensure_running_with_mode` doesn't skip this container (symmetric with
// install; the start/restart RPC handlers also clear it).
crate::crash_recovery::clear_user_stopped(&self.data_dir, app_id).await;
let lm = self.loaded(app_id).await?;
let action = self.ensure_running(&lm).await?;
match action {
@ -3924,15 +4347,15 @@ app:
let data_dir = tempfile::tempdir().unwrap();
orch.insert_manifest_for_test(
pull_manifest_with_generated_file(
"meshtastic",
"docker.io/meshtastic/meshtasticd:daily-alpine",
"exampleapp",
"docker.io/example/exampleapp:latest",
data_dir.path().to_string_lossy().as_ref(),
),
PathBuf::from("/tmp/meshtastic"),
PathBuf::from("/tmp/exampleapp"),
)
.await;
orch.install("meshtastic").await.unwrap();
orch.install("exampleapp").await.unwrap();
let config_path = data_dir.path().join("config.yaml");
let config = std::fs::read_to_string(config_path).unwrap();
@ -3940,7 +4363,7 @@ app:
let calls = rt.calls();
assert!(calls
.iter()
.any(|c| c == "create_container:meshtastic:offset=0"));
.any(|c| c == "create_container:exampleapp:offset=0"));
}
#[tokio::test]
@ -3954,15 +4377,15 @@ app:
orch.insert_manifest_for_test(
pull_manifest_with_generated_file(
"meshtastic",
"docker.io/meshtastic/meshtasticd:daily-alpine",
"exampleapp",
"docker.io/example/exampleapp:latest",
data_dir.path().to_string_lossy().as_ref(),
),
PathBuf::from("/tmp/meshtastic"),
PathBuf::from("/tmp/exampleapp"),
)
.await;
orch.install("meshtastic").await.unwrap();
orch.install("exampleapp").await.unwrap();
let config = std::fs::read_to_string(config_path).unwrap();
assert_eq!(config, "key: operator\n");
@ -3979,15 +4402,15 @@ app:
orch.insert_manifest_for_test(
pull_manifest_with_generated_file_overwrite(
"meshtastic",
"docker.io/meshtastic/meshtasticd:daily-alpine",
"exampleapp",
"docker.io/example/exampleapp:latest",
data_dir.path().to_string_lossy().as_ref(),
),
PathBuf::from("/tmp/meshtastic"),
PathBuf::from("/tmp/exampleapp"),
)
.await;
orch.install("meshtastic").await.unwrap();
orch.install("exampleapp").await.unwrap();
let config = std::fs::read_to_string(config_path).unwrap();
assert_eq!(config, "key: new\n");
@ -4497,4 +4920,47 @@ app:
)
);
}
#[test]
fn fresh_subtree_anchor_handles_nested_and_direct() {
let tmp = tempfile::tempdir().unwrap();
let root = tmp.path(); // the pre-existing "data root"
// Direct child (immich-style): anchor is the data root, subtree top is
// the child itself.
let direct = root.join("immich");
let (anchor, top) = fresh_subtree_anchor(&direct);
assert_eq!(anchor.as_deref(), Some(root));
assert_eq!(top, direct);
// Nested (jellyfin-style /config): the intermediate `jellyfin` dir does
// NOT exist yet, so the anchor must skip past it to the data root and the
// subtree top is `jellyfin` — chowning that -R fixes both levels. The old
// code referenced the immediate parent (`jellyfin`), which mkdir -p makes
// root-owned → the EACCES bug.
let nested = root.join("jellyfin").join("config");
let (anchor, top) = fresh_subtree_anchor(&nested);
assert_eq!(anchor.as_deref(), Some(root));
assert_eq!(top, root.join("jellyfin"));
// Second volume of the same app: now `jellyfin` exists (created for the
// first volume), so the anchor is `jellyfin` and only `cache` is new.
std::fs::create_dir(root.join("jellyfin")).unwrap();
let (anchor, top) = fresh_subtree_anchor(&root.join("jellyfin").join("cache"));
assert_eq!(anchor.as_deref(), Some(root.join("jellyfin").as_path()));
assert_eq!(top, root.join("jellyfin").join("cache"));
}
#[test]
fn pid_is_alive_detects_live_and_dead_pids() {
// Our own process is alive.
assert!(pid_is_alive(std::process::id() as i32));
// Non-positive PIDs are never alive (a "running" container with PID 0 is
// exactly the zombie case).
assert!(!pid_is_alive(0));
assert!(!pid_is_alive(-1));
// A PID far above the kernel's pid_max can't name a live process, so the
// zombie guard reports it dead → the reconciler recreates.
assert!(!pid_is_alive(2_000_000_000));
}
}

View File

@ -581,11 +581,12 @@ pub async fn write_if_changed(unit: &QuadletUnit, dir: &Path) -> Result<bool> {
/// Reload the user systemd manager. Required after any quadlet write
/// or removal so systemd picks up the generated `.service` translation.
pub async fn daemon_reload_user() -> Result<()> {
let status = Command::new("systemctl")
.args(["--user", "daemon-reload"])
.status()
// Bounded: a wedged user manager (e.g. a unit stuck "deactivating" while
// podman hangs) could otherwise block daemon-reload indefinitely and freeze
// any caller — notably uninstall teardown.
let status = systemctl_user_status(&["daemon-reload"], Duration::from_secs(30))
.await
.context("spawn systemctl --user daemon-reload")?;
.context("systemctl --user daemon-reload")?;
if !status.success() {
return Err(anyhow!("systemctl --user daemon-reload exited {status}"));
}
@ -787,11 +788,19 @@ fn directive_values(unit_body: &str, prefix: &str) -> Vec<String> {
/// that systemd no longer knows about.
pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
let svc = format!("{unit_name}.service");
// Stop first; ignore failure (unit may already be down).
let _ = Command::new("systemctl")
.args(["--user", "stop", &svc])
.status()
.await;
// Stop first; ignore failure (unit may already be down). BOUNDED — on
// rootless podman a generated unit can wedge in "deactivating" while
// `podman rm -f` hangs underneath it, and an unbounded `systemctl stop`
// would block the entire uninstall forever: the progress bar freezes and
// the package entry is stranded in `Removing` (a ghost in My Apps that also
// blocks reinstall). If the graceful stop times out, escalate to
// SIGKILL + reset-failed so teardown always proceeds.
if systemctl_user_status(&["stop", &svc], QUADLET_STOP_TIMEOUT)
.await
.is_err()
{
let _ = kill_and_reset_service(&svc).await;
}
let path = dir.join(format!("{unit_name}.container"));
if fs::try_exists(&path).await.unwrap_or(false) {
match fs::remove_file(&path).await {
@ -802,10 +811,15 @@ pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
}
daemon_reload_user().await.ok();
// Defensive: kill the actual container too, in case quadlet left it.
let _ = Command::new("podman")
.args(["rm", "-f", unit_name])
.status()
.await;
// Bounded so a hung podman store can't re-introduce the stall this function
// exists to avoid.
let _ = tokio::time::timeout(
QUADLET_STOP_TIMEOUT,
Command::new("podman")
.args(["rm", "-f", unit_name])
.status(),
)
.await;
Ok(())
}

View File

@ -66,6 +66,7 @@ fn ensure_one(dir: &Path, gs: &GeneratedSecret) -> Result<()> {
match gs.kind {
SecretGenKind::Hex16 => write_secret(&dir.join(&gs.name), &random_hex(16))?,
SecretGenKind::Hex32 => write_secret(&dir.join(&gs.name), &random_hex(32))?,
SecretGenKind::Base64 => write_secret(&dir.join(&gs.name), &random_base64(32))?,
SecretGenKind::Bcrypt => {
let password = random_hex(BCRYPT_PASSWORD_BYTES);
let hash = bcrypt::hash(&password, bcrypt::DEFAULT_COST)
@ -92,6 +93,15 @@ fn random_hex(bytes: usize) -> String {
hex::encode(buf)
}
/// `bytes` of entropy, standard base64 (with padding). For keys that a service
/// base64-decodes to recover the raw bytes (e.g. netbird's store encryptionKey).
fn random_base64(bytes: usize) -> String {
use base64::Engine as _;
let mut buf = vec![0u8; bytes];
rand::thread_rng().fill_bytes(&mut buf);
base64::engine::general_purpose::STANDARD.encode(buf)
}
/// Atomically write a `0600` secret: a temp file in the same dir (so the rename
/// is atomic), fsynced, then renamed over the target.
fn write_secret(path: &Path, value: &str) -> Result<()> {

View File

@ -61,6 +61,22 @@ pub async fn load_user_stopped(data_dir: &Path) -> std::collections::HashSet<Str
}
}
/// Names of the containers that were running at the last periodic snapshot
/// (`running-containers.json`, saved every ~120s by `save_container_snapshot`).
/// Unlike `check_for_crash`, this reads the snapshot unconditionally (no PID/crash
/// gate) — it's the durable "what was running" signal the boot reconciler uses to
/// recreate a previously-running app whose container vanished. Empty if absent.
pub async fn load_last_running_names(data_dir: &Path) -> std::collections::HashSet<String> {
let path = data_dir.join(CONTAINER_STATE_FILE);
match fs::read_to_string(&path).await {
Ok(content) => match serde_json::from_str::<ContainerSnapshot>(&content) {
Ok(snapshot) => snapshot.containers.into_iter().map(|c| c.name).collect(),
Err(_) => std::collections::HashSet::new(),
},
Err(_) => std::collections::HashSet::new(),
}
}
/// Save the set of user-stopped containers to disk.
pub async fn save_user_stopped(data_dir: &Path, stopped: &std::collections::HashSet<String>) {
let path = data_dir.join(USER_STOPPED_FILE);
@ -898,6 +914,43 @@ mod tests {
assert_eq!(containers[1].name, "archy-mempool-web");
}
#[tokio::test]
async fn test_load_last_running_names_reads_snapshot_without_pid_gate() {
let tmp = TempDir::new().unwrap();
// No PID file written — load_last_running_names must NOT require a crash.
let snapshot = ContainerSnapshot {
timestamp: 1000,
containers: vec![
RunningContainerRecord {
name: "immich_server".to_string(),
image: "immich:2.7".to_string(),
},
RunningContainerRecord {
name: "immich_postgres".to_string(),
image: "postgres:16".to_string(),
},
],
};
fs::write(
tmp.path().join(CONTAINER_STATE_FILE),
serde_json::to_string(&snapshot).unwrap(),
)
.await
.unwrap();
let names = load_last_running_names(tmp.path()).await;
assert_eq!(names.len(), 2);
assert!(names.contains("immich_server"));
assert!(names.contains("immich_postgres"));
assert!(!names.contains("immich_redis"));
}
#[tokio::test]
async fn test_load_last_running_names_empty_when_absent() {
let tmp = TempDir::new().unwrap();
assert!(load_last_running_names(tmp.path()).await.is_empty());
}
#[tokio::test]
async fn test_write_and_remove_pid_marker() {
let tmp = TempDir::new().unwrap();

View File

@ -198,6 +198,24 @@ async fn main() -> Result<()> {
(Some(trait_obj), Some(dev))
} else {
let prod = Arc::new(ProdContainerOrchestrator::new(config.clone()).await?);
// Pull the freshest signed app-catalog BEFORE loading manifests, so any
// registry-embedded manifest (the origin-wins overlay in load_manifests)
// is in place on THIS boot — not a restart later. Without this the boot
// would overlay the previous run's cached catalog and a newly-published
// app (e.g. a registry-only install) wouldn't appear until the next
// restart. Bounded + best-effort: on timeout/unreachable origin the
// last-cached catalog (or the disk manifests) still load — registry is
// an overlay on top of disk, never a hard dependency.
match tokio::time::timeout(
std::time::Duration::from_secs(25),
crate::container::app_catalog::refresh_catalog(&config.data_dir),
)
.await
{
Ok(Ok(n)) => info!("🛰️ app-catalog refreshed before manifest load ({n} apps)"),
Ok(Err(e)) => tracing::debug!("app-catalog pre-load refresh failed (using cache): {e}"),
Err(_) => tracing::debug!("app-catalog pre-load refresh timed out (using cache)"),
}
// Best-effort manifest load; a missing /opt/archipelago/apps is
// logged inside load_manifests and not fatal.
match prod.load_manifests().await {

View File

@ -373,6 +373,8 @@ pub fn spawn_mesh_listener(
our_x25519_secret: [u8; 32],
our_x25519_pubkey_hex: String,
server_name: Option<String>,
lora_region: Option<String>,
channel_name: Option<String>,
shutdown: tokio::sync::watch::Receiver<bool>,
cmd_rx: mpsc::Receiver<MeshCommand>,
) -> tokio::task::JoinHandle<()> {
@ -394,6 +396,8 @@ pub fn spawn_mesh_listener(
&our_x25519_secret,
&our_x25519_pubkey_hex,
server_name.as_deref(),
lora_region.as_deref(),
channel_name.as_deref(),
&mut shutdown,
&mut cmd_rx,
)

View File

@ -39,6 +39,30 @@ impl MeshRadioDevice {
}
}
/// Provision the operator-configured LoRa region. Meshcore radios manage
/// their own band on the device, so this is a no-op for them; Meshtastic
/// radios ship region-UNSET (RF-silent) and must be set or they never mesh.
/// Returns `Ok(true)` when a region was written (the device reboots to
/// apply, so the caller should restart the session).
async fn ensure_lora_region(&mut self, region: Option<&str>) -> Result<bool> {
match self {
Self::Meshcore(_) => Ok(false),
Self::Meshtastic(device) => device.ensure_lora_region(region).await,
}
}
/// Provision the shared archy primary channel so all nodes can decode each
/// other. No-op for meshcore (it joins its channel by name on the device);
/// Meshtastic radios can sit on mismatched channels otherwise and silently
/// drop every packet as undecryptable. Returns `Ok(true)` when a channel was
/// written (device reboots; caller should restart the session).
async fn ensure_channel(&mut self, channel_name: Option<&str>) -> Result<bool> {
match self {
Self::Meshcore(_) => Ok(false),
Self::Meshtastic(device) => device.ensure_channel(channel_name).await,
}
}
async fn send_self_advert(&mut self) -> Result<()> {
match self {
Self::Meshcore(device) => device.send_self_advert().await,
@ -46,6 +70,17 @@ impl MeshRadioDevice {
}
}
/// Actively advertise our identity over the air. Meshcore already does this
/// inside `send_self_advert` (CMD_SEND_SELF_ADVERT), so this is a no-op for
/// it; Meshtastic needs an explicit NodeInfo broadcast or peers never learn
/// about an already-running node.
async fn send_nodeinfo_advert(&mut self, want_response: bool) -> Result<()> {
match self {
Self::Meshcore(_) => Ok(()),
Self::Meshtastic(device) => device.send_nodeinfo_broadcast(want_response).await,
}
}
async fn send_channel_text(&mut self, channel: u8, payload: &[u8]) -> Result<()> {
match self {
Self::Meshcore(device) => device.send_channel_text(channel, payload).await,
@ -471,6 +506,23 @@ async fn sync_queued_messages(
}
}
/// How many times we will try to write the LoRa region across reconnects before
/// giving up. A healthy radio accepts it on the first try (the reboot-and-verify
/// resolves on the next session). A radio that silently refuses to persist
/// config — corrupt/full flash, managed mode, etc. — would otherwise reboot-loop
/// forever; after this many attempts we stop, log, and run without it.
const MAX_REGION_PROVISION_ATTEMPTS: u32 = 3;
/// Process-global count of LoRa-region writes attempted (one radio per process).
/// Reset to 0 whenever the radio reports the desired region, so genuine later
/// drift re-provisions but a broken radio doesn't loop.
static REGION_PROVISION_ATTEMPTS: std::sync::atomic::AtomicU32 =
std::sync::atomic::AtomicU32::new(0);
/// Same retry-cap idea as the region, for the shared-channel write.
static CHANNEL_PROVISION_ATTEMPTS: std::sync::atomic::AtomicU32 =
std::sync::atomic::AtomicU32::new(0);
/// Run a single mesh session (connect, initialize, main loop).
pub(super) async fn run_mesh_session(
state: &Arc<MeshState>,
@ -480,6 +532,8 @@ pub(super) async fn run_mesh_session(
our_x25519_secret: &[u8; 32],
our_x25519_pubkey_hex: &str,
server_name: Option<&str>,
lora_region: Option<&str>,
channel_name: Option<&str>,
shutdown: &mut tokio::sync::watch::Receiver<bool>,
cmd_rx: &mut mpsc::Receiver<MeshCommand>,
) -> Result<()> {
@ -512,6 +566,73 @@ pub(super) async fn run_mesh_session(
let _ = state.event_tx.send(MeshEvent::DeviceConnected(device_info));
// Provision the LoRa region before anything else. A fresh Meshtastic radio
// is region-UNSET and therefore RF-silent — it can neither hear nor be
// heard, so contact discovery and DMs would all silently fail. If we write
// a new region the firmware reboots to apply it; restart the session so we
// re-handshake the freshly-rebooted radio (and then set its name on the
// reconnect, where the region already matches and no reboot occurs).
use std::sync::atomic::Ordering;
let region_attempts = REGION_PROVISION_ATTEMPTS.load(Ordering::Relaxed);
if region_attempts < MAX_REGION_PROVISION_ATTEMPTS {
match device.ensure_lora_region(lora_region).await {
Ok(true) => {
REGION_PROVISION_ATTEMPTS.fetch_add(1, Ordering::Relaxed);
info!(
region = lora_region.unwrap_or(""),
attempt = region_attempts + 1,
max = MAX_REGION_PROVISION_ATTEMPTS,
"Provisioned LoRa region — radio rebooting, restarting mesh session"
);
// Give the radio time to reboot before the reconnect re-opens it.
tokio::time::sleep(Duration::from_secs(10)).await;
return Ok(());
}
// Radio reports the desired region (or none configured): clear the
// attempt counter so a future genuine drift re-provisions cleanly.
Ok(false) => REGION_PROVISION_ATTEMPTS.store(0, Ordering::Relaxed),
Err(e) => warn!("Failed to provision LoRa region: {}", e),
}
} else if lora_region.is_some() {
warn!(
region = lora_region.unwrap_or(""),
attempts = MAX_REGION_PROVISION_ATTEMPTS,
"Radio did not persist the configured LoRa region after repeated \
attempts continuing without it. The radio likely needs a manual \
factory reset / reflash; mesh discovery stays offline until its \
region is set."
);
}
// Provision the shared primary channel (after the region, since both reboot
// the radio). Without a matching channel two same-region radios still can't
// decode each other's traffic. Same retry-cap + restart-on-change pattern.
let channel_attempts = CHANNEL_PROVISION_ATTEMPTS.load(Ordering::Relaxed);
if channel_attempts < MAX_REGION_PROVISION_ATTEMPTS {
match device.ensure_channel(channel_name).await {
Ok(true) => {
CHANNEL_PROVISION_ATTEMPTS.fetch_add(1, Ordering::Relaxed);
info!(
channel = channel_name.unwrap_or(""),
attempt = channel_attempts + 1,
max = MAX_REGION_PROVISION_ATTEMPTS,
"Provisioned shared mesh channel — radio rebooting, restarting mesh session"
);
tokio::time::sleep(Duration::from_secs(10)).await;
return Ok(());
}
Ok(false) => CHANNEL_PROVISION_ATTEMPTS.store(0, Ordering::Relaxed),
Err(e) => warn!("Failed to provision mesh channel: {}", e),
}
} else if channel_name.is_some() {
warn!(
channel = channel_name.unwrap_or(""),
attempts = MAX_REGION_PROVISION_ATTEMPTS,
"Radio did not persist the shared mesh channel after repeated \
attempts continuing without it; the radio may need a manual reset."
);
}
// Set advert name to the server's human-readable name (e.g. "ThinkPad"),
// falling back to the DID fragment if no name is configured.
let advert_name = if let Some(name) = server_name {
@ -536,6 +657,13 @@ pub(super) async fn run_mesh_session(
if let Err(e) = device.send_self_advert().await {
warn!("Failed to send initial advert: {}", e);
}
// Actively announce our identity over the air with want_response, so any
// already-running neighbour both learns about us and replies with its own
// NodeInfo — immediate two-way discovery instead of waiting for the radio's
// multi-hour NodeInfo cycle. (No-op for meshcore.)
if let Err(e) = device.send_nodeinfo_advert(true).await {
warn!("Failed to send initial NodeInfo advert: {}", e);
}
// NOTE: Archipelago identity adverts (`ARCHY:2:{ed}:{x25519}`) are intentionally
// NOT broadcast on the shared public channel (channel 0). Doing so spams every
@ -615,6 +743,13 @@ pub(super) async fn run_mesh_session(
} else {
consecutive_write_failures = 0;
}
// Periodic over-air identity beacon (no want_response, to avoid
// reply storms) so peers that come online later still discover
// us between the radio's own infrequent NodeInfo broadcasts.
// No-op for meshcore (its self-advert above already goes out).
if let Err(e) = device.send_nodeinfo_advert(false).await {
debug!("Periodic NodeInfo advert failed: {}", e);
}
// (Identity re-broadcast on the public channel intentionally
// removed — see the note at session startup. It spammed the
// shared channel every advert tick.)

View File

@ -22,6 +22,10 @@ const START2: u8 = 0xc3;
const TO_RADIO_MAX: usize = 512;
const BROADCAST_NUM: u32 = 0xffff_ffff;
const TEXT_MESSAGE_APP: u32 = 1;
/// Meshtastic PortNum for NodeInfo (identity) packets — used to actively
/// advertise ourselves over the air so neighbours discover us, the parity
/// equivalent of meshcore's self-advert.
const NODEINFO_APP: u32 = 4;
/// Meshtastic PortNum for admin (config) packets.
const ADMIN_APP: u32 = 6;
/// AdminMessage.set_owner oneof field number (carries a `User`).
@ -37,9 +41,31 @@ const TO_RADIO_HEARTBEAT: u64 = 7;
const FROM_RADIO_PACKET: u64 = 2;
const FROM_RADIO_MY_INFO: u64 = 3;
const FROM_RADIO_NODE_INFO: u64 = 4;
/// FromRadio.config (field 5): a `Config` block streamed during want_config.
const FROM_RADIO_CONFIG: u64 = 5;
const FROM_RADIO_CONFIG_COMPLETE_ID: u64 = 7;
const FROM_RADIO_REBOOTED: u64 = 8;
/// AdminMessage.set_config oneof field number (carries a `Config`). NB: 33 is
/// `set_channel` — `set_config` is 34 (verified against meshtastic/protobufs).
const ADMIN_SET_CONFIG_FIELD: u64 = 34;
/// AdminMessage.set_channel oneof field number (carries a `Channel`).
const ADMIN_SET_CHANNEL_FIELD: u64 = 33;
/// FromRadio.channel (field 10): a `Channel` streamed during want_config.
const FROM_RADIO_CHANNEL: u64 = 10;
/// Channel.role value for the PRIMARY channel (broadcasts ride here).
const CHANNEL_ROLE_PRIMARY: u64 = 1;
/// Config.lora oneof field number (carries a `LoRaConfig`).
const CONFIG_LORA_FIELD: u64 = 6;
/// LoRaConfig field numbers we set when provisioning the radio's region.
const LORA_USE_PRESET_FIELD: u64 = 1;
const LORA_REGION_FIELD: u64 = 7;
const LORA_HOP_LIMIT_FIELD: u64 = 8;
const LORA_TX_ENABLED_FIELD: u64 = 9;
/// RegionCode::UNSET — a radio in this state refuses to transmit or receive on
/// LoRa, so it can never mesh. Fresh-flashed radios ship UNSET.
const REGION_UNSET: u32 = 0;
/// Async Meshtastic device handle.
pub struct MeshtasticDevice {
port: serial2_tokio::SerialPort,
@ -57,6 +83,19 @@ pub struct MeshtasticDevice {
/// records which peers are PKC-capable, so we can tell a true end-to-end
/// (PKI) DM from a channel-PSK fallback.
peer_pubkeys: HashMap<u32, Vec<u8>>,
/// The radio's currently-configured LoRa region code, learned from the
/// `Config.lora` block during `initialize`. `None` until that frame is
/// seen; `Some(REGION_UNSET)` for a fresh radio that has never had a region
/// set (which means it is RF-silent). Used to decide whether we need to
/// provision the operator-configured region — and to avoid a reboot loop by
/// only writing when it actually differs.
current_region: Option<u32>,
/// The radio's current PRIMARY channel as `(name, psk)`, learned from the
/// `Channel` blocks during `initialize`. Two radios only decode each other
/// when their primary channel (name + psk → channel hash) matches, so archy
/// provisions a shared channel here the same way it provisions the region.
/// `None` until a primary `Channel` frame is seen.
current_primary_channel: Option<(String, Vec<u8>)>,
device_path: String,
}
@ -84,6 +123,8 @@ impl MeshtasticDevice {
short_name: None,
contacts: HashMap::new(),
peer_pubkeys: HashMap::new(),
current_region: None,
current_primary_channel: None,
device_path: path.to_string(),
})
}
@ -203,10 +244,207 @@ impl MeshtasticDevice {
Ok(())
}
/// Ensure the radio is provisioned for the operator-configured LoRa region.
/// A freshly-flashed Meshtastic radio ships with `region = UNSET`, which
/// makes the firmware refuse to transmit or receive anything — so two such
/// radios can never see each other and the mesh appears empty. This is the
/// Meshtastic analog of how a meshcore radio comes up on its configured
/// band: archy brings every node onto the same region automatically.
///
/// Returns `Ok(true)` when it actually wrote a new region (the device then
/// reboots to apply it, so the caller should restart the session). Returns
/// `Ok(false)` when no change was needed (already correct, no region
/// configured, or an unrecognised region string) — never reboot-loops.
pub async fn ensure_lora_region(&mut self, region: Option<&str>) -> Result<bool> {
let Some(region_str) = region else {
return Ok(false);
};
let Some(code) = region_name_to_code(region_str) else {
warn!(
region = region_str,
"Unknown LoRa region in mesh-config — leaving radio region unchanged"
);
return Ok(false);
};
if code == REGION_UNSET {
// Operator explicitly asked for UNSET (or blank) — don't fight it.
return Ok(false);
}
match self.current_region {
Some(cur) if cur == code => Ok(false),
_ => {
self.set_lora_region(code).await?;
Ok(true)
}
}
}
/// Write a LoRa region to the locally-connected radio via an
/// `AdminMessage { set_config: Config { lora: LoRaConfig { … } } }` on the
/// ADMIN_APP port — the same local-admin path `set_advert_name` uses (no
/// session passkey needed over serial). We send a minimal, valid preset
/// config: `use_preset` + `LONG_FAST` (the default modem preset), the
/// chosen `region`, a sane `hop_limit`, and `tx_enabled`. The firmware
/// reboots to apply the change.
pub async fn set_lora_region(&mut self, region_code: u32) -> Result<()> {
let Some(node_num) = self.node_num else {
anyhow::bail!("Meshtastic set_lora_region: node_num unknown");
};
// LoRaConfig { use_preset(1)=true, region(7)=code, hop_limit(8)=3,
// tx_enabled(9)=true }. modem_preset defaults to LONG_FAST (0) and
// tx_power defaults to max, which is what we want for a stock mesh.
let mut lora = Vec::new();
encode_varint_field_into(LORA_USE_PRESET_FIELD, 1, &mut lora);
encode_varint_field_into(LORA_REGION_FIELD, region_code as u64, &mut lora);
encode_varint_field_into(LORA_HOP_LIMIT_FIELD, 3, &mut lora);
encode_varint_field_into(LORA_TX_ENABLED_FIELD, 1, &mut lora);
// Config { lora(6): LoRaConfig }
let mut config = Vec::new();
encode_len_field(CONFIG_LORA_FIELD, &lora, &mut config);
// AdminMessage { set_config(33): Config }
let mut admin = Vec::new();
encode_len_field(ADMIN_SET_CONFIG_FIELD, &config, &mut admin);
let packet = encode_mesh_packet(node_num, ADMIN_APP, &admin);
self.send_to_radio(&encode_to_radio_variant(TO_RADIO_PACKET, &packet))
.await
.context("Failed to send Meshtastic set_config(LoRa region) admin packet")?;
info!(
node_num,
region_code, "Set Meshtastic LoRa region (device will reboot to apply)"
);
self.current_region = Some(region_code);
Ok(())
}
/// Ensure the radio's PRIMARY channel matches the shared archy channel so
/// all nodes can decode each other. Region gets two radios onto the same
/// band; a matching channel (name + psk → channel hash) gets them decoding
/// each other's traffic — without it they hear each other but drop every
/// packet as undecryptable. The psk is derived deterministically from the
/// channel name, so every archy node with the same `channel_name` converges
/// on the same channel (the parity equivalent of meshcore's named channel).
///
/// Returns `Ok(true)` when it wrote a new channel (the device reboots to
/// apply, so the caller should restart the session); `Ok(false)` when no
/// change was needed — never reboot-loops.
pub async fn ensure_channel(&mut self, channel_name: Option<&str>) -> Result<bool> {
let Some(channel_name) = channel_name else {
return Ok(false);
};
if channel_name.is_empty() {
return Ok(false);
}
let desired_psk = derive_channel_psk(channel_name);
let already = matches!(
&self.current_primary_channel,
Some((name, psk)) if name == channel_name && psk == &desired_psk
);
if already {
Ok(false)
} else {
self.set_channel(channel_name, &desired_psk).await?;
Ok(true)
}
}
/// Write the PRIMARY channel via `AdminMessage { set_channel: Channel { … } }`
/// (the same local-admin path as `set_advert_name`). The firmware reboots to
/// apply it.
pub async fn set_channel(&mut self, name: &str, psk: &[u8]) -> Result<()> {
let Some(node_num) = self.node_num else {
anyhow::bail!("Meshtastic set_channel: node_num unknown");
};
// ChannelSettings { psk(2), name(3) }
let mut settings = Vec::new();
encode_len_field(2, psk, &mut settings);
encode_len_field(3, name.as_bytes(), &mut settings);
// Channel { index(1)=0, settings(2), role(3)=PRIMARY }
let mut channel = Vec::new();
encode_varint_field_into(1, 0, &mut channel);
encode_len_field(2, &settings, &mut channel);
encode_varint_field_into(3, CHANNEL_ROLE_PRIMARY, &mut channel);
// AdminMessage { set_channel(33): Channel }
let mut admin = Vec::new();
encode_len_field(ADMIN_SET_CHANNEL_FIELD, &channel, &mut admin);
let packet = encode_mesh_packet(node_num, ADMIN_APP, &admin);
self.send_to_radio(&encode_to_radio_variant(TO_RADIO_PACKET, &packet))
.await
.context("Failed to send Meshtastic set_channel admin packet")?;
info!(node_num, channel = %name, "Set Meshtastic primary channel (device will reboot to apply)");
self.current_primary_channel = Some((name.to_string(), psk.to_vec()));
Ok(())
}
pub async fn send_self_advert(&mut self) -> Result<()> {
self.send_to_radio(&encode_heartbeat()).await
}
/// Build our own `User` protobuf (id/long_name/short_name) for a NodeInfo
/// advert. Returns `None` until the handshake has learned our identity.
fn build_self_user(&self) -> Option<Vec<u8>> {
let mut user = Vec::new();
if let Some(id) = &self.user_id {
encode_len_field(1, id.as_bytes(), &mut user);
}
if let Some(long_name) = &self.long_name {
encode_len_field(2, long_name.as_bytes(), &mut user);
}
if let Some(short_name) = &self.short_name {
encode_len_field(3, short_name.as_bytes(), &mut user);
}
if user.is_empty() {
None
} else {
Some(user)
}
}
/// Actively advertise our identity over the air by broadcasting a NodeInfo
/// packet (our `User`) on the primary channel. Meshtastic radios otherwise
/// only emit NodeInfo on boot and every few hours, so without this two
/// already-running nodes can sit forever without discovering each other.
/// This is the Meshtastic analog of meshcore's periodic self-advert.
///
/// `want_response` solicits each neighbour to reply with its own NodeInfo —
/// use it on connect for immediate two-way discovery; leave it off for the
/// periodic beacon so a busy mesh doesn't trigger reply storms.
pub async fn send_nodeinfo_broadcast(&mut self, want_response: bool) -> Result<()> {
let Some(user) = self.build_self_user() else {
debug!("Meshtastic NodeInfo advert skipped — local identity not known yet");
return Ok(());
};
// Data { portnum(1)=NODEINFO_APP, payload(2)=User, want_response(3)? }
let mut data = Vec::new();
encode_varint_field_into(1, NODEINFO_APP as u64, &mut data);
encode_len_field(2, &user, &mut data);
if want_response {
encode_varint_field_into(3, 1, &mut data);
}
// MeshPacket { to(2)=BROADCAST (fixed32), decoded(4)=Data }. The firmware
// fills in `from` = our node-num when it transmits.
let mut packet = Vec::new();
encode_fixed32_field(2, BROADCAST_NUM, &mut packet);
encode_len_field(4, &data, &mut packet);
self.send_to_radio(&encode_to_radio_variant(TO_RADIO_PACKET, &packet))
.await
.context("Failed to send Meshtastic NodeInfo broadcast")?;
debug!(want_response, "Broadcast Meshtastic NodeInfo advert");
Ok(())
}
pub async fn send_channel_text(&mut self, _channel: u8, msg: &[u8]) -> Result<()> {
let text = String::from_utf8_lossy(msg);
let packet = encode_mesh_packet(BROADCAST_NUM, TEXT_MESSAGE_APP, text.as_bytes());
@ -339,12 +577,36 @@ impl MeshtasticDevice {
return Ok(Some(frame));
}
// Drain aggressively. Meshtastic firmware interleaves verbose debug-log
// text with protobuf frames on the same serial line, so a single small
// read per poll can fall behind the byte stream, overflow the OS serial
// buffer, and corrupt/drop inbound frames — which silently kills message
// reception while leaving sends working. Pull up to a bounded burst of
// bytes per call, decoding as soon as a complete frame appears.
let mut tmp = [0u8; READ_BUF_SIZE];
match tokio::time::timeout(Duration::from_millis(50), self.port.read(&mut tmp)).await {
Ok(Ok(0)) => anyhow::bail!("Meshtastic serial port closed"),
Ok(Ok(n)) => self.read_buf.extend_from_slice(&tmp[..n]),
Ok(Err(e)) => return Err(e).context("Meshtastic serial read error"),
Err(_) => return Ok(None),
for _ in 0..32 {
match tokio::time::timeout(Duration::from_millis(30), self.port.read(&mut tmp)).await {
Ok(Ok(0)) => anyhow::bail!("Meshtastic serial port closed"),
Ok(Ok(n)) => {
self.read_buf.extend_from_slice(&tmp[..n]);
if let Some(frame) = decode_serial_frame(&mut self.read_buf) {
return Ok(Some(frame));
}
// Bound memory if it's a pure-debug flood with no frames:
// keep only from the last possible frame-start marker.
if self.read_buf.len() > 64 * 1024 {
if let Some(pos) =
self.read_buf.windows(2).rposition(|w| w == [START1, START2])
{
self.read_buf.drain(..pos);
} else {
self.read_buf.clear();
}
}
}
Ok(Err(e)) => return Err(e).context("Meshtastic serial read error"),
Err(_) => break, // no more bytes available right now
}
}
Ok(decode_serial_frame(&mut self.read_buf))
@ -352,8 +614,14 @@ impl MeshtasticDevice {
fn handle_from_radio(&mut self, frame: &[u8]) -> Option<InboundFrame> {
let Some((field, value)) = decode_top_level_variant(frame) else {
debug!(
len = frame.len(),
head = %hex::encode(&frame[..frame.len().min(8)]),
"Meshtastic FromRadio frame did not decode to a known top-level field"
);
return None;
};
debug!(field, value_len = value.len(), "Meshtastic FromRadio field");
match field {
FROM_RADIO_MY_INFO => {
if let Some((node_num, user_id)) = parse_my_info(value) {
@ -369,6 +637,22 @@ impl MeshtasticDevice {
None
}
FROM_RADIO_PACKET => self.packet_to_inbound_frame(value),
FROM_RADIO_CONFIG => {
// Only the LoRa sub-config carries a region; other Config
// variants (device/position/…) return None and are ignored.
if let Some(region) = parse_config_lora_region(value) {
self.current_region = Some(region);
debug!(region, "Meshtastic LoRa region from device config");
}
None
}
FROM_RADIO_CHANNEL => {
if let Some((name, psk)) = parse_primary_channel(value) {
debug!(name = %name, psk_len = psk.len(), "Meshtastic primary channel from device");
self.current_primary_channel = Some((name, psk));
}
None
}
FROM_RADIO_CONFIG_COMPLETE_ID | FROM_RADIO_REBOOTED => None,
other => {
debug!(
@ -424,6 +708,12 @@ impl MeshtasticDevice {
if Some(from) == self.node_num {
return None;
}
info!(
from = format!("!{:08x}", from),
len = packet.payload.len(),
pki = packet.pki_encrypted,
"Meshtastic received text packet over the air"
);
// Record E2E status: a `pki_encrypted` packet (or one carrying the
// sender's `public_key`) proves this DM arrived end-to-end encrypted via
// the PKI, not the shared channel PSK. We learn the sender's key here too
@ -504,6 +794,116 @@ fn encode_heartbeat() -> Vec<u8> {
encode_to_radio_variant(TO_RADIO_HEARTBEAT, &[])
}
/// Extract `LoRaConfig.region` from a `Config` message, returning the region
/// code. Returns `Some(REGION_UNSET)` when the LoRa block is present but has no
/// region field (a fresh radio), and `None` when this Config carries a
/// non-LoRa variant (device/position/…) so the caller keeps the prior value.
fn parse_config_lora_region(data: &[u8]) -> Option<u32> {
let mut idx = 0;
while idx < data.len() {
let (field, value, next) = next_field(data, idx)?;
idx = next;
if field == CONFIG_LORA_FIELD {
if let FieldValue::Bytes(b) = value {
let mut j = 0;
let mut region = REGION_UNSET;
while j < b.len() {
let (lf, lv, ln) = next_field(b, j)?;
j = ln;
if lf == LORA_REGION_FIELD {
if let FieldValue::Varint(v) = lv {
region = v as u32;
}
}
}
return Some(region);
}
}
}
None
}
/// Extract `(name, psk)` from a `Channel` message, but only for the PRIMARY
/// channel (role == 1) — that's the one broadcasts ride on and whose hash must
/// match for two radios to decode each other. Returns `None` for secondary /
/// disabled channels so the caller keeps the primary it already learned.
fn parse_primary_channel(data: &[u8]) -> Option<(String, Vec<u8>)> {
let mut role = 0u64;
let mut name = String::new();
let mut psk = Vec::new();
let mut idx = 0;
while idx < data.len() {
let (field, value, next) = next_field(data, idx)?;
idx = next;
match (field, value) {
(3, FieldValue::Varint(v)) => role = v,
(2, FieldValue::Bytes(b)) => {
let mut j = 0;
while j < b.len() {
let (sf, sv, sn) = next_field(b, j)?;
j = sn;
match (sf, sv) {
(2, FieldValue::Bytes(p)) => psk = p.to_vec(),
(3, FieldValue::Bytes(n)) => {
name = String::from_utf8_lossy(n).to_string()
}
_ => {}
}
}
}
_ => {}
}
}
if role == CHANNEL_ROLE_PRIMARY {
Some((name, psk))
} else {
None
}
}
/// Derive the 32-byte channel PSK deterministically from the channel name, so
/// every archy node configured with the same `channel_name` converges on the
/// exact same primary channel (identical hash) and meshes automatically.
fn derive_channel_psk(channel_name: &str) -> Vec<u8> {
use sha2::{Digest, Sha256};
let mut hasher = Sha256::new();
hasher.update(b"archipelago-mesh:");
hasher.update(channel_name.as_bytes());
hasher.finalize().to_vec()
}
/// Map a Meshtastic `RegionCode` name (as set in `mesh-config.json`, e.g.
/// "EU_868", "US", "ANZ") to its protobuf enum value. Case-insensitive.
/// Returns `None` for an unrecognised name so we never write a bogus region.
fn region_name_to_code(name: &str) -> Option<u32> {
Some(match name.trim().to_uppercase().as_str() {
"UNSET" => 0,
"US" => 1,
"EU_433" => 2,
"EU_868" | "EU868" => 3,
"CN" => 4,
"JP" => 5,
"ANZ" => 6,
"KR" => 7,
"TW" => 8,
"RU" => 9,
"IN" => 10,
"NZ_865" => 11,
"TH" => 12,
"LORA_24" => 13,
"UA_433" => 14,
"UA_868" => 15,
"MY_433" => 16,
"MY_919" => 17,
"SG_923" => 18,
"PH_433" => 19,
"PH_868" => 20,
"PH_915" => 21,
"ANZ_433" => 22,
_ => return None,
})
}
fn encode_to_radio_variant(field: u64, bytes: &[u8]) -> Vec<u8> {
let mut out = Vec::new();
encode_len_field(field, bytes, &mut out);
@ -544,7 +944,11 @@ fn decode_top_level_variant(buf: &[u8]) -> Option<(u64, &[u8])> {
}
if matches!(
field,
FROM_RADIO_PACKET | FROM_RADIO_MY_INFO | FROM_RADIO_NODE_INFO
FROM_RADIO_PACKET
| FROM_RADIO_MY_INFO
| FROM_RADIO_NODE_INFO
| FROM_RADIO_CONFIG
| FROM_RADIO_CHANNEL
) {
return Some((field, &buf[idx..end]));
}

View File

@ -326,6 +326,14 @@ pub struct MeshConfig {
/// Channel name for broadcasts.
#[serde(default)]
pub channel_name: Option<String>,
/// Meshtastic LoRa region (e.g. "EU_868", "US", "ANZ"). Fresh-flashed
/// Meshtastic radios ship region-UNSET and are RF-silent until a region is
/// set, so archy provisions this region on connect to bring every node onto
/// the same band automatically (the parity equivalent of a meshcore radio
/// coming up on its configured band). Ignored for meshcore devices and when
/// unset/None.
#[serde(default)]
pub lora_region: Option<String>,
/// Whether to periodically broadcast our identity.
#[serde(default)]
pub broadcast_identity: bool,
@ -385,6 +393,7 @@ impl Default for MeshConfig {
enabled: false,
device_path: None,
channel_name: Some("archipelago".to_string()),
lora_region: None,
broadcast_identity: true,
advert_name: None,
mesh_only_mode: None,
@ -675,6 +684,8 @@ impl MeshService {
self.our_x25519_secret,
self.our_x25519_pubkey_hex.clone(),
self.server_name.clone(),
self.config.lora_region.clone(),
self.config.channel_name.clone(),
shutdown_rx,
cmd_rx,
);

View File

@ -8,8 +8,9 @@ pub mod runtime;
pub use bitcoin_simulator::{BitcoinSimulationMode, BitcoinSimulator};
pub use health_monitor::HealthMonitor;
pub use manifest::{
AppInterface, AppManifest, BuildConfig, ContainerConfig, Dependency, DerivedEnv, GeneratedFile,
GeneratedSecret, HealthCheck, HookStep, HostCopy, HostFacts, LifecycleHooks, ManifestError,
AppInterface, AppManifest, BuildConfig, ContainerConfig, Dependency, DerivedEnv, GeneratedCert,
GeneratedFile, GeneratedSecret, HealthCheck, HookStep, HostCopy, HostFacts, LifecycleHooks,
ManifestError,
ResolvedSource, ResourceLimits, SecretEnv, SecretGenKind, SecretsProvider, SecurityPolicy,
Volume,
};

View File

@ -223,6 +223,19 @@ pub struct ContainerConfig {
#[serde(default)]
pub generated_secrets: Vec<GeneratedSecret>,
/// Self-signed TLS certificates the orchestrator materialises before the
/// container is created (so a bind-mounted cert path resolves to a real
/// file, not a stale/missing path). Like `generated_secrets`, this keeps an
/// app data-driven: a service that needs a secure context (e.g. netbird's
/// dashboard — OIDC PKCE / `window.crypto.subtle` only works over HTTPS,
/// issue #15) declares the cert here instead of relying on per-app Rust.
/// Idempotent: an entry whose `crt` and `key` already exist is left
/// untouched. SAN/CN templates are rendered against host facts at apply time.
///
/// Example: `- { crt: /var/lib/archipelago/netbird/tls.crt, key: /var/lib/archipelago/netbird/tls.key }`
#[serde(default)]
pub generated_certs: Vec<GeneratedCert>,
/// Rootless-mapped UID:GID applied to the container's data directory
/// (the `bind`-mounted host path with `target` inside the container's
/// data root) before creation. Mirrors `SPEC_DATA_UID`.
@ -261,6 +274,11 @@ pub enum SecretGenKind {
Hex16,
/// 32 random bytes, lowercase hex (64 chars). Longer keys/cookies.
Hex32,
/// 32 random bytes, standard base64 (44 chars incl. padding). For services
/// that require a base64-encoded key rather than hex — e.g. netbird's relay
/// `authSecret` and the SQLite store `encryptionKey`, which base64-decode
/// their configured value (hex would decode to the wrong bytes).
Base64,
/// A random password and its bcrypt hash. `<name>` holds the bcrypt hash
/// (what a server is configured with); the plaintext is stored alongside as
/// `<name>.pw` for any client that must authenticate. `secret_env` injects
@ -282,12 +300,31 @@ impl GeneratedSecret {
/// (primary first). A consumer references one of these via `secret_env`.
pub fn target_files(&self) -> Vec<String> {
match self.kind {
SecretGenKind::Hex16 | SecretGenKind::Hex32 => vec![self.name.clone()],
SecretGenKind::Hex16 | SecretGenKind::Hex32 | SecretGenKind::Base64 => {
vec![self.name.clone()]
}
SecretGenKind::Bcrypt => vec![self.name.clone(), format!("{}.pw", self.name)],
}
}
}
/// A self-signed TLS certificate materialised by the orchestrator. See
/// [`ContainerConfig::generated_certs`]. `crt`/`key` are absolute host paths
/// (typically under `/var/lib/archipelago/<app>/`) that the container
/// bind-mounts read-only. `common_name` and `sans` are rendered against host
/// facts (`{{HOST_IP}}`) at apply time; when omitted they default to the
/// node's host IP plus `IP:127.0.0.1,DNS:localhost` so the cert is valid for
/// however the box is reached locally.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct GeneratedCert {
pub crt: String,
pub key: String,
#[serde(default)]
pub common_name: Option<String>,
#[serde(default)]
pub sans: Vec<String>,
}
fn default_pull_policy() -> String {
"if-not-present".to_string()
}
@ -665,6 +702,18 @@ impl AppManifest {
}
}
// generated_certs: crt/key must be non-empty absolute paths with no
// traversal (they become bind-mount sources, same safety bar as files).
for (i, c) in self.app.container.generated_certs.iter().enumerate() {
for (field, val) in [("crt", &c.crt), ("key", &c.key)] {
if val.is_empty() || !val.starts_with('/') || val.contains("..") {
return Err(ManifestError::Invalid(format!(
"container.generated_certs[{i}].{field} must be an absolute path with no '..', got '{val}'"
)));
}
}
}
// data_uid: if set, must look like "NNNNN:NNNNN".
if let Some(u) = &self.app.container.data_uid {
let parts: Vec<&str> = u.split(':').collect();
@ -1711,6 +1760,7 @@ app:
],
secret_env: vec![],
generated_secrets: vec![],
generated_certs: vec![],
data_uid: None,
};
let facts = HostFacts {
@ -1762,6 +1812,7 @@ app:
},
],
generated_secrets: vec![],
generated_certs: vec![],
data_uid: None,
};
let p = MapSecretsProvider {
@ -1799,6 +1850,7 @@ app:
secret_file: "bitcoin-rpc-password".to_string(),
}],
generated_secrets: vec![],
generated_certs: vec![],
data_uid: None,
};
let p = MapSecretsProvider {

View File

@ -121,10 +121,16 @@ impl PodmanClient {
"cryptpad" => "http://localhost:3003",
"penpot" => "http://localhost:9001",
"immich_server" | "immich" => "http://localhost:2283",
// Gitea publishes SSH (2222) and web (3001). Without a manifest on
// disk, extract_lan_address() returns whichever podman lists first —
// which can be the SSH port, breaking the launch. Pin the web UI.
"gitea" => "http://localhost:3001",
"nginx-proxy-manager" => "http://localhost:8081",
"fedimint-gateway" => "http://localhost:8176",
"endurain" => "http://localhost:8080",
"netbird" => "http://localhost:8087",
// HTTPS: netbird's dashboard needs a secure context for OIDC PKCE
// (window.crypto.subtle), so the proxy serves TLS on 8087 (issue #15).
"netbird" => "https://localhost:8087",
"electrs" | "archy-electrs-ui" => "http://localhost:50002",
_ => return None,
};
@ -275,10 +281,18 @@ impl PodmanClient {
// Build the container spec for the API
let mut port_mappings = Vec::new();
for port in &manifest.app.ports {
// Honour the manifest's protocol (default tcp). netbird's STUN port
// is 3478/udp; forcing tcp here would publish the wrong protocol and
// silently break relay discovery.
let protocol = match port.protocol.to_ascii_lowercase().as_str() {
"udp" => "udp",
"sctp" => "sctp",
_ => "tcp",
};
port_mappings.push(serde_json::json!({
"container_port": port.container,
"host_port": port.host,
"protocol": "tcp",
"protocol": protocol,
}));
}

View File

@ -1,18 +0,0 @@
# Copy to .env and adjust. Used by demo-deploy/docker-compose.yml.
# Registry host + namespace that holds the prebuilt demo images.
REGISTRY=146.59.87.168:3000/lfg2025
# Image tag to deploy (CI publishes :demo and :<git-sha>).
IMAGE_TAG=demo
# Host port for the demo UI.
DEMO_WEB_PORT=2100
# Optional — enables the in-app AI chat panel. Leave blank to disable.
ANTHROPIC_API_KEY=
# Optional sandbox tuning (defaults shown).
DEMO_SESSION_TTL_MS=2700000 # 45 min idle before a visitor session is reaped
DEMO_MAX_SESSIONS=500 # concurrent visitor cap
DEMO_FILE_QUOTA_BYTES=52428800 # 50 MB uploads per visitor

View File

@ -1,33 +0,0 @@
# Archipelago — Public Demo deploy
A click-to-play demo of the Archipelago UI, backed entirely by a mock backend.
Every visitor gets an **isolated, ephemeral sandbox** (own apps, wallet, files),
real container runtimes are never touched, and Bitcoin runs on **signet** test
coins. **Login password: `entertoexit`** (shown on the login screen).
This directory is the full contents of the public `archy-demo` repo. It holds no
source — only this compose file that pulls prebuilt `:demo` images.
## Deploy in Portainer
1. **Stacks → Add stack → Repository** (or paste `docker-compose.yml` into the web editor).
2. Set environment variables (see `.env.example`) — at minimum `REGISTRY`, and
`ANTHROPIC_API_KEY` if you want the AI chat panel.
3. Deploy. The UI is served on `:2100` (override with `DEMO_WEB_PORT`).
To pick up a new build, redeploy the stack (or wire the CI Portainer webhook).
## How it stays current
The images are built from the Archipelago monorepo by
`.github/workflows/demo-images.yml` on every change to `neode-ui/`, tagged `:demo`
and `:<git-sha>`, and pushed to `REGISTRY`. Editing the real UI → CI rebuilds →
redeploy here. No source lives in this repo.
## What's mocked
- **Per-visitor isolation** — state keyed by a `demo_sid` cookie, idle-reaped.
- **Apps** — install/uninstall/start/stop are simulated (no real Docker).
- **Wallet/Bitcoin** — signet-flavored; use the in-UI faucet for test sats.
- **Files** — real per-session upload/rename/delete, 50 MB quota, wiped on reap.
- **Intro** — replays once per calendar day per browser.

View File

@ -1,49 +0,0 @@
# Archipelago Public Demo — thin deploy stack
#
# This is the ENTIRE contents intended for the public `archy-demo` repo. It holds
# NO source — it pulls prebuilt `:demo` images that CI builds from the monorepo on
# every neode-ui change (see .github/workflows/demo-images.yml). Deploy this in
# Portainer ("deploy from repository" or paste into the web editor).
#
# Demo login password: entertoexit
# Access on http://<host>:2100
#
# Configure via a .env file (see .env.example):
# REGISTRY registry host/namespace holding the demo images
# IMAGE_TAG image tag to pull (default: demo)
# ANTHROPIC_API_KEY optional — enables the AI chat panel
# DEMO_WEB_PORT host port for the UI (default 2100)
services:
neode-backend:
image: ${REGISTRY:-146.59.87.168:3000/lfg2025}/archy-demo-backend:${IMAGE_TAG:-demo}
container_name: archy-demo-backend
environment:
DEMO: "1"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
NODE_OPTIONS: "--dns-result-order=ipv4first"
DEMO_SESSION_TTL_MS: ${DEMO_SESSION_TTL_MS:-2700000}
DEMO_MAX_SESSIONS: ${DEMO_MAX_SESSIONS:-500}
DEMO_FILE_QUOTA_BYTES: ${DEMO_FILE_QUOTA_BYTES:-52428800}
expose:
- "5959"
dns:
- 8.8.8.8
- 1.1.1.1
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:5959/health"]
interval: 30s
timeout: 10s
retries: 3
neode-web:
image: ${REGISTRY:-146.59.87.168:3000/lfg2025}/archy-demo-web:${IMAGE_TAG:-demo}
container_name: archy-demo-web
ports:
- "${DEMO_WEB_PORT:-2100}:80"
environment:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
depends_on:
- neode-backend
restart: unless-stopped

View File

@ -1,22 +0,0 @@
# Curated demo files
Drop real files into `demo/files/` to make them the cloud's content for **every**
demo visitor (read-only — visitors can browse, download, and "buy" them, but only
maintainers add them). This is the "private login": the only way to add files is
to commit them here, which requires repo access.
```
demo/files/
Documents/whitepaper.pdf
Photos/rig.jpg
Music/track.mp3
```
- Folder structure becomes the cloud's folders.
- Text files (`.md .txt .json .csv …`, < 1 MB) are inlined; everything else is
streamed from disk on download.
- If `demo/files/` is empty, the demo falls back to the built-in seeded set
(Documents/Photos/Music/Videos with sample content).
After adding files, commit and push — CI rebuilds the `:demo` image and Portainer
redeploys. Keep the total modest (these load into the demo image).

View File

@ -14,31 +14,6 @@
<link rel="icon" href="/aiui/favicon.svg" type="image/svg+xml" />
<link rel="apple-touch-icon" href="/aiui/apple-touch-icon-180x180.png" />
<title>AIUI</title>
<!-- Demo (?seed): pre-load the example "Content Showcase" conversation into
AIUI's IndexedDB so the chat history isn't empty (live chat is disabled
in the demo and points users to these previous chats). Mirrors the app's
own /seed exactly by calling its seedPromptsToConversation(). -->
<script type="module">
(async () => {
try {
if (!new URLSearchParams(location.search).has('seed')) return;
const db = await new Promise((res, rej) => {
const r = indexedDB.open('aiui-store', 1);
r.onupgradeneeded = (e) => { const d = e.target.result; if (!d.objectStoreNames.contains('conversations')) d.createObjectStore('conversations', { keyPath: 'id' }); };
r.onsuccess = () => res(r.result); r.onerror = () => rej(r.error);
});
const exists = await new Promise((res) => {
try { const q = db.transaction('conversations', 'readonly').objectStore('conversations').getKey('seed-all'); q.onsuccess = () => res(!!q.result); q.onerror = () => res(false); }
catch { res(false); }
});
if (exists) return;
const { seedPromptsToConversation } = await import('/aiui/assets/seedPrompts-CLWaUv28.js');
const conv = seedPromptsToConversation();
await new Promise((res, rej) => { const t = db.transaction('conversations', 'readwrite'); t.objectStore('conversations').put(conv); t.oncomplete = () => res(); t.onerror = () => rej(t.error); });
try { localStorage.setItem('aiui-active-conversation', conv.id); } catch {}
} catch (e) { console.warn('[demo] AIUI seed bootstrap failed', e); }
})();
</script>
<script type="module" crossorigin src="/aiui/assets/index-Lh5NfTCq.js"></script>
<link rel="stylesheet" crossorigin href="/aiui/assets/index-CHQ7uqBj.css">
<link rel="manifest" href="/aiui/manifest.webmanifest"><script id="vite-plugin-pwa:register-sw" src="/aiui/registerSW.js"></script></head>

View File

View File

@ -1,13 +1,6 @@
# Archipelago Public Demo Stack - Mock backend + Vue UI + AIUI Chat
# Deploy via Portainer: Web editor -> paste this, or deploy from repo (build).
# Access at http://localhost:2100
#
# This builds the demo images from source. For the separated, auto-updating
# deploy that pulls prebuilt :demo images, see demo-deploy/docker-compose.yml.
#
# DEMO=1 turns on the public multi-visitor sandbox: each visitor gets an
# isolated, ephemeral copy of all state; real container runtimes are never
# touched; the shared login password is "entertoexit".
# Archipelago Demo Stack - Mock backend + Vue UI + AIUI Chat
# Deploy via Portainer: Web editor -> paste this, or deploy from repo
# Access at http://localhost:4848
#
# Required: Set ANTHROPIC_API_KEY in environment or .env file for chat to work
# IndeedHub is deployed as a separate Portainer stack (indee-demo repo)
@ -19,13 +12,9 @@ services:
dockerfile: neode-ui/Dockerfile.backend
container_name: archy-demo-backend
environment:
DEMO: "1"
VITE_DEV_MODE: "existing"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
NODE_OPTIONS: "--dns-result-order=ipv4first"
# Optional tuning (defaults shown):
# DEMO_SESSION_TTL_MS: "2700000" # 45 min idle before a session is reaped
# DEMO_MAX_SESSIONS: "500" # concurrent visitor cap
# DEMO_FILE_QUOTA_BYTES: "52428800" # 50 MB uploads per visitor
expose:
- "5959"
dns:
@ -42,11 +31,9 @@ services:
build:
context: .
dockerfile: neode-ui/Dockerfile.web
args:
VITE_DEMO: "1"
container_name: archy-demo-web
ports:
- "2100:80"
- "4848:80"
depends_on:
- neode-backend
restart: unless-stopped

View File

@ -0,0 +1,14 @@
# Archipelago mempool frontend — adds a resilient nginx backend proxy.
#
# The only delta vs the upstream image is /patch/entrypoint.sh, which rewrites
# the generated nginx-mempool.conf to use `resolver` + a variable proxy_pass so
# the frontend re-resolves the backend (mempool-api) via DNS on every request.
# Without this, nginx pins the backend IP at startup and serves 502 / "offline"
# after any backend restart (podman reassigns the IP). See the script header.
ARG BASE=146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0
FROM ${BASE}
# --chmod keeps the exec bit (build runs as USER 1000, plain COPY lands root:0644
# → "not executable"). Base USER/ENTRYPOINT/CMD (1000 / /patch/entrypoint.sh /
# nginx -g "daemon off;") are inherited unchanged.
COPY --chmod=0755 entrypoint.sh /patch/entrypoint.sh

View File

@ -0,0 +1,137 @@
#!/bin/sh
__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__=${BACKEND_MAINNET_HTTP_HOST:=127.0.0.1}
__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__=${BACKEND_MAINNET_HTTP_PORT:=8999}
__MEMPOOL_FRONTEND_HTTP_PORT__=${FRONTEND_HTTP_PORT:=8080}
CONF=/etc/nginx/conf.d/nginx-mempool.conf
# ─── archipelago patch ────────────────────────────────────────────────────
# The stock frontend writes `proxy_pass http://<backend>:8999` with a literal
# hostname and NO resolver, so nginx resolves the backend IP ONCE at worker
# start and caches it for the process lifetime. Podman reassigns the backend
# container's IP whenever it is restarted/recreated (gate, OTA, crash, reboot
# re-IPAM), after which nginx keeps proxying to the dead IP → /api hangs, the
# websocket 502s, and the mempool UI shows "offline" until nginx is reloaded.
#
# Fix: force per-request DNS re-resolution via `resolver` + a variable in
# proxy_pass. Because a variable in proxy_pass disables nginx's automatic
# location→URI rewriting, each block is rewritten to preserve its original
# path mapping exactly:
# /api/v1/ws, /ws → "/" (var + "/" replaces the whole URI)
# /api/v1 → identity (no-URI proxy_pass passes $uri unchanged)
# /api/ → /api/v1/$1 (explicit rewrite, then no-URI proxy_pass)
# Operates on the __PLACEHOLDER__ tokens so the host/port sed below fills in
# the concrete values (incl. the `set $mp_backend` line). Idempotent.
# Resolver address: podman's aardvark-dns answers on the network gateway
# (e.g. 10.89.0.1), NOT Docker's 127.0.0.11. Read it from resolv.conf so this
# works on any podman network/subnet (and still falls back for Docker).
ARCHY_RESOLVER=$(awk '/^nameserver/ { print $2; exit }' /etc/resolv.conf 2>/dev/null)
ARCHY_RESOLVER=${ARCHY_RESOLVER:-127.0.0.11}
if ! grep -q 'set \$mp_backend' "$CONF"; then
awk -v res_addr="$ARCHY_RESOLVER" '
BEGIN { res = 0 }
/^[[:space:]]*location / && res == 0 {
print "\tresolver " res_addr " valid=10s ipv6=off;"
res = 1
}
/proxy_pass http:\/\/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__\/;/ {
print "\t\tset $mp_backend __MEMPOOL_BACKEND_MAINNET_HTTP_HOST__;"
print "\t\tproxy_pass http://$mp_backend:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__/;"
next
}
/proxy_pass http:\/\/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__\/api\/v1\/;/ {
print "\t\tset $mp_backend __MEMPOOL_BACKEND_MAINNET_HTTP_HOST__;"
print "\t\trewrite ^/api/(.*)$ /api/v1/$1 break;"
print "\t\tproxy_pass http://$mp_backend:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__;"
next
}
/proxy_pass http:\/\/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__\/api\/v1;/ {
print "\t\tset $mp_backend __MEMPOOL_BACKEND_MAINNET_HTTP_HOST__;"
print "\t\tproxy_pass http://$mp_backend:__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__;"
next
}
{ print }
' "$CONF" > "$CONF.archy" && mv "$CONF.archy" "$CONF"
fi
# ─── end archipelago patch ────────────────────────────────────────────────
sed -i "s/__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__/${__MEMPOOL_BACKEND_MAINNET_HTTP_HOST__}/g" /etc/nginx/conf.d/nginx-mempool.conf
sed -i "s/__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__/${__MEMPOOL_BACKEND_MAINNET_HTTP_PORT__}/g" /etc/nginx/conf.d/nginx-mempool.conf
cp /etc/nginx/nginx.conf /patch/nginx.conf
sed -i "s/__MEMPOOL_FRONTEND_HTTP_PORT__/${__MEMPOOL_FRONTEND_HTTP_PORT__}/g" /patch/nginx.conf
cat /patch/nginx.conf > /etc/nginx/nginx.conf
if [ "${LIGHTNING_DETECTED_PORT}" != "" ];then
export LIGHTNING=true
fi
# Runtime overrides - read env vars defined in docker compose
__MAINNET_ENABLED__=${MAINNET_ENABLED:=true}
__TESTNET_ENABLED__=${TESTNET_ENABLED:=false}
__TESTNET4_ENABLED__=${TESTNET_ENABLED:=false}
__SIGNET_ENABLED__=${SIGNET_ENABLED:=false}
__LIQUID_ENABLED__=${LIQUID_ENABLED:=false}
__LIQUID_TESTNET_ENABLED__=${LIQUID_TESTNET_ENABLED:=false}
__ITEMS_PER_PAGE__=${ITEMS_PER_PAGE:=10}
__KEEP_BLOCKS_AMOUNT__=${KEEP_BLOCKS_AMOUNT:=8}
__NGINX_PROTOCOL__=${NGINX_PROTOCOL:=http}
__NGINX_HOSTNAME__=${NGINX_HOSTNAME:=localhost}
__NGINX_PORT__=${NGINX_PORT:=8999}
__BLOCK_WEIGHT_UNITS__=${BLOCK_WEIGHT_UNITS:=4000000}
__MEMPOOL_BLOCKS_AMOUNT__=${MEMPOOL_BLOCKS_AMOUNT:=8}
__BASE_MODULE__=${BASE_MODULE:=mempool}
__ROOT_NETWORK__=${ROOT_NETWORK:=}
__MEMPOOL_WEBSITE_URL__=${MEMPOOL_WEBSITE_URL:=https://mempool.space}
__LIQUID_WEBSITE_URL__=${LIQUID_WEBSITE_URL:=https://liquid.network}
__MINING_DASHBOARD__=${MINING_DASHBOARD:=true}
__LIGHTNING__=${LIGHTNING:=false}
__AUDIT__=${AUDIT:=false}
__MAINNET_BLOCK_AUDIT_START_HEIGHT__=${MAINNET_BLOCK_AUDIT_START_HEIGHT:=0}
__TESTNET_BLOCK_AUDIT_START_HEIGHT__=${TESTNET_BLOCK_AUDIT_START_HEIGHT:=0}
__SIGNET_BLOCK_AUDIT_START_HEIGHT__=${SIGNET_BLOCK_AUDIT_START_HEIGHT:=0}
__ACCELERATOR__=${ACCELERATOR:=false}
__ACCELERATOR_BUTTON__=${ACCELERATOR_BUTTON:=true}
__SERVICES_API__=${SERVICES_API:=https://mempool.space/api/v1/services}
__PUBLIC_ACCELERATIONS__=${PUBLIC_ACCELERATIONS:=false}
__HISTORICAL_PRICE__=${HISTORICAL_PRICE:=true}
__ADDITIONAL_CURRENCIES__=${ADDITIONAL_CURRENCIES:=false}
# Export as environment variables to be used by envsubst
export __MAINNET_ENABLED__
export __TESTNET_ENABLED__
export __TESTNET4_ENABLED__
export __SIGNET_ENABLED__
export __LIQUID_ENABLED__
export __LIQUID_TESTNET_ENABLED__
export __ITEMS_PER_PAGE__
export __KEEP_BLOCKS_AMOUNT__
export __NGINX_PROTOCOL__
export __NGINX_HOSTNAME__
export __NGINX_PORT__
export __BLOCK_WEIGHT_UNITS__
export __MEMPOOL_BLOCKS_AMOUNT__
export __BASE_MODULE__
export __ROOT_NETWORK__
export __MEMPOOL_WEBSITE_URL__
export __LIQUID_WEBSITE_URL__
export __MINING_DASHBOARD__
export __LIGHTNING__
export __AUDIT__
export __MAINNET_BLOCK_AUDIT_START_HEIGHT__
export __TESTNET_BLOCK_AUDIT_START_HEIGHT__
export __SIGNET_BLOCK_AUDIT_START_HEIGHT__
export __ACCELERATOR__
export __ACCELERATOR_BUTTON__
export __SERVICES_API__
export __PUBLIC_ACCELERATIONS__
export __HISTORICAL_PRICE__
export __ADDITIONAL_CURRENCIES__
folder=$(find /var/www/mempool -name "config.js" | xargs dirname)
echo ${folder}
envsubst < ${folder}/config.template.js > ${folder}/config.js
exec "$@"

View File

@ -1,11 +1,13 @@
# 🚩 PRODUCTION MASTER PLAN — Archipelago App Platform & Registry
# PRODUCTION MASTER PLAN — Archipelago App Platform & Registry
> **THIS IS THE AUTHORITATIVE PLAN. Agents: read this first and keep it open until
> the production test gate (§5) is green.** It overrides ad-hoc direction and
> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove
> the priority banner and demote this doc.
> **✅ SINGLE-NODE PRODUCTION GATE IS GREEN (2026-06-23): `run-gate.sh` 5/5 on .228, 0 failures.**
> This remains the authoritative plan for the broader north star (manifest-driven
> platform, registry-distributed manifests, external marketplace), but it is no
> longer a hard priority banner blocking all other work. Remaining workstreams are
> in §6 / §8b. Next exit-criteria: multinode (`docs/multinode-testing-plan.md`) +
> workstreams B/C/D.
>
> Last updated: 2026-06-22 · Binary: v1.7.99-alpha · See §8b for the live resume.
> Last updated: 2026-06-26 · zombie-container guard + gitea launch-port fix shipped, binary `040df5ce` rolled to the fleet (see §8b SESSION h). Prior: orchestrator Fix A+B (`a721532f`/`e0343137`) deployed + proven.
---
@ -40,7 +42,8 @@ real nodes. Until then, this plan is the priority.
- **Migrations never destroy data.** Preserve `/var/lib/archipelago/<app>`,
generated secrets, displayed credentials, public ports, and adoption container
names. Always provide a rollback path. Stop/recreate only when necessary.
- **Verify on a real node (.228, then .198) before any tag.**
- **Verify on the real node .228 before any tag.** (Fleet/multinode verification is
a separate pass → `docs/multinode-testing-plan.md`.)
## 3. Current state (2026-06-21)
@ -56,7 +59,7 @@ real nodes. Until then, this plan is the priority.
- **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`,
`-fedimint-ui`) build from `docker/<name>` contexts via `companion.rs`, not the
manifest registry — a later phase folds them in.
- **No app has passed the formal production gate (5× for now, was 20×).** That is the blocker.
- **No app has passed the formal production gate.** That is the blocker.
## 4. Workstreams (each links its authoritative detail doc)
@ -66,7 +69,8 @@ real nodes. Until then, this plan is the priority.
| B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet |
| C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending |
| D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) |
| E | **Production test gate** — 5× lifecycle on .228 + .198 (for now; was 20×), per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** |
| E | **Production test gate** — 5× lifecycle on **.228**, per-app L1/L2 matrix; multinode is split out → `multinode-testing-plan.md` | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **✅ .228 5×-GREEN (110/110 ×5, 0 not-ok, 2026-06-23)** — but this is DESTRUCTIVE-tier / ~8 core apps only; see §6c for the coverage gaps |
| F | **Lifecycle perfection — cascade + progress + ALL apps** — extend the gate to uninstall/reinstall (cascade), real install/uninstall progress UI, and EVERY installed app (not just the 8 core). The "insanely-perfect OS/container environment" bar. | §6c (below), `tests/lifecycle/TESTING.md` | **IN PROGRESS (2026-06-26)** — root bug FIXED: uninstall could hang → ghost/stuck-bar/reinstall-block (`71cc9ac4`, unbounded systemctl/podman in `quadlet::disable_remove`); `cascade-uninstall.bats` **7/7 green on .228** w/ binary `ae349a75`. Remaining: wire CASCADE into the canonical gate run, progress-UI truthfulness, all-apps matrix, guardian/IBD state. |
**Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md`
(ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption
@ -75,13 +79,23 @@ modes FM1FM6 + the desired-state-first reconciler that fixes them).
## 5. Production test gate (exit criterion)
An app is **production-ready** only when `tests/lifecycle/run-20x.sh` is green
An app is **production-ready** only when `tests/lifecycle/run-gate.sh` is green
across the full matrix — install / UI-reachable / stop / start / restart /
reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall —
**5× on .228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from
20× — restore to 20× before the final ship). All 8 gate checkboxes in `tests/lifecycle/TESTING.md`
are currently unchecked. Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps,
L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated coverage.
**5× on .228** (`ARCHY_ITERATIONS=5`). **The gate runs ON the node** (it uses local
podman/systemctl/bitcoin probes; running it via RPC from another host silently
tests the runner). **Multinode / fleet verification (.198 + others) is a SEPARATE
plan — `docs/multinode-testing-plan.md` — NOT part of this single-node criterion.**
Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps, L2 UI ● dashboard +
proxies; L3 survival ◐; ~30 apps have zero automated coverage.
> ⚠️ **The 2026-06-23 5×-green is NOT the full bar.** `run-gate.sh` runs only the
> **DESTRUCTIVE tier** (stop/start/restart/survive) over ~8 core apps; it **skips
> uninstall/reinstall** (CASCADE is gated behind `ARCHY_ALLOW_CASCADE_DESTRUCTIVE`,
> never set by the gate) and tests no install/uninstall **progress UI**. Real
> uninstall/reinstall/progress bugs (immich + grafana) were found in manual testing
> right after — see **§6c (workstream F)** for the gap and the expanded-gate plan.
> The true "every app, fully" criterion is F's definition-of-done, not this run.
## 6. Immediate sequence (live workstream)
@ -97,14 +111,118 @@ L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated cov
data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)*
4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide)
for the podman-`--restart` path. *(f160e0c4)*
5. ◻ **Verify on .198** (immich migration validated on .228 only so far).
6. ◻ **E** — run the 5× gate (`ARCHY_ITERATIONS=5`, was 20×); fix until green.
7. ◻ Demote this banner.
5. ✅ **E** — 5× gate on **.228** (`ARCHY_ITERATIONS=5`) is **GREEN: 5/5, 0 not-ok**
(2026-06-23). Two real orchestrator bugs were found + fixed en route (package.stop
per-app grace; package.restart phantom stack-member injection → `order_present_containers`,
commit 92d7f52d) plus two single-shot-read probes hardened (bitcoin-knots state, immich
lan_address). The single-node criterion is met.
6. ✅ Banner demoted (this doc, 2026-06-23). Next: multinode pass + workstreams B/C/D.
**Multinode / fleet verification (.198 and the rest) is split into its own plan:**
`docs/multinode-testing-plan.md`. Do it AFTER the .228 single-node gate is green.
**Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the
published catalog (then sign) to actually distribute manifests via the registry;
Phase-3 `use_quadlet_backends` rollout so orchestrator backends are Quadlet (not
just podman-`--restart`); immich on .198.
just podman-`--restart`).
## 6b. Post-deploy task order (agreed 2026-06-23)
After the 2026-06-23 multinode test deploy (latest backend + UX frontend to .116/.198/.228
+ Tailscale testers), do these IN ORDER:
1. **netbird #20 ph4** — the last real manifest migration (workstream A).
2. **Phase-3 `use_quadlet_backends`** — orchestrator backends become Quadlet units.
3. **§6c Lifecycle perfection** (workstream F) — the comprehensive uninstall/reinstall +
progress-UI + all-apps gate expansion below.
## 6c. Lifecycle perfection — what "green" MISSED (workstream F, the perfection bar)
**Why this exists:** the 2026-06-23 single-node gate went 5×-green but is **NOT** the
"every app fully lifecycle-tested" guarantee a user reasonably assumes. The canonical gate
(`run-gate.sh`) only runs the **DESTRUCTIVE tier** (stop / start / restart / survive) over
**~8 core apps** (bitcoin-knots, btcpay, electrumx, lnd, mempool, immich, fedimint,
filebrowser). It explicitly **SKIPS uninstall/reinstall** (the CASCADE tier is gated behind
`ARCHY_ALLOW_CASCADE_DESTRUCTIVE`, which `run-gate.sh` never sets) and has **zero coverage**
for the other ~30 apps (grafana, jellyfin, vaultwarden, penpot, nextcloud, photoprism,
uptime-kuma, homeassistant, … — see `app-registry-status-2026-06-21.md`). So uninstall,
reinstall, install-progress UI, and most apps were never under test.
**Real bugs found in manual multinode testing on .198 (2026-06-23) — the motivating evidence:**
- **Uninstall is broken for immich + grafana:** takes very long, the progress bar sits at a
**solid full-red with no real progression**, and the app **does not actually uninstall**
it still appears in **My Apps** afterward (ghost entry / state not cleared).
- **grafana reinstall just stops** partway (no completion, no clear error).
- **fedimint guardian** suddenly showed **"starting up — Guardian opens a wait page until
Bitcoin finishes initial sync" / "starting"** on that node — verify this is correct
wait-for-IBD behavior vs a stuck/false state (it's a backend that depends on bitcoin sync).
**✅ 2026-06-26 — root cause of the immich/grafana uninstall trio FOUND + FIXED (`71cc9ac4`).**
Single cause: `quadlet::disable_remove()` (first op in uninstall teardown, via companion +
orchestrator) ran `systemctl --user stop` / `daemon-reload` / `podman rm -f` with **no timeout**.
On rootless podman a generated unit can wedge "deactivating" while podman hangs → `systemctl stop`
blocks forever → the spawned uninstall task returns neither Ok nor Err, so (a) `set_uninstall_stage`
never fires → **frozen full-red bar**, (b) `remove_package_state_entry` never runs → **ghost stuck in
`Removing`**, (c) the install guard rejects reinstall (`already Removing`). The spawn wrapper already
reverts state on Err/removes on Ok — only a *hang* stranded it. Fix bounds all three calls
(stop→`QUADLET_STOP_TIMEOUT` + SIGKILL/reset-failed escalation; daemon-reload→30s; podman rm→timeout).
**Validated live: `cascade-uninstall.bats` 7/7 on .228** (binary `ae349a75`) — grafana install →
uninstall (no ghost, data dir gone) → reinstall → running → cleanup. NOTE: proves the happy path +
no-regression; the original hang was load/timing-induced and not separately reproduced.
**Workstream F scope — the gate must grow to (in priority order):**
1. **CASCADE tier in the canonical gate:** uninstall → verify the app is GONE from My Apps /
`container-list` / package state (no ghost), data preserved per policy, then reinstall →
verify it returns healthy. Catch the immich/grafana ghost + reinstall-stops bugs.
*(✅ DONE `b7d92107`: `run-gate.sh` now runs ONE cascade pass after the 5× loop when
`ARCHY_GATE_CASCADE=1` (+`ARCHY_ALLOW_DESTRUCTIVE=1`), counted into the tally — opt-in so default
behavior is unchanged, and deliberately NOT folded into all 5 iterations. `cascade-uninstall.bats`
7/7 on .228. Next: extend cascade coverage beyond the single throwaway app to the multi-container
stacks, e.g. an immich/btcpay cascade variant.)*
2. **Progress-UI assertions:** install AND uninstall must report monotonic, truthful progress
(not a stuck full-red bar); a long op must surface a real stage/percentage and a terminal
success/failure — no silent hang. (Likely both a backend progress-event fix AND a UI fix.)
*(✅ 2026-06-26 `9f17ba68`: the "stuck full-red bar" was `AppCard.vue` hardcoding the uninstall
bar to `w-full bg-red-400/60 animate-pulse` — solid, full, red, fake-pulse. Now derives a real
percentage from the backend's existing `uninstall-stage` label ("Stopping containers (X/N)"→1050%,
"Cleaning up volumes"→70%, "Removing app data"→90%) and renders like install (neutral fill, real
width+%, shimmer). FE built `index-DtZyZomC.js`, rolled to .228/.116/.198/.89 (+.88/.5/.120).
STILL TODO: a bats/UI assertion that the bar is monotonic + lands on a terminal state; possibly a
backend numeric-progress field so the UI doesn't parse stage strings.)*
3. **ALL-apps coverage:** a generic per-app lifecycle matrix (install / UI-reach / stop / start /
restart / uninstall / reinstall / reboot-survive) driven by the manifest set, so grafana and
the ~30 uncovered apps are gated too — not just the 8 core. Manifest-driven, so new apps are
covered automatically.
*(✅ 2026-06-26 `43934eef`: `bats/all-apps-lifecycle.bats` — DESTRUCTIVE counterpart to the
read-only `all-apps-matrix.bats`. Discovers the app set from My Apps ∩ the node `catalog.json`;
drives stop/start/restart for every app and, under `ARCHY_ALLOW_CASCADE_DESTRUCTIVE`, a FULL
teardown (uninstall→no-ghost→reinstall) with the catalog `{dockerImage, containerConfig}` as the
reinstall spec. PROTECTED (never touched): bitcoin*/electrum* (resync cost) + lnd/btcpay*/fedimint*
(irreversible wallet loss — user asked to protect only bitcoin+electrum; wallet apps added for
safety, override via `ARCHY_MATRIX_PROTECT`). Validated on .228 (discovery + 1-app lifecycle
green). HEAVY/destructive → a supervised pass on LAN nodes (.116/.198/.228), NOT folded into
run-gate. Invoke: `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1 ARCHY_PASSWORD=…
ARCHY_SCHEME=https bats bats/all-apps-lifecycle.bats`.)*
**✅ FIRST FULL DESTRUCTIVE RUN on .228 (2026-06-26):** lifecycle **11/11 clean**; teardown
**8/11** (immich 3-container stack incl.) — and it surfaced **3 real reinstall bugs** (the payoff):
1. **fresh-install bind-dir ownership = root:root** → EACCES on reinstall (jellyfin `/config`
denied exit 139; netbird-server can't open its SQLite store). Fix B's chown-to-parent only
runs on the reconcile path, **not** `package.install`. The important orchestrator fix.
2. **netbird reinstall adopts leftover containers → skips the manifest cert/file render**
(tls.crt/key/nginx.conf never written → proxy can't start → app reads absent). Only a fully
clean reinstall renders them.
3. **portainer image pin `lfg2025/portainer:2.19.4` is `manifest unknown`** (never pushed to the
registry) and the pin OVERRIDES the RPC dockerImage → portainer is un(re)installable
fleet-wide. Registry/catalog data bug (push the image or change the pin).
.228 restored (jellyfin+netbird via manual chown / clean reinstall; all installed apps running,
28 ctrs; portainer left uninstalled — uninstallable until #3 fixed). TODO: fix #1 (extend chown
to install path) + #2 + #3; add reboot-survive + UI-reach per app to the matrix.
4. **Guardian/IBD-dependent states:** assert that "waiting for bitcoin sync"-style states are a
legitimate, surfaced wait (with a path to ready) and never a permanent stuck state.
**Definition of done for F:** the expanded gate (CASCADE + progress + all-apps) is 5×-green on
.228, then re-verified across the multinode fleet — i.e. an *insanely-perfect* OS/container
environment where every app installs, runs, updates, uninstalls, and reinstalls cleanly with
honest progress, no ghosts, no data loss, reboot-survivable.
## 7. Release blockers & operational gotchas (durable)
@ -141,6 +259,32 @@ Beta Live (public). Hardening priorities feeding the gate:
- **P1** LUKS2 full-partition encryption for `/var/lib/archipelago/`
(AES-256-XTS, Argon2id, key from setup password + hardware salt).
- **P1** Meshtastic plug-and-play parity with MeshCore.
- **P1 ✅ CODE-COMPLETE** (branch `companion-mobile-ux`, 2026-06-23; needs
on-device + mobile-web verification before merge to `main`) — Mobile app-launch
UX — drop the "this app opens in a tab" interstitial.
Two surfaces (both: no interstitial screen, launch the app directly):
- **Companion app (Android):** open **every** app in the **in-app WebView**
(not just non-iframeable ones) — *and* carry the current mobile-iframe footer
controls into the WebView (back/forward/reload/close — good, useful UX).
- **Mobile web browser (PWA):** open tab-apps directly in a **new browser tab**.
Touch points: `neode-ui/src/stores/appLauncher.ts`, `AppLauncherOverlay.vue`,
the Android in-app WebView bridge, and the mesh-mobile iframe footer controls.
(Reference prior work: `b5a9deb8` in-app webview for non-iframeable apps,
`d1fbcd9b` "open in browser" via native bridge.)
- **✅ Done (branch `companion-mobile-ux`):** mobile launches now use the
store-driven panel (no route push) so the background tab no longer changes and
closing returns you where you launched; tab-only apps open directly (in-app
WebView on companion via `openInApp`, new browser tab on PWA) with **no
interstitial**; the Android `InAppBrowser` (`WebViewScreen.kt`) gained a bottom
footer bar (back/forward/reload/open-in-browser/close) + a centered loading
screen (favicon + progress); a shared `AppLoadingScreen` (icon + progress)
replaced the black/spinner loaders on the app session **and** legacy iframe
overlay; the dashboard is pinned to `100dvh` on mobile so the mesh chat/tools
panes stop sliding under the tab bar in mobile browsers (no-op in companion);
ElectrumX shows its real icon in My Apps. Companion APK bumped to **v0.4.7**
(versionCode 11) with a committed shared debug keystore so updates install
without an uninstall. **Not yet:** merge to `main`; publish the 0.4.7 companion
download (deferred until the gate work lands so they ship together).
**Post-beta (deferred — do not start until gate is green):** P2P encrypted
voice/video (WebRTC over federation via Tor); watch-only wallet + mesh BTC
@ -148,14 +292,271 @@ hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.
Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash
phases 26 (`dual-ecash-design.md`).
## 8b. SESSION STATE + RESUME (updated 2026-06-22) — READ THIS FIRST ON RESUME
## 8b. SESSION STATE + RESUME (updated 2026-06-26) — READ §8b "CURRENT STATE + RESUME" FIRST
### ▶ SESSION h (2026-06-26) — LATEST, RESUME FROM HERE
**Canonical resume detail: memory `project_session_resume_2026_06_23b` (▶️ top of MEMORY.md).**
Local main = `670ebb06` (3 commits past the previously-pushed `43e70049`: `0a8db904` zombie
guard + `670ebb06` gitea launch-port fix; `43e70049` webview was already pushed). **Combined
release binary `040df5ce2551d17b` rolled to the fleet.** Binary+FE not in git — rebuild on a
fresh machine (`cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`).
**DONE this session:**
1. ✅ **Zombie-container guard** (`0a8db904`) — the reconciler's Running branch now verifies a
container's `State.Pid` is alive (`/proc/<pid>` exists) before trusting podman's "Up"; on a
concrete dead PID it stop+remove+`install_fresh` from the manifest. Conservative: any
uncertainty (inspect fail / unparseable PID) assumes alive, so a transient hiccup never
destroys a healthy container. Fixes the class that broke NetBird login on .228 (dashboard
"Up" w/ dead PID → proxy 502, no host port → reconciler never recovered it). Unit test +
**live-proven on .228**: synthetic zombie on `jellyfin` (killed conmon+PID → podman still
"Up") → guard logged `…process is dead (zombie) — recreating app_id=jellyfin` → recreated →
settled to NoOp. **Zero false-positives across the other 33 healthy containers.**
2. ✅ **Gitea launch-port fix** (`670ebb06`) — gitea launched at **:2222 (SSH)** instead of
**:3001 (web)** on nodes without the gitea manifest on disk (`manifest_lan_address_for`
returns None → fell through to `extract_lan_address`, which returns podman's first-listed
port; podman lists `2222->22` before `3001->3000`). Added `"gitea" => http://localhost:3001`
to the static `lan_address_for` map (`core/container/src/podman_client.rs`) like every other
core app. Reported on tailscale node **100.82.34.38** — that node still needs the new binary
(or a refreshed gitea manifest) to pick it up.
3. ✅ **Rolled `040df5ce`** to .228/.116/.198/.89 (verified sha+active); .88/.5/.120 rolling.
**OPEN follow-ups (logged, NOT regressions):**
- **mempool env-drift recreate-loop on .228** — reconciler logs `container env drift detected —
recreating app_id=mempool` every ~30-90s, never converges (pre-existing; the known mempool
nginx stale-IP class, [[project_mempool_nginx_stale_ip_fix]]). mempool stays running but churns.
- **nostr-rs-relay** stuck "Stopping" + ~2s create-loop on .228 (from session g).
**NEXT:** finish .88/.5/.120 roll → push main to gitea-vps2 → Phase-3 quadlet / Workstream F /
multinode. SSH/sudo pw `ThisIsWeb54321@` (**.88 = `ThisIsWeb54321!`**); UI/RPC .228/.198 =
`ThisIsWeb54321@`. Reusable tooling in scratchpad: `deploy-bin.sh`/`remote-apply.sh` (EXPECT_SHA
= `040df5ce…`), `rpc.sh`.
---
### ▶ SESSION g (2026-06-25) — earlier, historical
**Canonical resume detail: memory `project_session_resume_2026_06_23b` + `project_netbird_ph4_legacy_deletion_map` + `project_workstream_f_lifecycle_perfection`.**
`gitea-vps2/main = a721532f` (pushed). **Local main = `89d397bb`** (2 new commits this session, NOT pushed/deployed: `41e7f500` harness tolerance + `89d397bb` netbird ph4 legacy delete). Binary+FE are NOT in git — rebuild on a fresh machine.
**TL;DR (SESSION g, 2026-06-25) — everything below DONE this session:**
1. ✅ **Rolled** `e0343137` + fresh FE (`index-a75rd6Hy.js`) to **7 nodes** (.116/.198/.228/.89/.88/.5/.120), all verified. **.15 SKIPPED** (auth rejected — creds don't match).
2. ✅ **Harness tolerance fixes COMMITTED** `41e7f500` (run-gate settle/immich + immich.bats 90s + mempool.bats poll).
3. ✅ **mempool RESOLVED** fleet-wide — see mempool note below.
4. ✅ **netbird #20 ph4 DONE** — legacy Rust installer DELETED, committed `89d397bb` (492 lines gone, manifest-driven only, `cargo check` clean). Release binary BUILDING for the .228 live-verify (build left running — check after).
**NEXT (resume here):** (a) check the release build, deploy the `89d397bb` binary to .228, live-verify netbird adopts via manifest (https:8087→200, no `bail!`); (b) roll `89d397bb` to the rest of the fleet (behavior-neutral — manifest path already executed); (c) **push local main → gitea-vps2** (2 commits ahead); then **Phase-3 `use_quadlet_backends` → Workstream F → multinode**.
**ROLL RESULTS (2026-06-25, binary `e0343137b99bf066` + fresh FE bundled):**
| Node | Result |
|------|--------|
| .228 | ✅ already on `e0343137` (prior session, binary-only) |
| .116 (local) | ✅ binary + fresh FE; 36 containers survived restart; UI 200; `index-a75rd6Hy.js` live |
| .198 (LAN) | ✅ binary + fresh FE; 38 containers up; UI 200 |
| .89 (100.89.209.89) | ✅ binary + fresh FE; service active |
| .88 (100.70.96.88, pw `ThisIsWeb54321!`) | ✅ binary + fresh FE; service active |
| .5 (100.72.136.5) | ⏳ attempted — see resume note (cellular x250) |
| .120 (100.66.157.120) | ⏳ attempted — see resume note (cellular x250) |
| .15 (100.64.83.15, archy-dev-pa) | ❌ SKIPPED — `archipelago@` + `ThisIsWeb54321@` rejected (`Permission denied (publickey,password)`); node creds unknown |
Deploy tooling (reusable): scratchpad `deploy-bin.sh <label> <local\|ssh\|ts> <host> <pw>` + `remote-apply.sh` (mv binary avoids ETXTBSY, atomic FE swap preserving `aiui`/APK/`claude-login.html`, chown 1000:1000, restart, sha+health verify). Frontend tarball = `tar -C web/dist/neode-ui -czf neode-ui.tgz .` (flat). Full sha `e0343137b99bf06642c45da67bb092e9a411190ff59eda8e5177c2a06b6f6e89`.
**Focus: validate the two UNVALIDATED-WIP orchestrator fixes (commit `a721532f`) on the .228 canary, then roll to the 7-node fleet.**
- **Fix A** — desired-state recovery: a was-running app that vanished (e.g. lost through a failed teardown + reboot) auto-recreates on reconcile, via new `crash_recovery::load_last_running_names` (reads `running-containers.json` sans PID gate) + exact container-name match in `reconcile_all_with_mode`. Zero false-positives (uninstalled/user-stopped excluded).
- **Fix B** — recreate volume-ownership: a freshly-created bind dir for a NO-`data_uid` app gets `chown --reference=<parent>` so container-root can write → kills the immich-class recreate EACCES crash-loop. Only fresh dirs (zero regression for existing installs).
VALIDATION PROGRESS (sessions e→f):
1. ✅ Release binary built — sha16 `e0343137b99bf066` (differs from pre-fix `f2aa2fab` → fixes compiled in).
2. ✅ `cargo test -p archipelago crash_recovery`**13/13 green**, incl. the two new Fix A tests.
3. ✅ Deployed new binary to **.228 canary** (binary-only; FE unchanged at `435b9f92`). Verified live sha `e0343137`, active, RPC OK. Container cgroup confirmed in `user@1000.service` (NOT archipelago.service) → `systemctl stop` is container-safe on .228.
4. ✅ **Fix A PROVEN**`podman rm -f jellyfin` (non-baseline, no-data_uid) → periodic ExistingOnly reconciler (30s) recreated it; journal: `previously-running app has no container after boot — recreating (desired-state recovery) app_id=jellyfin`.
5. ✅ **Fix B PROVEN** — fresh `package.install uptime-kuma` (no-data_uid, no prior data dir) → bind dir chowned to parent owner `1000:1000` (NOT root:root), state=running, RestartCount=0, no EACCES, app wrote its own subdirs → clean uninstall (container+data-dir gone). all-apps matrix read-only **5/5 (17 apps)**.
6. 🟡 **5× DESTRUCTIVE gate on .228 — NOT yet 5/5, but failures are HARNESS-TOLERANCE FLAKES, NOT Fix A/B regressions** (proven: Fix A logged **0** desired-state-recovery firings during the failures; immich/lnd `RestartCount: 0`, no crashes). Under sustained 5× churn on this 34-app node a *different* heavy-app recovery probe slips each iteration:
- immich `lan_address` (test 64): 30s probe too tight after archipelago-restart recovery. **FIXED** (settle_stack now waits on immich :2283 when present, cap 180→300s; test 64 deadline 30→90s). Went **ok/ok/ok 3×** after fix.
- mempool orphan count (test 82): single-shot count caught a transient extra container mid-recreate (clears to 3=3). **FIXED locally** (poll for steady-state ≤30s) — fix is in local `tests/lifecycle/bats/mempool.bats`, NOT yet re-gated.
- lnd `getinfo recovers after restart` (test 77): already has a generous 240s deadline; peak concurrent load occasionally beats it. lnd itself **HEALTHY** (wallet unlocked — "wallet already unlocked, WalletUnlocker no longer available", RestartCount 0). Likely needs deadline bump or lnd added to within-iteration tolerance. **NOT yet fixed.**
- NOTE: the 300s settle bump made iterations very long (iter2=1062s) and a diagnostic run wedged in iter3; killed it. Re-think settle (maybe per-app readiness with shorter caps) before the next run.
7. ✅ **DECISION RESOLVED (2026-06-25):** user chose **(B) roll now** AND bundle the fresh UX frontend (per `feedback_deploy_targets_and_ux_bundle`). Gate load-robustness deferred to a separate hardening pass.
8. ✅ **ROLLED** `e0343137` + fresh FE (`index-a75rd6Hy.js`) to .116/.198/.89/.88/.5/.120 (.228 already on it) — all verified `sha=e0343137`, service active. **.15 skipped** (auth reject). See roll table above.
9. ✅ **Harness fixes COMMITTED** `41e7f500` (no longer uncommitted).
10. ✅ **netbird #20 ph4 — legacy installer DELETED**, committed `89d397bb`. `install_netbird_stack` is now orchestrator-manifest → adopt → `bail!` (no in-Rust installer); removed 6 dead helpers + 3 `NETBIRD_*_IMAGE` consts + unused import (~492 lines). `cargo check` clean (0 warnings). Manifest path verified live pre-delete (.228 https:8087→200). **Release binary BUILT: sha `cccb7cfd9c38a651`** (`core/target/release/archipelago`, supersedes `e0343137`) — NOT yet deployed; deploy to .228 + live-verify then roll. Map+rationale: memory `project_netbird_ph4_legacy_deletion_map`. **Pre-existing follow-up (NOT introduced by delete): the manifest path lacks an active #10 OIDC-readiness gate — if that login race resurfaces, add an OIDC-ready gate to the netbird manifest.**
**✅ 2026-06-25 — STRAY 13h GATE on .228 found + killed; mempool RESOLVED.** A `setsid` gate run from session-e was still churning .228 ~13h later (pathologically slow — only reached test 71/lnd; the 300s settle bump is the suspect). Killed its process group (note: `pkill -f bats` self-matches the ssh command's own argv → kill by numeric PID/PGID instead). After kill, `crash_recovery` (Fix A) auto-recovered the immich/indeedhub/netbird stacks — **good live exercise of Fix A**. **mempool fallout RESOLVED:** the gate churn left .228's podman **overlay storage corrupt** (mempool frontend crash-looped — container couldn't write `/etc/nginx`, same image serves fine on .116) → **fixed by rebooting .228** (clears overlay corruption; Fix A staggered-recovered all apps; mempool stable 200). **.198 is PRUNED** bitcoin → mempool requires archival (install correctly refused) → **cleanly uninstalled** the orphan mempool-db. All nodes now correct. LESSON: never leave the gate running unsupervised; reconsider the 300s settle before re-running.
Fleet on `e0343137` + FE `index-a75rd6Hy.js` on .116/.198/.228/.89/.88/.5/.120 (.15 still old). **`89d397bb` (netbird-delete) binary NOT yet deployed anywhere — verify on .228 then roll.** SSH/sudo pw UNIFORM `ThisIsWeb54321@` (**.88 = `ThisIsWeb54321!`**); **UI/RPC: .228=`ThisIsWeb54321@`, .198=`ThisIsWeb54321@`.** Reusable tooling in scratchpad: `deploy-bin.sh`/`remote-apply.sh` (binary+FE swap), `rpc.sh <host> <pw> <method> [params]` (auth.login→call). Gate harness at `~/lifecycle/lifecycle` on .228 — **CHECK it isn't already running/wedged before re-launching**.
---
### ▶ SESSION b (2026-06-23 PM) — earlier, historical
**Canonical resume detail: memory `project_session_resume_2026_06_23b` (▶️ top of MEMORY.md).**
`gitea-vps2/main = 4346007d` pushed; local HEAD `e57514b6` (uninstall fix, committed, **not pushed/deployed**).
Shipped + verified live on .228 (all in 4346007d):
- **Connection-lost FULLY fixed** — companion `image_exists` journal-flood (Stdio::null) + netbird UDP-port reconcile churn (`wait_for_manifest_host_ports` tcp-only). .228: flood→0, ws/db→0 disconnects, load 3.95→2.26.
- **netbird → manifest-driven** (#20 ph4) — 3 manifests + 4 orchestrator primitives (base64 secret, GeneratedCert+`ensure_manifest_certs`, templated-file render `{{HOST_IP}}/{{NETWORK_GATEWAY}}/{{secret:}}`, udp port protocol). Live: https 8087→200, OIDC→200, resolver=gateway. Legacy-Rust delete deferred to post-full-verify.
- **registry-manifest flip (code)**`EMBED_MANIFESTS` default-on, `main.rs` bounded pre-load `refresh_catalog`. Catalog regenerated w/ 52 embedded manifests but **NOT published** (gitignored + never committed; publish = force-add to gitea-vps2 main). Do after fleet binary roll.
- **UX regression root-caused + fixed** — the mobile/desktop UX (loader/AppLoadingScreen, store-driven launch, app icons, android webview footer) was on `companion-mobile-ux` and **never merged to main**, so any main build silently dropped it. **Merged → main**, frontend redeployed to .228. Android 0.4.9/code13 pushed for user to build APK elsewhere.
In progress — **Workstream F lifecycle bugs** (this §, user-picked next):
- **uninstall ghost — FIXED + pushed (e57514b6) + DEPLOYED to .228.** `handle_package_uninstall` returned Err on any cleanup-residue failure *before* removing the package state entry → ghost in My Apps + revert-to-Installed. Now: split container vs cleanup errors; remove state entry as soon as containers gone (before slow data rm). **LIVE-VERIFY IN PROGRESS:** fresh grafana (not previously installed → no data risk) install→uninstall→reinstall on .228; install was mid image-pull at handoff. RPC recipe + caution in memory `project_session_resume_2026_06_23b`.
- **#15 fedimint guardian — RESOLVED, not stuck** (legit `until` IBD-gate → setup wizard now bitcoin synced; no code change).
- #14 grafana reinstall-stops — verify in the same grafana test (likely same root cause as #13).
Next: finish grafana uninstall/reinstall live-verify on .228 → roll the new binary to the rest of the fleet (.116/.198/.5/.120 still on old binary) → publish embedded catalog (#8) → finish Workstream F (gate CASCADE+progress+all-apps expansion) → Phase 3 Quadlet → multinode.
WATCH: main.rs pre-load `refresh_catalog` (≤25s) slows startup — sanity-check startup→RPC-ready isn't egregious on the fleet roll.
---
### ▶ CURRENT STATE + RESUME (2026-06-23) — earlier session-a baseline (historical)
**✅ HEADLINE (2026-06-23): single-node gate GREEN (`run-gate.sh` 5/5 on .228, 0 not-ok) +
multinode test deploy DONE to 6 nodes.** The exit criterion (§5) is met. Green took fixing **two real
orchestrator bugs** (package.stop per-app grace, 2026-06-22; package.restart phantom stack-member
injection, 2026-06-23 — `order_present_containers`, commit 92d7f52d) plus hardening two single-shot
probes (bitcoin-knots state, immich lan_address). All work is **committed + PUSHED to `gitea-vps2`
(146) `main` @ `ccb594fb`** — the local-only state is resolved. Binary = release sha `5472c575…`.
**▶ DEPLOY STATE (latest backend `5472c575` + UX frontend + one-tap companion APK) — 2026-06-23:**
| Node | Pw | Done | Notes |
|------|----|----|-------|
| .116 (local, http:80) | `ThisIsWeb54321@` | ✅ | dev node: bitcoin mid-IBD + http-only |
| .198 | `archipelago` | ✅ | resilience; user manual-testing here |
| .228 | `archipelago` | ✅ | canonical gate node (5×-green) |
| 100.82.34.38 (archipelago-1) | `archipelago` | ✅ | |
| 100.89.209.89 (archy-x250-pa) | `ThisIsWeb54321@` | ✅ | |
| 100.70.96.88 (archipelago node) | `ThisIsWeb54321!` | ✅ | note the `!` |
| 100.64.83.15 (archy-dev-pa) | ? | ⏳ | UP (tailscale ping ok) but `ThisIsWeb54321@` REJECTED — **need correct pw** |
| 100.66.157.120 (archy-x250-exp) | `ThisIsWeb54321@` | ⏭️ | DOWN — user said leave it |
Deploy scripts saved in scratchpad: `deploy-node.sh` (full binary+FE, sha+health verify) and
`fe-only.sh` (FE-only, no archipelago restart). Reusable: `bash deploy-node.sh <host> <pw> <scheme> 127.0.0.1`.
**▶ COMPANION APK fixed (other agent's commit `5c43e127` + my reconcile):** QR + download were a
zip-wrapped `.apk.zip` (forced unzip). Now serve raw `archipelago-companion.apk` (one-tap) from the
146 raw URL; `CompanionIntroOverlay.vue` + ship/publish scripts repointed; old `.zip` dropped. The
OLD `.apk.zip` URL now 404s, so EVERY node was FE-refreshed to the new build (all 6 verified
`/ : 200` + bundle references `archipelago-companion.apk`).
**▶ MANUAL-TEST BUGS FOUND on .198 → workstream F (§4/§6c).** The green gate is DESTRUCTIVE-tier /
~8 core apps; it SKIPS uninstall/reinstall and has no progress-UI / all-apps coverage. Real bugs:
immich+grafana **uninstall hangs at a solid full-red bar + leaves a ghost in My Apps** (doesn't
actually remove); grafana **reinstall stops**; fedimint guardian shows "waiting for bitcoin sync"
(verify legit vs stuck). These motivate **workstream F** (cascade + progress + all-apps gate).
Also added **§10**: investigate TanStack-Query/push-based state mgmt for neode-ui (the state-drift
root cause behind the stuck bar + ghosts).
**▶ NEXT — agreed task order (do IN ORDER, see §6b):**
1. **netbird #20 ph4** — last real manifest migration.
2. **Phase-3 `use_quadlet_backends`** — orchestrator backends → Quadlet units.
3. **§6c workstream F** — cascade/uninstall + progress-UI + ALL-apps gate; fix the immich/grafana
uninstall + ghost-My-Apps + reinstall-stops bugs to a 5×-green; then §10 state-mgmt investigation.
4. **Multinode pass**`docs/multinode-testing-plan.md` (the 6 deployed nodes are ready for manual
testing now).
**▶ LOOSE ENDS / gotchas for the resuming session:**
- **`neode-ui/src/components/AppLoadingScreen.vue` is UNTRACKED** on .116 — the other agent created it
but NO committed code imports it (orphan, not in `e825bbed`). Left in place; decide whether to wire
it in or delete. Not deployed (committed UX doesn't reference it).
- **gitea-local mirror (`localhost:3000`) push is BROKEN** (token redirects to `/login`); push to
`gitea-vps2` works and is primary. Reconcile the local mirror token if you need it.
- **Don't delete bitcoin/electrum data** (user directive) — run only the DESTRUCTIVE gate
(`run-gate.sh` default; never set `ARCHY_ALLOW_CASCADE_DESTRUCTIVE` on real nodes with synced chains).
- **.198 gate not run this session** (user was manual-testing there + restarting). .116 gate ran but
failed 12 tests — ALL environmental (.116 is http-only → ui-coverage hardcodes `https://`; + bitcoin
mid-IBD → bitcoin/lnd preconditions). NOT product regressions. `gate-116.log` on .116.
**(historical resume notes for the 5× chase below — superseded by the green result above)**
**Headline (2026-06-22):** the production gate's `package.stop` blocker is **FIXED**; **`.228` is 1×-GREEN
(110/110)**; a **fresh 5× run is IN PROGRESS on `.228`** (the single-node exit criterion) after a
real mempool bug found + fixed (below). The gate is now single-node (.228); multinode is split out
(`docs/multinode-testing-plan.md`). The gate is canonically **5×** now — `run-gate.sh` (the `20x`
naming/script was removed 2026-06-22, commit `57a013bc`).
**2026-06-22 (late) — mempool stale-IP bug FOUND + FIXED (real production bug, not a flake):**
The 1st 5× attempt failed iteration 1 on `#74 mempool api backend remains queryable`. Root cause was
NOT timing — the frontend nginx pinned mempool-api's IP at startup (no `resolver`); after the gate
restarts mempool-api (new podman IP) nginx 502s and the UI shows "offline". Fixed in
`mempool-frontend:v3.0.1` (resolver+variable proxy_pass; see `[[project_mempool_nginx_stale_ip_fix]]`
/ `docker/mempool-frontend/`), pushed to vps2, manifests bumped 3.0.0→3.0.1, deployed + resilience-
verified live on .228 (backend restart now auto-recovers). Also fixed the test itself (`mempool.bats`
#74: 180s→300s + real `fail` helper). Commits `0f05f73a` (fix) `57a013bc` (gate rename).
**THE 5× RUN IS DETACHED ON .228 — survives terminal/session close. Check it from any machine:**
```
sshpass -p archipelago ssh archipelago@192.168.1.228 \
'grep -E "iteration [0-9]+: (PASS|FAIL)|RESULTS|passed:|failed:" /tmp/gate-5x3.log; \
echo "running pid: $(pgrep -f run-gate.sh$ || echo DONE)"; grep "^not ok" /tmp/gate-5x3.log | sort -u'
```
- Log: `/tmp/gate-5x3.log` on .228 · launched `nohup` · `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1`,
run **ON the node** from `/tmp/lifecycle-run/tests/lifecycle` via `./run-gate.sh` (ARCHY_HOST=127.0.0.1).
`bats` 1.11.1 + static `jq` 1.7.1 are installed on .228.
- **If all 5 iterations PASS → .228 has met the single-node criterion → demote the banner.**
- If it flakes again: readiness-under-churn (lnd/mempool); hardening in `98f4fa44` (inter-iteration
`settle_stack()` + readiness windows). Re-copy repo `tests/lifecycle` to /tmp/lifecycle-run, relaunch.
**▶ 2026-06-23 (morning) — 5× FINISHED 2/5; both mempool fails ROOT-CAUSED to ONE real
orchestrator bug (NOT flakes) + FIXED:** the overnight run finished `passed: 2 / failed: 3` on
`gate-5x3.log`, three *distinct one-off* fails, none repeating:
- iter1 `#5 container-list valid state for bitcoin-knots` — pre-launch churn (as predicted); didn't
repeat. **Hardened anyway:** the probe was a single-shot read; now polls ≤30s for a settled valid
state so a momentary `restarting`/transient can't flake a 20-min iteration (`bitcoin-knots.bats`).
- iter2 `#74 mempool api queryable` + iter5 `#73 mempool stack running` — **SAME root cause.**
`package.restart mempool` resolves its container list via `ordered_containers_for_start`, which was
**injecting phantom stack-member names** (`mysql-mempool`, `archy-mempool-api`, `archy-mempool-web`
— variant names from the union `startup_order` list that aren't live on this node). The phantom
`mysql-mempool` is 2nd in the start order; `do_orchestrator_package_start` hits its unknown-app-id
fallback → `do_package_start` inspect fails "no such object" → the `?` **aborts the whole start
sequence**, so `mempool-api` (pos 5) + `mempool` frontend (pos 8) never start. They then sat down
~6 min until the health monitor independently recovered them → #73 (frontend not running in 180s)
and #74 (api not queryable in 300s) both flake. Journal proof on .228: `package.restart mempool
failed: Start failed: mysql-mempool: ... no such object`, 23:27:32.
**Fix:** `ordered_containers_for_start` now orders only the *actually-present* containers and never
injects phantom order entries (new pure helper `order_present_containers` + 3 unit tests,
`dependencies.rs`). This is the SAME class as the mempool nginx bug — a hardcoded-name/reality
mismatch — and is exactly the manifest-driven-lifecycle anti-pattern the master plan targets.
- **Deploy + relaunch:** built release binary on .116, swapped `/usr/local/bin/archipelago` on .228
(containers live under `user@1000.service`, NOT the `archipelago.service` cgroup, so a service
restart does NOT kill them — verified via conmon cgroup paths). Manually verified mempool restart
keeps the stack up, then relaunched a clean 5× → see `gate-5x4.log` (check cmd above, swap the
filename). Expectation: all three fixed → 5/5 green → demote the banner.
**Code fixes shipped this session (all on `main`, built + DEPLOYED to .228 AND .198):**
- `2dad64b2` stop honours per-app grace (was `-t 30` deadline racing SIGKILL).
- `760a32bc` reconciler stops resurrecting user-stopped apps (dep-override + host-port watchdog).
- `6e49ce6f` container-list reports user-stopped apps as `stopped` despite a live UI companion.
- `452f05d8` companion self-heal on its own ~30s loop (was gated behind the slow per-app pass).
- Test-harness hardening: `88930558` `53b8e47f` `892ff083` `98f4fa44` (readiness retries, immich/
fedimint/NPM/lnd windows, inter-iteration settle). Binary built on .116
`core/target/release/archipelago` (4-fix); deploy = stop archipelago, cp to /usr/local/bin, start.
**NODE-STATE fixes on .228 NOT in the repo (re-apply if .228 is reset/reimaged):**
- nginx `/app/lnd/` proxy target was stale `8081` → fixed to `18083` (sed in
/etc/nginx/sites-{available,enabled}/archipelago + snippets, then `nginx -s reload`). Repo code is
correct (18083); old node config was stale.
- Removed a stale orphan `~/.config/containers/systemd/home-assistant.container` (ContainerName
`home-assistant` ≠ the real `homeassistant` container; it was stuck "activating"). Real app fine.
- electrumx was re-installed (`package.install` w/ image `146.59.87.168:3000/lfg2025/electrumx:v1.18.0`)
to re-register it as a tracked manifest app (it had become adopted plain-podman).
**KEY LESSON:** run the lifecycle gate **ON the node**, not via RPC from .116 — its bitcoin/companion/
orphan/endpoint tests use local `podman`/`systemctl`/`bitcoin-cli`/`curl`, so a remote run silently
tests the *runner* (this is why earlier runs from .116 falsely showed "bitcoin in IBD" etc.).
**Remaining (after 5× green):** netbird migration (#20 ph4 — the one real migration left) + btcpay/
mempool stack polish; Phase-3 `use_quadlet_backends`; B flip-on (EMBED_MANIFESTS+sign); per-app test
coverage (~30 apps unwritten); the mobile app-launch UX (§8 Roadmap P1). Multinode → its own plan.
---
### Where we are — Task #20 (manifest lifecycle hooks) + indeedhub migration: DONE & 2-node verified
Manifest-driven lifecycle hooks + the IndeedHub stack migration are **complete and
live-verified on BOTH .228 and .198** (adoption + fresh-create + post_install hook
exec, stable under load). 15 commits this session: `4c1a4e59`..`e2a012d0`. Working
tree clean. The release lifecycle gate is temporarily **5×** (was 20×; `ARCHY_ITERATIONS=5`).
tree clean. The release lifecycle gate is **5×** (`ARCHY_ITERATIONS=5`).
**Shipped (all on `main`, newest first):**
- `e2a012d0` indeedhub frontend health → `tcp:7777` (was http GET `/`; the http check
@ -247,30 +648,78 @@ regenerate, matching .198) → re-run the canonical gate (DESTRUCTIVE only).
regression suite green (37/37). **Validated:** healthy app `vaultwarden` stops cleanly on .198
(running→exited→removed) — no regression; the deployed binary's stop path works.
**But validation revealed the gate failures are MULTI-CAUSED — the grace bug is only one of ~5:**
1. ✅ FIXED — orchestrator ignored per-app stop grace (`podman stop -t 30` spurious 30s timeout).
2. ⛔ **`fedimint` is crash-looping / unhealthy on BOTH nodes** (`health_monitor: Auto-restarting
unhealthy container: fedimint`, attempt 6/10). An app that won't stay up can't be cleanly
stopped — fedimint was a *confounded* test case. Needs a fedimint-health investigation
(why is its container unhealthy / why does host port 8173 not become reachable).
`health_monitor` DOES respect `user_stopped` (health_monitor.rs:983) so that part is correct.
3. ⛔ **Host-listener repair watchdog** (`prod_orchestrator`: "host listener disappeared after
startup; restarting container app_id=fedimint") restarts containers whose launch port isn't
reachable — fights any stop of a port-unreachable app.
4. ⚠️ **State-model nuance:** `vaultwarden` showed `exited``absent`, never `stopped`; the gate waits
for exactly `"stopped"` (`wait_for_container_status … stopped`). The `Exited→Stopped` conversion
(server.rs:1191, needs `user_stopped.contains(id)`) isn't always firing — likely an id-vs-name
key mismatch. The gate may need to accept `exited`/`absent` as terminal, or the conversion fixed.
5. ⚠️ **Grace vs gate-timeout:** `electrumx` grace is 300s; if it ignores SIGQUIT the container
only dies at the 300s SIGKILL — far past the gate's 60s wait. `-t` is a *ceiling*, so a HEALTHY
electrumx that honours SIGQUIT stops fast; an unhealthy/ignoring one blows the gate window.
Decide: trim graces, make the gate's per-app stop-wait ≥ grace, or both.
6. ⚠️ **.228 contamination** (plain podman, no quadlet units) — my cascade-gate; re-quadletize.
**The gate stop-failure was MULTI-CAUSED (3 real product bugs) — all 3 now FIXED + the electrumx
lifecycle suite is GREEN (10/10, 66s) on .228:**
1. ✅ **Stop ignored per-app grace** (`podman stop -t 30` spurious 30s timeout) — commit `2dad64b2`.
Orchestrator now uses manifest `stop_grace_secs``stop_grace_secs_for()` table; deadline =
grace + 15s; applied to quadlet stop + API + CLI.
2. ✅ **Reconciler resurrected user-stopped apps** — commit `760a32bc`. The reconcile filter's
`dependency_required` override re-included a user-stopped dependency (electrumx ← active mempool),
the in-memory `disabled` set is wiped on manifest reload, and the host-port "repair" then restarted
the stopped backend within ~8s. Fix: `ensure_running_with_mode` now bails `Left("user-stopped")`
when the on-disk `user_stopped` marker is set (the single choke point all reconcile flows through);
install/start clear the marker first so user actions are unaffected.
3. ✅ **container-list reported user-stopped apps as `running`** — commit `6e49ce6f`. The backend was
Exited but its UI companion (electrs-ui/bitcoin-ui/…) kept serving the launch port, and the
state-refresh upgraded any reachable launch port to `running`. Fix: `handle_container_list` forces
`stopped` for `user_stopped` apps before the launch-port refresh.
**Bottom line:** the grace fix is correct and shipped, but **the gate will not go green until #2#6
are addressed**. These are pre-existing product/health issues the gate is correctly surfacing, not
regressions from this work. They need owner prioritization (esp. fedimint health, the watchdog-vs-
stop interaction, and the gate's terminal-state acceptance).
**Earlier theories now RESOLVED/superseded:** "fedimint crash-looping" was **probe-induced churn**
left alone, fedimint is stable (Up 48 min, 0 watchdog restarts/30 min); its restarts during testing
were the host-port watchdog firing while I rapid-cycled stop/start (fixed by #2). "Exited→Stopped
key mismatch" was actually the live-UI-companion launch-port issue (#3). "Grace vs gate-timeout"
(electrumx 300s) was moot — a healthy electrumx honours SIGQUIT and stops in <1s.
**TWO-NODE GATE RESULT (1×, DESTRUCTIVE, both with the 3-fix binary):**
- **.228: 104/110.** All previously-failing `package.stop` tests now PASS (bitcoin/btcpay/electrumx/
fedimint/immich). Remaining 6: test 31 (companion recreate), 44 (fedimint orphan — probe
pollution), 55 (immich restart timing), 83 (bitcoin not archival-synced), 94/99 (endpoint/lnd-proxy
cascade from 83).
- **.198: 94/110.** **14 of 16 failures are one root cause: bitcoin is in IBD** (test 83 says
`blocks=817652 headers=954850` — ~137k behind). Everything chained to bitcoin cascades: lnd
(16,85), btcpay (22,23,103), electrumx (37), mempool stack (71,72,73,101), endpoints (94),
bitcoin.getinfo (7,12). The other 2 are node-independent: **31** (companion recreate) and **44**
(fedimint orphan pollution).
**CONCLUSION: the lifecycle-stop blocker is FIXED and validated on both nodes.** The residual red is
NOT lifecycle bugs — it is (a) **bitcoin still syncing (IBD)** on the test nodes [test 83 is an
explicit precondition; nothing electrumx/lnd/btcpay/mempool can pass until it finishes], (b) **.228
plain-podman contamination** (my cascade-gate), and (c) two minor items: **test 31** companion-unit
recreate (both nodes — likely the 90s window vs reconcile tick + image step; investigate) and **test
44** orphan fedimint container left by my probing.
**EVERY gate failure is now FIXED or explained — NO lifecycle code bugs remain.** Final read:
- ✅ `package.stop` (the blocker): 3 bugs fixed (`2dad64b2`/`760a32bc`/`6e49ce6f`), green both nodes.
- **bitcoin-IBD cascade** (most of .198's red): environmental — bitcoin syncing (test 83 precondition).
- **test 31** companion-recreate: NOT a product bug. Two things: (a) **FIXED** — the companion
reconcile stage was gated behind the slow per-app pass; now it runs on its own ~30s loop
(`452f05d8`). Validated on .228 with the new binary: a deleted `archy-electrs-ui` unit self-heals
in **~10s** (was stuck 100s+), journal: `companion not active, repairing → wrote quadlet unit →
companion started`. (b) **HARNESS CAVEAT** — the companion-survives bats does LOCAL `rm`/`systemctl
--user` (no ssh), so running the gate from .116 against a remote node actually tests **.116's**
companions with **.116's** (old) binary, not the RPC target. ⇒ the companion-survives suite must be
run ON the target node (or with the new binary on .116) to be meaningful. This explains the
"failed on both nodes" runs — both were silently testing .116.
- **test 55** immich restart: NOT a bug — the heavy 3-container stack (postgres+redis+server) restarts
in >120s under load; immich DOES return to running. *Optional:* bump the immich restart wait.
- **test 44** fedimint orphan: my probe pollution; a teardown clears it.
**To reach a literally-green 5× gate (now infra/node-prep + minor test-window tuning, not lifecycle code):**
1. Let bitcoin finish IBD on a test node (or point the gate at an archival-synced bitcoin).
2. Re-quadletize .228 (reinstall its backends so `.container` units regenerate, matching .198).
electrumx done; bitcoin/btcpay/fedimint/immich/etc. remain. (Most backends ARE in manifest_ids
already; this is about regenerating quadlet units + clearing adopted plain-podman state.)
3. Optional: faster companion-reconcile cadence (test 31) + longer immich-restart wait (test 55) +
clear the test-44 orphan — or simply run the gate on a less-loaded, bitcoin-synced node.
4. ✅ **test 31 ROOT-CAUSED = contamination + load (NOT a product bug).** `companion::reconcile` only
recreates a deleted companion unit (e.g. `archy-electrs-ui`) when its PARENT backend (electrumx)
is in `manifest_ids`. On contaminated .228 electrumx ran as plain podman and was NOT a tracked
manifest install (its `/opt/.../electrumx/manifest.yml` exists on disk but wasn't loaded), so the
reconciler never iterated it → companion orphaned. **Proven fix:** `package.install electrumx`
re-registered it (now `reconcile action app_id=electrumx` fires) AND restored the companion (unit
present, service active). The companion self-heal logic is correct. ⇒ test 31 clears once .228 is
re-quadletized (step 2). electrumx on .228 is now de-contaminated. Still: clear test-44 orphans.
4. Then run `ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1` on the synced+quadlet node, then the other.
**Quadlet context (still true, but SEPARATE from the bug above):** quadlet IS the intended backend
runtime — .198 has the backend `.container` files (bitcoin-knots/btcpay-server/fedimint/filebrowser/
@ -287,7 +736,7 @@ bug is purely "container never stops", not "state not reported".
### MY-SESSION ERRATA (own it on resume)
- I ran the gate with `ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1`, which is **NOT** the canonical gate (that
is `ARCHY_ALLOW_DESTRUCTIVE=1` only — stop/start/restart, no uninstall/reinstall; see run-20x.sh
is `ARCHY_ALLOW_DESTRUCTIVE=1` only — stop/start/restart, no uninstall/reinstall; see run-gate.sh
"Suggested release-gate invocation"). Cascade ran uninstall/reinstall on every app and, when I
killed the run mid-iteration, left bitcoin-knots/electrumx/btcpay/fedimint/immich uninstalled or
stranded. **I fully restored .228** (reinstalled bitcoin-knots with the correct image
@ -296,30 +745,22 @@ bug is purely "container never stops", not "state not reported".
- Reinstall gotcha: `package.install` needs a REAL image ref in `dockerImage`; a bare app name
`Invalid Docker image format`.
### NEXT STEPS (in order)
1. ✅ **DONE** — root-caused the stop-grace bug, fixed it (commit `2dad64b2`), unit-tested,
release-built, **deployed to .198 + .228**, validated no-regression (vaultwarden stops on .198).
2. ⛔ **fedimint health** — why is its container unhealthy on both nodes (health_monitor restart
6/10; host port 8173 unreachable)? A crash-looping app can't pass the lifecycle gate. Likely the
real top blocker now. Same lens for any other unhealthy app surfaced by the gate.
3. ⛔ **Host-listener repair vs user-stop** — the launch-port watchdog
(`prod_orchestrator`: "host listener disappeared after startup; restarting container") must NOT
restart a container the user just stopped. Check it consults `disabled`/`user_stopped`.
4. ⚠️ **Gate terminal-state acceptance** — apps end `exited`/`absent`, not always `stopped`
(Exited→Stopped conversion at server.rs:1191 needs a matching `user_stopped` key). Either fix the
conversion (id-vs-name) or have `wait_for_container_status … stopped` accept exited/absent.
5. ⚠️ **Grace vs gate-timeout** — trim over-long graces (electrumx 300s) and/or make the gate's
per-app stop-wait ≥ the app's grace.
6. **Re-quadletize .228** (backend `.container` files wiped by my cascade-gate; reinstall its apps so
units regenerate, matching .198; verify `.container` + `PODMAN_SYSTEMD_UNIT`).
7. **Run the canonical gate** `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5` (NO cascade; never kill
mid-iteration) on .198 then .228. Green = Step-2-of-plan done.
8. Hardening: `package.start` should regenerate a missing quadlet unit, not fall back to bare podman;
re-survey the status doc's quadlet % from `.container`-file presence.
9. **netbird migration (#20 phase 4)** — same pattern; assess setup steps first (TLS cert gen,
config files, resolver IP — may need host-file-write hooks beyond exec/copy_from_host; legacy is
install_netbird_stack in stacks.rs).
10. Then single-container legacy apps onto the orchestrator install flow; then demote the banner.
### NEXT STEPS (in order) — SINGLE-NODE (.228) criterion
1. ✅ **DONE** — 4 stop/reconcile bugs fixed + deployed (`2dad64b2` grace, `760a32bc`
reconcile-resurrection guard, `6e49ce6f` container-list user-stopped, `452f05d8` companion
cadence). Plus test-harness fixes (lnd/immich/fedimint/NPM readiness + config).
2. ✅ **DONE** — gate run **ON .228** (synced bitcoin): **110/110 GREEN** (1×). Key lesson:
**run the gate on the node**, not via RPC from .116 (local podman/systemctl/bitcoin probes).
3. ◧ **5× run on .228 in progress** (`ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1`, on the node).
5 consecutive clean iterations = the single-node gate criterion → demote the banner.
4. **netbird migration (#20 phase 4)** — the one real migration left; assess setup steps first (TLS
cert gen, config files, resolver IP — may need host-file-write hooks beyond exec/copy_from_host;
legacy is install_netbird_stack in stacks.rs). Then btcpay/mempool stack polish.
5. Hardening: `package.start` should regenerate a missing quadlet unit, not fall back to bare podman.
**Multinode / fleet (.198 + the rest) → `docs/multinode-testing-plan.md` (separate, after .228 green).**
Carry-over notes for that plan: .198 bitcoin was mid-IBD; the lnd `/app/lnd/` nginx proxy had a
stale `8081` target on .228 (repo code is correct at 18083 — re-check on other nodes).
### KNOWN ISSUES / WATCH-OUTS
- **.198 is a weak/loaded node** (load avg ~35). The generic reconcile recreates
@ -374,3 +815,92 @@ This master plan is the hub. Authoritative standalone docs (linked above), kept:
All dated handoffs/resumes/transcripts/superseded trackers were consolidated here
and removed (recoverable via git) on 2026-06-21.
## 10. Backlog — investigate frontend state management (2026-06-23)
**Investigate adopting a real client-state/data-fetching layer for `neode-ui`** instead of
the current hand-rolled Pinia stores + ad-hoc fetch/poll patterns. Motivation: lifecycle/UX
bugs like the stuck "full-red" install/uninstall progress bar and ghost **My Apps** entries
(see §6c) are partly a *state-sync* problem — the UI's view of package state drifts from the
backend and isn't reliably invalidated/refetched. A principled query/cache layer (request
dedup, background refetch, cache invalidation on mutation, optimistic updates, retry/stale
handling) would make these classes of bug structurally hard.
**Research → recommend → (maybe) adopt:**
- Evaluate **TanStack Query** (Vue Query) as the leading candidate, plus alternatives
(Pinia Colada, vue-query alternatives, plain Pinia + a disciplined invalidation layer, or
an SSE/WebSocket push model for package-state events instead of polling).
- Criteria: fit with the existing Pinia/RPC architecture, bundle-size cost, offline/PWA
behaviour, how cleanly it models long-running mutations (install/uninstall with progress),
and whether a push channel for package-state changes is the better root-cause fix.
- Deliverable: a short design note + a recommendation, then a scoped migration of the
package-lifecycle surfaces (My Apps / install / uninstall / update progress) as the proof
case — sequence AFTER workstream F (it informs F's progress-UI fix and vice-versa).
## 10b. Backlog — intelligent launch-port selection (2026-06-26)
**Replace the per-app static launch-port map with a smart, manifest-first heuristic.** Gitea
launched at **:2222 (SSH)** instead of **:3001 (web)** on a node missing the gitea manifest on
disk: `manifest_lan_address_for` returned None → the code fell through to `extract_lan_address`,
which returns podman's **first-listed** published port, and podman lists `2222->22` before
`3001->3000`. Patched 2026-06-26 (`670ebb06`) with a static `"gitea" => 3001` entry in
`lan_address_for` (`core/container/src/podman_client.rs`) — but that's a per-app band-aid (the
anti-pattern CLAUDE.md warns against; the map already carries bitcoin/lnd/mempool/immich/… by hand).
**Real fix (do this, then delete the static entries):**
- **Primary** is already correct — derive the launch URL from the manifest's declared
`interfaces.main` port. The failure was only the *fallback*. The north-star cure is
registry-distributed manifests (workstream B) so the manifest is always present and we never
guess.
- **Smart fallback** — make `extract_lan_address` stop returning the blind first port: **skip
container-side ports that are known non-HTTP (22/SSH, etc.) and prefer the published port whose
container side matches the manifest `health_check` endpoint / a known web port.** Fixes the whole
multi-port-app class generically (no per-app hardcoding), and lets us drop the static map.
- ~20-line change to one function + unit tests; rides the next fleet roll. NOT a free-port
remap (that's `port_allocator.rs`, which already resolves host-port *collisions* — a different
problem; gitea's web UI was never in conflict).
## 10c. Backlog — generalize the archival/full-node install blocker (2026-06-26)
**Make "this app needs an un-pruned (archival, txindex) Bitcoin node" a manifest-declared
dependency, applied to every app that needs it — using the electrumX/mempool blocker as the
reference behavior.** Today the gate works but is **hardcoded**: `requires_unpruned_bitcoin()` in
`core/archipelago/src/api/rpc/package/dependencies.rs` is a literal `matches!(package_id, "electrumx"
| "electrs" | "mempool-electrs" | "mempool" | "mempool-web")`, and install `bail!`s with
`archival_bitcoin_required_message` when `bitcoin.pruned` is true or disk < `ARCHIVAL_BITCOIN_DISK_GB`
(1 TB). That's the same per-app-hardcoding anti-pattern as the gitea static map (§10b) and the
`install_*_stack` Rust — any new app needing a full node is silently *un*-gated until someone edits
this match.
**Do:**
- **Declare it in the manifest** — e.g. `requires: { bitcoin: archival }` (or a
`dependencies.bitcoin.pruned: false` constraint) so the install pre-flight reads the requirement
from the manifest set instead of a hardcoded list. Covers future apps automatically (manifest-driven
north star).
- **Audit coverage** — confirm EVERY archival-dependent app is gated (electrumX, electrs,
mempool + its electrs, and any BTC-indexer/explorer added later); add a unit test asserting the
manifest constraint ⇒ blocker fires.
- **UX** — the blocker must be a clear, surfaced **pre-install** state in the UI (not just an RPC
`bail!` string): explain *why* (pruned node / insufficient disk), what to do (add ~1 TB, resync
un-pruned with txindex), and keep the app visibly "requires archival node" rather than a confusing
generic failure. Pairs with workstream F's honest-progress/blocker UX.
- Reference: the existing `package-install-prune-check` dependency descriptor (dependencies.rs:208)
is the seam to make data-driven.
## 10d. Mesh — Meshtastic MeshCore-parity (in the fleet binary; one open bug) (2026-06-26)
**Status: shipped as commit `8fdb45e8` and now riding in the rolled fleet binary** (built into the
#9 deploy from HEAD, sha `0060dcd6…`). The Meshtastic driver auto-provisions LoRa **region (EU_868)**
and a shared **channel "archipelago"** via the official admin API (`set_config`=field34,
`set_channel`=field33) — discovery, bidirectional RF, and **sending** are all verified on **.116 + .228**.
Detail + history: [[project_meshtastic_parity]].
**Open work (slot after WS-F #911, before/with multinode):**
- **RECEIVED-message surfacing bug** — the running driver does **not** surface received messages
(`mesh.messages` stays `[]`) even though the radio physically receives them. An instrumentation
build was in flight to locate where the inbound packet is dropped between the radio serial/BLE read
and the `mesh.messages` store. This is the one blocker to closing MeshCore parity.
- **.198 radio is bad** — won't persist config (needs a reflash) so it's not a usable mesh test node;
use .116/.228 for mesh verification.
- Definition of done: a message sent from a MeshCore/Meshtastic peer on channel "archipelago" appears
in `mesh.messages` on the receiving archipelago node, end-to-end, on ≥2 LAN nodes.

View File

@ -103,10 +103,10 @@ Notes:
## 4. Test-gate reality
**No app has passed the formal release gate.** The gate is `run-20x.sh` green
**No app has passed the formal release gate.** The gate is `run-gate.sh` green
across the full lifecycle matrix (install / UI reachable / stop / start /
restart / reinstall / reboot-survive / archipelago-restart-survive / uninstall),
**20× on .228 AND .198**. All 8 release-gate checkboxes in
**5× on .228 AND .198**. All 8 release-gate checkboxes in
`tests/lifecycle/TESTING.md` are **unchecked (☐)**.
What exists today:
@ -132,7 +132,7 @@ failure): `bitcoin-receive.bats`, `port-drift.bats`, `secret-completeness.bats`.
1. **immich** is the last legacy (in-cgroup) app — migrate to Quadlet to finish Pillar 1.
2. **grafana / strfry** Quadlet units stuck *activating* with no container — investigate. (onlyoffice removed 2026-06-21.)
3. **fedimint-gateway / fedimint-clientd** (this session) now run but have no lifecycle test coverage.
4. The formal **20× release gate has never been green** — it is the blocker for the v1.7.52 tag.
4. The formal **5× release gate has never been green** — it is the blocker for the v1.7.52 tag.
---

View File

@ -0,0 +1,215 @@
# Bitcoin Multi-Version Support — Design
**Status:** design (2026-06-22)
**Goal:** let a user choose *which* version of Bitcoin Core / Bitcoin Knots to
install (latest pre-selected, older versions in a dropdown), and later switch
versions or opt into auto-update — all manifest/catalog-driven, all served from
**our signed registry**, rootless, with **zero data loss** across version
changes.
See also: [`docs/registry-manifest-design.md`](registry-manifest-design.md)
(catalog distribution + signing this builds on),
[`docs/PRODUCTION-MASTER-PLAN.md`](PRODUCTION-MASTER-PLAN.md) (gate that must be
green first), `MEMORY → project_decoupled_app_updates`,
`MEMORY → project_manifest_driven_north_star`.
> **Scheduling:** this is net-new scope. It lands **after** the production test
> gate (`tests/lifecycle/run-20x.sh`) is green on `.228` + `.198`. The data-
> preservation invariant (downgrade vs. chainstate) is the highest risk here.
---
## 1. Where we are today
### Image source / build
| Thing | Today |
|-------|-------|
| `apps/bitcoin-core/Dockerfile` | `FROM bitcoin/bitcoin:24.0` — a **community** image, **stale** (manifest says 28.4), no project-official Docker image exists |
| `apps/bitcoin-knots/` | **no Dockerfile**`:latest` is built/pushed by hand |
| Registry | `scripts/image-versions.sh``ARCHY_REGISTRY="146.59.87.168:3000/lfg2025"`; only `BITCOIN_KNOTS_IMAGE=…/bitcoin-knots:latest` pinned, no Core pin |
| Tags in registry | **one tag per image**. No historical versions. |
### Version pinning
- `apps/bitcoin-core/manifest.yml``…/bitcoin:28.4` (pinned).
- `apps/bitcoin-knots/manifest.yml``…/bitcoin-knots:latest` (**floating** — a
liability for reproducibility and for "switch back to the version I had").
- `core/archipelago/src/container/app_catalog.rs` + `app-catalog/catalog.json`:
signed, hourly-fetched, carries `version` (badge text) + `image`.
`catalog_image_override()` overrides the manifest image **only if same-repo**.
`available_update_for_app()` already ignores floating tags for update
detection.
### Install path
- `prod_orchestrator.rs::install_fresh()` resolves the image as
**manifest image → catalog override → pull**. There is **no per-install
version parameter** — `orchestrator.install(app_id)` takes only the id.
- RPC `package.install` (`api/rpc/package/install.rs`) *accepts* `dockerImage` /
`version` params but for orchestrator-managed apps (bitcoin-core / bitcoin-knots
are allowlisted) it **ignores them** and lets the orchestrator resolve.
- **Conflict guard** (`prod_orchestrator.rs` ~13061325): core and knots may not
run simultaneously. Must be preserved by everything below.
### UI
- Install is **one-click, no modal** (`MarketplaceAppDetails.vue::installApp()`).
- Update badge + "Update to X" already exist (`appDetails/AppHeroSection.vue`,
RPC `package.update`).
- **No** Bitcoin-specific settings panel; all apps share `AppSidebar.vue`.
- Per-app config persisted **only at install time** as `containerConfig`
`/var/lib/archipelago/app-configs/<id>.json`. **No post-install set-config RPC.**
---
## 2. Source-of-truth decision: official upstream → our registry
We use the **official releases** as upstream provenance, but nodes only ever pull
from our registry. Nodes do **not** fetch bitcoin.org / GitHub at install time —
that would break rootless/offline installs and the signed-registry trust model,
and neither project publishes an official Docker image anyway.
**Official sources (verified):**
| Impl | Index | Per-version asset pattern |
|------|-------|---------------------------|
| Bitcoin Core | [bitcoincore.org/en/releases](https://bitcoincore.org/en/releases/) · [github bitcoin/bitcoin](https://github.com/bitcoin/bitcoin/releases) | `https://bitcoincore.org/bin/bitcoin-core-<ver>/bitcoin-<ver>-x86_64-linux-gnu.tar.gz` + `SHA256SUMS` + `SHA256SUMS.asc` |
| Bitcoin Knots | [github bitcoinknots/bitcoin](https://github.com/bitcoinknots/bitcoin/releases) · [bitcoinknots.org/files](https://bitcoinknots.org/) | `https://bitcoinknots.org/files/<maj>.x/<ver>/bitcoin-<ver>-x86_64-linux-gnu.tar.gz` (`<ver>` e.g. `29.3.knots20260508`) |
Both ship **signed binary tarballs** with multi-builder Guix attestations
(`SHA256SUMS.asc`). The build pipeline verifies these **once, at build**; our DHT
Phase 0 registry signature then carries provenance to the fleet.
> Knots version strings embed a build date (`29.3.knots20260508`). Treat the full
> string as the tag; surface a friendly `29.3` + date in the UI.
---
## 3. Design
### Phase 0 — Reproducible, verified image pipeline *(prerequisite)*
New `scripts/build-bitcoin-image.sh <impl> <version>` that, per version:
1. Downloads the official tarball + `SHA256SUMS(.asc)` (GitHub release assets are
an identical mirror → fallback).
2. Verifies SHA256 **and** the Guix/builder GPG signatures. **Fail closed.**
3. Builds a minimal **rootless** image: pin a small base, unpack
`bitcoind`/`bitcoin-cli`. Keep the existing entrypoint probe
(`command -v bitcoind || find /opt -path '*/bin/bitcoind'`) so per-version
layout differences don't break startup.
4. Tags + pushes `:<version>` **and** updates the default pin (`:latest` /
`:28.4`-style) to the registry.
**Curate, don't mirror everything.** Publish a bounded set (proposal: current +
last ~3 majors), e.g. Core `31.0, 30.0, 29.3, 28.4, 27.2` and Knots
`29.3.knots…, 28.1.knots…, 27.1.knots…`. **`log` / document dropped versions** —
silent truncation reads as "all versions supported" when it isn't.
Also fixes existing debt: replaces the stale community `FROM bitcoin/bitcoin:24.0`
and gives Knots a real Dockerfile + non-floating tags.
### Phase 1 — Version catalog (signed, registry-distributed)
Extend `AppCatalogEntry` (forward-compatible — no `deny_unknown_fields`, old nodes
ignore it):
```jsonc
"bitcoin-core": {
"version": "31.0", // default / latest (existing field)
"image": "…/bitcoin:31.0", // existing
"versions": [ // NEW
{ "version": "31.0", "image": "…/bitcoin:31.0", "default": true },
{ "version": "30.0", "image": "…/bitcoin:30.0" },
{ "version": "28.4", "image": "…/bitcoin:28.4", "deprecated": true, "eol": "2026-...." }
]
}
```
Published to `releases/app-catalog.json`, signed by the existing release-root
mechanism. This is the **single source of truth** the UI reads for "what can I
install / switch to," and third-party-registry apps inherit the capability for
free. `version`/`image` stay as the default for back-compat.
### Phase 2 — Install-time version selection
- **Orchestrator:** add `install_with_image(app_id, Option<image_tag>)` (or an
optional arg on `install`). When a tag is supplied, **validate same-repo**
against the manifest (reuse `image_without_registry_or_tag()`), then override in
`install_fresh()`. Default path unchanged. Preserve the core/knots conflict
guard.
- **RPC:** thread the selected version/image from `package.install` into the
orchestrator for the allowlisted apps (the param is already received — just not
forwarded).
- **UI:** the first **install modal** in the app — latest pre-selected, dropdown
of `versions[]`, deprecated/EOL badges on old entries. On confirm, pass the
chosen version to `package.install`.
### Phase 3 — In-app version switch + auto-update toggle
- **UI:** a Bitcoin **"Version & Updates"** card (conditional in `AppSidebar.vue`
for `bitcoin-core` / `bitcoin-knots`): current version, a switch dropdown, and
an **auto-update-to-latest** toggle.
- **Switch = controlled re-pull/recreate** reusing the `package.update`
machinery but targeting an arbitrary (incl. older) tag → effectively
`package.set-version`.
- **Persistence:** new `package.set-config` RPC writing the existing
`app-configs/<id>.json` (`{ pinnedVersion, autoUpdate }`).
- **Auto-update:** the existing hourly catalog check, when `autoUpdate:true`,
triggers `package.update` to the catalog default. A pinned version **suppresses
the update badge**.
---
## 4. Invariants & safety rails
- **Rootless only.** Pipeline images and run path stay rootless; no Docker-socket,
no privileged.
- **No data loss across version change.** Preserve `/var/lib/archipelago/bitcoin`,
secrets (`bitcoin-rpc-password`, `…-rpcauth`), ports, and the adoption container
name on every install / switch / update.
- **⚠️ Downgrade vs. chainstate (highest risk).** Bitcoin Core refuses to start on
a chainstate written by a *newer* version unless reindexed (expensive, or data
loss on a pruned node). The UI **must** warn loudly on downgrade; the
orchestrator should gate/confirm it and never silently wipe. Pruned nodes can't
simply `-reindex`.
- **Core ⇄ Knots switch** stays governed by the existing conflict guard; treat an
impl switch as distinct from a version switch.
- **Floating tags** (`latest`) are never advertised as a selectable "version" and
never counted as an available update (already handled by
`available_update_for_app`).
- **Verify on a real node** (`.228` then `.198`) and pass `run-20x` before any
tag.
---
## 5. Files / seams (no code yet)
| Concern | File |
|---------|------|
| Image build/push | new `scripts/build-bitcoin-image.sh`; `apps/bitcoin-core/Dockerfile`; new `apps/bitcoin-knots/Dockerfile`; `scripts/image-versions.sh` |
| Catalog schema | `core/archipelago/src/container/app_catalog.rs`; `releases/app-catalog.json` (+ `app-catalog/catalog.json`) |
| Install override | `core/archipelago/src/container/prod_orchestrator.rs` (`install` / `install_fresh`); `api/rpc/package/install.rs`; `api/rpc/dispatcher.rs` |
| Switch / set-config RPC | `api/rpc/package/update.rs`; new `package.set-config` handler; `app-configs/<id>.json` |
| Install modal | `neode-ui/src/views/MarketplaceAppDetails.vue`; new `…/marketplace/AppInstallModal.vue` |
| Version & Updates card | `neode-ui/src/views/appDetails/AppSidebar.vue`; `neode-ui/src/api/rpc-client.ts`; `neode-ui/src/types/api.ts` |
---
## 6. Open questions
1. **Curated version set** — how many majors back do we host, and storage budget
on the registry?
2. **Multi-arch** — fleet is x86_64 today; do any nodes need arm64 images?
3. **Pruned-node downgrade policy** — block outright, or allow with an explicit
"this will require re-sync / may lose pruned data" confirmation?
4. **Auto-update default** — off (opt-in) for a consensus-critical app like
Bitcoin? (Recommended: **off**, explicit opt-in.)
5. **Knots date-suffix UX** — how to display `29.3.knots20260508` cleanly.
---
## Sources
- [Bitcoin Core releases](https://bitcoincore.org/en/releases/)
- [bitcoin/bitcoin releases](https://github.com/bitcoin/bitcoin/releases)
- [bitcoinknots/bitcoin releases](https://github.com/bitcoinknots/bitcoin/releases)
- [Bitcoin Knots](https://bitcoinknots.org/)
- [bitcoin.org version history](https://bitcoin.org/en/version-history)

View File

@ -1,109 +0,0 @@
# Archipelago Public Demo — build info & status
**Status:** implemented & deployable (2026-06-22)
**Branch:** `demo-build` (worktree `../archy-demo-build`), pushed to
`gitea-vps2` = `http://146.59.87.168:3000/lfg2025/archy.git`.
**Main/prod is untouched** — all demo work lives only on `demo-build`.
A public, click-to-play demo of the Archipelago UI, 100% mock-data driven,
multi-visitor, deployed via Portainer. See also `docs/demo-deployment-design.md`
(original design) and `demo-deploy/` (thin prebuilt-image stack).
---
## Deploy (Portainer)
Build-from-repo (works today, no registry needed):
| Field | Value |
|-------|-------|
| Repository URL | `http://146.59.87.168:3000/lfg2025/archy.git` |
| Reference | `refs/heads/demo-build` |
| Compose path | `docker-compose.demo.yml` |
| Auth | user `lfg2025`, password = Gitea token |
| UI port | **2100** · Login password: **`entertoexit`** |
Redeploy after each push. `docker-compose.demo.yml` builds two images
(`neode-ui/Dockerfile.backend` = mock server, `neode-ui/Dockerfile.web` = nginx+UI).
The thin `demo-deploy/docker-compose.yml` pulls prebuilt `:demo` images instead
(needs the CI image pipeline / registry wired — `.github/workflows/demo-images.yml`).
### Flags / env
- Backend: `DEMO=1` (compose sets it) → multi-session sandbox, no real runtime.
- Web build: `VITE_DEMO=1` (Dockerfile.web ARG, default 1) → inlined demo UI behaviour.
- Optional: `ANTHROPIC_API_KEY` (NOT needed — AIUI chat is canned in demo),
`DEMO_SESSION_TTL_MS` (45m), `DEMO_MAX_SESSIONS` (500), `DEMO_FILE_QUOTA_BYTES` (50MB).
---
## Architecture
Everything is gated behind `DEMO` (off = classic single-user dev mock, unchanged).
- **`neode-ui/mock-backend.js`** — the entire fake backend (Node/Express, ~95+ RPCs).
- **Per-session isolation:** `AsyncLocalStorage` + Proxy. Globals (`mockData`,
`walletState`, `userState`, `mockState`, `bitcoinRelayMockState`) are Proxies
that resolve to the current request's store, keyed by a `demo_sid` cookie.
Deep-cloned from `SEED_*` on first hit; idle-reaped; per-session WS fan-out.
- **Files:** per-session in-memory store + curated disk files (see below).
- Forces simulation mode in DEMO (`docker=null`).
- **`neode-ui/src/composables/useDemoIntro.ts`** — the frontend demo switch
(`IS_DEMO`), per-day intro gate, `DEMO_PASSWORD`, app demoability + launch URLs.
- **`neode-ui/docker/nginx-demo.conf`** — routes `/rpc`, `/ws`, `/app/*`,
`/electrs-status`, `/proxy/`, `/lnd-connect-info`, the IndeeHub/Mempool
reverse-proxies, and the SPA.
- **`docker/{bitcoin-ui,electrs-ui,lnd-ui,fedimint-ui}/`** — the REAL registry app
UIs, served statically under `/app/<id>/` with mocked data endpoints.
- **`demo/aiui/`** — prebuilt AIUI dist (chat is canned; `?mockArchy&seed`).
- **`demo/files/`** — curated cloud files drop-in (see below).
## Demo features (all implemented)
Per-session sandbox · per-session file upload (Range streaming) · testnet/signet
flavor · per-day intro replay · `entertoexit` login (prefilled + hint) · version
`<real>-demo` · onboarding wizard skipped (intro kept) · "No demo" install gating ·
real app UIs (Bitcoin Core vs Knots by subversion, ElectrumX, LND, Fedimint;
Mempool/IndeeHub iframed) · 12 federation nodes / 5 peers · FIPS active · interactive
buy flow (testnet addresses, bolt11, 2s QR) · real testnet tx links (mempool.space) ·
networking profits 5,231,978 sats + labelled wallet txs · VPN · Nostr relays ·
node-visibility toggle · dummy Cashu mints + Fedimint federations · AIUI canned
reply + `?mockArchy` mock data + `?seed` pre-loaded "Content Showcase" chat.
---
## Curated cloud files (`demo/files/`)
Drop real files into `demo/files/<Folder>/<file>` and commit — they become the
cloud content for every visitor (read-only; git access = the "private login").
Loader **merges per top-level folder**: adding `Music/` swaps only Music and keeps
the sample Documents/Photos/Videos. Empty → built-in seeds. Text inlined; binaries
streamed from disk with HTTP Range (seek). Backend reads `/demo/files`
**Dockerfile.backend COPYs it; `.dockerignore` must allow it.**
---
## Gotchas (READ before editing)
- **Sibling dirs need both the Dockerfile COPY and a `.dockerignore` allow.**
`docker/bitcoin-ui`, `docker/electrs-ui`, `docker/lnd-ui`, `docker/fedimint-ui`,
`demo/files` are outside `neode-ui/`; they're copied into the backend image and
un-ignored in `.dockerignore` (`* ` + `!docker/` + `docker/*` + `!docker/<ui>/`).
Forgetting either → Portainer build "not found" or runtime 500/404.
- **Real app UIs assume root-serving** — served via `express.static('/app/<id>')`
+ `/app/<id>/assets/*``/assets/*` redirect + per-path data endpoints
(`bitcoin-status`, `rpc/v1`, `bitcoin-rpc/`, `/proxy/lnd/*`, `/electrs-status`).
- **Uploaded-via-UI files are ephemeral** (per-session, lost on redeploy/reap).
Only `demo/files/` persists.
- **Mempool iframe is best-effort** (third-party CSP/websockets). **IndeeHub** is
reverse-proxied with header-strip + `sub_filter` asset rewrite; if still black,
it's indee's own `X-Frame-Options` (fix on that server).
- **AIUI `?seed` bootstrap hardcodes the current AIUI bundle hash**
(`/aiui/assets/seedPrompts-CLWaUv28.js`) — re-paste if AIUI is rebuilt. Tiny
first-load IndexedDB race (one refresh shows the chat).
- **Running mock-backend.js locally in the sandbox is flaky:** start backgrounded,
`sleep 5+`, then curl; NEVER `pkill -f mock-backend` (it matches & kills the
shell) — use `pkill -x node`.
- **Delete-405** seen pre-redeploy was nginx/stale; backend DELETE returns 200.
---
## Commit trail (demo-build, newest last)
`2715f2d8` sandbox → … → `7efebb4a` media merge + AIUI seed. ~14 commits, all
`feat(demo)/fix(demo)`.

View File

@ -0,0 +1,169 @@
# Public Demo Deployment — Design
**Status:** design (2026-06-22)
**Goal:** a public, click-to-play demo of the Archipelago UI that **auto-tracks
the real code** yet stays **separated** from the private monorepo and its
secrets/backend. Deployed via **Portainer**, mock-data driven, with working file
storage and a testnet-flavored Bitcoin sandbox so visitors can play freely.
See also: `neode-ui/mock-backend.js` (existing mock), `docker-compose.demo.yml`
(existing demo stack), `MEMORY → reference_neode_ui_dev_testing`,
`MEMORY → reference_ovh_168_mirror` (Portainer/registry host).
---
## 1. What already exists (the 70%)
The demo is mostly built. Inventory:
| Asset | Path | State |
|-------|------|-------|
| Mock backend (Node/Express + ws) | `neode-ui/mock-backend.js` (~3,862 lines) | 95+ JSON-RPC methods: auth, package lifecycle, Bitcoin/LND wallet, mesh, federation, identity, monitoring, mock filebrowser |
| Mock data | `mockData` / `walletState` / `MOCK_FILES` in `mock-backend.js` | rich; 10 pre-installed apps, 30+ marketplace apps, wallet balances, seeded files (Music/Documents/Photos/Videos) |
| Demo compose | `docker-compose.demo.yml` | `neode-backend` (mock, `:5959`) + `neode-web` (nginx, `:4848`); header already says "Deploy via Portainer" |
| Backend image | `neode-ui/Dockerfile.backend` | Node 22 Alpine → `node mock-backend.js` |
| Web image | `neode-ui/Dockerfile.web` | multi-stage `vite build` → nginx |
| Demo nginx | `neode-ui/docker/nginx-demo.conf` | proxies `/rpc/v1`, `/ws`, `/app/*` to the mock backend |
| Precedent | `indee-demo` Portainer stack | separate stack referencing a **pre-built image** — the pattern we extend |
**Gaps for a *public* (not dev) demo:** state is global (visitors collide),
uploads are no-ops, Bitcoin block height is hardcoded, no CI image pipeline, no
separated public deploy repo.
---
## 2. Architecture: source in monorepo, demo ships as images, public repo is thin
The tension — "must update as I update the real code" **and** "sort of
separated" — is resolved by separating at the **deploy layer, not the source
layer**.
```
monorepo (private — single source of truth)
neode-ui/ + mock-backend.js
│ push to main
CI: build archy-demo-web + archy-demo-backend
│ push :demo / :latest
registry (146.59.87.168:3000 / vps2)
│ Portainer webhook / re-pull
archy-demo (public repo — tiny)
docker-compose.yml ──referencing pre-built images──▶ Portainer ▶ demo.<host>
.env.example
```
- **Single source of truth = the monorepo.** `neode-ui/` and `mock-backend.js`
stay where they are, so the demo tracks real code automatically — no fork to
sync, no drift.
- **Separation = the public repo never holds source.** `archy-demo` contains only
a `docker-compose.yml` (image refs) + `.env.example` + README. No Rust backend,
no secrets, no UI source. Safe to make public.
- **Auto-update flow:** edit code → push → CI rebuilds demo images → Portainer
redeploys. The public compose file is touched rarely (only when service shape
changes).
**Why not a true fork / `git subtree split`?** It works but needs a sync job
*and* re-exposes UI source publicly. The image pipeline gives stronger
separation (zero source leak) **and** zero manual sync. (Decided 2026-06-22.)
---
## 3. Work items
### 3.1 CI image pipeline
- On push to `main` (path filter: `neode-ui/**`), build:
- `archy-demo-backend` from `neode-ui/Dockerfile.backend`
- `archy-demo-web` from `neode-ui/Dockerfile.web` (`build:docker`)
- Tag `:demo` + `:<git-sha>`, push to the registry.
- Trigger Portainer redeploy (stack webhook) on success.
### 3.2 Public `archy-demo` repo
- `docker-compose.yml` mirroring `docker-compose.demo.yml` but **`image:`
references instead of `build:`** (pull `:demo`, no build context).
- `.env.example` (`ANTHROPIC_API_KEY`, `VITE_DEV_MODE=existing`, session TTL,
upload quota).
- README: one-paragraph "deploy in Portainer → web editor paste / deploy from
repo," access on `:4848`.
- No source. This is the only public surface.
### 3.3 Multi-user: per-session sandbox (reset on idle) ⟵ *decided*
The biggest code change. Today `mockData` / `walletState` / `MOCK_FILES` are
**global singletons** → visitors corrupt each other's view.
- Issue a `demo-session` cookie on first hit (the mock already sets a session on
login; extend it to anonymous visitors).
- Key state by session id: `sessions[sid] = { mockData, walletState, files }`,
each **deep-cloned from a pristine seed** on creation.
- Reap on idle (e.g. 30 min no activity) + hard cap concurrent sessions; on reap,
free memory + temp dir.
- RPC dispatch + WS patches resolve the per-session state instead of the global.
- Keeps the demo a true playground: install/uninstall/spend freely, reset by
reconnecting.
### 3.4 File storage: persisted per session ⟵ *decided*
Today filebrowser upload/delete/rename are 200-OK no-ops.
- Back each session with a temp dir (e.g. `/tmp/demo/<sid>/`), seeded from
`MOCK_FILES`.
- Make `POST/DELETE/PATCH /app/filebrowser/api/resources/*` and `GET …/raw/*`
read/write that dir. Enforce a per-session quota (e.g. 50 MB) and reject
oversize/odd MIME.
- Cleaned when the session is reaped — no standing public writable volume, no real
filebrowser container to harden.
### 3.5 Bitcoin: testnet-flavored mock ⟵ *decided*
- Relabel wallet/chain as **testnet/signet**: `tb1q…` addresses, "testnet" chain
in `bitcoin.getinfo`, scripted-but-plausible block height + confirmations.
- Keep `dev.faucet` as the in-UI "get test sats" button (instant, free).
- No real `bitcoind` → no sync, no disk, no public RPC attack surface.
- *Future upgrade path:* swap to a real signet node + LND in the stack if we ever
want movable real test sats (out of scope now).
### 3.6 Mock containers / app lifecycle
- The mock already simulates `package.install/uninstall/start/stop/restart`
asynchronously. For the demo, **force simulation mode** (never touch a real
Docker socket — rootless/safe and host-independent). Confirm no path in
`mock-backend.js` reaches for a real runtime when `DEMO=1`.
### 3.7 Mock-data refresh
- Update `mockData` static apps + marketplace to current app set/versions, refresh
wallet figures, seeded mesh messages, and files so the demo feels current. This
is ongoing and rides the same image pipeline.
---
## 4. Invariants / guardrails (public exposure)
- **No real secrets, no real backend, no real Docker socket** in the demo image or
public repo. Mock password stays a known demo credential, clearly labeled.
- **Per-session isolation** is a hard requirement before going public — without it
the demo is unusable for strangers.
- **Resource caps:** session count, per-session memory + upload quota, idle reap;
the box can't be DoS'd into OOM by upload spam or session churn.
- **`ANTHROPIC_API_KEY`** (chat) is injected via Portainer env, never committed;
rate-limit / budget-cap demo chat usage.
- **Read-only registry creds** for the Portainer host to pull `:demo`.
---
## 5. Files / seams
| Concern | Where |
|---------|-------|
| Per-session state, file persistence, testnet labels, sim-mode | `neode-ui/mock-backend.js` |
| Build contexts (reused as-is) | `neode-ui/Dockerfile.backend`, `neode-ui/Dockerfile.web`, `neode-ui/docker/nginx-demo.conf` |
| Demo stack (in-repo, dev) | `docker-compose.demo.yml` (keep `build:`) |
| Public stack (new repo) | `archy-demo/docker-compose.yml` (`image:` refs), `.env.example`, README |
| CI pipeline | new workflow (path filter `neode-ui/**` → build + push `:demo` → Portainer webhook) |
---
## 6. Open questions
1. **Demo host** — which Portainer instance (OVH `.168`? a dedicated VPS)? Public
DNS + TLS for `demo.<domain>`?
2. **Registry for `:demo` images**`146.59.87.168:3000` vs vps2; public-pull or
creds baked into Portainer?
3. **Session TTL + concurrency cap** — concrete numbers (30 min / N sessions / 50 MB)?
4. **Chat in the demo** — enable Claude chat (needs key + budget cap) or stub it?
5. **Sync cadence** — rebuild `:demo` on every `neode-ui/**` push, or nightly?

View File

@ -0,0 +1,69 @@
# Multinode / Fleet Testing Plan (separate from the single-node gate)
> **Scope split (2026-06-22):** the production test gate (`docs/PRODUCTION-MASTER-PLAN.md` §5,
> `tests/lifecycle/TESTING.md`) is now a **single-node criterion on .228**. Verifying the same
> lifecycle matrix across the rest of the fleet (.198 and the other testers) lives HERE and is run
> **after** the .228 single-node gate is green. This is intentionally NOT a blocker on the .228 gate.
## Why split it out
The lifecycle gate must be **run ON the node under test** — its bitcoin/companion/orphan/endpoint
checks use local `podman`/`systemctl`/`bitcoin-cli`/`curl`, not RPC to a remote host. Running it from
one host against another silently tests the *runner*. So "multinode" isn't "point the harness at N
hosts" — it's "run the on-node gate on each host," plus the genuinely cross-node concerns (federation,
mesh, transport, sync) that a single node can't exercise.
## How to run the gate on another node
Bats + jq usually aren't installed on ISO nodes. Bootstrap (one-time per node):
```
# from a host that has them (e.g. .116):
dpkg -L bats | grep -E '^/usr/(bin|lib|libexec)' | tar czf /tmp/bats.tgz -P -T - $(which jq)
tar czf /tmp/tests.tgz -C <repo> tests/lifecycle
scp /tmp/bats.tgz /tmp/tests.tgz <node>:/tmp/
# on the node:
sudo tar xzf /tmp/bats.tgz -P -C / # bats (jq here is dynamically linked — may need libs)
sudo curl -fsSL -o /usr/local/bin/jq \
https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 && sudo chmod +x /usr/local/bin/jq
mkdir -p /tmp/lifecycle-run && tar xzf /tmp/tests.tgz -C /tmp/lifecycle-run
cd /tmp/lifecycle-run/tests/lifecycle
ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=https ARCHY_PASSWORD=<node pw> \
ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5 nohup ./run-gate.sh > /tmp/gate.log 2>&1 &
```
## Per-node preconditions (learned on .228)
- **Bitcoin must be fully synced + archival** (`initialblockdownload:false`, `pruned:false`).
test 83 reads the *real* `getblockchaininfo`, not the UI's headers-height. A node mid-IBD will
cascade-fail electrumx/lnd/btcpay/mempool even though the apps run.
- **Backends should be proper installs** (in `manifest_ids`), not adopted plain-podman left over
from ad-hoc `package.start`/cascade churn — otherwise companion self-heal and quadlet checks skew.
- **No stale per-app nginx proxy targets.** e.g. `/app/lnd/` must point at the lnd-ui port (18083),
not a stale `8081`. Repo code is correct; old node configs may be stale — re-check + regenerate.
- **No orphan quadlet units** (e.g. a `home-assistant.container` whose ContainerName ≠ the real
`homeassistant` container) — these wedge `systemctl --user` "activating" and fail the quadlet checks.
## Node roster (carry-over)
| Node | Role | Notes |
|------|------|-------|
| .228 | **single-node gate** (primary) | 14-app resilience node; bitcoin synced archival; gate GREEN. |
| .198 | fleet verify | was weak/loaded (load ~35) + **bitcoin mid-IBD** at split time → must finish syncing first; sshd wedges under concurrent SSH (use ONE session; gate uses HTTPS RPC so fine). |
| .5 / .120 | x250 testers (Tailscale) | flaky cellular; SSH via `tailscale nc` ProxyCommand. |
| .116 | dev/validation | local repo; its own bitcoin may be mid-IBD — do NOT treat as a gate target unless synced. |
## Cross-node concerns (only a multinode setup can test)
- Federation sync (Tor/FIPS transports), DID/contact federation, peer file fetch.
- Mesh (Meshtastic/MeshCore) + mesh-AI gating.
- Dual-ecash federation validation + networking-sats routing.
- DHT / iroh swarm distribution (origin-always-wins) once that dep lands.
## Sequence
1. Get the **.228 single-node gate green 5×** (master plan §5/§6) — DONE/in progress.
2. THEN: bring each fleet node to the preconditions above; run the on-node gate 5× per node.
3. THEN: the cross-node suites (federation/mesh/transport), tracked here.
This plan does not gate the v1.7.x single-node criterion; it is the next layer.

View File

@ -14,14 +14,6 @@ RUN npm install
# Copy application code
COPY neode-ui/ ./
# Sibling assets the mock backend reads relative to /app (../docker, ../demo):
# the Bitcoin UI mock shell and any curated cloud files dropped into demo/files.
COPY docker/bitcoin-ui /docker/bitcoin-ui
COPY docker/electrs-ui /docker/electrs-ui
COPY docker/lnd-ui /docker/lnd-ui
COPY docker/fedimint-ui /docker/fedimint-ui
COPY demo/files /demo/files
# Expose port
EXPOSE 5959

View File

@ -20,12 +20,6 @@ RUN find public/assets -name "*backup*" -type f -delete || true && \
ENV DOCKER_BUILD=true
ENV NODE_ENV=production
# Public-demo build flag — inlined into the bundle (import.meta.env.VITE_DEMO).
# Enables the per-day intro replay, the "entertoexit" login hint, and other
# demo-only UI affordances. Override with --build-arg VITE_DEMO=0 for a plain build.
ARG VITE_DEMO=1
ENV VITE_DEMO=$VITE_DEMO
# Use npm script which handles build better
RUN npm run build:docker || (echo "Build failed! Listing files:" && ls -la && echo "Checking vite config:" && cat vite.config.ts && exit 1)

View File

@ -62,28 +62,6 @@ http {
proxy_set_header X-Real-IP $remote_addr;
}
# ElectrumX UI status (polled by the electrs-ui shell)
location /electrs-status {
proxy_pass http://neode-backend:5959;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# LND UI endpoints (polled by the lnd-ui shell)
location /proxy/ {
proxy_pass http://neode-backend:5959;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /lnd-connect-info {
proxy_pass http://neode-backend:5959;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# Proxy FileBrowser API to mock backend (demo mode)
location /app/filebrowser/ {
client_max_body_size 10G;
@ -94,59 +72,6 @@ http {
proxy_request_buffering off;
}
# IndeeHub: reverse-proxy the real site same-origin, strip framing headers,
# and rewrite its absolute asset paths (/assets, /, src, href) to the
# /app/indeedhub/ prefix so the SPA loads inside the iframe.
location /app/indeedhub/ {
proxy_pass https://indee.tx1138.com/;
proxy_http_version 1.1;
proxy_set_header Host indee.tx1138.com;
proxy_set_header Accept-Encoding "";
proxy_ssl_server_name on;
proxy_hide_header X-Frame-Options;
proxy_hide_header Content-Security-Policy;
proxy_hide_header Content-Security-Policy-Report-Only;
sub_filter_types text/html text/css application/javascript application/json;
sub_filter_once off;
sub_filter 'href="/' 'href="/app/indeedhub/';
sub_filter 'src="/' 'src="/app/indeedhub/';
sub_filter "href='/" "href='/app/indeedhub/";
sub_filter "src='/" "src='/app/indeedhub/";
sub_filter 'from"/' 'from"/app/indeedhub/';
sub_filter 'url(/' 'url(/app/indeedhub/';
}
# Mempool: same approach. NOTE mempool.space is a strict third-party app —
# its data/websocket calls may still be blocked; iframe is best-effort.
location /app/mempool/ {
proxy_pass https://mempool.space/;
proxy_http_version 1.1;
proxy_set_header Host mempool.space;
proxy_set_header Accept-Encoding "";
proxy_ssl_server_name on;
proxy_hide_header X-Frame-Options;
proxy_hide_header Content-Security-Policy;
proxy_hide_header Content-Security-Policy-Report-Only;
sub_filter_types text/html text/css application/javascript application/json;
sub_filter_once off;
sub_filter 'href="/' 'href="/app/mempool/';
sub_filter 'src="/' 'src="/app/mempool/';
sub_filter "href='/" "href='/app/mempool/";
sub_filter "src='/" "src='/app/mempool/";
sub_filter 'from"/' 'from"/app/mempool/';
sub_filter 'url(/' 'url(/app/mempool/';
}
# Proxy every other app UI (/app/<id>/) to the mock backend, which serves
# the per-app mock UIs (bitcoin-ui, electrumx, lnd, fedimint) and the
# generic "Not available in the demo" notice for the rest.
location /app/ {
proxy_pass http://neode-backend:5959;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
# Serve AIUI SPA
location /aiui/ {
alias /usr/share/nginx/html/aiui/;

File diff suppressed because it is too large Load Diff

View File

@ -73,7 +73,7 @@
"author": "Mempool",
"category": "money",
"tier": "core",
"dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.0",
"dockerImage": "146.59.87.168:3000/lfg2025/mempool-frontend:v3.0.1",
"repoUrl": "https://github.com/mempool/mempool",
"requires": [
"bitcoin-knots",
@ -195,7 +195,7 @@
"title": "Nostr Relay (Rust)",
"version": "0.8.0",
"description": "High-performance Nostr relay written in Rust. Host your own decentralized social media relay and earn networking profits.",
"icon": "/assets/img/app-icons/nostrudel.svg",
"icon": "/assets/img/app-icons/nostr.svg",
"author": "Nostr RS Relay",
"category": "community",
"tier": "recommended",
@ -214,31 +214,6 @@
]
}
},
{
"id": "meshtastic",
"title": "Meshtastic",
"version": "2-daily-alpine",
"description": "Open-source mesh networking for LoRa radios. Create decentralized communication networks.",
"icon": "/assets/img/app-icons/meshcore.svg",
"author": "Meshtastic",
"category": "networking",
"tier": "recommended",
"dockerImage": "docker.io/meshtastic/meshtasticd:daily-alpine",
"repoUrl": "https://github.com/meshtastic/firmware",
"containerConfig": {
"ports": [
"4403:4403"
],
"volumes": [
"/var/lib/archipelago/meshtastic:/var/lib/meshtasticd"
],
"env": [
"MESHTASTIC_PORT=/dev/ttyUSB0",
"MESHTASTIC_SERIAL=true"
],
"notes": "Requires a LoRa radio device at /dev/ttyUSB0. The config file is rendered from the app manifest before container start."
}
},
{
"id": "vaultwarden",
"title": "Vaultwarden",

View File

@ -38,6 +38,13 @@ export const companionInputActive = ref(false)
let ws: WebSocket | null = null
let shouldReconnect = true
let reconnectTimer: ReturnType<typeof setTimeout> | null = null
// Exponential backoff for the relay socket. It's a secondary feature (companion
// input), so when the backend is down it must NOT hammer a fixed-interval
// reconnect — that floods the console/network with failed-WS noise for the whole
// outage. Back off 1s → 30s, reset on a successful open. (Mirrors websocket.ts.)
let relayReconnectAttempts = 0
const RELAY_RECONNECT_BASE_MS = 1000
const RELAY_RECONNECT_MAX_MS = 30_000
let cursorEl: HTMLDivElement | null = null
let companionTimeout: ReturnType<typeof setTimeout> | null = null
let inputFlickerTimeout: ReturnType<typeof setTimeout> | null = null
@ -332,6 +339,7 @@ function doConnect() {
ws.onopen = () => {
relayConnected.value = true
relayReconnectAttempts = 0 // healthy again — reset backoff
if (import.meta.env.DEV) console.log('[RemoteRelay] Connected')
}
@ -343,7 +351,12 @@ function doConnect() {
relayConnected.value = false
ws = null
if (shouldReconnect) {
reconnectTimer = setTimeout(doConnect, 5000)
const delay = Math.min(
RELAY_RECONNECT_BASE_MS * 2 ** relayReconnectAttempts,
RELAY_RECONNECT_MAX_MS,
)
relayReconnectAttempts++
reconnectTimer = setTimeout(doConnect, delay)
}
}
@ -379,6 +392,7 @@ export function requestExternalOpen(url: string): boolean {
/** Start the remote relay listener. Connects to /ws/remote-relay. */
export function startRemoteRelay() {
shouldReconnect = true
relayReconnectAttempts = 0
doConnect()
}

View File

@ -69,12 +69,12 @@
<div class="relative flex-1 min-h-0 bg-black/40 overflow-hidden">
<!-- Loading indicator -->
<Transition name="content-fade">
<div v-if="iframeLoading" class="absolute inset-0 z-10 flex items-center justify-center bg-black/40">
<svg class="animate-spin h-8 w-8 text-white/70" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
</div>
<AppLoadingScreen
v-if="iframeLoading"
:icon="overlayIcon"
:title="store.title || 'App'"
:progress="loadProgress"
/>
</Transition>
<iframe
ref="iframeRef"
@ -184,10 +184,12 @@
</template>
<script setup lang="ts">
import { ref, watch, onMounted, onBeforeUnmount } from 'vue'
import { ref, computed, watch, onMounted, onBeforeUnmount } from 'vue'
import { useAppLauncherStore } from '@/stores/appLauncher'
import NostrSignConsent from '@/components/NostrSignConsent.vue'
import NostrIdentityPicker from '@/components/NostrIdentityPicker.vue'
import AppLoadingScreen from '@/components/AppLoadingScreen.vue'
import { DEFAULT_APP_ICON } from '@/views/apps/appsConfig'
import { rpcClient } from '@/api/rpc-client'
interface PaymentRequest {
@ -207,6 +209,39 @@ const isRefreshing = ref(false)
const iframeLoading = ref(true)
const iframeBlocked = ref(false)
// Best-guess icon for the loading screen resolved from the /app/{id}/ path
// when present; AppLoadingScreen's <img> falls back to the default icon if the
// guessed asset 404s.
const overlayIcon = computed(() => {
const url = store.url
if (!url) return DEFAULT_APP_ICON
try {
const m = new URL(url, window.location.origin).pathname.match(/^\/app\/([a-z0-9._-]+)/i)
if (m?.[1]) return `/assets/img/app-icons/${m[1].toLowerCase()}.png`
} catch { /* not a parseable URL */ }
return DEFAULT_APP_ICON
})
// Faux load progress (cross-origin iframes give no real progress events): ease
// toward ~92% while loading, snap to 100% on load.
const loadProgress = ref(0)
let progressTimer: ReturnType<typeof setInterval> | null = null
function stopProgress() {
if (progressTimer) { clearInterval(progressTimer); progressTimer = null }
}
function startProgress() {
stopProgress()
loadProgress.value = 8
progressTimer = setInterval(() => {
loadProgress.value += Math.max(0.4, (92 - loadProgress.value) * 0.08)
if (loadProgress.value >= 92) { loadProgress.value = 92; stopProgress() }
}, 180)
}
watch(iframeLoading, (loading) => {
if (loading) startProgress()
else { stopProgress(); loadProgress.value = 100 }
}, { immediate: true })
// Nostr identity picker state
const showIdentityPicker = ref(false)
const IDENTITY_STORAGE_KEY = 'archipelago_app_identity_'
@ -573,6 +608,7 @@ onMounted(() => {
onBeforeUnmount(() => {
clearTimers()
stopProgress()
window.removeEventListener('keydown', onKeyDown, true)
window.removeEventListener('message', onMessage)
})

View File

@ -0,0 +1,81 @@
<template>
<div class="app-loading-screen absolute inset-0 z-10 flex flex-col items-center justify-center">
<div class="app-loading-icon">
<img :src="icon" :alt="title" @error="handleImageError" />
</div>
<p class="app-loading-title">{{ title }}</p>
<div class="app-loading-bar">
<div class="app-loading-fill" :style="{ width: `${clampedProgress}%` }"></div>
</div>
<p class="app-loading-hint">{{ hint }}</p>
</div>
</template>
<script setup lang="ts">
import { computed } from 'vue'
import { handleImageError } from '@/views/apps/appsConfig'
const props = withDefaults(defineProps<{
icon: string
title: string
progress: number
hint?: string
}>(), {
hint: 'Loading…',
})
const clampedProgress = computed(() => Math.min(100, Math.max(0, props.progress)))
</script>
<style scoped>
.app-loading-screen {
gap: 18px;
background: #0b0d12;
}
.app-loading-icon {
width: 84px;
height: 84px;
border-radius: 20px;
overflow: hidden;
display: flex;
align-items: center;
justify-content: center;
background: rgba(255, 255, 255, 0.05);
border: 1px solid rgba(255, 255, 255, 0.08);
box-shadow: 0 12px 32px rgba(0, 0, 0, 0.45);
animation: app-loading-pulse 1.8s ease-in-out infinite;
}
.app-loading-icon img {
width: 100%;
height: 100%;
object-fit: cover;
}
.app-loading-title {
margin: 0;
font-size: 1rem;
font-weight: 600;
color: rgba(255, 255, 255, 0.9);
}
.app-loading-bar {
width: min(240px, 60vw);
height: 4px;
border-radius: 999px;
background: rgba(255, 255, 255, 0.1);
overflow: hidden;
}
.app-loading-fill {
height: 100%;
border-radius: 999px;
background: linear-gradient(90deg, #fb923c, #f59e0b);
transition: width 0.3s ease;
}
.app-loading-hint {
margin: 0;
font-size: 0.75rem;
color: rgba(255, 255, 255, 0.4);
}
@keyframes app-loading-pulse {
0%, 100% { transform: scale(1); opacity: 1; }
50% { transform: scale(1.05); opacity: 0.85; }
}
</style>

View File

@ -82,7 +82,7 @@ const STORAGE_KEY = 'neode_companion_intro_seen'
// Absolute URL so the QR works when scanned by a phone (a relative path has no
// host to resolve). Points at the companion APK hosted on the 146 release server
// (publicly reachable) rather than the local node's /packages copy.
const DEFAULT_DOWNLOAD_URL = 'http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/neode-ui/public/packages/archipelago-companion.apk.zip'
const DEFAULT_DOWNLOAD_URL = 'http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/neode-ui/public/packages/archipelago-companion.apk'
const visible = ref(false)
const qrDataUrl = ref('')

View File

@ -1,96 +0,0 @@
/**
* Public-demo helpers.
*
* The demo build (VITE_DEMO=1) replays the intro/onboarding on each visit, but
* only once per calendar day per browser tracked in localStorage so it
* survives the short-lived backend session. Also exposes the shared demo
* credentials shown on the login screen.
*/
export const IS_DEMO =
import.meta.env.VITE_DEMO === '1' || import.meta.env.VITE_DEMO === 'true'
/** Memorable shared password for the public demo (must match the mock backend). */
export const DEMO_PASSWORD = 'entertoexit'
const INTRO_DATE_KEY = 'demo_intro_date'
function todayKey(): string {
// Local calendar day, e.g. "2026-06-22".
const d = new Date()
return `${d.getFullYear()}-${String(d.getMonth() + 1).padStart(2, '0')}-${String(d.getDate()).padStart(2, '0')}`
}
/** True if this browser already watched the intro earlier today. */
export function demoIntroSeenToday(): boolean {
try {
return localStorage.getItem(INTRO_DATE_KEY) === todayKey()
} catch {
return false
}
}
/** Record that the intro has been seen today, so it won't replay until tomorrow. */
export function markDemoIntroSeen(): void {
try {
localStorage.setItem(INTRO_DATE_KEY, todayKey())
} catch {
/* ignore (private mode / storage disabled) */
}
}
/** Forget today's "seen" marker so the intro plays again (e.g. "Replay Intro"). */
export function clearDemoIntroSeen(): void {
try {
localStorage.removeItem(INTRO_DATE_KEY)
} catch {
/* ignore */
}
}
// ── Demoable apps ───────────────────────────────────────────────────────────
// Only these apps actually do something in the demo (a mock UI or a real
// external site). Everything else shows "No demo" on a disabled install button
// and is not launchable.
const DEMO_EXTERNAL_URLS: Record<string, string> = {}
// Apps loaded in the in-app iframe via a same-origin path. IndeeHub and Mempool
// are reverse-proxied by nginx (X-Frame-Options/CSP stripped + asset paths
// rewritten) so the frame-busting real sites can be embedded.
const DEMO_MOCK_UI: Record<string, string> = {
indeedhub: '/app/indeedhub/',
mempool: '/app/mempool/',
'mempool-web': '/app/mempool/',
'bitcoin-knots': '/app/bitcoin-knots/',
'bitcoin-core': '/app/bitcoin-core/',
bitcoin: '/app/bitcoin-core/',
'bitcoin-ui': '/app/bitcoin-ui/',
electrs: '/app/electrumx/',
electrumx: '/app/electrumx/',
'archy-electrs-ui': '/app/electrumx/',
lnd: '/app/lnd/',
'lnd-ui': '/app/lnd/',
'archy-lnd-ui': '/app/lnd/',
thunderhub: '/app/lnd/',
fedimint: '/app/fedimint/',
fedimintd: '/app/fedimint/',
filebrowser: '/app/filebrowser/',
}
/**
* Whether a demo app opens in a new tab. Nothing does IndeeHub and Mempool
* both load their real site directly in the in-app iframe.
*/
export function isDemoExternal(_appId: string): boolean {
return false
}
/** Can this app be launched/installed in the demo? */
export function isDemoApp(appId: string): boolean {
return appId in DEMO_EXTERNAL_URLS || appId in DEMO_MOCK_UI
}
/** Resolve the demo launch URL for an app, or null if it isn't demoable. */
export function demoAppUrl(appId: string): string | null {
return DEMO_EXTERNAL_URLS[appId] ?? DEMO_MOCK_UI[appId] ?? null
}

View File

@ -23,8 +23,6 @@ if (!navigator.clipboard) {
},
})
}
import { useToast } from '@/composables/useToast'
const app = createApp(App)
const pinia = createPinia()
@ -97,14 +95,20 @@ function recordError(source: string, err: unknown, info?: string) {
const entry: ArchyErrorEntry = { when: new Date().toISOString(), source, message, info, stack: e?.stack }
errorLog.push(entry)
if (errorLog.length > 25) errorLog.shift()
// Log SILENTLY: a global handler error is almost always something we should
// fix at the source, not interrupt the user for. Keep the full record on the
// console + the window.__archyErrors ring buffer, and make the screenshot-able
// overlay available ON DEMAND (window.__archyShowErrors(), or the debug view)
// — but do NOT auto-pop a red toast / overlay over the UI. Components that
// need to tell the user about a *specific, actionable* failure still call
// toast.error() directly; this catch-all stays out of the way.
console.error(`[${source}]`, err, info ?? '')
// Surface the real message (truncated) instead of a generic toast — this is a
// test/bug-bash build, and "Something went wrong" hides exactly what we need.
const short = message.length > 140 ? `${message.slice(0, 140)}` : message
try {
useToast().error(`Something went wrong: ${short}`)
} catch { /* toast itself failed — the console + ring buffer still have it */ }
// Always show the on-device overlay so the error is visible without a console.
}
// Expose the on-demand error overlay + ring buffer so a crash that only repros
// in a runtime without a console (Android companion WebView) is still
// retrievable: call `window.__archyShowErrors()` to screenshot/Copy them.
;(window as unknown as { __archyShowErrors?: () => void }).__archyShowErrors = () => {
try { showErrorOverlay() } catch { /* overlay is best-effort */ }
}
@ -133,15 +137,28 @@ function reloadOnceForStaleChunk(err: unknown): boolean {
return true
}
// Known-benign environmental noise — expected on some deployments and not
// actionable by the user or us, so it shouldn't even occupy a ring-buffer slot
// (which would push out real errors). The PWA service worker can't register
// over a self-signed cert (it needs a trusted cert or localhost); on those
// nodes the SW/offline cache simply doesn't run, which is fine. Logged at debug
// only. (A trusted cert is the real fix — tracked separately, #56.)
function isBenignEnvironmentError(err: unknown): boolean {
const msg = (err as { message?: string })?.message ?? String(err ?? '')
return /Failed to register a ServiceWorker|ServiceWorker.*(SSL|certificate|SecurityError)|An SSL certificate error occurred when fetching the script/i.test(msg)
}
// Vue's errorHandler only catches errors raised synchronously inside Vue's
// lifecycle/reactivity. Async rejections and plain runtime errors (e.g. a JS
// API missing in an older WebView) slip past it, so catch those too.
window.addEventListener('error', (ev) => {
if (reloadOnceForStaleChunk(ev.error ?? ev.message)) return
if (isBenignEnvironmentError(ev.error ?? ev.message)) { console.debug('[benign]', ev.message); return }
recordError('window.error', ev.error ?? ev.message)
})
window.addEventListener('unhandledrejection', (ev) => {
if (reloadOnceForStaleChunk(ev.reason)) return
if (isBenignEnvironmentError(ev.reason)) { console.debug('[benign]', ev.reason); return }
recordError('unhandledrejection', ev.reason)
})

View File

@ -55,7 +55,7 @@ describe('useAppLauncherStore', () => {
expect(mockWindowOpen).not.toHaveBeenCalled()
})
it('uses route-based app sessions on mobile instead of panel mode', () => {
it('uses the store-driven panel on mobile (no route change, no background swap)', () => {
Object.defineProperty(window, 'innerWidth', {
value: 390,
writable: true,
@ -65,8 +65,10 @@ describe('useAppLauncherStore', () => {
store.openSession('indeedhub')
expect(store.panelAppId).toBe(null)
expect(mockPush).toHaveBeenCalledWith({ name: 'app-session', params: { appId: 'indeedhub' }, query: { returnTo: '/dashboard/apps' } })
// Mobile now uses the store-driven panel like desktop panel mode so the
// underlying page/tab never changes and closing returns to the origin.
expect(store.panelAppId).toBe('indeedhub')
expect(mockPush).not.toHaveBeenCalled()
})
it('normalizes localhost launch URLs to current host before resolving', () => {
@ -117,7 +119,7 @@ describe('useAppLauncherStore', () => {
)
})
it('routes desktop new-tab apps into app session on mobile', () => {
it('opens tab-only apps directly on mobile (new tab in PWA, no interstitial)', () => {
Object.defineProperty(window, 'innerWidth', {
value: 390,
writable: true,
@ -127,10 +129,17 @@ describe('useAppLauncherStore', () => {
store.open({ url: 'http://192.168.1.228:8081', title: 'Nginx Proxy Manager' })
// Tab-only app on mobile-web: open directly in a new browser tab (the
// companion would use the in-app WebView). No session, no route push, no
// "this app opens in a tab" interstitial.
expect(store.isOpen).toBe(false)
expect(store.panelAppId).toBe(null)
expect(mockWindowOpen).not.toHaveBeenCalled()
expect(mockPush).toHaveBeenCalledWith({ name: 'app-session', params: { appId: 'nginx-proxy-manager' }, query: { returnTo: '/dashboard/apps' } })
expect(mockPush).not.toHaveBeenCalled()
expect(mockWindowOpen).toHaveBeenCalledWith(
'http://192.168.1.228:8081',
'_blank',
'noopener,noreferrer',
)
})
it('opens Nginx Proxy Manager in new tab using title hint when URL is path-only', () => {
@ -264,7 +273,7 @@ describe('useAppLauncherStore', () => {
)
})
it('routes prepackaged websites into app session on mobile', () => {
it('opens prepackaged websites in the store-driven panel on mobile', () => {
Object.defineProperty(window, 'innerWidth', {
value: 390,
writable: true,
@ -274,9 +283,12 @@ describe('useAppLauncherStore', () => {
store.open({ url: 'https://present.l484.com', title: 'Arch Presentation', openInNewTab: true })
// Iframeable prepackaged sites stay in-app via the store panel (no route
// change, no background swap) just like every other mobile launch.
expect(store.isOpen).toBe(false)
expect(store.panelAppId).toBe('arch-presentation')
expect(mockWindowOpen).not.toHaveBeenCalled()
expect(mockPush).toHaveBeenCalledWith({ name: 'app-session', params: { appId: 'arch-presentation' }, query: { returnTo: '/dashboard/apps' } })
expect(mockPush).not.toHaveBeenCalled()
})
it('routes HTTPS same-host apps via session view', () => {

View File

@ -4,7 +4,7 @@ import { rpcClient } from '@/api/rpc-client'
import router from '@/router'
import { recordAppLaunch } from '@/utils/appUsage'
import { requestExternalOpen } from '@/api/remote-relay'
import { IS_DEMO, isDemoExternal, demoAppUrl } from '@/composables/useDemoIntro'
import { openInAppOrNewTab } from '@/utils/openExternal'
/**
* Open a URL in a new browser tab but if a companion (phone) is currently
@ -223,20 +223,25 @@ export const useAppLauncherStore = defineStore('appLauncher', () => {
function openSession(appId: string) {
recordAppLaunch(appId)
const mobile = isMobileViewport()
// Demo: apps backed by a real external site that blocks iframing (mempool.space)
// open in a new tab; everything else demoable renders in the in-app session.
if (IS_DEMO && isDemoExternal(appId)) {
const ext = demoAppUrl(appId)
if (ext) { openExternal(ext); return }
}
const launchUrl = NEW_TAB_APP_IDS.has(appId) ? directAppUrl(appId) : null
if (launchUrl && !mobile) {
openExternal(launchUrl)
return
// Tab-only apps (set X-Frame-Options, can't be iframed). No interstitial:
// desktop opens a new browser tab; mobile opens the in-app WebView (Android
// companion) or a new browser tab (PWA) — see openInAppOrNewTab.
if (NEW_TAB_APP_IDS.has(appId)) {
const launchUrl = directAppUrl(appId)
if (launchUrl) {
if (mobile) openInAppOrNewTab(launchUrl)
else openExternal(launchUrl)
return
}
}
// Iframeable apps. Mobile and desktop-panel mode both use the store-driven
// panel so the underlying page/tab never changes (no background swap) and
// closing returns the user to wherever they launched from. Only desktop
// overlay/fullscreen modes use a routed session.
const mode = localStorage.getItem(DISPLAY_MODE_KEY) || 'panel'
if (mode === 'panel' && !mobile) {
if (mobile || mode === 'panel') {
panelAppId.value = appId
} else {
panelAppId.value = null

View File

@ -164,6 +164,20 @@ select:focus-visible {
/* Mobile: override with tab bar clearance */
@media (max-width: 767px) {
/* Mobile web browsers report 100vh taller than the visible area (the dynamic
URL/toolbar chrome). The dashboard is the containing block for the fixed,
container-relative panes (the mesh chat/tools panes), so a 100vh-tall
container pushes their `bottom` offset below the visible viewport they
slide under the bottom tab bar (which is body-teleported and viewport-fixed,
so it stays put). Pin the dashboard to the *dynamic* viewport so the two
reference frames line up. No-op in the companion WebView (no browser chrome
dvh == vh), so its layout is unchanged. Doubled class beats Tailwind's
`.min-h-screen` (100vh) utility on specificity. */
.dashboard-view.dashboard-view {
height: 100dvh;
min-height: 100dvh;
}
.mobile-scroll-pad {
padding-bottom: calc(var(--mobile-tab-bar-height, 88px) + var(--safe-area-bottom, env(safe-area-inset-bottom, 0px)) + var(--audio-player-height, 0px) + 16px);
}

View File

@ -11,15 +11,37 @@
*/
interface ArchipelagoNativeBridge {
openExternal?: (url: string) => void
openInApp?: (url: string) => void
}
function nativeBridge(): ArchipelagoNativeBridge | undefined {
return (window as unknown as { ArchipelagoNative?: ArchipelagoNativeBridge }).ArchipelagoNative
}
export function openExternalUrl(url: string): void {
if (!url) return
const native = (window as unknown as { ArchipelagoNative?: ArchipelagoNativeBridge })
.ArchipelagoNative
const native = nativeBridge()
if (native && typeof native.openExternal === 'function') {
native.openExternal(url)
return
}
window.open(url, '_blank', 'noopener,noreferrer')
}
/**
* Launch an app that can't be embedded in an iframe (X-Frame-Options) from a
* mobile surface with NO "this app opens in a tab" interstitial.
*
* - Android companion: hand it to the in-app WebView (`openInApp`) so it stays
* inside Archipelago with the native back/forward/reload/close controls.
* - Plain mobile browser (PWA): open directly in a new browser tab.
*/
export function openInAppOrNewTab(url: string): void {
if (!url) return
const native = nativeBridge()
if (native && typeof native.openInApp === 'function') {
native.openInApp(url)
return
}
window.open(url, '_blank', 'noopener,noreferrer')
}

View File

@ -1,6 +1,6 @@
<template>
<div class="app-session-root">
<Teleport to="body" :disabled="isInlinePanel">
<Teleport to="body" :disabled="isInlinePanel && !isMobile">
<div
:class="backdropClasses"
@click.self="handleBackdropClick"
@ -27,6 +27,7 @@
:app-url="appUrl"
:app-id="appId"
:app-title="appTitle"
:app-icon="appIcon"
:loading="loading"
:iframe-blocked="iframeBlocked"
:must-open-new-tab="mustOpenNewTab"
@ -104,12 +105,11 @@ import {
type DisplayMode, DISPLAY_MODE_KEY, NEW_TAB_APPS, IFRAME_BLOCKED_APPS,
resolveAppUrl, resolveAppTitle,
} from './appSession/appSessionConfig'
import { launchBlockedReason } from './apps/appsConfig'
import { launchBlockedReason, resolveAppIcon } from './apps/appsConfig'
import { useAppIdentity } from './appSession/useAppIdentity'
import { useNostrBridge } from './appSession/useNostrBridge'
import { openExternalUrl } from '@/utils/openExternal'
import { openExternalUrl, openInAppOrNewTab } from '@/utils/openExternal'
import { useElectrsSync } from '@/composables/useElectrsSync'
import { IS_DEMO, isDemoExternal } from '@/composables/useDemoIntro'
const props = defineProps<{
appIdProp?: string
@ -155,14 +155,18 @@ const appId = computed(() => {
const appTitle = computed(() => resolveAppTitle(appId.value))
const packageEntry = computed(() => store.data?.['package-data']?.[appId.value] || null)
const appIcon = computed(() =>
packageEntry.value
? resolveAppIcon(appId.value, packageEntry.value)
: `/assets/img/app-icons/${appId.value}.png`
)
const blockedReason = computed(() => launchBlockedReason(appId.value, packageEntry.value))
const blockedTitle = computed(() => appId.value === 'fedimint' || appId.value === 'fedimintd' ? 'Waiting for Bitcoin sync' : 'App not ready')
const isMobile = typeof window !== 'undefined' && window.innerWidth < 768
// In the demo, apps backed by a real external site that blocks iframing
// (mempool.space) open in a new tab rather than the in-app session frame.
const mustOpenNewTab = computed(() =>
NEW_TAB_APPS.has(appId.value) || (IS_DEMO && isDemoExternal(appId.value))
)
// Reactive so the overlay/teleport/footer/animation decisions track the live
// viewport (and match the CSS `md` breakpoint) instead of a stale one-shot read.
const isMobile = ref(typeof window !== 'undefined' && window.innerWidth < 768)
function updateIsMobile() { isMobile.value = window.innerWidth < 768 }
const mustOpenNewTab = computed(() => NEW_TAB_APPS.has(appId.value))
// ElectrumX shows a sync screen before its real UI (the Electrum server only
// serves clients once its index is built). Poll /electrs-status while this is
@ -246,16 +250,18 @@ function setMode(mode: DisplayMode) {
}
}
// Reactive classes based on display mode
// Reactive classes based on display mode. On mobile the store-driven panel
// renders as a full-screen overlay (teleported to body) so it covers the nav
// and the underlying page never changes desktop keeps the inline panel.
const backdropClasses = computed(() => {
if (isInlinePanel.value) return 'app-session-backdrop-inline'
if (isInlinePanel.value && !isMobile.value) return 'app-session-backdrop-inline'
return 'app-session-backdrop-overlay'
})
const panelClasses = computed(() => {
const base = 'app-session-panel glass-card'
if (isInlinePanel.value) return `${base} app-session-inline`
if (displayMode.value === 'fullscreen') return `${base} app-session-fullscreen`
if (isInlinePanel.value && !isMobile.value) return `${base} app-session-inline`
if (displayMode.value === 'fullscreen' && !isMobile.value) return `${base} app-session-fullscreen`
return `${base} app-session-overlay`
})
@ -375,10 +381,13 @@ watch(displayMode, (mode) => {
})
onMounted(() => {
// Apps that block iframes open externally on desktop. On mobile, keep the
// session surface visible so launcher taps do not bounce straight out.
if (mustOpenNewTab.value && appUrl.value && !isMobile) {
window.open(appUrl.value, '_blank', 'noopener,noreferrer')
// Apps that block iframes (X-Frame-Options) can't be shown in the session.
// Open them directly instead of showing a "this app opens in a tab"
// interstitial: desktop new browser tab; mobile in-app WebView (companion)
// or new tab (PWA). Then dismiss the (empty) session surface.
if (mustOpenNewTab.value && appUrl.value) {
if (isMobile.value) openInAppOrNewTab(appUrl.value)
else window.open(appUrl.value, '_blank', 'noopener,noreferrer')
if (isInlinePanel.value) emit('close')
else closeRouteSession()
return
@ -386,8 +395,9 @@ onMounted(() => {
window.addEventListener('keydown', onKeyDown, true)
window.addEventListener('message', onMessage)
window.addEventListener('resize', updateIsMobile)
document.addEventListener('fullscreenchange', onFullscreenChange)
if (IFRAME_BLOCKED_APPS.has(appId.value) || (mustOpenNewTab.value && isMobile)) {
if (IFRAME_BLOCKED_APPS.has(appId.value)) {
loading.value = false
iframeBlocked.value = true
} else {
@ -409,6 +419,7 @@ onBeforeUnmount(() => {
if (iframeCheckId) clearTimeout(iframeCheckId)
window.removeEventListener('keydown', onKeyDown, true)
window.removeEventListener('message', onMessage)
window.removeEventListener('resize', updateIsMobile)
document.removeEventListener('fullscreenchange', onFullscreenChange)
screensaverStore.resume(screensaverReason.value)
if (document.fullscreenElement) document.exitFullscreen().catch(() => {})

View File

@ -62,7 +62,6 @@ import { ref, computed, onMounted, onBeforeUnmount } from 'vue'
import { useRouter } from 'vue-router'
import { useI18n } from 'vue-i18n'
import { ContextBroker } from '@/services/contextBroker'
import { IS_DEMO } from '@/composables/useDemoIntro'
const { t } = useI18n()
@ -72,12 +71,9 @@ const aiuiConnected = ref(false)
let broker: ContextBroker | null = null
const aiuiUrl = computed(() => {
// Demo: ?mockArchy makes AIUI use its built-in mock node data (apps, system,
// network, wallet, bitcoin, files) and &seed pre-loads the example chats.
const demo = IS_DEMO ? '&mockArchy=1&seed=1' : ''
const envUrl = import.meta.env.VITE_AIUI_URL
if (envUrl) return `${envUrl}?embedded=true&hideClose=true${demo}`
if (import.meta.env.PROD || IS_DEMO) return `/aiui/?embedded=true&hideClose=true${demo}`
if (envUrl) return `${envUrl}?embedded=true&hideClose=true`
if (import.meta.env.PROD) return '/aiui/?embedded=true&hideClose=true'
return ''
})

View File

@ -156,11 +156,6 @@
<!-- Normal Login Mode -->
<template v-else>
<!-- Demo credential hint -->
<div v-if="isDemo" class="mb-4 p-3 bg-orange-500/15 border border-orange-400/30 rounded-lg text-orange-100 text-sm text-center">
🎮 Demo mode Password: <span class="font-mono font-semibold">{{ DEMO_PASSWORD }}</span>
</div>
<div class="mb-6">
<label for="login-password" class="block text-sm font-medium text-white/80 mb-2">
{{ t('login.password') }}
@ -208,16 +203,14 @@
>
{{ t('login.replayIntro') }}
</button>
<template v-if="!isDemo">
<span class="text-white/30">|</span>
<button
@click="restartOnboarding"
:disabled="isResettingOnboarding"
class="text-xs text-white/50 hover:text-white/70 transition-colors underline-offset-2 hover:underline disabled:opacity-50 disabled:cursor-not-allowed"
>
{{ isResettingOnboarding ? t('login.resetting') : t('login.onboarding') }}
</button>
</template>
<span class="text-white/30">|</span>
<button
@click="restartOnboarding"
:disabled="isResettingOnboarding"
class="text-xs text-white/50 hover:text-white/70 transition-colors underline-offset-2 hover:underline disabled:opacity-50 disabled:cursor-not-allowed"
>
{{ isResettingOnboarding ? t('login.resetting') : t('login.onboarding') }}
</button>
</div>
</div>
</div>
@ -235,7 +228,6 @@ const { t } = useI18n()
import { useLoginTransitionStore } from '../stores/loginTransition'
import { rpcClient } from '../api/rpc-client'
import { resumeAudioContext, startSynthwave, stopSynthwave, playLoginSuccessWhoosh, playPop } from '@/composables/useLoginSounds'
import { IS_DEMO, DEMO_PASSWORD, clearDemoIntroSeen } from '@/composables/useDemoIntro'
const router = useRouter()
const currentRoute = useRoute()
@ -249,8 +241,7 @@ const loginRedirectTo = computed(() => {
const store = useAppStore()
const loginTransition = useLoginTransitionStore()
const isDemo = IS_DEMO
const password = ref(IS_DEMO ? DEMO_PASSWORD : '')
const password = ref('')
const confirmPassword = ref('')
const loading = ref(false)
const error = ref<string | null>(null)
@ -529,8 +520,6 @@ async function handleTotpVerify() {
function replayIntro() {
// Clear the intro seen flag
localStorage.removeItem('neode_intro_seen')
// Demo: also clear the per-day gate so the intro plays again now.
if (IS_DEMO) clearDemoIntroSeen()
// Navigate to root to trigger splash screen
window.location.href = '/'
}

View File

@ -63,8 +63,8 @@
<button
v-else
@click="installApp"
:disabled="demoNoInstall || installing || (!installBlockedReason && !app.manifestUrl && !app.dockerImage)"
:title="demoNoInstall ? 'Not available in the demo' : (installBlockedReason || undefined)"
:disabled="installing || (!installBlockedReason && !app.manifestUrl && !app.dockerImage)"
:title="installBlockedReason || undefined"
class="glass-button glass-button-sm px-6 py-2.5 rounded-lg text-sm font-semibold flex items-center gap-2 disabled:opacity-50 disabled:cursor-not-allowed"
>
<svg v-if="installing" class="animate-spin h-4 w-4" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
@ -74,7 +74,7 @@
<svg v-else class="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-4l-4 4m0 0l-4-4m4 4V4" />
</svg>
{{ demoNoInstall ? 'No demo' : installBlockedReason ? 'Bitcoin Pruned' : installing ? t('common.installing') : t('common.install') }}
{{ installBlockedReason ? 'Bitcoin Pruned' : installing ? t('common.installing') : t('common.install') }}
</button>
</div>
</div>
@ -129,8 +129,8 @@
<button
v-else
@click="installApp"
:disabled="demoNoInstall || installing || (!installBlockedReason && !app.manifestUrl && !app.dockerImage)"
:title="demoNoInstall ? 'Not available in the demo' : (installBlockedReason || undefined)"
:disabled="installing || (!installBlockedReason && !app.manifestUrl && !app.dockerImage)"
:title="installBlockedReason || undefined"
class="glass-button glass-button-sm px-4 py-2.5 rounded-lg text-sm font-semibold flex items-center justify-center gap-2 disabled:opacity-50 disabled:cursor-not-allowed col-span-2"
>
<svg v-if="installing" class="animate-spin h-4 w-4" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
@ -140,7 +140,7 @@
<svg v-else class="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M4 16v1a3 3 0 003 3h10a3 3 0 003-3v-1m-4-4l-4 4m0 0l-4-4m4 4V4" />
</svg>
{{ demoNoInstall ? 'No demo' : installBlockedReason ? 'Bitcoin Pruned' : installing ? t('common.installing') : t('common.install') }}
{{ installBlockedReason ? 'Bitcoin Pruned' : installing ? t('common.installing') : t('common.install') }}
</button>
</div>
@ -351,7 +351,6 @@
<script setup lang="ts">
import { ref, computed, onMounted, onBeforeUnmount } from 'vue'
import { IS_DEMO, isDemoApp } from '@/composables/useDemoIntro'
import { useRouter, useRoute } from 'vue-router'
import { useI18n } from 'vue-i18n'
import { useAppStore } from '../stores/app'
@ -487,9 +486,6 @@ const installBlockedReason = computed(() => {
return electrumxArchiveWarning
})
// Demo: only demoable apps can be installed; the rest show "No demo".
const demoNoInstall = computed(() => IS_DEMO && !!app.value?.id && !isDemoApp(app.value.id))
let pendingRedirect: ReturnType<typeof setTimeout> | null = null
onMounted(() => {

View File

@ -22,30 +22,27 @@
@click="goToOptions"
class="glass-button px-6 py-3 sm:px-8 sm:py-4 rounded-lg text-base sm:text-lg font-medium transition-all hover:bg-black/70 hover:border-white/30 onb-cta"
>
{{ isDemo ? 'Enter the demo →' : 'Unlock your sovereignty →' }}
Unlock your sovereignty
</button>
<!-- Onboarding wizard entry points are hidden in the demo (no seed/identity setup) -->
<template v-if="!isDemo">
<a
tabindex="0"
role="button"
class="text-white/50 hover:text-white/80 underline text-sm cursor-pointer mt-4 block text-center onb-cta"
@click="goToRestore"
@keydown.enter="goToRestore"
>
Restore from seed phrase
</a>
<a
tabindex="0"
role="button"
class="text-white/50 hover:text-white/80 underline text-sm cursor-pointer mt-2 block text-center onb-cta"
@click="goToLogin"
@keydown.enter="goToLogin"
>
Already set up? Log in
</a>
</template>
<a
tabindex="0"
role="button"
class="text-white/50 hover:text-white/80 underline text-sm cursor-pointer mt-4 block text-center onb-cta"
@click="goToRestore"
@keydown.enter="goToRestore"
>
Restore from seed phrase
</a>
<a
tabindex="0"
role="button"
class="text-white/50 hover:text-white/80 underline text-sm cursor-pointer mt-2 block text-center onb-cta"
@click="goToLogin"
@keydown.enter="goToLogin"
>
Already set up? Log in
</a>
</div>
</div>
</div>
@ -56,16 +53,11 @@ import { ref, onMounted } from 'vue'
import { useRouter } from 'vue-router'
import AnimatedLogo from '@/components/AnimatedLogo.vue'
import { playNavSound } from '@/composables/useNavSounds'
import { IS_DEMO, markDemoIntroSeen } from '@/composables/useDemoIntro'
const router = useRouter()
const ctaButton = ref<HTMLButtonElement | null>(null)
const isDemo = IS_DEMO
onMounted(() => {
// Demo: once the visitor has seen the intro today, don't auto-replay it again
// until tomorrow (they can still use "Replay Intro" on the login screen).
if (IS_DEMO) markDemoIntroSeen()
// Auto-focus after entry animation completes (1.4s animation delay + 0.6s duration)
setTimeout(() => {
ctaButton.value?.focus({ preventScroll: true })
@ -74,13 +66,6 @@ onMounted(() => {
function goToOptions() {
playNavSound('action')
// Demo: skip the onboarding wizard (seed/identity setup) entirely go straight
// to login, which is prefilled with the demo password.
if (isDemo) {
localStorage.setItem('neode_onboarding_complete', '1')
router.push('/login').catch(() => {})
return
}
router.push('/onboarding/path').catch(() => {})
}

View File

@ -1304,7 +1304,7 @@ async function payWithLightning() {
function scheduleInvoicePoll() {
if (invoicePollTimer) clearTimeout(invoicePollTimer)
invoicePollTimer = setTimeout(pollInvoice, 1000)
invoicePollTimer = setTimeout(pollInvoice, 3000)
}
async function pollInvoice() {

View File

@ -16,22 +16,11 @@
import { ref, onMounted } from 'vue'
import { useRouter } from 'vue-router'
import { isOnboardingComplete } from '@/composables/useOnboarding'
import { IS_DEMO, demoIntroSeenToday } from '@/composables/useDemoIntro'
import BootScreen from '@/components/BootScreen.vue'
const router = useRouter()
const showBootScreen = ref(false)
/**
* Public demo: replay the intro on every visit, but at most once per calendar
* day per browser. If already seen today straight to login; otherwise intro.
*/
function demoRoute() {
const dest = demoIntroSeenToday() ? '/login' : '/onboarding/intro'
log('demoRoute', { dest })
router.replace(dest).catch(() => {})
}
function log(msg: string, data?: unknown) {
const ts = new Date().toISOString()
const entry = `[RootRedirect ${ts}] ${msg}` + (data !== undefined ? ` ${JSON.stringify(data)}` : '')
@ -79,10 +68,6 @@ async function checkOnboarded(): Promise<boolean> {
}
async function proceedToApp() {
if (IS_DEMO) {
demoRoute()
return
}
const devMode = import.meta.env.VITE_DEV_MODE
if (devMode === 'setup' || devMode === 'existing') {
log('proceedToApp devMode', { devMode })
@ -136,11 +121,6 @@ onMounted(async () => {
log('production flow', { isUp })
if (isUp) {
// Demo: per-day intro gate instead of server-side onboarding state.
if (IS_DEMO) {
demoRoute()
return
}
const onboarded = await checkOnboarded()
if (onboarded) {
log('server up + onboarded → proceedToApp')

View File

@ -3,8 +3,8 @@ import { beforeEach, describe, expect, it, vi } from 'vitest'
import AppSession from '../AppSession.vue'
const { mockReplace, mockPush, mockWindowOpen, mockSuppress, mockResume } = vi.hoisted(() => ({
mockReplace: vi.fn(),
mockPush: vi.fn(),
mockReplace: vi.fn(() => Promise.resolve()),
mockPush: vi.fn(() => Promise.resolve()),
mockWindowOpen: vi.fn(),
mockSuppress: vi.fn(),
mockResume: vi.fn(),
@ -62,7 +62,7 @@ describe('AppSession mobile new-tab apps', () => {
})
})
it('keeps iframe-blocked apps inside the mobile session instead of auto-opening a tab', async () => {
it('opens tab-only apps directly on mobile instead of showing an interstitial', async () => {
const wrapper = mount(AppSession, {
global: {
stubs: {
@ -75,9 +75,11 @@ describe('AppSession mobile new-tab apps', () => {
})
await flushPromises()
expect(mockWindowOpen).not.toHaveBeenCalled()
expect(mockReplace).not.toHaveBeenCalled()
expect(wrapper.text()).toContain('This app opens in a new tab')
expect(wrapper.text()).toContain('Open in new tab')
// Tab-only app (gitea) on mobile-web: open directly in a new browser tab
// (no native bridge in the test) and dismiss the empty session — no
// "this app opens in a tab" interstitial.
expect(mockWindowOpen).toHaveBeenCalled()
expect(mockReplace).toHaveBeenCalled()
expect(wrapper.text()).not.toContain('This app opens in a new tab')
})
})

View File

@ -1,12 +1,7 @@
<template>
<div class="relative flex-1 min-h-0 bg-black/40 overflow-hidden app-session-frame-safe">
<Transition name="content-fade">
<div v-if="loading" class="absolute inset-0 z-10 flex items-center justify-center bg-black/40">
<svg class="animate-spin h-8 w-8 text-blue-400" viewBox="0 0 24 24" fill="none">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4" />
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z" />
</svg>
</div>
<AppLoadingScreen v-if="loading" :icon="appIcon" :title="appTitle" :progress="loadProgress" />
</Transition>
<!-- ElectrumX sync screen shown before the real UI while the on-chain
@ -116,13 +111,15 @@
</template>
<script setup lang="ts">
import { nextTick, ref, watch } from 'vue'
import { nextTick, onBeforeUnmount, ref, watch } from 'vue'
import type { ElectrsSyncStatus } from '@/composables/useElectrsSync'
import AppLoadingScreen from '@/components/AppLoadingScreen.vue'
const props = defineProps<{
appUrl: string
appId: string
appTitle: string
appIcon: string
loading: boolean
iframeBlocked: boolean
mustOpenNewTab: boolean
@ -144,6 +141,40 @@ const emit = defineEmits<{
const iframeRef = ref<HTMLIFrameElement | null>(null)
// Faux load progress for the loading screen. Cross-origin iframes give no real
// progress events, so ease toward ~92% while loading and snap to 100% on load
// far better UX than a black screen with a bare spinner.
const loadProgress = ref(0)
let progressTimer: ReturnType<typeof setInterval> | null = null
function stopProgress() {
if (progressTimer) { clearInterval(progressTimer); progressTimer = null }
}
function startProgress() {
stopProgress()
loadProgress.value = 8
progressTimer = setInterval(() => {
// Decelerate as it approaches the cap so it never visually "finishes" early.
const remaining = 92 - loadProgress.value
loadProgress.value += Math.max(0.4, remaining * 0.08)
if (loadProgress.value >= 92) { loadProgress.value = 92; stopProgress() }
}, 180)
}
watch(() => props.loading, (isLoading) => {
if (isLoading) {
startProgress()
} else {
stopProgress()
loadProgress.value = 100
}
}, { immediate: true })
watch(() => props.refreshKey, () => { if (props.loading) startProgress() })
onBeforeUnmount(stopProgress)
function focusIframe() {
iframeRef.value?.focus({ preventScroll: true })
}

View File

@ -1,7 +1,6 @@
/** Static configuration maps for app session routing and display */
import { GENERATED_APP_PORTS, GENERATED_APP_TITLES, GENERATED_NEW_TAB_APPS } from './generatedAppSessionConfig'
import { IS_DEMO, demoAppUrl } from '@/composables/useDemoIntro'
export type DisplayMode = 'panel' | 'overlay' | 'fullscreen'
@ -77,15 +76,6 @@ export const IFRAME_BLOCKED_APPS = new Set<string>([])
/** Resolve app URL using direct port mapping (source of truth) */
export function resolveAppUrl(id: string, routeQueryPath?: string, runtimeUrl?: string): string {
// Demo: route to the app's mock UI or real external site (mempool.space,
// indee.tx1138.com). Carry through a deep-link path (e.g. /tx/<hash> for
// mempool). Non-demoable apps fall through to a generic notice page.
if (IS_DEMO) {
const base = demoAppUrl(id)
if (base) return routeQueryPath ? base + routeQueryPath : base
return `/app/${id}/`
}
// External HTTPS apps
const ext = EXTERNAL_URLS[id]
if (ext) return ext

View File

@ -102,17 +102,23 @@
</div>
</div>
<!-- Uninstalling progress live stage label from backend -->
<!-- Uninstalling progress truthful stage-driven bar (mirrors install) -->
<div v-else-if="isUninstalling" class="mt-4">
<div class="flex items-center gap-1.5">
<svg class="animate-spin h-3 w-3 text-red-400" fill="none" viewBox="0 0 24 24">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
<span class="text-xs text-red-300 truncate">{{ uninstallStageLabel }}</span>
<div class="flex items-center justify-between mb-1.5">
<span class="text-xs text-white/70 flex items-center gap-1.5">
<svg class="animate-spin h-3 w-3" fill="none" viewBox="0 0 24 24">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
{{ uninstallStageLabel }}
</span>
<span v-if="uninstallProgress !== null" class="text-xs text-white/50">{{ uninstallProgress }}%</span>
</div>
<div class="mt-1.5 w-full h-1.5 bg-white/10 rounded-full overflow-hidden">
<div class="h-full bg-red-400/60 rounded-full animate-pulse w-full"></div>
<div class="w-full h-1.5 bg-white/10 rounded-full overflow-hidden">
<div
class="install-progress-fill h-full bg-white/60 rounded-full transition-all duration-500"
:style="{ width: `${Math.max(uninstallProgress ?? 8, 4)}%` }"
></div>
</div>
</div>
@ -282,6 +288,29 @@ const uninstallStageLabel = computed(() => {
return raw ? raw : `${t('common.uninstalling')}`
})
// Map the backend's uninstall-stage label to a truthful percentage so the bar
// progresses through the teardown instead of sitting at a solid full(-red)
// block. Backend stages (set_uninstall_stage):
// "Stopping containers (X/N)" 1050% (linear over the stack)
// "Cleaning up volumes" 70%
// "Removing app data" 90%
// Unknown/between pushes null the bar parks low and the shimmer overlay
// (install-progress-fill) carries the motion, exactly like a fixed install phase.
const uninstallProgress = computed<number | null>(() => {
const raw = props.pkg['uninstall-stage'] || ''
const m = raw.match(/\((\d+)\s*\/\s*(\d+)\)/)
if (m) {
const done = Number(m[1])
const total = Number(m[2])
if (total > 0) {
return Math.round(10 + Math.min(done / total, 1) * 40)
}
}
if (/volume/i.test(raw)) return 70
if (/data/i.test(raw)) return 90
return null
})
const isTransitioning = computed(() => {
const s = props.pkg.state
const h = props.pkg.health

View File

@ -239,6 +239,16 @@ const APP_ICON_FALLBACKS: Record<string, string> = {
'archy-bitcoin-ui': '/assets/img/app-icons/bitcoin-knots.webp',
'archy-lnd-ui': '/assets/img/app-icons/lnd.svg',
'archy-electrs-ui': '/assets/img/app-icons/electrumx.png',
// ElectrumX ships under a few historical ids (the backend was renamed
// electrs → electrumx). Without an explicit map, an `electrs`-keyed install
// falls through to the default `/assets/img/app-icons/electrs.png`, which
// doesn't exist → handleImageError swaps .png→.svg and lands on electrs.svg
// (the "Electrs in Rust" logo) instead of the real ElectrumX icon. Pin the
// whole family to the ElectrumX icon so My Apps shows the right logo no
// matter which id the node has it installed under.
'electrs': '/assets/img/app-icons/electrumx.png',
'electrs-ui': '/assets/img/app-icons/electrumx.png',
'electrumx': '/assets/img/app-icons/electrumx.png',
}
// Parent-app icon by prefix, for stack members not listed explicitly above

View File

@ -1,9 +1,12 @@
<template>
<Teleport to="body">
<!-- Offline Banner -->
<!-- Lifecycle / Offline Banner.
Server restart/shutdown is deliberate shown immediately. A plain
connection blip is debounced (showConnIssue) so transient sub-grace
reconnects don't flash. -->
<Transition name="conn-banner">
<div
v-if="isOffline && !store.isReconnecting && store.isAuthenticated"
v-if="(showLifecycle || showConnectionLost)"
class="conn-banner-overlay"
>
<div class="path-option-card px-6 py-3 border-l-4 border-yellow-500 inline-flex items-center gap-2 text-yellow-200 shadow-2xl">
@ -17,10 +20,10 @@
</div>
</Transition>
<!-- Reconnecting Banner -->
<!-- Reconnecting Banner (debounced) -->
<Transition name="conn-banner">
<div
v-if="store.isReconnecting && store.isAuthenticated"
v-if="showReconnecting"
class="conn-banner-overlay"
>
<div class="path-option-card px-6 py-3 border-l-4 border-blue-500 inline-flex items-center gap-2 text-blue-200 shadow-2xl">
@ -35,7 +38,7 @@
</template>
<script setup lang="ts">
import { computed } from 'vue'
import { computed, ref, watch, onUnmounted } from 'vue'
import { useAppStore } from '@/stores/app'
const store = useAppStore()
@ -43,6 +46,58 @@ const store = useAppStore()
const isOffline = computed(() => store.isOffline)
const isRestarting = computed(() => store.isRestarting)
const isShuttingDown = computed(() => store.isShuttingDown)
// A deliberate server lifecycle transition (restart/shutdown) is real and
// user-initiated surface it immediately, no debounce.
const isLifecycleTransition = computed(() => isRestarting.value || isShuttingDown.value)
const showLifecycle = computed(() => isLifecycleTransition.value && store.isAuthenticated)
// A plain connection blip (offline or reconnecting, not a lifecycle transition).
// The overwhelming majority recover within a second or two (load spikes,
// Tailscale/relay TCP resets), so showing the banner instantly makes a healthy
// node read as unstable. Debounce: only surface after the issue persists past a
// grace window; hide immediately on recovery.
const hasConnIssue = computed(
() => (store.isReconnecting || isOffline.value) && !isLifecycleTransition.value
)
const SHOW_DELAY_MS = 2500
const showConnIssue = ref(false)
let pendingTimer: ReturnType<typeof setTimeout> | null = null
function clearTimer() {
if (pendingTimer) {
clearTimeout(pendingTimer)
pendingTimer = null
}
}
watch(
hasConnIssue,
(issue) => {
clearTimer()
if (issue) {
pendingTimer = setTimeout(() => {
showConnIssue.value = true
pendingTimer = null
}, SHOW_DELAY_MS)
} else {
// Recovered before the grace window elapsed hide at once.
showConnIssue.value = false
}
},
{ immediate: true }
)
onUnmounted(clearTimer)
// Debounced visual states the template renders.
const showReconnecting = computed(
() => showConnIssue.value && store.isReconnecting && store.isAuthenticated
)
const showConnectionLost = computed(
() => showConnIssue.value && isOffline.value && !store.isReconnecting && store.isAuthenticated
)
</script>
<style scoped>

View File

@ -143,9 +143,10 @@ const mobileTabBar = ref<HTMLElement | null>(null)
const MOBILE_LAYOUT_MAX_WIDTH = 920
const viewportWidth = ref(typeof window === 'undefined' ? 1024 : window.innerWidth)
// App sessions own their mobile controls. Normal mobile launches use the route
// session; keeping this guard also protects any desktop-panel state on resize.
const isAppSessionActive = computed(() => route.name === 'app-session')
// App sessions own their mobile controls, so the nav hides while one is open.
// Mobile launches now use the store-driven panel (no route change) to keep the
// background tab intact, so treat an active panel the same as a routed session.
const isAppSessionActive = computed(() => route.name === 'app-session' || !!appLauncher.panelAppId)
// Show persistent tabs for Apps/Marketplace on mobile
const showAppsTabs = computed(() => {

View File

@ -102,9 +102,9 @@
@click.stop="$emit('launch', app)"
class="px-4 py-2 glass-button glass-button-sm rounded-lg text-sm font-medium"
>Launch</button>
<!-- Scanning (skipped in demo there are no real containers to scan) -->
<!-- Scanning -->
<span
v-else-if="!IS_DEMO && !containersScanned && (app.source === 'local' || app.dockerImage)"
v-else-if="!containersScanned && (app.source === 'local' || app.dockerImage)"
class="flex-1 px-4 py-2 rounded-lg text-white/50 text-sm font-medium text-center cursor-default relative overflow-hidden"
>
<span class="discover-shimmer-bg"></span>
@ -116,12 +116,6 @@
Checking...
</span>
</span>
<!-- Demo: app not demoable -->
<button
v-else-if="IS_DEMO && !isInstalled(app.id) && !isDemoApp(app.id)"
disabled
class="flex-1 px-4 py-2 bg-white/10 rounded-lg text-white/40 text-sm font-medium cursor-not-allowed"
>No demo</button>
<!-- Install button -->
<button
v-else-if="!isInstalled(app.id) && (app.source === 'local' || app.dockerImage)"
@ -164,7 +158,6 @@
<script setup lang="ts">
import type { MarketplaceApp } from './types'
import { handleImageError } from '@/views/apps/appsConfig'
import { IS_DEMO, isDemoApp } from '@/composables/useDemoIntro'
defineProps<{
filteredApps: MarketplaceApp[]

View File

@ -64,7 +64,7 @@
Starting...
</span>
<button
v-else-if="!IS_DEMO && !containersScanned && app.dockerImage"
v-else-if="!containersScanned && app.dockerImage"
disabled
class="text-white/40 text-sm flex items-center gap-2"
>
@ -74,11 +74,6 @@
</svg>
Checking...
</button>
<button
v-else-if="IS_DEMO && !isInstalled(app.id) && !isDemoApp(app.id)"
disabled
class="glass-button glass-button-sm rounded-lg text-sm font-medium opacity-50 cursor-not-allowed"
>No demo</button>
<button
v-else-if="!isInstalled(app.id) && app.dockerImage"
data-controller-install-btn
@ -104,7 +99,6 @@
<script setup lang="ts">
import type { FeaturedApp, MarketplaceApp } from './types'
import { handleImageError } from '@/views/apps/appsConfig'
import { IS_DEMO, isDemoApp } from '@/composables/useDemoIntro'
defineProps<{
featuredApps: FeaturedApp[]

View File

@ -85,7 +85,7 @@ export function getCuratedAppList(): MarketplaceApp[] {
{ id: 'grafana', title: 'Grafana', version: '10.2.0', description: 'Analytics and monitoring platform. Dashboards for your node metrics and system health.', icon: '/assets/img/app-icons/grafana.png', author: 'Grafana Labs', dockerImage: `${R}/grafana:10.2.0`, repoUrl: 'https://github.com/grafana/grafana' },
{ id: 'searxng', title: 'SearXNG', version: '2024.1.0', description: 'Privacy-respecting metasearch engine. Search the internet without being tracked or profiled.', icon: '/assets/img/app-icons/searxng.png', author: 'SearXNG', dockerImage: `${R}/searxng:latest`, repoUrl: 'https://github.com/searxng/searxng' },
{ id: 'ollama', title: 'Ollama', version: '0.5.4', description: 'Run AI models locally. Llama, Mistral, and more — on your hardware, completely private.', icon: '/assets/img/app-icons/ollama.png', author: 'Ollama', dockerImage: `${R}/ollama:latest`, repoUrl: 'https://github.com/ollama/ollama' },
{ id: 'cryptpad', title: 'CryptPad', version: '2024.12.0', description: 'End-to-end encrypted documents, spreadsheets, and presentations. Zero-knowledge collaboration.', icon: '/assets/img/app-icons/cryptpad.webp', author: 'XWiki SAS', dockerImage: `${R}/cryptpad:2024.12.0`, repoUrl: 'https://github.com/cryptpad/cryptpad' },
{ id: 'cryptpad', title: 'CryptPad', version: '2024.12.0', description: 'End-to-end encrypted documents, spreadsheets, and presentations. Zero-knowledge collaboration.', icon: '/assets/icon/favico-black-v2.svg', author: 'XWiki SAS', dockerImage: `${R}/cryptpad:2024.12.0`, repoUrl: 'https://github.com/cryptpad/cryptpad' },
{ id: 'nextcloud', title: 'Nextcloud', version: '29', description: 'Your own private cloud. File sync, calendars, contacts — all on your hardware.', icon: '/assets/img/app-icons/nextcloud.webp', author: 'Nextcloud', dockerImage: `${R}/nextcloud:29`, repoUrl: 'https://github.com/nextcloud/server' },
{ id: 'vaultwarden', title: 'Vaultwarden', version: '1.30.0', description: 'Self-hosted password vault. Bitwarden-compatible with zero-knowledge encryption.', icon: '/assets/img/app-icons/vaultwarden.webp', author: 'Vaultwarden', dockerImage: `${R}/vaultwarden:1.30.0-alpine`, repoUrl: 'https://github.com/dani-garcia/vaultwarden' },
{ id: 'jellyfin', title: 'Jellyfin', version: '10.8.13', description: 'Free media server. Stream your movies, music, and photos to any device.', icon: '/assets/img/app-icons/jellyfin.webp', author: 'Jellyfin', dockerImage: `${R}/jellyfin:10.8.13`, repoUrl: 'https://github.com/jellyfin/jellyfin' },

View File

@ -234,7 +234,7 @@ export function getCuratedAppList(): MarketplaceApp[] {
title: 'CryptPad',
version: '2024.12.0',
description: 'End-to-end encrypted documents, spreadsheets, and presentations. Zero-knowledge collaboration.',
icon: '/assets/img/app-icons/cryptpad.webp',
icon: '/assets/icon/favico-black-v2.svg',
author: 'XWiki SAS',
dockerImage: `${REGISTRY}/cryptpad:2024.12.0`,
manifestUrl: undefined,

View File

@ -151,16 +151,6 @@ export default defineConfig({
changeOrigin: true,
secure: false,
},
// Demo mock app UIs (electrumx, lnd, fedimint) + generic notice page.
'/app/electrumx': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/app/electrs': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/app/lnd': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/app/fedimint': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/app/bitcoin-core': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/app/bitcoin-knots': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/electrs-status': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/proxy': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
'/lnd-connect-info': { target: process.env.BACKEND_URL || 'http://localhost:5959', changeOrigin: true, secure: false },
// Serve the node's deployed AIUI same-origin like production (set VITE_AIUI_URL=/aiui/)
'/aiui': {
target: process.env.AIUI_PROXY_TARGET || 'http://127.0.0.1:80',

4178
releases/app-catalog.json Normal file

File diff suppressed because it is too large Load Diff

View File

@ -80,7 +80,7 @@ fi
# runs the release gate harness (cargo fmt/check, catalog drift, vitest, and
# the focused cargo suites — incl. the receive/port-drift/secret regressions).
# Skipped on --dry-run, or set SKIP_RELEASE_TESTS=1 to bypass in an emergency.
# The lifecycle bats harness (tests/lifecycle/run-20x.sh) still runs separately
# The lifecycle bats harness (tests/lifecycle/run-gate.sh) still runs separately
# against live nodes — see tests/lifecycle/TESTING.md.
if ! $DRY_RUN; then
if [ "${SKIP_RELEASE_TESTS:-0}" = "1" ]; then

Some files were not shown because too many files have changed in this diff Show More