Fixes from real fresh-install feedback (Framework node .81) + its log bundle:
Backend:
- websocket: subscribe before initial snapshot — broadcasts in the gap were
silently lost, stranding clients on stale state until a hard refresh
(the "everything needs ctrl-r" bug: My Apps stuck Loading, App Store
stuck Checking, containers-scanned never arriving)
- crash recovery: check the crash marker BEFORE writing our own PID —
recovery had never run on any node (always saw its own PID and skipped);
PID-reuse guard via /proc cmdline
- boot status: pending-boot-starts registry (recovery, stack recovery,
reconciler, adoption) — scanner overlays queued-but-down apps as
Restarting instead of Stopped after a reboot; scanner-authored
Restarting resolves immediately on a settled scan (no transitional wedge)
- install deps: bounded wait (36x5s) when a dependency is installed but
still starting ("Waiting for Bitcoin to start…") instead of instant
rejection; dependency-gate rejections remove the optimistic entry (no
phantom Stopped tile) and surface as a notification
- seed backup: auth.setup persists the onboarding mnemonic as the
encrypted seed backup (reveal previously failed on EVERY node — nothing
ever wrote master_seed.enc); seed.restore stashes too; error sanitizer
lets seed/2FA errors through instead of "Check server logs"
- lnd: bitcoind.rpchost resolved from the running Bitcoin variant
(hardcoded bitcoin-knots broke Core nodes); manifest uses derived_env
- bitcoin status: clean human message for connection-reset/startup; raw
URLs + os-error chains no longer reach the app card
- fedimint-clientd: chown /var/lib/archipelago/fmcd to 1000:1000 (root-
created dir crash-looped the rootless container, EACCES) — first-boot
script + pre-start self-heal
- log volume (>1GB/day on a day-old node): journald caps drop-in (ISO +
bootstrap self-heal), bitcoind -printtoconsole=0 everywhere (90% of the
journal was IBD UpdateTip spam), tracing default debug→info
Frontend:
- Login: Enter advances to confirm field then submits; submit always
clickable with inline errors (was silently disabled on mismatch);
Restart Onboarding needs a confirming second click (the mismatch →
"onboarding restarted" trap)
- sync store: 30s state reconciliation + refetch on re-entrant connect;
20s containers-scanned escape hatch so Checking can never show forever;
fresh empty node reaches the real "no apps yet" state
- intro video: CRF20 re-encode (SSIM 0.988) + faststart — moov was at EOF
so playback needed the full 15MB first (the intro lag)
- backgrounds: 10 heaviest JPEGs → WebP q90 (9.4MB→6.6MB); 7 stayed JPEG
(WebP larger on noisy sources)
- Web5ConnectedNodes: drop unused template ref that failed vue-tsc -b
ISO/kiosk:
- nginx: /assets/ 404s no longer cached immutable for a year; HTTPS block
gained the missing /assets/ location (served index.html as images)
- kiosk: launcher/service spliced from configs/ at ISO build (stale
heredoc force-disabled GPU); MemoryHigh/Max 1200/1500→2200/2800M (kiosk
rode the reclaim throttle = the lag); firmware-intel-graphics +
firmware-amd-graphics (trixie split DMC blobs out of misc-nonfree)
Verified: cargo test 898/898 green, npm run build green with dist
contents confirmed (webp refs, lnd.png, faststart video, new strings).
Handover for ISO build + deploy: docs/HANDOVER-2026-07-02-iso-feedback.md
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Peers that opt in via a new "Share Location" toggle in Settings
(server.set-location RPC) get plotted on other trusted peers' Mesh Map
with a distinct Archy-logo marker, separate from raw LoRa radio peers.
Location is persisted locally, carried in NodeStateSnapshot, and
propagated through federation sync/delta like other node state.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
- reticulum.rs: send_text_msg was lossy-UTF8-mangling binary CBOR control
envelopes (ReadReceipt etc.) before sending as LXMF text; base64-encode
with a marker instead, decoded losslessly on receive.
- typed_messages.rs: mesh.send-read-receipt fired automatically on every
chat view with no is_archy_peer gate, so viewing a message from a stock
(non-archy) LXMF peer auto-sent it an undecodable control envelope,
surfacing as garbage text right after whatever it just sent. Now a no-op
for non-archy peers.
- mesh/listener/mod.rs: RX_STALL_TIMEOUT was 300s and forced a full
auto-detect reconnect on any otherwise-healthy but quiet mesh link
(visible as "Connecting..." flapping); this also wiped Reticulum's
in-memory peer-address table every cycle, breaking messaging with peers
who hadn't re-announced in the window. Bumped to 1800s.
- reticulum.rs: persist the peer prefix/dest-hash/display-name table to
disk so a restart doesn't force every peer back to "Anonymous Peer"
until they re-announce.
- decode.rs/frames.rs: Meshcore was discarding the SNR its wire format
carries; wire it onto the peer record. Mesh.vue's signalBars() now falls
back to SNR-based bars when RSSI is unavailable (always true for
Meshcore); Reticulum has neither and correctly stays at 0/"no data".
- system/handlers.rs, dispatcher.rs: new system.get-hostname RPC + cert
regeneration (with a proper SAN) whenever server.set-name changes the
hostname, so HTTPS doesn't add a mismatch warning on top of the
self-signed one after a rename.
- AccountInfoSection.vue: surface the mDNS hostname + http/https links in
Settings (HTTPS needed for mic/camera secure-context features) — never
forced, both keep working.
- build-auto-installer-iso.sh: ship avahi-daemon so .local names actually
resolve on the LAN, and give the self-signed cert a real SAN instead of
a bare CN, both at image-build and install-time-fallback.
- Mesh.vue/MediaLightbox.vue/mesh-styles.css: mic/attach-stack no longer
closes on a plain hover-past; mesh images open in the shared lightbox
and have a real download button; lightbox close button moves to
bottom-center on mobile instead of under the status bar; mesh device
panel gets the same height/padding as its sibling tabs.
Verified: 108/108 mesh unit tests, deployed + confirmed healthy on
.116/.198/.228 (matching binary hash across all three), live Reticulum
messaging confirmed working end-to-end post-deploy.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Add a TollGate row (Enabled/Disabled/Not installed) to the Home
dashboard's Network tile, polling the existing openwrt.get-status RPC
on the same cadence as the other network rows. Only rendered once an
OpenWrt router is actually configured, so nodes without one aren't
cluttered with an always-"Not configured" row.
Also fixes the underlying reason this could never have worked: nothing
in the OpenWrt Gateway flow ever persisted the router's host/credentials
server-side — the "connect" form only kept them in local component
state, so any no-args openwrt.get-status call (this new tile, and even
the Gateway page's own reload) always failed with "No router
configured" despite a fully working, provisioned router. Now
handle_openwrt_get_status saves the connection to router_config.json
whenever a host is explicitly passed in and the connection succeeds.
orchestrator_uninstall_app_ids("immich") only disabled the "immich" app_id
itself; "immich-postgres" and "immich-redis" (separate orchestrator-tracked
manifests, same pattern as mempool-api/archy-mempool-db) stayed enabled, so
the boot reconciler kept restarting their leftover stopped containers
forever after the generic uninstall path stopped them (.198, 2026-07-01 --
found while uninstalling immich to relieve disk I/O pressure competing with
a slow Bitcoin IBD).
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Several compounding bugs were blocking end-to-end TollGate provisioning
on OpenWrt 25.x (apk-native) routers:
- install_ipk's non-ar fallback assumed a flat tarball, but some .ipks are
a gzip tar of the three classic ipk members one level deep; it was
dumping debian-binary/data.tar.gz/control.tar.gz straight into / instead
of unpacking the real payload.
- Manually-extracted packages never ran their pending /etc/uci-defaults/*
scripts (that only happens through opkg/apk's own postinst bookkeeping),
so nothing ever created /etc/config/tollgate.
- uci_apply() never ensured the target config file existed first — `uci
set` fails outright on a config namespace nothing has created yet, which
is true for a package-defined one like "tollgate" (unlike wireless/
network/dhcp, which ship by default).
- The installed-check and restart_services looked for a binary/init script
named after the opkg package ("tollgate-module-basic-go"/"tollgate"),
but the real on-disk names are tollgate-wrt — so status always reported
"not installed" and service restarts silently no-op'd.
- provision_ssid used `uci add`, creating a new wifi-iface section (and
therefore a new duplicate broadcast SSID) on every provision call instead
of updating one in place.
Also adds a TollGateConfig.enabled field so the enable/disable state is
actually applied to the running service and the SSID's own broadcast
(stop + disable at boot, or start + enable), not just written to UCI.
On the frontend, the OpenWrt Gateway page's TollGate panel was read-only
once installed — add an edit form (price, step size, min steps, mint URL,
enabled toggle) that reuses the same idempotent provision-tollgate call.
- mempool-api now declares dependencies:[bitcoin:archival] directly, closing a
gap where installing it standalone (a legitimate direct orchestrator-install
target) bypassed the mempool umbrella's pruning gate entirely.
- New durable user-uninstalled marker (crash_recovery.rs, mirrors user_stopped)
fixes required-baseline-app self-heal (bitcoin-knots/electrumx/lnd/mempool/
etc.) resurrecting itself after an explicit uninstall survives a restart or
reboot, since the in-memory disabled set is wiped by every load_manifests().
- installed_version() (set_config.rs) no longer trusts a floating image tag
("latest") as the reported running version -- a stale local :latest cache
reported "latest" forever regardless of what latest had moved on to. Now
falls back to asking the Bitcoin backend directly via `bitcoind --version`
when the tag is floating.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Master-plan backlog §10b/§10c: replace two per-app-hardcoded lookups with
generic, manifest-driven behavior so future apps are covered automatically
instead of needing a code edit.
- extract_lan_address (docker_packages.rs) now skips container-side ports
that are known non-HTTP (SSH, FTP, common DB ports) instead of blindly
taking podman's first-listed port. Fixes the whole class of bug the gitea
SSH-before-web static override was a one-off patch for.
- requires_unpruned_bitcoin (dependencies.rs) now checks the app's own
manifest for a `bitcoin:archival` dependency declaration first, falling
back to the old hardcoded id list. electrumx and mempool manifests now
declare it explicitly as the proof case.
869/869 Rust tests green, catalog drift clean.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Merges in the meshtastic agent's now-finished work alongside this session's
continuation: stock-peer (3ccc) PKI-capability is now stamped through
get_contacts -> refresh_contacts -> MeshPeer.pkc_capable, so a directed DM to/from
a PKC-capable stock Meshtastic peer correctly shows the E2E pill on the Sent row,
not just received messages. Confirmed live: .198 sees "Meshtastic 3ccc" with
pkc_capable=true.
Also fixes two real interop/correctness bugs found while live-testing the
Reticulum <-> Sideband link:
- Receive: the daemon only ever read LXMF's plain-text content, silently
dropping native FIELD_IMAGE/FIELD_FILE_ATTACHMENTS fields — a stock
Sideband/NomadNet photo vanished into a blank-space message. Now decoded
into the same ContentInline typed envelope our own attachments use.
- Send: images to a non-archy (stock) peer now use native LXMF FIELD_IMAGE
instead of our own opaque CBOR wire format, which Sideband can't decode.
- Root cause of a garbled MC-chunk-fragment bug: TypedEnvelope.v/.sig (the
OUTER wrapper every message type uses) serialized raw bytes as a CBOR
array-of-integers instead of a native byte string, bloating every
message on the wire ~2-3.5x — enough to push even a tiny ReadReceipt
over the 140-byte single-frame chunking threshold. Root-caused by
reading ciborium's deserializer source directly (deserialize_bytes only
works within its internal scratch buffer; deserialize_byte_buf streams
unbounded).
Frontend: consolidated the attach/record buttons into a single animated "+"
menu (was overflowing the compose row).
857/857 tests pass. Verified live across all 5 deploy-roster nodes.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Phase 0 gates #2/#3 (two-node LXMF-over-LoRa, external Sideband interop) passed
on real hardware (.116's flashed Heltec V3 RNode <-> a phone-flashed RNode running
Sideband) — RNS announce, encrypted DM round-trip, and contact binding all verified
live. Fixed two bugs found in the process: the Reticulum send path wasn't stamping
outbound messages as E2E despite LXMF being unconditionally encrypted, and the
per-message transport pill collapsed Meshcore/Meshtastic into one generic "lora"
color instead of distinguishing the three radio transports.
Built on top of that link: a Columba-style image/file send experience —
compression-quality presets with a real transfer-time estimate (mesh.transport-advice,
now device-throughput-aware), receive-side thumbnail previews + auto-render for
already-local attachments, and async voice messages, all reusing the existing
ContentRef/ContentInline attachment pipeline. The headline addition is genuine RNS
Resource transfer support (daemon-side RNS.Link + RNS.Resource, Rust-side
send_resource/resource_recv plumbing, a new "resource-mesh" transport-advice tier)
so compressed photos up to 2MB now actually transfer over LoRa for Reticulum peers
instead of always falling back to Tor past the small inline-chunk cap.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
- WISP wizard: step-by-step flow for WiFi, DHCP, masquerade config
- WAN status: expose lan_ip, dhcp_start/limit, masq, sta_state, wifi_log
- wifi_scan: detect CCMP as WPA2 (psk2) so association succeeds
- opkg: PkgManager enum — detect apk-native mode when opkg not in repos
- tollgate: apk-native install path using manual ipk extraction
- arch detection: read DISTRIB_ARCH from /etc/openwrt_release; normalise
bare mipsel/mips from uname -m to mipsel_24kc/mips_24kc
- install_ipk: install binutils via apk when ar not in BusyBox
- install_ipk: wget --no-check-certificate for routers without CA bundle
- install_ipk: ar fallback to tar -xzf for non-standard ipk formats
- install_ipk: 5MB overlay space check with clear user-facing error
- middleware: allow "Not enough flash/space" errors through sanitizer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On a freshly-flashed OpenWrt router, radio0 is disabled by default so
iw dev returns empty. Detect the PHY via /sys/class/ieee80211/, enable
radio0, run `wifi up`, then poll up to 8s for netifd to create the
virtual interface before handing it to iwinfo scan.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New RPC methods:
- openwrt.scan-wifi: triggers iwinfo scan on the router radio,
returns networks sorted by signal strength
- openwrt.configure-wan: creates UCI wireless.wwan (sta mode) +
network.wwan (DHCP) + adds wwan to firewall WAN zone, then
calls `wifi reload`
get-status now includes a `wan` object with configured/ssid/ip/
internet fields so the UI can show current uplink state.
Frontend WAN panel: scan → pick SSID (signal bars) → enter password
→ apply. Shows "Configure WAN first" hint above TollGate install
button when internet is not available.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
apk errors were being silently dropped (stdout only). Run apk update
first and fail with a clear "router may have no internet" message if
it fails, rather than a cryptic exit-1 from apk add.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
"opkg not found at /usr/bin/opkg" was being swallowed by the error
sanitizer and shown as generic "Operation failed". Also fix bare
`opkg list-installed` call in get-status handler to use full path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BusyBox opkg exits 0 even when 'Cannot install' due to insufficient space,
causing the fallback to silently report success. Now captures stderr and
checks for the failure string explicitly.
Adds user-visible error for the common case where the router flash is too
small for the TollGate package (~19 MB needed vs ~9 MB available on typical
budget routers). Adds error prefixes to the RPC sanitizer allowlist so the
message reaches the UI instead of showing 'Check server logs'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- OpenWrtGateway.vue: add "Install TollGate" button when not installed;
tracks connected credentials for reuse in the provision call
- install.rs: fall back to wget download from GitHub releases when the
package is not in any opkg feed (mips_24kc and other arches supported)
- openwrt.rs: provision-tollgate now falls back to saved router_config
for credentials, matching the behaviour of get-status
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without these prefixes in the allowlist, sanitize_error_message swallowed
the "No router configured" error and returned a generic "Operation failed",
so the frontend could never detect the unconfigured state and show the
connect form.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backend: new `openwrt.get-status` RPC endpoint SSHes into the saved (or
provided) OpenWrt router and returns system info, TollGate config, and WiFi
AP interfaces via UCI.
Frontend: new OpenWrtGateway.vue view at /dashboard/server/openwrt shows
system hostname, OpenWrt version, uptime, TollGate install/enable state with
pricing and mint URL, and all AP-mode WiFi interfaces. Linked from the Local
Network section of the Server view.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New `archipelago-openwrt` workspace crate provides SSH/UCI-based management
of OpenWrt routers, including automated TollGate installation and configuration
of a pay-as-you-go "archipelago" SSID backed by the local Cashu mint.
Exposes two RPC endpoints:
- `openwrt.scan` — discover OpenWrt routers on the LAN
- `openwrt.provision-tollgate` — install tollgate-module-basic-go, write UCI
config (TIP-01/TIP-02), and create isolated WiFi SSID + firewall zone
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- send_message now sends archy↔archy plain text as a native TEXT_MESSAGE_APP
DM (firmware PKC-encrypts E2E), not wrapped in the binary typed envelope
that silently broke archy↔archy LoRa delivery. Archy peers' Sent rows are
marked encrypted so the E2E pill shows; rich typed msgs still use the
typed-wire path.
- Add a software radio-reboot to recover a wedged/RX-deaf radio without
physical access (and for the Device-tab settings panel): driver reboot()
via AdminMessage reboot_seconds=97 (verified vs meshtastic/protobufs),
MeshCommand::RebootRadio, MeshService::reboot_radio, RPC mesh.reboot-radio.
- Handoff doc: docs/SESSION-1.8.0-OTA-PROGRESS.md "RESUME HERE" — RF link is
the proven blocker (radios not hearing each other); modem_preset mismatch
is the prime suspect; on-device Meshtastic-app check + fix plan documented.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
netbird is fully manifest-driven (apps/netbird-*/manifest.yml via the signed
catalog): install_stack_via_orchestrator renders the 3-member stack with
generated_certs (self-signed TLS for the #15 OIDC secure context), base64
generated_secrets, and templated config — and adopts the running stack by live
container name. The hardcoded `podman run` fallback was therefore dead code on
any node with the embedded catalog (verified live: .228 https:8087 -> 200).
Removes the per-app Rust installer anti-pattern the master plan calls out:
- install_netbird_stack: orchestrator -> adopt -> bail! (no in-Rust installer)
- deletes 6 now-dead helpers (write_netbird_config_files, ensure_netbird_tls_cert,
read_or_generate_b64_secret, netbird_net_resolver_ip, detect_netbird_public_host_ip,
wait_for_netbird_oidc_ready), 3 NETBIRD_*_IMAGE consts, unused base64::Engine import
- ~485 lines removed; prod_orchestrator doc-comments updated
Behavioural parity: the manifest path already executed on the fleet, so this
changes no live behavior. The legacy #10 OIDC-readiness wait was already bypassed
by the manifest path; if that race resurfaces, add an OIDC-ready gate to the
manifest rather than resurrecting the Rust fn.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
handle_package_uninstall lumped every teardown failure into one `errors` vec
and returned Err on any of them BEFORE removing the package state entry — so a
non-fatal cleanup hiccup (a slow/failed `sudo rm -rf` of a large data dir, a
volume/network removal) left the app's containers gone but its entry in
package_data → a ghost in My Apps, and the spawned task reverted it to Installed.
Split the failures: container removal that even force-rm can't complete (app
genuinely still present) keeps the entry + returns Err; everything after the
containers are gone is best-effort. Remove the state entry as soon as the
containers are gone — BEFORE the slow volume/data teardown — so My Apps updates
immediately and residue can never ghost the app. set_uninstall_stage is a no-op
once the entry is gone (if-let guard), so the later stages don't re-create it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
package.restart resolved its container list via
ordered_containers_for_start, which injected every name from the
union startup_order list that wasn't already present — including
variant names not live on a given node (mysql-mempool,
archy-mempool-api, archy-mempool-web). The phantom mysql-mempool is
2nd in the mempool start order, so do_orchestrator_package_start hit
its unknown-app-id fallback, do_package_start failed the inspect
("no such object"), and the `?` aborted the whole start sequence —
leaving mempool-api + the frontend down until the health monitor
recovered them minutes later. That was the source of the 5× gate
flakes #73 (frontend not running in 180s) and #74 (api not queryable
in 300s); root-caused from the .228 journal
("Start failed: mysql-mempool").
Replace the inject-then-sort logic with a pure helper
order_present_containers that orders only the actually-present
containers and never adds phantom entries. startup_order remains a
union of name variants across install generations — it's now used
purely to order what's live, not to inject what isn't. +3 unit tests.
Also harden bitcoin-knots.bats "valid state" probe: poll ≤30s for a
settled state instead of a single-shot read, so a container caught
mid-reconcile (transient restarting/configured) can't flake a 20-min
iteration. A genuinely-stuck container never settles, so real
breakage is still caught.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A user-stopped backend (electrumx, bitcoin, lnd, fedimint) kept reading 'running'
in container-list because its UI companion (electrs-ui, …) still serves the launch
port, and the state-refresh upgrades any reachable launch port to 'running'. The
gate's wait_for_container_status <app> stopped therefore never saw 'stopped'.
Fix: load the user_stopped marker in handle_container_list and force 'stopped' for
those apps before the launch-port refresh. The reconcile guard keeps the backend
down, so the marker is authoritative. package.start clears it first, so a started
app reports 'running' normally.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author the IndeedHub stack as 7 manifests (postgres/redis/minio/relay/api/
ffmpeg + frontend) and route install_indeedhub_stack through the
orchestrator first (immich pattern), falling back to the legacy installer
only when the manifests aren't deployed.
Data-preserving by construction — the manifests reproduce the live install
exactly so an existing node ADOPTS rather than recreates:
- container_name = the live hyphenated names the runtime already references
(health_monitor tiers/deps, crash_recovery).
- named volumes indeedhub-{postgres,redis,minio,relay}-data (not bind mounts).
- dedicated indeedhub-net + network_aliases [postgres|redis|minio|relay|api]
so the api/ffmpeg env hostnames and the frontend nginx upstreams resolve
unchanged.
- generated_secrets (indeedhub-db-password/-minio-password owned by their
backends, indeedhub-jwt by the api) reuse the live /var/lib/archipelago/
secrets values (ensure_one no-ops on existing files; postgres pw is fixed
at PGDATA init). minio user "indeeadmin" + AES_MASTER_SECRET literal kept.
The frontend carries the post_install hook (#20) that replaces the hardcoded
patch_indeedhub_nostr_provider: strip X-Frame-Options, refresh
nostr-provider.js from /opt/archipelago/web-ui, inject the <script> if
absent, reload nginx — defensive/idempotent since indeedhub:1.0.0 already
bakes these. Frontend manifest also corrected off its dead Next.js shape
(health check now nginx :7777, tmpfs /run + /var/cache/nginx).
Builds + unit-tested; live adoption/lifecycle verification on .228 next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After the manifest migration the launcher installed as "immich-server" (app_id),
which has no catalog entry → showed the raw id and no icon. Rename the server
manifest app_id immich-server→immich so it matches the catalog/curated "immich"
entry (title "Immich", icon immich.png) and is recognised as a known launcher app
(APP_CATEGORY_MAP) → stays in My Apps. immich_stack_app_ids now installs
[immich-postgres, immich-redis, immich]; orchestrator.install bypasses package
routing so there's no recursion with the "immich"→stack-installer mapping.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the immich migration off the legacy hardcoded install_immich_stack
(podman run + sudo chown) to the registry-manifest + orchestrator path. Validated
live on .228 (clean single set, healthy v2.7.4, data dir ownership correct).
- install_immich_stack now tries install_stack_via_orchestrator(immich_stack_app_ids)
first; legacy remains only as the no-manifests fallback.
- immich-{postgres,redis,server} manifests corrected from live findings:
* named by app_id (dropped container_name override) — using container_name
spawned DUPLICATE containers (app_id-named install vs name-override reconcile)
on the same PGDATA, which corrupted a postgres cluster. Server reaches its
siblings via app_id aliases (DB_HOSTNAME=immich-postgres, REDIS=immich-redis).
* immich-postgres data_uid 100998:100998 (postgres drops to container 999 →
host 100998 under rootless; verified the fresh dir is chowned correctly).
* immich-server version "release"→"2.7.4" (manifest validation requires a digit;
the bad version made the manifest silently skip → partial orchestrator install
→ legacy fallback → the duplicate corruption above).
- HARDEN install_stack_via_orchestrator: only fall back to the legacy installer
when NOTHING was installed yet. An "unknown app_id" AFTER a member is up now
errors instead of double-creating containers on shared data (the corruption
root cause).
- Strict the all-manifests round-trip test: fail (not skip) on any invalid shipped
manifest — this gap let the bad immich-server version through.
Known follow-up (pre-existing, platform-wide): orchestrator-installed backends
(immich, btcpay-db) run as podman --restart, not Quadlet, and podman-restart.service
is disabled on .228 → reboot-survival gap independent of this migration.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Buyer-side paid downloads now persist: purchases are cached on disk
(content_owned.rs) keyed by (seller onion, content_id), the gallery shows
an "Owned" badge unblurred, and items view/play in-app from the local
cache with no re-payment or reliance on a browser download (which
silently failed on the mobile companion). New RPCs content.owned-list /
content.owned-get. Validated e2e .116<-.198 (paid 100 sats via Fedimint,
166KB jpeg returns, survives restart).
fedimint-clientd manifest: restore the standard container capability set
(CHOWN/DAC_OVERRIDE/FOWNER/SETUID/SETGID) so fmcd's startup chown of an
existing-federation /data succeeds instead of dying EPERM (#7). Confirmed
the orchestrator applies these to the running container.
FIPS perf: tighten the supervisor warm-path keepalive 45s -> 25s so peer
paths stay inside the ~30-60s NAT cold window. Dials now reliably land on
FIPS instead of re-punching and falling back to Tor. Measured to the same
peer: cloud browse 18-22s -> 0.4s; full Fedimint paid download 29s -> 11s
(residual is the seller-side guardian reissue round-trip).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- PeerFiles: new confirmation step after "pay from ecash" — shows the amount and
which wallet will be spent (Cashu/Fedimint) with balances, lets the user switch
backends, and a styled Confirm button. The chosen backend is passed to the
payment so it spends exactly what was confirmed.
- content.download-peer-paid: accept `method` (cashu|fedimint) to honor the
confirmed choice; log the backend + outcome; backend-specific rejection errors
("not in the same Fedimint federation" / "doesn't accept your Cashu mint").
- AUTO-REFUND: a minted token whose sale fails (peer unreachable, rejected, or
error) is now reclaimed (fedimint reissue / cashu receive) so the buyer no
longer loses the spent ecash — fixes the stuck-Fedimint-notes report.
- wallet.ecash-balance already reports cashu_sats/fedimint_sats/total_sats which
the confirm screen uses to pick/show the covering wallet.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Right after install the dashboard SPA opens and, if it loads before NetBird's
embedded OIDC provider is serving, caches a bad auth state — the user appears
logged-in but can't log out until it self-corrects. Container "running" != OIDC
ready, so gate the install's Done phase on the management server's
/oauth2/.well-known/openid-configuration answering (best-effort, 60s cap, never
fails the install since the stack is already up).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Paying for a peer file minted a Cashu-only token, so a node whose ecash balance
lived in Fedimint couldn't pay even with funds. Now both backends are tried:
- payer (content.download-peer-paid): mint a Cashu token first; on failure fall
back to spending Fedimint notes. Only error if BOTH backends can't cover it.
- seller (verify_and_receive_payment): accept Fedimint notes as well as Cashu —
anything not starting with "cashu" is redeemed via reissue_into_any.
- new fedimint_client::spend_from_any() — spend from whichever joined federation
has the balance, returning the notes + federation id (mirrors reissue_into_any).
- wallet.ecash-balance now also reports fedimint_sats + combined total_sats; the
pay-for-file pre-check uses the combined total so a Fedimint-funded node isn't
wrongly blocked.
Compiles (cargo check + vue-tsc). Live cross-node federation validation pending
(dual-ecash phase 6) — needs two nodes sharing a federation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A node reachable both over LoRa and federation has two MeshPeer rows (radio
twin: low contact_id + firmware key; federation twin: high contact_id +
archipelago key), and messages key by peer_contact_id split across the two ids
— so opening one twin shows an empty thread (the .120->.89 symptom).
- backend: new group_peer_twins() helper groups peers by arch_pubkey_hex (set on
BOTH twins by bind_federation_twins), keeps the radio id as the mesh-first
send target, and unions messages across all twin ids. Wired into
conversations.list / conversations.messages / mesh.contacts-list. +3 unit tests.
- frontend: the live chat list merges client-side (mergedPeers) and matched twins
by the "Archy-z6Mk..." advert prefix, which the Meshtastic device rename broke
(radio now advertises the server name). Merge by arch_pubkey_hex instead, which
the backend reliably sets on both twins. Expose arch_pubkey_hex on MeshPeer.
- fix unrelated stale test: EcashTransaction test missing the new `kind` field.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- reissue_into_any now tries the UNION of the local registry AND fmcd's live
joined set (/v2/admin/info) before failing, so a valid Fedimint token isn't
wrongly rejected when the registry has drifted. On all-fail it returns a
friendly message: notes already redeemed into this wallet (funds safe) vs
didn't match any connected federation.
- Unified transaction history: a local Fedimint tx log (recorded on each
successful redeem) is merged with the Cashu history in wallet.ecash-history,
newest-first, each tagged kind=cashu|fedimint. Previously a Fedimint receive
appeared nowhere.
- fedimint-clientd healthcheck -> type:tcp. It was probing /health, which fmcd
doesn't serve (only /v2/*), pinning the container in (starting) forever; the
TCP probe is skipped by the Quadlet renderer (host-side lifecycle verifies),
so it reports running. Cosmetic for ecash, which worked throughout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
UI (this session):
- Global audio player now scales the whole interface into the space above it
on desktop (sidebar + main) and docks directly above the tab bar on mobile;
it stays visible while navigating.
- Mesh mobile redesign: floating Chat / BTC / Dead Man / AI / Map tab strip
with a single fixed, internally-scrolling pane (page no longer scrolls);
tabs hide while a conversation is open; floating back button; collapsible
Device panel (starts collapsed); keyboard-aware conversation sizing via
VisualViewport so the chat sits just above the keyboard.
- Cloud file grid: uniform 4/3 card heights (folders + images match).
- Swipe left/right switches tabs on the Apps and Web5 screens.
- Map tool fills its pane (no bottom gap); fix skewed Share Location toggle
on mobile (global min-height rule was deforming the switch).
- Trim redundant helper copy from the mesh AI tab.
Also bundles pre-existing in-progress work that was already in the tree:
mesh listener/session + wallet + container + bitcoin-status backend changes,
docker UI updates, and assorted other UI tweaks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fedimint never appeared in Wallet > Settings > Fedimint because the
fmcd (fedimint-clientd) sidecar was never installed: ensure_default_
federation() needs the fmcd password to reach the daemon, found none,
and silently no-oped, leaving the registry empty.
- prod_orchestrator: add fedimint-clientd to the baseline auto-install
set so it self-heals onto every node and auto-joins the default
federation; generate the fmcd-password secret before secret_env
resolves.
- fedimint_client: ensure_fmcd_password (random hex, 0600) shared with
the container's secret_env; from_node reads the same secret (legacy
fmcd/password kept as fallback); reissue_into_any redeems received
notes into the first joined federation that accepts them.
- wallet.ecash-receive: dual-token — cashu* tokens redeem at the mint,
anything else is reissued via fmcd; returns the kind + federation_id.
- UI: receive box advertises "Cashu or Fedimint" and reports which kind.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Cashu default mint was the local Fedimint guardian (:8175), wrongly surfacing
a Fedimint URL in the Cashu mints list. Default is now Minibits
(https://mint.minibits.cash/Bitcoin) — Cashu and Fedimint are distinct
protocols (Fedimint lives under its own tab).
- Peer-file (buy) invoice creation: retry the LND REST call (3× / 400ms) so a
transient LND-REST blip (swap pressure / just-restarted / TLS race) no longer
hard-fails as an opaque 503, and surface the real error chain ({:#}) in the
response + logs instead of a generic "Failed to create invoice".
- Autojoined default federation now shows a friendly name ("Archipelago
Federation") in the Fedimint tab instead of a bare federation id.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sizes bitcoind -dbcache to host RAM (~1/16, floor 300MB, cap 4096) instead of a
fixed 2048/4096. A multi-GB UTXO cache on an 8GB node running the full app stack
pushed memory past physical RAM and triggered system-wide swap thrash: the disk
saturated, bitcoind could not answer its own RPC, and the dashboard backend's
sqlite reads stalled — surfacing as fleet-wide /rpc/v1 502s and a blank Bitcoin
UI. Applied in scripts/container-specs.sh (reconciler path) and the config.rs
bitcoin-core path.
Bitcoin status cache now polls every 5s (was 10/15) with an 8s timeout (was 20s)
and fetches the four RPCs concurrently, so the cached snapshot tracks bitcoind's
responsive windows during IBD and the UI stops dwelling on "reconnecting...".
Unifies the divergent discover AppGrid/FeaturedApps image-error handlers onto the
canonical placeholder fallback so missing app icons render the placeholder.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- DMs now use native meshcore unicast (CMD_SEND_TXT_MSG) instead of @DM2 channel
broadcasts: private (E2E-encrypted to the recipient pubkey by firmware), off the
public channel, and decodable by stock clients. Plain text (split, not MC-chunked)
to non-archipelago contacts; typed envelopes to archy peers.
- !ai replies now DM the asker privately (RadioDm) instead of broadcasting on ch0.
- Auto contact-import: a heard advert (PUSH_CONTACT_ADVERT/0x80, 32-byte pubkey) is
added via CMD_ADD_UPDATE_CONTACT (0x09) so contacts appear without a flood advert.
- clear-all now DELETES firmware contacts via CMD_REMOVE_CONTACT (0x0F) instead of
blocklisting; blocking filter removed entirely. Wiped contacts return when reachable.
- Contact reachability: MeshPeer carries last_advert + reachable (path-based); UI shows
a reachability dot.
- Peers list: contact search box (filter by name/DID/npub/pubkey) with a clear button.
- send_message routes stock contacts as plain native text (fixes garbled envelopes).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- mesh: stop broadcasting ARCHY:2 identity on the public channel (startup + every advert tick); receive path still parses inbound. No more public-channel spam.
- mesh assistant: trigger on !ai/!ask typed in 1:1 chat (was only the dead AssistQuery path + bare channel text); route the reply transport-aware via MeshService::send_message (Tor for federation peers, LoRa for radio) through a new AssistChatReply event consumed at the server layer — fixes replies never reaching federation askers.
- mesh assistant: per-contact !ai allowlist (allowed_contacts) bypassing trusted_only; config + RPC + is_sender_allowed.
- fedimint-clientd manifest: network_policy open -> bridge (invalid value made the loader skip the whole manifest, so fmcd never ran and federations never joined/listed).
- ui: AI panel — Claude model dropdown (Haiku/Sonnet/Opus presets) + allowlist contact picker.
- ui: Settings — App Updates + App Registry moved under Account.
- ui: mesh chat — overscroll-behavior: contain so chat scroll no longer bleeds to the contacts panel.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>