archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	067002b04b	Merge branch 'bitcoin-version-bulletproof' into mesh-multiversion-integration	2026-06-29 06:45:50 -04:00
archipelago	20f762cb2c	feat(fips): auto-peer LAN-discovered federation nodes directly over FIPS Mesh/federation messages between co-located nodes were always falling back to Tor because the FIPS overlay had no direct peering — every node depended on the global anchor's spanning tree, and when that anchor link flaps a node is isolated and all FIPS dials time out. (Diagnosed live on .116/.198: pure-FIPS direct peering over UDP 8668 fixes it — 2.5ms vs timeout.) Generalize the manual fix: in the existing 5-min FIPS seed-anchor apply loop, also auto-connect every federation peer the PeerRegistry knows both a LAN address AND a FIPS npub for, dialing its FIPS UDP transport (port 8668) at its LAN IP via the same idempotent `fipsctl connect` path (new anchors::lan_fips_anchors). This is FIPS's own transport over the LAN — NOT Tailscale, NOT the HTTP/LAN messaging port. Transient (recomputed each tick from live mDNS discovery, never persisted) so changing IPs self-correct. Remote peers with no LAN address are untouched (still routed via the anchor). Registry Arc hoisted out of the transport-init block so the loop can read all_peers(). cargo check green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 06:42:18 -04:00
archipelago	11155055aa	feat(mesh): meshtastic PKI E2E pill — surface pki_encrypted on received DMs The synthetic meshcore-style frame the meshtastic driver builds can't carry the radio's PKI-encryption status, so received meshtastic DMs never lit the E2E pill. Thread it out-of-band: the device records `last_rx_encrypted` (= packet pki_encrypted) when it yields a text frame; the session loop reads it via `take_rx_encrypted()` right after dispatch and stamps the just-stored received message E2E (dispatch::stamp_received_encrypted, monotonic-id keyed). Meshcore returns false here (its E2E is derived in the frames decrypt path). Pure out-of-band signal — no change to the shared meshcore wire format. Built + deployed live in binary d937814e on .116/.198. cargo check green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 06:25:01 -04:00
archipelago	095a76cd20	fix(bitcoin): bulletproof multi-version switching (Knots & Core) Three stacked bugs made "switch version" silently fail / crash-loop, and the data-access mismatch corrupted a node's index during recovery attempts. Backend renderer: - sync_quadlet_unit ignored the per-app pinned version and re-rendered the quadlet with the manifest's :latest every reconcile tick, reverting any switch. Factor the install-time catalog/pin resolution into a shared resolve_catalog_image() and call it in BOTH install_fresh and sync_quadlet_unit. - The renderer folded manifest `entrypoint: ["sh","-lc"]` into Exec=, which only worked when the image entrypoint was a passthrough shell wrapper. The versioned images use ENTRYPOINT ["bitcoind"], so Exec=sh -lc ... became `bitcoind sh -lc ...` and crash-looped. Emit a real Entrypoint= override; exec_changed now also compares Entrypoint=. Images: - Build all bitcoin images (Core + Knots, every version) as container-root (USER removed) like the legacy :latest image. Chain data is owned by the data_uid (container uid 102); root reads it via CAP_DAC_OVERRIDE (granted in the manifest). A non-root USER (the previous uid 1000) can't read existing chain data → "Error initializing block database". Still fully rootless: container-root maps to the unprivileged host service user. Catalog: - bitcoin-knots versions[]: 29.3.knots20260508/20260507/20260210 + 29.2.knots20251110, "latest" tracking newest. - bitcoin-core versions[]: add 29.2 + a "latest" entry. All images rebuilt root and published to the mirror. Frontend: - AppSidebar version dropdown: rename the latest option to "Always use the latest version" (no v prefix), fix right padding, and guarantee the current selection matches a real option (was rendering blank). - New InstallVersionModal: full-screen version chooser shown from the App Store / Discover install button for multi-version apps (Bitcoin Knots/Core), app icon + "Install <name>", latest pre-selected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 05:46:04 -04:00
archipelago	3c7c04a662	fix(mesh): meshtastic receive — drain frame batch per poll + rx diagnostics Addresses the open Meshtastic parity bug (project_meshtastic_parity): the running driver received nothing (`mesh.messages` stayed []) though the radio got the packets and sends worked. Root-cause candidate: `try_recv_frame` decoded ONE serial frame per poll and returned Ok(None) for every non-text FromRadio frame, so the session loop slept 50ms between frames. Under Meshtastic's frequent NodeInfo/telemetry stream a received text packet queued behind them, and read_from_radio's 64KB buffer cap could drain (drop) it before it was ever decoded — reception silently dead while sends kept working. - try_recv_frame now drains a bounded batch (64) per poll, processing each frame's side effects and returning the first inbound text frame, so a text packet is decoded the same poll it arrives and the buffer never grows enough to hit the lossy cap. Bounded so a continuous flood still yields to select!. - packet_to_inbound_frame logs every decoded packet (from/portnum/payload_len) and a "did not parse (dropped)" case, so one live radio pass is conclusive. The rest of the decode path was verified correct by inspection (FROM_RADIO_PACKET =2, wire-type-5 handled, parse_mesh_packet sound, 60s heartbeat present) — not a parse bug. cargo check green. NEEDS a live radio pass on a rig that isn't .228 (off-limits: bitcoin testing) to confirm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 05:04:09 -04:00
archipelago	11038cdcc9	feat(mesh,ui): per-message transport pill (Mesh/FIPS/Tor) + fix E2E pill Adds a per-message transport badge to archy↔archy mesh chats and fixes the long-broken E2E badge — both meshcore and meshtastic, styled like the existing E2E pill. Transport pill: - New `MeshMessage.transport` ("lora"/"fips"/"tor"), surfaced in the UI beside the E2E badge (Mesh.vue transportLabel() → Mesh/FIPS/Tor, mesh-styles.css). - Sent LoRa → "lora"; sent federation → finalized to the real leg ("fips"/"tor") once the background send resolves (req.send_json transport), via an id-keyed store update. - Received: a post-dispatch stamp on handle_typed_envelope_direct's output (monotonic ids) tags both transports without threading through all 20 typed- dispatch sites — radio wrapper stamps "lora", federation injector stamps the peer's last_transport ("fips"/"tor", default tor; the inbound HTTP carries no FIPS-vs-Tor signal). - Plain native/channel LoRa frames → "lora"; channel broadcasts stay non-E2E. E2E pill fix: - `encrypted` was hardcoded false at every MeshMessage construction site, so the UI badge (Mesh.vue `v-if="msg.encrypted"`) never showed. Now: federation envelopes are E2E (identity-signed over an encrypted transport); the meshcore native-DM receive path already had a real `encrypted` flag (now also tagged with transport). meshtastic-PKI radio E2E flag threading is a noted follow-up. Backend cargo check + frontend vue-tsc build both green. Needs a live radio + multi-transport pass on .116/.228 to confirm end-to-end (see project_transport_pill / project_meshtastic_parity). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 04:29:25 -04:00
archipelago	6aa74c7386	feat(bitcoin): multi-version support for Core & Knots (install/switch/pin/auto-update) Lets a node runner choose which Bitcoin Core / Knots version to install (latest pre-selected), then switch, pin, or opt into auto-update from the app's interface — all manifest/catalog-driven, rootless, signed-registry, zero-data-loss. Motivated by upcoming BIP-110 signalling: runners need a real choice of software version. Backend: - version_config.rs: per-app pin + auto-update persistence (atomic, merge- preserving), downgrade detection, auto-update enumeration (+ unit tests). - app_catalog.rs: CatalogVersion / versions[] schema, catalog_versions(), catalog_image_for_version() (same-repo guard); a pin suppresses the update badge. - prod_orchestrator.rs: pinned version wins over the catalog default on every install/recreate. - install.rs: install-time `version` param persisted (default = unpinned). - set_config.rs: package.versions (read) + package.set-config (write) RPCs; downgrade is gated behind explicit confirm (warn + confirm + allow). - update.rs/main.rs: hourly per-app auto-update tick via the orchestrator (opt-in, pin-respecting); fix handle_package_update to be non-fatal for orchestrator-managed apps lacking a catalog primary image (bitcoin-core). UI: - MarketplaceAppDetails.vue: install-time version selector (shown when an app offers >=2 versions). - appDetails/AppSidebar.vue: "Version & Updates" card (switch / pin / auto- update toggle / downgrade warning), per app. - rpc-client.ts + en.json: RPC methods, types, strings. Phase 0 image pipeline: - scripts/build-bitcoin-image.sh: download official tarball + SHA256SUMS(.asc), verify SHA-256 + pinned-maintainer OpenPGP signature (fail-closed), build a minimal rootless image, smoke-test, tag + push. - apps/bitcoin-core/Dockerfile rewritten (drops stale community base); apps/bitcoin-knots/Dockerfile added. - generate-app-catalog.sh: emit curated versions[]; published + catalog now offers Core 25.2/26.2/27.2/28.4/29.3/30.2/31.0 + Knots 29.3.knots20260508. docs/bitcoin-multi-version-design.md: live progress tracker. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 18:46:17 -04:00
archipelago	83344b9f3a	fix(orchestrator): drop legacy mempool umbrella manifest on catalog-driven nodes The split-mempool-stack guard that skips the legacy monolithic `mempool` manifest (whose container collides with its split-stack frontend member `archy-mempool-web`) only ran over DISK manifests. On catalog-driven nodes (no disk manifests — e.g. the Phase-3/registry-manifest path), the legacy `mempool` manifest arrives via the registry-catalog overlay AFTER that guard, so both `mempool` and `archy-mempool-web` end up owning container `mempool` and rewrite+restart each other forever ("port binding drift" / "network alias drift" loop observed on .228, leaving mempool down). Enforce the guard once more over the merged (disk + catalog) manifest set: drop the `mempool` umbrella whenever all three split members are present. Installing `mempool` assembles the split stack, so `archy-mempool-web` owns the frontend container either way. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-28 14:04:41 -04:00
archipelago	4519dbf04f	fix(orchestrator): render manifest certs on the adopted-running reconcile path WS-F #10: a netbird reinstall that adopts a leftover running container skipped ensure_manifest_certs, so when its data dir was wiped the self- signed tls.crt/key were never regenerated; the next nginx.conf rewrite + restart then died on the missing cert (proxy 502, login broken). The Running branch of ensure_running_with_mode now calls ensure_manifest_certs before ensure_manifest_files, mirroring prepare_for_start's certs-before- files ordering. Idempotent: a no-op when crt+key already exist. Live-validated on .228: deleted netbird tls.crt/key under a Running container; reconciler regenerated a fresh CN=<host_ip> self-signed cert (1000:1000), https :8087 = 200. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 17:49:50 -04:00
archipelago	f9a6ae3f32	feat(mesh): Meshtastic region + shared-channel auto-provisioning (MeshCore parity) Fresh Meshtastic radios ship region-UNSET (RF-silent) and on mismatched channels, so nodes only ever saw themselves. Bring them to MeshCore parity using the official Meshtastic admin API: - Auto-provision LoRa region (set_config, AdminMessage field 34) from a new mesh-config `lora_region` (e.g. EU_868) when the radio's region differs. - Auto-provision a shared primary channel (set_channel, field 33) with a PSK derived deterministically from channel_name, so every node converges on one mesh — the parity equivalent of MeshCore's named "archipelago" channel. - Read current region/channel from want_config; only write when different (no reboot loop); cap attempts so a radio that won't persist can't loop. - Active NodeInfo advert scaffolding + aggressive serial drain. Verified on .116+.228: region+channel persist, discovery works (both see each other as named reachable contacts), bidirectional RF + sending confirmed. Receiving in the running driver is still under diagnosis (instrumentation added). Also removes the unwanted `meshtastic` daemon app from the registry (it was never meant to be a container — native driver provides system-level support): deletes apps/meshtastic + catalog entries (app-catalog, neode-ui, releases) + test refs. Meshtastic stays native, like MeshCore. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 04:46:35 -04:00
archipelago	fd3a4ee4ef	fix(orchestrator): chown the whole fresh bind subtree, not just the leaf ensure_bind_mount_dirs chowned a freshly-created no-data_uid bind dir with --reference={immediate_parent}. For a NESTED bind source like jellyfin's /var/lib/archipelago/jellyfin/config (or netbird's .../netbird/ data), `mkdir -p` creates the intermediate <app> dir root:root too, so referencing the immediate parent just copied ROOT — leaving the dir unwritable and the app EACCES-crash-looping on reinstall (found by the all-apps-lifecycle pass: jellyfin "/config/log denied" exit 139; netbird-server "unable to open database file"). It only ever worked for direct children of the data root (immich). Fix: anchor to the nearest PRE-EXISTING ancestor (the rootless data root, owned by the service user) and chown -R the entire newly-created subtree to it. Extracted the walk into fresh_subtree_anchor() with a unit test covering nested / direct / second-volume cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-27 04:46:35 -04:00
archipelago	71cc9ac46a	fix(uninstall): bound systemctl/podman teardown so uninstall can't hang Uninstalling immich/grafana could hang with a frozen full-red progress bar, leave a ghost entry stuck in My Apps, and then refuse reinstall. Single root cause: quadlet::disable_remove() — called first in the uninstall task (via companion + orchestrator teardown) — ran `systemctl --user stop`, daemon-reload, and `podman rm -f` with NO timeout. On rootless podman a generated unit can wedge in "deactivating" while podman hangs underneath, so `systemctl stop` blocks forever. The spawned uninstall task then never returns Ok or Err, so: - set_uninstall_stage() (after the stop) never fires → progress frozen; - remove_package_state_entry() never runs → entry stranded in `Removing` → ghost in My Apps; - the install guard rejects reinstall with "already Removing". The spawn wrapper already reverts state on Err and removes the entry on Ok — the only failure mode was a hang that returns neither. Bound the teardown so it always terminates: - systemctl stop → QUADLET_STOP_TIMEOUT, escalate to kill+reset-failed on timeout (reuses the existing helpers); - daemon_reload_user() → bounded systemctl_user_status (30s); - defensive `podman rm -f` → wrapped in tokio timeout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 04:27:02 -04:00
archipelago	0a8db9044f	fix(orchestrator): recreate zombie "Up" containers whose process is dead podman trusts its own state DB: when a container's conmon dies without podman observing it (cgroup-cascade SIGKILL on archipelago.service restart, a crash), `podman ps` keeps reporting it "Up" long after the process is gone. The reconciler NoOp'd such a zombie forever, so a dead dependency with no published host port never recovered. Observed live on .228 (2026-06-25): netbird-dashboard reported "Up" with a dead State.Pid → its nginx proxy 502'd → NetBird login broke ("Unauthenticated"). The dashboard publishes no host port, so the Running branch had nothing to probe and never recreated it. Add a zombie guard to the Running branch: verify the recorded State.Pid is alive (its /proc entry exists) before trusting "running"; on a concrete dead PID, stop+remove+install_fresh from the manifest. Conservative by design — any uncertainty (inspect failed, PID unparseable) assumes alive, so a transient podman hiccup never destroys a healthy container. Unit test covers live/dead/out-of-range PIDs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-26 02:25:52 -04:00
archipelago	89d397bb74	refactor(netbird): delete legacy Rust installer — #20 ph4 (manifest-driven only) netbird is fully manifest-driven (apps/netbird-/manifest.yml via the signed catalog): install_stack_via_orchestrator renders the 3-member stack with generated_certs (self-signed TLS for the #15 OIDC secure context), base64 generated_secrets, and templated config — and adopts the running stack by live container name. The hardcoded `podman run` fallback was therefore dead code on any node with the embedded catalog (verified live: .228 https:8087 -> 200). Removes the per-app Rust installer anti-pattern the master plan calls out: - install_netbird_stack: orchestrator -> adopt -> bail! (no in-Rust installer) - deletes 6 now-dead helpers (write_netbird_config_files, ensure_netbird_tls_cert, read_or_generate_b64_secret, netbird_net_resolver_ip, detect_netbird_public_host_ip, wait_for_netbird_oidc_ready), 3 NETBIRD__IMAGE consts, unused base64::Engine import - ~485 lines removed; prod_orchestrator doc-comments updated Behavioural parity: the manifest path already executed on the fleet, so this changes no live behavior. The legacy #10 OIDC-readiness wait was already bypassed by the manifest path; if that race resurfaces, add an OIDC-ready gate to the manifest rather than resurrecting the Rust fn. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-25 11:04:01 -04:00
archipelago	a721532f55	feat(orchestrator): desired-state recovery + recreate volume-ownership [UNVALIDATED WIP] NOT yet validated on a node or fleet-deployed — cargo check passes, release build + .228 canary validation pending. Committed as a checkpoint so the work survives. Two fixes the immich .198 incident exposed: Fix A (reconcile_all_with_mode): a previously-running app whose container vanished (e.g. a wedged podman teardown cleared by a reboot) was left absent on boot. Now, when boot reconcile would leave an app 'absent' but it was running at the last running-containers snapshot, recreate it (install_fresh). New crash_recovery::load_last_running_names() reads the snapshot without the PID/crash gate (+2 unit tests). Match is exact on compute_container_name (incl stack members); user-stopped + uninstalled apps are already excluded, so no false positives. Fix B (ensure_bind_mount_dirs): a freshly-created bind dir was left root:root, so a no-data_uid app running as container-root (→ host rootless user) hit EACCES and crash-looped (the exact immich upload-dir failure). Now a newly-created bind dir for a no-data_uid app is chowned via --reference=<parent> to match the rootless data root — no host-uid guessing, only fresh dirs (no regression for existing installs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 09:28:40 -04:00
archipelago	d1cd42c821	fix(orchestrator): stop retrying unrepairable volume chowns every reconcile ensure_running_container_ownership re-probed and re-attempted the in-container chown on every reconcile pass. For a mount that can't be re-owned from inside the userns (observed: mempool-api /data -> 'Operation not permitted'), this burned CPU and logged a WARN on every pass, forever (~6x/30min on .228/.116). Remember hard chown failures in a process-lifetime set keyed by (container-id, dest) and skip the probe+chown for known-unrepairable mounts. Keyed by Id (not name) so a recreated container gets a fresh repair attempt. Verified on .116: one recorded failure at startup, then silent across subsequent reconciles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 04:58:57 -04:00
archipelago	e57514b690	fix(uninstall): never ghost a removed app in My Apps on cleanup residue handle_package_uninstall lumped every teardown failure into one `errors` vec and returned Err on any of them BEFORE removing the package state entry — so a non-fatal cleanup hiccup (a slow/failed `sudo rm -rf` of a large data dir, a volume/network removal) left the app's containers gone but its entry in package_data → a ghost in My Apps, and the spawned task reverted it to Installed. Split the failures: container removal that even force-rm can't complete (app genuinely still present) keeps the entry + returns Err; everything after the containers are gone is best-effort. Remove the state entry as soon as the containers are gone — BEFORE the slow volume/data teardown — so My Apps updates immediately and residue can never ghost the app. set_uninstall_stage is a no-op once the entry is gone (if-let guard), so the later stages don't re-create it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 15:23:16 -04:00
archipelago	4346007d37	fix(orchestrator): only TCP host ports get reachability-probed wait_for_manifest_host_ports TCP-connect-probed every published port, including UDP/SCTP. netbird's 3478/udp STUN can never answer a TCP connect, so the probe failed forever and drove an endless host-port repair/reconcile loop on .228 (netbird-server restarting ~every 60s). Filter to tcp (empty protocol = tcp). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 14:40:48 -04:00
archipelago	9670af62b6	feat(registry): deliver app manifests via the signed catalog (embed by default) Turn on registry-distributed manifests for all apps: generate-app-catalog.sh now embeds each apps/<id>/manifest.yml by default (EMBED_MANIFESTS opt-out), so nodes install from the signed catalog (origin-wins overlay, disk = fallback) with no OTA-shipped disk manifest. main.rs awaits a bounded (25s) refresh_catalog before load_manifests so a fresh boot overlays the latest embedded catalog instead of a restart later; offline/ISO boot falls through to disk and never hangs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 13:39:54 -04:00
archipelago	a8b9b0f5e8	feat(netbird): manifest-driven migration via reusable orchestrator primitives Migrate the netbird stack (server/dashboard/proxy) off ~500 lines of per-app Rust to 3 declarative manifests, adding 4 reusable primitives: - SecretGenKind::Base64 (netbird relay authSecret + sqlite store encryptionKey) - GeneratedCert schema + ensure_manifest_certs (self-signed TLS so the dashboard gets a secure context for OIDC PKCE — issue #15; https proxy on 8087 preserved) - templated GeneratedFile render: {{HOST_IP}}/{{HOST_MDNS}}/{{NETWORK_GATEWAY}} (aardvark resolver for the #15 stale-IP fix) /{{secret:NAME}} (never logged) - legacy create_container now honours port.protocol (3478/udp STUN) install_netbird_stack routes via the orchestrator first (legacy kept as fallback, mirroring indeedhub); launch URL derives https://{host_ip}:8087 from host facts. Legacy Rust deletion deferred to post-live-verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 13:39:53 -04:00
archipelago	3c36cf1c40	fix(companion): stop image_exists journal flood that drops the UI websocket image_exists ran `podman image inspect <image>` via .status() (inherits the service stdout) with no --format, so every hit dumped the image's full ~249-line manifest JSON into the journal — once per companion image, every reconcile pass (.228: 21.6k journal lines / 10 min, 4131 inspect dumps). The service never crashed (NRestarts=0); the sustained journald/IO flood starved the async runtime and dropped the UI /ws/db websocket -> constant "connection lost"/reconnect. Discard the child's stdout/stderr; only the exit status is used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 13:39:19 -04:00
archipelago	92d7f52dd6	fix(orchestrator): order only live containers on package start/restart package.restart resolved its container list via ordered_containers_for_start, which injected every name from the union startup_order list that wasn't already present — including variant names not live on a given node (mysql-mempool, archy-mempool-api, archy-mempool-web). The phantom mysql-mempool is 2nd in the mempool start order, so do_orchestrator_package_start hit its unknown-app-id fallback, do_package_start failed the inspect ("no such object"), and the `?` aborted the whole start sequence — leaving mempool-api + the frontend down until the health monitor recovered them minutes later. That was the source of the 5× gate flakes #73 (frontend not running in 180s) and #74 (api not queryable in 300s); root-caused from the .228 journal ("Start failed: mysql-mempool"). Replace the inject-then-sort logic with a pure helper order_present_containers that orders only the actually-present containers and never adds phantom entries. startup_order remains a union of name variants across install generations — it's now used purely to order what's live, not to inject what isn't. +3 unit tests. Also harden bitcoin-knots.bats "valid state" probe: poll ≤30s for a settled state instead of a single-shot read, so a container caught mid-reconcile (transient restarting/configured) can't flake a 20-min iteration. A genuinely-stuck container never settles, so real breakage is still caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 02:22:50 -04:00
archipelago	57a013bc66	test(gate): make 5× the canonical gate, drop 20x naming Rename run-20x.sh → run-gate.sh, default ARCHY_ITERATIONS 20→5, and scrub 20× references across CLAUDE.md, the master plan, TESTING.md, app-registry status, the orchestrator/config doc-comments, and the bats suites. Also add a minimal fail() helper to mempool.bats so guard failures report cleanly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 18:12:41 -04:00
archipelago	452f05d849	fix(reconciler): decouple companion self-heal onto its own cadence The companion-unit repair stage ran at the END of each boot-reconciler tick, after reconcile_existing(). On a heavily loaded node that per-app pass takes >60-90s, so a deleted/lost companion unit (electrs-ui, bitcoin-ui, …) wasn't repaired within any reasonable window (gate test 31 'deleted unit recreated within one reconcile tick' timed out at 90s on the 45-app .228 node). Detecting + rewriting a companion unit is cheap, so spawn it as its own ~interval(30s) loop, independent of the slow app pass. Handle is aborted when the main loop exits (shutdown uses notify_one, so a second waiter would steal the wake permit). tick() is now app-reconcile only. All 4 boot_reconciler cadence tests still green (companion_stage=false in tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 13:04:28 -04:00
archipelago	6e49ce6f88	fix(container-list): report user-stopped apps as stopped despite live UI companion A user-stopped backend (electrumx, bitcoin, lnd, fedimint) kept reading 'running' in container-list because its UI companion (electrs-ui, …) still serves the launch port, and the state-refresh upgrades any reachable launch port to 'running'. The gate's wait_for_container_status <app> stopped therefore never saw 'stopped'. Fix: load the user_stopped marker in handle_container_list and force 'stopped' for those apps before the launch-port refresh. The reconcile guard keeps the backend down, so the marker is authoritative. package.start clears it first, so a started app reports 'running' normally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 09:26:30 -04:00
archipelago	760a32bccf	fix(reconcile): keep user-stopped apps stopped (reconciler was resurrecting them) package.stop a dependency (e.g. electrumx, a mempool dep) and the reconciler restarts it within ~8s: the reconcile filter's dependency_required override re-includes a user-stopped app that an active app depends on, and the in-memory disabled set is wiped on manifest reload — so ensure_running runs, the stopped app's unreachable ports look like a fault, the host-port repair restarts it, and package.stop never sticks (gate 'transitions to stopped' times out). Fix: guard ensure_running_with_mode on the on-disk user_stopped marker (the single choke point every reconcile flows through) → Left('user-stopped'). Explicit install/start clear the marker first (added clear_user_stopped to orchestrator install/start, symmetric with disabled.remove; start/restart RPC already cleared it) so user actions are unaffected. The container itself already stopped correctly — this stops the resurrection. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 09:04:02 -04:00
archipelago	2dad64b2ee	fix(stop): honour per-app graceful-stop grace in orchestrator stop path package.stop left slow-to-SIGTERM apps (fedimint/electrumx/bitcoin/btcpay/immich) running: the orchestrator path hardcoded podman API ?t=10 / CLI -t 30 and the CLI wrapper deadline (30s) equalled the -t grace, so the await fired exactly as podman SIGKILLed -> stop reported failed -> state reverted to running. Reproduced live on clean .198 (fedimint). - container/runtime.rs: add ContainerRuntime::stop_container_with_grace (defaulted so mock/dev impls are unchanged); PodmanRuntime honours grace for API + CLI with deadline = grace + 15s buffer; AutoRuntime delegates. New canonical per-app table stop_grace_secs_for() + DEFAULT_STOP_GRACE_SECS / STOP_GRACE_DEADLINE_BUFFER_SECS. - podman_client.rs: stop_container_with_grace uses ?t=<grace> + longer HTTP deadline. - prod_orchestrator::stop: resolve grace = manifest stop_grace_secs (north-star) else the table; pass to quadlet::stop_service_with_timeout AND stop_container_with_grace. - quadlet.rs: stop_service_with_timeout so slow apps aren't SIGKILLed at 45s. - rpc/package/runtime.rs: doc-note its &str stop_timeout_secs mirrors the canonical table. - tests: resolve_stop_grace_secs (manifest field wins / table fallback / default 30). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-22 06:59:40 -04:00
archipelago	ff78b31212	fix(hooks): run post_install `exec` in a transient user scope (fixes cgroup denial) Live on .228 the post_install `exec` steps failed with "crun: write cgroup.procs: Permission denied / OCI permission denied": a `podman exec` launched from archipelago.service can't place its child in the container's cgroup (under the service's own slice). Wrap `exec` in `systemd-run --user --scope --quiet --collect podman exec …` so it gets its own delegated cgroup — same trick as `podman_user_scope` for pasta starts. `copy_from_host` (a host-side `cp`, no in-container process) stays direct. Without this only copy_from_host worked; indeedhub happened to be unaffected (its image pre-bakes the nginx config so the exec steps were no-ops), but the hook capability is only generally useful with exec working. hooks unit tests pass; live verify on .228 next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 17:38:23 -04:00
archipelago	b73084dbb0	refactor(indeedhub): delete orchestrator special-cases; use generic path (#20 phase 3) The fresh-create path was blocked by hardcoded indeedhub orchestrator logic that predated and conflicted with the manifest migration: - ensure_running routed app_id=="indeedhub" → reconcile_indeedhub_stack, which REFUSED to create the frontend from its manifest (returned Left("stack-managed")). - run_pre_start_hooks("indeedhub") → start_indeedhub_backends → wait_for_indeedhub_dependencies_ready(120) — a DNS gate with a chicken-and-egg bug (required the frontend's own alias present before the frontend could be created), which failed install_fresh with "dependencies were not ready within 120s" and left the frontend down (caught live on .228). Delete all of it (−382 lines): reconcile_indeedhub_stack, start_indeedhub_backends, wait_for_indeedhub_dependencies_ready, indeedhub_api_dependency_dns_ready, indeedhub_required_aliases_present, repair_indeedhub_network_aliases, indeedhub_alias_present, patch_indeedhub_nostr_provider, and the INDEEDHUB_* consts. The manifests now carry everything these did: network_aliases (short hostnames), generated_secrets, dependencies, and the post_install nginx hook. So "indeedhub" + every member flows through the generic install_fresh/reconcile path — the frontend fresh-creates normally and runs its hook. (crash_recovery.rs's frontend-after-deps ordering guard is kept — it's beneficial startup ordering, not a blocker.) cargo check + release build green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 17:11:33 -04:00
archipelago	b1eea8c053	feat(indeedhub): manifest-driven 7-member stack, orchestrator-first (#20 phase 3) Author the IndeedHub stack as 7 manifests (postgres/redis/minio/relay/api/ ffmpeg + frontend) and route install_indeedhub_stack through the orchestrator first (immich pattern), falling back to the legacy installer only when the manifests aren't deployed. Data-preserving by construction — the manifests reproduce the live install exactly so an existing node ADOPTS rather than recreates: - container_name = the live hyphenated names the runtime already references (health_monitor tiers/deps, crash_recovery). - named volumes indeedhub-{postgres,redis,minio,relay}-data (not bind mounts). - dedicated indeedhub-net + network_aliases [postgres\|redis\|minio\|relay\|api] so the api/ffmpeg env hostnames and the frontend nginx upstreams resolve unchanged. - generated_secrets (indeedhub-db-password/-minio-password owned by their backends, indeedhub-jwt by the api) reuse the live /var/lib/archipelago/ secrets values (ensure_one no-ops on existing files; postgres pw is fixed at PGDATA init). minio user "indeeadmin" + AES_MASTER_SECRET literal kept. The frontend carries the post_install hook (#20) that replaces the hardcoded patch_indeedhub_nostr_provider: strip X-Frame-Options, refresh nostr-provider.js from /opt/archipelago/web-ui, inject the <script> if absent, reload nginx — defensive/idempotent since indeedhub:1.0.0 already bakes these. Frontend manifest also corrected off its dead Next.js shape (health check now nginx :7777, tmpfs /run + /var/cache/nginx). Builds + unit-tested; live adoption/lifecycle verification on .228 next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 15:46:26 -04:00
archipelago	b94b61f640	feat(manifest): network_aliases — extra DNS aliases on a container's network Add `container.network_aliases: Vec<String>` (serde default, DNS-label validated) so a stack member can answer to short hostnames its peers bake in, beyond its own container name. Rendered in both runtime paths: - podman_client: merged (deduped) into the custom-network aliases array. - quadlet from_manifest: appended after the container name; emitted only for Bridge networks (slirp/pasta reject aliases). Needed for the indeedhub migration: its frontend nginx proxies to `api:4000` / `minio:9000` / `relay:8080`, so those members declare `network_aliases: [api\|minio\|relay]` to keep the short names resolvable on the dedicated indeedhub-net (vs. colliding generic aliases on archy-net). Also fixes 4 pre-existing from_manifest test failures (unrelated to this change, surfaced now that the quadlet suite runs green): test manifests used the long-invalid `network_policy: archy-net` (allowlist is isolated/bridge/host → moved to network_policy: isolated + container.network) and bind sources outside /var/lib/archipelago. Tests: container crate 53 pass; archipelago quadlet+alias 47 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 15:45:11 -04:00
archipelago	955c54b713	feat(hooks): post_install executor + install-path wiring (#20 phase 2) Add container::hooks::run_post_install — runs an app's declarative post_install hooks against its own running container: - Exec -> podman exec <container> <args…> (60s timeout-bounded) - CopyFromHost -> resolve src against allowlist roots (<data_dir>/<app> and /opt/archipelago), canonicalise + prefix-check (defeats symlink escape), then podman cp <abs-src> <container>:<dest> Best-effort + idempotent: a failed step is warned and skipped, never fails the install — matching the legacy patch_indeedhub_nostr_provider behaviour this replaces. Wired into install_fresh after the container is up, so it runs only on a freshly created container (not plain start), and re-applies on recreate-after-drift. 5 unit tests on resolve_copy_src (accept in-data-dir, reject absolute / traversal / missing / symlink-escape). cargo test -p archipelago green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 11:45:28 -04:00
archipelago	f160e0c404	fix(reboot): enable podman-restart.service at startup (--restart reboot-survival) Orchestrator-installed backends (immich, btcpay-db, …) run as plain podman `--restart=unless-stopped` containers until the Phase-3 Quadlet rollout flips use_quadlet_backends on. Nothing in the codebase enabled the user's podman-restart.service, so those containers had NO reboot-survival mechanism. Enable it (idempotent, best-effort) at orchestrator startup so unless-stopped containers come back after a reboot. Already applied manually on .228 (covers 31 containers incl. immich + btcpay); this codifies it fleet-wide. The deeper fix (render Quadlet for all orchestrator installs) remains the gated Phase-3 Quadlet-everywhere rollout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 08:23:19 -04:00
archipelago	d5ef45731a	fix(immich): restore canonical app_id "immich" (title + icon) After the manifest migration the launcher installed as "immich-server" (app_id), which has no catalog entry → showed the raw id and no icon. Rename the server manifest app_id immich-server→immich so it matches the catalog/curated "immich" entry (title "Immich", icon immich.png) and is recognised as a known launcher app (APP_CATEGORY_MAP) → stays in My Apps. immich_stack_app_ids now installs [immich-postgres, immich-redis, immich]; orchestrator.install bypasses package routing so there's no recursion with the "immich"→stack-installer mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 08:07:08 -04:00
archipelago	9e6c5370fc	feat(immich): manifest-driven stack via orchestrator — live-migrated on .228 Completes the immich migration off the legacy hardcoded install_immich_stack (podman run + sudo chown) to the registry-manifest + orchestrator path. Validated live on .228 (clean single set, healthy v2.7.4, data dir ownership correct). - install_immich_stack now tries install_stack_via_orchestrator(immich_stack_app_ids) first; legacy remains only as the no-manifests fallback. - immich-{postgres,redis,server} manifests corrected from live findings: * named by app_id (dropped container_name override) — using container_name spawned DUPLICATE containers (app_id-named install vs name-override reconcile) on the same PGDATA, which corrupted a postgres cluster. Server reaches its siblings via app_id aliases (DB_HOSTNAME=immich-postgres, REDIS=immich-redis). * immich-postgres data_uid 100998:100998 (postgres drops to container 999 → host 100998 under rootless; verified the fresh dir is chowned correctly). * immich-server version "release"→"2.7.4" (manifest validation requires a digit; the bad version made the manifest silently skip → partial orchestrator install → legacy fallback → the duplicate corruption above). - HARDEN install_stack_via_orchestrator: only fall back to the legacy installer when NOTHING was installed yet. An "unknown app_id" AFTER a member is up now errors instead of double-creating containers on shared data (the corruption root cause). - Strict the all-manifests round-trip test: fail (not skip) on any invalid shipped manifest — this gap let the bad immich-server version through. Known follow-up (pre-existing, platform-wide): orchestrator-installed backends (immich, btcpay-db) run as podman --restart, not Quadlet, and podman-restart.service is disabled on .228 → reboot-survival gap independent of this migration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 07:08:45 -04:00
archipelago	7bfbe8fe40	feat(registry-manifest): phase 2 — publisher embeds manifests into signed catalog generate-app-catalog.sh gains opt-in EMBED_MANIFESTS=1: embeds each apps/<id>/manifest.yml into its catalog entry's `manifest` field (whole document, top-level app: preserved — exactly what the Rust side deserializes). Default off so routine catalog regen is unchanged during the migration window; turn on deliberately, then sign via the existing release-root ceremony. Verified: default embeds 0; EMBED_MANIFESTS=1 embeds 40 manifests (generated_secrets preserved). Adds a round-trip guard test: every shipped apps/*/manifest.yml must deserialize + validate through catalog_manifest_to_overlay (image apps accepted, build apps defer to disk) — catches schema drift between disk manifests and the catalog path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 05:46:17 -04:00
archipelago	220666d3a9	feat(registry-manifest): phase 1 — orchestrator consumes manifests from signed catalog Workstream B phase 1 (node-side consume). The signed app-catalog can now carry a full manifest per entry; the orchestrator overlays it over the disk manifest (origin-wins) with disk as the migration fallback. Moves apps toward registry-distributed manifests with no OTA-shipped disk file. - app_catalog: `manifest: Option<Value>` on AppCatalogEntry (forward-compatible, covered by the existing release-root signature over the raw JSON); `catalog_manifest_values()` accessor. - prod_orchestrator: `load_manifests` overlays catalog manifests after the disk walk; `catalog_manifest_to_overlay()` returns None (→ disk fallback) on unparseable value / app-id mismatch / failed validate() / build source (build contexts aren't registry-distributed yet — phase 1 is image-only). - manifest_dir stays PathBuf (build-only field); image-only apps never read it. - 6 unit tests; compiles clean. No-op until a catalog embeds a manifest, so existing nodes are unaffected. See docs/registry-manifest-design.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 05:30:38 -04:00
archipelago	03a4ee1b30	feat(container): manifest-declared generated secrets + companion/quadlet hardening Generated-secrets system: apps declare `generated_secrets` in their manifest (kinds hex16/hex32/bcrypt); `container::secrets::ensure_generated_secrets` materialises them 0600/rootless in resolve_dynamic_env — idempotent and self-healing (recovers wrongly root-owned secrets with no privilege). Replaces per-app Rust (deletes ensure_fmcd_password). fedimint-clientd/gateway manifests now declare fmcd-password / fedimint-gateway-hash. companion.rs: rebuild the auto-built :latest image when its build context changes (staleness check) so baked-in fixes (e.g. guardian-UI CSS) actually reach nodes. quadlet.rs: skip PublishPort under Network=host (podman rejects the combo, exit 125) + regression tests. UI: "Fedimint Guardian" rename, fedimint-clientd/nostr-rs-relay/meshtastic tagged as Services (headless backends), gateway icon fallback. Deployed + verified on .228 (generated-secrets fixed fedimint-gateway start; grafana/strfry orphan crash-loop units removed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-21 05:11:07 -04:00
archipelago	db7d424bff	feat(content): owned-content persistence + Fedimint paid downloads, fmcd caps fix, FIPS warm-path perf Buyer-side paid downloads now persist: purchases are cached on disk (content_owned.rs) keyed by (seller onion, content_id), the gallery shows an "Owned" badge unblurred, and items view/play in-app from the local cache with no re-payment or reliance on a browser download (which silently failed on the mobile companion). New RPCs content.owned-list / content.owned-get. Validated e2e .116<-.198 (paid 100 sats via Fedimint, 166KB jpeg returns, survives restart). fedimint-clientd manifest: restore the standard container capability set (CHOWN/DAC_OVERRIDE/FOWNER/SETUID/SETGID) so fmcd's startup chown of an existing-federation /data succeeds instead of dying EPERM (#7). Confirmed the orchestrator applies these to the running container. FIPS perf: tighten the supervisor warm-path keepalive 45s -> 25s so peer paths stay inside the ~30-60s NAT cold window. Dials now reliably land on FIPS instead of re-punching and falling back to Tor. Measured to the same peer: cloud browse 18-22s -> 0.4s; full Fedimint paid download 29s -> 11s (residual is the seller-side guardian reissue round-trip). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 18:58:52 -04:00
archipelago	63b98599e8	Revert "fix(fedimint): run fmcd with seccomp=unconfined so its DHT can start (#7 )" This reverts commit 409543c41e78025354acbdde5ffc6445895d4508.	2026-06-20 14:37:24 -04:00
archipelago	409543c41e	fix(fedimint): run fmcd with seccomp=unconfined so its DHT can start (#7 ) fmcd crash-looped "Operation not permitted (os error 1)" on .116 (kernel 6.12.74): the default rootless seccomp profile blocks a syscall its Mainline-DHT / iroh transport needs, so the REST API never came up (:8178 → HTTP 000) and federations couldn't be joined. Verified: with seccomp=unconfined fmcd boots and answers /v2/* (HTTP 401 instead of dead). fmcd works on other nodes, so this is kernel/seccomp-specific — but the relaxation is safe for an outbound-networking daemon and harmless where not needed. - new `security.seccomp_unconfined` manifest flag (SecurityPolicy); - libpod backend sets `seccomp_profile_path: "unconfined"` (== --security-opt seccomp=unconfined); quadlet backend emits `SeccompProfile=unconfined`; - enabled in apps/fedimint-clientd/manifest.yml. NOTE: manifests live on-disk at /opt/archipelago/apps/<id>/manifest.yml, so the node needs the updated manifest deployed + the fmcd container recreated to apply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 13:08:13 -04:00
archipelago	12f54e390d	feat(wallet): ecash pay confirmation screen + auto-refund on failed sale (#3 ) - PeerFiles: new confirmation step after "pay from ecash" — shows the amount and which wallet will be spent (Cashu/Fedimint) with balances, lets the user switch backends, and a styled Confirm button. The chosen backend is passed to the payment so it spends exactly what was confirmed. - content.download-peer-paid: accept `method` (cashu\|fedimint) to honor the confirmed choice; log the backend + outcome; backend-specific rejection errors ("not in the same Fedimint federation" / "doesn't accept your Cashu mint"). - AUTO-REFUND: a minted token whose sale fails (peer unreachable, rejected, or error) is now reclaimed (fedimint reissue / cashu receive) so the buyer no longer loses the spent ecash — fixes the stuck-Fedimint-notes report. - wallet.ecash-balance already reports cashu_sats/fedimint_sats/total_sats which the confirm screen uses to pick/show the covering wallet. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 12:16:02 -04:00
archipelago	a6957a48f7	fix(netbird): wait for OIDC discovery before reporting install done (#10 ) Right after install the dashboard SPA opens and, if it loads before NetBird's embedded OIDC provider is serving, caches a bad auth state — the user appears logged-in but can't log out until it self-corrects. Container "running" != OIDC ready, so gate the install's Done phase on the management server's /oauth2/.well-known/openid-configuration answering (best-effort, 60s cap, never fails the install since the stack is already up). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 08:57:37 -04:00
archipelago	8f06d88fbf	feat(wallet): pay for peer files from BOTH Cashu and Fedimint ecash (#3 ) Paying for a peer file minted a Cashu-only token, so a node whose ecash balance lived in Fedimint couldn't pay even with funds. Now both backends are tried: - payer (content.download-peer-paid): mint a Cashu token first; on failure fall back to spending Fedimint notes. Only error if BOTH backends can't cover it. - seller (verify_and_receive_payment): accept Fedimint notes as well as Cashu — anything not starting with "cashu" is redeemed via reissue_into_any. - new fedimint_client::spend_from_any() — spend from whichever joined federation has the balance, returning the notes + federation id (mirrors reissue_into_any). - wallet.ecash-balance now also reports fedimint_sats + combined total_sats; the pay-for-file pre-check uses the combined total so a Fedimint-funded node isn't wrongly blocked. Compiles (cargo check + vue-tsc). Live cross-node federation validation pending (dual-ecash phase 6) — needs two nodes sharing a federation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 08:13:23 -04:00
archipelago	f92e442bfc	fix(mesh): collapse cross-transport twin contacts into one conversation (#12 ) A node reachable both over LoRa and federation has two MeshPeer rows (radio twin: low contact_id + firmware key; federation twin: high contact_id + archipelago key), and messages key by peer_contact_id split across the two ids — so opening one twin shows an empty thread (the .120->.89 symptom). - backend: new group_peer_twins() helper groups peers by arch_pubkey_hex (set on BOTH twins by bind_federation_twins), keeps the radio id as the mesh-first send target, and unions messages across all twin ids. Wired into conversations.list / conversations.messages / mesh.contacts-list. +3 unit tests. - frontend: the live chat list merges client-side (mergedPeers) and matched twins by the "Archy-z6Mk..." advert prefix, which the Meshtastic device rename broke (radio now advertises the server name). Merge by arch_pubkey_hex instead, which the backend reliably sets on both twins. Expose arch_pubkey_hex on MeshPeer. - fix unrelated stale test: EcashTransaction test missing the new `kind` field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 08:01:14 -04:00
archipelago	d00d1b20d7	fix(mesh): rename Meshtastic radio to the node's server name Meshtastic device rename was a no-op — set_advert_name only updated an in-memory field and never told the radio, so the device kept its firmware default ('Meshtastic xxxx') and wasn't findable from external Meshtastic apps. MeshCore already renamed correctly (CMD_SET_ADVERT_NAME); this brings Meshtastic to parity. Send an AdminMessage{set_owner=User{long_name,short_name}} to the locally connected node (admin packet to our own node_num on the ADMIN_APP port). Local serial admin needs no session passkey, matching the official client. long_name = server name (<=39 chars); short_name = first 4 alphanumerics, upper-cased. Verified on real hardware: .120 -> 'Archy-X250-EXP', .5 -> 'Archy-X250-Beta' (name read back from the radio after reconnect). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-20 06:04:22 -04:00
archipelago	63611a4453	fix(mesh): honour explicit !ai allowlist for unauthenticated stock clients A stock meshcore client (e.g. a phone) can't sign our typed envelopes, so it is never 'authenticated' — which meant ticking it as an allowed assistant contact had no effect and !ai stayed denied. The explicit per-contact allowlist is a deliberate operator opt-in for a specific key, so match it regardless of authentication, keyed on the asker's resolved identity (bound archipelago key, else firmware routing key — how meshcore addresses the contact). The spoofable federation-trust-list match still requires authentication. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 16:43:30 -04:00
archipelago	7831e68d13	fix(wallet): redeem across all federations, unified ecash history, fmcd healthcheck - reissue_into_any now tries the UNION of the local registry AND fmcd's live joined set (/v2/admin/info) before failing, so a valid Fedimint token isn't wrongly rejected when the registry has drifted. On all-fail it returns a friendly message: notes already redeemed into this wallet (funds safe) vs didn't match any connected federation. - Unified transaction history: a local Fedimint tx log (recorded on each successful redeem) is merged with the Cashu history in wallet.ecash-history, newest-first, each tagged kind=cashu\|fedimint. Previously a Fedimint receive appeared nowhere. - fedimint-clientd healthcheck -> type:tcp. It was probing /health, which fmcd doesn't serve (only /v2/*), pinning the container in (starting) forever; the TCP probe is skipped by the Quadlet renderer (host-side lifecycle verifies), so it reports running. Cosmetic for ecash, which worked throughout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 16:43:29 -04:00
archipelago	a0b80dd27d	fix(mesh): authenticate !ai over LoRa via federation-twin binding + signed Text A !ai (or any typed message) from a trusted, federated node was denied when it arrived over the radio. The radio half of a node that is also a federation peer carried no archipelago identity (identity adverts are no longer broadcast on the public channel), so the trusted_only gate and signature verification had no key to check the asker against — and the same node showed up as two contacts (a radio twin + a federation twin). - bind_federation_twins(): correlate a radio contact with its federation twin by exact, case-insensitive advert_name and copy the federation peer's arch_pubkey_hex/did/x25519 onto the radio record. Called from upsert_federation_peer and refresh_contacts. Ambiguous names (held by >1 federation peer) are skipped. This is only a CANDIDATE key — security is unchanged: the inbound envelope signature must still verify against it. - send_message now signs the typed Text envelope (new_signed) so a radio !ai authenticates against the bound key. A meshcore node merely named like a trusted node cannot forge the signature, so it is still denied. Receiver-side verification (handle_typed_envelope_direct) and federation-trust matching (is_sender_allowed) already existed; this supplies the missing key binding and signature. Also resolves the radio/federation duplicate-contact display for same-named nodes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 13:57:50 -04:00
archipelago	75e470bfa4	fix(mesh): mesh-preferred message routing with FIPS/Tor fallback Messages to a federated peer that is out of LoRa range (e.g. on another continent) were dropped into the radio with no fallback, or hung on a dead FIPS path before reaching Tor — so they never arrived. - Route a radio contact over the federation transport (FIPS->Tor) when it is the same node as a federated peer (known archipelago identity -> onion) AND it is not currently reachable over the radio. Reachable radio peers stay on the mesh (preferred); oversized/file envelopes still always take federation. - Resolve the onion via the archipelago identity key (arch_pubkey_hex), not the firmware routing key, so a radio contact maps to its nodes.json onion. - Add .fips_timeout(8s) to the federation message POST so an unreachable FIPS overlay fast-fails to Tor (~3-5s) instead of burning the 120s budget. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 10:09:14 -04:00

1 2 3 4 5 ...

618 Commits