1495 Commits

Author SHA1 Message Date
archipelago
a57ae388ec fix(mesh): restore Meshtastic inbound stream after radio reboot
archy went deaf to inbound LoRa packets after every config write.
A config write (region/channel/owner) reboots the radio, which resets
the firmware PhoneAPI to STATE_SEND_NOTHING; it won't stream received
packets again until the client re-sends want_config. archy ignored
FromRadio.rebooted (field 8) so never resubscribed — which is why old
messages only arrived after a full restart (restart = fresh want_config).

- meshtastic.rs: handle FROM_RADIO_REBOOTED -> set pending_reinit;
  try_recv_frame re-sends want_config to resubscribe the packet stream.
  Add send_keepalive (bare heartbeat) and pin modem_preset=LONG_FAST in
  set_lora_region so all radios share frequency.
- listener/session.rs: MeshRadioDevice::send_keepalive; 10s sync_timer
  sends a keepalive each tick (insurance vs 15-min idle serial close).
- mod.rs send_message: device-aware send — Meshtastic archy peers get a
  plain TEXT_MESSAGE_APP DM (firmware PKC E2E); Meshcore archy peers keep
  the typed envelope (no meshcore regression).

Verified: .198->.228 directed DM arrives as RECEIVED enc=True
peer="Arch Optiplex"; all 3 nodes (.116/.198/.228) + 3ccc hear each
other. Binary 737b16c3 deployed+active on all three.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 12:44:31 -04:00
archipelago
fbfeeeb0f5 fix(mesh): native E2E DM for archy↔archy text + software radio-reboot
- send_message now sends archy↔archy plain text as a native TEXT_MESSAGE_APP
  DM (firmware PKC-encrypts E2E), not wrapped in the binary typed envelope
  that silently broke archy↔archy LoRa delivery. Archy peers' Sent rows are
  marked encrypted so the E2E pill shows; rich typed msgs still use the
  typed-wire path.
- Add a software radio-reboot to recover a wedged/RX-deaf radio without
  physical access (and for the Device-tab settings panel): driver reboot()
  via AdminMessage reboot_seconds=97 (verified vs meshtastic/protobufs),
  MeshCommand::RebootRadio, MeshService::reboot_radio, RPC mesh.reboot-radio.
- Handoff doc: docs/SESSION-1.8.0-OTA-PROGRESS.md "RESUME HERE" — RF link is
  the proven blocker (radios not hearing each other); modem_preset mismatch
  is the prime suspect; on-device Meshtastic-app check + fix plan documented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 10:39:34 -04:00
archipelago
b4531bb4fc fix(mesh): enforce LoRa-only off-grid labels 2026-06-30 06:22:45 -04:00
archipelago
2ac0711f8e fix(ui): refresh mesh transport labels after send 2026-06-30 06:05:41 -04:00
archipelago
a91814641e fix(mesh): set Meshtastic hop limit and show LoRa pill 2026-06-30 05:59:53 -04:00
archipelago
c2c4b5af7d merge: demo build updates
# Conflicts:
#	neode-ui/src/stores/appLauncher.ts
#	neode-ui/src/views/AppSession.vue
2026-06-30 05:22:42 -04:00
archipelago
daf750688d merge: mesh multiversion and transport pills
# Conflicts:
#	core/archipelago/src/mesh/listener/decode.rs
#	core/archipelago/src/mesh/meshtastic.rs
2026-06-30 05:19:58 -04:00
archipelago
4b7cbf2b5e merge: bitcoin version bulletproof and OTA work 2026-06-30 05:08:27 -04:00
archipelago
df9d3a55be integration: preserve deployed 1.8.0 OTA work 2026-06-30 05:08:17 -04:00
archipelago
7b0748c868 fix(mesh): respect the radio's flashed LoRa region (don't force ours)
ensure_lora_region previously force-overrode the device's region with the
mesh-config region (EU_868) whenever they differed — which would shove a US/ANZ
user's radio onto EU_868: an illegal band that also cuts it off from its local
mesh. Off-the-shelf interop must respect whatever region the user flashed.

Now: a radio that already reports a REAL region (US, EU_868, ANZ, …) is left
untouched. We only set a region when the device reports UNSET (a fresh radio is
RF-silent and can't mesh at all), using the operator-configured region as the
fallback. Unknown/None (never reported) is also left alone. Pairs with the
default-channel change so a meshtastic archy node behaves like a stock device.

cargo check green (built into the same binary as the channel fix).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 08:36:04 -04:00
archipelago
810127fd3e feat(mesh): meshtastic off-the-shelf interop — default channel + private archipelago
Make a meshtastic-equipped archy node work like a stock Meshtastic device AND
keep the private archy group, instead of being isolated on a custom primary:
- slot 0 (PRIMARY)  = the DEFAULT public channel (empty name + default key) →
  interoperates with every off-the-shelf device on LongFast and picks up
  default-channel users; our NodeInfo broadcasts ride here like normal.
- slot 1 (SECONDARY) = "archipelago" (deterministic psk) → private archy↔archy.

Previously the driver set "archipelago" as the PRIMARY, isolating archy from the
public mesh. Now ensure_channel writes at most one channel per call (default
primary first, then archipelago secondary), reusing the existing reboot→
reconnect→re-check loop so it converges in ≤2 cycles without reboot-looping;
primary_is_default() accepts the default key in 1-byte or expanded form so a
stock radio is never needlessly rewritten. set_channel generalized to
(index, name, psk, role); want_config parse tracks both slots.

MeshCore needs no change — it never overrides channels (ensure_channel is a
no-op) and already rides MeshCore's default Public channel off the shelf.

cargo check green. NEEDS radio verify on .116/.198 (default-channel RX + archy
group on the secondary). Channel provision cap (3) covers the 2-write migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 07:40:10 -04:00
archipelago
067002b04b Merge branch 'bitcoin-version-bulletproof' into mesh-multiversion-integration 2026-06-29 06:45:50 -04:00
archipelago
20f762cb2c feat(fips): auto-peer LAN-discovered federation nodes directly over FIPS
Mesh/federation messages between co-located nodes were always falling back to
Tor because the FIPS overlay had no direct peering — every node depended on the
global anchor's spanning tree, and when that anchor link flaps a node is
isolated and all FIPS dials time out. (Diagnosed live on .116/.198: pure-FIPS
direct peering over UDP 8668 fixes it — 2.5ms vs timeout.)

Generalize the manual fix: in the existing 5-min FIPS seed-anchor apply loop,
also auto-connect every federation peer the PeerRegistry knows both a LAN
address AND a FIPS npub for, dialing its FIPS UDP transport (port 8668) at its
LAN IP via the same idempotent `fipsctl connect` path (new
anchors::lan_fips_anchors). This is FIPS's own transport over the LAN — NOT
Tailscale, NOT the HTTP/LAN messaging port. Transient (recomputed each tick from
live mDNS discovery, never persisted) so changing IPs self-correct. Remote peers
with no LAN address are untouched (still routed via the anchor).

Registry Arc hoisted out of the transport-init block so the loop can read
all_peers(). cargo check green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:42:18 -04:00
archipelago
11155055aa feat(mesh): meshtastic PKI E2E pill — surface pki_encrypted on received DMs
The synthetic meshcore-style frame the meshtastic driver builds can't carry the
radio's PKI-encryption status, so received meshtastic DMs never lit the E2E pill.
Thread it out-of-band: the device records `last_rx_encrypted` (= packet
pki_encrypted) when it yields a text frame; the session loop reads it via
`take_rx_encrypted()` right after dispatch and stamps the just-stored received
message E2E (dispatch::stamp_received_encrypted, monotonic-id keyed). Meshcore
returns false here (its E2E is derived in the frames decrypt path). Pure
out-of-band signal — no change to the shared meshcore wire format.

Built + deployed live in binary d937814e on .116/.198. cargo check green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:25:01 -04:00
archipelago
f4f45c1a09 docs: mark .228 reindex finish/verify as other-agent owned
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:04:01 -04:00
archipelago
ed1352d3a3 docs+catalog: bitcoin multi-version rollout handoff + reproducible generator
- generate-app-catalog.sh: VERSIONS map now lists the full Knots set
  (29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core
  (adds 29.2 + a `latest` entry → newest); generator forces top-level
  `version` == the default entry's version (the 169ff2e2 invariant) so
  regeneration is reproducible. releases/app-catalog.json regenerated.
- docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes,
  fixes, current .228 state, the coordinated fleet-rollout steps (incl.
  :latest repoint sequencing / fleet-safety), reindex finish procedure, and
  the switch-matrix test plan.
- PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 06:02:24 -04:00
archipelago
095a76cd20 fix(bitcoin): bulletproof multi-version switching (Knots & Core)
Three stacked bugs made "switch version" silently fail / crash-loop, and
the data-access mismatch corrupted a node's index during recovery attempts.

Backend renderer:
- sync_quadlet_unit ignored the per-app pinned version and re-rendered the
  quadlet with the manifest's :latest every reconcile tick, reverting any
  switch. Factor the install-time catalog/pin resolution into a shared
  resolve_catalog_image() and call it in BOTH install_fresh and
  sync_quadlet_unit.
- The renderer folded manifest `entrypoint: ["sh","-lc"]` into Exec=, which
  only worked when the image entrypoint was a passthrough shell wrapper. The
  versioned images use ENTRYPOINT ["bitcoind"], so Exec=sh -lc ... became
  `bitcoind sh -lc ...` and crash-looped. Emit a real Entrypoint= override;
  exec_changed now also compares Entrypoint=.

Images:
- Build all bitcoin images (Core + Knots, every version) as container-root
  (USER removed) like the legacy :latest image. Chain data is owned by the
  data_uid (container uid 102); root reads it via CAP_DAC_OVERRIDE (granted in
  the manifest). A non-root USER (the previous uid 1000) can't read existing
  chain data → "Error initializing block database". Still fully rootless:
  container-root maps to the unprivileged host service user.

Catalog:
- bitcoin-knots versions[]: 29.3.knots20260508/20260507/20260210 +
  29.2.knots20251110, "latest" tracking newest.
- bitcoin-core versions[]: add 29.2 + a "latest" entry. All images rebuilt
  root and published to the mirror.

Frontend:
- AppSidebar version dropdown: rename the latest option to "Always use the
  latest version" (no v prefix), fix right padding, and guarantee the current
  selection matches a real option (was rendering blank).
- New InstallVersionModal: full-screen version chooser shown from the App
  Store / Discover install button for multi-version apps (Bitcoin Knots/Core),
  app icon + "Install <name>", latest pre-selected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 05:46:04 -04:00
archipelago
3c7c04a662 fix(mesh): meshtastic receive — drain frame batch per poll + rx diagnostics
Addresses the open Meshtastic parity bug (project_meshtastic_parity): the
running driver received nothing (`mesh.messages` stayed []) though the radio
got the packets and sends worked.

Root-cause candidate: `try_recv_frame` decoded ONE serial frame per poll and
returned Ok(None) for every non-text FromRadio frame, so the session loop slept
50ms between frames. Under Meshtastic's frequent NodeInfo/telemetry stream a
received text packet queued behind them, and read_from_radio's 64KB buffer cap
could drain (drop) it before it was ever decoded — reception silently dead while
sends kept working.

- try_recv_frame now drains a bounded batch (64) per poll, processing each
  frame's side effects and returning the first inbound text frame, so a text
  packet is decoded the same poll it arrives and the buffer never grows enough
  to hit the lossy cap. Bounded so a continuous flood still yields to select!.
- packet_to_inbound_frame logs every decoded packet (from/portnum/payload_len)
  and a "did not parse (dropped)" case, so one live radio pass is conclusive.

The rest of the decode path was verified correct by inspection (FROM_RADIO_PACKET
=2, wire-type-5 handled, parse_mesh_packet sound, 60s heartbeat present) — not a
parse bug. cargo check green. NEEDS a live radio pass on a rig that isn't .228
(off-limits: bitcoin testing) to confirm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 05:04:09 -04:00
archipelago
11038cdcc9 feat(mesh,ui): per-message transport pill (Mesh/FIPS/Tor) + fix E2E pill
Adds a per-message transport badge to archy↔archy mesh chats and fixes the
long-broken E2E badge — both meshcore and meshtastic, styled like the existing
E2E pill.

Transport pill:
- New `MeshMessage.transport` ("lora"/"fips"/"tor"), surfaced in the UI beside
  the E2E badge (Mesh.vue transportLabel() → Mesh/FIPS/Tor, mesh-styles.css).
- Sent LoRa → "lora"; sent federation → finalized to the real leg ("fips"/"tor")
  once the background send resolves (req.send_json transport), via an id-keyed
  store update.
- Received: a post-dispatch stamp on handle_typed_envelope_direct's output
  (monotonic ids) tags both transports without threading through all 20 typed-
  dispatch sites — radio wrapper stamps "lora", federation injector stamps the
  peer's last_transport ("fips"/"tor", default tor; the inbound HTTP carries no
  FIPS-vs-Tor signal).
- Plain native/channel LoRa frames → "lora"; channel broadcasts stay non-E2E.

E2E pill fix:
- `encrypted` was hardcoded false at every MeshMessage construction site, so the
  UI badge (Mesh.vue `v-if="msg.encrypted"`) never showed. Now: federation
  envelopes are E2E (identity-signed over an encrypted transport); the meshcore
  native-DM receive path already had a real `encrypted` flag (now also tagged
  with transport). meshtastic-PKI radio E2E flag threading is a noted follow-up.

Backend cargo check + frontend vue-tsc build both green. Needs a live radio +
multi-transport pass on .116/.228 to confirm end-to-end (see
project_transport_pill / project_meshtastic_parity).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 04:29:25 -04:00
archipelago
169ff2e2cd fix(bitcoin): knots catalog default must equal top-level version
The knots versions[] marked 29.3.knots20260508 as default while the
top-level catalog version is the floating 'latest' tag — violating the
generator's own invariant (default:true MUST equal the top-level version
so selecting it un-pins / tracks latest). Live effect via package.versions:
catalog_default_version='latest' so the UI-highlighted default actually
PINS+recreates (opposite of un-pin) and 'latest' was unreachable from the
Version & Updates card.

Add a 'latest' default entry (== the manifest's floating tag) and keep
29.3.knots20260508 as a pinnable option. Verified on .228: package.versions
now returns default=latest with 2 selectable versions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 19:56:49 -04:00
archipelago
da20f67462 Merge bitcoin-multi-version: multi-version support for Core & Knots
Integrate the bitcoin-multi-version feature (commit 6aa74c73): per-node
choice/pin/switch of Bitcoin Core & Knots versions with auto-update toggle —
catalog versions[] schema, install-time selection, package.versions +
package.set-config RPCs, hourly per-app auto-update tick, build-bitcoin-image.sh
(GPG+SHA verified rootless image builder), and UI (version select + Version &
Updates card). Catalog regenerated; preserves the mempool 127.0.0.1 health fix.

Not yet live-verified on .228 — gate any tagged release on that per CLAUDE.md.
2026-06-28 18:48:38 -04:00
archipelago
6aa74c7386 feat(bitcoin): multi-version support for Core & Knots (install/switch/pin/auto-update)
Lets a node runner choose which Bitcoin Core / Knots version to install
(latest pre-selected), then switch, pin, or opt into auto-update from the
app's interface — all manifest/catalog-driven, rootless, signed-registry,
zero-data-loss. Motivated by upcoming BIP-110 signalling: runners need a
real choice of software version.

Backend:
- version_config.rs: per-app pin + auto-update persistence (atomic, merge-
  preserving), downgrade detection, auto-update enumeration (+ unit tests).
- app_catalog.rs: CatalogVersion / versions[] schema, catalog_versions(),
  catalog_image_for_version() (same-repo guard); a pin suppresses the update
  badge.
- prod_orchestrator.rs: pinned version wins over the catalog default on every
  install/recreate.
- install.rs: install-time `version` param persisted (default = unpinned).
- set_config.rs: package.versions (read) + package.set-config (write) RPCs;
  downgrade is gated behind explicit confirm (warn + confirm + allow).
- update.rs/main.rs: hourly per-app auto-update tick via the orchestrator
  (opt-in, pin-respecting); fix handle_package_update to be non-fatal for
  orchestrator-managed apps lacking a catalog primary image (bitcoin-core).

UI:
- MarketplaceAppDetails.vue: install-time version selector (shown when an app
  offers >=2 versions).
- appDetails/AppSidebar.vue: "Version & Updates" card (switch / pin / auto-
  update toggle / downgrade warning), per app.
- rpc-client.ts + en.json: RPC methods, types, strings.

Phase 0 image pipeline:
- scripts/build-bitcoin-image.sh: download official tarball + SHA256SUMS(.asc),
  verify SHA-256 + pinned-maintainer OpenPGP signature (fail-closed), build a
  minimal rootless image, smoke-test, tag + push.
- apps/bitcoin-core/Dockerfile rewritten (drops stale community base);
  apps/bitcoin-knots/Dockerfile added.
- generate-app-catalog.sh: emit curated versions[]; published + catalog now
  offers Core 25.2/26.2/27.2/28.4/29.3/30.2/31.0 + Knots 29.3.knots20260508.

docs/bitcoin-multi-version-design.md: live progress tracker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 18:46:17 -04:00
archipelago
3cea7dd6c5 test(phase3): fix Phase-3 quadlet gates — define fail(), drop stale Notify=healthy assert
Two Phase-3 bats suites used `fail` (a bats-assert helper) but bats-assert
isn't installed on the alpha fleet (only bats-core), so every tripped
assertion crashed with `fail: command not found` (status 127) instead of
reporting a real pass/fail. Define the same minimal `fail() { echo ...;
return 1; }` the other suites already use (see mempool.bats). Without this
the gates were silently non-functional.

Also rewrite the obsolete "HealthCmd= implies Notify=healthy" assertion in
use-quadlet-backends-install.bats. Phase 3.4's Notify=healthy was
deliberately reverted: gating `systemctl start` on health hung boot
reconciliation for dependency-waiting apps (fedimint idles until Bitcoin
IBD; lnd until macaroon unlock), leaving units stuck "deactivating". The
renderer now emits HealthCmd= for Podman's health state but TimeoutStartSec=0
and NO Notify=healthy (quadlet.rs render() + contains_stale_health_gate()).
The test now asserts the current invariant: no backend unit gates start on
health.

Verified on the .228 canary node (ARCHIPELAGO_USE_QUADLET_BACKENDS=1):
use-quadlet-backends-install 6/6, backend-survives-archipelago-restart 3/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 16:09:05 -04:00
archipelago
d7c6f8c348 fix(mempool): health-check 127.0.0.1 not localhost (stops false-unhealthy loop)
The archy-mempool-web health_check endpoint used http://localhost:8080.
Inside the frontend image, wget resolves `localhost` to ::1 (IPv6) first,
but nginx binds 0.0.0.0:8080 (IPv4) only -> the baked HealthCmd gets
"connection refused" every probe -> container is perpetually unhealthy ->
the reconciler recreates it forever (observed on .228: mempool container
re-Started every ~3 min, Health=unhealthy). Proven live: in-container
`wget http://localhost:8080/` = refused, `wget http://127.0.0.1:8080/` = OK.

Pin the probe to 127.0.0.1 so it matches nginx's IPv4 bind. Updated both
the source manifest and the embedded copy in releases/app-catalog.json
(the catalog overlay wins over the disk manifest on fleet nodes, so the
catalog copy is the one that actually reaches .228).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 15:09:34 -04:00
archipelago
83344b9f3a fix(orchestrator): drop legacy mempool umbrella manifest on catalog-driven nodes
The split-mempool-stack guard that skips the legacy monolithic `mempool`
manifest (whose container collides with its split-stack frontend member
`archy-mempool-web`) only ran over DISK manifests. On catalog-driven nodes
(no disk manifests — e.g. the Phase-3/registry-manifest path), the legacy
`mempool` manifest arrives via the registry-catalog overlay AFTER that
guard, so both `mempool` and `archy-mempool-web` end up owning container
`mempool` and rewrite+restart each other forever ("port binding drift" /
"network alias drift" loop observed on .228, leaving mempool down).

Enforce the guard once more over the merged (disk + catalog) manifest set:
drop the `mempool` umbrella whenever all three split members are present.
Installing `mempool` assembles the split stack, so `archy-mempool-web`
owns the frontend container either way.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 14:04:41 -04:00
archipelago
05c22b6085 fix(mempool): correct frontend container port 4080->8080 (stops restart loop)
The mempool manifest + embedded catalog declared the frontend container
port as 4080, but mempool-frontend nginx listens on 8080 (the stack
creates it as -p 4080:8080 with FRONTEND_HTTP_PORT=8080, see
api/rpc/package/stacks.rs). So every reconcile rendered the quadlet as
PublishPort=4080:4080, disagreed with the working 4080:8080 container,
and restarted it ("port binding drift" -> "host port 4080 did not become
reachable within 5s" -> "host listener disappeared; restarting") in a
perpetual loop on .228. Correcting the manifest container port to 8080
makes the rendered quadlet match reality so the drift/restart loop stops.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 13:49:54 -04:00
archipelago
6734947c3e fix(fmcd): cap CPU + watchdog-restart the iroh relay hot-loop
On NAT'd nodes that can reach the iroh federation neither directly nor
via iroh's public relays, fmcd's embedded iroh networking enters a
relay/hole-punch reconnect hot-loop that pegs its entire CPU allotment
indefinitely (observed ~1 core sustained for 4 days on a Tailscale node,
while LAN nodes that reach the guardian directly stay <3%). fmcd 0.8.0
exposes no iroh/relay knobs, so:

- fmcd-run now samples fmcd's own CPU and restarts it when it stays near
  its allotment for ~15 min (a restart demonstrably clears the stuck iroh
  state; real work is bursty and never flat-pegs a core for minutes).
- Lower cpu_limit 1 -> 0.25 core so a stuck instance can't starve the
  node (steady-state is <3% of a core; joins are brief).

Ships as fmcd:0.8.1 (launcher-only rebuild, same fmcd binary). Bumped the
image pin + cpu_limit in the manifest, image-versions.sh, the embedded
catalog manifest (releases/app-catalog.json), and the UI catalogs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 12:19:27 -04:00
archipelago
4519dbf04f fix(orchestrator): render manifest certs on the adopted-running reconcile path
WS-F #10: a netbird reinstall that adopts a leftover running container
skipped ensure_manifest_certs, so when its data dir was wiped the self-
signed tls.crt/key were never regenerated; the next nginx.conf rewrite +
restart then died on the missing cert (proxy 502, login broken). The
Running branch of ensure_running_with_mode now calls ensure_manifest_certs
before ensure_manifest_files, mirroring prepare_for_start's certs-before-
files ordering. Idempotent: a no-op when crt+key already exist.

Live-validated on .228: deleted netbird tls.crt/key under a Running
container; reconciler regenerated a fresh CN=<host_ip> self-signed cert
(1000:1000), https :8087 = 200.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 17:49:50 -04:00
archipelago
a38c9d5f29 docs(master-plan): §10d Meshtastic MeshCore-parity status (one open received-msg bug)
Region (EU_868) + shared channel "archipelago" auto-provisioning shipped in
8fdb45e8 and riding the rolled #9 fleet binary (0060dcd6). Discovery, RF, and
sending verified on .116+.228; the one open blocker is the running driver not
surfacing received messages. Slotted after WS-F #9–11.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:53:06 -04:00
archipelago
f9a6ae3f32 feat(mesh): Meshtastic region + shared-channel auto-provisioning (MeshCore parity)
Fresh Meshtastic radios ship region-UNSET (RF-silent) and on mismatched
channels, so nodes only ever saw themselves. Bring them to MeshCore parity
using the official Meshtastic admin API:

- Auto-provision LoRa region (set_config, AdminMessage field 34) from a new
  mesh-config `lora_region` (e.g. EU_868) when the radio's region differs.
- Auto-provision a shared primary channel (set_channel, field 33) with a
  PSK derived deterministically from channel_name, so every node converges on
  one mesh — the parity equivalent of MeshCore's named "archipelago" channel.
- Read current region/channel from want_config; only write when different
  (no reboot loop); cap attempts so a radio that won't persist can't loop.
- Active NodeInfo advert scaffolding + aggressive serial drain.

Verified on .116+.228: region+channel persist, discovery works (both see each
other as named reachable contacts), bidirectional RF + sending confirmed.
Receiving in the running driver is still under diagnosis (instrumentation added).

Also removes the unwanted `meshtastic` daemon app from the registry (it was
never meant to be a container — native driver provides system-level support):
deletes apps/meshtastic + catalog entries (app-catalog, neode-ui, releases) +
test refs. Meshtastic stays native, like MeshCore.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:46:35 -04:00
archipelago
fd3a4ee4ef fix(orchestrator): chown the whole fresh bind subtree, not just the leaf
ensure_bind_mount_dirs chowned a freshly-created no-data_uid bind dir
with --reference={immediate_parent}. For a NESTED bind source like
jellyfin's /var/lib/archipelago/jellyfin/config (or netbird's .../netbird/
data), `mkdir -p` creates the intermediate <app> dir root:root too, so
referencing the immediate parent just copied ROOT — leaving the dir
unwritable and the app EACCES-crash-looping on reinstall (found by the
all-apps-lifecycle pass: jellyfin "/config/log denied" exit 139;
netbird-server "unable to open database file"). It only ever worked for
direct children of the data root (immich).

Fix: anchor to the nearest PRE-EXISTING ancestor (the rootless data root,
owned by the service user) and chown -R the entire newly-created subtree
to it. Extracted the walk into fresh_subtree_anchor() with a unit test
covering nested / direct / second-volume cases.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-27 04:46:35 -04:00
Dorian
38d2bbf570 chore(android): update companion APK download [skip ci] 2026-06-26 13:08:37 +01:00
Dorian
a90fea80ed feat(android): edit server entries from in-app settings menu (NESMenu); bump to 0.4.12 (vc16)
The 0.4.11 edit affordance only lived on ServerConnectScreen, which a
connected user never sees. Add edit to NESMenu — the settings modal
reached via two-finger hold while connected: a ✎ pencil on each saved
server opens the form pre-populated (Edit Server header + Cancel),
persists via ServerPreferences.updateSavedServer(), and reconnects when
the edited server is the live one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 13:08:18 +01:00
Dorian
389e602097 chore(android): update companion APK download [skip ci] 2026-06-26 12:54:52 +01:00
Dorian
5677f9cca1 feat(android): edit saved server entries; bump companion to 0.4.11 (vc15)
Add an edit affordance to each saved server in ServerConnectScreen: a
pencil button loads the entry into the form (Edit Server mode) with
Save Changes / Cancel actions. Persisted via a new
ServerPreferences.updateSavedServer() that replaces by connection
identity (address/port/scheme) and keeps the active record in sync when
the edited server is the active one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:54:07 +01:00
archipelago
fc64b422e7 docs(master-plan): WS-F#3 first destructive run — 3 reinstall bugs found
Full all-apps-lifecycle pass on .228: lifecycle 11/11, teardown 8/11.
Surfaced (1) fresh-install bind-dir ownership root:root → reinstall
EACCES (jellyfin/netbird; Fix B misses the install path), (2) netbird
reinstall adopts leftover containers → skips manifest cert/file render,
(3) portainer image pin lfg2025/portainer:2.19.4 unpublished (manifest
unknown), pin overrides RPC dockerImage. .228 restored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 07:47:24 -04:00
Dorian
07b9b5a3aa docs(android): companion release + App-Not-Installed runbook
Capture the 2026-06-26 lessons durably: ship via the hardened publish
script only, v1+v2+v3 signing is enforced by apksigner (AGP ignores
enableV1Signing at minSdk>=24), diagnose install failures with adb
install FIRST, signature-key changes force a one-time uninstall, and
keep all phone/adb work scoped to com.archipelago.app.debug.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:21:48 +01:00
Dorian
ac59771560 fix(android): force v1+v2+v3 signing & clean-build guards in companion publish
The published companion APK was v2-only (AGP silently ignores
enableV1Signing for minSdk>=24) and clean builds broke on stray
space-named resource dirs. Harden scripts/publish-companion-apk.sh:
clean build, remove/ýreject space-named res dirs, force v1+v2+v3 via
zipalign+apksigner, and abort unless all three schemes verify. Wire
ship-companion.sh to the shared script. Re-sign the served 0.4.10 APK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:53:25 +01:00
Dorian
d1f9e9ce88 chore(android): update companion apk download 2026-06-26 11:32:00 +01:00
Dorian
58847fc3d7 chore(android): bump companion to 0.4.10 (versionCode 14)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:31:36 +01:00
archipelago
a3e09eab57 docs(master-plan): WS-F#3 — destructive all-apps lifecycle matrix landed (43934eef)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:51 -04:00
archipelago
43934eefa5 test(gate): destructive all-apps lifecycle matrix (WS-F#3)
Active counterpart to the read-only all-apps-matrix.bats: drives
stop/start/restart for every installed app and, under
ARCHY_ALLOW_CASCADE_DESTRUCTIVE, a FULL teardown (uninstall →
no-ghost → reinstall) — the broad coverage F needs beyond the ~8 core
suites. App set is discovered from My Apps ∩ the node catalog; reinstall
spec comes from catalog.json {dockerImage, containerConfig}.

PROTECTED by default (never cycled or torn down): bitcoin*/electrum*
(expensive resync) AND lnd/btcpay*/fedimint* (teardown = irreversible
wallet/channel/guardian loss). The user asked to protect only
bitcoin+electrum; the wallet apps are added for safety and can be
removed via ARCHY_MATRIX_PROTECT. Heavy + destructive → a supervised
pass, not folded into run-gate. Validated on .228: discovery excludes
the 6 protected installed apps; lifecycle tier cycles a single app
(botfights) stop/start/restart green; teardown gated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:29:22 -04:00
archipelago
80146f4476 docs(master-plan): WS-F#2 — uninstall progress bar made truthful (9f17ba68)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:15:11 -04:00
archipelago
9f17ba6867 fix(ui): truthful uninstall progress bar (was a solid full-red block)
AppCard's uninstall bar was hardcoded `w-full bg-red-400/60 animate-pulse`
— a solid, full-width, red, fake-pulsing block that never moved and read
as an error, no matter the actual teardown progress (the install bar, by
contrast, renders a real percentage). Derive a truthful percentage from
the backend's existing `uninstall-stage` label — "Stopping containers
(X/N)" → 10–50%, "Cleaning up volumes" → 70%, "Removing app data" → 90%
— and render it exactly like install: neutral fill, real width + percent,
shimmer (not a fake pulse) carrying motion when a stage has no number.
Frontend-only; the backend already broadcasts these stages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 06:04:48 -04:00
archipelago
67426c0d41 docs(master-plan): cascade tier wired into the gate (b7d92107)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:24:07 -04:00
archipelago
b7d9210784 test(gate): optional ARCHY_GATE_CASCADE pass — wire the cascade tier in
run-gate.sh ran only the DESTRUCTIVE tier; the cascade-uninstall suite
(uninstall→no-ghost→reinstall, the #13/#14/uninstall-hang regression
guard) existed but was never enabled by the gate. Add an opt-in single
cascade pass after the 5× loop (ARCHY_GATE_CASCADE=1, requires
ARCHY_ALLOW_DESTRUCTIVE=1), counted into the pass/fail tally. Kept out
of the 5× loop deliberately — uninstall/reinstall every iteration would
balloon runtime and re-pull images; one pass guards the class. Default
gate behavior unchanged. Validated: cascade-uninstall.bats 7/7 on .228.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:22:45 -04:00
archipelago
292a2650df docs(master-plan): WS-F — uninstall-hang root cause fixed + cascade validated
Workstream F now in-progress: the immich/grafana uninstall hang →
ghost/stuck-bar/reinstall-block is root-caused (unbounded systemctl/
podman in quadlet::disable_remove) and fixed (71cc9ac4); cascade-
uninstall.bats 7/7 on .228. Records the remaining F items + the pending
gate-wiring decision.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 05:18:39 -04:00
archipelago
71cc9ac46a fix(uninstall): bound systemctl/podman teardown so uninstall can't hang
Uninstalling immich/grafana could hang with a frozen full-red progress
bar, leave a ghost entry stuck in My Apps, and then refuse reinstall.
Single root cause: quadlet::disable_remove() — called first in the
uninstall task (via companion + orchestrator teardown) — ran
`systemctl --user stop`, daemon-reload, and `podman rm -f` with NO
timeout. On rootless podman a generated unit can wedge in "deactivating"
while podman hangs underneath, so `systemctl stop` blocks forever. The
spawned uninstall task then never returns Ok or Err, so:
  - set_uninstall_stage() (after the stop) never fires → progress frozen;
  - remove_package_state_entry() never runs → entry stranded in
    `Removing` → ghost in My Apps;
  - the install guard rejects reinstall with "already Removing".

The spawn wrapper already reverts state on Err and removes the entry on
Ok — the only failure mode was a hang that returns neither. Bound the
teardown so it always terminates:
  - systemctl stop → QUADLET_STOP_TIMEOUT, escalate to kill+reset-failed
    on timeout (reuses the existing helpers);
  - daemon_reload_user() → bounded systemctl_user_status (30s);
  - defensive `podman rm -f` → wrapped in tokio timeout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 04:27:02 -04:00
archipelago
2ebcd8f9a8 docs(master-plan): backlog — smart launch-port selection + manifest-driven archival-node blocker
§10b: replace per-app static launch-port map with a manifest-first +
non-HTTP-port-skipping heuristic (the gitea :2222 class).
§10c: generalize the un-pruned/archival Bitcoin install blocker from a
hardcoded requires_unpruned_bitcoin() match to a manifest-declared
dependency, with a clear pre-install UX.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:47:25 -04:00
archipelago
3515344800 docs(master-plan): session h — zombie guard + gitea launch-port fix
Banner + §8b: zombie-container guard (0a8db904, live-proven on .228) and
gitea launch-port fix (670ebb06) shipped in binary 040df5ce, rolled to
the fleet. Logs the mempool env-drift recreate-loop and nostr-rs-relay
follow-ups.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 03:41:59 -04:00