archy/docs/RETICULUM-TRANSPORT-PROGRESS.md
archipelago f54c853128 feat(mesh): Reticulum LoRa hardware gates pass + RNS Resource transfer + image/voice attachments
Phase 0 gates #2/#3 (two-node LXMF-over-LoRa, external Sideband interop) passed
on real hardware (.116's flashed Heltec V3 RNode <-> a phone-flashed RNode running
Sideband) — RNS announce, encrypted DM round-trip, and contact binding all verified
live. Fixed two bugs found in the process: the Reticulum send path wasn't stamping
outbound messages as E2E despite LXMF being unconditionally encrypted, and the
per-message transport pill collapsed Meshcore/Meshtastic into one generic "lora"
color instead of distinguishing the three radio transports.

Built on top of that link: a Columba-style image/file send experience —
compression-quality presets with a real transfer-time estimate (mesh.transport-advice,
now device-throughput-aware), receive-side thumbnail previews + auto-render for
already-local attachments, and async voice messages, all reusing the existing
ContentRef/ContentInline attachment pipeline. The headline addition is genuine RNS
Resource transfer support (daemon-side RNS.Link + RNS.Resource, Rust-side
send_resource/resource_recv plumbing, a new "resource-mesh" transport-advice tier)
so compressed photos up to 2MB now actually transfer over LoRa for Reticulum peers
instead of always falling back to Tor past the small inline-chunk cap.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 19:57:01 -04:00

22 KiB
Raw Blame History

Reticulum mesh transport — progress tracker

Living status doc for the Reticulum (RNS+LXMF) third-transport work. Update this after every meaningful step. If a session is cut off mid-work, read this file first, then the plan, then resume at "Next up."

Full plan: .claude/plans/enchanted-strolling-rocket.md. Memory pointer: project_reticulum_transport_plan.md (auto-memory index).

Coordination note (2026-06-30): a separate agent owns concurrent Meshtastic work, scoped to mesh/meshtastic.rs + mesh/protocol.rs (see docs/SESSION-1.8.0-OTA-PROGRESS.md) and explicitly avoiding mesh/listener/session.rs transport plumbing + mesh/mod.rs routing, which this work owns. Stay out of meshtastic.rs/protocol.rs to avoid collisions.

Status at a glance

Phase What Status
0 Gate #1 — deterministic identity from Archy keys DONE, verified in venv AND in the PyInstaller binary (same dest hash)
0 Gate #2 — two-node LXMF-over-LoRa on real hardware PASSED 2026-06-30 — real RF announce + encrypted DM exchanged between .116's Heltec V3 RNode and a phone-flashed second RNode running Sideband
0 Gate #3 — external Sideband/MeshChat interop PASSED 2026-06-30 — same session as gate #2; Sideband is the stock external client this gate calls for
1 reticulum-daemon/ (Python rns+lxmf, Unix-socket RPC) scaffolded + tested (no radio); signed-identity announce also done (see below)
1 Packaging — PyInstaller single binary DONE + verifiedreticulum-daemon/build.sh, 16M standalone binary, selftest passes run from /tmp with no venv on PATH
2 Rust wiring (DeviceType, MeshRadioDevice, ReticulumLink, stamp sites) cargo check/cargo test -p archipelago GREEN (99 mesh tests pass) — still untested on real hardware
2c MeshConfig.device_kind reflashable-board pin DONE this session (was the one open Phase-2 item)
3 Frontend (~8 label/CSS spots) DONE (scoped down — see note below)
4 Multi-device (run all 3 radios at once) + per-network channels not started (follow-on, after 03)

Checkpoint 2026-06-30 (late session — read this first if cut off)

This session picked up after Phase 2/3 were already green, and closed out everything that didn't need real RNode hardware:

  1. Corrected two stale tracker entries (both were already done, just not reflected here):
    • The _announce_app_data "TODO" was actually already implemented: reticulum_daemon.py's _announce_app_data() embeds ARCHY:2:{ed}:{x25519} when --archy-ed-pubkey-hex/--archy-x25519-pubkey-hex are passed, and reticulum.rs's daemon_command()/open() already forward our_ed_pubkey_hex/our_x25519_pubkey_hex from session.rs (run_mesh_sessionauto_detect_and_open/open_preferred_pathReticulumLink::open). Confirmed end-to-end by reading the call chain, not just grepping.
    • Phase 3 frontend was already done (see prior entry below) — tracker table above said "not started", now corrected.
  2. Added MeshConfig.device_kind: Option<DeviceType> (plan §2c, the one explicitly-listed open Phase-2 item) — mesh/mod.rs (field + Default + threaded into start()'s spawn_mesh_listener call), listener/mod.rs (spawn_mesh_listener param → run_mesh_session arg), listener/session.rs (run_mesh_session param; auto_detect_and_open skips non-matching probes per-path via device_kind.is_none_or(|k| k == ...); open_preferred_path restructured to a match kind { ... } that tries only the pinned driver and surfaces its real error, instead of silently falling through to another firmware's handshake on the same port). None (default) preserves today's strict Meshcore→Meshtastic→Reticulum auto-detect — fully backward compatible, no config migration needed. cargo check + cargo test -p archipelago both green after (99 mesh tests, 0 failed).
  3. Built and verified the PyInstaller packaging (plan's Phase 1 "Packaging" + the file list's "Ops: release packaging to include the daemon binary" item — previously undone):
    • reticulum-daemon/build.sh (new) — reproducible build, installs requirements-build.txt (new, pyinstaller==6.21.0, build-only/not shipped) into the existing .venv, runs PyInstaller with flags discovered by trial: --collect-submodules RNS --collect-submodules LXMF --collect-data RNS -d noarchive.
    • Non-obvious gotcha, written up in build.sh's comments so it isn't re-discovered: RNS.Interfaces/__init__.py builds its __all__ via glob.glob(os.path.dirname(__file__) + "/*.py") at import time (Reticulum.py does from RNS.Interfaces import *). PyInstaller's default --onefile zips pure-Python modules into an in-binary PYZ archive, so __file__ doesn't point at a real directory and the glob comes back empty → NameError: name 'Interface' is not defined the moment RNS.Reticulum(...) is constructed. -d noarchive (keep modules as loose .pyc files on disk inside the onefile bundle's runtime-extraction dir) fixes it — confirmed by reproducing the failure first, then fixing it.
    • Verified, not just built: ran the resulting dist/archy-reticulum-daemon binary's --check (dest hash matches the venv-derived 06bb31e16f4f8d46a8ae8eac23a4fd21 for the test seed) and --selftest (full RNS+LXMF bring-up, no radio) both from /tmp with the binary copied away from the repo and the .venv not on PATH — confirms it's genuinely self-contained, not accidentally still depending on the dev venv.
    • dist//build//*.spec are already gitignored (reticulum-daemon/.gitignore); only build.sh + requirements-build.txt are new tracked files.

NOT done this session (still genuinely open):

  • Everything hardware-dependent (Phase 0 gates #2/#3, real RNode probe/spawn). The .116 Heltec V3 reflash mentioned in the prior session's memory was not done in this session — no physical hardware access was exercised, only software.
  • /dev/reticulum-radio udev symlink (plan §2c) — deliberately not added: the existing 99-mesh-radio.rules keys on USB vendor/product ID (e.g. CP2102 0x10c4/0xea60), but the whole point of device_kind is that the same chip can run any of the three firmwares — a vendor/product udev rule can't disambiguate them, and a fabricated rule would just be misleading. Real fix needs either a per-device ATTRS{serial}==... rule the operator fills in once they know their specific board's serial (no such board exists in-repo to template from yet), or rely on device_kind alone (already done, works regardless of /dev path naming). Revisit once a real RNode-flashed board's serial is known.
  • PyInstaller binary not yet wired into the release tarball / scripts/deploy-to-target.sh (the daemon binary path is currently resolved via ARCHY_RETICULUM_DAEMON_BIN env or the dev venv fallback in reticulum.rs's daemon_command() — production default /usr/local/bin/archy-reticulum-daemon is a real path convention now that build.sh produces exactly that filename, but nothing copies it there yet). Left undone deliberately — wiring release-tarball plumbing for a binary that's never been run against real RNS network traffic felt premature; do this once Phase 0 gates #2/#3 pass.

Phase 2 — Rust wiring detail (what's done vs left)

Done — cargo check -p archipelago is GREEN:

  • core/archipelago/src/mesh/types.rsDeviceType::Reticulum (+ Display arm) + a radio_transport_label(DeviceType) -> &'static str helper ("reticulum" vs "lora").
  • core/archipelago/src/mesh/mod.rs — all 4 outbound stamp sites use radio_transport_label(...); use_typed_envelope (~1571) extended to matches!(device_type, Meshcore | Reticulum); data_dir threaded into spawn_mesh_listener(...) call (was: MeshService::start()spawn_mesh_listener).
  • core/archipelago/src/mesh/listener/mod.rsspawn_mesh_listener takes data_dir: PathBuf, passes &data_dir into run_mesh_session.
  • core/archipelago/src/mesh/listener/decode.rs:406,639 and dispatch.rs:79 — all 3 inbound stamp sites now use radio_transport_label(state.status.read().await.device_type).
  • core/archipelago/src/mesh/listener/session.rs:
    • MeshRadioDevice enum has Reticulum(ReticulumLink); all 18 method arms wired (no-ops: ensure_lora_region, ensure_channel, send_keepalive, send_nodeinfo_advert, reboot, reset_contact_path; everything else forwards to ReticulumLink).
    • auto_detect_and_open(data_dir: &Path) and open_preferred_path(path, data_dir: &Path) both now try ReticulumLink::open(path, data_dir) last, after Meshcore/Meshtastic — cheap raw-serial KISS-detect probe runs first; the daemon only spawns on a confirmed match.
    • reticulum_contact_id() helper added (delegates to the canonical reticulum::reticulum_contact_id_from_hash, masked & 0x7FFF_FFFF, avoids 0).
    • refresh_contacts() has an is_reticulum branch parallel to is_meshtastic; reachable flows through contact.path_len != 0 unchanged (ReticulumLink::get_contacts() already encodes daemon-reported reachability into path_len).
    • data_dir: &Path threaded through run_mesh_session → both probe functions.
  • core/archipelago/src/mesh/reticulum.rscreated. ReticulumLink: spawns/supervises the daemon as a child process, Unix-socket RPC client (matches the tested daemon contract), prefix_to_hash: HashMap<[u8;6],[u8;16]> (mandatory per the plan), synthetic InboundFrame builder byte-matching meshtastic.rs's layout, Drop impl that kills the daemon + cleans up the socket. Has unit tests (KISS-detect byte matching, contact-id masking, synthetic-frame layout) — passing, see below.

Concurrent-edit note: a separate in-flight change (not mine) added MeshPeer.pkc_capable and ParsedContact.pkc_capable (Meshtastic PKI-capability tracking) while this work was in progress. Accounted for: reticulum.rs's ParsedContact literal sets pkc_capable: false (Reticulum/LXMF is unconditionally E2E via take_rx_encrypted(), this field has no analogue); two incomplete MeshPeer literals in decode.rs (lines ~330, ~548) were completed with pkc_capable: false to unblock the build for everyone — not reverted, not worked around.

Self-review fix applied: the RPC Unix socket originally lived in the shared system temp dir; moved to {data_dir}/reticulum/ (0700) instead — archipelago-owned, not shared /tmp, matching the security posture. Re-confirmed cargo check -p archipelago GREEN after the move.

NOT yet done:

  • MeshConfig.device_kind: Option<DeviceType> hint (optional reflashable-board disambiguator, plan §2c) — not added. Auto-detect ordering (Meshcore→Meshtastic→Reticulum, strict probes) is the only disambiguator right now.
  • Phase 3 frontend — DONE, but smaller scope than originally inventoried: only Mesh.vue's transportLabel() (per-message field) + mesh-styles.css .transport-reticulum
    • the mesh.ts doc comment needed the addition. transport.ts TransportKind, federation/types.ts last_transport, NodeList.vue transportBadge, and PeerFiles.vue transportPill are a COARSER routing-layer category (mesh/lan/fips/tor) where 'mesh' already covers any radio (meshcore/meshtastic/reticulum) — adding a separate 'reticulum' there would be inconsistent with how meshcore/meshtastic are handled. Confirmed via vue-tsc --noEmit (exit 0, zero errors).
  • Everything hardware-dependent: real daemon spawn/probe against an actual RNode (the .116 Heltec V3, once reflashed), two-node LXMF-over-LoRa, the _announce_app_data signed-identity TODO in the daemon (currently carries only the plaintext display name, not a verified Archy DID/pubkey — needed for bind_federation_twins-style auto-binding across protocols).

Verified facts to reuse (don't re-derive)

RNode KISS-detect handshake (confirmed against the canonical Reticulum source, not guessed):

constants: FEND=0xC0 FESC=0xDB TFEND=0xDC TFESC=0xDD CMD_DETECT=0x08 DETECT_REQ=0x73 DETECT_RESP=0x46
probe tx:  C0 08 73 C0 50 00 C0 48 00 C0 49 00 C0   (detect + fw_version + platform + mcu queries)
success:   response contains byte sequence ... C0 08 46 ...  (FEND, CMD_DETECT, DETECT_RESP)

Source: RNS/Interfaces/RNodeInterface.py (Liberated Systems mirror), detect()/readLoop().

Synthetic InboundFrame layout for a 1:1 DM, copied exactly from meshtastic.rs:1031-1047 (ReticulumLink must build the same shape so frames::handle_frame needs zero changes):

data = [snr(1)=0][reserved(2)=00,00][sender_prefix(6)][path(1)=0xff][type(1)=0][rx_time(4 LE)][payload…]
code = RESP_CONTACT_MSG_V3_E2E if encrypted else RESP_CONTACT_MSG_V3   (RNS/LXMF is always E2E, so always _E2E)

Channel/broadcast equivalent (RESP_MESHTASTIC_CHANNEL_TEXT, meshtastic.rs:1019-1028) — N/A for Reticulum in single-device Phase 2 (LXMF has no shared-channel concept); revisit in Phase 4.

resolve_peer (decode.rs:316) matches inbound sender_prefix against peer.pubkey_hex.starts_with(prefix) — so as long as refresh_contacts/announce-handling populates pubkey_hex = full 16-byte RNS hash hex BEFORE a message arrives (same precondition meshtastic relies on via its peer_pubkeys map), no Reticulum-specific fallback is needed there.

ParsedContact.public_key_hex for Reticulum = hex of the 16-byte RNS dest hash (32 hex chars, NOT 32 bytes) — the hex::decode(...).len()==32 checks elsewhere (e.g. the auto-heal reset_contact_path loop in refresh_contacts) will naturally skip Reticulum contacts since their key decodes to 16 bytes, not 32. That's fine — no special-casing needed, just don't "fix" it to be 32 bytes.

data_dir.join("identity").join("node_key") is the 32-byte raw Ed25519 seed file — this is exactly what reticulum_daemon.py --identity-key <path> expects (confirmed against identity.rs NODE_KEY_FILE/load_or_create). The daemon reads the file itself — Rust should pass the path, not pipe the raw key bytes through more hops than already exist.

Hardware update (2026-06-30)

.116 has a Heltec V3 available to reflash with RNode firmware. This unblocks Phase 0 gates #2/#3 (previously marked blocked — .198's radio is dead, but .116's Heltec V3 is a real path forward without needing new hardware). Next concrete step once reflashed: run reticulum-daemon/reticulum_daemon.py pointed at the RNode's serial path, confirm --check hash matches --selftest, then bring up two instances (.116 + .228, after .228 also gets an RNode-capable board) for the real two-node LXMF-over-LoRa gate.

Daemon contract (already built + tested — Phase 2 codes against this, no changes needed)

reticulum-daemon/reticulum_daemon.py, RPC over Unix socket (0600), one JSON object per line:

  • in: {"cmd":"send","dest_hash":hex16,"content":...} / {"cmd":"announce"} / {"cmd":"status"} / {"cmd":"shutdown"}
  • out: {"event":"ready",...} / {"event":"recv",...} / {"event":"announce",...} / {"event":"delivered",...} / {"event":"status",...} Verified: --check (hash only), --selftest (boots real RNS+LXMF, no radio), and a live socket round-trip (readystatusshutdown, clean exit) — see reticulum-daemon/README.md.

Checkpoint 2026-06-30 (hardware session — gates #2/#3 PASSED)

Picked up after a session pipe-break; the live system (archipelago.service + the spawned archy-reticulum-daemon) had kept running uninterrupted the whole time, so nothing was lost.

What happened, in order:

  1. .116's Heltec V3 (CP2102, USB vendor/product 10c4:ea60, serial 0001) was reflashed with RNode firmware and plugged into /dev/mesh-radio (generic udev symlink → ttyUSB0, not a per-serial rule). mesh-config.json has device_path: null — pure auto-detect, no device_kind pin needed.
  2. Auto-detect correctly tried Meshcore → Meshtastic → Reticulum and found it: journal shows Found Reticulum (RNode) device via auto-detect path=/dev/mesh-radio — but only after ~4 min of Failed to spawn reticulum-daemon — is it installed/packaged? retries, because /usr/local/bin/archy-reticulum-daemon hadn't been copied into place yet from reticulum-daemon/dist/ (built via ./build.sh). Once copied (sha256-verified match to the dist/ build), auto-detect succeeded on the very next retry.
  3. mesh.status RPC confirmed live: device_type: "reticulum", device_connected: true, dest_hash: 5d146f6e1c9707f89468b5016ed6dfad. Periodic self-advert (send_self_advert{"cmd":"announce"} → real RNS Identity.announce()) firing every ~30s — confirmed this is not the send_nodeinfo_advert no-op arm (that one's still legitimately a no-op for Reticulum; the real announce path is send_self_advert, wired correctly).
  4. Second RNode flashed onto a phone running Sideband. First attempt showed RF energy (interference_last_dbm climbing) but rxb: 0 — a parameter mismatch, not a frequency problem (energy was detected, just not demodulated). Root cause: Spreading Factor mismatch in Sideband's manual RNode interface config (frequency display rounds to one decimal so "869.5" silently passed at first glance — bandwidth/SF/CR are separate fields and SF was wrong). Once SF was corrected to match (freq 869525000, BW 125000, SF 8, CR 5), rxb went non-zero immediately and a real {"event":"announce","dest_hash":"1870744d...", "app_data":"7a617a61"} (hex for "zaza") arrived over the air.
  5. Gate #2 + gate #3 both passed in the same exchange: zaza shows up as a real, reachable mesh.peers contact; an inbound encrypted LXMF message ("Yoooo") arrived and was correctly stamped encrypted: true, transport: "reticulum"; a reply was sent back and round-tripped. Sideband is exactly the stock external client gate #3 calls for, so one real RNode-to-RNode LoRa link covered both gates — no need for a second dedicated archy node.
  6. Two real bugs found from this, both fixed:
    • record_sent_typed's encrypted flag was hardcoded false/archy || pkc_capable on the Reticulum send path (both the native-text path in send_message and the typed-envelope path in send_typed_wire) — correct for Meshcore/Meshtastic (where E2E really is conditional on PKI/session state not yet threaded through), wrong for Reticulum: LXMF encrypts every send to the destination identity key unconditionally, archy peer or not. Fixed: both call sites now OR in device_type == DeviceType::Reticulum.
    • radio_transport_label() collapsed Meshcore and Meshtastic into one generic "lora" string, so the per-message pill couldn't distinguish them. User asked for 3 distinct pill colors (Meshtastic mint, Meshcore orange, Reticulum blue) — extended the label fn to return "meshtastic"/"meshcore"/"reticulum" distinctly, updated Mesh.vue's transportLabel() switch and mesh-styles.css (.transport-meshtastic #3eb489, .transport-meshcore #fb923c, .transport-reticulum #60a5fa; kept .transport-lora #f59e0b as a fallback for any already-stored legacy-labelled messages). cargo check + vue-tsc --noEmit both green after.

NOT yet done:

  • The Rust-side fix above (encrypted flag, transport-label split) is built but not yet deployed to .116's running binary — the live daemon/auto-detect verification above was all against the binary already running before this session's edits. Rebuild + redeploy to see the fix live.
  • tests/lifecycle/run-gate.sh not re-run after these mesh changes yet (project convention: run after backend changes land).
  • Multi-device (3 radios at once, Phase 4) and the release-tarball/udev-rule wiring (originally "Next up" #6 below) are both still untouched.

Next up (resume here)

Phase 0 gates #1#3 are now all passed. What's left:

  1. Rebuild the backend + frontend and redeploy to .116 so the encrypted-flag fix and the 3-way transport-pill color split actually take effect on the live node (currently only checked in with cargo check/vue-tsc, not deployed).
  2. Re-verify on-device after redeploy: send another Sideband↔archy DM, confirm the Sent bubble now shows E2E + a blue "Reticulum" pill, and confirm Meshtastic/Meshcore pills (if any messages exist) render mint/orange instead of the old generic amber "LoRa".
  3. Exercise the rest of the plan's "Verification (definition of done)" items: hot-swap detection (unplug the RNode mid-session, confirm fallback to FIPS/Tor on the same contact; replug, confirm it picks Reticulum back up), and device_kind: Some(Reticulum) pin path (currently only auto-detect has been exercised on real hardware).
  4. Run tests/lifecycle/run-gate.sh to confirm no regression from the mesh changes landing.
  5. Only after the above: wire dist/archy-reticulum-daemon into the release tarball / scripts/deploy-to-target.sh (target path /usr/local/bin/archy-reticulum-daemon, matching reticulum.rs's default) and add a per-serial-number /dev/reticulum-radio udev rule now that a real board's serial number (0001 on the CP2102, .116's board) is known — though a second board will likely report the same 0001 stock serial since CP2102 modules commonly ship with an unprogrammed default, so this may still need a different disambiguator.
  6. Phase 4 (run all 3 radios at once) — still not started, follow-on after the above.