archy/docs/RETICULUM-TRANSPORT-PROGRESS.md
archipelago f54c853128 feat(mesh): Reticulum LoRa hardware gates pass + RNS Resource transfer + image/voice attachments
Phase 0 gates #2/#3 (two-node LXMF-over-LoRa, external Sideband interop) passed
on real hardware (.116's flashed Heltec V3 RNode <-> a phone-flashed RNode running
Sideband) — RNS announce, encrypted DM round-trip, and contact binding all verified
live. Fixed two bugs found in the process: the Reticulum send path wasn't stamping
outbound messages as E2E despite LXMF being unconditionally encrypted, and the
per-message transport pill collapsed Meshcore/Meshtastic into one generic "lora"
color instead of distinguishing the three radio transports.

Built on top of that link: a Columba-style image/file send experience —
compression-quality presets with a real transfer-time estimate (mesh.transport-advice,
now device-throughput-aware), receive-side thumbnail previews + auto-render for
already-local attachments, and async voice messages, all reusing the existing
ContentRef/ContentInline attachment pipeline. The headline addition is genuine RNS
Resource transfer support (daemon-side RNS.Link + RNS.Resource, Rust-side
send_resource/resource_recv plumbing, a new "resource-mesh" transport-advice tier)
so compressed photos up to 2MB now actually transfer over LoRa for Reticulum peers
instead of always falling back to Tor past the small inline-chunk cap.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-06-30 19:57:01 -04:00

293 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Reticulum mesh transport — progress tracker
Living status doc for the Reticulum (RNS+LXMF) third-transport work. **Update this after every
meaningful step.** If a session is cut off mid-work, read this file first, then the plan, then
resume at "Next up."
Full plan: `.claude/plans/enchanted-strolling-rocket.md`. Memory pointer:
`project_reticulum_transport_plan.md` (auto-memory index).
**Coordination note (2026-06-30):** a separate agent owns concurrent Meshtastic work, scoped to
`mesh/meshtastic.rs` + `mesh/protocol.rs` (see `docs/SESSION-1.8.0-OTA-PROGRESS.md`) and explicitly
avoiding `mesh/listener/session.rs` transport plumbing + `mesh/mod.rs` routing, which this work
owns. Stay out of `meshtastic.rs`/`protocol.rs` to avoid collisions.
## Status at a glance
| Phase | What | Status |
|---|---|---|
| 0 | Gate #1 — deterministic identity from Archy keys | ✅ **DONE**, verified in venv AND in the PyInstaller binary (same dest hash) |
| 0 | Gate #2 — two-node LXMF-over-LoRa on real hardware | ✅ **PASSED 2026-06-30** — real RF announce + encrypted DM exchanged between .116's Heltec V3 RNode and a phone-flashed second RNode running Sideband |
| 0 | Gate #3 — external Sideband/MeshChat interop | ✅ **PASSED 2026-06-30** — same session as gate #2; Sideband is the stock external client this gate calls for |
| 1 | `reticulum-daemon/` (Python rns+lxmf, Unix-socket RPC) | ✅ scaffolded + tested (no radio); signed-identity announce **also done** (see below) |
| 1 | Packaging — PyInstaller single binary | ✅ **DONE + verified**`reticulum-daemon/build.sh`, 16M standalone binary, selftest passes run from `/tmp` with no venv on PATH |
| 2 | Rust wiring (`DeviceType`, `MeshRadioDevice`, `ReticulumLink`, stamp sites) | ✅ **`cargo check`/`cargo test -p archipelago` GREEN** (99 mesh tests pass) — still untested on real hardware |
| 2c | `MeshConfig.device_kind` reflashable-board pin | ✅ **DONE** this session (was the one open Phase-2 item) |
| 3 | Frontend (~8 label/CSS spots) | ✅ DONE (scoped down — see note below) |
| 4 | Multi-device (run all 3 radios at once) + per-network channels | ⏳ not started (follow-on, after 03) |
## Checkpoint 2026-06-30 (late session — read this first if cut off)
This session picked up after Phase 2/3 were already green, and closed out everything that didn't
need real RNode hardware:
1. **Corrected two stale tracker entries** (both were already done, just not reflected here):
- The `_announce_app_data` "TODO" was actually already implemented:
`reticulum_daemon.py`'s `_announce_app_data()` embeds `ARCHY:2:{ed}:{x25519}` when
`--archy-ed-pubkey-hex`/`--archy-x25519-pubkey-hex` are passed, and `reticulum.rs`'s
`daemon_command()`/`open()` already forward `our_ed_pubkey_hex`/`our_x25519_pubkey_hex` from
`session.rs` (`run_mesh_session``auto_detect_and_open`/`open_preferred_path`
`ReticulumLink::open`). Confirmed end-to-end by reading the call chain, not just grepping.
- Phase 3 frontend was already done (see prior entry below) — tracker table above said
"not started", now corrected.
2. **Added `MeshConfig.device_kind: Option<DeviceType>`** (plan §2c, the one explicitly-listed
open Phase-2 item) — `mesh/mod.rs` (field + Default + threaded into `start()`'s
`spawn_mesh_listener` call), `listener/mod.rs` (`spawn_mesh_listener` param → `run_mesh_session`
arg), `listener/session.rs` (`run_mesh_session` param; `auto_detect_and_open` skips
non-matching probes per-path via `device_kind.is_none_or(|k| k == ...)`;
`open_preferred_path` restructured to a `match kind { ... }` that tries **only** the pinned
driver and surfaces its real error, instead of silently falling through to another firmware's
handshake on the same port). `None` (default) preserves today's strict
Meshcore→Meshtastic→Reticulum auto-detect — fully backward compatible, no config migration
needed. `cargo check` + `cargo test -p archipelago` both green after (99 mesh tests, 0 failed).
3. **Built and verified the PyInstaller packaging** (plan's Phase 1 "Packaging" + the file list's
"Ops: release packaging to include the daemon binary" item — previously undone):
- `reticulum-daemon/build.sh` (new) — reproducible build, installs `requirements-build.txt`
(new, `pyinstaller==6.21.0`, build-only/not shipped) into the existing `.venv`, runs
PyInstaller with flags discovered by trial: `--collect-submodules RNS --collect-submodules
LXMF --collect-data RNS -d noarchive`.
- **Non-obvious gotcha, written up in `build.sh`'s comments so it isn't re-discovered:**
`RNS.Interfaces/__init__.py` builds its `__all__` via `glob.glob(os.path.dirname(__file__) +
"/*.py")` at import time (`Reticulum.py` does `from RNS.Interfaces import *`). PyInstaller's
default `--onefile` zips pure-Python modules into an in-binary PYZ archive, so `__file__`
doesn't point at a real directory and the glob comes back empty → `NameError: name
'Interface' is not defined` the moment `RNS.Reticulum(...)` is constructed. `-d noarchive`
(keep modules as loose `.pyc` files on disk inside the onefile bundle's runtime-extraction
dir) fixes it — confirmed by reproducing the failure first, then fixing it.
- **Verified, not just built:** ran the resulting `dist/archy-reticulum-daemon` binary's
`--check` (dest hash matches the venv-derived `06bb31e16f4f8d46a8ae8eac23a4fd21` for the
test seed) and `--selftest` (full RNS+LXMF bring-up, no radio) **both from `/tmp` with the
binary copied away from the repo and the `.venv` not on `PATH`** — confirms it's genuinely
self-contained, not accidentally still depending on the dev venv.
- `dist/`/`build/`/`*.spec` are already gitignored (`reticulum-daemon/.gitignore`); only
`build.sh` + `requirements-build.txt` are new tracked files.
**NOT done this session (still genuinely open):**
- Everything hardware-dependent (Phase 0 gates #2/#3, real RNode probe/spawn). The .116 Heltec V3
reflash mentioned in the prior session's memory was **not** done in this session — no physical
hardware access was exercised, only software.
- `/dev/reticulum-radio` udev symlink (plan §2c) — **deliberately not added**: the existing
`99-mesh-radio.rules` keys on USB vendor/product ID (e.g. CP2102 0x10c4/0xea60), but the whole
point of `device_kind` is that the *same* chip can run any of the three firmwares — a
vendor/product udev rule can't disambiguate them, and a fabricated rule would just be
misleading. Real fix needs either a per-device `ATTRS{serial}==...` rule the operator fills in
once they know their specific board's serial (no such board exists in-repo to template from
yet), or rely on `device_kind` alone (already done, works regardless of `/dev` path naming).
Revisit once a real RNode-flashed board's serial is known.
- PyInstaller binary not yet wired into the release tarball / `scripts/deploy-to-target.sh` (the
daemon binary path is currently resolved via `ARCHY_RETICULUM_DAEMON_BIN` env or the dev venv
fallback in `reticulum.rs`'s `daemon_command()` — production default
`/usr/local/bin/archy-reticulum-daemon` is a real path convention now that `build.sh` produces
exactly that filename, but nothing copies it there yet). Left undone deliberately — wiring
release-tarball plumbing for a binary that's never been run against real RNS network traffic
felt premature; do this once Phase 0 gates #2/#3 pass.
## Phase 2 — Rust wiring detail (what's done vs left)
**Done — `cargo check -p archipelago` is GREEN:**
- `core/archipelago/src/mesh/types.rs``DeviceType::Reticulum` (+ `Display` arm) + a
`radio_transport_label(DeviceType) -> &'static str` helper (`"reticulum"` vs `"lora"`).
- `core/archipelago/src/mesh/mod.rs` — all 4 outbound stamp sites use
`radio_transport_label(...)`; `use_typed_envelope` (~1571) extended to
`matches!(device_type, Meshcore | Reticulum)`; `data_dir` threaded into
`spawn_mesh_listener(...)` call (was: `MeshService::start()``spawn_mesh_listener`).
- `core/archipelago/src/mesh/listener/mod.rs``spawn_mesh_listener` takes `data_dir:
PathBuf`, passes `&data_dir` into `run_mesh_session`.
- `core/archipelago/src/mesh/listener/decode.rs:406,639` and `dispatch.rs:79` — all 3 inbound
stamp sites now use `radio_transport_label(state.status.read().await.device_type)`.
- `core/archipelago/src/mesh/listener/session.rs`:
- `MeshRadioDevice` enum has `Reticulum(ReticulumLink)`; all 18 method arms wired (no-ops:
`ensure_lora_region`, `ensure_channel`, `send_keepalive`, `send_nodeinfo_advert`, `reboot`,
`reset_contact_path`; everything else forwards to `ReticulumLink`).
- `auto_detect_and_open(data_dir: &Path)` and `open_preferred_path(path, data_dir: &Path)`
both now try `ReticulumLink::open(path, data_dir)` **last**, after Meshcore/Meshtastic —
cheap raw-serial KISS-detect probe runs first; the daemon only spawns on a confirmed match.
- `reticulum_contact_id()` helper added (delegates to the canonical
`reticulum::reticulum_contact_id_from_hash`, masked `& 0x7FFF_FFFF`, avoids 0).
- `refresh_contacts()` has an `is_reticulum` branch parallel to `is_meshtastic`; `reachable`
flows through `contact.path_len != 0` unchanged (`ReticulumLink::get_contacts()` already
encodes daemon-reported reachability into `path_len`).
- `data_dir: &Path` threaded through `run_mesh_session` → both probe functions.
- `core/archipelago/src/mesh/reticulum.rs` — **created**. `ReticulumLink`: spawns/supervises the
daemon as a child process, Unix-socket RPC client (matches the tested daemon contract),
`prefix_to_hash: HashMap<[u8;6],[u8;16]>` (mandatory per the plan), synthetic
`InboundFrame` builder byte-matching `meshtastic.rs`'s layout, `Drop` impl that kills the
daemon + cleans up the socket. Has unit tests (KISS-detect byte matching, contact-id masking,
synthetic-frame layout) — **passing, see below**.
**Concurrent-edit note:** a separate in-flight change (not mine) added `MeshPeer.pkc_capable`
and `ParsedContact.pkc_capable` (Meshtastic PKI-capability tracking) while this work was in
progress. Accounted for: `reticulum.rs`'s `ParsedContact` literal sets `pkc_capable: false`
(Reticulum/LXMF is unconditionally E2E via `take_rx_encrypted()`, this field has no analogue);
two incomplete `MeshPeer` literals in `decode.rs` (lines ~330, ~548) were completed with
`pkc_capable: false` to unblock the build for everyone — not reverted, not worked around.
**Self-review fix applied:** the RPC Unix socket originally lived in the shared system temp
dir; moved to `{data_dir}/reticulum/` (0700) instead — archipelago-owned, not shared `/tmp`,
matching the security posture. Re-confirmed `cargo check -p archipelago` GREEN after the move.
**NOT yet done:**
- `MeshConfig.device_kind: Option<DeviceType>` hint (optional reflashable-board disambiguator,
plan §2c) — not added. Auto-detect ordering (Meshcore→Meshtastic→Reticulum, strict probes)
is the only disambiguator right now.
- Phase 3 frontend — **DONE**, but **smaller scope than originally inventoried**: only
`Mesh.vue`'s `transportLabel()` (per-message field) + `mesh-styles.css` `.transport-reticulum`
+ the `mesh.ts` doc comment needed the addition. `transport.ts` `TransportKind`,
`federation/types.ts` `last_transport`, `NodeList.vue` `transportBadge`, and `PeerFiles.vue`
`transportPill` are a COARSER routing-layer category (`mesh`/`lan`/`fips`/`tor`) where
`'mesh'` already covers any radio (meshcore/meshtastic/reticulum) — adding a separate
`'reticulum'` there would be inconsistent with how meshcore/meshtastic are handled. Confirmed
via `vue-tsc --noEmit` (exit 0, zero errors).
- Everything hardware-dependent: real daemon spawn/probe against an actual RNode (the .116
Heltec V3, once reflashed), two-node LXMF-over-LoRa, the `_announce_app_data` signed-identity
TODO in the daemon (currently carries only the plaintext display name, not a verified Archy
DID/pubkey — needed for `bind_federation_twins`-style auto-binding across protocols).
## Verified facts to reuse (don't re-derive)
**RNode KISS-detect handshake** (confirmed against the canonical Reticulum source, not guessed):
```
constants: FEND=0xC0 FESC=0xDB TFEND=0xDC TFESC=0xDD CMD_DETECT=0x08 DETECT_REQ=0x73 DETECT_RESP=0x46
probe tx: C0 08 73 C0 50 00 C0 48 00 C0 49 00 C0 (detect + fw_version + platform + mcu queries)
success: response contains byte sequence ... C0 08 46 ... (FEND, CMD_DETECT, DETECT_RESP)
```
Source: `RNS/Interfaces/RNodeInterface.py` (Liberated Systems mirror), `detect()`/`readLoop()`.
**Synthetic `InboundFrame` layout** for a 1:1 DM, copied exactly from
`meshtastic.rs:1031-1047` (`ReticulumLink` must build the same shape so `frames::handle_frame`
needs zero changes):
```
data = [snr(1)=0][reserved(2)=00,00][sender_prefix(6)][path(1)=0xff][type(1)=0][rx_time(4 LE)][payload…]
code = RESP_CONTACT_MSG_V3_E2E if encrypted else RESP_CONTACT_MSG_V3 (RNS/LXMF is always E2E, so always _E2E)
```
Channel/broadcast equivalent (`RESP_MESHTASTIC_CHANNEL_TEXT`, meshtastic.rs:1019-1028) — N/A for
Reticulum in single-device Phase 2 (LXMF has no shared-channel concept); revisit in Phase 4.
**`resolve_peer`** (decode.rs:316) matches inbound `sender_prefix` against
`peer.pubkey_hex.starts_with(prefix)` — so as long as `refresh_contacts`/announce-handling
populates `pubkey_hex` = full 16-byte RNS hash hex BEFORE a message arrives (same precondition
meshtastic relies on via its `peer_pubkeys` map), no Reticulum-specific fallback is needed there.
**`ParsedContact.public_key_hex`** for Reticulum = hex of the 16-byte RNS dest hash (32 hex
chars, NOT 32 bytes) — the `hex::decode(...).len()==32` checks elsewhere (e.g. the auto-heal
`reset_contact_path` loop in `refresh_contacts`) will naturally skip Reticulum contacts since
their key decodes to 16 bytes, not 32. That's fine — no special-casing needed, just don't "fix"
it to be 32 bytes.
**`data_dir.join("identity").join("node_key")`** is the 32-byte raw Ed25519 seed file — this is
exactly what `reticulum_daemon.py --identity-key <path>` expects (confirmed against
`identity.rs` `NODE_KEY_FILE`/`load_or_create`). The daemon reads the file itself — Rust should
pass the **path**, not pipe the raw key bytes through more hops than already exist.
## Hardware update (2026-06-30)
**.116 has a Heltec V3 available to reflash with RNode firmware.** This unblocks Phase 0 gates
#2/#3 (previously marked blocked — `.198`'s radio is dead, but .116's Heltec V3 is a real path
forward without needing new hardware). Next concrete step once reflashed: run
`reticulum-daemon/reticulum_daemon.py` pointed at the RNode's serial path, confirm `--check`
hash matches `--selftest`, then bring up two instances (.116 + .228, after .228 also gets an
RNode-capable board) for the real two-node LXMF-over-LoRa gate.
## Daemon contract (already built + tested — Phase 2 codes against this, no changes needed)
`reticulum-daemon/reticulum_daemon.py`, RPC over Unix socket (0600), one JSON object per line:
- in: `{"cmd":"send","dest_hash":hex16,"content":...}` / `{"cmd":"announce"}` /
`{"cmd":"status"}` / `{"cmd":"shutdown"}`
- out: `{"event":"ready",...}` / `{"event":"recv",...}` / `{"event":"announce",...}` /
`{"event":"delivered",...}` / `{"event":"status",...}`
Verified: `--check` (hash only), `--selftest` (boots real RNS+LXMF, no radio), and a live
socket round-trip (`ready`→`status`→`shutdown`, clean exit) — see `reticulum-daemon/README.md`.
## Checkpoint 2026-06-30 (hardware session — gates #2/#3 PASSED)
Picked up after a session pipe-break; the live system (archipelago.service + the spawned
`archy-reticulum-daemon`) had kept running uninterrupted the whole time, so nothing was lost.
**What happened, in order:**
1. .116's Heltec V3 (CP2102, USB vendor/product `10c4:ea60`, serial `0001`) was reflashed with
RNode firmware and plugged into `/dev/mesh-radio` (generic udev symlink → `ttyUSB0`, not a
per-serial rule). `mesh-config.json` has `device_path: null` — pure auto-detect, no
`device_kind` pin needed.
2. Auto-detect correctly tried Meshcore → Meshtastic → Reticulum and found it: journal shows
`Found Reticulum (RNode) device via auto-detect path=/dev/mesh-radio` — but only **after**
~4 min of `Failed to spawn reticulum-daemon — is it installed/packaged?` retries, because
`/usr/local/bin/archy-reticulum-daemon` hadn't been copied into place yet from
`reticulum-daemon/dist/` (built via `./build.sh`). Once copied (sha256-verified match to the
`dist/` build), auto-detect succeeded on the very next retry.
3. `mesh.status` RPC confirmed live: `device_type: "reticulum"`, `device_connected: true`,
`dest_hash: 5d146f6e1c9707f89468b5016ed6dfad`. Periodic self-advert (`send_self_advert` →
`{"cmd":"announce"}` → real RNS `Identity.announce()`) firing every ~30s — confirmed this is
**not** the `send_nodeinfo_advert` no-op arm (that one's still legitimately a no-op for
Reticulum; the real announce path is `send_self_advert`, wired correctly).
4. Second RNode flashed onto a phone running **Sideband**. First attempt showed RF energy
(`interference_last_dbm` climbing) but `rxb: 0` — a parameter mismatch, **not** a frequency
problem (energy was detected, just not demodulated). Root cause: Spreading Factor mismatch
in Sideband's manual RNode interface config (frequency display rounds to one decimal so
"869.5" silently passed at first glance — bandwidth/SF/CR are separate fields and SF was
wrong). Once SF was corrected to match (freq `869525000`, BW `125000`, **SF `8`**, CR `5`),
`rxb` went non-zero immediately and a real `{"event":"announce","dest_hash":"1870744d...",
"app_data":"7a617a61"}` (hex for "zaza") arrived over the air.
5. **Gate #2 + gate #3 both passed in the same exchange**: `zaza` shows up as a real, reachable
`mesh.peers` contact; an inbound encrypted LXMF message ("Yoooo") arrived and was correctly
stamped `encrypted: true, transport: "reticulum"`; a reply was sent back and round-tripped.
Sideband is exactly the stock external client gate #3 calls for, so one real RNode-to-RNode
LoRa link covered both gates — no need for a second dedicated archy node.
6. **Two real bugs found from this, both fixed:**
- `record_sent_typed`'s `encrypted` flag was hardcoded `false`/`archy || pkc_capable` on the
Reticulum send path (both the native-text path in `send_message` and the typed-envelope
path in `send_typed_wire`) — correct for Meshcore/Meshtastic (where E2E really is
conditional on PKI/session state not yet threaded through), **wrong** for Reticulum: LXMF
encrypts every send to the destination identity key unconditionally, archy peer or not.
Fixed: both call sites now OR in `device_type == DeviceType::Reticulum`.
- `radio_transport_label()` collapsed Meshcore **and** Meshtastic into one generic `"lora"`
string, so the per-message pill couldn't distinguish them. User asked for 3 distinct pill
colors (Meshtastic mint, Meshcore orange, Reticulum blue) — extended the label fn to
return `"meshtastic"`/`"meshcore"`/`"reticulum"` distinctly, updated `Mesh.vue`'s
`transportLabel()` switch and `mesh-styles.css` (`.transport-meshtastic` `#3eb489`,
`.transport-meshcore` `#fb923c`, `.transport-reticulum` `#60a5fa`; kept `.transport-lora`
`#f59e0b` as a fallback for any already-stored legacy-labelled messages). `cargo check` +
`vue-tsc --noEmit` both green after.
**NOT yet done:**
- The Rust-side fix above (`encrypted` flag, transport-label split) is built but **not yet
deployed to .116's running binary** — the live daemon/auto-detect verification above was all
against the binary already running before this session's edits. Rebuild + redeploy to see the
fix live.
- `tests/lifecycle/run-gate.sh` not re-run after these mesh changes yet (project convention:
run after backend changes land).
- Multi-device (3 radios at once, Phase 4) and the release-tarball/udev-rule wiring (originally
"Next up" #6 below) are both still untouched.
## Next up (resume here)
Phase 0 gates #1#3 are now **all passed**. What's left:
1. Rebuild the backend + frontend and redeploy to .116 so the `encrypted`-flag fix and the
3-way transport-pill color split actually take effect on the live node (currently only
checked in with `cargo check`/`vue-tsc`, not deployed).
2. Re-verify on-device after redeploy: send another Sideband↔archy DM, confirm the Sent bubble
now shows E2E + a blue "Reticulum" pill, and confirm Meshtastic/Meshcore pills (if any
messages exist) render mint/orange instead of the old generic amber "LoRa".
3. Exercise the rest of the plan's "Verification (definition of done)" items: hot-swap
detection (unplug the RNode mid-session, confirm fallback to FIPS/Tor on the same contact;
replug, confirm it picks Reticulum back up), and `device_kind: Some(Reticulum)` pin path
(currently only auto-detect has been exercised on real hardware).
4. Run `tests/lifecycle/run-gate.sh` to confirm no regression from the mesh changes landing.
5. Only after the above: wire `dist/archy-reticulum-daemon` into the release tarball /
`scripts/deploy-to-target.sh` (target path `/usr/local/bin/archy-reticulum-daemon`, matching
`reticulum.rs`'s default) and add a per-serial-number `/dev/reticulum-radio` udev rule now
that a real board's serial number (`0001` on the CP2102, .116's board) is known — though a
second board will likely report the same `0001` stock serial since CP2102 modules commonly
ship with an unprogrammed default, so this may still need a different disambiguator.
6. Phase 4 (run all 3 radios at once) — still not started, follow-on after the above.