Inbound Meshtastic text addressed to BROADCAST_NUM (the default public LongFast channel, or any channel slot) was filed into a per-sender 1:1 DM thread, so public-channel messages polluted individual people's DM chats and appeared as if sent directly to the user. packet_to_inbound_frame now detects `to == BROADCAST_NUM` and emits a new synthetic RESP_MESHTASTIC_CHANNEL_TEXT frame ([channel_idx][sender_prefix(6)][text]) that the listener files under the channel thread (contact_id = u32::MAX - idx) while still attributing the message to its real sender. Directed text (to == our node) still routes to the DM thread — a regression test locks that split in. send_channel_text now sets MeshPacket.channel (field 3) so archy actually transmits on channel 0 (public) instead of ignoring the slot. Mesh.vue keeps the synthetic "Meshtastic !xxxx" sender id when that is the best identity available for a stock public-channel device. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
210 lines
14 KiB
Markdown
210 lines
14 KiB
Markdown
# 1.8.0 OTA Session Progress
|
|
|
|
Updated: 2026-06-30
|
|
|
|
---
|
|
|
|
## ▶️ RESUME HERE — archy↔archy LoRa (2026-06-30 PM) — READ FIRST
|
|
|
|
**Goal:** archy↔archy text over Meshtastic LoRa must DELIVER and show the E2E pill,
|
|
identical in off-grid and normal mode. Test bed = `.116` / `.198` / `.228` (all EU_868).
|
|
Don't touch the federation/FIPS path.
|
|
|
|
### ✅✅✅ SOLVED 2026-06-30 — archy↔archy LoRa WORKS (delivery + E2E pill + identity)
|
|
VERIFIED: `.198→.228` directed DM → `.228` row `RECEIVED enc=True peer="Arch Optiplex"`.
|
|
All three nodes (.116/.198/.228) now hear each other + stock peer 3ccc. Deployed binary
|
|
**`737b16c3235b`** active on all three. Fix source **COMMITTED as `a57ae388`** on `main`
|
|
(not yet pushed to gitea-vps2/origin).
|
|
|
|
**THE fix (receive stream):** archy ignored `FromRadio.rebooted` (field 8). Every config
|
|
write reboots the radio → firmware PhoneAPI resets to `STATE_SEND_NOTHING` and stops
|
|
streaming received packets until the client re-sends `want_config`. archy never did →
|
|
went deaf to inbound (that's why old messages only arrived after a full restart = fresh
|
|
want_config). Fix: handle `FROM_RADIO_REBOOTED` → set `pending_reinit` → re-send
|
|
want_config; plus a 10s keepalive heartbeat (insurance vs 15-min idle serial close) and
|
|
a pinned `modem_preset=LONG_FAST` so all radios share frequency. Combined with the earlier
|
|
E2E send fix (plain TEXT_MESSAGE_APP DM, firmware PKC) this closes archy↔archy LoRa.
|
|
|
|
**Open follow-ups:** #A surface received msgs under archy identity in all UI views; #6
|
|
device-onboarding modal; #8 Device-tab settings panel; #7 re-verify .116 in rotation;
|
|
#12 make modem_preset authoritative + hot-swap re-binding + RX-stall watchdog;
|
|
#14 signal-strength (RSSI/SNR) indicator per contact (from MeshPacket rx_rssi/rx_snr);
|
|
#15 map view plotting peer locations where shared (Meshtastic POSITION_APP portnum=3
|
|
lat/lon). See the resume memory `project_session_resume_2026_06_30_lora.md` for the full
|
|
task list.
|
|
|
|
### (historical) earlier TL;DR — RF-layer suspicion, now RESOLVED by the reboot-recovery fix
|
|
The **archy software is correct and deployed.** The blocker was at the
|
|
**radio/RF layer: the three radios are not hearing each other over the air at all.** No
|
|
amount of archy code change will fix that until the radios actually RF-link. **Resume by
|
|
testing the radios directly at home (Meshtastic phone app over Bluetooth) — see "DO THIS
|
|
FIRST AT HOME" below.** ← this turned out to be the want_config resubscribe bug above.
|
|
|
|
### What is DONE and deployed (commit pending — see below)
|
|
- **E2E send fix** (`core/archipelago/src/mesh/mod.rs` `send_message`, ~L1542): archy↔archy
|
|
plain chat text is now sent as a **native `TEXT_MESSAGE_APP` DM** (firmware PKC-encrypts
|
|
it E2E), NOT wrapped in our binary typed envelope. Archy peers' Sent rows are marked
|
|
`encrypted=true` so the pill shows. Rich typed msgs still use `send_typed_wire`. This was
|
|
the original root-cause fix (envelope-wrapped text silently broke archy↔archy LoRa).
|
|
- **NEW: software radio-reboot** end-to-end, so a wedged/RX-deaf radio can be rebooted
|
|
without physical access (and for the Device-tab settings panel the user requested):
|
|
- `meshtastic.rs`: `reboot(seconds)` driver method + `ADMIN_REBOOT_SECONDS_FIELD = 97`
|
|
(verified vs meshtastic/protobufs admin.proto — `set_owner=32/set_channel=33/set_config=34`
|
|
matched our existing constants, confirming the proto read).
|
|
- `listener/mod.rs`: `MeshCommand::RebootRadio { seconds }`.
|
|
- `listener/session.rs`: device-enum `reboot()` dispatch (Meshtastic only) + handler arm.
|
|
- `mesh/mod.rs`: `MeshService::reboot_radio(seconds)`.
|
|
- `api/rpc/mesh/messaging.rs`: `handle_mesh_reboot_radio` → RPC **`mesh.reboot-radio`**
|
|
`{seconds?}` (default 2); dispatcher arm in `api/rpc/dispatcher.rs`.
|
|
- `cargo check` passes. Built release **sha `ba4aed590027690d`** and DEPLOYED + active on
|
|
`.116/.198/.228`. The RPC works (`{"reboot":true,"seconds":2}`).
|
|
- ⚠️ **Caveat:** when called, archy logged "Sent Meshtastic radio reboot" but the radio did
|
|
**not** visibly reboot afterward (no config re-stream). Either field 97 is still off, or
|
|
newer firmware requires an admin session passkey even over local serial, or the USB serial
|
|
stayed open through the 2s reboot so no reconnect was logged. **Needs on-device verification.**
|
|
|
|
### The hard evidence (why "nothing works")
|
|
- Directed DM tests `.198→.228` AND `.116→.228` (neither path reflashed): sender logs
|
|
`Sent plain native DM dest=30d258436d65 part=1 total=1` and RPC returns `sent:true,
|
|
encrypted:true`, but `.228` logs **nothing** — packet never reaches archy from the radio.
|
|
- A raw broadcast from `.198` (`mesh.broadcast`) was accepted by its radio but **not heard**
|
|
by `.228`/`.116`.
|
|
- In an 8-minute window, **all three nodes received 0 inbound OTA packets from any other node.**
|
|
Each only logs its OWN once-a-minute `Broadcast Meshtastic NodeInfo advert` + local TX
|
|
`field=11` queue-status. `.228 mesh.status` = `messages_received:1` total.
|
|
- `.198`'s radio is alive and transmitting NodeInfo every 60s — so it's not dead; it's that
|
|
**reception is broken on the receivers.** A radio cannot drop a broadcast AND a unicast to
|
|
its own node number while config matches, unless it simply isn't on the same airwaves.
|
|
- archy provisioning is correct & identical across nodes (read back from device): PRIMARY =
|
|
public LongFast (`name="" psk_len=1`), SECONDARY = `archipelago`, region=3 (EU_868). Admin
|
|
field constants verified. The send path hands the radio a correct unicast MeshPacket
|
|
(`to`=node, want_ack, hop_limit=3, plaintext `decoded` for the firmware to PKC-encrypt).
|
|
|
|
### PRIME SUSPECT (software-fixable) — modem-preset / frequency mismatch
|
|
archy only ever writes `region` + `use_preset` and **never explicitly pins `modem_preset`**
|
|
(it parses region but not preset; `set_lora_region` relies on the LongFast default). If ANY
|
|
radio has a non-default modem preset / frequency slot persisted (e.g. set via the Meshtastic
|
|
app, or a different factory default after the `.198` reflash), the radios are on **different
|
|
airwaves despite identical channel name + region**, and archy would never correct it.
|
|
|
|
### DO THIS FIRST AT HOME (decisive, ~2 min, only the user can do it)
|
|
Open the **Meshtastic phone app over Bluetooth** (works alongside archy's USB serial) on each
|
|
of `.116/.198/.228` and check:
|
|
1. Do the 3 nodes **see each other** in the node list (recent "heard")? → if NO, they're not
|
|
RF-reaching (preset/freq/antenna/range).
|
|
2. Do all 3 show the **same** Modem preset (LongFast), Region (EU_868), Frequency slot, and
|
|
the same PRIMARY channel? → any difference = the cause.
|
|
This single test separates "archy misconfigures the radios" from "radios physically can't
|
|
reach each other."
|
|
|
|
### THEN — the archy fix to apply (if preset/config differs)
|
|
Make archy **authoritatively write the full LoRaConfig** and force re-provision so all radios
|
|
converge: in `core/archipelago/src/mesh/meshtastic.rs::set_lora_region` (and its
|
|
caller/guard `ensure_lora_region` ~L304), explicitly set `modem_preset = LONG_FAST (0)` as a
|
|
field in the LoRaConfig (it's currently omitted/defaulted), and make the startup provision
|
|
path rewrite LoRa config when the preset doesn't match, then reboot the radio (use the new
|
|
`mesh.reboot-radio`). Also verify the `mesh.reboot-radio` actually reboots the radio
|
|
on-device (the caveat above).
|
|
|
|
### TEST RECIPE (works on each node)
|
|
- RPC helper used this session: a node-side `rpc.sh` that logs in (password
|
|
`ThisIsWeb54321@`), grabs the `csrf_token` cookie, echoes it as `X-CSRF-Token`, and POSTs to
|
|
`http://127.0.0.1:5678/rpc/v1`. Recreate it or run archy's RPC directly. Methods:
|
|
`mesh.peers`, `mesh.status`, `mesh.messages`, `mesh.send {contact_id,message}`,
|
|
`mesh.broadcast`, `mesh.reboot-radio {seconds}`.
|
|
- **LoRa contact ids:** `.116=1135977788` (prefix `3ca5b543`), `.198=3677050140` (`db2b551c`),
|
|
`.228=1129894448` (prefix `30d25843`), stock `3ccc=1128152268`.
|
|
- **Link health check (run on each node):** look for inbound `from=Some("!...")` lines in
|
|
`journalctl -u archipelago` that are NOT the node's own `Broadcast ... NodeInfo advert`. If
|
|
zero across all nodes → RF link is down (the current state).
|
|
- **E2E success criteria:** send `.198→.228`, the marker appears in `.228` `mesh.messages` as
|
|
an inbound row with `encrypted:true` / `transport:"lora"`, AND `.116↔.228` likewise.
|
|
|
|
### DEPLOY / BUILD RECIPE
|
|
- Build: from `core/`, `CARGO_TARGET_DIR=/tmp/archy-hotfix-target CARGO_INCREMENTAL=0 cargo
|
|
build --release -p archipelago --bin archipelago`. (If `rust-lld: undefined hidden symbol`,
|
|
it's incremental cache — `CARGO_INCREMENTAL=0` fixes it.)
|
|
- SSH key `~/.ssh/archipelago-deploy` is authorized on `.116/.198/.228`. SSH/UI/RPC password
|
|
`ThisIsWeb54321@`. Per node: scp the binary, `sudo systemctl stop archipelago` →
|
|
`kill -9 $(pgrep -x archipelago)` → `install -m0755` to `/usr/local/bin/archipelago` →
|
|
`systemctl start archipelago`. Verify by `sha256sum` match + `systemctl is-active`.
|
|
- **Current deployed sha on all 3 = `ba4aed590027690d`** (the reboot-enabled build).
|
|
|
|
### Fleet state (as of 2026-06-30 PM)
|
|
- All 3 nodes on binary `ba4aed59`, active. Off-grid mode currently OFF (`mesh_only:false`).
|
|
- `.198` radio was reflashed to factory `firmware-heltec-v3-2.7.26` (recovered from corrupt
|
|
NVS); region EU_868 persists. Its archy identity is NOT re-bound on `.228` (`.228` shows
|
|
`.198` as raw radio "Meshtastic 551c", `arch_pubkey_hex` absent) because `.228` hasn't heard
|
|
`.198`'s identity broadcast — a downstream symptom of the dead RF link, not a separate bug.
|
|
- The radios are powered & each transmitting; they are simply not hearing each other.
|
|
|
|
### Deferred UI (after LoRa works)
|
|
- Device-tab **settings panel** (gear/desktop) — host the "Reboot radio" button there; calls
|
|
`mesh.reboot-radio`. Scoping done: add to the Mesh.vue actions row (mirrors Broadcast/Off-Grid
|
|
buttons) + a `rebootRadio()` method in `neode-ui/src/stores/mesh.ts`. See `Mesh.vue` ~L1484
|
|
actions row and `mesh.ts` ~L373 `broadcastIdentity()` pattern.
|
|
- Device-onboarding modal (detect plugged-in radio).
|
|
|
|
---
|
|
|
|
Current scope:
|
|
- Preserve existing mesh work: E2E indicators, FIPS/Tor transport indicators, typed-message paths, Meshtastic region/channel provisioning, and dirty Meshtastic receive-attempt changes.
|
|
- Take over the `3ccc` stock Meshtastic peer bug: LoRa text from `3ccc` to Archipelago `.116` does not surface in `mesh.messages`.
|
|
- Keep release-gate fixes already made in this session.
|
|
|
|
Local gate status so far:
|
|
- `cargo test -p archipelago --bin archipelago`: green, 849/849 after Meshtastic fixes.
|
|
- `python3 scripts/check-app-catalog-drift.py --release --strict`: green.
|
|
- `npm run type-check`: green.
|
|
|
|
Key changes made so far:
|
|
- Added cascade uninstall progress truthfulness assertion to `tests/lifecycle/bats/cascade-uninstall.bats`.
|
|
- Fixed release catalog drift filters and regenerated catalog metadata.
|
|
- Fixed invalid `apps/fedimint-clientd/manifest.yml` `cpu_limit` schema value.
|
|
- Updated stale/tight Rust tests without changing production behavior.
|
|
|
|
Remaining non-automatable / operational gates:
|
|
- Workstream B signing is blocked on the offline `RELEASE_MASTER_MNEMONIC`; code + runbook exist, but the publisher must pin/sign the release-root catalog.
|
|
- Phase-3 Quadlet backend rollout is implemented behind `use_quadlet_backends` and default-off. The gate skip-passes until explicitly enabled on a node; flipping it fleet-wide requires a coordinated flag rollout plus backend reinstall/migration verification.
|
|
- `.116` read-only `use-quadlet-backends-install.bats`: 6/6 skip-clean; no backend `.container` units, so Phase-3 is not active on that node.
|
|
- Release metadata still says `1.7.99-alpha` in `releases/manifest.json`; changelog top is `v1.8.00-alpha`. Cutting an actual 1.8.0 OTA requires an explicit version/manifest update.
|
|
|
|
Do not discard:
|
|
- `core/archipelago/src/mesh/listener/decode.rs`
|
|
- `core/archipelago/src/mesh/listener/session.rs`
|
|
- `core/archipelago/src/mesh/meshtastic.rs`
|
|
|
|
3ccc bug current hypothesis:
|
|
- The prior attempted Meshtastic fix added a hard stale-packet filter using `rx_time`.
|
|
- Stock Meshtastic radios without GPS/RTC can report tiny nonzero epoch values until time sync.
|
|
- That would make live `3ccc` packets look older than 10 minutes and get dropped before `mesh.messages`.
|
|
- Current patch treats implausibly early `rx_time` values as unknown rather than stale.
|
|
|
|
.116 live validation after 2026-06-30 hotfix:
|
|
- `.116` reachable by SSH; `archipelago` active; `/dev/mesh-radio -> ttyUSB0` attached.
|
|
- Current canary deploy is commit `b4531bb4`; backend sha
|
|
`4ab53e539d89679ef664401a9a57996267772fed02327abc2912c3e77543acbf`; frontend bundle
|
|
`index-YOAeJF7w.js` / `Mesh-BSAo88jN.js`.
|
|
- `main` pushed to `gitea-vps2`.
|
|
- RPC on `.116`:
|
|
- `transport.status` currently reports `mesh_only:false` (off-grid mode is not enabled unless
|
|
the user toggles it).
|
|
- `mesh.status` reports Meshtastic connected: `device_type:"meshtastic"`,
|
|
`self_node_id:1135977788`, `peer_count:13`.
|
|
- Recent `.116` -> `3ccc` sent rows are stored with real 2026 timestamps and `transport:"lora"`.
|
|
- UI/backend fixes included in `b4531bb4`:
|
|
- `transportLabel("lora")` displays **LoRa**.
|
|
- mesh sends refetch messages after send so transport pills settle without browser refresh.
|
|
- off-grid mode blocks the mesh-chat FIPS/Tor federation fallback and forces LoRa-only sends;
|
|
banner text is `Tor/FIPS disabled - LoRa only`.
|
|
- empty mesh-chat placeholder opacity reduced.
|
|
- Meshtastic diagnostics now identify the remaining blocker:
|
|
- 3ccc NodeInfo is discovered:
|
|
`Meshtastic peer is PKC-capable (NodeInfo public_key) node=1128152268 key_len=32`.
|
|
- Bytes from stock Meshtastic text reach `.116`, but the custom parser rejects the packet:
|
|
`Meshtastic FromRadio.packet did not parse into a decoded MeshPacket len=73 head=0dcc3c3e43153ca5b5432a16df56cbed`.
|
|
- Non-text packets decode and are ignored with port numbers (`portnum=3/4/5`), so the serial
|
|
read path is alive. Resume inside `core/archipelago/src/mesh/meshtastic.rs::parse_mesh_packet`.
|
|
- LoRa is therefore **not fully fixed** yet: stock `3ccc` -> `.116` text does not surface in
|
|
`mesh.messages`, and `.116` -> `3ccc` still needs user-visible confirmation in the Meshtastic app.
|