fix(mesh): restore Meshtastic inbound stream after radio reboot

archy went deaf to inbound LoRa packets after every config write.
A config write (region/channel/owner) reboots the radio, which resets
the firmware PhoneAPI to STATE_SEND_NOTHING; it won't stream received
packets again until the client re-sends want_config. archy ignored
FromRadio.rebooted (field 8) so never resubscribed — which is why old
messages only arrived after a full restart (restart = fresh want_config).

- meshtastic.rs: handle FROM_RADIO_REBOOTED -> set pending_reinit;
  try_recv_frame re-sends want_config to resubscribe the packet stream.
  Add send_keepalive (bare heartbeat) and pin modem_preset=LONG_FAST in
  set_lora_region so all radios share frequency.
- listener/session.rs: MeshRadioDevice::send_keepalive; 10s sync_timer
  sends a keepalive each tick (insurance vs 15-min idle serial close).
- mod.rs send_message: device-aware send — Meshtastic archy peers get a
  plain TEXT_MESSAGE_APP DM (firmware PKC E2E); Meshcore archy peers keep
  the typed envelope (no meshcore regression).

Verified: .198->.228 directed DM arrives as RECEIVED enc=True
peer="Arch Optiplex"; all 3 nodes (.116/.198/.228) + 3ccc hear each
other. Binary 737b16c3 deployed+active on all three.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-30 12:44:31 -04:00
parent fbfeeeb0f5
commit a57ae388ec
3 changed files with 115 additions and 22 deletions

View File

@ -71,6 +71,17 @@ impl MeshRadioDevice {
}
}
/// Lightweight serial keepalive (Meshtastic only). Keeps the firmware
/// streaming RECEIVED packets to our serial client — without it the radio
/// can mark a quiet client gone and deliver only our own queue-status.
/// Meshcore needs no such ping.
async fn send_keepalive(&mut self) -> Result<()> {
match self {
Self::Meshcore(_) => Ok(()),
Self::Meshtastic(device) => device.send_keepalive().await,
}
}
/// Actively advertise our identity over the air. Meshcore already does this
/// inside `send_self_advert` (CMD_SEND_SELF_ADVERT), so this is a no-op for
/// it; Meshtastic needs an explicit NodeInfo broadcast or peers never learn
@ -806,8 +817,14 @@ pub(super) async fn run_mesh_session(
handle_send_command(cmd, &mut device, state, &mut consecutive_write_failures).await;
}
// Periodic message sync
// Periodic message sync + serial keepalive
_ = sync_timer.tick() => {
// Keep the radio streaming inbound packets to our serial client
// (best-effort — a failed keepalive shouldn't trip the reconnect
// counter on its own; a truly dead port is caught by real writes).
if let Err(e) = device.send_keepalive().await {
debug!("Mesh keepalive failed: {}", e);
}
if sync_queued_messages(&mut device, state, our_x25519_secret).await {
consecutive_write_failures += 1;
debug!(failures = consecutive_write_failures, "Message sync failed");

View File

@ -93,6 +93,14 @@ const DEFAULT_PSK_EXPANDED: &[u8] = &[
const CONFIG_LORA_FIELD: u64 = 6;
/// LoRaConfig field numbers we set when provisioning the radio's region.
const LORA_USE_PRESET_FIELD: u64 = 1;
/// LoRaConfig.modem_preset (field 2). Pinned to LONG_FAST (0) so every archy
/// radio computes the SAME over-the-air frequency/bandwidth. Omitting it (relying
/// on the firmware default) lets a radio keep a non-default preset persisted via
/// the phone app or a differing factory default — which puts radios on different
/// airwaves despite identical region + channel, so they silently never hear each
/// other. ModemPreset enum: LONG_FAST = 0.
const LORA_MODEM_PRESET_FIELD: u64 = 2;
const LORA_MODEM_PRESET_LONG_FAST: u64 = 0;
const LORA_REGION_FIELD: u64 = 7;
const LORA_HOP_LIMIT_FIELD: u64 = 8;
const LORA_TX_ENABLED_FIELD: u64 = 9;
@ -141,6 +149,14 @@ pub struct MeshtasticDevice {
/// the session loop reads it via `take_rx_encrypted()` right after dispatch
/// to stamp the message's E2E pill. Set true only for `pki_encrypted` DMs.
last_rx_encrypted: bool,
/// Set when the radio announces it just rebooted (`FromRadio.rebooted`). A
/// rebooted firmware drops every client's `want_config` session, so it stops
/// streaming RECEIVED packets to us (we keep getting only our own
/// queue-status). We must re-send `want_config` to re-subscribe to the live
/// packet stream — otherwise inbound messages silently never surface after
/// any config write (region/channel/owner all reboot the radio). Consumed in
/// `try_recv_frame`, which re-issues the handshake.
pending_reinit: bool,
}
impl MeshtasticDevice {
@ -172,6 +188,7 @@ impl MeshtasticDevice {
current_secondary_channel: None,
device_path: path.to_string(),
last_rx_encrypted: false,
pending_reinit: false,
})
}
@ -357,11 +374,15 @@ impl MeshtasticDevice {
anyhow::bail!("Meshtastic set_lora_region: node_num unknown");
};
// LoRaConfig { use_preset(1)=true, region(7)=code, hop_limit(8)=3,
// tx_enabled(9)=true }. modem_preset defaults to LONG_FAST (0) and
// LoRaConfig { use_preset(1)=true, modem_preset(2)=LONG_FAST, region(7)=code,
// hop_limit(8)=3, tx_enabled(9)=true }. We pin modem_preset explicitly
// (rather than relying on the firmware default) so every archy radio lands
// on the SAME frequency/bandwidth — otherwise a radio carrying a stale
// non-default preset stays on different airwaves and silently never meshes.
// tx_power defaults to max, which is what we want for a stock mesh.
let mut lora = Vec::new();
encode_varint_field_into(LORA_USE_PRESET_FIELD, 1, &mut lora);
encode_varint_field_into(LORA_MODEM_PRESET_FIELD, LORA_MODEM_PRESET_LONG_FAST, &mut lora);
encode_varint_field_into(LORA_REGION_FIELD, region_code as u64, &mut lora);
encode_varint_field_into(LORA_HOP_LIMIT_FIELD, 3, &mut lora);
encode_varint_field_into(LORA_TX_ENABLED_FIELD, 1, &mut lora);
@ -509,6 +530,15 @@ impl MeshtasticDevice {
self.send_time_broadcast().await
}
/// Lightweight serial keepalive: a bare `ToRadio.heartbeat`. The firmware's
/// PhoneAPI treats a client that goes quiet as gone and can stop streaming
/// received packets to it; a once-a-minute advert heartbeat is too sparse, so
/// the session loop pings this every few seconds to keep the inbound stream
/// flowing. No NodeInfo/Position side effects, so it's cheap to call often.
pub async fn send_keepalive(&mut self) -> Result<()> {
self.send_to_radio(&encode_heartbeat()).await
}
/// Broadcast a minimal Position payload carrying current epoch time. The
/// Meshtastic protobuf explicitly documents `Position.time` as the path for
/// phone/API clients to set time on mesh devices without GPS/RTC. This keeps
@ -677,9 +707,22 @@ impl MeshtasticDevice {
// continuous flood still yields back to the session select! loop.
for _ in 0..64 {
let Some(frame) = self.read_from_radio().await? else {
return Ok(None);
break;
};
if let Some(inbound) = self.handle_from_radio(&frame) {
let inbound = self.handle_from_radio(&frame);
// If the radio announced a reboot while draining, re-subscribe to the
// live packet stream BEFORE returning, so we don't go deaf to inbound
// packets for the rest of the session. (A reboot drops our want_config
// session on the firmware side.)
if self.pending_reinit {
self.pending_reinit = false;
if let Err(e) = self.send_to_radio(&encode_want_config()).await {
warn!("Failed to re-request config after radio reboot: {}", e);
} else {
info!("Re-requested Meshtastic config after reboot — packet stream resubscribed");
}
}
if let Some(inbound) = inbound {
return Ok(Some(inbound));
}
}
@ -809,8 +852,18 @@ impl MeshtasticDevice {
}
None
}
FROM_RADIO_REBOOTED => {
// The radio just rebooted (a config write, or a manual/OTA
// reboot). Its firmware has dropped our `want_config` session,
// so it will no longer stream RECEIVED packets to us — we'd be
// left hearing only our own queue-status and silently miss every
// inbound message. Flag a re-subscribe; `try_recv_frame` re-issues
// `want_config` to resume the live packet stream.
warn!("Meshtastic radio rebooted — will re-request config to resume packet stream");
self.pending_reinit = true;
None
}
FROM_RADIO_CONFIG_COMPLETE_ID
| FROM_RADIO_REBOOTED
| FROM_RADIO_QUEUE_STATUS
| FROM_RADIO_XMODEM_PACKET
| FROM_RADIO_METADATA

View File

@ -1542,21 +1542,45 @@ impl MeshService {
/// MeshMessage carries a stable MessageKey — this is what makes replies
/// and reactions addressable against plain text bubbles.
pub async fn send_message(&self, contact_id: u32, text: &str) -> Result<MeshMessage> {
use crate::mesh::message_types::{MeshMessageType, TypedEnvelope};
let seq = self.state.next_send_seq(contact_id).await;
// Plain chat text — to BOTH archy peers and stock devices — is sent as a
// native Meshtastic DM on TEXT_MESSAGE_APP. The firmware end-to-end
// (PKC / Curve25519) encrypts a directed DM whenever it knows the
// destination's public key, which archy peers exchange via NodeInfo, so
// the message is delivered E2E and surfaces as chat on every client.
//
// We deliberately do NOT wrap archy↔archy text in our binary typed
// envelope here. Meshtastic firmware 2.7.x will not deliver an opaque
// directed payload as a message: PRIVATE_APP is treated as opaque app
// data (never shown as chat), and a base64 envelope overflows a single
// LoRa frame and chunk-fails. Wrapping text was exactly what silently
// broke archy↔archy LoRa while archy→stock (plain text) kept working.
// Rich typed messages (invoice/coordinate/reaction/…) still use the
// typed-wire path via `send_typed_wire`; only plain Text goes native.
let device_type = self.state.status.read().await.device_type;
let archy = self.is_archy_peer(contact_id).await;
// Transport choice is DEVICE-AWARE so we fix Meshtastic without regressing
// Meshcore:
// • Meshtastic (any peer) → plain text native DM on TEXT_MESSAGE_APP. The
// firmware end-to-end (PKC/Curve25519) encrypts a directed DM to any
// peer whose public key it knows (archy peers exchange them via
// NodeInfo), so it's delivered E2E and shows as chat on every client.
// Meshtastic firmware 2.7.x will NOT deliver our opaque binary typed
// envelope as a message (PRIVATE_APP is opaque app-data; a base64
// envelope overflows one LoRa frame and chunk-fails) — wrapping text
// is exactly what silently broke archy↔archy Meshtastic LoRa.
// • Meshcore archy peer → keep the rich signed typed envelope. Meshcore
// frames are binary-safe (no UTF-8 mangling) and it carries its own
// session E2E + our signature for `!ai` auth / seq reply addressing,
// so the envelope works there and we must not drop it.
// • Meshcore stock client → plain text (can't decode our envelope).
// Rich typed messages (invoice/coordinate/reaction/…) always use the
// typed-wire path via `send_typed_wire`; only plain Text is routed here.
let use_typed_envelope = archy && device_type == DeviceType::Meshcore;
if use_typed_envelope {
// Sign with our archipelago identity so the receiver can authenticate
// us over LoRa (verifies against our bound `arch_pubkey_hex`). `with_seq`
// is applied after signing — seq is not covered by the signature.
let envelope = TypedEnvelope::new_signed(
MeshMessageType::Text,
text.as_bytes().to_vec(),
&self.signing_key,
)
.with_seq(seq);
let wire = envelope.to_wire()?;
return self
.send_typed_wire(contact_id, wire, "text", text, None, seq)
.await;
}
let dest_prefix = self.peer_dest_prefix(contact_id).await?;
self.state
.send_cmd(listener::MeshCommand::SendNativeText {
@ -1569,7 +1593,6 @@ impl MeshService {
// archy peers always exchange keys, so mark those Sent rows E2E so the
// pill shows immediately. (The receiver independently stamps E2E from the
// radio's `pki_encrypted` flag, so an inbound row is accurate regardless.)
let e2e = self.is_archy_peer(contact_id).await;
Ok(self
.record_sent_typed(
contact_id,
@ -1578,7 +1601,7 @@ impl MeshService {
None,
seq,
Some("lora".to_string()),
e2e,
archy,
)
.await)
}