fix(mesh): mesh-preferred message routing with FIPS/Tor fallback

Messages to a federated peer that is out of LoRa range (e.g. on another
continent) were dropped into the radio with no fallback, or hung on a dead
FIPS path before reaching Tor — so they never arrived.

- Route a radio contact over the federation transport (FIPS->Tor) when it is
  the same node as a federated peer (known archipelago identity -> onion) AND
  it is not currently reachable over the radio. Reachable radio peers stay on
  the mesh (preferred); oversized/file envelopes still always take federation.
- Resolve the onion via the archipelago identity key (arch_pubkey_hex), not
  the firmware routing key, so a radio contact maps to its nodes.json onion.
- Add .fips_timeout(8s) to the federation message POST so an unreachable FIPS
  overlay fast-fails to Tor (~3-5s) instead of burning the 120s budget.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-19 10:09:14 -04:00
parent 0ac67f5092
commit 75e470bfa4

View File

@ -933,7 +933,25 @@ impl MeshService {
// over Tor; otherwise the send falls through to LoRa.
let is_federation_synthetic = contact_id & 0x8000_0000 != 0;
let exceeds_lora = wire.len() > protocol::MAX_MESSAGE_LEN;
if is_federation_synthetic || exceeds_lora {
// Mesh-preferred routing with a federation fallback. A normal radio
// contact is delivered over LoRa (preferred — free, local, no internet).
// But if that contact is the same node as a federated peer — we know its
// archipelago identity (`arch_pubkey_hex`) → onion — AND it is NOT
// currently reachable over the radio (out of LoRa range, e.g. a peer on
// another continent), route the message over the federation transport
// (FIPS→Tor) instead of handing it to a radio that physically cannot
// deliver it. Reachable radio peers stay on the mesh; oversized
// envelopes (file shares etc.) always take the federation path.
let radio_federated_unreachable = !is_federation_synthetic
&& !exceeds_lora
&& {
let peers = self.state.peers.read().await;
peers
.get(&contact_id)
.map(|p| !p.reachable && p.arch_pubkey_hex.is_some())
.unwrap_or(false)
};
if is_federation_synthetic || exceeds_lora || radio_federated_unreachable {
// Resolve the peer's pubkey/did. Prefer the live mesh peer table,
// but fall back to federation storage for federation-synthetic ids
// that were never seeded into `state.peers` — e.g. a radio-less
@ -942,9 +960,15 @@ impl MeshService {
// even though we know its onion from nodes.json.
let from_table = {
let peers = self.state.peers.read().await;
peers
.get(&contact_id)
.map(|p| (p.pubkey_hex.clone(), p.did.clone()))
peers.get(&contact_id).map(|p| {
// Resolve via the archipelago IDENTITY key (not the firmware
// routing key) — that's what matches the peer's onion entry
// in nodes.json for the federation lookup below.
(
p.arch_pubkey_hex.clone().or_else(|| p.pubkey_hex.clone()),
p.did.clone(),
)
})
};
let (peer_pubkey, peer_did) = match from_table {
Some(v) => v,
@ -1078,7 +1102,14 @@ impl MeshService {
"/archipelago/mesh-typed",
)
.service(crate::settings::transport::PeerService::Messaging)
.timeout(std::time::Duration::from_secs(120));
.timeout(std::time::Duration::from_secs(120))
// Fast-fail a FIPS path the peer isn't reachable on (the common case
// for remote/Tailscale peers that share no FIPS overlay with us) so
// the Tor fallback delivers the message in ~3-5s instead of the send
// hanging on FIPS. FIPS-reachable peers connect in <1s and still use
// it; only an unreachable FIPS path is short-circuited. Matches the
// federation-sync fix. 8s ≈ the FIPS connect_timeout headroom.
.fips_timeout(std::time::Duration::from_secs(8));
match req.send_json(&body).await {
Ok((resp, transport)) if resp.status().is_success() => {
tracing::debug!(contact_id, transport = %transport, "Federation envelope delivered");