Federated nodes failed to converge to full-mesh across the LAN<->Tailscale
boundary: nodes were invisible to peers, sync 'took ages'/timed out, and
names only updated on a manual sync. Onions were healthy in both directions
(~3-5s); the failures were app-layer.
- B: federation dials fast-fail a dead FIPS path via .fips_timeout(6s) in
sync_with_peer + notify_join, so the Tor fallback isn't stuck behind the
full 30s FIPS budget when LAN and remote peers share no FIPS path.
- A: notify_join (peer-joined) now spawns with retries+backoff instead of a
single awaited best-effort POST, so the join RPC returns instantly (no
'Request timeout') and the inviter reliably learns the joiner (was
asymmetric).
- C: new 90s periodic federation auto-sync (none existed) so renamed nodes
and roster changes propagate without a manual Sync click.
- self-heal: each auto-sync re-asserts membership to any peer that doesn't
list us back, converging the fleet to full-mesh and healing pre-existing
asymmetry with no manual re-joins.
Validated live across 7 nodes: a previously fleet-invisible node became
fully meshed automatically (logs: 'auto-sync ... reasserted=1',
'peer-joined ... delivered').
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>