docs: handoff — mesh rename done, .120->.89 dup-contact diagnosis, netbird TODO
Resume notes for the 1.8.0 bug-bash mesh work: Meshtastic rename shipped + verified; .120->.89 'non-delivery' diagnosed to a duplicate-contact surfacing bug (messages inject fine, split across federation/radio twin contact_ids); design for the dedup fix (#12) and the netbird logout-race map (#10). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d00d1b20d7
commit
5f7e8dca80
82
docs/HANDOFF-2026-06-20-mesh-netbird.md
Normal file
82
docs/HANDOFF-2026-06-20-mesh-netbird.md
Normal file
@ -0,0 +1,82 @@
|
||||
# Handoff — Mesh device rename, mesh routing, duplicate contacts, netbird logout (2026-06-20)
|
||||
|
||||
Session is a **test-build iteration toward the 1.8.0 bug-bash release** — sideload patched binaries
|
||||
to test nodes, NO version bump / NO OTA release (manifest stays `1.7.99-alpha`). Because the version
|
||||
string never changes, **verify a deploy by sha256-matching the deployed binary**, not by `current_version`.
|
||||
|
||||
## Test node roster (creds in the operator's local notes / agent memory — NOT in this repo)
|
||||
- `.116` 192.168.1.116 — this build host (archi-thinkpad), dev/validation.
|
||||
- `.198` 192.168.1.198, `.228` 192.168.1.228 — LAN resilience nodes.
|
||||
- `.5` Tailscale 100.72.136.5 (archy-x250-beta) — **Meshtastic radio**.
|
||||
- `.120` Tailscale 100.66.157.120 (archy-x250-exp) — **Meshtastic radio**.
|
||||
- `.89` Tailscale 100.89.209.89 (archy-x250-pa) — **dual radio**: ttyACM0 Meshtastic (probe FAILS),
|
||||
ttyUSB0 MeshCore (active). Configured device_path = ttyACM0. Runs netbird (v2.38.0).
|
||||
|
||||
Deploy driver used this session: `/tmp/archy-deploy/deploy-node.sh <user@host> <pw> <label>`
|
||||
(scp binary + stream `web/dist/neode-ui` + sudo swap `/usr/local/bin/archipelago`, preserve aiui +
|
||||
claude-login.html, chown 1000:1000, restart, verify sha256+health). Recreate from this doc if /tmp is gone.
|
||||
|
||||
## Deploy state (binary sha) at handoff
|
||||
- `b5183dfc…` (HEAD d00d1b20, includes Meshtastic rename) → on **.5 and .120** (verified).
|
||||
- `f702b4f1…` (the 3 wallet/mesh/ui fixes, pre-rename) → on **.116, .198, .228**.
|
||||
- `7c17a96…` (OLD, pre-f702b4f1) → **.89 is STALE** — update before re-testing .120→.89.
|
||||
|
||||
## DONE
|
||||
1. **Meshtastic device rename → server name** — committed `d00d1b20` (pushed to gitea-vps2/main).
|
||||
`meshtastic.rs set_advert_name` was a no-op (in-memory only). Now sends
|
||||
`AdminMessage{set_owner=User{long_name,short_name}}` to the local node on ADMIN_APP port (6),
|
||||
set_owner field = 32. long_name = server name (≤39), short_name = first 4 alphanumerics upper-cased.
|
||||
**Hardware-verified**: .120 radio now reads back `Archy-X250-EXP`, .5 reads back `Archy-X250-Beta`.
|
||||
MeshCore already renamed (CMD_SET_ADVERT_NAME, serial.rs:147) — unchanged, now at parity.
|
||||
2. **Routing priority confirmed = Mesh → FIPS → Tor**. `send_typed_wire` (mesh/mod.rs:1007): reachable
|
||||
radio peer → LoRa; federation-synthetic OR (`!reachable && arch_pubkey_hex.is_some()`) → federation.
|
||||
`send_typed_wire_via_federation` (mod.rs:1124): FIPS first w/ `.fips_timeout(8s)`, Tor fallback.
|
||||
3. **`.120`→`.89` "non-delivery" diagnosed — it is NOT a delivery failure.** `.120` sends to .89's
|
||||
federation contact_id `3027572739`, logs `Federation envelope delivered transport=tor` (gated on
|
||||
HTTP 2xx, mod.rs:1185). The receiver returns 2xx ONLY after ed25519-verify + successful
|
||||
`inject_typed_from_federation` (node_message.rs:217-263). Identity matches (.89 pubkey 031875b4…).
|
||||
`.89`→`.120` works. So .120's messages ARE injected into .89's state under contact_id
|
||||
`2679725907` = federation_peer_contact_id(.120 pubkey 535fb91f…), name "Archy-X250-EXP".
|
||||
It's a **duplicate-contact SURFACING** problem (user confirmed doubles).
|
||||
|
||||
## TODO (resume here)
|
||||
### #12 Fix duplicate mesh contacts ← user chose this NEXT
|
||||
Root cause: `handle_mesh_contacts_list` (api/rpc/mesh/typed_messages.rs:1126) and
|
||||
`handle_conversations_list` (api/rpc/mesh/status.rs:89) emit **one row per `state.peers` entry** with
|
||||
**no cross-transport dedup**. A node can have TWO peers: a radio peer (low contact_id, firmware key)
|
||||
and a federation peer (high contact_id ≥ 0x8000_0000, archipelago key). `bind_federation_twins`
|
||||
(mesh/mod.rs:85) correlates them by exact advert_name and copies `arch_pubkey_hex` onto the radio
|
||||
twin, but LEAVES BOTH ROWS. Messages are keyed by `peer_contact_id` (split across the two ids), so
|
||||
the federation-injected messages sit on the federation row while the user may open the radio row → empty.
|
||||
|
||||
**Design constraint (important):** the two twins have DIFFERENT routing. Collapsing must NOT break
|
||||
"mesh-first": the canonical SEND contact_id should be the RADIO twin when one exists (so send_typed_wire
|
||||
routes LoRa-if-reachable, else federation via the bound arch key), else the federation id. The merged
|
||||
THREAD must union messages from ALL twin contact_ids (group by `arch_pubkey_hex`). Apply the dedup in:
|
||||
- `handle_conversations_list` (status.rs:89) — one conversation per identity group; last msg = newest across twins.
|
||||
- `handle_mesh_contacts_list` (typed_messages.rs:1126).
|
||||
- `handle_conversations_messages` (status.rs ~146) — when asked for a contact_id, resolve its group's
|
||||
twin ids and filter messages by ANY of them.
|
||||
Add a shared helper (e.g. group peers by `arch_pubkey_hex` when Some, else singleton by contact_id).
|
||||
Do NOT merge/re-key at `bind_federation_twins` time — that would force federation routing and break mesh-first.
|
||||
MeshPeer struct: mesh/types.rs:28 (fields: contact_id, advert_name, did, pubkey_hex, arch_pubkey_hex, reachable…).
|
||||
|
||||
**Before testing #12:** update `.89` to the current build (it's on stale 7c17a96), then re-check whether
|
||||
.120 ("Archy-X250-EXP") shows once with its messages. NB: .89 had 0 journal mentions of "Archy-X250-EXP"
|
||||
and no radio contact for .120 — so its specific double may be a stale-binary artifact; confirm on fresh build.
|
||||
|
||||
### #10 Netbird logout race
|
||||
Symptom: right after install netbird shows logged-in but can't log out; self-corrects after a while.
|
||||
Map: install `stacks.rs install_netbird_stack` (~1760-1918): 3 containers (netbird-server :8086, dashboard,
|
||||
nginx proxy :8087→443 self-signed TLS). `wait_for_stack_containers` waits for "running", NOT OIDC-ready.
|
||||
Dashboard is netbird's own SPA, opened in a NEW TAB (appLauncher.ts ~52-60, secure-context/crypto.subtle).
|
||||
Hypothesis: startup race — dashboard loads before netbird-server's OIDC provider is ready, caches a bad auth
|
||||
state; logout endpoint not ready. Likely fix: gate install completion / launch on netbird-server OIDC
|
||||
readiness (poll an endpoint) rather than container "running". Repro on `.89` (has netbird running).
|
||||
Prior note: AccountInfoSection.vue ~602 release note claims a previous unified-origin fix for the 404
|
||||
logout/login loop — the initial-state race remains.
|
||||
|
||||
## Mesh parity directive
|
||||
MeshCore "works great"; Meshtastic must reach the SAME parity (rename done; duplicate-contact + routing
|
||||
fallback shared across both). Meshtastic↔MeshCore are INCOMPATIBLE over-the-air, so cross-protocol
|
||||
federated peers (.120↔.89) rely entirely on the FIPS/Tor fallback.
|
||||
Loading…
x
Reference in New Issue
Block a user