119 lines
9.5 KiB
Markdown
119 lines
9.5 KiB
Markdown
# Handoff — Mesh device rename, mesh routing, duplicate contacts, netbird logout (2026-06-20)
|
|
|
|
Session is a **test-build iteration toward the 1.8.0 bug-bash release** — sideload patched binaries
|
|
to test nodes, NO version bump / NO OTA release (manifest stays `1.7.99-alpha`). Because the version
|
|
string never changes, **verify a deploy by sha256-matching the deployed binary**, not by `current_version`.
|
|
|
|
## Test node roster (creds in the operator's local notes / agent memory — NOT in this repo)
|
|
- `.116` 192.168.1.116 — this build host (archi-thinkpad), dev/validation.
|
|
- `.198` 192.168.1.198, `.228` 192.168.1.228 — LAN resilience nodes.
|
|
- `.5` Tailscale 100.72.136.5 (archy-x250-beta) — **Meshtastic radio**.
|
|
- `.120` Tailscale 100.66.157.120 (archy-x250-exp) — **Meshtastic radio**.
|
|
- `.89` Tailscale 100.89.209.89 (archy-x250-pa) — **dual radio**: ttyACM0 Meshtastic (probe FAILS),
|
|
ttyUSB0 MeshCore (active). Configured device_path = ttyACM0. Runs netbird (v2.38.0).
|
|
|
|
Deploy driver used this session: `/tmp/archy-deploy/deploy-node.sh <user@host> <pw> <label>`
|
|
(scp binary + stream `web/dist/neode-ui` + sudo swap `/usr/local/bin/archipelago`, preserve aiui +
|
|
claude-login.html, chown 1000:1000, restart, verify sha256+health). Recreate from this doc if /tmp is gone.
|
|
|
|
## Deploy state (binary sha) at handoff
|
|
- `b5183dfc…` (HEAD d00d1b20, includes Meshtastic rename) → on **.5 and .120** (verified).
|
|
- `f702b4f1…` (the 3 wallet/mesh/ui fixes, pre-rename) → on **.116, .198, .228**.
|
|
- `7c17a96…` (OLD, pre-f702b4f1) → **.89 is STALE** — update before re-testing .120→.89.
|
|
|
|
## DONE
|
|
1. **Meshtastic device rename → server name** — committed `d00d1b20` (pushed to gitea-vps2/main).
|
|
`meshtastic.rs set_advert_name` was a no-op (in-memory only). Now sends
|
|
`AdminMessage{set_owner=User{long_name,short_name}}` to the local node on ADMIN_APP port (6),
|
|
set_owner field = 32. long_name = server name (≤39), short_name = first 4 alphanumerics upper-cased.
|
|
**Hardware-verified**: .120 radio now reads back `Archy-X250-EXP`, .5 reads back `Archy-X250-Beta`.
|
|
MeshCore already renamed (CMD_SET_ADVERT_NAME, serial.rs:147) — unchanged, now at parity.
|
|
2. **Routing priority confirmed = Mesh → FIPS → Tor**. `send_typed_wire` (mesh/mod.rs:1007): reachable
|
|
radio peer → LoRa; federation-synthetic OR (`!reachable && arch_pubkey_hex.is_some()`) → federation.
|
|
`send_typed_wire_via_federation` (mod.rs:1124): FIPS first w/ `.fips_timeout(8s)`, Tor fallback.
|
|
3. **`.120`→`.89` "non-delivery" diagnosed — it is NOT a delivery failure.** `.120` sends to .89's
|
|
federation contact_id `3027572739`, logs `Federation envelope delivered transport=tor` (gated on
|
|
HTTP 2xx, mod.rs:1185). The receiver returns 2xx ONLY after ed25519-verify + successful
|
|
`inject_typed_from_federation` (node_message.rs:217-263). Identity matches (.89 pubkey 031875b4…).
|
|
`.89`→`.120` works. So .120's messages ARE injected into .89's state under contact_id
|
|
`2679725907` = federation_peer_contact_id(.120 pubkey 535fb91f…), name "Archy-X250-EXP".
|
|
It's a **duplicate-contact SURFACING** problem (user confirmed doubles).
|
|
|
|
## SESSION 2 PROGRESS (2026-06-20, code-complete — NOT yet deployed; user held deploy)
|
|
All committed to local `main`; NOT pushed to gitea-vps2/origin yet, NOT sideloaded.
|
|
- **#12 dup contacts DONE** (`f92e442b`, +3 unit tests pass). Backend `group_peer_twins()`
|
|
helper (mesh/mod.rs) dedups by `arch_pubkey_hex`, radio twin = canonical send id, unions
|
|
messages; wired into conversations.list/messages + mesh.contacts-list. **KEY FINDING:**
|
|
conversations.list/messages have NO frontend consumer — the live chat list renders the
|
|
*frontend* merge `mergedPeers` (Mesh.vue), which matched twins by the `Archy-z6Mk…` advert
|
|
prefix that the device RENAME broke. Real fix = merge by `arch_pubkey_hex` (now exposed on the
|
|
MeshPeer TS type). Should also clear `.120→.89` and likely **#5** (Arch Mobile on .116, same bug).
|
|
- **Companion crash diagnostic SHIPPED** (`b3633ec5`): main.ts global handler now shows the REAL
|
|
error + keeps a 25-entry `window.__archyErrors` ring buffer + catches async/unhandledrejection.
|
|
Still need to deploy + repro on the optiplex node (read `window.__archyErrors` via chrome://inspect)
|
|
to get the actual throw. User says LAN/mobile-browser fine → Tailscale-WebView-specific.
|
|
- **#3 dual-ecash pay-for-file DONE** (`8f06d88f`, compiles): payer tries Cashu→Fedimint, seller
|
|
accepts both (verify_and_receive_payment: non-"cashu" = reissue_into_any), new
|
|
fedimint_client::spend_from_any(), wallet.ecash-balance reports total_sats. LIVE federation
|
|
validation pending (two nodes sharing a federation).
|
|
- **#2 mobile scroll cutoff DONE** (`a8c668ee`): DashboardMobileNav wrote `--mobile-tab-bar-height:0px`
|
|
when the bar was hidden/unlaid-out, defeating the `,88px` fallback → bar covered last row. Now never
|
|
writes 0 (removes var → fallback), re-measures on rAF + post-WebView-injection. Backup hypothesis if
|
|
it persists: `.dashboard-view` is `min-h-screen`(100vh) → mobile-browser toolbar overlap, switch to dvh.
|
|
|
|
DEPLOYED 2026-06-20 to ALL 6 nodes — binary sha `4a8f2198…` (release build of commit a6957a48 +
|
|
this handoff), FE rebuilt, all sha-verified + service active: .116(local) .198 .228 .89 .5 .120.
|
|
.5/.120 needed a 30-min timeout (slow DERP). #10 netbird OIDC gate also shipped in this build.
|
|
REMAINING VERIFICATION (on real hardware, user-side):
|
|
- #12/#5: open mesh chat on .116 (and .89/.120) — confirm a federated node shows ONCE with its
|
|
messages (no radio/federation double), and that "Arch Mobile" messages now surface.
|
|
- #1 companion crash: open the companion app to the optiplex node over Tailscale, reproduce the
|
|
crash, then read the REAL error from `window.__archyErrors` (chrome://inspect the WebView) or the
|
|
now-detailed toast. That error is what's needed to write the actual fix. Confirm which node = optiplex.
|
|
- #3: pay for a peer file when the buyer's balance is only in Fedimint (needs two nodes in a federation).
|
|
- #2: check Cloud/files bottom rows clear the tab bar on mobile browser.
|
|
Commits are LOCAL on main (f92e442b/b3633ec5/8f06d88f/a8c668ee/a6957a48 + docs) — NOT pushed to
|
|
gitea-vps2/origin (no version bump; bug-bash sideload only).
|
|
|
|
## TODO (original resume — #12 now DONE above)
|
|
### #12 Fix duplicate mesh contacts ← DONE this session (see SESSION 2 PROGRESS)
|
|
Root cause: `handle_mesh_contacts_list` (api/rpc/mesh/typed_messages.rs:1126) and
|
|
`handle_conversations_list` (api/rpc/mesh/status.rs:89) emit **one row per `state.peers` entry** with
|
|
**no cross-transport dedup**. A node can have TWO peers: a radio peer (low contact_id, firmware key)
|
|
and a federation peer (high contact_id ≥ 0x8000_0000, archipelago key). `bind_federation_twins`
|
|
(mesh/mod.rs:85) correlates them by exact advert_name and copies `arch_pubkey_hex` onto the radio
|
|
twin, but LEAVES BOTH ROWS. Messages are keyed by `peer_contact_id` (split across the two ids), so
|
|
the federation-injected messages sit on the federation row while the user may open the radio row → empty.
|
|
|
|
**Design constraint (important):** the two twins have DIFFERENT routing. Collapsing must NOT break
|
|
"mesh-first": the canonical SEND contact_id should be the RADIO twin when one exists (so send_typed_wire
|
|
routes LoRa-if-reachable, else federation via the bound arch key), else the federation id. The merged
|
|
THREAD must union messages from ALL twin contact_ids (group by `arch_pubkey_hex`). Apply the dedup in:
|
|
- `handle_conversations_list` (status.rs:89) — one conversation per identity group; last msg = newest across twins.
|
|
- `handle_mesh_contacts_list` (typed_messages.rs:1126).
|
|
- `handle_conversations_messages` (status.rs ~146) — when asked for a contact_id, resolve its group's
|
|
twin ids and filter messages by ANY of them.
|
|
Add a shared helper (e.g. group peers by `arch_pubkey_hex` when Some, else singleton by contact_id).
|
|
Do NOT merge/re-key at `bind_federation_twins` time — that would force federation routing and break mesh-first.
|
|
MeshPeer struct: mesh/types.rs:28 (fields: contact_id, advert_name, did, pubkey_hex, arch_pubkey_hex, reachable…).
|
|
|
|
**Before testing #12:** update `.89` to the current build (it's on stale 7c17a96), then re-check whether
|
|
.120 ("Archy-X250-EXP") shows once with its messages. NB: .89 had 0 journal mentions of "Archy-X250-EXP"
|
|
and no radio contact for .120 — so its specific double may be a stale-binary artifact; confirm on fresh build.
|
|
|
|
### #10 Netbird logout race
|
|
Symptom: right after install netbird shows logged-in but can't log out; self-corrects after a while.
|
|
Map: install `stacks.rs install_netbird_stack` (~1760-1918): 3 containers (netbird-server :8086, dashboard,
|
|
nginx proxy :8087→443 self-signed TLS). `wait_for_stack_containers` waits for "running", NOT OIDC-ready.
|
|
Dashboard is netbird's own SPA, opened in a NEW TAB (appLauncher.ts ~52-60, secure-context/crypto.subtle).
|
|
Hypothesis: startup race — dashboard loads before netbird-server's OIDC provider is ready, caches a bad auth
|
|
state; logout endpoint not ready. Likely fix: gate install completion / launch on netbird-server OIDC
|
|
readiness (poll an endpoint) rather than container "running". Repro on `.89` (has netbird running).
|
|
Prior note: AccountInfoSection.vue ~602 release note claims a previous unified-origin fix for the 404
|
|
logout/login loop — the initial-state race remains.
|
|
|
|
## Mesh parity directive
|
|
MeshCore "works great"; Meshtastic must reach the SAME parity (rename done; duplicate-contact + routing
|
|
fallback shared across both). Meshtastic↔MeshCore are INCOMPATIBLE over-the-air, so cross-protocol
|
|
federated peers (.120↔.89) rely entirely on the FIPS/Tor fallback.
|