[Bug] B8: netbird app doesn't work (LOW PRIORITY) #15
Loading…
x
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
netbird app still doesn't work. Low priority / much later.
Tracked in repo: tests/production-quality/TRACKER.md
Code-side review done; full diagnosis still needs
podman logsfrom a node (LOW PRIORITY, kept open).NetBird installs as a 3-container stack (
stacks.rs):netbird-server(management/signal/relay),netbird-dashboard, and an nginx proxy on :8087. Config + secrets are generated,exposedAddress/AUTH_AUTHORITYare set to the detected host IP, and the nginx routes/api,/oauth2,/signalexchange,/managementcorrectly.Findings without a live node:
NET_ADMIN//dev/net/tun— but that's expected; those are needed by the NetBird client agent (not installed here), not the management/dashboard, so that's likely not the failure.AUTH_AUTHORITY=.../oauth2) depends on the management service's embedded IDP being reachable/seeded; if that's not coming up, login fails and the app "doesn't work".Next diagnostic step (needs a reachable node):
podman logs netbird-server,netbird-dashboard, and the nginx proxy after install, and confirm all three are actually running. Holding off on speculative code changes that can't be verified live.Diagnosed live on .198 and fixed in code — netbird now works on the LAN IP.
Three distinct problems were stacked here:
nginx cached a container IP → permanent 502. The proxy used
proxy_pass http://netbird-dashboard:80(literal name), so nginx resolved it once at startup and kept hitting a dead IP after the container's IP changed on restart/reboot. Fix: point nginxresolverat the netbird-net gateway (Podman's aardvark DNS) and use variable upstreams so it re-resolves at request time. (Note:resolver local=onand the${NGINX_LOCAL_RESOLVERS}template var both proved unsupported on this image — the explicit gateway is what works; the install now reads it back frompodman network inspect.)Dashboard/server hit
conmon died without writing exit file— podman reported them "Up 29h" while they were actually dead and refusing connections. A clean recreate fixed it. (This was largely caused by my manualpodmancommands over SSH interfering with the archipelago-managed stack; linger is enabled, so archipelago's own containers are stable. There is still a general lifecycle gap — a non-serving container should be detected/recreated by the reconciler — tracked separately.)Auth failed with CORS / "Unauthenticated".
detect_netbird_public_host_ip()baked the Tailscale 100.x address as the dashboard's API/OIDC origin, so loading the UI via the LAN IP made every/api+/oauth2fetch cross-origin → browser-blocked. Fix: prefer the LAN IP as the canonical origin; add CORS (reflect Origin + preflight) on/api+/oauth2; and list the LAN origin in the OIDC redirect URIs too, so any access IP works.Validated live on .198 (config regenerated with the LAN origin + containers recreated): dashboard
200,/api/instance200, OIDC discovery200, and the dashboard JS is now baked tohttp://192.168.1.198:8087.cargo check+vue-tscpass.Rollout: these are in
stacks.rsconfig generation, so a binary update + netbird reinstall regenerates the correct config on each node. Theconmon-died lifecycle hardening is broader and is filed as a follow-up.Follow-up — final root cause + end-to-end fix confirmed.
After the resolver/CORS/LAN-origin fixes, login still failed with "Unauthenticated". Deeper diagnosis on .198 showed
/oauth2/.well-known/openid-configurationreturning 502 with nginx loggingnetbird-server could not be resolved (3: Host not found)— the server container wasRunning=truebut had no IP on netbird-net (serverIP=empty), so Podman's aardvark DNS had no record. netbird-server runs an embedded Dex IdP at{origin}/oauth2; with the server unresolvable, the dashboard couldn't fetch the OIDC discovery → "Unauthenticated".Cause: corrupted netavark/aardvark network state (from repeated manual container churn while debugging). Fix = a clean network + ordered container creation — which is exactly what a fresh
package.installdoes. Validated by tearing down and recreating cleanly on .198: all three containers got IPs (10.89.3.2/3/4) and dashboard / api / oauth2 all return 200, stable across repeated polls.Net for all nodes: a fresh install via the fixed binary (LAN-IP origin + gateway-derived nginx
resolver+ variable upstreams + CORS + multi-origin redirect URIs + explicit dashboard network-alias) creates the network cleanly and brings netbird up working. Existing/manually-churned nodes need a netbird reinstall (clean network) to recover.cargo checkpasses.After updating a node, hard-refresh the browser — the dashboard bakes its API origin into JS at container start, so a cached bundle pointing at the old origin must be reloaded.
Real final root cause of "Unauthenticated": a stale netbird store.
After the resolver/CORS/LAN-origin fixes the infra was fully healthy (all endpoints 200, OIDC discovery valid), yet the dashboard still showed "Unauthenticated" instantly with no login page. Server logs showed it only ever hit
/api/instance→setup_required: false, whilesingle account mode … accounts number 0. Contradiction: the store said "already set up" but had zero accounts, so the dashboard tried to auth, found no session, and errored without ever redirecting to the embedded Dex login.Cause: the netbird data dir on .198 held a
store.dbdated May 20 — initialized by an earlier install under the old (Tailscale) issuer. Reconfiguring to the LAN origin while reusing that store left it in a half-initialized state.Fix: wiped the store (kept the GeoLite DBs) and let netbird re-init.
/api/instancenow correctly returns{"setup_required": true}→ the dashboard shows the create-admin/getting-started page. netbird is working on .198.Rollout implication: a fresh
package.installcreates a fresh data dir, so new nodes are fine. Existing nodes that already had netbird installed need a clean reinstall (uninstall WITHOUT preserving data, then install) so the store re-initializes under the LAN-IP issuer — otherwise the stale store keeps them in this broken state. Worth considering whether netbird uninstall should always wipe its data dir to make this automatic.THE root cause: netbird's dashboard requires a secure context (HTTPS).
Browser console on the failing login showed
Uncaught Error: window.crypto.subtle is unavailable.window.crypto.subtle(which react-oidc uses for OIDC PKCE) is only exposed in a secure context — HTTPS or localhost. Over plainhttp://<LAN-IP>:8087it'sundefined, so the dashboard's auth init threw before it ever redirected to login — which is why we saw "Unauthenticated" with dead buttons and no/oauth2/authrequest. All the earlier fixes (nginx resolver, LAN-origin,/nb-authSPA fallback, conmon-died recreate, fresh store) were real and necessary, but HTTPS was the missing foundation.Shipped (option A — code complete, compiles, validated live on .198):
stacks.rs: proxy now terminates TLS (self-signed cert generated at install via openssl, SAN = LAN IP + 127.0.0.1 + localhost),listen 443 ssl, published8087:443; all origins (exposedAddress / issuer / dashboard endpoints / redirect URIs) are nowhttps://.appLauncher.ts: netbird added toNEW_TAB_APP_IDSand served viahttps://(a self-signed-HTTPS iframe is blocked — you can't accept a cert warning inside a frame), so it opens in a real tab where the user accepts the cert once.Validated on .198:
https://192.168.1.198:8087loads, registration + login work.Caveats / follow-ups: