[Bug] Reconciler trusts podman 'Up' for conmon-died containers (netbird up-but-not-serving) #53

Closed
opened 2026-06-17 20:34:02 +00:00 by lfg2025 · 0 comments
Owner

Observed live on .198 (#15): both netbird-server and netbird-dashboard were in State.Error = "conmon died without writing exit file"podman ps reported them "Up 29 hours" while they were actually dead and refusing all connections. The proxy then 502'd and the dashboard showed "Unauthenticated", with no automatic recovery.

The reconciler trusts podman ps "Up" status and didn't detect a container that is up-but-not-serving. Ask: treat a conmon died / connection-refused upstream as unhealthy and recreate the container (a plain restart did not recover it on .198; a rm -f + recreate did). Relates to the container-lifecycle failure modes already tracked.

Observed live on .198 (#15): both `netbird-server` and `netbird-dashboard` were in `State.Error = "conmon died without writing exit file"` — `podman ps` reported them **"Up 29 hours"** while they were actually dead and refusing all connections. The proxy then 502'd and the dashboard showed "Unauthenticated", with no automatic recovery. The reconciler trusts `podman ps` "Up" status and didn't detect a container that is up-but-not-serving. **Ask:** treat a `conmon died` / connection-refused upstream as unhealthy and recreate the container (a plain `restart` did not recover it on .198; a `rm -f` + recreate did). Relates to the container-lifecycle failure modes already tracked.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: lfg2025/archy#53
No description provided.