archy/scripts/archipelago-wg

59 lines
2.0 KiB
Plaintext
Raw Normal View History

release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
#!/bin/bash
# archipelago-wg — Privileged WireGuard helper for the Archipelago backend.
# Installed to /usr/local/bin/archipelago-wg with a sudoers rule so the
# unprivileged archipelago/debian service user can manage wg0 without
# full root or disabling NoNewPrivileges.
#
# Usage:
# archipelago-wg setup <privkey-file> — Create wg0 interface
# archipelago-wg add-peer <pubkey> <ip> — Add peer to wg0
# archipelago-wg remove-peer <pubkey> — Remove peer from wg0
set -euo pipefail
case "${1:-}" in
setup)
KEY_FILE="${2:?Usage: archipelago-wg setup <privkey-file>}"
[ -f "$KEY_FILE" ] || { echo "Key file not found: $KEY_FILE" >&2; exit 1; }
# Ensure kernel module is loaded
modprobe wireguard 2>/dev/null || true
# Create interface
ip link add dev wg0 type wireguard 2>/dev/null || true
wg set wg0 listen-port 51820 private-key "$KEY_FILE"
# Assign server address if not already set
ip address show dev wg0 | grep -q "10.44.0.1" || ip address add 10.44.0.1/16 dev wg0
ip link set up dev wg0
# NAT masquerade for VPN clients
iptables -t nat -C POSTROUTING -s 10.44.0.0/16 ! -o wg0 -j MASQUERADE 2>/dev/null ||
iptables -t nat -A POSTROUTING -s 10.44.0.0/16 ! -o wg0 -j MASQUERADE
# Open firewall port
if command -v ufw >/dev/null 2>&1 && ufw status | grep -q "Status: active"; then
ufw allow 51820/udp >/dev/null 2>&1 || true
fi
echo "wg0 configured"
;;
add-peer)
PUBKEY="${2:?Usage: archipelago-wg add-peer <pubkey> <allowed-ip>}"
ALLOWED_IP="${3:?Usage: archipelago-wg add-peer <pubkey> <allowed-ip>}"
wg set wg0 peer "$PUBKEY" allowed-ips "$ALLOWED_IP"
echo "peer added"
;;
remove-peer)
PUBKEY="${2:?Usage: archipelago-wg remove-peer <pubkey>}"
wg set wg0 peer "$PUBKEY" remove
echo "peer removed"
;;
*)
echo "Usage: archipelago-wg {setup|add-peer|remove-peer}" >&2
exit 1
;;
esac