archy/scripts/tor-helper.sh

153 lines
4.8 KiB
Bash
Raw Normal View History

release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
#!/bin/bash
# tor-helper.sh — Privileged Tor operations for the Archipelago backend.
# Runs as root via systemd (archipelago-tor-helper.service), triggered by
# a path unit watching /var/lib/archipelago/tor-config/tor-action.
#
# The backend writes a JSON action file, the path unit triggers this script.
# This avoids calling sudo from within a NoNewPrivileges=yes service.
set -euo pipefail
ACTION_FILE="/var/lib/archipelago/tor-config/tor-action"
TORRC_STAGED="/var/lib/archipelago/tor-config/torrc.staged"
RESULT_FILE="/var/lib/archipelago/tor-config/tor-result"
HOSTNAMES_DIR="/var/lib/archipelago/tor-hostnames"
log() { echo "[tor-helper] $*"; }
write_result() {
echo "$1" > "$RESULT_FILE"
chown archipelago:archipelago "$RESULT_FILE" 2>/dev/null || true
}
sync_hostnames() {
mkdir -p "$HOSTNAMES_DIR"
# Clear stale copies first
rm -f "$HOSTNAMES_DIR"/* 2>/dev/null || true
# Prefer /var/lib/tor (system Tor, authoritative) over /var/lib/archipelago/tor
# Only copy from secondary if not already found in primary
for base in /var/lib/tor /var/lib/archipelago/tor; do
for dir in "$base"/hidden_service_*; do
[ -d "$dir" ] || continue
svc=$(basename "$dir" | sed 's/^hidden_service_//')
echo "$svc" | grep -q '_old_' && continue
# Skip if already synced from a higher-priority location
[ -f "${HOSTNAMES_DIR}/${svc}" ] && continue
if [ -f "$dir/hostname" ]; then
cp "$dir/hostname" "${HOSTNAMES_DIR}/${svc}"
log "Synced hostname: $svc ($base)"
fi
done
done
chown -R archipelago:archipelago "$HOSTNAMES_DIR" 2>/dev/null || true
}
# ─── Main ─────────────────────────────────────────────────────────
if [ ! -f "$ACTION_FILE" ]; then
log "No action file found"
exit 0
fi
ACTION=$(cat "$ACTION_FILE")
rm -f "$ACTION_FILE"
ACTION_TYPE=$(echo "$ACTION" | python3 -c "import sys,json; print(json.load(sys.stdin).get('action',''))" 2>/dev/null || echo "")
case "$ACTION_TYPE" in
write-torrc-and-restart)
if [ ! -f "$TORRC_STAGED" ]; then
log "ERROR: No staged torrc at $TORRC_STAGED"
write_result '{"ok":false,"error":"No staged torrc"}'
exit 1
fi
cp "$TORRC_STAGED" /etc/tor/torrc
chown debian-tor:debian-tor /etc/tor/torrc 2>/dev/null || true
log "torrc updated from staged file"
systemctl restart tor
log "Tor restarted"
# Wait for SOCKS port
for i in $(seq 1 30); do
if timeout 1 bash -c 'echo > /dev/tcp/127.0.0.1/9050' 2>/dev/null; then
break
fi
sleep 1
done
sync_hostnames
write_result '{"ok":true}'
;;
restart)
systemctl restart tor
log "Tor restarted"
sleep 3
sync_hostnames
write_result '{"ok":true}'
;;
delete-service)
NAME=$(echo "$ACTION" | python3 -c "import sys,json; print(json.load(sys.stdin).get('name',''))" 2>/dev/null || echo "")
if [ -z "$NAME" ]; then
write_result '{"ok":false,"error":"Missing service name"}'
exit 1
fi
if ! echo "$NAME" | grep -qE '^[a-zA-Z0-9_-]+$'; then
write_result '{"ok":false,"error":"Invalid service name"}'
exit 1
fi
rm -rf "/var/lib/tor/hidden_service_${NAME}" 2>/dev/null || true
rm -rf "/var/lib/archipelago/tor/hidden_service_${NAME}" 2>/dev/null || true
rm -f "${HOSTNAMES_DIR}/${NAME}" 2>/dev/null || true
log "Deleted hidden service: $NAME"
write_result '{"ok":true}'
;;
rename-service)
NAME=$(echo "$ACTION" | python3 -c "import sys,json; print(json.load(sys.stdin).get('name',''))" 2>/dev/null || echo "")
TIMESTAMP=$(echo "$ACTION" | python3 -c "import sys,json; print(json.load(sys.stdin).get('timestamp',''))" 2>/dev/null || echo "")
if [ -z "$NAME" ] || [ -z "$TIMESTAMP" ]; then
write_result '{"ok":false,"error":"Missing service name or timestamp"}'
exit 1
fi
if ! echo "$NAME" | grep -qE '^[a-zA-Z0-9_-]+$'; then
write_result '{"ok":false,"error":"Invalid service name"}'
exit 1
fi
if ! echo "$TIMESTAMP" | grep -qE '^[0-9]+$'; then
write_result '{"ok":false,"error":"Invalid timestamp"}'
exit 1
fi
OLD_SUFFIX="${NAME}_old_${TIMESTAMP}"
for base in /var/lib/tor /var/lib/archipelago/tor; do
SRC="${base}/hidden_service_${NAME}"
DST="${base}/hidden_service_${OLD_SUFFIX}"
if [ -d "$SRC" ]; then
mv "$SRC" "$DST"
log "Renamed $SRC -> $DST"
fi
done
rm -f "${HOSTNAMES_DIR}/${NAME}" 2>/dev/null || true
write_result '{"ok":true}'
;;
sync-hostnames)
sync_hostnames
write_result '{"ok":true}'
;;
reboot)
write_result '{"ok":true}'
log "System reboot initiated"
sleep 1
systemctl reboot
;;
*)
log "Unknown action: $ACTION_TYPE"
write_result '{"ok":false,"error":"Unknown action"}'
exit 1
;;
esac