archy/scripts/trust-archipelago-cert.sh

74 lines
2.9 KiB
Bash
Raw Normal View History

release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
#!/bin/bash
#
# Trust the Archipelago server's self-signed certificate on macOS.
# Run this to eliminate "Not secure" when accessing https://192.168.1.228
#
# Usage: ./scripts/trust-archipelago-cert.sh [host]
# Default host: 192.168.1.228
#
# Requires: SSH access to archipelago@host (uses deploy-config.sh password)
#
set -e
HOST="${1:-192.168.1.228}"
CERT_FILE="/tmp/archipelago-${HOST}.crt"
KEYCHAIN="${HOME}/Library/Keychains/login.keychain-db"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
# Try to fetch cert from server via SSH (most reliable)
SSH_KEY="${ARCHIPELAGO_SSH_KEY:-$HOME/.ssh/archipelago-deploy}"
echo "Fetching certificate from server..."
if [ -f "$SSH_KEY" ]; then
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" archipelago@${HOST} \
'sudo -n cat /etc/archipelago/ssl/archipelago.crt' > "$CERT_FILE" 2>/dev/null || true
elif [ -f "$SCRIPT_DIR/deploy-config.sh" ]; then
# Last-resort fallback: password auth (leaks credentials to process list)
. "$SCRIPT_DIR/deploy-config.sh"
echo "WARNING: SSH key not found at $SSH_KEY — falling back to password auth"
if command -v sshpass >/dev/null 2>&1; then
sshpass -p "$ARCHIPELAGO_PASSWORD" ssh -o StrictHostKeyChecking=no archipelago@${HOST} \
'sudo -n cat /etc/archipelago/ssl/archipelago.crt' > "$CERT_FILE" 2>/dev/null || true
else
echo "WARNING: No SSH key and sshpass not installed — skipping SSH fetch"
fi
fi
# Fallback: fetch via openssl (can hang on some systems)
if [ ! -s "$CERT_FILE" ]; then
echo "Fetching certificate via TLS..."
(echo "Q"; sleep 1) | openssl s_client -connect "${HOST}:443" -servername "${HOST}" 2>/dev/null | \
openssl x509 -outform PEM > "$CERT_FILE"
fi
if [ ! -s "$CERT_FILE" ]; then
echo "Failed to fetch certificate. Ensure deploy-config.sh exists and SSH works, or the server is reachable."
exit 1
fi
echo "Adding to your login keychain..."
# Remove old cert if present (by common name)
security delete-certificate -c "archipelago.local" "$KEYCHAIN" 2>/dev/null || true
# Add to user keychain with trust (no sudo needed)
if security add-trusted-cert -d -r trustRoot -k "$KEYCHAIN" "$CERT_FILE" 2>/dev/null; then
echo " Certificate trusted successfully."
elif security add-trusted-cert -d -r trustAsRoot -k "$KEYCHAIN" "$CERT_FILE" 2>/dev/null; then
echo " Certificate trusted successfully."
else
# Fallback: add cert and open Keychain Access for manual trust
cp "$CERT_FILE" "$HOME/Desktop/archipelago-${HOST}.crt"
echo ""
echo " Could not auto-trust. Certificate saved to Desktop."
echo " Double-click archipelago-${HOST}.crt to add it, then in Keychain Access"
echo " find it, double-click, expand Trust → set to 'Always Trust'."
CERT_FILE="" # Don't delete, we copied to Desktop
fi
rm -f "$CERT_FILE"
echo ""
echo "✅ Done. Restart your browser fully (quit Chrome/Safari) and visit https://${HOST}"