archy/image-recipe/scripts/build-backend.sh

70 lines
1.9 KiB
Bash
Raw Normal View History

release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
#!/bin/bash
# Build Archipelago backend binary for Debian Linux
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
BACKEND_DIR="$PROJECT_ROOT/core/archipelago"
OUTPUT_DIR="$SCRIPT_DIR/../build/backend"
echo "🔨 Building Archipelago backend..."
echo " Source: $BACKEND_DIR"
echo " Output: $OUTPUT_DIR"
echo ""
# Create output directory
mkdir -p "$OUTPUT_DIR"
# Check if we should use Docker
USE_DOCKER=false
if [[ "$OSTYPE" == "darwin"* ]]; then
USE_DOCKER=true
echo "🍎 macOS detected - using Docker for Linux build"
elif ! command -v rustc >/dev/null 2>&1; then
USE_DOCKER=true
echo "⚠️ Rust not found - using Docker"
fi
if [ "$USE_DOCKER" = true ]; then
echo "🐳 Building in Docker container..."
docker run --rm \
-v "$PROJECT_ROOT:/workspace" \
-v "$OUTPUT_DIR:/output" \
-w /workspace/core/archipelago \
rust:trixie \
sh -c '
echo "📦 Installing build dependencies..."
apt-get update && apt-get install -y pkg-config libssl-dev
echo "🔨 Building Archipelago backend..."
cargo build --release
echo "📋 Copying binary to output..."
cp ../target/release/archipelago /output/
echo "✅ Build complete!"
ls -lh /output/archipelago
'
else
# Native Linux build
echo "🐧 Building natively..."
cd "$BACKEND_DIR"
cargo build --release
cp "../target/release/archipelago" "$OUTPUT_DIR/archipelago"
fi
# Strip binary for smaller size
if [ -f "$OUTPUT_DIR/archipelago" ]; then
if command -v strip >/dev/null 2>&1 && [[ "$OSTYPE" != "darwin"* ]]; then
strip "$OUTPUT_DIR/archipelago"
fi
echo ""
echo "✅ Backend built: $OUTPUT_DIR/archipelago"
ls -lh "$OUTPUT_DIR/archipelago"
else
echo "❌ Build failed - binary not found"
exit 1
fi