release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet
Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 +
v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500)
with no recovery path short of SSH. This release adds a self-check
guardrail to the update flow.
What changed:
- apply_update() writes a pending-verify marker with old+new version and
a 150s deadline immediately before scheduling the service restart.
- verify_pending_update() runs from main.rs startup. If the marker is
present and within its freshness window, the new binary waits 15s for
nginx + backend to settle, then probes https://127.0.0.1/ every 5s for
up to 90s (self-signed certs accepted).
- On any probe success within the window, the marker is cleared and
nothing else happens.
- On window-exhaust, the new binary:
1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts>
(quarantined, not deleted, so we can post-mortem).
2. Restores web-ui.bak on top of web-ui.
3. Calls rollback_update() to restore the previous binary.
4. Updates state.current_version to reflect the rollback.
5. systemctl --no-block restart archipelago so the OLD binary boots.
- Markers older than 10 minutes are treated as stale and cleared without
probing, so a crashed-during-startup marker from weeks ago cannot
spontaneously roll back a healthy node on a later reboot.
- rollback_update() binary copy now goes through host_sudo instead of
tokio::fs::copy, so it escapes the service's ProtectSystem=strict
mount namespace. Without this, the rollback silently failed with
EROFS on /usr/local/bin and orphaned the rollback - the exact
opposite of what auto-rollback is for.
Tests: 4 new unit tests in update::tests covering marker round-trip,
absent-marker noop, no-panic on verify_pending_update with nothing to
verify, and an invariant assert that the 90s probe window stays below
the 600s stale threshold. All passing.
Side fix: scripts/create-release-manifest.sh was dying with exit 141
(SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail.
Replaced with a single awk NR==1 that doesn't short-circuit the upstream
pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
|
|
|
#!/bin/bash
|
|
|
|
|
# Container image versions — single source of truth
|
|
|
|
|
# Source this file from all scripts that create containers
|
|
|
|
|
#
|
|
|
|
|
# Usage: source /opt/archipelago/image-versions.sh 2>/dev/null || true
|
|
|
|
|
# source "$(dirname "$0")/image-versions.sh" 2>/dev/null || true
|
|
|
|
|
#
|
|
|
|
|
# Tags MUST match what's actually in the registry at git.tx1138.com/lfg2025/
|
|
|
|
|
# Run: podman images --format '{{.Repository}}:{{.Tag}}' | grep 'git.tx1138' | sort
|
|
|
|
|
# to verify against the registry.
|
|
|
|
|
|
|
|
|
|
# Archipelago app registries (primary + fallback)
|
|
|
|
|
ARCHY_REGISTRY="git.tx1138.com/lfg2025"
|
2026-04-23 08:22:32 -04:00
|
|
|
ARCHY_REGISTRY_FALLBACK="146.59.87.168:3000/lfg2025"
|
release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet
Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 +
v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500)
with no recovery path short of SSH. This release adds a self-check
guardrail to the update flow.
What changed:
- apply_update() writes a pending-verify marker with old+new version and
a 150s deadline immediately before scheduling the service restart.
- verify_pending_update() runs from main.rs startup. If the marker is
present and within its freshness window, the new binary waits 15s for
nginx + backend to settle, then probes https://127.0.0.1/ every 5s for
up to 90s (self-signed certs accepted).
- On any probe success within the window, the marker is cleared and
nothing else happens.
- On window-exhaust, the new binary:
1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts>
(quarantined, not deleted, so we can post-mortem).
2. Restores web-ui.bak on top of web-ui.
3. Calls rollback_update() to restore the previous binary.
4. Updates state.current_version to reflect the rollback.
5. systemctl --no-block restart archipelago so the OLD binary boots.
- Markers older than 10 minutes are treated as stale and cleared without
probing, so a crashed-during-startup marker from weeks ago cannot
spontaneously roll back a healthy node on a later reboot.
- rollback_update() binary copy now goes through host_sudo instead of
tokio::fs::copy, so it escapes the service's ProtectSystem=strict
mount namespace. Without this, the rollback silently failed with
EROFS on /usr/local/bin and orphaned the rollback - the exact
opposite of what auto-rollback is for.
Tests: 4 new unit tests in update::tests covering marker round-trip,
absent-marker noop, no-panic on verify_pending_update with nothing to
verify, and an invariant assert that the 90s probe window stays below
the 600s stale threshold. All passing.
Side fix: scripts/create-release-manifest.sh was dying with exit 141
(SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail.
Replaced with a single awk NR==1 that doesn't short-circuit the upstream
pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
|
|
|
|
|
|
|
|
# Bitcoin stack
|
|
|
|
|
BITCOIN_KNOTS_IMAGE="$ARCHY_REGISTRY/bitcoin-knots:latest"
|
|
|
|
|
LND_IMAGE="$ARCHY_REGISTRY/lnd:v0.18.4-beta"
|
|
|
|
|
ELECTRUMX_IMAGE="$ARCHY_REGISTRY/electrumx:v1.18.0"
|
|
|
|
|
|
|
|
|
|
# Mempool stack
|
|
|
|
|
MEMPOOL_BACKEND_IMAGE="$ARCHY_REGISTRY/mempool-backend:v3.0.0"
|
|
|
|
|
MEMPOOL_WEB_IMAGE="$ARCHY_REGISTRY/mempool-frontend:v3.0.0"
|
|
|
|
|
MARIADB_IMAGE="$ARCHY_REGISTRY/mariadb:11.4.10"
|
|
|
|
|
|
|
|
|
|
# BTCPay
|
|
|
|
|
BTCPAY_IMAGE="$ARCHY_REGISTRY/btcpayserver:1.13.7"
|
|
|
|
|
NBXPLORER_IMAGE="$ARCHY_REGISTRY/nbxplorer:2.6.0"
|
|
|
|
|
POSTGRES_IMAGE="$ARCHY_REGISTRY/postgres:15.17"
|
|
|
|
|
BTCPAY_POSTGRES_IMAGE="$ARCHY_REGISTRY/postgres:15.17"
|
|
|
|
|
|
|
|
|
|
# Apps
|
|
|
|
|
HOMEASSISTANT_IMAGE="$ARCHY_REGISTRY/home-assistant:2024.1"
|
|
|
|
|
GRAFANA_IMAGE="$ARCHY_REGISTRY/grafana:10.2.0"
|
|
|
|
|
UPTIME_KUMA_IMAGE="$ARCHY_REGISTRY/uptime-kuma:1"
|
|
|
|
|
JELLYFIN_IMAGE="$ARCHY_REGISTRY/jellyfin:10.8.13"
|
|
|
|
|
PHOTOPRISM_IMAGE="$ARCHY_REGISTRY/photoprism:240915"
|
|
|
|
|
OLLAMA_IMAGE="$ARCHY_REGISTRY/ollama:latest"
|
|
|
|
|
VAULTWARDEN_IMAGE="$ARCHY_REGISTRY/vaultwarden:1.30.0-alpine"
|
|
|
|
|
NEXTCLOUD_IMAGE="$ARCHY_REGISTRY/nextcloud:29"
|
|
|
|
|
SEARXNG_IMAGE="$ARCHY_REGISTRY/searxng:latest"
|
|
|
|
|
# OnlyOffice removed — incompatible with rootless Podman (internal postgres/rabbitmq fail)
|
|
|
|
|
# Replaced by CryptPad (single Node.js process, e2e encrypted)
|
|
|
|
|
CRYPTPAD_IMAGE="$ARCHY_REGISTRY/cryptpad:2024.12.0"
|
|
|
|
|
FILEBROWSER_IMAGE="$ARCHY_REGISTRY/filebrowser:v2.27.0"
|
|
|
|
|
NPM_IMAGE="$ARCHY_REGISTRY/nginx-proxy-manager:latest"
|
|
|
|
|
PORTAINER_IMAGE="$ARCHY_REGISTRY/portainer:latest"
|
|
|
|
|
|
|
|
|
|
# Networking
|
|
|
|
|
TAILSCALE_IMAGE="$ARCHY_REGISTRY/tailscale:stable"
|
|
|
|
|
ALPINE_TOR_IMAGE="$ARCHY_REGISTRY/alpine-tor:0.4.8.13"
|
|
|
|
|
ADGUARDHOME_IMAGE="$ARCHY_REGISTRY/adguardhome:v0.107.55"
|
|
|
|
|
|
|
|
|
|
# Fedimint
|
|
|
|
|
FEDIMINT_IMAGE="$ARCHY_REGISTRY/fedimintd:v0.10.0"
|
|
|
|
|
FEDIMINT_GATEWAY_IMAGE="$ARCHY_REGISTRY/gatewayd:v0.10.0"
|
|
|
|
|
|
|
|
|
|
# Media
|
|
|
|
|
REDIS_IMAGE="$ARCHY_REGISTRY/redis:7.4.8"
|
|
|
|
|
|
|
|
|
|
# Valkey (general purpose)
|
|
|
|
|
VALKEY_IMAGE="$ARCHY_REGISTRY/valkey:8.1.6"
|
|
|
|
|
|
|
|
|
|
# Nostr
|
|
|
|
|
NOSTR_RS_RELAY_IMAGE="$ARCHY_REGISTRY/nostr-rs-relay:0.9.0"
|
|
|
|
|
STRFRY_IMAGE="$ARCHY_REGISTRY/strfry:1.0.4"
|
|
|
|
|
NOSTR_VPN_IMAGE="$ARCHY_REGISTRY/nostr-vpn:v0.3.7"
|
|
|
|
|
NOSTR_VPN_UI_IMAGE="$ARCHY_REGISTRY/nostr-vpn-ui:latest"
|
|
|
|
|
FIPS_IMAGE="$ARCHY_REGISTRY/fips:v0.1.0"
|
|
|
|
|
FIPS_UI_IMAGE="$ARCHY_REGISTRY/fips-ui:latest"
|
|
|
|
|
|
|
|
|
|
# AI / Routing
|
|
|
|
|
ROUTSTR_IMAGE="$ARCHY_REGISTRY/routstr:v0.4.3"
|
|
|
|
|
|
|
|
|
|
# Community / Gaming
|
|
|
|
|
BOTFIGHTS_IMAGE="$ARCHY_REGISTRY/botfights:1.1.0"
|
|
|
|
|
|
|
|
|
|
# IndeedHub stack
|
|
|
|
|
INDEEDHUB_IMAGE="$ARCHY_REGISTRY/indeedhub:1.0.0"
|
|
|
|
|
INDEEDHUB_API_IMAGE="$ARCHY_REGISTRY/indeedhub-api:1.0.0"
|
|
|
|
|
INDEEDHUB_FFMPEG_IMAGE="$ARCHY_REGISTRY/indeedhub-ffmpeg:1.0.0"
|
|
|
|
|
MINIO_IMAGE="$ARCHY_REGISTRY/minio:RELEASE.2024-11-07T00-52-20Z"
|
|
|
|
|
INDEEDHUB_POSTGRES_IMAGE="$ARCHY_REGISTRY/postgres:16.13-alpine"
|
|
|
|
|
INDEEDHUB_REDIS_IMAGE="$ARCHY_REGISTRY/redis:7.4.8-alpine"
|
|
|
|
|
|
|
|
|
|
# Gitea (Git + Container Registry)
|
|
|
|
|
GITEA_IMAGE="docker.io/gitea/gitea:1.23"
|
|
|
|
|
|
|
|
|
|
# DWN (Decentralized Web Node)
|
|
|
|
|
DWN_SERVER_IMAGE="$ARCHY_REGISTRY/dwn-server:main"
|
|
|
|
|
|
|
|
|
|
# Immich stack
|
|
|
|
|
IMMICH_POSTGRES_IMAGE="$ARCHY_REGISTRY/immich-postgres:14-vectorchord0.4.3-pgvectors0.2.0"
|
|
|
|
|
IMMICH_SERVER_IMAGE="$ARCHY_REGISTRY/immich-server:release"
|
|
|
|
|
|
|
|
|
|
# Penpot stack
|
|
|
|
|
PENPOT_POSTGRES_IMAGE="$ARCHY_REGISTRY/postgres:15"
|
|
|
|
|
PENPOT_VALKEY_IMAGE="$ARCHY_REGISTRY/valkey:8.1"
|
|
|
|
|
PENPOT_BACKEND_IMAGE="$ARCHY_REGISTRY/penpot-backend:2.4"
|
|
|
|
|
PENPOT_EXPORTER_IMAGE="$ARCHY_REGISTRY/penpot-exporter:2.4"
|
|
|
|
|
PENPOT_FRONTEND_IMAGE="$ARCHY_REGISTRY/penpot-frontend:2.4"
|
|
|
|
|
|
|
|
|
|
# Custom UI containers (built from docker/ dirs, pushed to registry)
|
|
|
|
|
# These use :latest because they're internally built and pushed — acceptable for self-hosted images
|
|
|
|
|
BITCOIN_UI_IMAGE="$ARCHY_REGISTRY/bitcoin-ui:latest"
|
|
|
|
|
LND_UI_IMAGE="$ARCHY_REGISTRY/lnd-ui:latest"
|
|
|
|
|
ELECTRS_UI_IMAGE="$ARCHY_REGISTRY/electrs-ui:latest"
|
|
|
|
|
|
|
|
|
|
# Base images
|
|
|
|
|
NGINX_ALPINE_IMAGE="$ARCHY_REGISTRY/nginx:1.27.4-alpine"
|