release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet
Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 +
v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500)
with no recovery path short of SSH. This release adds a self-check
guardrail to the update flow.
What changed:
- apply_update() writes a pending-verify marker with old+new version and
a 150s deadline immediately before scheduling the service restart.
- verify_pending_update() runs from main.rs startup. If the marker is
present and within its freshness window, the new binary waits 15s for
nginx + backend to settle, then probes https://127.0.0.1/ every 5s for
up to 90s (self-signed certs accepted).
- On any probe success within the window, the marker is cleared and
nothing else happens.
- On window-exhaust, the new binary:
1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts>
(quarantined, not deleted, so we can post-mortem).
2. Restores web-ui.bak on top of web-ui.
3. Calls rollback_update() to restore the previous binary.
4. Updates state.current_version to reflect the rollback.
5. systemctl --no-block restart archipelago so the OLD binary boots.
- Markers older than 10 minutes are treated as stale and cleared without
probing, so a crashed-during-startup marker from weeks ago cannot
spontaneously roll back a healthy node on a later reboot.
- rollback_update() binary copy now goes through host_sudo instead of
tokio::fs::copy, so it escapes the service's ProtectSystem=strict
mount namespace. Without this, the rollback silently failed with
EROFS on /usr/local/bin and orphaned the rollback - the exact
opposite of what auto-rollback is for.
Tests: 4 new unit tests in update::tests covering marker round-trip,
absent-marker noop, no-panic on verify_pending_update with nothing to
verify, and an invariant assert that the 90s probe window stays below
the 600s stale threshold. All passing.
Side fix: scripts/create-release-manifest.sh was dying with exit 141
(SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail.
Replaced with a single awk NR==1 that doesn't short-circuit the upstream
pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
|
|
|
import{a as h,P as m,b as d,c as r,e as t,t as s,F as u,g as p,Q as x,h as f}from"./index-Lh5NfTCq.js";const g={class:"rounded-lg bg-white/[0.03] border border-white/5 p-2.5 mb-1"},b={class:"flex items-center gap-1.5 mb-1"},w={class:"w-5 h-5 rounded-full shrink-0 flex items-center justify-center text-xs font-bold bg-purple-500/20 text-purple-400"},y={class:"text-xs font-semibold text-white/70"},v={class:"text-xs ml-auto text-white/20"},k={class:"text-xs text-white/60 leading-relaxed whitespace-pre-wrap"},T=h({__name:"ThreadNode",props:{node:{},depth:{}},emits:["reply"],setup(e){function i(o){return new Date(o*1e3).toLocaleTimeString("en",{hour:"2-digit",minute:"2-digit"})}return(o,n)=>{const l=m("ThreadNode",!0);return d(),r("div",{style:f({paddingLeft:`${Math.min(e.depth,4)*16}px`})},[t("div",g,[t("div",b,[t("div",w,s(e.node.note.authorName?.charAt(0)?.toUpperCase()??"?"),1),t("span",y,s(e.node.note.authorName??"anon"),1),t("span",v,s(i(e.node.note.created_at)),1)]),t("p",k,s(e.node.note.content),1),t("button",{class:"text-xs text-white/25 hover:text-accent/60 mt-1 transition-colors",onClick:n[0]||(n[0]=a=>o.$emit("reply",e.node.note))}," Reply ")]),(d(!0),r(u,null,p(e.node.children,a=>(d(),x(l,{key:a.note.id,node:a,depth:e.depth+1,onReply:n[1]||(n[1]=c=>o.$emit("reply",c))},null,8,["node","depth"]))),128))],4)}}});export{T as default};
|