2 Commits

Author SHA1 Message Date
archipelago
3cea7dd6c5 test(phase3): fix Phase-3 quadlet gates — define fail(), drop stale Notify=healthy assert
Two Phase-3 bats suites used `fail` (a bats-assert helper) but bats-assert
isn't installed on the alpha fleet (only bats-core), so every tripped
assertion crashed with `fail: command not found` (status 127) instead of
reporting a real pass/fail. Define the same minimal `fail() { echo ...;
return 1; }` the other suites already use (see mempool.bats). Without this
the gates were silently non-functional.

Also rewrite the obsolete "HealthCmd= implies Notify=healthy" assertion in
use-quadlet-backends-install.bats. Phase 3.4's Notify=healthy was
deliberately reverted: gating `systemctl start` on health hung boot
reconciliation for dependency-waiting apps (fedimint idles until Bitcoin
IBD; lnd until macaroon unlock), leaving units stuck "deactivating". The
renderer now emits HealthCmd= for Podman's health state but TimeoutStartSec=0
and NO Notify=healthy (quadlet.rs render() + contains_stale_health_gate()).
The test now asserts the current invariant: no backend unit gates start on
health.

Verified on the .228 canary node (ARCHIPELAGO_USE_QUADLET_BACKENDS=1):
use-quadlet-backends-install 6/6, backend-survives-archipelago-restart 3/3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 16:09:05 -04:00
archipelago
01f416ae5d test(lifecycle): regression gate for FM3 cgroup-cascade SIGKILL
Sister suite to companion-survives-archipelago-restart.bats. That one
tests the same property for UI companions, which already ship via
Quadlet (commit 6e716f68) and so already pass.

This new suite tests the property for backend containers (bitcoin-knots
/ bitcoin-core / lnd / electrumx). Until v1.7.52 Phase 3 ships these
under Quadlet too, the suite is EXPECTED TO FAIL on fleet boxes — it's
the executable definition of "FM3 fixed".

Observed live on .198 on 2026-05-01: `sudo systemctl stop archipelago`
killed every container in archipelago.service's cgroup. The dedicated
"backends survive archipelago restart" test catches exactly that, and
also verifies the SAME container instance survives (compares pre/post
.Id), so an orchestrator that recreates a fresh container after the
SIGKILL doesn't read as pass.

Three @test cases:
* destructive gate (skip-marker for the suite)
* baseline: at least one backend installed + running
* backends survive: same .Id pre + post archipelago restart

Don't gate releases on this passing until Phase 3 lands; before then
treat it as a "expected to fail / shows progress" indicator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 16:17:27 -04:00