Per direction: the gate is now 5x green ON .228 only (run on the node, not via RPC). Fleet/multinode verification (.198 + others) moved to a new docs/multinode-testing-plan.md with the bootstrap recipe, per-node preconditions (synced archival bitcoin, no stale nginx proxy targets, no orphan quadlet units), node roster, and cross-node suites. Updated CLAUDE.md, master-plan SS5/SS6/SS8b/WS-E, and TESTING.md release gates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
3.9 KiB
Multinode / Fleet Testing Plan (separate from the single-node gate)
Scope split (2026-06-22): the production test gate (
docs/PRODUCTION-MASTER-PLAN.md§5,tests/lifecycle/TESTING.md) is now a single-node criterion on .228. Verifying the same lifecycle matrix across the rest of the fleet (.198 and the other testers) lives HERE and is run after the .228 single-node gate is green. This is intentionally NOT a blocker on the .228 gate.
Why split it out
The lifecycle gate must be run ON the node under test — its bitcoin/companion/orphan/endpoint
checks use local podman/systemctl/bitcoin-cli/curl, not RPC to a remote host. Running it from
one host against another silently tests the runner. So "multinode" isn't "point the harness at N
hosts" — it's "run the on-node gate on each host," plus the genuinely cross-node concerns (federation,
mesh, transport, sync) that a single node can't exercise.
How to run the gate on another node
Bats + jq usually aren't installed on ISO nodes. Bootstrap (one-time per node):
# from a host that has them (e.g. .116):
dpkg -L bats | grep -E '^/usr/(bin|lib|libexec)' | tar czf /tmp/bats.tgz -P -T - $(which jq)
tar czf /tmp/tests.tgz -C <repo> tests/lifecycle
scp /tmp/bats.tgz /tmp/tests.tgz <node>:/tmp/
# on the node:
sudo tar xzf /tmp/bats.tgz -P -C / # bats (jq here is dynamically linked — may need libs)
sudo curl -fsSL -o /usr/local/bin/jq \
https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 && sudo chmod +x /usr/local/bin/jq
mkdir -p /tmp/lifecycle-run && tar xzf /tmp/tests.tgz -C /tmp/lifecycle-run
cd /tmp/lifecycle-run/tests/lifecycle
ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=https ARCHY_PASSWORD=<node pw> \
ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5 nohup ./run-20x.sh > /tmp/gate.log 2>&1 &
Per-node preconditions (learned on .228)
- Bitcoin must be fully synced + archival (
initialblockdownload:false,pruned:false). test 83 reads the realgetblockchaininfo, not the UI's headers-height. A node mid-IBD will cascade-fail electrumx/lnd/btcpay/mempool even though the apps run. - Backends should be proper installs (in
manifest_ids), not adopted plain-podman left over from ad-hocpackage.start/cascade churn — otherwise companion self-heal and quadlet checks skew. - No stale per-app nginx proxy targets. e.g.
/app/lnd/must point at the lnd-ui port (18083), not a stale8081. Repo code is correct; old node configs may be stale — re-check + regenerate. - No orphan quadlet units (e.g. a
home-assistant.containerwhose ContainerName ≠ the realhomeassistantcontainer) — these wedgesystemctl --user"activating" and fail the quadlet checks.
Node roster (carry-over)
| Node | Role | Notes |
|---|---|---|
| .228 | single-node gate (primary) | 14-app resilience node; bitcoin synced archival; gate GREEN. |
| .198 | fleet verify | was weak/loaded (load ~3–5) + bitcoin mid-IBD at split time → must finish syncing first; sshd wedges under concurrent SSH (use ONE session; gate uses HTTPS RPC so fine). |
| .5 / .120 | x250 testers (Tailscale) | flaky cellular; SSH via tailscale nc ProxyCommand. |
| .116 | dev/validation | local repo; its own bitcoin may be mid-IBD — do NOT treat as a gate target unless synced. |
Cross-node concerns (only a multinode setup can test)
- Federation sync (Tor/FIPS transports), DID/contact federation, peer file fetch.
- Mesh (Meshtastic/MeshCore) + mesh-AI gating.
- Dual-ecash federation validation + networking-sats routing.
- DHT / iroh swarm distribution (origin-always-wins) once that dep lands.
Sequence
- Get the .228 single-node gate green 5× (master plan §5/§6) — DONE/in progress.
- THEN: bring each fleet node to the preconditions above; run the on-node gate 5× per node.
- THEN: the cross-node suites (federation/mesh/transport), tracked here.
This plan does not gate the v1.7.x single-node criterion; it is the next layer.