2026-06-22 16:47:34 -04:00
|
|
|
|
# Multinode / Fleet Testing Plan (separate from the single-node gate)
|
|
|
|
|
|
|
|
|
|
|
|
> **Scope split (2026-06-22):** the production test gate (`docs/PRODUCTION-MASTER-PLAN.md` §5,
|
|
|
|
|
|
> `tests/lifecycle/TESTING.md`) is now a **single-node criterion on .228**. Verifying the same
|
|
|
|
|
|
> lifecycle matrix across the rest of the fleet (.198 and the other testers) lives HERE and is run
|
|
|
|
|
|
> **after** the .228 single-node gate is green. This is intentionally NOT a blocker on the .228 gate.
|
|
|
|
|
|
|
|
|
|
|
|
## Why split it out
|
|
|
|
|
|
|
|
|
|
|
|
The lifecycle gate must be **run ON the node under test** — its bitcoin/companion/orphan/endpoint
|
|
|
|
|
|
checks use local `podman`/`systemctl`/`bitcoin-cli`/`curl`, not RPC to a remote host. Running it from
|
|
|
|
|
|
one host against another silently tests the *runner*. So "multinode" isn't "point the harness at N
|
|
|
|
|
|
hosts" — it's "run the on-node gate on each host," plus the genuinely cross-node concerns (federation,
|
|
|
|
|
|
mesh, transport, sync) that a single node can't exercise.
|
|
|
|
|
|
|
|
|
|
|
|
## How to run the gate on another node
|
|
|
|
|
|
|
|
|
|
|
|
Bats + jq usually aren't installed on ISO nodes. Bootstrap (one-time per node):
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
# from a host that has them (e.g. .116):
|
|
|
|
|
|
dpkg -L bats | grep -E '^/usr/(bin|lib|libexec)' | tar czf /tmp/bats.tgz -P -T - $(which jq)
|
|
|
|
|
|
tar czf /tmp/tests.tgz -C <repo> tests/lifecycle
|
|
|
|
|
|
scp /tmp/bats.tgz /tmp/tests.tgz <node>:/tmp/
|
|
|
|
|
|
# on the node:
|
|
|
|
|
|
sudo tar xzf /tmp/bats.tgz -P -C / # bats (jq here is dynamically linked — may need libs)
|
|
|
|
|
|
sudo curl -fsSL -o /usr/local/bin/jq \
|
|
|
|
|
|
https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 && sudo chmod +x /usr/local/bin/jq
|
|
|
|
|
|
mkdir -p /tmp/lifecycle-run && tar xzf /tmp/tests.tgz -C /tmp/lifecycle-run
|
|
|
|
|
|
cd /tmp/lifecycle-run/tests/lifecycle
|
|
|
|
|
|
ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=https ARCHY_PASSWORD=<node pw> \
|
2026-06-22 18:12:41 -04:00
|
|
|
|
ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5 nohup ./run-gate.sh > /tmp/gate.log 2>&1 &
|
2026-06-22 16:47:34 -04:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## Per-node preconditions (learned on .228)
|
|
|
|
|
|
|
|
|
|
|
|
- **Bitcoin must be fully synced + archival** (`initialblockdownload:false`, `pruned:false`).
|
|
|
|
|
|
test 83 reads the *real* `getblockchaininfo`, not the UI's headers-height. A node mid-IBD will
|
|
|
|
|
|
cascade-fail electrumx/lnd/btcpay/mempool even though the apps run.
|
|
|
|
|
|
- **Backends should be proper installs** (in `manifest_ids`), not adopted plain-podman left over
|
|
|
|
|
|
from ad-hoc `package.start`/cascade churn — otherwise companion self-heal and quadlet checks skew.
|
|
|
|
|
|
- **No stale per-app nginx proxy targets.** e.g. `/app/lnd/` must point at the lnd-ui port (18083),
|
|
|
|
|
|
not a stale `8081`. Repo code is correct; old node configs may be stale — re-check + regenerate.
|
|
|
|
|
|
- **No orphan quadlet units** (e.g. a `home-assistant.container` whose ContainerName ≠ the real
|
|
|
|
|
|
`homeassistant` container) — these wedge `systemctl --user` "activating" and fail the quadlet checks.
|
|
|
|
|
|
|
|
|
|
|
|
## Node roster (carry-over)
|
|
|
|
|
|
|
|
|
|
|
|
| Node | Role | Notes |
|
|
|
|
|
|
|------|------|-------|
|
|
|
|
|
|
| .228 | **single-node gate** (primary) | 14-app resilience node; bitcoin synced archival; gate GREEN. |
|
|
|
|
|
|
| .198 | fleet verify | was weak/loaded (load ~3–5) + **bitcoin mid-IBD** at split time → must finish syncing first; sshd wedges under concurrent SSH (use ONE session; gate uses HTTPS RPC so fine). |
|
|
|
|
|
|
| .5 / .120 | x250 testers (Tailscale) | flaky cellular; SSH via `tailscale nc` ProxyCommand. |
|
|
|
|
|
|
| .116 | dev/validation | local repo; its own bitcoin may be mid-IBD — do NOT treat as a gate target unless synced. |
|
|
|
|
|
|
|
|
|
|
|
|
## Cross-node concerns (only a multinode setup can test)
|
|
|
|
|
|
|
|
|
|
|
|
- Federation sync (Tor/FIPS transports), DID/contact federation, peer file fetch.
|
|
|
|
|
|
- Mesh (Meshtastic/MeshCore) + mesh-AI gating.
|
|
|
|
|
|
- Dual-ecash federation validation + networking-sats routing.
|
|
|
|
|
|
- DHT / iroh swarm distribution (origin-always-wins) once that dep lands.
|
|
|
|
|
|
|
|
|
|
|
|
## Sequence
|
|
|
|
|
|
|
|
|
|
|
|
1. Get the **.228 single-node gate green 5×** (master plan §5/§6) — DONE/in progress.
|
|
|
|
|
|
2. THEN: bring each fleet node to the preconditions above; run the on-node gate 5× per node.
|
|
|
|
|
|
3. THEN: the cross-node suites (federation/mesh/transport), tracked here.
|
|
|
|
|
|
|
|
|
|
|
|
This plan does not gate the v1.7.x single-node criterion; it is the next layer.
|