The 5x destructive gate on heavy nodes false-failed on transient windows
during stack recovery, not real regressions:
- immich.bats: lan_address port-publish probe 30s -> 90s. The postgres->redis
->server (DB migrations on boot) stack can take >30s to republish :2283 after
a churn-induced recreate; destructive-tier immich tests already allow 180-240s.
- mempool.bats: orphan-container check now polls to steady state (<=30s) instead
of a single-shot count, which caught a recreated member briefly visible
alongside its replacement mid-reconcile.
- run-gate.sh: settle cap 180s -> 300s and also gate on immich's :2283 when
installed, so the next iteration's read-only probe doesn't race a still-
recovering stack. Settle returns the instant every probe is green.
A genuinely unexposed/orphaned/unhealthy app still fails these checks; they only
absorb the transient recreate window under sustained churn.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename run-20x.sh → run-gate.sh, default ARCHY_ITERATIONS 20→5, and scrub
20× references across CLAUDE.md, the master plan, TESTING.md, app-registry
status, the orchestrator/config doc-comments, and the bats suites. Also add
a minimal fail() helper to mempool.bats so guard failures report cleanly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>