archy/tests/lifecycle/bats/bitcoin-receive.bats
archipelago 0ed892a412 fix: wallet receive reliability, bitcoin install self-heal, ElectrumX app tile
Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha
(all nodes were already on the latest build — these were live bugs, not stale
builds), plus the missing ElectrumX tile, and adds automated coverage so each
can't regress silently.

Receive address (".116 receive fails", ".228 false 'wallet is locked'"):
- LND publishes its REST API on a host port that can drift from the manifest
  (a container created when the mapping was 8080 kept publishing 8080 after the
  manifest moved to 18080). The in-process client connects to the manifest port,
  gets connection-refused, and wallet init fails forever while the container
  looks "Up". Add published-port drift detection to the reconciler
  (container_ports_drifted / host_port_bindings_drifted) that recreates a
  drifted backend even for restart-sensitive apps — a drifted container is
  already broken, so leaving it "untouched" only perpetuates the failure.
- Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED,
  WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they
  survive the RPC error sanitizer instead of collapsing to the generic
  "Operation failed". The UI maps the code instead of guessing wallet state from
  substrings — so an unreachable REST endpoint is no longer mislabelled "locked".

Bitcoin install (".198 bitcoin gone / reinstall just stops"):
- bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only
  generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so
  secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate
  it idempotently before bitcoin starts (ensure_app_secrets, reusing
  ensure_txrelay_credentials), and name the missing secret in the error so a
  genuine gap is actionable instead of a bare "IO error".

ElectrumX app tile missing on every node with it installed:
- The catalog generator dropped electrumx because the manifest had no
  interfaces.main block, so the tile had no launch URL and was hidden. Declare
  the companion UI port (50002) in the manifest, regenerate the catalog, and let
  an app with a known launch URL stay launchable while its backend is still
  "starting" (ElectrumX indexes for 10m+).

Test harness:
- New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness
  (validated live; port-drift catches the real .116 drift).
- Rust unit tests for drift detection, the receive reason-code classifier, and
  the named-missing-secret error; vitest for the UI code mapping.
- create-release.sh now runs tests/release/run.sh and aborts the release on
  failure — previously it ran no tests at all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 03:12:56 -04:00

105 lines
3.9 KiB
Bash

#!/usr/bin/env bats
# tests/lifecycle/bats/bitcoin-receive.bats
#
# Regression coverage for the Bitcoin "Receive" flow. Receive addresses come
# from LND's hot wallet via the `lnd.newaddress` RPC, so this exercises the
# exact path that broke on the fleet:
# - .116: LND REST published on the wrong host port (8080 vs the manifest's
# 18080) -> connection refused -> receive failed with the generic
# "Operation failed. Check server logs." message.
# - .228: the same family surfaced to the UI as a *false* "wallet is locked".
#
# These tests run on the archy host (they shell into podman / curl localhost).
#
# Tiers: read-only only — generating a receive address is non-destructive.
load '../lib/rpc.bash'
setup_file() {
: "${ARCHY_PASSWORD:?Set ARCHY_PASSWORD env var to the UI password}"
export ARCHY_FORCE_LOGIN=1
rpc_login
unset ARCHY_FORCE_LOGIN
}
teardown_file() {
rpc_logout_local
}
# Resolve the LND REST host port from the manifest (single source of truth) so
# this test follows the manifest rather than hard-coding 18080.
_lnd_rest_host_port() {
local mf
for mf in \
"${ARCHIPELAGO_APPS_DIR:-/opt/archipelago/apps}/lnd/manifest.yml" \
"${ARCHIPELAGO_APPS_DIR:-/opt/archipelago/apps}/lnd/manifest.yaml" \
"$BATS_TEST_DIRNAME/../../../apps/lnd/manifest.yml"; do
[[ -r "$mf" ]] || continue
# The REST mapping is the `- host: <N>` whose following `container:` is 8080.
awk '
/- host:/ { host=$3 }
/container:/ { if ($2 == 8080 && host != "") { print host; exit } }
' "$mf"
return 0
done
}
_lnd_running() {
rpc_result container-list 2>/dev/null \
| jq -e '.[] | select(.name == "lnd" and .state == "running")' >/dev/null 2>&1
}
# ────────────────────────────────────────────────────────────────────
# Read-only tier
# ────────────────────────────────────────────────────────────────────
@test "LND REST is reachable on the manifest host port (catches port drift)" {
_lnd_running || skip "lnd not running"
local port
port=$(_lnd_rest_host_port)
[[ -n "$port" ]] || skip "could not resolve LND REST host port from manifest"
# A TCP connect is enough: drift (container published on a different host
# port) shows up as connection-refused here, exactly as on .116.
run curl -sk -o /dev/null --max-time 8 "https://127.0.0.1:${port}/v1/getinfo"
if [ "$status" -ne 0 ]; then
echo "LND REST not reachable on host port ${port} (curl exit $status) — likely published-port drift" >&2
return 1
fi
}
@test "lnd.newaddress returns a bech32 address when lnd is running" {
_lnd_running || skip "lnd not running"
run rpc_call lnd.newaddress
[ "$status" -eq 0 ]
local err addr
err=$(echo "$output" | jq -r '.error.message // .error // empty')
addr=$(echo "$output" | jq -r '.result.address // empty')
# The whole point of the fix: a running lnd must hand back a real address.
if [[ -n "$err" ]]; then
echo "lnd.newaddress errored on a running node: $err" >&2
return 1
fi
if [[ "$addr" != bc1* ]]; then
echo "expected a bech32 (bc1…) address, got: '$addr'" >&2
return 1
fi
}
@test "receive errors are specific, never the generic catch-all" {
# Even when receive legitimately can't produce an address, the message must be
# actionable (start with 'Bitcoin address' and/or carry a [CODE] token) — the
# generic 'Operation failed' is what hid the real cause on .116.
run rpc_call lnd.newaddress
[ "$status" -eq 0 ]
local err
err=$(echo "$output" | jq -r '.error.message // .error // empty')
if [[ "$err" == "Operation failed. Check server logs for details." ]]; then
echo "receive returned the generic catch-all instead of a specific reason" >&2
return 1
fi
}