Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on
a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block
validation. The old 10s client-side timeout was rejecting roughly 30% of
UI calls even though the node was perfectly healthy. 20x stress test on
the live .116 node (caught in IBD catch-up at block 797k) used to drop
10 of 20 calls; now drops 0 of 20.
What changed:
- core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up
to 3 times with 500ms and 1500ms backoffs between attempts. Only
transient transport errors (timeout, connect refused, send/recv IO)
trigger retry. A well-formed bitcoind error response is surfaced
immediately - real RPC bugs are never masked.
- Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top
of reqwest's own timeout, so DNS starvation or TLS wedging can't
steal the entire retry budget.
- handle_bitcoin_getinfo client builder gained a 3s connect_timeout
so a dead bitcoind is fast-failed inside the first attempt instead
of eating the whole 15s.
- Retry policy extracted into a RetryConfig struct so tests can dial
down timeouts to ~100ms per attempt. Production defaults live in
RetryConfig::production().
Not changed (tracked as follow-up):
- mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the
same 10s-timeout pattern. Not migrated to the new wrapper in this
release; scheduled for v1.7.43 alongside the render_bitcoin_conf
work.
- lnd/info.rs and electrs_status have similar 10s/15s timeouts but
different failure profiles - audit first, migrate only the ones
that actually exhibit the bug.
Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing.
Uses an in-process hyper server (already a transitive dep) to simulate
bitcoind responses; no new crates required.
- happy_path_first_attempt: no retry when first attempt succeeds
- retries_on_timeout_then_succeeds: first attempt times out, second
succeeds, returns OK (uses a short-timeout RetryConfig so the test
runs in <1s instead of 15s)
- retries_exhausted_on_persistent_connect_refused: all attempts fail
against a closed port, error bubbles up, elapsed time confirms
backoffs actually ran
- does_not_retry_on_rpc_level_error: bitcoind-returned error body is
surfaced immediately, no retry
- does_not_retry_parse_errors: non-JSON response (e.g. 503 with html
body) is NOT retried - guards against the tempting "retry all
non-2xx" mistake that would mask real bitcoind misconfig
- retry_budget_invariants: asserts total wall-time ceiling stays
under 60s so a bumped constant can't silently hang a UI call
forever
Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD
catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the
old 10s timeout. Worst-case latency was 48.9s during peak validation;
happy-path latency (cached result) remains 28-77ms.
28 lines
2.0 KiB
JSON
28 lines
2.0 KiB
JSON
{
|
|
"version": "1.7.42-alpha",
|
|
"release_date": "2026-04-22",
|
|
"changelog": [
|
|
"Bitcoin dashboard no longer flickers errors during initial chain sync. When bitcoind is busy validating a fresh block, its RPC thread can block for five to ten seconds \u2014 long enough that the old 10-second client timeout was rejecting roughly 30% of UI calls even on a perfectly healthy node. The RPC client now retries transient timeouts transparently (3 attempts, 500ms + 1500ms backoffs between them) and only surfaces errors that bitcoind itself reported. Calls that used to flash red on the dashboard during sync now just take a second or two longer and return correct data.",
|
|
"Connection-refused against bitcoind is still fast-failed \u2014 a genuinely-dead daemon is reported in under a second, so real outages are still surfaced immediately. The retry wrapper only engages on timeouts and transport-level errors, never on well-formed bitcoind error responses, so real RPC bugs are not masked.",
|
|
"Stress-tested against a syncing pruned node on live hardware: 20 out of 20 getblockchaininfo calls succeed under load (previously ~60% success), with worst-case latency under 50 seconds during peak IBD catch-up and sub-100ms once blocks are cached."
|
|
],
|
|
"components": [
|
|
{
|
|
"name": "archipelago",
|
|
"current_version": "1.7.41-alpha",
|
|
"new_version": "1.7.42-alpha",
|
|
"download_url": "https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.42-alpha/archipelago",
|
|
"sha256": "5e0b1006546348c888cbbbd93a1002d7cc7006a0018195762f00d84ed6436c9e",
|
|
"size_bytes": 41222000
|
|
},
|
|
{
|
|
"name": "archipelago-frontend-1.7.42-alpha.tar.gz",
|
|
"current_version": "1.7.41-alpha",
|
|
"new_version": "1.7.42-alpha",
|
|
"download_url": "https://git.tx1138.com/lfg2025/archy/raw/branch/main/releases/v1.7.42-alpha/archipelago-frontend-1.7.42-alpha.tar.gz",
|
|
"sha256": "8eca3ada91f64b6d34b37950a8b9eb570981b95dd9ab54cd88909deb6acf31b6",
|
|
"size_bytes": 162088504
|
|
}
|
|
]
|
|
}
|