release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red
Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on
a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block
validation. The old 10s client-side timeout was rejecting roughly 30% of
UI calls even though the node was perfectly healthy. 20x stress test on
the live .116 node (caught in IBD catch-up at block 797k) used to drop
10 of 20 calls; now drops 0 of 20.
What changed:
- core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up
to 3 times with 500ms and 1500ms backoffs between attempts. Only
transient transport errors (timeout, connect refused, send/recv IO)
trigger retry. A well-formed bitcoind error response is surfaced
immediately - real RPC bugs are never masked.
- Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top
of reqwest's own timeout, so DNS starvation or TLS wedging can't
steal the entire retry budget.
- handle_bitcoin_getinfo client builder gained a 3s connect_timeout
so a dead bitcoind is fast-failed inside the first attempt instead
of eating the whole 15s.
- Retry policy extracted into a RetryConfig struct so tests can dial
down timeouts to ~100ms per attempt. Production defaults live in
RetryConfig::production().
Not changed (tracked as follow-up):
- mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the
same 10s-timeout pattern. Not migrated to the new wrapper in this
release; scheduled for v1.7.43 alongside the render_bitcoin_conf
work.
- lnd/info.rs and electrs_status have similar 10s/15s timeouts but
different failure profiles - audit first, migrate only the ones
that actually exhibit the bug.
Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing.
Uses an in-process hyper server (already a transitive dep) to simulate
bitcoind responses; no new crates required.
- happy_path_first_attempt: no retry when first attempt succeeds
- retries_on_timeout_then_succeeds: first attempt times out, second
succeeds, returns OK (uses a short-timeout RetryConfig so the test
runs in <1s instead of 15s)
- retries_exhausted_on_persistent_connect_refused: all attempts fail
against a closed port, error bubbles up, elapsed time confirms
backoffs actually ran
- does_not_retry_on_rpc_level_error: bitcoind-returned error body is
surfaced immediately, no retry
- does_not_retry_parse_errors: non-JSON response (e.g. 503 with html
body) is NOT retried - guards against the tempting "retry all
non-2xx" mistake that would mask real bitcoind misconfig
- retry_budget_invariants: asserts total wall-time ceiling stays
under 60s so a bumped constant can't silently hang a UI call
forever
Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD
catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the
old 10s timeout. Worst-case latency was 48.9s during peak validation;
happy-path latency (cached result) remains 28-77ms.
This commit is contained in:
parent
d1bcf271f9
commit
a4988fc9ca
2
core/Cargo.lock
generated
2
core/Cargo.lock
generated
@ -80,7 +80,7 @@ checksum = "a23eb6b1614318a8071c9b2521f36b424b2c83db5eb3a0fead4a6c0809af6e61"
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "archipelago"
|
name = "archipelago"
|
||||||
version = "1.7.41-alpha"
|
version = "1.7.42-alpha"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"anyhow",
|
"anyhow",
|
||||||
"archipelago-container",
|
"archipelago-container",
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "archipelago"
|
name = "archipelago"
|
||||||
version = "1.7.41-alpha"
|
version = "1.7.42-alpha"
|
||||||
edition = "2021"
|
edition = "2021"
|
||||||
description = "Archipelago Bitcoin Node OS - Native backend"
|
description = "Archipelago Bitcoin Node OS - Native backend"
|
||||||
authors = ["Archipelago Team"]
|
authors = ["Archipelago Team"]
|
||||||
|
|||||||
@ -3,6 +3,55 @@ use anyhow::{Context, Result};
|
|||||||
use serde::{Deserialize, Serialize};
|
use serde::{Deserialize, Serialize};
|
||||||
use zeroize::Zeroize;
|
use zeroize::Zeroize;
|
||||||
|
|
||||||
|
/// Retry configuration for [`bitcoin_rpc_post_with_retry`].
|
||||||
|
///
|
||||||
|
/// Exposed as a struct (rather than hard-coded constants inside the function)
|
||||||
|
/// so tests can dial down timeouts to keep the suite fast while still
|
||||||
|
/// exercising real retry/backoff behavior.
|
||||||
|
#[derive(Debug, Clone)]
|
||||||
|
struct RetryConfig {
|
||||||
|
max_attempts: u32,
|
||||||
|
attempt_timeout: std::time::Duration,
|
||||||
|
/// Length must equal `max_attempts - 1` (one backoff between each
|
||||||
|
/// successive attempt). The last attempt is not followed by a backoff.
|
||||||
|
backoffs: Vec<std::time::Duration>,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl RetryConfig {
|
||||||
|
/// Production retry policy: 3 attempts, 15s each, 500ms + 1500ms backoffs.
|
||||||
|
/// Total worst-case wall time: 3 * 15 + 0.5 + 1.5 = 47s.
|
||||||
|
fn production() -> Self {
|
||||||
|
Self {
|
||||||
|
max_attempts: BITCOIN_RPC_MAX_ATTEMPTS,
|
||||||
|
attempt_timeout: BITCOIN_RPC_ATTEMPT_TIMEOUT,
|
||||||
|
backoffs: BITCOIN_RPC_BACKOFFS.to_vec(),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Max retry attempts for a single bitcoin_rpc_call invocation.
|
||||||
|
/// First attempt + 2 retries = 3 total.
|
||||||
|
const BITCOIN_RPC_MAX_ATTEMPTS: u32 = 3;
|
||||||
|
|
||||||
|
/// Per-attempt deadline. Must be >= the reqwest client's own timeout (we
|
||||||
|
/// build it at 15s in handle_bitcoin_getinfo) — this is the outer safety net.
|
||||||
|
const BITCOIN_RPC_ATTEMPT_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(15);
|
||||||
|
|
||||||
|
/// Backoff between attempts. Index 0 = after first failure, 1 = after second, etc.
|
||||||
|
/// Chosen to absorb bitcoind's typical block-validation stall (2-5s) without
|
||||||
|
/// adding noticeable latency on the happy path (first attempt succeeds in ~30ms).
|
||||||
|
const BITCOIN_RPC_BACKOFFS: [std::time::Duration; 2] = [
|
||||||
|
std::time::Duration::from_millis(500),
|
||||||
|
std::time::Duration::from_millis(1500),
|
||||||
|
];
|
||||||
|
|
||||||
|
/// Classify a reqwest error as transient (retryable) or fatal.
|
||||||
|
/// Transient: timeout, connect refused, request/response body IO errors.
|
||||||
|
/// Fatal: TLS errors, URL parse errors, redirect loops, builder errors.
|
||||||
|
fn is_transient_transport_error(e: &reqwest::Error) -> bool {
|
||||||
|
e.is_timeout() || e.is_connect() || e.is_request() || e.is_body()
|
||||||
|
}
|
||||||
|
|
||||||
#[derive(Debug, Serialize)]
|
#[derive(Debug, Serialize)]
|
||||||
struct BitcoinInfo {
|
struct BitcoinInfo {
|
||||||
block_height: u64,
|
block_height: u64,
|
||||||
@ -37,8 +86,15 @@ struct MempoolInfo {
|
|||||||
|
|
||||||
impl RpcHandler {
|
impl RpcHandler {
|
||||||
pub(super) async fn handle_bitcoin_getinfo(&self) -> Result<serde_json::Value> {
|
pub(super) async fn handle_bitcoin_getinfo(&self) -> Result<serde_json::Value> {
|
||||||
|
// Per-attempt timeout (see bitcoin_rpc_call for retry semantics).
|
||||||
|
// 15s is enough room for bitcoind to answer getblockchaininfo even
|
||||||
|
// during block validation; bitcoin_rpc_call wraps each attempt in a
|
||||||
|
// separate tokio::time::timeout too, so this is belt-and-suspenders.
|
||||||
|
// connect_timeout is tighter so a dead bitcoind doesn't steal the
|
||||||
|
// whole attempt budget on TCP connect alone.
|
||||||
let client = reqwest::Client::builder()
|
let client = reqwest::Client::builder()
|
||||||
.timeout(std::time::Duration::from_secs(10))
|
.timeout(std::time::Duration::from_secs(15))
|
||||||
|
.connect_timeout(std::time::Duration::from_secs(3))
|
||||||
.build()
|
.build()
|
||||||
.context("Failed to create HTTP client")?;
|
.context("Failed to create HTTP client")?;
|
||||||
|
|
||||||
@ -68,6 +124,19 @@ impl RpcHandler {
|
|||||||
Ok(serde_json::to_value(info)?)
|
Ok(serde_json::to_value(info)?)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Call a Bitcoin Core JSON-RPC method.
|
||||||
|
///
|
||||||
|
/// Retries up to [`BITCOIN_RPC_MAX_ATTEMPTS`] times on transient
|
||||||
|
/// transport errors (timeout / connection refused / send/recv IO).
|
||||||
|
/// Does **not** retry when bitcoind responds with a well-formed
|
||||||
|
/// `{"error": ...}` body — those are real RPC errors and surfacing
|
||||||
|
/// them quickly is the right behavior.
|
||||||
|
///
|
||||||
|
/// Motivation: on a syncing pruned node, bitcoind's RPC thread can block
|
||||||
|
/// for 5-10 seconds during block validation. A single 10s timeout means
|
||||||
|
/// ~30% of UI calls error out even though the node is perfectly healthy.
|
||||||
|
/// With retry + backoff, the UI sees a uniform slow-but-successful
|
||||||
|
/// response instead of intermittent failures.
|
||||||
async fn bitcoin_rpc_call<T: serde::de::DeserializeOwned>(
|
async fn bitcoin_rpc_call<T: serde::de::DeserializeOwned>(
|
||||||
&self,
|
&self,
|
||||||
client: &reqwest::Client,
|
client: &reqwest::Client,
|
||||||
@ -75,33 +144,15 @@ impl RpcHandler {
|
|||||||
params: &[serde_json::Value],
|
params: &[serde_json::Value],
|
||||||
) -> Result<T> {
|
) -> Result<T> {
|
||||||
let (rpc_user, rpc_pass) = crate::bitcoin_rpc::bitcoin_rpc_credentials().await;
|
let (rpc_user, rpc_pass) = crate::bitcoin_rpc::bitcoin_rpc_credentials().await;
|
||||||
let body = serde_json::json!({
|
bitcoin_rpc_post_with_retry(
|
||||||
"jsonrpc": "1.0",
|
client,
|
||||||
"id": "archy",
|
crate::constants::BITCOIN_RPC_URL,
|
||||||
"method": method,
|
&rpc_user,
|
||||||
"params": params,
|
&rpc_pass,
|
||||||
});
|
method,
|
||||||
|
params,
|
||||||
let resp = client
|
)
|
||||||
.post(crate::constants::BITCOIN_RPC_URL)
|
.await
|
||||||
.basic_auth(&rpc_user, Some(&rpc_pass))
|
|
||||||
.json(&body)
|
|
||||||
.send()
|
|
||||||
.await
|
|
||||||
.context("Bitcoin RPC connection failed")?;
|
|
||||||
|
|
||||||
let rpc_resp: BitcoinRpcResponse<T> = resp
|
|
||||||
.json()
|
|
||||||
.await
|
|
||||||
.context("Failed to parse Bitcoin RPC response")?;
|
|
||||||
|
|
||||||
if let Some(err) = rpc_resp.error {
|
|
||||||
anyhow::bail!("Bitcoin RPC error: {}", err);
|
|
||||||
}
|
|
||||||
|
|
||||||
rpc_resp
|
|
||||||
.result
|
|
||||||
.ok_or_else(|| anyhow::anyhow!("Bitcoin RPC returned null result"))
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/// Initialize a Bitcoin Core descriptor wallet with keys derived from the master seed.
|
/// Initialize a Bitcoin Core descriptor wallet with keys derived from the master seed.
|
||||||
@ -243,3 +294,370 @@ impl RpcHandler {
|
|||||||
}))
|
}))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Free-function counterpart to `RpcHandler::bitcoin_rpc_call`.
|
||||||
|
///
|
||||||
|
/// Takes the URL + credentials as parameters so it can be exercised by unit
|
||||||
|
/// tests against a mock HTTP server without constructing a full `RpcHandler`.
|
||||||
|
///
|
||||||
|
/// Production callers go through `RpcHandler::bitcoin_rpc_call`, which loads
|
||||||
|
/// credentials from the secrets file and points at `BITCOIN_RPC_URL`.
|
||||||
|
async fn bitcoin_rpc_post_with_retry<T: serde::de::DeserializeOwned>(
|
||||||
|
client: &reqwest::Client,
|
||||||
|
url: &str,
|
||||||
|
rpc_user: &str,
|
||||||
|
rpc_pass: &str,
|
||||||
|
method: &str,
|
||||||
|
params: &[serde_json::Value],
|
||||||
|
) -> Result<T> {
|
||||||
|
bitcoin_rpc_post_with_retry_cfg(
|
||||||
|
client,
|
||||||
|
url,
|
||||||
|
rpc_user,
|
||||||
|
rpc_pass,
|
||||||
|
method,
|
||||||
|
params,
|
||||||
|
&RetryConfig::production(),
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Inner implementation with configurable retry policy (for tests).
|
||||||
|
async fn bitcoin_rpc_post_with_retry_cfg<T: serde::de::DeserializeOwned>(
|
||||||
|
client: &reqwest::Client,
|
||||||
|
url: &str,
|
||||||
|
rpc_user: &str,
|
||||||
|
rpc_pass: &str,
|
||||||
|
method: &str,
|
||||||
|
params: &[serde_json::Value],
|
||||||
|
cfg: &RetryConfig,
|
||||||
|
) -> Result<T> {
|
||||||
|
debug_assert_eq!(
|
||||||
|
cfg.backoffs.len(),
|
||||||
|
(cfg.max_attempts - 1) as usize,
|
||||||
|
"RetryConfig: backoffs.len() must equal max_attempts - 1"
|
||||||
|
);
|
||||||
|
|
||||||
|
let body = serde_json::json!({
|
||||||
|
"jsonrpc": "1.0",
|
||||||
|
"id": "archy",
|
||||||
|
"method": method,
|
||||||
|
"params": params,
|
||||||
|
});
|
||||||
|
|
||||||
|
let mut last_err: Option<anyhow::Error> = None;
|
||||||
|
for attempt in 0..cfg.max_attempts {
|
||||||
|
if attempt > 0 {
|
||||||
|
let backoff = cfg
|
||||||
|
.backoffs
|
||||||
|
.get(attempt as usize - 1)
|
||||||
|
.copied()
|
||||||
|
.unwrap_or_else(|| std::time::Duration::from_secs(2));
|
||||||
|
tracing::warn!(
|
||||||
|
"bitcoin_rpc({}): attempt {} failed, backing off {:?}",
|
||||||
|
method,
|
||||||
|
attempt,
|
||||||
|
backoff
|
||||||
|
);
|
||||||
|
tokio::time::sleep(backoff).await;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Per-attempt hard deadline. Independent of reqwest's built-in timeout
|
||||||
|
// so we always cap total time even if reqwest blocks on something
|
||||||
|
// weird (e.g., DNS starvation).
|
||||||
|
let fut = client
|
||||||
|
.post(url)
|
||||||
|
.basic_auth(rpc_user, Some(rpc_pass))
|
||||||
|
.json(&body)
|
||||||
|
.send();
|
||||||
|
|
||||||
|
let send_result = match tokio::time::timeout(cfg.attempt_timeout, fut).await {
|
||||||
|
Err(_elapsed) => {
|
||||||
|
last_err = Some(anyhow::anyhow!(
|
||||||
|
"Bitcoin RPC send timed out after {:?}",
|
||||||
|
cfg.attempt_timeout
|
||||||
|
));
|
||||||
|
continue; // transient: retry
|
||||||
|
}
|
||||||
|
Ok(r) => r,
|
||||||
|
};
|
||||||
|
|
||||||
|
let resp = match send_result {
|
||||||
|
Ok(r) => r,
|
||||||
|
Err(e) if is_transient_transport_error(&e) => {
|
||||||
|
last_err = Some(anyhow::Error::from(e).context("Bitcoin RPC connection failed"));
|
||||||
|
continue; // transient: retry
|
||||||
|
}
|
||||||
|
Err(e) => {
|
||||||
|
return Err(anyhow::Error::from(e).context("Bitcoin RPC connection failed"));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
let rpc_resp: BitcoinRpcResponse<T> = resp
|
||||||
|
.json()
|
||||||
|
.await
|
||||||
|
.context("Failed to parse Bitcoin RPC response")?;
|
||||||
|
|
||||||
|
if let Some(err) = rpc_resp.error {
|
||||||
|
// RPC-level error: this is a real bitcoind response, not transient.
|
||||||
|
anyhow::bail!("Bitcoin RPC error: {}", err);
|
||||||
|
}
|
||||||
|
|
||||||
|
return rpc_resp
|
||||||
|
.result
|
||||||
|
.ok_or_else(|| anyhow::anyhow!("Bitcoin RPC returned null result"));
|
||||||
|
}
|
||||||
|
|
||||||
|
Err(last_err.unwrap_or_else(|| {
|
||||||
|
anyhow::anyhow!("Bitcoin RPC exhausted retries with no error captured")
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod tests {
|
||||||
|
use super::*;
|
||||||
|
use hyper::service::{make_service_fn, service_fn};
|
||||||
|
use hyper::{Body, Request, Response, Server, StatusCode};
|
||||||
|
use std::convert::Infallible;
|
||||||
|
use std::net::SocketAddr;
|
||||||
|
use std::sync::atomic::{AtomicU32, Ordering};
|
||||||
|
use std::sync::Arc;
|
||||||
|
|
||||||
|
/// Spin up a mock bitcoind HTTP server that behaves according to `handler`.
|
||||||
|
/// Returns the bound URL and a JoinHandle (dropped = server shutdown via the
|
||||||
|
/// oneshot cancel channel).
|
||||||
|
async fn spawn_mock<F, Fut>(
|
||||||
|
handler: F,
|
||||||
|
) -> (String, tokio::task::JoinHandle<()>, tokio::sync::oneshot::Sender<()>)
|
||||||
|
where
|
||||||
|
F: Fn(Request<Body>) -> Fut + Send + Sync + Clone + 'static,
|
||||||
|
Fut: std::future::Future<Output = Response<Body>> + Send + 'static,
|
||||||
|
{
|
||||||
|
let addr = SocketAddr::from(([127, 0, 0, 1], 0));
|
||||||
|
let make_svc = make_service_fn(move |_| {
|
||||||
|
let handler = handler.clone();
|
||||||
|
async move {
|
||||||
|
Ok::<_, Infallible>(service_fn(move |req| {
|
||||||
|
let handler = handler.clone();
|
||||||
|
async move { Ok::<_, Infallible>(handler(req).await) }
|
||||||
|
}))
|
||||||
|
}
|
||||||
|
});
|
||||||
|
let server = Server::bind(&addr).serve(make_svc);
|
||||||
|
let url = format!("http://{}", server.local_addr());
|
||||||
|
let (tx, rx) = tokio::sync::oneshot::channel::<()>();
|
||||||
|
let handle = tokio::spawn(async move {
|
||||||
|
let graceful = server.with_graceful_shutdown(async { let _ = rx.await; });
|
||||||
|
let _ = graceful.await;
|
||||||
|
});
|
||||||
|
(url, handle, tx)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Reply body bitcoind would send for a successful getblockcount.
|
||||||
|
fn ok_reply() -> Body {
|
||||||
|
Body::from(r#"{"result":42,"error":null,"id":"archy"}"#)
|
||||||
|
}
|
||||||
|
|
||||||
|
fn err_reply() -> Body {
|
||||||
|
Body::from(r#"{"result":null,"error":{"code":-8,"message":"nope"},"id":"archy"}"#)
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Succeeds on first attempt — should not retry.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn happy_path_first_attempt() {
|
||||||
|
let count = Arc::new(AtomicU32::new(0));
|
||||||
|
let c = count.clone();
|
||||||
|
let (url, _h, _tx) = spawn_mock(move |_req| {
|
||||||
|
let c = c.clone();
|
||||||
|
async move {
|
||||||
|
c.fetch_add(1, Ordering::SeqCst);
|
||||||
|
Response::new(ok_reply())
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let client = reqwest::Client::builder().build().unwrap();
|
||||||
|
let v: u64 = bitcoin_rpc_post_with_retry(
|
||||||
|
&client,
|
||||||
|
&url,
|
||||||
|
"user",
|
||||||
|
"pass",
|
||||||
|
"getblockcount",
|
||||||
|
&[],
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("should succeed");
|
||||||
|
assert_eq!(v, 42);
|
||||||
|
assert_eq!(count.load(Ordering::SeqCst), 1, "should not have retried");
|
||||||
|
}
|
||||||
|
|
||||||
|
/// HTTP 503 with non-JSON body: produces a JSON-parse error which is NOT
|
||||||
|
/// classified as transient. Must fail after first attempt.
|
||||||
|
/// This guards against the tempting mistake of blanket-retrying every
|
||||||
|
/// non-2xx response — which would mask real bitcoind misconfig.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn does_not_retry_parse_errors() {
|
||||||
|
let count = Arc::new(AtomicU32::new(0));
|
||||||
|
let c = count.clone();
|
||||||
|
let (url, _h, _tx) = spawn_mock(move |_req| {
|
||||||
|
let c = c.clone();
|
||||||
|
async move {
|
||||||
|
c.fetch_add(1, Ordering::SeqCst);
|
||||||
|
Response::builder()
|
||||||
|
.status(StatusCode::SERVICE_UNAVAILABLE)
|
||||||
|
.body(Body::from("busy"))
|
||||||
|
.unwrap()
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let client = reqwest::Client::builder().build().unwrap();
|
||||||
|
let result: Result<u64> = bitcoin_rpc_post_with_retry(
|
||||||
|
&client,
|
||||||
|
&url,
|
||||||
|
"user",
|
||||||
|
"pass",
|
||||||
|
"getblockcount",
|
||||||
|
&[],
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
assert!(result.is_err(), "non-JSON response should error out");
|
||||||
|
assert_eq!(
|
||||||
|
count.load(Ordering::SeqCst),
|
||||||
|
1,
|
||||||
|
"parse errors are not retryable"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Connect-refused (port closed) is the canonical transient transport
|
||||||
|
/// error. Must exhaust BITCOIN_RPC_MAX_ATTEMPTS and the total elapsed
|
||||||
|
/// time must include at least the sum of the backoffs.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn retries_exhausted_on_persistent_connect_refused() {
|
||||||
|
// Bind a port then immediately drop the listener so the port is closed.
|
||||||
|
let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
|
||||||
|
let closed_url = format!("http://{}", listener.local_addr().unwrap());
|
||||||
|
drop(listener);
|
||||||
|
|
||||||
|
let client = reqwest::Client::builder()
|
||||||
|
.connect_timeout(std::time::Duration::from_millis(500))
|
||||||
|
.build()
|
||||||
|
.unwrap();
|
||||||
|
let start = std::time::Instant::now();
|
||||||
|
let result: Result<u64> = bitcoin_rpc_post_with_retry(
|
||||||
|
&client,
|
||||||
|
&closed_url,
|
||||||
|
"user",
|
||||||
|
"pass",
|
||||||
|
"getblockcount",
|
||||||
|
&[],
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
let elapsed = start.elapsed();
|
||||||
|
assert!(result.is_err(), "connect-refused should exhaust retries");
|
||||||
|
let min_backoff: std::time::Duration = BITCOIN_RPC_BACKOFFS.iter().sum();
|
||||||
|
assert!(
|
||||||
|
elapsed >= min_backoff,
|
||||||
|
"should have backed off between retries (elapsed={:?}, expected at least {:?})",
|
||||||
|
elapsed,
|
||||||
|
min_backoff
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// The motivating scenario: first attempt times out (bitcoind busy),
|
||||||
|
/// subsequent attempt succeeds. Uses a short test-only RetryConfig so
|
||||||
|
/// the test runs in <1s instead of 15s.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn retries_on_timeout_then_succeeds() {
|
||||||
|
let count = Arc::new(AtomicU32::new(0));
|
||||||
|
let c = count.clone();
|
||||||
|
// Mock server: first request hangs for 500ms, subsequent requests reply OK.
|
||||||
|
let (url, _h, _tx) = spawn_mock(move |_req| {
|
||||||
|
let c = c.clone();
|
||||||
|
async move {
|
||||||
|
let n = c.fetch_add(1, Ordering::SeqCst);
|
||||||
|
if n == 0 {
|
||||||
|
tokio::time::sleep(std::time::Duration::from_millis(500)).await;
|
||||||
|
}
|
||||||
|
Response::new(ok_reply())
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let client = reqwest::Client::builder().build().unwrap();
|
||||||
|
// Attempt timeout 100ms < server's 500ms sleep => first attempt times out.
|
||||||
|
// Backoff 20ms between attempts.
|
||||||
|
let cfg = RetryConfig {
|
||||||
|
max_attempts: 3,
|
||||||
|
attempt_timeout: std::time::Duration::from_millis(100),
|
||||||
|
backoffs: vec![
|
||||||
|
std::time::Duration::from_millis(20),
|
||||||
|
std::time::Duration::from_millis(20),
|
||||||
|
],
|
||||||
|
};
|
||||||
|
let v: u64 = bitcoin_rpc_post_with_retry_cfg(
|
||||||
|
&client,
|
||||||
|
&url,
|
||||||
|
"user",
|
||||||
|
"pass",
|
||||||
|
"getblockcount",
|
||||||
|
&[],
|
||||||
|
&cfg,
|
||||||
|
)
|
||||||
|
.await
|
||||||
|
.expect("second attempt should succeed");
|
||||||
|
assert_eq!(v, 42);
|
||||||
|
assert!(
|
||||||
|
count.load(Ordering::SeqCst) >= 2,
|
||||||
|
"expected at least 2 attempts (got {})",
|
||||||
|
count.load(Ordering::SeqCst)
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// bitcoind returned a well-formed `{"error": ...}` body. Must NOT retry.
|
||||||
|
#[tokio::test]
|
||||||
|
async fn does_not_retry_on_rpc_level_error() {
|
||||||
|
let count = Arc::new(AtomicU32::new(0));
|
||||||
|
let c = count.clone();
|
||||||
|
let (url, _h, _tx) = spawn_mock(move |_req| {
|
||||||
|
let c = c.clone();
|
||||||
|
async move {
|
||||||
|
c.fetch_add(1, Ordering::SeqCst);
|
||||||
|
Response::new(err_reply())
|
||||||
|
}
|
||||||
|
})
|
||||||
|
.await;
|
||||||
|
|
||||||
|
let client = reqwest::Client::builder().build().unwrap();
|
||||||
|
let result: Result<u64> = bitcoin_rpc_post_with_retry(
|
||||||
|
&client,
|
||||||
|
&url,
|
||||||
|
"user",
|
||||||
|
"pass",
|
||||||
|
"getblockcount",
|
||||||
|
&[],
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
assert!(result.is_err());
|
||||||
|
assert_eq!(
|
||||||
|
count.load(Ordering::SeqCst),
|
||||||
|
1,
|
||||||
|
"RPC-level errors are not transient"
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Sanity: retry budget invariants. Chosen to catch regressions where
|
||||||
|
/// someone bumps these constants without realizing the total worst-case
|
||||||
|
/// wall time implications.
|
||||||
|
#[test]
|
||||||
|
fn retry_budget_invariants() {
|
||||||
|
assert_eq!(BITCOIN_RPC_MAX_ATTEMPTS, 3);
|
||||||
|
assert_eq!(BITCOIN_RPC_BACKOFFS.len(), (BITCOIN_RPC_MAX_ATTEMPTS - 1) as usize);
|
||||||
|
// Total wall-time ceiling:
|
||||||
|
// 3 attempts * 15s + (0.5s + 1.5s) backoff = 47s
|
||||||
|
let total: std::time::Duration =
|
||||||
|
BITCOIN_RPC_ATTEMPT_TIMEOUT * BITCOIN_RPC_MAX_ATTEMPTS
|
||||||
|
+ BITCOIN_RPC_BACKOFFS.iter().sum::<std::time::Duration>();
|
||||||
|
assert!(total < std::time::Duration::from_secs(60));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@ -1,7 +1,7 @@
|
|||||||
{
|
{
|
||||||
"name": "neode-ui",
|
"name": "neode-ui",
|
||||||
"private": true,
|
"private": true,
|
||||||
"version": "1.7.41-alpha",
|
"version": "1.7.42-alpha",
|
||||||
"type": "module",
|
"type": "module",
|
||||||
"scripts": {
|
"scripts": {
|
||||||
"start": "./start-dev.sh",
|
"start": "./start-dev.sh",
|
||||||
|
|||||||
@ -180,6 +180,16 @@ init()
|
|||||||
</button>
|
</button>
|
||||||
</div>
|
</div>
|
||||||
<div class="overflow-y-auto flex-1 min-h-0 space-y-6 pr-1">
|
<div class="overflow-y-auto flex-1 min-h-0 space-y-6 pr-1">
|
||||||
|
<!-- v1.7.42-alpha -->
|
||||||
|
<div>
|
||||||
|
<div class="flex items-center gap-2 mb-3">
|
||||||
|
<span class="text-xs font-mono px-2 py-0.5 rounded bg-orange-500/20 text-orange-300">v1.7.42-alpha</span>
|
||||||
|
<span class="text-xs text-white/40">Apr 22, 2026</span>
|
||||||
|
</div>
|
||||||
|
<div class="space-y-3 text-sm text-white/80 pl-3 border-l border-white/10">
|
||||||
|
<p>Bitcoin dashboards no longer flicker errors during initial chain sync. When bitcoind is busy validating a fresh block it can take up to 10 seconds to answer RPC — the old code gave up after exactly 10 seconds, so any call that landed during that window surfaced as a failure even though the node was perfectly healthy. The RPC client now retries transient timeouts transparently (3 attempts, ~500ms + 1500ms backoff between them) and only surfaces errors that bitcoind itself reported. Connection refused is still fast-failed so genuinely-dead bitcoinds are reported in under a second.</p>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
<!-- v1.7.41-alpha -->
|
<!-- v1.7.41-alpha -->
|
||||||
<div>
|
<div>
|
||||||
<div class="flex items-center gap-2 mb-3">
|
<div class="flex items-center gap-2 mb-3">
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user