archy/core/archipelago/Cargo.toml
archipelago 7ecd30bde2 release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red
Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on
a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block
validation. The old 10s client-side timeout was rejecting roughly 30% of
UI calls even though the node was perfectly healthy. 20x stress test on
the live .116 node (caught in IBD catch-up at block 797k) used to drop
10 of 20 calls; now drops 0 of 20.

What changed:
- core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up
  to 3 times with 500ms and 1500ms backoffs between attempts. Only
  transient transport errors (timeout, connect refused, send/recv IO)
  trigger retry. A well-formed bitcoind error response is surfaced
  immediately - real RPC bugs are never masked.
- Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top
  of reqwest's own timeout, so DNS starvation or TLS wedging can't
  steal the entire retry budget.
- handle_bitcoin_getinfo client builder gained a 3s connect_timeout
  so a dead bitcoind is fast-failed inside the first attempt instead
  of eating the whole 15s.
- Retry policy extracted into a RetryConfig struct so tests can dial
  down timeouts to ~100ms per attempt. Production defaults live in
  RetryConfig::production().

Not changed (tracked as follow-up):
- mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the
  same 10s-timeout pattern. Not migrated to the new wrapper in this
  release; scheduled for v1.7.43 alongside the render_bitcoin_conf
  work.
- lnd/info.rs and electrs_status have similar 10s/15s timeouts but
  different failure profiles - audit first, migrate only the ones
  that actually exhibit the bug.

Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing.
Uses an in-process hyper server (already a transitive dep) to simulate
bitcoind responses; no new crates required.
  - happy_path_first_attempt: no retry when first attempt succeeds
  - retries_on_timeout_then_succeeds: first attempt times out, second
    succeeds, returns OK (uses a short-timeout RetryConfig so the test
    runs in <1s instead of 15s)
  - retries_exhausted_on_persistent_connect_refused: all attempts fail
    against a closed port, error bubbles up, elapsed time confirms
    backoffs actually ran
  - does_not_retry_on_rpc_level_error: bitcoind-returned error body is
    surfaced immediately, no retry
  - does_not_retry_parse_errors: non-JSON response (e.g. 503 with html
    body) is NOT retried - guards against the tempting "retry all
    non-2xx" mistake that would mask real bitcoind misconfig
  - retry_budget_invariants: asserts total wall-time ceiling stays
    under 60s so a bumped constant can't silently hang a UI call
    forever

Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD
catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the
old 10s timeout. Worst-case latency was 48.9s during peak validation;
happy-path latency (cached result) remains 28-77ms.
2026-04-22 16:46:28 -04:00

109 lines
2.9 KiB
TOML

[package]
name = "archipelago"
version = "1.7.42-alpha"
edition = "2021"
description = "Archipelago Bitcoin Node OS - Native backend"
authors = ["Archipelago Team"]
[[bin]]
name = "archipelago"
path = "src/main.rs"
[dependencies]
# Core dependencies
tokio = { version = "1", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
anyhow = "1.0"
thiserror = "1.0"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
# HTTP and WebSocket
hyper = { version = "0.14", features = ["full", "http1"] }
hyper-util = { version = "0.1", features = ["full", "http1"] }
http-body-util = "0.1"
http-body = "1.0"
tower = "0.5"
tower-http = { version = "0.6", features = ["cors", "trace"] }
hyper-ws-listener = "0.3.0"
tokio-tungstenite = "0.20"
futures-util = "0.3"
# Our modules
archipelago-container = { path = "../container" }
archipelago-security = { path = "../security" }
archipelago-performance = { path = "../performance" }
# Database (optional for now - can use SQLite or skip)
# sqlx = { version = "0.7", features = ["sqlite", "runtime-tokio-rustls"] }
# Authentication
bcrypt = "0.15"
sha2 = "0.10.9"
hmac = "0.12.1"
uuid = { version = "1.0", features = ["v4"] }
regex = "1.10"
# Node identity (Ed25519 + X25519 key agreement)
ed25519-dalek = { version = "2.2.0", features = ["rand_core"] }
curve25519-dalek = "4.1.3"
rand = "0.8.5"
hex = "0.4"
bs58 = "0.5"
chrono = "0.4"
# BIP-39 mnemonic seed generation + BIP-32 HD key derivation
bip39 = { version = "=2.1.0", features = ["rand"] }
bitcoin = { version = "=0.32.5", features = ["rand-std"] }
# Configuration
toml = "0.8"
serde_yaml = "0.9"
# HTTP client (for LND REST proxy, Tor SOCKS for peer messaging)
# Uses rustls-tls for cross-compilation (no OpenSSL dependency)
reqwest = { version = "0.11", default-features = false, features = ["json", "socks", "rustls-tls"] }
# Nostr (node discovery + NIP-44 encrypted peer handshake)
nostr-sdk = { version = "0.44", features = ["nip04", "nip44"] }
# Backup encryption (DID identity export) + TOTP 2FA encryption
argon2 = "0.5.3"
chacha20poly1305 = "0.10.1"
base64 = "0.21"
# Full system backup (tar archive + gzip compression)
tar = "0.4"
flate2 = "1.0"
# TOTP 2FA
totp-rs = { version = "5.7", features = ["otpauth", "gen_secret"] }
qrcode = "0.14"
data-encoding = "2.6"
zeroize = { version = "1.8.2", features = ["derive"] }
# Mainline DHT (did:dht — BitTorrent DHT for decentralized identity)
mainline = "2"
zbase32 = "0.1"
bytes = "1"
# Mesh networking (Meshcore serial protocol over USB LoRa radios)
serial2-tokio = "0.1"
# Double Ratchet key derivation (Phase 3: encrypted mesh messaging)
hkdf = "0.12.4"
# Transport abstraction (Phase 2: mesh as federation transport)
ciborium = "0.2.2"
reed-solomon-erasure = "6.0"
mdns-sd = "0.18"
# Systemd watchdog notification
sd-notify = "0.4"
[dev-dependencies]
tokio-test = "0.4"
tempfile = "3.10"