archy/docs/CURRENT_AGENT_HANDOFF.md
2026-06-11 00:24:54 -04:00

217 lines
8.2 KiB
Markdown

# Current Agent Handoff - Bitcoin UI Recovery And `1.8-alpha` Resume
Last updated: 2026-06-10 05:33 EDT
## Read This First
This is a separate handoff from `docs/NEXT_TERMINAL_HANDOFF.md`. That file tracks
an older/broader plan. For the next agent resuming this machine-switch pause,
read this file first, then read:
- `docs/RESUME.md`
- `docs/1.8-alpha-improvements-tracker.md`
- `docs/CONTAINER_LIFECYCLE_HANDOFF.md`
- `docs/MIGRATION_STATUS_REPORT.md`
Do not assume `docs/NEXT_TERMINAL_HANDOFF.md` is the current short-term plan.
## Current Goal
Cut Archipelago `1.8-alpha`, including a ready-to-test ISO image.
The release goal is not just "apps launch once"; the app/container system needs
to be developer-ready and production-release ready:
- manifests and docs must describe the real runtime contract;
- apps must install, start, stop, restart, uninstall, reinstall, survive reboot,
report truthful status, and show useful progress;
- My Apps must preserve last-known truth during Podman/scanner backoff instead
of showing false empty/no-app states;
- Bitcoin-dependent apps must explain sync/wallet readiness instead of looking
broken;
- final validation needs focused lifecycle, broad non-destructive lifecycle,
then repeated reboot checks before ISO cut/smoke test.
## Current Estimate
As of this pause:
- Credible release candidate: roughly `87-91%`.
- Production-quality release developers will love: roughly `73-79%`.
- Calendar estimate if the remaining systemic lifecycle issues are bounded:
`1-2 focused engineering days` for a release candidate, then additional
reboot/ISO smoke time.
- The biggest remaining risk is not catalog wiring; it is rootless Podman
control-plane responsiveness, stale scanner state, lifecycle progress UX, and
reboot validation.
## Validation Host
- Host: `192.168.1.198`
- SSH user: `archipelago`
- Password used in this session: `password123`
- Active Bitcoin app on this host: `bitcoin-knots`, not `bitcoin-core`
- Keep `archipelago-doctor.timer` and `archipelago-reconcile.timer` inactive
for deterministic validation unless intentionally testing them.
- Preserve app data.
- Avoid broad Podman store/image cleanup commands on `.198`.
## Bitcoin UI Incident Summary
User reported the Bitcoin custom UI showing:
`Bitcoin node is starting or busy syncing; retrying automatically. Detail:
getblockchaininfo: Bitcoin RPC request failed ... operation timed out`
Then after listener repair, the message changed through:
- `Connection refused`
- `Verifying blocks...`
- then the user reported it looked fine again.
What happened:
- The node is a `bitcoin-knots` node.
- During live debugging, the wrong alias, `bitcoin-core`, was started/stopped.
- `bitcoin-core` and `bitcoin-knots` compete for the same Bitcoin RPC/P2P ports.
- That action left the real `bitcoin-knots` service active but without the host
`8332` rootlessport listener for a while.
- Stopping the stray `bitcoin-core.service` and restarting only
`bitcoin-knots.service` recreated listeners on `8332` and `8333`.
- After restart, bitcoind entered the normal `-28 Verifying blocks...` phase.
- The user later reported the Bitcoin UI looked fine again.
Known live state observed during recovery:
- `bitcoin-knots.service`: active
- `bitcoin-core.service`: inactive
- `archy-bitcoin-ui.service`: active
- listeners present after repair:
- `8332` via `rootlessport`
- `8333` via `rootlessport`
- `8334` via nginx/Bitcoin UI
- `bitcoin-knots` logs showed active IBD around height `4137xx` and progress
about `0.09438`.
Do not restart Bitcoin again unless there is a fresh confirmed service/listener
failure. If checking status, prefer read-only probes and avoid starting the
wrong variant.
## Source Fixes Made Locally
These local edits were made after live Bitcoin recovered. They are not deployed
yet and were not fully validated before the user paused.
### `core/archipelago/src/bitcoin_status.rs`
Changed Bitcoin status cache behavior and copy:
- refresh interval changed from `5s` to `10s`;
- transient error backoff added at `15s`;
- RPC client timeout increased from `8s` to `20s`;
- error context now uses full anyhow chain with `{e:#}`;
- transient classifications now include common overloaded/backend states;
- user-facing copy now distinguishes:
- `verifying blocks after restart`;
- `waiting for the Bitcoin RPC listener`;
- `busy and not answering RPC before the timeout`;
- generic `starting or busy syncing`;
- added unit tests for the three user-visible states above.
Intent: stop collapsing distinct backend states into the same stale
"starting or busy syncing" timeout message.
### `core/archipelago/src/api/rpc/package/update.rs`
Narrow Bitcoin alias fix added:
- `orchestrator_update_app_id("bitcoin-knots")` now remains
`"bitcoin-knots"` instead of mapping to `"bitcoin-core"`;
- candidate app IDs for a Bitcoin container now prefer `bitcoin-knots` before
`bitcoin-core`;
- tests updated to lock this behavior.
Intent: `bitcoin-core` and `bitcoin-knots` can be dependency/status aliases,
but must not be interchangeable lifecycle/update targets on a node that has a
specific installed variant.
Important: this file also already contained other uncommitted update/pull
timeout changes from prior work. Do not assume every diff in this file came
from this interruption.
## Validation Status At Pause
Completed:
- `cargo fmt --manifest-path core/Cargo.toml --all` passed after the local
Bitcoin edits.
Attempted but not completed:
- Targeted Cargo tests were first launched in three separate `/tmp` target dirs
and failed due `/tmp` filling with `No space left on device`.
- Those temporary dirs were removed:
- `/tmp/archy-cargo-bitcoin-status`
- `/tmp/archy-cargo-update-alias`
- `/tmp/archy-cargo-container-candidates`
- A second run using `CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix` was still
compiling when the user paused. It was terminated for handoff.
- No successful Rust test result exists yet for the new Bitcoin status/alias
tests.
Recommended validation after resume:
```bash
git diff --check -- core/archipelago/src/bitcoin_status.rs core/archipelago/src/api/rpc/package/update.rs docs/CURRENT_AGENT_HANDOFF.md
CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago bitcoin_status::tests
CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago update_aliases_map_to_manifest_app_ids
CARGO_TARGET_DIR=.codex-tmp/cargo-bitcoin-fix CARGO_BUILD_JOBS=2 cargo test --manifest-path core/Cargo.toml -p archipelago container_name_candidates_cover_common_aliases
```
If Cargo target locking appears stale, check for real `cargo`/`rustc` workers
before deleting anything. Prefer workspace-local target dirs under `.codex-tmp`
over new cold `/tmp` targets.
## Immediate Next Steps
1. Confirm no lingering Cargo process:
```bash
pgrep -af "cargo|rustc|cargo-bitcoin-fix"
```
2. Validate the local Bitcoin source fixes listed above.
3. If validation passes, build/deploy the backend to `.198` only after
confirming the user still wants deployment.
4. Recheck live Bitcoin non-destructively:
- `bitcoin-knots.service` active;
- `bitcoin-core.service` inactive;
- listeners on `8332`, `8333`, `8334`;
- Bitcoin UI loads on `8334`;
- `/bitcoin-status` returns useful copy if backend is busy.
5. Resume release backlog:
- rootless Podman lifecycle/control-plane responsiveness;
- My Apps last-known-state truthfulness during scanner backoff;
- progress UX for install/uninstall/start/stop/restart;
- remaining tracker rows in `docs/1.8-alpha-improvements-tracker.md`;
- focused lifecycle matrix on `.198`;
- broad non-destructive lifecycle;
- 3 clean reboot validations minimum, 5 preferred;
- ISO cut and ISO smoke test.
## Cautions For Next Agent
- Do not start `bitcoin-core` on `.198` unless intentionally migrating variants.
- Treat `bitcoin-knots` as the installed Bitcoin variant.
- Do not run broad Podman prune/store cleanup.
- Do not revert unrelated dirty worktree changes.
- `docs/NEXT_TERMINAL_HANDOFF.md` exists but is not the short-term handoff for
this pause.
- Many repo files are dirty from broader release hardening. Read diffs before
attributing changes.