diff --git a/docs/PRODUCTION-MASTER-PLAN.md b/docs/PRODUCTION-MASTER-PLAN.md index 4f81e4e1..6956271a 100644 --- a/docs/PRODUCTION-MASTER-PLAN.md +++ b/docs/PRODUCTION-MASTER-PLAN.md @@ -135,6 +135,16 @@ After the 2026-06-23 multinode test deploy (latest backend + UX frontend to .116 3. **§6c Lifecycle perfection** (workstream F) — the comprehensive uninstall/reinstall + progress-UI + all-apps gate expansion below. +## 6b-bis. Bitcoin multi-version bulletproofing (2026-06-29) — READY TO MERGE + DEPLOY + +Branch `bitcoin-version-bulletproof` (base `095a76cd`). Fixes the "switch version silently +fails / crash-loops" class + a data-access mismatch that can corrupt a node's index. All +code + images + catalog + frontend DONE; **.228** carries it (Knots chainstate mid-reindex +recovery). The **coordinated fleet rollout** (OTA binary+frontend, mirror catalog publish, +`:latest` repoint sequencing, full switch-matrix test) is the remaining work — fold it into +the next release. **Authoritative detail + exact remaining steps + test matrix → +`docs/bitcoin-version-bulletproof-rollout.md`.** Pairs with `docs/bitcoin-multi-version-design.md`. + ## 6c. Lifecycle perfection — what "green" MISSED (workstream F, the perfection bar) **Why this exists:** the 2026-06-23 single-node gate went 5×-green but is **NOT** the diff --git a/docs/bitcoin-version-bulletproof-rollout.md b/docs/bitcoin-version-bulletproof-rollout.md new file mode 100644 index 00000000..cf0c9aa4 --- /dev/null +++ b/docs/bitcoin-version-bulletproof-rollout.md @@ -0,0 +1,131 @@ +# Bitcoin Multi-Version — Bulletproofing & Rollout (handoff) + +> **Status 2026-06-29:** code + images + catalog + frontend DONE on branch +> `bitcoin-version-bulletproof` (base commit `095a76cd`, plus the catalog-generator +> + handoff follow-ups). **.228 is the test node**: binary + frontend + catalog are +> live there; its Knots chainstate is mid-**reindex recovery** (see §5). The fleet +> rollout (OTA binary+frontend, mirror catalog publish, `:latest` repoint) is the +> **coordinated step the other agent owns** — see §4. Pairs with +> `docs/bitcoin-multi-version-design.md` (the original design). + +## 1. What was broken (root causes) + +User report: "switched Knots to `v29.3.knots20260508`, version didn't update in the UI." +Three **stacked** bugs, plus a data-corruption hazard: + +1. **Reconciler reverted the pin.** `prod_orchestrator::sync_quadlet_unit` re-rendered the + quadlet every reconcile tick using the manifest's `:latest`, ignoring the per-app + pinned version → any switch silently reverted within one tick. +2. **Entrypoint render bug.** The renderer folded the manifest `entrypoint: ["sh","-lc"]` + into `Exec=`. That only works when the image ENTRYPOINT is a passthrough shell wrapper. + The versioned images use `ENTRYPOINT ["bitcoind"]`, so `Exec=sh -lc …` became + `bitcoind sh -lc …` → `unexpected token 'sh'` → crash loop. +3. **Image USER divergence.** The versioned images were built `USER bitcoin` (uid 1000); + the legacy `:latest` ran as **root**. Chain data is owned by the `data_uid` + (host 100101 / container uid 102). Root reads it via `CAP_DAC_OVERRIDE` (granted in the + manifest); uid-1000 cannot → `Error initializing block database`. +4. **Data hazard (already hit on .228).** Repeated failed starts under mixed UIDs left + bitcoind's two LevelDBs (`blocks/index/` + `chainstate/`) truncated to KB stubs while + the raw `blocks/blk*.dat` (797 GB) stayed intact. Recovery = `bitcoind -reindex` from + local blocks (no re-download). The uniform-root image fix (below) removes the mixed-UID + cause going forward; the proper switch flow was already data-safe (600s stop grace, + clean stop→rm→recreate, conflict-stops the other impl — they share port 8332 + datadir + `/var/lib/archipelago/bitcoin`). + +## 2. What was fixed (all on the branch) + +- **Renderer** (`core/archipelago/src/container/`): + - `prod_orchestrator.rs`: factored `resolve_catalog_image()` (catalog/pinned-version → + image) and call it in BOTH `install_fresh` and `sync_quadlet_unit` — the pin now + survives reconcile. + - `quadlet.rs`: emit a real `Entrypoint=` + `Exec=` instead of folding; + `exec_changed` now also diffs `Entrypoint=` so the recreate fires. Validated against + the live podman 5.4.2 quadlet generator. +- **Images** (`scripts/build-bitcoin-image.sh`, `apps/bitcoin-{knots,core}/Dockerfile`): + removed `USER bitcoin` → run as **container-root** like legacy (still 100% rootless: + container-root maps to the unprivileged host service user; `CAP_DAC_OVERRIDE` from the + manifest lets bitcoind read the `data_uid`-owned datadir). **All** images rebuilt root + + pushed to the mirror (`146.59.87.168:3000/lfg2025`): + - Knots: `29.3.knots20260508`, `29.3.knots20260507`, `29.3.knots20260210`, `29.2.knots20251110` + - Core: `25.2 26.2 27.2 28.4 29.2 29.3 30.2 31.0` + `latest` (→31.0) +- **Catalog** (`scripts/generate-app-catalog.sh` VERSIONS map + regenerated + `releases/app-catalog.json`): Knots & Core `versions[]` populated; the generator now + forces top-level `version` == the `default` entry's version (the `169ff2e2` invariant) + regardless of the manifest version. Knots `latest` entry points at the newest **dated** + image (`29.3.knots20260508`) so "Always use latest" = newest on fixed-binary nodes. +- **Frontend** (`neode-ui/`): + - `AppSidebar.vue`: rename the latest option to **"Always use the latest version"** + (no `v` prefix), fix right padding, and `pickSelection()` guarantees the bound value is + a real option (fixes the blank dropdown). + - New `components/InstallVersionModal.vue`: full-screen version chooser shown from the + App Store / Discover **card** install button for multi-version apps — app icon + + "Install ", latest pre-selected. Wired in `Discover.vue handleInstall`. + - i18n keys: `appDetails.alwaysUseLatestVersion`, `marketplace.installModalTitle/Hint`. + +## 3. Current live state on .228 (test node) + +- Binary with both renderer fixes: **deployed** (`/usr/local/bin/archipelago`). +- New frontend bundle: **deployed** to `/opt/archipelago/web-ui` (hard-refresh to see it). +- Updated catalog: placed at `/var/lib/archipelago/app-catalog.json` (local override — + will refresh from the mirror's OLDER copy at the next hourly fetch until §4 publishes it). +- Knots: `bitcoin-knots` service held **stopped** (`package.stop`, user_stopped); + a detached `bitcoin-knots-reindex` container is rebuilding the index+UTXO (§5). + +## 4. Remaining — coordinated fleet rollout (OTHER AGENT) + +Do this together with the other workstream's release, AFTER both are ready: + +1. **Merge** branch `bitcoin-version-bulletproof` into the release line. +2. **Build + OTA** the binary + frontend (these carry the renderer fix + UI). The renderer + fix is a **hard prerequisite** for the new images everywhere — see fleet-safety below. +3. **Publish the catalog** to the mirror (push `releases/app-catalog.json` to gitea-vps2 + `main`, the raw URL nodes fetch hourly). The current catalog is **fleet-safe even before + the binary lands**: unpinned/auto-update nodes resolve via the manifest's floating + `:latest` (still the legacy image); only explicit version selection (needs the new UI) + uses the new root images. +4. **Only AFTER the binary is fleet-wide:** optionally repoint the `bitcoin-knots:latest` + tag → `29.3.knots20260508` (root) and simplify the catalog `latest` entry back to the + `:latest` tag. **Do NOT repoint `:latest` before then** — old-binary nodes fold + `Exec=sh -lc …` and would crash on an `ENTRYPOINT ["bitcoind"]` image. (Core never + worked on old binaries — it always shipped `ENTRYPOINT ["bitcoind"]` — so Core has no + such constraint.) +5. **Verify the full switch matrix** on a healthy node (§6). + +## 5. Finishing .228's reindex (the remaining test-node task) + +The detached `bitcoin-knots-reindex` container runs the new **root** `29.3.knots20260508` +image with `-reindex -server=0` against `/var/lib/archipelago/bitcoin`. It holds the datadir +lock, so the managed service (held stopped) can't collide. When it has connected blocks up +to ~the prior tip (height ≥ ~955800) it's done; then: + +```sh +# on .228 (SSH/sudo/UI pw all: ThisIsWeb54321@) +podman stop -t 600 bitcoin-knots-reindex && podman rm bitcoin-knots-reindex +# start the managed service via RPC (sets desired=running, clears user_stopped): +# package.start {id: bitcoin-knots} (POST https://127.0.0.1/rpc/v1, CSRF: echo csrf_token cookie as X-CSRF-Token) +# verify: +podman exec bitcoin-knots sh -lc '$(command -v bitcoind) --version | head -1' # → v29.3.knots20260508 +# RPC up → the Bitcoin UI populates; it syncs the gap to tip. +``` +The "Bitcoin RPC connection refused (127.0.0.1:8332)" the UI shows is EXPECTED until this +swap (reindex runs with RPC off). + +## 6. Switch-matrix test plan (what "bulletproof" must prove) + +On a healthy node, each step must end with bitcoind running + RPC answering + syncing, with +NO `Error initializing block database` and NO data loss: +- Knots: switch `latest` → `29.3.knots20260507` → `29.3.knots20260210` → back to `latest`. +- Core: install `latest`; switch `31.0` → `28.4.0`. +- **Knots ↔ Core** (shared datadir/port): Knots→Core upgrade path (Core ≥ data version) and + the reverse. **Cross-major DOWNGRADES** (e.g. 29.x data → Core 28.4) legitimately need a + reindex — the UI already surfaces a downgrade warning; confirm it does and that confirming + reindexes cleanly rather than crash-looping. +- Reboot survival after each switch. + +## 7. Notes / assumptions + +- **"29.2"** in the request doesn't exist as a Knots build (404 upstream); added as **Bitcoin + Core 29.2** (exists). Revisit if a Knots 29.2 was meant. +- Reindex is unavoidable ONLY because .228's index was already corrupted by the pre-fix + crash loop; a normal switch on the fixed binary does NOT reindex. +- Creds for .228: SSH/sudo + UI/RPC all `ThisIsWeb54321@`. diff --git a/releases/app-catalog.json b/releases/app-catalog.json index 7c0c2da4..4d75b685 100644 --- a/releases/app-catalog.json +++ b/releases/app-catalog.json @@ -1,6 +1,6 @@ { "schema": 1, - "updated": "2026-06-28", + "updated": "2026-06-29", "apps": { "adguardhome": { "version": "v0.107.55", @@ -343,7 +343,7 @@ "description": "Reference Bitcoin Core node with dynamic prune/full-mode startup based on host disk.", "container_name": "bitcoin-core", "container": { - "image": "146.59.87.168:3000/lfg2025/bitcoin:latest", + "image": "146.59.87.168:3000/lfg2025/bitcoin:28.4", "pull_policy": "if-not-present", "network": "archy-net", "entrypoint": [ diff --git a/scripts/generate-app-catalog.sh b/scripts/generate-app-catalog.sh index a519b1fe..cfd47f0e 100755 --- a/scripts/generate-app-catalog.sh +++ b/scripts/generate-app-catalog.sh @@ -182,10 +182,12 @@ VERSIONS = { # "Version & Updates" card. Add the next release by building its image then # prepending it here. "bitcoin-core": [ + {"version": "latest", "image": f"{REGISTRY}/bitcoin:latest", "default": True}, {"version": "31.0", "image": f"{REGISTRY}/bitcoin:31.0"}, {"version": "30.2", "image": f"{REGISTRY}/bitcoin:30.2"}, {"version": "29.3", "image": f"{REGISTRY}/bitcoin:29.3"}, - {"version": "28.4.0", "image": f"{REGISTRY}/bitcoin:28.4", "default": True}, + {"version": "29.2", "image": f"{REGISTRY}/bitcoin:29.2"}, + {"version": "28.4.0", "image": f"{REGISTRY}/bitcoin:28.4"}, {"version": "27.2", "image": f"{REGISTRY}/bitcoin:27.2"}, {"version": "26.2", "image": f"{REGISTRY}/bitcoin:26.2", "deprecated": True}, {"version": "25.2", "image": f"{REGISTRY}/bitcoin:25.2", "deprecated": True}, @@ -196,16 +198,35 @@ VERSIONS = { # top-level catalog version (L167-168) or the card can't reach "latest" and # selecting the highlighted default would instead pin+recreate. Pinning # 29.3.knots20260508 moves a runner off the floating tag. + # `latest` is the default and points at the NEWEST published dated image + # (not the bare :latest tag) so "Always use the latest version" installs the + # newest build on fixed-binary nodes, while UNPINNED nodes still resolve via + # the manifest's floating :latest tag (kept on the legacy image until the + # entrypoint-render fix is fleet-deployed — see + # docs/bitcoin-version-bulletproof-rollout.md). "bitcoin-knots": [ {"version": "latest", - "image": f"{REGISTRY}/bitcoin-knots:latest", "default": True}, + "image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260508", "default": True}, {"version": "29.3.knots20260508", "image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260508"}, + {"version": "29.3.knots20260507", + "image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260507"}, + {"version": "29.3.knots20260210", + "image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260210"}, + {"version": "29.2.knots20251110", + "image": f"{REGISTRY}/bitcoin-knots:29.2.knots20251110"}, ], } for app_id, versions in VERSIONS.items(): if app_id in apps and versions: apps[app_id]["versions"] = versions + # The default/latest entry MUST equal the app's top-level catalog + # `version` (commit 169ff2e2) so selecting the highlighted default + # un-pins / tracks latest instead of pinning+recreating. Enforce it here + # rather than relying on the manifest version matching. + default_entry = next((v for v in versions if v.get("default")), None) + if default_entry: + apps[app_id]["version"] = default_entry["version"] catalog = { "schema": 1,