docs+catalog: bitcoin multi-version rollout handoff + reproducible generator
- generate-app-catalog.sh: VERSIONS map now lists the full Knots set (29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core (adds 29.2 + a `latest` entry → newest); generator forces top-level `version` == the default entry's version (the 169ff2e2 invariant) so regeneration is reproducible. releases/app-catalog.json regenerated. - docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes, fixes, current .228 state, the coordinated fleet-rollout steps (incl. :latest repoint sequencing / fleet-safety), reindex finish procedure, and the switch-matrix test plan. - PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
095a76cd20
commit
ed1352d3a3
@ -135,6 +135,16 @@ After the 2026-06-23 multinode test deploy (latest backend + UX frontend to .116
|
||||
3. **§6c Lifecycle perfection** (workstream F) — the comprehensive uninstall/reinstall +
|
||||
progress-UI + all-apps gate expansion below.
|
||||
|
||||
## 6b-bis. Bitcoin multi-version bulletproofing (2026-06-29) — READY TO MERGE + DEPLOY
|
||||
|
||||
Branch `bitcoin-version-bulletproof` (base `095a76cd`). Fixes the "switch version silently
|
||||
fails / crash-loops" class + a data-access mismatch that can corrupt a node's index. All
|
||||
code + images + catalog + frontend DONE; **.228** carries it (Knots chainstate mid-reindex
|
||||
recovery). The **coordinated fleet rollout** (OTA binary+frontend, mirror catalog publish,
|
||||
`:latest` repoint sequencing, full switch-matrix test) is the remaining work — fold it into
|
||||
the next release. **Authoritative detail + exact remaining steps + test matrix →
|
||||
`docs/bitcoin-version-bulletproof-rollout.md`.** Pairs with `docs/bitcoin-multi-version-design.md`.
|
||||
|
||||
## 6c. Lifecycle perfection — what "green" MISSED (workstream F, the perfection bar)
|
||||
|
||||
**Why this exists:** the 2026-06-23 single-node gate went 5×-green but is **NOT** the
|
||||
|
||||
131
docs/bitcoin-version-bulletproof-rollout.md
Normal file
131
docs/bitcoin-version-bulletproof-rollout.md
Normal file
@ -0,0 +1,131 @@
|
||||
# Bitcoin Multi-Version — Bulletproofing & Rollout (handoff)
|
||||
|
||||
> **Status 2026-06-29:** code + images + catalog + frontend DONE on branch
|
||||
> `bitcoin-version-bulletproof` (base commit `095a76cd`, plus the catalog-generator
|
||||
> + handoff follow-ups). **.228 is the test node**: binary + frontend + catalog are
|
||||
> live there; its Knots chainstate is mid-**reindex recovery** (see §5). The fleet
|
||||
> rollout (OTA binary+frontend, mirror catalog publish, `:latest` repoint) is the
|
||||
> **coordinated step the other agent owns** — see §4. Pairs with
|
||||
> `docs/bitcoin-multi-version-design.md` (the original design).
|
||||
|
||||
## 1. What was broken (root causes)
|
||||
|
||||
User report: "switched Knots to `v29.3.knots20260508`, version didn't update in the UI."
|
||||
Three **stacked** bugs, plus a data-corruption hazard:
|
||||
|
||||
1. **Reconciler reverted the pin.** `prod_orchestrator::sync_quadlet_unit` re-rendered the
|
||||
quadlet every reconcile tick using the manifest's `:latest`, ignoring the per-app
|
||||
pinned version → any switch silently reverted within one tick.
|
||||
2. **Entrypoint render bug.** The renderer folded the manifest `entrypoint: ["sh","-lc"]`
|
||||
into `Exec=`. That only works when the image ENTRYPOINT is a passthrough shell wrapper.
|
||||
The versioned images use `ENTRYPOINT ["bitcoind"]`, so `Exec=sh -lc …` became
|
||||
`bitcoind sh -lc …` → `unexpected token 'sh'` → crash loop.
|
||||
3. **Image USER divergence.** The versioned images were built `USER bitcoin` (uid 1000);
|
||||
the legacy `:latest` ran as **root**. Chain data is owned by the `data_uid`
|
||||
(host 100101 / container uid 102). Root reads it via `CAP_DAC_OVERRIDE` (granted in the
|
||||
manifest); uid-1000 cannot → `Error initializing block database`.
|
||||
4. **Data hazard (already hit on .228).** Repeated failed starts under mixed UIDs left
|
||||
bitcoind's two LevelDBs (`blocks/index/` + `chainstate/`) truncated to KB stubs while
|
||||
the raw `blocks/blk*.dat` (797 GB) stayed intact. Recovery = `bitcoind -reindex` from
|
||||
local blocks (no re-download). The uniform-root image fix (below) removes the mixed-UID
|
||||
cause going forward; the proper switch flow was already data-safe (600s stop grace,
|
||||
clean stop→rm→recreate, conflict-stops the other impl — they share port 8332 + datadir
|
||||
`/var/lib/archipelago/bitcoin`).
|
||||
|
||||
## 2. What was fixed (all on the branch)
|
||||
|
||||
- **Renderer** (`core/archipelago/src/container/`):
|
||||
- `prod_orchestrator.rs`: factored `resolve_catalog_image()` (catalog/pinned-version →
|
||||
image) and call it in BOTH `install_fresh` and `sync_quadlet_unit` — the pin now
|
||||
survives reconcile.
|
||||
- `quadlet.rs`: emit a real `Entrypoint=<first>` + `Exec=<rest+cmd>` instead of folding;
|
||||
`exec_changed` now also diffs `Entrypoint=` so the recreate fires. Validated against
|
||||
the live podman 5.4.2 quadlet generator.
|
||||
- **Images** (`scripts/build-bitcoin-image.sh`, `apps/bitcoin-{knots,core}/Dockerfile`):
|
||||
removed `USER bitcoin` → run as **container-root** like legacy (still 100% rootless:
|
||||
container-root maps to the unprivileged host service user; `CAP_DAC_OVERRIDE` from the
|
||||
manifest lets bitcoind read the `data_uid`-owned datadir). **All** images rebuilt root +
|
||||
pushed to the mirror (`146.59.87.168:3000/lfg2025`):
|
||||
- Knots: `29.3.knots20260508`, `29.3.knots20260507`, `29.3.knots20260210`, `29.2.knots20251110`
|
||||
- Core: `25.2 26.2 27.2 28.4 29.2 29.3 30.2 31.0` + `latest` (→31.0)
|
||||
- **Catalog** (`scripts/generate-app-catalog.sh` VERSIONS map + regenerated
|
||||
`releases/app-catalog.json`): Knots & Core `versions[]` populated; the generator now
|
||||
forces top-level `version` == the `default` entry's version (the `169ff2e2` invariant)
|
||||
regardless of the manifest version. Knots `latest` entry points at the newest **dated**
|
||||
image (`29.3.knots20260508`) so "Always use latest" = newest on fixed-binary nodes.
|
||||
- **Frontend** (`neode-ui/`):
|
||||
- `AppSidebar.vue`: rename the latest option to **"Always use the latest version"**
|
||||
(no `v` prefix), fix right padding, and `pickSelection()` guarantees the bound value is
|
||||
a real option (fixes the blank dropdown).
|
||||
- New `components/InstallVersionModal.vue`: full-screen version chooser shown from the
|
||||
App Store / Discover **card** install button for multi-version apps — app icon +
|
||||
"Install <name>", latest pre-selected. Wired in `Discover.vue handleInstall`.
|
||||
- i18n keys: `appDetails.alwaysUseLatestVersion`, `marketplace.installModalTitle/Hint`.
|
||||
|
||||
## 3. Current live state on .228 (test node)
|
||||
|
||||
- Binary with both renderer fixes: **deployed** (`/usr/local/bin/archipelago`).
|
||||
- New frontend bundle: **deployed** to `/opt/archipelago/web-ui` (hard-refresh to see it).
|
||||
- Updated catalog: placed at `/var/lib/archipelago/app-catalog.json` (local override —
|
||||
will refresh from the mirror's OLDER copy at the next hourly fetch until §4 publishes it).
|
||||
- Knots: `bitcoin-knots` service held **stopped** (`package.stop`, user_stopped);
|
||||
a detached `bitcoin-knots-reindex` container is rebuilding the index+UTXO (§5).
|
||||
|
||||
## 4. Remaining — coordinated fleet rollout (OTHER AGENT)
|
||||
|
||||
Do this together with the other workstream's release, AFTER both are ready:
|
||||
|
||||
1. **Merge** branch `bitcoin-version-bulletproof` into the release line.
|
||||
2. **Build + OTA** the binary + frontend (these carry the renderer fix + UI). The renderer
|
||||
fix is a **hard prerequisite** for the new images everywhere — see fleet-safety below.
|
||||
3. **Publish the catalog** to the mirror (push `releases/app-catalog.json` to gitea-vps2
|
||||
`main`, the raw URL nodes fetch hourly). The current catalog is **fleet-safe even before
|
||||
the binary lands**: unpinned/auto-update nodes resolve via the manifest's floating
|
||||
`:latest` (still the legacy image); only explicit version selection (needs the new UI)
|
||||
uses the new root images.
|
||||
4. **Only AFTER the binary is fleet-wide:** optionally repoint the `bitcoin-knots:latest`
|
||||
tag → `29.3.knots20260508` (root) and simplify the catalog `latest` entry back to the
|
||||
`:latest` tag. **Do NOT repoint `:latest` before then** — old-binary nodes fold
|
||||
`Exec=sh -lc …` and would crash on an `ENTRYPOINT ["bitcoind"]` image. (Core never
|
||||
worked on old binaries — it always shipped `ENTRYPOINT ["bitcoind"]` — so Core has no
|
||||
such constraint.)
|
||||
5. **Verify the full switch matrix** on a healthy node (§6).
|
||||
|
||||
## 5. Finishing .228's reindex (the remaining test-node task)
|
||||
|
||||
The detached `bitcoin-knots-reindex` container runs the new **root** `29.3.knots20260508`
|
||||
image with `-reindex -server=0` against `/var/lib/archipelago/bitcoin`. It holds the datadir
|
||||
lock, so the managed service (held stopped) can't collide. When it has connected blocks up
|
||||
to ~the prior tip (height ≥ ~955800) it's done; then:
|
||||
|
||||
```sh
|
||||
# on .228 (SSH/sudo/UI pw all: ThisIsWeb54321@)
|
||||
podman stop -t 600 bitcoin-knots-reindex && podman rm bitcoin-knots-reindex
|
||||
# start the managed service via RPC (sets desired=running, clears user_stopped):
|
||||
# package.start {id: bitcoin-knots} (POST https://127.0.0.1/rpc/v1, CSRF: echo csrf_token cookie as X-CSRF-Token)
|
||||
# verify:
|
||||
podman exec bitcoin-knots sh -lc '$(command -v bitcoind) --version | head -1' # → v29.3.knots20260508
|
||||
# RPC up → the Bitcoin UI populates; it syncs the gap to tip.
|
||||
```
|
||||
The "Bitcoin RPC connection refused (127.0.0.1:8332)" the UI shows is EXPECTED until this
|
||||
swap (reindex runs with RPC off).
|
||||
|
||||
## 6. Switch-matrix test plan (what "bulletproof" must prove)
|
||||
|
||||
On a healthy node, each step must end with bitcoind running + RPC answering + syncing, with
|
||||
NO `Error initializing block database` and NO data loss:
|
||||
- Knots: switch `latest` → `29.3.knots20260507` → `29.3.knots20260210` → back to `latest`.
|
||||
- Core: install `latest`; switch `31.0` → `28.4.0`.
|
||||
- **Knots ↔ Core** (shared datadir/port): Knots→Core upgrade path (Core ≥ data version) and
|
||||
the reverse. **Cross-major DOWNGRADES** (e.g. 29.x data → Core 28.4) legitimately need a
|
||||
reindex — the UI already surfaces a downgrade warning; confirm it does and that confirming
|
||||
reindexes cleanly rather than crash-looping.
|
||||
- Reboot survival after each switch.
|
||||
|
||||
## 7. Notes / assumptions
|
||||
|
||||
- **"29.2"** in the request doesn't exist as a Knots build (404 upstream); added as **Bitcoin
|
||||
Core 29.2** (exists). Revisit if a Knots 29.2 was meant.
|
||||
- Reindex is unavoidable ONLY because .228's index was already corrupted by the pre-fix
|
||||
crash loop; a normal switch on the fixed binary does NOT reindex.
|
||||
- Creds for .228: SSH/sudo + UI/RPC all `ThisIsWeb54321@`.
|
||||
@ -1,6 +1,6 @@
|
||||
{
|
||||
"schema": 1,
|
||||
"updated": "2026-06-28",
|
||||
"updated": "2026-06-29",
|
||||
"apps": {
|
||||
"adguardhome": {
|
||||
"version": "v0.107.55",
|
||||
@ -343,7 +343,7 @@
|
||||
"description": "Reference Bitcoin Core node with dynamic prune/full-mode startup based on host disk.",
|
||||
"container_name": "bitcoin-core",
|
||||
"container": {
|
||||
"image": "146.59.87.168:3000/lfg2025/bitcoin:latest",
|
||||
"image": "146.59.87.168:3000/lfg2025/bitcoin:28.4",
|
||||
"pull_policy": "if-not-present",
|
||||
"network": "archy-net",
|
||||
"entrypoint": [
|
||||
|
||||
@ -182,10 +182,12 @@ VERSIONS = {
|
||||
# "Version & Updates" card. Add the next release by building its image then
|
||||
# prepending it here.
|
||||
"bitcoin-core": [
|
||||
{"version": "latest", "image": f"{REGISTRY}/bitcoin:latest", "default": True},
|
||||
{"version": "31.0", "image": f"{REGISTRY}/bitcoin:31.0"},
|
||||
{"version": "30.2", "image": f"{REGISTRY}/bitcoin:30.2"},
|
||||
{"version": "29.3", "image": f"{REGISTRY}/bitcoin:29.3"},
|
||||
{"version": "28.4.0", "image": f"{REGISTRY}/bitcoin:28.4", "default": True},
|
||||
{"version": "29.2", "image": f"{REGISTRY}/bitcoin:29.2"},
|
||||
{"version": "28.4.0", "image": f"{REGISTRY}/bitcoin:28.4"},
|
||||
{"version": "27.2", "image": f"{REGISTRY}/bitcoin:27.2"},
|
||||
{"version": "26.2", "image": f"{REGISTRY}/bitcoin:26.2", "deprecated": True},
|
||||
{"version": "25.2", "image": f"{REGISTRY}/bitcoin:25.2", "deprecated": True},
|
||||
@ -196,16 +198,35 @@ VERSIONS = {
|
||||
# top-level catalog version (L167-168) or the card can't reach "latest" and
|
||||
# selecting the highlighted default would instead pin+recreate. Pinning
|
||||
# 29.3.knots20260508 moves a runner off the floating tag.
|
||||
# `latest` is the default and points at the NEWEST published dated image
|
||||
# (not the bare :latest tag) so "Always use the latest version" installs the
|
||||
# newest build on fixed-binary nodes, while UNPINNED nodes still resolve via
|
||||
# the manifest's floating :latest tag (kept on the legacy image until the
|
||||
# entrypoint-render fix is fleet-deployed — see
|
||||
# docs/bitcoin-version-bulletproof-rollout.md).
|
||||
"bitcoin-knots": [
|
||||
{"version": "latest",
|
||||
"image": f"{REGISTRY}/bitcoin-knots:latest", "default": True},
|
||||
"image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260508", "default": True},
|
||||
{"version": "29.3.knots20260508",
|
||||
"image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260508"},
|
||||
{"version": "29.3.knots20260507",
|
||||
"image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260507"},
|
||||
{"version": "29.3.knots20260210",
|
||||
"image": f"{REGISTRY}/bitcoin-knots:29.3.knots20260210"},
|
||||
{"version": "29.2.knots20251110",
|
||||
"image": f"{REGISTRY}/bitcoin-knots:29.2.knots20251110"},
|
||||
],
|
||||
}
|
||||
for app_id, versions in VERSIONS.items():
|
||||
if app_id in apps and versions:
|
||||
apps[app_id]["versions"] = versions
|
||||
# The default/latest entry MUST equal the app's top-level catalog
|
||||
# `version` (commit 169ff2e2) so selecting the highlighted default
|
||||
# un-pins / tracks latest instead of pinning+recreating. Enforce it here
|
||||
# rather than relying on the manifest version matching.
|
||||
default_entry = next((v for v in versions if v.get("default")), None)
|
||||
if default_entry:
|
||||
apps[app_id]["version"] = default_entry["version"]
|
||||
|
||||
catalog = {
|
||||
"schema": 1,
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user