132 lines
8.2 KiB
Markdown
132 lines
8.2 KiB
Markdown
|
|
# Bitcoin Multi-Version — Bulletproofing & Rollout (handoff)
|
||
|
|
|
||
|
|
> **Status 2026-06-29:** code + images + catalog + frontend DONE on branch
|
||
|
|
> `bitcoin-version-bulletproof` (base commit `095a76cd`, plus the catalog-generator
|
||
|
|
> + handoff follow-ups). **.228 is the test node**: binary + frontend + catalog are
|
||
|
|
> live there; its Knots chainstate is mid-**reindex recovery** (see §5). The fleet
|
||
|
|
> rollout (OTA binary+frontend, mirror catalog publish, `:latest` repoint) is the
|
||
|
|
> **coordinated step the other agent owns** — see §4. Pairs with
|
||
|
|
> `docs/bitcoin-multi-version-design.md` (the original design).
|
||
|
|
|
||
|
|
## 1. What was broken (root causes)
|
||
|
|
|
||
|
|
User report: "switched Knots to `v29.3.knots20260508`, version didn't update in the UI."
|
||
|
|
Three **stacked** bugs, plus a data-corruption hazard:
|
||
|
|
|
||
|
|
1. **Reconciler reverted the pin.** `prod_orchestrator::sync_quadlet_unit` re-rendered the
|
||
|
|
quadlet every reconcile tick using the manifest's `:latest`, ignoring the per-app
|
||
|
|
pinned version → any switch silently reverted within one tick.
|
||
|
|
2. **Entrypoint render bug.** The renderer folded the manifest `entrypoint: ["sh","-lc"]`
|
||
|
|
into `Exec=`. That only works when the image ENTRYPOINT is a passthrough shell wrapper.
|
||
|
|
The versioned images use `ENTRYPOINT ["bitcoind"]`, so `Exec=sh -lc …` became
|
||
|
|
`bitcoind sh -lc …` → `unexpected token 'sh'` → crash loop.
|
||
|
|
3. **Image USER divergence.** The versioned images were built `USER bitcoin` (uid 1000);
|
||
|
|
the legacy `:latest` ran as **root**. Chain data is owned by the `data_uid`
|
||
|
|
(host 100101 / container uid 102). Root reads it via `CAP_DAC_OVERRIDE` (granted in the
|
||
|
|
manifest); uid-1000 cannot → `Error initializing block database`.
|
||
|
|
4. **Data hazard (already hit on .228).** Repeated failed starts under mixed UIDs left
|
||
|
|
bitcoind's two LevelDBs (`blocks/index/` + `chainstate/`) truncated to KB stubs while
|
||
|
|
the raw `blocks/blk*.dat` (797 GB) stayed intact. Recovery = `bitcoind -reindex` from
|
||
|
|
local blocks (no re-download). The uniform-root image fix (below) removes the mixed-UID
|
||
|
|
cause going forward; the proper switch flow was already data-safe (600s stop grace,
|
||
|
|
clean stop→rm→recreate, conflict-stops the other impl — they share port 8332 + datadir
|
||
|
|
`/var/lib/archipelago/bitcoin`).
|
||
|
|
|
||
|
|
## 2. What was fixed (all on the branch)
|
||
|
|
|
||
|
|
- **Renderer** (`core/archipelago/src/container/`):
|
||
|
|
- `prod_orchestrator.rs`: factored `resolve_catalog_image()` (catalog/pinned-version →
|
||
|
|
image) and call it in BOTH `install_fresh` and `sync_quadlet_unit` — the pin now
|
||
|
|
survives reconcile.
|
||
|
|
- `quadlet.rs`: emit a real `Entrypoint=<first>` + `Exec=<rest+cmd>` instead of folding;
|
||
|
|
`exec_changed` now also diffs `Entrypoint=` so the recreate fires. Validated against
|
||
|
|
the live podman 5.4.2 quadlet generator.
|
||
|
|
- **Images** (`scripts/build-bitcoin-image.sh`, `apps/bitcoin-{knots,core}/Dockerfile`):
|
||
|
|
removed `USER bitcoin` → run as **container-root** like legacy (still 100% rootless:
|
||
|
|
container-root maps to the unprivileged host service user; `CAP_DAC_OVERRIDE` from the
|
||
|
|
manifest lets bitcoind read the `data_uid`-owned datadir). **All** images rebuilt root +
|
||
|
|
pushed to the mirror (`146.59.87.168:3000/lfg2025`):
|
||
|
|
- Knots: `29.3.knots20260508`, `29.3.knots20260507`, `29.3.knots20260210`, `29.2.knots20251110`
|
||
|
|
- Core: `25.2 26.2 27.2 28.4 29.2 29.3 30.2 31.0` + `latest` (→31.0)
|
||
|
|
- **Catalog** (`scripts/generate-app-catalog.sh` VERSIONS map + regenerated
|
||
|
|
`releases/app-catalog.json`): Knots & Core `versions[]` populated; the generator now
|
||
|
|
forces top-level `version` == the `default` entry's version (the `169ff2e2` invariant)
|
||
|
|
regardless of the manifest version. Knots `latest` entry points at the newest **dated**
|
||
|
|
image (`29.3.knots20260508`) so "Always use latest" = newest on fixed-binary nodes.
|
||
|
|
- **Frontend** (`neode-ui/`):
|
||
|
|
- `AppSidebar.vue`: rename the latest option to **"Always use the latest version"**
|
||
|
|
(no `v` prefix), fix right padding, and `pickSelection()` guarantees the bound value is
|
||
|
|
a real option (fixes the blank dropdown).
|
||
|
|
- New `components/InstallVersionModal.vue`: full-screen version chooser shown from the
|
||
|
|
App Store / Discover **card** install button for multi-version apps — app icon +
|
||
|
|
"Install <name>", latest pre-selected. Wired in `Discover.vue handleInstall`.
|
||
|
|
- i18n keys: `appDetails.alwaysUseLatestVersion`, `marketplace.installModalTitle/Hint`.
|
||
|
|
|
||
|
|
## 3. Current live state on .228 (test node)
|
||
|
|
|
||
|
|
- Binary with both renderer fixes: **deployed** (`/usr/local/bin/archipelago`).
|
||
|
|
- New frontend bundle: **deployed** to `/opt/archipelago/web-ui` (hard-refresh to see it).
|
||
|
|
- Updated catalog: placed at `/var/lib/archipelago/app-catalog.json` (local override —
|
||
|
|
will refresh from the mirror's OLDER copy at the next hourly fetch until §4 publishes it).
|
||
|
|
- Knots: `bitcoin-knots` service held **stopped** (`package.stop`, user_stopped);
|
||
|
|
a detached `bitcoin-knots-reindex` container is rebuilding the index+UTXO (§5).
|
||
|
|
|
||
|
|
## 4. Remaining — coordinated fleet rollout (OTHER AGENT)
|
||
|
|
|
||
|
|
Do this together with the other workstream's release, AFTER both are ready:
|
||
|
|
|
||
|
|
1. **Merge** branch `bitcoin-version-bulletproof` into the release line.
|
||
|
|
2. **Build + OTA** the binary + frontend (these carry the renderer fix + UI). The renderer
|
||
|
|
fix is a **hard prerequisite** for the new images everywhere — see fleet-safety below.
|
||
|
|
3. **Publish the catalog** to the mirror (push `releases/app-catalog.json` to gitea-vps2
|
||
|
|
`main`, the raw URL nodes fetch hourly). The current catalog is **fleet-safe even before
|
||
|
|
the binary lands**: unpinned/auto-update nodes resolve via the manifest's floating
|
||
|
|
`:latest` (still the legacy image); only explicit version selection (needs the new UI)
|
||
|
|
uses the new root images.
|
||
|
|
4. **Only AFTER the binary is fleet-wide:** optionally repoint the `bitcoin-knots:latest`
|
||
|
|
tag → `29.3.knots20260508` (root) and simplify the catalog `latest` entry back to the
|
||
|
|
`:latest` tag. **Do NOT repoint `:latest` before then** — old-binary nodes fold
|
||
|
|
`Exec=sh -lc …` and would crash on an `ENTRYPOINT ["bitcoind"]` image. (Core never
|
||
|
|
worked on old binaries — it always shipped `ENTRYPOINT ["bitcoind"]` — so Core has no
|
||
|
|
such constraint.)
|
||
|
|
5. **Verify the full switch matrix** on a healthy node (§6).
|
||
|
|
|
||
|
|
## 5. Finishing .228's reindex (the remaining test-node task)
|
||
|
|
|
||
|
|
The detached `bitcoin-knots-reindex` container runs the new **root** `29.3.knots20260508`
|
||
|
|
image with `-reindex -server=0` against `/var/lib/archipelago/bitcoin`. It holds the datadir
|
||
|
|
lock, so the managed service (held stopped) can't collide. When it has connected blocks up
|
||
|
|
to ~the prior tip (height ≥ ~955800) it's done; then:
|
||
|
|
|
||
|
|
```sh
|
||
|
|
# on .228 (SSH/sudo/UI pw all: ThisIsWeb54321@)
|
||
|
|
podman stop -t 600 bitcoin-knots-reindex && podman rm bitcoin-knots-reindex
|
||
|
|
# start the managed service via RPC (sets desired=running, clears user_stopped):
|
||
|
|
# package.start {id: bitcoin-knots} (POST https://127.0.0.1/rpc/v1, CSRF: echo csrf_token cookie as X-CSRF-Token)
|
||
|
|
# verify:
|
||
|
|
podman exec bitcoin-knots sh -lc '$(command -v bitcoind) --version | head -1' # → v29.3.knots20260508
|
||
|
|
# RPC up → the Bitcoin UI populates; it syncs the gap to tip.
|
||
|
|
```
|
||
|
|
The "Bitcoin RPC connection refused (127.0.0.1:8332)" the UI shows is EXPECTED until this
|
||
|
|
swap (reindex runs with RPC off).
|
||
|
|
|
||
|
|
## 6. Switch-matrix test plan (what "bulletproof" must prove)
|
||
|
|
|
||
|
|
On a healthy node, each step must end with bitcoind running + RPC answering + syncing, with
|
||
|
|
NO `Error initializing block database` and NO data loss:
|
||
|
|
- Knots: switch `latest` → `29.3.knots20260507` → `29.3.knots20260210` → back to `latest`.
|
||
|
|
- Core: install `latest`; switch `31.0` → `28.4.0`.
|
||
|
|
- **Knots ↔ Core** (shared datadir/port): Knots→Core upgrade path (Core ≥ data version) and
|
||
|
|
the reverse. **Cross-major DOWNGRADES** (e.g. 29.x data → Core 28.4) legitimately need a
|
||
|
|
reindex — the UI already surfaces a downgrade warning; confirm it does and that confirming
|
||
|
|
reindexes cleanly rather than crash-looping.
|
||
|
|
- Reboot survival after each switch.
|
||
|
|
|
||
|
|
## 7. Notes / assumptions
|
||
|
|
|
||
|
|
- **"29.2"** in the request doesn't exist as a Knots build (404 upstream); added as **Bitcoin
|
||
|
|
Core 29.2** (exists). Revisit if a Knots 29.2 was meant.
|
||
|
|
- Reindex is unavoidable ONLY because .228's index was already corrupted by the pre-fix
|
||
|
|
crash loop; a normal switch on the fixed binary does NOT reindex.
|
||
|
|
- Creds for .228: SSH/sudo + UI/RPC all `ThisIsWeb54321@`.
|