- generate-app-catalog.sh: VERSIONS map now lists the full Knots set (29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core (adds 29.2 + a `latest` entry → newest); generator forces top-level `version` == the default entry's version (the 169ff2e2 invariant) so regeneration is reproducible. releases/app-catalog.json regenerated. - docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes, fixes, current .228 state, the coordinated fleet-rollout steps (incl. :latest repoint sequencing / fleet-safety), reindex finish procedure, and the switch-matrix test plan. - PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8.2 KiB
Bitcoin Multi-Version — Bulletproofing & Rollout (handoff)
Status 2026-06-29: code + images + catalog + frontend DONE on branch
bitcoin-version-bulletproof(base commit095a76cd, plus the catalog-generator
- handoff follow-ups). .228 is the test node: binary + frontend + catalog are live there; its Knots chainstate is mid-reindex recovery (see §5). The fleet rollout (OTA binary+frontend, mirror catalog publish,
:latestrepoint) is the coordinated step the other agent owns — see §4. Pairs withdocs/bitcoin-multi-version-design.md(the original design).
1. What was broken (root causes)
User report: "switched Knots to v29.3.knots20260508, version didn't update in the UI."
Three stacked bugs, plus a data-corruption hazard:
- Reconciler reverted the pin.
prod_orchestrator::sync_quadlet_unitre-rendered the quadlet every reconcile tick using the manifest's:latest, ignoring the per-app pinned version → any switch silently reverted within one tick. - Entrypoint render bug. The renderer folded the manifest
entrypoint: ["sh","-lc"]intoExec=. That only works when the image ENTRYPOINT is a passthrough shell wrapper. The versioned images useENTRYPOINT ["bitcoind"], soExec=sh -lc …becamebitcoind sh -lc …→unexpected token 'sh'→ crash loop. - Image USER divergence. The versioned images were built
USER bitcoin(uid 1000); the legacy:latestran as root. Chain data is owned by thedata_uid(host 100101 / container uid 102). Root reads it viaCAP_DAC_OVERRIDE(granted in the manifest); uid-1000 cannot →Error initializing block database. - Data hazard (already hit on .228). Repeated failed starts under mixed UIDs left
bitcoind's two LevelDBs (
blocks/index/+chainstate/) truncated to KB stubs while the rawblocks/blk*.dat(797 GB) stayed intact. Recovery =bitcoind -reindexfrom local blocks (no re-download). The uniform-root image fix (below) removes the mixed-UID cause going forward; the proper switch flow was already data-safe (600s stop grace, clean stop→rm→recreate, conflict-stops the other impl — they share port 8332 + datadir/var/lib/archipelago/bitcoin).
2. What was fixed (all on the branch)
- Renderer (
core/archipelago/src/container/):prod_orchestrator.rs: factoredresolve_catalog_image()(catalog/pinned-version → image) and call it in BOTHinstall_freshandsync_quadlet_unit— the pin now survives reconcile.quadlet.rs: emit a realEntrypoint=<first>+Exec=<rest+cmd>instead of folding;exec_changednow also diffsEntrypoint=so the recreate fires. Validated against the live podman 5.4.2 quadlet generator.
- Images (
scripts/build-bitcoin-image.sh,apps/bitcoin-{knots,core}/Dockerfile): removedUSER bitcoin→ run as container-root like legacy (still 100% rootless: container-root maps to the unprivileged host service user;CAP_DAC_OVERRIDEfrom the manifest lets bitcoind read thedata_uid-owned datadir). All images rebuilt root + pushed to the mirror (146.59.87.168:3000/lfg2025):- Knots:
29.3.knots20260508,29.3.knots20260507,29.3.knots20260210,29.2.knots20251110 - Core:
25.2 26.2 27.2 28.4 29.2 29.3 30.2 31.0+latest(→31.0)
- Knots:
- Catalog (
scripts/generate-app-catalog.shVERSIONS map + regeneratedreleases/app-catalog.json): Knots & Coreversions[]populated; the generator now forces top-levelversion== thedefaultentry's version (the169ff2e2invariant) regardless of the manifest version. Knotslatestentry points at the newest dated image (29.3.knots20260508) so "Always use latest" = newest on fixed-binary nodes. - Frontend (
neode-ui/):AppSidebar.vue: rename the latest option to "Always use the latest version" (novprefix), fix right padding, andpickSelection()guarantees the bound value is a real option (fixes the blank dropdown).- New
components/InstallVersionModal.vue: full-screen version chooser shown from the App Store / Discover card install button for multi-version apps — app icon + "Install ", latest pre-selected. Wired inDiscover.vue handleInstall. - i18n keys:
appDetails.alwaysUseLatestVersion,marketplace.installModalTitle/Hint.
3. Current live state on .228 (test node)
- Binary with both renderer fixes: deployed (
/usr/local/bin/archipelago). - New frontend bundle: deployed to
/opt/archipelago/web-ui(hard-refresh to see it). - Updated catalog: placed at
/var/lib/archipelago/app-catalog.json(local override — will refresh from the mirror's OLDER copy at the next hourly fetch until §4 publishes it). - Knots:
bitcoin-knotsservice held stopped (package.stop, user_stopped); a detachedbitcoin-knots-reindexcontainer is rebuilding the index+UTXO (§5).
4. Remaining — coordinated fleet rollout (OTHER AGENT)
Do this together with the other workstream's release, AFTER both are ready:
- Merge branch
bitcoin-version-bulletproofinto the release line. - Build + OTA the binary + frontend (these carry the renderer fix + UI). The renderer fix is a hard prerequisite for the new images everywhere — see fleet-safety below.
- Publish the catalog to the mirror (push
releases/app-catalog.jsonto gitea-vps2main, the raw URL nodes fetch hourly). The current catalog is fleet-safe even before the binary lands: unpinned/auto-update nodes resolve via the manifest's floating:latest(still the legacy image); only explicit version selection (needs the new UI) uses the new root images. - Only AFTER the binary is fleet-wide: optionally repoint the
bitcoin-knots:latesttag →29.3.knots20260508(root) and simplify the cataloglatestentry back to the:latesttag. Do NOT repoint:latestbefore then — old-binary nodes foldExec=sh -lc …and would crash on anENTRYPOINT ["bitcoind"]image. (Core never worked on old binaries — it always shippedENTRYPOINT ["bitcoind"]— so Core has no such constraint.) - Verify the full switch matrix on a healthy node (§6).
5. Finishing .228's reindex (the remaining test-node task)
The detached bitcoin-knots-reindex container runs the new root 29.3.knots20260508
image with -reindex -server=0 against /var/lib/archipelago/bitcoin. It holds the datadir
lock, so the managed service (held stopped) can't collide. When it has connected blocks up
to ~the prior tip (height ≥ ~955800) it's done; then:
# on .228 (SSH/sudo/UI pw all: ThisIsWeb54321@)
podman stop -t 600 bitcoin-knots-reindex && podman rm bitcoin-knots-reindex
# start the managed service via RPC (sets desired=running, clears user_stopped):
# package.start {id: bitcoin-knots} (POST https://127.0.0.1/rpc/v1, CSRF: echo csrf_token cookie as X-CSRF-Token)
# verify:
podman exec bitcoin-knots sh -lc '$(command -v bitcoind) --version | head -1' # → v29.3.knots20260508
# RPC up → the Bitcoin UI populates; it syncs the gap to tip.
The "Bitcoin RPC connection refused (127.0.0.1:8332)" the UI shows is EXPECTED until this swap (reindex runs with RPC off).
6. Switch-matrix test plan (what "bulletproof" must prove)
On a healthy node, each step must end with bitcoind running + RPC answering + syncing, with
NO Error initializing block database and NO data loss:
- Knots: switch
latest→29.3.knots20260507→29.3.knots20260210→ back tolatest. - Core: install
latest; switch31.0→28.4.0. - Knots ↔ Core (shared datadir/port): Knots→Core upgrade path (Core ≥ data version) and the reverse. Cross-major DOWNGRADES (e.g. 29.x data → Core 28.4) legitimately need a reindex — the UI already surfaces a downgrade warning; confirm it does and that confirming reindexes cleanly rather than crash-looping.
- Reboot survival after each switch.
7. Notes / assumptions
- "29.2" in the request doesn't exist as a Knots build (404 upstream); added as Bitcoin Core 29.2 (exists). Revisit if a Knots 29.2 was meant.
- Reindex is unavoidable ONLY because .228's index was already corrupted by the pre-fix crash loop; a normal switch on the fixed binary does NOT reindex.
- Creds for .228: SSH/sudo + UI/RPC all
ThisIsWeb54321@.