- generate-app-catalog.sh: VERSIONS map now lists the full Knots set
(29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core
(adds 29.2 + a `latest` entry → newest); generator forces top-level
`version` == the default entry's version (the 169ff2e2 invariant) so
regeneration is reproducible. releases/app-catalog.json regenerated.
- docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes,
fixes, current .228 state, the coordinated fleet-rollout steps (incl.
:latest repoint sequencing / fleet-safety), reindex finish procedure, and
the switch-matrix test plan.
- PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three stacked bugs made "switch version" silently fail / crash-loop, and
the data-access mismatch corrupted a node's index during recovery attempts.
Backend renderer:
- sync_quadlet_unit ignored the per-app pinned version and re-rendered the
quadlet with the manifest's :latest every reconcile tick, reverting any
switch. Factor the install-time catalog/pin resolution into a shared
resolve_catalog_image() and call it in BOTH install_fresh and
sync_quadlet_unit.
- The renderer folded manifest `entrypoint: ["sh","-lc"]` into Exec=, which
only worked when the image entrypoint was a passthrough shell wrapper. The
versioned images use ENTRYPOINT ["bitcoind"], so Exec=sh -lc ... became
`bitcoind sh -lc ...` and crash-looped. Emit a real Entrypoint= override;
exec_changed now also compares Entrypoint=.
Images:
- Build all bitcoin images (Core + Knots, every version) as container-root
(USER removed) like the legacy :latest image. Chain data is owned by the
data_uid (container uid 102); root reads it via CAP_DAC_OVERRIDE (granted in
the manifest). A non-root USER (the previous uid 1000) can't read existing
chain data → "Error initializing block database". Still fully rootless:
container-root maps to the unprivileged host service user.
Catalog:
- bitcoin-knots versions[]: 29.3.knots20260508/20260507/20260210 +
29.2.knots20251110, "latest" tracking newest.
- bitcoin-core versions[]: add 29.2 + a "latest" entry. All images rebuilt
root and published to the mirror.
Frontend:
- AppSidebar version dropdown: rename the latest option to "Always use the
latest version" (no v prefix), fix right padding, and guarantee the current
selection matches a real option (was rendering blank).
- New InstallVersionModal: full-screen version chooser shown from the App
Store / Discover install button for multi-version apps (Bitcoin Knots/Core),
app icon + "Install <name>", latest pre-selected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The knots versions[] marked 29.3.knots20260508 as default while the
top-level catalog version is the floating 'latest' tag — violating the
generator's own invariant (default:true MUST equal the top-level version
so selecting it un-pins / tracks latest). Live effect via package.versions:
catalog_default_version='latest' so the UI-highlighted default actually
PINS+recreates (opposite of un-pin) and 'latest' was unreachable from the
Version & Updates card.
Add a 'latest' default entry (== the manifest's floating tag) and keep
29.3.knots20260508 as a pinnable option. Verified on .228: package.versions
now returns default=latest with 2 selectable versions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The archy-mempool-web health_check endpoint used http://localhost:8080.
Inside the frontend image, wget resolves `localhost` to ::1 (IPv6) first,
but nginx binds 0.0.0.0:8080 (IPv4) only -> the baked HealthCmd gets
"connection refused" every probe -> container is perpetually unhealthy ->
the reconciler recreates it forever (observed on .228: mempool container
re-Started every ~3 min, Health=unhealthy). Proven live: in-container
`wget http://localhost:8080/` = refused, `wget http://127.0.0.1:8080/` = OK.
Pin the probe to 127.0.0.1 so it matches nginx's IPv4 bind. Updated both
the source manifest and the embedded copy in releases/app-catalog.json
(the catalog overlay wins over the disk manifest on fleet nodes, so the
catalog copy is the one that actually reaches .228).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The mempool manifest + embedded catalog declared the frontend container
port as 4080, but mempool-frontend nginx listens on 8080 (the stack
creates it as -p 4080:8080 with FRONTEND_HTTP_PORT=8080, see
api/rpc/package/stacks.rs). So every reconcile rendered the quadlet as
PublishPort=4080:4080, disagreed with the working 4080:8080 container,
and restarted it ("port binding drift" -> "host port 4080 did not become
reachable within 5s" -> "host listener disappeared; restarting") in a
perpetual loop on .228. Correcting the manifest container port to 8080
makes the rendered quadlet match reality so the drift/restart loop stops.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On NAT'd nodes that can reach the iroh federation neither directly nor
via iroh's public relays, fmcd's embedded iroh networking enters a
relay/hole-punch reconnect hot-loop that pegs its entire CPU allotment
indefinitely (observed ~1 core sustained for 4 days on a Tailscale node,
while LAN nodes that reach the guardian directly stay <3%). fmcd 0.8.0
exposes no iroh/relay knobs, so:
- fmcd-run now samples fmcd's own CPU and restarts it when it stays near
its allotment for ~15 min (a restart demonstrably clears the stuck iroh
state; real work is bursty and never flat-pegs a core for minutes).
- Lower cpu_limit 1 -> 0.25 core so a stuck instance can't starve the
node (steady-state is <3% of a core; joins are brief).
Ships as fmcd:0.8.1 (launcher-only rebuild, same fmcd binary). Bumped the
image pin + cpu_limit in the manifest, image-versions.sh, the embedded
catalog manifest (releases/app-catalog.json), and the UI catalogs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fresh Meshtastic radios ship region-UNSET (RF-silent) and on mismatched
channels, so nodes only ever saw themselves. Bring them to MeshCore parity
using the official Meshtastic admin API:
- Auto-provision LoRa region (set_config, AdminMessage field 34) from a new
mesh-config `lora_region` (e.g. EU_868) when the radio's region differs.
- Auto-provision a shared primary channel (set_channel, field 33) with a
PSK derived deterministically from channel_name, so every node converges on
one mesh — the parity equivalent of MeshCore's named "archipelago" channel.
- Read current region/channel from want_config; only write when different
(no reboot loop); cap attempts so a radio that won't persist can't loop.
- Active NodeInfo advert scaffolding + aggressive serial drain.
Verified on .116+.228: region+channel persist, discovery works (both see each
other as named reachable contacts), bidirectional RF + sending confirmed.
Receiving in the running driver is still under diagnosis (instrumentation added).
Also removes the unwanted `meshtastic` daemon app from the registry (it was
never meant to be a container — native driver provides system-level support):
deletes apps/meshtastic + catalog entries (app-catalog, neode-ui, releases) +
test refs. Meshtastic stays native, like MeshCore.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Force-add the gitignored releases/app-catalog.json so nodes resolve
146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/app-catalog.json
(currently HTTP 404 → disk-manifest fallback). Embedded-manifest delivery
is default-on; origin-wins overlay with disk as fallback. Unsigned (migration
window accepts unsigned). Includes netbird x3 manifests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>