check_for_updates now fetches the manifest as raw JSON and runs
trust::verify_detached before parsing: a tampered or wrong-signer
signature rejects the mirror outright, and unsigned manifests are
offered for MANUAL apply only — the 3 AM auto-apply scheduler refuses
them, closing the unattended remote-root hole (§A of the 1.8.0
hardening plan). UpdateState gains manifest_signed so the UI can
surface authenticity.
Publisher side: create-release.sh signs the manifest during the
release (ceremony, mnemonic via TTY/env only), publish-release-assets
hard-refuses to ship an unsigned manifest (grep + new 'ceremony
verify' cryptographic gate), and scripts/sign-manifest.sh covers
re-signing outside a release run.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pin RELEASE_ROOT_PUBKEY_HEX from the 2026-07-02 release-root signing ceremony
(signer did🔑z6MkkidEnEpo6qHMCNSZoNKWtvQvxq3whnaME9wGgEFhq7ur) so nodes verify
the publisher identity of the app-catalog. Sign releases/app-catalog.json in place.
Fix two floats that made the catalog unsignable: archy-btcpay-db manifest version
-> string, fedimint-clientd cpu_limit 0.25 -> 1 (u32). Add scripts/sign-catalog.sh
helper, the 1.8.0 release-hardening plan/tracker, and the commit-and-push project
rule in CLAUDE.md.
Backward-compatible: old binaries still accept the signed catalog; the pinned-anchor
binary ships in the next build/OTA.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes from real fresh-install feedback (Framework node .81) + its log bundle:
Backend:
- websocket: subscribe before initial snapshot — broadcasts in the gap were
silently lost, stranding clients on stale state until a hard refresh
(the "everything needs ctrl-r" bug: My Apps stuck Loading, App Store
stuck Checking, containers-scanned never arriving)
- crash recovery: check the crash marker BEFORE writing our own PID —
recovery had never run on any node (always saw its own PID and skipped);
PID-reuse guard via /proc cmdline
- boot status: pending-boot-starts registry (recovery, stack recovery,
reconciler, adoption) — scanner overlays queued-but-down apps as
Restarting instead of Stopped after a reboot; scanner-authored
Restarting resolves immediately on a settled scan (no transitional wedge)
- install deps: bounded wait (36x5s) when a dependency is installed but
still starting ("Waiting for Bitcoin to start…") instead of instant
rejection; dependency-gate rejections remove the optimistic entry (no
phantom Stopped tile) and surface as a notification
- seed backup: auth.setup persists the onboarding mnemonic as the
encrypted seed backup (reveal previously failed on EVERY node — nothing
ever wrote master_seed.enc); seed.restore stashes too; error sanitizer
lets seed/2FA errors through instead of "Check server logs"
- lnd: bitcoind.rpchost resolved from the running Bitcoin variant
(hardcoded bitcoin-knots broke Core nodes); manifest uses derived_env
- bitcoin status: clean human message for connection-reset/startup; raw
URLs + os-error chains no longer reach the app card
- fedimint-clientd: chown /var/lib/archipelago/fmcd to 1000:1000 (root-
created dir crash-looped the rootless container, EACCES) — first-boot
script + pre-start self-heal
- log volume (>1GB/day on a day-old node): journald caps drop-in (ISO +
bootstrap self-heal), bitcoind -printtoconsole=0 everywhere (90% of the
journal was IBD UpdateTip spam), tracing default debug→info
Frontend:
- Login: Enter advances to confirm field then submits; submit always
clickable with inline errors (was silently disabled on mismatch);
Restart Onboarding needs a confirming second click (the mismatch →
"onboarding restarted" trap)
- sync store: 30s state reconciliation + refetch on re-entrant connect;
20s containers-scanned escape hatch so Checking can never show forever;
fresh empty node reaches the real "no apps yet" state
- intro video: CRF20 re-encode (SSIM 0.988) + faststart — moov was at EOF
so playback needed the full 15MB first (the intro lag)
- backgrounds: 10 heaviest JPEGs → WebP q90 (9.4MB→6.6MB); 7 stayed JPEG
(WebP larger on noisy sources)
- Web5ConnectedNodes: drop unused template ref that failed vue-tsc -b
ISO/kiosk:
- nginx: /assets/ 404s no longer cached immutable for a year; HTTPS block
gained the missing /assets/ location (served index.html as images)
- kiosk: launcher/service spliced from configs/ at ISO build (stale
heredoc force-disabled GPU); MemoryHigh/Max 1200/1500→2200/2800M (kiosk
rode the reclaim throttle = the lag); firmware-intel-graphics +
firmware-amd-graphics (trixie split DMC blobs out of misc-nonfree)
Verified: cargo test 898/898 green, npm run build green with dist
contents confirmed (webp refs, lnd.png, faststart video, new strings).
Handover for ISO build + deploy: docs/HANDOVER-2026-07-02-iso-feedback.md
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
.5's 5x gate done: 5/5 iterations, all technically FAIL per run-gate.sh's
tally but only from .5's permanent pruned-bitcoin ceiling (accepted going
in); down to 2 failures/iteration by the end. Found + fixed a real hang
(lnd cached a dead bitcoin-knots IP after a restart) live mid-run.
Separately found a real boot-reconciler bug via indeedhub going stuck on
.116: any genuinely-installed-but-fully-absent app was left stuck forever
unless it was one of 8 hardcoded "baseline" apps. Fix tracked, code change
in the shared working tree pending test confirmation.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
While the .5 gate ran: confirmed no legacy multi-container stacks remain
(workstream A tail fully closed), reframed the "30 apps zero coverage"
claim as stale (all apps get generic baseline coverage via
all-apps-lifecycle/matrix, real gap is 34 apps lacking app-specific
assertions), and discovered tests/multinode/smoke.sh already exists and
ran it live against .116<->.228: federation pairing/FIPS/content-browse
all confirmed working, but found + root-caused a real tombstone bug
(federation.remove-node silently swallows tombstone-write failures,
letting removed peers get re-added by background sync). Not fixed yet —
federation/trust code, needs a careful fix, not a blind one.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
.198 IBD/pruned blocker → user chose swap over wait/hardware. .116 ruled
out (no bitcoin container), .120 ruled out (reserved for another dev). .5
(archy-x250-beta) is fully synced despite also being sub-1TB/pruned;
bootstrapped bats+jq and launched the 5x destructive gate there.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Reset-failed 2 stale dead-unit records on .198, confirmed nginx lnd proxy
target is correct. Hit a genuine blocker needing a user decision: .198's
448GB disk is below the 1TB archival threshold so it runs pruned bitcoin,
currently only 21% through IBD — the multinode plan's precondition requires
pruned:false + fully synced.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
immich is already fully Quadlet-migrated (verified live on .228, same
install_stack_via_orchestrator primitive as netbird/btcpay). TanStack Query
spike recommends not adopting — no cache/staleness bugs, WS push already
covers hot data. Netbird reinstall adoption-skips-cert-render is correct by
design (adoption only fires when no manifest exists to render from anyway).
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
docs/UNIFIED-TASK-TRACKER.md replaces hunting across SESSION-1.8.0-OTA-PROGRESS.md
and PRODUCTION-MASTER-PLAN.md for "what's left" — fastest/simplest tasks first.
Verified against live code/nodes rather than trusting doc text: several previously
"open" items (bind-dir chown, netbird legacy installer, launch-port fallback,
archival-bitcoin manifest field, progress-UI monotonicity, all-apps coverage,
fedimint test coverage, changelog backfill, portainer image pin, grafana quadlet
activation) turned out already shipped or non-issues, and are closed out here.
TESTING.md's release-gate checklist updated to match reality (cargo warnings,
5x gate, changelog already green; multinode/backend-default-flip/tag genuinely open).
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
- prod_orchestrator.rs: the boot reconciler's zombie-guard and start-failed
recreate paths (Created/Stopped/Exited states) had no attempt cap, unlike
health_monitor's independent restart tracker. A container whose entrypoint
fatally crashes right after `podman start` succeeds got stop+remove+
install_fresh'd every ~30s reconcile tick forever (portainer on .198,
2026-07-01: a DB schema newer than the pinned binary could read -- no
amount of recreating fixes that). Added a 5-attempts/30-minute circuit
breaker; once exhausted the container is left alone with an error! log
instead of looping, and an explicit install/start clears the counter.
- content_server.rs: serve_content now prunes a catalog entry whose backing
file is missing on disk, instead of leaving it advertised to every peer
forever with no way to distinguish "gone" from "transient failure."
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
- mempool-api now declares dependencies:[bitcoin:archival] directly, closing a
gap where installing it standalone (a legitimate direct orchestrator-install
target) bypassed the mempool umbrella's pruning gate entirely.
- New durable user-uninstalled marker (crash_recovery.rs, mirrors user_stopped)
fixes required-baseline-app self-heal (bitcoin-knots/electrumx/lnd/mempool/
etc.) resurrecting itself after an explicit uninstall survives a restart or
reboot, since the in-memory disabled set is wiped by every load_manifests().
- installed_version() (set_config.rs) no longer trusts a floating image tag
("latest") as the reported running version -- a stale local :latest cache
reported "latest" forever regardless of what latest had moved on to. Now
falls back to asking the Bitcoin backend directly via `bitcoind --version`
when the tag is floating.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Merges in the meshtastic agent's now-finished work alongside this session's
continuation: stock-peer (3ccc) PKI-capability is now stamped through
get_contacts -> refresh_contacts -> MeshPeer.pkc_capable, so a directed DM to/from
a PKC-capable stock Meshtastic peer correctly shows the E2E pill on the Sent row,
not just received messages. Confirmed live: .198 sees "Meshtastic 3ccc" with
pkc_capable=true.
Also fixes two real interop/correctness bugs found while live-testing the
Reticulum <-> Sideband link:
- Receive: the daemon only ever read LXMF's plain-text content, silently
dropping native FIELD_IMAGE/FIELD_FILE_ATTACHMENTS fields — a stock
Sideband/NomadNet photo vanished into a blank-space message. Now decoded
into the same ContentInline typed envelope our own attachments use.
- Send: images to a non-archy (stock) peer now use native LXMF FIELD_IMAGE
instead of our own opaque CBOR wire format, which Sideband can't decode.
- Root cause of a garbled MC-chunk-fragment bug: TypedEnvelope.v/.sig (the
OUTER wrapper every message type uses) serialized raw bytes as a CBOR
array-of-integers instead of a native byte string, bloating every
message on the wire ~2-3.5x — enough to push even a tiny ReadReceipt
over the 140-byte single-frame chunking threshold. Root-caused by
reading ciborium's deserializer source directly (deserialize_bytes only
works within its internal scratch buffer; deserialize_byte_buf streams
unbounded).
Frontend: consolidated the attach/record buttons into a single animated "+"
menu (was overflowing the compose row).
857/857 tests pass. Verified live across all 5 deploy-roster nodes.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Phase 0 gates #2/#3 (two-node LXMF-over-LoRa, external Sideband interop) passed
on real hardware (.116's flashed Heltec V3 RNode <-> a phone-flashed RNode running
Sideband) — RNS announce, encrypted DM round-trip, and contact binding all verified
live. Fixed two bugs found in the process: the Reticulum send path wasn't stamping
outbound messages as E2E despite LXMF being unconditionally encrypted, and the
per-message transport pill collapsed Meshcore/Meshtastic into one generic "lora"
color instead of distinguishing the three radio transports.
Built on top of that link: a Columba-style image/file send experience —
compression-quality presets with a real transfer-time estimate (mesh.transport-advice,
now device-throughput-aware), receive-side thumbnail previews + auto-render for
already-local attachments, and async voice messages, all reusing the existing
ContentRef/ContentInline attachment pipeline. The headline addition is genuine RNS
Resource transfer support (daemon-side RNS.Link + RNS.Resource, Rust-side
send_resource/resource_recv plumbing, a new "resource-mesh" transport-advice tier)
so compressed photos up to 2MB now actually transfer over LoRa for Reticulum peers
instead of always falling back to Tor past the small inline-chunk cap.
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Inbound Meshtastic text addressed to BROADCAST_NUM (the default public
LongFast channel, or any channel slot) was filed into a per-sender 1:1 DM
thread, so public-channel messages polluted individual people's DM chats
and appeared as if sent directly to the user.
packet_to_inbound_frame now detects `to == BROADCAST_NUM` and emits a new
synthetic RESP_MESHTASTIC_CHANNEL_TEXT frame
([channel_idx][sender_prefix(6)][text]) that the listener files under the
channel thread (contact_id = u32::MAX - idx) while still attributing the
message to its real sender. Directed text (to == our node) still routes to
the DM thread — a regression test locks that split in.
send_channel_text now sets MeshPacket.channel (field 3) so archy actually
transmits on channel 0 (public) instead of ignoring the slot. Mesh.vue keeps
the synthetic "Meshtastic !xxxx" sender id when that is the best identity
available for a stock public-channel device.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- send_message now sends archy↔archy plain text as a native TEXT_MESSAGE_APP
DM (firmware PKC-encrypts E2E), not wrapped in the binary typed envelope
that silently broke archy↔archy LoRa delivery. Archy peers' Sent rows are
marked encrypted so the E2E pill shows; rich typed msgs still use the
typed-wire path.
- Add a software radio-reboot to recover a wedged/RX-deaf radio without
physical access (and for the Device-tab settings panel): driver reboot()
via AdminMessage reboot_seconds=97 (verified vs meshtastic/protobufs),
MeshCommand::RebootRadio, MeshService::reboot_radio, RPC mesh.reboot-radio.
- Handoff doc: docs/SESSION-1.8.0-OTA-PROGRESS.md "RESUME HERE" — RF link is
the proven blocker (radios not hearing each other); modem_preset mismatch
is the prime suspect; on-device Meshtastic-app check + fix plan documented.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- generate-app-catalog.sh: VERSIONS map now lists the full Knots set
(29.3.knots20260508/20260507/20260210 + 29.2.knots20251110) and Core
(adds 29.2 + a `latest` entry → newest); generator forces top-level
`version` == the default entry's version (the 169ff2e2 invariant) so
regeneration is reproducible. releases/app-catalog.json regenerated.
- docs/bitcoin-version-bulletproof-rollout.md: full handoff — root causes,
fixes, current .228 state, the coordinated fleet-rollout steps (incl.
:latest repoint sequencing / fleet-safety), reindex finish procedure, and
the switch-matrix test plan.
- PRODUCTION-MASTER-PLAN.md: link the rollout doc (§6b-bis).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Region (EU_868) + shared channel "archipelago" auto-provisioning shipped in
8fdb45e8 and riding the rolled #9 fleet binary (0060dcd6). Discovery, RF, and
sending verified on .116+.228; the one open blocker is the running driver not
surfacing received messages. Slotted after WS-F #9–11.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Workstream F now in-progress: the immich/grafana uninstall hang →
ghost/stuck-bar/reinstall-block is root-caused (unbounded systemctl/
podman in quadlet::disable_remove) and fixed (71cc9ac4); cascade-
uninstall.bats 7/7 on .228. Records the remaining F items + the pending
gate-wiring decision.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
§10b: replace per-app static launch-port map with a manifest-first +
non-HTTP-port-skipping heuristic (the gitea :2222 class).
§10c: generalize the un-pruned/archival Bitcoin install blocker from a
hardcoded requires_unpruned_bitcoin() match to a manifest-declared
dependency, with a clear pre-install UX.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Banner + §8b: zombie-container guard (0a8db904, live-proven on .228) and
gitea launch-port fix (670ebb06) shipped in binary 040df5ce, rolled to
the fleet. Logs the mempool env-drift recreate-loop and nostr-rs-relay
follow-ups.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 2026-06-23 5×-green gate is DESTRUCTIVE-tier / ~8 core apps only —
it skips uninstall/reinstall (cascade) and has no progress-UI or
all-apps coverage. Manual multinode testing found real bugs it never
ran (immich+grafana uninstall hangs at full-red bar + ghost in My Apps;
grafana reinstall stops; fedimint guardian "waiting for bitcoin sync").
Adds §4 row F, §6b post-deploy order (netbird→Phase-3→F), §6c scope +
observed bugs + definition-of-done, a §5 warning, and §10 backlog to
investigate TanStack-Query/push-based state management for neode-ui.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
run-gate.sh 5×-green on .228, 0 not-ok (gate-5x5.log). Records the
milestone in the header/banner, §4 workstream E, §6 sequence, and §8b;
demotes the priority banner per §6 item 6. Next: bundled testing deploy
(.116/.198 + UX frontend), multinode pass, workstreams B/C/D.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record the overnight 5× outcome (2/5) and the triage: all three
fails were distinct one-offs. iter1 #5 bitcoin-knots = pre-launch
churn (hardened anyway); iter2 #74 + iter5 #73 = one real
orchestrator bug (phantom stack-member injection in
ordered_containers_for_start), now fixed + live-verified on .228.
Update the resume check command to gate-5x4.log.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename run-20x.sh → run-gate.sh, default ARCHY_ITERATIONS 20→5, and scrub
20× references across CLAUDE.md, the master plan, TESTING.md, app-registry
status, the orchestrator/config doc-comments, and the bats suites. Also add
a minimal fail() helper to mempool.bats so guard failures report cleanly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Captures: .228 1x-GREEN (110/110); hardened 5x DETACHED on .228 (/tmp/gate-5x2.log,
nohup — survives terminal close) with the exact check-from-any-machine command; all
shipped code fixes (commits) + deploy state (.228 + .198); node-state fixes NOT in
repo (lnd nginx proxy 8081->18083, home-assistant orphan unit removed, electrumx
re-registered); the run-ON-the-node lesson; and remaining work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Companion app: open every app in the in-app WebView (not just non-iframeable),
carrying the mobile-iframe footer controls into the WebView. Mobile web (PWA):
open tab-apps directly in a new tab. No interstitial on either surface. Touch
points + prior commits (b5a9deb8, d1fbcd9b) noted.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per direction: the gate is now 5x green ON .228 only (run on the node, not via RPC).
Fleet/multinode verification (.198 + others) moved to a new docs/multinode-testing-plan.md
with the bootstrap recipe, per-node preconditions (synced archival bitcoin, no stale
nginx proxy targets, no orphan quadlet units), node roster, and cross-node suites.
Updated CLAUDE.md, master-plan SS5/SS6/SS8b/WS-E, and TESTING.md release gates.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independent companion loop (452f05d8) validated on .228: deleted archy-electrs-ui
recreates in ~10s (was stuck 100s+). Also: companion-survives bats does LOCAL
rm/systemctl --user, so running it from .116 via RPC tests .116's companions with
.116's binary, NOT the remote target — must run ON the target node. Explains the
'failed on both nodes' runs (both silently tested .116).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Last 2 .228 stragglers confirmed load/timing, not bugs: test 31 (companion recreate)
= contamination + ~108s reconcile cadence > 90s window; test 55 (immich restart) =
heavy stack restarts >120s under load but DOES return. Path to literally-green gate
is infra (bitcoin sync, re-quadletize .228) + minor test-window tuning. Optional
product improvement noted: independent ~30s companion-reconcile cadence.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
companion::reconcile only recreates a deleted companion unit when its parent
backend is in manifest_ids. On contaminated .228, electrumx ran as plain podman
and was NOT a tracked manifest install (manifest on disk but unloaded), so the
reconciler never iterated it -> archy-electrs-ui companion orphaned. Proven:
package.install electrumx re-registered it + restored the companion. Self-heal
logic is sound; test 31 clears on re-quadletize. electrumx on .228 de-contaminated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>