When a Quadlet unit file already exists for an orchestrator-managed
backend, sync its on-disk bytes against what the current renderer
produces. write_if_changed makes this idempotent — when bytes match,
no IO; when they differ (post-deploy of a renderer change), the file
is rewritten and systemctl --user daemon-reload runs once.
We deliberately do NOT restart the .service when the file changes:
running containers keep their current config until the operator
restarts them. That's the right tradeoff — file updates are cheap and
non-destructive; service restarts are the SIGKILL cascade we're
trying to eliminate.
Why this matters: pre-this-commit, every renderer change required a
fresh package.install RPC per app to take effect. Observed live on
.228 2026-05-02 — the TimeoutStartSec=600 fix shipped in code but
existing units stayed on the old format because nothing triggered a
re-render. Combined with state.json being empty (so the reconciler's
auto-install path didn't fire either), the fix was invisible until
manual unit deletion.
Companions (UI_APP_IDS) are skipped — companion.rs renders those units
with a different shape; syncing here would clobber them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug surfaced live on .228 2026-05-02 — every backend Quadlet unit
(lnd, electrumx, fedimint, btcpay-server, mempool-api, bitcoin-knots)
hit systemd's default 90s start timeout because Notify=healthy makes
systemctl wait for the first green health probe, but
HealthInterval=30s × HealthRetries=3 = 90s minimum even on a healthy
service. Race: timeout fires the moment the third probe MIGHT succeed.
Result was three different post-states (inactive+running, failed+missing,
inactive+stopped) depending on whether systemd's ExecStopPost ran
podman rm before the orchestrator's adoption logic re-grabbed the
container.
Fix: when health is set, render TimeoutStartSec=600 (10 minutes) into
[Service]. Long enough for slow-starting backends (electrumx index
replay, lnd wallet unlock) without being so long that a truly stuck
unit hangs forever. Companions stay unchanged (no health → no override,
default 90s applies).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced by the first real-node validation of Phase 3.2-3.4
on .228 (2026-05-02), both caught before flipping the default.
Bug 1 — translate_health_check double-prefixed http://. Manifests in
the wild carry the scheme inside the endpoint string ("http://localhost:8175"),
and we were prepending another http:// unconditionally. Result on .228:
every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`,
every probe failed, fedimint hit a 14-restart loop. Now we accept either
form and skip appending hc.path when the endpoint already carries one.
Regression test asserts no double-prefix and that an in-endpoint path
is honoured.
Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui /
electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41.
Migration tore down the running companion + raced companion.rs render,
producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile
errors and leaving archy-bitcoin-ui down. Companions now short-circuit
out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists
returns Err for an unrelated reason (permissions, EIO), we now skip
migration instead of treating "I can't tell" as "go ahead and migrate" —
migrating on top of a possibly-existing unit is destructive.
What this does not fix yet:
* the orchestrator's reconciler iterating every manifest in
/opt/archipelago/apps/, not just installed apps. Pre-existing
behavior (also affects the legacy path) — separate scope.
* fedimint /data UID mismatch surfaced when Quadlet started fedimint
fresh. Likely orthogonal — defer.
* no rollback when install_via_quadlet fails after a remove_container.
Tracked as Phase 3.3.1 — defer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QuadletUnit gains an optional HealthSpec; from_manifest translates the
manifest's health_check (tcp/http/cmd) into a HealthCmd= directive and
emits Notify=healthy alongside it. systemctl start <unit>.service then
blocks until the container's first green probe — eliminating the
"container up but RPC not ready" race the orchestrator currently papers
over with post-start polling.
Translation policy:
* tcp, endpoint "host:port" -> nc -z host port
* http, endpoint "host:port", path -> curl -fsS -m 5 http://endpoint<path>
* cmd, endpoint "<shell command>" -> verbatim
* unknown type / malformed endpoint -> None (skip Notify=healthy rather
than emit a HealthCmd that hangs the unit start forever)
Companion units leave health: None and remain byte-identical to before
this PR — the renderer only emits the Health* / Notify= block when set.
+4 quadlet unit tests (19 total). Dropped a never-used test setter that
was generating a dead_code warning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When use_quadlet_backends flips from off → on, existing fleet boxes
have backend containers parented under archipelago.service's cgroup
(the bad shape that triggers FM3 cascade SIGKILL on every archipelago
restart). ensure_running now notices and corrects this:
* If there's already a `<name>.container` unit on disk → no-op
(subsequent reconcile ticks take this fast path).
* Else if a podman container with that name exists → it's a pre-3.3
artifact. Stop+remove it (volumes survive — bind mounts are not
touched by `podman rm`), then write the Quadlet unit, daemon-reload,
and start the new managed service.
* Else → fall through to install_fresh, which already routes through
install_via_quadlet when the flag is on.
The migration is idempotent and self-healing: if a fleet box is
half-migrated (unit on disk but no service active, or service active
but stale unit), the next reconcile tick converges. Bitcoin chain
data, lnd wallet state, and electrumx index all live on host bind
mounts and are unaffected by the container-record swap.
Volume safety audited per backend in `uses_orchestrator_install_flow`
allowlist — every entry mounts its data dir as a host bind mount.
Default still off. To migrate a node:
/etc/archipelago/config.toml: use_quadlet_backends = true
followed by `systemctl restart archipelago` — the next reconcile tick
walks every managed app and migrates each in turn.
Tests: 624 passing, 0 cargo warnings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prod_orchestrator::install_fresh now branches on the new
Config::use_quadlet_backends flag (default false):
* off (today's production behavior) — unchanged: runtime.create_container
+ start_container, container parented under archipelago.service's
cgroup, FM3 cascade SIGKILL on every archipelago restart.
* on — install_via_quadlet renders the manifest as a Quadlet unit via
QuadletUnit::from_manifest, writes it atomically into
~/.config/containers/systemd/, calls daemon-reload, and starts the
generated <name>.service. Container ends up under user.slice — no
more cgroup parented under archipelago, so archipelago restarts
don't touch the container's lifetime.
Default off so this commit is structurally safe to ship: nothing
changes at runtime until an operator opts in. Flip the default once
tests/lifecycle/run-20x.sh has gone green against the new path on
.228 + .198 (the v1.7.52 release gate).
Plumbing:
* config.rs — `use_quadlet_backends: bool` w/ Default false
* prod_orchestrator.rs — flag stored on the struct, threaded through
new(), with set_use_quadlet_backends(bool) test setter
* prod_orchestrator.rs — install_via_quadlet helper
* dropped the Phase-3.1 #[allow(dead_code)] markers on from_manifest /
parse_memory_mib / RestartPolicy::OnFailure now that the call path
exists; if a future revert removes the wiring, the warnings come back.
Tests: 624 passing, cargo check clean (0 warnings). Existing companion
behavior unaffected — render_skips_backend_directives_when_default
still passes byte-equal to before quadlet.rs grew the new fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The QuadletUnit struct now covers everything a backend manifest needs
(ports, environment, devices, add_hosts, entrypoint+command, read-only
root, no_new_privileges, cpu_quota, restart policy choice). Adds
QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed
manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB
forms. The renderer skips empty/false directives so existing companion
units render byte-identically — no behavior change for shipping
companions; the backend renderer is dead code until Phase 3.2 wires it
into the orchestrator.
Eight new unit tests cover:
* parse_memory_mib forms (1024, 512m, 2g, garbage)
* shell_join quoting (whitespace, embedded quotes)
* RestartPolicy → systemd string mapping
* render emits backend directives when set
* render skips them when defaulted (companion regression gate)
* from_manifest happy path on a bitcoin-knots-shaped manifest
* from_manifest read-only volume detection
* from_manifest tmpfs filtering
* end-to-end manifest → render bytes assertion
Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was
implicitly covered before but is now explicit). Cargo warnings: 0.
`from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are
marked allow(dead_code) with explicit references to Phase 3.2 — if
3.2 doesn't wire them, the dead-code warning resurfaces.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cargo check was showing five real warnings, all genuinely dead:
* container/mod.rs — re-exports compute_container_name, AdoptionReport,
ReconcileAction, ReconcileReport were unused outside
prod_orchestrator. Drop from the pub use line.
* prod_orchestrator — with_runtime + insert_manifest_for_test only exist
for the test module in the same file. Mark them
#[cfg(test)] so they don't appear in release builds.
* async_lifecycle — remove_package_entry has no callers; doc claims
"used for install-failure cleanup" but nothing
cleans up. Delete (10 lines).
* registry.rs — `use tracing::{debug, info};` had no consumers.
* fips.rs — unused-assignment chain on last_status. The poll
loop always sets it on every break path, so the
initial `None` and the unwrap_or_else fallback
were both dead. Refactored to `let after = loop
{ ...; break s; };`.
cargo check is now clean. cargo test --workspace --bins: 614 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3a of the install path consolidation. Two coupled changes:
1. install.rs handle_package_install: gate the legacy "container exists →
adopt + return" probe on !orchestrator_managed. Apps the orchestrator
knows about (bitcoin-knots, bitcoin-core, lnd, electrumx, fedimint,
filebrowser, btcpay-server stack apps, mempool stack apps, plus the
companion UIs that just moved to Quadlet) skip the legacy probe and
fall straight into the orchestrator branch.
The legacy adopt block was returning success on a bare `podman start`
exit-0 — even when the process inside the container crashed seconds
later. That's the .228 "running but unreachable" failure mode. The
orchestrator's ensure_running honors the manifest's health check and
pre-start hooks (e.g. re-renders bitcoin-ui's nginx.conf if the RPC
password rotated), so this is a behavioral upgrade, not just a
refactor.
2. ProdContainerOrchestrator::install: make idempotent. Previously it
blindly called install_fresh which would fail on `podman create` if
the container name already existed. Now it delegates to ensure_running:
- Container Running + healthy → no-op (refresh hooks, restart if
config rewritten)
- Container Stopped/Exited → start (with hook refresh)
- Container missing → install_fresh
- Container in wedged state (Created/Paused/Unknown) → force-recreate
Without this, change #1 would regress every "container already exists"
case for the 18 orchestrator-managed app IDs. With it, install becomes
the single source of truth for "make app X be in the desired state."
Tests: 654 passed across the workspace (614 unit + 37 orchestration + 3
rpc), 0 failures. The 20 prod_orchestrator tests cover the install /
ensure_running / reconcile paths the new install delegates through.
Net delta: install.rs grows by ~30 lines (gating wrapper + comments),
prod_orchestrator.rs grows by ~30 lines (idempotent install body). Both
are temporary — the larger deletions (~1700 lines) come once every app
has been verified through the orchestrator path in subsequent phases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion UI containers (archy-bitcoin-ui, archy-lnd-ui,
archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn
blocks from install.rs. If archipelago crashed mid-spawn or the
container's cgroup was reaped, companions vanished from podman ps -a
and only a manual rm/run could bring them back (the .228 incident).
Now each companion is rendered as a Quadlet .container unit under
~/.config/containers/systemd/, daemon-reloaded, and started via
systemctl --user. systemd owns supervision from that point on:
- archipelago can crash, restart, or be uninstalled without touching
any companion.
- Quadlet's Restart=always + RestartSec=10 handles container exits.
- A 30s reconcile tick in boot_reconciler enumerates expected
companion units and re-installs any whose unit file or service
vanished — defense-in-depth against external tampering.
New module layout:
- container/quadlet.rs: pure unit renderer + atomic write_if_changed
+ systemctl helpers (daemon_reload_user / enable_now / disable_remove
/ is_active). 6 unit tests, no I/O in the renderer.
- container/companion.rs: per-app companion specs, install/remove/
reconcile, image presence (build local first, fall back to insecure
registry only via image_uses_insecure_registry whitelist). 2 tests.
install.rs handle_package_install now ends with a single call to
companion::install_for(package_id), replacing 287 lines of spawn-and-
hope shellouts plus a ~120-line nginx auth-injector helper that worked
around per-node RPC password baking. The helper is gone too — the
pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/
bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only.
runtime.rs handle_package_uninstall now disables companions before
the container rm loop. Otherwise systemd's Restart=always would
respawn each companion within ~10s of removal.
Tests: 53 container tests pass, including 6 quadlet renderer tests
(host network, bridge network, capability set, atomic write idempotence)
and 2 companion specs (per-app companion lookup, build_unit shape).
boot_reconciler tests gain a #[cfg(test)] without_companion_stage()
flag so the paused-clock fixtures don't race the real systemctl I/O.
A bats regression test (companion-survives-archipelago-restart.bats,
gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode
cannot recur: every installed companion has a unit file, services
stay active across systemctl --user restart archipelago, and a
deleted unit file is recreated within one reconcile tick.
Net delta: +941 / -363, but the +941 is mostly tests (~440 lines)
and the new declarative layer; the imperative tokio::spawn block and
its nginx-auth helper are gone, removing two failure classes
(orphan companions on archipelago crash, and post-start exec races
under tightly-confined cgroups) that previously needed manual SSH
recovery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Snapshots the in-flight hardening work so subsequent reconcile/Quadlet
phases land on a clean before/after diff.
Changes:
- core/container/src/podman_client.rs: image_uses_insecure_registry()
whitelist for the OVH (146.59.87.168:3000) and legacy Hetzner
(23.182.128.160:3000) HTTP mirrors; podman_network_settings() lifts
custom networks into the Networks map so containers can join them.
- core/archipelago/src/container/prod_orchestrator.rs:
ensure_container_network() creates per-manifest networks on demand;
apply_data_uid() now goes through host_sudo for mkdir -p + chown so
bind-mount roots get created and chowned without password prompts.
- core/archipelago/src/api/rpc/package/{install,update,stacks}.rs:
podman pull adds --tls-verify=false only for whitelisted registries.
- core/archipelago/src/bootstrap.rs: removes stale dev-mode systemd
override on startup (live nodes carried it from old installers).
- core/archipelago/src/config.rs: ignore ARCHIPELAGO_DEV_MODE in prod
binaries — it had been silently rerouting volumes to /tmp.
- apps/bitcoin-{core,knots}/manifest.yml: locate bitcoind at runtime
so image-layout differences don't break entrypoint.
- scripts/app-catalog-image-smoke-test.py: production catalog/image
smoke test that probes a target node before users click Install.
- .gitignore: cover .codex, .pnpm-store, __pycache__, *.bak.
Removes filebrowser.rs.bak and two stale catalog.json.bak files
(verified identical to live counterparts).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The parser retained any key ending in _IMAGE, so a harmless-looking
variable like NOT_AN_IMAGE="something" would be treated as a pinned
container image. Add a value-shape check: the value must contain both
a registry separator (/) and a tag separator (:) to qualify.
The Rust search path listed /opt/archipelago/image-versions.sh and
scripts/image-versions.sh (repo-relative for dev), but the image
recipe deploys the file to /opt/archipelago/scripts/image-versions.sh.
Production nodes therefore silently failed every lookup: find_file
returned None, load_image_versions returned an empty HashMap, and
both pinned_image_for_app and pinned_images_for_stack returned no
matches.
Symptom on deployed nodes: every container scan emitted
"image-versions.sh not found in any search path" at DEBUG level, and
the version-comparison logic in docker_packages.rs plus the
update-check logic in api/rpc/package/update.rs silently degraded to
no-op — users would not see update-available badges and upgrade RPCs
could not resolve pinned targets.
Fix: put the canonical deployed path first in PATHS, keep the older
/opt/archipelago/image-versions.sh as a fallback for not-yet-updated
nodes, and retain scripts/image-versions.sh as the dev-repo-relative
fallback. Verified on .228: backend now logs "Parsed 57 image
versions from /opt/archipelago/scripts/image-versions.sh" on scan.
Pre-existing test_parse_image_versions failure in this module is
unrelated (the NOT_AN_IMAGE assertion was broken before this change
because the parser's _IMAGE-suffix retain keeps it). Leaving that for
the general cargo-test cleanup pass.
load_registries + load_mirrors normally only ADD missing defaults to
the persisted JSON — explicit removals stick. After retiring the .23
Hetzner VPS we need the opposite: existing nodes have .23 baked into
their saved configs and would spend seconds per install/update timing
out against a dead host until the operator manually removes it via
the Settings UI.
Add a targeted one-time migration in both loaders: if any saved entry
has 23.182.128.160 in its URL, drop it on load and rewrite the file.
This is an exception to the usual "explicit removals stick" rule —
the user never chose to add this mirror, it was a default.
Narrow-scope migration (one hardcoded IP match, no schema version)
because the cost/benefit of a general migration system isn't worth
it for a single decommissioned host. Future retirements can follow
the same pattern.
The Hetzner VPS at 23.182.128.160 was decommissioned. Replace it
everywhere with the OVH VPS at 146.59.87.168, which was previously
the tertiary mirror.
- update.rs: drop DEFAULT_TERTIARY_MIRROR_URL, promote .168 into
the secondary slot as "Server 1 (OVH)"; tx1138 becomes Server 2.
Default mirror list shrinks from 3 to 2.
- container/registry.rs: default RegistryConfig drops .23, promotes
.168 to Server 1 / priority 0, tx1138 stays Server 2 / priority 10.
- api/rpc/package/config.rs: trusted-registry allowlist swaps .23
for .168.
- api/handler/mod.rs: app-catalog fallback URL uses .168.
- neode-ui/views/marketplace/marketplaceData.ts: REGISTRY uses .168.
- scripts/image-versions.sh: ARCHY_REGISTRY_FALLBACK uses .168.
- image-recipe/build-auto-installer-iso.sh: installer ISO registries
use .168 (both podman registries.conf and backend registries.json).
Tests updated to assert on the new 2-entry default lists (registry +
mirror). URL-parser fixture tests in update.rs retain .23 strings —
they exercise string-parsing logic, not mirror policy.
Git remotes: dropped `gitea-vps` and the .23 push URL on the `origin`
multi-push alias (not part of this commit — pure working-copy change).
Replaces the first-boot-containers.sh sed/envsubst approach with a
Rust-native render step bound into the ContainerOrchestrator lifecycle.
- New container::bitcoin_ui module: embeds the nginx.conf template via
include_str!, reads the plaintext RPC password from
/var/lib/archipelago/secrets/bitcoin-rpc-password, substitutes
{{BITCOIN_RPC_AUTH}} with base64(archipelago:<password>), and atomic-
writes (tmp + rename) to /var/lib/archipelago/bitcoin-ui/nginx.conf.
Idempotent: byte-compares before writing so unchanged input is a
no-op (no inode churn, no restart cascade).
- ProdContainerOrchestrator gains run_pre_start_hooks(app_id) returning
HookOutcome::{Rewritten, Unchanged}. Fires in install_fresh before
create_container, and in ensure_running: on Running + Rewritten
triggers a restart; on Stopped re-renders then starts.
- bitcoin-ui Dockerfile no longer COPYs a default.conf; the file now
arrives via runtime bind-mount of the rendered config. If the bind-
mount is ever missing, nginx starts with no site configured and
returns 404 everywhere — safe failure vs. serving upstream RPC with
a stale Authorization header.
- apps/{bitcoin,electrs,lnd}-ui/manifest.yml land as first-class
manifests. bitcoin-ui declares the bind-mount target and a dependency
on bitcoin-core; electrs-ui and lnd-ui declare their own deps and
health checks.
- 8 new unit tests on the render fn (idempotency, rotation, trimming,
missing/empty secret, template invariants) plus an integration test
asserting install(bitcoin-ui) actually lands a substituted nginx.conf
on disk via the hook. 39/39 container:: tests pass
(test_parse_image_versions pre-existing failure unchanged, out of
scope).
Step 5 of the rust-orchestrator migration. New file boot_reconciler.rs holds a
small Tokio task that calls ProdContainerOrchestrator::reconcile_all() on a
30-second cadence (answered design Q3).
* BootReconciler::new(orch, interval, shutdown) — shutdown is an Arc<Notify>
so callers can trigger a graceful exit without pulling in tokio-util.
* run_forever(self) — does one reconcile immediately, then loops on
tokio::select! { sleep_until | shutdown.notified() }. Shutdown interrupts
the sleep but never an in-flight reconcile_all call.
* Per-pass outcomes are logged at debug/warn; failures never propagate out
because reconcile_all already absorbs per-app errors into ReconcileReport.
Four tokio::test(start_paused = true) tests verify the loop cadence against a
CountingRuntime test double:
* initial_pass_fires_immediately — first reconcile runs with no delay
* second_pass_fires_after_interval — second pass fires after exactly
interval elapses in paused-clock time
* shutdown_terminates_loop — notify_one() lets run_forever return
* failure_in_one_pass_does_not_stop_loop — the loop keeps ticking even when
the first pass had to install a missing container
Not wired into main.rs yet — that is Step 6. Re-exported from container::mod
as BootReconciler + RECONCILER_DEFAULT_INTERVAL for the wire-up step.
Step 4 of the rust-orchestrator migration. Unifies the container lifecycle
surface behind a single trait so the RPC layer stops caring whether it is
talking to the dev or prod orchestrator.
* New trait core/archipelago/src/container/traits.rs: ContainerOrchestrator
with install / start / stop / restart / remove / upgrade / status / list /
logs / health, all keyed by app_id. Every method is async_trait-based.
* ProdContainerOrchestrator: the lifecycle methods are moved from inherent
impl into the trait impl (avoids name-shadowing recursion). Adoption and
reconcile remain inherent since only main.rs / BootReconciler call them.
* DevContainerOrchestrator: new trait impl that forwards to the existing
Dev-named methods, applying the dev container-name + port-offset rules
internally. New load_manifest_for() helper resolves app_id to
<data_dir>/apps/<app_id>/manifest.yml so trait-level install(app_id)
works in dev too. install_container(manifest, path) stays inherent for
the manifest-path RPC shape.
* RpcHandler now holds Option<Arc<dyn ContainerOrchestrator>> and, when in
dev mode, a separate Option<Arc<DevContainerOrchestrator>> for the
manifest_path install RPC. In prod mode RpcHandler::new() constructs a
ProdContainerOrchestrator and calls load_manifests() at startup.
* All seven container-* RPC guards no longer say dev mode required.
container-install still requires dev mode because its manifest_path
argument has no prod meaning; every other container RPC now works in both
modes via the trait.
BOOT STILL DOES NOT USE THIS. main.rs wire-up (Step 6) and BootReconciler
(Step 5) come next. Until then the prod orchestrator is constructed but nothing
populates /opt/archipelago/apps so it has zero manifests to manage, matching
the pre-Step-4 behaviour.
Verification: cargo build -p archipelago clean (11 expected unused method
warnings for methods not yet wired from main.rs). cargo test -p archipelago:
all 21 container::* tests pass (16 prod_orchestrator + 5 others). 24 other
test failures are pre-existing and unrelated (identity_manager / session /
wallet / mesh / credentials — all independently flaky on file-backed state).
Step 3 of the rust-orchestrator-migration. New file prod_orchestrator.rs (999 LOC)
implements the full public surface that will replace scripts/first-boot-containers.sh:
* install / start / stop / restart / remove / upgrade / status / list / logs / health
* adopt_existing: read-only scan that claims containers matching our manifests by
name, without recreating — preserves the v1.7.42 fixture on .116.
* reconcile_all: level-triggered, per-app failures collected rather than aborting.
* install_fresh: build-or-pull (Step 2 trait methods), relative build contexts
resolved against the manifest directory.
Naming rule (answered design Q1): UI app IDs (bitcoin-ui/electrs-ui/lnd-ui) get the
archy- prefix; backends keep their bare ID. An explicit extensions.container_name
always wins. Codified in compute_container_name() with unit tests for all three tiers.
Concurrency (answered design Q4): per-app tokio::sync::Mutex<()> created lazily,
protecting every mutating op against the reconciler loop. Acquiring the per-app
lock only needs a read lock on the map, so independent apps do not serialize.
16 tests: 3 sync naming rule tests + 13 tokio async tests covering install (pull,
build-absent, build-present, relative-context), reconcile (noop/exited/missing/
mixed-failure), adopt-by-name, upgrade sequence ordering, list filtering, health
state mapping, and unknown-app-id rejection. All pass.
Not wired into main.rs yet — that is Step 6. Crate builds clean with expected
unused warnings for the new re-exports.
ContainerConfig.image is now Option<String>, mutually exclusive with a new
optional ContainerConfig.build: Option<BuildConfig>. Exactly one of image
or build must be present, enforced in AppManifest::validate.
Adds ResolvedSource enum (Pull | Build) and ContainerConfig::resolve +
::image_ref helpers so the orchestrator can treat pull and build uniformly.
All 26 existing pull-only manifests continue to parse unchanged
(covered by existing_pull_only_manifests_still_parse test).
Call sites updated: podman_client, runtime::DockerRuntime, dev_orchestrator.
Dev orchestrator errors out cleanly on Build sources until Step 2 lands
build_image support on the runtime trait.
Step 1 of docs/rust-orchestrator-migration.md. 10 new unit tests, all pass.
Also includes: docs/rust-orchestrator-migration.md (design spec) and
docs/STATUS.md resume section for the next session.
- auth.rs now infers onboarding-complete from setup_complete + password_hash so
nodes stop bouncing users through the intro wizard after browser clear / update
/ reboot; the flag self-heals to disk on next check
- frontend: "backend uncertain" no longer defaults to /onboarding/intro —
useOnboarding returns null + callers poll / retry instead of flashing the wizard
- login sounds (synthwave, welcome voice, pop, whoosh, oomph) gated by
isFirstInstallPhase(); typing sounds unaffected
- removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog,
frontend config, Rust AppMetadata + install dispatch + install_penpot_stack;
docker/fips-ui + docker/nostr-vpn-ui + apps/penpot dirs and 5 icons deleted;
15 image versions deleted from tx1138, .168, gitea-local registries (.160
Gitea was 502 at release time — follow-up)
- AIUI baked into frontend release tarball via demo/aiui/; deploy-to-target
falls back to demo/aiui/ when the AIUI sibling checkout is missing
- prebuild hook syncs app-catalog/catalog.json → public/catalog.json so the
two copies can no longer drift (was the source of the "apps still visible"
bug — public/ had stale data)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Install flow
- api/rpc/package/install.rs: always append the literal image URL as a
last-resort pull candidate in do_pull_image, so images not carried by
any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install
instead of masquerading as a generic pull failure across every mirror.
- api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat
error, not just "file exists". Once bitcoin-knots' first-boot chowns
/var/lib/archipelago/bitcoin into the container's user namespace (700
perms, UID 100100/100101), the archipelago daemon can't even traverse
in — try_exists returns Err which unwrap_or(false) treated as "not
present" and drove a doomed write. Now errors out of the directory
traversal are treated as "conf already owned by container user" and
the write is skipped. Mirrors the lnd.conf pattern.
- api/rpc/package/install.rs: drop the hardcoded `prune=550` from the
conf default. Operators with multi-TB drives shouldn't be silently
pruned; users who want a pruned node can set it in bitcoin.conf
themselves. Full archive is the only honest default.
- api/rpc/package/config.rs: bitcoin-core now passes explicit
-server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI
args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and
reads conf + argv only; without these the RPC listens on 127.0.0.1
inside the container and rootlessport can't reach it, so the
bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call.
Bitcoin Knots keeps its own entrypoint-driven defaults.
- container/docker_packages.rs: split bitcoin-core out of the shared
AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with
bitcoin-core.svg and a Reference-implementation description; the
bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home
card showing "Bitcoin Knots" for a Core install.
Bitcoin node UI (docker/bitcoin-ui)
- index.html: impl name/tagline/logo now dynamic. applyImplBranding()
reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves
to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both
get their own icon and subtitle. Settings modal replaced its
hardcoded Regtest/txindex=1/port-18443 placeholders with live values
from getblockchaininfo + getindexinfo + getzmqnotifications.
- index.html: new Storage info card (Full Archive · X GB /
Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on
the main dashboard, same level as Network. Settings modal mirrors it
with the prune height when applicable.
- Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the
bg-network.jpg used by the dashboard are now COPY'd into the image
under /usr/share/nginx/html/assets. Previously the <img src> pointed
at paths that 404'd into the SPA fallback and the onerror handler
hid the broken logo silently.
Frontend
- appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334),
HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core).
Without these the AppSessionFrame showed "No URL found for
bitcoin-core" and the home/app-list title fell through to the raw id.
- settings/AccountInfoSection.vue: backfill What's New entries for
v1.7.31 through v1.7.37 that had been missed in earlier cuts.
Release plumbing
- releases/v1.7.37-alpha/: binary + frontend tarball.
- releases/manifest.json: v1.7.37-alpha, sha256/size refreshed.
- Cargo.toml / package.json: version bumps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Backend: install.rs registry reachability probe now strips the
`host[:port]/namespace` suffix before appending `/v2/` (the Docker
V2 API lives at the host root, not under the namespace) and accepts
HTTP 405 in addition to 200/401 as "registry daemon alive". This
fixes false "unreachable" reports on the Test button for Gitea and
other registries that protect their /v2/ endpoint.
- Backend: stacks.rs install_indeedhub_stack now force-removes any
leftover indeedhub-* containers and indeedhub-net before creating
the stack. A partial install (or the old first-boot stub racing the
installer) used to leave containers around that blocked re-install
with "name already in use". Re-running the App Store install now
self-heals.
- Backend: registry.rs load_registries auto-merges any default
registry URLs missing from the saved config (appended with priority
max+10+i, persisted). Lets new default mirrors (e.g. Server 3 OVH)
roll out to existing nodes without manual config edits. Explicit
removals still stick — URLs absent from disk AND absent from
defaults stay gone.
- Backend: update.rs adds DEFAULT_TERTIARY_MIRROR_URL at
http://146.59.87.168:3000/ (Server 3 OVH) to default_mirrors, with
the same auto-merge-on-load behavior as registries. Test updated
for 3-mirror default (.160, tx1138, .168).
- Scripts: dropped the first-boot IndeedHub stub (~38 lines in
first-boot-containers.sh §8b). It predated the proper stack
installer, raced it, and was the main source of the name-conflict
mess the stacks.rs cleanup above now also guards against.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Backend: unified pull-progress streaming across primary AND fallback
registries. Earlier code only streamed for the primary attempt; if it
failed fast (VPS 404, etc.) the UI froze at 0% until the fallback
finished. The waterfall now uses a single shared helper that streams
podman stderr through update_install_progress for every URL tried.
- Backend: PackageDataEntry gains uninstall_stage, set at each phase of
handle_package_uninstall ("Stopping containers (i/total)",
"Cleaning up volumes", "Removing app data"). State flips to Removing
during the pipeline.
- Frontend: MarketplaceAppCard renders the live progress bar with byte
counts during installs, matching the System Update download bar style.
- Frontend: AppCard renders the live uninstall stage label per app.
Modal closes immediately on confirm so concurrent uninstalls each
show their own progress on their own card.
- Cleanup: removed dead helpers (image_candidates, rewrite_for_primary,
primary_image_url, pull_from_registries_with_skip) made unused by
the install.rs refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New Settings → App registries page (/dashboard/settings/registries)
that mirrors the update-mirrors experience: list of configured
registries, test reachability, set primary, add/remove. New
registry.set-primary RPC; existing registry.{list,add,remove,test}
reused.
- Default RegistryConfig flipped: VPS (23.182.128.160:3000/lfg2025) is
now Server 1 (primary), tx1138 is Server 2 (fallback).
- Install pipeline now rewrites the first pull to the primary registry
URL before attempting it. Before this, installs always hit whichever
registry the image was hardcoded to, so changing the primary didn't
actually affect where images came from. On failure, the existing
fallback walk skips the primary (already tried) and walks the rest.
- App catalog proxy UPSTREAMS order flipped so the catalog follows the
same VPS-first rule.
- Reboot overlay: animated "a" logo now sits in the center of the ring
(matches the screensaver composition). Extracted the logo-wrapper
pattern inline.
7/7 registry tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure formatter output — no semantic changes. Sweeping these into their
own commit so the FIPS integration diff that follows stays scoped to
the actual feature.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>