316 Commits

Author SHA1 Message Date
archipelago
e456c9701b fix(peer-files): stream large cloud downloads + surface real errors (#30, #38)
Large peer downloads (~178MB) failed with a generic 'Operation failed', and
the download path had three stacked problems:

- The FIPS reqwest client used a hard-coded 20s total timeout regardless of the
  caller's .timeout(), so a big transfer over the mesh aborted at 20s before
  the Tor fallback could help. Honor the per-request timeout (client_with_timeout).
- The peer-content proxy buffered the whole file into node memory via
  resp.bytes() before sending a byte, and capped the transfer at 60s. Stream
  the body through with hyper::Body::wrap_stream (constant memory) and raise the
  timeout to 900s; bump the nginx peer-content read timeout to match.
- Free downloads pulled the file as base64 over RPC, doubling it in node memory
  and the browser — fatal for large files. Download free files by streaming
  from /api/peer-content straight to disk, after a 1-byte Range probe that
  surfaces the real reason (peer offline on mesh and Tor) instead of a generic
  failure. Paid downloads now return the real error through the {error} channel
  the UI already displays.

Adds the reqwest 'stream' feature for bytes_stream().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 03:10:21 -04:00
archipelago
1843739e0c fix(install): restart stack containers that crash on first start (#25)
Apps could fail install when a stack member exited on its first start
because a dependency (db/redis/the bitcoin node) was not ready yet — a
transient crash, not a broken install. wait_for_stack_containers now
restarts each exited/dead container up to 3 times before declaring the
install failed; the runtime supervisor keeps it alive afterwards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 15:14:09 -04:00
archipelago
aa9e0f02b7 fix(cloud): pin peer file-card filename + action buttons to the bottom (#11)
Make each peer file card a flex column filling its grid cell (flex flex-col
h-full) and pin the body row (filename + Play/Download) with mt-auto, so cards
with a media preview and cards without line their footers up across the row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 09:27:29 -04:00
archipelago
edd03e542d feat(storage): encrypt chat history + mesh contacts at rest, atomic writes, persist contacts (#12)
User: chat history (messages + mesh/Tor contacts) must persist and be
secure/encrypted per best practice. Root cause of the .198 loss was the B17
mount race writing empty stores over real data (B17 already fixes the trigger);
this hardens storage so it can never silently lose or expose data:

- storage_crypto: shared at-rest envelope mirroring credentials::store — key =
  SHA-256(domain ‖ node identity key) (seed-derived, per-store domain
  separation), ChaCha20-Poly1305 AEAD with a random 96-bit nonce, tamper-evident.
  Transparent migration of legacy plaintext files. Unit-tested (round-trip,
  wrong-key/tamper rejection, plaintext detection).
- messages.json: encrypted at rest + ATOMIC write (temp+rename) so a crash/
  reboot mid-write cannot corrupt history; decrypt-with-migration on load; a
  failed decrypt never overwrites the on-disk data.
- mesh contacts (alias/notes/pinned/blocked): were ONLY in memory and lost on
  every restart — now persisted to mesh-contacts.json (encrypted, atomic),
  loaded on MeshState startup, saved after contacts-save/contacts-block.

Explicit clear (mesh.clear-all) still wipes everything, as intended.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 08:54:37 -04:00
archipelago
2943fd0c5e style(core): cargo fmt (B1/B3/B13 follow-up — satisfy release fmt gate) 2026-06-16 03:09:18 -04:00
archipelago
bf24bbc15a fix(mempool): resolve CORE_RPC_HOST to the actual bitcoin node (Knots/Core) (B12)
CORE_RPC_HOST was hardcoded to bitcoin-knots in three env-render paths, so on a
bitcoin-core node (container named bitcoin-core) mempool-api could not reach
Bitcoin RPC. Both node variants are reachable on archy-net by container name —
only the name differs.

- Legacy direct-podman (stacks.rs) and config.rs::get_app_config now use a new
  dependencies::detect_bitcoin_rpc_host() (pure, unit-tested pick_bitcoin_host).
- Quadlet/manifest path (the modern fleet default): add a {{BITCOIN_HOST}}
  derived-env placeholder — HostFacts.bitcoin_host + resolve_derived_env render
  it; prod_orchestrator detects Knots/Core via podman ps, resolved on demand
  only for manifests that use the placeholder. mempool-api manifest moves
  CORE_RPC_HOST from static env to derived_env: {{BITCOIN_HOST}}.

Tests: pick_bitcoin_host (5 cases incl. substring safety), container-crate
resolve_derived_env, and orchestrator mempool_core_rpc_host_follows_bitcoin_node
(core->bitcoin-core, knots->bitcoin-knots). No-regression confirmed: picker
returns bitcoin-knots live on .198. Live bitcoin-core validation pending (no
core node available). Sibling hardcodes (lnd/btcpay/electrumx/fedimint) tracked
as B12b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 02:07:39 -04:00
archipelago
5c8707432b fix(cloud): Range-streaming proxy for peer media so it plays/seeks (B3)
Peer media (music/video) wouldn't play: the frontend downloaded the whole
file via RPC as base64 and made a non-seekable Blob URL, so <video>/large
<audio> stalled and big files hit the RPC timeout.

Add GET /api/peer-content/<onion>/<id> — a same-origin, session-gated proxy
that forwards the browser's Range header to the peer's /content/<id> (which
already returns 206 Partial Content) and passes status + Content-Range +
Content-Type back. PeerFiles.playMedia() now points <video>/<audio> at this
streaming URL for free content instead of buffering a base64 blob, so the
player can seek and start immediately. Onion/id validated to prevent
SSRF/path traversal. (Paid preview keeps its existing flow.)

Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:46:51 -04:00
archipelago
0801dd6632 feat(cloud): show Tor/FIPS transport pill on peer browse (B21)
content.browse-peer now returns the transport that actually reached the
peer (fips/tor/mesh/lan). PeerFiles shows it as a small coloured pill next
to the peer name (FIPS/Mesh green, LAN blue, Tor amber) and the loading
text no longer hardcodes "Connecting via Tor" (it was misleading when FIPS
was used). Pairs with B14 (transport recording).

Verified: cargo build --release EXIT 0; vue-tsc --noEmit EXIT 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:25:39 -04:00
archipelago
1c6dc153ce fix(content): use re-exported federation::record_peer_transport path (repair build)
The B14 commit referenced crate::federation::storage::record_peer_transport
but `storage` is a private module — record_peer_transport is re-exported at
crate::federation::. E0603 broke the build. Use the re-exported path (as
load_nodes/fips_npub_for_onion already do). Verified: cargo build --release
EXIT 0. Also logs B21 (Tor/FIPS pill) plan.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:15:01 -04:00
archipelago
f2e3710c28 fix(content): record peer transport on cloud browse/download/preview (B14)
The 4 content peer handlers (browse, download, download_paid, preview)
captured the transport returned by PeerRequest::send_get() but discarded
it, so the federation node's last_transport was never updated for cloud
activity — the UI showed Tor/none even when FIPS was used. Call
record_peer_transport() after each successful fetch (same as sync does).

Note: live data shows FIPS still reaches only some peers (many genuinely
fall back to Tor) — tracked separately as B14b (FIPS reachability).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:02:13 -04:00
archipelago
1db720af13 fix(lnd): repair fleet-wide CORS on LND connect-wallet endpoints (B5)
The LND wallet UI (served on its own app port) fetches /lnd-connect-info
and /proxy/lnd/* cross-origin, so both need correct CORS headers.

(a) Older nginx configs add their own Access-Control-Allow-Origin in the
    /lnd-connect-info location on top of the one the backend sets, yielding
    a DUPLICATE header that browsers reject ("multiple values"). bootstrap
    now strips that redundant nginx add_header (backend owns CORS).
(b) /proxy/lnd/* returned a 401 with no CORS headers when the session
    check failed, so the browser saw an opaque CORS error instead of a
    readable 401. Add unauthorized_cors() and use it on that path.

Adds tests/production-quality/ (bug tracker + lnd-cors-test.sh harness).
Verified: harness 4/4 on .116, .198, .103.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 11:31:14 -04:00
archipelago
e056c2477b fix(fips,federation,ui): mesh content browse, removed-node tombstones, modal sizing
FIPS peer content browse over the mesh was failing with "Peer returned
error: 404 Not Found" and never falling back to Tor. `is_peer_allowed_path`
only allowed `/content/<id>` (item fetches) — the catalog endpoint is
exactly `/content` (no trailing slash), so it 404'd over the FIPS peer
listener. A FIPS 404 was also treated as a successful response, so the dial
never retried Tor. Fixes: allow `/content` over the mesh; add
`fips_should_fall_back()` so a FIPS 404/5xx in Auto mode falls back to Tor
(handles version-skew peers reaching a different route). Also correct the
reconnect hint text — the public anchor is TCP/8443, not UDP/8668.

Federation: deleted nodes reappeared because transitive discovery
(`merge` of a peer's advertised trusted peers) re-added any unknown DID.
Add a tombstone store (`removed-nodes.json`): remove_node tombstones the
DID, transitive merge skips tombstoned DIDs, and a remote-triggered
peer-joined is ignored for a removed DID. Explicit local re-add (add_node)
clears the tombstone.

UI: the app credentials modal panel stretched edge-to-edge (height:100%,
max-width:none, items-stretch overlay). Constrain it to a centered card
(max-width 34rem, rounded, dimmed full-screen backdrop) matching the
AppIconGrid / wallet-receive modal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 08:09:26 -04:00
archipelago
95f9a805b1 feat(fips): connect to public mesh anchor over TCP + wire daemon updates
The whole fleet was silently never reaching the FIPS mesh: the default
public anchor was configured as fips.v0l.io:8668/udp, but the anchor only
answers on TCP/8443. Fix the default to 185.18.221.160:8443/tcp (IPv4
literal — the hostname resolves IPv6-first and the daemon binds v4-only,
which fails the handshake with EAFNOSUPPORT), and auto-seed it in
anchors::load() so every node dials it without operator action (removal
still persists). Proven live on .116: cold start → anchor_connected in
~400ms, anchor became mesh parent.

Wire fips::update::apply() against upstream GitHub releases (stable
channel only): resolve /releases/latest → SHA256-verify the .deb against
checksums-linux.txt → install → restart. dpkg runs via `systemd-run` to
escape archipelago's ProtectSystem=strict sandbox (else /var/lib/dpkg is
read-only), with --force-confold (archipelago manages /etc/fips conffiles)
and --force-downgrade (dev builds sort newer than the stable tag).
Validated live: .116 upgraded 0.3.0-dev -> stable v0.3.0.

Also: standalone fips-ui dashboard app (apps/fips-ui + docker/fips-ui,
static nginx proxying /rpc/v1 same-origin, copiable own-anchor address);
reserve UI port 8336; register fips/fips-ui as platform-managed. Includes
the Lightning wallet cross-origin (CORS) + LND proxy auth + nginx
self-healer fix so the wallet screen connects instead of "failed to fetch".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 06:41:48 -04:00
archipelago
91adc281ca fix(lnd): per-node wallet password + locked-wallet self-heal on login
Replaces the fleet-wide hardcoded WALLET_PASSWORD='hellohello' that left wallets
LOCKED after OTA/reboot (auto-unlock used the wrong password fleet-wide).

Forward fix (both init paths unified, validated cargo check + LND REST mechanics
on a scratch wallet):
- Per-node random 256-bit secret in secrets/lnd-wallet-password (0600), mirroring
  secrets/bitcoin-rpc-password. read_wallet_password (no-gen) vs
  ensure_wallet_password (gen at init only).
- container/lnd.rs init AND api/rpc/lnd/wallet.rs seed-derived init both use the
  per-node secret (wallet.rs keeps recoverable derived entropy; password unified).
- Unlock tries [per-node secret, legacy 'hellohello']; single-attempt primitive
  distinguishes invalid-passphrase (fail fast, try next) from not-ready (retry),
  so a wrong password no longer hangs the boot path ~60s.

Migration (candidate-unlock + rotate, best-effort at login):
- change_wallet_password (WalletUnlocker.ChangePassword) + migrate_locked_wallet:
  if LOCKED, try candidates as current pw and ChangePassword onto the per-node
  secret so future boots auto-unlock. Hooked into auth.login (non-blocking) with
  the just-verified password as the candidate.

NOT YET: seed-recovery fallback for wallets where no candidate matches (e.g.
.116/.228) — destructive, needs entropy-source/funds-safety handling; next pass.
NOT shipped: pending end-to-end validation on a real node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 11:19:56 -04:00
archipelago
0ed892a412 fix: wallet receive reliability, bitcoin install self-heal, ElectrumX app tile
Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha
(all nodes were already on the latest build — these were live bugs, not stale
builds), plus the missing ElectrumX tile, and adds automated coverage so each
can't regress silently.

Receive address (".116 receive fails", ".228 false 'wallet is locked'"):
- LND publishes its REST API on a host port that can drift from the manifest
  (a container created when the mapping was 8080 kept publishing 8080 after the
  manifest moved to 18080). The in-process client connects to the manifest port,
  gets connection-refused, and wallet init fails forever while the container
  looks "Up". Add published-port drift detection to the reconciler
  (container_ports_drifted / host_port_bindings_drifted) that recreates a
  drifted backend even for restart-sensitive apps — a drifted container is
  already broken, so leaving it "untouched" only perpetuates the failure.
- Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED,
  WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they
  survive the RPC error sanitizer instead of collapsing to the generic
  "Operation failed". The UI maps the code instead of guessing wallet state from
  substrings — so an unreachable REST endpoint is no longer mislabelled "locked".

Bitcoin install (".198 bitcoin gone / reinstall just stops"):
- bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only
  generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so
  secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate
  it idempotently before bitcoin starts (ensure_app_secrets, reusing
  ensure_txrelay_credentials), and name the missing secret in the error so a
  genuine gap is actionable instead of a bare "IO error".

ElectrumX app tile missing on every node with it installed:
- The catalog generator dropped electrumx because the manifest had no
  interfaces.main block, so the tile had no launch URL and was hidden. Declare
  the companion UI port (50002) in the manifest, regenerate the catalog, and let
  an app with a known launch URL stay launchable while its backend is still
  "starting" (ElectrumX indexes for 10m+).

Test harness:
- New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness
  (validated live; port-drift catches the real .116 drift).
- Rust unit tests for drift detection, the receive reason-code classifier, and
  the named-missing-secret error; vitest for the UI code mapping.
- create-release.sh now runs tests/release/run.sh and aborts the release on
  failure — previously it ran no tests at all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 03:12:56 -04:00
archipelago
c800293f1f fix: bitcoin receive, AIUI pointer input, electrs self-heal, OTA timeout
- LND wallet: request correct address type so receive-address generation
  no longer 400s
- AIUI/app session: on-screen pointer can click + type into app content
  (incl. app store search); "open in new tab" opens the phone browser;
  mobile credential modal centered instead of full-height
  (remote-relay.ts, AppSession.vue, AppSessionFrame.vue, AppIconGrid.vue,
  openExternal.ts, WebViewScreen.kt) + remote-relay tests
- health_monitor: electrs auto-recovers from a corrupt index and shows a
  percent/block-height progress screen while reindexing (useElectrsSync.ts)
- update.rs: drop retired tx1138 secondary mirror (one-time migration);
  longer download timeout for slow connections
- CHANGELOG: v1.7.90-alpha notes
- tests/release/run.sh: harness tweaks

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 04:49:32 -04:00
archipelago
c49e8fcacd fix: harden OTA updates, AIUI desktop gap, LND no-proxy
- update.rs: post-OTA probe falls back to http://127.0.0.1/ on connect
  error (nginx binds :80, not :443) so good updates are no longer rolled
  back; recover stuck update_in_progress; avoid ETXTBSY on running binary
- LND: REST client bypasses proxy, GET newaddress p2wkh, wallet
  readiness/unlock after restart
- Dashboard.vue: chat route back to plain h-full (desktop bottom-gap fix)
- vite.config.ts: dev-only /aiui proxy
- tests/release/run.sh: release gate harness (static+frontend+backend)
- CHANGELOG: v1.7.89-alpha notes

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 01:23:32 -04:00
archipelago
b8ac68d844 fix: restore aiui and bitcoin receive before release 2026-06-12 05:10:03 -04:00
archipelago
8d4b309753 fix: patch bitcoin receive and full-screen launch overlays 2026-06-12 04:42:23 -04:00
archipelago
d6f108d818 chore: snapshot release workspace 2026-06-12 03:00:15 -04:00
archipelago
6a30ff11bd chore: release v1.7.84-alpha 2026-06-11 04:44:58 -04:00
archipelago
c393b96da3 backend: harden rootless app lifecycle orchestration 2026-06-11 00:24:32 -04:00
archipelago
626a89bdbc fix(apps): proxy saleor storefront media 2026-05-22 17:08:03 -04:00
archipelago
a578834462 fix(apps): repair saleor storefront startup 2026-05-21 21:33:51 -04:00
archipelago
8eb03d106e fix(apps): repair saleor storefront graphql origin 2026-05-21 00:30:22 -04:00
archipelago
34c4e87d14 feat(apps): add saleor storefront 2026-05-20 23:02:57 -04:00
archipelago
cc1f8fba72 fix(apps): stabilize saleor and netbird release paths 2026-05-20 20:38:52 -04:00
archipelago
f4368785f0 fix(apps): unblock saleor and netbird first-use flows 2026-05-20 00:28:30 -04:00
archipelago
92c58141af fix(apps): stabilize saleor and netbird launch 2026-05-19 21:45:17 -04:00
archipelago
522c046525 feat(apps): add saleor and harden netbird repair 2026-05-19 20:11:22 -04:00
archipelago
bd69ef41d5 fix(apps): repair netbird login and iframe focus 2026-05-19 19:21:43 -04:00
archipelago
1836b035b4 fix(mobile): improve app store search and launches 2026-05-19 18:29:04 -04:00
archipelago
f0bd49d03d fix(apps): repair netbird install and app icons 2026-05-19 17:20:32 -04:00
archipelago
ab96c97cb9 fix(apps): self-host netbird and stabilize app sessions 2026-05-19 16:02:35 -04:00
archipelago
87be717f40 fix(apps): keep slow installs visible 2026-05-19 14:29:20 -04:00
archipelago
d736364ad7 fix(apps): stabilize btcpay and public proxy launch flows 2026-05-19 09:26:43 -04:00
archipelago
19dbf60f03 fix(apps): detect stale npm created containers 2026-05-18 10:04:22 -04:00
archipelago
7104ba0cbf fix(apps): repair orchestrator starts before launch 2026-05-18 09:20:12 -04:00
archipelago
19f2125a4d fix(apps): repair stale nginx proxy manager ports 2026-05-17 22:38:04 -04:00
archipelago
413d50116e fix(apps): restore mobile and website launching 2026-05-17 19:22:18 -04:00
archipelago
7804223152 chore: release v1.7.57-alpha 2026-05-17 17:30:04 -04:00
archipelago
01ec0565a6 fix: restore wifi setup and ssh password updates 2026-05-15 18:15:06 -04:00
Dorian
835c525218 chore(release): stage v1.7.55-alpha 2026-05-13 15:09:22 -04:00
archipelago
c0751e2551 chore(release): stage v1.7.54-alpha 2026-05-06 09:23:57 -04:00
archipelago
1a0d8a432c chore(release): stage v1.7.53-alpha 2026-05-05 13:59:50 -04:00
archipelago
745cb1c626 chore(release): stage v1.7.52-alpha 2026-05-05 11:29:18 -04:00
archipelago
6bbe1b96cf refactor: drop dead code surfaced by cargo
cargo check was showing five real warnings, all genuinely dead:

* container/mod.rs   — re-exports compute_container_name, AdoptionReport,
                       ReconcileAction, ReconcileReport were unused outside
                       prod_orchestrator. Drop from the pub use line.
* prod_orchestrator  — with_runtime + insert_manifest_for_test only exist
                       for the test module in the same file. Mark them
                       #[cfg(test)] so they don't appear in release builds.
* async_lifecycle    — remove_package_entry has no callers; doc claims
                       "used for install-failure cleanup" but nothing
                       cleans up. Delete (10 lines).
* registry.rs        — `use tracing::{debug, info};` had no consumers.
* fips.rs            — unused-assignment chain on last_status. The poll
                       loop always sets it on every break path, so the
                       initial `None` and the unwrap_or_else fallback
                       were both dead. Refactored to `let after = loop
                       { ...; break s; };`.

cargo check is now clean. cargo test --workspace --bins: 614 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:34:02 -04:00
archipelago
6603227874 fix(install): auto-clean stuck OTHER-variant bitcoin container
If bitcoin-core was installed but never started (e.g. port 8332 already
bound by bitcoin-knots), the container sticks in `created` state forever.
The old conflict check refused EVERY future bitcoin install — including
re-install of the running variant — leaving no UI path to recovery.

Now the check distinguishes states:
  - missing                       → no conflict, continue
  - running                       → real conflict, refuse install
  - created/exited/configured/... → stuck; auto-remove and continue

Volumes are untouched; only the dead container record goes away.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:59:11 -04:00
archipelago
27ff1d5b52 fix(install): generate bitcoin RPC password before orchestrator install
Bitcoin containers were exiting in ms after start because the orchestrator
install path skipped the credential-materialisation step the legacy path
did. resolve_secret_env then failed to read
/var/lib/archipelago/secrets/bitcoin-rpc-password, the container started
with no password, and bitcoind crashed before logs were useful.

Two changes:

1. install.rs — call bitcoin_rpc_credentials() for bitcoin/bitcoin-core/
   bitcoin-knots before any install branch runs. The function generates +
   persists on first call (OnceCell-cached), so this is idempotent.

2. manifest.rs::resolve_secret_env — return ManifestError::Invalid when a
   resolved secret trims to empty, instead of silently producing
   `KEY=` env vars that crash auth.

Adds a unit test for the empty-secret rejection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:39:56 -04:00
archipelago
f9e34fd0c6 refactor(install): route orchestrator-managed apps through orchestrator first
Phase 3a of the install path consolidation. Two coupled changes:

1. install.rs handle_package_install: gate the legacy "container exists →
   adopt + return" probe on !orchestrator_managed. Apps the orchestrator
   knows about (bitcoin-knots, bitcoin-core, lnd, electrumx, fedimint,
   filebrowser, btcpay-server stack apps, mempool stack apps, plus the
   companion UIs that just moved to Quadlet) skip the legacy probe and
   fall straight into the orchestrator branch.

   The legacy adopt block was returning success on a bare `podman start`
   exit-0 — even when the process inside the container crashed seconds
   later. That's the .228 "running but unreachable" failure mode. The
   orchestrator's ensure_running honors the manifest's health check and
   pre-start hooks (e.g. re-renders bitcoin-ui's nginx.conf if the RPC
   password rotated), so this is a behavioral upgrade, not just a
   refactor.

2. ProdContainerOrchestrator::install: make idempotent. Previously it
   blindly called install_fresh which would fail on `podman create` if
   the container name already existed. Now it delegates to ensure_running:
     - Container Running + healthy → no-op (refresh hooks, restart if
       config rewritten)
     - Container Stopped/Exited → start (with hook refresh)
     - Container missing → install_fresh
     - Container in wedged state (Created/Paused/Unknown) → force-recreate

   Without this, change #1 would regress every "container already exists"
   case for the 18 orchestrator-managed app IDs. With it, install becomes
   the single source of truth for "make app X be in the desired state."

Tests: 654 passed across the workspace (614 unit + 37 orchestration + 3
rpc), 0 failures. The 20 prod_orchestrator tests cover the install /
ensure_running / reconcile paths the new install delegates through.

Net delta: install.rs grows by ~30 lines (gating wrapper + comments),
prod_orchestrator.rs grows by ~30 lines (idempotent install body). Both
are temporary — the larger deletions (~1700 lines) come once every app
has been verified through the orchestrator path in subsequent phases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:12:52 -04:00