114 Commits

Author SHA1 Message Date
archipelago
1bce694ebb feat(ui): mobile mesh tabs, AIUI-style audio player, cloud grid + map fixes
UI (this session):
- Global audio player now scales the whole interface into the space above it
  on desktop (sidebar + main) and docks directly above the tab bar on mobile;
  it stays visible while navigating.
- Mesh mobile redesign: floating Chat / BTC / Dead Man / AI / Map tab strip
  with a single fixed, internally-scrolling pane (page no longer scrolls);
  tabs hide while a conversation is open; floating back button; collapsible
  Device panel (starts collapsed); keyboard-aware conversation sizing via
  VisualViewport so the chat sits just above the keyboard.
- Cloud file grid: uniform 4/3 card heights (folders + images match).
- Swipe left/right switches tabs on the Apps and Web5 screens.
- Map tool fills its pane (no bottom gap); fix skewed Share Location toggle
  on mobile (global min-height rule was deforming the switch).
- Trim redundant helper copy from the mesh AI tab.

Also bundles pre-existing in-progress work that was already in the tree:
mesh listener/session + wallet + container + bitcoin-status backend changes,
docker UI updates, and assorted other UI tweaks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:52:26 -04:00
archipelago
c4855526fe feat(wallet): wire fmcd as core app + dual-ecash receive
Fedimint never appeared in Wallet > Settings > Fedimint because the
fmcd (fedimint-clientd) sidecar was never installed: ensure_default_
federation() needs the fmcd password to reach the daemon, found none,
and silently no-oped, leaving the registry empty.

- prod_orchestrator: add fedimint-clientd to the baseline auto-install
  set so it self-heals onto every node and auto-joins the default
  federation; generate the fmcd-password secret before secret_env
  resolves.
- fedimint_client: ensure_fmcd_password (random hex, 0600) shared with
  the container's secret_env; from_node reads the same secret (legacy
  fmcd/password kept as fallback); reissue_into_any redeems received
  notes into the first joined federation that accepts them.
- wallet.ecash-receive: dual-token — cashu* tokens redeem at the mint,
  anything else is reissued via fmcd; returns the kind + federation_id.
- UI: receive box advertises "Cashu or Fedimint" and reports which kind.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 09:52:26 -04:00
archipelago
83bb589ea6 style: cargo fmt for v1.7.99-alpha release gate
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 19:50:46 -04:00
archipelago
bd567cd165 feat(wallet,content,seed): Fedimint dual-ecash, paid content streaming, seed ceremony
- Fedimint ecash alongside Cashu: fedimint-clientd (fmcd) HTTP bridge,
  fedimint_client, fedimint RPC, wallet wiring
- Paid peer content: content invoices + streaming content server + content RPCs
- Seed-phrase ceremony/reveal RPCs and CLI ceremony tool
- LND wallet, mesh status/messaging, app-stack (netbird HTTPS), and
  decoupled-update wiring; Fedimint Client core app in catalog

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 19:21:07 -04:00
archipelago
7c458ede8e Merge agent-trust-wip (DHT Phases 0–4) into main
Integrates the DHT/peer-distribution line with the v1.7.98-alpha release
fixes:
- Phase 0 signed-catalog trust + release-root key (KAT-pinned)
- Phase 1 BLAKE3 content addressing alongside SHA-256
- Phase 2 swarm-assist fetch seam (origin always wins) + iroh-blobs
  provider — heavy iroh deps stay behind the off-by-default `iroh-swarm`
  feature, so the default build/deploy is unaffected
- Phase 3 signed Nostr seed-advertisement + discovery glue + paid swarm
  serving + "Networking Profits" Settings page
- Phase 4 paid swarm streaming (cross-mint ecash, Shape-A paid ALPN,
  streaming.prepare-payment), also iroh-swarm-gated

Conflicts resolved: seed.rs (kept release-root KAT tests), update.rs
(comment-only, OTA logic identical), Cargo.lock (regenerated against the
merged Cargo.toml). Default-feature build is clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 07:50:06 -04:00
archipelago
3aea8c5bfa fix(orchestrator): rebuild local UI images when source changes (#34)
The prod orchestrator only checked whether a build-image tag was *present*
before deciding to skip the build. The local UI images (bitcoin-ui, lnd-ui,
electrs-ui) COPY a built neode-ui dist, so a UI update changed the source but
left the old tag in place and the new UI never shipped.

Gate the build on a content fingerprint of the build context (sorted relative
path + length + mtime, SHA-256) recorded in a per-tag stamp under data_dir.
Rebuild whenever the fingerprint differs from the one that produced the
existing image; podman's own COPY-layer cache keeps a no-op rebuild cheap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-17 03:09:56 -04:00
archipelago
27f11bf85a feat(trust): wire Phase 0 signed-catalog verification + pin release-root KAT
Completes the parked trust module and wires it into the live build:
- main.rs: register `mod trust`
- app_catalog::fetch_one: verify the release-root detached signature when
  present (verify against raw JSON so forward-compat fields stay in the
  signed preimage); accept unsigned during the migration window, hard-reject
  a present-but-bad signature so a tampering mirror can't pass altered bytes
- seed: pin release-root Ed25519 known-answer test (priv+pub) for the
  signing ceremony / pinned-anchor / external-verifier cross-check
- signed_doc: drop unused import

20/20 Phase 0 unit tests pass (trust::canonical/did/signed_doc/anchor,
seed release-root, app_catalog). Crate compiles clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 12:40:57 -04:00
archipelago
aa9e0f02b7 fix(cloud): pin peer file-card filename + action buttons to the bottom (#11)
Make each peer file card a flex column filling its grid cell (flex flex-col
h-full) and pin the body row (filename + Play/Download) with mt-auto, so cards
with a media preview and cards without line their footers up across the row.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 09:27:29 -04:00
archipelago
bf24bbc15a fix(mempool): resolve CORE_RPC_HOST to the actual bitcoin node (Knots/Core) (B12)
CORE_RPC_HOST was hardcoded to bitcoin-knots in three env-render paths, so on a
bitcoin-core node (container named bitcoin-core) mempool-api could not reach
Bitcoin RPC. Both node variants are reachable on archy-net by container name —
only the name differs.

- Legacy direct-podman (stacks.rs) and config.rs::get_app_config now use a new
  dependencies::detect_bitcoin_rpc_host() (pure, unit-tested pick_bitcoin_host).
- Quadlet/manifest path (the modern fleet default): add a {{BITCOIN_HOST}}
  derived-env placeholder — HostFacts.bitcoin_host + resolve_derived_env render
  it; prod_orchestrator detects Knots/Core via podman ps, resolved on demand
  only for manifests that use the placeholder. mempool-api manifest moves
  CORE_RPC_HOST from static env to derived_env: {{BITCOIN_HOST}}.

Tests: pick_bitcoin_host (5 cases incl. substring safety), container-crate
resolve_derived_env, and orchestrator mempool_core_rpc_host_follows_bitcoin_node
(core->bitcoin-core, knots->bitcoin-knots). No-regression confirmed: picker
returns bitcoin-knots live on .198. Live bitcoin-core validation pending (no
core node available). Sibling hardcodes (lnd/btcpay/electrumx/fedimint) tracked
as B12b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 02:07:39 -04:00
archipelago
1973d76427 style: rustfmt lnd migrate_locked_wallet matches! call
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 14:41:40 -04:00
archipelago
3214d6aff3 fix(lnd): self-heal unrecoverable locked wallet via wipe+recreate
When an existing LND wallet is locked and none of the candidate passwords
(per-node secret, legacy constant) open it, the node can never auto-unlock
unattended. unlock_existing_wallet now returns Ok(false) for "all candidates
actively rejected" (vs Err for transient "LND not ready"), and
ensure_wallet_initialized responds by recreating the wallet:

  - mark the lnd container user-stopped so the health monitor won't
    re-launch it (and re-open the wallet) mid-wipe,
  - stop lnd, delete its wallet/chain/graph state as root,
  - start lnd, wait for NON_EXISTING, re-init a fresh wallet on the
    per-node secret, then clear the user-stopped flag.

LND runs as a plain bridge-network podman container (not a Quadlet unit),
so it is restarted via `systemd-run --user --scope podman`, matching the
orchestrator/health-monitor path.

Alpha nodes hold no funds and a wallet locked with an unknown password is
already inaccessible, so the wipe loses nothing reachable. Completes the
forward fix from 91adc281 for nodes whose wallet pre-dates the per-node
secret and whose password is unrecorded (e.g. .116/.228).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 14:08:33 -04:00
archipelago
91adc281ca fix(lnd): per-node wallet password + locked-wallet self-heal on login
Replaces the fleet-wide hardcoded WALLET_PASSWORD='hellohello' that left wallets
LOCKED after OTA/reboot (auto-unlock used the wrong password fleet-wide).

Forward fix (both init paths unified, validated cargo check + LND REST mechanics
on a scratch wallet):
- Per-node random 256-bit secret in secrets/lnd-wallet-password (0600), mirroring
  secrets/bitcoin-rpc-password. read_wallet_password (no-gen) vs
  ensure_wallet_password (gen at init only).
- container/lnd.rs init AND api/rpc/lnd/wallet.rs seed-derived init both use the
  per-node secret (wallet.rs keeps recoverable derived entropy; password unified).
- Unlock tries [per-node secret, legacy 'hellohello']; single-attempt primitive
  distinguishes invalid-passphrase (fail fast, try next) from not-ready (retry),
  so a wrong password no longer hangs the boot path ~60s.

Migration (candidate-unlock + rotate, best-effort at login):
- change_wallet_password (WalletUnlocker.ChangePassword) + migrate_locked_wallet:
  if LOCKED, try candidates as current pw and ChangePassword onto the per-node
  secret so future boots auto-unlock. Hooked into auth.login (non-blocking) with
  the just-verified password as the candidate.

NOT YET: seed-recovery fallback for wallets where no candidate matches (e.g.
.116/.228) — destructive, needs entropy-source/funds-safety handling; next pass.
NOT shipped: pending end-to-end validation on a real node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 11:19:56 -04:00
archipelago
a483fe4baa fix: derive launch port from URL authority, not naive rsplit
reachable_lan_address() parsed the launch port with url.rsplit(':')
which yields "8096/" for manifest interfaces.main URLs that carry a
path (http://localhost:8096/). That fails to parse and silently drops
a perfectly reachable launch URL, so apps like jellyfin, btcpay-server,
fedimint, gitea, nextcloud and portainer showed running with no launch
link in the UI. New launch_url_port() reads digits after the final
colon (mirroring port_from_url in the RPC layer) and tolerates a
trailing path. Adds regression tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 03:35:19 -04:00
archipelago
0ed892a412 fix: wallet receive reliability, bitcoin install self-heal, ElectrumX app tile
Fixes three Bitcoin/wallet failures observed across the fleet on v1.7.90-alpha
(all nodes were already on the latest build — these were live bugs, not stale
builds), plus the missing ElectrumX tile, and adds automated coverage so each
can't regress silently.

Receive address (".116 receive fails", ".228 false 'wallet is locked'"):
- LND publishes its REST API on a host port that can drift from the manifest
  (a container created when the mapping was 8080 kept publishing 8080 after the
  manifest moved to 18080). The in-process client connects to the manifest port,
  gets connection-refused, and wallet init fails forever while the container
  looks "Up". Add published-port drift detection to the reconciler
  (container_ports_drifted / host_port_bindings_drifted) that recreates a
  drifted backend even for restart-sensitive apps — a drifted container is
  already broken, so leaving it "untouched" only perpetuates the failure.
- Receive errors now carry a stable [CODE] token (REST_UNREACHABLE, WALLET_LOCKED,
  WALLET_UNINITIALIZED, SYNCING) and always start with "Bitcoin address" so they
  survive the RPC error sanitizer instead of collapsing to the generic
  "Operation failed". The UI maps the code instead of guessing wallet state from
  substrings — so an unreachable REST endpoint is no longer mislabelled "locked".

Bitcoin install (".198 bitcoin gone / reinstall just stops"):
- bitcoin-knots requires the secret bitcoin-rpc-txrelay-rpcauth, which was only
  generated by the tx-relay flow. Nodes that never used tx-relay lacked it, so
  secret resolution hard-failed and the whole Bitcoin stack cascaded. Generate
  it idempotently before bitcoin starts (ensure_app_secrets, reusing
  ensure_txrelay_credentials), and name the missing secret in the error so a
  genuine gap is actionable instead of a bare "IO error".

ElectrumX app tile missing on every node with it installed:
- The catalog generator dropped electrumx because the manifest had no
  interfaces.main block, so the tile had no launch URL and was hidden. Declare
  the companion UI port (50002) in the manifest, regenerate the catalog, and let
  an app with a known launch URL stay launchable while its backend is still
  "starting" (ElectrumX indexes for 10m+).

Test harness:
- New lifecycle bats suites: bitcoin-receive, port-drift, secret-completeness
  (validated live; port-drift catches the real .116 drift).
- Rust unit tests for drift detection, the receive reason-code classifier, and
  the named-missing-secret error; vitest for the UI code mapping.
- create-release.sh now runs tests/release/run.sh and aborts the release on
  failure — previously it ran no tests at all.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 03:12:56 -04:00
archipelago
c49e8fcacd fix: harden OTA updates, AIUI desktop gap, LND no-proxy
- update.rs: post-OTA probe falls back to http://127.0.0.1/ on connect
  error (nginx binds :80, not :443) so good updates are no longer rolled
  back; recover stuck update_in_progress; avoid ETXTBSY on running binary
- LND: REST client bypasses proxy, GET newaddress p2wkh, wallet
  readiness/unlock after restart
- Dashboard.vue: chat route back to plain h-full (desktop bottom-gap fix)
- vite.config.ts: dev-only /aiui proxy
- tests/release/run.sh: release gate harness (static+frontend+backend)
- CHANGELOG: v1.7.89-alpha notes

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 01:23:32 -04:00
archipelago
d6f108d818 chore: snapshot release workspace 2026-06-12 03:00:15 -04:00
archipelago
6a30ff11bd chore: release v1.7.84-alpha 2026-06-11 04:44:58 -04:00
archipelago
f818f1dcc1 app-platform: remove unsupported saleor release surface 2026-06-11 01:16:21 -04:00
archipelago
c393b96da3 backend: harden rootless app lifecycle orchestration 2026-06-11 00:24:32 -04:00
archipelago
34c4e87d14 feat(apps): add saleor storefront 2026-05-20 23:02:57 -04:00
archipelago
f4368785f0 fix(apps): unblock saleor and netbird first-use flows 2026-05-20 00:28:30 -04:00
archipelago
92c58141af fix(apps): stabilize saleor and netbird launch 2026-05-19 21:45:17 -04:00
archipelago
522c046525 feat(apps): add saleor and harden netbird repair 2026-05-19 20:11:22 -04:00
archipelago
bd69ef41d5 fix(apps): repair netbird login and iframe focus 2026-05-19 19:21:43 -04:00
archipelago
f0bd49d03d fix(apps): repair netbird install and app icons 2026-05-19 17:20:32 -04:00
archipelago
ab96c97cb9 fix(apps): self-host netbird and stabilize app sessions 2026-05-19 16:02:35 -04:00
archipelago
87be717f40 fix(apps): keep slow installs visible 2026-05-19 14:29:20 -04:00
archipelago
413d50116e fix(apps): restore mobile and website launching 2026-05-17 19:22:18 -04:00
archipelago
7804223152 chore: release v1.7.57-alpha 2026-05-17 17:30:04 -04:00
Dorian
b8053c00ca fix: clear stale health notifications 2026-05-14 08:57:54 -04:00
Dorian
f95e9a1cd0 fix: quote quadlet environment values 2026-05-14 01:15:22 -04:00
Dorian
2ff47f88a7 fix: harden container reconcile and launch behavior 2026-05-13 22:59:55 -04:00
Dorian
835c525218 chore(release): stage v1.7.55-alpha 2026-05-13 15:09:22 -04:00
archipelago
c0751e2551 chore(release): stage v1.7.54-alpha 2026-05-06 09:23:57 -04:00
archipelago
745cb1c626 chore(release): stage v1.7.52-alpha 2026-05-05 11:29:18 -04:00
archipelago
aad0ba5234 feat(orchestrator): drift-sync existing Quadlet units on each reconcile
When a Quadlet unit file already exists for an orchestrator-managed
backend, sync its on-disk bytes against what the current renderer
produces. write_if_changed makes this idempotent — when bytes match,
no IO; when they differ (post-deploy of a renderer change), the file
is rewritten and systemctl --user daemon-reload runs once.

We deliberately do NOT restart the .service when the file changes:
running containers keep their current config until the operator
restarts them. That's the right tradeoff — file updates are cheap and
non-destructive; service restarts are the SIGKILL cascade we're
trying to eliminate.

Why this matters: pre-this-commit, every renderer change required a
fresh package.install RPC per app to take effect. Observed live on
.228 2026-05-02 — the TimeoutStartSec=600 fix shipped in code but
existing units stayed on the old format because nothing triggered a
re-render. Combined with state.json being empty (so the reconciler's
auto-install path didn't fire either), the fix was invisible until
manual unit deletion.

Companions (UI_APP_IDS) are skipped — companion.rs renders those units
with a different shape; syncing here would clobber them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 11:43:18 -04:00
archipelago
281e65e697 fix(quadlet): TimeoutStartSec=600 when Notify=healthy is set
Bug surfaced live on .228 2026-05-02 — every backend Quadlet unit
(lnd, electrumx, fedimint, btcpay-server, mempool-api, bitcoin-knots)
hit systemd's default 90s start timeout because Notify=healthy makes
systemctl wait for the first green health probe, but
HealthInterval=30s × HealthRetries=3 = 90s minimum even on a healthy
service. Race: timeout fires the moment the third probe MIGHT succeed.

Result was three different post-states (inactive+running, failed+missing,
inactive+stopped) depending on whether systemd's ExecStopPost ran
podman rm before the orchestrator's adoption logic re-grabbed the
container.

Fix: when health is set, render TimeoutStartSec=600 (10 minutes) into
[Service]. Long enough for slow-starting backends (electrumx index
replay, lnd wallet unlock) without being so long that a truly stuck
unit hangs forever. Companions stay unchanged (no health → no override,
default 90s applies).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 07:14:48 -04:00
archipelago
384f12de7a fix(quadlet): http:// double-prefix + companion migration race
Two bugs surfaced by the first real-node validation of Phase 3.2-3.4
on .228 (2026-05-02), both caught before flipping the default.

Bug 1 — translate_health_check double-prefixed http://. Manifests in
the wild carry the scheme inside the endpoint string ("http://localhost:8175"),
and we were prepending another http:// unconditionally. Result on .228:
every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`,
every probe failed, fedimint hit a 14-restart loop. Now we accept either
form and skip appending hc.path when the endpoint already carries one.
Regression test asserts no double-prefix and that an in-endpoint path
is honoured.

Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui /
electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41.
Migration tore down the running companion + raced companion.rs render,
producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile
errors and leaving archy-bitcoin-ui down. Companions now short-circuit
out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists
returns Err for an unrelated reason (permissions, EIO), we now skip
migration instead of treating "I can't tell" as "go ahead and migrate" —
migrating on top of a possibly-existing unit is destructive.

What this does not fix yet:
  * the orchestrator's reconciler iterating every manifest in
    /opt/archipelago/apps/, not just installed apps. Pre-existing
    behavior (also affects the legacy path) — separate scope.
  * fedimint /data UID mismatch surfaced when Quadlet started fedimint
    fresh. Likely orthogonal — defer.
  * no rollback when install_via_quadlet fails after a remove_container.
    Tracked as Phase 3.3.1 — defer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:37:37 -04:00
archipelago
97ce23d773 feat(quadlet): Phase 3.4 — health-gated startup via Notify=healthy
QuadletUnit gains an optional HealthSpec; from_manifest translates the
manifest's health_check (tcp/http/cmd) into a HealthCmd= directive and
emits Notify=healthy alongside it. systemctl start <unit>.service then
blocks until the container's first green probe — eliminating the
"container up but RPC not ready" race the orchestrator currently papers
over with post-start polling.

Translation policy:
* tcp,  endpoint "host:port"        -> nc -z host port
* http, endpoint "host:port", path  -> curl -fsS -m 5 http://endpoint<path>
* cmd,  endpoint "<shell command>"  -> verbatim
* unknown type / malformed endpoint -> None (skip Notify=healthy rather
  than emit a HealthCmd that hangs the unit start forever)

Companion units leave health: None and remain byte-identical to before
this PR — the renderer only emits the Health* / Notify= block when set.

+4 quadlet unit tests (19 total). Dropped a never-used test setter that
was generating a dead_code warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:21:57 -04:00
archipelago
65576bd755 feat(orchestrator): Phase 3.3 — in-place migration to Quadlet
When use_quadlet_backends flips from off → on, existing fleet boxes
have backend containers parented under archipelago.service's cgroup
(the bad shape that triggers FM3 cascade SIGKILL on every archipelago
restart). ensure_running now notices and corrects this:

* If there's already a `<name>.container` unit on disk → no-op
  (subsequent reconcile ticks take this fast path).
* Else if a podman container with that name exists → it's a pre-3.3
  artifact. Stop+remove it (volumes survive — bind mounts are not
  touched by `podman rm`), then write the Quadlet unit, daemon-reload,
  and start the new managed service.
* Else → fall through to install_fresh, which already routes through
  install_via_quadlet when the flag is on.

The migration is idempotent and self-healing: if a fleet box is
half-migrated (unit on disk but no service active, or service active
but stale unit), the next reconcile tick converges. Bitcoin chain
data, lnd wallet state, and electrumx index all live on host bind
mounts and are unaffected by the container-record swap.

Volume safety audited per backend in `uses_orchestrator_install_flow`
allowlist — every entry mounts its data dir as a host bind mount.

Default still off. To migrate a node:
  /etc/archipelago/config.toml: use_quadlet_backends = true
followed by `systemctl restart archipelago` — the next reconcile tick
walks every managed app and migrates each in turn.

Tests: 624 passing, 0 cargo warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:27:59 -04:00
archipelago
5b2e02bd43 feat(orchestrator): Phase 3.2 — wire Quadlet path behind feature flag
prod_orchestrator::install_fresh now branches on the new
Config::use_quadlet_backends flag (default false):

* off (today's production behavior) — unchanged: runtime.create_container
  + start_container, container parented under archipelago.service's
  cgroup, FM3 cascade SIGKILL on every archipelago restart.
* on  — install_via_quadlet renders the manifest as a Quadlet unit via
  QuadletUnit::from_manifest, writes it atomically into
  ~/.config/containers/systemd/, calls daemon-reload, and starts the
  generated <name>.service. Container ends up under user.slice — no
  more cgroup parented under archipelago, so archipelago restarts
  don't touch the container's lifetime.

Default off so this commit is structurally safe to ship: nothing
changes at runtime until an operator opts in. Flip the default once
tests/lifecycle/run-20x.sh has gone green against the new path on
.228 + .198 (the v1.7.52 release gate).

Plumbing:
* config.rs — `use_quadlet_backends: bool` w/ Default false
* prod_orchestrator.rs — flag stored on the struct, threaded through
  new(), with set_use_quadlet_backends(bool) test setter
* prod_orchestrator.rs — install_via_quadlet helper
* dropped the Phase-3.1 #[allow(dead_code)] markers on from_manifest /
  parse_memory_mib / RestartPolicy::OnFailure now that the call path
  exists; if a future revert removes the wiring, the warnings come back.

Tests: 624 passing, cargo check clean (0 warnings). Existing companion
behavior unaffected — render_skips_backend_directives_when_default
still passes byte-equal to before quadlet.rs grew the new fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:22:10 -04:00
archipelago
9becafafd3 feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52)
The QuadletUnit struct now covers everything a backend manifest needs
(ports, environment, devices, add_hosts, entrypoint+command, read-only
root, no_new_privileges, cpu_quota, restart policy choice). Adds
QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed
manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB
forms. The renderer skips empty/false directives so existing companion
units render byte-identically — no behavior change for shipping
companions; the backend renderer is dead code until Phase 3.2 wires it
into the orchestrator.

Eight new unit tests cover:
* parse_memory_mib forms (1024, 512m, 2g, garbage)
* shell_join quoting (whitespace, embedded quotes)
* RestartPolicy → systemd string mapping
* render emits backend directives when set
* render skips them when defaulted (companion regression gate)
* from_manifest happy path on a bitcoin-knots-shaped manifest
* from_manifest read-only volume detection
* from_manifest tmpfs filtering
* end-to-end manifest → render bytes assertion

Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was
implicitly covered before but is now explicit). Cargo warnings: 0.

`from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are
marked allow(dead_code) with explicit references to Phase 3.2 — if
3.2 doesn't wire them, the dead-code warning resurfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
archipelago
6bbe1b96cf refactor: drop dead code surfaced by cargo
cargo check was showing five real warnings, all genuinely dead:

* container/mod.rs   — re-exports compute_container_name, AdoptionReport,
                       ReconcileAction, ReconcileReport were unused outside
                       prod_orchestrator. Drop from the pub use line.
* prod_orchestrator  — with_runtime + insert_manifest_for_test only exist
                       for the test module in the same file. Mark them
                       #[cfg(test)] so they don't appear in release builds.
* async_lifecycle    — remove_package_entry has no callers; doc claims
                       "used for install-failure cleanup" but nothing
                       cleans up. Delete (10 lines).
* registry.rs        — `use tracing::{debug, info};` had no consumers.
* fips.rs            — unused-assignment chain on last_status. The poll
                       loop always sets it on every break path, so the
                       initial `None` and the unwrap_or_else fallback
                       were both dead. Refactored to `let after = loop
                       { ...; break s; };`.

cargo check is now clean. cargo test --workspace --bins: 614 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:34:02 -04:00
archipelago
f9e34fd0c6 refactor(install): route orchestrator-managed apps through orchestrator first
Phase 3a of the install path consolidation. Two coupled changes:

1. install.rs handle_package_install: gate the legacy "container exists →
   adopt + return" probe on !orchestrator_managed. Apps the orchestrator
   knows about (bitcoin-knots, bitcoin-core, lnd, electrumx, fedimint,
   filebrowser, btcpay-server stack apps, mempool stack apps, plus the
   companion UIs that just moved to Quadlet) skip the legacy probe and
   fall straight into the orchestrator branch.

   The legacy adopt block was returning success on a bare `podman start`
   exit-0 — even when the process inside the container crashed seconds
   later. That's the .228 "running but unreachable" failure mode. The
   orchestrator's ensure_running honors the manifest's health check and
   pre-start hooks (e.g. re-renders bitcoin-ui's nginx.conf if the RPC
   password rotated), so this is a behavioral upgrade, not just a
   refactor.

2. ProdContainerOrchestrator::install: make idempotent. Previously it
   blindly called install_fresh which would fail on `podman create` if
   the container name already existed. Now it delegates to ensure_running:
     - Container Running + healthy → no-op (refresh hooks, restart if
       config rewritten)
     - Container Stopped/Exited → start (with hook refresh)
     - Container missing → install_fresh
     - Container in wedged state (Created/Paused/Unknown) → force-recreate

   Without this, change #1 would regress every "container already exists"
   case for the 18 orchestrator-managed app IDs. With it, install becomes
   the single source of truth for "make app X be in the desired state."

Tests: 654 passed across the workspace (614 unit + 37 orchestration + 3
rpc), 0 failures. The 20 prod_orchestrator tests cover the install /
ensure_running / reconcile paths the new install delegates through.

Net delta: install.rs grows by ~30 lines (gating wrapper + comments),
prod_orchestrator.rs grows by ~30 lines (idempotent install body). Both
are temporary — the larger deletions (~1700 lines) come once every app
has been verified through the orchestrator path in subsequent phases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 11:12:52 -04:00
archipelago
23c4e7441f refactor(container): move companion UIs to systemd via Quadlet
Companion UI containers (archy-bitcoin-ui, archy-lnd-ui,
archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn
blocks from install.rs. If archipelago crashed mid-spawn or the
container's cgroup was reaped, companions vanished from podman ps -a
and only a manual rm/run could bring them back (the .228 incident).

Now each companion is rendered as a Quadlet .container unit under
~/.config/containers/systemd/, daemon-reloaded, and started via
systemctl --user. systemd owns supervision from that point on:

- archipelago can crash, restart, or be uninstalled without touching
  any companion.
- Quadlet's Restart=always + RestartSec=10 handles container exits.
- A 30s reconcile tick in boot_reconciler enumerates expected
  companion units and re-installs any whose unit file or service
  vanished — defense-in-depth against external tampering.

New module layout:
- container/quadlet.rs: pure unit renderer + atomic write_if_changed
  + systemctl helpers (daemon_reload_user / enable_now / disable_remove
  / is_active). 6 unit tests, no I/O in the renderer.
- container/companion.rs: per-app companion specs, install/remove/
  reconcile, image presence (build local first, fall back to insecure
  registry only via image_uses_insecure_registry whitelist). 2 tests.

install.rs handle_package_install now ends with a single call to
companion::install_for(package_id), replacing 287 lines of spawn-and-
hope shellouts plus a ~120-line nginx auth-injector helper that worked
around per-node RPC password baking. The helper is gone too — the
pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/
bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only.

runtime.rs handle_package_uninstall now disables companions before
the container rm loop. Otherwise systemd's Restart=always would
respawn each companion within ~10s of removal.

Tests: 53 container tests pass, including 6 quadlet renderer tests
(host network, bridge network, capability set, atomic write idempotence)
and 2 companion specs (per-app companion lookup, build_unit shape).
boot_reconciler tests gain a #[cfg(test)] without_companion_stage()
flag so the paused-clock fixtures don't race the real systemctl I/O.

A bats regression test (companion-survives-archipelago-restart.bats,
gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode
cannot recur: every installed companion has a unit file, services
stay active across systemctl --user restart archipelago, and a
deleted unit file is recreated within one reconcile tick.

Net delta: +941 / -363, but the +941 is mostly tests (~440 lines)
and the new declarative layer; the imperative tokio::spawn block and
its nginx-auth helper are gone, removing two failure classes
(orphan companions on archipelago crash, and post-start exec races
under tightly-confined cgroups) that previously needed manual SSH
recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
archipelago
0684491072 chore: baseline codex hardening before lifecycle refactor
Snapshots the in-flight hardening work so subsequent reconcile/Quadlet
phases land on a clean before/after diff.

Changes:
- core/container/src/podman_client.rs: image_uses_insecure_registry()
  whitelist for the OVH (146.59.87.168:3000) and legacy Hetzner
  (23.182.128.160:3000) HTTP mirrors; podman_network_settings() lifts
  custom networks into the Networks map so containers can join them.
- core/archipelago/src/container/prod_orchestrator.rs:
  ensure_container_network() creates per-manifest networks on demand;
  apply_data_uid() now goes through host_sudo for mkdir -p + chown so
  bind-mount roots get created and chowned without password prompts.
- core/archipelago/src/api/rpc/package/{install,update,stacks}.rs:
  podman pull adds --tls-verify=false only for whitelisted registries.
- core/archipelago/src/bootstrap.rs: removes stale dev-mode systemd
  override on startup (live nodes carried it from old installers).
- core/archipelago/src/config.rs: ignore ARCHIPELAGO_DEV_MODE in prod
  binaries — it had been silently rerouting volumes to /tmp.
- apps/bitcoin-{core,knots}/manifest.yml: locate bitcoind at runtime
  so image-layout differences don't break entrypoint.
- scripts/app-catalog-image-smoke-test.py: production catalog/image
  smoke test that probes a target node before users click Install.
- .gitignore: cover .codex, .pnpm-store, __pycache__, *.bak.

Removes filebrowser.rs.bak and two stale catalog.json.bak files
(verified identical to live counterparts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:52:29 -04:00
archipelago
05e6c2e738 fix: release v1.7.51-alpha install hardening 2026-05-01 05:02:39 -04:00
archipelago
7ab788d178 chore: release v1.7.49-alpha 2026-04-30 16:37:54 -04:00
archipelago
8f83b37d51 feat(orchestrator): complete container migration and release hardening 2026-04-28 15:00:58 -04:00
archipelago
2843cc1e84 fix(container/image_versions): reject entries that are not image references
The parser retained any key ending in _IMAGE, so a harmless-looking
variable like NOT_AN_IMAGE="something" would be treated as a pinned
container image. Add a value-shape check: the value must contain both
a registry separator (/) and a tag separator (:) to qualify.
2026-04-23 13:02:15 -04:00