1074 lines
41 KiB
Rust
Raw Normal View History

refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
//! Render and lifecycle Quadlet `.container` units for companion UI
//! containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui).
//!
//! Why Quadlet: companions used to run as fire-and-forget `tokio::spawn`
//! blocks from `install.rs`. If archipelago crashed mid-spawn or the
//! kernel reaped a parent cgroup, companions vanished from `podman ps`
//! entirely and only a manual `podman run` brought them back. Putting the
//! unit on disk and letting systemd own start/restart removes that whole
//! class of failure: the daemon is now systemd, archipelago is just the
//! provisioner.
//!
//! Design constraints kept this module small on purpose:
//!
//! - **Single responsibility**: render → write → enable → disable. We do
//! NOT pull images here — the caller is expected to have the image
//! present locally (companions either build from `/opt/archipelago/docker/`
//! or are pre-pulled by `install_companion_image`). The quadlet unit
//! declares `Pull=never` so a missing image surfaces immediately
//! instead of silently retrying behind systemd's restart loop.
//! - **Atomic writes**: `tempfile + rename` so a partially-written unit
//! is never visible to systemd. A daemon-reload during a rolling
//! update can't see half a file.
//! - **Idempotent**: `write_if_changed` compares bytes before touching
//! the file. No daemon-reload, no service-restart cascade if the
//! rendered bytes match what's on disk.
//! - **systemctl --user only**: archipelago runs as uid=1000 with
//! linger enabled. We never touch the system bus from here.
//!
//! See `docs/rust-orchestrator-migration.md` and the failure-mode log in
//! `feedback_container_lifecycle_failure_modes.md` for the incident
//! that motivated the move.
use anyhow::{anyhow, Context, Result};
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
use archipelago_container::AppManifest;
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
use std::fmt::Write as _;
use std::path::{Path, PathBuf};
use tokio::fs;
use tokio::process::Command;
/// Default rootless quadlet directory. Resolved per-user at runtime via
/// `unit_dir()`. Tests pass an explicit dir.
pub const DEFAULT_REL_UNIT_DIR: &str = ".config/containers/systemd";
#[derive(Debug, Clone)]
pub struct BindMount {
pub host: PathBuf,
pub container: PathBuf,
pub read_only: bool,
}
#[derive(Debug, Clone, Default)]
#[allow(dead_code)] // Bridge is reserved for Phase 5 per-app network isolation.
pub enum NetworkMode {
#[default]
Host,
/// A user-defined podman network — quadlet creates the container
/// attached to it. The network must already exist (orchestrator's
/// `ensure_container_network` handles that on every reconcile tick).
Bridge(String),
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
/// systemd Restart= policy for the generated `.service` unit. Companions
/// use Always (any exit triggers a restart). Backends use OnFailure
/// (clean exits — e.g. operator-issued `systemctl stop` — stay stopped,
/// only crashes get restarted automatically).
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum RestartPolicy {
Always,
feat(orchestrator): Phase 3.2 — wire Quadlet path behind feature flag prod_orchestrator::install_fresh now branches on the new Config::use_quadlet_backends flag (default false): * off (today's production behavior) — unchanged: runtime.create_container + start_container, container parented under archipelago.service's cgroup, FM3 cascade SIGKILL on every archipelago restart. * on — install_via_quadlet renders the manifest as a Quadlet unit via QuadletUnit::from_manifest, writes it atomically into ~/.config/containers/systemd/, calls daemon-reload, and starts the generated <name>.service. Container ends up under user.slice — no more cgroup parented under archipelago, so archipelago restarts don't touch the container's lifetime. Default off so this commit is structurally safe to ship: nothing changes at runtime until an operator opts in. Flip the default once tests/lifecycle/run-20x.sh has gone green against the new path on .228 + .198 (the v1.7.52 release gate). Plumbing: * config.rs — `use_quadlet_backends: bool` w/ Default false * prod_orchestrator.rs — flag stored on the struct, threaded through new(), with set_use_quadlet_backends(bool) test setter * prod_orchestrator.rs — install_via_quadlet helper * dropped the Phase-3.1 #[allow(dead_code)] markers on from_manifest / parse_memory_mib / RestartPolicy::OnFailure now that the call path exists; if a future revert removes the wiring, the warnings come back. Tests: 624 passing, cargo check clean (0 warnings). Existing companion behavior unaffected — render_skips_backend_directives_when_default still passes byte-equal to before quadlet.rs grew the new fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:22:10 -04:00
/// Used by `from_manifest` for backend manifests. Wired through
/// `install_via_quadlet` (gated by `Config::use_quadlet_backends`).
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
OnFailure,
}
impl Default for RestartPolicy {
fn default() -> Self {
Self::Always
}
}
impl RestartPolicy {
fn as_systemd(self) -> &'static str {
match self {
Self::Always => "always",
Self::OnFailure => "on-failure",
}
}
}
/// Container healthcheck wired through to systemd via `Notify=healthy`.
/// When set, `systemctl start <name>.service` blocks until the container's
/// own healthcheck reports green — eliminating the "container up but RPC
/// not ready" race that the orchestrator currently papers over with
/// post-start polling.
///
/// Ranges roughly mirror the manifest's HealthCheck struct: `cmd` is the
/// shell form (`/usr/bin/curl -fsS http://localhost:8332/health` etc.),
/// `interval`/`timeout` use systemd time format ("30s", "5m"), `retries`
/// is the consecutive-failures threshold before "unhealthy" trips.
#[derive(Debug, Clone)]
pub struct HealthSpec {
pub cmd: String,
pub interval: String,
pub timeout: String,
pub retries: u32,
}
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
/// One Quadlet `.container` unit. Field set is deliberately small —
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
/// add a new field only when a real manifest needs it.
#[derive(Debug, Clone, Default)]
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
pub struct QuadletUnit {
pub name: String,
pub description: String,
pub image: String,
pub network: NetworkMode,
pub user: Option<String>,
pub memory_mb: Option<u32>,
pub cap_drop_all: bool,
pub cap_add: Vec<String>,
pub bind_mounts: Vec<BindMount>,
pub extra_podman_args: Vec<String>,
pub depends_on: Vec<String>,
/// Phase 3.4: when present the rendered unit emits HealthCmd=,
/// HealthInterval=, HealthTimeout=, HealthRetries=, AND Notify=healthy
/// so systemctl start blocks on a green health probe.
pub health: Option<HealthSpec>,
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
// Backend-manifest extensions (Phase 3.1). Companion units leave
// these defaulted; the renderer skips empty/false directives so a
// companion's rendered bytes are unchanged from before this PR.
pub ports: Vec<(u16, u16, String)>,
pub environment: Vec<String>,
pub devices: Vec<String>,
pub add_hosts: Vec<(String, String)>,
pub entrypoint: Option<Vec<String>>,
pub command: Vec<String>,
pub read_only_root: bool,
pub no_new_privileges: bool,
pub cpu_quota: Option<u32>,
pub restart_policy: RestartPolicy,
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
}
impl QuadletUnit {
/// File name on disk: `<name>.container`. Quadlet translates this
/// into a `<name>.service` unit at daemon-reload time.
pub fn unit_filename(&self) -> String {
format!("{}.container", self.name)
}
/// systemd service name created by Quadlet for this unit.
pub fn service_name(&self) -> String {
format!("{}.service", self.name)
}
/// Render the canonical Quadlet unit text. Pure function — no I/O.
pub fn render(&self) -> String {
let mut s = String::with_capacity(512);
let _ = writeln!(s, "# Generated by archipelago. DO NOT EDIT.");
let _ = writeln!(s, "# Edits are overwritten on the next reconcile.");
let _ = writeln!(s);
let _ = writeln!(s, "[Unit]");
let _ = writeln!(s, "Description={}", self.description);
let _ = writeln!(s, "After=network-online.target");
let _ = writeln!(s, "Wants=network-online.target");
for dep in &self.depends_on {
let _ = writeln!(s, "Requires={dep}");
let _ = writeln!(s, "After={dep}");
}
let _ = writeln!(s);
let _ = writeln!(s, "[Container]");
let _ = writeln!(s, "ContainerName={}", self.name);
let _ = writeln!(s, "Image={}", self.image);
// Pull=never: companions are pre-pulled or built. A missing image
// must surface as a unit start failure, not a silent retry storm.
let _ = writeln!(s, "Pull=never");
match &self.network {
NetworkMode::Host => {
let _ = writeln!(s, "Network=host");
}
NetworkMode::Bridge(net) => {
let _ = writeln!(s, "Network={net}");
}
}
if let Some(user) = &self.user {
let _ = writeln!(s, "User={user}");
}
if self.cap_drop_all {
let _ = writeln!(s, "DropCapability=ALL");
}
for cap in &self.cap_add {
let _ = writeln!(s, "AddCapability={cap}");
}
if let Some(mb) = self.memory_mb {
let _ = writeln!(s, "PodmanArgs=--memory={mb}m");
}
for bm in &self.bind_mounts {
let mode = if bm.read_only { ":ro,Z" } else { ":Z" };
let _ = writeln!(
s,
"Volume={}:{}{}",
bm.host.display(),
bm.container.display(),
mode
);
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
for (host, container, proto) in &self.ports {
let p = if proto.is_empty() { "tcp" } else { proto.as_str() };
let _ = writeln!(s, "PublishPort={host}:{container}/{p}");
}
for env in &self.environment {
// env entries already arrive shaped as "KEY=VALUE"; quadlet
// accepts that form on a single Environment= line per pair.
let _ = writeln!(s, "Environment={env}");
}
for dev in &self.devices {
let _ = writeln!(s, "AddDevice={dev}");
}
for (name, ip) in &self.add_hosts {
let _ = writeln!(s, "AddHost={name}:{ip}");
}
if self.read_only_root {
let _ = writeln!(s, "ReadOnly=true");
}
if self.no_new_privileges {
let _ = writeln!(s, "NoNewPrivileges=true");
}
if let Some(cpus) = self.cpu_quota {
let _ = writeln!(s, "PodmanArgs=--cpus={cpus}");
}
if let Some(h) = &self.health {
let _ = writeln!(s, "HealthCmd={}", h.cmd);
let _ = writeln!(s, "HealthInterval={}", h.interval);
let _ = writeln!(s, "HealthTimeout={}", h.timeout);
let _ = writeln!(s, "HealthRetries={}", h.retries);
// Notify=healthy: systemd treats the unit as "started" only
// after the first green health probe. Start ordering
// (Requires=/After=) downstream of this unit therefore
// doesn't fire until the app is actually serving requests.
let _ = writeln!(s, "Notify=healthy");
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
if let Some(ep) = &self.entrypoint {
// Quadlet's Exec= replaces the image entrypoint+cmd. When
// the manifest provides both entrypoint and command we
// concatenate; if only command is set we'll emit that on
// its own below.
let mut parts: Vec<String> = ep.clone();
parts.extend(self.command.iter().cloned());
let _ = writeln!(s, "Exec={}", shell_join(&parts));
} else if !self.command.is_empty() {
let _ = writeln!(s, "Exec={}", shell_join(&self.command));
}
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
for arg in &self.extra_podman_args {
let _ = writeln!(s, "PodmanArgs={arg}");
}
let _ = writeln!(s);
let _ = writeln!(s, "[Service]");
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
// Restart policy + 10s backoff. RestartSec keeps a crash-loop
// from saturating the journal. Companions: Always. Backends:
// OnFailure (clean stops stay stopped).
let _ = writeln!(s, "Restart={}", self.restart_policy.as_systemd());
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
let _ = writeln!(s, "RestartSec=10");
if self.health.is_some() {
// Notify=healthy makes systemd block the unit's "started"
// state on the first green health probe. systemd's default
// TimeoutStartSec is 90s — but `HealthInterval=30s` ×
// `HealthRetries=3` is itself 90s, so the timeout fires the
// moment the third probe MIGHT succeed. On .228 every backend
// (lnd, electrumx, fedimint, btcpay-server, mempool-api,
// bitcoin-knots) timed out at 90s and systemd terminated the
// container while it was still warming up. Bump to 600s — long
// enough for slow-starting backends (electrumx replays its
// index, lnd unlocks its wallet) without being so long that a
// truly stuck unit hangs forever.
let _ = writeln!(s, "TimeoutStartSec=600");
}
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
let _ = writeln!(s);
let _ = writeln!(s, "[Install]");
let _ = writeln!(s, "WantedBy=default.target");
s
}
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
/// Render a manifest's argv-style list as a single Exec= line. We do
/// the minimum quoting needed so quadlet's parser sees one element per
/// item: anything containing whitespace, quotes, or shell metacharacters
/// gets wrapped in double quotes with embedded `"` and `\` escaped.
fn shell_join(parts: &[String]) -> String {
parts
.iter()
.map(|p| {
if p.is_empty() || p.chars().any(|c| c.is_whitespace() || "\"\\$`".contains(c)) {
let escaped = p.replace('\\', "\\\\").replace('"', "\\\"");
format!("\"{escaped}\"")
} else {
p.clone()
}
})
.collect::<Vec<_>>()
.join(" ")
}
impl QuadletUnit {
/// Build a backend-flavour QuadletUnit from a parsed AppManifest.
feat(orchestrator): Phase 3.2 — wire Quadlet path behind feature flag prod_orchestrator::install_fresh now branches on the new Config::use_quadlet_backends flag (default false): * off (today's production behavior) — unchanged: runtime.create_container + start_container, container parented under archipelago.service's cgroup, FM3 cascade SIGKILL on every archipelago restart. * on — install_via_quadlet renders the manifest as a Quadlet unit via QuadletUnit::from_manifest, writes it atomically into ~/.config/containers/systemd/, calls daemon-reload, and starts the generated <name>.service. Container ends up under user.slice — no more cgroup parented under archipelago, so archipelago restarts don't touch the container's lifetime. Default off so this commit is structurally safe to ship: nothing changes at runtime until an operator opts in. Flip the default once tests/lifecycle/run-20x.sh has gone green against the new path on .228 + .198 (the v1.7.52 release gate). Plumbing: * config.rs — `use_quadlet_backends: bool` w/ Default false * prod_orchestrator.rs — flag stored on the struct, threaded through new(), with set_use_quadlet_backends(bool) test setter * prod_orchestrator.rs — install_via_quadlet helper * dropped the Phase-3.1 #[allow(dead_code)] markers on from_manifest / parse_memory_mib / RestartPolicy::OnFailure now that the call path exists; if a future revert removes the wiring, the warnings come back. Tests: 624 passing, cargo check clean (0 warnings). Existing companion behavior unaffected — render_skips_backend_directives_when_default still passes byte-equal to before quadlet.rs grew the new fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:22:10 -04:00
/// Wired through `prod_orchestrator::install_via_quadlet`, gated by
/// `Config::use_quadlet_backends`.
///
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
/// `name` is the on-disk container name (typically the manifest's
/// `app.id`, but the orchestrator may rename — see
/// `compute_container_name`). The returned unit is NOT yet written;
/// the caller is expected to merge in any environment overrides
/// (resolve_dynamic_env, secret_env) before calling write_if_changed.
pub fn from_manifest(manifest: &AppManifest, name: &str) -> Self {
let app = &manifest.app;
let network = match app.security.network_policy.as_str() {
"host" => NetworkMode::Host,
// Bridge name comes from the manifest's container.network if
// set; otherwise the orchestrator manages a default network
// separately and we fall back to host. Quadlet won't refuse
// either form.
other if !other.is_empty() && other != "isolated" => NetworkMode::Bridge(other.into()),
_ => match app.container.network.as_deref() {
Some(n) if !n.is_empty() && n != "host" => NetworkMode::Bridge(n.into()),
_ => NetworkMode::Host,
},
};
let bind_mounts = app
.volumes
.iter()
.filter(|v| v.volume_type != "tmpfs" && !v.source.is_empty())
.map(|v| BindMount {
host: PathBuf::from(&v.source),
container: PathBuf::from(&v.target),
read_only: v.options.iter().any(|o| o == "ro"),
})
.collect::<Vec<_>>();
let memory_mb = app.resources.memory_limit.as_ref().and_then(|s| {
// Manifests use forms like "1g", "512m", "1024". Convert to
// MiB. Anything we can't parse gets dropped (renderer skips
// None) — better to lose the limit than to mis-cap.
parse_memory_mib(s)
});
Self {
name: name.to_string(),
description: format!("Archipelago app: {}", app.id),
image: app.container.image_ref().unwrap_or_default(),
network,
user: None,
memory_mb,
cap_drop_all: true,
cap_add: app.security.capabilities.clone(),
bind_mounts,
extra_podman_args: vec![],
depends_on: vec![],
health: app.health_check.as_ref().and_then(translate_health_check),
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
ports: app
.ports
.iter()
.map(|p| (p.host, p.container, p.protocol.clone()))
.collect(),
environment: app.environment.clone(),
devices: app.devices.clone(),
add_hosts: vec![("host.archipelago".into(), "10.89.0.1".into())],
entrypoint: app.container.entrypoint.clone(),
command: app.container.custom_args.clone(),
read_only_root: app.security.readonly_root,
no_new_privileges: true,
cpu_quota: app.resources.cpu_limit,
restart_policy: RestartPolicy::OnFailure,
}
}
}
/// Translate the manifest's HealthCheck shape into a HealthSpec the
/// renderer understands. Returns None when the manifest's health spec
/// is malformed or unsupported — we'd rather skip Notify=healthy than
/// emit a broken HealthCmd that fails the unit start forever.
///
/// Supported shapes:
/// - type: tcp, endpoint: "host:port" → `nc -z host port`
fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:37:37 -04:00
/// - type: http, endpoint: "host:port" or "http(s)://host:port", path → curl
/// - type: cmd, endpoint: "<shell command>" → `<shell command>` verbatim
fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:37:37 -04:00
///
/// For type=http we accept the endpoint with or without scheme; manifests
/// in the wild use both forms (`localhost:8175` and
/// `http://localhost:8175/`). Earlier we blindly prepended `http://` even
/// when one was already there, producing `http://http://...` HealthCmds
/// that pasted on .228 2026-05-02 and failed every probe.
fn translate_health_check(
hc: &archipelago_container::HealthCheck,
) -> Option<HealthSpec> {
let cmd = match hc.check_type.as_str() {
"tcp" => {
let endpoint = hc.endpoint.as_deref()?;
let (host, port) = endpoint.rsplit_once(':')?;
// nc is in busybox/coreutils on every base image we ship.
// The -z flag does a "scan" that exits 0 on connect, 1 otherwise.
format!("nc -z {host} {port}")
}
"http" => {
fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:37:37 -04:00
let endpoint = hc.endpoint.as_deref()?.trim();
// Accept either bare host:port or a full URL. If endpoint
// already includes a scheme we use it as-is; otherwise we
// prepend http://. This keeps existing http://foo manifests
// working and stops the http://http:// double-prefix bug.
let url = if endpoint.starts_with("http://") || endpoint.starts_with("https://") {
endpoint.to_string()
} else {
format!("http://{endpoint}")
};
// If the endpoint already carried a path component, honour it
// and ignore hc.path (manifests that bake the path into the
// endpoint don't expect to merge a separate path field).
// Otherwise append hc.path (default "/").
let already_has_path = url
.splitn(4, '/')
.nth(3)
.map(|p| !p.is_empty())
.unwrap_or(false);
let final_url = if already_has_path {
url
} else {
let path = hc.path.as_deref().unwrap_or("/");
format!("{url}{path}")
};
// -fsS: fail on non-2xx, silent except on error, show errors.
// -m 5: per-request timeout matches the default manifest timeout.
fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:37:37 -04:00
format!("curl -fsS -m 5 {final_url}")
}
"cmd" => hc.endpoint.as_deref()?.to_string(),
_ => return None,
};
Some(HealthSpec {
cmd,
interval: hc.interval.clone(),
timeout: hc.timeout.clone(),
retries: hc.retries,
})
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
/// Parse the manifest's memory_limit string into MiB. Recognises the
/// forms our manifests actually use: "<n>", "<n>m"/"<n>M", "<n>g"/"<n>G".
/// Returns None for anything else; the caller treats None as unlimited.
fn parse_memory_mib(raw: &str) -> Option<u32> {
let trimmed = raw.trim();
if trimmed.is_empty() {
return None;
}
let (num_part, mul) = match trimmed.chars().last()? {
'g' | 'G' => (&trimmed[..trimmed.len() - 1], 1024u32),
'm' | 'M' => (&trimmed[..trimmed.len() - 1], 1u32),
'k' | 'K' => return None, // sub-MiB precision: drop, not worth it
c if c.is_ascii_digit() => (trimmed, 1u32), // bare number, treat as MiB
_ => return None,
};
num_part.trim().parse::<u32>().ok()?.checked_mul(mul)
}
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
/// Resolve the per-user quadlet dir under $HOME. Created if missing.
pub async fn unit_dir() -> Result<PathBuf> {
let home = std::env::var_os("HOME")
.map(PathBuf::from)
.ok_or_else(|| anyhow!("HOME not set; cannot locate quadlet unit dir"))?;
let dir = home.join(DEFAULT_REL_UNIT_DIR);
fs::create_dir_all(&dir)
.await
.with_context(|| format!("create_dir_all {}", dir.display()))?;
Ok(dir)
}
/// Atomically write `unit` into `dir/<name>.container` if the bytes
/// differ from what's already there. Returns true if the file changed.
pub async fn write_if_changed(unit: &QuadletUnit, dir: &Path) -> Result<bool> {
let path = dir.join(unit.unit_filename());
let new_bytes = unit.render();
if let Ok(old) = fs::read_to_string(&path).await {
if old == new_bytes {
return Ok(false);
}
}
fs::create_dir_all(dir)
.await
.with_context(|| format!("create_dir_all {}", dir.display()))?;
let tmp = path.with_extension("container.tmp");
fs::write(&tmp, new_bytes.as_bytes())
.await
.with_context(|| format!("write tmp {}", tmp.display()))?;
fs::rename(&tmp, &path)
.await
.with_context(|| format!("rename {} -> {}", tmp.display(), path.display()))?;
Ok(true)
}
/// Reload the user systemd manager. Required after any quadlet write
/// or removal so systemd picks up the generated `.service` translation.
pub async fn daemon_reload_user() -> Result<()> {
let status = Command::new("systemctl")
.args(["--user", "daemon-reload"])
.status()
.await
.context("spawn systemctl --user daemon-reload")?;
if !status.success() {
return Err(anyhow!("systemctl --user daemon-reload exited {status}"));
}
Ok(())
}
/// Enable + start a quadlet-generated service. `enable --now` makes it
/// survive reboots and starts it immediately.
pub async fn enable_now(service: &str) -> Result<()> {
// Quadlet-generated units cannot be `enable`d directly because the
// .service file lives under /run, not /etc — `enable` would refuse
// ("transient or generated"). The unit's `[Install] WantedBy` is
// honoured at daemon-reload, so we just start it.
let status = Command::new("systemctl")
.args(["--user", "start", service])
.status()
.await
.with_context(|| format!("spawn systemctl --user start {service}"))?;
if !status.success() {
return Err(anyhow!("systemctl --user start {service} exited {status}"));
}
Ok(())
}
/// Stop + remove a quadlet unit and its on-disk file. Best-effort:
/// errors stop only the destructive write at the failing step so a
/// partial removal doesn't leave a quadlet file pointing at a service
/// that systemd no longer knows about.
pub async fn disable_remove(unit_name: &str, dir: &Path) -> Result<()> {
let svc = format!("{unit_name}.service");
// Stop first; ignore failure (unit may already be down).
let _ = Command::new("systemctl")
.args(["--user", "stop", &svc])
.status()
.await;
let path = dir.join(format!("{unit_name}.container"));
if fs::try_exists(&path).await.unwrap_or(false) {
fs::remove_file(&path)
.await
.with_context(|| format!("remove {}", path.display()))?;
}
daemon_reload_user().await.ok();
// Defensive: kill the actual container too, in case quadlet left it.
let _ = Command::new("podman")
.args(["rm", "-f", unit_name])
.status()
.await;
Ok(())
}
/// Is the quadlet-generated service currently active?
pub async fn is_active(service: &str) -> bool {
Command::new("systemctl")
.args(["--user", "is-active", "--quiet", service])
.status()
.await
.map(|s| s.success())
.unwrap_or(false)
}
#[cfg(test)]
mod tests {
use super::*;
use tempfile::tempdir;
fn sample_unit() -> QuadletUnit {
QuadletUnit {
name: "archy-bitcoin-ui".into(),
description: "Bitcoin RPC UI proxy".into(),
image: "146.59.87.168:3000/lfg2025/bitcoin-ui:latest".into(),
network: NetworkMode::Host,
user: Some("0:0".into()),
memory_mb: Some(128),
cap_drop_all: true,
cap_add: vec![
"CHOWN".into(),
"DAC_OVERRIDE".into(),
"NET_BIND_SERVICE".into(),
"SETUID".into(),
"SETGID".into(),
],
bind_mounts: vec![BindMount {
host: PathBuf::from("/var/lib/archipelago/bitcoin-ui/nginx.conf"),
container: PathBuf::from("/etc/nginx/conf.d/default.conf"),
read_only: true,
}],
extra_podman_args: vec![],
depends_on: vec![],
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
..QuadletUnit::default()
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
}
}
#[test]
fn render_contains_required_directives() {
let s = sample_unit().render();
assert!(s.contains("[Container]"));
assert!(s.contains("ContainerName=archy-bitcoin-ui"));
assert!(s.contains("Image=146.59.87.168:3000/lfg2025/bitcoin-ui:latest"));
assert!(s.contains("Pull=never"));
assert!(s.contains("Network=host"));
assert!(s.contains("DropCapability=ALL"));
assert!(s.contains("AddCapability=CHOWN"));
assert!(s.contains("AddCapability=NET_BIND_SERVICE"));
assert!(s.contains("PodmanArgs=--memory=128m"));
assert!(s.contains(
"Volume=/var/lib/archipelago/bitcoin-ui/nginx.conf:/etc/nginx/conf.d/default.conf:ro,Z"
));
assert!(s.contains("[Service]"));
assert!(s.contains("Restart=always"));
assert!(s.contains("WantedBy=default.target"));
}
#[test]
fn render_bridge_network_emits_network_name() {
let mut u = sample_unit();
u.network = NetworkMode::Bridge("archy-bitcoin-ui-net".into());
let s = u.render();
assert!(s.contains("Network=archy-bitcoin-ui-net"));
assert!(!s.contains("Network=host"));
}
#[test]
fn unit_filename_and_service_name_are_consistent() {
let u = sample_unit();
assert_eq!(u.unit_filename(), "archy-bitcoin-ui.container");
assert_eq!(u.service_name(), "archy-bitcoin-ui.service");
}
#[tokio::test]
async fn write_if_changed_writes_first_time_then_noops() {
let dir = tempdir().unwrap();
let u = sample_unit();
let changed = write_if_changed(&u, dir.path()).await.unwrap();
assert!(changed, "first write must report changed");
let on_disk = tokio::fs::read_to_string(dir.path().join(u.unit_filename()))
.await
.unwrap();
assert!(on_disk.starts_with("# Generated by archipelago"));
let changed2 = write_if_changed(&u, dir.path()).await.unwrap();
assert!(!changed2, "second write with identical bytes must no-op");
}
#[tokio::test]
async fn write_if_changed_rewrites_when_field_changes() {
let dir = tempdir().unwrap();
let mut u = sample_unit();
write_if_changed(&u, dir.path()).await.unwrap();
u.memory_mb = Some(256);
let changed = write_if_changed(&u, dir.path()).await.unwrap();
assert!(changed, "field change must trigger rewrite");
let on_disk = tokio::fs::read_to_string(dir.path().join(u.unit_filename()))
.await
.unwrap();
assert!(on_disk.contains("PodmanArgs=--memory=256m"));
}
#[tokio::test]
async fn write_if_changed_atomic_rename_leaves_no_tmp() {
let dir = tempdir().unwrap();
write_if_changed(&sample_unit(), dir.path()).await.unwrap();
let mut entries = tokio::fs::read_dir(dir.path()).await.unwrap();
while let Some(e) = entries.next_entry().await.unwrap() {
assert!(
!e.file_name().to_string_lossy().ends_with(".tmp"),
"atomic rename must leave no .tmp residue"
);
}
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
// ────────────────────────────────────────────────────────────────
// Phase 3.1 backend renderer tests
// ────────────────────────────────────────────────────────────────
#[test]
fn parse_memory_mib_recognises_common_forms() {
assert_eq!(parse_memory_mib("1024"), Some(1024));
assert_eq!(parse_memory_mib("512m"), Some(512));
assert_eq!(parse_memory_mib("512M"), Some(512));
assert_eq!(parse_memory_mib("2g"), Some(2048));
assert_eq!(parse_memory_mib("2G"), Some(2048));
assert_eq!(parse_memory_mib("1k"), None); // sub-MiB rejected
assert_eq!(parse_memory_mib("garbage"), None);
assert_eq!(parse_memory_mib(""), None);
assert_eq!(parse_memory_mib(" 256m "), Some(256));
}
#[test]
fn shell_join_quotes_only_when_needed() {
assert_eq!(shell_join(&["bitcoind".into()]), "bitcoind");
assert_eq!(
shell_join(&["bitcoind".into(), "-server=1".into()]),
"bitcoind -server=1"
);
// Whitespace forces quoting:
assert_eq!(
shell_join(&["bash".into(), "-c".into(), "echo hi".into()]),
"bash -c \"echo hi\""
);
// Embedded quotes must escape:
assert_eq!(
shell_join(&[r#"say "hi""#.into()]),
r#""say \"hi\"""#
);
}
#[test]
fn restart_policy_emits_correct_systemd_string() {
assert_eq!(RestartPolicy::Always.as_systemd(), "always");
assert_eq!(RestartPolicy::OnFailure.as_systemd(), "on-failure");
}
#[test]
fn render_emits_backend_directives_when_set() {
let u = QuadletUnit {
name: "bitcoin-knots".into(),
description: "Bitcoin Knots backend".into(),
image: "registry/bitcoin-knots:latest".into(),
network: NetworkMode::Bridge("archy-net".into()),
cap_drop_all: true,
cap_add: vec!["NET_BIND_SERVICE".into()],
ports: vec![(8332, 8332, "tcp".into()), (8333, 8333, "tcp".into())],
environment: vec![
"BITCOIN_RPC_USER=archipelago".into(),
"BITCOIN_RPC_PASS=secret".into(),
],
devices: vec!["/dev/kvm".into()],
add_hosts: vec![("host.archipelago".into(), "10.89.0.1".into())],
entrypoint: Some(vec!["/usr/local/bin/bitcoind".into()]),
command: vec!["-server=1".into(), "-rpcbind=0.0.0.0".into()],
read_only_root: true,
no_new_privileges: true,
cpu_quota: Some(2),
restart_policy: RestartPolicy::OnFailure,
..QuadletUnit::default()
};
let s = u.render();
assert!(s.contains("PublishPort=8332:8332/tcp"));
assert!(s.contains("PublishPort=8333:8333/tcp"));
assert!(s.contains("Environment=BITCOIN_RPC_USER=archipelago"));
assert!(s.contains("Environment=BITCOIN_RPC_PASS=secret"));
assert!(s.contains("AddDevice=/dev/kvm"));
assert!(s.contains("AddHost=host.archipelago:10.89.0.1"));
assert!(s.contains("ReadOnly=true"));
assert!(s.contains("NoNewPrivileges=true"));
assert!(s.contains("PodmanArgs=--cpus=2"));
assert!(s.contains("Exec=/usr/local/bin/bitcoind -server=1 -rpcbind=0.0.0.0"));
assert!(s.contains("Restart=on-failure"));
assert!(s.contains("Network=archy-net"));
}
#[test]
fn render_skips_backend_directives_when_default() {
// Companion-style unit: backend extension fields all defaulted.
// Rendered bytes must not include any of the backend directives,
// so existing companion units stay byte-identical to before.
let s = sample_unit().render();
assert!(!s.contains("PublishPort="));
assert!(!s.contains("Environment="));
assert!(!s.contains("AddDevice="));
assert!(!s.contains("AddHost="));
assert!(!s.contains("ReadOnly="));
assert!(!s.contains("NoNewPrivileges="));
assert!(!s.contains("Exec="));
assert!(!s.contains("--cpus="));
// Default RestartPolicy is Always — companions rely on this.
assert!(s.contains("Restart=always"));
}
#[test]
fn from_manifest_translates_a_typical_backend() {
let yaml = r#"
app:
id: bitcoin-knots
name: Bitcoin Knots
version: 1.0.0
container:
image: registry/bitcoin-knots:1.0
entrypoint: ["/usr/local/bin/bitcoind"]
custom_args: ["-server=1", "-rpcbind=0.0.0.0"]
ports:
- host: 8332
container: 8332
protocol: tcp
volumes:
- type: bind
source: /var/lib/archipelago/bitcoin
target: /home/bitcoin/.bitcoin
options: []
environment:
- BITCOIN_NETWORK=mainnet
devices: []
resources:
cpu_limit: 4
memory_limit: 2g
security:
capabilities: ["NET_BIND_SERVICE"]
readonly_root: true
network_policy: archy-net
"#;
let m = AppManifest::parse(yaml).expect("manifest must parse");
let u = QuadletUnit::from_manifest(&m, "bitcoin-knots");
assert_eq!(u.name, "bitcoin-knots");
assert_eq!(u.image, "registry/bitcoin-knots:1.0");
assert!(matches!(u.network, NetworkMode::Bridge(ref n) if n == "archy-net"));
assert_eq!(u.memory_mb, Some(2048));
assert_eq!(u.cpu_quota, Some(4));
assert!(u.read_only_root);
assert!(u.no_new_privileges);
assert_eq!(u.cap_add, vec!["NET_BIND_SERVICE"]);
assert_eq!(u.ports, vec![(8332, 8332, "tcp".to_string())]);
assert_eq!(u.environment, vec!["BITCOIN_NETWORK=mainnet"]);
assert_eq!(u.bind_mounts.len(), 1);
assert_eq!(
u.bind_mounts[0].host,
PathBuf::from("/var/lib/archipelago/bitcoin")
);
assert!(!u.bind_mounts[0].read_only);
assert_eq!(u.entrypoint, Some(vec!["/usr/local/bin/bitcoind".into()]));
assert_eq!(u.command, vec!["-server=1", "-rpcbind=0.0.0.0"]);
assert!(u.add_hosts.iter().any(|(n, ip)| n == "host.archipelago" && ip == "10.89.0.1"));
assert_eq!(u.restart_policy, RestartPolicy::OnFailure);
}
#[test]
fn from_manifest_marks_ro_volumes_read_only() {
let yaml = r#"
app:
id: x
name: X
version: 1.0.0
container:
image: x:latest
volumes:
- type: bind
source: /etc/host-conf
target: /etc/conf
options: ["ro"]
"#;
let m = AppManifest::parse(yaml).unwrap();
let u = QuadletUnit::from_manifest(&m, "x");
assert_eq!(u.bind_mounts.len(), 1);
assert!(u.bind_mounts[0].read_only);
}
#[test]
fn from_manifest_skips_tmpfs_volumes() {
let yaml = r#"
app:
id: x
name: X
version: 1.0.0
container:
image: x:latest
volumes:
- type: tmpfs
target: /tmp
tmpfs_options: "rw,size=64m"
- type: bind
source: /var/lib/x
target: /data
options: []
"#;
let m = AppManifest::parse(yaml).unwrap();
let u = QuadletUnit::from_manifest(&m, "x");
// tmpfs entry is dropped from bind_mounts; bind entry survives.
assert_eq!(u.bind_mounts.len(), 1);
assert_eq!(u.bind_mounts[0].host, PathBuf::from("/var/lib/x"));
}
#[test]
fn render_emits_health_directives_when_set() {
let mut u = QuadletUnit::default();
u.name = "lnd".into();
u.image = "x:1".into();
u.health = Some(HealthSpec {
cmd: "nc -z localhost 10009".into(),
interval: "30s".into(),
timeout: "5s".into(),
retries: 3,
});
let s = u.render();
assert!(s.contains("HealthCmd=nc -z localhost 10009"));
assert!(s.contains("HealthInterval=30s"));
assert!(s.contains("HealthTimeout=5s"));
assert!(s.contains("HealthRetries=3"));
assert!(s.contains("Notify=healthy"));
// Notify=healthy needs a long-enough TimeoutStartSec or systemd
// kills the unit before the first probe can pass — observed live
// on .228 2026-05-02 across all six backends.
assert!(s.contains("TimeoutStartSec=600"), "got: {s}");
}
#[test]
fn render_skips_health_directives_when_absent() {
// No health spec → no Notify=healthy, no HealthCmd, no
// TimeoutStartSec override (default 90s applies). Companions rely
// on this so their rendered bytes stay unchanged.
let s = sample_unit().render();
assert!(!s.contains("HealthCmd="));
assert!(!s.contains("Notify=healthy"));
assert!(!s.contains("HealthRetries="));
assert!(!s.contains("TimeoutStartSec="));
}
#[test]
fn translate_health_check_handles_each_supported_type() {
use archipelago_container::HealthCheck;
let tcp = HealthCheck {
check_type: "tcp".into(),
endpoint: Some("localhost:10009".into()),
path: None,
interval: "30s".into(),
timeout: "5s".into(),
retries: 3,
};
let h = translate_health_check(&tcp).expect("tcp must translate");
assert_eq!(h.cmd, "nc -z localhost 10009");
assert_eq!(h.retries, 3);
let http = HealthCheck {
check_type: "http".into(),
endpoint: Some("localhost:8080".into()),
path: Some("/health".into()),
interval: "10s".into(),
timeout: "3s".into(),
retries: 5,
};
let h = translate_health_check(&http).expect("http must translate");
assert_eq!(h.cmd, "curl -fsS -m 5 http://localhost:8080/health");
let cmdck = HealthCheck {
check_type: "cmd".into(),
endpoint: Some("/usr/local/bin/probe.sh".into()),
path: None,
interval: "60s".into(),
timeout: "15s".into(),
retries: 2,
};
let h = translate_health_check(&cmdck).expect("cmd must translate");
assert_eq!(h.cmd, "/usr/local/bin/probe.sh");
// Unknown type → None (renderer skips Notify=healthy entirely
// rather than emit a broken HealthCmd that hangs the unit start).
let bad = HealthCheck {
check_type: "exec".into(),
endpoint: Some("foo".into()),
path: None,
interval: "30s".into(),
timeout: "5s".into(),
retries: 3,
};
assert!(translate_health_check(&bad).is_none());
// Malformed tcp endpoint → None (no port separator).
let badtcp = HealthCheck {
check_type: "tcp".into(),
endpoint: Some("hostonly".into()),
path: None,
interval: "30s".into(),
timeout: "5s".into(),
retries: 3,
};
assert!(translate_health_check(&badtcp).is_none());
}
fix(quadlet): http:// double-prefix + companion migration race Two bugs surfaced by the first real-node validation of Phase 3.2-3.4 on .228 (2026-05-02), both caught before flipping the default. Bug 1 — translate_health_check double-prefixed http://. Manifests in the wild carry the scheme inside the endpoint string ("http://localhost:8175"), and we were prepending another http:// unconditionally. Result on .228: every backend HealthCmd read `curl -fsS -m 5 http://http://localhost...`, every probe failed, fedimint hit a 14-restart loop. Now we accept either form and skip appending hc.path when the endpoint already carries one. Regression test asserts no double-prefix and that an in-endpoint path is honoured. Bug 2 — Phase 3.3 migration ran for UI companions (bitcoin-ui / electrs-ui / lnd-ui) that have shipped via Quadlet since v1.7.41. Migration tore down the running companion + raced companion.rs render, producing "Phase 3.3: re-install archy-bitcoin-ui via Quadlet" reconcile errors and leaving archy-bitcoin-ui down. Companions now short-circuit out of migrate_to_quadlet_if_needed before any IO. Also: when try_exists returns Err for an unrelated reason (permissions, EIO), we now skip migration instead of treating "I can't tell" as "go ahead and migrate" — migrating on top of a possibly-existing unit is destructive. What this does not fix yet: * the orchestrator's reconciler iterating every manifest in /opt/archipelago/apps/, not just installed apps. Pre-existing behavior (also affects the legacy path) — separate scope. * fedimint /data UID mismatch surfaced when Quadlet started fedimint fresh. Likely orthogonal — defer. * no rollback when install_via_quadlet fails after a remove_container. Tracked as Phase 3.3.1 — defer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:37:37 -04:00
#[test]
fn translate_health_check_http_does_not_double_prefix_scheme() {
// Regression: on .228 2026-05-02 we shipped HealthCmds reading
// `curl -fsS -m 5 http://http://localhost:8175/` because manifests
// in the wild carry the scheme inside the endpoint string. Every
// probe failed and the unit looped. Now we accept either form.
use archipelago_container::HealthCheck;
let with_scheme = HealthCheck {
check_type: "http".into(),
endpoint: Some("http://localhost:8175".into()),
path: Some("/".into()),
interval: "30s".into(),
timeout: "5s".into(),
retries: 3,
};
let h = translate_health_check(&with_scheme).expect("with-scheme must translate");
assert_eq!(h.cmd, "curl -fsS -m 5 http://localhost:8175/");
assert!(!h.cmd.contains("http://http://"), "got: {}", h.cmd);
let with_https = HealthCheck {
check_type: "http".into(),
endpoint: Some("https://example.local/health".into()),
path: None,
interval: "30s".into(),
timeout: "5s".into(),
retries: 3,
};
let h = translate_health_check(&with_https).expect("https must translate");
// Endpoint already has /health → don't append the default "/".
assert_eq!(h.cmd, "curl -fsS -m 5 https://example.local/health");
}
#[test]
fn from_manifest_picks_up_health_check() {
let yaml = r#"
app:
id: lnd
name: LND
version: 1.0.0
container:
image: x:1
health_check:
type: tcp
endpoint: localhost:10009
interval: 15s
timeout: 4s
retries: 5
"#;
let m = AppManifest::parse(yaml).unwrap();
let u = QuadletUnit::from_manifest(&m, "lnd");
let h = u.health.as_ref().expect("health should be populated");
assert_eq!(h.cmd, "nc -z localhost 10009");
assert_eq!(h.interval, "15s");
assert_eq!(h.timeout, "4s");
assert_eq!(h.retries, 5);
assert!(u.render().contains("Notify=healthy"));
}
feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52) The QuadletUnit struct now covers everything a backend manifest needs (ports, environment, devices, add_hosts, entrypoint+command, read-only root, no_new_privileges, cpu_quota, restart policy choice). Adds QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB forms. The renderer skips empty/false directives so existing companion units render byte-identically — no behavior change for shipping companions; the backend renderer is dead code until Phase 3.2 wires it into the orchestrator. Eight new unit tests cover: * parse_memory_mib forms (1024, 512m, 2g, garbage) * shell_join quoting (whitespace, embedded quotes) * RestartPolicy → systemd string mapping * render emits backend directives when set * render skips them when defaulted (companion regression gate) * from_manifest happy path on a bitcoin-knots-shaped manifest * from_manifest read-only volume detection * from_manifest tmpfs filtering * end-to-end manifest → render bytes assertion Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was implicitly covered before but is now explicit). Cargo warnings: 0. `from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are marked allow(dead_code) with explicit references to Phase 3.2 — if 3.2 doesn't wire them, the dead-code warning resurfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 17:09:50 -04:00
#[test]
fn from_manifest_renders_to_a_systemd_unit() {
// End-to-end: parse a real-shape manifest, build the unit, render
// the bytes, and assert the unit body contains the directives a
// human would write by hand.
let yaml = r#"
app:
id: lnd
name: LND
version: 1.0.0
container:
image: registry/lnd:latest
ports:
- host: 10009
container: 10009
protocol: tcp
volumes:
- type: bind
source: /var/lib/archipelago/lnd
target: /root/.lnd
options: []
environment:
- LND_NETWORK=mainnet
resources:
memory_limit: 1g
security:
capabilities: []
network_policy: archy-net
"#;
let m = AppManifest::parse(yaml).unwrap();
let body = QuadletUnit::from_manifest(&m, "lnd").render();
assert!(body.contains("ContainerName=lnd"));
assert!(body.contains("Image=registry/lnd:latest"));
assert!(body.contains("Network=archy-net"));
assert!(body.contains("PublishPort=10009:10009/tcp"));
assert!(body.contains("Volume=/var/lib/archipelago/lnd:/root/.lnd:Z"));
assert!(body.contains("Environment=LND_NETWORK=mainnet"));
assert!(body.contains("PodmanArgs=--memory=1024m"));
assert!(body.contains("AddHost=host.archipelago:10.89.0.1"));
assert!(body.contains("DropCapability=ALL"));
assert!(body.contains("NoNewPrivileges=true"));
assert!(body.contains("Restart=on-failure"));
}
refactor(container): move companion UIs to systemd via Quadlet Companion UI containers (archy-bitcoin-ui, archy-lnd-ui, archy-electrs-ui) used to be launched as fire-and-forget tokio::spawn blocks from install.rs. If archipelago crashed mid-spawn or the container's cgroup was reaped, companions vanished from podman ps -a and only a manual rm/run could bring them back (the .228 incident). Now each companion is rendered as a Quadlet .container unit under ~/.config/containers/systemd/, daemon-reloaded, and started via systemctl --user. systemd owns supervision from that point on: - archipelago can crash, restart, or be uninstalled without touching any companion. - Quadlet's Restart=always + RestartSec=10 handles container exits. - A 30s reconcile tick in boot_reconciler enumerates expected companion units and re-installs any whose unit file or service vanished — defense-in-depth against external tampering. New module layout: - container/quadlet.rs: pure unit renderer + atomic write_if_changed + systemctl helpers (daemon_reload_user / enable_now / disable_remove / is_active). 6 unit tests, no I/O in the renderer. - container/companion.rs: per-app companion specs, install/remove/ reconcile, image presence (build local first, fall back to insecure registry only via image_uses_insecure_registry whitelist). 2 tests. install.rs handle_package_install now ends with a single call to companion::install_for(package_id), replacing 287 lines of spawn-and- hope shellouts plus a ~120-line nginx auth-injector helper that worked around per-node RPC password baking. The helper is gone too — the pre-start hook renders the per-node nginx.conf to /var/lib/archipelago/ bitcoin-ui/nginx.conf and the Quadlet unit bind-mounts it read-only. runtime.rs handle_package_uninstall now disables companions before the container rm loop. Otherwise systemd's Restart=always would respawn each companion within ~10s of removal. Tests: 53 container tests pass, including 6 quadlet renderer tests (host network, bridge network, capability set, atomic write idempotence) and 2 companion specs (per-app companion lookup, build_unit shape). boot_reconciler tests gain a #[cfg(test)] without_companion_stage() flag so the paused-clock fixtures don't race the real systemctl I/O. A bats regression test (companion-survives-archipelago-restart.bats, gated on ARCHY_ALLOW_DESTRUCTIVE=1) asserts the .228 failure mode cannot recur: every installed companion has a unit file, services stay active across systemctl --user restart archipelago, and a deleted unit file is recreated within one reconcile tick. Net delta: +941 / -363, but the +941 is mostly tests (~440 lines) and the new declarative layer; the imperative tokio::spawn block and its nginx-auth helper are gone, removing two failure classes (orphan companions on archipelago crash, and post-start exec races under tightly-confined cgroups) that previously needed manual SSH recovery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 10:45:07 -04:00
}