lfg2025/archy

archipelago 9becafafd3 feat(quadlet): backend-manifest renderer (Phase 3.1 of v1.7.52)

The QuadletUnit struct now covers everything a backend manifest needs
(ports, environment, devices, add_hosts, entrypoint+command, read-only
root, no_new_privileges, cpu_quota, restart policy choice). Adds
QuadletUnit::from_manifest(&AppManifest, name) that translates a parsed
manifest into a unit, plus parse_memory_mib for "1g"/"512m"/raw-MiB
forms. The renderer skips empty/false directives so existing companion
units render byte-identically — no behavior change for shipping
companions; the backend renderer is dead code until Phase 3.2 wires it
into the orchestrator.

Eight new unit tests cover:
* parse_memory_mib forms (1024, 512m, 2g, garbage)
* shell_join quoting (whitespace, embedded quotes)
* RestartPolicy → systemd string mapping
* render emits backend directives when set
* render skips them when defaulted (companion regression gate)
* from_manifest happy path on a bitcoin-knots-shaped manifest
* from_manifest read-only volume detection
* from_manifest tmpfs filtering
* end-to-end manifest → render bytes assertion

Tests: 615 → 624 (+9 net; one pre-existing parse_memory_mib path was
implicitly covered before but is now explicit). Cargo warnings: 0.

`from_manifest`, `parse_memory_mib`, and `RestartPolicy::OnFailure` are
marked allow(dead_code) with explicit references to Phase 3.2 — if
3.2 doesn't wire them, the dead-code warning resurfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-01 17:09:50 -04:00

7.9 KiB

Raw Blame History

Container subsystem testing — scorecard and roadmap

The bar (verbatim from the v1.7.52 owner):

"best performant, minimal code, tested containers possible in the world. No bloated code, no problems installing a single one, no problems uninstalling, every one needs to be tested 20+ times in every state before we make another update, not a single container failure outside of hardware or internet failure is allowed."

This document is the live tracker for whether we're meeting that bar. Every PR that touches the container subsystem updates the scoreboard below. If you can't honestly tick the box, the change isn't ready.

Test layers

Layer	What it asserts	Toolchain	Latency / iteration
L0 — Rust unit	Pure-function behaviour (manifest parsing, secret resolution, structural invariants)	`cargo test --workspace --bins`	~5s
L1 — RPC API	The JSON-RPC API responds correctly per app (`container-list`, `package.{install,start,stop,restart,uninstall}`, `bitcoin.getinfo`, etc.)	bats + lib/rpc.bash	~30s per suite
L2 — UI surface	The URLs a user actually clicks (dashboard, `/app/<id>/`, direct-port iframes) return 200 with non-empty bodies	bats + lib/ui-probes.bash	~10s per suite
L3 — Lifecycle survival	Containers survive operational events (archipelago restart, host reboot, kill -9 mid-install, OOM)	bats (gated)	~60s per scenario
L4 — Browser journey	Real DOM-level user flow (login → install → wait → click → use)	playwright (TBD)	~30-120s per journey
L5 — Chaos / failure-path	Failure modes recover gracefully (corrupt config, deleted bolt DB, network partition)	bats (chaos-gated)	~120s per scenario
L6 — Performance	Cold install latency, reconcile-tick cost, podman call count per lifecycle event	timed bats + Prometheus (TBD)	~60s per benchmark

Release gate: L0+L1+L2+L3 green × 20 iterations on .228 AND .198. L4+L5+L6 are quality gates we add as they mature; not blocking the v1.7.52 tag.

Coverage matrix — current state

Legend: ● fully covered, ◐ partial, ○ missing

Per-app × per-state matrix (L1 + L2)

App	Container present	Valid state	RPC reachable	UI URL 200	Stop	Start	Restart	Reinstall	Reboot survives	Archipelago-restart survives
bitcoin-knots	●	●	●	● (port 8334)	●	●	●	●	○	◐ regression-gate only
bitcoin-core	◐ shares with knots	◐	○	◐	○	○	○	○	○	◐ regression-gate
lnd	●	●	● (lncli)	● (`/app/lnd/`)	●	●	●	●	○	◐ regression-gate
electrumx	●	●	● (TCP 50001)	● (`/app/electrumx/`)	●	●	●	●	○	◐ regression-gate
btcpay-server	●	●	◐ frontend-port	● (`/app/btcpay/`)	●	●	●	●	○	○
mempool	●	●	● (`/api/v1/backend-info`)	● (`/app/mempool/`)	●	●	●	●	○	○
fedimint	●	●	◐ container-only	● (`/app/fedimint/`)	●	●	●	●	○	○
filebrowser	○	○	○	● probe-only	○	○	○	○	○	◐ via companions
archy-bitcoin-ui	◐ via companions	◐	n/a	● (port 8334)	○	○	○	n/a	◐ via companions	●
archy-lnd-ui	◐ via companions	◐	n/a	● (`/app/lnd/`)	○	○	○	n/a	◐ via companions	●
archy-electrs-ui	◐ via companions	◐	n/a	● (`/app/electrumx/`)	○	○	○	n/a	◐ via companions	●

Done: 50 of 110 cells. Goal: 110/110 ● for the listed apps before v1.7.52 tags.

Layer-by-layer status

Layer	Tests	Suites	Status
L0 unit	624	n/a	● green
L1 RPC	70	bitcoin-knots, lnd, electrumx, btcpay, mempool, fedimint, required-stack, package-update-smoke	● for the 6 core apps
L2 UI	9	ui-coverage	● for dashboard + 7 proxy paths + bitcoin-ui:8334
L3 lifecycle survival	8	companion-survives-archipelago-restart, backend-survives-archipelago-restart, required-stack-destructive	◐ companions ● ; backends ◐ regression-gate (will fail until Phase 3 Quadlet ships)
L4 browser journey	0	none	○ not started
L5 chaos	0	none	○ not started
L6 performance	0	none	○ not started

Run commands

# L0 unit:
cd core && cargo test --workspace --bins

# Single bats suite:
ARCHY_PASSWORD=password123 tests/lifecycle/run.sh bitcoin-knots

# Full bats suite (read-only):
ARCHY_PASSWORD=password123 tests/lifecycle/run.sh

# Full + destructive (for the verification fleet):
ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 tests/lifecycle/run.sh

# 20× release-gate run (the actual v1.7.52 ship gate):
ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 \
  tests/lifecycle/run-20x.sh

LoC budget

Goal: minimum-viable container subsystem.

Module	LoC today	Target	Δ	Status
`core/container/src/dependency_resolver.rs`	—	—	-270	● deleted
`core/container/src/health_monitor.rs`	196	0	-196	◐ pending health migration into reconciler (Phase 3.5)
`core/container/src/podman_client.rs::create/start/stop`	~400	~150	-250	◐ pending Quadlet migration (Phase 3.5)
`core/archipelago/src/container/dev_orchestrator.rs`	410	0	-410	○ pending dev_mode strategy decision
`core/archipelago/src/container/data_manager.rs`	96	0	-96	○ couples with dev_orchestrator
`core/container/src/bitcoin_simulator.rs`	219	0	-219	○ couples with dev_orchestrator
`core/container/src/port_manager.rs`	175	0	-175	○ couples with dev_orchestrator
`core/archipelago/src/api/rpc/package/install.rs::install_bitcoincoin_rpc_repair`	~150	0	-150	◐ pending fold into orchestrator pre-start
imperative `install_fresh` in prod_orchestrator	~120	0	-120	○ pending Phase 3.2 Quadlet renderer

Today: -270 LoC committed. Outstanding deletes possible: ~1,616 LoC (if Phase 3 ships fully + dev_mode resolved).

Net target for v1.7.52: container subsystem ≈ half of today's LoC.

Performance KPIs (TBD — measure first, then target)

We don't have a performance harness yet. Add as L6 lands:

KPI	Today	Target	Notes
cold install: bitcoin-knots manifest → `running` healthcheck	unknown	< 30s once image is local	excludes the ~1GB image pull
cold install: lnd	unknown	< 60s once image is local	wallet unlock dominates
reconcile-tick wall time (no-op pass over all installed apps)	unknown	< 250ms	the current orchestrator does many `podman inspect` calls
podman shell-outs per package.install (orchestrator path)	7-10	1-2 (Quadlet)	post-Phase-3
daemon startup (boot → port 5678 listening)	unknown	< 5s	reconcile is async after this

Release gates

v1.7.52 ships only when ALL of:

☐ Bitcoin-stops fix verified live on a fresh node (tests/lifecycle/bats/bitcoin-knots.bats fully ● after a cold install)
☐ tests/lifecycle/run-20x.sh returns 0 against .228 (full suite, ARCHY_ALLOW_DESTRUCTIVE=1)
☐ tests/lifecycle/run-20x.sh returns 0 against .198 (same)
☐ The L3 backend-survives-archipelago-restart suite passes (= Phase 3 Quadlet shipped for backends)
☐ Cargo: 0 warnings, 0 unused, all tests green (sustained ✓ since 1c0df95f)
☐ LoC: at least one of {Phase 3 Quadlet, dev_mode resolution} merged
☐ Layman-readable changelog (per feedback_changelog_layman.md)
☐ Tag pushed to origin + gitea-local + gitea-vps2 (per feedback_ship_ritual.md)

How to update this document

When you land a change that materially moves any cell of the matrix or any LoC row, update this file in the same commit. Reviewers checking the PR can read the diff to TESTING.md as the answer to "what did this commit improve?". Without the update, the change is half-shipped.

7.9 KiB Raw Blame History Unescape Escape