Single authoritative hub (docs/PRODUCTION-MASTER-PLAN.md) for the app-platform north star: every app manifest-driven (zero OS-level reliance), manifests via the signed registry, developer-ready external marketplace; rootless/secure/robust/ 100%-uptime. Repo CLAUDE.md (auto-loaded each session) points agents at it until the 20x lifecycle gate is green. New design doc registry-manifest-design.md. Consolidated docs 56 -> 28: deleted dated handoffs/resumes/transcripts and superseded trackers (content folded into the master plan or already in memory). Kept all evergreen design/reference docs + ADRs (the master links them). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.2 KiB
Registry-Distributed App Manifests — Design
Status: design (2026-06-21) Goal (north-star): every app installs from a manifest distributed via the signed app-catalog on the registry — no OS-level code reliance, no OTA-shipped disk manifest required. Rootless, signed, robust, reboot-survivable.
See also: docs/dht-distribution-design.md (this is
its "discovery/authenticity" layer), MEMORY → project_manifest_driven_north_star.
1. Where we are today
Two distinct mechanisms, only one of which is registry-distributed:
| Thing | Source | Reaches node via | Carries |
|---|---|---|---|
apps/*/manifest.yml (48) |
repo working tree | OTA: self-update.sh rsyncs apps/ → /opt/archipelago/apps/ |
full manifest (the orchestrator's real source of truth) |
app-catalog.json (28) |
releases/app-catalog.json |
registry HTTP fetch, hourly, signed (app_catalog::refresh_catalog) |
version + image override only |
- Orchestrator registry = in-memory
state.manifests: HashMap<app_id, LoadedManifest>, populated byProdContainerOrchestrator::load_manifests()walking the disk dir.install(app_id)→loaded(app_id)→ "unknown app_id" if absent. app_catalog.rsis already: signed (release-root,trust::verify_detachedover the raw JSON), mirror-derived URLs, atomic cache at<data_dir>/app-catalog.json, forward-compatible (nodeny_unknown_fields— adding fields never breaks old nodes).
Gap: the manifest itself is never registry-distributed. Every app — btcpay, grafana, immich — depends on an OTA-shipped disk file. That is the OS-level reliance to eliminate.
2. Target
The signed catalog entry carries the full manifest. The orchestrator loads manifests from the catalog cache (origin), falling back to disk only during the migration window. Publishing an app = editing the catalog + signing + push — no binary OTA, no disk manifest.
publisher: apps/*/manifest.yml ──generate──▶ releases/app-catalog.json (embeds + signs)
node: refresh_catalog() ──fetch+verify──▶ <data_dir>/app-catalog.json
load_manifests() ──merge──▶ state.manifests (catalog wins; disk = fallback)
install(app_id) ──▶ render Quadlet unit (rootless, systemd-managed)
3. Schema change (app_catalog::AppCatalogEntry)
Add one optional, forward-compatible field:
/// Full app manifest, embedded so the app installs from the registry alone
/// (no OTA-shipped disk file). Carried as the raw value the publisher signed;
/// deserialized into `AppManifest` at load time. Absent during migration =>
/// the node uses the disk manifest fallback.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub manifest: Option<serde_json::Value>,
Why serde_json::Value, not AppManifest:
- keeps the signed preimage intact (we verify over the raw JSON bytes; a typed round-trip could drop/reorder unknown fields and break the signature),
- decouples catalog schema from manifest schema churn,
- deserialize +
validate()happens at orchestrator load, exactly likefrom_file.
Authenticity is free: fetch_one already verifies the release-root signature
over the whole document, so an embedded manifest is covered by the same signature.
A present-but-bad signature is already a hard reject.
4. Orchestrator load path (load_manifests)
Extend (not replace) the disk walk:
- Load disk manifests as today →
disk: HashMap<app_id, LoadedManifest>. - Load catalog manifests from the cache: for each entry with
manifest: Some(v),serde_json::from_value::<AppManifest>(v)thenvalidate(); on success build aLoadedManifest { manifest, manifest_dir }. - Merge, catalog-wins: a catalog manifest overrides the disk one for the same
app_id. Disk remains the fallback for apps the catalog doesn't cover (migration).- Rationale: the registry is the authoritative origin; disk is the legacy
transport we're retiring. This matches
app_catalog's "catalog verdict is authoritative when it covers the app" posture.
- Rationale: the registry is the authoritative origin; disk is the legacy
transport we're retiring. This matches
- A catalog manifest that fails parse/validate is logged and skipped → disk fallback used (one bad entry never blocks the fleet, same as the disk walk).
manifest_dir for registry manifests
LoadedManifest.manifest_dir is used for build contexts and generated files
(GeneratedFile) that live next to a disk manifest. Registry manifests have no dir.
- Phase 1 scope = image-only apps (no
build:, nogenerated_files:needing a source dir). immich, grafana, the fedimint apps, postgres/redis all qualify. - Represent the absent dir explicitly:
manifest_dir: Option<PathBuf>(or a sentinel under<data_dir>/registry-apps/<app_id>/materialized on demand). Companion build apps (bitcoin-ui, …) keep their disk path until a later phase teaches the catalog to carry build contexts (content-addressed, per the DHT plan).
5. Publishing (publish-side generator)
Add a generator (extend create-release.sh / a small scripts/gen-app-catalog):
- walk
apps/*/manifest.yml, parse, embed each as the entry'smanifest(JSON), - keep
version/image/imagesderived from the manifest for the badge path, - write
releases/app-catalog.json, then sign with the existing release-root ceremony (archipelago ceremony/ Phase 0 seed). Unsigned still accepted in the migration window.
6. Migration & rollback
- Backward compatible: old nodes ignore the new
manifestfield (nodeny_unknown_fields) and keep using disk manifests. - Forward: new nodes prefer catalog manifests, disk as fallback. Once the
catalog covers every app and is verified live, drop
apps/from the OTA rsync. - Rollback: delete
<data_dir>/app-catalog.json(or revert the published catalog) → nodes fall back to disk manifests. No data touched.
7. Phases
- Schema + load merge (this design):
manifestfield,load_manifestscatalog-wins merge,manifest_dir: Option, unit tests (catalog overrides disk; bad catalog manifest → disk fallback; absent → disk). Image-only apps. - Publisher generator + signing: emit embedded+signed catalog; CI/release wiring.
- First real app end-to-end: immich as 3 registry manifests
(
immich-postgres/immich-redis/immich-server) installed viainstall_stack_via_orchestrator(delete legacyinstall_immich_stack). Usesgenerated_secrets: [immich-db-password](already built). - Build-context apps: content-addressed build contexts in the catalog (DHT swarm fetch) so companions stop needing disk too.
- Drop
apps/from OTA once coverage + live verification complete.
8. Open questions
- Do we embed manifests inline or reference them by content hash (BLAKE3) with a separate signed blob? Inline is simplest for Phase 1; hashing aligns with the DHT image-by-digest plan and keeps the catalog small. Lean inline now, revisit at Phase 4 when build contexts (large) need addressing anyway.
generated_fileswith inline content (vs. source-dir) — already supported in the manifest schema? If so, registry manifests can carry small rendered files inline, removing another disk dependency.