Workstream B phase 1 (node-side consume). The signed app-catalog can now carry a full manifest per entry; the orchestrator overlays it over the disk manifest (origin-wins) with disk as the migration fallback. Moves apps toward registry-distributed manifests with no OTA-shipped disk file. - app_catalog: `manifest: Option<Value>` on AppCatalogEntry (forward-compatible, covered by the existing release-root signature over the raw JSON); `catalog_manifest_values()` accessor. - prod_orchestrator: `load_manifests` overlays catalog manifests after the disk walk; `catalog_manifest_to_overlay()` returns None (→ disk fallback) on unparseable value / app-id mismatch / failed validate() / build source (build contexts aren't registry-distributed yet — phase 1 is image-only). - manifest_dir stays PathBuf (build-only field); image-only apps never read it. - 6 unit tests; compiles clean. No-op until a catalog embeds a manifest, so existing nodes are unaffected. See docs/registry-manifest-design.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
7.5 KiB
Registry-Distributed App Manifests — Design
Status: design (2026-06-21) Goal (north-star): every app installs from a manifest distributed via the signed app-catalog on the registry — no OS-level code reliance, no OTA-shipped disk manifest required. Rootless, signed, robust, reboot-survivable.
See also: docs/dht-distribution-design.md (this is
its "discovery/authenticity" layer), MEMORY → project_manifest_driven_north_star.
1. Where we are today
Two distinct mechanisms, only one of which is registry-distributed:
| Thing | Source | Reaches node via | Carries |
|---|---|---|---|
apps/*/manifest.yml (48) |
repo working tree | OTA: self-update.sh rsyncs apps/ → /opt/archipelago/apps/ |
full manifest (the orchestrator's real source of truth) |
app-catalog.json (28) |
releases/app-catalog.json |
registry HTTP fetch, hourly, signed (app_catalog::refresh_catalog) |
version + image override only |
- Orchestrator registry = in-memory
state.manifests: HashMap<app_id, LoadedManifest>, populated byProdContainerOrchestrator::load_manifests()walking the disk dir.install(app_id)→loaded(app_id)→ "unknown app_id" if absent. app_catalog.rsis already: signed (release-root,trust::verify_detachedover the raw JSON), mirror-derived URLs, atomic cache at<data_dir>/app-catalog.json, forward-compatible (nodeny_unknown_fields— adding fields never breaks old nodes).
Gap: the manifest itself is never registry-distributed. Every app — btcpay, grafana, immich — depends on an OTA-shipped disk file. That is the OS-level reliance to eliminate.
2. Target
The signed catalog entry carries the full manifest. The orchestrator loads manifests from the catalog cache (origin), falling back to disk only during the migration window. Publishing an app = editing the catalog + signing + push — no binary OTA, no disk manifest.
publisher: apps/*/manifest.yml ──generate──▶ releases/app-catalog.json (embeds + signs)
node: refresh_catalog() ──fetch+verify──▶ <data_dir>/app-catalog.json
load_manifests() ──merge──▶ state.manifests (catalog wins; disk = fallback)
install(app_id) ──▶ render Quadlet unit (rootless, systemd-managed)
3. Schema change (app_catalog::AppCatalogEntry)
Add one optional, forward-compatible field:
/// Full app manifest, embedded so the app installs from the registry alone
/// (no OTA-shipped disk file). Carried as the raw value the publisher signed;
/// deserialized into `AppManifest` at load time. Absent during migration =>
/// the node uses the disk manifest fallback.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub manifest: Option<serde_json::Value>,
Why serde_json::Value, not AppManifest:
- keeps the signed preimage intact (we verify over the raw JSON bytes; a typed round-trip could drop/reorder unknown fields and break the signature),
- decouples catalog schema from manifest schema churn,
- deserialize +
validate()happens at orchestrator load, exactly likefrom_file.
Authenticity is free: fetch_one already verifies the release-root signature
over the whole document, so an embedded manifest is covered by the same signature.
A present-but-bad signature is already a hard reject.
4. Orchestrator load path (load_manifests)
Extend (not replace) the disk walk:
- Load disk manifests as today →
disk: HashMap<app_id, LoadedManifest>. - Load catalog manifests from the cache: for each entry with
manifest: Some(v),serde_json::from_value::<AppManifest>(v)thenvalidate(); on success build aLoadedManifest { manifest, manifest_dir }. - Merge, catalog-wins: a catalog manifest overrides the disk one for the same
app_id. Disk remains the fallback for apps the catalog doesn't cover (migration).- Rationale: the registry is the authoritative origin; disk is the legacy
transport we're retiring. This matches
app_catalog's "catalog verdict is authoritative when it covers the app" posture.
- Rationale: the registry is the authoritative origin; disk is the legacy
transport we're retiring. This matches
- A catalog manifest that fails parse/validate is logged and skipped → disk fallback used (one bad entry never blocks the fleet, same as the disk walk).
manifest_dir for registry manifests — IMPLEMENTED
LoadedManifest.manifest_dir is used only in the ResolvedSource::Build branch
(relative container.build.context resolution — two call sites). Image-only apps
(ResolvedSource::Pull) never read it.
Decision (phase 1, shipped): keep manifest_dir: PathBuf (no Option ripple
through the codebase). A catalog manifest with a build source is skipped so its
disk manifest stays in effect — build contexts aren't registry-distributed until a
later phase (content-addressed, per the DHT plan). For an accepted (image-only)
catalog manifest, manifest_dir = the disk app dir if the app also exists on disk,
else a sentinel <manifests_dir>/<app_id> (never read for image-only apps).
This is enforced by catalog_manifest_to_overlay(app_id, value) -> Option<AppManifest>
in prod_orchestrator.rs, which returns None (→ disk fallback) for: unparseable
value, embedded-id ≠ catalog-key, failed validate(), or a build source.
5. Publishing (publish-side generator)
Add a generator (extend create-release.sh / a small scripts/gen-app-catalog):
- walk
apps/*/manifest.yml, parse, embed each as the entry'smanifest(JSON), - keep
version/image/imagesderived from the manifest for the badge path, - write
releases/app-catalog.json, then sign with the existing release-root ceremony (archipelago ceremony/ Phase 0 seed). Unsigned still accepted in the migration window.
6. Migration & rollback
- Backward compatible: old nodes ignore the new
manifestfield (nodeny_unknown_fields) and keep using disk manifests. - Forward: new nodes prefer catalog manifests, disk as fallback. Once the
catalog covers every app and is verified live, drop
apps/from the OTA rsync. - Rollback: delete
<data_dir>/app-catalog.json(or revert the published catalog) → nodes fall back to disk manifests. No data touched.
7. Phases
- Schema + load merge (this design):
manifestfield,load_manifestscatalog-wins merge,manifest_dir: Option, unit tests (catalog overrides disk; bad catalog manifest → disk fallback; absent → disk). Image-only apps. - Publisher generator + signing: emit embedded+signed catalog; CI/release wiring.
- First real app end-to-end: immich as 3 registry manifests
(
immich-postgres/immich-redis/immich-server) installed viainstall_stack_via_orchestrator(delete legacyinstall_immich_stack). Usesgenerated_secrets: [immich-db-password](already built). - Build-context apps: content-addressed build contexts in the catalog (DHT swarm fetch) so companions stop needing disk too.
- Drop
apps/from OTA once coverage + live verification complete.
8. Open questions
- Do we embed manifests inline or reference them by content hash (BLAKE3) with a separate signed blob? Inline is simplest for Phase 1; hashing aligns with the DHT image-by-digest plan and keeps the catalog small. Lean inline now, revisit at Phase 4 when build contexts (large) need addressing anyway.
generated_fileswith inline content (vs. source-dir) — already supported in the manifest schema? If so, registry manifests can carry small rendered files inline, removing another disk dependency.