archy/docs/registry-manifest-design.md
archipelago 192238cbb8 docs: consolidate into PRODUCTION-MASTER-PLAN, add CLAUDE.md, prune 25 stale docs
Single authoritative hub (docs/PRODUCTION-MASTER-PLAN.md) for the app-platform
north star: every app manifest-driven (zero OS-level reliance), manifests via the
signed registry, developer-ready external marketplace; rootless/secure/robust/
100%-uptime. Repo CLAUDE.md (auto-loaded each session) points agents at it until
the 20x lifecycle gate is green. New design doc registry-manifest-design.md.

Consolidated docs 56 -> 28: deleted dated handoffs/resumes/transcripts and
superseded trackers (content folded into the master plan or already in memory).
Kept all evergreen design/reference docs + ADRs (the master links them).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 05:11:32 -04:00

7.2 KiB

Registry-Distributed App Manifests — Design

Status: design (2026-06-21) Goal (north-star): every app installs from a manifest distributed via the signed app-catalog on the registry — no OS-level code reliance, no OTA-shipped disk manifest required. Rootless, signed, robust, reboot-survivable.

See also: docs/dht-distribution-design.md (this is its "discovery/authenticity" layer), MEMORY → project_manifest_driven_north_star.


1. Where we are today

Two distinct mechanisms, only one of which is registry-distributed:

Thing Source Reaches node via Carries
apps/*/manifest.yml (48) repo working tree OTA: self-update.sh rsyncs apps/ → /opt/archipelago/apps/ full manifest (the orchestrator's real source of truth)
app-catalog.json (28) releases/app-catalog.json registry HTTP fetch, hourly, signed (app_catalog::refresh_catalog) version + image override only
  • Orchestrator registry = in-memory state.manifests: HashMap<app_id, LoadedManifest>, populated by ProdContainerOrchestrator::load_manifests() walking the disk dir. install(app_id)loaded(app_id) → "unknown app_id" if absent.
  • app_catalog.rs is already: signed (release-root, trust::verify_detached over the raw JSON), mirror-derived URLs, atomic cache at <data_dir>/app-catalog.json, forward-compatible (no deny_unknown_fields — adding fields never breaks old nodes).

Gap: the manifest itself is never registry-distributed. Every app — btcpay, grafana, immich — depends on an OTA-shipped disk file. That is the OS-level reliance to eliminate.

2. Target

The signed catalog entry carries the full manifest. The orchestrator loads manifests from the catalog cache (origin), falling back to disk only during the migration window. Publishing an app = editing the catalog + signing + push — no binary OTA, no disk manifest.

publisher: apps/*/manifest.yml  ──generate──▶  releases/app-catalog.json (embeds + signs)
node:      refresh_catalog() ──fetch+verify──▶ <data_dir>/app-catalog.json
           load_manifests()  ──merge──▶ state.manifests   (catalog wins; disk = fallback)
           install(app_id)   ──▶ render Quadlet unit (rootless, systemd-managed)

3. Schema change (app_catalog::AppCatalogEntry)

Add one optional, forward-compatible field:

/// Full app manifest, embedded so the app installs from the registry alone
/// (no OTA-shipped disk file). Carried as the raw value the publisher signed;
/// deserialized into `AppManifest` at load time. Absent during migration =>
/// the node uses the disk manifest fallback.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub manifest: Option<serde_json::Value>,

Why serde_json::Value, not AppManifest:

  • keeps the signed preimage intact (we verify over the raw JSON bytes; a typed round-trip could drop/reorder unknown fields and break the signature),
  • decouples catalog schema from manifest schema churn,
  • deserialize + validate() happens at orchestrator load, exactly like from_file.

Authenticity is free: fetch_one already verifies the release-root signature over the whole document, so an embedded manifest is covered by the same signature. A present-but-bad signature is already a hard reject.

4. Orchestrator load path (load_manifests)

Extend (not replace) the disk walk:

  1. Load disk manifests as today → disk: HashMap<app_id, LoadedManifest>.
  2. Load catalog manifests from the cache: for each entry with manifest: Some(v), serde_json::from_value::<AppManifest>(v) then validate(); on success build a LoadedManifest { manifest, manifest_dir }.
  3. Merge, catalog-wins: a catalog manifest overrides the disk one for the same app_id. Disk remains the fallback for apps the catalog doesn't cover (migration).
    • Rationale: the registry is the authoritative origin; disk is the legacy transport we're retiring. This matches app_catalog's "catalog verdict is authoritative when it covers the app" posture.
  4. A catalog manifest that fails parse/validate is logged and skipped → disk fallback used (one bad entry never blocks the fleet, same as the disk walk).

manifest_dir for registry manifests

LoadedManifest.manifest_dir is used for build contexts and generated files (GeneratedFile) that live next to a disk manifest. Registry manifests have no dir.

  • Phase 1 scope = image-only apps (no build:, no generated_files: needing a source dir). immich, grafana, the fedimint apps, postgres/redis all qualify.
  • Represent the absent dir explicitly: manifest_dir: Option<PathBuf> (or a sentinel under <data_dir>/registry-apps/<app_id>/ materialized on demand). Companion build apps (bitcoin-ui, …) keep their disk path until a later phase teaches the catalog to carry build contexts (content-addressed, per the DHT plan).

5. Publishing (publish-side generator)

Add a generator (extend create-release.sh / a small scripts/gen-app-catalog):

  • walk apps/*/manifest.yml, parse, embed each as the entry's manifest (JSON),
  • keep version/image/images derived from the manifest for the badge path,
  • write releases/app-catalog.json, then sign with the existing release-root ceremony (archipelago ceremony / Phase 0 seed). Unsigned still accepted in the migration window.

6. Migration & rollback

  • Backward compatible: old nodes ignore the new manifest field (no deny_unknown_fields) and keep using disk manifests.
  • Forward: new nodes prefer catalog manifests, disk as fallback. Once the catalog covers every app and is verified live, drop apps/ from the OTA rsync.
  • Rollback: delete <data_dir>/app-catalog.json (or revert the published catalog) → nodes fall back to disk manifests. No data touched.

7. Phases

  1. Schema + load merge (this design): manifest field, load_manifests catalog-wins merge, manifest_dir: Option, unit tests (catalog overrides disk; bad catalog manifest → disk fallback; absent → disk). Image-only apps.
  2. Publisher generator + signing: emit embedded+signed catalog; CI/release wiring.
  3. First real app end-to-end: immich as 3 registry manifests (immich-postgres/immich-redis/immich-server) installed via install_stack_via_orchestrator (delete legacy install_immich_stack). Uses generated_secrets: [immich-db-password] (already built).
  4. Build-context apps: content-addressed build contexts in the catalog (DHT swarm fetch) so companions stop needing disk too.
  5. Drop apps/ from OTA once coverage + live verification complete.

8. Open questions

  • Do we embed manifests inline or reference them by content hash (BLAKE3) with a separate signed blob? Inline is simplest for Phase 1; hashing aligns with the DHT image-by-digest plan and keeps the catalog small. Lean inline now, revisit at Phase 4 when build contexts (large) need addressing anyway.
  • generated_files with inline content (vs. source-dir) — already supported in the manifest schema? If so, registry manifests can carry small rendered files inline, removing another disk dependency.