feat(immich): manifest-driven stack via orchestrator — live-migrated on .228

Completes the immich migration off the legacy hardcoded install_immich_stack
(podman run + sudo chown) to the registry-manifest + orchestrator path. Validated
live on .228 (clean single set, healthy v2.7.4, data dir ownership correct).

- install_immich_stack now tries install_stack_via_orchestrator(immich_stack_app_ids)
  first; legacy remains only as the no-manifests fallback.
- immich-{postgres,redis,server} manifests corrected from live findings:
  * named by app_id (dropped container_name override) — using container_name
    spawned DUPLICATE containers (app_id-named install vs name-override reconcile)
    on the same PGDATA, which corrupted a postgres cluster. Server reaches its
    siblings via app_id aliases (DB_HOSTNAME=immich-postgres, REDIS=immich-redis).
  * immich-postgres data_uid 100998:100998 (postgres drops to container 999 →
    host 100998 under rootless; verified the fresh dir is chowned correctly).
  * immich-server version "release"→"2.7.4" (manifest validation requires a digit;
    the bad version made the manifest silently skip → partial orchestrator install
    → legacy fallback → the duplicate corruption above).
- HARDEN install_stack_via_orchestrator: only fall back to the legacy installer
  when NOTHING was installed yet. An "unknown app_id" AFTER a member is up now
  errors instead of double-creating containers on shared data (the corruption
  root cause).
- Strict the all-manifests round-trip test: fail (not skip) on any invalid shipped
  manifest — this gap let the bad immich-server version through.

Known follow-up (pre-existing, platform-wide): orchestrator-installed backends
(immich, btcpay-db) run as podman --restart, not Quadlet, and podman-restart.service
is disabled on .228 → reboot-survival gap independent of this migration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-06-21 07:08:45 -04:00
parent 011081d180
commit 9e6c5370fc
6 changed files with 61 additions and 18 deletions

View File

@ -4,15 +4,20 @@ app:
version: "14-vectorchord0.4.3-pgvectors0.2.0" version: "14-vectorchord0.4.3-pgvectors0.2.0"
description: Postgres (pgvecto.rs / vectorchord) backend for Immich. description: Postgres (pgvecto.rs / vectorchord) backend for Immich.
# The Immich server connects via DB_HOSTNAME=immich_postgres, so the container # No container_name override: the container is named by app_id (immich-postgres),
# name (and thus its archy-net alias) must be the underscore form. # which is also its archy-net alias and the server's DB_HOSTNAME. (Overriding the
extensions: # name diverges from the orchestrator's app_id-based naming and spawns duplicate
container_name: immich_postgres # containers — mirror the btcpay stack, which names members by app_id.)
container: container:
image: 146.59.87.168:3000/lfg2025/immich-postgres:14-vectorchord0.4.3-pgvectors0.2.0 image: 146.59.87.168:3000/lfg2025/immich-postgres:14-vectorchord0.4.3-pgvectors0.2.0
pull_policy: if-not-present pull_policy: if-not-present
network: archy-net network: archy-net
# postgres drops to its own uid (container 999 → host 100998 under rootless),
# so the data dir must be owned by that mapped uid — mirrors archy-btcpay-db.
# Verified on .228: the live immich-db is owned 100998. Without this a FRESH
# install's dir would be service-user-owned and postgres would EACCES.
data_uid: "100998:100998"
generated_secrets: generated_secrets:
- name: immich-db-password - name: immich-db-password
kind: hex32 kind: hex32

View File

@ -4,9 +4,7 @@ app:
version: "7-alpine" version: "7-alpine"
description: Valkey (Redis-compatible) cache for Immich. description: Valkey (Redis-compatible) cache for Immich.
# Immich server connects via REDIS_HOSTNAME=immich_redis — alias must match. # Named by app_id (immich-redis) = archy-net alias = server's REDIS_HOSTNAME.
extensions:
container_name: immich_redis
container: container:
image: 146.59.87.168:3000/lfg2025/valkey:7-alpine image: 146.59.87.168:3000/lfg2025/valkey:7-alpine

View File

@ -1,11 +1,11 @@
app: app:
id: immich-server id: immich-server
name: Immich name: Immich
version: "release" version: "2.7.4"
description: Self-hosted photo and video backup with mobile apps and search. description: Self-hosted photo and video backup with mobile apps and search.
extensions: # Named by app_id (immich-server); connects to its siblings by their app_id
container_name: immich_server # aliases on archy-net (see DB_HOSTNAME / REDIS_HOSTNAME below).
container: container:
image: 146.59.87.168:3000/lfg2025/immich-server:release image: 146.59.87.168:3000/lfg2025/immich-server:release
@ -41,10 +41,10 @@ app:
options: [rw] options: [rw]
environment: environment:
- DB_HOSTNAME=immich_postgres - DB_HOSTNAME=immich-postgres
- DB_USERNAME=postgres - DB_USERNAME=postgres
- DB_DATABASE_NAME=immich - DB_DATABASE_NAME=immich
- REDIS_HOSTNAME=immich_redis - REDIS_HOSTNAME=immich-redis
- UPLOAD_LOCATION=/usr/src/app/upload - UPLOAD_LOCATION=/usr/src/app/upload
health_check: health_check:

View File

@ -620,16 +620,25 @@ async fn install_stack_via_orchestrator(
)) ))
.await; .await;
let mut installed = 0usize;
for app_id in app_ids { for app_id in app_ids {
match orchestrator.install(app_id).await { match orchestrator.install(app_id).await {
Ok(container_name) => { Ok(container_name) => {
installed += 1;
install_log(&format!( install_log(&format!(
"INSTALL ORCH: {} stack — app {} installed as {}", "INSTALL ORCH: {} stack — app {} installed as {}",
stack_name, app_id, container_name stack_name, app_id, container_name
)) ))
.await; .await;
} }
Err(e) if e.to_string().contains("unknown app_id") => { Err(e) if e.to_string().contains("unknown app_id") && installed == 0 => {
// None of the stack's manifests are known — the orchestrator
// can't render this stack at all, so defer to the legacy
// installer. Only safe when NOTHING was installed yet: once an
// earlier member is up, falling back would let the legacy path
// double-create containers on the same data dir (observed
// corrupting an immich postgres cluster — two postmasters, one
// PGDATA). A partial set means a deploy bug, not a legacy node.
install_log(&format!( install_log(&format!(
"INSTALL ORCH SKIP: {} stack — app {} unknown, falling back to legacy stack installer", "INSTALL ORCH SKIP: {} stack — app {} unknown, falling back to legacy stack installer",
stack_name, app_id stack_name, app_id
@ -637,6 +646,17 @@ async fn install_stack_via_orchestrator(
.await; .await;
return Ok(None); return Ok(None);
} }
Err(e) if e.to_string().contains("unknown app_id") => {
install_log(&format!(
"INSTALL ORCH FAIL: {} stack — app {} unknown AFTER {} installed; refusing legacy fallback (would double-create on shared data)",
stack_name, app_id, installed
))
.await;
return Err(e.context(format!(
"orchestrator stack install {} aborted: app {} has no manifest but {} member(s) already installed — deploy all stack manifests",
stack_name, app_id, installed
)));
}
Err(e) => { Err(e) => {
install_log(&format!( install_log(&format!(
"INSTALL ORCH FAIL: {} stack — app {} failed: {}", "INSTALL ORCH FAIL: {} stack — app {} failed: {}",
@ -668,6 +688,11 @@ fn mempool_stack_app_ids() -> &'static [&'static str] {
&["archy-mempool-db", "mempool-api", "archy-mempool-web"] &["archy-mempool-db", "mempool-api", "archy-mempool-web"]
} }
fn immich_stack_app_ids() -> &'static [&'static str] {
// Install order = dependency order: db + cache before the server.
&["immich-postgres", "immich-redis", "immich-server"]
}
const REGISTRY: &str = "146.59.87.168:3000/lfg2025"; const REGISTRY: &str = "146.59.87.168:3000/lfg2025";
const NETBIRD_DASHBOARD_IMAGE: &str = "docker.io/netbirdio/dashboard:v2.38.0"; const NETBIRD_DASHBOARD_IMAGE: &str = "docker.io/netbirdio/dashboard:v2.38.0";
@ -734,6 +759,17 @@ async fn pull_image_with_retry(image: &str) -> Result<()> {
impl RpcHandler { impl RpcHandler {
/// Install Immich stack (postgres + redis + server). /// Install Immich stack (postgres + redis + server).
pub(super) async fn install_immich_stack(&self) -> Result<serde_json::Value> { pub(super) async fn install_immich_stack(&self) -> Result<serde_json::Value> {
// Manifest-driven path (workstream B/C): render the stack from
// apps/immich-*/manifest.yml via the orchestrator (rootless Quadlet
// units, generated_secrets, reboot-survivable). Falls back to the legacy
// installer below only when the orchestrator doesn't know these app_ids
// (manifests not yet deployed). See docs/PRODUCTION-MASTER-PLAN.md.
if let Some(orchestrated) =
install_stack_via_orchestrator(self, "immich", immich_stack_app_ids()).await?
{
return Ok(orchestrated);
}
if let Some(adopted) = adopt_stack_if_exists( if let Some(adopted) = adopt_stack_if_exists(
"immich_server", "immich_server",
"immich", "immich",

View File

@ -3778,10 +3778,14 @@ app:
if !mf.exists() { if !mf.exists() {
continue; continue;
} }
let m = match AppManifest::from_file(&mf) { // Every shipped manifest MUST be valid. load_manifests() silently
Ok(m) => m, // skips malformed ones in prod, which once let an invalid app.version
Err(_) => continue, // a malformed disk manifest is a separate concern // ("release", no digit) ship — the app then vanished from the
}; // orchestrator and a stack install half-fell-back to the legacy path.
// Fail loudly here instead.
let m = AppManifest::from_file(&mf).unwrap_or_else(|e| {
panic!("shipped manifest {} must be valid: {e}", mf.display())
});
let id = m.app.id.clone(); let id = m.app.id.clone();
let is_build = m.app.container.build.is_some(); let is_build = m.app.container.build.is_some();
let value = serde_json::to_value(&m).expect("manifest serializes to JSON"); let value = serde_json::to_value(&m).expect("manifest serializes to JSON");

View File

@ -63,7 +63,7 @@ real nodes. Until then, this plan is the priority.
| # | Workstream | Detail doc | Status | | # | Workstream | Detail doc | Status |
|---|-----------|-----------|--------| |---|-----------|-----------|--------|
| A | **Manifest-driven app platform** — packaging contract, single/multi-container runtime, routing, controlled hooks, dev tooling (6 phases, security model, migration rules) | `APP-PACKAGING-MIGRATION-PLAN.md` | mostly done; immich + multi-container polish remain | | A | **Manifest-driven app platform** — packaging contract, single/multi-container runtime, routing, controlled hooks, dev tooling (6 phases, security model, migration rules) | `APP-PACKAGING-MIGRATION-PLAN.md` | mostly done; immich + multi-container polish remain |
| B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **design done — implementing phase 1** | | B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet |
| C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending |
| D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) | | D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 02 code-complete (worktree) |
| E | **Production test gate** — 20× lifecycle on .228 + .198, per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** | | E | **Production test gate** — 20× lifecycle on .228 + .198, per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** |