# 🚩 PRODUCTION MASTER PLAN β€” Archipelago App Platform & Registry > **THIS IS THE AUTHORITATIVE PLAN. Agents: read this first and keep it open until > the production test gate (Β§5) is green.** It overrides ad-hoc direction and > supersedes all prior roadmap/handoff/status docs. When the gate passes, remove > the priority banner and demote this doc. > > Last updated: 2026-06-21 Β· Binary: v1.7.99-alpha --- ## 1. The North Star Make Archipelago a **world-class, developer-ready app platform** where: 1. **Every app is manifest-driven** β€” install/run/update/uninstall needs only the app's manifest (+ catalog entry). **Zero OS-level code reliance**: no per-app Rust installers, no `sudo mkdir/chown`, no host provisioning. 2. **Manifests are distributed via the (signed) registry**, not baked into the binary OTA as disk files. Bumping/adding an app = a signed catalog change. 3. **Third-party developers can build and ship apps via an external registry** β€” a decentralized marketplace (DID-signed manifests, Nostr discovery, reputation), not a gatekept central store. `archy app validate/render/install/test` tooling. 4. The platform stays **rootless, secure-by-default, elegant, robust, and 100%-uptime-capable** (reboot-survivable, self-healing, no data loss on migrate). **Definition of done:** the production test gate (Β§5) is green for the app set on real nodes. Until then, this plan is the priority. ## 2. Invariants (never violate) - **Rootless Podman only.** No rootful, no Docker-socket mounts, no privileged containers unless explicitly approved. (ADR-001, ADR-009.) - **No app-specific business logic in the Rust backend.** The orchestrator owns the lifecycle state machine; apps are declarative. Legacy `install_immich_stack` (hardcoded `podman run` + `sudo chown`) is the anti-pattern being deleted. - **Secrets are manifest-declared** (`generated_secrets`, materialised by `container::secrets` 0600/rootless, idempotent + self-healing) β€” never hardcoded, per-app, or logged. Replaces the deleted `ensure_fmcd_password`. - **Migrations never destroy data.** Preserve `/var/lib/archipelago/`, generated secrets, displayed credentials, public ports, and adoption container names. Always provide a rollback path. Stop/recreate only when necessary. - **Verify on a real node (.228, then .198) before any tag.** ## 3. Current state (2026-06-21) - **~40 apps are manifest-based and Quadlet-migrated** (survive `archipelago.service` restart + reboot). Exhaustive per-app table: `docs/app-registry-status-2026-06-21.md`. - **Legacy holdout: immich** β€” the one app with **no manifest** and a hardcoded Rust stack installer (in-cgroup, not Quadlet). 3 containers, healthy, live data. The migration proof case. - **Manifests still travel by OTA disk rsync** (`apps/ β†’ /opt/archipelago/apps`). The signed catalog (`app-catalog.json`) currently distributes **only image overrides** β€” not full manifests. Gap closed by workstream B. - **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`, `-fedimint-ui`) build from `docker/` contexts via `companion.rs`, not the manifest registry β€” a later phase folds them in. - **No app has passed the formal 20Γ— production gate.** That is the blocker. ## 4. Workstreams (each links its authoritative detail doc) | # | Workstream | Detail doc | Status | |---|-----------|-----------|--------| | A | **Manifest-driven app platform** β€” packaging contract, single/multi-container runtime, routing, controlled hooks, dev tooling (6 phases, security model, migration rules) | `APP-PACKAGING-MIGRATION-PLAN.md` | mostly done; immich + multi-container polish remain | | B | **Registry-distributed manifests** β€” catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet | | C | **Developer-ready external registry** β€” 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending | | D | **Distribution backbone** β€” signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 0–2 code-complete (worktree) | | E | **Production test gate** β€” 20Γ— lifecycle on .228 + .198, per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green β€” exit criterion** | **Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md` (ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption scan, Quadlet rendering) and `bulletproof-containers.md` (the six container failure modes FM1–FM6 + the desired-state-first reconciler that fixes them). ## 5. Production test gate (exit criterion) An app is **production-ready** only when `tests/lifecycle/run-20x.sh` is green across the full matrix β€” install / UI-reachable / stop / start / restart / reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall β€” **20Γ— on .228 AND .198**. All 8 gate checkboxes in `tests/lifecycle/TESTING.md` are currently unchecked. Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps, L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated coverage. ## 6. Immediate sequence (live workstream) 1. βœ… **B-phase 1** β€” `manifest` field on `AppCatalogEntry`; `load_manifests` catalog-wins merge; `manifest_dir` kept (build-source catalog manifests skipped in phase 1); unit tests. *(commit 220666d3)* 2. βœ… **B-phase 2** β€” `EMBED_MANIFESTS` publisher generator + round-trip guard. *(7bfbe8fe; signing via existing ceremony β€” not yet flipped on for the fleet.)* 3. βœ… **C immich proof** β€” immich is a manifest-driven stack (immich + immich-postgres + immich-redis) installed via `install_stack_via_orchestrator`; legacy installer is now fallback-only. Live-migrated + verified on .228. Found+fixed: container_name duplicate-on-shared-PGDATA, version-digit validation, partial-fallback hardening, data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)* 4. βœ… **Reboot-survival** β€” podman-restart.service enabled (startup, fleet-wide) for the podman-`--restart` path. *(f160e0c4)* 5. β—» **Verify on .198** (immich migration validated on .228 only so far). 6. β—» **E** β€” run the 20Γ— gate; fix until green. 7. β—» Demote this banner. **Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the published catalog (then sign) to actually distribute manifests via the registry; Phase-3 `use_quadlet_backends` rollout so orchestrator backends are Quadlet (not just podman-`--restart`); immich on .198. ## 7. Release blockers & operational gotchas (durable) Carried forward from prior handoffs (deduped against persistent memory): - **Rootless control-plane responsiveness** β€” slow `podman ps`/store cleanup at startup must not surface a false "no apps installed" UI. **My Apps must preserve last-known apps during scanner backoff**, never show empty during a transient. - **Reboot survival** β€” gate on β‰₯3 (prefer 5) consecutive clean post-reboot lifecycle passes. Quadlet units under `user.slice` survive `archipelago.service` restart; legacy in-cgroup containers get SIGKILLed and reconciled back. - **Startup patterns** β€” wait on a socket/health, never `sleep`. Tailscale waits for its socket; Fedimint Guardian waits for Bitcoin RPC `initialblockdownload:false` before launching fedimintd (proxy/wait companion on :8175 during IBD). - **Bitcoin must run full** (`txindex=1`, non-pruned) for ElectrumX/mempool. - **Adoption** β€” match existing containers by name and adopt without recreate; record a migration version in app state; preserve Nostr signer bridges (IndeeHub needs `/nostr-provider.js` served, not just port reachability). - **Image presence** β€” use bounded targeted `podman image inspect`, not `podman image exists` (avoids store-walk stalls). - **Companion rebuilds** β€” `companion.rs` must rebuild `:latest` when the build context changes (staleness check), else baked-in fixes (e.g. guardian CSS) never reach nodes. `:local` is a manual override, never auto-rebuilt. ## 8. Roadmap **Pipeline:** Feature Testing (internal) β†’ User Testing (controlled hardware) β†’ Beta Live (public). Hardening priorities feeding the gate: - **P0** Container app reliability β€” bulletproof install/health/restart/uninstall across all apps, dependency chains, multi-container stacks. - **P0** Networking stack first-install β†’ reboot-proof (WireGuard/NetBird, Tor hidden services, LND Connect). - **P1** LUKS2 full-partition encryption for `/var/lib/archipelago/` (AES-256-XTS, Argon2id, key from setup password + hardware salt). - **P1** Meshtastic plug-and-play parity with MeshCore. **Post-beta (deferred β€” do not start until gate is green):** P2P encrypted voice/video (WebRTC over federation via Tor); watch-only wallet + mesh BTC hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.md`); Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash phases 2–6 (`dual-ecash-design.md`). ## 8b. SESSION STATE + RESUME (2026-06-21, live) **Landed + committed on main this session (newest first):** - `b1eea8c0` indeedhub (#20) **phase 3 β€” CODE COMPLETE, unit-tested; NOT yet live-verified.** 7 manifests (apps/indeedhub-{postgres,redis,minio,relay,api, ffmpeg} + apps/indeedhub frontend) + install_indeedhub_stack orchestrator-first (immich pattern). Data-preserving by construction = ADOPTION on .228: exact live hyphen container names, named volumes indeedhub-*-data, dedicated indeedhub-net + network_aliases [postgres|redis|minio|relay|api], generated_secrets reuse live /var/lib/archipelago/secrets values (ensure_one no-ops on existing). Frontend carries the post_install nginx hook (replaces patch_indeedhub_nostr_provider; defensive since indeedhub:1.0.0 already bakes it). .228 GROUND TRUTH captured: 7 containers Up, volumes indeedhub-{postgres,redis,minio,relay}-data, network indeedhub-net; frontend nginx upstreams api:4000/minio:9000/relay:8080; image already bakes X-Frame strip + nostr-provider.js (6347B) + sub_filter. **NEXT = live verify on .228:** build+sideload binary, restart, package.install indeedhub β†’ expect adoption (NoOp, no data touch), then full lifecycle. Risk: service restart SIGKILL-cascade if Quadlet not fully shipped on .228. - `b94b61f6` `network_aliases` manifest field (ContainerConfig) + podman_client & quadlet rendering + DNS-label validation; also fixed 4 pre-existing from_manifest test failures (network_policy: archy-net invalid; bind sources outside /var/lib/archipelago). Enables indeedhub's short aliases on indeedhub-net. - `955c54b7` hook capability (#20) **phase 2** β€” `container::hooks::run_post_install` executor (podman exec + copy_from_host w/ allowlist canonicalise + symlink-escape prefix check; best-effort/idempotent) wired into `install_fresh` after container is up (fresh-container-only). 5 unit tests; `cargo test -p archipelago` green. - `4c1a4e59` hook capability (#20) **phase 1** β€” `LifecycleHooks`/`HookStep`/`HostCopy` schema + validate() + re-exports + 3 schema tests; also fixed 3 pre-existing `ContainerConfig` test literals missing `generated_secrets` (container crate now compiles; `cargo test -p archipelago-container` green, 53 pass). - `f0c6b79d` immich containers named underscore (immich_server/_postgres/_redis) to match runtime lifecycle code β€” fixes package.stop/start/restart. **immich fully migrated + verified on .228** (manifest-driven stack via orchestrator). - `b0b54a96` immich lifecycle bats suite (tests/lifecycle/bats/immich.bats). - `d5ef4573`/`9e6c5370`/`011081d1` immich migration (renameβ†’immich, orchestrator-first). - `f160e0c4` podman-restart.service enabled at startup (reboot-survival). - `0860dfac` Services-tab UI (backendsβ†’Services, parent icons, categories sub-nav, swipe). - `220666d3`/`7bfbe8fe` registry-manifest infra phases 1+2 (consume + EMBED_MANIFESTS publish). - `192238cb` docs consolidation 56β†’28 + CLAUDE.md. - `03a4ee1b` generated-secrets system + companion/quadlet fixes. **DONE β€” hook capability (#20), phases 1+2 (schema + executor + wiring):** controlled post-install hooks so indeedhub/netbird can migrate. Design: `docs/manifest-hooks-design.md`. Schema, validate(), executor, and install-path wiring all landed + green (commits `4c1a4e59`/`955c54b7` above). Remaining #20 phases: 3 = indeedhub migration (NEXT, below); 4 = netbird; 5 = `pre_start` hooks (type exists, NOT yet executed β€” wire into `prepare_for_start` if/when needed). **NEXT β€” #20 phase 3, indeedhub migration:** author 7 member manifests (postgres/redis/minio/relay/api/ffmpeg + frontend) on archy-net with container-name hostnames; frontend carries the `post_install` hook (strip X-Frame-Options, copy nostr-provider.js, inject script, nginx reload β€” see `patch_indeedhub_nostr_provider` in install.rs:68 for exact ops); wire `install_indeedhub_stack` orchestrator-first; generated_secrets: indeedhub-db-password/indeedhub-jwt/indeedhub-minio-password (reuse live values); preserve hardcoded AES_MASTER_SECRET literal + minio user "indeeadmin". Then netbird (assess its setup steps). Then single-container legacy apps (add to `uses_orchestrator_install_flow` allowlist in install.rs + verify each). Then the lifecycle gate (#6) β€” needs harness hardening (#18) + .228 bitcoin synced. **Test/deploy facts:** .228 = archi resilience node, UI/RPC pw `password123` (https), SSH pw `archipelago`. Lifecycle harness runs from .116: `cd tests/lifecycle && ARCHY_HOST=192.168.1.228 ARCHY_SCHEME=https ARCHY_PASSWORD=password123 ARCHY_ALLOW_DESTRUCTIVE=1 ./run.sh `. RPC trigger: auth.login (sets session + csrf cookies) β†’ send csrf cookie value as `X-CSRF-Token` header. package.install needs `{"id":"","dockerImage":""}` (dockerImage required even for stacks). Rust workspace root = `core/`. Linker `undefined hidden symbol` β†’ rebuild with `CARGO_INCREMENTAL=0`. immich on .228: app_id `immich`, containers immich_server/immich_postgres/immich_redis, data dir owner 100998:100998. ## 9. Documentation map (what survives) This master plan is the hub. Authoritative standalone docs (linked above), kept: - **Design:** `architecture.md`, `app-developer-guide.md`, `APP-PACKAGING-MIGRATION-PLAN.md`, `registry-manifest-design.md`, `marketplace-protocol.md`, `dht-distribution-design.md`, `multi-node-architecture.md`, `rust-orchestrator-migration.md`, `bulletproof-containers.md`, `three-mode-ui-design.md`, `dual-ecash-design.md`, `meshroller-integration-design.md`, `phase4-streaming-ecash-plan.md`, `adr/*`. - **Reference:** `app-manifest-spec.md`, `api-reference.md`, `developer-guide.md`, `operations-runbook.md`, `troubleshooting.md`, `user-walkthrough.md`, `bitcoin-rpc-relay.md`, `security-code-audit-2026-03.md`, `GAMEPAD-NAV.md`, `SEED-VERIFICATION.md`, `hotfix-process.md`, `app-registry-status-2026-06-21.md`. All dated handoffs/resumes/transcripts/superseded trackers were consolidated here and removed (recoverable via git) on 2026-06-21.