.198 ground truth: backend apps ARE quadlet (.container files present) -> quadlet is the intended runtime. .228's plain-podman state traced to my cascade-gate uninstall + package.start restore (no quadlet regen). Two real robustness sub-bugs remain (start should regen quadlet; stop podman-fallback gap). Next: canonical gate on CLEAN .198 first to tell real-bug from contamination. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
336 lines
23 KiB
Markdown
336 lines
23 KiB
Markdown
# 🚩 PRODUCTION MASTER PLAN — Archipelago App Platform & Registry
|
||
|
||
> **THIS IS THE AUTHORITATIVE PLAN. Agents: read this first and keep it open until
|
||
> the production test gate (§5) is green.** It overrides ad-hoc direction and
|
||
> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove
|
||
> the priority banner and demote this doc.
|
||
>
|
||
> Last updated: 2026-06-22 · Binary: v1.7.99-alpha · See §8b for the live resume.
|
||
|
||
---
|
||
|
||
## 1. The North Star
|
||
|
||
Make Archipelago a **world-class, developer-ready app platform** where:
|
||
|
||
1. **Every app is manifest-driven** — install/run/update/uninstall needs only the
|
||
app's manifest (+ catalog entry). **Zero OS-level code reliance**: no per-app
|
||
Rust installers, no `sudo mkdir/chown`, no host provisioning.
|
||
2. **Manifests are distributed via the (signed) registry**, not baked into the
|
||
binary OTA as disk files. Bumping/adding an app = a signed catalog change.
|
||
3. **Third-party developers can build and ship apps via an external registry** —
|
||
a decentralized marketplace (DID-signed manifests, Nostr discovery, reputation),
|
||
not a gatekept central store. `archy app validate/render/install/test` tooling.
|
||
4. The platform stays **rootless, secure-by-default, elegant, robust, and
|
||
100%-uptime-capable** (reboot-survivable, self-healing, no data loss on migrate).
|
||
|
||
**Definition of done:** the production test gate (§5) is green for the app set on
|
||
real nodes. Until then, this plan is the priority.
|
||
|
||
## 2. Invariants (never violate)
|
||
|
||
- **Rootless Podman only.** No rootful, no Docker-socket mounts, no privileged
|
||
containers unless explicitly approved. (ADR-001, ADR-009.)
|
||
- **No app-specific business logic in the Rust backend.** The orchestrator owns
|
||
the lifecycle state machine; apps are declarative. Legacy `install_immich_stack`
|
||
(hardcoded `podman run` + `sudo chown`) is the anti-pattern being deleted.
|
||
- **Secrets are manifest-declared** (`generated_secrets`, materialised by
|
||
`container::secrets` 0600/rootless, idempotent + self-healing) — never hardcoded,
|
||
per-app, or logged. Replaces the deleted `ensure_fmcd_password`.
|
||
- **Migrations never destroy data.** Preserve `/var/lib/archipelago/<app>`,
|
||
generated secrets, displayed credentials, public ports, and adoption container
|
||
names. Always provide a rollback path. Stop/recreate only when necessary.
|
||
- **Verify on a real node (.228, then .198) before any tag.**
|
||
|
||
## 3. Current state (2026-06-21)
|
||
|
||
- **~40 apps are manifest-based and Quadlet-migrated** (survive
|
||
`archipelago.service` restart + reboot). Exhaustive per-app table:
|
||
`docs/app-registry-status-2026-06-21.md`.
|
||
- **Legacy holdout: immich** — the one app with **no manifest** and a hardcoded
|
||
Rust stack installer (in-cgroup, not Quadlet). 3 containers, healthy, live data.
|
||
The migration proof case.
|
||
- **Manifests still travel by OTA disk rsync** (`apps/ → /opt/archipelago/apps`).
|
||
The signed catalog (`app-catalog.json`) currently distributes **only image
|
||
overrides** — not full manifests. Gap closed by workstream B.
|
||
- **The 4 companions** (`archy-bitcoin-ui`, `-lnd-ui`, `-electrs-ui`,
|
||
`-fedimint-ui`) build from `docker/<name>` contexts via `companion.rs`, not the
|
||
manifest registry — a later phase folds them in.
|
||
- **No app has passed the formal production gate (5× for now, was 20×).** That is the blocker.
|
||
|
||
## 4. Workstreams (each links its authoritative detail doc)
|
||
|
||
| # | Workstream | Detail doc | Status |
|
||
|---|-----------|-----------|--------|
|
||
| A | **Manifest-driven app platform** — packaging contract, single/multi-container runtime, routing, controlled hooks, dev tooling (6 phases, security model, migration rules) | `APP-PACKAGING-MIGRATION-PLAN.md` | mostly done; immich + multi-container polish remain |
|
||
| B | **Registry-distributed manifests** — catalog carries full signed manifest; orchestrator installs from registry; disk = migration fallback | `registry-manifest-design.md` | **phases 1+2 done** (node consume + opt-in publisher embed); not yet flipped on for the fleet |
|
||
| C | **Developer-ready external registry** — 3rd-party DID-signed manifests, decentralized Nostr discovery (NIP-78 kind 30078) + trust score, `archy app …` tooling | `marketplace-protocol.md`, `app-developer-guide.md` | design exists; tooling + trust UX pending |
|
||
| D | **Distribution backbone** — signed catalog, BLAKE3 content-addressing, iroh swarm (origin-always-wins) | `dht-distribution-design.md` | phases 0–2 code-complete (worktree) |
|
||
| E | **Production test gate** — 5× lifecycle on .228 + .198 (for now; was 20×), per-app L1/L2 matrix | `tests/lifecycle/TESTING.md`, `bulletproof-containers.md` | **never green — exit criterion** |
|
||
|
||
**Orchestrator architecture** (foundation for A/B): `rust-orchestrator-migration.md`
|
||
(ProdContainerOrchestrator, BootReconciler 30s level-triggered reconcile, adoption
|
||
scan, Quadlet rendering) and `bulletproof-containers.md` (the six container failure
|
||
modes FM1–FM6 + the desired-state-first reconciler that fixes them).
|
||
|
||
## 5. Production test gate (exit criterion)
|
||
|
||
An app is **production-ready** only when `tests/lifecycle/run-20x.sh` is green
|
||
across the full matrix — install / UI-reachable / stop / start / restart /
|
||
reinstall / **reboot-survive** / **archipelago-restart-survive** / uninstall —
|
||
**5× on .228 AND .198 for now** (`ARCHY_ITERATIONS=5`; temporarily reduced from
|
||
20× — restore to 20× before the final ship). All 8 gate checkboxes in `tests/lifecycle/TESTING.md`
|
||
are currently unchecked. Coverage today: L0 unit (631 ●), L1 RPC ● for 6 core apps,
|
||
L2 UI ● dashboard + proxies; L3 survival ◐; ~30 apps have zero automated coverage.
|
||
|
||
## 6. Immediate sequence (live workstream)
|
||
|
||
1. ✅ **B-phase 1** — `manifest` field on `AppCatalogEntry`; `load_manifests`
|
||
catalog-wins merge; `manifest_dir` kept (build-source catalog manifests skipped
|
||
in phase 1); unit tests. *(commit 220666d3)*
|
||
2. ✅ **B-phase 2** — `EMBED_MANIFESTS` publisher generator + round-trip guard.
|
||
*(7bfbe8fe; signing via existing ceremony — not yet flipped on for the fleet.)*
|
||
3. ✅ **C immich proof** — immich is a manifest-driven stack (immich + immich-postgres
|
||
+ immich-redis) installed via `install_stack_via_orchestrator`; legacy installer
|
||
is now fallback-only. Live-migrated + verified on .228. Found+fixed: container_name
|
||
duplicate-on-shared-PGDATA, version-digit validation, partial-fallback hardening,
|
||
data_uid 100998. Canonical app_id `immich` (title+icon). *(9e6c5370, d5ef4573)*
|
||
4. ✅ **Reboot-survival** — podman-restart.service enabled (startup, fleet-wide)
|
||
for the podman-`--restart` path. *(f160e0c4)*
|
||
5. ◻ **Verify on .198** (immich migration validated on .228 only so far).
|
||
6. ◻ **E** — run the 5× gate (`ARCHY_ITERATIONS=5`, was 20×); fix until green.
|
||
7. ◻ Demote this banner.
|
||
|
||
**Not yet done / deliberate follow-ups:** flip `EMBED_MANIFESTS` on for the
|
||
published catalog (then sign) to actually distribute manifests via the registry;
|
||
Phase-3 `use_quadlet_backends` rollout so orchestrator backends are Quadlet (not
|
||
just podman-`--restart`); immich on .198.
|
||
|
||
## 7. Release blockers & operational gotchas (durable)
|
||
|
||
Carried forward from prior handoffs (deduped against persistent memory):
|
||
|
||
- **Rootless control-plane responsiveness** — slow `podman ps`/store cleanup at
|
||
startup must not surface a false "no apps installed" UI. **My Apps must preserve
|
||
last-known apps during scanner backoff**, never show empty during a transient.
|
||
- **Reboot survival** — gate on ≥3 (prefer 5) consecutive clean post-reboot
|
||
lifecycle passes. Quadlet units under `user.slice` survive `archipelago.service`
|
||
restart; legacy in-cgroup containers get SIGKILLed and reconciled back.
|
||
- **Startup patterns** — wait on a socket/health, never `sleep`. Tailscale waits
|
||
for its socket; Fedimint Guardian waits for Bitcoin RPC `initialblockdownload:false`
|
||
before launching fedimintd (proxy/wait companion on :8175 during IBD).
|
||
- **Bitcoin must run full** (`txindex=1`, non-pruned) for ElectrumX/mempool.
|
||
- **Adoption** — match existing containers by name and adopt without recreate;
|
||
record a migration version in app state; preserve Nostr signer bridges
|
||
(IndeeHub needs `/nostr-provider.js` served, not just port reachability).
|
||
- **Image presence** — use bounded targeted `podman image inspect`, not
|
||
`podman image exists` (avoids store-walk stalls).
|
||
- **Companion rebuilds** — `companion.rs` must rebuild `:latest` when the build
|
||
context changes (staleness check), else baked-in fixes (e.g. guardian CSS) never
|
||
reach nodes. `:local` is a manual override, never auto-rebuilt.
|
||
|
||
## 8. Roadmap
|
||
|
||
**Pipeline:** Feature Testing (internal) → User Testing (controlled hardware) →
|
||
Beta Live (public). Hardening priorities feeding the gate:
|
||
|
||
- **P0** Container app reliability — bulletproof install/health/restart/uninstall
|
||
across all apps, dependency chains, multi-container stacks.
|
||
- **P0** Networking stack first-install → reboot-proof (WireGuard/NetBird, Tor
|
||
hidden services, LND Connect).
|
||
- **P1** LUKS2 full-partition encryption for `/var/lib/archipelago/`
|
||
(AES-256-XTS, Argon2id, key from setup password + hardware salt).
|
||
- **P1** Meshtastic plug-and-play parity with MeshCore.
|
||
|
||
**Post-beta (deferred — do not start until gate is green):** P2P encrypted
|
||
voice/video (WebRTC over federation via Tor); watch-only wallet + mesh BTC
|
||
hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.md`);
|
||
Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash
|
||
phases 2–6 (`dual-ecash-design.md`).
|
||
|
||
## 8b. SESSION STATE + RESUME (updated 2026-06-22) — READ THIS FIRST ON RESUME
|
||
|
||
### Where we are — Task #20 (manifest lifecycle hooks) + indeedhub migration: DONE & 2-node verified
|
||
|
||
Manifest-driven lifecycle hooks + the IndeedHub stack migration are **complete and
|
||
live-verified on BOTH .228 and .198** (adoption + fresh-create + post_install hook
|
||
exec, stable under load). 15 commits this session: `4c1a4e59`..`e2a012d0`. Working
|
||
tree clean. The release lifecycle gate is temporarily **5×** (was 20×; `ARCHY_ITERATIONS=5`).
|
||
|
||
**Shipped (all on `main`, newest first):**
|
||
- `e2a012d0` indeedhub frontend health → `tcp:7777` (was http GET `/`; the http check
|
||
false-failed under load and the reconciler churned the frontend — fixed).
|
||
- `ff78b312` hook `exec` runs in a transient user scope
|
||
(`systemd-run --user --scope --quiet --collect podman exec …`) — fixes
|
||
"crun: write cgroup.procs: Permission denied" when exec'ing from archipelago.service.
|
||
- `ff8f11b8` indeedhub frontend caps `[CHOWN,DAC_OVERRIDE,SETGID,SETUID]` — nginx
|
||
workers died "setgid(101) failed" under the orchestrator's `--cap-drop=ALL`.
|
||
- `b73084db` DELETED the legacy indeedhub orchestrator special-cases (−382 lines:
|
||
reconcile_indeedhub_stack, start_indeedhub_backends, the 120s dependency-DNS gate,
|
||
patch_indeedhub_nostr_provider, repair_indeedhub_network_aliases, INDEEDHUB_* consts)
|
||
→ "indeedhub" now uses the GENERIC install_fresh/reconcile path.
|
||
- `b1eea8c0` 7 indeedhub manifests (apps/indeedhub{,-postgres,-redis,-minio,-relay,-api,
|
||
-ffmpeg}) + `install_indeedhub_stack` orchestrator-first (immich pattern).
|
||
- `b94b61f6` `network_aliases` ContainerConfig field (podman_client + quadlet rendering,
|
||
DNS-label validated) — lets the frontend nginx reach `api:4000`/`minio:9000`/`relay:8080`
|
||
on the dedicated `indeedhub-net`.
|
||
- `955c54b7`/`4c1a4e59` #20 hooks phases 1-2: schema (LifecycleHooks/HookStep/HostCopy in
|
||
archipelago-container::manifest) + executor `container::hooks::run_post_install`
|
||
(allowlist-canonicalised copy_from_host + scoped exec), wired into `install_fresh`.
|
||
- `84031e62` gate 20×→5× (docs only: CLAUDE.md, this file, tests/lifecycle/TESTING.md).
|
||
|
||
**Design = adoption-safe + manifest-driven.** Manifests reproduce the live install exactly
|
||
so existing nodes ADOPT (NoOp) instead of recreate: hyphen container_names the runtime
|
||
already references, named volumes `indeedhub-{postgres,redis,minio,relay}-data`,
|
||
`indeedhub-net` + network_aliases [postgres|redis|minio|relay|api], generated_secrets reuse
|
||
the live /var/lib/archipelago/secrets values (ensure_one no-ops on existing; postgres pw is
|
||
fixed at PGDATA init). minio user "indeeadmin" + AES_MASTER_SECRET literal kept. The
|
||
frontend image indeedhub:1.0.0 already bakes the iframe nginx (X-Frame omit + nostr-provider.js
|
||
+ sub_filter), so the post_install hook (sed X-Frame / copy nostr-provider.js / inject /
|
||
nginx reload) is defensive/idempotent. crash_recovery.rs's frontend-after-deps ordering
|
||
guard is KEPT on purpose (beneficial; not a blocker).
|
||
|
||
### ⚠️ GATE FINDING 2026-06-22 — `package.stop` non-propagation (mostly self-inflicted on .228; verify on .198)
|
||
|
||
Step 1 (sync .228 tcp-health manifest) is **DONE + verified** (frontend adopted, UI 200, no
|
||
churn). Step 2 (the 5× gate) surfaced a `package.stop` failure — **but the headline cause turned
|
||
out to be MY cascade-gate contaminating .228**, not a fundamental product gap. Severity downgraded
|
||
from ⛔ to ⚠️ after the .198 ground-truth check (below). Still has a real robustness sub-bug.
|
||
|
||
**Symptom.** On (post-contamination) .228, `package.stop electrumx` returns `{"status":"stopping"}`
|
||
but the container **never stops** — `container-list` shows `running` 66s+. The gate's
|
||
`wait_for_container_status electrumx stopped 60` times out. Same hit bitcoin-knots/btcpay/fedimint/
|
||
immich. **Contrast:** `filebrowser` stops correctly (`running → stopped` ~6s).
|
||
|
||
**.198 GROUND TRUTH (decisive, checked 2026-06-22):** .198 (untouched today) **HAS quadlet
|
||
`.container` files for backend apps** — `bitcoin-knots.container`, `btcpay-server.container`,
|
||
`fedimint.container`, `filebrowser.container`, `indeedhub.container`, `gitea.container`,
|
||
`grafana.container`, `botfights.container`, `archy-{btcpay-db,nbxplorer}.container`, the
|
||
`fedimint*`/`indeedhub*` members, etc. **⇒ Quadlet IS the intended backend runtime.** .228 instead
|
||
has NONE of these (only the 4 UI companions + home-assistant; `bitcoin-core.container` is
|
||
`.disabled-20260506`). **So .228's plain-podman state is contamination:** my cascade-destructive
|
||
gate UNINSTALLED its apps (removing the `.container` files) and my `package.start` restore brought
|
||
them back as plain `podman run --restart=unless-stopped` **without regenerating the quadlet units**.
|
||
`podman inspect electrumx` on .228 → `PODMAN_SYSTEMD_UNIT` EMPTY; `systemctl --user stop
|
||
electrumx.service` → `Unit not loaded (rc=5)`. (NB: electrumx specifically shows no `PODMAN_SYSTEMD_UNIT`
|
||
on .198 too — confirm whether electrumx has its own `.container` on .198; the listing was truncated.)
|
||
|
||
**Two real sub-bugs remain (independent of the contamination):**
|
||
1. **`package.start`/restore recreates a container as plain podman when its quadlet unit is missing**
|
||
instead of regenerating the `.container` unit — leaving it un-stoppable via systemctl. Should
|
||
reconcile the quadlet unit, not fall back to bare podman silently.
|
||
2. **`prod_orchestrator::stop()` podman-fallback doesn't fire for electrumx-class apps.** Stop path
|
||
(prod_orchestrator.rs:2890): `loaded(app_id)?` → `quadlet::stop_service` (fail-soft) →
|
||
`runtime.stop_container` (podman). `compute_container_name(electrumx)` → bare `"electrumx"`
|
||
(correct target). filebrowser reaches the fallback and stops; electrumx does NOT ⇒ suspect
|
||
`loaded("electrumx")` erroring before the fallback AND the error not classed as
|
||
`is_unknown_app_id_error` (so `do_orchestrator_package_stop` never reaches `do_package_stop`).
|
||
Confirm by promoting the best-effort `install_log("STOP …")`/`STOP FAIL` to `tracing::error!`
|
||
(it was empty in .228's install log) and reading `loaded()` + `is_unknown_app_id_error`.
|
||
|
||
**Correction to the status doc:** the "Quadlet-everywhere ~96%" survey may have mis-read the signal
|
||
*on contaminated nodes*; .198 genuinely is quadlet, so re-survey from `.container` file presence +
|
||
`PODMAN_SYSTEMD_UNIT`, not from "container running".
|
||
- `prod_orchestrator::stop()` (prod_orchestrator.rs:2890) does: `self.loaded(app_id)?` →
|
||
`quadlet::stop_service("{name}.service")` (fails-soft for non-quadlet) → `runtime.stop_container`
|
||
(podman fallback). For electrumx, `compute_container_name` → bare `"electrumx"` (correct target).
|
||
filebrowser hits the podman fallback and stops; electrumx does NOT ⇒ suspect `loaded("electrumx")`
|
||
**erroring before the fallback** (manifest not loaded in orchestrator) AND the error not being
|
||
classed as `is_unknown_app_id_error` (so `do_orchestrator_package_stop` never falls back to the
|
||
plain-podman `do_package_stop`). **NEXT: confirm by capturing the `STOP:`/`STOP FAIL:` line**
|
||
(the best-effort install-log was empty on .228 — promote it to a `tracing::error!` so the failure
|
||
reason is visible in journalctl, or reproduce locally with the mock orchestrator) and inspecting
|
||
`loaded()` + `is_unknown_app_id_error`.
|
||
- The **stop→stopped STATE reporting is correct** when the container actually stops: server.rs:1334
|
||
keeps a `--rm`'d app visible as `Stopped` via the `user_stopped` guard (proven on filebrowser).
|
||
So the bug is purely "container never stops", not "state not reported".
|
||
|
||
**Quadlet-vs-podman question: RESOLVED.** Quadlet is intended (.198 has the `.container` files;
|
||
see ground-truth block above). No need to redesign — the work is (a) restore .228's quadlet units,
|
||
(b) fix the two robustness sub-bugs, (c) re-run the canonical gate on a clean node.
|
||
|
||
### MY-SESSION ERRATA (own it on resume)
|
||
- I ran the gate with `ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1`, which is **NOT** the canonical gate (that
|
||
is `ARCHY_ALLOW_DESTRUCTIVE=1` only — stop/start/restart, no uninstall/reinstall; see run-20x.sh
|
||
"Suggested release-gate invocation"). Cascade ran uninstall/reinstall on every app and, when I
|
||
killed the run mid-iteration, left bitcoin-knots/electrumx/btcpay/fedimint/immich uninstalled or
|
||
stranded. **I fully restored .228** (reinstalled bitcoin-knots with the correct image
|
||
`146.59.87.168:3000/lfg2025/bitcoin-knots:latest`; started the rest; cleared a stale
|
||
`user-stopped.json`). Verified healthy: UI 200, 35 containers, 17 apps `running`.
|
||
- Reinstall gotcha: `package.install` needs a REAL image ref in `dockerImage`; a bare app name
|
||
→ `Invalid Docker image format`.
|
||
|
||
### NEXT STEPS (in order)
|
||
1. ✅ **DONE — .198 ground truth:** quadlet is intended (.198 has the backend `.container` files).
|
||
2. **Run the CANONICAL gate on .198 FIRST** — it is the clean, properly-quadletized node (I did NOT
|
||
touch it today). `ARCHY_HOST=192.168.1.198 ARCHY_SCHEME=https ARCHY_PASSWORD='ThisIsWeb54321@'
|
||
ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5 tests/lifecycle/run-20x.sh`. NO cascade; never kill
|
||
mid-iteration. This tells us whether the stop bug reproduces on a quadlet-correct node (→ real
|
||
product bug) or was purely .228 contamination (→ just re-quadletize .228).
|
||
3. **Restore .228's quadlet units** — properly reinstall its backend apps so `.container` files
|
||
regenerate (match .198). The cleanest route is the gate's own install path or a forced reconcile;
|
||
verify `.container` files reappear + `PODMAN_SYSTEMD_UNIT` is set, then re-run the gate on .228.
|
||
4. **Fix the two robustness sub-bugs** (only if they reproduce on quadlet-correct nodes / as
|
||
hardening): (a) `package.start` must regenerate a missing quadlet unit, not fall back to bare
|
||
podman; (b) `prod_orchestrator::stop()` podman-fallback must fire when there's no quadlet unit
|
||
(`loaded()` failure / non-`unknown_app_id` error must not abort the stop). Add a mock-orchestrator
|
||
test reproducing electrumx-style "no quadlet unit" stop.
|
||
5. **netbird migration (#20 phase 4)** — same pattern; assess setup steps first (TLS cert gen,
|
||
config files, resolver IP — may need host-file-write hooks beyond exec/copy_from_host; legacy is
|
||
install_netbird_stack in stacks.rs).
|
||
6. Then single-container legacy apps onto the orchestrator install flow; then demote the banner.
|
||
|
||
### KNOWN ISSUES / WATCH-OUTS
|
||
- **.198 is a weak/loaded node** (load avg ~3–5). The generic reconcile recreates
|
||
containers it deems unhealthy; under load, false-failing health checks → churn. The
|
||
tcp-health fix (`e2a012d0`) mitigated the frontend case. If the lifecycle gate churns on
|
||
.198, look for other apps whose http health checks false-fail under load → prefer tcp.
|
||
- **Many concurrent SSH sessions to .198 wedge its sshd** (MaxStartups) — it pings but SSH
|
||
hangs for minutes. Use ONE ssh at a time to .198; `pkill -f 192.168.1.198` to clear strays.
|
||
- Hook `exec` only works in the scoped form (committed). `copy_from_host` is direct `cp`.
|
||
|
||
### DEPLOY / VERIFY FACTS (both nodes, ISO Debian, glibc 2.41 — binary built on .116 runs on both)
|
||
- **Build:** `cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`
|
||
(~12 min, opt-level=3). Binary at `core/target/release/archipelago`. Linker
|
||
"undefined hidden symbol" → rebuild with CARGO_INCREMENTAL=0. `archipelago` is a
|
||
bin-only crate (no lib). Filtered tests: `cargo test -p archipelago --bin archipelago -- hooks quadlet`.
|
||
- **Sideload:** `scp binary $H:/tmp/archipelago-new` → `sudo systemctl stop archipelago;
|
||
sudo cp /tmp/archipelago-new /usr/local/bin/archipelago; sudo chmod +x …; sudo systemctl
|
||
start archipelago`. Containers SURVIVE the restart (--restart unless-stopped +
|
||
podman-restart.service). Binary path is /usr/local/bin/archipelago.
|
||
- **Manifests** live at /opt/archipelago/apps/<app_id>/manifest.yml (root-owned ok). The
|
||
orchestrator CACHES them at startup → **edit on disk then RESTART archipelago to reload**.
|
||
Bulk deploy: `tar czf t.tgz -C apps indeedhub indeedhub-postgres indeedhub-redis
|
||
indeedhub-minio indeedhub-relay indeedhub-api indeedhub-ffmpeg`; scp; `sudo tar xzf t.tgz
|
||
-C /opt/archipelago/apps`.
|
||
- **Nodes:** .228 = 192.168.1.228, SSH pw `archipelago`, RPC/UI pw `password123` (https).
|
||
.198 = 192.168.1.198, SSH pw `archipelago`, **RPC/UI pw `ThisIsWeb54321@`** (https). Both
|
||
have the 7-container indeedhub stack + secrets + named volumes pre-existing.
|
||
- **Trigger install via RPC:** `auth.login` (sets session+csrf cookies) → send the csrf
|
||
cookie value as `X-CSRF-Token` header → `package.install` with params
|
||
`{"id":"indeedhub","dockerImage":"<any>"}` (dockerImage required even for stacks; install
|
||
is async → returns `{"status":"installing"}`). install logs go to
|
||
/var/log/archipelago/container-installs.log (best-effort) AND journalctl -u archipelago.
|
||
- **Fresh-create test recipe:** `podman rm -f indeedhub` (stateless frontend) → package.install
|
||
indeedhub → expect install_fresh + post_install hook (all 4 steps `ok`) + UI 200 on :7778
|
||
(/ , /nostr-provider.js, /api/). On adoption the frontend is NoOp (hook does NOT run —
|
||
install_fresh is the only hook trigger).
|
||
|
||
## 9. Documentation map (what survives)
|
||
|
||
This master plan is the hub. Authoritative standalone docs (linked above), kept:
|
||
|
||
- **Design:** `architecture.md`, `app-developer-guide.md`,
|
||
`APP-PACKAGING-MIGRATION-PLAN.md`, `registry-manifest-design.md`,
|
||
`marketplace-protocol.md`, `dht-distribution-design.md`,
|
||
`multi-node-architecture.md`, `rust-orchestrator-migration.md`,
|
||
`bulletproof-containers.md`, `three-mode-ui-design.md`, `dual-ecash-design.md`,
|
||
`meshroller-integration-design.md`, `phase4-streaming-ecash-plan.md`, `adr/*`.
|
||
- **Reference:** `app-manifest-spec.md`, `api-reference.md`, `developer-guide.md`,
|
||
`operations-runbook.md`, `troubleshooting.md`, `user-walkthrough.md`,
|
||
`bitcoin-rpc-relay.md`, `security-code-audit-2026-03.md`, `GAMEPAD-NAV.md`,
|
||
`SEED-VERIFICATION.md`, `hotfix-process.md`, `app-registry-status-2026-06-21.md`.
|
||
|
||
All dated handoffs/resumes/transcripts/superseded trackers were consolidated here
|
||
and removed (recoverable via git) on 2026-06-21.
|