lnd's RPC isn't ready until its wallet auto-unlocks on (re)start, which lags the
container 'running' state — single-shot lncli getinfo raced that window and
false-failed (gate tests 60 + 85). Retry up to ~90s like a health probe. lnd is
functional (getinfo returns cleanly once ready).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Mempool and IndeeHub load their real site directly in the iframe (reverted the
proxy/new-tab — per request "use https://indee.tx1138.com/").
- Real app UIs now served as whole static dirs under /app/<id>/ (express.static)
so their bundled assets (qrcode.js, css, bg images) resolve; /app/<id>/assets/*
redirect to the frontend's shared assets. Fixes the console 404 cascade.
- Bitcoin Core/Knots: register rpc/v1 + bitcoin-rpc on their paths (relay-status
no longer 404s); per-impl bitcoin-status preserved.
- AIUI chat returns a fixed line in demo ("Not available in demo, check out the
previous chats to experience AIUI") instead of calling Claude — no key spend.
- Add /api/app-catalog (serves the baked catalog) to stop that 404.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- App UIs now use the real registry shells with dummy data: bitcoin-ui for
Bitcoin Core (Satoshi subversion) and Bitcoin Knots (Knots subversion) via
per-path /app/bitcoin-{core,knots}/bitcoin-status; the real lnd-ui (mock
/proxy/lnd/v1/getinfo+channels, /lnd-connect-info, /api/container/logs); the
static fedimint-ui. ElectrumX already on the real electrs-ui. Custom mock UIs
dropped — accurate UX.
- IndeeHub loads in the iframe: nginx reverse-proxies /app/indeedhub/ →
indee.tx1138.com and strips X-Frame-Options/CSP (it blocked framing before).
- Mempool opens in a new tab (mempool.space can't be iframed).
- Cloud media playback: HTTP Range support in the curated-file server so audio/
video can stream and seek (needs real files dropped into demo/files/).
- Dockerfile/.dockerignore copy docker/lnd-ui + docker/fedimint-ui.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- immich restart: bump wait 120s->240s. Restart = ordered stop+start of the 3-
container stack (postgres->redis->server w/ DB migrations), so it needs at least
as long as the start test (180s) — the old 120s was inconsistent and false-failed
on loaded nodes. immich does return to running.
- fedimint orphan check: the unanchored 'total' regex (^fedimint) counts the
legitimate fedimint-clientd (dual-ecash bridge) but the anchored 'known' regex
omitted it -> total>known false orphan on every node running fedimint-clientd.
Add fedimint-clientd to known.
Both run as LOCAL podman/systemctl on the gate runner, so they test the runner node
(.116), not the RPC target — surfaced while driving the .228 gate green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Mock app UIs (ElectrumX, LND, Fedimint, Bitcoin Core) + the "Not available"
notice now use the Archipelago black theme and show the app's My-Apps icon.
- Bitcoin Core gets its own UI (/app/bitcoin-core/) so it no longer shows Bitcoin
Knots branding; the Knots-branded bitcoin-ui shell is reserved for Bitcoin Knots.
- ElectrumX now serves the real electrs-ui shell (+ qrcode.js + a dummy
/electrs-status) with the correct ElectrumX icon; "Electrs" renamed to ElectrumX.
- My Apps: pre-install Bitcoin Knots again, drop ThunderHub, rename Electrs→ElectrumX.
- App store no longer shows "Checking…" forever in demo — non-demoable apps show
"No demo" immediately (skip the container-scan state).
- Relay endpoint no longer reveals a real domain (randomised host).
- Dockerfile/.dockerignore copy docker/electrs-ui into the backend image.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Independent companion loop (452f05d8) validated on .228: deleted archy-electrs-ui
recreates in ~10s (was stuck 100s+). Also: companion-survives bats does LOCAL
rm/systemctl --user, so running it from .116 via RPC tests .116's companions with
.116's binary, NOT the remote target — must run ON the target node. Explains the
'failed on both nodes' runs (both silently tested .116).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The companion-unit repair stage ran at the END of each boot-reconciler tick, after
reconcile_existing(). On a heavily loaded node that per-app pass takes >60-90s, so a
deleted/lost companion unit (electrs-ui, bitcoin-ui, …) wasn't repaired within any
reasonable window (gate test 31 'deleted unit recreated within one reconcile tick'
timed out at 90s on the 45-app .228 node). Detecting + rewriting a companion unit is
cheap, so spawn it as its own ~interval(30s) loop, independent of the slow app pass.
Handle is aborted when the main loop exits (shutdown uses notify_one, so a second
waiter would steal the wake permit). tick() is now app-reconcile only.
All 4 boot_reconciler cadence tests still green (companion_stage=false in tests).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- nginx-demo.conf + vite proxy now route every /app/<id>/ to the mock backend, so
the per-app mock UIs and the generic "Not available in the demo" notice render
(previously only /app/filebrowser was proxied → most apps 404'd).
- Mempool and IndeeHub now load in the in-app iframe (not a new tab).
- Add an LND Lightning mock UI (channels, balances, routing) with dummy data;
lnd/thunderhub are demoable. Notice page reworded to "Not available in the demo".
- Fix missing icons: Bitcoin Core → bitcoin-core.png, Mempool → mempool.webp.
- Pre-install only Bitcoin Core (drop duplicate Bitcoin Knots; still installable).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Last 2 .228 stragglers confirmed load/timing, not bugs: test 31 (companion recreate)
= contamination + ~108s reconcile cadence > 90s window; test 55 (immich restart) =
heavy stack restarts >120s under load but DOES return. Path to literally-green gate
is infra (bitcoin sync, re-quadletize .228) + minor test-window tuning. Optional
product improvement noted: independent ~30s companion-reconcile cadence.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
companion::reconcile only recreates a deleted companion unit when its parent
backend is in manifest_ids. On contaminated .228, electrumx ran as plain podman
and was NOT a tracked manifest install (manifest on disk but unloaded), so the
reconciler never iterated it -> archy-electrs-ui companion orphaned. Proven:
package.install electrumx re-registered it + restored the companion. Self-heal
logic is sound; test 31 clears on re-quadletize. electrumx on .228 de-contaminated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The backend COPY of docker/bitcoin-ui failed in Portainer because .dockerignore
(* + whitelist) excluded it. Re-include docker/ then exclude its contents except
bitcoin-ui, so the build context contains the Bitcoin UI mock shell. demo/files is
already covered by !demo/.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- demo/files/<Folder>/<file> becomes the cloud's content for every visitor
(read-only; "private login" = git/repo access). Text inlined, binaries streamed
from disk; empty folder falls back to the built-in seeded set.
- Dockerfile.backend now copies docker/bitcoin-ui and demo/files into the image
(they live outside neode-ui/) — this also fixes the Bitcoin UI mock, which the
backend reads from /docker/bitcoin-ui and was previously absent in the container.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.228 104/110, .198 94/110 with the 3-fix binary. Every package.stop test passes on
healthy apps. .198's 14/16 failures trace to bitcoin in IBD (test 83: ~137k blocks
behind) cascading to lnd/btcpay/electrumx/mempool. 2 node-independent: companion
recreate (31, both nodes), fedimint orphan pollution (44). Path to green 5x gate is
now infra (sync bitcoin, re-quadletize .228) + minor (test 31), not lifecycle bugs.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Claude proxy injects a system-prompt describing this node (version, signet
chain + height, wallet balances, installed apps, 5 FIPS peers / 12 trusted nodes)
into every demo chat request. The assistant answers local-node and Bitcoin
questions with the node's real-looking data automatically — no /seed needed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fips.status reports installed+active with 5 authenticated peers and an anchor
connection; list/add/remove/apply seed-anchors and reconnect/install all resolve
to working states so the FIPS Mesh + Seed Anchors cards light green in the demo.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Tx/explorer links open mempool.space/testnet/tx/<id>; the backend hydrates the
wallet's transactions with REAL recent testnet txids at startup (best-effort,
falls back to mock hashes offline). Mempool app + demo-external apps open in a
new tab; deep-link paths are carried through.
- Add the content.* paid-download handlers the buy flow needs (owned-list,
preview-peer, download-peer-{paid,invoice,onchain}, request-invoice,
invoice-status, request-onchain, onchain-status) — every path resolves to a
success state with testnet receive addresses / bolt11 invoices so visitors can
walk the full buy → unlock journey.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
App launching (DEMO):
- resolveAppUrl routes every app to its demo target: mock UIs for Bitcoin Core,
ElectrumX, Fedimint (served by the backend), IndeeHub → iframe indee.tx1138.com,
Mempool → mempool.space/testnet (new tab); all others → a generic "Demo preview"
notice page.
- Non-demoable apps show a disabled "No demo" install button (marketplace details,
app grid, featured apps).
Onboarding:
- Demo treats the visitor as fully set up so the onboarding WIZARD (seed/identity)
is never forced; the welcome intro still replays per day. Intro CTA goes straight
to login; wizard entry points + login restart-onboarding link hidden in demo.
Network:
- federation.list-nodes now returns 12 trusted/federated nodes (9 trusted, 3
observer); transport.peers already at 5.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stop failure was 3 real product bugs (grace / reconcile-resurrection /
container-list user-stopped state), all fixed (2dad64b2, 760a32bc, 6e49ce6f) +
deployed. electrumx lifecycle suite 10/10 green (66s). fedimint 'crash loop' was
probe-induced churn (stable when left alone). Validating breadth next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Turn the mock backend + UI into a public, click-to-play demo deployable as a
Portainer stack, gated behind DEMO=1 (classic single-user mock unchanged when off).
Backend (neode-ui/mock-backend.js):
- Per-session state isolation via AsyncLocalStorage + Proxy: every visitor gets
an isolated, deep-cloned copy of mockData/walletState/userState/etc., keyed by
a demo_sid cookie. Per-session WebSocket fan-out, idle reaper, session cap.
- Real per-session file storage (upload/folder/rename/delete) with a 50MB quota,
replacing the no-op filebrowser handlers; adds the missing app.filebrowser-token RPC.
- Force simulation mode (never touch a host Docker/Podman socket).
- Testnet (signet) flavor; shared login password "entertoexit".
- Report the real app version suffixed with -demo.
Frontend:
- VITE_DEMO build flag (useDemoIntro.ts): replay the intro once per calendar day
per browser; prefill + show the "entertoexit" login hint.
Deploy:
- docker-compose.demo.yml wired for DEMO, UI on :2100 (build-from-repo).
- demo-deploy/ thin stack (prebuilt :demo image refs + .env.example + README).
- .github/workflows/demo-images.yml builds/pushes archy-demo-{web,backend} images.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A user-stopped backend (electrumx, bitcoin, lnd, fedimint) kept reading 'running'
in container-list because its UI companion (electrs-ui, …) still serves the launch
port, and the state-refresh upgrades any reachable launch port to 'running'. The
gate's wait_for_container_status <app> stopped therefore never saw 'stopped'.
Fix: load the user_stopped marker in handle_container_list and force 'stopped' for
those apps before the launch-port refresh. The reconcile guard keeps the backend
down, so the marker is authoritative. package.start clears it first, so a started
app reports 'running' normally.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
package.stop a dependency (e.g. electrumx, a mempool dep) and the reconciler
restarts it within ~8s: the reconcile filter's dependency_required override
re-includes a user-stopped app that an active app depends on, and the in-memory
disabled set is wiped on manifest reload — so ensure_running runs, the stopped
app's unreachable ports look like a fault, the host-port repair restarts it, and
package.stop never sticks (gate 'transitions to stopped' times out).
Fix: guard ensure_running_with_mode on the on-disk user_stopped marker (the single
choke point every reconcile flows through) → Left('user-stopped'). Explicit
install/start clear the marker first (added clear_user_stopped to orchestrator
install/start, symmetric with disabled.remove; start/restart RPC already cleared
it) so user actions are unaffected. The container itself already stopped correctly
— this stops the resurrection.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix deployed to .198+.228, vaultwarden stops clean (no regression). But validation
showed the gate failures are multi-caused: (2) fedimint crash-looping/unhealthy on
both nodes can't be stopped; (3) host-listener repair watchdog restarts
port-unreachable containers fighting stop; (4) gate waits for 'stopped' but apps end
'exited'/'absent' (Exited->Stopped conversion key mismatch); (5) grace vs 60s
gate-timeout (electrumx 300s); (6) .228 contamination. Documented + re-sequenced
NEXT STEPS (fedimint health is the new top blocker).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reproduced live on CLEAN .198: package.stop fedimint -> 'podman stop -t 30
timed out after 30s' -> stop fails -> state reverts to running. Real fleet-wide
bug (NOT .228 contamination). stop_timeout_secs() per-app grace (bitcoin 600/lnd
330/electrumx 300/fedimint 60) is used by legacy stop paths but NOT the
orchestrator path: ContainerRuntime::stop_container hardcodes API ?t=10 / CLI
-t 30, and PODMAN_CLI_DEFAULT_TIMEOUT=30s == the -t grace so the await fires as
podman SIGKILLs. Fix = thread per-app grace + widen wrapper deadline; owner picks
table-based vs manifest-driven stop_grace_secs. Re-escalated to blocker.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
.198 ground truth: backend apps ARE quadlet (.container files present) -> quadlet
is the intended runtime. .228's plain-podman state traced to my cascade-gate
uninstall + package.start restore (no quadlet regen). Two real robustness sub-bugs
remain (start should regen quadlet; stop podman-fallback gap). Next: canonical
gate on CLEAN .198 first to tell real-bug from contamination.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5x gate run surfaced a real blocker: package.stop does not stop electrumx/
bitcoin-knots/btcpay/fedimint/immich (container stays running; gate stop-wait
times out). Root cause chain: these backend apps run as plain podman
--restart=unless-stopped, NOT quadlet units (PODMAN_SYSTEMD_UNIT empty; only UI
companions + home-assistant have .container files; bitcoin-core.container is
.disabled). orchestrator.stop() podman-fallback fires for filebrowser but not
electrumx -> suspect loaded()/is_unknown_app_id_error gap. stop->stopped state
reporting itself is correct (filebrowser proof, user_stopped guard).
Also: corrected the canonical gate invocation (DESTRUCTIVE only, not CASCADE);
restored .228 after my cascade-gate left apps stranded.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On the loaded .198 the frontend churned (created → "unhealthy" → reconciler
recreates → loop). The http health check fetched / through nginx (SPA +
sub_filter) and false-failed under node load; the reconciler then treated the
frontend as wedged and recreated it. nginx binds 7777 at startup, so a tcp
liveness check passes immediately and stays green under load while still
catching a real "nginx not listening" failure. Generous retries/start_period.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live on .228 the post_install `exec` steps failed with "crun: write
cgroup.procs: Permission denied / OCI permission denied": a `podman exec`
launched from archipelago.service can't place its child in the container's
cgroup (under the service's own slice). Wrap `exec` in
`systemd-run --user --scope --quiet --collect podman exec …` so it gets its own
delegated cgroup — same trick as `podman_user_scope` for pasta starts.
`copy_from_host` (a host-side `cp`, no in-container process) stays direct.
Without this only copy_from_host worked; indeedhub happened to be unaffected
(its image pre-bakes the nginx config so the exec steps were no-ops), but the
hook capability is only generally useful with exec working. hooks unit tests
pass; live verify on .228 next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live fresh-create on .228 (post special-case removal) had nginx workers die
with "setgid(101) failed (Operation not permitted)" → workers exited code 2,
port published but nothing served (HTTP 000). The orchestrator does
--cap-drop=ALL, so unlike the legacy `podman run` (default caps) nginx's master
couldn't drop workers to the nginx user. Declare CHOWN/DAC_OVERRIDE/SETGID/SETUID
(SET* to drop the worker user, CHOWN+DAC_OVERRIDE for the tmpfs proxy cache).
Verified on .228: frontend fresh-creates, caps applied, nginx serves, UI 200
incl. /api/ and /nostr-provider.js.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fresh-create path was blocked by hardcoded indeedhub orchestrator logic
that predated and conflicted with the manifest migration:
- ensure_running routed app_id=="indeedhub" → reconcile_indeedhub_stack, which
REFUSED to create the frontend from its manifest (returned Left("stack-managed")).
- run_pre_start_hooks("indeedhub") → start_indeedhub_backends →
wait_for_indeedhub_dependencies_ready(120) — a DNS gate with a chicken-and-egg
bug (required the frontend's own alias present before the frontend could be
created), which failed install_fresh with "dependencies were not ready within
120s" and left the frontend down (caught live on .228).
Delete all of it (−382 lines): reconcile_indeedhub_stack, start_indeedhub_backends,
wait_for_indeedhub_dependencies_ready, indeedhub_api_dependency_dns_ready,
indeedhub_required_aliases_present, repair_indeedhub_network_aliases,
indeedhub_alias_present, patch_indeedhub_nostr_provider, and the INDEEDHUB_*
consts. The manifests now carry everything these did: network_aliases (short
hostnames), generated_secrets, dependencies, and the post_install nginx hook. So
"indeedhub" + every member flows through the generic install_fresh/reconcile path
— the frontend fresh-creates normally and runs its hook.
(crash_recovery.rs's frontend-after-deps ordering guard is kept — it's beneficial
startup ordering, not a blocker.) cargo check + release build green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per user direction: the production test gate is 5x (ARCHY_ITERATIONS=5) on
.228 AND .198 for now, down from 20x. Restore to 20x before the final ship.
Updated CLAUDE.md, PRODUCTION-MASTER-PLAN.md, and tests/lifecycle/TESTING.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Author the IndeedHub stack as 7 manifests (postgres/redis/minio/relay/api/
ffmpeg + frontend) and route install_indeedhub_stack through the
orchestrator first (immich pattern), falling back to the legacy installer
only when the manifests aren't deployed.
Data-preserving by construction — the manifests reproduce the live install
exactly so an existing node ADOPTS rather than recreates:
- container_name = the live hyphenated names the runtime already references
(health_monitor tiers/deps, crash_recovery).
- named volumes indeedhub-{postgres,redis,minio,relay}-data (not bind mounts).
- dedicated indeedhub-net + network_aliases [postgres|redis|minio|relay|api]
so the api/ffmpeg env hostnames and the frontend nginx upstreams resolve
unchanged.
- generated_secrets (indeedhub-db-password/-minio-password owned by their
backends, indeedhub-jwt by the api) reuse the live /var/lib/archipelago/
secrets values (ensure_one no-ops on existing files; postgres pw is fixed
at PGDATA init). minio user "indeeadmin" + AES_MASTER_SECRET literal kept.
The frontend carries the post_install hook (#20) that replaces the hardcoded
patch_indeedhub_nostr_provider: strip X-Frame-Options, refresh
nostr-provider.js from /opt/archipelago/web-ui, inject the <script> if
absent, reload nginx — defensive/idempotent since indeedhub:1.0.0 already
bakes these. Frontend manifest also corrected off its dead Next.js shape
(health check now nginx :7777, tmpfs /run + /var/cache/nginx).
Builds + unit-tested; live adoption/lifecycle verification on .228 next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add `container.network_aliases: Vec<String>` (serde default, DNS-label
validated) so a stack member can answer to short hostnames its peers bake
in, beyond its own container name. Rendered in both runtime paths:
- podman_client: merged (deduped) into the custom-network aliases array.
- quadlet from_manifest: appended after the container name; emitted only
for Bridge networks (slirp/pasta reject aliases).
Needed for the indeedhub migration: its frontend nginx proxies to
`api:4000` / `minio:9000` / `relay:8080`, so those members declare
`network_aliases: [api|minio|relay]` to keep the short names resolvable on
the dedicated indeedhub-net (vs. colliding generic aliases on archy-net).
Also fixes 4 pre-existing from_manifest test failures (unrelated to this
change, surfaced now that the quadlet suite runs green): test manifests
used the long-invalid `network_policy: archy-net` (allowlist is
isolated/bridge/host → moved to network_policy: isolated + container.network)
and bind sources outside /var/lib/archipelago.
Tests: container crate 53 pass; archipelago quadlet+alias 47 pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add container::hooks::run_post_install — runs an app's declarative
post_install hooks against its own running container:
- Exec -> podman exec <container> <args…> (60s timeout-bounded)
- CopyFromHost -> resolve src against allowlist roots (<data_dir>/<app>
and /opt/archipelago), canonicalise + prefix-check (defeats symlink
escape), then podman cp <abs-src> <container>:<dest>
Best-effort + idempotent: a failed step is warned and skipped, never
fails the install — matching the legacy patch_indeedhub_nostr_provider
behaviour this replaces. Wired into install_fresh after the container is
up, so it runs only on a freshly created container (not plain start), and
re-applies on recreate-after-drift.
5 unit tests on resolve_copy_src (accept in-data-dir, reject absolute /
traversal / missing / symlink-escape). cargo test -p archipelago green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add controlled post_install/pre_start hook schema to AppDefinition:
LifecycleHooks/HookStep (Exec | CopyFromHost)/HostCopy with allowlist
validation (relative src, no '..', absolute container dest, non-empty
exec). Re-exported from the crate root. Design: docs/manifest-hooks-design.md.
Also add the missing generated_secrets: vec![] field to three
pre-existing ContainerConfig test literals (the field was added to the
struct in 03a4ee1b but the container crate's own tests were never rerun,
so -p archipelago-container failed to compile). cargo test green: 53 pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
container-list reports stack apps package-level (.name="immich"), so the suite
checks the "immich" package (presence, valid state, :2283 lan-address) rather than
individual container names. Destructive tier fires async stop/start/restart and
asserts on the end state via wait_for_container_status.
KNOWN: the destructive tier is flaky for slow multi-container stacks — bats runs
ops back-to-back with no settling while immich's async stack ops take 30s+, and
stopped reports as "exited" not "stopped". The immich migration itself is verified
working (manual stop/start/restart succeed; all 3 containers healthy). Hardening
the harness for stack apps (inter-op settling + stopped|exited acceptance) is a
follow-up.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
package.stop/start/restart broke ("no containers found" / "no such object
immich_postgres") because the runtime hardcodes the immich stack's container names
as immich_server/immich_postgres/immich_redis (underscore) across 8 files
(lifecycle, health, crash-recovery, ports, config). The migration had named the
containers by app_id (hyphen), mismatching all of it.
Root cause of the earlier failed attempt: container_name was nested under an
`extensions:` block, but `app.extensions` is serde(flatten) — container_name must
be a TOP-LEVEL app key to be read by compute_container_name. Fixed: set
container_name: immich_server / immich_postgres / immich_redis at top level, and
point DB_HOSTNAME/REDIS_HOSTNAME at the underscore aliases. App ids stay hyphen
(immich/immich-postgres/immich-redis) so the catalog identity (title+icon) holds.
Manifest-only change — container names now match existing runtime references, no
code edits to the 8 files. (Deriving stack containers from manifests instead of
hardcoded lists remains a north-star follow-up.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
RPC-based (host-agnostic) lifecycle coverage for the manifest-driven immich stack
(immich + immich-postgres + immich-redis): presence + valid state of all 3 members,
a guard that no legacy underscore containers exist (catches botched migration /
legacy-installer fallback), destructive stop/start/restart of the server with
postgres+redis staying up, and cascade uninstall/reinstall (preserve_data).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Orchestrator-installed backends (immich, btcpay-db, …) run as plain podman
`--restart=unless-stopped` containers until the Phase-3 Quadlet rollout flips
use_quadlet_backends on. Nothing in the codebase enabled the user's
podman-restart.service, so those containers had NO reboot-survival mechanism.
Enable it (idempotent, best-effort) at orchestrator startup so unless-stopped
containers come back after a reboot. Already applied manually on .228 (covers
31 containers incl. immich + btcpay); this codifies it fleet-wide.
The deeper fix (render Quadlet for all orchestrator installs) remains the gated
Phase-3 Quadlet-everywhere rollout.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After the manifest migration the launcher installed as "immich-server" (app_id),
which has no catalog entry → showed the raw id and no icon. Rename the server
manifest app_id immich-server→immich so it matches the catalog/curated "immich"
entry (title "Immich", icon immich.png) and is recognised as a known launcher app
(APP_CATEGORY_MAP) → stays in My Apps. immich_stack_app_ids now installs
[immich-postgres, immich-redis, immich]; orchestrator.install bypasses package
routing so there's no recursion with the "immich"→stack-installer mapping.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Classify databases/APIs/backends into Services (#10): add immich-postgres/redis
to SERVICE_NAMES; isServiceContainer matches -postgres/-redis/-valkey/-cache/-db
suffixes; isWebsitePackage final fallback now routes any no-UI, non-known package
to Services ("anything that isn't the frontend UI launcher").
- Services show their parent app's icon (#14): backends reuse the app logo
(immich-* → immich, archy-btcpay-db → btcpay, indeedhub-* → indeedhub, etc.)
via explicit APP_ICON_FALLBACKS + prefix map, instead of 404 → 📦.
- Categories sub-nav for Services (#12): getServiceCategory + buildServiceCategories
+ useServiceCategories; Services tab gets the same desktop/mobile category strips
(Databases/Caches/APIs/Backends), shown only for categories with items. Shared
selectedCategory resets to 'all' on tab switch.
- Mobile swipe (#11): the tab-swipe gesture is suppressed over .mobile-category-strip
so swiping the category chips scrolls them instead of changing tabs (covers both
My Apps and the new Services strip).
vue-tsc build clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>