Compare commits

...

10 Commits

Author SHA1 Message Date
archipelago
2715f2d847 feat(demo): public multi-visitor demo sandbox for Portainer
Turn the mock backend + UI into a public, click-to-play demo deployable as a
Portainer stack, gated behind DEMO=1 (classic single-user mock unchanged when off).

Backend (neode-ui/mock-backend.js):
- Per-session state isolation via AsyncLocalStorage + Proxy: every visitor gets
  an isolated, deep-cloned copy of mockData/walletState/userState/etc., keyed by
  a demo_sid cookie. Per-session WebSocket fan-out, idle reaper, session cap.
- Real per-session file storage (upload/folder/rename/delete) with a 50MB quota,
  replacing the no-op filebrowser handlers; adds the missing app.filebrowser-token RPC.
- Force simulation mode (never touch a host Docker/Podman socket).
- Testnet (signet) flavor; shared login password "entertoexit".
- Report the real app version suffixed with -demo.

Frontend:
- VITE_DEMO build flag (useDemoIntro.ts): replay the intro once per calendar day
  per browser; prefill + show the "entertoexit" login hint.

Deploy:
- docker-compose.demo.yml wired for DEMO, UI on :2100 (build-from-repo).
- demo-deploy/ thin stack (prebuilt :demo image refs + .env.example + README).
- .github/workflows/demo-images.yml builds/pushes archy-demo-{web,backend} images.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:28:05 -04:00
archipelago
29cd167894 docs(gate): stop-grace fix shipped+validated; gate is multi-caused (5 issues)
Fix deployed to .198+.228, vaultwarden stops clean (no regression). But validation
showed the gate failures are multi-caused: (2) fedimint crash-looping/unhealthy on
both nodes can't be stopped; (3) host-listener repair watchdog restarts
port-unreachable containers fighting stop; (4) gate waits for 'stopped' but apps end
'exited'/'absent' (Exited->Stopped conversion key mismatch); (5) grace vs 60s
gate-timeout (electrumx 300s); (6) .228 contamination. Documented + re-sequenced
NEXT STEPS (fedimint health is the new top blocker).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 08:07:43 -04:00
archipelago
2dad64b2ee fix(stop): honour per-app graceful-stop grace in orchestrator stop path
package.stop left slow-to-SIGTERM apps (fedimint/electrumx/bitcoin/btcpay/immich)
running: the orchestrator path hardcoded podman API ?t=10 / CLI -t 30 and the CLI
wrapper deadline (30s) equalled the -t grace, so the await fired exactly as podman
SIGKILLed -> stop reported failed -> state reverted to running. Reproduced live on
clean .198 (fedimint).

- container/runtime.rs: add ContainerRuntime::stop_container_with_grace (defaulted
  so mock/dev impls are unchanged); PodmanRuntime honours grace for API + CLI with
  deadline = grace + 15s buffer; AutoRuntime delegates. New canonical per-app table
  stop_grace_secs_for() + DEFAULT_STOP_GRACE_SECS / STOP_GRACE_DEADLINE_BUFFER_SECS.
- podman_client.rs: stop_container_with_grace uses ?t=<grace> + longer HTTP deadline.
- prod_orchestrator::stop: resolve grace = manifest stop_grace_secs (north-star) else
  the table; pass to quadlet::stop_service_with_timeout AND stop_container_with_grace.
- quadlet.rs: stop_service_with_timeout so slow apps aren't SIGKILLed at 45s.
- rpc/package/runtime.rs: doc-note its &str stop_timeout_secs mirrors the canonical table.
- tests: resolve_stop_grace_secs (manifest field wins / table fallback / default 30).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 06:59:40 -04:00
archipelago
470e3c649a docs(gate): ROOT-CAUSE the stop blocker — orchestrator ignores per-app stop grace
Reproduced live on CLEAN .198: package.stop fedimint -> 'podman stop -t 30
timed out after 30s' -> stop fails -> state reverts to running. Real fleet-wide
bug (NOT .228 contamination). stop_timeout_secs() per-app grace (bitcoin 600/lnd
330/electrumx 300/fedimint 60) is used by legacy stop paths but NOT the
orchestrator path: ContainerRuntime::stop_container hardcodes API ?t=10 / CLI
-t 30, and PODMAN_CLI_DEFAULT_TIMEOUT=30s == the -t grace so the await fires as
podman SIGKILLs. Fix = thread per-app grace + widen wrapper deadline; owner picks
table-based vs manifest-driven stop_grace_secs. Re-escalated to blocker.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 06:17:23 -04:00
archipelago
a111d79a05 docs(gate): downgrade stop-blocker ⚠️ — .198 has quadlet units, .228 state was my contamination
.198 ground truth: backend apps ARE quadlet (.container files present) -> quadlet
is the intended runtime. .228's plain-podman state traced to my cascade-gate
uninstall + package.start restore (no quadlet regen). Two real robustness sub-bugs
remain (start should regen quadlet; stop podman-fallback gap). Next: canonical
gate on CLEAN .198 first to tell real-bug from contamination.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 06:00:42 -04:00
archipelago
47026fae30 docs(gate): document package.stop blocker + quadlet-vs-podman finding (.228)
5x gate run surfaced a real blocker: package.stop does not stop electrumx/
bitcoin-knots/btcpay/fedimint/immich (container stays running; gate stop-wait
times out). Root cause chain: these backend apps run as plain podman
--restart=unless-stopped, NOT quadlet units (PODMAN_SYSTEMD_UNIT empty; only UI
companions + home-assistant have .container files; bitcoin-core.container is
.disabled). orchestrator.stop() podman-fallback fires for filebrowser but not
electrumx -> suspect loaded()/is_unknown_app_id_error gap. stop->stopped state
reporting itself is correct (filebrowser proof, user_stopped guard).

Also: corrected the canonical gate invocation (DESTRUCTIVE only, not CASCADE);
restored .228 after my cascade-gate left apps stranded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 05:47:11 -04:00
archipelago
d6fa262d69 docs(#20): consolidate master-plan resume — indeedhub migration 2-node verified (.228+.198); cutoff-proof next-steps + deploy facts
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 04:23:52 -04:00
archipelago
e2a012d086 fix(indeedhub): frontend health = tcp:7777 not http GET / (stops reconcile churn)
On the loaded .198 the frontend churned (created → "unhealthy" → reconciler
recreates → loop). The http health check fetched / through nginx (SPA +
sub_filter) and false-failed under node load; the reconciler then treated the
frontend as wedged and recreated it. nginx binds 7777 at startup, so a tcp
liveness check passes immediately and stays green under load while still
catching a real "nginx not listening" failure. Generous retries/start_period.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 03:39:26 -04:00
archipelago
e4d3f94913 docs(#20): hook exec cgroup gap FIXED + verified on .228 (scoped exec)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 17:57:17 -04:00
archipelago
ff78b31212 fix(hooks): run post_install exec in a transient user scope (fixes cgroup denial)
Live on .228 the post_install `exec` steps failed with "crun: write
cgroup.procs: Permission denied / OCI permission denied": a `podman exec`
launched from archipelago.service can't place its child in the container's
cgroup (under the service's own slice). Wrap `exec` in
`systemd-run --user --scope --quiet --collect podman exec …` so it gets its own
delegated cgroup — same trick as `podman_user_scope` for pasta starts.
`copy_from_host` (a host-side `cp`, no in-container process) stays direct.

Without this only copy_from_host worked; indeedhub happened to be unaffected
(its image pre-bakes the nginx config so the exec steps were no-ops), but the
hook capability is only generally useful with exec working. hooks unit tests
pass; live verify on .228 next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 17:38:23 -04:00
19 changed files with 1020 additions and 260 deletions

67
.github/workflows/demo-images.yml vendored Normal file
View File

@ -0,0 +1,67 @@
name: Demo images
# Builds and pushes the public-demo images on every change to the UI / mock
# backend, so the separated `archy-demo` Portainer stack auto-tracks the real
# code (see demo-deploy/ and docs/demo-deployment-design.md).
#
# Required repo configuration:
# vars.DEMO_REGISTRY e.g. 146.59.87.168:3000/lfg2025
# secrets.DEMO_REGISTRY_USER
# secrets.DEMO_REGISTRY_TOKEN
# Optional:
# secrets.PORTAINER_WEBHOOK redeploy hook called after a successful push
on:
push:
branches: [main]
paths:
- 'neode-ui/**'
- 'docker-compose.demo.yml'
- '.github/workflows/demo-images.yml'
workflow_dispatch:
jobs:
build:
name: Build & push demo images
runs-on: ubuntu-latest
# Skip cleanly on forks / before registry config is set.
if: ${{ vars.DEMO_REGISTRY != '' }}
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Log in to registry
uses: docker/login-action@v3
with:
registry: ${{ vars.DEMO_REGISTRY_HOST || vars.DEMO_REGISTRY }}
username: ${{ secrets.DEMO_REGISTRY_USER }}
password: ${{ secrets.DEMO_REGISTRY_TOKEN }}
- name: Build & push backend
uses: docker/build-push-action@v6
with:
context: .
file: neode-ui/Dockerfile.backend
push: true
tags: |
${{ vars.DEMO_REGISTRY }}/archy-demo-backend:demo
${{ vars.DEMO_REGISTRY }}/archy-demo-backend:${{ github.sha }}
- name: Build & push web
uses: docker/build-push-action@v6
with:
context: .
file: neode-ui/Dockerfile.web
push: true
build-args: |
VITE_DEMO=1
tags: |
${{ vars.DEMO_REGISTRY }}/archy-demo-web:demo
${{ vars.DEMO_REGISTRY }}/archy-demo-web:${{ github.sha }}
- name: Trigger Portainer redeploy
if: ${{ success() && secrets.PORTAINER_WEBHOOK != '' }}
run: curl -fsS -X POST "${{ secrets.PORTAINER_WEBHOOK }}"

View File

@ -67,14 +67,18 @@ app:
- exec: ["sh", "-c", "grep -q nostr-provider /etc/nginx/conf.d/default.conf || sed -i 's#</head>#<script src=\"/nostr-provider.js\"></script></head>#' /etc/nginx/conf.d/default.conf"]
- exec: ["nginx", "-s", "reload"]
# TCP liveness on the nginx port, NOT an http GET of /. nginx binds 7777 at
# startup (before workers), so this passes immediately and stays green under
# load. An http check of / runs the SPA + sub_filter and false-fails when the
# node is busy → the reconciler then treats the frontend as wedged and
# recreates it in a loop (observed churning the frontend on the loaded .198).
health_check:
type: http
endpoint: http://localhost:7777
path: /
type: tcp
endpoint: localhost:7777
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
timeout: 5s
retries: 5
start_period: 30s
interfaces:
main:

View File

@ -22,6 +22,11 @@ const PODMAN_LOG_TIMEOUT: Duration = Duration::from_secs(15);
/// Per-container graceful shutdown timeout in seconds.
/// Bitcoin Core needs 600s to flush UTXO set, LND 330s for channel state,
/// indexers 300s for index flush, databases 120s for WAL/transaction commit.
///
/// MIRRORS `archipelago_container::runtime::stop_grace_secs_for` (which returns
/// `u64` and is the canonical table used by the orchestrator stop path). This
/// `&str` variant exists for the legacy `podman stop -t <s>` call sites here —
/// keep the two tables in sync until those are migrated to the orchestrator.
pub fn stop_timeout_secs(container_name: &str) -> &'static str {
let id = container_name
.strip_prefix("archy-")

View File

@ -97,26 +97,44 @@ async fn run_step(
args.push("exec");
args.push(container);
args.extend(exec.iter().map(String::as_str));
run_podman(&args).await
// `exec` spawns a process INSIDE the container's cgroup. When the
// container was started by archipelago.service, that cgroup is under
// the service's slice and a bare `podman exec` from the service can't
// write its `cgroup.procs` ("crun: ... Permission denied / OCI
// permission denied"). Run it in a transient user scope (its own
// delegated cgroup) — mirrors `podman_user_scope` for pasta starts.
run_podman(&args, /* scoped */ true).await
}
HookStep::CopyFromHost { copy_from_host } => {
let abs = resolve_copy_src(&copy_from_host.src, app_id, data_dir)?;
let abs = abs.to_string_lossy().into_owned();
let dest = format!("{container}:{}", copy_from_host.dest);
run_podman(&["cp", &abs, &dest]).await
// `cp` is a host-side copy (no in-container process), so no scope needed.
run_podman(&["cp", &abs, &dest], /* scoped */ false).await
}
}
}
async fn run_podman(args: &[&str]) -> Result<()> {
/// Run a podman command, optionally inside a transient systemd user scope. The
/// scope gives the invocation its own delegated cgroup so `podman exec` can
/// place its child process — without it, an exec launched from the service's
/// own cgroup is denied write to the container's `cgroup.procs`.
async fn run_podman(args: &[&str], scoped: bool) -> Result<()> {
let rendered = args.join(" ");
let out = tokio::time::timeout(
HOOK_TIMEOUT,
tokio::process::Command::new("podman").args(args).output(),
)
.await
.map_err(|_| anyhow::anyhow!("podman {rendered} timed out after {:?}", HOOK_TIMEOUT))?
.map_err(|e| anyhow::anyhow!("podman {rendered}: {e}"))?;
let mut cmd = if scoped {
let mut c = tokio::process::Command::new("systemd-run");
c.args(["--user", "--scope", "--quiet", "--collect", "podman"]);
c.args(args);
c
} else {
let mut c = tokio::process::Command::new("podman");
c.args(args);
c
};
let out = tokio::time::timeout(HOOK_TIMEOUT, cmd.output())
.await
.map_err(|_| anyhow::anyhow!("podman {rendered} timed out after {:?}", HOOK_TIMEOUT))?
.map_err(|e| anyhow::anyhow!("podman {rendered}: {e}"))?;
if !out.status.success() {
bail!(

View File

@ -171,6 +171,22 @@ pub fn compute_container_name(manifest: &AppManifest) -> String {
}
}
/// Resolve the graceful-stop grace (seconds) for an app: the manifest
/// `stop_grace_secs` extension if declared (manifest-driven, north-star), else
/// the historical per-app `stop_timeout_secs` table keyed by container name.
pub fn resolve_stop_grace_secs(manifest: &AppManifest, container_name: &str) -> u64 {
if let Some(v) = manifest.app.extensions.get("stop_grace_secs") {
// Accept either a YAML integer or a numeric string.
if let Some(n) = v.as_u64() {
return n;
}
if let Some(n) = v.as_str().and_then(|s| s.trim().parse::<u64>().ok()) {
return n;
}
}
archipelago_container::runtime::stop_grace_secs_for(container_name)
}
/// Fingerprint a local build context so a changed source tree (e.g. a rebuilt
/// `neode-ui` dist copied into `docker/<ui>/`) forces an image rebuild even
/// when the image tag already exists (#34). Walks the context directory and
@ -2896,13 +2912,25 @@ impl ContainerOrchestrator for ProdContainerOrchestrator {
let lock = self.app_lock(app_id).await;
let _guard = lock.lock().await;
let name = compute_container_name(&lm.manifest);
// Per-app graceful-stop grace: manifest `stop_grace_secs` if declared,
// else the historical per-app table. Slow-to-SIGTERM apps (bitcoin-core
// 600s, lnd 330s, electrumx 300s, fedimint 60s…) otherwise get a too-short
// `podman stop -t` and the stop is reported failed while the container
// keeps running. See PRODUCTION-MASTER-PLAN §8b.
let grace_secs = resolve_stop_grace_secs(&lm.manifest, &name);
// Quadlet-owned containers are restarted by systemd if only `podman stop`
// is used. Stop the user service first, then stop the container as a
// defensive fallback for legacy/non-Quadlet installs.
if let Err(err) = quadlet::stop_service(&format!("{name}.service")).await {
// defensive fallback for legacy/non-Quadlet installs. Give systemd the
// per-app grace before it force-kills the app-scoped unit.
let quadlet_timeout = std::time::Duration::from_secs(
grace_secs + archipelago_container::runtime::STOP_GRACE_DEADLINE_BUFFER_SECS,
);
if let Err(err) =
quadlet::stop_service_with_timeout(&format!("{name}.service"), quadlet_timeout).await
{
tracing::debug!(container = %name, error = %err, "quadlet stop skipped/failed");
}
match self.runtime.stop_container(&name).await {
match self.runtime.stop_container_with_grace(&name, grace_secs).await {
Ok(()) => Ok(()),
Err(err) => {
let stuck_stopping = self
@ -3467,6 +3495,37 @@ app:
assert_eq!(compute_container_name(&m), "legacy-bitcoin-ui");
}
fn manifest_with_stop_grace(id: &str, grace: &str) -> AppManifest {
let yaml = format!(
"app:\n id: {id}\n name: {id}\n version: 1.0.0\n stop_grace_secs: {grace}\n container:\n image: foo:1\n"
);
AppManifest::parse(&yaml).unwrap()
}
#[test]
fn stop_grace_manifest_field_wins() {
// An explicit stop_grace_secs overrides the per-app table (fedimint=60).
let m = manifest_with_stop_grace("fedimint", "180");
assert_eq!(resolve_stop_grace_secs(&m, "fedimint"), 180);
}
#[test]
fn stop_grace_falls_back_to_table() {
// No manifest field → the historical per-app table by container name.
let m = pull_manifest("fedimint", "foo:1");
assert_eq!(resolve_stop_grace_secs(&m, "fedimint"), 60);
let m = pull_manifest("bitcoin-knots", "foo:1");
assert_eq!(resolve_stop_grace_secs(&m, "bitcoin-knots"), 600);
let m = pull_manifest("electrumx", "foo:1");
assert_eq!(resolve_stop_grace_secs(&m, "electrumx"), 300);
}
#[test]
fn stop_grace_unknown_app_defaults_to_30() {
let m = pull_manifest("some-unknown-app", "foo:1");
assert_eq!(resolve_stop_grace_secs(&m, "some-unknown-app"), 30);
}
async fn orch_with(runtime: Arc<MockRuntime>) -> ProdContainerOrchestrator {
let mut orch = ProdContainerOrchestrator::with_runtime(
runtime,

View File

@ -642,7 +642,17 @@ pub async fn restart_service(service: &str) -> Result<()> {
/// Stop a generated Quadlet service without removing its unit file.
pub async fn stop_service(service: &str) -> Result<()> {
match systemctl_user_status(&["stop", service], QUADLET_STOP_TIMEOUT).await {
stop_service_with_timeout(service, QUADLET_STOP_TIMEOUT).await
}
/// Stop a user service, waiting up to `timeout` for a graceful stop before
/// force-killing the app-scoped unit. Slow-to-SIGTERM apps (bitcoin-core ~600s,
/// lnd ~330s) must not be SIGKILLed at the default 45s — that risks data
/// corruption — so the orchestrator passes the per-app grace here. Never waits
/// less than `QUADLET_STOP_TIMEOUT`.
pub async fn stop_service_with_timeout(service: &str, timeout: Duration) -> Result<()> {
let timeout = timeout.max(QUADLET_STOP_TIMEOUT);
match systemctl_user_status(&["stop", service], timeout).await {
Ok(status) if status.success() => Ok(()),
Ok(status) => Err(anyhow!("systemctl --user stop {service} exited {status}")),
Err(err) => {

View File

@ -422,11 +422,22 @@ impl PodmanClient {
}
pub async fn stop_container(&self, name: &str) -> Result<()> {
self.stop_container_with_grace(name, 10).await
}
/// Stop via libpod honouring a per-app grace (seconds). The HTTP deadline is
/// kept above the grace so the post-grace SIGKILL lands before we give up —
/// otherwise slow-to-SIGTERM apps (fedimint, bitcoin-core, electrumx…) time
/// out at exactly the grace boundary and the stop is reported as failed.
pub async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
let deadline = std::time::Duration::from_secs(
grace_secs + crate::runtime::STOP_GRACE_DEADLINE_BUFFER_SECS,
);
self.api_request(
"POST",
&format!("libpod/containers/{}/stop?t=10", name),
&format!("libpod/containers/{}/stop?t={}", name, grace_secs),
None,
DEFAULT_TIMEOUT,
deadline,
)
.await
.map(|_| ())

View File

@ -10,6 +10,35 @@ const PODMAN_CLI_DEFAULT_TIMEOUT: Duration = Duration::from_secs(30);
const PODMAN_CLI_IMAGE_CHECK_TIMEOUT: Duration = Duration::from_secs(10);
const PODMAN_CLI_BUILD_TIMEOUT: Duration = Duration::from_secs(900);
/// Default graceful-stop grace (seconds) when a caller doesn't supply a per-app
/// value. Mirrors the historical `podman stop -t 30`.
pub const DEFAULT_STOP_GRACE_SECS: u64 = 30;
/// Headroom added to a stop grace to form the await/HTTP deadline, so podman's
/// post-grace SIGKILL completes before the wrapper times out.
pub const STOP_GRACE_DEADLINE_BUFFER_SECS: u64 = 15;
/// Canonical per-app graceful-stop grace (seconds), keyed by container name.
/// Slow-to-SIGTERM apps need far longer than the 30s default: bitcoin-core
/// flushes its chainstate, lnd closes channels, electrumx finishes indexing,
/// stack DBs checkpoint. Used as the fallback when a manifest doesn't declare
/// `stop_grace_secs`. NOTE: the RPC layer's `stop_timeout_secs` mirrors this
/// (returns the same values as `&str` for legacy `podman stop -t` call sites) —
/// keep the two in sync until that path is retired.
pub fn stop_grace_secs_for(container_name: &str) -> u64 {
let id = container_name
.strip_prefix("archy-")
.unwrap_or(container_name);
match id {
"bitcoin-knots" | "bitcoin-core" | "bitcoin" => 600,
"lnd" => 330,
"electrumx" | "electrs" | "mempool-electrs" => 300,
"btcpay-db" | "mempool-db" | "penpot-postgres" | "immich_postgres" | "nextcloud-db"
| "endurain-db" => 120,
"btcpay-server" | "nbxplorer" | "fedimint" | "fedimint-gateway" => 60,
_ => DEFAULT_STOP_GRACE_SECS,
}
}
#[async_trait]
pub trait ContainerRuntime: Send + Sync {
async fn pull_image(&self, image: &str, signature: Option<&str>) -> Result<()>;
@ -21,6 +50,19 @@ pub trait ContainerRuntime: Send + Sync {
) -> Result<String>;
async fn start_container(&self, name: &str) -> Result<()>;
async fn stop_container(&self, name: &str) -> Result<()>;
/// Stop a container honouring a per-app graceful-shutdown grace (seconds).
///
/// Slow-to-SIGTERM apps (bitcoin-core, lnd, electrumx, fedimint, immich…)
/// need a longer `podman stop -t` than the default 30s, or `podman stop`
/// returns before the container exits and the orchestrator treats the stop
/// as failed (the container keeps running). The wrapping deadline is always
/// kept strictly greater than `grace_secs` so podman's post-grace SIGKILL
/// lands inside the await. The default impl ignores the grace and calls
/// `stop_container` — only the real podman runtime honours it.
async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
let _ = grace_secs;
self.stop_container(name).await
}
async fn remove_container(&self, name: &str) -> Result<()>;
async fn get_container_status(&self, name: &str) -> Result<ContainerStatus>;
async fn get_container_logs(&self, name: &str, lines: u32) -> Result<Vec<String>>;
@ -122,10 +164,23 @@ impl ContainerRuntime for PodmanRuntime {
}
async fn stop_container(&self, name: &str) -> Result<()> {
match self.client.stop_container(name).await {
self.stop_container_with_grace(name, DEFAULT_STOP_GRACE_SECS)
.await
}
async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
match self.client.stop_container_with_grace(name, grace_secs).await {
Ok(()) => Ok(()),
Err(api_err) => {
let output = self.podman_cli(&["stop", "-t", "30", name]).await?;
// CLI fallback. Keep the wrapper deadline strictly above the
// `-t` grace so podman's post-grace SIGKILL completes before the
// await gives up (otherwise a deadline == grace races the kill
// and reports a spurious timeout).
let grace = grace_secs.to_string();
let deadline = Duration::from_secs(grace_secs + STOP_GRACE_DEADLINE_BUFFER_SECS);
let output = self
.podman_cli_timeout(&["stop", "-t", &grace, name], deadline)
.await?;
if output.status.success() {
Ok(())
} else {
@ -841,6 +896,10 @@ impl ContainerRuntime for AutoRuntime {
self.runtime.stop_container(name).await
}
async fn stop_container_with_grace(&self, name: &str, grace_secs: u64) -> Result<()> {
self.runtime.stop_container_with_grace(name, grace_secs).await
}
async fn remove_container(&self, name: &str) -> Result<()> {
self.runtime.remove_container(name).await
}

18
demo-deploy/.env.example Normal file
View File

@ -0,0 +1,18 @@
# Copy to .env and adjust. Used by demo-deploy/docker-compose.yml.
# Registry host + namespace that holds the prebuilt demo images.
REGISTRY=146.59.87.168:3000/lfg2025
# Image tag to deploy (CI publishes :demo and :<git-sha>).
IMAGE_TAG=demo
# Host port for the demo UI.
DEMO_WEB_PORT=2100
# Optional — enables the in-app AI chat panel. Leave blank to disable.
ANTHROPIC_API_KEY=
# Optional sandbox tuning (defaults shown).
DEMO_SESSION_TTL_MS=2700000 # 45 min idle before a visitor session is reaped
DEMO_MAX_SESSIONS=500 # concurrent visitor cap
DEMO_FILE_QUOTA_BYTES=52428800 # 50 MB uploads per visitor

33
demo-deploy/README.md Normal file
View File

@ -0,0 +1,33 @@
# Archipelago — Public Demo deploy
A click-to-play demo of the Archipelago UI, backed entirely by a mock backend.
Every visitor gets an **isolated, ephemeral sandbox** (own apps, wallet, files),
real container runtimes are never touched, and Bitcoin runs on **signet** test
coins. **Login password: `entertoexit`** (shown on the login screen).
This directory is the full contents of the public `archy-demo` repo. It holds no
source — only this compose file that pulls prebuilt `:demo` images.
## Deploy in Portainer
1. **Stacks → Add stack → Repository** (or paste `docker-compose.yml` into the web editor).
2. Set environment variables (see `.env.example`) — at minimum `REGISTRY`, and
`ANTHROPIC_API_KEY` if you want the AI chat panel.
3. Deploy. The UI is served on `:2100` (override with `DEMO_WEB_PORT`).
To pick up a new build, redeploy the stack (or wire the CI Portainer webhook).
## How it stays current
The images are built from the Archipelago monorepo by
`.github/workflows/demo-images.yml` on every change to `neode-ui/`, tagged `:demo`
and `:<git-sha>`, and pushed to `REGISTRY`. Editing the real UI → CI rebuilds →
redeploy here. No source lives in this repo.
## What's mocked
- **Per-visitor isolation** — state keyed by a `demo_sid` cookie, idle-reaped.
- **Apps** — install/uninstall/start/stop are simulated (no real Docker).
- **Wallet/Bitcoin** — signet-flavored; use the in-UI faucet for test sats.
- **Files** — real per-session upload/rename/delete, 50 MB quota, wiped on reap.
- **Intro** — replays once per calendar day per browser.

View File

@ -0,0 +1,49 @@
# Archipelago Public Demo — thin deploy stack
#
# This is the ENTIRE contents intended for the public `archy-demo` repo. It holds
# NO source — it pulls prebuilt `:demo` images that CI builds from the monorepo on
# every neode-ui change (see .github/workflows/demo-images.yml). Deploy this in
# Portainer ("deploy from repository" or paste into the web editor).
#
# Demo login password: entertoexit
# Access on http://<host>:2100
#
# Configure via a .env file (see .env.example):
# REGISTRY registry host/namespace holding the demo images
# IMAGE_TAG image tag to pull (default: demo)
# ANTHROPIC_API_KEY optional — enables the AI chat panel
# DEMO_WEB_PORT host port for the UI (default 2100)
services:
neode-backend:
image: ${REGISTRY:-146.59.87.168:3000/lfg2025}/archy-demo-backend:${IMAGE_TAG:-demo}
container_name: archy-demo-backend
environment:
DEMO: "1"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
NODE_OPTIONS: "--dns-result-order=ipv4first"
DEMO_SESSION_TTL_MS: ${DEMO_SESSION_TTL_MS:-2700000}
DEMO_MAX_SESSIONS: ${DEMO_MAX_SESSIONS:-500}
DEMO_FILE_QUOTA_BYTES: ${DEMO_FILE_QUOTA_BYTES:-52428800}
expose:
- "5959"
dns:
- 8.8.8.8
- 1.1.1.1
restart: unless-stopped
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:5959/health"]
interval: 30s
timeout: 10s
retries: 3
neode-web:
image: ${REGISTRY:-146.59.87.168:3000/lfg2025}/archy-demo-web:${IMAGE_TAG:-demo}
container_name: archy-demo-web
ports:
- "${DEMO_WEB_PORT:-2100}:80"
environment:
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
depends_on:
- neode-backend
restart: unless-stopped

View File

@ -1,6 +1,13 @@
# Archipelago Demo Stack - Mock backend + Vue UI + AIUI Chat
# Deploy via Portainer: Web editor -> paste this, or deploy from repo
# Access at http://localhost:4848
# Archipelago Public Demo Stack - Mock backend + Vue UI + AIUI Chat
# Deploy via Portainer: Web editor -> paste this, or deploy from repo (build).
# Access at http://localhost:2100
#
# This builds the demo images from source. For the separated, auto-updating
# deploy that pulls prebuilt :demo images, see demo-deploy/docker-compose.yml.
#
# DEMO=1 turns on the public multi-visitor sandbox: each visitor gets an
# isolated, ephemeral copy of all state; real container runtimes are never
# touched; the shared login password is "entertoexit".
#
# Required: Set ANTHROPIC_API_KEY in environment or .env file for chat to work
# IndeedHub is deployed as a separate Portainer stack (indee-demo repo)
@ -12,9 +19,13 @@ services:
dockerfile: neode-ui/Dockerfile.backend
container_name: archy-demo-backend
environment:
VITE_DEV_MODE: "existing"
DEMO: "1"
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
NODE_OPTIONS: "--dns-result-order=ipv4first"
# Optional tuning (defaults shown):
# DEMO_SESSION_TTL_MS: "2700000" # 45 min idle before a session is reaped
# DEMO_MAX_SESSIONS: "500" # concurrent visitor cap
# DEMO_FILE_QUOTA_BYTES: "52428800" # 50 MB uploads per visitor
expose:
- "5959"
dns:
@ -31,9 +42,11 @@ services:
build:
context: .
dockerfile: neode-ui/Dockerfile.web
args:
VITE_DEMO: "1"
container_name: archy-demo-web
ports:
- "4848:80"
- "2100:80"
depends_on:
- neode-backend
restart: unless-stopped

View File

@ -5,7 +5,7 @@
> supersedes all prior roadmap/handoff/status docs. When the gate passes, remove
> the priority banner and demote this doc.
>
> Last updated: 2026-06-21 · Binary: v1.7.99-alpha
> Last updated: 2026-06-22 · Binary: v1.7.99-alpha · See §8b for the live resume.
---
@ -148,126 +148,214 @@ hardening; paid swarm streaming + IndeeHub source (`phase4-streaming-ecash-plan.
Meshroller Rust-native mesh AI (`meshroller-integration-design.md`); dual-ecash
phases 26 (`dual-ecash-design.md`).
## 8b. SESSION STATE + RESUME (2026-06-21, live)
## 8b. SESSION STATE + RESUME (updated 2026-06-22) — READ THIS FIRST ON RESUME
**Landed + committed on main this session (newest first):**
- **#20 phase 3 — ADOPTION PATH LIVE-VERIFIED on .228 (2026-06-21).** Built
v1.7.99-alpha, sideloaded binary + 7 manifests, restarted (stop/replace/start —
containers survived via --restart unless-stopped + podman-restart.service). RPC
`package.install indeedhub``complete`, orchestrator-first path adopted all 7
members (`reconcile action app_id=indeedhub-* action=NoOp`), containers stayed
**Up 4 days (NOT recreated)** — zero data/credential disruption. UI green:
frontend :7778 → 200, nostr-provider.js → 200, **/api/ → 200 (proves
network_aliases: frontend nginx `http://api:4000` resolved on indeedhub-net)**.
Fleet healthy (36 containers, none down).
**FRESH-CREATE PATH = FIXED + VERIFIED (2026-06-21).** Deleted the legacy
indeedhub orchestrator special-cases (`b73084db`, 382 lines: reconcile_indeedhub_stack,
start_indeedhub_backends, the 120s dependency-DNS gate, patch_indeedhub_nostr_provider,
etc.) so "indeedhub" flows through the generic install_fresh path. Then two live fixes
on .228: (1) frontend nginx needs `capabilities: [CHOWN,DAC_OVERRIDE,SETGID,SETUID]`
under the orchestrator's --cap-drop=ALL (workers died "setgid(101) failed"); manifest
fix `ff8f11b8`. (2) NOTE: manifest reload needs an archipelago restart (manifests
cached at startup) — a disk manifest edit alone won't take. RESULT: frontend
fresh-creates via install_fresh, caps applied, post_install hook FIRES
(copy_from_host nostr-provider.js ✅), UI 200 (/, /nostr-provider.js, /api/).
**KNOWN GAP (general hook capability, NOT blocking indeedhub):** the post_install
`exec` steps fail via `podman exec` from the archipelago.service systemd cgroup
(`crun: write cgroup.procs: Permission denied / OCI permission denied`). Harmless
here (image bakes the nginx config so the exec steps are no-ops; copy_from_host is
the one that matters and works). FIX = wrap the hook executor's `podman exec` in a
transient user scope (`systemd-run --user --scope`, like `podman_user_scope`) in
core/archipelago/src/container/hooks.rs::run_podman. Do before relying on exec hooks
for an app whose image does NOT pre-bake its mutations.
### Where we are — Task #20 (manifest lifecycle hooks) + indeedhub migration: DONE & 2-node verified
PRIOR (now resolved) — was: **FRESH-CREATE PATH = BLOCKED (found live 2026-06-21).** Removed the stateless
frontend + reinstalled to exercise install_fresh → it FAILED:
`orchestrator stack install indeedhub failed at app indeedhub: IndeedHub
dependencies were not ready within 120s (indeedhub-api dependency DNS not ready)`,
and the frontend was left down. Recovered manually on .228 (podman run w/ alias
indeedhub on indeedhub-net; UI 200). ROOT CAUSE = hardcoded indeedhub orchestrator
special-cases that predate + conflict with the manifest path:
- prod_orchestrator `ensure_running` ~L1377: `app_id=="indeedhub"`
`reconcile_indeedhub_stack`, which REFUSES manifest creation when the frontend
is absent (returns Left("stack-managed")).
- `run_pre_start_hooks("indeedhub")` ~L2324 → `start_indeedhub_backends`
`wait_for_indeedhub_dependencies_ready(120)` — the gate that blocked install_fresh
(`indeedhub_api_dependency_dns_ready` returns false while the frontend's own alias
is absent + a getent transiently fails).
- also `repair_indeedhub_network_aliases`, `patch_indeedhub_nostr_provider`, the
"frontend did not stay reachable; restart" path (~L2474), `INDEEDHUB_BACKEND_*`
consts, and a crash_recovery.rs indeedhub special-case.
**FIX (next, its own build/deploy/test cycle):** delete these special-cases now
that the manifest carries dependencies/network_aliases/post_install — route
"indeedhub" through the GENERIC install_fresh + reconcile path so the frontend
fresh-creates normally (hook fires). Then re-run the destructive lifecycle on .228
(frontend recreate must succeed + run the hook), then .198, then the gate.
NOTE: .228 currently runs v1.7.99-alpha (these special-cases still present) — the
running stack is fine (adoption NoOp); only a frontend-absent event re-triggers the
bug, and the frontend is up.
- `b1eea8c0` indeedhub (#20) **phase 3 — CODE COMPLETE, unit-tested.** 7 manifests (apps/indeedhub-{postgres,redis,minio,relay,api,
ffmpeg} + apps/indeedhub frontend) + install_indeedhub_stack orchestrator-first
(immich pattern). Data-preserving by construction = ADOPTION on .228: exact live
hyphen container names, named volumes indeedhub-*-data, dedicated indeedhub-net +
network_aliases [postgres|redis|minio|relay|api], generated_secrets reuse live
/var/lib/archipelago/secrets values (ensure_one no-ops on existing). Frontend
carries the post_install nginx hook (replaces patch_indeedhub_nostr_provider;
defensive since indeedhub:1.0.0 already bakes it). .228 GROUND TRUTH captured:
7 containers Up, volumes indeedhub-{postgres,redis,minio,relay}-data, network
indeedhub-net; frontend nginx upstreams api:4000/minio:9000/relay:8080; image
already bakes X-Frame strip + nostr-provider.js (6347B) + sub_filter.
**NEXT = live verify on .228:** build+sideload binary, restart, package.install
indeedhub → expect adoption (NoOp, no data touch), then full lifecycle. Risk:
service restart SIGKILL-cascade if Quadlet not fully shipped on .228.
- `b94b61f6` `network_aliases` manifest field (ContainerConfig) + podman_client &
quadlet rendering + DNS-label validation; also fixed 4 pre-existing from_manifest
test failures (network_policy: archy-net invalid; bind sources outside
/var/lib/archipelago). Enables indeedhub's short aliases on indeedhub-net.
- `955c54b7` hook capability (#20) **phase 2**`container::hooks::run_post_install`
executor (podman exec + copy_from_host w/ allowlist canonicalise + symlink-escape
prefix check; best-effort/idempotent) wired into `install_fresh` after container
is up (fresh-container-only). 5 unit tests; `cargo test -p archipelago` green.
- `4c1a4e59` hook capability (#20) **phase 1**`LifecycleHooks`/`HookStep`/`HostCopy`
schema + validate() + re-exports + 3 schema tests; also fixed 3 pre-existing
`ContainerConfig` test literals missing `generated_secrets` (container crate now
compiles; `cargo test -p archipelago-container` green, 53 pass).
- `f0c6b79d` immich containers named underscore (immich_server/_postgres/_redis) to
match runtime lifecycle code — fixes package.stop/start/restart. **immich fully
migrated + verified on .228** (manifest-driven stack via orchestrator).
- `b0b54a96` immich lifecycle bats suite (tests/lifecycle/bats/immich.bats).
- `d5ef4573`/`9e6c5370`/`011081d1` immich migration (rename→immich, orchestrator-first).
- `f160e0c4` podman-restart.service enabled at startup (reboot-survival).
- `0860dfac` Services-tab UI (backends→Services, parent icons, categories sub-nav, swipe).
- `220666d3`/`7bfbe8fe` registry-manifest infra phases 1+2 (consume + EMBED_MANIFESTS publish).
- `192238cb` docs consolidation 56→28 + CLAUDE.md.
- `03a4ee1b` generated-secrets system + companion/quadlet fixes.
Manifest-driven lifecycle hooks + the IndeedHub stack migration are **complete and
live-verified on BOTH .228 and .198** (adoption + fresh-create + post_install hook
exec, stable under load). 15 commits this session: `4c1a4e59`..`e2a012d0`. Working
tree clean. The release lifecycle gate is temporarily **5×** (was 20×; `ARCHY_ITERATIONS=5`).
**DONE — hook capability (#20), phases 1+2 (schema + executor + wiring):**
controlled post-install hooks so indeedhub/netbird can migrate. Design:
`docs/manifest-hooks-design.md`. Schema, validate(), executor, and install-path
wiring all landed + green (commits `4c1a4e59`/`955c54b7` above). Remaining #20
phases: 3 = indeedhub migration (NEXT, below); 4 = netbird; 5 = `pre_start` hooks
(type exists, NOT yet executed — wire into `prepare_for_start` if/when needed).
**Shipped (all on `main`, newest first):**
- `e2a012d0` indeedhub frontend health → `tcp:7777` (was http GET `/`; the http check
false-failed under load and the reconciler churned the frontend — fixed).
- `ff78b312` hook `exec` runs in a transient user scope
(`systemd-run --user --scope --quiet --collect podman exec …`) — fixes
"crun: write cgroup.procs: Permission denied" when exec'ing from archipelago.service.
- `ff8f11b8` indeedhub frontend caps `[CHOWN,DAC_OVERRIDE,SETGID,SETUID]` — nginx
workers died "setgid(101) failed" under the orchestrator's `--cap-drop=ALL`.
- `b73084db` DELETED the legacy indeedhub orchestrator special-cases (382 lines:
reconcile_indeedhub_stack, start_indeedhub_backends, the 120s dependency-DNS gate,
patch_indeedhub_nostr_provider, repair_indeedhub_network_aliases, INDEEDHUB_* consts)
→ "indeedhub" now uses the GENERIC install_fresh/reconcile path.
- `b1eea8c0` 7 indeedhub manifests (apps/indeedhub{,-postgres,-redis,-minio,-relay,-api,
-ffmpeg}) + `install_indeedhub_stack` orchestrator-first (immich pattern).
- `b94b61f6` `network_aliases` ContainerConfig field (podman_client + quadlet rendering,
DNS-label validated) — lets the frontend nginx reach `api:4000`/`minio:9000`/`relay:8080`
on the dedicated `indeedhub-net`.
- `955c54b7`/`4c1a4e59` #20 hooks phases 1-2: schema (LifecycleHooks/HookStep/HostCopy in
archipelago-container::manifest) + executor `container::hooks::run_post_install`
(allowlist-canonicalised copy_from_host + scoped exec), wired into `install_fresh`.
- `84031e62` gate 20×→5× (docs only: CLAUDE.md, this file, tests/lifecycle/TESTING.md).
**NEXT — #20 phase 3, indeedhub migration:** author 7 member manifests
(postgres/redis/minio/relay/api/ffmpeg + frontend) on archy-net with container-name
hostnames; frontend carries the `post_install` hook (strip X-Frame-Options, copy
nostr-provider.js, inject script, nginx reload — see `patch_indeedhub_nostr_provider`
in install.rs:68 for exact ops); wire `install_indeedhub_stack` orchestrator-first;
generated_secrets: indeedhub-db-password/indeedhub-jwt/indeedhub-minio-password
(reuse live values); preserve hardcoded AES_MASTER_SECRET literal + minio user
"indeeadmin". Then netbird (assess its setup steps). Then single-container legacy
apps (add to `uses_orchestrator_install_flow` allowlist in install.rs + verify each).
Then the lifecycle gate (#6) — needs harness hardening (#18) + .228 bitcoin synced.
**Design = adoption-safe + manifest-driven.** Manifests reproduce the live install exactly
so existing nodes ADOPT (NoOp) instead of recreate: hyphen container_names the runtime
already references, named volumes `indeedhub-{postgres,redis,minio,relay}-data`,
`indeedhub-net` + network_aliases [postgres|redis|minio|relay|api], generated_secrets reuse
the live /var/lib/archipelago/secrets values (ensure_one no-ops on existing; postgres pw is
fixed at PGDATA init). minio user "indeeadmin" + AES_MASTER_SECRET literal kept. The
frontend image indeedhub:1.0.0 already bakes the iframe nginx (X-Frame omit + nostr-provider.js
+ sub_filter), so the post_install hook (sed X-Frame / copy nostr-provider.js / inject /
nginx reload) is defensive/idempotent. crash_recovery.rs's frontend-after-deps ordering
guard is KEPT on purpose (beneficial; not a blocker).
**Test/deploy facts:** .228 = archi resilience node, UI/RPC pw `password123` (https),
SSH pw `archipelago`. Lifecycle harness runs from .116: `cd tests/lifecycle &&
ARCHY_HOST=192.168.1.228 ARCHY_SCHEME=https ARCHY_PASSWORD=password123
ARCHY_ALLOW_DESTRUCTIVE=1 ./run.sh <suite>`. RPC trigger: auth.login (sets session
+ csrf cookies) → send csrf cookie value as `X-CSRF-Token` header. package.install
needs `{"id":"<app>","dockerImage":"<any-valid-image>"}` (dockerImage required even
for stacks). Rust workspace root = `core/`. Linker `undefined hidden symbol`
rebuild with `CARGO_INCREMENTAL=0`. immich on .228: app_id `immich`, containers
immich_server/immich_postgres/immich_redis, data dir owner 100998:100998.
### ⛔ GATE BLOCKER 2026-06-22 — `package.stop` ignores the per-app stop grace (REAL, fleet-wide, ROOT-CAUSED)
Step 1 (sync .228 tcp-health manifest) is **DONE + verified**. Step 2 (the 5× gate) surfaced a
real, fleet-wide `package.stop` bug — **reproduced on the CLEAN, quadlet-correct .198**, so it is a
genuine product bug, not node contamination. Root cause is fully pinned (below).
**Symptom.** `package.stop <app>` returns `{"status":"stopping"}` but the container **never stops**
(`container-list` shows `running` 60s+); the gate's `wait_for_container_status … stopped 60` times
out. Hits **fedimint, electrumx, bitcoin-knots, btcpay-server, immich** (slow-to-SIGTERM apps).
`filebrowser` passes because it exits on SIGTERM in <30s.
**ROOT CAUSE (from .198 journal during a live `package.stop fedimint`):**
```
WARN quadlet: systemctl --user stop fedimint.service timed out after 45s
ERROR runtime: package.stop fedimint failed: stop_container fedimint:
podman stop -t 30 fedimint timed out after 30s: deadline has elapsed
```
The orchestrator stop path **ignores the per-app graceful-stop table** and the wrapper deadline
equals the grace:
- `archipelago::api::rpc::package::runtime::stop_timeout_secs()` defines per-app grace
(**bitcoin 600s, lnd 330s, electrumx 300s, immich_postgres 120s, fedimint/btcpay 60s**, default 30).
The **legacy** stop paths use it (runtime.rs:329/607/1060 `podman stop -t <stop_timeout_secs>`).
- The **orchestrator** path does NOT: `prod_orchestrator::stop()``ContainerRuntime::stop_container`
(`container/src/runtime.rs:124`) → API `PodmanClient::stop_container` hardcodes **`?t=10`**
(podman_client.rs) and the CLI fallback hardcodes **`-t 30`** (runtime.rs:128). fedimint needs 60s
but gets 10s/30s ⇒ SIGTERM grace expires; the API/CLI stop errors out and the whole stop fails →
state reverts to `running`.
- **Compounding:** `PODMAN_CLI_DEFAULT_TIMEOUT = 30s` (runtime.rs:9) wraps `podman stop -t 30`, so
the await fires **exactly** when podman would SIGKILL → "timed out after 30s" even though the kill
would land a moment later. The wrapper deadline must exceed the `-t` grace.
**FIX (two parts, design choice flagged):**
1. **Thread the per-app stop grace into the orchestrator stop path.** Either (A) move/duplicate
`stop_timeout_secs` into the `container` crate and have `stop_container` use it, (B) extend the
`ContainerRuntime::stop_container` signature to take a `grace: Duration` and have
`prod_orchestrator::stop()` compute it from the loaded manifest, or **(C, north-star-aligned)**
add a `stop_grace_secs` field to the manifest (default 30) and read it from `lm.manifest` in
`stop()`. (C) is the manifest-driven choice; bitcoin/lnd/electrumx/fedimint manifests then declare
their value. **DECISION NEEDED from owner: A/B (fast, table-based) vs C (manifest-driven).**
2. **Make the CLI/API wrapper deadline = grace + buffer** (e.g. grace + 15s) so podman's SIGKILL
completes inside the await. Apply to both `PodmanClient::stop_container` (`?t=`+HTTP timeout) and
the `runtime.rs` CLI fallback (`-t`+`PODMAN_CLI_DEFAULT_TIMEOUT`).
Add a mock-orchestrator test: a container that ignores SIGTERM for >30s must still end `stopped`.
**Build/deploy after the fix:** `cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`
→ sideload to .228 + .198 (stop archipelago, cp binary, start) → **re-quadletize .228** (its backend
`.container` files are gone from my cascade-gate contamination — reinstall its apps so units
regenerate, matching .198) → re-run the canonical gate (DESTRUCTIVE only).
### ✅/⚠️ FIX SHIPPED + VALIDATED 2026-06-22 — and the gate has MORE causes than the grace bug
**Done:** the grace fix is implemented (option **C+table fallback**: manifest `stop_grace_secs`
`stop_grace_secs_for()` table; deadline = grace + 15s), unit-tested (3 tests green), committed
(`2dad64b2`), release-built, and **deployed to BOTH .228 and .198** (active, UI 200). Quadlet
regression suite green (37/37). **Validated:** healthy app `vaultwarden` stops cleanly on .198
(running→exited→removed) — no regression; the deployed binary's stop path works.
**But validation revealed the gate failures are MULTI-CAUSED — the grace bug is only one of ~5:**
1. ✅ FIXED — orchestrator ignored per-app stop grace (`podman stop -t 30` spurious 30s timeout).
2. ⛔ **`fedimint` is crash-looping / unhealthy on BOTH nodes** (`health_monitor: Auto-restarting
unhealthy container: fedimint`, attempt 6/10). An app that won't stay up can't be cleanly
stopped — fedimint was a *confounded* test case. Needs a fedimint-health investigation
(why is its container unhealthy / why does host port 8173 not become reachable).
`health_monitor` DOES respect `user_stopped` (health_monitor.rs:983) so that part is correct.
3. ⛔ **Host-listener repair watchdog** (`prod_orchestrator`: "host listener disappeared after
startup; restarting container app_id=fedimint") restarts containers whose launch port isn't
reachable — fights any stop of a port-unreachable app.
4. ⚠️ **State-model nuance:** `vaultwarden` showed `exited``absent`, never `stopped`; the gate waits
for exactly `"stopped"` (`wait_for_container_status … stopped`). The `Exited→Stopped` conversion
(server.rs:1191, needs `user_stopped.contains(id)`) isn't always firing — likely an id-vs-name
key mismatch. The gate may need to accept `exited`/`absent` as terminal, or the conversion fixed.
5. ⚠️ **Grace vs gate-timeout:** `electrumx` grace is 300s; if it ignores SIGQUIT the container
only dies at the 300s SIGKILL — far past the gate's 60s wait. `-t` is a *ceiling*, so a HEALTHY
electrumx that honours SIGQUIT stops fast; an unhealthy/ignoring one blows the gate window.
Decide: trim graces, make the gate's per-app stop-wait ≥ grace, or both.
6. ⚠️ **.228 contamination** (plain podman, no quadlet units) — my cascade-gate; re-quadletize.
**Bottom line:** the grace fix is correct and shipped, but **the gate will not go green until #2#6
are addressed**. These are pre-existing product/health issues the gate is correctly surfacing, not
regressions from this work. They need owner prioritization (esp. fedimint health, the watchdog-vs-
stop interaction, and the gate's terminal-state acceptance).
**Quadlet context (still true, but SEPARATE from the bug above):** quadlet IS the intended backend
runtime — .198 has the backend `.container` files (bitcoin-knots/btcpay-server/fedimint/filebrowser/
indeedhub/gitea/grafana/botfights/…). .228 lost them (only UI companions + home-assistant remain;
`bitcoin-core.container` is `.disabled-20260506`) **because my cascade-gate uninstalled its apps and
my `package.start` restore recreated them as bare `podman run --restart=unless-stopped`** without
regenerating units. Two related hardening items: (a) `package.start` should regenerate a missing
quadlet unit, not fall back to bare podman; (b) re-survey the status doc's "Quadlet-everywhere ~96%"
from `.container`-file presence + `PODMAN_SYSTEMD_UNIT`, not from "container running".
The **stop→stopped STATE reporting is correct** once the container actually stops (server.rs:1334
keeps a `--rm`'d app visible as `Stopped` via the `user_stopped` guard — proven on filebrowser); the
bug is purely "container never stops", not "state not reported".
### MY-SESSION ERRATA (own it on resume)
- I ran the gate with `ARCHY_ALLOW_CASCADE_DESTRUCTIVE=1`, which is **NOT** the canonical gate (that
is `ARCHY_ALLOW_DESTRUCTIVE=1` only — stop/start/restart, no uninstall/reinstall; see run-20x.sh
"Suggested release-gate invocation"). Cascade ran uninstall/reinstall on every app and, when I
killed the run mid-iteration, left bitcoin-knots/electrumx/btcpay/fedimint/immich uninstalled or
stranded. **I fully restored .228** (reinstalled bitcoin-knots with the correct image
`146.59.87.168:3000/lfg2025/bitcoin-knots:latest`; started the rest; cleared a stale
`user-stopped.json`). Verified healthy: UI 200, 35 containers, 17 apps `running`.
- Reinstall gotcha: `package.install` needs a REAL image ref in `dockerImage`; a bare app name
`Invalid Docker image format`.
### NEXT STEPS (in order)
1. ✅ **DONE** — root-caused the stop-grace bug, fixed it (commit `2dad64b2`), unit-tested,
release-built, **deployed to .198 + .228**, validated no-regression (vaultwarden stops on .198).
2. ⛔ **fedimint health** — why is its container unhealthy on both nodes (health_monitor restart
6/10; host port 8173 unreachable)? A crash-looping app can't pass the lifecycle gate. Likely the
real top blocker now. Same lens for any other unhealthy app surfaced by the gate.
3. ⛔ **Host-listener repair vs user-stop** — the launch-port watchdog
(`prod_orchestrator`: "host listener disappeared after startup; restarting container") must NOT
restart a container the user just stopped. Check it consults `disabled`/`user_stopped`.
4. ⚠️ **Gate terminal-state acceptance** — apps end `exited`/`absent`, not always `stopped`
(Exited→Stopped conversion at server.rs:1191 needs a matching `user_stopped` key). Either fix the
conversion (id-vs-name) or have `wait_for_container_status … stopped` accept exited/absent.
5. ⚠️ **Grace vs gate-timeout** — trim over-long graces (electrumx 300s) and/or make the gate's
per-app stop-wait ≥ the app's grace.
6. **Re-quadletize .228** (backend `.container` files wiped by my cascade-gate; reinstall its apps so
units regenerate, matching .198; verify `.container` + `PODMAN_SYSTEMD_UNIT`).
7. **Run the canonical gate** `ARCHY_ALLOW_DESTRUCTIVE=1 ARCHY_ITERATIONS=5` (NO cascade; never kill
mid-iteration) on .198 then .228. Green = Step-2-of-plan done.
8. Hardening: `package.start` should regenerate a missing quadlet unit, not fall back to bare podman;
re-survey the status doc's quadlet % from `.container`-file presence.
9. **netbird migration (#20 phase 4)** — same pattern; assess setup steps first (TLS cert gen,
config files, resolver IP — may need host-file-write hooks beyond exec/copy_from_host; legacy is
install_netbird_stack in stacks.rs).
10. Then single-container legacy apps onto the orchestrator install flow; then demote the banner.
### KNOWN ISSUES / WATCH-OUTS
- **.198 is a weak/loaded node** (load avg ~35). The generic reconcile recreates
containers it deems unhealthy; under load, false-failing health checks → churn. The
tcp-health fix (`e2a012d0`) mitigated the frontend case. If the lifecycle gate churns on
.198, look for other apps whose http health checks false-fail under load → prefer tcp.
- **Many concurrent SSH sessions to .198 wedge its sshd** (MaxStartups) — it pings but SSH
hangs for minutes. Use ONE ssh at a time to .198; `pkill -f 192.168.1.198` to clear strays.
- Hook `exec` only works in the scoped form (committed). `copy_from_host` is direct `cp`.
### DEPLOY / VERIFY FACTS (both nodes, ISO Debian, glibc 2.41 — binary built on .116 runs on both)
- **Build:** `cd core && CARGO_INCREMENTAL=0 cargo build --release -p archipelago`
(~12 min, opt-level=3). Binary at `core/target/release/archipelago`. Linker
"undefined hidden symbol" → rebuild with CARGO_INCREMENTAL=0. `archipelago` is a
bin-only crate (no lib). Filtered tests: `cargo test -p archipelago --bin archipelago -- hooks quadlet`.
- **Sideload:** `scp binary $H:/tmp/archipelago-new` → `sudo systemctl stop archipelago;
sudo cp /tmp/archipelago-new /usr/local/bin/archipelago; sudo chmod +x …; sudo systemctl
start archipelago`. Containers SURVIVE the restart (--restart unless-stopped +
podman-restart.service). Binary path is /usr/local/bin/archipelago.
- **Manifests** live at /opt/archipelago/apps/<app_id>/manifest.yml (root-owned ok). The
orchestrator CACHES them at startup → **edit on disk then RESTART archipelago to reload**.
Bulk deploy: `tar czf t.tgz -C apps indeedhub indeedhub-postgres indeedhub-redis
indeedhub-minio indeedhub-relay indeedhub-api indeedhub-ffmpeg`; scp; `sudo tar xzf t.tgz
-C /opt/archipelago/apps`.
- **Nodes:** .228 = 192.168.1.228, SSH pw `archipelago`, RPC/UI pw `password123` (https).
.198 = 192.168.1.198, SSH pw `archipelago`, **RPC/UI pw `ThisIsWeb54321@`** (https). Both
have the 7-container indeedhub stack + secrets + named volumes pre-existing.
- **Trigger install via RPC:** `auth.login` (sets session+csrf cookies) → send the csrf
cookie value as `X-CSRF-Token` header → `package.install` with params
`{"id":"indeedhub","dockerImage":"<any>"}` (dockerImage required even for stacks; install
is async → returns `{"status":"installing"}`). install logs go to
/var/log/archipelago/container-installs.log (best-effort) AND journalctl -u archipelago.
- **Fresh-create test recipe:** `podman rm -f indeedhub` (stateless frontend) → package.install
indeedhub → expect install_fresh + post_install hook (all 4 steps `ok`) + UI 200 on :7778
(/ , /nostr-provider.js, /api/). On adoption the frontend is NoOp (hook does NOT run —
install_fresh is the only hook trigger).
## 9. Documentation map (what survives)

View File

@ -20,6 +20,12 @@ RUN find public/assets -name "*backup*" -type f -delete || true && \
ENV DOCKER_BUILD=true
ENV NODE_ENV=production
# Public-demo build flag — inlined into the bundle (import.meta.env.VITE_DEMO).
# Enables the per-day intro replay, the "entertoexit" login hint, and other
# demo-only UI affordances. Override with --build-arg VITE_DEMO=0 for a plain build.
ARG VITE_DEMO=1
ENV VITE_DEMO=$VITE_DEMO
# Use npm script which handles build better
RUN npm run build:docker || (echo "Build failed! Listing files:" && ls -la && echo "Checking vite config:" && cat vite.config.ts && exit 1)

View File

@ -17,14 +17,34 @@ import fs from 'fs/promises'
import path from 'path'
import { fileURLToPath } from 'url'
import Docker from 'dockerode'
import { AsyncLocalStorage } from 'node:async_hooks'
import crypto from 'crypto'
const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)
const execPromise = promisify(exec)
// DEMO mode: public, multi-visitor sandbox. Each visitor gets an isolated,
// ephemeral copy of all mutable state (see per-session store below), real
// container runtimes are never touched, and idle sessions are reaped.
// When DEMO is off, behaviour is identical to the classic single-user dev mock.
const DEMO =
process.env.DEMO === '1' ||
process.env.VITE_DEMO === '1' ||
process.env.VITE_DEV_MODE === 'demo'
// Find container socket: Podman (macOS/Linux) or Docker
import { existsSync } from 'fs'
import { existsSync, readFileSync } from 'fs'
// Report the real app version, suffixed with -demo in the public sandbox so it's
// obviously the demo while still tracking whatever version the UI ships.
let APP_VERSION = '0.1.0'
try {
const pkg = JSON.parse(readFileSync(new URL('./package.json', import.meta.url), 'utf-8'))
if (pkg.version) APP_VERSION = pkg.version
} catch { /* fall back to default */ }
if (DEMO) APP_VERSION += '-demo'
function findContainerSocket() {
// DOCKER_HOST env var (set by podman machine start)
@ -47,9 +67,13 @@ function findContainerSocket() {
return null
}
const containerSocket = findContainerSocket()
// In DEMO mode we never bind to a real container runtime — the public demo must
// be host-independent and unable to touch the host's Docker/Podman.
const containerSocket = DEMO ? null : findContainerSocket()
const docker = containerSocket ? new Docker({ socketPath: containerSocket }) : null
if (containerSocket) {
if (DEMO) {
console.log('[Container] DEMO mode — simulation only (real runtime disabled)')
} else if (containerSocket) {
console.log(`[Container] Socket: ${containerSocket}`)
} else {
console.log('[Container] No socket found — simulation mode (no Docker/Podman)')
@ -84,12 +108,25 @@ app.use((req, res, next) => {
})
app.use(cookieParser())
// DEMO: bind every request to an isolated per-visitor state store (keyed by the
// `demo_sid` cookie) for the remainder of the request. Outside DEMO this is a
// no-op and all handlers share the single default store (classic mock behaviour).
app.use((req, res, next) => {
if (!DEMO) return next()
const store = resolveSessionStore(req, res)
stateContext.run(store, () => next())
})
// Mock session storage
const sessions = new Map()
const MOCK_PASSWORD = 'password123'
// Public demo uses a memorable shared password (shown on the login screen);
// the classic dev mock keeps password123.
const MOCK_PASSWORD = DEMO ? 'entertoexit' : 'password123'
// Mutable wallet state — faucet/send/receive modify these values
const walletState = {
// Mutable wallet state — faucet/send/receive modify these values.
// SEED_* objects are pristine templates; each demo session gets a deep clone
// (see makeSessionStore). `walletState` itself becomes a session-aware proxy below.
const SEED_WALLET = {
onchain_sats: 2_350_000,
channel_sats: 8_250_000,
ecash_sats: 250_000,
@ -103,7 +140,7 @@ const walletState = {
}
function randomHex(bytes) { return Array.from({length: bytes}, () => Math.floor(Math.random()*256).toString(16).padStart(2,'0')).join('') }
const bitcoinRelayMockState = {
const SEED_BTCRELAY = {
settings: {
enabled_for_peers: true,
allow_peer_requests: true,
@ -144,77 +181,45 @@ const bitcoinRelayMockState = {
],
}
// User state (simulated file-based storage)
let userState = {
setupComplete: false,
onboardingComplete: false,
passwordHash: null, // In real app, this would be bcrypt hash
}
let mockState = { analyticsEnabled: false }
// Initialize user state based on dev mode
function initializeUserState() {
switch (DEV_MODE) {
// User state (simulated file-based storage). Returns a fresh object per session.
// In DEMO mode the effective dev mode is "onboarding" so the intro/onboarding can
// play for each new visitor (the per-day replay gate lives in the frontend).
function seedUserState() {
const mode = DEMO ? 'onboarding' : DEV_MODE
switch (mode) {
case 'setup':
// Setup mode: Original StartOS node setup - user needs to set password
// This is the simple password setup, NOT the experimental onboarding
userState = {
setupComplete: false, // User hasn't set password yet
onboardingComplete: false, // Onboarding not relevant for setup mode
passwordHash: null,
}
break
// Setup mode: user needs to set a password (simple setup, not onboarding).
return { setupComplete: false, onboardingComplete: false, passwordHash: null }
case 'onboarding':
// Onboarding mode: Experimental onboarding flow
// User has set password (via setup) but needs to go through experimental onboarding
userState = {
setupComplete: true, // Password already set
onboardingComplete: false, // Needs experimental onboarding
passwordHash: MOCK_PASSWORD,
}
break
// Password already set; visitor still needs to go through onboarding/intro.
return { setupComplete: true, onboardingComplete: false, passwordHash: MOCK_PASSWORD }
case 'existing':
// Existing user: Fully set up, just needs to login
userState = {
setupComplete: true,
onboardingComplete: true,
passwordHash: MOCK_PASSWORD,
}
break
// Fully set up, just needs to log in.
return { setupComplete: true, onboardingComplete: true, passwordHash: MOCK_PASSWORD }
case 'boot':
// Boot mode: Simulate server startup delay (shows boot screen)
// Server responds with 502 for the first 10 seconds, then works like onboarding mode
userState = {
setupComplete: true,
onboardingComplete: false,
passwordHash: MOCK_PASSWORD,
}
break
// Simulate server startup delay (boot screen), then behave like onboarding.
return { setupComplete: true, onboardingComplete: false, passwordHash: MOCK_PASSWORD }
default:
// Default: Fully set up (for UI development)
userState = {
setupComplete: true,
onboardingComplete: true,
passwordHash: MOCK_PASSWORD,
}
// Default: fully set up (for UI development).
return { setupComplete: true, onboardingComplete: true, passwordHash: MOCK_PASSWORD }
}
console.log(`[Auth] Dev mode: ${DEV_MODE}`)
console.log(`[Auth] Setup: ${userState.setupComplete}, Onboarding: ${userState.onboardingComplete}`)
}
initializeUserState()
function seedMockState() {
return { analyticsEnabled: false }
}
// WebSocket clients for broadcasting updates
const wsClients = new Set()
console.log(`[Auth] Dev mode: ${DEV_MODE}${DEMO ? ' (DEMO multi-session)' : ''}`)
// Helper: Broadcast data update to all WebSocket clients
// Broadcast a data-update patch to the WebSocket clients of the CURRENT session
// only (so demo visitors never see each other's state). Outside a request context
// (e.g. startup) this resolves to the default store, matching single-user mode.
function broadcastUpdate(patch) {
const message = JSON.stringify({
rev: Date.now(),
patch: patch
})
wsClients.forEach(client => {
currentStore().sockets.forEach(client => {
if (client.readyState === 1) { // OPEN
client.send(message)
}
@ -676,10 +681,10 @@ async function uninstallPackage(id) {
}
// Mock data
const mockData = {
const SEED_MOCKDATA = {
'server-info': {
id: 'archipelago-demo',
version: '0.1.0',
version: APP_VERSION,
name: 'Archipelago',
pubkey: 'a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456',
'status-info': {
@ -907,9 +912,12 @@ app.get('/rpc/v1', (_req, res) => {
.send(`JSON-RPC is available at /rpc/v1 for POST requests only.\nOpen the dashboard at http://localhost:${uiPort}/.\n`)
})
// DEMO runs on a testnet (signet) so visitors can play with worthless coins.
const DEMO_CHAIN = DEMO ? 'signet' : 'main'
function mockBitcoinBlockchainInfo() {
return {
chain: 'main',
chain: DEMO_CHAIN,
blocks: 902418,
headers: 902418,
bestblockhash: randomHex(32),
@ -948,7 +956,7 @@ function bitcoinRelayStatusPayload() {
synced: true,
blocks: 902418,
headers: 902418,
chain: 'main',
chain: DEMO_CHAIN,
status_ok: true,
status_stale: false,
error: null,
@ -1142,6 +1150,11 @@ app.post('/rpc/v1', (req, res) => {
return res.json({ result: true })
}
case 'app.filebrowser-token': {
// The Cloud/Files UI exchanges this for a filebrowser auth cookie.
// The mock filebrowser endpoints don't validate it, so any token works.
return res.json({ result: { token: `demo-fb-${Date.now().toString(36)}` } })
}
case 'node.did': {
const mockDid = 'did:key:z6MkpTHR8VNsBxYAAWHut2Geadd9jSwuBV8xRoAnwWsdvktH'
const mockPubkey = 'a1b2c3d4e5f6789012345678901234567890abcdef1234567890abcdef123456'
@ -3012,10 +3025,10 @@ app.post('/rpc/v1', (req, res) => {
case 'update.status': {
return res.json({
result: {
current_version: '0.1.0',
latest_version: '0.1.1',
update_available: true,
release_notes: 'Bug fixes and performance improvements.',
current_version: APP_VERSION,
latest_version: APP_VERSION,
update_available: false,
release_notes: 'You are running the latest demo build.',
channel: 'stable',
},
})
@ -3251,7 +3264,7 @@ app.post('/rpc/v1', (req, res) => {
// =============================================================================
// Mock FileBrowser API (for Cloud page in demo/Docker deployments)
// =============================================================================
const MOCK_FILES = {
const SEED_FILES = {
'/': [
{ name: 'Music', path: '/Music', size: 0, modified: '2025-03-01T10:00:00Z', isDir: true, type: '' },
{ name: 'Documents', path: '/Documents', size: 0, modified: '2025-02-28T14:30:00Z', isDir: true, type: '' },
@ -3298,7 +3311,7 @@ const MOCK_FILES = {
],
}
const MOCK_FILE_CONTENTS = {
const SEED_FILE_CONTENTS = {
'/Documents/bitcoin-whitepaper-notes.md': `# Bitcoin Whitepaper Notes\n\n## Key Concepts\n\n### Peer-to-Peer Electronic Cash\n- No trusted third party needed\n- Double-spending solved via proof-of-work\n- Longest chain = truth\n\n### Proof of Work\n- SHA-256 based hashing\n- Difficulty adjusts every 2016 blocks (~2 weeks)\n- Incentive: block reward + transaction fees\n\n## My Thoughts\n- The 21M supply cap is genius - digital scarcity\n- Lightning Network solves the scaling concern\n- Self-custody is the whole point`,
'/Documents/node-setup-checklist.md': `# Archipelago Node Setup Checklist\n\n## Hardware\n- [x] Intel NUC / Mini PC (16GB RAM minimum)\n- [x] 2TB NVMe SSD\n- [x] USB drive for installer\n- [x] Ethernet cable\n\n## Core Apps\n- [x] Bitcoin Knots\n- [x] LND\n- [x] Mempool Explorer\n- [ ] BTCPay Server\n- [ ] Fedimint`,
'/Documents/lightning-channels.csv': `channel_id,peer_alias,capacity_sats,local_balance,remote_balance,status\nch_001,ACINQ,5000000,2450000,2550000,active\nch_002,WalletOfSatoshi,2000000,1200000,800000,active\nch_003,Voltage,10000000,4500000,5500000,active\nch_004,Kraken,3000000,1800000,1200000,active`,
@ -3320,53 +3333,153 @@ app.post('/app/filebrowser/api/login', (req, res) => {
res.send('"mock-filebrowser-token-demo"')
})
// FileBrowser list resources
app.get('/app/filebrowser/api/resources/*', (req, res) => {
const reqPath = decodeURIComponent(req.params[0] || '/').replace(/\/+$/, '') || '/'
const items = MOCK_FILES[reqPath] || []
// ── Per-session file store helpers ──────────────────────────────────────────
// store().files = { tree: { '<dir>': [entries] }, contents: { '<path>': string|Buffer }, bytes }
const FB_QUOTA_BYTES = Number(process.env.DEMO_FILE_QUOTA_BYTES) || 50 * 1024 * 1024
function fbNormalize(raw) {
// → leading slash, no trailing slash (root stays '/')
const p = '/' + decodeURIComponent(raw || '').split('/').filter(Boolean).join('/')
return p === '/' ? '/' : p.replace(/\/+$/, '')
}
function fbParent(p) {
const i = p.lastIndexOf('/')
return i <= 0 ? '/' : p.slice(0, i)
}
function fbBase(p) { return p.slice(p.lastIndexOf('/') + 1) }
function fbType(name) {
const ext = (name.includes('.') ? name.split('.').pop() : '').toLowerCase()
if (['mp3', 'wav', 'flac', 'ogg', 'm4a', 'aac'].includes(ext)) return 'audio'
if (['jpg', 'jpeg', 'png', 'gif', 'webp', 'svg', 'bmp'].includes(ext)) return 'image'
if (['mp4', 'webm', 'mov', 'mkv', 'avi'].includes(ext)) return 'video'
if (['txt', 'md', 'json', 'csv', 'log', 'yaml', 'yml', 'xml', 'conf', 'ini'].includes(ext)) return 'text'
return ''
}
function fbContentType(name) {
const t = fbType(name)
const ext = (name.includes('.') ? name.split('.').pop() : '').toLowerCase()
if (t === 'audio') return ext === 'wav' ? 'audio/wav' : 'audio/mpeg'
if (t === 'image') return ext === 'png' ? 'image/png' : ext === 'svg' ? 'image/svg+xml' : 'image/jpeg'
if (t === 'video') return 'video/mp4'
return 'text/plain; charset=utf-8'
}
function fbListResponse(res, items) {
res.json({
items,
numDirs: items.filter(i => i.isDir).length,
numFiles: items.filter(i => !i.isDir).length,
sorting: { by: 'name', asc: true },
})
}
// FileBrowser list resources (root: /api/resources or /api/resources/)
app.get(['/app/filebrowser/api/resources', '/app/filebrowser/api/resources/*'], (req, res) => {
const dir = fbNormalize(req.params[0] || '')
const items = currentStore().files.tree[dir] || []
fbListResponse(res, items)
})
app.get('/app/filebrowser/api/resources', (req, res) => {
const items = MOCK_FILES['/'] || []
res.json({
items,
numDirs: items.filter(i => i.isDir).length,
numFiles: items.filter(i => !i.isDir).length,
sorting: { by: 'name', asc: true },
})
})
// FileBrowser upload (POST to resources path) — mock accepts and discards the body
// FileBrowser POST = upload a file OR create a folder (trailing slash ⇒ folder)
app.post('/app/filebrowser/api/resources/*', (req, res) => {
req.resume()
req.on('end', () => res.sendStatus(200))
})
const store = currentStore()
const { tree, contents } = store.files
const isFolder = (req.params[0] || '').endsWith('/')
const full = fbNormalize(req.params[0] || '')
const parent = fbParent(full)
const name = fbBase(full)
if (!name) return res.sendStatus(400)
if (!tree[parent]) tree[parent] = []
// FileBrowser delete
app.delete('/app/filebrowser/api/resources/*', (req, res) => {
res.sendStatus(200)
})
// FileBrowser rename
app.patch('/app/filebrowser/api/resources/*', (req, res) => {
res.sendStatus(200)
})
// FileBrowser raw file content (for text file reading)
app.get('/app/filebrowser/api/raw/*', (req, res) => {
const reqPath = '/' + decodeURIComponent(req.params[0] || '')
const content = MOCK_FILE_CONTENTS[reqPath]
if (content) {
res.type('text/plain').send(content)
} else {
res.status(404).send('File not found')
if (isFolder) {
if (!tree[parent].some(e => e.name === name && e.isDir)) {
tree[parent].push({ name, path: full, size: 0, modified: new Date().toISOString(), isDir: true, type: '' })
if (!tree[full]) tree[full] = []
}
return res.sendStatus(200)
}
// File upload — collect body with a quota guard.
const chunks = []
let size = 0
let aborted = false
req.on('data', (c) => {
size += c.length
if (store.files.bytes + size > FB_QUOTA_BYTES) {
aborted = true
req.destroy()
return
}
chunks.push(c)
})
req.on('end', () => {
if (aborted) return res.status(507).send('Demo storage quota exceeded (50 MB)')
const buf = Buffer.concat(chunks)
// Replace existing entry of the same name (override=true).
const existing = tree[parent].find(e => e.name === name && !e.isDir)
if (existing) store.files.bytes -= existing.size
tree[parent] = tree[parent].filter(e => !(e.name === name && !e.isDir))
const type = fbType(name)
tree[parent].push({ name, path: full, size: buf.length, modified: new Date().toISOString(), isDir: false, type })
contents[full] = type === 'text' ? buf.toString('utf-8') : buf
store.files.bytes += buf.length
res.sendStatus(200)
})
req.on('error', () => { if (!res.headersSent) res.sendStatus(400) })
})
// FileBrowser delete (file or folder + its subtree)
app.delete('/app/filebrowser/api/resources/*', (req, res) => {
const store = currentStore()
const { tree, contents } = store.files
const full = fbNormalize(req.params[0] || '')
const parent = fbParent(full)
if (tree[parent]) {
const entry = tree[parent].find(e => e.path === full)
if (entry && !entry.isDir) store.files.bytes -= entry.size || 0
tree[parent] = tree[parent].filter(e => e.path !== full)
}
// Recursively drop a directory's children.
if (tree[full]) {
const stack = [full]
while (stack.length) {
const d = stack.pop()
for (const e of tree[d] || []) {
if (e.isDir) stack.push(e.path)
else { store.files.bytes -= e.size || 0; delete contents[e.path] }
}
delete tree[d]
}
}
delete contents[full]
res.sendStatus(200)
})
// FileBrowser rename/move (PATCH with { destination })
app.patch('/app/filebrowser/api/resources/*', (req, res) => {
const store = currentStore()
const { tree, contents } = store.files
const full = fbNormalize(req.params[0] || '')
const dest = fbNormalize((req.body && req.body.destination) || '')
if (!dest || dest === '/') return res.sendStatus(400)
const parent = fbParent(full)
const entry = (tree[parent] || []).find(e => e.path === full)
if (!entry) return res.sendStatus(404)
const newName = fbBase(dest)
entry.name = newName
entry.path = dest
entry.modified = new Date().toISOString()
entry.type = entry.isDir ? '' : fbType(newName)
if (contents[full] !== undefined) { contents[dest] = contents[full]; delete contents[full] }
res.sendStatus(200)
})
// FileBrowser raw file content (text reads, blob/stream fetches)
app.get('/app/filebrowser/api/raw/*', (req, res) => {
const full = fbNormalize(req.params[0] || '')
const content = currentStore().files.contents[full]
if (content === undefined) return res.status(404).send('File not found')
res.type(fbContentType(fbBase(full)))
res.send(Buffer.isBuffer(content) ? content : String(content))
})
// Claude API Proxy (reads ANTHROPIC_API_KEY from environment)
@ -3718,13 +3831,137 @@ app.get('/health', (req, res) => {
res.status(200).send('healthy')
})
// ───────────────────────────────────────────────────────────────────────────
// Per-session state isolation (DEMO multi-visitor sandbox)
//
// Every mutable global (mockData, walletState, userState, mockState,
// bitcoinRelayMockState) and the filebrowser file store is partitioned per
// visitor. Handlers keep referring to those names unchanged — the names are
// Proxies that forward to the current request's store, resolved via
// AsyncLocalStorage. Outside DEMO (or outside a request, e.g. at startup) they
// resolve to a single shared `defaultStore`, so classic single-user behaviour
// is byte-for-byte preserved.
// ───────────────────────────────────────────────────────────────────────────
const stateContext = new AsyncLocalStorage()
// Build a fresh, fully-isolated state bundle from the pristine seeds.
function makeSessionStore() {
const md = structuredClone(SEED_MOCKDATA)
// No real runtime in DEMO → package list is the curated static app set.
md['package-data'] = structuredClone(staticDevApps)
return {
mockData: md,
walletState: structuredClone(SEED_WALLET),
userState: seedUserState(),
mockState: seedMockState(),
bitcoinRelayMockState: structuredClone(SEED_BTCRELAY),
files: { tree: structuredClone(SEED_FILES), contents: structuredClone(SEED_FILE_CONTENTS), bytes: 0 },
sockets: new Set(),
lastSeen: Date.now(),
}
}
// The shared store used in single-user mode and for any work outside a request.
const defaultStore = makeSessionStore()
function currentStore() {
return stateContext.getStore() || defaultStore
}
// A Proxy whose every operation is delegated to currentStore()[bucket], so the
// existing handler code (`mockData['package-data']`, `walletState.x += n`, …)
// transparently reads/writes the right visitor's state.
function sessionBucketProxy(bucket) {
const target = () => currentStore()[bucket]
return new Proxy(Object.create(null), {
get: (_t, k) => target()[k],
set: (_t, k, v) => { target()[k] = v; return true },
has: (_t, k) => k in target(),
deleteProperty: (_t, k) => { delete target()[k]; return true },
ownKeys: () => Reflect.ownKeys(target()),
getOwnPropertyDescriptor: (_t, k) => {
const d = Object.getOwnPropertyDescriptor(target(), k)
if (d) d.configurable = true
return d
},
defineProperty: (_t, k, d) => { Object.defineProperty(target(), k, d); return true },
getPrototypeOf: () => Object.prototype,
})
}
const mockData = sessionBucketProxy('mockData')
const walletState = sessionBucketProxy('walletState')
const userState = sessionBucketProxy('userState')
const mockState = sessionBucketProxy('mockState')
const bitcoinRelayMockState = sessionBucketProxy('bitcoinRelayMockState')
// Demo session lifecycle: keyed by the `demo_sid` cookie, capped, idle-reaped.
const demoSessions = new Map() // sid -> store
const DEMO_SESSION_TTL_MS = Number(process.env.DEMO_SESSION_TTL_MS) || 45 * 60 * 1000
const DEMO_MAX_SESSIONS = Number(process.env.DEMO_MAX_SESSIONS) || 500
function resolveSessionStore(req, res) {
let sid = req.cookies?.demo_sid
if (!sid || !demoSessions.has(sid)) {
// Cap concurrent sessions: evict the oldest if we're at the limit.
if (demoSessions.size >= DEMO_MAX_SESSIONS) {
let oldestSid = null, oldest = Infinity
for (const [k, s] of demoSessions) if (s.lastSeen < oldest) { oldest = s.lastSeen; oldestSid = k }
if (oldestSid) { reapSession(oldestSid) }
}
sid = crypto.randomUUID()
demoSessions.set(sid, makeSessionStore())
res.cookie('demo_sid', sid, { httpOnly: true, sameSite: 'lax', maxAge: DEMO_SESSION_TTL_MS })
}
const store = demoSessions.get(sid)
store.lastSeen = Date.now()
store.sid = sid
return store
}
// Resolve the session store for a WebSocket upgrade request. The HTTP layer has
// already issued the `demo_sid` cookie by the time the socket connects; if it is
// somehow absent we fall back to a fresh ephemeral store (no cookie to set here).
function wsStoreForRequest(req) {
const raw = req.headers?.cookie || ''
const m = raw.match(/(?:^|;\s*)demo_sid=([^;]+)/)
const sid = m && m[1]
if (sid && demoSessions.has(sid)) {
const store = demoSessions.get(sid)
store.lastSeen = Date.now()
return store
}
const store = makeSessionStore()
if (sid) demoSessions.set(sid, store)
return store
}
function reapSession(sid) {
const store = demoSessions.get(sid)
if (!store) return
for (const ws of store.sockets) { try { ws.close(4000, 'session expired') } catch { /* ignore */ } }
demoSessions.delete(sid)
}
if (DEMO) {
setInterval(() => {
const now = Date.now()
for (const [sid, store] of demoSessions) {
if (now - store.lastSeen > DEMO_SESSION_TTL_MS) reapSession(sid)
}
}, 60 * 1000).unref?.()
}
// WebSocket endpoint
const server = http.createServer(app)
const wss = new WebSocketServer({ server, path: '/ws/db' })
wss.on('connection', (ws, req) => {
console.log('[WebSocket] Client connected from', req.socket.remoteAddress)
wsClients.add(ws)
// Attach this socket to the visitor's session store so broadcasts only reach
// that visitor. In non-DEMO mode every socket joins the single default store.
const wsStore = DEMO ? wsStoreForRequest(req) : defaultStore
wsStore.sockets.add(ws)
// Set up ping/pong to keep connection alive
const pingInterval = setInterval(() => {
@ -3751,11 +3988,12 @@ wss.on('connection', (ws, req) => {
}
}, 45000) // Every 45s (client expects data within 60s)
// Send initial data immediately
// Send initial data immediately (this visitor's store, not the global proxy —
// there is no request context inside the WS connection handler).
try {
ws.send(JSON.stringify({
type: 'initial',
data: mockData,
data: wsStore.mockData,
}))
console.log('[WebSocket] Initial data sent')
} catch (err) {
@ -3780,14 +4018,14 @@ wss.on('connection', (ws, req) => {
console.log('[WebSocket] Client disconnected', { code, reason: reason.toString() })
clearInterval(pingInterval)
clearInterval(heartbeatInterval)
wsClients.delete(ws)
wsStore.sockets.delete(ws)
})
ws.on('error', (error) => {
console.error('[WebSocket Error]', error)
clearInterval(pingInterval)
clearInterval(heartbeatInterval)
wsClients.delete(ws)
wsStore.sockets.delete(ws)
})
})

View File

@ -0,0 +1,49 @@
/**
* Public-demo helpers.
*
* The demo build (VITE_DEMO=1) replays the intro/onboarding on each visit, but
* only once per calendar day per browser tracked in localStorage so it
* survives the short-lived backend session. Also exposes the shared demo
* credentials shown on the login screen.
*/
export const IS_DEMO =
import.meta.env.VITE_DEMO === '1' || import.meta.env.VITE_DEMO === 'true'
/** Memorable shared password for the public demo (must match the mock backend). */
export const DEMO_PASSWORD = 'entertoexit'
const INTRO_DATE_KEY = 'demo_intro_date'
function todayKey(): string {
// Local calendar day, e.g. "2026-06-22".
const d = new Date()
return `${d.getFullYear()}-${String(d.getMonth() + 1).padStart(2, '0')}-${String(d.getDate()).padStart(2, '0')}`
}
/** True if this browser already watched the intro earlier today. */
export function demoIntroSeenToday(): boolean {
try {
return localStorage.getItem(INTRO_DATE_KEY) === todayKey()
} catch {
return false
}
}
/** Record that the intro has been seen today, so it won't replay until tomorrow. */
export function markDemoIntroSeen(): void {
try {
localStorage.setItem(INTRO_DATE_KEY, todayKey())
} catch {
/* ignore (private mode / storage disabled) */
}
}
/** Forget today's "seen" marker so the intro plays again (e.g. "Replay Intro"). */
export function clearDemoIntroSeen(): void {
try {
localStorage.removeItem(INTRO_DATE_KEY)
} catch {
/* ignore */
}
}

View File

@ -156,6 +156,11 @@
<!-- Normal Login Mode -->
<template v-else>
<!-- Demo credential hint -->
<div v-if="isDemo" class="mb-4 p-3 bg-orange-500/15 border border-orange-400/30 rounded-lg text-orange-100 text-sm text-center">
🎮 Demo mode Password: <span class="font-mono font-semibold">{{ DEMO_PASSWORD }}</span>
</div>
<div class="mb-6">
<label for="login-password" class="block text-sm font-medium text-white/80 mb-2">
{{ t('login.password') }}
@ -228,6 +233,7 @@ const { t } = useI18n()
import { useLoginTransitionStore } from '../stores/loginTransition'
import { rpcClient } from '../api/rpc-client'
import { resumeAudioContext, startSynthwave, stopSynthwave, playLoginSuccessWhoosh, playPop } from '@/composables/useLoginSounds'
import { IS_DEMO, DEMO_PASSWORD, clearDemoIntroSeen } from '@/composables/useDemoIntro'
const router = useRouter()
const currentRoute = useRoute()
@ -241,7 +247,8 @@ const loginRedirectTo = computed(() => {
const store = useAppStore()
const loginTransition = useLoginTransitionStore()
const password = ref('')
const isDemo = IS_DEMO
const password = ref(IS_DEMO ? DEMO_PASSWORD : '')
const confirmPassword = ref('')
const loading = ref(false)
const error = ref<string | null>(null)
@ -520,6 +527,8 @@ async function handleTotpVerify() {
function replayIntro() {
// Clear the intro seen flag
localStorage.removeItem('neode_intro_seen')
// Demo: also clear the per-day gate so the intro plays again now.
if (IS_DEMO) clearDemoIntroSeen()
// Navigate to root to trigger splash screen
window.location.href = '/'
}

View File

@ -53,11 +53,15 @@ import { ref, onMounted } from 'vue'
import { useRouter } from 'vue-router'
import AnimatedLogo from '@/components/AnimatedLogo.vue'
import { playNavSound } from '@/composables/useNavSounds'
import { IS_DEMO, markDemoIntroSeen } from '@/composables/useDemoIntro'
const router = useRouter()
const ctaButton = ref<HTMLButtonElement | null>(null)
onMounted(() => {
// Demo: once the visitor has seen the intro today, don't auto-replay it again
// until tomorrow (they can still use "Replay Intro" on the login screen).
if (IS_DEMO) markDemoIntroSeen()
// Auto-focus after entry animation completes (1.4s animation delay + 0.6s duration)
setTimeout(() => {
ctaButton.value?.focus({ preventScroll: true })

View File

@ -16,11 +16,22 @@
import { ref, onMounted } from 'vue'
import { useRouter } from 'vue-router'
import { isOnboardingComplete } from '@/composables/useOnboarding'
import { IS_DEMO, demoIntroSeenToday } from '@/composables/useDemoIntro'
import BootScreen from '@/components/BootScreen.vue'
const router = useRouter()
const showBootScreen = ref(false)
/**
* Public demo: replay the intro on every visit, but at most once per calendar
* day per browser. If already seen today straight to login; otherwise intro.
*/
function demoRoute() {
const dest = demoIntroSeenToday() ? '/login' : '/onboarding/intro'
log('demoRoute', { dest })
router.replace(dest).catch(() => {})
}
function log(msg: string, data?: unknown) {
const ts = new Date().toISOString()
const entry = `[RootRedirect ${ts}] ${msg}` + (data !== undefined ? ` ${JSON.stringify(data)}` : '')
@ -68,6 +79,10 @@ async function checkOnboarded(): Promise<boolean> {
}
async function proceedToApp() {
if (IS_DEMO) {
demoRoute()
return
}
const devMode = import.meta.env.VITE_DEV_MODE
if (devMode === 'setup' || devMode === 'existing') {
log('proceedToApp devMode', { devMode })
@ -121,6 +136,11 @@ onMounted(async () => {
log('production flow', { isUp })
if (isUp) {
// Demo: per-day intro gate instead of server-side onboarding state.
if (IS_DEMO) {
demoRoute()
return
}
const onboarded = await checkOnboarded()
if (onboarded) {
log('server up + onboarded → proceedToApp')