archy/image-recipe/configs/archipelago-kiosk.service
archipelago c375ecc441 fix: fresh-ISO feedback bug-bash — onboarding, status truthfulness, recovery, kiosk, logs
Fixes from real fresh-install feedback (Framework node .81) + its log bundle:

Backend:
- websocket: subscribe before initial snapshot — broadcasts in the gap were
  silently lost, stranding clients on stale state until a hard refresh
  (the "everything needs ctrl-r" bug: My Apps stuck Loading, App Store
  stuck Checking, containers-scanned never arriving)
- crash recovery: check the crash marker BEFORE writing our own PID —
  recovery had never run on any node (always saw its own PID and skipped);
  PID-reuse guard via /proc cmdline
- boot status: pending-boot-starts registry (recovery, stack recovery,
  reconciler, adoption) — scanner overlays queued-but-down apps as
  Restarting instead of Stopped after a reboot; scanner-authored
  Restarting resolves immediately on a settled scan (no transitional wedge)
- install deps: bounded wait (36x5s) when a dependency is installed but
  still starting ("Waiting for Bitcoin to start…") instead of instant
  rejection; dependency-gate rejections remove the optimistic entry (no
  phantom Stopped tile) and surface as a notification
- seed backup: auth.setup persists the onboarding mnemonic as the
  encrypted seed backup (reveal previously failed on EVERY node — nothing
  ever wrote master_seed.enc); seed.restore stashes too; error sanitizer
  lets seed/2FA errors through instead of "Check server logs"
- lnd: bitcoind.rpchost resolved from the running Bitcoin variant
  (hardcoded bitcoin-knots broke Core nodes); manifest uses derived_env
- bitcoin status: clean human message for connection-reset/startup; raw
  URLs + os-error chains no longer reach the app card
- fedimint-clientd: chown /var/lib/archipelago/fmcd to 1000:1000 (root-
  created dir crash-looped the rootless container, EACCES) — first-boot
  script + pre-start self-heal
- log volume (>1GB/day on a day-old node): journald caps drop-in (ISO +
  bootstrap self-heal), bitcoind -printtoconsole=0 everywhere (90% of the
  journal was IBD UpdateTip spam), tracing default debug→info

Frontend:
- Login: Enter advances to confirm field then submits; submit always
  clickable with inline errors (was silently disabled on mismatch);
  Restart Onboarding needs a confirming second click (the mismatch →
  "onboarding restarted" trap)
- sync store: 30s state reconciliation + refetch on re-entrant connect;
  20s containers-scanned escape hatch so Checking can never show forever;
  fresh empty node reaches the real "no apps yet" state
- intro video: CRF20 re-encode (SSIM 0.988) + faststart — moov was at EOF
  so playback needed the full 15MB first (the intro lag)
- backgrounds: 10 heaviest JPEGs → WebP q90 (9.4MB→6.6MB); 7 stayed JPEG
  (WebP larger on noisy sources)
- Web5ConnectedNodes: drop unused template ref that failed vue-tsc -b

ISO/kiosk:
- nginx: /assets/ 404s no longer cached immutable for a year; HTTPS block
  gained the missing /assets/ location (served index.html as images)
- kiosk: launcher/service spliced from configs/ at ISO build (stale
  heredoc force-disabled GPU); MemoryHigh/Max 1200/1500→2200/2800M (kiosk
  rode the reclaim throttle = the lag); firmware-intel-graphics +
  firmware-amd-graphics (trixie split DMC blobs out of misc-nonfree)

Verified: cargo test 898/898 green, npm run build green with dist
contents confirmed (webp refs, lnd.png, faststart video, new strings).
Handover for ISO build + deploy: docs/HANDOVER-2026-07-02-iso-feedback.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 08:00:39 -04:00

39 lines
1.7 KiB
Desktop File

[Unit]
Description=Archipelago Kiosk (X11 + Chromium)
After=archipelago.service systemd-user-sessions.service network-online.target
Wants=archipelago.service network-online.target
ConditionPathExists=/usr/local/bin/archipelago-kiosk-launcher
Conflicts=getty@tty1.service
[Service]
Type=simple
# Wait up to 5 min for archipelago to serve /health. On slow hardware
# first-boot is dominated by the FileBrowser pull (unbundled ISO),
# initial archipelago state sync, and frontend settle — .198 took
# longer than 120s and chromium launched against an empty backend,
# producing a white window that only recovered on reboot. 300s gives
# slow-but-functional hardware enough headroom; TimeoutStartSec is
# bumped in lockstep so systemd doesn't kill us mid-wait.
ExecStartPre=/bin/bash -c 'for i in $(seq 1 150); do curl -sf http://localhost/health >/dev/null 2>&1 && break; sleep 2; done'
ExecStart=/usr/local/bin/archipelago-kiosk-launcher
TimeoutStartSec=360
Restart=always
RestartSec=5
# Resource guardrail (#36). On GPU-less / headless hardware chromium could spin
# software compositing at ~92% of a core, saturating the node and starving the
# backend (it caused the .198 receive timeout + deploy storms). Cap CPU + memory
# so a runaway kiosk can never take the whole machine down; Delegate so the cap
# also binds the chromium/Xorg children in this unit's cgroup.
Delegate=yes
CPUQuota=75%
# Raised from 1500M/1200M: a Framework (Tiger Lake) kiosk sat at 806M used /
# 1.1G peak, riding the old MemoryHigh reclaim-throttle line — the throttling
# itself was the perceived UI lag. Keep Max well above real peaks; High stays
# the soft reclaim line so a runaway kiosk still can't take the machine down.
MemoryMax=2800M
MemoryHigh=2200M
[Install]
WantedBy=multi-user.target