archy/image-recipe/configs/archipelago-kiosk.service
archipelago 45ac9be965 fix(kiosk): cap chromium resources + drop GPU rasterization when headless (#36)
The kiosk chromium pinned ~92% of a core (software-compositing spin from
--enable-gpu-rasterization on a GPU-less/headless node), saturating the machine
and starving the backend + container builds — it caused the .198 receive timeout
and the deploy storms.

- archipelago-kiosk.service: CPUQuota=75% + MemoryMax/High + Delegate, so a
  runaway kiosk can never take the whole node down.
- archipelago-kiosk-launcher.sh: detect /dev/dri — use GPU rasterization only
  when a GPU exists, else --disable-gpu (avoids the headless spin).
- bootstrap::ensure_kiosk_hardened: OTA self-heal that installs the updated
  unit+launcher on already-deployed nodes, daemon-reloads, and only try-restarts
  a *running* kiosk (never re-enables an operator-disabled one).

cargo check clean; launcher bash -n clean; unit syntax valid.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 11:10:26 -04:00

35 lines
1.4 KiB
Desktop File

[Unit]
Description=Archipelago Kiosk (X11 + Chromium)
After=archipelago.service systemd-user-sessions.service network-online.target
Wants=archipelago.service network-online.target
ConditionPathExists=/usr/local/bin/archipelago-kiosk-launcher
Conflicts=getty@tty1.service
[Service]
Type=simple
# Wait up to 5 min for archipelago to serve /health. On slow hardware
# first-boot is dominated by the FileBrowser pull (unbundled ISO),
# initial archipelago state sync, and frontend settle — .198 took
# longer than 120s and chromium launched against an empty backend,
# producing a white window that only recovered on reboot. 300s gives
# slow-but-functional hardware enough headroom; TimeoutStartSec is
# bumped in lockstep so systemd doesn't kill us mid-wait.
ExecStartPre=/bin/bash -c 'for i in $(seq 1 150); do curl -sf http://localhost/health >/dev/null 2>&1 && break; sleep 2; done'
ExecStart=/usr/local/bin/archipelago-kiosk-launcher
TimeoutStartSec=360
Restart=always
RestartSec=5
# Resource guardrail (#36). On GPU-less / headless hardware chromium could spin
# software compositing at ~92% of a core, saturating the node and starving the
# backend (it caused the .198 receive timeout + deploy storms). Cap CPU + memory
# so a runaway kiosk can never take the whole machine down; Delegate so the cap
# also binds the chromium/Xorg children in this unit's cgroup.
Delegate=yes
CPUQuota=75%
MemoryMax=1500M
MemoryHigh=1200M
[Install]
WantedBy=multi-user.target