archy/.claude/plans/toasty-inventing-cascade.md
Dorian 1e283daf13 fix: overhaul container lifecycle — recovery, health, uninstall, UI state
Container recovery:
- Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s
- Dependency-aware restarts: won't restart services before their deps
- Reset dependent counters when a dependency recovers
- Handle "created" state containers (were invisible to health monitor)
- Added IndeedHub, mempool-api, mysql to tier system
- Crash recovery: podman start timeout 30s→120s with retry
- Podman client: socket timeout 5s→30s, added restart policy

UI state representation:
- Exit code 0 shows "stopped" (gray), not "crashed" (red)
- Exit code 137 shows "killed (OOM)"
- Non-zero exit shows "crashed" (red)
- Added exit_code field to PackageDataEntry

Install/uninstall fixes:
- Install returns error when container doesn't start (was silent success)
- Post-install hooks awaited instead of fire-and-forget tokio::spawn
- Uninstall: graceful rm before force, volume prune, network cleanup
- Uninstall returns error on partial failure (was 200 OK)

Config consistency:
- DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded)
- Bitcoin: added ZMQ ports 28332/28333 for LND block notifications
- IndeedHub port 7777→8190 (was conflicting with strfry)
- Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0

Performance:
- Metrics collector interval 60s→300s (was duplicating health monitor)
- Podman client: proper error propagation instead of unwrap_or_default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:03:57 +01:00

7.3 KiB

Plan: ISO Polish — Fix Everything for Beta Release

Context

Fresh ISO install on .198 revealed 11 issues ranging from critical (app installs, Tor broken) to UX (GRUB scaling, boot splash, kiosk reliability). Goal: next ISO build produces a flawless out-of-box experience.

Issues & Fixes (priority order)

1. CRITICAL: Tor services.json not written (escaping bug)

Symptom: setup-tor.sh: line 12: $ARCHY_TOR_DIR/services.json: No such file or directory Root cause: In build-auto-installer-iso.sh, the setup-tor heredoc escapes $ARCHY_TOR_DIR as \$ARCHY_TOR_DIR, producing a literal $ in the output script. The variable never expands at runtime. Fix: In the heredoc that generates setup-tor.sh (~line 1200), use unescaped $ARCHY_TOR_DIR so it expands at runtime. The heredoc itself uses <<TORSCRIPT (unquoted) so we need to check the quoting carefully. File: image-recipe/build-auto-installer-iso.sh (setup-tor heredoc section)

2. CRITICAL: App installs failing ("Operation failed")

Symptom: Screenshot shows "Failed: Error: Operation failed. Check server logs" + "Downloading..." stuck Root cause: This is the OLD build (pre-CSRF fix). The new build has the fix. However, sanitize_error_message() in middleware.rs masks ALL real errors. Need to verify the new build works. Fix: Already fixed (auth.ts, rpc-client.ts, mod.rs). Verify on next ISO. Also: Consider allowing "Failed to pull" errors through the sanitizer so users see meaningful install errors. File: core/archipelago/src/api/rpc/middleware.rs

3. HIGH: Kiosk white screen / never loads on first boot

Symptom: First boot: black screen → white screen → kiosk never loads. Second boot works fine. Root cause: The kiosk ExecStartPre health check polls 15x with 2s delay (30s max), but on first boot the backend may not be ready within 30s (first-boot-containers, Tor setup, etc. all running). Chromium opens http://localhost/kiosk before nginx/backend is fully up → white page. No retry logic in the launcher. Fix: Increase health check to 30 attempts (60s). Add a loading page that Chromium shows while waiting (a simple HTML file served by nginx even when backend is down). Add --disable-gpu flag to Chromium (fixes some white screen issues on low-end GPUs). File: image-recipe/build-auto-installer-iso.sh (kiosk launcher + ExecStartPre)

4. HIGH: GRUB theme text not scaling / cut off on 4:3 monitors

Symptom: Screenshot shows "Install (var" cut off, menu items barely readable on 1280x1024 Dell Root cause: GRUB theme uses percentage-based layout but no font size control. GRUB defaults to a small bitmap font. The item_height = 40 is fixed pixels, too small at some resolutions. No explicit font loaded in theme.txt. Fix: In grub.cfg, load a larger font (24px DejaVu or similar). Adjust theme.txt: increase item_height, move menu position up, ensure text fits. Add loadfont to grub.cfg. Files: image-recipe/branding/grub-theme/theme.txt, image-recipe/build-auto-installer-iso.sh (grub.cfg generation)

5. HIGH: LUKS partition not showing in disk stats

Symptom: Server view doesn't show LUKS encryption status or the encrypted partition Root cause: Backend system.disk-status uses df / or df /var/lib/archipelago but doesn't report LUKS status. No cryptsetup status call. Frontend only shows used/total/free/percent. Fix: Add LUKS detection to the disk status RPC: check if /dev/mapper/archipelago* exists, read cryptsetup status. Return encrypted: true/false and encryption_cipher fields. Frontend: show a lock icon + "LUKS2 Encrypted" badge in disk stats. Files: core/archipelago/src/api/rpc/system/handlers.rs, neode-ui/src/views/Server.vue

6. MEDIUM: No Plymouth boot splash showing

Symptom: No animation between GRUB and login — just black screen with blinking cursor Root cause: Plymouth theme files exist in image-recipe/branding/plymouth-theme/ but the ISO build doesn't copy the logo.png or install the theme properly. Also kernel cmdline needs splash quiet and plymouth-set-default-theme must be run. Fix: Verify the ISO build copies plymouth theme + logo.png to rootfs, runs plymouth-set-default-theme archipelago, and kernel cmdline includes splash quiet. File: image-recipe/build-auto-installer-iso.sh (plymouth setup section)

7. MEDIUM: No custom MOTD

Symptom: Default Debian MOTD on VT1 login Fix: Add custom MOTD to ISO build that shows Archipelago ASCII logo, version, IP address, and useful commands (kiosk toggle, SSH info). File: image-recipe/build-auto-installer-iso.sh (add MOTD generation)

8. MEDIUM: Onboarding intro needs double press

Symptom: Pressing the intro circle/button once resets, need to press twice Root cause: SplashScreen.vue has a 48-segment ring animation triggered on hover. The splash → intro transition may have a race condition with animation completion. OnboardingIntro.vue auto-focuses CTA after 2100ms delay — if user clicks before that, focus may steal the event. Fix: Investigate SplashScreen.vue transition timing. Add click debounce or ensure single-click always proceeds. Files: neode-ui/src/components/SplashScreen.vue, neode-ui/src/views/OnboardingIntro.vue

9. MEDIUM: No TUI animations in actual installer

Symptom: Installer is functional but plain — no bouncing Bitcoin, no glitch effects from demo Root cause: scripts/install-tui-demo.sh has elaborate animations but the actual installer in the ISO build script is minimal (basic spinner + typewriter only). Fix: Port key animations from install-tui-demo.sh into the actual installer: logo decrypt reveal, progress bars with percentage, phase transitions. Keep it lightweight but visually distinctive. File: image-recipe/build-auto-installer-iso.sh (auto-install.sh section)

10. LOW: Container tests CI failing

Symptom: cargo: command not found in container-tests workflow Fix: Add source $HOME/.cargo/env to test steps. Already staged locally. File: .gitea/workflows/container-tests.yml

11. LOW: Kiosk enable/disable command lacks visual feedback

Symptom: User runs command, MOTD changes but no immediate visual confirmation Root cause: The archipelago-kiosk script DOES print feedback messages. The issue may be that VT auto-switches and the user doesn't see the output. Fix: Add a brief sleep before VT switch so user sees the confirmation message. Consider adding a --quiet flag for scripted use. File: image-recipe/build-auto-installer-iso.sh (kiosk toggle script)

Execution Order

  1. Tor fix (#1) — 5 min, critical
  2. Kiosk reliability (#3) — 15 min, high impact
  3. GRUB text scaling (#4) — 15 min, visible
  4. LUKS disk stats (#5) — 20 min, backend + frontend
  5. App install error messages (#2) — 10 min, verify + improve
  6. Plymouth boot splash (#6) — 15 min
  7. Custom MOTD (#7) — 10 min
  8. Intro double-press (#8) — 10 min
  9. TUI animations (#9) — 30 min (port from demo)
  10. CI fix (#10) — 2 min
  11. Kiosk feedback (#11) — 5 min

Verification

  • Build new ISO on .228 via CI (push to main)
  • Flash to USB, install on .198
  • Check: GRUB readable → Plymouth splash → TUI installer animations → MOTD shows → Kiosk loads first time → Tor onion addresses visible → App installs work → Disk shows LUKS → Intro single-click works