archy/docs/BETA-ISSUES-20260328.md

97 lines
4.0 KiB
Markdown
Raw Normal View History

release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
# Beta Test Issues — 2026-03-28 (ISO build 2137)
Hardware: Dell OptiPlex 3020M, i5, 8GB RAM, 465G HDD, UEFI+Legacy
## ISO / Boot (image-recipe)
### 1. UEFI autodetect broken
- **Severity**: High
- **Detail**: Only autodetects/boots in Legacy BIOS mode. UEFI boot does not autodetect the install disk.
- **Where**: `build-auto-installer-iso.sh` GRUB config, EFI boot chain
- **Status**: TODO
### 2. Installation TUI screens need redesign
- **Severity**: Medium
- **Detail**: Current installer output is plain/ugly. Needs polished design.
- **Action**: User will provide .md mockup for each screen, then we implement.
- **Where**: `build-auto-installer-iso.sh` auto-install.sh embedded script
- **Status**: AWAITING DESIGN
### 3. No TUI animations
- **Severity**: Low
- **Detail**: Would like Claude-style spinner/progress animations during install. May not be possible with bash.
- **Where**: auto-install.sh
- **Status**: TODO (investigate)
### 4. USB read errors on boot
- **Severity**: Medium (cosmetic but bad first impression)
- **Detail**: Read errors scroll on screen during USB boot before installer loads. Scares new users.
- **Where**: Kernel/initramfs boot, possibly `quiet` not suppressing early messages
- **Status**: TODO
### 5. GRUB background tiling + text cutoff
- **Severity**: Medium
- **Detail**: Boot menu background image tiles instead of scaling. Menu text ("Install Archipelago", "Failsafe mode") is cut off.
- **Where**: `branding/grub-theme/`, `boot/grub/grub.cfg`, theme.txt resolution settings
- **Status**: TODO
### 6. USB removal drops to command line
- **Severity**: Medium
- **Detail**: After install completes, removing USB drops to shell before user presses Enter to reboot. Confuses non-technical users.
- **Where**: auto-install.sh — end of install, before `read -s` / `reboot`
- **Status**: TODO
## Frontend / UI (neode-ui)
### 7. Broken splash screen flashes before onboarding
- **Severity**: High
- **Detail**: Black screen with "online/offline" top-right, broken archipelago image top-left, "use arrow keys" text. Flashes briefly before onboarding loads.
- **Where**: Likely `RootRedirect.vue` or `SplashScreen.vue` — routing/transition timing
- **Status**: TODO (reported before, persists)
### 8. Skip buttons still visible in onboarding
- **Severity**: Medium
- **Detail**: Onboarding flow still shows skip buttons. Should be removed for clean UX.
- **Where**: `src/views/onboarding/` components
- **Status**: TODO
### 9. App install UX outdated
- **Severity**: High
- **Detail**: Missing the yellow "Installing..." button that persists across navigation. Apps don't show as "installing" in My Apps view during install.
- **Where**: `src/views/marketplace/`, `src/views/myapps/`, app install store
- **Status**: TODO
### 10. Login requires double Enter
- **Severity**: Medium
- **Detail**: Password field on login page requires pressing Enter twice to submit.
- **Where**: `src/views/LoginView.vue` — form submission handler
- **Status**: TODO (reported before, persists)
### 11. No password setting UI
- **Severity**: High
- **Detail**: No way for user to set/change their password from the web UI. Currently hardcoded `password123`.
- **Where**: Settings view, backend auth API
- **Status**: TODO
### 12. Browser login loops (non-kiosk)
- **Severity**: High
- **Detail**: Logging in from a browser (not kiosk) on the same network redirects back to login in a loop. Kiosk mode works fine.
- **Where**: Auth/session handling — possibly cookie `SameSite` or redirect logic in `RootRedirect.vue`
- **Status**: TODO
### 13. Can't exit input fields with arrow keys
- **Severity**: Medium
- **Detail**: When focused on a text input, up/down arrow keys don't move focus to adjacent UI elements. Stuck in the field.
- **Where**: `useControllerNav.ts` — input field focus trap logic
- **Status**: TODO (reported before, persists)
---
## Summary
| Category | Critical | High | Medium | Low |
|----------|----------|------|--------|-----|
| ISO/Boot | 0 | 1 | 4 | 1 |
| Frontend | 0 | 4 | 3 | 0 |
| **Total** | **0** | **5** | **7** | **1** |