archy/docs/BETA-PROGRESS.md
Dorian 1a31c33ae8 fix: BUG-1 CSRF, TASK-8 H2/H3/H4, BUG-20/37/40/41 — 7 bugs fixed
BUG-1 (P0): CSRF tokens now HMAC-derived from session token instead of
random — survives backend restarts, eliminates cookie/header race conditions.
Frontend retries 403s as belt-and-suspenders.

TASK-8 H2: federation.peer-joined verifies ed25519 signature on join messages.
TASK-8 H3: federation.peer-address-changed requires signed proof from known peer.
TASK-8 H4: Rust backend default bind 0.0.0.0 → 127.0.0.1 (nginx proxies all).

BUG-20: ElectrumX index estimate string fixed from ~55GB to ~130GB.
BUG-37: App card Start/Stop buttons split into loading vs interactive states
        to prevent WebSocket state flicker during container scans.
BUG-40: Uninstall modal uses Teleport to body with z-[3000] for full overlay.
BUG-41: Uninstalling overlay on card + optimistic store removal.

Updated MASTER_PLAN.md and BETA-PROGRESS.md to reflect all completed work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:05:21 +00:00

12 KiB

Beta Progress Tracker

Goal: Flawless beta that works perfectly on every machine we install it on. Freeze started: 2026-03-18 Last updated: 2026-03-18


Pipeline

PHASE 1: Feature Testing (internal)     ← WE ARE HERE
    ↓
PHASE 2: User Testing (real users, controlled)
    ↓
PHASE 3: Beta Live (public release)

Current phase: PHASE 1 — Feature Testing Gate to Phase 2: Every feature works, all bugs fixed, security hardened, ISO verified Gate to Phase 3: User testing feedback resolved, no P0/P1 issues remaining


Phase 1: Feature Testing (Internal)

Everything in this phase must pass before we hand it to real users.

Overall Status: IN PROGRESS (~35%)

Workstream Status Completion Gate-blocking?
1A. Critical Bugs (BUG-1 CSRF) NOT STARTED 0% YES
1B. Boot Screen (FEATURE-4) IN PROGRESS ~20% YES
1C. Security Hardening (TASK-8) IN PROGRESS ~75% (9/12 fixed) YES
1D. Rootless Podman (TASK-11) DONE (.228), IN PROGRESS (.198) ~80% YES
1E. Beta Telemetry (TASK-12) NOT STARTED 0% YES
1F. App Testing — every feature NOT STARTED 0% YES
1G. ISO Build & Fresh Install NOT STARTED 0% YES
1H. UI Polish & Layout DONE (batch) ~80% No
1I. WebSocket Reliability NOT STARTED 0% No
1J. Quality Baseline Check NOT STARTED 0% No

1A. Critical Bugs

BUG-1: Random logout / CSRF mismatch — P0

Status: PLANNED Impact: Users get randomly logged out. Blocks user testing — unacceptable UX.

What's known:

  • Sessions now persist to disk (fixed)
  • CSRF token mismatch between cookie and header still causes 403s
  • Likely caused by cookie rotation in multi-tab or deploy scenarios

Remaining work:

  • Add debug logging to capture actual cookie vs header values
  • Reproduce reliably (multi-tab, deploy, long idle)
  • Fix the root cause
  • Verify fix survives deploys and multi-tab use

BUG-3: IndeedHub WebSocket spam — P2

Status: PLANNED Impact: Console noise, minor. Should fix before user testing.

  • Rebuild IndeedHub with relative WebSocket URL
  • Verify fix

1B. Boot Screen (FEATURE-4)

Status: IN PROGRESS (started 2026-03-17) Impact: Users hit errors on first boot before backend is ready. Blocks user testing.

  • Audit current /health endpoint — what does it check?
  • Add granular service readiness to health endpoint
  • Design boot screen component (screensaver + progress)
  • Create pixel art icon animations
  • Implement health polling with smooth transition
  • Handle edge cases (slow start, partial failures, timeout)
  • Test on fresh ISO install (first-boot path)
  • Test on normal reboot (existing user path)

1C. Security Hardening (TASK-8)

Status: IN PROGRESS — 9 of 12 pentest findings fixed (commits 27f205f, c1db74e)

Fixed (9/12)

  • C1: /lnd-connect-info requires session auth
  • C3: DEV_MODE removed from production service
  • H1: node-message verifies ed25519 signatures
  • M1: content.add rejects .. path traversal
  • M2: NIP-07 postMessage uses specific origin
  • M3: AIUI nginx checks session_id cookie
  • L2: Strict v3 onion validation
  • MED-03: Shell injection in bitcoin.conf generation
  • MED-07: No body size limit on /rpc/

Remaining (3/12)

  • H2: Federation peer-joined signature verification
  • H3: Federation address-changed signature verification
  • H4: Bind service ports to 127.0.0.1 (Bitcoin RPC, LND, etc.)

1D. Rootless Podman (TASK-11)

Status: DONE on .228 (30 containers rootless), IN PROGRESS on .198 Impact: Security posture — containers no longer require root.

  • Migrate existing root Podman containers to rootless (archipelago user)
  • Update PodmanClient to run podman directly (no sudo) — 9 Rust files
  • Deploy script auto-fixes ownership + sysctl + linger on every deploy
  • All 30 containers running rootless on .228
  • .198: only 2 containers running — needs full container recreation (TASK-39)
  • Tailscale deploy script: full deploy-tailscale.sh with split-mode SSH, rootful→rootless migration, container creation, all infrastructure
  • Test full deploy on .198 (validation before Tailscale)
  • Deploy to Tailscale nodes (Arch 1/2/3)

1E. Beta Telemetry — Node Reporting (TASK-12)

Status: NOT STARTED Impact: Without this we're blind during user testing — can't see what's broken on their machines.

All beta nodes report health/errors to a central log. We build a panel to monitor and triage issues.

Design:

  • Opt-in telemetry (user consents during onboarding or settings)
  • Each node periodically reports: health status, error log digest, container states, uptime
  • Central endpoint collects reports (could be a simple API on one of our servers)
  • Dashboard panel shows all reporting nodes, their status, recent errors
  • Privacy: no wallet data, no keys, no personal data — only system health and error logs
  • Nodes identified by anonymous ID (hash of DID), not IP or name

Tasks:

  • Design report payload (health, errors, container states, versions, uptime)
  • Design privacy model — what's collected, what's NOT, user consent flow
  • Build reporting endpoint (backend RPC → central collector)
  • Build central collector service (receives + stores reports)
  • Build monitoring dashboard/panel (view all nodes, filter by error type)
  • Add opt-in toggle to Settings UI
  • Add reporting interval config (default: every 15 min?)
  • Test with multi-node fleet (.228, .198, Tailscale nodes)

1F. App Testing — Every Feature

Status: NOT STARTED Reference: docs/BETA-RELEASE-CHECKLIST.md — full matrix

Systematic test of every feature on the dev server, then on fresh install.

Core Flows

  • Onboarding: welcome → password → path → DID → backup → dashboard
  • Login / logout / re-login
  • Password change (invalidates other sessions)
  • 2FA enrollment and verification
  • Settings: view server name, version, DID, Tor address
  • Dashboard: all overview cards render with data

App Lifecycle (every app)

  • Bitcoin Knots: install, sync starts, UI loads, uninstall
  • Electrs: install, auto-connects to Bitcoin, UI loads, uninstall
  • LND: install, auto-connects to Bitcoin, UI loads, uninstall
  • BTCPay Server: install, connects, Lightning available, uninstall
  • Mempool: install with Bitcoin+Electrs, shows data, uninstall
  • Fedimint + Gateway: install, UI loads, uninstall
  • File Browser: install, UI loads, uninstall
  • Immich: install, UI loads, uninstall
  • PhotoPrism: install, UI loads, uninstall
  • Penpot: install, UI loads, uninstall
  • SearXNG: install, UI loads, uninstall
  • Ollama: install, UI loads, uninstall
  • Nostr Relay: install, UI loads, uninstall
  • Nginx Proxy Manager: install, UI loads, uninstall
  • Tailscale: install, UI loads, uninstall
  • Home Assistant: install, UI loads (new tab), uninstall
  • IndeedHub: opens external URL in iframe

Dependency Chain Errors

  • Electrs without Bitcoin → clear error message
  • LND without Bitcoin → clear error message
  • Mempool without Bitcoin+Electrs → clear error message

Federation & Identity

  • Federation invite + join between nodes
  • DWN sync between federated nodes
  • Backup create + download
  • Backup restore on fresh install

WebSocket

  • Connects on login, receives initial data
  • Reconnects after network drop
  • Ping/pong heartbeat both directions
  • Connection state visible in UI
  • Install progress delivered real-time

Nginx Proxies

  • Every /app/* proxy resolves correctly
  • BTCPay and Home Assistant open in new tab
  • Tor hidden services resolve

1G. ISO Build & Fresh Install

Status: NOT STARTED

  • ISO builds successfully on dev server
  • ISO size < 10 GB
  • All container images captured
  • Boot from USB on x86_64 hardware
  • Auto-installer partitions correctly
  • Services start on first boot
  • Web UI accessible within 3 minutes
  • Full onboarding flow completes
  • Second machine test (different hardware)
  • ARM64 test (if targeting)

1H. UI Polish & Layout

Status: MOSTLY DONE — batch of fixes shipped 2026-03-18 Note: Layout rearrangements and UX improvements allowed during freeze.

  • Rename fedimintd → "Fedimint Guardian" + icon (TASK-26)
  • Tab-launch icons for apps opening in new tabs (TASK-27)
  • Installed apps sorted to end of marketplace (TASK-28)
  • Mesh mobile: header hidden, overflow fixed (TASK-29)
  • On-Chain first in receive modals (TASK-30)
  • Federation node names — show name not DID, hover for key (TASK-35)
  • Cleaner iframe error screen with remediation (TASK-36)
  • CPU alert threshold fixed (BUG-33)
  • ElectrumX shows index size during indexing
  • Container startup "Checking..." shimmer
  • Sticky nav header (TASK-31)
  • Review all views for consistent glass design
  • Verify all loading/empty/error states work
  • Check responsive layout on tablet/mobile

1I. WebSocket Reliability

Covered under 1F testing — no separate workstream needed.


1J. Quality Baseline Check

Last known (2026-03-11):

  • Silent catches: 0

  • Console statements: 0

  • any types: 0

  • TypeScript errors: 0

  • Tests: 515 passed

  • npm audit (runtime): 0

  • Re-run full quality sweep — verify no regressions

  • Fix any new violations


Phase 2: User Testing (Controlled)

Gate: All Phase 1 items pass. No P0/P1 bugs open.

Starts when we hand ISOs to real users on real hardware we don't control.

Item Status
Recruit test users (3-5 people, varied hardware) NOT STARTED
Provide ISOs + install instructions NOT STARTED
Beta telemetry collecting reports from user nodes NOT STARTED
Monitor dashboard for errors across fleet NOT STARTED
Triage + fix reported issues NOT STARTED
User feedback collection (structured form or channel) NOT STARTED
Fix all P0/P1 issues from user reports NOT STARTED
Rebuild ISO with fixes, re-test NOT STARTED

Phase 3: Beta Live (Public)

Gate: User testing complete. No P0/P1 issues. Telemetry shows stable fleet.

Item Status
Final ISO build with all fixes NOT STARTED
Release notes / changelog NOT STARTED
Download page / distribution NOT STARTED
Public announcement NOT STARTED
Telemetry monitoring active for early adopters NOT STARTED

Session Log

Date Session Work Done Items Closed
2026-03-18 #1 Created beta freeze plan, progress tracker
2026-03-18 #2 Restructured into 3-phase pipeline, added telemetry workstream
2026-03-18 #3 Updated tracking to reflect completed work — TASK-11 done, TASK-8 9/12, UI batch done TASK-11, TASK-26-30, TASK-32, TASK-34-36, BUG-33
2026-03-18 #4 Rewrote deploy-tailscale.sh (full deploy with split-mode SSH, rootful migration, containers, infra). Fixed first-boot-containers.sh rootless bugs (subnet, UID mapping, prereqs). Dynamic HTTPS certs.

Post-Beta Parking Lot

These are explicitly deferred until after beta ships:

  • FEATURE-6: Watch-only wallet architecture
  • TASK-7: Mesh Bitcoin security hardening
  • INQUIRY-5: Offline balance check via mesh relay
  • TASK-2: Roll incoming-tx into deploy & ISO (P2, not blocking)
  • did:dht integration
  • Multi-user support
  • Cluster mode
  • Mobile companion PWA