- Update all skill SSH commands from sshpass to key-based auth (~/.ssh/archipelago-deploy) - Add proxy_connect_timeout 120s to nginx Ollama location blocks - Add new polish/sweep skills for overnight automation - Add demo content (documents, photos) for demo stack - Add .ssh/ to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 KiB
Archipelago Production Polish Plan
Duration: 8 weeks (March 10 – May 4, 2026) Goal: Zero new features. Every existing feature polished to flawless production quality. Philosophy: The iPhone moment — everything just works, feels inevitable, no rough edges.
SSH Access
All remote commands use SSH key auth (password auth is disabled):
ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228
Never use sshpass. The deploy script handles this automatically via SSH_KEY.
Audit Summary
Full codebase audit completed March 8, 2026. Findings:
| Layer | Critical | High | Medium | Low |
|---|---|---|---|---|
| Frontend (Vue/TS) | 4 | 6 | 10 | 4 |
| Backend (Rust) | 6 | 6 | 6 | 7 |
| Infrastructure | 5 | 6 | 7 | 3 |
| UX Flows | 4 | 4 | 6 | 3 |
| Total | 19 | 22 | 29 | 17 |
Skills Required
Existing Skills (14)
deploy, deploy-both, diagnose, check-server, frontend-dev, sync-configs, build-iso, server-logs, add-app, harden, test, lint, ux-review, refactor
New Skills (9)
| Skill | Purpose |
|---|---|
polish |
Main orchestrator — reads this plan, detects week, executes tasks |
polish-errors |
Fix silent error handling, add user-facing error states |
polish-loading |
Add skeleton loaders, loading indicators, empty states |
polish-forms |
Input validation, trimming, real-time feedback |
polish-backend |
Fix unwrap/expect, add timeouts, connection pooling |
polish-deploy |
Add rollback, health checks, pre-deploy validation |
polish-security |
Systemd hardening, nginx CSP, secrets management |
polish-websocket |
Reconnection UX, connection status indicator, heartbeat |
sweep |
Full automated quality sweep: lint + type-check + verify fixes |
Week 1: Silent Failures & Error Handling (March 10–16)
Theme: Nothing fails silently. Every error is visible, actionable, recoverable.
Tasks
1.1 Frontend: Kill all silent catch blocks
- Files: Settings.vue, Web5.vue, router/index.ts, Apps.vue, OnboardingIntro.vue
- Action: Replace 21+
.catch(() => {})patterns with proper error handling - Pattern: Log to console in dev, show toast/inline error to user in prod
- Acceptance: Zero
.catch(() => {})in codebase (grep confirms) - Skill:
/polish-errors
1.2 Frontend: Remove all console.log from production
- Files: stores/app.ts (15+), api/websocket.ts (12+)
- Action: Replace with conditional dev-only logging or remove
- Pattern:
if (import.meta.env.DEV) console.log(...)or remove entirely - Acceptance: Zero
console.logoutside of dev guards (grep confirms) - Skill:
/lint
1.3 Backend: Fix all unwrap/expect in handler.rs
- Files: core/archipelago/src/api/handler.rs (11 unwraps)
- Action: Replace
.unwrap()on Response builders with.map_err()and? - Acceptance: Zero
unwrap()in handler.rs - Skill:
/polish-backend
1.4 Backend: Fix unwrap/expect across all production paths
- Files: main.rs, identity.rs, totp.rs, rpc/mod.rs, image_verifier.rs
- Action: Audit all 32
.unwrap()/.expect()calls, replace with?or.context() - Acceptance: Zero unwrap/expect outside of test modules
- Skill:
/polish-backend
1.5 Backend: Hardcoded Bitcoin RPC credentials
- Files: core/archipelago/src/api/rpc/bitcoin.rs:89
- Action: Move
archipelago/archipelago123to env var or secrets manager - Pattern:
std::env::var("ARCHIPELAGO_BITCOIN_RPC_USER").unwrap_or("archipelago".into()) - Acceptance: No hardcoded credentials in Rust source
1.6 Deploy & verify
- Run
/lintto confirm zero violations - Run
/deployto live server - Run
/check-serverto verify health - Manual spot-check: trigger errors in UI, confirm they're visible
Week 2: Loading States & Visual Feedback (March 17–23)
Theme: The user always knows what's happening. No blank screens, no mystery waits.
Tasks
2.1 Add skeleton loaders to all async views
- Files: Apps.vue, AppDetails.vue, Marketplace.vue, Cloud.vue, Server.vue, Settings.vue
- Action: Create
SkeletonLoader.vuecomponent, add to every view that fetches data - Pattern: Show skeleton immediately, swap to real content on load
- Acceptance: Every view shows placeholder content during load
- Skill:
/polish-loading
2.2 Add timeout warnings to long operations
- Files: Login.vue (server startup), Marketplace.vue (app install)
- Action: After 15s show "Taking longer than expected...", after 30s show troubleshoot options
- Acceptance: No operation silently hangs
2.3 Fix Start/Stop button state mismatch
- Files: Apps.vue, AppDetails.vue, ContainerApps.vue
- Action: Button reflects actual backend state, not a fixed 5s timer
- Pattern: Poll backend every 2s during state transition, update button immediately on response
- Acceptance: Button state always matches container state within 3s
2.4 Connection status indicator
- Files: Create
ConnectionStatus.vue, integrate into App.vue header - Action: Show green/amber/red dot based on WebSocket connection state
- Pattern: Use
wsClient.isConnected()— green=connected, amber=reconnecting, red=disconnected - Acceptance: User always knows if they're connected
- Skill:
/polish-websocket
2.5 Fix OnlineStatusPill to use real data
- Files: components/OnlineStatusPill.vue
- Action: Connect to actual WebSocket state instead of hardcoded "Online"
- Acceptance: Pill reflects real connection state
2.6 Empty states for all views
- Files: Apps.vue, Cloud.vue, ContainerApps.vue
- Action: When no data, show helpful message with CTA (e.g., "No apps installed — Browse Marketplace")
- Acceptance: Every view handles the zero-data case gracefully
2.7 Deploy & verify
/deploythen/check-server- Test: disconnect network, observe status indicator
- Test: slow network (throttle), observe skeleton loaders
- Test: fresh account with no apps, observe empty states
Week 3: Form Validation & Input Quality (March 24–30)
Theme: Every input feels responsive, validated, impossible to misuse.
Tasks
3.1 Real-time password validation
- Files: Login.vue (password setup), Settings.vue (password change)
- Action: Show inline validation as user types: length check, match check, strength meter
- Pattern: Debounced validation on input, green checkmark / red X per rule
- Acceptance: User sees validation state before clicking submit
- Skill:
/polish-forms
3.2 TOTP input improvements
- Files: Login.vue (TOTP verify step)
- Action: Auto-submit on 6 digits, show session countdown timer, trim whitespace
- Pattern:
watch(code, () => { if (code.length === 6) submit() }) - Acceptance: TOTP flow is fast and clear, session timeout is visible
3.3 Input trimming on all forms
- Files: Login.vue, Settings.vue, any form input
- Action:
.trim()all text inputs before submission - Acceptance: Leading/trailing whitespace never causes failures
3.4 Disable submit buttons during operations
- Files: Settings.vue (password change), Login.vue (login), Marketplace.vue (install)
- Action: Add
:disabled="isSubmitting"to all action buttons - Pattern: Button shows spinner + disabled state during async operation
- Acceptance: No button can be double-clicked during an operation
3.5 Error message consistency
- Files: All views with error messages
- Action: Create
formatError()utility that normalizes error messages - Pattern: Network errors -> "Can't reach server", Auth errors -> "Session expired", Server errors -> "Something went wrong"
- Acceptance: Error messages are user-friendly, never show raw error strings
3.6 Deploy & verify
- Test every form: login, password change, TOTP setup, app install
- Try invalid inputs, verify feedback is immediate and clear
Week 4: Backend Robustness (March 31 – April 6)
Theme: The backend never crashes, never hangs, handles every edge case.
Tasks
4.1 Add timeouts to all container operations
- Files: core/archipelago/src/container/dev_orchestrator.rs
- Action: Wrap all podman calls with
tokio::time::timeout(Duration::from_secs(30), ...) - Acceptance: No container operation can hang indefinitely
4.2 Add timeouts to all external HTTP calls
- Files: bitcoin.rs, handler.rs (LND proxy)
- Action: Explicit
reqwest::Clientwith timeout, not default - Pattern: Reuse a single
Clientstored inRpcHandlerstate - Acceptance: Every HTTP call has an explicit timeout
4.3 Connection pooling for Bitcoin RPC
- Files: core/archipelago/src/api/rpc/bitcoin.rs
- Action: Store
reqwest::ClientinRpcHandler, reuse across requests - Acceptance: One client instance, connection pooled
4.4 Fix all clippy warnings
- Action: Run
cargo clippy --all-targets --all-featureson dev server, fix all 10 warnings - Warnings:
should_implement_trait,get_first,assign_op_pattern,wildcard_in_or_patterns,redundant_field_names,unused_import,ptr_arg,very_complex_type,if_else_collapse,io::Error::other - Acceptance:
cargo clippyreturns zero warnings - Skill:
/lint
4.5 Rate limiting on unauthenticated endpoints
- Files: core/archipelago/src/api/handler.rs
- Action: Add per-IP rate limiting to
/archipelago/node-messageand/electrs-status - Pattern: In-memory rate limiter with 60 req/min per IP
- Acceptance: Endpoints return 429 when rate exceeded
4.6 Consistent error codes and messages
- Files: All RPC endpoints
- Action: Define error code constants, consistent capitalization
- Pattern:
const ERR_AUTH: i32 = -1001;etc. - Acceptance: All error responses use defined constants
4.7 Remove dead code
- Files: identity.rs (unused field, unused methods), auth.rs (dead_code allows)
- Action: Remove
identity_dirfield, remove unusedverify()anddid_key()methods, remove#[allow(dead_code)]and verify usage - Acceptance: Zero
#[allow(dead_code)]outside of generated code
4.8 Replace println/eprintln with tracing
- Files: core/startos/src/* (23+ instances)
- Action: Replace
println!->tracing::info!,eprintln!->tracing::warn! - Acceptance: Zero
println!/eprintln!in non-test code
4.9 Deploy & verify
/deploythen/check-serverthen/diagnose- Test: kill Bitcoin container, verify backend doesn't crash
- Test: flood unauthenticated endpoint, verify rate limiting
- Test: restart backend, verify graceful startup
Week 5: WebSocket & Real-Time Quality (April 7–13)
Theme: Real-time updates are bulletproof. Connection issues are transparent to the user.
Tasks
5.1 WebSocket reconnection UX
- Files: api/websocket.ts, App.vue
- Action: After max reconnect attempts, show persistent banner "Connection lost. Click to retry."
- Pattern: Don't silently give up after 10 attempts
- Acceptance: User always has a path to reconnect
- Skill:
/polish-websocket
5.2 WebSocket heartbeat improvement
- Files: api/websocket.ts
- Action: Send ping every 30s, expect pong within 5s, reconnect if missed
- Acceptance: Stale connections detected within 35s, not 60s
5.3 RPC client session detection
- Files: api/rpc-client.ts
- Action: On 401/403 response, redirect to login page instead of showing generic error
- Pattern:
if (status === 401) { router.push('/login'); return; } - Acceptance: Expired sessions redirect to login immediately
5.4 Message queuing during reconnection
- Files: api/rpc-client.ts, api/websocket.ts
- Action: If WebSocket is down, queue state-update subscriptions, replay on reconnect
- Pattern: Don't lose container state updates during brief disconnects
- Acceptance: State is consistent after reconnection without page refresh
5.5 WebSocket race condition fix
- Files: stores/app.ts, api/websocket.ts
- Action: Fix duplicate listener issue on rapid reconnect (
isWsSubscribedflag) - Pattern: Use a Set of listener IDs, deduplicate on registration
- Acceptance: No duplicate event handlers after reconnect cycles
5.6 Deploy & verify
- Test: kill backend, observe frontend reconnection behavior
- Test: toggle wifi, observe status indicator + reconnection
- Test: let session expire, verify redirect to login
Week 6: Deployment & Infrastructure Hardening (April 14–20)
Theme: Deployments are safe, reversible, and verified. Infrastructure is production-grade.
Tasks
6.1 Deploy script: add rollback capability
- Files: scripts/deploy-to-target.sh
- Action: Before overwriting binary/frontend, backup to
.backupsuffix - Pattern: On health check failure after restart, restore from backup
- Acceptance: Failed deploy auto-restores previous working version
- Skill:
/polish-deploy
6.2 Deploy script: pre-deploy sanity checks
- Files: scripts/deploy-to-target.sh
- Action: Check disk space (2GB min), verify SSH key exists, verify target dir exists
- Acceptance: Deploy fails early with clear message if preconditions not met
6.3 Deploy script: post-deploy health verification
- Files: scripts/deploy-to-target.sh
- Action: After restart, poll
/healthendpoint for 30s. If no 200, trigger rollback - Acceptance: Every deploy is verified healthy before declaring success
6.4 Deploy script: deployment locking
- Files: scripts/deploy-to-target.sh
- Action: Use flock to prevent concurrent deploys
- Acceptance: Second simultaneous deploy fails immediately with message
6.5 First-boot script: add error handling
- Files: scripts/first-boot-containers.sh
- Action: Add
set -e, verify each container starts before creating dependents - Acceptance: If Bitcoin fails, Mempool is not attempted
6.6 Systemd service hardening
- Files: image-recipe/configs/archipelago.service
- Action: Add
PrivateTmp=yes,NoNewPrivileges=true,ProtectSystem=strict,ProtectHome=yes,SystemCallFilter=@system-service - Acceptance: Service runs with minimal privileges
- Skill:
/harden
6.7 Nginx security headers
- Files: image-recipe/configs/nginx-archipelago.conf
- Action: Add HSTS, fix CSP (remove unsafe-inline), add rate limiting zones, custom log format that strips tokens
- Acceptance: Security headers pass Mozilla Observatory scan
6.8 Nginx config: test before reload
- Files: scripts/deploy-to-target.sh
- Action:
nginx -tfailure should abort deploy and restore backup config - Acceptance: Invalid nginx config never goes live
6.9 Deploy & verify
- Test: deploy with intentionally broken binary, verify rollback
- Test: deploy with invalid nginx config, verify rollback
- Test: concurrent deploy attempt, verify lock
- Run
/diagnosefull check
Week 7: Accessibility, Polish & Edge Cases (April 21–27)
Theme: Every interaction is crisp. Keyboard users, slow networks, edge cases — all handled.
Tasks
7.1 ARIA labels on all interactive elements
- Files: All views and components
- Action: Add
aria-labelto buttons, links, form inputs that lack visible labels - Pattern:
<button aria-label="Install Bitcoin Core" ...> - Acceptance: Every interactive element has accessible name
7.2 Focus management in modals
- Files: Apps.vue (uninstall modal), Marketplace.vue (filter modal), Settings.vue
- Action: Trap focus inside modals, return focus on close, autofocus first interactive element
- Pattern: Use
useFocusTrapcomposable - Acceptance: Tab key never leaves modal; Escape closes; focus returns to trigger
7.3 Keyboard navigation completeness
- Files: All views
- Action: Verify every action is reachable via keyboard (Tab/Enter/Escape)
- Acceptance: Full app usable without mouse
7.4 Fix inline Tailwind violations
- Files: Web5.vue, AppDetails.vue, Cloud.vue, onboarding views
- Action: Extract inline classes to global classes in style.css
- Pattern:
px-3 py-1.5 rounded-lg bg-white/5->.info-rowclass - Acceptance: Zero inline Tailwind utility classes in components
- Skill:
/ux-review
7.5 Touch feedback on mobile
- Files: style.css, app card components
- Action: Add
:activestates for mobile touch feedback - Pattern:
.app-card:active { transform: scale(0.98); } - Acceptance: Every tappable element has tactile feedback
7.6 Responsive edge cases
- Files: Marketplace.vue, Dashboard.vue, AppDetails.vue
- Action: Test at 320px, 375px, 768px, 1024px, 1440px widths
- Fix: Any overflow, text truncation, or broken layouts
- Acceptance: No horizontal scroll or broken layout at any standard width
7.7 Fix template crash risks
- Files: ContainerApps.vue:76 (
app.image.split('/').pop()) - Action: Add null guards on all template expressions that chain methods
- Pattern:
app.image?.split('/').pop() ?? 'unknown' - Acceptance: No template expression can crash on null/undefined data
7.8 Remove all TODO/FIXME from production code
- Files: Web5.vue, AppDetails.vue, backend TODO comments
- Action: Either implement the TODO or remove the dead code
- Pattern: If feature isn't ready, remove the UI element entirely
- Acceptance: Zero TODO/FIXME/HACK in committed code
- Skill:
/refactor
7.9 Deploy & verify
- Test: navigate entire app with keyboard only
- Test: resize browser through all breakpoints
- Test: screen reader (VoiceOver) basic navigation
- Run
/ux-reviewon every view
Week 8: Integration Testing, Final Sweep & ISO (April 28 – May 4)
Theme: Everything works together. The final product is tested end-to-end and burned to ISO.
Tasks
8.1 Create critical path tests — Frontend
- Files: Create
neode-ui/src/__tests__/directory - Tests to write:
- Login flow: valid password, invalid password, TOTP, session timeout
- App lifecycle: install -> start -> launch -> stop -> uninstall
- Settings: password change, TOTP setup, TOTP disable
- WebSocket: connect, disconnect, reconnect
- Framework: Vitest + @vue/test-utils (already in package.json)
- Acceptance: 10+ critical path tests passing
- Skill:
/test
8.2 Create critical path tests — Backend
- Tests to write:
- RPC endpoint validation (good/bad input for each endpoint)
- Session management (create, validate, expire, invalidate)
- Container manifest parsing (valid, invalid, missing fields)
- Rate limiting (under limit, at limit, over limit)
- Acceptance: 10+ backend tests passing
- Skill:
/test
8.3 Create deployment verification test
- Files: scripts/verify-deploy.sh (new)
- Action: Script that hits every endpoint, checks every container, verifies every UI route
- Pattern: Automated smoke test run after every deploy
- Acceptance: Script exits 0 only if everything works
8.4 Full quality sweep
- Run
/lint— zero violations - Run
/harden— zero findings - Run
/ux-review— zero findings - Run
/diagnose— all green - Run
/sweep— clean bill of health - Acceptance: All skills report zero issues
8.5 Build final ISO
- Sync all configs:
/sync-configs - Build ISO:
/build-iso - Flash to USB, boot on clean hardware
- Verify first-boot experience end-to-end
- Acceptance: ISO boots, onboarding works, Bitcoin syncs, apps install
8.6 Performance baseline
- Measure and document:
- Time to first meaningful paint (target: <2s)
- Login flow completion time (target: <3s)
- App install completion time (document actual)
- WebSocket reconnection time (target: <5s)
- Backend cold start time (target: <3s)
- Acceptance: All targets met or documented with explanation
8.7 Final documentation pass
- Update
docs/current-state.mdto reflect production status - Update
CHANGELOG.mdwith all polish work - Verify all CLAUDE.md instructions are still accurate
- Acceptance: Docs match reality
Metrics & Definition of Done
Per-Week Exit Criteria
Each week is "done" when:
- All tasks for that week have acceptance criteria met
/sweepreturns zero violations for that week's focus area/deploysucceeds and/check-serveris green- Manual spot-check of affected features passes
Project Exit Criteria (Week 8)
The project is done when ALL of these are true:
- Zero
.catch(() => {})in frontend - Zero
console.logoutside dev guards - Zero
unwrap()/expect()in backend production paths - Zero clippy warnings
- Zero inline Tailwind in components
- Zero TODO/FIXME in committed code
- Every view has: loading state, error state, empty state
- Every form has: real-time validation, disabled during submit
- Every button action has: loading feedback, error feedback
- WebSocket shows connection status to user
- Session timeout redirects to login
- Deploy has: rollback, health check, locking
- Systemd service is hardened
- Nginx has: HSTS, proper CSP, rate limiting, clean logs
- 10+ frontend tests passing
- 10+ backend tests passing
- ISO boots and onboards successfully
- All performance targets met
Risk Register
| Risk | Mitigation |
|---|---|
| Skeleton loaders change visual feel | Match exact glassmorphism style, use existing color tokens |
| Backend changes break existing functionality | Deploy to secondary server (198) first, test, then primary |
| Nginx CSP changes break app iframes | Test each framed app individually before deploying |
| Rate limiting blocks legitimate use | Set generous limits (60/min), monitor false positives |
| Test suite becomes maintenance burden | Only test critical paths, no unit tests for trivial code |
| ISO build captures incomplete state | Always build ISO from clean deploy, never mid-development |