# Archipelago Production Polish Plan **Duration**: 8 weeks (March 10 – May 4, 2026) **Goal**: Zero new features. Every existing feature polished to flawless production quality. **Philosophy**: The iPhone moment — everything just works, feels inevitable, no rough edges. ## SSH Access All remote commands use SSH key auth (password auth is disabled): ```bash ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228 ``` Never use `sshpass`. The deploy script handles this automatically via `SSH_KEY`. --- ## Audit Summary Full codebase audit completed March 8, 2026. Findings: | Layer | Critical | High | Medium | Low | |-------|----------|------|--------|-----| | Frontend (Vue/TS) | 4 | 6 | 10 | 4 | | Backend (Rust) | 6 | 6 | 6 | 7 | | Infrastructure | 5 | 6 | 7 | 3 | | UX Flows | 4 | 4 | 6 | 3 | | **Total** | **19** | **22** | **29** | **17** | --- ## Skills Required ### Existing Skills (14) `deploy`, `deploy-both`, `diagnose`, `check-server`, `frontend-dev`, `sync-configs`, `build-iso`, `server-logs`, `add-app`, `harden`, `test`, `lint`, `ux-review`, `refactor` ### New Skills (9) | Skill | Purpose | |-------|---------| | `polish` | Main orchestrator — reads this plan, detects week, executes tasks | | `polish-errors` | Fix silent error handling, add user-facing error states | | `polish-loading` | Add skeleton loaders, loading indicators, empty states | | `polish-forms` | Input validation, trimming, real-time feedback | | `polish-backend` | Fix unwrap/expect, add timeouts, connection pooling | | `polish-deploy` | Add rollback, health checks, pre-deploy validation | | `polish-security` | Systemd hardening, nginx CSP, secrets management | | `polish-websocket` | Reconnection UX, connection status indicator, heartbeat | | `sweep` | Full automated quality sweep: lint + type-check + verify fixes | --- ## Week 1: Silent Failures & Error Handling (March 10–16) **Theme**: Nothing fails silently. Every error is visible, actionable, recoverable. ### Tasks #### 1.1 Frontend: Kill all silent catch blocks - **Files**: Settings.vue, Web5.vue, router/index.ts, Apps.vue, OnboardingIntro.vue - **Action**: Replace 21+ `.catch(() => {})` patterns with proper error handling - **Pattern**: Log to console in dev, show toast/inline error to user in prod - **Acceptance**: Zero `.catch(() => {})` in codebase (grep confirms) - **Skill**: `/polish-errors` #### 1.2 Frontend: Remove all console.log from production - **Files**: stores/app.ts (15+), api/websocket.ts (12+) - **Action**: Replace with conditional dev-only logging or remove - **Pattern**: `if (import.meta.env.DEV) console.log(...)` or remove entirely - **Acceptance**: Zero `console.log` outside of dev guards (grep confirms) - **Skill**: `/lint` #### 1.3 Backend: Fix all unwrap/expect in handler.rs - **Files**: core/archipelago/src/api/handler.rs (11 unwraps) - **Action**: Replace `.unwrap()` on Response builders with `.map_err()` and `?` - **Acceptance**: Zero `unwrap()` in handler.rs - **Skill**: `/polish-backend` #### 1.4 Backend: Fix unwrap/expect across all production paths - **Files**: main.rs, identity.rs, totp.rs, rpc/mod.rs, image_verifier.rs - **Action**: Audit all 32 `.unwrap()`/`.expect()` calls, replace with `?` or `.context()` - **Acceptance**: Zero unwrap/expect outside of test modules - **Skill**: `/polish-backend` #### 1.5 Backend: Hardcoded Bitcoin RPC credentials - **Files**: core/archipelago/src/api/rpc/bitcoin.rs:89 - **Action**: Move `archipelago/archipelago123` to env var or secrets manager - **Pattern**: `std::env::var("ARCHIPELAGO_BITCOIN_RPC_USER").unwrap_or("archipelago".into())` - **Acceptance**: No hardcoded credentials in Rust source #### 1.6 Deploy & verify - Run `/lint` to confirm zero violations - Run `/deploy` to live server - Run `/check-server` to verify health - Manual spot-check: trigger errors in UI, confirm they're visible --- ## Week 2: Loading States & Visual Feedback (March 17–23) **Theme**: The user always knows what's happening. No blank screens, no mystery waits. ### Tasks #### 2.1 Add skeleton loaders to all async views - **Files**: Apps.vue, AppDetails.vue, Marketplace.vue, Cloud.vue, Server.vue, Settings.vue - **Action**: Create `SkeletonLoader.vue` component, add to every view that fetches data - **Pattern**: Show skeleton immediately, swap to real content on load - **Acceptance**: Every view shows placeholder content during load - **Skill**: `/polish-loading` #### 2.2 Add timeout warnings to long operations - **Files**: Login.vue (server startup), Marketplace.vue (app install) - **Action**: After 15s show "Taking longer than expected...", after 30s show troubleshoot options - **Acceptance**: No operation silently hangs #### 2.3 Fix Start/Stop button state mismatch - **Files**: Apps.vue, AppDetails.vue, ContainerApps.vue - **Action**: Button reflects actual backend state, not a fixed 5s timer - **Pattern**: Poll backend every 2s during state transition, update button immediately on response - **Acceptance**: Button state always matches container state within 3s #### 2.4 Connection status indicator - **Files**: Create `ConnectionStatus.vue`, integrate into App.vue header - **Action**: Show green/amber/red dot based on WebSocket connection state - **Pattern**: Use `wsClient.isConnected()` — green=connected, amber=reconnecting, red=disconnected - **Acceptance**: User always knows if they're connected - **Skill**: `/polish-websocket` #### 2.5 Fix OnlineStatusPill to use real data - **Files**: components/OnlineStatusPill.vue - **Action**: Connect to actual WebSocket state instead of hardcoded "Online" - **Acceptance**: Pill reflects real connection state #### 2.6 Empty states for all views - **Files**: Apps.vue, Cloud.vue, ContainerApps.vue - **Action**: When no data, show helpful message with CTA (e.g., "No apps installed — Browse Marketplace") - **Acceptance**: Every view handles the zero-data case gracefully #### 2.7 Deploy & verify - `/deploy` then `/check-server` - Test: disconnect network, observe status indicator - Test: slow network (throttle), observe skeleton loaders - Test: fresh account with no apps, observe empty states --- ## Week 3: Form Validation & Input Quality (March 24–30) **Theme**: Every input feels responsive, validated, impossible to misuse. ### Tasks #### 3.1 Real-time password validation - **Files**: Login.vue (password setup), Settings.vue (password change) - **Action**: Show inline validation as user types: length check, match check, strength meter - **Pattern**: Debounced validation on input, green checkmark / red X per rule - **Acceptance**: User sees validation state before clicking submit - **Skill**: `/polish-forms` #### 3.2 TOTP input improvements - **Files**: Login.vue (TOTP verify step) - **Action**: Auto-submit on 6 digits, show session countdown timer, trim whitespace - **Pattern**: `watch(code, () => { if (code.length === 6) submit() })` - **Acceptance**: TOTP flow is fast and clear, session timeout is visible #### 3.3 Input trimming on all forms - **Files**: Login.vue, Settings.vue, any form input - **Action**: `.trim()` all text inputs before submission - **Acceptance**: Leading/trailing whitespace never causes failures #### 3.4 Disable submit buttons during operations - **Files**: Settings.vue (password change), Login.vue (login), Marketplace.vue (install) - **Action**: Add `:disabled="isSubmitting"` to all action buttons - **Pattern**: Button shows spinner + disabled state during async operation - **Acceptance**: No button can be double-clicked during an operation #### 3.5 Error message consistency - **Files**: All views with error messages - **Action**: Create `formatError()` utility that normalizes error messages - **Pattern**: Network errors -> "Can't reach server", Auth errors -> "Session expired", Server errors -> "Something went wrong" - **Acceptance**: Error messages are user-friendly, never show raw error strings #### 3.6 Deploy & verify - Test every form: login, password change, TOTP setup, app install - Try invalid inputs, verify feedback is immediate and clear --- ## Week 4: Backend Robustness (March 31 – April 6) **Theme**: The backend never crashes, never hangs, handles every edge case. ### Tasks #### 4.1 Add timeouts to all container operations - **Files**: core/archipelago/src/container/dev_orchestrator.rs - **Action**: Wrap all podman calls with `tokio::time::timeout(Duration::from_secs(30), ...)` - **Acceptance**: No container operation can hang indefinitely #### 4.2 Add timeouts to all external HTTP calls - **Files**: bitcoin.rs, handler.rs (LND proxy) - **Action**: Explicit `reqwest::Client` with timeout, not default - **Pattern**: Reuse a single `Client` stored in `RpcHandler` state - **Acceptance**: Every HTTP call has an explicit timeout #### 4.3 Connection pooling for Bitcoin RPC - **Files**: core/archipelago/src/api/rpc/bitcoin.rs - **Action**: Store `reqwest::Client` in `RpcHandler`, reuse across requests - **Acceptance**: One client instance, connection pooled #### 4.4 Fix all clippy warnings - **Action**: Run `cargo clippy --all-targets --all-features` on dev server, fix all 10 warnings - **Warnings**: `should_implement_trait`, `get_first`, `assign_op_pattern`, `wildcard_in_or_patterns`, `redundant_field_names`, `unused_import`, `ptr_arg`, `very_complex_type`, `if_else_collapse`, `io::Error::other` - **Acceptance**: `cargo clippy` returns zero warnings - **Skill**: `/lint` #### 4.5 Rate limiting on unauthenticated endpoints - **Files**: core/archipelago/src/api/handler.rs - **Action**: Add per-IP rate limiting to `/archipelago/node-message` and `/electrs-status` - **Pattern**: In-memory rate limiter with 60 req/min per IP - **Acceptance**: Endpoints return 429 when rate exceeded #### 4.6 Consistent error codes and messages - **Files**: All RPC endpoints - **Action**: Define error code constants, consistent capitalization - **Pattern**: `const ERR_AUTH: i32 = -1001;` etc. - **Acceptance**: All error responses use defined constants #### 4.7 Remove dead code - **Files**: identity.rs (unused field, unused methods), auth.rs (dead_code allows) - **Action**: Remove `identity_dir` field, remove unused `verify()` and `did_key()` methods, remove `#[allow(dead_code)]` and verify usage - **Acceptance**: Zero `#[allow(dead_code)]` outside of generated code #### 4.8 Replace println/eprintln with tracing - **Files**: core/startos/src/* (23+ instances) - **Action**: Replace `println!` -> `tracing::info!`, `eprintln!` -> `tracing::warn!` - **Acceptance**: Zero `println!` / `eprintln!` in non-test code #### 4.9 Deploy & verify - `/deploy` then `/check-server` then `/diagnose` - Test: kill Bitcoin container, verify backend doesn't crash - Test: flood unauthenticated endpoint, verify rate limiting - Test: restart backend, verify graceful startup --- ## Week 5: WebSocket & Real-Time Quality (April 7–13) **Theme**: Real-time updates are bulletproof. Connection issues are transparent to the user. ### Tasks #### 5.1 WebSocket reconnection UX - **Files**: api/websocket.ts, App.vue - **Action**: After max reconnect attempts, show persistent banner "Connection lost. Click to retry." - **Pattern**: Don't silently give up after 10 attempts - **Acceptance**: User always has a path to reconnect - **Skill**: `/polish-websocket` #### 5.2 WebSocket heartbeat improvement - **Files**: api/websocket.ts - **Action**: Send ping every 30s, expect pong within 5s, reconnect if missed - **Acceptance**: Stale connections detected within 35s, not 60s #### 5.3 RPC client session detection - **Files**: api/rpc-client.ts - **Action**: On 401/403 response, redirect to login page instead of showing generic error - **Pattern**: `if (status === 401) { router.push('/login'); return; }` - **Acceptance**: Expired sessions redirect to login immediately #### 5.4 Message queuing during reconnection - **Files**: api/rpc-client.ts, api/websocket.ts - **Action**: If WebSocket is down, queue state-update subscriptions, replay on reconnect - **Pattern**: Don't lose container state updates during brief disconnects - **Acceptance**: State is consistent after reconnection without page refresh #### 5.5 WebSocket race condition fix - **Files**: stores/app.ts, api/websocket.ts - **Action**: Fix duplicate listener issue on rapid reconnect (`isWsSubscribed` flag) - **Pattern**: Use a Set of listener IDs, deduplicate on registration - **Acceptance**: No duplicate event handlers after reconnect cycles #### 5.6 Deploy & verify - Test: kill backend, observe frontend reconnection behavior - Test: toggle wifi, observe status indicator + reconnection - Test: let session expire, verify redirect to login --- ## Week 6: Deployment & Infrastructure Hardening (April 14–20) **Theme**: Deployments are safe, reversible, and verified. Infrastructure is production-grade. ### Tasks #### 6.1 Deploy script: add rollback capability - **Files**: scripts/deploy-to-target.sh - **Action**: Before overwriting binary/frontend, backup to `.backup` suffix - **Pattern**: On health check failure after restart, restore from backup - **Acceptance**: Failed deploy auto-restores previous working version - **Skill**: `/polish-deploy` #### 6.2 Deploy script: pre-deploy sanity checks - **Files**: scripts/deploy-to-target.sh - **Action**: Check disk space (2GB min), verify SSH key exists, verify target dir exists - **Acceptance**: Deploy fails early with clear message if preconditions not met #### 6.3 Deploy script: post-deploy health verification - **Files**: scripts/deploy-to-target.sh - **Action**: After restart, poll `/health` endpoint for 30s. If no 200, trigger rollback - **Acceptance**: Every deploy is verified healthy before declaring success #### 6.4 Deploy script: deployment locking - **Files**: scripts/deploy-to-target.sh - **Action**: Use flock to prevent concurrent deploys - **Acceptance**: Second simultaneous deploy fails immediately with message #### 6.5 First-boot script: add error handling - **Files**: scripts/first-boot-containers.sh - **Action**: Add `set -e`, verify each container starts before creating dependents - **Acceptance**: If Bitcoin fails, Mempool is not attempted #### 6.6 Systemd service hardening - **Files**: image-recipe/configs/archipelago.service - **Action**: Add `PrivateTmp=yes`, `NoNewPrivileges=true`, `ProtectSystem=strict`, `ProtectHome=yes`, `SystemCallFilter=@system-service` - **Acceptance**: Service runs with minimal privileges - **Skill**: `/harden` #### 6.7 Nginx security headers - **Files**: image-recipe/configs/nginx-archipelago.conf - **Action**: Add HSTS, fix CSP (remove unsafe-inline), add rate limiting zones, custom log format that strips tokens - **Acceptance**: Security headers pass Mozilla Observatory scan #### 6.8 Nginx config: test before reload - **Files**: scripts/deploy-to-target.sh - **Action**: `nginx -t` failure should abort deploy and restore backup config - **Acceptance**: Invalid nginx config never goes live #### 6.9 Deploy & verify - Test: deploy with intentionally broken binary, verify rollback - Test: deploy with invalid nginx config, verify rollback - Test: concurrent deploy attempt, verify lock - Run `/diagnose` full check --- ## Week 7: Accessibility, Polish & Edge Cases (April 21–27) **Theme**: Every interaction is crisp. Keyboard users, slow networks, edge cases — all handled. ### Tasks #### 7.1 ARIA labels on all interactive elements - **Files**: All views and components - **Action**: Add `aria-label` to buttons, links, form inputs that lack visible labels - **Pattern**: `