archy/loops/plan.md

# Overnight Plan — 2-Year Production Hardening & Security Roadmap

> **Goal**: Take Archipelago from development prototype to production-grade, security-hardened Bitcoin Node OS.
> Every phase: fix → test → harden → test → verify nothing broke → move to next module → review at end.
> Deploy after every change: `./scripts/deploy-to-target.sh --live` — test at http://192.168.1.228
> See `CLAUDE.md` for all project rules and conventions.
>
> **NOTE — DEV ENVIRONMENT IS OUT OF SCOPE**: SSH keys, deploy script credentials, `StrictHostKeyChecking=no`,
> dev passwords in test scripts, and `password123` in dev mode are intentional development tooling on a private
> home LAN. Do NOT change these. This plan covers PRODUCTION code only — what runs on the deployed server.

---

## ============================================================
## YEAR 1 — QUARTER 1: CRITICAL & HIGH SEVERITY FIXES
## ============================================================

---

## Phase 1: Infrastructure — CRITICAL Production Credential Hardening

> **Layman version**: Every Archipelago installation currently uses the same passwords (like every house
> in a neighborhood using the same door key). We fix this by generating unique random passwords per
> installation and storing them encrypted. This is the single most important security fix.

- [ ] **Generate random Bitcoin RPC credentials at first boot**: In `scripts/first-boot-containers.sh`, find all occurrences of `-rpcuser=archipelago` and `-rpcpassword=archipelago123`. Replace the hardcoded values with dynamically generated credentials:
  1. At the top of the script (after the shebang and initial variables), add:
     ```bash
     # Generate per-installation credentials if not already saved
     SECRETS_DIR="/var/lib/archipelago/secrets"
     mkdir -p "$SECRETS_DIR" && chmod 700 "$SECRETS_DIR"
     if [ ! -f "$SECRETS_DIR/bitcoin-rpc-password" ]; then
         openssl rand -base64 24 > "$SECRETS_DIR/bitcoin-rpc-password"
         chmod 600 "$SECRETS_DIR/bitcoin-rpc-password"
     fi
     BITCOIN_RPC_USER="archipelago"
     BITCOIN_RPC_PASS=$(cat "$SECRETS_DIR/bitcoin-rpc-password")
     ```
  2. Replace every `-rpcpassword=archipelago123` with `-rpcpassword=$BITCOIN_RPC_PASS` throughout the script.
  3. Replace every `archipelago:archipelago123@` in connection strings (ElectrumX DAEMON_URL, etc.) with `$BITCOIN_RPC_USER:$BITCOIN_RPC_PASS@`.
  4. Do the same in `scripts/deploy-to-target.sh` — search for `archipelago123` and replace with `$BITCOIN_RPC_PASS` (read from the same secrets file on the target server).
  5. SSH to 192.168.1.228, generate the initial password file, restart bitcoin-knots with the new password, then restart all dependent containers (electrs, mempool-api, lnd, btcpay).
  6. Verify: `sudo podman exec bitcoin-knots bitcoin-cli -rpcuser=archipelago -rpcpassword=$(cat /var/lib/archipelago/secrets/bitcoin-rpc-password) getblockchaininfo` should succeed. The old hardcoded password should fail.

- [ ] **Generate random database passwords at first boot**: Same pattern for all database passwords. In `scripts/first-boot-containers.sh`:
  1. Add credential generation for each database service:
     ```bash
     for svc in mempool btcpay immich penpot; do
         if [ ! -f "$SECRETS_DIR/${svc}-db-password" ]; then
             openssl rand -base64 24 > "$SECRETS_DIR/${svc}-db-password"
             chmod 600 "$SECRETS_DIR/${svc}-db-password"
         fi
     done
     MEMPOOL_DB_PASS=$(cat "$SECRETS_DIR/mempool-db-password")
     BTCPAY_DB_PASS=$(cat "$SECRETS_DIR/btcpay-db-password")
     IMMICH_DB_PASS=$(cat "$SECRETS_DIR/immich-db-password")
     PENPOT_DB_PASS=$(cat "$SECRETS_DIR/penpot-db-password")
     ```
  2. Replace `mempoolpass` with `$MEMPOOL_DB_PASS`, `btcpaypass` with `$BTCPAY_DB_PASS`, `immichpass` with `$IMMICH_DB_PASS`, `penpot` (password) with `$PENPOT_DB_PASS` throughout the script.
  3. Replace `rootpass` (MySQL root) with a generated password too.
  4. On the live server, update existing containers: stop each DB container, update the password in the DB itself, restart with new env vars.
  5. Verify each service still connects to its database by checking container logs for connection errors.

- [ ] **Generate unique Fedimint gateway password per deployment**: In `scripts/first-boot-containers.sh` and `scripts/deploy-to-target.sh`, find the hardcoded bcrypt hash `$2y$10$t9YjjxkiktrlYvjajB/zgOMDnSNVg4HqrbDqh47u7Jf42whNdxNqC`. Replace with:
  1. Generate a random password and hash it:
     ```bash
     if [ ! -f "$SECRETS_DIR/fedimint-gateway-password" ]; then
         FEDI_PASS=$(openssl rand -base64 16)
         echo "$FEDI_PASS" > "$SECRETS_DIR/fedimint-gateway-password"
         chmod 600 "$SECRETS_DIR/fedimint-gateway-password"
     fi
     FEDI_PASS=$(cat "$SECRETS_DIR/fedimint-gateway-password")
     FEDI_HASH=$(htpasswd -bnBC 10 "" "$FEDI_PASS" | tr -d ':\n')
     ```
  2. Use `$FEDI_HASH` in the `--bcrypt-password-hash` argument.
  3. Display the password in the first-boot log so the operator can note it.
  4. Verify: open Fedimint gateway web UI and log in with the generated password.

- [ ] **Remove hardcoded Bitcoin RPC credentials from Rust backend**: In `core/archipelago/src/mesh/mod.rs`, find line ~610 with `.basic_auth("archipelago", Some("archipelago123"))`. Replace with:
  1. Add a function to read credentials from the secrets file:
     ```rust
     fn read_bitcoin_rpc_credentials() -> Result<(String, String)> {
         let pass = tokio::fs::read_to_string("/var/lib/archipelago/secrets/bitcoin-rpc-password")
             .await
             .context("Failed to read Bitcoin RPC password from secrets")?;
         Ok(("archipelago".to_string(), pass.trim().to_string()))
     }
     ```
  2. Call this function where RPC credentials are needed instead of hardcoding.
  3. Do the same for any other `.basic_auth("archipelago", Some("archipelago123"))` calls in the codebase. Search with `grep -rn "archipelago123" core/` to find all occurrences.
  4. Build on dev server: `cd ~/archy/core && cargo clippy --all-targets --all-features`.
  5. Deploy and verify mesh Bitcoin relay still works.

- [ ] **Verify Phase 1 — No hardcoded passwords remain**: Run these checks:
  1. `grep -rn "archipelago123" scripts/ core/ --include="*.rs" --include="*.sh"` — should return zero results (except comments explaining the migration).
  2. `grep -rn "mempoolpass\|btcpaypass\|immichpass\|rootpass" scripts/ --include="*.sh"` — should return zero results.
  3. `ls -la /var/lib/archipelago/secrets/` on the server — should show password files with `600` permissions.
  4. All services still running: `sudo podman ps --format '{{.Names}} {{.Status}}' | grep -v "Up"` — should show nothing (all containers Up).
  5. Bitcoin RPC works: `sudo podman exec bitcoin-knots bitcoin-cli getblockchaininfo | head -5`.
  6. Web UI loads and all apps accessible at http://192.168.1.228.

---

## Phase 2: Infrastructure — Systemd & Network Hardening

> **Layman version**: The backend currently runs as the all-powerful "root" user with no restrictions.
> If any bug is exploited, the attacker gets complete control of everything. We lock it down so the
> backend can only do what it needs to do — like giving a bank teller access to the cash drawer but
> not the vault, the CEO's office, or the security cameras.

- [ ] **Create unprivileged archipelago user for backend**: SSH to 192.168.1.228:
  1. Check if user exists: `id archipelago`. If it's the login user (UID 1000), create a separate service user: `sudo useradd -r -s /usr/sbin/nologin -d /var/lib/archipelago archipelago-svc` (UID will be in the system range).
  2. Actually — the `archipelago` user already exists as UID 1000 (the login user). The backend should run as this user, NOT root. Change `/etc/systemd/system/archipelago.service` to use `User=archipelago` instead of `User=root`.
  3. Fix file ownership: `sudo chown -R archipelago:archipelago /var/lib/archipelago/`.
  4. The backend needs to talk to Podman. Since Podman is rootless for UID 1000, this should work. Test: `sudo -u archipelago podman ps`.
  5. If Podman needs root for some operations, use `sudo` with specific commands only via sudoers — NOT running the entire backend as root.

- [ ] **Add systemd sandboxing to archipelago.service**: Edit `image-recipe/configs/archipelago.service`. Add these directives under `[Service]`:
  ```ini
  # Filesystem protection
  ProtectSystem=strict
  ProtectHome=yes
  PrivateTmp=yes
  ReadWritePaths=/var/lib/archipelago

  # Privilege restriction
  NoNewPrivileges=yes
  PrivateDevices=yes

  # Network restriction (allow only IPv4/IPv6 + Unix sockets)
  RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6

  # Restrict what the process can do
  RestrictNamespaces=yes
  RestrictRealtime=yes
  RestrictSUIDSGID=yes

  # Only allow needed syscalls
  SystemCallArchitectures=native
  SystemCallFilter=@system-service
  SystemCallFilter=~@privileged @resources

  # Memory protection
  MemoryDenyWriteExecute=yes

  # Logging
  StandardOutput=journal
  StandardError=journal
  ```
  Deploy the service file to the server: `scp image-recipe/configs/archipelago.service archipelago@192.168.1.228:/tmp/ && ssh archipelago@192.168.1.228 'sudo cp /tmp/archipelago.service /etc/systemd/system/ && sudo systemctl daemon-reload && sudo systemctl restart archipelago'`.
  Watch the journal for errors: `ssh archipelago@192.168.1.228 'sudo journalctl -u archipelago -n 50 --no-pager'`. If the service fails to start due to a denied syscall or path, adjust the sandboxing (e.g., add the path to `ReadWritePaths` or the syscall group to `SystemCallFilter`). Iterate until the service starts cleanly.

- [ ] **Bind Bitcoin RPC to localhost only**: SSH to 192.168.1.228. Edit the bitcoin-knots container's start command:
  1. Find where bitcoin-knots is started (in `scripts/first-boot-containers.sh` or via `podman inspect bitcoin-knots`).
  2. Change `-rpcbind=0.0.0.0:8332` to `-rpcbind=127.0.0.1:8332 -rpcbind=::1:8332`.
  3. Change `-rpcallowip=0.0.0.0/0` to `-rpcallowip=127.0.0.1/32 -rpcallowip=10.88.0.0/16` (the 10.88.x.x is Podman's default network — containers need to reach Bitcoin RPC).
  4. Stop and recreate bitcoin-knots with the new flags.
  5. Verify containers on the Podman network can still reach it: `sudo podman exec lnd bitcoin-cli -rpcconnect=bitcoin-knots -rpcuser=... getblockchaininfo`.
  6. Verify external access is blocked: from another machine on the LAN, `curl http://192.168.1.228:8332` should fail/timeout.

- [ ] **Reduce Tailscale container privileges**: In `scripts/first-boot-containers.sh`, find the Tailscale container creation (line ~460). Replace `--privileged` with:
  ```bash
  --cap-drop=ALL \
  --cap-add=NET_ADMIN \
  --cap-add=NET_RAW \
  --device=/dev/net/tun:/dev/net/tun \
  --read-only \
  --tmpfs /tmp \
  --tmpfs /var/lib/tailscale \
  ```
  Recreate the Tailscale container on the server. Verify Tailscale still works: `sudo podman exec tailscale tailscale status`.

- [ ] **Verify Phase 2 — Systemd hardening active**: Run these checks:
  1. `sudo systemctl show archipelago | grep -E "ProtectSystem|NoNewPrivileges|PrivateTmp"` — should show `strict`, `yes`, `yes`.
  2. `sudo systemctl status archipelago` — should be active and running.
  3. `ss -tlnp | grep 8332` — Bitcoin RPC should show `127.0.0.1:8332`, NOT `0.0.0.0:8332`.
  4. `sudo podman inspect tailscale | jq '.[0].HostConfig.Privileged'` — should be `false`.
  5. All apps still load in the web UI.
  6. Mesh networking still works (if enabled).

---

## Phase 3: Backend — CRITICAL Code Fixes

> **Layman version**: Two bugs in the Rust backend could let an attacker either run any command on your
> server (command injection) or crash your entire node at will (unwrap panic). These are the most
> dangerous code-level bugs found.

- [ ] **Fix command injection in VPN key generation**: In `core/archipelago/src/vpn.rs`, find lines 132-137 where `sh -c` is used with `format!("echo '{}' | wg pubkey", private_key)`. This is a textbook command injection vulnerability. Replace the entire block with safe stdin piping:
  ```rust
  let mut child = tokio::process::Command::new("wg")
      .arg("pubkey")
      .stdin(std::process::Stdio::piped())
      .stdout(std::process::Stdio::piped())
      .stderr(std::process::Stdio::piped())
      .spawn()
      .context("Failed to spawn wg pubkey")?;

  if let Some(mut stdin) = child.stdin.take() {
      use tokio::io::AsyncWriteExt;
      stdin.write_all(private_key.as_bytes()).await
          .context("Failed to write private key to wg stdin")?;
      // stdin is dropped here, closing it
  }

  let output = child.wait_with_output().await
      .context("wg pubkey process failed")?;

  if !output.status.success() {
      anyhow::bail!("wg pubkey failed: {}", String::from_utf8_lossy(&output.stderr));
  }

  let pubkey = String::from_utf8(output.stdout)
      .context("wg pubkey output is not valid UTF-8")?
      .trim()
      .to_string();
  ```
  Search the entire `core/` directory for other `sh -c` or `bash -c` patterns: `grep -rn 'Command::new("sh")\|Command::new("bash")' core/`. Fix any other occurrences with the same pattern.
  Build: `cd ~/archy/core && cargo clippy --all-targets --all-features`.
  Test: If VPN setup is available in the UI, test generating a WireGuard key.

- [ ] **Fix unwrap crash in secrets manager**: In `core/security/src/secrets_manager.rs`, find line 112 with `secret_path.parent().unwrap()`. Replace with:
  ```rust
  let parent = secret_path.parent()
      .ok_or_else(|| anyhow::anyhow!("Invalid secret path: no parent directory for {:?}", secret_path))?;
  fs::create_dir_all(parent).await?;
  ```
  Search for ALL `.unwrap()` calls in the file: `grep -n "unwrap()" core/security/src/secrets_manager.rs`. For each one in a non-test function, evaluate whether it can actually fail and replace with `?` or `.ok_or_else()` if so. Common safe unwraps (e.g., after a `.is_some()` check) can stay but should get a comment explaining why they're safe.
  Build and deploy.

- [ ] **Fix expect crash in Tor proxy fallback**: In `core/archipelago/src/api/rpc/tor.rs`, find line ~525 with `.expect("valid proxy")`. Replace the entire proxy chain with proper error handling:
  ```rust
  let proxy_url = format!("socks5h://{}", proxy);
  let proxy = reqwest::Proxy::all(&proxy_url)
      .or_else(|_| reqwest::Proxy::all("socks5h://127.0.0.1:9050"))
      .context("Failed to create SOCKS5 proxy for Tor")?;
  ```
  Search for ALL `.expect(` calls in non-test code: `grep -rn "\.expect(" core/archipelago/src/ --include="*.rs" | grep -v "#\[cfg(test)\]" | grep -v "mod tests"`. List them and fix any that could realistically fail in production.
  Build: `cargo clippy --all-targets --all-features`.

- [ ] **Fix image verifier accepting unsigned images**: In `core/security/src/image_verifier.rs`, find lines 18-22 where the verifier returns `Ok(false)` for unsigned images. Change to:
  ```rust
  if signature.is_none() && self.cosign_public_key.is_none() {
      return Err(anyhow::anyhow!(
          "Image '{}' has no signature and no cosign key is configured. \
           All container images must be signed for production use.",
          image
      ));
  }
  ```
  Also fix line 25-32 where missing cosign binary returns `Ok(false)`:
  ```rust
  if !cosign_available {
      return Err(anyhow::anyhow!(
          "Cosign binary not found. Install cosign to verify container image signatures."
      ));
  }
  ```
  Build and test. Note: this may cause existing unsigned images to fail verification. If the system doesn't use cosign yet, add a config flag `require_signatures: bool` that defaults to `false` for now but can be flipped to `true` when cosign is deployed.

- [ ] **Verify Phase 3 — No more crash vectors**: Run these checks:
  1. `grep -rn 'Command::new("sh")' core/ --include="*.rs"` — should return zero results.
  2. `grep -rn "\.unwrap()" core/security/src/secrets_manager.rs | grep -v test` — should be minimal/commented.
  3. `grep -rn "\.expect(" core/archipelago/src/api/ --include="*.rs" | grep -v test | grep -v "// SAFE:"` — review each remaining expect.
  4. `cargo clippy --all-targets --all-features` — zero warnings.
  5. Backend starts cleanly: `sudo systemctl restart archipelago && sudo journalctl -u archipelago -n 20 --no-pager`.
  6. Web UI login works. Container start/stop works. Settings page works.

---

## Phase 4: Mesh Networking — Authentication & Validation

> **Layman version**: The mesh network currently accepts messages from anyone who claims to be someone.
> It's like accepting a phone call from someone who says "Hi, I'm your bank" without verifying. We add
> cryptographic proof of identity (digital signatures) so every message is provably from who it claims.
> We also add checks so fake Bitcoin data can't be relayed.

- [ ] **Implement signed identity announcements**: In `core/archipelago/src/mesh/listener.rs`, find the identity advertisement handling (around line 923+). Modify the peer identity broadcast to include an Ed25519 signature:
  1. When broadcasting identity (DID + Ed25519 pubkey), sign the announcement with the node's private key:
     ```rust
     // In the identity broadcast function
     let identity_payload = format!("{}:{}", did, hex::encode(&pubkey));
     let signature = signing_key.sign(identity_payload.as_bytes());
     // Include signature in the broadcast envelope
     ```
  2. When receiving an identity announcement, verify the signature before accepting the peer:
     ```rust
     // In the identity receive handler
     let identity_payload = format!("{}:{}", claimed_did, hex::encode(&claimed_pubkey));
     let verifying_key = ed25519_dalek::VerifyingKey::from_bytes(&claimed_pubkey)?;
     verifying_key.verify_strict(identity_payload.as_bytes(), &signature)
         .map_err(|_| anyhow::anyhow!("Identity announcement signature verification failed for {}", claimed_did))?;
     ```
  3. Reject any identity announcement without a valid signature. Log the rejection at `warn!` level.
  4. Update the `TypedEnvelope` struct in `message_types.rs` to include an optional `identity_signature` field if not already present.
  Build and test with two mesh-connected nodes if available. If only one node, verify the code compiles and the identity broadcast includes signatures.

- [ ] **Verify envelope signatures on received messages**: In `core/archipelago/src/mesh/listener.rs`, find where incoming `TypedEnvelope` messages are processed. Add signature verification:
  1. Before processing any message, call `envelope.verify_signature()` (which should already exist in `message_types.rs`).
  2. If verification fails, log a warning and drop the message:
     ```rust
     if !envelope.verify_signature(&peer_pubkey)? {
         tracing::warn!(peer = %contact_id, "Dropping message with invalid signature");
         continue;
     }
     ```
  3. For alert messages specifically, verify the alert is signed by the claimed peer's key before displaying or relaying.
  Build and deploy.

- [ ] **Add Bitcoin transaction/block validation before relay**: In `core/archipelago/src/mesh/bitcoin_relay.rs`, find lines 210-232 where block headers and transactions are relayed:
  1. For block headers, add basic validation:
     ```rust
     fn validate_block_header(header: &BlockHeader, last_known_height: u32) -> Result<bool> {
         // Check header version is valid (1-4 or BIP9 signaling)
         if header.version < 1 {
             return Ok(false);
         }
         // Check that height is sequential (within reason for mesh delays)
         if header.height > last_known_height + 100 {
             tracing::warn!("Block header height {} is too far ahead of known height {}", header.height, last_known_height);
             return Ok(false);
         }
         // Check prev_block_hash is 32 bytes
         if header.prev_block_hash.len() != 32 {
             return Ok(false);
         }
         Ok(true)
     }
     ```
  2. For transactions, add basic syntax validation:
     ```rust
     fn validate_raw_transaction(tx_bytes: &[u8]) -> Result<bool> {
         // Minimum valid transaction size is ~60 bytes
         if tx_bytes.len() < 60 || tx_bytes.len() > 400_000 {
             return Ok(false);
         }
         // Check version bytes (first 4 bytes, little-endian)
         let version = u32::from_le_bytes(tx_bytes[0..4].try_into()?);
         if version < 1 || version > 3 {
             return Ok(false);
         }
         Ok(true)
     }
     ```
  3. Add rate limiting: max 10 block headers per minute, max 5 transactions per minute per peer.
  4. Call these validation functions before relaying any data.
  Build and deploy.

- [ ] **Add message sequence numbers**: In `core/archipelago/src/mesh/message_types.rs`, add a `sequence: u64` field to `TypedEnvelope`:
  1. Add the field to the struct (with `#[serde(default)]` for backwards compatibility with old messages).
  2. In the message creation code, increment a per-peer counter for each outgoing message.
  3. On receive, track the last seen sequence per peer and log out-of-order messages at `debug!` level.
  4. Do NOT reject out-of-order messages (mesh is unreliable), but allow upper layers to reorder if needed.
  Build and deploy.

- [ ] **Verify Phase 4 — Mesh authentication active**: Run these checks:
  1. `grep -rn "verify_signature\|verify_strict" core/archipelago/src/mesh/ --include="*.rs"` — should show verification calls in listener.rs and message_types.rs.
  2. `grep -rn "validate_block_header\|validate_raw_transaction" core/archipelago/src/mesh/bitcoin_relay.rs` — validation functions exist.
  3. `cargo test --all-features` — all mesh tests pass.
  4. `cargo clippy --all-targets --all-features` — zero warnings.
  5. Backend starts cleanly with mesh enabled.

---

## ============================================================
## YEAR 1 — QUARTER 2: FRONTEND, NGINX, AND MEDIUM FIXES
## ============================================================

---

## Phase 5: Frontend — XSS, Auth, and Input Validation

> **Layman version**: The web interface has a few places where an attacker could inject malicious code
> into the page (XSS), steal login cookies, or redirect you to a fake site after login. We fix all
> of these and add proper input sanitization everywhere.

- [ ] **Fix v-html XSS in BootScreen and Settings**: In `neode-ui/src/components/BootScreen.vue` line 55, replace `v-html="icons[currentIcon]"` with a safe rendering approach:
  1. Since the icons are hardcoded SVG strings, create a computed property that returns the current icon and use `v-html` with a DOMPurify sanitizer.
  2. Install DOMPurify: `cd neode-ui && npm install dompurify && npm install -D @types/dompurify`.
  3. Verify the package exists first: `npm view dompurify version`.
  4. In BootScreen.vue:
     ```typescript
     import DOMPurify from 'dompurify'
     const sanitizedIcon = computed(() => DOMPurify.sanitize(icons[currentIcon.value], { USE_PROFILES: { svg: true } }))
     ```
     Then use `v-html="sanitizedIcon"`.
  5. In Settings.vue line 286, do the same for `totpQrSvg`:
     ```typescript
     const sanitizedQrSvg = computed(() => DOMPurify.sanitize(totpQrSvg.value, { USE_PROFILES: { svg: true } }))
     ```
  6. Run `npm run type-check` to verify.
  7. Build and deploy. Verify boot screen animation still works. Verify TOTP QR code still renders on Settings page.

- [ ] **Fix FileBrowser cookie security flags**: In `neode-ui/src/api/filebrowser-client.ts` line 62, find `document.cookie = \`auth=${this.token}; path=/app/filebrowser; SameSite=Strict\``. This cookie is missing security flags. Since we can't set `HttpOnly` from JavaScript (that's a server-side flag), the best we can do client-side is:
  ```typescript
  document.cookie = `auth=${this.token}; path=/app/filebrowser; SameSite=Strict; Secure`
  ```
  The `Secure` flag ensures the cookie is only sent over HTTPS. For the long term (Phase 13), the FileBrowser auth should be proxied through the backend so the cookie can be set server-side with `HttpOnly`.
  Also add an expiration so the cookie doesn't persist indefinitely:
  ```typescript
  const expires = new Date(Date.now() + 24 * 60 * 60 * 1000).toUTCString() // 24 hours
  document.cookie = `auth=${this.token}; path=/app/filebrowser; SameSite=Strict; Secure; expires=${expires}`
  ```
  Build and deploy. Verify FileBrowser still works (login, browse, download).

- [ ] **Hide TOTP secret by default**: In `neode-ui/src/views/Settings.vue`, find line 289 with `{{ totpSecretBase32 }}`. Wrap it in a reveal toggle:
  1. Add a ref: `const showTotpSecret = ref(false)`
  2. Replace the display with:
     ```vue
     <div v-if="totpSecretBase32" class="mt-3">
       <p class="text-xs text-white/50 mb-1">Manual entry key (keep secret!):</p>
       <div v-if="showTotpSecret" class="flex items-center gap-2">
         <p class="text-sm font-mono text-orange-400 break-all">{{ totpSecretBase32 }}</p>
         <button class="glass-button text-xs px-2 py-1" @click="showTotpSecret = false">Hide</button>
       </div>
       <button v-else class="glass-button text-xs px-3 py-1" @click="showTotpSecret = true">
         Show manual entry key
       </button>
     </div>
     ```
  3. Remove the `select-all` class — users should deliberately copy, not accidentally select.
  Build and deploy. Verify TOTP setup flow still works.

- [ ] **Validate redirect URL after login**: In `neode-ui/src/router/index.ts`, find line 231 with `const redirectTo = (to.query.redirect as string) || '/dashboard'`. Replace with:
  ```typescript
  function isLocalRedirect(path: unknown): path is string {
    if (typeof path !== 'string') return false
    try {
      // Must be a relative path, not an absolute URL
      if (path.startsWith('//') || path.includes('://')) return false
      const url = new URL(path, window.location.origin)
      return url.origin === window.location.origin
    } catch {
      return false
    }
  }

  const redirectTo = isLocalRedirect(to.query.redirect) ? to.query.redirect : '/dashboard'
  ```
  Run `npm run type-check`. Build and deploy. Test: visit `http://192.168.1.228/login?redirect=https://evil.com` — after login should go to `/dashboard`, NOT `evil.com`. Visit `http://192.168.1.228/login?redirect=/mesh` — after login should go to `/mesh`.

- [ ] **Add input trimming to all auth fields**: In `neode-ui/src/views/Login.vue`, find all password and input submissions. Add `.trim()` before sending:
  1. Search for `password.value` in the file. Wherever it's submitted via RPC (e.g., `params: { password: password.value }`), change to `params: { password: password.value.trim() }`.
  2. Do the same for TOTP code inputs, setup passwords, confirm passwords.
  3. Also check `neode-ui/src/views/Settings.vue` for password change forms — trim those too.
  Run `npm run type-check`. Build and deploy. Test login with a password that has trailing spaces — should still work.

- [ ] **Validate route parameters**: In `neode-ui/src/views/AppDetails.vue` (line ~485) and `neode-ui/src/views/AppSession.vue` (line ~267), add app ID validation:
  1. Create a utility function in `neode-ui/src/utils/` or inline:
     ```typescript
     function isValidAppId(id: unknown): id is string {
       return typeof id === 'string' && /^[a-z0-9][a-z0-9-]*[a-z0-9]$/.test(id) && id.length <= 64
     }
     ```
  2. In each view's `setup`, validate the route param early:
     ```typescript
     const appId = computed(() => {
       const id = route.params.id
       if (!isValidAppId(id)) {
         router.replace('/apps')
         return ''
       }
       return id
     })
     ```
  Build and deploy. Test: navigate to a valid app — should work. Navigate to `/app/../../etc/passwd` — should redirect to `/apps`.

- [ ] **Verify Phase 5 — Frontend hardened**: Run these checks:
  1. `grep -rn "v-html" neode-ui/src/ --include="*.vue" | grep -v "DOMPurify\|sanitize"` — any remaining v-html should be justified.
  2. `grep -rn "select-all" neode-ui/src/ --include="*.vue"` — TOTP secret should NOT have select-all.
  3. `npm run type-check` — zero errors.
  4. `npm run build` — builds successfully.
  5. Test login flow, TOTP setup, app navigation, FileBrowser at http://192.168.1.228.

---

## Phase 6: Nginx — Security Headers & Rate Limiting

> **Layman version**: The web server (nginx) is missing security headers that tell browsers how to
> protect users. We add headers that prevent clickjacking, content type confusion, and XSS. We also
> add rate limiting so attackers can't overwhelm the server with requests.

- [ ] **Fix Content Security Policy**: In `image-recipe/configs/nginx-archipelago.conf`, find line ~14 with the existing CSP. Replace the CSP header with a strict version:
  ```nginx
  add_header Content-Security-Policy "default-src 'self'; script-src 'self'; style-src 'self' 'unsafe-inline'; img-src 'self' data: blob:; font-src 'self' data:; connect-src 'self' ws: wss:; frame-src 'self'; frame-ancestors 'self'; base-uri 'self'; form-action 'self';" always;
  ```
  Note: `'unsafe-inline'` for styles is needed because Vue scoped styles sometimes inject inline styles. `'unsafe-eval'` is removed — if the app breaks, it means some JS is using `eval()` which should be fixed in code instead.
  Deploy the nginx config. Test the web UI thoroughly — if anything breaks, check browser console for CSP violations and adjust the policy minimally.

- [ ] **Replace X-Frame-Options stripping with SAMEORIGIN**: In `image-recipe/configs/snippets/archipelago-https-app-proxies.conf`, find all 38 occurrences of `proxy_hide_header X-Frame-Options;`. For each one, add after it:
  ```nginx
  add_header X-Frame-Options "SAMEORIGIN" always;
  ```
  This allows Archipelago's own UI to iframe apps but blocks external sites from framing them. Do the same in the HTTP config in `nginx-archipelago.conf`.
  Deploy and test: open an app in the Archipelago iframe — should still load.

- [ ] **Add HSTS header**: In `image-recipe/configs/nginx-archipelago.conf`, add to the HTTPS server block (or main server block if using HTTPS):
  ```nginx
  add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
  ```
  Note: Do NOT add `preload` — this is a local server, not a public domain.

- [ ] **Add rate limiting to RPC endpoint**: In `image-recipe/configs/nginx-archipelago.conf`, add at the top (before the `server` block):
  ```nginx
  # Rate limit zones
  limit_req_zone $binary_remote_addr zone=rpc:10m rate=20r/s;
  limit_req_zone $binary_remote_addr zone=auth:10m rate=3r/s;
  ```
  Then in the `/rpc/` location block, add:
  ```nginx
  limit_req zone=rpc burst=40 nodelay;
  limit_req_status 429;
  ```
  For auth-specific endpoints, apply stricter limits in the backend or add a separate location for auth RPCs.
  Deploy and test: normal UI use should work fine. Rapid-fire requests should get 429 responses.

- [ ] **Add remaining security headers**: In `image-recipe/configs/nginx-archipelago.conf`, add to the server block:
  ```nginx
  add_header X-Content-Type-Options "nosniff" always;
  add_header X-DNS-Prefetch-Control "off" always;
  add_header Referrer-Policy "strict-origin-when-cross-origin" always;
  add_header Permissions-Policy "camera=(), microphone=(), geolocation=(), payment=()" always;
  ```
  Deploy and verify: `curl -sI http://192.168.1.228 | grep -i "x-content\|referrer\|permissions\|strict-transport"`.

- [ ] **Verify Phase 6 — Nginx hardened**: Run these checks from another machine:
  1. `curl -sI http://192.168.1.228 | grep -i "content-security-policy"` — CSP header present, no `unsafe-eval`.
  2. `curl -sI http://192.168.1.228 | grep -i "x-content-type"` — `nosniff` present.
  3. `curl -sI http://192.168.1.228 | grep -i "x-frame-options"` — present on app proxies.
  4. `curl -sI http://192.168.1.228 | grep -i "referrer-policy"` — present.
  5. Rate limit test: `for i in $(seq 1 100); do curl -s -o /dev/null -w "%{http_code}\n" http://192.168.1.228/rpc/v1; done | sort | uniq -c` — should show some 429s.
  6. All UI features still work normally.

---

## Phase 7: Backend — MEDIUM Severity Fixes

> **Layman version**: These fixes improve defense-in-depth. They're not immediately exploitable like
> the critical bugs, but they close gaps that a sophisticated attacker could chain together. Think of
> it as adding deadbolts after fixing the broken window.

- [ ] **Add zeroization to SecretsManager**: In `core/security/src/secrets_manager.rs`, the encryption key stays in memory for the lifetime of the struct. Add zeroization on drop:
  1. Add `zeroize` dependency to `core/security/Cargo.toml` if not present: `zeroize = { version = "1", features = ["derive"] }`.
  2. Wrap the key material in a zeroizing wrapper. Since `Aes256Gcm` doesn't implement `Zeroize`, store the raw key separately:
     ```rust
     use zeroize::Zeroize;

     pub struct SecretsManager {
         secrets_dir: PathBuf,
         cipher: Aes256Gcm,
         raw_key: zeroize::Zeroizing<[u8; 32]>, // Zeroized on drop
     }
     ```
  3. In the constructor, store the key bytes before creating the cipher, and wrap in `Zeroizing`.
  Build and test: secrets should still encrypt/decrypt correctly.

- [ ] **Replace thread_rng with OsRng in secrets manager**: In `core/security/src/secrets_manager.rs`, find lines 64 and 221 where `rand::thread_rng().fill_bytes()` is used. Replace with:
  ```rust
  use rand::rngs::OsRng;
  OsRng.fill_bytes(&mut nonce_bytes);  // Line 64
  OsRng.fill_bytes(&mut new_secret_bytes);  // Line 221
  ```
  Build and test.

- [ ] **Encrypt the remember-me HMAC secret**: In `core/archipelago/src/session.rs`, find lines 395-403 where the remember-me secret is stored as plaintext. Encrypt it using the secrets manager:
  1. Instead of `std::fs::write(REMEMBER_SECRET_FILE, &secret)`, use the SecretsManager to encrypt the secret before writing.
  2. On read, decrypt using SecretsManager.
  3. If SecretsManager is not available at that point in the boot sequence, derive the secret from a combination of machine-specific data (e.g., `/etc/machine-id` + salt) using Argon2, so it's different per installation but deterministic.
  Build, deploy, and test: remember-me login should still work after restart.

- [ ] **Use checked arithmetic for Bitcoin amounts**: In `core/archipelago/src/wallet/ecash.rs` line 64, replace the `.sum()` with checked addition:
  ```rust
  pub fn balance(&self) -> u64 {
      self.tokens.iter()
          .filter(|t| !t.spent)
          .try_fold(0u64, |acc, t| acc.checked_add(t.amount_sats))
          .unwrap_or(u64::MAX) // Saturate on overflow rather than wrapping
  }
  ```
  Search for other `.sum()` calls on monetary amounts: `grep -rn "\.sum()" core/ --include="*.rs"`. Fix any that operate on `u64` Bitcoin amounts.
  Build and test.

- [ ] **Create validated AppId newtype**: In `core/archipelago/src/api/rpc/container.rs`, create a newtype for app IDs:
  ```rust
  #[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
  pub struct AppId(String);

  impl AppId {
      pub fn new(id: &str) -> Result<Self> {
          // Only allow lowercase alphanumeric + hyphens, 1-64 chars
          if id.is_empty() || id.len() > 64 {
              anyhow::bail!("App ID must be 1-64 characters");
          }
          if !id.chars().all(|c| c.is_ascii_lowercase() || c.is_ascii_digit() || c == '-') {
              anyhow::bail!("App ID must contain only lowercase letters, digits, and hyphens");
          }
          if id.starts_with('-') || id.ends_with('-') || id.contains("--") {
              anyhow::bail!("App ID must not start/end with hyphen or contain consecutive hyphens");
          }
          Ok(Self(id.to_string()))
      }

      pub fn as_str(&self) -> &str { &self.0 }
  }

  impl std::fmt::Display for AppId {
      fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
          write!(f, "{}", self.0)
      }
  }
  ```
  Use `AppId` in RPC handler signatures where app IDs are accepted. The deserializer will validate automatically.
  Build — fix all compilation errors from the type change. Deploy and test app operations.

- [ ] **Validate Tor service names**: In `core/archipelago/src/api/rpc/tor.rs`, find lines 426-427 where `name` is used in path operations. Add validation:
  ```rust
  fn validate_service_name(name: &str) -> Result<()> {
      if name.is_empty() || name.len() > 64 {
          anyhow::bail!("Service name must be 1-64 characters");
      }
      if !name.chars().all(|c| c.is_ascii_alphanumeric() || c == '-' || c == '_') {
          anyhow::bail!("Service name must contain only alphanumeric characters, hyphens, and underscores");
      }
      Ok(())
  }
  ```
  Call `validate_service_name(&name)?;` before any filesystem operation with the name.
  Build and deploy.

- [ ] **Add per-user rate limiting on CPU-intensive RPC endpoints**: In `core/archipelago/src/api/rpc/mod.rs`, add a rate limiter for expensive operations:
  1. Add a simple token-bucket rate limiter using a `HashMap<String, (Instant, u32)>` behind a `Mutex`.
  2. Apply rate limits to: `backup.create` (1/minute), container install/uninstall (5/minute), `auth.totp.setup` (3/minute), password change (3/minute).
  3. Return HTTP 429 with a `Retry-After` header when rate limited.
  Build and deploy. Test: rapid-fire backup requests should be throttled.

- [ ] **Implement backup recovery codes**: In `core/archipelago/src/auth.rs` or `session.rs`, add recovery code generation during initial setup:
  1. Generate 8 random recovery codes (each 8 characters, alphanumeric) during password setup.
  2. Hash them with SHA-256 and store the hashes in `/var/lib/archipelago/recovery-codes.json`.
  3. Display the codes to the user once (they must write them down).
  4. Add an RPC endpoint `auth.recover` that accepts a recovery code, verifies against stored hashes, and allows password reset.
  5. Each code is single-use — delete the hash after successful use.
  Build, deploy, and test the full flow.

- [ ] **Verify Phase 7 — Backend medium fixes complete**: Run these checks:
  1. `cargo clippy --all-targets --all-features` — zero warnings.
  2. `cargo test --all-features` — all tests pass.
  3. `grep -rn "thread_rng" core/security/ --include="*.rs"` — zero results.
  4. Backend starts cleanly after deploy.
  5. All UI features work: login, remember-me, app install, settings.

---

## Phase 8: Mesh — MEDIUM Fixes & Atomic State

> **Layman version**: The encrypted messaging system has some edge cases where a crash at the wrong
> moment could weaken security, and emergency alerts can be faked. We fix the crash safety and add
> signature checks to alerts.

- [ ] **Add alert signature verification on receive**: In `core/archipelago/src/mesh/listener.rs`, find where emergency alerts are processed. Before displaying or relaying an alert:
  ```rust
  // Verify the alert is actually signed by the claimed peer
  let peer_pubkey = resolve_peer_pubkey(&envelope.sender)?;
  if !envelope.verify_signature(&peer_pubkey)? {
      tracing::warn!(
          claimed_sender = %envelope.sender,
          "Dropping emergency alert with invalid signature — possible spoofing attempt"
      );
      continue; // Skip this alert
  }
  ```
  Build and test.

- [ ] **Implement atomic ratchet state persistence**: In `core/archipelago/src/mesh/session.rs`, find lines 156-159 where ratchet state is saved. Replace with atomic write (write to temp file, then rename):
  ```rust
  async fn save_session_atomic(&self, did: &str, state: &RatchetState) -> Result<()> {
      let path = self.session_path(did);
      let tmp_path = path.with_extension("tmp");

      let data = serde_json::to_vec(state)
          .context("Failed to serialize ratchet state")?;

      tokio::fs::write(&tmp_path, &data).await
          .context("Failed to write temporary ratchet state")?;

      tokio::fs::rename(&tmp_path, &path).await
          .context("Failed to atomically rename ratchet state file")?;

      Ok(())
  }
  ```
  This ensures that a crash during write leaves either the old state (intact) or the new state (complete), never a partial/corrupt file.
  Build and test.

- [ ] **Encrypt GPS in dead man's switch alerts**: In `core/archipelago/src/mesh/alerts.rs`, find where GPS coordinates are included in alerts. Encrypt the GPS data for intended recipients only:
  1. Make GPS optional in the alert struct: `gps: Option<EncryptedGps>`.
  2. When creating an alert, encrypt GPS coordinates using each trusted peer's public key.
  3. Only intended recipients can decrypt the GPS. Other mesh relayers see the alert but not the location.
  Build and test.

- [ ] **Systematic unwrap audit in mesh code**: Run `grep -rn "\.unwrap()\|\.expect(" core/archipelago/src/mesh/ --include="*.rs" | grep -v "mod tests" | grep -v "#\[test\]"`. For each occurrence:
  1. If it's in message parsing/deserialization — replace with `?` (incoming data is untrusted).
  2. If it's after a guaranteed check (e.g., `if x.is_some() { x.unwrap() }`) — refactor to `if let Some(v) = x`.
  3. If it's truly infallible (e.g., regex compilation of a literal) — add `// SAFETY: literal regex cannot fail` comment.
  Target: reduce unwrap/expect in non-test mesh code to under 20, all documented.
  Build and run full test suite.

- [ ] **Verify Phase 8 — Mesh hardened**: Run these checks:
  1. `cargo test --all-features` — all tests pass.
  2. `grep -c "unwrap()\|\.expect(" core/archipelago/src/mesh/*.rs | grep -v test` — count should be under 20.
  3. Backend starts cleanly with mesh enabled.
  4. No ratchet state `.tmp` files left behind: `ls /var/lib/archipelago/mesh/sessions/*.tmp` — should be empty.

---

## ============================================================
## YEAR 1 — QUARTER 3: PRODUCTION FEATURES & INFRASTRUCTURE
## ============================================================

---

## Phase 9: Tor-by-Default Integration

> **Layman version**: Currently, Tor is optional. Competitors like Start9 and nix-bitcoin route all
> traffic through Tor by default for maximum privacy. We match this by making Tor the default for
> all Bitcoin and Lightning network connections.

- [ ] **Install and configure Tor on first boot**: In `scripts/first-boot-containers.sh`, add a Tor container (or system service) that starts before other services:
  1. Add a Tor container or verify the system Tor service is installed and enabled.
  2. Configure Tor with a SocksPort on `127.0.0.1:9050`.
  3. Add hidden service configs for: web UI (port 80), LND (port 8081), Bitcoin P2P (port 8333).
  4. Save the generated `.onion` addresses to `/var/lib/archipelago/tor-hostnames/`.

- [ ] **Route Bitcoin Core through Tor by default**: Add `-proxy=127.0.0.1:9050` and `-onlynet=onion` to bitcoin-knots container flags. This routes all P2P connections through Tor, hiding the node's IP address from the Bitcoin network.
  Test: `sudo podman exec bitcoin-knots bitcoin-cli getnetworkinfo` should show only onion connections.

- [ ] **Route LND through Tor**: Configure LND to use Tor for all connections. Add `--tor.active --tor.socks=127.0.0.1:9050` to LND start flags. Verify LND peers are connected via Tor.

- [ ] **Add .onion URL display in web UI**: In `neode-ui/src/views/Settings.vue`, add a section showing the node's .onion address for remote access via Tor Browser.

- [ ] **Add Tor toggle in settings**: Allow users to disable Tor if they prefer clearnet (some use cases require it). Default should be Tor-on.

- [ ] **Verify Phase 9 — Tor active**: Bitcoin peers are onion-only, LND via Tor, .onion address displayed in UI.

---

## Phase 10: Encrypted Backup System

> **Layman version**: If your hardware dies, you lose everything — Bitcoin wallet, Lightning channels,
> all app data. We build an encrypted backup system so you can restore to new hardware. Start9 has this;
> we need it too.

- [ ] **Design backup manifest**: Create a backup manifest that lists what to back up per app: data directories, config files, secrets. Store in `apps/{app-id}/manifest.yml` under a `backup:` key.

- [ ] **Implement encrypted backup creation**: Add an RPC endpoint `backup.create` that:
  1. Snapshots all app data directories using `tar`.
  2. Encrypts the tarball with AES-256-GCM using a key derived from the user's master password + Argon2.
  3. Saves to a configurable destination (local USB, network share, etc.).
  4. Shows progress in the UI.

- [ ] **Implement encrypted backup restore**: Add an RPC endpoint `backup.restore` that:
  1. Accepts a backup file and the master password.
  2. Decrypts and verifies integrity.
  3. Stops affected containers, restores data, restarts containers.
  4. Handles version migration if backup is from an older version.

- [ ] **Add scheduled backups**: Allow users to configure automatic backups (daily/weekly) to external storage.

- [ ] **Verify Phase 10 — Backup/restore works**: Create a backup, delete an app's data, restore from backup, verify app works.

---

## Phase 11: Automated Update System

> **Layman version**: Currently, updates require SSH access and running a script manually. Users need
> a "click to update" button like Umbrel has. We build this with atomic updates that can roll back
> if something breaks.

- [ ] **Design update architecture**: Plan the update mechanism:
  1. Backend checks for updates by fetching a signed manifest from a known URL (or local file for air-gapped).
  2. Updates are downloaded as delta tarballs (frontend + backend binary).
  3. Applied atomically: new binary placed alongside old, symlink swapped.
  4. Rollback: if health check fails after update, swap symlink back.

- [ ] **Implement update check RPC endpoint**: Add `system.check_updates` that fetches the update manifest and returns available version + changelog.

- [ ] **Implement update apply RPC endpoint**: Add `system.apply_update` that downloads, verifies signature, applies, and restarts.

- [ ] **Add rollback mechanism**: If the backend fails to start after update (health check via systemd), automatically roll back to previous binary.

- [ ] **Add update UI in Settings**: Show current version, available updates, changelog, and "Update Now" button with progress indicator.

- [ ] **Verify Phase 11 — Updates work**: Simulate an update (place a new binary version), apply it, verify the system comes back up. Simulate a bad update, verify rollback.

---

## Phase 12: App Ecosystem Expansion

> **Layman version**: We have ~15 apps. The Bitcoin essentials are covered, but users expect at least
> 30 apps to compete with Start9/RaspiBlitz. We add the most-requested apps with proper security hardening.

- [ ] **Add missing essential Bitcoin apps**: Ensure these are available and work out of the box:
  1. Fulcrum (Electrum server alternative — faster than Electrs for large wallets)
  2. Thunderhub (Lightning management — alternative to Ride the Lightning)
  3. LNbits (Lightning toolkit with extensions)
  4. Lightning Terminal (Loop, Pool, Faraday in one UI)
  5. Specter Desktop (multisig wallet management)

- [ ] **Add privacy-enhancing apps**:
  1. JoinMarket / JAM (CoinJoin — RaspiBlitz has this, we should too)
  2. Whirlpool CLI (if legally permissible post-Samourai)

- [ ] **Add self-hosting essentials**:
  1. Matrix / Synapse (decentralized chat)
  2. Gitea (self-hosted Git)
  3. WireGuard (VPN — nix-bitcoin has this)

- [ ] **Harden all new app manifests**: Every new app must have:
  - `readonly_root: true`
  - `cap_drop: ALL` + only required caps added
  - Non-root user (UID > 1000)
  - `no-new-privileges: true`
  - Pinned image by SHA256 digest
  - Health check configured

- [ ] **Verify Phase 12 — All apps work**: Install each new app, verify it starts, verify the UI loads, verify it connects to Bitcoin/Lightning if needed.

---

## ============================================================
## YEAR 1 — QUARTER 4: PRODUCTION READINESS
## ============================================================

---

## Phase 13: Advanced Security Hardening

> **Layman version**: We've fixed all the known bugs. Now we add proactive security measures — things
> that prevent entire classes of bugs from being exploitable, even if new bugs are introduced later.

- [ ] **Add Content Security Policy nonce support**: Replace `'unsafe-inline'` in CSP with nonce-based script loading. This requires the backend to generate a random nonce per page load and inject it into both the CSP header and the script tags.

- [ ] **Implement session timeout**: In `core/archipelago/src/session.rs`, add configurable session timeout (default 24 hours, configurable in settings). Auto-expire sessions that haven't been active.

- [ ] **Add "active sessions" management**: Show all active sessions in the Settings UI with last-active time and IP. Allow users to terminate individual sessions or "log out everywhere."

- [ ] **Require re-authentication for sensitive operations**: Password change, 2FA setup/disable, and recovery code regeneration should require entering the current password, even if already logged in.

- [ ] **Implement audit logging**: Log all security-relevant events (login, logout, failed login, password change, 2FA change, app install/uninstall) to a dedicated audit log file with timestamps and source IPs.

- [ ] **Verify Phase 13**: Session timeout works, active sessions visible, re-auth required for sensitive ops, audit log populated.

---

## Phase 14: ISO Build Hardening

> **Layman version**: The ISO installer creates the initial system. We harden it so that a freshly
> installed Archipelago is secure out of the box — no manual hardening needed.

- [ ] **Force password change on first boot**: The installer should require setting a unique admin password. No default passwords should work after first boot.

- [ ] **Enable automatic security updates for the OS**: Configure unattended-upgrades for Debian security patches only (not full upgrades).

- [ ] **Harden SSH configuration**: In the installed system's sshd_config:
  1. Disable password authentication (key-only).
  2. Disable root login.
  3. Use ed25519 host keys only.
  Note: This is for the PRODUCTION installed system, not the dev server.

- [ ] **Configure firewall (UFW)**: Enable UFW on first boot with:
  - Allow: 80 (HTTP), 443 (HTTPS), 8333 (Bitcoin P2P), 9735 (Lightning P2P)
  - Allow: Podman container networking (forward policy ACCEPT)
  - Deny: everything else by default

- [ ] **Pin all container images in first-boot script by SHA256 digest**: Replace any remaining `:latest` or version-only tags with `image@sha256:...` digests. Document how to update digests when new versions are released.

- [ ] **Verify Phase 14**: Flash a test ISO, boot it, verify all hardening is active, verify apps work.

---

## Phase 15: Penetration Test Round 1

> **Layman version**: We've fixed everything we know about. Now we try to break in ourselves to find
> what we missed. This is a structured attempt to attack the system from different angles.

- [ ] **Network-level testing**: From another machine on the LAN:
  1. Port scan: `nmap -sV 192.168.1.228` — only expected ports should be open.
  2. Try accessing Bitcoin RPC directly: `curl http://192.168.1.228:8332` — should fail.
  3. Try accessing container ports that shouldn't be exposed.
  4. Test rate limiting: spam the login endpoint.

- [ ] **Web application testing**:
  1. Test for XSS: inject `<script>alert(1)</script>` in every input field.
  2. Test for CSRF: craft cross-origin POST to `/rpc/v1` from a different origin — should fail.
  3. Test for open redirect: `?redirect=https://evil.com` — should not redirect externally.
  4. Test for path traversal: `../../etc/passwd` in app IDs, file paths.
  5. Check CSP: browser console should show no violations during normal use.
  6. Check cookies: all session cookies should have `Secure`, `SameSite` flags.

- [ ] **Authentication testing**:
  1. Brute force login: 100 rapid login attempts — should be rate limited.
  2. Session fixation: use an old session token after logout — should fail.
  3. TOTP bypass: try using old TOTP codes — should fail (replay protection).
  4. Remember-me token: should not work after password change.

- [ ] **Container escape testing**:
  1. Verify all containers run as non-root: `sudo podman inspect --format '{{.Config.User}}' $(sudo podman ps -q)`.
  2. Verify read-only root: `sudo podman exec {container} touch /test-file` — should fail.
  3. Verify no capabilities beyond required: `sudo podman inspect --format '{{.HostConfig.CapDrop}} {{.HostConfig.CapAdd}}' $(sudo podman ps -q)`.

- [ ] **Document all findings**: Create a test report with pass/fail for each test. Fix any failures found.

---

## Phase 16: Documentation & User Guides

> **Layman version**: The best security in the world is useless if users can't set it up correctly.
> We write clear guides so anyone can install, configure, and maintain their node securely.

- [ ] **Write installation guide**: Step-by-step guide from downloading the ISO to first login.

- [ ] **Write security best practices guide**: How to keep your node secure — password strength, 2FA setup, backup procedures, network security.

- [ ] **Write app integration guide**: How each app connects to Bitcoin/Lightning, what data it stores, how to back it up.

- [ ] **Write recovery guide**: What to do if you lose your password, how to restore from backup, how to migrate to new hardware.

- [ ] **Verify Phase 16**: Have someone unfamiliar with the project follow the guides and report any confusion.

---

## ============================================================
## YEAR 2 — QUARTERS 1-2: POLISH, SCALE, AND ADVANCED FEATURES
## ============================================================

---

## Phase 17: Reproducible Builds

> **Layman version**: Users should be able to verify that the binary they're running was built from
> the exact source code they can read. This prevents supply chain attacks — nobody can sneak in
> malicious code without it being visible in the source.

- [ ] **Containerized build environment**: Create a Dockerfile that builds the Rust backend and Vue frontend in a deterministic environment (pinned Rust version, pinned Node version, pinned system libraries).

- [ ] **Publish build checksums**: After each release build, publish SHA256 checksums of all artifacts (backend binary, frontend bundle, ISO image).

- [ ] **Document verification process**: Write instructions for users to verify their installed binary matches the published checksum.

- [ ] **Verify Phase 17**: Build the same commit twice in the containerized environment — checksums should match.

---

## Phase 18: Mobile Companion & Remote Access

> **Layman version**: Umbrel has a mobile app. Start9 uses Tor .onion addresses for remote access.
> We need at least one of these so users can check on their node from their phone.

- [ ] **Implement Tor hidden service for web UI**: The web UI should be accessible via a .onion address from Tor Browser on any device, anywhere in the world, without port forwarding.

- [ ] **Optimize web UI for mobile**: Make the Vue UI responsive for phone-sized screens. Test on iOS Safari and Android Chrome.

- [ ] **Add PWA support**: Make the web UI installable as a Progressive Web App on mobile devices.

- [ ] **Verify Phase 18**: Access the node via Tor Browser on a phone. Install as PWA. All core features work on mobile.

---

## Phase 19: CoinJoin Integration

> **Layman version**: RaspiBlitz has JoinMarket, RoninDojo had Whirlpool. CoinJoin is essential for
> Bitcoin privacy — it mixes your coins with others so transactions can't be traced back to you.

- [ ] **Integrate JoinMarket/JAM**: Add JoinMarket as a containerized app with the JAM web UI. Auto-connect to the local Bitcoin Core instance.

- [ ] **Add CoinJoin guide**: Document how to use JoinMarket for privacy, including maker/taker roles and fee settings.

- [ ] **Verify Phase 19**: JoinMarket starts, connects to Bitcoin Core, JAM UI accessible, can create a test CoinJoin (testnet or small amount).

---

## Phase 20: Advanced Mesh Features

> **Layman version**: The mesh networking is already unique. Now we polish it — make it more reliable,
> add peer reputation (trust peers who send valid data), and improve the steganography to resist
> more sophisticated analysis.

- [ ] **Implement peer reputation system**: Track which peers send valid vs invalid data. Peers that consistently send valid block headers get higher trust scores. Peers that send invalid data get deprioritized.

- [ ] **Improve steganography resistance**: Add timing jitter to mesh transmissions so traffic patterns don't reveal communication. Vary message sizes to resist traffic analysis.

- [ ] **Add mesh health dashboard**: Show mesh network status, connected peers, message latency, relay statistics in the web UI.

- [ ] **Verify Phase 20**: Mesh connects, messages relay, peer reputation tracks correctly, steganography modes work.

---

## ============================================================
## YEAR 2 — QUARTERS 3-4: FINAL HARDENING & v1.0
## ============================================================

---

## Phase 21: Penetration Test Round 2

> **Layman version**: We did this in Phase 15 with the early fixes. Now we repeat it with the full
> production system including all new features. This is the final check before v1.0.

- [ ] **Repeat all Phase 15 tests**: Network, web, auth, container — every test from Phase 15.

- [ ] **Test new features**: Tor access, backup/restore, updates, CoinJoin, mesh.

- [ ] **Test adversarial mesh scenarios**:
  1. Rogue peer sending fake identities — should be rejected (Phase 4 fix).
  2. Rogue peer sending invalid Bitcoin data — should be filtered (Phase 4 fix).
  3. Rogue peer sending fake emergency alerts — should be rejected (Phase 8 fix).
  4. Replay attack on mesh messages — sequence numbers should detect.

- [ ] **Test disaster recovery**:
  1. Kill the server during a backup — verify partial backups are handled safely.
  2. Kill the server during an update — verify rollback works.
  3. Corrupt the ratchet state file — verify atomic persistence prevented data loss (Phase 8 fix).
  4. Lose the admin password — verify recovery codes work (Phase 7 fix).

- [ ] **Document all findings and fix any issues**.

---

## Phase 22: Dependency Audit & Supply Chain

> **Layman version**: Our code might be secure, but if a library we depend on has a vulnerability,
> we're still exposed. We audit every dependency.

- [ ] **Run cargo audit**: `cd core && cargo install cargo-audit && cargo audit`. Fix or document all advisories.

- [ ] **Run npm audit**: `cd neode-ui && npm audit`. Fix all critical and high severity issues.

- [ ] **Review transitive dependencies**: For each direct dependency, check its dependency tree for abandoned or suspicious packages.

- [ ] **Pin all Cargo.lock and package-lock.json**: Ensure these lock files are committed and used in all builds.

- [ ] **Set up automated dependency monitoring**: Configure Dependabot or similar for automated security alerts on dependency vulnerabilities.

- [ ] **Verify Phase 22**: Zero critical/high advisories in both `cargo audit` and `npm audit`.

---

## Phase 23: Performance & Reliability Under Load

> **Layman version**: Security under normal use is one thing. Security under stress (many users,
> large blockchain, limited resources) is another. We test that the system remains stable and secure
> when pushed to its limits.

- [ ] **Stress test RPC endpoints**: Send 1000 concurrent RPC requests — verify rate limiting works and the server doesn't crash.

- [ ] **Test with full blockchain**: Verify the system handles a 600GB+ blockchain without running out of disk space, memory, or CPU.

- [ ] **Test mesh under high message volume**: Send 100 messages per minute through the mesh — verify encryption/decryption keeps up and memory doesn't leak.

- [ ] **Test container resource limits**: Start all apps simultaneously — verify memory and CPU limits prevent any single app from starving others.

- [ ] **Monitor for memory leaks**: Run the backend for 7 days continuously. Monitor RSS memory — should be stable, not growing.

- [ ] **Verify Phase 23**: System stable after 7 days of continuous operation with all apps running.

---

## Phase 24: Final Review & v1.0 Release

> **Layman version**: Everything is fixed, tested, hardened, and tested again. This is the final
> review before declaring the system production-ready.

- [ ] **Full code review**: Review every module one more time:
  1. `core/security/` — secrets manager, image verifier, AppArmor
  2. `core/archipelago/src/api/` — all RPC endpoints
  3. `core/archipelago/src/mesh/` — all mesh code
  4. `core/container/` — Podman client
  5. `neode-ui/src/api/` — RPC client, WebSocket, container client
  6. `neode-ui/src/views/` — all views
  7. `image-recipe/configs/` — nginx, systemd
  8. `scripts/` — first-boot, deploy

- [ ] **Verify all Phase checks pass**: Go through every "Verify Phase N" checklist from Phases 1-23. Every check must pass.

- [ ] **Compare against competitors one final time**: Re-evaluate the competitive comparison table. Document where Archipelago stands on every dimension.

- [ ] **Create security advisory process**: Document how security vulnerabilities should be reported, triaged, and disclosed. Create a SECURITY.md in the repository.

- [ ] **Tag v1.0 release**: Create the release with full changelog, checksums, and documentation.

- [ ] **Build and publish v1.0 ISO**: Final ISO build with all hardening active.

---

## ============================================================
## APPENDIX A: COMPETITIVE COMPARISON (Reference)
## ============================================================

> This section is informational — it explains WHERE Archipelago stands versus competitors so each
> phase's priorities are clear.

### Architecture Comparison

**Archipelago**
- Language: Rust + Vue 3 + TypeScript
- Containers: Podman (rootless)
- OS: Debian 12
- Status: Pre-production (2024)

**Umbrel**
- Language: TypeScript + Node.js + React
- Containers: Docker (root daemon)
- OS: Custom Debian
- Status: Production (since 2020, 10.8k GitHub stars)

**Start9 (StartOS)**
- Language: Rust + TypeScript
- Containers: Docker
- OS: Custom Linux
- Status: Production (since 2020, 1.6k GitHub stars)

**RaspiBlitz**
- Language: Python + Bash
- Containers: None (bare metal systemd)
- OS: Raspberry Pi OS
- Status: Production (since 2018, 2.6k GitHub stars, 207 contributors)

**myNode**
- Language: Python + Bash
- Containers: Docker (partial)
- OS: Debian
- Status: Production (since 2019, 730 GitHub stars)

**Nodl**
- Language: Unknown (proprietary)
- Containers: Unknown
- OS: Custom Linux
- Status: Production (since 2018, hardware-only)

**nix-bitcoin**
- Language: Nix + Shell
- Containers: None (systemd services)
- OS: NixOS
- Status: Production (since 2018, 600 GitHub stars)

**RoninDojo**
- Language: Bash
- Containers: Docker
- OS: Debian 12
- Status: Uncertain (Samourai arrest impact, since 2019)

**Citadel**
- Language: TypeScript (Umbrel fork)
- Containers: Docker
- OS: Pi OS
- Status: Abandoned (since 2022, 137 GitHub stars)

---

### Security Comparison

**Archipelago** — Rootless containers, AES-256-GCM secrets, TOTP 2FA, Signal protocol mesh.
Needs: systemd hardening (Phase 2), credential rotation (Phase 1).

**Umbrel** — Root Docker, plaintext secrets, no 2FA, no LAN encryption.
Known critical vuln: default passwords allowed fund theft.
License: PolyForm NC (NOT open source).

**Start9** — Docker containers, encrypted backups, self-signed CA for LAN HTTPS, Tor default.
Strongest incumbent security posture among GUI-based platforms.

**RaspiBlitz** — No containers (bare metal), separate bitcoin user, fully transparent.
No sandboxing, bash scripts are fragile.

**myNode** — Mixed Docker/systemd, basic security, Tor optional.
License: CC-NC-ND (restrictive).

**Nodl** — Full disk encryption, physical kill switch, RAID redundancy.
Best hardware security. Software details not public.

**nix-bitcoin** — BEST SECURITY overall. Hardened kernel, seccomp-bpf, namespace isolation,
systemd sandboxing, reproducible builds, security bounty fund. No GUI (CLI only).

**RoninDojo** — Privacy-first (Whirlpool CoinJoin), Tor default.
Future uncertain due to Samourai legal situation.

---

### Unique Features Only Archipelago Has

1. Mesh networking (LoRa/RF peer-to-peer)
2. Off-grid Bitcoin relay (TX + block headers over radio)
3. Signal Protocol encrypted P2P (X3DH + Double Ratchet)
4. Steganography (data as weather/sensor readings)
5. Dead man's switch (automated emergency alerts)
6. Rootless containers (Podman — no root daemon)
7. TOTP 2FA on web UI
8. Encrypted secrets manager (AES-256-GCM at rest)

### Features Archipelago Needs to Add

1. Tor-by-default (Phase 9) — Start9, nix-bitcoin, RoninDojo have this
2. Encrypted backups (Phase 10) — Start9 has this
3. Automated updates (Phase 11) — Umbrel, Start9, Nodl have this
4. Larger app ecosystem (Phase 12) — Umbrel has 300+
5. Systemd hardening (Phase 2) — nix-bitcoin has this
6. CoinJoin (Phase 19) — RaspiBlitz, RoninDojo have this
7. Mobile access (Phase 18) — Umbrel, Start9 have this
8. Reproducible builds (Phase 17) — nix-bitcoin has this

---

## ============================================================
## APPENDIX B: DEV ENVIRONMENT (OUT OF SCOPE)
## ============================================================

> These items are INTENTIONAL development tooling. They exist for convenience on a private home LAN.
> They are NOT production security issues. DO NOT CHANGE THEM.

1. **SSH keys and passwords in deploy scripts** — Used to deploy from Mac to dev server over home LAN.
   `StrictHostKeyChecking=no` is acceptable for a known server on a trusted network.

2. **`password123` default in dev mode** — Only active when `config.dev_mode` is true. Not compiled
   into production builds. Used for rapid development iteration.

3. **Test script passwords** — Test scripts (`test-security.sh`, `test-app-install.sh`) use known
   passwords for automated testing against dev servers.

4. **SSH credentials in CLAUDE.md** — Development convenience for AI-assisted deployment. The dev
   server is behind a home router with no port forwarding.

5. **Deploy script SSH config** — `scripts/deploy-config.sh` stores dev server access credentials.
   Gitignored. Not part of the production system.

6. **Mock backend** (`neode-ui/mock-backend.js`) — Dev-only Node.js server for frontend development.
   Never deployed to production. Uses `password123` for testing.

These are all standard development practices for a pre-production project on a private network.
The production system (what gets installed via ISO) does not use any of these credentials.