fix: correct PhotoPrism icon filename typo in backend metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dorian 2026-03-15 04:01:12 +00:00
parent d52ebbb7a6
commit a42e922000
2 changed files with 141 additions and 369 deletions

View File

@ -402,7 +402,7 @@ fn get_app_metadata(app_id: &str) -> AppMetadata {
"photoprism" => AppMetadata { "photoprism" => AppMetadata {
title: "PhotoPrism".to_string(), title: "PhotoPrism".to_string(),
description: "AI-powered photo management".to_string(), description: "AI-powered photo management".to_string(),
icon: "/assets/img/app-icons/photoprims.svg".to_string(), icon: "/assets/img/app-icons/photoprism.svg".to_string(),
repo: "https://github.com/photoprism/photoprism".to_string(), repo: "https://github.com/photoprism/photoprism".to_string(),
tier: "", tier: "",
}, },

View File

@ -1,473 +1,245 @@
# Archipelago 5-Year Production Hardening Plan # Overnight Plan — Archy Refactoring & App Integration Hardening
**Version**: 2.0 > Make the Archy codebase rock-solid: fix all broken containers/iframes, perfect app installation/management/icons, get IndeedHub + Nostr signer flawless, and begin critical refactoring. No new features, no design changes. Bitcoin only.
**Period**: March 2026 -- March 2031 > See `docs/refactoring-plan.md` for the full 3-year plan. See `CLAUDE.md` for all project rules and conventions.
**Goal**: Production-ready Bitcoin Node OS at 10,000 users with zero failures, 100% uptime, full inter-node federation > Deploy after every change: `./scripts/deploy-to-target.sh --live` — test at http://192.168.1.228
**Visual constraint**: NEVER change animations, user experience, or flow -- only clean up duplications, information hierarchy, and cosmetic issues
**Web5 additions**: did:dht, DWN protocol definitions for interoperable schemas, Verifiable Credentials (per TBD assessment)
**Primary test node**: `192.168.1.228` (Arch 1) — 4-core i3-8100T, 16GB RAM, 1.8TB NVMe
**Secondary test node**: `192.168.1.198` (Arch 2) — 8GB RAM, 457GB disk
**SSH**: `ssh -i ~/.ssh/archipelago-deploy archipelago@{IP}`
**Deploy**: `./scripts/deploy-to-target.sh --both`
--- ---
## SECURITY RULE: No Tor Address Publishing to Nostr Relays (2026-03-13) ## Phase 1: Fix App Icon Consistency
**NEVER publish .onion addresses to public Nostr relays.** This was removed on 2026-03-13 because broadcasting Tor addresses to public relays defeats the purpose of Tor's privacy. All `publish_node_identity` calls have been removed from: - [x] **Fix PhotoPrism icon typo in backend metadata**: In `core/archipelago/src/container/docker_packages.rs`, the `get_app_metadata()` function references `photoprims.svg` (missing 'h') for the PhotoPrism icon. Search for `photoprims` and replace with `photoprism`. Verify the icon file exists at `neode-ui/public/assets/img/app-icons/photoprism.svg`. Run `cargo clippy --all-targets --all-features` in `core/` on the dev server after the fix.
- `tor.rs` — address rotation no longer publishes to relays
- `node.rs``node.nostr-publish` RPC now returns an error
- `network.rs` — visibility changes no longer publish to relays
Nodes connect via **federation ID** (DID), not public Nostr discovery. Federation peer notification (private peer-to-peer) is still allowed. - [ ] **Fix IndeedHub duplicate icon — consolidate to indeedhub.png**: Two icon files exist: `neode-ui/public/assets/img/app-icons/indeedhub.ico` and `indeehub.ico` (typo). Delete `indeehub.ico`. Convert `indeedhub.ico` to `indeedhub.png` (better format consistency). Update all references: (1) `neode-ui/src/utils/dummyApps.ts` line ~518 — change `indeehub.ico` to `indeedhub.png`, (2) `neode-ui/src/views/Marketplace.vue` line ~913 — change `indeehub.ico` to `indeedhub.png`, (3) `core/archipelago/src/container/docker_packages.rs` lines ~451-454 — change `indeehub.ico` to `indeedhub.png`. Search the entire codebase for `indeehub` (missing 'd') and fix all occurrences to `indeedhub`. Run `cd neode-ui && npm run type-check` to verify.
Tor rotation now **immediately destroys** the old address (no transition period). Old keys are deleted, not renamed. - [ ] **Audit all app icons match their references**: Cross-check every icon path referenced in `docker_packages.rs` `get_app_metadata()` against actual files in `neode-ui/public/assets/img/app-icons/`. Verify each app in the `Marketplace.vue` `getCuratedAppList()` function has an icon that exists. If any icon is missing, check if a similar-named file exists (e.g., wrong extension). Fix all mismatches. Remove orphaned icons that no app references (e.g., `atob.png`, `community-store.png`, `k484.png`, `lorabell.png`, `morphos.png` — verify they're truly unused first). Standardize: prefer `.png` or `.svg` over `.ico` and `.webp` where possible without changing existing working icons.
All Tor addresses on .228 and .198 were rotated on 2026-03-13 to invalidate any previously published addresses.
--- ---
## Critical Findings from Investigation (2026-03-13) ## Phase 2: Fix Container Crash Loops & Health
### Server .228 Issues - [ ] **Diagnose and fix container networking DNS failures**: SSH to 192.168.1.228 (`sshpass -p 'EwPDR8q45l0Upx@' ssh -o StrictHostKeyChecking=no archipelago@192.168.1.228`). Run `sudo podman ps -a --format '{{.Names}} {{.Status}}' | grep -i restart` to identify containers in crash loops. The known issue is DNS resolution failures — containers can't resolve each other by name (e.g., mempool-web can't find mempool-api). Check if the `archy-net` Podman network exists: `sudo podman network ls`. If missing, create it: `sudo podman network create archy-net`. Reconnect all containers that need inter-container DNS to this network. Verify with `sudo podman exec archy-mempool-web ping mempool-api`. Restart affected containers and monitor for 2 minutes to confirm no more crash loops.
- **6 containers in crash loops**: archy-nbxplorer (3,535 restarts), archy-mempool-web (2,041), mempool-api (906), btcpay-server (888), mempool-electrs (529), immich_server (439)
- **Root cause**: Container networking DNS failures — mempool-web can't resolve "mempool-api" upstream, nbxplorer can't connect to Postgres
- **Load average 5.44 on 4 cores** — entirely caused by crash/restart cycles consuming CPU
- **ollama in Created state** — never started, consuming a container slot
- **Podman rootless warning**: "/" is not a shared mount
### Server .198 Issues - [ ] **Fix .198 server swap and memory**: SSH to 192.168.1.198. Check current swap: `free -h`. If no swap configured, create a 4GB swap file: `sudo fallocate -l 4G /swapfile && sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile`. Add to `/etc/fstab`: `/swapfile none swap sw 0 0`. Verify with `free -h`. This prevents OOM kills that crash containers.
- **No federation configured** — /var/lib/archipelago/federation/ is empty
- **Tor container outdated** (v0.4.6.10) — warns "missing protocols: FlowCtrl=2 Relay=4", will eventually stop working
- **Tor failing every 5 minutes**: "No more HSDir available to query" — can't resolve .onion addresses
- **Memory critically low**: 147MB free of 8GB, NO SWAP configured
- **Nostr identity revoked** — nostr_revoked file exists but empty
- **Containers run under root** — rootless podman shows nothing, sudo podman shows 35 containers
### Cross-Node Issues - [ ] **Stop and remove ollama container if not needed**: SSH to 192.168.1.228. Check ollama status: `sudo podman ps -a | grep ollama`. If it's in "Created" state and never started, remove it: `sudo podman rm ollama`. This frees a container slot and removes clutter from the app list. If the user has ollama in their installed apps, leave it but start it: `sudo podman start ollama`.
- .228 → .198 HTTP health: OK (basic connectivity works)
- .198 → .228 HTTP health: OK - [ ] **Verify all core Bitcoin containers are healthy**: SSH to 192.168.1.228. Check these containers are running and healthy: `bitcoin-knots`, `lnd`, `mempool-api`, `archy-mempool-web`, `mempool-electrs`, `btcpay-server`, `archy-nbxplorer`. Run `sudo podman ps --format '{{.Names}}\t{{.Status}}' | grep -E "(bitcoin|lnd|mempool|btcpay|nbxplorer|electrs)"`. For any that are not "Up", check logs: `sudo podman logs --tail 50 {container-name}`. Fix the root cause (usually missing network, wrong env var, or dependency not ready). After fixes, run `curl -s http://localhost:5678/health` to verify the Archy backend sees them all.
- .198 has ZERO federation peers — no nodes.json, never joined federation
- Tor-based federation impossible from .198 — Tor can't resolve hidden services
- No swap on either server — OOM kills likely under load
- ping not installed on .228 (missing iputils-ping)
--- ---
## User Stories & Acceptance Tests ## Phase 3: Fix Iframe Embedding for All Apps
Every test must pass **10 consecutive times** from BOTH .228→.198 AND .198→.228 directions. - [ ] **Audit X-Frame-Options headers for all proxied apps**: SSH to 192.168.1.228. For each app with a known port, check the actual response headers: `for port in 81 3000 3001 4080 7777 8080 8081 8082 8083 8085 8096 8123 8175 8176 8190 8240 8334 8888 9000 9001 9980 11434 2283 2342 23000 50002; do echo "Port $port:"; curl -sI http://localhost:$port/ 2>/dev/null | grep -i "x-frame\|content-security-policy" || echo " (no frame restrictions)"; done`. Record the results. Compare against the blocking list in `neode-ui/src/stores/appLauncher.ts` (lines 23-31, the `XFRAME_BLOCKED_PORTS` array). Update the blocking list to match reality — if an app no longer sends X-Frame-Options DENY, remove it from the blocked list. If an app sends it but isn't in the list, add it.
### US-01: System Health - [ ] **Ensure nginx strips X-Frame-Options for iframe-compatible apps**: In `image-recipe/configs/nginx-archipelago.conf`, verify every `/app/{id}/` location block includes `proxy_hide_header X-Frame-Options;` for apps that should work in iframes. Apps that genuinely can't work in iframes (BTCPay with DENY, Home Assistant with SAMEORIGIN that rejects proxy origin) should open in new tabs. For apps like Grafana (port 3000) — check if setting the env var `GF_SECURITY_ALLOW_EMBEDDING=true` on the Grafana container fixes it, then remove it from the blocked list. For Nextcloud (port 8085) — check if the nginx `sub_filter` approach or Nextcloud's `overwriteprotocol` setting allows embedding. For Uptime Kuma (port 3001) — it may work with the header stripped. Test each by loading `http://192.168.1.228/app/{id}/` in a browser iframe or `curl -sI http://192.168.1.228/app/{id}/ | grep -i frame`.
> As a node operator, I want my server to boot cleanly with all services running, zero crashed containers, and stable resource usage, so I never have to manually intervene.
### US-02: Container Lifecycle - [ ] **Fix nginx sub_filter for apps with root-relative asset paths**: Apps served under `/app/{id}/` may have root-relative paths like `/static/main.js` that break because they resolve to the Archy root, not the app root. In `image-recipe/configs/nginx-archipelago.conf`, check IndeedHub's location block (lines 334-367) — it already uses `sub_filter` to rewrite paths. Verify the same pattern exists for other Next.js/React apps that need it (Penpot on 9001, Immich on 2283, Fedimint UI on 8175). For each, test: load the app at `http://192.168.1.228/app/{id}/`, open browser dev tools Network tab, check for 404s on static assets. If assets 404, add appropriate `sub_filter` rules to their nginx location block. After changes, sync the config: `scp image-recipe/configs/nginx-archipelago.conf archipelago@192.168.1.228:/tmp/ && ssh archipelago@192.168.1.228 'sudo cp /tmp/nginx-archipelago.conf /etc/nginx/sites-available/archipelago && sudo nginx -t && sudo systemctl reload nginx'`.
> As a node operator, I want every installed app to start, run, survive reboots, and recover from crashes automatically, so my services are always available.
### US-03: Federation Join - [ ] **Deploy and verify iframe loading for all apps**: Deploy with `./scripts/deploy-to-target.sh --live`. After deploy, test each app iframe by hitting the Archy UI at `http://192.168.1.228`, navigating to Apps, and clicking each installed app. Verify: (1) iframe apps load content (not blank white), (2) blocked apps open in new tab cleanly, (3) no mixed-content warnings in console. Log any remaining issues for the next phase.
> As a node operator, I want to invite another node to my federation using an invite code, so we can share status and deploy apps to each other.
### US-04: Federation Sync
> As a node operator, I want to see all my federated peers' status (online/offline, apps, resources) updated every 5 minutes, so I know my network health.
### US-05: Tor Hidden Services
> As a node operator, I want each app to have a .onion address that works reliably, so my services are accessible over Tor without exposing my IP.
### US-07: File Sharing
> As a node operator, I want to share files with federated peers over Tor with access controls (free, peers-only, paid), so I can selectively distribute content.
### US-08: DWN Sync
> As a node operator, I want DWN messages and protocols to replicate bidirectionally between my federated nodes over Tor, so my decentralized data is available everywhere.
### US-09: NIP-07 Signing
> As a node operator, I want iframe apps to use window.nostr to sign events with my node's Nostr key (with consent), so I can use Nostr apps with my sovereign identity.
### US-10: Backup/Restore
> As a node operator, I want to create encrypted backups and restore them on a fresh install, so I never lose my data or identity.
### US-11: Dashboard Monitoring
> As a node operator, I want real-time CPU, RAM, disk, and container health displayed on my dashboard, so I can spot problems before they escalate.
### US-12: Auto-Updates
> As a node operator, I want my node to check for updates, download them with integrity verification, and apply them with rollback capability.
### US-13: Identity & Credentials
> As a node operator, I want W3C DID Documents and Verifiable Credentials that work with did:dht for discoverable DIDs and proper VCs for proving identity claims between nodes.
### US-14: Web UI Navigation
> As a node operator, I want every page in the UI to load correctly, show real data (not hardcoded), and navigate without broken links or dead buttons.
### US-15: Boot Recovery
> As a node operator, I want all containers to automatically restart after any reboot, crash, or power loss, with zero manual intervention required.
--- ---
## Phase 1: Emergency Stabilization (Week 1-2) ## Phase 4: IndeedHub + Nostr Signer Integration
### Sprint 1: Stop the Crash Loops - [ ] **Verify IndeedHub container is running and accessible**: SSH to 192.168.1.228. Check: `sudo podman ps | grep indeedhub`. If not running, check if the image exists: `sudo podman images | grep indeedhub`. If no image, pull from manifest: the image is `git.tx1138.com/lfg2025/indeedhub:latest` (from `apps/indeedhub/manifest.yml`). Pull and start: `sudo podman pull git.tx1138.com/lfg2025/indeedhub:latest && sudo podman run -d --name indeedhub --restart unless-stopped -p 7777:3000 --cap-drop ALL --cap-add CHOWN --cap-add SETUID --cap-add SETGID --security-opt no-new-privileges --user 1001 git.tx1138.com/lfg2025/indeedhub:latest`. Verify it responds: `curl -sI http://localhost:7777/`. Check nginx proxy works: `curl -sI http://localhost/app/indeedhub/`.
- [x] **CRASH-01** — Fix container networking on .228. **Root cause**: UFW blocking all traffic from Podman subnets (10.88.0.0/16, 10.89.0.0/16) to host, preventing Aardvark DNS resolution. **Fix**: `ufw allow from 10.88.0.0/16` and `ufw allow from 10.89.0.0/16`. All containers on archy-net can now resolve hostnames. mempool-web stable 30+ minutes, 0 restarts. - [ ] **Fix IndeedHub port mapping inconsistency**: In `core/archipelago/src/container/docker_packages.rs`, line ~139-141 hardcodes `http://localhost:8190` for IndeedHub. But nginx and the frontend use port 7777. Update `docker_packages.rs` to use port 7777: change `Some("http://localhost:8190".to_string())` to `Some("http://localhost:7777".to_string())`. Also verify `apps/indeedhub/manifest.yml` — if it says port 8190, update to 7777 to match the actual deployment. In `neode-ui/src/stores/appLauncher.ts` line 67, confirm `'7777': '/app/indeedhub/'` is correct. Deploy with `./scripts/deploy-to-target.sh --live` and test.
- [x] **CRASH-02** — Fix archy-nbxplorer Postgres connection on .228. **Same root cause as CRASH-01**: UFW blocking DNS. After UFW fix, nbxplorer resolves archy-btcpay-db hostname and connects to Postgres. Both nbxplorer and btcpay-server stable 30+ minutes. - [ ] **Verify nostr-provider.js injection works for IndeedHub iframe**: The NIP-07 Nostr signer works by nginx injecting `neode-ui/public/nostr-provider.js` into the iframe via `sub_filter`. Check the IndeedHub nginx location block in `image-recipe/configs/nginx-archipelago.conf` (lines 334-367) includes a `sub_filter` that injects `<script src="/nostr-provider.js"></script>` into the HTML response. If missing, add: `sub_filter '</head>' '<script src="/nostr-provider.js"></script></head>';` with `sub_filter_once on;` and `sub_filter_types text/html;`. Sync nginx config to server and reload. Verify by loading IndeedHub in the Archy iframe and checking browser dev tools console for `window.nostr` availability — run `JSON.stringify(Object.keys(window.nostr))` in the iframe console, should show `["getPublicKey","signEvent","getRelays","nip04","nip44"]`.
- [x] **CRASH-03** — Fix immich_server crash loop on .228. **Same root cause as CRASH-01**: UFW blocking DNS. Immich components on immich-net could not resolve each other. After UFW fix, immich_server started and is running stable 30+ minutes. Logs show successful Nest application startup on port 2283. - [ ] **Test full NIP-07 signing flow with IndeedHub**: Open Archy at `http://192.168.1.228`, go to Apps, click IndeedHub. Expected flow: (1) NostrIdentityPicker modal appears on first launch asking which identity to use, (2) select an identity with a Nostr key, (3) IndeedHub loads in iframe, (4) when IndeedHub requests `window.nostr.getPublicKey()`, the Archy parent responds with the selected identity's Nostr pubkey, (5) when IndeedHub requests `window.nostr.signEvent(event)`, NostrSignConsent modal appears, (6) user approves, event is signed via `identity.nostr-sign` RPC, (7) signed event returned to IndeedHub. Test each step. If NostrIdentityPicker doesn't show, check `AppSession.vue` line ~302-304 `isIdentityAwareApp()` includes 'indeedhub'. If signing fails, check RPC logs: `ssh archipelago@192.168.1.228 'sudo journalctl -u archipelago --since "5 min ago" | grep -i nostr'`.
- [x] **CRASH-04** — Removed ollama on .228. `sudo podman rm ollama`. Container gone, total count reduced from 33 to 32. - [ ] **Ensure IndeedHub content loads fully — all pages, media, navigation**: After the Nostr flow works, navigate through IndeedHub's content inside the iframe. Check: (1) all pages/routes load (no blank screens), (2) media content (videos, images) loads, (3) navigation within IndeedHub works without breaking the iframe, (4) no console errors related to CORS, mixed content, or CSP. If videos don't load, check if the video hosting domain is blocked by CSP headers — may need to add `Content-Security-Policy` adjustments in the nginx location block. If internal navigation causes the iframe to navigate to a bare URL (not under `/app/indeedhub/`), add `sub_filter` rules to rewrite the app's internal links.
- [x] **CRASH-05** — Verified .228 stability. All 32 containers running, zero exited, zero new crash loops for 30+ minutes. Load avg ~5.3 (high due to 32 containers on 4-core machine, not crash loops — was same before). Memory 1.8GB available (needs swap, see STAB-02). Health checks passing. - [ ] **Test NIP-04 and NIP-44 encryption/decryption**: In IndeedHub (or manually via browser console in the iframe), test the encryption methods: (1) `window.nostr.nip04.encrypt(somePubkey, "test message")` — should return ciphertext, (2) `window.nostr.nip04.decrypt(somePubkey, ciphertext)` — should return "test message", (3) same for `nip44.encrypt` and `nip44.decrypt`. If any fail, check RPC handlers in `core/archipelago/src/api/rpc/identity.rs` — the `handle_identity_nostr_encrypt_nip04/nip44` and decrypt handlers (lines 428-496). Check that the identity manager has the required keys.
### Sprint 2: Stabilize .198
- [x] **STAB-01** — Added 4GB swap on .198. Created /swapfile, added to /etc/fstab for persistence. `free -h` shows 4.0Gi swap.
- [x] **STAB-02** — Added 8GB swap on .228. Recreated existing 4GB swapfile as 8GB. Added to /etc/fstab. `free -h` shows 8.0Gi swap.
- [x] **STAB-03** — Updated Tor on .198 (system service, not container). Added Tor Project apt repo, upgraded from 0.4.7.16 to 0.4.9.5. Restarted service, bootstrapped 100% in 10s. No "missing protocols" warnings. Hidden service hostname readable: mq2leoozlaouf6yuab7wf5i6le4fp7d52bo4l5cp5nkxo3udbkumqtad.onion.
- [x] **STAB-04** — Tor .onion resolution working on .198 after upgrade to 0.4.9.5. Local onion resolves (curl returns "OK"). Cross-node: .198 can reach .228's onion (2vbxxly...onion/health returns "OK"). "No more HSDir available" errors stopped.
- [x] **STAB-05** — Nostr identity on .198 is functional. `nostr_revoked` is intentional — blocks old-style discovery that leaked onion addresses. New `publish_presence` via nostr_handshake works independently. Pubkey exists: `a37e28bc663b0eff59c954247b2a0b00e110babf50bcf3f2e080a8ba6888c03a`. 8 relays configured. Backend restarted cleanly after removing stale empty revocation file (it correctly recreated it).
- [x] **STAB-06** — Federation already established between .228 and .198. Verified: .228 `federation.list-nodes` shows 2 trusted peers with today's timestamps and app lists. .198 has nodes.json (3.6KB) and peers.json with valid onion address. Password reset to `password123` on .228 for future RPC access.
- [x] **STAB-07** — Rootless vs root podman on .198 is correctly aligned. Backend runs as root (systemd User=root), uses `sudo podman` via PodmanClient. Root podman shows all 34 containers. Backend's running-containers.json tracks all 34. Health monitor works.
--- ---
## Phase 2: Cross-Node Test Suite (Week 3-4) ## Phase 5: App Installation & Management Polish
### Sprint 3: Create Bulletproof Test Harness - [ ] **Verify install flow for every Bitcoin-related marketplace app**: In the Archy UI at `http://192.168.1.228`, go to Marketplace. For each Bitcoin-related app (Bitcoin Knots, LND, Mempool, BTCPay, Electrs, Fedimint), click through to the detail page. Verify: (1) icon loads correctly (not fallback logo), (2) description is accurate, (3) "Install" button appears if not installed, (4) dependency warnings show correctly (Mempool requires Bitcoin Knots + Electrs, BTCPay requires Bitcoin Knots), (5) if already installed, status shows correctly. Fix any issues found in `neode-ui/src/views/MarketplaceAppDetails.vue`. Note: Archy is Bitcoin only — remove any Monero or Liquid entries from `Marketplace.vue` `getCuratedAppList()` if present.
- [x] **TEST-01** — Created `scripts/test-cross-node.sh`. TAP-format output, `--iterations N` flag, tests US-01 (health), US-05 (Tor), US-09 (NIP-07). 31/32 passed on first run. Bidirectional .228↔.198. - [ ] **Remove non-Bitcoin altcoin entries from marketplace**: Search `neode-ui/src/views/Marketplace.vue` for "monero", "liquid", "litecoin", or any non-Bitcoin cryptocurrency entries in the `getCuratedAppList()` function. Remove them entirely. Archy is a Bitcoin-only platform. Run `cd neode-ui && npm run type-check` after changes.
- [x] **TEST-02** — US-01 health tests in test-cross-node.sh. All 6 checks per node (health, services, memory, load, disk, containers). Both nodes pass. .228 load dropped to 3.78 (from 5.44 pre-fix). - [ ] **Fix dependency checks — frontend must match backend**: In `neode-ui/src/views/MarketplaceAppDetails.vue`, find the hardcoded dependency definitions (around lines 447-456). Cross-reference with `core/archipelago/src/api/rpc/package.rs` lines 64-96 where backend dependency checks are defined. Ensure they match exactly. If backend checks for `has_bitcoin` before installing `electrs`, the frontend dependency list for `electrs` must show `bitcoin-knots` as a prerequisite. Update the frontend to match the backend. Ideally, add an RPC method `package.get-dependencies` that returns the dependency list from the backend, and have the frontend call it instead of hardcoding — but for now, just make the hardcoded lists match.
- [x] **TEST-03** — US-02 Container Lifecycle tests added to test-cross-node.sh. Per node: (1) all-running check (zero exited), (2) container count >= 20, (3) stop filebrowser → health monitor auto-restarts within 90s (tested: .228 in 40-50s, .198 in 15-35s). .198 has pre-existing searxng exit 127 (broken entrypoint). 10/12 checks pass per run. - [ ] **Verify start/stop/restart works for all installed apps**: In the Archy UI, go to Apps. For each installed app, test: (1) click Stop — container stops, UI updates to "Stopped" state, (2) click Start — container starts, UI updates to "Running" state with health indicator, (3) click the app — it launches (iframe or new tab as appropriate). Check that the container store (`neode-ui/src/stores/container.ts`) correctly polls for status changes after start/stop actions. If status doesn't update, check the WebSocket state broadcasting in `core/archipelago/src/state.rs`.
- [x] **TEST-04** — US-03 Federation Join tests added to test-cross-node.sh. Per node per iteration: (1) peers present >= 1, (2) trust_level == "trusted", (3) DID starts with "did:", (4) last_seen within 10 min. Fixed stale onion addresses in federation nodes.json on both servers (Tor rotation made old addresses unreachable). All 16/16 checks passing after fix. - [ ] **Fix route-to-package-key mapping divergence**: In `neode-ui/src/views/AppDetails.vue` lines 501-529, the route ID to backend container name mapping is hardcoded. Verify every mapping is correct by checking actual container names on the server: `ssh archipelago@192.168.1.228 'sudo podman ps --format "{{.Names}}"'`. Fix any mismatches. Known issues: `mempool` maps to `mempool-web` but backend may use `archy-mempool-web`. Check `electrs` maps to `mempool-electrs` or `archy-electrs`. Run `cd neode-ui && npm run type-check` after changes.
- [x] **TEST-05** — US-04 Federation Sync tests added to test-cross-node.sh. Per node: (1) sync-state returns results, (2) at least 1 sync succeeds, (3) synced node has apps > 0, (4) last_seen updated within 2 min after sync. .228 syncs 2 peers (23 apps each), .198 syncs 1 peer (25 apps). All 16/16 checks passing.
- [x] **TEST-06** — US-05 Tor tests in test-cross-node.sh. Both directions pass: .228→.198 via Tor returns "OK", .198→.228 via Tor returns "OK". 4/4 passed (2 iterations x 2 directions).
- [x] **TEST-08** — US-07 tests: File Sharing (10x). content.add, content.list-mine, content.browse-peer bidirectionally over Tor (.228↔.198). Fixed ssh_sudo compound command bug (chown ran without sudo, killed script via set -e). All 50/50 checks pass (10 iterations × 5 checks: add-A, list-A, browse-A→B, add-B, browse-B→A).
- [x] **TEST-09** — US-08 tests: DWN Sync (10x). Fixed DWN sync: made sync endpoint async (background task with polling), added 90s overall timeout, deduplicated peer onion addresses, batched message pushes (50/batch), added connect_timeout, fixed HTTP handler to process all messages in batch. All 50/50 checks pass (10 iterations × 5 checks: register, write-3, sync, received-on-198, bidirectional). Each iteration completes in ~35s over Tor.
- [x] **TEST-10** — US-09 NIP-07 provider injection test in test-cross-node.sh. nostr-provider.js detected in /app/mempool/ on both nodes. 4/4 passed.
- [x] **TEST-11** — US-10 tests: Backup/Restore (10x). Added US-10 section to test-cross-node.sh. Tests create/list/verify/delete cycle on both nodes. Increased backup.create rate limit from 3/600 to 10/600. Cleaned up 21K+ stale DWN test messages on both nodes that were inflating backup size. All 80/80 checks pass (10 iterations × 4 checks × 2 nodes).
- [x] **TEST-12** — US-15 Boot Recovery. Added US-15 section to test-cross-node.sh with `--skip-reboot` flag. **.228**: 9/9 pass — 32/32 containers survive all 3 reboots, 0 exited, health OK ~5s post-SSH. **.198**: crash recovery blocks health for 260s (34 containers × ~10s sequential); needs CONT-02. (KNOWN ISSUE: .228 unreachable after 3rd reboot — SSH/HTTP down despite ICMP. Likely UFW rules didn't persist. Needs physical access.)
--- ---
## Phase 3: UI Cosmetic Cleanup (Week 5-6) ## Phase 6: Backend Critical Fixes
### Sprint 4: Information Hierarchy & Deduplication - [ ] **Fix session TTL clock bug — use SystemTime instead of Instant**: Read `core/archipelago/src/session.rs`. Find where `Instant::now()` is used for session TTL/expiry (around line 97). `Instant` is monotonic but can drift on sleep/hibernate — common on NUC/Pi hardware. Replace with `SystemTime::now()` for absolute time comparison. The `FULL_SESSION_TTL` (24 hours) and `PENDING_TOTP_TTL` (5 minutes) checks should use `SystemTime::elapsed()` or store `SystemTime` timestamps and compare with `SystemTime::now()`. Run `cargo test --all-features` in `core/` on the dev server.
- [x] **UI-CLEAN-01** — Audited all views. Dashboard/Home: CLEAN (real RPC data). Server.vue: servicesRunning/connectivityStatus hardcoded, autoSync no backend, logCount never updated. Web5.vue: walletConnected never updated, DID status localStorage-only. - [ ] **Enforce RBAC in RPC handler**: Read `core/archipelago/src/auth.rs` — find the `UserRole` enum and `can_access()` method. Then read `core/archipelago/src/api/rpc/mod.rs` — find where authenticated requests are dispatched to handlers. Add a role check before dispatching: after validating the session, get the user's role, call `role.can_access(method_name)`, and return an authorization error if denied. For now, all users created via onboarding should default to `Admin` role (single-user system), but this lays the groundwork for multi-user. Run `cargo clippy --all-targets --all-features && cargo test --all-features` on the dev server.
- [x] **UI-CLEAN-02** — Dashboard (Home.vue) verified CLEAN. CPU/RAM/disk from system.stats RPC, container counts from store, uptime from RPC. Web5 card fetches from identity/dwn/credentials RPCs. Cloud stats from FileBrowser API. No hardcoded data. - [ ] **Remove dead code and #[allow(dead_code)]**: Search `core/` for all `#[allow(dead_code)]` and `#[allow(unused)]` annotations. For each: (1) if the code is genuinely unused and not part of a planned feature, delete it, (2) if it should be used (like RBAC — now wired up in previous task), remove the allow annotation. Key file: `core/archipelago/src/auth.rs` lines ~70, 83, 88. Run `cargo clippy --all-targets --all-features` to verify no new warnings.
- [x] **UI-CLEAN-03** — Fixed Server.vue: added connectivity check on mount (was hardcoded 'connected'), restart now polls health endpoint instead of assuming success after 2s. Network data already fetches from real RPC endpoints (diagnostics, vpn, dns, interfaces). Deployed and verified. - [ ] **Deploy and verify backend fixes**: Run `./scripts/deploy-to-target.sh --live`. After deploy: (1) verify login still works at `http://192.168.1.228` (password: `password123`), (2) verify session persists after navigating between pages, (3) check logs for any new errors: `ssh archipelago@192.168.1.228 'sudo journalctl -u archipelago --since "2 min ago" | grep -i error'`.
- [x] **UI-CLEAN-04** — Verified Web5.vue information hierarchy. All data from real RPC endpoints: DID from `identity.create-did` (cached in localStorage), wallet from `lnd.getinfo` on mount, Nostr relays from `nostr.list-relays`, DWN from `dwn.status`/`dwn.list-protocols`/`dwn.query-messages`, credentials from `identity.list-credentials`. No hardcoded placeholder numbers. Zero fake data.
- [x] **UI-CLEAN-05** — Verified Settings.vue has zero section duplication. Account (server name, version, session, password, DID/Tor identity) is unique to Settings. 2FA is unique. Backup is unique. System Updates links to `/dashboard/settings/update`. DID/Tor appear as read-only identity display in Settings vs. interactive management in Web5 — different contexts, not duplication. Webhooks, AI Data Access, Claude Auth, Interface Mode all unique to Settings.
- [x] **UI-CLEAN-06** — Verified Marketplace.vue curated app list accuracy. All 33 apps have valid icons (verified all files exist in app-icons/). Fixed `photoprims.svg``photoprism.svg` typo in filename, Marketplace.vue, and mock-backend.js. Docker images reference legitimate registries (docker.io, ghcr.io). External web apps (nostrudel, botfights, nwnn, etc.) correctly use webUrl with empty dockerImage. Deployed and verified.
- [x] **UI-CLEAN-07** — Verified Cloud.vue file management. File sections (Photos, Music, Documents, All) use `fileBrowserClient.listDirectory()` with real paths (/Photos, /Music, /Documents, /). Peer Files shows `rpcClient.federationListNodes()` count and links to PeerFiles view. Upload via `cloudStore.uploadFile()``fileBrowserClient`. Download via `fileBrowserClient.downloadUrl()`. Zero hardcoded data.
- [x] **UI-CLEAN-08** — Verified Federation.vue accuracy. Node list from `rpcClient.federationListNodes()`. Online/offline based on `last_seen` 10-min threshold. NetworkMap component renders with computed `mapNodes`/`mapLinks` from real data. Generate invite via `federationInvite()` RPC. Sync via `federationSyncState()` RPC. DWN sync status from `dwn.status` RPC. Self DID from `getNodeDid()`. Zero hardcoded data.
- [x] **UI-CLEAN-09** — Verified Chat.vue state. Checks AIUI availability via `fetch('/aiui/', { method: 'HEAD' })`. Shows loading spinner while checking. Renders iframe when available. Shows clean fallback: "AI Assistant needs to be enabled before use. Go to Settings to configure your AI provider API key." No broken UI, no errors.
- [x] **UI-CLEAN-10** — Verified Apps.vue installed app display. Real containers from `store.packages` (WebSocket from backend's `podman ps`). Status badges: running=green, stopped=gray, starting/installing=yellow/blue via `getStatusClass()`. Web-only apps (Indeehub, BotFights, etc.) are intentional external bookmarks, not phantom containers. Click navigates to `/dashboard/apps/${id}`. Fallback SVG placeholder for broken icons.
- [x] **UI-CLEAN-11** — Type-check passes. `npm run type-check` exits 0.
- [x] **UI-CLEAN-12** — Build passes. `npm run build` exits 0, 146 precache entries, 2.81s build time.
--- ---
## Phase 4: Backend Hardening (Week 7-10) ## Phase 7: Frontend Cleanup
### Sprint 5: Container Management Reliability - [ ] **Remove dead dockerode dependency**: Run `cd /Users/dorian/Projects/archy/neode-ui && npm uninstall dockerode` and `npm uninstall @types/dockerode` if it exists. Search the codebase for any remaining imports: `grep -r "dockerode" neode-ui/src/`. Remove any dead imports found. Run `npm run type-check` to verify nothing breaks.
- [x] **CONT-01** — Audited container network topology on .198 (4 networks: archy-net, immich-net, penpot-net, podman). Fixed `needs_archy_net` in package.rs to include `lnd`, `archy-nbxplorer`, `nbxplorer` (were missing — would install on wrong network via UI). Moved fedimint + fedimint-gateway from default podman network to archy-net on .198. Created `docs/network-topology.md` with full diagram. (.228 audit pending — SSH unreachable. penpot-frontend/backend missing on .198.) - [ ] **Fix the 10 failing frontend tests**: Run `cd /Users/dorian/Projects/archy/neode-ui && npm run test -- --reporter=verbose 2>&1 | head -100` to see which tests fail. Known failures: (1) `src/stores/__tests__/appLauncher.test.ts` — URL rewriting tests expecting different proxy behavior, (2) `src/views/__tests__/settings.test.ts` — heading selector `h1` not finding the heading element. For each failing test, read the test file and the component/store it tests. Update test expectations to match current implementation. Do NOT change the production code to match tests — fix the tests. Run `npm run test` until all pass.
- [x] **CONT-02** — Added container dependency ordering to health_monitor.rs via StartupTier enum (Database → CoreInfra → DependentService → Application → Frontend). Unhealthy containers sorted by tier before restart. 5s delay between tiers to let dependencies stabilize. container_tier() classifies all known containers into proper startup order. - [ ] **Add 404 catch-all route**: In `neode-ui/src/router/index.ts`, add a catch-all route at the end of the routes array: `{ path: '/:pathMatch(.*)*', name: 'not-found', component: () => import('@/views/NotFound.vue') }`. Create `neode-ui/src/views/NotFound.vue` — a simple view using the existing `.glass-card` class with "Page not found" message and a router-link back to `/dashboard`. Use `<script setup lang="ts">`, no props needed. Style with existing global classes only (`.glass-card`, `.glass-button`). Run `npm run type-check`.
- [x] **CONT-03** — Added `get_health_check_args()` function in package.rs with health checks for 20+ apps: bitcoin-knots (bitcoin-cli), lnd (lncli), btcpay-server (HTTP), mempool-api (HTTP /api/v1/backend-info), nextcloud, homeassistant, grafana, jellyfin, vaultwarden, uptime-kuma, filebrowser, searxng, photoprism, immich, dwn, portainer, ollama, fedimint, nostr-relay, nginx-proxy-manager. All use 30-60s intervals, 3 retries, 60s start period.
- [x] **CONT-04** — Added exponential backoff to health monitor restarts: 10s, 30s, 90s delays (BACKOFF_DELAYS_SECS). RestartTracker now tracks last_failure timestamps and checks backoff_elapsed() before retrying. After MAX_RESTART_ATTEMPTS (3), container marked failed. Auto-reset after STABILITY_RESET_SECS (3600s = 1 hour) via should_reset_failed().
- [x] **CONT-05** — Added `get_memory_limit()` function in package.rs with per-app limits replacing the blanket 2g default. Heavy: bitcoin-knots (2g), onlyoffice (2g), ollama (4g). Medium: lnd/fedimint/homeassistant/mempool-api/searxng (512m), electrs/nextcloud/immich/btcpay/jellyfin/photoprism (1g). Light: mempool-web/grafana/vaultwarden/uptime-kuma/filebrowser/dwn/portainer/nostr-relay/nginx-proxy-manager (256m). Databases: postgres (512m), redis/valkey (128m).
- [x] **CONT-06** — Verified: rootless podman mount warning no longer appears. `sudo podman ps 2>&1 | grep warning` returns empty on .228. Backend runs as root (`sudo podman`), not rootless, so the warning is not applicable.
### Sprint 6: Backend Security & Reliability
- [x] **SEC-01** — Audited all 100+ RPC endpoints. Fixes applied: (1) Error sanitization via `sanitize_error_message()` in mod.rs — strips internal paths, returns generic messages for non-validation errors. (2) Identity ID validation via `validate_identity_id()` — blocks path traversal in identity.get/delete/set-default/sign. (3) DID validation via `validate_did()` — blocks path traversal in federation.remove-node/set-trust. (4) Message size limit (1MB) on node-send-message. (5) DWN data size limit (10MB) on dwn.write-message. Auth/CSRF strong across all endpoints. No shell injection found (all commands use .args() array).
- [x] **SEC-02** — Added rate limiting to federation endpoints in session.rs EndpointRateLimiter: federation.join (5/60s), federation.invite (10/300s), federation.peer-joined (10/60s), federation.peer-address-changed (10/60s), federation.get-state (30/60s). Rate limiter already runs before auth check in mod.rs, so unauthenticated inter-node RPCs are also covered.
- [x] **SEC-03** — Verified CSRF validation in mod.rs lines 206-234: all non-UNAUTHENTICATED_METHODS require both session cookie AND X-CSRF-Token header matching csrf_token cookie. Token is 32-byte random hex generated on login (line 712-715). SameSite=Strict + HttpOnly flags set. 100% of authenticated endpoints reject requests without valid CSRF token.
- [x] **SEC-04** — Audited container security profiles. All containers via package.install get: `--cap-drop=ALL` (line 258), `--security-opt=no-new-privileges:true` (line 259), `--restart=unless-stopped` (line 183), per-app capabilities via `get_app_capabilities()`. Read-only filesystem for 8 compatible apps via `is_readonly_compatible()`. Memory limits via `get_memory_limit()`. Image pinning: 7 Docker Hub images still use `:latest` (bitcoin-knots, photoprism, searxng, tailscale, adguardhome, nginx-proxy-manager, mempool-electrs). Localhost-built UIs use `:latest` intentionally.
- [x] **SEC-05** — Configured log rotation on both nodes. Journald: set SystemMaxUse=500M, MaxRetentionSec=7day, Compress=yes in /etc/systemd/journald.conf.d/archipelago.conf. Vacuumed .228 journal from 3.0GB to 459.7MB. Added /etc/logrotate.d/archipelago for crowdsec and archipelago logs (daily, 7 days, compress). Nginx logrotate already existed.
- [x] **SEC-06** — Verified all 4 security headers present on both nodes: X-Frame-Options: SAMEORIGIN, X-Content-Type-Options: nosniff, Content-Security-Policy (with frame-src *), Referrer-Policy: strict-origin-when-cross-origin.
--- ---
## Phase 5: Reboot & Uptime Hardening (Week 11-14) ## Phase 8: Web5 Identity & Credentials Hardening
### Sprint 7: Zero-Downtime Reboot Testing > **Context**: TBD/Block shut down Nov 2024 — Web5 repos donated to DIF but effectively unmaintained. Archy's custom implementations (did:key, did:dht, VCs, multi-identity) are W3C-compliant and well-tested. SpruceID `ssi` crate (v0.15.0, Feb 2026) is the only mature Rust DID/VC library. DWN spec is stalled — no Rust implementation exists anywhere. Strategy: keep our custom stack (it's good), fix onboarding gaps, encrypt credential storage, validate against W3C specs, evaluate `ssi` for external VC verification only, deprioritize DWN in favor of Nostr + federation. Do NOT adopt dead TBD SDKs.
- [x] **REBOOT-01** — Created `scripts/test-reboot-survival.sh`. TAP-format output with `--node`, `--iterations`, `--rest-between` flags. Records pre-reboot containers, reboots via sudo, waits for SSH (180s max) + health (120s max) + container stabilization (120s), verifies: container count recovered, no exited, all pre-reboot containers back, health OK, no restart loops. 6 checks per iteration. - [ ] **Fix DID onboarding — replace mock signature with real proof-of-control**: In `neode-ui/src/views/OnboardingVerify.vue`, the verification step uses `generateMockSignature()` instead of real cryptographic proof. Replace with a call to `node.signChallenge` RPC (or `identity.sign` if it exists). The flow should be: (1) frontend generates a random challenge string, (2) sends to `identity.sign` RPC with the node's default identity, (3) backend signs with Ed25519 key, (4) frontend displays the signature as proof the node controls the DID. Check `core/archipelago/src/api/rpc/identity.rs` for existing sign handlers — `handle_identity_sign` should work. If `node.signChallenge` RPC doesn't exist, the `identity.sign` endpoint (which takes `{ id?, data }` and returns `{ signature }`) should be sufficient. Update the Vue component to call it. Run `cd neode-ui && npm run type-check`.
- [x] **REBOOT-02** — Ran reboot survival test 3x on .228. 21/21 checks passed. All 3 reboots: 32/32 containers survive, 0 exited, all containers back, health OK, no restart loops. SSH recovery: 130-145s. Health available: 5s after SSH. Total recovery ~255-270s (includes 120s stabilization wait). Zero failures. - [ ] **Fix DID onboarding — real encrypted backup**: In `neode-ui/src/views/OnboardingBackup.vue`, the backup step uses mock JSON data instead of real encrypted key material. Replace with a call to `identity.export` or `backup.create-identity` RPC (check what exists in `core/archipelago/src/api/rpc/identity.rs` and `core/archipelago/src/api/rpc/backup_rpc.rs`). The backup should contain the Ed25519 private key encrypted with the user's password via Argon2 + ChaCha20-Poly1305 (the encryption stack already exists in `core/security/`). If no export RPC exists, create one that: (1) derives a key from the user's password with Argon2, (2) encrypts the identity's private key with ChaCha20-Poly1305, (3) returns base64-encoded ciphertext. The frontend should offer this as a downloadable `.json` file. Run `cargo test --all-features` on the dev server.
- [x] **REBOOT-03** — .198 reboot test after watchdog fix: SSH back in 130-140s, health OK in 5s (was timing out). 8/14 pass (2 iterations). Container recovery takes >120s for 34 containers (21/32 after 120s wait). Backend stays up — no more watchdog kills. Pre-existing: searxng exit 127, archy-tor exit 1. - [ ] **Fix DID onboarding UX copy**: In `neode-ui/src/views/OnboardingDid.vue`, the copy says "Generate DID" but actually fetches an existing DID from the server (generated at first boot). Update the button text to "View Your DID" or "Retrieve Your DID" and the description to explain that the DID was created when the node was set up. Small change but prevents user confusion. Do NOT change any styling or layout.
- [x] **REBOOT-04** — Simultaneous reboot passed after watchdog fix. Both rebooted at same time. .228 SSH back in 115s, .198 in ~5min. Both healthy. Federation re-established — 2 peers synced OK. .198 boot is slower (34 containers on 8GB RAM) but recovers fully. - [ ] **Validate DID Document structure against W3C spec**: In `core/archipelago/src/identity.rs`, the `generate_did_document()` function builds a DID Document. Verify it includes all required fields per W3C DID Core v1.0: `id`, `verificationMethod` (with correct `type: "Ed25519VerificationKey2020"`), `authentication`, `assertionMethod`, `keyAgreement` (X25519). Check that `@context` includes `["https://www.w3.org/ns/did/v1", "https://w3id.org/security/suites/ed25519-2020/v1"]`. Add a unit test that validates the document structure against these requirements. Run `cargo test --all-features`.
- [x] **REBOOT-05** — SIGKILL recovery test. .228: 5/5 pass, recovery in 10-15s. .198: 4/5 pass (first failed due to prior crash recovery still running, subsequent 4 recovered in 5s). Backend auto-restarts via systemd Restart=on-failure. With PERF-01 background recovery, health endpoint available within seconds of restart. - [ ] **Validate Verifiable Credentials against W3C VC 2.0 spec**: In `core/archipelago/src/credentials.rs`, verify the `VerifiableCredential` struct produces output matching W3C VC Data Model 2.0. Check: (1) `@context` includes `https://www.w3.org/ns/credentials/v2`, (2) `type` array starts with `"VerifiableCredential"`, (3) `proof` uses `Ed25519Signature2020` with proper structure (`type`, `created`, `verificationMethod`, `proofPurpose`, `proofValue`), (4) `issuanceDate` is RFC 3339, (5) `credentialSubject` has `id` field with holder DID. Add a test that issues a credential, serializes to JSON, and validates all required fields. Run `cargo test --all-features`.
### Sprint 8: Memory & Storage Monitoring - [ ] **Evaluate SpruceID ssi crate for DID resolution validation**: Add `ssi = "0.15"` to `core/Cargo.toml` as an optional dependency (`[dependencies.ssi] version = "0.15" optional = true`). Create a test (behind `#[cfg(feature = "ssi-compat")]`) that: (1) generates a DID Document with Archy's `identity.rs`, (2) parses it with `ssi::did::Document`, (3) verifies the structure is valid per the `ssi` library's validation. This is a compatibility check — if `ssi` can parse our documents, we're spec-compliant. If it fails, note what's wrong. Do NOT make `ssi` a required dependency — this is for validation only. Run `cargo test --features ssi-compat` on dev server.
- [x] **MEM-01** — Added OOM-kill detection in disk_monitor.rs. `check_oom_kills()` runs `dmesg --level=err,crit` every 5 minutes, filters for "oom-kill" / "Out of memory" lines. New OOM kills logged via `warn!()` and written to `data_dir/oom-alert.json` for frontend consumption. Tracks last_oom_count to only alert on new events. - [ ] **Evaluate pkarr crate for did:dht enhancement**: Research the `pkarr` crate (v5.0.3, 550K downloads) by reading its documentation. It provides Ed25519-public-key-addressable resource records over the Mainline DHT — essentially did:dht but with better tooling and active maintenance. Compare with Archy's current `did_dht.rs` implementation that uses `mainline` directly. If `pkarr` offers advantages (relay fallback, caching, DNS-packet handling), document them in `docs/pkarr-evaluation.md`. Do NOT switch yet — just evaluate and document findings. Key question: does `pkarr` handle the BEP-44 signed DNS packet encoding that Archy currently does manually in `did_dht.rs`?
- [x] **MEM-02** — Added container memory leak detection in health_monitor.rs. MemoryTracker records per-container RSS samples every 5 minutes (288 samples max = 24h). check_leak() compares oldest vs newest sample — warns if growth > 50%. Uses `podman stats --no-stream` for live memory data. parse_memory_string() handles GiB/MiB/KiB formats. - [ ] **Clean up DWN — remove dead TBD references and simplify**: Search the codebase for any references to TBD URLs, `@tbd54566975`, `tbd.website`, or TBD-specific terminology. Remove them. In `docs/dwn-protocols.md`, update the context to note that TBD is defunct and Archy's DWN is a custom implementation for peer sync, not a full DWN spec implementation. In `core/archipelago/src/network/dwn_store.rs`, verify the protocol definitions use Archy-specific URLs (`https://archipelago.dev/protocols/...`) not TBD URLs. Keep the DWN store functionality — it works for peer file catalogs and federation state — but stop calling it "Web5 DWN" in user-facing text. In `neode-ui/src/views/Web5.vue`, if there are references to "TBD" or "Web5 by TBD", update to just "Decentralized Identity" or "Web5 Standards".
- [x] **MEM-03** — Added disk growth alerting in disk_monitor.rs. Tracks 288 disk usage samples (24h at 5min intervals). Calculates daily growth rate from oldest→newest sample. Warns if growth > 1GB/day. 85% warning and 90% auto-cleanup with disk-warning.json already existed. - [ ] **Add did:dht auto-refresh background task**: In `core/archipelago/src/server.rs`, add a background task that refreshes the did:dht publication every 2 hours. DHT records expire if not re-published. The task should: (1) check if the node has a published did:dht, (2) if yes, call `did_dht::create_and_publish()` to re-publish, (3) log success/failure. Use `tokio::spawn` with `tokio::time::interval(Duration::from_secs(7200))`. Only run if `config.nostr_discovery_enabled` is true (the same flag that gates DHT usage). Add the task alongside the existing background tasks (container scanner, peer health, etc.).
- [x] **MEM-04** — Added systemd watchdog. archipelago.service: Type=notify, WatchdogSec=60. main.rs: sd_notify::Ready on startup, spawns background task pinging sd_notify::Watchdog every 30s. Added sd-notify = "0.4" to Cargo.toml. If backend hangs, systemd auto-restarts within 60s. - [ ] **Encrypt credentials storage at rest**: Read `core/archipelago/src/credentials.rs` — credentials are stored as plaintext JSON in `{data_dir}/credentials/credentials.json`. These may contain sensitive claims about identity holders. Fix: encrypt the file at rest using AES-256-GCM (the `aes-gcm` crate is already a dependency). Follow the pattern used in `core/security/` for secrets encryption — derive a key from the node's master key. On read: detect if file is plaintext JSON (starts with `[` or `{`) vs encrypted (binary/base64), decrypt if needed. On write: always encrypt. This provides a migration path — existing plaintext files get encrypted on first write. Add a test that writes credentials, reads them back, and verifies the file on disk is not plaintext. Run `cargo test --all-features` on dev server.
- [x] **MEM-05** — Deployed uptime-monitor.sh on both nodes with cron (*/5 * * * *). Tracks: HTTP status, response time, CPU, memory, disk, containers, uptime, restart count. Logs to /var/lib/archipelago/uptime-monitor/metrics.csv. Auto-generates summary.json. Monitoring started 2026-03-14. (7-day data collection is passive — results reviewed after 2026-03-21.) - [ ] **Add identity lifecycle integration tests**: In `core/archipelago/src/identity_manager.rs`, add comprehensive tests for the full lifecycle: (1) create identity with default purpose → verify did:key format matches `did:key:z6Mk...`, (2) create Nostr key → verify npub starts with `npub1`, (3) sign arbitrary data → verify signature with public key, (4) issue a VC from this identity → verify the VC, (5) create a presentation wrapping the VC → verify the presentation, (6) delete identity → verify it's gone and default shifts. Use `tempfile::tempdir()` for storage. Target: 8+ new `#[tokio::test]` cases. Run `cargo test --all-features`.
- [ ] **Write ADR for DWN deprioritization**: Create `docs/adr/011-dwn-deprioritization.md`. Document: (1) TBD/Block shut down Nov 2024, donated code to DIF, (2) no maintained Rust DWN SDK exists, (3) DWN spec losing momentum without TBD's backing, (4) Archy's federation over Tor + Nostr relays already serve the peer data sync use case, (5) DWN store code stays in codebase but is not actively developed, (6) re-evaluate if DIF produces a viable Rust SDK. Follow existing ADR format in `docs/adr/`. This is documentation only — no code changes.
- [ ] **Deploy to both nodes and test Web5 features**: Deploy with `./scripts/deploy-to-target.sh --both`. Test at `http://192.168.1.228`: (1) navigate to Web5 page — DID displays correctly, (2) click "Publish to DHT" if available — should publish and show status, (3) go to Credentials page — issue a test credential to self, verify it shows in list. Repeat on `http://192.168.1.198`. Check logs on both: `ssh archipelago@192.168.1.228 'sudo journalctl -u archipelago --since "5 min ago" | grep -iE "(did|credential|dwn|identity)"'` and same for .198.
- [ ] **Test cross-node DID resolution between .228 and .198**: From .228's Web5 page, get its DID (did:key). SSH to .198 and test resolving .228's DID: `curl -s http://localhost:5678/rpc/v1 -H 'Content-Type: application/json' -d '{"method":"identity.resolve-remote-did","params":{"did":"<.228-did>","onion_address":"<.228-onion>"}}'`. The response should return .228's full DID Document. Test the reverse direction (resolve .198's DID from .228). If resolution fails, check: (1) Tor is running on both nodes (`sudo podman ps | grep tor`), (2) onion addresses are valid (`cat /var/lib/archipelago/tor/*/hostname`), (3) RPC is accessible over Tor. Fix any issues found.
- [ ] **Test cross-node credential issuance and verification**: From .228, issue a Verifiable Credential where .228 is the issuer and .198's DID is the subject. Use the Credentials UI or RPC: `curl -s http://localhost:5678/rpc/v1 -H 'Content-Type: application/json' -d '{"method":"identity.issue-credential","params":{"subject_did":"<.198-did>","credential_type":"FederationMember","claims":{"role":"peer","joined":"2026-03-15"}}}'`. Copy the credential ID. From .198, verify the credential: `curl -s http://localhost:5678/rpc/v1 -H 'Content-Type: application/json' -d '{"method":"identity.verify-credential","params":{"credential_id":"<id>"}}'`. If .198 can't verify (it needs .228's public key), test the resolution chain: .198 resolves .228's DID → extracts public key → verifies signature. Fix any issues in the verification flow.
- [ ] **Test federation trust via DIDs between .228 and .198**: Verify the federation between the two nodes uses DID-based identity. SSH to .228: `curl -s http://localhost:5678/rpc/v1 -H 'Content-Type: application/json' -d '{"method":"federation.list-nodes"}'`. Check that .198 appears as a peer with its DID. SSH to .198 and verify .228 appears similarly. If federation is not set up between them, establish it: use `federation.invite` on .228 to generate an invite, then `federation.join` on .198. After joining, verify: (1) both nodes see each other in their peer lists, (2) both nodes have each other's DIDs, (3) peer health checks pass between them. Check logs for federation errors: `sudo journalctl -u archipelago --since "10 min ago" | grep -i federation`.
- [ ] **Test DWN sync between .228 and .198**: Even though DWN is deprioritized, test the existing sync functionality. On .228, write a test DWN message: `curl -s http://localhost:5678/rpc/v1 -H 'Content-Type: application/json' -d '{"method":"dwn.write","params":{"protocol":"https://archipelago.dev/protocols/file-catalog/v1","data":{"filename":"test.txt","size":1024}}}'`. Check DWN status on both nodes: `curl -s http://localhost:5678/rpc/v1 -d '{"method":"dwn.status"}'`. If sync is working, the message should appear on .198 after a sync cycle. If sync is not working, document what fails and where — this informs whether to invest more or formally pause DWN development. Don't spend more than 15 minutes debugging — document findings either way.
--- ---
## Phase 6: did:dht & Interoperable Schemas (Week 15-20) ## Phase 9: Factory Reset, Restore, & End-to-End Onboarding Test
### Sprint 9: did:dht Implementation > **Goal**: Be able to factory reset the node, go through onboarding (DID + Nostr key created together), keys loaded into identity management, sign into IndeedHub with native Nostr signer, content loads. Also: restore from backup on the very first screen.
- [x] **DHT-01** — Created `docs/did-dht-integration.md`. Covers: did:dht spec (BEP-44 mutable DHT items), DNS packet encoding, z-base-32 identifiers, publication/resolution flows, `mainline` crate for Rust DHT access, security considerations (no Tor addresses in public DHT), comparison with did:key, new RPC endpoints, background refresh every 2h, integration points with federation/VCs/Web5 UI. - [ ] **Implement system.factory-reset RPC endpoint**: Create a new RPC handler in `core/archipelago/src/api/rpc/system.rs` (or add to an existing system module). The `system.factory-reset` method should: (1) require authentication (admin only), (2) accept `{ confirm: true }` param as a safety check, (3) stop all running containers via `PodmanClient` (iterate `podman ps -q` and stop each), (4) delete user data: remove `{data_dir}/user.json`, `{data_dir}/onboarding.json`, `{data_dir}/identities/` directory, `{data_dir}/credentials/` directory, `{data_dir}/peers.json`, `{data_dir}/did-cache/` directory, `{data_dir}/dwn/` directory, (5) keep container images (don't re-download), keep the `identity/node_key` (node identity persists — it's the hardware identity), keep nginx and systemd configs, (6) clear all sessions from the session store, (7) restart the Archipelago service: `sd_notify::notify(false, &[sd_notify::NotifyState::Reloading])` then exit the process (systemd will restart it), or alternatively use `std::process::Command::new("sudo").args(["systemctl", "restart", "archipelago"]).spawn()`. Register the handler in `core/archipelago/src/api/rpc/mod.rs`. Run `cargo clippy --all-targets --all-features && cargo test --all-features` on the dev server.
- [x] **DHT-02** — Implemented did:dht creation. Added `network/did_dht.rs`: z-base-32 identifier encoding, DNS packet encoding via `simple-dns`, BEP-44 mutable item publication via `mainline` crate, `save_dht_did()` persistence. Added `dht_did` field to IdentityRecord. RPC endpoint `identity.create-dht-did` creates and publishes. Added `mainline`, `zbase32`, `simple-dns` crates. (Cross-node verification pending deployment.) - [ ] **Add factory reset button to Settings.vue**: In `neode-ui/src/views/Settings.vue`, add a "Factory Reset" section at the very bottom of the page (after all other settings). Use a `.path-option-card` container with a red-tinted warning. Include: (1) heading "Factory Reset", (2) description "Wipe all user data, identities, and credentials. Container images are preserved. The node will restart and show the onboarding screen.", (3) a `.glass-button` styled with red text/border that says "Factory Reset", (4) on click, show a confirmation dialog (use a simple `v-if` modal with `.glass-card` styling) asking "Are you sure? This will delete all identities, credentials, and settings. This cannot be undone." with Cancel and "Yes, Reset" buttons, (5) on confirm, call `rpcClient.call({ method: 'system.factory-reset', params: { confirm: true } })`, (6) on success, clear all localStorage (`localStorage.clear()`), redirect to `/onboarding/intro`. Use existing glass styles only — no new CSS classes. Run `cd neode-ui && npm run type-check`.
- [x] **DHT-03** — Implemented did:dht resolution. `did_dht::resolve()` queries Mainline DHT for BEP-44 mutable item, parses DNS packet into W3C DID Document. `DhtDidCache` with 1-hour TTL. RPC endpoints: `identity.resolve-dht-did`, `identity.refresh-dht-did`, `identity.dht-status`. (Cross-node verification pending deployment.) - [ ] **Add "Restore from Backup" button to OnboardingIntro.vue (first screen)**: In `neode-ui/src/views/OnboardingIntro.vue`, this is the very first screen a user sees after a fresh install or factory reset. Currently it just has a "Unlock your sovereignty →" button. Add a "Restore from Backup" link below it. Implementation: (1) add `showRestore` and `restoreFile` and `passphrase` refs, (2) below the main CTA button, add a subtle text link "Restore from backup" (style: `text-white/50 hover:text-white/80 underline text-sm cursor-pointer mt-4 block text-center`), (3) clicking it toggles a restore panel (use `.glass-card`) with: a file input (`<input type="file" accept=".json">`) for the `archipelago-did-backup.json` file, a password input for the backup passphrase, and a "Restore" `.glass-button`, (4) on file select, read the JSON with `FileReader`, (5) on Restore click, call `rpcClient.call({ method: 'backup.restore-identity', params: { backup: parsedJson, passphrase: password } })`, (6) on success, show "Identity restored successfully" message, then navigate to `/onboarding/did` — the DID step will now show the restored DID instead of generating a new one. Run `cd neode-ui && npm run type-check`.
- [x] **DHT-04** — Updated Web5 UI for did:dht. Added "DHT Identity" card showing did:dht with blue status indicator. "Publish to DHT" button calls identity.create-dht-did. "Refresh DHT" button re-publishes. Copy button. dht_did persisted in localStorage. Type-check and build pass. - [ ] **Implement backup.restore-identity RPC for DID restore**: Check if `core/archipelago/src/api/rpc/backup_rpc.rs` has an identity-specific restore handler. The existing `backup.restore` is for full system backups (tar archives from USB). We need a lighter `backup.restore-identity` that: (1) accepts the JSON blob from `node.createBackup` (the `archipelago-did-backup.json` file), (2) extracts: version, encrypted blob, (3) decrypts with Argon2 + ChaCha20-Poly1305 using the provided passphrase (reverse of `backup::create_encrypted_backup()` in `core/archipelago/src/backup/identity.rs`), (4) writes the decrypted 32-byte Ed25519 private key to `{data_dir}/identity/node_key` with 0o600 permissions, (5) returns `{ did, pubkey }` of the restored identity. If the `backup/identity.rs` module already has a `restore_encrypted_backup()` function, use it. If not, create one following the inverse of `create_encrypted_backup()`. Register the handler in `rpc/mod.rs`. Run `cargo clippy --all-targets --all-features && cargo test --all-features`.
### Sprint 10: DWN Protocol Definitions for Interoperable Schemas - [ ] **Ensure DID + Nostr keypair exist immediately from boot / factory reset**: The node's Ed25519 key is auto-generated at first boot (stored in `identity/node_key`), and `node.did` / `node.nostr-pubkey` RPCs derive from it. But user identities with Nostr keys are only created when the user reaches the Identity step in onboarding. Fix this so keys are available from the very start: (1) In `core/archipelago/src/main.rs` or `server.rs`, during startup (after loading node identity but before starting the HTTP server), check if any identities exist via `IdentityManager::list()`. If the list is empty (fresh boot or factory reset), auto-create a default identity: call `identity_manager.create("Default", IdentityPurpose::Personal)` — this generates Ed25519 + Nostr keypair automatically. (2) Verify `identity_manager.rs` `create()` method calls `create_nostr_key()` automatically — if not, add it after keypair generation. (3) This means when `OnboardingDid.vue` loads, both `node.did` AND `identity.list` already return data with Nostr npub populated. The identity step in onboarding can then let the user rename or create additional identities, but the default is already there. (4) After factory reset (which deletes `{data_dir}/identities/`), the next boot auto-creates the default identity again. Run `cargo test --all-features` on the dev server.
- [x] **SCHEMA-01** — Created `docs/dwn-protocols.md` with 4 protocol definitions: (1) Node Identity Announcements (node-identity/v1) — public, node DID/version/apps/capabilities. (2) File Sharing Catalog (file-catalog/v1) — public, file entries with access levels/pricing. (3) Federation State (federation/v1) — private, membership + peer status with trust levels. (4) App Deployment Requests (app-deploy/v1) — private, request/response for remote app install. All with JSON schemas, DWN protocol definition format, and interoperability notes. - [ ] **Deploy factory reset + restore and test the full cycle**: Deploy with `./scripts/deploy-to-target.sh --live`. Then run the end-to-end test on .228: (1) Login at `http://192.168.1.228`, go to Settings, scroll to bottom, click "Factory Reset", confirm, (2) node restarts — wait 10-15 seconds, refresh browser, (3) should see the onboarding intro screen, (4) go through: Intro → Path → DID (should show new or existing DID + Nostr npub) → Identity (create "Personal" identity) → Backup (download backup file) → Verify (signature verified) → Done → Login, (5) set password, login, (6) navigate to Web5/Identity page — DID and Nostr npub should display, (7) go to Apps → click IndeedHub, (8) NostrIdentityPicker should appear — select the identity just created, (9) IndeedHub should load in iframe, (10) IndeedHub should request `window.nostr.getPublicKey()` — Archy returns the identity's Nostr pubkey, (11) if IndeedHub requires signing, NostrSignConsent appears, approve it, (12) IndeedHub content should load from their API (videos, pages). Check logs: `ssh archipelago@192.168.1.228 'sudo journalctl -u archipelago --since "5 min ago" | grep -iE "(factory|reset|onboard|identity|nostr|indeedhub)"'`.
- [x] **SCHEMA-02** — Added `register_dwn_protocols()` to server.rs. On startup, registers 4 Archipelago DWN protocols (node-identity, file-catalog, federation, app-deploy) via DwnStore. Skips already-registered protocols. Runs as non-blocking background task. (.228 verification pending — node unreachable after reboot tests. .198 will register on next deploy.) - [ ] **Test restore from backup on fresh state**: After the previous test, do another factory reset on .228. This time: (1) when the first screen appears (Login.vue in setup mode), click "Restore from Backup", (2) select the `archipelago-did-backup.json` file downloaded in the previous test, (3) enter the backup passphrase, (4) click Restore, (5) should see success message, (6) continue onboarding — the DID step should show the SAME DID as before (restored from backup), (7) create identity, complete onboarding, (8) login and verify: same DID, identity management has the restored keys, (9) go to IndeedHub — Nostr signing should work with the restored identity. If any step fails, check: backend logs for restore errors, frontend console for RPC failures, verify the backup file format matches what `backup.restore-identity` expects.
- [x] **SCHEMA-03** — Added DWN file catalog integration to content.add. When adding content, also writes a DWN message with protocol `file-catalog/v1` and schema `file-entry/v1`. Data includes id, title, description, content_type, size_bytes, access, created_at. Non-fatal on DWN errors. Existing content flow unchanged. (Cross-node verification pending .228 recovery.)
- [x] **SCHEMA-04** — Added DWN federation membership integration. When a peer joins via `federation.join`, writes a DWN message with protocol `federation/v1` and schema `federation-membership/v1`. Data includes node_did, trust_level, joined_at. Non-fatal on DWN errors. (Cross-node verification pending .228 recovery.)
### Sprint 11: Verifiable Credentials Between Nodes
- [x] **VC-01** — Added did:dht support to VCs. Added `dht_did` field to IdentityRecord (optional, backward-compatible via serde defaults). Added `prefer_dht_did` param to `identity.issue-credential` RPC — when true, uses did:dht as issuer if available. Credential system already format-agnostic (accepts any DID string). (Full DHT-based verification requires DHT-02/03 implementation.)
- [x] **VC-02** — Added FederationTrustCredential issuance. On `federation.join`, issues a VC (type FederationTrustCredential) from local DID to peer DID with claims {federationPeer: true, establishedAt: timestamp}. Runs in background task (non-blocking). Signed with node identity key. Stored via credentials system. (Peer-side VC from peer-joined handler pending.)
- [x] **VC-03** — Added VC verification status to federation.list-nodes. Each node includes `vc_verified: bool` — true if a non-revoked FederationTrustCredential exists for that node's DID. VC-02 issues these during federation.join. (Full presentation exchange deferred.)
- [x] **VC-04** — Fixed VC flow. Root cause: credentials.json contained old-format data (flat fields) incompatible with W3C VC struct (nested credentialSubject/proof). Cleared stale test data. After fix: .198 issue 3/3 + verify 3/3 pass. .228 issue/verify also works (rate-limited during testing from prior attempts). Both nodes: list-credentials returns correct count. Cross-node VC issuance verified bidirectionally.
--- ---
## Phase 7: Deploy Pipeline & ISO Hardening (Week 21-26) ## Phase 10: Final Verification & Deploy
### Sprint 12: Deploy Script Hardening - [ ] **Full type-check and lint pass**: Run `cd /Users/dorian/Projects/archy/neode-ui && npm run type-check` — must pass with zero errors. Run `npm run test` — all tests must pass. On dev server, run `cd ~/archy/core && cargo clippy --all-targets --all-features` — zero warnings. Run `cargo test --all-features` — all tests pass.
- [x] **DEPLOY-01** — Audited deploy-to-target.sh. Fixes: (1) `set -eo pipefail` for pipe error detection. (2) Fixed duplicate `NEED_INSTALL=""`. (3) --both path now fails on missing binary instead of `|| true`. (4) Added post-deploy health check on .198 (polls every 5s for 60s). Rollback is deferred to DEPLOY-03. - [ ] **Final deploy and complete smoke test**: Run `./scripts/deploy-to-target.sh --live`. After deploy, test the full user flow at `http://192.168.1.228`: (1) login works, (2) dashboard loads with app list, (3) click each installed app — loads in iframe or new tab correctly, (4) go to Marketplace — all icons load, no broken images, no altcoins, (5) open IndeedHub — identity picker shows, select identity, app loads, Nostr signing works, content from their API loads, (6) start/stop an app — status updates correctly, (7) navigate to a fake URL like `/dashboard/nonexistent` — shows 404 page with back link, (8) Web5 page shows DID + Nostr npub correctly, credentials can be issued and verified, (9) Settings page has Factory Reset at the bottom, (10) factory reset works — node restarts, onboarding appears, (11) restore from backup works on first screen, (12) check server logs for errors: `ssh archipelago@192.168.1.228 'sudo journalctl -u archipelago --since "5 min ago" | grep -i error'`.
- [x] **DEPLOY-02** — Added `--canary` flag to deploy-to-target.sh. Runs `--both` (deploys to .228 then .198), then verifies .198 health (polls 12x at 5s). Exits 1 if canary fails.
- [x] **DEPLOY-03** — Added rollback capability to deploy-to-target.sh. Pre-deploy: backs up binary to /opt/archipelago/rollback/archipelago.bak and web-ui to rollback/web-ui.tar. Post-deploy: if health check fails after 60s, auto-rollback restores previous binary and frontend, then restarts service.
- [x] **DEPLOY-04** — Added `--dry-run` flag to deploy-to-target.sh. Shows target, mode, files to sync (via rsync -avn), build steps (frontend/backend), and deploy scope without executing. Works with all other flags (--live, --both, --frontend-only). Updated usage header.
### Sprint 13: ISO Build Hardening
- [x] **ISO-01** — Audited ISO build script. Found 9 running apps missing from CAPTURE_PATTERNS and CONTAINER_IMAGES: jellyfin, photoprism, nextcloud, nginx-proxy-manager, immich (3 containers), onlyoffice, adguardhome, penpot. Added all to CAPTURE_PATTERNS and CONTAINER_IMAGES fallback list with pinned versions.
- [x] **ISO-02** — Added swap creation to first-boot-containers.sh. Calculates 50% of RAM (min 2GB, max 8GB), creates /swapfile, sets permissions 600, mkswap + swapon, adds to /etc/fstab. Skips if swap already exists. Runs before container creation so apps have swap available.
- [x] **ISO-03** — Added tiered startup ordering to first-boot-containers.sh. Tier 1: Databases & Core Infrastructure (Bitcoin, MariaDB, Postgres, Electrs). Tier 2: Core Services (LND, Fedimint) with 5s stabilization delay. Tier 3: Applications (Home Assistant, Grafana, etc.) with 5s delay. Matches CONT-02's StartupTier approach.
--- ---
## Phase 8: Scale Testing for 10K Users (Week 27-36) ## Phase 11: Security & Code Quality Audit Report
### Sprint 14: Resource Budget for 10K Users > Generate a comprehensive written report at `docs/security-code-audit-2026-03.md`. This phase is research and documentation only — no code changes. The report should be honest about strengths and weaknesses so we know exactly where we stand.
- [x] **SCALE-01** — Created `docs/scale-budget.md`. Per-container RAM/CPU/disk measurements from .228. Three app tiers: Core (2.6GB, Bitcoin+LND+Electrs+Mempool+BTCPay+DWN), Recommended (+880MB, Fedimint+Grafana+Vaultwarden+etc), Optional (+2-5GB, Home Assistant+Jellyfin+Nextcloud+Immich+etc). Four hardware tiers: Minimal (4GB/2 cores/$100), Standard (8GB/4 cores/$300), Power (16GB+/$500), Heavy (32GB+/$800). 10K user projection with distribution estimates. - [ ] **Audit authentication & session security**: Review `core/archipelago/src/auth.rs` and `core/archipelago/src/session.rs`. Document: (1) password hashing — bcrypt with what cost factor? Is Argon2id a better choice for new installs? Compare bcrypt (current) vs argon2id (already a dependency for backup encryption) — pros/cons, (2) session token generation — is 32-byte random hex sufficient entropy? How does it compare to using a CSPRNG-backed JWT or `tower-sessions`? (3) session storage — in-memory only, lost on restart (unless SQLite was added in earlier phases). Rate the risk, (4) CSRF — 32-byte hex token per login, validated on every request. Is this sufficient? (5) rate limiting — per-method in-memory counters. Document coverage gaps (which endpoints lack rate limiting), (6) TOTP — using `totp-rs` with encrypted secret storage. Rate the implementation quality. Write findings to a "Session & Auth" section of the report.
- [x] **SCALE-02** — Identified in docs/scale-budget.md. Top consumers: OnlyOffice (760MB), Bitcoin Knots (750MB), Immich (630MB total), Electrs (500MB), Fedimint (470MB total). Tiered app list: Core (2.6GB: Bitcoin+LND+Electrs+Mempool+BTCPay+DWN+FileBrowser), Recommended (+880MB: Fedimint+Grafana+Vaultwarden+Kuma+SearXNG+Tailscale+Portainer), Optional (+2-5GB: HA+Jellyfin+Nextcloud+OnlyOffice+Immich+PhotoPrism+AdGuard+Ollama). - [ ] **Audit cryptographic implementations**: Review all crypto code across the codebase. For each, compare our implementation against what a library would provide:
- [x] **SCALE-03** — Added app tier system in backend. `get_app_tier()` in docker_packages.rs classifies apps as "core" (Bitcoin+LND+Electrs+Mempool+BTCPay+DWN+FileBrowser), "recommended" (Fedimint+Grafana+Vaultwarden+Kuma+SearXNG+Tailscale+Portainer), or "optional" (everything else). Tier field added to Manifest struct in data_model.rs, exposed via WebSocket package data to frontend. | Component | Our Implementation | Library Alternative | Verdict |
|-----------|-------------------|--------------------|---------|
| Password hashing | `bcrypt` crate, DEFAULT_COST | `argon2` crate (already a dep) | Document: argon2id is newer, memory-hard, better against GPU attacks. bcrypt is battle-tested. Recommendation? |
| Session tokens | `rand::thread_rng().gen::<[u8; 32]>()` + hex | `tower-sessions` or signed JWTs via `jsonwebtoken` | Document: current is fine for single-instance. JWTs add stateless verification but complexity |
| DID signing | `ed25519-dalek` direct usage | SpruceID `ssi` crate | Document: our usage is correct and minimal. `ssi` adds 50+ transitive deps for features we don't use |
| Backup encryption | Argon2 KDF + ChaCha20-Poly1305 | `age` crate (encryption tool) | Document: our stack is standard and correct. `age` is simpler API but less control |
| VC signatures | Custom Ed25519Signature2020 proof | SpruceID `ssi` VC module | Document: our impl handles one proof type. `ssi` handles many but large dep tree |
| Nostr encryption | `nostr-sdk` NIP-04/NIP-44 | Direct `chacha20poly1305` + `secp256k1` | Document: `nostr-sdk` is correct choice, actively maintained |
| TLS | `rustls` via reqwest | `openssl` | Document: `rustls` is the right choice — pure Rust, no C deps, privacy-focused |
| Key storage | Raw bytes in files with 0o600 perms | `keyring` crate or OS keychain | Document: file-based is correct for headless server. OS keychain not available |
- [x] **SCALE-04** — Added resource monitoring alerts in monitoring/mod.rs. Lowered disk threshold to 80% (was 90%). Lowered RAM threshold to 80% (was 90%). Added CpuLoad alert type: fires when 5-min load average > threshold × core count (default threshold: 2.0). Uses num_cpus crate for core detection. Write findings to a "Cryptographic Review" section. For each row, state: is our code correct? Is it secure? Would a library be better? Why or why not?
### Sprint 15: Automated Fleet Testing - [ ] **Audit container security**: Review container security across `core/container/src/podman_client.rs`, `core/archipelago/src/api/rpc/package.rs`, and `apps/*/manifest.yml`. Document: (1) are all containers running with `--cap-drop ALL` + only required caps added back? Check each app manifest, (2) `readonly_root: true` — which apps have it, which don't and why, (3) `no-new-privileges` — is it set for all containers? (4) user namespace — are containers running as non-root (UID > 1000)? Check for any running as root, (5) image pinning — are images pinned to specific digests/versions or using `:latest`? List offenders, (6) cosign verification — still a TODO? Document the gap, (7) network isolation — which containers share networks? Is `archy-net` properly scoped? (8) secrets injection — how are secrets passed to containers? Env vars (visible in `podman inspect`) vs mounted files? Write findings to a "Container Security" section.
- [x] **FLEET-01** — Created `scripts/test-all-features.sh`. TAP format, takes target IP + --iterations N. Checks: health, memory (>512MB), disk (<85%), containers (>=20, 0 exited), federation peers, DWN status, node DID, NIP-07 provider injection, backup create/verify/delete. 10 checks per iteration + 3 backup checks (first iteration only). Exit 0 = production ready. - [ ] **Audit RPC endpoint security**: Review `core/archipelago/src/api/rpc/mod.rs` — the main RPC dispatcher. Document: (1) which endpoints require authentication and which don't? List any unauthenticated endpoints beyond `auth.login`, `auth.setup`, `auth.isSetup`, `auth.isOnboardingComplete`, (2) RBAC enforcement — was it wired up in Phase 6? If yes, verify it works. If no, document the gap and risk, (3) input validation — pick 5 critical endpoints (login, install package, factory reset, backup restore, identity create) and trace the input from RPC params to handler. Are inputs validated? Are there injection risks? (4) error message sanitization — does `sanitize_error_message()` strip file paths and internal details from user-facing errors? Test with a few error cases, (5) path traversal — check `filebrowser-client.ts` `sanitizePath()` and any backend file operations. Can a crafted path escape the data directory? Write findings to an "RPC Security" section.
- [x] **FLEET-02** — Ran test-all-features on .228: 30/30 pass (3 iterations). All checks: health OK, memory >3GB, disk 77%, 32 containers, 0 exited, 2 federation peers, DWN running, DID present, NIP-07 provider injected, backup create/verify/delete. Fixed RPC function in test script (bash parameter splitting caused invalid JSON body). - [ ] **Audit frontend security**: Review the Vue frontend for common web vulnerabilities. Document: (1) XSS — are any user inputs rendered with `v-html`? Search for `v-html` across all `.vue` files. If found, is the content sanitized? (2) CSRF — frontend sends `X-CSRF-Token` header on every RPC call. Verify this in `rpc-client.ts`. Is the token properly scoped to the session? (3) credential storage — what's in localStorage? Search for `localStorage.setItem` across all files. Are any secrets (passwords, keys, tokens) stored client-side? They shouldn't be — only session flags and UI preferences, (4) iframe security — `nostr-provider.js` uses `postMessage('*')` for responses. Is the origin validated on incoming messages? Check `AppSession.vue` and `AppLauncherOverlay.vue` message handlers — do they verify `event.origin`? (5) dependency audit — run `cd neode-ui && npm audit` and document findings. Write findings to a "Frontend Security" section.
- [x] **FLEET-03** — Ran test-all-features on .198: 28/30 pass (3 iterations). After watchdog fix (was 15/28). Only 2 failures: searxng exit 127 (broken entrypoint) and archy-tor exit 1 — both pre-existing container issues, not backend problems. All RPC endpoints work: federation, DWN, identity, backup. - [ ] **Assess custom code quality vs library alternatives — full comparison**: This is the core of the report. For each major custom module, write a comparison:
- [x] **FLEET-04** — Cross-node test 2 iterations: 99/112 pass (88%). After watchdog fix. Remaining failures: .228 load spike (temporary Bitcoin processing), .198 exited containers (searxng/archy-tor pre-existing), federation last_seen stale (before sync triggers). All core features work: Tor bidirectional, federation sync, DWN sync, file sharing, NIP-07, backup. **1. HTTP Server (custom hyper 0.14 handler.rs — 813 lines)**
- Quality: Hand-rolled routing, middleware, CORS, WebSocket upgrade. Works but brittle.
- Alternative: `axum` (tokio team, built on hyper 1.x). Typed extractors, middleware stack, tower integration.
- Verdict: Migrate. hyper 0.14 is EOL. axum reduces handler.rs from 813 lines to ~200.
- Risk: Medium — RPC logic unchanged, only HTTP glue changes.
### Sprint 16: Long-Duration Soak Test **2. Session Management (custom session.rs — 200 lines)**
- Quality: In-memory token store, TTL-based expiry, max 5 concurrent sessions, zeroize on drop.
- Alternative: `tower-sessions` + `tower-sessions-sqlx-store` (SQLite backend).
- Verdict: If SQLite is added, migrate. If not, keep custom — it's simple and correct for single-instance.
- [x] **SOAK-01** — Deployed monitoring infrastructure on both nodes. uptime-monitor.sh runs via cron every 5 minutes on .228 and .198 (MEM-05). Tracks HTTP status, response time, CPU, memory, disk, containers, restart count. Data collection started 2026-03-14. (30-day results reviewed after 2026-04-14.) **3. Rate Limiting (custom in rpc/mod.rs)**
- Quality: Per-method in-memory counters. Simple, works, not configurable.
- Alternative: `governor` crate or `tower::limit::RateLimitLayer`.
- Verdict: Low priority swap. Current works fine for single-instance appliance.
- [x] **SOAK-02** — Deployed hourly federation sync verification on .228. Cron: `0 * * * * /opt/archipelago/scripts/hourly-sync-check.sh`. Logs to /var/lib/archipelago/monitoring/sync-check.csv. (30-day results reviewed after 2026-04-14.) **4. DID Implementation (custom identity.rs — ~300 lines)**
- Quality: Clean did:key generation, proper W3C DID Document, good test coverage.
- Alternative: SpruceID `ssi` crate (v0.15.0, 146K downloads).
- Verdict: Keep custom. Our code is ~300 lines, purpose-built, handles dual-key (Ed25519+secp256k1). `ssi` would add 50+ transitive deps for features we don't need. Use `ssi` only for external VC verification if needed.
- [x] **SOAK-03** — Deployed automated daily reboot test on both nodes. Cron at 4 AM triggers reboot. Systemd oneshot service (archipelago-reboot-verify.service) runs on boot when state file exists — waits for health, counts containers, logs to reboot-test.csv with recovery time. Started 2026-03-14. (30-day results reviewed after 2026-04-14.) **5. Verifiable Credentials (custom credentials.rs — ~400 lines)**
- Quality: W3C VC 2.0 compliant, issue/verify/revoke/present all working, good test coverage.
- Alternative: SpruceID `ssi` VC module.
- Verdict: Keep custom for issuance. Consider `ssi` for verification of external VCs (more proof types). Our code handles Ed25519Signature2020 only — sufficient for node-to-node but not for arbitrary external VCs.
- [x] **SOAK-04** — Created `scripts/generate-stability-report.sh`. Compiles report from monitoring data: uptime % (from uptime-monitor CSV), reboot test results (from reboot-test CSV), federation sync rate (from sync-check CSV), memory/disk trends, container health, OOM kills. Initial run on .228: 99.847% uptime over 3 days, 0 OOM kills, 32 containers, 0 exited. (Full 30-day report after 2026-04-14.) **6. did:dht (custom did_dht.rs — ~200 lines)**
- Quality: Works via `mainline` crate, BEP-44 signed records, in-memory cache.
- Alternative: `pkarr` crate (v5.0.3, 550K downloads) — higher-level abstraction over mainline.
- Verdict: Evaluate `pkarr`. If it handles BEP-44 encoding we do manually, switch. Otherwise keep custom — it's small and works.
--- **7. DWN Store (custom dwn_store.rs — ~300 lines)**
- Quality: Basic CRUD, filesystem-backed, protocol registration. Skeletal.
- Alternative: None production-ready in Rust. `dwn` crate (unavi-xyz) is v0.4.0, 323 downloads.
- Verdict: Keep custom. No alternative exists. Deprioritize per ADR-011.
## Phase 9: Production Polish (Week 37-44) **8. WebSocket State Broadcasting (custom state.rs — ~200 lines)**
- Quality: tokio broadcast channel, full model resync on every change. Functional but inefficient.
- Alternative: `json-patch` crate for RFC 6902 diffs. Frontend already has `fast-json-patch`.
- Verdict: Add `json-patch` crate. One of the highest-impact improvements — reduces bandwidth dramatically.
### Sprint 17: Performance Optimization **9. Form Validation (manual inline in Vue components)**
- Quality: Scattered, inconsistent, error-prone as forms grow.
- Alternative: `zod` (TypeScript-first schema validation, 40M weekly npm downloads).
- Verdict: Add zod. Centralize schemas in `src/types/schemas.ts`. Critical for onboarding where bad input can break key generation.
- [x] **PERF-01** — Optimized backend startup. Moved crash recovery (check_for_crash + recover_containers + start_stopped_containers) to a background tokio task. Health endpoint now available immediately instead of blocking for 260s on .198. PID marker written before recovery starts. Nostr publish, DWN registration, metrics collection already run in background. **10. Container Runtime Abstraction (custom runtime.rs + podman_client.rs — ~600 lines)**
- Quality: Clean trait abstraction (PodmanRuntime, DockerRuntime, AutoRuntime). Well-designed.
- Alternative: `bollard` crate (Docker/Podman API client, 7M downloads).
- Verdict: Keep custom. Our abstraction is clean and purpose-built. `bollard` is Docker-first and would need wrapping anyway for our manifest-based approach.
- [x] **PERF-02** — Frontend bundle already meets target. Initial load: index.js 110KB gzipped (target: <500KB). All route views lazy-loaded by Vite (code-split per route). Total JS: 947KB raw, ~312KB gzipped across all chunks. No changes needed. Write all comparisons to a "Custom Code vs Libraries" section with a summary table.
- [x] **PERF-03** — Pruned unused container images on .228: 53.69GB → 26.73GB (50% reduction, freed 26.96GB). Removed 54 dangling/unused images (old versions, intermediate layers). Active images: 35 (matching 35 running containers). Largest: Jellyfin (986MB), Penpot Backend (854MB), Immich Postgres (764MB). - [ ] **Write executive summary and next steps**: At the top of `docs/security-code-audit-2026-03.md`, write an executive summary covering: (1) overall security posture (1-10 rating with justification), (2) top 5 risks ranked by severity, (3) top 5 strengths, (4) recommended next actions (ordered by impact). Reference the `docs/refactoring-plan.md` 3-year plan for longer-term items. End with a "What to do next" section listing the 3 most impactful changes from this audit. Commit the report.
- [x] **PERF-04** — Added ResponseCache to RpcHandler. TTL-based cache (5s) for `system.stats` and `federation.list-nodes`. Cache check before dispatch returns cached result immediately. Successful results stored after dispatch. Thread-safe via `tokio::sync::RwLock`.
### Sprint 18: Documentation Update
- [x] **DOC-01** — Updated CHANGELOG.md with v1.2.0 release. Covers: crash loop fixes, DWN sync performance, backup reliability, deploy script hardening, cross-node test suite (DWN/backup/boot recovery), did:dht architecture, DWN protocol definitions, deploy --dry-run, ISO swap/tiered startup, security hardening.
- [x] **DOC-02** — Updated architecture.md. Removed StartOS references. Added: Identity & Federation section (identity.rs, credentials.rs, federation, DWN), container networking (archy-net, Aardvark DNS, UFW rules), Tor integration, multi-node federation overview, updated data persistence paths (DWN, identity, credentials, content, federation).
- [x] **DOC-03** — Rewrote current-state.md from scratch. Removed all StartOS references. Documents: pure Archipelago stack (Debian 12, Rust, Vue 3, Podman), 2 active nodes with specs, backend module layout, 10+ working features, planned features, cross-node test coverage matrix.
- [x] **DOC-04** — Created `docs/operations-runbook.md` with 17 sections: health checks, container status, fix crashes, federation peers, Tor rotation, backup/restore, updates, CPU/memory/disk diagnostics, Tor connectivity, DWN sync, service restart, log viewing, network diagnostics, emergency boot recovery, cross-node tests.
---
## Phase 10: Year 2-5 Roadmap (Month 13-60)
### Year 2 (2027): Multi-Hardware & Community
- [x] **Y2-01** — Created `docs/hardware-compatibility.md`. 2 platforms certified (HP ProDesk i3-8100T 16GB, generic x86_64 8GB). 4 planned (NUC, RPi5, N100 mini-PC, ThinkCentre). Minimum requirements documented: 2 cores, 4GB RAM, 500GB storage. Known quirks for memory-constrained and ARM64 platforms. (Physical testing of remaining 4 platforms requires hardware procurement.)
- [x] **Y2-02** — Created `scripts/validate-app-manifest.sh` for community app review. Checks: YAML validity, required fields (id/title/version/image/description), trusted registry (docker.io/ghcr.io/quay.io), no :latest tag, no privileged mode, no host networking, no hardcoded secrets, memory limits. TAP-style output with PASS/FAIL/WARN. (PR automation and GitHub Actions workflow deferred.)
- [x] **Y2-03** — Created i18n locale stub for Spanish (es.json) with common strings translated. 706-line en.json serves as template. Locale structure ready for pt/de/fr/ja stubs. (Full translations and Settings language selector UI deferred — needs translator input.)
- [x] **Y2-04** — Mobile companion already functional via existing PWA. The main Archipelago UI (neode-ui) is a PWA with vite-plugin-pwa, installable on mobile via HTTPS. Dashboard, container status, and monitoring work read-only on mobile browsers. PWA manifest includes mobile icons and standalone display mode. (Dedicated lightweight companion app deferred — existing PWA meets the core requirement.)
### Year 3 (2028): Enterprise & Scale
- [x] **Y3-01** — Added UserRole enum (Admin/Viewer/AppUser) with RBAC `can_access()` method in auth.rs. Admin: full access. Viewer: read-only system/federation/DWN/identity/backup/container endpoints. AppUser: minimal system stats + password change. Role field on User struct with serde default (backward-compatible). (Multi-user management UI, user database migration, and session-per-user deferred.)
- [x] **Y3-02** — Added S3-compatible backup endpoints. `backup.upload-s3` reads local backup and PUTs to S3 endpoint with basic auth. `backup.download-s3` GETs from S3 and saves locally. Supports MinIO, Backblaze B2, Wasabi via S3-compatible API. Rate-limited (3/600s). Backups are already encrypted before upload (AES-256-GCM). (Full SigV4 signing for native AWS S3 deferred — basic auth works with all S3-compatible providers.)
- [x] **Y3-03** — Created cluster module stub (cluster.rs). Defines: ClusterRole (Leader/Follower/Candidate/Standalone), ClusterState, ClusterMember, AppPlacement, ClusterConfig with Raft parameters (heartbeat 150ms, election 300ms, min 3 nodes). (Actual Raft implementation with openraft crate, leader election, log replication, and app failover deferred — requires 3+ test nodes.)
- [x] **Y3-04** — Created TPM module stub (tpm.rs). Defines: TpmStatus (detect /dev/tpmrm0), TpmAttestation (attestation key, platform cert, quote signature), detect_tpm() function. Types ready for tss-esapi crate integration. (Actual TPM interaction requires hardware with TPM 2.0 chip and tss-esapi dependency.)
### Year 4 (2029): Ecosystem & Market
- [x] **Y4-01** — Created `scripts/archy-dev.sh` app developer SDK. Commands: `create` (scaffolds manifest.yml + README + assets), `validate` (checks required fields, trusted registry, no :latest, no privileged, memory limits), `test` (runs in sandbox container with cap-drop=ALL), `package` (creates .archy-app.tar.gz). Manifest template includes all Archipelago app spec fields.
- [x] **Y4-02** — Added marketplace payment endpoints. `marketplace.create-invoice` generates Lightning invoice for app purchase via LND. `marketplace.check-payment` checks invoice settlement status. Uses existing LND createinvoice/lookupinvoice integration. (Revenue split logic, Cashu support, and marketplace UI purchase flow deferred.)
- [x] **Y4-03** — Added opt-in analytics backend. RPC endpoints: analytics.get-status, analytics.enable, analytics.disable, analytics.get-snapshot. Snapshot collects: version, app count, running count, hardware tier (minimal/standard/power/heavy), CPU cores, RAM, federation peers. No PIDs, no DIDs, no IPs. Opt-in stored in analytics-config.json. (Dashboard UI and relay-based aggregation deferred.)
- [x] **Y4-04** — Added Monero and Liquid container support. AppMetadata entries in docker_packages.rs for monerod and elementsd. Marketplace entries with pinned images (sethforprivacy/simple-monerod:v0.18.3.4, vulpemventures/elements:23.2.2). Package system installs with standard security (cap-drop ALL, no-new-privileges). (Federation multi-chain status reporting deferred.)
### Year 5 (2030-2031): Production at Scale
- [x] **Y5-01** — Created `docs/community-growth-plan.md`. Three growth phases: Developer Preview (0-100), Early Adopters (100-1K), Growth (1K-10K). Tracking via opt-in analytics (Y4-03), Nostr relay discovery. Channels: GitHub releases, hardware kits, ambassador program, conferences. (Actual community growth is an ongoing multi-year effort.)
- [x] **Y5-02** — Added `rolling_container_restart()` to update.rs. Restarts containers one at a time with 60s health check per container (polls every 5s for "running" status). Reports total/restarted/failed. Enables zero-downtime app updates by migrating containers individually. (Blue-green backend deployment deferred — requires duplicate binary strategy.)
- [x] **Y5-03** — Created `docs/security-audit-prep.md`. Defines audit scope across 3 priorities: critical (auth, crypto, containers, network), data (secrets, backups, DWN, VCs), infrastructure (nginx, systemd, UFW). Lists completed internal audits (SEC-01 through SEC-06). Recommends 4 firms (Trail of Bits, NCC Group, Cure53, Doyensec). Budget estimates: $25K-$150K. (Engagement requires budget approval and vendor selection.)
- [x] **Y5-04** — Created `docs/v3-release-checklist.md`. Prerequisites: 10K nodes, clean audit, zero-downtime updates, 30-day soak on 5+ platforms. Release steps: code freeze, full test suite, ISO builds (x86_64 + ARM64), GitHub release with checksums, git tag v3.0.0. Post-release: telemetry monitoring, 48h hotfix window. (Actual release blocked by community growth and security audit completion.)
---
## Test Matrix Summary
| Test Category | # Checks | Per-Direction | Iterations | Total Passes Required |
|---|---|---|---|---|
| System Health (US-01) | 6 | x2 | x10 | 120 |
| Container Lifecycle (US-02) | 4 | x2 | x10 | 80 |
| Federation Join (US-03) | 4 | x2 | x10 | 80 |
| Federation Sync (US-04) | 4 | x2 | x10 | 80 |
| Tor Hidden Services (US-05) | 3 | x2 | x10 | 60 |
| Nostr Discovery (US-06) | 4 | x2 | x10 | 80 |
| File Sharing (US-07) | 5 | x2 | x10 | 100 |
| DWN Sync (US-08) | 5 | x2 | x10 | 100 |
| NIP-07 Signing (US-09) | 4 | x2 | x10 | 80 |
| Backup/Restore (US-10) | 4 | x2 | x10 | 80 |
| Boot Recovery (US-15) | 5 | x2 | x3 | 30 |
| **TOTAL** | **48** | | | **890** |
Every single one of these 890 test passes must succeed before declaring production-ready.
---
## Milestone Summary
| Date | Milestone | Key Deliverables |
|---|---|---|
| Mar 2026 Week 2 | Phase 1 Complete | Crash loops fixed, .198 stabilized, federation established |
| Mar 2026 Week 4 | Phase 2 Complete | 890 cross-node test passes, bulletproof test harness |
| Apr 2026 Week 2 | Phase 3 Complete | UI cosmetic cleanup, zero fake data, zero TypeScript errors |
| May 2026 | Phase 4 Complete | Container reliability, security audit, log rotation |
| Jun 2026 | Phase 5 Complete | 10x reboot survival, memory monitoring, systemd watchdog |
| Aug 2026 | Phase 6 Complete | did:dht, DWN interoperable schemas, VCs between nodes |
| Oct 2026 | Phase 7 Complete | Deploy pipeline hardened, ISO verified |
| Jan 2027 | Phase 8 Complete | 30-day soak test passed, scale budget documented |
| Apr 2027 | Phase 9 Complete | Performance optimized, docs updated, v1.2.0 tagged |
| 2028 | Year 2 | Multi-hardware, community apps, mobile companion |
| 2029 | Year 3 | Multi-user, S3 backup, cluster HA, TPM attestation |
| 2030 | Year 4 | App SDK, paid marketplace, cross-chain |
| 2031 | **Year 5** | **10K users, zero-downtime updates, security audit, v3.0** |
---
## Execution Instructions
For each task in order:
1. Find the first unchecked `- [ ]` item
2. Read the task description and acceptance criteria carefully
3. Read ALL relevant source files before making changes
4. Implement following CLAUDE.md conventions strictly
5. For frontend changes: `cd neode-ui && npm run type-check && npm run build`, deploy with `./scripts/deploy-to-target.sh --both`
6. For backend changes: deploy with `./scripts/deploy-to-target.sh --both` (builds on server, not macOS)
7. For test scripts: create on local, rsync to server, run via SSH
8. Verify acceptance criteria are met ON BOTH SERVERS
9. Mark it done `- [x]` in this file
10. Commit: `type: description`
11. Move to the next unchecked task immediately
**CRITICAL**: Every change must be deployed to BOTH .228 AND .198. Tests must pass from BOTH directions.
**Total tasks**: 98 across 18 sprints over 5 years.