276 Commits

Author SHA1 Message Date
Dorian
fa7a89ce9b feat: add canary deploy and auto-rollback (DEPLOY-02, DEPLOY-03)
DEPLOY-02: --canary flag deploys to both then verifies .198 health
DEPLOY-03: Pre-deploy rollback backup (binary + web-ui) to
/opt/archipelago/rollback/. Auto-rollback on post-deploy health failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:09:06 +00:00
Dorian
ec8eedca15 docs: v1.2.0 changelog and operations runbook
- DOC-01: CHANGELOG.md for v1.2.0 — crash fixes, DWN sync perf, test
  suite, did:dht planning, DWN protocols, deploy hardening, ISO improvements
- DOC-04: operations-runbook.md — 17 sections covering health checks,
  container management, federation, Tor, backups, updates, diagnostics,
  emergency recovery, and test execution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:08:48 +00:00
Dorian
f9272650c4 feat: add tiered startup ordering to first-boot containers
- Tier 1: Databases & Core Infrastructure (Bitcoin, MariaDB, Postgres)
- Tier 2: Core Services (LND, Fedimint) with 5s stabilization delay
- Tier 3: Applications (Home Assistant, Grafana, etc.) with 5s delay
- Matches health_monitor.rs StartupTier approach

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:06:20 +00:00
Dorian
a9624f3fe6 feat: auto-create swap file on first boot
- Add swap creation to first-boot-containers.sh
- Size: 50% of RAM (min 2GB, max 8GB)
- Creates /swapfile, adds to /etc/fstab for persistence
- Runs before container creation to prevent OOM during startup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:05:04 +00:00
Dorian
2becf9391a fix: audit and harden deploy script reliability
- Add pipefail to catch pipe errors (set -eo pipefail)
- Fix duplicate NEED_INSTALL="" initialization
- Fail on missing binary in --both path (was silently ignored)
- Add post-deploy health check on .198 (polls 60s)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:04:08 +00:00
Dorian
55deb69175 feat: add --dry-run flag to deploy script
Shows target, mode, files to sync, build steps, and deploy scope
without executing any changes. Works with --live, --both, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:02:37 +00:00
Dorian
f1ca3948c4 feat: auto-register Archipelago DWN protocols on startup
- Add register_dwn_protocols() in server.rs
- Registers 4 protocols: node-identity, file-catalog, federation, app-deploy
- Skips already-registered protocols (idempotent)
- Runs as non-blocking background task during server init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 03:00:29 +00:00
Dorian
9afe93a5b4 docs: did:dht integration architecture and DWN protocol schemas
- DHT-01: docs/did-dht-integration.md — did:dht spec analysis, DNS packet
  encoding, mainline crate, publication/resolution flows, security notes
- SCHEMA-01: docs/dwn-protocols.md — 4 DWN protocol definitions with JSON
  schemas: node-identity, file-catalog, federation, app-deploy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:59:16 +00:00
Dorian
952b7d1c92 feat: add container memory leak detection (MEM-02)
MemoryTracker in health_monitor.rs tracks per-container RSS every 5 min.
Warns when a container's memory grows >50% over tracking period.
Parses podman stats output (GiB/MiB/KiB formats).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:56:18 +00:00
Dorian
e3e279331f feat: add systemd watchdog, OOM detection, disk growth alerting
MEM-01: OOM kill detection via dmesg checks every 5 minutes
MEM-03: Disk growth rate tracking (288 samples over 24h), warns at >1GB/day
MEM-04: Systemd watchdog (WatchdogSec=60, sd_notify::Watchdog every 30s)
        Service Type=notify for proper startup notification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:54:59 +00:00
Dorian
91cad8a9ab test: US-15 boot recovery tests — .228 passes 9/9, .198 needs CONT-02
- Add US-15 boot recovery test to test-cross-node.sh (--skip-reboot flag)
- .228: 32/32 containers survive all 3 reboots, 0 exited
- .198: sequential crash recovery blocks health for 260s
- Add federation rate limits (federation.join 5/60, peer RPCs 10/60)
- Add DWN message data size limit (10MB max)
- Known: .228 unreachable after reboot tests, needs physical access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:54:16 +00:00
Dorian
e9849be311 test: add reboot survival test script (REBOOT-01)
Creates scripts/test-reboot-survival.sh with TAP format output.
Records pre-reboot containers, reboots node, waits for SSH + health,
verifies container count/state/health. 6 checks per iteration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:52:55 +00:00
Dorian
3fe25fb8dc feat: Phase 4 backend hardening — container reliability + security audit
Container Management (CONT-01 through CONT-06):
- Fix needs_archy_net: add lnd, nbxplorer to archy-net list
- Add StartupTier dependency ordering to health monitor (DB→Core→Dependent→App→UI)
- Add exponential backoff (10s/30s/90s) with 1hr stability reset
- Add get_health_check_args() with health checks for 20+ apps
- Add get_memory_limit() with per-app limits (128m-4g vs blanket 2g)
- Create docs/network-topology.md
- Fix fedimint containers on both nodes (moved to archy-net)

Security Audit (SEC-01 through SEC-06):
- Add sanitize_error_message() — strips internal paths from RPC errors
- Add validate_identity_id() — blocks path traversal on identity operations
- Add validate_did() — blocks path traversal on federation operations
- Add message size limits: node-send-message (1MB), dwn.write-message (10MB)
- Add rate limits for federation endpoints (join: 5/60s, invite: 10/300s)
- Configure journald (500MB max, 7 day retention) on both nodes
- Add /etc/logrotate.d/archipelago for backend + crowdsec logs
- Verify all 4 nginx security headers on both nodes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:45:28 +00:00
Dorian
385c58bd87 test: US-10 backup/restore tests pass 80/80 — add rate limit headroom
- Add US-10 backup/restore test section to test-cross-node.sh
- Test cycle: create → list → verify → delete, 10 iterations × 2 nodes
- Increase backup.create rate limit from 3/600 to 10/600 (still conservative)
- Increase backup.restore rate limit from 2/600 to 5/600
- Clean up 21K+ stale DWN test messages on both servers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:11:24 +00:00
Dorian
8173c09c34 test: US-08 DWN sync tests pass 50/50 — fix sync performance
- Make dwn.sync endpoint async: spawns background task, returns immediately
- Add 90s overall timeout to sync_with_peers via tokio::time::timeout
- Deduplicate peer onion addresses before syncing
- Batch message pushes (50 per request) instead of one-at-a-time over Tor
- Add 15s connect_timeout to Tor SOCKS5 client
- Cap local message query to 200 messages per sync
- Fix DWN HTTP handler to process ALL messages in batch (was only first)
- Add recordId deduplication in handler to prevent duplicate imports
- Update test script to poll dwn.status for sync completion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 01:35:56 +00:00
Dorian
9f9ac06aa0 test: US-07 file sharing tests pass 50/50 — fix ssh_sudo compound command bug
Fixed ssh_sudo in US-07 section where chown ran without sudo because
&& in the command broke the sudo pipe. With set -e, this silently killed
the script. Wrapped compound commands in sudo bash -c to keep everything
under sudo. All file sharing tests pass bidirectionally over Tor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 00:44:10 +00:00
Dorian
c235b0a090 test: add container lifecycle, federation join/sync tests to cross-node suite
- TEST-03: US-02 container lifecycle — stop filebrowser, verify health monitor
  auto-restarts within 90s (40s on .228, 15s on .198)
- TEST-04: US-03 federation join — verify peers present, trust level, DID, last_seen
- TEST-05: US-04 federation sync — trigger sync, verify app counts, freshness
- Fix: updated stale onion addresses in federation nodes.json on both servers
  after Tor address rotation broke inter-node sync

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 23:56:56 +00:00
Dorian
3488b07c25 chore: complete Phase 3 UI cleanup — verify all views use real data
- UI-CLEAN-04: Web5.vue verified clean (DID, wallet, DWN, credentials all from RPC)
- UI-CLEAN-05: Settings.vue no section duplication with other pages
- UI-CLEAN-06: Marketplace — fix photoprims.svg → photoprism.svg typo, all 33 icons verified
- UI-CLEAN-07: Cloud.vue file management from real FileBrowser API
- UI-CLEAN-08: Federation.vue all data from federation RPC endpoints
- UI-CLEAN-09: Chat.vue proper AIUI availability check with fallback
- UI-CLEAN-10: Apps.vue shows real containers from store + intentional web bookmarks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 23:40:29 +00:00
Dorian
7d1d8541e0 fix: Server.vue — check connectivity on mount, poll health after restart
- Added checkConnectivity() call on mount instead of assuming connected
- Restart now polls server.health up to 15 times instead of blindly
  assuming success after 2s
- Marks UI-CLEAN-01, 02, 03 done in plan

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 23:13:50 +00:00
Dorian
c7cfd032ae test: add cross-node test suite with TAP output
Created scripts/test-cross-node.sh covering:
- US-01: System health (6 checks per node per iteration)
- US-05: Tor hidden service resolution (bidirectional)
- US-09: NIP-07 nostr-provider injection

31/32 tests pass. Both nodes healthy, Tor working bidirectionally,
NIP-07 provider injected on both nodes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 23:06:49 +00:00
Dorian
745bcf76bd fix: stabilize both servers — swap, Tor upgrade, federation verified
STAB-01: Added 4GB swap on .198
STAB-02: Added 8GB swap on .228
STAB-03: Upgraded Tor on .198 from 0.4.7.16 to 0.4.9.5 (Tor Project repo)
STAB-04: .onion resolution working — .198 can reach .228 via Tor
STAB-05: Nostr identity valid — revocation is intentional (blocks old format)
STAB-06: Federation already established between .228 and .198
STAB-07: Root podman correctly aligned with backend on .198

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 23:02:18 +00:00
Dorian
66bf30547b fix: resolve container crash loops on .228 — UFW blocking Podman DNS
Root cause: UFW firewall was blocking all traffic from Podman container
subnets (10.88.0.0/16, 10.89.0.0/16) to the host, which prevented
Aardvark DNS resolution. Containers could not resolve each other by
hostname, causing mempool-web, mempool-api, nbxplorer, btcpay-server,
and immich_server to crash loop (6000+ total restarts).

Fix: Added UFW allow rules for Podman network subnets. Also removed
unused ollama container. All 32 containers now stable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-13 22:35:04 +00:00
Dorian
62b023fcef feat: release v1.1.0 — Nostr signing, file sharing, DWN sync, Tor rotation
Bump version to 1.1.0 in Cargo.toml and package.json.
Add comprehensive CHANGELOG.md entry covering all v1.1.0 features:
NIP-07 iframe signing, file sharing across nodes, DWN multi-node sync,
node visualization map, Tor address rotation, boot container recovery,
and full monitoring/testing suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 04:02:21 +00:00
Dorian
f99d2becb2 chore: mark UPTIME-03 complete — all uptime issues documented and fixed
Three issues found during uptime testing: boot container recovery,
uptime monitor auth, Tor hostname permissions — all fixed in prior
commits. No memory leaks detected. 99.5% uptime over 415 checks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:55:59 +00:00
Dorian
62f37eca00 feat: auto-start stopped containers on boot, add failure recovery tests
Added start_stopped_containers() to crash_recovery.rs that starts all
exited/created containers on backend startup, fixing the issue where
containers didn't come back after clean reboot (PID marker removed by
systemd stop). Created test-failure-recovery.sh covering 5 failure
scenarios: container crash, backend restart, Tor restart, full reboot,
and Tor traffic block (UPTIME-02).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:55:14 +00:00
Dorian
15d6fece3d feat: add federation health monitoring, fix uptime monitor auth
Created federation-health-check.sh tracking peer online/offline state,
DWN sync status, and federation success rate. Fixed uptime-monitor.sh
to authenticate for system.stats RPC. Both run every 5min via cron
on primary server (UPTIME-01).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:38:33 +00:00
Dorian
1a94a24d00 feat: add full integration test script — 23 checks all passing
Covers federation, content sharing, DWN messages + sync, health
monitor auto-restart, Tor rotation endpoints, and NIP-07 signing.
Fixed content.list → content.list-mine, system.stats field name.
(INSTALL-04)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:35:42 +00:00
Dorian
1068267c5a feat: fix Tor rotation to handle system Tor and hostname caching
read_onion_address() now checks tor-hostnames readable cache first,
clears cache before wait_for_hostname, updates it after rotation.
Rotation restarts system Tor (not just archy-tor container). Created
test-tor-rotation.sh with 10 automated checks (INSTALL-03).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:32:21 +00:00
Dorian
540836f3d6 feat: fix NIP-07 signing to use node Nostr key, add test script
Added node.nostr-sign RPC that uses the node-level Nostr key (matching
getPublicKey), fixing pubkey mismatch where identity.nostr-sign used a
different key. Updated appLauncher to call node.nostr-sign. Added
nostr_sign_hash() to nostr_discovery.rs. Created test-nip07.sh with
11 automated checks (INSTALL-02).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 03:18:45 +00:00
Dorian
a137c137a2 chore: mark MAP-04 complete — DWN management already in Web5.vue
Web5.vue already has protocol management (register/list/remove),
message browser with pagination, sync targets, and sync now button.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:55:51 +00:00
Dorian
0a9a255cc5 feat: add D3.js network map visualization to Federation page
- Install d3@7 and @types/d3@7
- NetworkMap.vue: force-directed graph with draggable nodes, trust-level
  coloring (green/amber/red), online/offline opacity, dashed links
- Federation.vue: List/Map tab switcher with localStorage persistence
- Wire map to real federation data (self node centered, peers as satellites)
- Default to map view when 3+ nodes federated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:55:16 +00:00
Dorian
9a22a3c5df feat: add DWN sync status section to Federation node detail modal
- Shows message count, last sync time, sync status indicator
- Sync Now button triggers dwn.sync RPC with loading state
- DWN status dot in node list cards (green/amber/red)
- Loads DWN status on mount alongside federation nodes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:50:55 +00:00
Dorian
fa674fc79e test: DWN sync across federation peers — infrastructure verified
Sync successfully contacts federation peers over Tor. Pull/push protocol
works end-to-end (tested via direct Tor DWN endpoint). Peers need updated
backend deployed for full cross-node replication.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:47:42 +00:00
Dorian
d55319205b feat: fix DWN sync to use federation peers and standard port
- DWN sync now uses federation node list instead of old peer list
- Fix sync URL to use port 80 (nginx) instead of 5678 (direct backend)
- DWN /dwn endpoint now accessible without auth for peer sync
- Support both message formats: {message:{}} and {messages:[{}]}
- Replace request["message"] with unified message variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:47:09 +00:00
Dorian
4176e640a0 feat: add Peer Files UI for browsing and downloading federated content
- New PeerFiles.vue view shows federated peers and their shared catalogs
- Peer Files card in Cloud.vue shows when federation peers exist
- New content.download-peer RPC fetches content from peer via Tor
- Route: /dashboard/cloud/peers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:37:59 +00:00
Dorian
9f5c2ee656 test: verify content sharing at scale — 10 files, checksums match
Catalog browse: 0.33s over Tor. All file sizes (28B to 10MB) download
correctly with matching MD5 checksums. Transfer speeds ~500-800KB/s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:29:21 +00:00
Dorian
18e8a178a3 feat: implement peers_only and specific availability access control for content
- PeersOnly access now checks X-Federation-DID header against known federation nodes
- Specific availability restricts content to named peer DIDs only
- Anonymous/unknown DID requests get 403 Forbidden
- Free content remains accessible to everyone
- Paid content still returns 402 with price info

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:27:38 +00:00
Dorian
fc8e3f2d39 feat: fix content sharing — nginx proxy, file path resolution, catalog filtering
- Add /content and /dwn proxy locations to nginx config (both HTTP and HTTPS)
  so peer requests reach the backend instead of the SPA catch-all
- Update content_file_path() to check FileBrowser data dir as fallback when
  files aren't in the dedicated content/files/ directory
- Populate size_bytes from actual file metadata in content.add
- Filter out availability:nobody items from the public catalog endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:20:55 +00:00
Dorian
8c3c2104b2 feat: test federation resilience across 4 scenarios (FED-DEPLOY-04)
Verified: backend stop detection, restart recovery, Tor stop detection,
full reboot recovery. Fixed AppArmor read rules for Tor directories.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 02:06:15 +00:00
Dorian
6a90cf9a13 feat: validate Nostr discovery across all federated nodes (FED-DEPLOY-03)
All 3 servers publish to Nostr relays and discover each other.
Removed stale revocation files and suspicious SSRF relay entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 01:56:36 +00:00
Dorian
95d97c008a feat: federate 3 servers with Tor, fix inter-node auth (FED-DEPLOY-02)
- Add tor-hostnames fallback for reading onion addresses when system Tor
  owns hidden_service directories (permissions 700)
- Exempt federation.peer-joined, federation.get-state, and
  federation.peer-address-changed from auth/CSRF (inter-node RPC)
- Set up system Tor with AppArmor overrides on archipelago-2 and 3
- All 3 servers federated and syncing successfully

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 01:52:50 +00:00
Dorian
d5d3d2430e feat: deploy latest code to all available servers (FED-DEPLOY-01)
Deployed to primary (192.168.1.228), archipelago-2, and archipelago-3.
Secondary (192.168.1.198) is offline. All 3 servers healthy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 01:06:53 +00:00
Dorian
98da9b2b4c feat: add Tor services management UI in Settings
Settings page shows all Tor hidden services with toggle switches
(enable/disable per app) and a Rotate button for the main node address.
Added RPC client methods for tor.list-services, tor.toggle-app,
tor.rotate-service, tor.cleanup-rotated. Toggle CSS classes in style.css.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:13:38 +00:00
Dorian
e74bf9bcfd feat: propagate Tor address rotation to Nostr relays and federation peers
After rotation, spawns background task that publishes updated .onion to
Nostr relays and sends federation.peer-address-changed RPC to all peers
over Tor. Peers update their nodes.json with the new address.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 00:08:16 +00:00
Dorian
f1ca60e732 feat: add Tor address rotation, cleanup, and per-app toggle RPC endpoints
tor.rotate-service: renames hidden service dir, restarts Tor, waits
for new hostname. Old dir kept for 24h transition.
tor.cleanup-rotated: removes expired old service directories.
tor.toggle-app: enable/disable Tor access per app with service dir
management and container restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:57:38 +00:00
Dorian
83c0092f1b feat: add NIP-04 and NIP-44 encrypt/decrypt RPC endpoints for iframe apps
Backend: identity.nostr-encrypt-nip04, identity.nostr-decrypt-nip04,
identity.nostr-encrypt-nip44, identity.nostr-decrypt-nip44 endpoints
with auto-resolve to default identity. Frontend: appLauncher routes
nip04.* and nip44.* postMessage calls to backend RPC.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:50:56 +00:00
Dorian
1d3a8e2050 feat: add noStrudel Nostr client with NIP-07 iframe proxy support
Added nostrudel.ninja as a web-only app in Marketplace (community category).
Configured nginx reverse proxy at /ext/nostrudel/ with NIP-07 provider
injection in both HTTP and HTTPS blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:38:22 +00:00
Dorian
cbf971b6b2 feat: add NIP-07 signing consent modal with remember-per-app support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:33:30 +00:00
Dorian
aca1d66f71 feat: inject NIP-07 nostr-provider.js into all nginx app proxy blocks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 23:21:15 +00:00
Dorian
e2cb235b42 chore: mark IDENT-04 complete — onboarding backup already wired
OnboardingBackup.vue was already calling rpcClient.createBackup()
with real RPC backend. No code changes needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:57:09 +00:00