# Architecture Review — Fix Remaining Issues ## Context The architecture review (`docs/architecture-review.html`) identified 4 P0, 6 P1, and 6 medium-priority issues across the codebase. After research, **all 4 P0s and 4 of 6 P1s are already fixed**. This plan addresses the remaining open items that improve reliability and security during the beta freeze. **What's already fixed:** P0-1 (health RPC), P0-2 (health checks), P0-3 (backup rollback), P0-4 (nginx protections), P1-B (rate limiter cleanup), P1-C (systemd limits), P1-E (WS reconnect), P1-F (Vue error handler), Issue 11 (session async I/O). **What we're fixing now (4 items):** --- ## Item 1: Add 10s timeout to 6 bare `client.connect()` calls — DONE **Why:** A down Nostr relay hangs the async task indefinitely, blocking identity publishing, node discovery, and marketplace operations. Direct uptime impact. ### Files & locations | File | Line | Function | |------|------|----------| | `core/archipelago/src/identity_manager.rs` | 409 | `publish_profile()` | | `core/archipelago/src/nostr_discovery.rs` | 113 | `publish_node_revocation()` | | `core/archipelago/src/nostr_discovery.rs` | 200 | `verify_revocation()` | | `core/archipelago/src/nostr_discovery.rs` | 264 | `discover_archipelago_nodes()` | | `core/archipelago/src/marketplace.rs` | 298 | `discover()` | | `core/archipelago/src/marketplace.rs` | 406 | `publish()` | ### Pattern (from `nostr_handshake.rs:126`) Replace each `client.connect().await;` with: ```rust if tokio::time::timeout(Duration::from_secs(10), client.connect()).await.is_err() { tracing::warn!("Nostr relay connection timed out after 10s, continuing anyway"); } ``` Ensure `use std::time::Duration;` is imported in each file. `tracing::warn!` is already available in all three files. ### Risk: LOW — Mechanical pattern replication, no logic changes. --- ## Item 2: Pin all crypto dependency versions exactly — DONE **Why:** Floating versions (`"2.1"` instead of `"2.2.0"`) allow `cargo update` to silently change crypto libraries. Supply chain risk + project rules violation. ### Versions (verified from Cargo.lock) **`core/archipelago/Cargo.toml`:** | Line | Current | Pin to | |------|---------|--------| | 44 | `sha2 = "0.10"` | `"0.10.9"` | | 45 | `hmac = "0.12"` | `"0.12.1"` | | 50 | `ed25519-dalek = { version = "2.1", ... }` | `version = "2.2.0"` | | 51 | `curve25519-dalek = "4"` | `"4.1.3"` | | 52 | `rand = "0.8"` | `"0.8.5"` | | 69 | `argon2 = "0.5"` | `"0.5.3"` | | 70 | `chacha20poly1305 = "0.10"` | `"0.10.1"` | | 81 | `zeroize = { version = "1.7", ... }` | `version = "1.8.2"` | | 92 | `hkdf = "0.12"` | `"0.12.4"` | **`core/security/Cargo.toml`:** | Line | Current | Pin to | |------|---------|--------| | 16 | `aes-gcm = "0.10"` | `"0.10.3"` | | 17 | `rand = "0.8"` | `"0.8.5"` | | 19 | `zeroize = { version = "1", ... }` | `version = "1.8.2"` | **Note:** `core/models/Cargo.toml` has `ed25519-dalek = "2.0.0"` but this crate is NOT in the workspace — it's dead code. Skip it. ### Risk: LOW — Pins to versions already resolved in Cargo.lock. No actual dependency changes. --- ## Item 3: Pin all floating container image tags — DONE **Why:** Floating tags (`:1`, `:7`, `:alpine`, `:main`) mean two installs a week apart get different software. Supply chain risk and a support nightmare. ### File: `scripts/image-versions.sh` | Line | Variable | Current Tag | Action | |------|----------|-------------|--------| | 16 | `MARIADB_IMAGE` | `:11.4` | SSH -> get exact patch version | | 21 | `POSTGRES_IMAGE` | `:15` | SSH -> get exact patch version | | 22 | `BTCPAY_POSTGRES_IMAGE` | `:15` | SSH -> get exact patch version | | 25 | `HOMEASSISTANT_IMAGE` | `:2024.12` | SSH -> get exact patch version | | 27 | `UPTIME_KUMA_IMAGE` | `:1` | SSH -> get exact patch version | | 32 | `NEXTCLOUD_IMAGE` | `:29` | SSH -> get exact patch version | | 34 | `ONLYOFFICE_IMAGE` | `:8.2` | SSH -> get exact patch version | | 35 | `FILEBROWSER_IMAGE` | `:v2` | SSH -> get exact patch version | | 36 | `NPM_IMAGE` | `:2` | SSH -> get exact patch version | | 49 | `REDIS_IMAGE` | `:7` | SSH -> get exact patch version | | 52 | `VALKEY_IMAGE` | `:8` | SSH -> get exact patch version | | 60 | `INDEEDHUB_POSTGRES_IMAGE` | `:16-alpine` | SSH -> get exact patch version | | 61 | `INDEEDHUB_REDIS_IMAGE` | `:7-alpine` | SSH -> get exact patch version | | 64 | `DWN_SERVER_IMAGE` | `:main` | SSH -> get image digest, pin by SHA or tag | | 68 | `NGINX_ALPINE_IMAGE` | `:alpine` | SSH -> get exact version | ### Pre-work required Run on 192.168.1.228: `podman images --format '{{.Repository}}:{{.Tag}}'` to get exact versions currently deployed. Pin to THOSE — don't upgrade. ### Risk: MEDIUM — Must match what's actually running. Wrong pin = containers fail on next creation. --- ## Item 4: Add CI pipeline for Rust + frontend checks — DONE **Why:** No tests or linting run in CI. Regressions from Items 1-3 (and all future beta fixes) go undetected until they hit the server. ### File to create: `.github/workflows/ci.yml` Two parallel jobs: 1. **`rust`** (ubuntu-latest): `cargo fmt --check` -> `cargo clippy -D warnings` -> `cargo test` 2. **`frontend`** (ubuntu-latest): `npm ci` -> `npm run type-check` -> `npm test` Trigger: push to `main` + all PRs. Reference existing `build-macos.yml` for action versions (checkout@v4, setup-node@v4 with Node 18). ### Risk: LOW — Additive only, new file, doesn't affect existing workflows. --- ## Execution Order 1. **Item 1** (Nostr timeouts) — lowest risk, immediate reliability gain 2. **Item 2** (crypto pins) — batch with Item 1 for single deploy 3. **Item 3** (container image pins) — requires SSH query first 4. **Item 4** (CI) — validates everything, no deploy needed Items 1+2 deploy together. Item 3 deploys separately (script only). Item 4 is push-only. ## Verification - Items 1+2: `cargo clippy --all-targets --all-features` on dev server (zero warnings), then deploy + test identity/discovery/marketplace features - Item 3: `source scripts/image-versions.sh` + verify all vars have exact patch versions - Item 4: Push to branch, verify both CI jobs pass green on GitHub Actions ## Deferred (post-beta) - Issue 6: Generate TS types from Rust (ts-rs) — new dependency - Issue 7: Consolidate container metadata to single source — structural refactor - Issue 8: Split deploy/ISO scripts into modules — already planned in script comments - Issue 9: Single app manifest driving all 6+ locations — architectural change - Issue 12: useAsyncState composable — touches 14+ views, risky during freeze