Architecture review (all P0+P1 issues now fixed): - Add 10s timeout to 6 bare Nostr client.connect() calls - Pin all 12 crypto deps to exact versions from Cargo.lock - Pin all 15 floating container image tags to exact patch versions - Add CI pipeline (cargo fmt + clippy + tests, frontend type-check + build) Self-update system (git.tx1138.com): - scripts/self-update.sh: pull, build, install, restart with rollback - systemd timer checks daily at 3 AM - update.check RPC does git-based checks when repo is present - update.git-apply RPC triggers self-update from UI - Default update URL changed from GitHub to git.tx1138.com - Git added to ISO package list for fresh installs Documentation: - CHANGELOG v1.3.1 with all changes - README updated (version, update system section) - BETA-PROGRESS session #6 logged - architecture-review.html: 4 issues marked FIXED, 8/12 refactoring done Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
146 lines
6.5 KiB
Markdown
146 lines
6.5 KiB
Markdown
# Architecture Review — Fix Remaining Issues
|
|
|
|
## Context
|
|
|
|
The architecture review (`docs/architecture-review.html`) identified 4 P0, 6 P1, and 6 medium-priority issues across the codebase. After research, **all 4 P0s and 4 of 6 P1s are already fixed**. This plan addresses the remaining open items that improve reliability and security during the beta freeze.
|
|
|
|
**What's already fixed:** P0-1 (health RPC), P0-2 (health checks), P0-3 (backup rollback), P0-4 (nginx protections), P1-B (rate limiter cleanup), P1-C (systemd limits), P1-E (WS reconnect), P1-F (Vue error handler), Issue 11 (session async I/O).
|
|
|
|
**What we're fixing now (4 items):**
|
|
|
|
---
|
|
|
|
## Item 1: Add 10s timeout to 6 bare `client.connect()` calls — DONE
|
|
|
|
**Why:** A down Nostr relay hangs the async task indefinitely, blocking identity publishing, node discovery, and marketplace operations. Direct uptime impact.
|
|
|
|
### Files & locations
|
|
|
|
| File | Line | Function |
|
|
|------|------|----------|
|
|
| `core/archipelago/src/identity_manager.rs` | 409 | `publish_profile()` |
|
|
| `core/archipelago/src/nostr_discovery.rs` | 113 | `publish_node_revocation()` |
|
|
| `core/archipelago/src/nostr_discovery.rs` | 200 | `verify_revocation()` |
|
|
| `core/archipelago/src/nostr_discovery.rs` | 264 | `discover_archipelago_nodes()` |
|
|
| `core/archipelago/src/marketplace.rs` | 298 | `discover()` |
|
|
| `core/archipelago/src/marketplace.rs` | 406 | `publish()` |
|
|
|
|
### Pattern (from `nostr_handshake.rs:126`)
|
|
|
|
Replace each `client.connect().await;` with:
|
|
```rust
|
|
if tokio::time::timeout(Duration::from_secs(10), client.connect()).await.is_err() {
|
|
tracing::warn!("Nostr relay connection timed out after 10s, continuing anyway");
|
|
}
|
|
```
|
|
|
|
Ensure `use std::time::Duration;` is imported in each file. `tracing::warn!` is already available in all three files.
|
|
|
|
### Risk: LOW — Mechanical pattern replication, no logic changes.
|
|
|
|
---
|
|
|
|
## Item 2: Pin all crypto dependency versions exactly — DONE
|
|
|
|
**Why:** Floating versions (`"2.1"` instead of `"2.2.0"`) allow `cargo update` to silently change crypto libraries. Supply chain risk + project rules violation.
|
|
|
|
### Versions (verified from Cargo.lock)
|
|
|
|
**`core/archipelago/Cargo.toml`:**
|
|
|
|
| Line | Current | Pin to |
|
|
|------|---------|--------|
|
|
| 44 | `sha2 = "0.10"` | `"0.10.9"` |
|
|
| 45 | `hmac = "0.12"` | `"0.12.1"` |
|
|
| 50 | `ed25519-dalek = { version = "2.1", ... }` | `version = "2.2.0"` |
|
|
| 51 | `curve25519-dalek = "4"` | `"4.1.3"` |
|
|
| 52 | `rand = "0.8"` | `"0.8.5"` |
|
|
| 69 | `argon2 = "0.5"` | `"0.5.3"` |
|
|
| 70 | `chacha20poly1305 = "0.10"` | `"0.10.1"` |
|
|
| 81 | `zeroize = { version = "1.7", ... }` | `version = "1.8.2"` |
|
|
| 92 | `hkdf = "0.12"` | `"0.12.4"` |
|
|
|
|
**`core/security/Cargo.toml`:**
|
|
|
|
| Line | Current | Pin to |
|
|
|------|---------|--------|
|
|
| 16 | `aes-gcm = "0.10"` | `"0.10.3"` |
|
|
| 17 | `rand = "0.8"` | `"0.8.5"` |
|
|
| 19 | `zeroize = { version = "1", ... }` | `version = "1.8.2"` |
|
|
|
|
**Note:** `core/models/Cargo.toml` has `ed25519-dalek = "2.0.0"` but this crate is NOT in the workspace — it's dead code. Skip it.
|
|
|
|
### Risk: LOW — Pins to versions already resolved in Cargo.lock. No actual dependency changes.
|
|
|
|
---
|
|
|
|
## Item 3: Pin all floating container image tags — DONE
|
|
|
|
**Why:** Floating tags (`:1`, `:7`, `:alpine`, `:main`) mean two installs a week apart get different software. Supply chain risk and a support nightmare.
|
|
|
|
### File: `scripts/image-versions.sh`
|
|
|
|
| Line | Variable | Current Tag | Action |
|
|
|------|----------|-------------|--------|
|
|
| 16 | `MARIADB_IMAGE` | `:11.4` | SSH -> get exact patch version |
|
|
| 21 | `POSTGRES_IMAGE` | `:15` | SSH -> get exact patch version |
|
|
| 22 | `BTCPAY_POSTGRES_IMAGE` | `:15` | SSH -> get exact patch version |
|
|
| 25 | `HOMEASSISTANT_IMAGE` | `:2024.12` | SSH -> get exact patch version |
|
|
| 27 | `UPTIME_KUMA_IMAGE` | `:1` | SSH -> get exact patch version |
|
|
| 32 | `NEXTCLOUD_IMAGE` | `:29` | SSH -> get exact patch version |
|
|
| 34 | `ONLYOFFICE_IMAGE` | `:8.2` | SSH -> get exact patch version |
|
|
| 35 | `FILEBROWSER_IMAGE` | `:v2` | SSH -> get exact patch version |
|
|
| 36 | `NPM_IMAGE` | `:2` | SSH -> get exact patch version |
|
|
| 49 | `REDIS_IMAGE` | `:7` | SSH -> get exact patch version |
|
|
| 52 | `VALKEY_IMAGE` | `:8` | SSH -> get exact patch version |
|
|
| 60 | `INDEEDHUB_POSTGRES_IMAGE` | `:16-alpine` | SSH -> get exact patch version |
|
|
| 61 | `INDEEDHUB_REDIS_IMAGE` | `:7-alpine` | SSH -> get exact patch version |
|
|
| 64 | `DWN_SERVER_IMAGE` | `:main` | SSH -> get image digest, pin by SHA or tag |
|
|
| 68 | `NGINX_ALPINE_IMAGE` | `:alpine` | SSH -> get exact version |
|
|
|
|
### Pre-work required
|
|
Run on 192.168.1.228: `podman images --format '{{.Repository}}:{{.Tag}}'` to get exact versions currently deployed. Pin to THOSE — don't upgrade.
|
|
|
|
### Risk: MEDIUM — Must match what's actually running. Wrong pin = containers fail on next creation.
|
|
|
|
---
|
|
|
|
## Item 4: Add CI pipeline for Rust + frontend checks — DONE
|
|
|
|
**Why:** No tests or linting run in CI. Regressions from Items 1-3 (and all future beta fixes) go undetected until they hit the server.
|
|
|
|
### File to create: `.github/workflows/ci.yml`
|
|
|
|
Two parallel jobs:
|
|
1. **`rust`** (ubuntu-latest): `cargo fmt --check` -> `cargo clippy -D warnings` -> `cargo test`
|
|
2. **`frontend`** (ubuntu-latest): `npm ci` -> `npm run type-check` -> `npm test`
|
|
|
|
Trigger: push to `main` + all PRs. Reference existing `build-macos.yml` for action versions (checkout@v4, setup-node@v4 with Node 18).
|
|
|
|
### Risk: LOW — Additive only, new file, doesn't affect existing workflows.
|
|
|
|
---
|
|
|
|
## Execution Order
|
|
|
|
1. **Item 1** (Nostr timeouts) — lowest risk, immediate reliability gain
|
|
2. **Item 2** (crypto pins) — batch with Item 1 for single deploy
|
|
3. **Item 3** (container image pins) — requires SSH query first
|
|
4. **Item 4** (CI) — validates everything, no deploy needed
|
|
|
|
Items 1+2 deploy together. Item 3 deploys separately (script only). Item 4 is push-only.
|
|
|
|
## Verification
|
|
|
|
- Items 1+2: `cargo clippy --all-targets --all-features` on dev server (zero warnings), then deploy + test identity/discovery/marketplace features
|
|
- Item 3: `source scripts/image-versions.sh` + verify all vars have exact patch versions
|
|
- Item 4: Push to branch, verify both CI jobs pass green on GitHub Actions
|
|
|
|
## Deferred (post-beta)
|
|
|
|
- Issue 6: Generate TS types from Rust (ts-rs) — new dependency
|
|
- Issue 7: Consolidate container metadata to single source — structural refactor
|
|
- Issue 8: Split deploy/ISO scripts into modules — already planned in script comments
|
|
- Issue 9: Single app manifest driving all 6+ locations — architectural change
|
|
- Issue 12: useAsyncState composable — touches 14+ views, risky during freeze
|