Architecture review (all P0+P1 issues now fixed): - Add 10s timeout to 6 bare Nostr client.connect() calls - Pin all 12 crypto deps to exact versions from Cargo.lock - Pin all 15 floating container image tags to exact patch versions - Add CI pipeline (cargo fmt + clippy + tests, frontend type-check + build) Self-update system (git.tx1138.com): - scripts/self-update.sh: pull, build, install, restart with rollback - systemd timer checks daily at 3 AM - update.check RPC does git-based checks when repo is present - update.git-apply RPC triggers self-update from UI - Default update URL changed from GitHub to git.tx1138.com - Git added to ISO package list for fresh installs Documentation: - CHANGELOG v1.3.1 with all changes - README updated (version, update system section) - BETA-PROGRESS session #6 logged - architecture-review.html: 4 issues marked FIXED, 8/12 refactoring done Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.5 KiB
Architecture Review — Fix Remaining Issues
Context
The architecture review (docs/architecture-review.html) identified 4 P0, 6 P1, and 6 medium-priority issues across the codebase. After research, all 4 P0s and 4 of 6 P1s are already fixed. This plan addresses the remaining open items that improve reliability and security during the beta freeze.
What's already fixed: P0-1 (health RPC), P0-2 (health checks), P0-3 (backup rollback), P0-4 (nginx protections), P1-B (rate limiter cleanup), P1-C (systemd limits), P1-E (WS reconnect), P1-F (Vue error handler), Issue 11 (session async I/O).
What we're fixing now (4 items):
Item 1: Add 10s timeout to 6 bare client.connect() calls — DONE
Why: A down Nostr relay hangs the async task indefinitely, blocking identity publishing, node discovery, and marketplace operations. Direct uptime impact.
Files & locations
| File | Line | Function |
|---|---|---|
core/archipelago/src/identity_manager.rs |
409 | publish_profile() |
core/archipelago/src/nostr_discovery.rs |
113 | publish_node_revocation() |
core/archipelago/src/nostr_discovery.rs |
200 | verify_revocation() |
core/archipelago/src/nostr_discovery.rs |
264 | discover_archipelago_nodes() |
core/archipelago/src/marketplace.rs |
298 | discover() |
core/archipelago/src/marketplace.rs |
406 | publish() |
Pattern (from nostr_handshake.rs:126)
Replace each client.connect().await; with:
if tokio::time::timeout(Duration::from_secs(10), client.connect()).await.is_err() {
tracing::warn!("Nostr relay connection timed out after 10s, continuing anyway");
}
Ensure use std::time::Duration; is imported in each file. tracing::warn! is already available in all three files.
Risk: LOW — Mechanical pattern replication, no logic changes.
Item 2: Pin all crypto dependency versions exactly — DONE
Why: Floating versions ("2.1" instead of "2.2.0") allow cargo update to silently change crypto libraries. Supply chain risk + project rules violation.
Versions (verified from Cargo.lock)
core/archipelago/Cargo.toml:
| Line | Current | Pin to |
|---|---|---|
| 44 | sha2 = "0.10" |
"0.10.9" |
| 45 | hmac = "0.12" |
"0.12.1" |
| 50 | ed25519-dalek = { version = "2.1", ... } |
version = "2.2.0" |
| 51 | curve25519-dalek = "4" |
"4.1.3" |
| 52 | rand = "0.8" |
"0.8.5" |
| 69 | argon2 = "0.5" |
"0.5.3" |
| 70 | chacha20poly1305 = "0.10" |
"0.10.1" |
| 81 | zeroize = { version = "1.7", ... } |
version = "1.8.2" |
| 92 | hkdf = "0.12" |
"0.12.4" |
core/security/Cargo.toml:
| Line | Current | Pin to |
|---|---|---|
| 16 | aes-gcm = "0.10" |
"0.10.3" |
| 17 | rand = "0.8" |
"0.8.5" |
| 19 | zeroize = { version = "1", ... } |
version = "1.8.2" |
Note: core/models/Cargo.toml has ed25519-dalek = "2.0.0" but this crate is NOT in the workspace — it's dead code. Skip it.
Risk: LOW — Pins to versions already resolved in Cargo.lock. No actual dependency changes.
Item 3: Pin all floating container image tags — DONE
Why: Floating tags (:1, :7, :alpine, :main) mean two installs a week apart get different software. Supply chain risk and a support nightmare.
File: scripts/image-versions.sh
| Line | Variable | Current Tag | Action |
|---|---|---|---|
| 16 | MARIADB_IMAGE |
:11.4 |
SSH -> get exact patch version |
| 21 | POSTGRES_IMAGE |
:15 |
SSH -> get exact patch version |
| 22 | BTCPAY_POSTGRES_IMAGE |
:15 |
SSH -> get exact patch version |
| 25 | HOMEASSISTANT_IMAGE |
:2024.12 |
SSH -> get exact patch version |
| 27 | UPTIME_KUMA_IMAGE |
:1 |
SSH -> get exact patch version |
| 32 | NEXTCLOUD_IMAGE |
:29 |
SSH -> get exact patch version |
| 34 | ONLYOFFICE_IMAGE |
:8.2 |
SSH -> get exact patch version |
| 35 | FILEBROWSER_IMAGE |
:v2 |
SSH -> get exact patch version |
| 36 | NPM_IMAGE |
:2 |
SSH -> get exact patch version |
| 49 | REDIS_IMAGE |
:7 |
SSH -> get exact patch version |
| 52 | VALKEY_IMAGE |
:8 |
SSH -> get exact patch version |
| 60 | INDEEDHUB_POSTGRES_IMAGE |
:16-alpine |
SSH -> get exact patch version |
| 61 | INDEEDHUB_REDIS_IMAGE |
:7-alpine |
SSH -> get exact patch version |
| 64 | DWN_SERVER_IMAGE |
:main |
SSH -> get image digest, pin by SHA or tag |
| 68 | NGINX_ALPINE_IMAGE |
:alpine |
SSH -> get exact version |
Pre-work required
Run on 192.168.1.228: podman images --format '{{.Repository}}:{{.Tag}}' to get exact versions currently deployed. Pin to THOSE — don't upgrade.
Risk: MEDIUM — Must match what's actually running. Wrong pin = containers fail on next creation.
Item 4: Add CI pipeline for Rust + frontend checks — DONE
Why: No tests or linting run in CI. Regressions from Items 1-3 (and all future beta fixes) go undetected until they hit the server.
File to create: .github/workflows/ci.yml
Two parallel jobs:
rust(ubuntu-latest):cargo fmt --check->cargo clippy -D warnings->cargo testfrontend(ubuntu-latest):npm ci->npm run type-check->npm test
Trigger: push to main + all PRs. Reference existing build-macos.yml for action versions (checkout@v4, setup-node@v4 with Node 18).
Risk: LOW — Additive only, new file, doesn't affect existing workflows.
Execution Order
- Item 1 (Nostr timeouts) — lowest risk, immediate reliability gain
- Item 2 (crypto pins) — batch with Item 1 for single deploy
- Item 3 (container image pins) — requires SSH query first
- Item 4 (CI) — validates everything, no deploy needed
Items 1+2 deploy together. Item 3 deploys separately (script only). Item 4 is push-only.
Verification
- Items 1+2:
cargo clippy --all-targets --all-featureson dev server (zero warnings), then deploy + test identity/discovery/marketplace features - Item 3:
source scripts/image-versions.sh+ verify all vars have exact patch versions - Item 4: Push to branch, verify both CI jobs pass green on GitHub Actions
Deferred (post-beta)
- Issue 6: Generate TS types from Rust (ts-rs) — new dependency
- Issue 7: Consolidate container metadata to single source — structural refactor
- Issue 8: Split deploy/ISO scripts into modules — already planned in script comments
- Issue 9: Single app manifest driving all 6+ locations — architectural change
- Issue 12: useAsyncState composable — touches 14+ views, risky during freeze