archy/docs/multi-node-architecture.md
Dorian 6fee6befed refactor: update dependencies and remove unused code
- Added new dependencies: `adler2`, `crc32fast`, `flate2`, `miniz_oxide`, and `libredox`.
- Updated existing dependencies: `tokio-rustls` to version 0.26.4 and `filetime` to version 0.2.27.
- Removed the `backup.rs` file as it is no longer needed.
- Introduced tests for configuration and credential management.
- Enhanced the `identity` module to generate W3C compliant DID documents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 00:19:30 +00:00

189 lines
7.3 KiB
Markdown

# Multi-Node Architecture
## Overview
Archipelago supports federation — multiple nodes can form a trusted cluster to share status, deploy apps remotely, and coordinate services. This document describes the architecture for multi-node orchestration.
## Discovery & Trust Model
### Node Discovery
Nodes discover each other through two complementary channels:
1. **Nostr Relay Discovery**: Each node publishes its identity (DID, onion address, pubkey) to configured Nostr relays as a NIP-78 application-specific event. Other nodes query relays to find peers.
2. **Direct Invite**: A node generates an invite code containing its DID, onion address, and a one-time authentication token. The recipient node uses this code to establish a direct connection.
3. **Tor Hidden Services**: All inter-node communication uses Tor hidden services (.onion addresses) for privacy and NAT traversal.
### Trust Establishment
Federation uses a mutual DID verification model:
```
Node A Node B
│ │
│── federation.invite (generates invite code) ──► │
│ │
│ ◄── federation.join (presents invite + DID) ── │
│ │
│── Verify Node B's DID Document over Tor ──────► │
│ ◄── Verify Node A's DID Document over Tor ── │
│ │
│── Exchange signed challenge/response ─────────► │
│ ◄── Exchange signed challenge/response ────── │
│ │
│ [Mutual trust established] │
│ [Both nodes add each other to federation] │
```
**Trust Levels**:
- `trusted`: Full federation — can deploy apps, sync state, see all container statuses
- `observer`: Read-only — can see status but cannot deploy or modify
- `untrusted`: Discovered but not yet verified — pending invite acceptance
### ADR: Decentralized Trust over Centralized Authority
**Decision**: Use DID-based mutual verification instead of a central authority or PKI.
**Context**: Archipelago nodes are sovereign — no central server should control trust. Each node maintains its own trust list.
**Consequences**:
- (+) No single point of failure for trust
- (+) Nodes can federate without internet (direct Tor connection)
- (+) Consistent with the DID identity model already in use
- (-) No global revocation mechanism (each node manages its own trust)
- (-) Trust is bilateral — A trusting B doesn't imply C trusts B
## Shared State Protocol
### State Sync
Federated nodes periodically sync their state. Each node exposes a state summary via its RPC endpoint, accessible only to trusted federation peers.
**Synced data**:
- Container/app statuses (installed, running, stopped, version)
- Node health (CPU, memory, disk, uptime)
- Available storage capacity
- Tor hidden service status
- Lightning Network status (channels, capacity)
**Not synced** (privacy):
- Credentials and secrets
- Private keys
- Session data
- User passwords
### Sync Protocol
```
Every 5 minutes (configurable):
For each federated node:
1. POST to peer's /rpc/ endpoint: federation.get-state
2. Authenticate with signed challenge (DID key)
3. Receive state snapshot
4. Store in local federation cache
5. Broadcast changes via WebSocket to local UI
```
### State Storage
```
/var/lib/archipelago/federation/
├── nodes.json # List of federated nodes with trust levels
├── state-cache/
│ ├── <node-did>.json # Latest state snapshot from each peer
│ └── ...
└── invites/
├── pending.json # Outgoing invites awaiting acceptance
└── received.json # Incoming invites awaiting approval
```
## RPC Endpoints
### Federation Management
| Method | Description | Auth |
|--------|-------------|------|
| `federation.invite` | Generate invite code for a new peer | Local |
| `federation.join` | Accept an invite and establish federation | Local |
| `federation.list-nodes` | List all federated nodes with status | Local |
| `federation.remove-node` | Remove a node from federation | Local |
| `federation.set-trust` | Change trust level for a federated node | Local |
### Federation Data Exchange
| Method | Description | Auth |
|--------|-------------|------|
| `federation.get-state` | Return node's state snapshot | Federation peer |
| `federation.deploy-app` | Request remote app installation | Trusted peer |
| `federation.sync-state` | Trigger manual state sync | Local |
### Authentication for Inter-Node RPC
Federation RPC calls between nodes use DID-based authentication:
1. Caller includes `X-Federation-DID` header with their DID
2. Caller includes `X-Federation-Sig` header with a signed timestamp
3. Receiver verifies the DID is in their trusted federation list
4. Receiver verifies the signature using the DID's public key
5. Timestamp must be within 5 minutes to prevent replay attacks
## Federated App Deployment
### Flow
```
Local Node Remote Node
│ │
│── federation.deploy-app ──────► │
│ {app_id, version, config} │
│ │
│ [Remote verifies trust level] │
│ [Remote checks if app exists] │
│ [Remote pulls container image] │
│ [Remote starts container] │
│ │
│ ◄── Status update via sync ── │
│ {app_id: "running"} │
```
### Constraints
- Only `trusted` peers can deploy apps to each other
- Remote node can reject deployment (insufficient resources, policy)
- Container images are pulled from registry, not transferred between nodes
- App configuration is sent with the deploy command
- Remote node applies its own security policies (AppArmor, capabilities)
## UI: Federation Dashboard
**Route**: `/dashboard/server/federation`
**Components**:
1. **Node List**: Table of federated nodes showing:
- Node name (DID-derived or custom alias)
- Status: online/offline (based on last successful sync)
- Trust level badge (trusted/observer)
- App count, resource usage summary
- Last seen timestamp
2. **Add Node**: Form with invite code input or QR code scanner
3. **Node Detail Modal**: Clicking a node shows:
- Full DID and onion address
- Container/app list with statuses
- Resource usage (CPU, memory, disk)
- Deploy app button (if trusted)
- Change trust level / remove node
## Security Considerations
1. **All federation traffic over Tor**: Prevents IP address leakage between nodes
2. **DID-based auth**: No shared secrets; each node proves identity with its key
3. **Replay protection**: Signed timestamps prevent replay attacks
4. **Trust is bilateral**: Both nodes must agree to federate
5. **App deployment is opt-in**: Remote node can refuse deployment requests
6. **State snapshots are read-only**: A compromised peer cannot modify another node's state
7. **Invite codes are single-use**: Once accepted, the invite token is invalidated