- Added new dependencies: `adler2`, `crc32fast`, `flate2`, `miniz_oxide`, and `libredox`. - Updated existing dependencies: `tokio-rustls` to version 0.26.4 and `filetime` to version 0.2.27. - Removed the `backup.rs` file as it is no longer needed. - Introduced tests for configuration and credential management. - Enhanced the `identity` module to generate W3C compliant DID documents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
189 lines
7.3 KiB
Markdown
189 lines
7.3 KiB
Markdown
# Multi-Node Architecture
|
|
|
|
## Overview
|
|
|
|
Archipelago supports federation — multiple nodes can form a trusted cluster to share status, deploy apps remotely, and coordinate services. This document describes the architecture for multi-node orchestration.
|
|
|
|
## Discovery & Trust Model
|
|
|
|
### Node Discovery
|
|
|
|
Nodes discover each other through two complementary channels:
|
|
|
|
1. **Nostr Relay Discovery**: Each node publishes its identity (DID, onion address, pubkey) to configured Nostr relays as a NIP-78 application-specific event. Other nodes query relays to find peers.
|
|
|
|
2. **Direct Invite**: A node generates an invite code containing its DID, onion address, and a one-time authentication token. The recipient node uses this code to establish a direct connection.
|
|
|
|
3. **Tor Hidden Services**: All inter-node communication uses Tor hidden services (.onion addresses) for privacy and NAT traversal.
|
|
|
|
### Trust Establishment
|
|
|
|
Federation uses a mutual DID verification model:
|
|
|
|
```
|
|
Node A Node B
|
|
│ │
|
|
│── federation.invite (generates invite code) ──► │
|
|
│ │
|
|
│ ◄── federation.join (presents invite + DID) ── │
|
|
│ │
|
|
│── Verify Node B's DID Document over Tor ──────► │
|
|
│ ◄── Verify Node A's DID Document over Tor ── │
|
|
│ │
|
|
│── Exchange signed challenge/response ─────────► │
|
|
│ ◄── Exchange signed challenge/response ────── │
|
|
│ │
|
|
│ [Mutual trust established] │
|
|
│ [Both nodes add each other to federation] │
|
|
```
|
|
|
|
**Trust Levels**:
|
|
- `trusted`: Full federation — can deploy apps, sync state, see all container statuses
|
|
- `observer`: Read-only — can see status but cannot deploy or modify
|
|
- `untrusted`: Discovered but not yet verified — pending invite acceptance
|
|
|
|
### ADR: Decentralized Trust over Centralized Authority
|
|
|
|
**Decision**: Use DID-based mutual verification instead of a central authority or PKI.
|
|
|
|
**Context**: Archipelago nodes are sovereign — no central server should control trust. Each node maintains its own trust list.
|
|
|
|
**Consequences**:
|
|
- (+) No single point of failure for trust
|
|
- (+) Nodes can federate without internet (direct Tor connection)
|
|
- (+) Consistent with the DID identity model already in use
|
|
- (-) No global revocation mechanism (each node manages its own trust)
|
|
- (-) Trust is bilateral — A trusting B doesn't imply C trusts B
|
|
|
|
## Shared State Protocol
|
|
|
|
### State Sync
|
|
|
|
Federated nodes periodically sync their state. Each node exposes a state summary via its RPC endpoint, accessible only to trusted federation peers.
|
|
|
|
**Synced data**:
|
|
- Container/app statuses (installed, running, stopped, version)
|
|
- Node health (CPU, memory, disk, uptime)
|
|
- Available storage capacity
|
|
- Tor hidden service status
|
|
- Lightning Network status (channels, capacity)
|
|
|
|
**Not synced** (privacy):
|
|
- Credentials and secrets
|
|
- Private keys
|
|
- Session data
|
|
- User passwords
|
|
|
|
### Sync Protocol
|
|
|
|
```
|
|
Every 5 minutes (configurable):
|
|
For each federated node:
|
|
1. POST to peer's /rpc/ endpoint: federation.get-state
|
|
2. Authenticate with signed challenge (DID key)
|
|
3. Receive state snapshot
|
|
4. Store in local federation cache
|
|
5. Broadcast changes via WebSocket to local UI
|
|
```
|
|
|
|
### State Storage
|
|
|
|
```
|
|
/var/lib/archipelago/federation/
|
|
├── nodes.json # List of federated nodes with trust levels
|
|
├── state-cache/
|
|
│ ├── <node-did>.json # Latest state snapshot from each peer
|
|
│ └── ...
|
|
└── invites/
|
|
├── pending.json # Outgoing invites awaiting acceptance
|
|
└── received.json # Incoming invites awaiting approval
|
|
```
|
|
|
|
## RPC Endpoints
|
|
|
|
### Federation Management
|
|
|
|
| Method | Description | Auth |
|
|
|--------|-------------|------|
|
|
| `federation.invite` | Generate invite code for a new peer | Local |
|
|
| `federation.join` | Accept an invite and establish federation | Local |
|
|
| `federation.list-nodes` | List all federated nodes with status | Local |
|
|
| `federation.remove-node` | Remove a node from federation | Local |
|
|
| `federation.set-trust` | Change trust level for a federated node | Local |
|
|
|
|
### Federation Data Exchange
|
|
|
|
| Method | Description | Auth |
|
|
|--------|-------------|------|
|
|
| `federation.get-state` | Return node's state snapshot | Federation peer |
|
|
| `federation.deploy-app` | Request remote app installation | Trusted peer |
|
|
| `federation.sync-state` | Trigger manual state sync | Local |
|
|
|
|
### Authentication for Inter-Node RPC
|
|
|
|
Federation RPC calls between nodes use DID-based authentication:
|
|
|
|
1. Caller includes `X-Federation-DID` header with their DID
|
|
2. Caller includes `X-Federation-Sig` header with a signed timestamp
|
|
3. Receiver verifies the DID is in their trusted federation list
|
|
4. Receiver verifies the signature using the DID's public key
|
|
5. Timestamp must be within 5 minutes to prevent replay attacks
|
|
|
|
## Federated App Deployment
|
|
|
|
### Flow
|
|
|
|
```
|
|
Local Node Remote Node
|
|
│ │
|
|
│── federation.deploy-app ──────► │
|
|
│ {app_id, version, config} │
|
|
│ │
|
|
│ [Remote verifies trust level] │
|
|
│ [Remote checks if app exists] │
|
|
│ [Remote pulls container image] │
|
|
│ [Remote starts container] │
|
|
│ │
|
|
│ ◄── Status update via sync ── │
|
|
│ {app_id: "running"} │
|
|
```
|
|
|
|
### Constraints
|
|
|
|
- Only `trusted` peers can deploy apps to each other
|
|
- Remote node can reject deployment (insufficient resources, policy)
|
|
- Container images are pulled from registry, not transferred between nodes
|
|
- App configuration is sent with the deploy command
|
|
- Remote node applies its own security policies (AppArmor, capabilities)
|
|
|
|
## UI: Federation Dashboard
|
|
|
|
**Route**: `/dashboard/server/federation`
|
|
|
|
**Components**:
|
|
1. **Node List**: Table of federated nodes showing:
|
|
- Node name (DID-derived or custom alias)
|
|
- Status: online/offline (based on last successful sync)
|
|
- Trust level badge (trusted/observer)
|
|
- App count, resource usage summary
|
|
- Last seen timestamp
|
|
|
|
2. **Add Node**: Form with invite code input or QR code scanner
|
|
|
|
3. **Node Detail Modal**: Clicking a node shows:
|
|
- Full DID and onion address
|
|
- Container/app list with statuses
|
|
- Resource usage (CPU, memory, disk)
|
|
- Deploy app button (if trusted)
|
|
- Change trust level / remove node
|
|
|
|
## Security Considerations
|
|
|
|
1. **All federation traffic over Tor**: Prevents IP address leakage between nodes
|
|
2. **DID-based auth**: No shared secrets; each node proves identity with its key
|
|
3. **Replay protection**: Signed timestamps prevent replay attacks
|
|
4. **Trust is bilateral**: Both nodes must agree to federate
|
|
5. **App deployment is opt-in**: Remote node can refuse deployment requests
|
|
6. **State snapshots are read-only**: A compromised peer cannot modify another node's state
|
|
7. **Invite codes are single-use**: Once accepted, the invite token is invalidated
|