archy/docs/app-manifest-spec.md

127 lines
3.7 KiB
Markdown
Raw Normal View History

release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
# App Manifest Specification
## Overview
App manifests define containerized applications in Archipelago. They use YAML format and specify container configuration, dependencies, resources, security policies, and integration metadata.
## File Location
App manifests are stored in `apps/{app-id}/manifest.yml`
## Schema
### Required Fields
```yaml
app:
id: string # Unique app identifier (lowercase, kebab-case)
name: string # Human-readable name
version: string # Semantic version (e.g., "1.0.0")
container:
image: string # Container image (e.g., "bitcoin/bitcoin:26.0")
```
### Optional Fields
```yaml
app:
description: string # App description
container:
image_signature: string # Cosign signature URL (e.g., "cosign://...")
pull_policy: string # "if-not-present" | "always" | "never"
dependencies:
- storage: string # Minimum disk space (e.g., "500Gi")
- app_id: string # Required app dependency
version: string # Version constraint (e.g., ">=26.0")
- string # Simple app dependency
resources:
cpu_limit: number # CPU cores (e.g., 2)
memory_limit: string # Memory limit (e.g., "2Gi", "512Mi")
disk_limit: string # Disk limit (e.g., "500Gi")
security:
capabilities: [string] # Linux capabilities (e.g., ["NET_BIND_SERVICE"])
readonly_root: boolean # Read-only root filesystem (default: true)
network_policy: string # "isolated" | "host" | network name
apparmor_profile: string # AppArmor profile name
ports:
- host: number # Host port
container: number # Container port
protocol: string # "tcp" | "udp" (default: "tcp")
volumes:
- type: string # "bind" | "tmpfs" | "volume"
source: string # Host path
target: string # Container path
options: [string] # Mount options (e.g., ["rw", "noexec"])
environment:
- string # Environment variable (e.g., "NETWORK=mainnet")
devices:
- string # Device path (e.g., "/dev/ttyUSB0")
health_check:
type: string # "http" | "exec"
endpoint: string # HTTP URL or command
path: string # HTTP path (for http type)
interval: string # Check interval (e.g., "30s")
timeout: string # Timeout (e.g., "5s")
retries: number # Failure retries (default: 3)
# Integration-specific metadata
bitcoin_integration:
rpc_access: string # "admin" | "read-only"
sync_required: boolean # Requires synced node
testnet_support: boolean
pruning_support: boolean
lightning_integration:
channel_management: boolean
payment_routing: boolean
nostr_integration:
relay_type: string # "public" | "private"
monetization_enabled: boolean
event_storage: string # "sqlite" | "postgres"
web5_integration:
did_support: boolean
dwn_protocol: boolean
sync_enabled: boolean
networking:
mesh_enabled: boolean
local_network_access: boolean
device_discovery: boolean
routing_protocols: [string] # e.g., ["olsr", "babel"]
```
## Examples
See `apps/` directory for complete examples:
- `apps/bitcoin-core/manifest.yml`
- `apps/lnd/manifest.yml`
- `apps/nostr-rs-relay/manifest.yml`
- `apps/meshtastic/manifest.yml`
## Validation
Manifests are validated on installation:
- Required fields present
- Version format valid
- Resource limits reasonable
- Port conflicts detected
- Dependency cycles prevented
## Versioning
- Use semantic versioning (MAJOR.MINOR.PATCH)
- Breaking changes increment MAJOR
- New features increment MINOR
- Bug fixes increment PATCH