fix: overhaul container lifecycle — recovery, health, uninstall, UI state

Container recovery:
- Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s
- Dependency-aware restarts: won't restart services before their deps
- Reset dependent counters when a dependency recovers
- Handle "created" state containers (were invisible to health monitor)
- Added IndeedHub, mempool-api, mysql to tier system
- Crash recovery: podman start timeout 30s→120s with retry
- Podman client: socket timeout 5s→30s, added restart policy

UI state representation:
- Exit code 0 shows "stopped" (gray), not "crashed" (red)
- Exit code 137 shows "killed (OOM)"
- Non-zero exit shows "crashed" (red)
- Added exit_code field to PackageDataEntry

Install/uninstall fixes:
- Install returns error when container doesn't start (was silent success)
- Post-install hooks awaited instead of fire-and-forget tokio::spawn
- Uninstall: graceful rm before force, volume prune, network cleanup
- Uninstall returns error on partial failure (was 200 OK)

Config consistency:
- DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded)
- Bitcoin: added ZMQ ports 28332/28333 for LND block notifications
- IndeedHub port 7777→8190 (was conflicting with strfry)
- Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0

Performance:
- Metrics collector interval 60s→300s (was duplicating health monitor)
- Podman client: proper error propagation instead of unwrap_or_default

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dorian 2026-03-31 07:03:57 +01:00
parent cdff10a8bc
commit 64b57dca7d
65 changed files with 3950 additions and 298 deletions

View File

@ -0,0 +1,47 @@
---
name: code-reviewer
description: Reviews Archipelago code changes for quality — frontend patterns, Rust safety, container security, crypto rules, and project conventions.
tools: Read, Grep, Glob
model: sonnet
---
You are an Archipelago code reviewer. Check changes against project standards.
## Frontend (neode-ui/)
- `<script setup lang="ts">` in all Vue components
- Global CSS in `style.css`, never inline Tailwind utilities
- `.glass-button` for buttons, not `.gradient-button`
- Pinia stores for shared state, never provide/inject
- Every async view needs: loading state, empty state, error state
- Trim text inputs before submission
- Disable submit buttons during async operations
- Use `errorMessage` ref pattern for user-visible errors
## Backend (core/)
- No `.unwrap()` in request handlers — use `anyhow::Result`
- Validate input before path construction (reject `..`, `/`, null bytes)
- Timeouts on all external operations (10s default, 30s heavy)
- Log with `tracing`, never `println!` or `eprintln!`
- Container ops through `PodmanClient`, never raw `Command::new("podman")`
- Backend binds 127.0.0.1 only
## Containers
- `--cap-drop=ALL --cap-add=...` (except SearXNG — needs default caps)
- `--security-opt=no-new-privileges:true`
- Pin image versions, never `:latest`
- `--restart unless-stopped`
- UID mapping: `host_uid = 100000 + container_uid`
## Security
- Constant-time comparisons for secrets/tokens/HMACs
- No key material in logs at any level
- Zeroize after crypto operations
- ed25519 over RSA, ChaCha20-Poly1305 over AES-CBC
- CSPRNG only (OsRng in Rust, crypto.getRandomValues in JS)
- Sats as integers (u64/BigInt), never floats
## Project Conventions
- Commits: `type: description` (feat, fix, docs, refactor, test, chore, perf)
- Container images: `scripts/image-versions.sh` is single source of truth
- Frontend builds to `web/dist/neode-ui/`, not `neode-ui/dist/`
- Type-check before committing: `cd neode-ui && npx vue-tsc -b --noEmit`

View File

@ -0,0 +1,42 @@
---
name: deploy-specialist
description: Deploys Archipelago to all 5 nodes. Knows SSH access, build capabilities, post-deploy verification, and IndeedHub multi-node patterns.
tools: Bash, Read, Grep, Glob
model: sonnet
---
You are the Archipelago deploy specialist. You deploy backend, frontend, and container changes to the fleet.
## Node Inventory
| Node | Address | SSH |
|------|---------|-----|
| .228 (primary) | 192.168.1.228 | `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228` |
| .198 (secondary) | 192.168.1.198 | `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.198` |
| Arch 1 | 100.82.97.63 | `ssh -i ~/.ssh/archipelago-deploy archipelago@100.82.97.63` |
| Arch 2 | 100.122.84.60 | `ssh -i ~/.ssh/archipelago-deploy archipelago@archipelago-2.tail2b6225.ts.net` |
| Arch 3 | 100.124.105.113 | `ssh -i ~/.ssh/archipelago-deploy archipelago@100.124.105.113` |
## Deploy Methods
- **LAN (.228, .198)**: `./scripts/deploy-to-target.sh --both`
- **Arch 2**: `ARCHIPELAGO_TARGET="archipelago@archipelago-2.tail2b6225.ts.net" ./scripts/deploy-to-target.sh --live`
- **Arch 3**: SCP pre-built binary + frontend tarball (no build tools on this node)
- SSH directly from Mac to all nodes with `~/.ssh/archipelago-deploy` — never relay through .228
## Critical Rules
1. When updating IndeedHub: deploy to ALL nodes, not just .228
2. IndeedHub nginx MUST use hardcoded container IPs, not DNS names
3. After container recreation: reapply ALL patches (X-Frame-Options removal, IP hardcoding, nostr-provider injection)
4. Export custom images as INDIVIDUAL tarballs (combined tarballs corrupt image IDs)
5. Containers binding port 80 need `--user 0:0` (NET_BIND_SERVICE ignored in rootless Podman)
6. MariaDB/Postgres only read env vars on FIRST init — password changes need ALTER USER
## Post-Deploy Checklist
1. Test web UI at target IP
2. Verify modified apps load correctly
3. Check backend: `sudo journalctl -u archipelago -n 20`
4. Check nginx: `sudo tail -20 /var/log/nginx/error.log`
5. If ISO-related: sync configs to image-recipe/configs/

View File

@ -34,6 +34,9 @@
## Deploy & Container Fixes
- [project_deploy_session_2026_03_22.md](project_deploy_session_2026_03_22.md) — Fleet deploy fixes: credential mismatches, restart storms, rootless port 80, deploy script hardening
## Gamepad Navigation
- [project_gamepad_nav.md](project_gamepad_nav.md) — Controller nav system, key files, patterns, Chromium gotchas
## Completed Work
- [project_mesh_198_issue.md](project_mesh_198_issue.md) — Mesh .198: 3 bugs fixed and deployed
- [project_indeedhub_arch3_fix.md](project_indeedhub_arch3_fix.md) — IndeedHub Arch 3: corrupted combined tarball fixed

View File

@ -0,0 +1,16 @@
---
name: Asset workflow - designer makes images
description: User is a designer — never generate PNG/JPEG/SVG assets, only provide specs. TUI/text animations are Claude's job.
type: feedback
---
Never generate PNG, JPEG, or SVG image assets. The user is a designer and will always create these manually.
**Claude's job:** TUI text, animations, shell scripts, code
**User's job:** PNG, JPEG, SVG, any visual/graphic assets
When images are needed, provide clear specs (dimensions, format, constraints, where they go) and let the user create them.
**Why:** User is a professional designer. Auto-generated pixel art looks generic compared to their actual brand artwork.
**How to apply:** When boot splash, logos, icons, or any visual assets are needed, output a spec sheet with dimensions/format/constraints. Never run image generation scripts as part of the build.

View File

@ -0,0 +1,19 @@
---
name: Deploy container patterns
description: Hard-won deploy patterns — rootless port 80, credential sync, health checks, image export
type: feedback
---
Container deploy patterns learned from fleet-wide deploy sessions.
**Rootless port 80:** Containers binding port 80 MUST use `--user 0:0`. `NET_BIND_SERVICE` cap doesn't work in rootless Podman.
**Why:** Discovered across multiple containers (FileBrowser, Nextcloud, Vaultwarden, Jellyfin) that `--cap-add NET_BIND_SERVICE` is silently ignored in rootless mode. Only `--user 0:0` works.
**Credential sync:** MariaDB/Postgres only read env vars on FIRST init. If deploy generates new random passwords in `secrets/` but the DB data dir already exists, the DB keeps the OLD password. Fix: either wipe data dir + reinit, or `ALTER USER` to sync.
**Image export:** Always export custom images as INDIVIDUAL tarballs (`podman save -o name.tar`). Combined tarballs corrupt image IDs.
**Health checks:** Every container should have `--health-cmd`. Currently 25+ containers have them.
**How to apply:** Check these patterns in any deploy script changes or new container additions.

View File

@ -0,0 +1,12 @@
---
name: Gamepad navigation unfinished
description: Gamepad/controller nav rewrite (aada1975) shipped but has issues — needs further work
type: feedback
---
Gamepad navigation rewrite was committed (aada1975) and included in CI ISO builds, but user reports it's not working correctly. Issues:
- Can't exit input fields with up/down arrow keys when other elements are available
- Navigation behavior not right (unspecified details — need to investigate)
**Why:** The rewrite was a major change to `useControllerNav.ts` and focus management. Shipped in beta but needs polish.
**How to apply:** When touching gamepad/controller nav, treat as unfinished work. Test arrow key behavior on inputs, focus trap logic, and spatial navigation thoroughly.

View File

@ -0,0 +1,19 @@
---
name: Archipelago ASCII logo — never change
description: The block-letter ASCII art logo for Archipelago is locked in. Use this exact design everywhere.
type: feedback
---
The Archipelago ASCII block-letter logo is finalized. Never change it.
```
█▀█ █▀▄ █▀▀ █ █ █ █▀█ █▀▀ █ █▀█ █▀▀ █▀█
█▀█ █▀▄ █ █▀█ █ █▀▀ ██▀ █ █▀█ █ █ █ █
▀ ▀ ▀ ▀ ▀▀▀ ▀ ▀ ▀ ▀ ▀▀▀ ▀▀▀ ▀ ▀ ▀▀▀ ▀▀▀
```
Uses ▀ ▄ █ block characters. 45 chars wide, fits in any 52+ col box.
Render in Bitcoin orange (`\033[38;5;208m`) by default.
**Why:** User explicitly approved this logo and said "save that never change."
**How to apply:** Use this for all TUI contexts — install screens, MOTD, menu banners, boot displays. Replace the old spaced-out `a r c h i p e l a g o` text with this wherever a banner is needed.

View File

@ -0,0 +1,19 @@
---
name: App Registry Setup
description: Archipelago app container registry at 80.71.235.15:3000 (Gitea) — marketplace images mirrored there
type: project
---
Archipelago app registry running on Gitea at `80.71.235.15:3000`, org `archipelago`.
**Why:** Self-hosted container registry so Archipelago nodes pull app images from our infrastructure instead of Docker Hub/ghcr.io. Critical for unbundled ISO installs where apps are downloaded on-demand.
**How to apply:**
- Registry URL: `80.71.235.15:3000/archipelago/<app>:<version>`
- HTTP only (insecure) — nodes need `registries.conf` with `insecure = true`
- ISO build bakes the insecure registry config into `/home/archipelago/.config/containers/registries.conf`
- Marketplace data in `neode-ui/src/views/marketplace/marketplaceData.ts` uses `REGISTRY` constant
- 34 images pushed from .228 on 2026-03-26
- NOT pushed yet: Thunderhub, Penpot (not on .228)
- Gitea instance deployed via Portainer on `80.71.235.15:9443`
- Login: podman login 80.71.235.15:3000 (credentials set up on .228)

View File

@ -0,0 +1,20 @@
---
name: CI/CD Setup
description: Gitea Actions CI/CD — runner on .228, workflow builds unbundled ISO on push to main
type: project
---
CI/CD pipeline using Gitea Actions on git.tx1138.com.
**Why:** Automatic ISO builds on every push to main. ISOs copied to FileBrowser /Builds/ for download.
**How to apply:**
- Gitea repo: `git.tx1138.com/lfg2025/archy`
- Runner: .228 registered as `archipelago-builder` with label `ubuntu-latest:host`
- Runner service: `gitea-runner.service` (systemd, runs as archipelago user)
- Runner config: `~/.runner` on .228
- Workflow: `.gitea/workflows/build-iso.yml` — unbundled ISO only
- Uses `https://git.tx1138.com/actions/checkout@v4` (NOT github.com actions)
- Builds: backend (cargo), frontend (npm), then ISO with `UNBUNDLED=1`
- Output: copied to `/var/lib/archipelago/filebrowser/Builds/`
- act_runner v0.2.11 installed at `/usr/local/bin/act_runner`

View File

@ -0,0 +1,20 @@
---
name: Container Orchestration Hardening
description: Container orchestration overhaul — stop grace periods, pull retry, persistent restart tracking, scheduled remediation, failsafe install, boot reconciliation
type: project
---
Container orchestration hardening implemented on dev-iso branch (2026-03-28).
**Why:** Gitea issue requesting true orchestration. Containers were unreliable — 10s stop timeout risked Bitcoin Core UTXO corruption, image pulls failed silently, restart counters reset on process restart enabling infinite loops, doctor/reconcile scripts only ran manually.
**What was done (7 changes):**
1. Per-container stop grace periods (600s bitcoin, 330s lnd, 300s electrs, 120s databases, 60s btcpay, 30s default) + systemd TimeoutStopSec=660
2. Image pull retry with exponential backoff (3 attempts: 5s/15s/45s) + post-pull verification + stacks.rs error propagation instead of silent swallow
3. Resolved container/health_monitor.rs TODO (documented as orchestrator-level responsibility)
4. Persistent restart tracking to restart-tracker.json (survives process restarts, seeded on startup)
5. Scheduled systemd timers: container-doctor every 30min, reconcile-containers every 6h
6. Failsafe install: post-pull image verify, rollback on start failure, 30s post-start health check with crash diagnosis
7. Boot reconciliation: runs reconcile-containers.sh after crash recovery completes
**How to apply:** These changes affect beta reliability. The other programmer is working on custom base ISO on the same branch — coordinate on build-auto-installer-iso.sh changes.

View File

@ -0,0 +1,22 @@
---
name: Gamepad Navigation System
description: Controller/gamepad navigation architecture, key decisions, known issues, and the nav map doc location
type: project
---
Gamepad/controller navigation is a core feature of Archipelago — the UI runs on a kiosk with Xbox-style controller input.
**Why:** Archipelago runs on dedicated hardware with a TV/monitor + gamepad. Every page must be fully navigable without a mouse.
**How to apply:** When modifying any page's interactive elements, check that `data-controller-container` and `tabindex` are set correctly. Read `neode-ui/docs/GAMEPAD-NAV-MAP.md` for the full per-page navigation spec and implementation notes.
## Key files
- `neode-ui/src/composables/useControllerNav.ts` — all navigation logic
- `neode-ui/docs/GAMEPAD-NAV-MAP.md` — full nav spec with per-page tables, implementation notes, and Chromium gotchas
## Critical patterns
- Cards on grid pages: `glass-card transition-all hover:-translate-y-1` + `data-controller-container tabindex="0"`
- Settings page is a MIXED page (containers + standalone buttons) — nav searches both together
- ToggleSwitch has `tabindex="-1"` + `data-controller-ignore` so gamepad skips it
- Focus glow uses blurred box-shadow, NOT `0 0 0 Npx` spread (Chromium compositor bug with translateZ(0))
- `outline: none !important` on all containers to kill browser default focus rings

View File

@ -0,0 +1,29 @@
---
name: ISO Size Reduction Plan
description: Plan to reduce ISO from 3.9GB — prioritized phases for post-beta
type: project
---
Current ISO: ~3.9GB (unbundled). Target: <1.5GB.
**Why:** Debian Live base (~800MB) + rootfs with kiosk/Podman/firmware (~2.1GB) + squashfs overhead.
**Phase 1 — Quick wins (post-beta, ~500MB-1GB savings):**
- Strip unused firmware blobs (WiFi chipsets, GPU)
- Remove build-only packages from rootfs (not needed at runtime)
- `--no-install-recommends` in all apt installs
- Strip debug symbols from binaries
- Remove man pages, docs, locale data (`localepurge`)
**Phase 2 — Minimal base (~1-1.5GB savings):**
- Replace Debian Live ISO with custom `debootstrap --variant=minbase` live image
- Make kiosk (X11 + Chromium ~400MB) optional / separate overlay
- Alpine-based rootfs alternative
**Phase 3 — Long term (<1GB target):**
- Custom kernel with only needed modules
- A/B read-only root partition (no live boot infrastructure)
- Network installer variant (tiny ISO, needs internet)
- Reproducible builds with exact dep trees
**How to apply:** Each phase is independent. Phase 1 is safe to do anytime. Phase 2 requires testing the boot chain. Phase 3 is architectural.

View File

@ -0,0 +1,42 @@
---
name: ISO Session 2026-03-28 Handoff
description: Session handoff — branding overhaul, ISOLINUX config updated, terminal banners redesigned, UEFI still broken
type: project
---
## Session State (2026-03-28 ~latest)
### Branding Overhaul (this session)
**ISOLINUX boot menu:**
- Config updated: menu centered (HSHIFT 28, WIDTH 26), title "Bitcoin Node OS"
- Selection: white on dark, hotkeys in Bitcoin orange (#fb923c)
- Tab message: "Press TAB to edit | https://archipelago.sh"
- MENU RESOLUTION kept at 1024x768 (uses GRUB background.png)
- Three options: Install Archipelago, Install (verbose), Boot from local disk
**Terminal banners — unified design across all screens:**
- Name: "A R C H I P E L A G O" (uppercase, spaced, bold white)
- Separator: orange line
- Subtitle: dim text (varies by context)
- Colors: basic ANSI (works on bare-metal console, not 256-color)
- Width: fits 80-col terminals (no overflow/clipping)
- Build script auto-install.sh: centered + adaptive-width boxes
- Standalone scripts: fixed 52-char boxes
**Files changed:**
- build-auto-installer-iso.sh: ISOLINUX config, colors (256 to basic ANSI), case, header + completion
- build/debian-iso/custom/etc/profile.d/z99-archipelago.sh: full rewrite
- build/debian-iso/custom/archipelago/auto-start.sh: full rewrite
- archipelago-scripts/archipelago-menu.sh: full rewrite
- build/debian-iso/custom/isolinux/stdmenu.cfg, menu.cfg, live.cfg: updated
- branding/generate-isolinux-splash.py: new file (640x480 splash generator, optional)
### Outstanding Issues
- UEFI boot broken — drops to grub> prompt, only Legacy BIOS works
- ISOLINUX resolution kept at 1024x768, may clip on some hardware
- Install + onboarding logs confirmed present on .198 (5 log files)
- Need to review actual log content from .198
### Target Machine
- Dell on .198, Legacy BIOS, password: archipelago

View File

@ -0,0 +1,357 @@
# Gold Standard Claude Code Configuration — Archipelago
## Context
The last optimization (2026-03-28) cut CLAUDE.md from 130→101 lines and skills from 33→11. That was the right first pass. This plan is the second pass: fixing structural issues the first cleanup didn't address — hook duplication, memory chaos, a leaked API key, missing path scoping, context budget waste, and underutilized agent/permission systems. The goal is a configuration so tight that re-running this audit would produce zero suggestions.
**Research base**: Every file in `.claude/` (project + global), all 26 project memories, all 8 auto-memories, all 11 skills, all 5 rules, all 11 hooks, both settings files, the iframe-specialist agent, the full project structure (core/, neode-ui/, scripts/, image-recipe/, apps/, .gitea/), latest Claude Code docs (CLAUDE.md best practices, hooks v2.1.85+, skills frontmatter, agents, memory, permissions, MCP, context management, agent teams), and the 2026-03-28 cleanup feedback.
**Governing principle** (carried from cleanup): *Every line must prevent a specific mistake Claude would otherwise make. If Claude does it right without the instruction, it's noise.*
---
## Phase 0: CRITICAL — Remove Leaked Secret
**File**: `.claude/memory/deploy-automation.md` (line 11)
Contains a plaintext Anthropic API key: `sk-ant-api03-...`
**Action**: Remove the key immediately. Replace with: `"ANTHROPIC_API_KEY from secrets store (never stored in memory files)"`
This is the only blocking item. Everything else is optimization.
---
## Phase 1: CLAUDE.md — Trim to ~75 Lines
**File**: `/Users/dorian/Projects/archy/CLAUDE.md`
**Current**: 101 lines | **Target**: ~75 lines | **Saves**: ~500 tokens/session
### What to cut (reference data that doesn't prevent mistakes)
| Section | Lines | Action | Reason |
|---------|-------|--------|--------|
| Infrastructure table | 21-30 | Move to auto-memory | Reference data, not a rule. Already in memory files |
| ISO debug commands | 79-84 | Move to `iso-debug` skill reference | Diagnostic commands, not rules |
| Kiosk toggle info | 85-86 | Move to auto-memory or delete | Reference, not a rule |
| "Backend binds 127.0.0.1" | 63 | Move to new backend rule | Claude can read the code |
| "Timeouts on all external operations" | 65 | Move to new backend rule | Already in `rules/api.md` |
### What to add
```markdown
## Compact Instructions
When compacting, preserve: list of modified files, test results, deploy target state, current branch.
```
This costs 2 lines but saves entire sessions from losing critical context.
### Resulting structure (~75 lines)
```
Lines 1-2: Project description + stack
Lines 3-6: Beta freeze notice
Lines 7-12: Quick reference (dev, build, deploy commands)
Lines 13-18: Architecture diagram (compact)
Lines 19-20: Data paths
Lines 21-26: Critical Rules (5 rules)
Lines 27-33: App Integration Checklist
Lines 34-36: Git conventions
Lines 37-39: Compact instructions
```
Infrastructure table moves to auto-memory where it's still loaded at session start.
---
## Phase 2: Hook Deduplication — Eliminate Double Execution
### Problem
Every `Bash` call runs **both** global `pretooluse-bash.sh` AND project `block-risky-bash.sh`. Every `Edit|Write` call runs **both** global `pretooluse-files.sh` AND project `protect-files.sh`. They overlap on ~80% of patterns (rm -rf, git reset --hard, .git/ edits, .env files, etc.).
**Cost**: 2 extra Python processes per tool call, checking the same patterns twice.
### Solution: Project hooks become project-specific only
**File**: `.claude/hooks/block-risky-bash.sh`
**Action**: Strip all patterns already covered by global hook. Keep ONLY:
- Cargo build on macOS (Archy-specific: "build on dev server via SSH")
- Path traversal with rm (more aggressive check than global)
~15 lines instead of ~80.
**File**: `.claude/hooks/protect-files.sh`
**Action**: Strip all patterns already covered by global hook. Keep ONLY:
- `scripts/deploy-config.sh` (Archy-specific credential file)
- Path-outside-project check (project-specific boundary)
~20 lines instead of ~75.
**Global hooks stay unchanged** — they're the universal baseline.
### Result
- Before: 4 Python processes per Bash call (2 global + 2 project parsing same JSON)
- After: 2 Python processes per Bash call (1 global comprehensive + 1 tiny project-specific)
---
## Phase 3: Memory System — Consolidate and Clean
### Problem
Two separate memory systems with overlapping content:
1. **Auto-memory** (`~/.claude/projects/-Users-dorian-Projects-archy/memory/`) — 8 files, auto-loaded
2. **Project memory** (`.claude/memory/`) — 26 files, NOT auto-loaded
Claude sees auto-memory every session. Project memory only loads if Claude manually reads it.
### Solution: Curate auto-memory, keep project memory as archive
**Auto-memory MEMORY.md** — restructure to ~25 lines with the most critical feedback:
```markdown
# Archipelago Project Memory
## Critical Feedback (prevent recurring mistakes)
- [Direct Port Rule](feedback_apps_always_direct_port.md) — Apps MUST use direct port, NEVER proxy paths
- [External URLs](feedback_external_urls_iframe.md) — Open https:// directly, never /ext/
- [Deploy All Nodes](feedback_indeedhub_deploy_all_servers.md) — Deploy to ALL nodes
- [No Tor Publishing](feedback_no_tor_relay_publishing.md) — Never publish .onion to relays
- [UFW Forward](feedback_podman_ufw_forward.md) — DEFAULT_FORWARD_POLICY=ACCEPT
- [Deploy Patterns](feedback_deploy_patterns.md) — Rootless port 80, cred sync, image export
- [Asset Workflow](feedback_asset_workflow.md) — Never generate images, user is designer
- [ASCII Logo](feedback_logo_ascii.md) — Block-letter logo locked, never change
- [Claude Cleanup](feedback_claude_cleanup.md) — Instruction optimization principles
## Infrastructure
- [CI/CD & Registry](reference_cicd_registry.md) — git.tx1138.com, act_runner, insecure registry
- [Multi-Node Deploy](reference_multi_node_deploy.md) — 5 nodes, SSH keys, deploy methods
- [Infrastructure Quick Ref](reference_infrastructure.md) — IPs, passwords, SSH keys (moved from CLAUDE.md)
## Project State
- [ISO Testing](project_iso_testing_plan.md) — Hardware matrix, boot compatibility
- [ISO Custom Base](project_iso_size_reduction.md) — Debootstrap ISO, remaining issues
## Archive
Detailed project memory in .claude/memory/MEMORY.md (26 files, not auto-loaded).
```
**New auto-memory files to create** (migrated from project memory):
- `feedback_apps_always_direct_port.md` — Broken THREE TIMES, highest-value feedback
- `feedback_deploy_patterns.md` — Hard-won container patterns
- `feedback_asset_workflow.md` — Prevents wasted effort generating images
- `feedback_logo_ascii.md` — Prevents changing locked-in branding
- `reference_infrastructure.md` — Infrastructure table from CLAUDE.md (IPs, SSH, passwords)
**Project memory (.claude/memory/)**:
- Add comment at top of MEMORY.md: `<!-- Archive: not auto-loaded. Active memory at ~/.claude/projects/.../memory/ -->`
- Fix `deploy-automation.md` (Phase 0 — remove API key)
- Update `unbundled-iso.md` (still says "NOT YET BUILT")
---
## Phase 4: Permissions — Auto-Approve Safe Commands
**File**: `.claude/settings.local.json`
**Current**: Only `ssh:*` and `gh api:*` allowed.
**Updated** — add read-only and build/test commands:
```json
{
"permissions": {
"allow": [
"Bash(ssh:*)",
"Bash(gh api:*)",
"Bash(cd neode-ui*)",
"Bash(npm run *)",
"Bash(npm test*)",
"Bash(npm start*)",
"Bash(npx vue-tsc*)",
"Bash(npx vitest*)",
"Bash(git log*)",
"Bash(git diff*)",
"Bash(git status*)",
"Bash(git branch*)",
"Bash(git show*)",
"Bash(git stash*)",
"Bash(cargo check*)",
"Bash(cargo clippy*)",
"Bash(cargo test*)",
"Bash(journalctl*)",
"Bash(systemctl status*)",
"Bash(ls *)",
"Bash(wc *)",
"Bash(file *)",
"Bash(xxd *)",
"Bash(df *)",
"Bash(du *)"
]
}
}
```
**NOT auto-approved** (still require confirmation):
- `git push/commit` — Affects remote/creates state
- `cargo build` — Blocked by hook on macOS anyway
- `npm install` — Modifies dependencies
- `./scripts/deploy-*` — Deploys to servers
- `rm`, `mv`, `cp` — Potentially destructive
---
## Phase 5: Merge iso-branding into build-iso
**Problem**: `iso-branding` is a pure design reference, only relevant during ISO builds. Its description consumes skill budget.
**Action**:
1. Move `.claude/skills/iso-branding/SKILL.md` content → `.claude/skills/build-iso/references/branding.md`
2. Update `build-iso/SKILL.md` to reference the branding file
3. Delete `.claude/skills/iso-branding/` directory
**Skill count**: 11 → 10
---
## Phase 6: Add Backend Rule File
**Problem**: No path-scoped rule for Rust backend. 3 backend rules sit in CLAUDE.md (loaded every session even for frontend-only work).
**New file**: `.claude/rules/backend.md`
```markdown
---
globs:
- "core/**/*.rs"
- "core/**/Cargo.toml"
---
# Backend Rules (Archipelago — Rust)
- Backend binds `127.0.0.1` only — nginx handles external access
- Validate all input before path construction — reject `..`, `/`, null bytes
- Timeouts on all external operations (10s default, 30s heavy)
- Use `anyhow::Result` for error propagation, not `.unwrap()` in handlers
- Log with `tracing`, never `println!` or `eprintln!` in production paths
- Container commands through `PodmanClient` (core/container/), never raw Command::new("podman")
```
Delete the Backend section from CLAUDE.md (moved here).
---
## Phase 7: Tighten prompt-injection-detect.sh
**Problem**: `context_manipulation` pattern matches `IMPORTANT:`, `CRITICAL:`, `<system>` — normal in code/docs. Creates false positive warnings.
**Action**: Tighten the `context_manipulation` regex to require injection-specific signatures:
```bash
# OLD (too broad):
"IMPORTANT:|CRITICAL:|SYSTEM:|ADMIN:|<system>|</system>|<instructions>"
# NEW (specific):
"(?:^|\s)(?:SYSTEM|ADMIN):\s*(?:you are|ignore|forget|override|new instructions)|<(?:system|instructions)>.*(?:ignore|override|forget)"
```
---
## Phase 8: Add 2 Focused Agents
**Current**: 1 agent (iframe-specialist, 678 lines)
**Add**:
### `.claude/agents/deploy-specialist.md`
```yaml
---
name: deploy-specialist
description: Deploys to all 5 Archipelago nodes. Knows SSH access, build capabilities, post-deploy verification.
tools: Bash, Read, Grep, Glob
model: sonnet
---
```
Body: Node inventory, deploy workflow, IndeedHub multi-node rules, post-deploy checklist.
### `.claude/agents/code-reviewer.md`
```yaml
---
name: code-reviewer
description: Reviews code against Archipelago standards — frontend patterns, Rust safety, container security, crypto rules.
tools: Read, Grep, Glob
model: sonnet
---
```
Body: Frontend rules, backend rules, container rules, security checklist.
**Agent count**: 1 → 3
---
## Phase 9: Skill Frontmatter Audit
**Problem**: Action skills that have side effects should have `disable-model-invocation: true` to prevent Claude from auto-invoking them.
| Skill | Has `disable-model-invocation: true`? | Needs it? |
|-------|--------------------------------------|-----------|
| add-app | Yes | Yes (side effects) |
| add-web-app | Verify | Yes |
| build-iso | Verify | Yes (builds ISO) |
| iso-debug | Verify | Yes (runs diagnostics) |
| podman | Verify | Yes (modifies containers) |
| polish | Verify | Yes (modifies code) |
| sweep | Verify | Yes (runs checks, may fix) |
| mesh | No | No (reference knowledge) |
| design-pixel-retro | No | No (reference knowledge) |
| gamepad-nav | No | No (reference knowledge) |
Action: Verify and add `disable-model-invocation: true` to all 7 action skills.
---
## Summary
| Phase | Impact | Files Changed | Benefit |
|-------|--------|---------------|---------|
| 0. Remove API key | CRITICAL | 1 | Security |
| 1. Trim CLAUDE.md | HIGH | 1 | ~500 tokens/session saved |
| 2. Dedup hooks | HIGH | 2 | ~200ms faster per tool call |
| 3. Memory consolidate | HIGH | ~8 | Cleaner context, no stale data |
| 4. Permissions | MEDIUM | 1 | ~3s saved per safe command |
| 5. Merge iso-branding | LOW | 3 | 1 less skill description |
| 6. Backend rule | MEDIUM | 2 | Path-scoped, not always-loaded |
| 7. Injection hook | LOW | 1 | Fewer false positives |
| 8. New agents | MEDIUM | 2 new | Better delegation |
| 9. Skill frontmatter | LOW | ~5 | Prevents unintended auto-invoke |
**Net changes**: CLAUDE.md 101→~75 lines, skills 11→10, agents 1→3, rules 5→6, hooks 60% smaller
---
## What This Plan Does NOT Change (and why each was evaluated)
- **Global CLAUDE.md** (36 lines) — Already optimized, passes the "would removing cause mistakes?" test
- **Global hooks** (8 scripts) — Universal baseline, well-tuned, no project overlap
- **Global rules** (api, crypto, bitcoin) — Correct glob scoping, concise content
- **Global settings.json** — Plugins, effort level, hook config all justified
- **iframe-specialist agent** — Deep reference, correctly scoped, rarely loaded
- **Skills mesh/gamepad-nav/design-pixel-retro** — Tiny description cost (~120 chars each), valuable on-demand
- **MCP servers** — Not needed (self-hosted infra, no external API integrations)
- **Agent teams** — Experimental, single-developer project doesn't benefit
- **Project .claude/memory/ (26 files)** — Kept as archive with annotation
---
## Verification Checklist
After implementation:
- [ ] `grep -r "sk-ant" .claude/` returns zero results
- [ ] New session auto-loads MEMORY.md with all critical feedback
- [ ] `git status` auto-approves without permission prompt
- [ ] `/sweep` skill loads and executes correctly
- [ ] Project hooks run fast (no duplicate pattern checks)
- [ ] `cd neode-ui && npx vue-tsc -b --noEmit` passes
- [ ] Spawning deploy-specialist agent works
- [ ] CLAUDE.md is ≤80 lines
- [ ] `/context` shows reasonable token budget

View File

@ -0,0 +1,174 @@
# Plan: Optimize Claude Code Instructions for Maximum Coding Performance
## Context
### The Problem
Research across Anthropic's official docs, engineering blog, GitHub issues, and academic papers converges on one finding: **instruction overload degrades Claude's coding performance**. The more tokens consumed by rules/instructions, the less attention and context remain for actual code generation.
Key evidence:
- Anthropic official docs: *"Bloated CLAUDE.md files cause Claude to ignore your actual instructions!"*
- Boris Cherny (Claude Code creator) uses ~100 lines / ~2,500 tokens for his CLAUDE.md
- Research (Jaroslawicz et al., 2025): instruction compliance decreases linearly as count increases; frontier models plateau at ~150-200 instructions; Claude Code's system prompt already uses ~50
- "Lost in the Middle" (Stanford, 2024): LLMs exhibit U-shaped attention — middle content gets least attention
- Anthropic engineering blog: *"Find the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome"*
- Aggressive language (BANNED, NEVER, CRITICAL, Non-Negotiable) overtriggers on Claude 4.5/4.6 — Anthropic explicitly recommends dialing it back
- Multiple GitHub issues (15443, 28158, 16073, 34197) document systematic instruction ignoring with large CLAUDE.md files
### Current State (Archy Project)
**Always-loaded instruction payload:**
| Source | Lines | Chars | Est. Tokens |
|--------|-------|-------|-------------|
| Global CLAUDE.md | 97 | 5,624 | ~1,400 |
| Project CLAUDE.md | 130 | 5,270 | ~1,300 |
| 5 rules files | 119 | 5,123 | ~1,280 |
| MEMORY.md index | 16 | 1,099 | ~275 |
| 33 skill descriptions (system) | ~300 | ~13,200 | ~3,300 |
| **Total always-loaded** | **~662** | **~30,316** | **~7,555** |
Plus ~10 memory files (~290 lines, ~19K chars) loaded on relevance, and 33 skills totaling ~122K chars loaded on demand.
### Key Problems Identified
1. **Global CLAUDE.md is ~60% things Claude already knows** — "Comment WHY not WHAT," "Functions under 50 lines," "Zero compiler warnings" are standard practices Claude follows without being told
2. **Anti-Hallucination section (28 lines) restates built-in behavior** — package verification is in Claude's training
3. **Redundancy across files** — security rules appear in global CLAUDE.md + crypto.md + api.md + project CLAUDE.md (4x)
4. **Aggressive language throughout** — "BANNED," "Non-Negotiable," "MANDATORY," "NEVER" — Anthropic says this causes overtriggering on current models
5. **Project CLAUDE.md duplicates rules files** — Frontend section repeats frontend.md, Security section repeats crypto.md + api.md
6. **Philosophy section is ~30 lines that don't affect code generation** — Claude won't suggest altcoins or proprietary deps regardless
### What We Preserve (per user request)
- All deploy commands, build commands, SSH access, CI/CD info
- All infrastructure keys/addresses/IPs
- Security and quality architecture rules that prevent real mistakes
- All memory files and feedback (operational learnings)
- All skills (they already use progressive disclosure correctly)
---
## The Plan
### Principle: Every line must prevent a specific mistake Claude would otherwise make
If Claude would do the right thing without the instruction -> delete it.
If Claude does the wrong thing even with the instruction -> make it a hook.
If it only matters for specific files -> scope it with globs in rules/.
### Step 1: Rewrite Global CLAUDE.md (~97 -> ~35 lines)
**Remove (Claude already knows these):**
- "Comment WHY not WHAT" — standard practice
- "Functions under 50 lines, single responsibility" — standard practice
- "Zero compiler warnings, zero linter errors" — standard practice
- "Remove dead code entirely" — standard practice
- "Deploy and verify changes" — project-specific, belongs in project CLAUDE.md
- Entire "Core Principles" enumeration (5 items) — the one-line philosophy header covers it
- "Encryption first" details — covered by crypto.md rules file
- Most of "Anti-Hallucination" section (28 lines) — Claude already verifies packages; keep only "cross-reference existing deps" which is non-obvious
- "Code Sourcing: What to avoid" items 3-4 — too specific, rarely triggered
**Keep (prevents real mistakes):**
- Bitcoin-only stance (1 line) — prevents suggesting altcoin libs
- Open source preference (1 line)
- Code sourcing core rules (no vibe-code repos, no vendoring without approval)
- Dependency selection order (rustls > openssl, etc.) — non-obvious preferences
- Security standards not in rules files (never commit secrets, pin versions)
- Project ecosystem listing — useful cross-project context
- Atomic commit format
**Rewrite style:** Calm, direct. No MANDATORY, no bold on every line.
### Step 2: Rewrite Project CLAUDE.md (~130 -> ~75 lines)
**Remove (duplicated in scoped rules files):**
- Frontend section (lines 70-77) — exact duplicate of .claude/rules/frontend.md
- Security section (lines 87-94) — duplicates crypto.md + api.md + containers.md
- "See .claude/rules/ for detailed..." pointer — Claude loads them automatically
**Remove (Claude already knows):**
- "No unwrap()/expect() — use ? with .context()" — standard Rust practice
- "tracing for logging, never println!" — standard practice
- "tokio runtime" — obvious from the codebase
**Keep and tighten (all non-obvious, prevents real mistakes):**
- Overview + Stack (essential context)
- Beta freeze status (active project constraint)
- Quick Reference commands (frequently used, non-guessable)
- Infrastructure table (IPs, keys, remotes — user explicitly wants these)
- Architecture diagram (essential mental model)
- Critical Rules (5 items — all non-obvious)
- Backend: only non-obvious rules (bind 127.0.0.1, path validation, timeouts)
- ISO Build commands (operational knowledge)
- App Integration Checklist (prevents real mistakes)
- Git conventions (one line)
### Step 3: Tone Adjustment (all files)
Per Anthropic's explicit guidance for Claude 4.5/4.6:
| Before | After |
|--------|-------|
| `.gradient-button` is BANNED | Use `.glass-button` for all buttons, not `.gradient-button` |
| Non-Negotiable | _(remove header, rules speak for themselves)_ |
| MANDATORY checks | _(remove, rules are clear)_ |
| NEVER use floating point | Sats are always integers (`u64`/`BigInt`), not floats |
| NEVER build Rust on macOS | Do not build Rust on macOS — deploy script handles cross-compilation |
This is not cosmetic — Anthropic docs state aggressive language causes overtriggering.
### Step 4: Tighten Rules Files
- **frontend.md** — Tone adjustment only (already 8 good rules, glob-scoped)
- **containers.md** — Reorder critical rules to top, tone adjustment. Keep UID table and systemd requirements (genuine lookup references)
- **api.md, bitcoin.md, crypto.md** — Tone adjustment only (already concise and glob-scoped)
### Step 5: Clean Up Memory Index
- Fix duplicate Session 2026-03-28 entry in MEMORY.md
- Add missing entries for untracked files (feedback_asset_workflow.md, project_iso_size_reduction.md, etc.)
- All memory file content preserved as-is
### Step 6: No Changes To
- **Skills** — Load on demand (correct architecture). 33 skill descriptions at ~100 tokens each is the design intent.
- **Hooks** — Already well-structured.
- **Settings** — Good as-is.
- **Rules file glob scoping** — Already correct.
---
## Expected Impact
| Metric | Before | After | Reduction |
|--------|--------|-------|-----------|
| Global CLAUDE.md | 97 lines / 5,624 chars | ~35 lines / ~2,100 chars | 64% |
| Project CLAUDE.md | 130 lines / 5,270 chars | ~75 lines / ~3,200 chars | 42% |
| Rules files | 119 lines / 5,123 chars | ~115 lines / ~5,000 chars | 3% |
| **Total always-loaded** | **346 lines / 16,017 chars** | **~225 lines / ~10,300 chars** | **35%** |
Key outcomes:
- Every remaining line prevents a specific, real mistake
- No redundancy between files
- Calm, direct tone matched to current model behavior
- Critical rules at top/bottom of files (exploits primacy/recency attention bias)
- ~1,400 tokens freed for actual code context per session
## Files to Modify
1. `/Users/dorian/.claude/CLAUDE.md` — Rewrite (97 -> ~35 lines)
2. `/Users/dorian/Projects/archy/CLAUDE.md` — Rewrite (130 -> ~75 lines)
3. `/Users/dorian/Projects/archy/.claude/rules/frontend.md` — Tone adjustment (BANNED -> positive)
4. `/Users/dorian/Projects/archy/.claude/rules/containers.md` — Reorder + tone
5. `/Users/dorian/.claude/rules/bitcoin.md` — Tone adjustment
6. `/Users/dorian/.claude/rules/crypto.md` — Tone adjustment
7. `/Users/dorian/.claude/projects/-Users-dorian-Projects-archy/memory/MEMORY.md` — Fix index
## Verification
1. Start a new Claude Code session on archy
2. Check infrastructure IPs, SSH keys, deploy commands are all accessible
3. Ask Claude to write a Vue component — should follow glass-button, script setup, style.css
4. Ask Claude to write Rust backend code — should use ?, bind 127.0.0.1
5. Ask Claude about deploying — should know deploy-to-target.sh, .228, .198
6. Ask Claude to add a container — should follow rootless Podman, UID mapping
7. Observe: faster responses, less hedging, more focused output

View File

@ -0,0 +1,241 @@
# Container Orchestration Dev Testing Infrastructure
## Context
Container orchestration has been unreliable for months. Every fix requires a full deploy to .228 (5+ minutes), manual SSH debugging, and prayer. No way to test orchestration logic locally or catch regressions before deploy. We need three layers of testing so orchestration is bulletproof before it ever touches a server.
## Three Layers
### Layer C: Mock Podman in Rust Unit Tests (runs on macOS, instant)
Tests the orchestration LOGIC without any containers. Runs in `cargo test`, takes seconds.
**What it tests:** Retry backoff timing, restart tracker persistence, tier ordering, stop grace periods, failsafe install flow, health monitor state machine, crash recovery.
**Implementation:**
Create `core/archipelago/src/container/mock_podman.rs` — a fake podman command executor:
```rust
pub struct MockPodman {
containers: Arc<Mutex<HashMap<String, MockContainer>>>,
fail_pull: Arc<AtomicBool>, // simulate registry down
fail_start: Arc<AtomicBool>, // simulate container crash on start
pull_delay_ms: Arc<AtomicU64>, // simulate slow pull
}
struct MockContainer {
name: String,
image: String,
state: ContainerState, // Created/Running/Exited/Stopped
exit_code: i32,
created_at: DateTime<Utc>,
}
```
Key trait to add in `runtime.rs`:
```rust
#[async_trait]
pub trait CommandExecutor: Send + Sync {
async fn execute(&self, program: &str, args: &[&str]) -> Result<CommandOutput>;
}
```
Production uses `RealExecutor` (calls `tokio::process::Command`). Tests use `MockPodman`.
**Test file:** `core/archipelago/tests/orchestration_tests.rs`
Tests to write:
1. `test_stop_grace_periods` — bitcoin gets 600s, lnd 330s, unknown gets 30s
2. `test_pull_retry_backoff` — fail twice, succeed third, verify 5s/15s delays
3. `test_pull_all_attempts_fail` — fail 3x, verify error returned
4. `test_restart_tracker_persistence` — save to disk, reload, verify counters survive
5. `test_restart_tracker_stability_reset` — after 1h, counters clear
6. `test_failsafe_install_rollback` — container exits immediately, verify cleanup
7. `test_failsafe_install_image_missing` — pull succeeds but image not found, verify error
8. `test_health_monitor_tier_ordering` — databases restart before apps
9. `test_health_monitor_skips_user_stopped` — user-stopped containers not restarted
10. `test_health_monitor_max_attempts` — stops after 3 failures
11. `test_crash_recovery_loads_snapshot` — PID file + snapshot → containers restarted
12. `test_crash_recovery_skips_user_stopped` — user-stopped not recovered
**Files to modify:**
- `core/archipelago/src/container/mod.rs` — add `pub mod mock_podman;`
- `core/archipelago/src/container/mock_podman.rs` — NEW mock implementation
- `core/archipelago/tests/orchestration_tests.rs` — NEW test file
- `core/archipelago/src/health_monitor.rs` — extract logic into testable functions (pure functions that take data, not functions that call podman)
- `core/archipelago/src/api/rpc/package/runtime.rs` — make `stop_timeout_secs` public for testing
**Key refactors to make code testable:**
- Extract `stop_timeout_secs()``pub fn` so tests can call it directly
- Extract health monitor `check_and_restart()` into a function that takes container list + tracker + user_stopped, returns actions to take (restart X, notify Y, skip Z) — pure logic, no IO
- Extract `RestartTracker` + `RestartHistory` into own file for independent testing
- Make `pull_image_with_progress` retry logic independent of progress streaming
---
### Layer A: SSH Dev Loop in dev-start.sh (real containers on .228)
New option 9 in `dev-start.sh`: "Container orchestration dev (live on .228)"
**What it does:**
1. Rsync code to .228 (2 seconds)
2. Build backend on .228 (incremental: 5-15 seconds)
3. Restart archipelago service
4. Run orchestration smoke tests via RPC
5. Show container status + health monitor logs
6. Loop: edit locally → press Enter → rsync+rebuild+test
**What it tests:** Real podman, real containers, real networking. The actual install/start/stop/restart/health cycle.
**Implementation:**
Add option 9 to `scripts/dev-start.sh`:
```bash
9)
echo "Container Orchestration Dev (live testing on .228)"
exec "$SCRIPT_DIR/dev-container-test.sh"
;;
```
Create `scripts/dev-container-test.sh` (~150 lines):
```bash
#!/bin/bash
# Fast edit-build-test loop for container orchestration on .228
#
# Usage: ./scripts/dev-container-test.sh [--once]
#
# Syncs code, builds, restarts, runs orchestration smoke tests.
# Press Enter to re-run, Ctrl+C to stop.
SSH="ssh -o StrictHostKeyChecking=no -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228"
sync_and_build() {
rsync (same excludes as deploy script)
ssh: cargo build --release -p archipelago (incremental)
ssh: sudo systemctl restart archipelago
ssh: wait for health endpoint (15s timeout)
}
run_smoke_tests() {
# Test 1: Container list works
curl -s /rpc/v1 -d '{"method":"container.list"}'
# Test 2: Install filebrowser (small, fast, no deps)
curl -s /rpc/v1 -d '{"method":"package.install","params":{"id":"filebrowser","dockerImage":"..."}}'
# Wait for running state
# Test 3: Stop with grace period
curl -s /rpc/v1 -d '{"method":"package.stop","params":{"id":"filebrowser"}}'
# Verify stopped
# Test 4: Start
curl -s /rpc/v1 -d '{"method":"package.start","params":{"id":"filebrowser"}}'
# Verify running
# Test 5: Health check
curl -s /rpc/v1 -d '{"method":"container.health"}'
# Test 6: Check restart-tracker.json exists
ssh: cat /var/lib/archipelago/restart-tracker.json
# Test 7: Check health monitor logs for errors
ssh: journalctl -u archipelago --since "2 min ago" | grep -i "error\|panic\|fail"
# Test 8: Uninstall
curl -s /rpc/v1 -d '{"method":"package.uninstall","params":{"id":"filebrowser"}}'
}
# Main loop
while true; do
sync_and_build
run_smoke_tests
echo "Press Enter to re-run, Ctrl+C to stop"
read
done
```
**Files:**
- `scripts/dev-start.sh` — add option 9
- `scripts/dev-container-test.sh` — NEW
---
### Layer B: CI Integration Tests (runs on .228 via Gitea Actions)
Extend the existing CI to run container orchestration tests on every push to dev-iso.
**What it tests:** Full lifecycle on real hardware after every code change. Catches regressions automatically.
**Implementation:**
Create `.gitea/workflows/container-tests.yml`:
```yaml
name: Container Orchestration Tests
on:
push:
branches: [dev-iso, main]
paths:
- 'core/**'
- 'scripts/container-*.sh'
- 'scripts/reconcile-*.sh'
jobs:
unit-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Rust unit tests (orchestration)
run: cargo test -p archipelago -- orchestration --no-fail-fast
integration-tests:
runs-on: ubuntu-latest
needs: unit-tests
steps:
- uses: actions/checkout@v4
- name: Deploy to test node
run: |
# Rsync + build on .228
# Run orchestration smoke tests
bash scripts/run-container-tests.sh
```
Create `scripts/run-container-tests.sh` (~200 lines):
Reuses the smoke test logic from dev-container-test.sh but structured for CI:
- JSON output for CI parsing
- Exit codes for pass/fail
- Timeout handling (5 min max)
- Cleanup after test (remove test containers)
- Tests: install, start, stop, restart, uninstall, health check, restart tracker, reconciliation
**Files:**
- `.gitea/workflows/container-tests.yml` — NEW
- `scripts/run-container-tests.sh` — NEW
---
## Execution Order
1. **Layer C first** (mock tests) — Get the logic tested, runs locally, fast feedback
2. **Layer A second** (dev loop) — Test against real containers with fast iteration
3. **Layer B last** (CI) — Automate regression catching
## Files Summary
| File | Action | Layer |
|------|--------|-------|
| `core/archipelago/src/container/mock_podman.rs` | NEW | C |
| `core/archipelago/src/container/mod.rs` | MODIFY | C |
| `core/archipelago/tests/orchestration_tests.rs` | NEW | C |
| `core/archipelago/src/health_monitor.rs` | REFACTOR (extract pure logic) | C |
| `core/archipelago/src/api/rpc/package/runtime.rs` | MODIFY (pub fn) | C |
| `scripts/dev-start.sh` | MODIFY (add option 9) | A |
| `scripts/dev-container-test.sh` | NEW | A |
| `.gitea/workflows/container-tests.yml` | NEW | B |
| `scripts/run-container-tests.sh` | NEW | B |
## Verification
- Layer C: `cargo test -p archipelago -- orchestration` — all pass on macOS
- Layer A: `./scripts/dev-start.sh` → option 9 → green smoke tests on .228
- Layer B: Push to dev-iso → CI green on container-tests workflow

View File

@ -0,0 +1,89 @@
# Plan: ISO Polish — Fix Everything for Beta Release
## Context
Fresh ISO install on .198 revealed 11 issues ranging from critical (app installs, Tor broken) to UX (GRUB scaling, boot splash, kiosk reliability). Goal: next ISO build produces a flawless out-of-box experience.
## Issues & Fixes (priority order)
### 1. CRITICAL: Tor services.json not written (escaping bug)
**Symptom:** `setup-tor.sh: line 12: $ARCHY_TOR_DIR/services.json: No such file or directory`
**Root cause:** In `build-auto-installer-iso.sh`, the setup-tor heredoc escapes `$ARCHY_TOR_DIR` as `\$ARCHY_TOR_DIR`, producing a literal `$` in the output script. The variable never expands at runtime.
**Fix:** In the heredoc that generates setup-tor.sh (~line 1200), use unescaped `$ARCHY_TOR_DIR` so it expands at runtime. The heredoc itself uses `<<TORSCRIPT` (unquoted) so we need to check the quoting carefully.
**File:** `image-recipe/build-auto-installer-iso.sh` (setup-tor heredoc section)
### 2. CRITICAL: App installs failing ("Operation failed")
**Symptom:** Screenshot shows "Failed: Error: Operation failed. Check server logs" + "Downloading..." stuck
**Root cause:** This is the OLD build (pre-CSRF fix). The new build has the fix. However, `sanitize_error_message()` in `middleware.rs` masks ALL real errors. Need to verify the new build works.
**Fix:** Already fixed (auth.ts, rpc-client.ts, mod.rs). Verify on next ISO.
**Also:** Consider allowing "Failed to pull" errors through the sanitizer so users see meaningful install errors.
**File:** `core/archipelago/src/api/rpc/middleware.rs`
### 3. HIGH: Kiosk white screen / never loads on first boot
**Symptom:** First boot: black screen → white screen → kiosk never loads. Second boot works fine.
**Root cause:** The kiosk `ExecStartPre` health check polls 15x with 2s delay (30s max), but on first boot the backend may not be ready within 30s (first-boot-containers, Tor setup, etc. all running). Chromium opens `http://localhost/kiosk` before nginx/backend is fully up → white page. No retry logic in the launcher.
**Fix:** Increase health check to 30 attempts (60s). Add a loading page that Chromium shows while waiting (a simple HTML file served by nginx even when backend is down). Add `--disable-gpu` flag to Chromium (fixes some white screen issues on low-end GPUs).
**File:** `image-recipe/build-auto-installer-iso.sh` (kiosk launcher + ExecStartPre)
### 4. HIGH: GRUB theme text not scaling / cut off on 4:3 monitors
**Symptom:** Screenshot shows "Install (var" cut off, menu items barely readable on 1280x1024 Dell
**Root cause:** GRUB theme uses percentage-based layout but no font size control. GRUB defaults to a small bitmap font. The `item_height = 40` is fixed pixels, too small at some resolutions. No explicit font loaded in theme.txt.
**Fix:** In `grub.cfg`, load a larger font (24px DejaVu or similar). Adjust theme.txt: increase `item_height`, move menu position up, ensure text fits. Add `loadfont` to grub.cfg.
**Files:** `image-recipe/branding/grub-theme/theme.txt`, `image-recipe/build-auto-installer-iso.sh` (grub.cfg generation)
### 5. HIGH: LUKS partition not showing in disk stats
**Symptom:** Server view doesn't show LUKS encryption status or the encrypted partition
**Root cause:** Backend `system.disk-status` uses `df /` or `df /var/lib/archipelago` but doesn't report LUKS status. No `cryptsetup status` call. Frontend only shows used/total/free/percent.
**Fix:** Add LUKS detection to the disk status RPC: check if `/dev/mapper/archipelago*` exists, read `cryptsetup status`. Return `encrypted: true/false` and `encryption_cipher` fields. Frontend: show a lock icon + "LUKS2 Encrypted" badge in disk stats.
**Files:** `core/archipelago/src/api/rpc/system/handlers.rs`, `neode-ui/src/views/Server.vue`
### 6. MEDIUM: No Plymouth boot splash showing
**Symptom:** No animation between GRUB and login — just black screen with blinking cursor
**Root cause:** Plymouth theme files exist in `image-recipe/branding/plymouth-theme/` but the ISO build doesn't copy the logo.png or install the theme properly. Also kernel cmdline needs `splash quiet` and `plymouth-set-default-theme` must be run.
**Fix:** Verify the ISO build copies plymouth theme + logo.png to rootfs, runs `plymouth-set-default-theme archipelago`, and kernel cmdline includes `splash quiet`.
**File:** `image-recipe/build-auto-installer-iso.sh` (plymouth setup section)
### 7. MEDIUM: No custom MOTD
**Symptom:** Default Debian MOTD on VT1 login
**Fix:** Add custom MOTD to ISO build that shows Archipelago ASCII logo, version, IP address, and useful commands (kiosk toggle, SSH info).
**File:** `image-recipe/build-auto-installer-iso.sh` (add MOTD generation)
### 8. MEDIUM: Onboarding intro needs double press
**Symptom:** Pressing the intro circle/button once resets, need to press twice
**Root cause:** `SplashScreen.vue` has a 48-segment ring animation triggered on hover. The splash → intro transition may have a race condition with animation completion. `OnboardingIntro.vue` auto-focuses CTA after 2100ms delay — if user clicks before that, focus may steal the event.
**Fix:** Investigate SplashScreen.vue transition timing. Add click debounce or ensure single-click always proceeds.
**Files:** `neode-ui/src/components/SplashScreen.vue`, `neode-ui/src/views/OnboardingIntro.vue`
### 9. MEDIUM: No TUI animations in actual installer
**Symptom:** Installer is functional but plain — no bouncing Bitcoin, no glitch effects from demo
**Root cause:** `scripts/install-tui-demo.sh` has elaborate animations but the actual installer in the ISO build script is minimal (basic spinner + typewriter only).
**Fix:** Port key animations from install-tui-demo.sh into the actual installer: logo decrypt reveal, progress bars with percentage, phase transitions. Keep it lightweight but visually distinctive.
**File:** `image-recipe/build-auto-installer-iso.sh` (auto-install.sh section)
### 10. LOW: Container tests CI failing
**Symptom:** `cargo: command not found` in container-tests workflow
**Fix:** Add `source $HOME/.cargo/env` to test steps. Already staged locally.
**File:** `.gitea/workflows/container-tests.yml`
### 11. LOW: Kiosk enable/disable command lacks visual feedback
**Symptom:** User runs command, MOTD changes but no immediate visual confirmation
**Root cause:** The `archipelago-kiosk` script DOES print feedback messages. The issue may be that VT auto-switches and the user doesn't see the output.
**Fix:** Add a brief sleep before VT switch so user sees the confirmation message. Consider adding a `--quiet` flag for scripted use.
**File:** `image-recipe/build-auto-installer-iso.sh` (kiosk toggle script)
## Execution Order
1. Tor fix (#1) — 5 min, critical
2. Kiosk reliability (#3) — 15 min, high impact
3. GRUB text scaling (#4) — 15 min, visible
4. LUKS disk stats (#5) — 20 min, backend + frontend
5. App install error messages (#2) — 10 min, verify + improve
6. Plymouth boot splash (#6) — 15 min
7. Custom MOTD (#7) — 10 min
8. Intro double-press (#8) — 10 min
9. TUI animations (#9) — 30 min (port from demo)
10. CI fix (#10) — 2 min
11. Kiosk feedback (#11) — 5 min
## Verification
- Build new ISO on .228 via CI (push to main)
- Flash to USB, install on .198
- Check: GRUB readable → Plymouth splash → TUI installer animations → MOTD shows → Kiosk loads first time → Tor onion addresses visible → App installs work → Disk shows LUKS → Intro single-click works

View File

@ -0,0 +1,205 @@
# BIP-39 Master Seed — Unified Key Derivation for Archipelago
## Context
Archipelago's current identity system is broken:
- Node key generated randomly at boot, before the user exists
- Each identity creates independent random Ed25519 + secp256k1 keys
- ADR-008 says "both keys derived from same master seed" but code doesn't do this
- Backup only covers the node key, not identity keys
- No seed phrase — backup is an opaque encrypted blob with a user passphrase
- Restore path disabled ("Coming Soon")
- No connection between node identity and Bitcoin/LND wallet keys
**Goal:** One 24-word BIP-39 seed phrase derives ALL keys. User writes down 24 words, can recover everything on a fresh install.
---
## Derivation Scheme
```
BIP-39 Mnemonic (24 words, 256-bit entropy)
-> PBKDF2-HMAC-SHA512 (2048 rounds, empty passphrase)
-> Master Seed (64 bytes)
|
+-- HKDF-SHA256(seed, info="archipelago/node/ed25519/v1") -> Node Ed25519 key -> did:key
+-- HKDF-SHA256(seed, info="archipelago/nostr-node/secp256k1/v1") -> Node Nostr key
+-- HKDF-SHA256(seed, info="archipelago/identity/{i}/ed25519/v1") -> Identity i Ed25519 -> did:key
+-- BIP-32 m/44'/1237'/0'/0/{i} -> Identity i Nostr key (NIP-06)
+-- BIP-32 m/84'/0'/0' -> Bitcoin Core wallet (native segwit)
+-- HKDF-SHA256(seed, info="archipelago/lnd/entropy/v1") -> 16 bytes -> LND aezeed entropy
```
---
## Phase 1: Seed Module (foundation)
### New crates in `core/archipelago/Cargo.toml`
```toml
bip39 = "=2.1.0"
bitcoin = { version = "=0.32.5", features = ["rand-std"] }
```
### New file: `core/archipelago/src/seed.rs`
**`MasterSeed` struct** — wraps `Zeroizing<[u8; 64]>`, implements `ZeroizeOnDrop`
Functions:
- `MasterSeed::generate() -> (Mnemonic, MasterSeed)` — 256-bit entropy, 24 words
- `MasterSeed::from_mnemonic(mnemonic) -> MasterSeed` — for restore
- `MasterSeed::from_mnemonic_words(words: &str) -> Result<(Mnemonic, MasterSeed)>` — parse + validate
- `derive_node_ed25519(&MasterSeed) -> SigningKey` — HKDF with info `archipelago/node/ed25519/v1`
- `derive_identity_ed25519(&MasterSeed, index: u32) -> SigningKey` — HKDF with info `archipelago/identity/{index}/ed25519/v1`
- `derive_nostr_identity_key(&MasterSeed, index: u32) -> nostr_sdk::Keys` — BIP-32 `m/44'/1237'/0'/0/{index}`
- `derive_node_nostr_key(&MasterSeed) -> nostr_sdk::Keys` — HKDF with info `archipelago/nostr-node/secp256k1/v1`
- `derive_bitcoin_xprv(&MasterSeed) -> Xpriv` — BIP-32 `m/84'/0'/0'`
- `derive_lnd_entropy(&MasterSeed) -> [u8; 16]` — HKDF with info `archipelago/lnd/entropy/v1`
- `save_seed_encrypted(data_dir, mnemonic, passphrase)` — Argon2+ChaCha20 to `master_seed.enc`
- `load_seed_encrypted(data_dir, passphrase) -> Mnemonic`
- `seed_exists(data_dir) -> bool`
- `save_identity_index(data_dir, next_index: u32)` / `load_identity_index(data_dir) -> u32`
Security: Never log seed/mnemonic. All seed types implement `ZeroizeOnDrop`. File permissions 0o600.
Existing building blocks to reuse:
- `mesh/crypto.rs:hkdf_sha256()` / `hkdf_sha256_32()` — already implemented
- `backup/identity.rs` encryption pattern — Argon2+ChaCha20 (reuse for `save_seed_encrypted`)
- `ed25519-dalek`, `sha2`, `hmac`, `hkdf`, `zeroize` — all in Cargo.toml already
---
## Phase 2: Onboarding UI
### New Vue views:
**`OnboardingSeedGenerate.vue`** — calls `seed.generate`, displays 24 words in grid, "I wrote these down" checkbox
**`OnboardingSeedVerify.vue`** — picks 4 random word positions, user types them back, calls `seed.verify`, shows DID + npub on success
**`OnboardingSeedRestore.vue`** — 24 input fields with BIP-39 wordlist autocomplete, calls `seed.restore`
### New onboarding flow:
```
Intro -> Options (Fresh / Restore) -> [branch]
FRESH: SeedGenerate -> SeedVerify -> Identity (name/purpose) -> Done
RESTORE: SeedRestore -> Done
```
### Router changes (`neode-ui/src/router/index.ts`):
- Add routes: `onboarding/seed`, `onboarding/seed-verify`, `onboarding/seed-restore`
- Remove: `onboarding/did`, `onboarding/backup`, `onboarding/verify`
- Enable Restore path in `OnboardingOptions.vue`
### RPC client (`neode-ui/src/api/rpc-client.ts`):
- `generateSeed()`, `verifySeed()`, `restoreSeed()`, `saveSeedEncrypted()`, `seedStatus()`
---
## Phase 3: Backend Integration
### `identity.rs` — add `NodeIdentity::from_seed(identity_dir, &MasterSeed)`
- Derives Ed25519 node key via `seed::derive_node_ed25519()`
- Writes to `node_key` / `node_key.pub` (same format as today)
- Existing `load_or_create()` unchanged (loads from disk, works for both seed-derived and legacy keys)
### `identity_manager.rs` — seed-aware `create()`
- When seed available: derive Ed25519 from `derive_identity_ed25519(seed, index)`, Nostr from `derive_nostr_identity_key(seed, index)`
- Increment and persist `identity_index`
- Add `derivation_index: Option<u32>` to `IdentityFile` (serde default, backward-compatible)
- When no seed (legacy): fall back to current random generation
### `server.rs` — startup flow:
```
seed exists + node_key exists -> Normal seed-backed operation
no seed + node_key exists -> Legacy node, show migration prompt
no seed + no node_key -> Fresh install, await onboarding
seed exists + no node_key -> Re-derive from seed (recovery)
```
- Add `seed_backed: bool` to `ServerInfo`
### New RPC endpoints in `api/rpc/seed.rs`:
- `seed.generate` — generates mnemonic, derives & writes node keys, returns words (onboarding only, unauth)
- `seed.verify` — validates user re-entered correct words (onboarding only)
- `seed.restore` — accepts 24 words, derives all keys, writes to disk (onboarding only, unauth)
- `seed.save-encrypted` — encrypts mnemonic to `master_seed.enc` (optional convenience)
- `seed.status` — returns `{ has_seed, is_legacy, identity_count, next_index }`
- `seed.derive-lnd-entropy` — password-protected, returns 16 bytes for LND wallet init
- `seed.derive-bitcoin-xprv` — password-protected, returns xprv for Bitcoin Core import
In-memory mnemonic between `seed.generate` and `seed.verify`: held in `Mutex<Option<Zeroizing<String>>>` with 10-minute auto-clear timeout.
---
## Phase 4: Bitcoin/LND Integration
### LND wallet from seed:
- `lnd.init-wallet-from-seed` handler — derives 16-byte entropy, calls LND REST `POST /v1/initwallet` with `seed_entropy`
- Triggered during LND first-install flow
### Bitcoin Core wallet from seed:
- `bitcoin.init-wallet-from-seed` handler — derives BIP-84 xprv, calls `createwallet` + `importdescriptors` via Bitcoin Core RPC
- Triggered during Bitcoin Core first-install flow
Both endpoints require password re-verification.
---
## Phase 5: Migration & Polish
### Legacy node migration:
- Detect legacy nodes (node_key exists, no master_seed.enc)
- Settings page shows prompt: "Set up seed phrase to protect future identities"
- Existing keys preserved — only NEW identities use seed derivation
- Optional full migration (`seed.migrate-legacy`) can be added later
### Cleanup:
- Remove old `OnboardingDid.vue`, `OnboardingBackup.vue`, `OnboardingVerify.vue`
- Update Settings backup section to show seed status
- Update ADR-008 to reflect implementation matches description
---
## File Layout After Implementation
```
{data_dir}/identity/
node_key # 32 bytes Ed25519 secret (derived from seed or legacy)
node_key.pub # 32 bytes Ed25519 public
master_seed.enc # NEW: encrypted mnemonic (optional convenience backup)
identity_index # NEW: next derivation index (plain text integer)
{data_dir}/identities/
{uuid}.json # Same format + optional derivation_index field
```
---
## Critical Files to Modify
| File | Change |
|------|--------|
| `core/archipelago/Cargo.toml` | Add `bip39`, `bitcoin` crates |
| `core/archipelago/src/seed.rs` | **NEW** — all seed logic |
| `core/archipelago/src/identity.rs` | Add `from_seed()` constructor |
| `core/archipelago/src/identity_manager.rs` | Seed-aware `create()`, add `derivation_index` |
| `core/archipelago/src/server.rs` | Startup state detection (seed/legacy/fresh) |
| `core/archipelago/src/api/rpc/seed.rs` | **NEW** — seed RPC handlers |
| `core/archipelago/src/api/rpc/dispatcher.rs` | Register seed.* endpoints |
| `neode-ui/src/views/OnboardingSeedGenerate.vue` | **NEW** — show 24 words |
| `neode-ui/src/views/OnboardingSeedVerify.vue` | **NEW** — verify written words |
| `neode-ui/src/views/OnboardingSeedRestore.vue` | **NEW** — enter 24 words to restore |
| `neode-ui/src/views/OnboardingOptions.vue` | Enable Restore path |
| `neode-ui/src/router/index.ts` | Update onboarding routes |
| `neode-ui/src/api/rpc-client.ts` | Add seed RPC methods |
---
## Verification
1. **Unit tests**: Deterministic derivation (same mnemonic -> same keys), invalid mnemonic rejection, index increment, zeroization
2. **Integration**: Fresh install flow end-to-end, restore flow (generate on node A, enter words on node B, verify same DID/npub)
3. **Security**: Grep seed.rs for tracing macros that interpolate seed vars, verify file permissions
4. **LND**: Derive entropy, init wallet, verify deterministic aezeed
5. **Bitcoin Core**: Derive xprv, import descriptors, verify addresses match
6. **Legacy**: Existing node without seed starts normally, can still create identities
7. **Type check**: `cd neode-ui && npx vue-tsc -b --noEmit`

14
.claude/rules/backend.md Normal file
View File

@ -0,0 +1,14 @@
---
globs:
- "core/**/*.rs"
- "core/**/Cargo.toml"
---
# Backend Rules (Archipelago — Rust)
- Backend binds `127.0.0.1` only — nginx handles external access
- Validate all input before path construction — reject `..`, `/`, null bytes
- Timeouts on all external operations (10s default, 30s for heavy like Bitcoin RPC)
- Use `anyhow::Result` for error propagation, not `.unwrap()` in handlers
- Log with `tracing`, never `println!` or `eprintln!` in production paths
- Container commands through `PodmanClient` (core/container/), never raw `Command::new("podman")`

View File

@ -0,0 +1,80 @@
# ISO Boot Branding — Archipelago
Design and build the visual boot experience from USB power-on to web UI.
## Brand Identity
**Archipelago** = self-sovereign Bitcoin node OS. Floating islands in the sky.
| Element | Value |
|---------|-------|
| Primary accent | `#fb923c` (Bitcoin orange) |
| Secondary accent | `#f7931a` (deeper orange) |
| Success | `#4ade80` (green) |
| Background | `#0a0a0a` -> `#050505` (near-black) |
| Text | `#ffffff` (white), `#aaaaaa` (dim), `#555555` (subtle) |
| Glass | `rgba(255,255,255,0.06)` frost overlay |
| Style | Pixel art cyberpunk, dark glass morphism, CRT scanlines |
| Logo | Pixel-art lowercase "a" (from SVG favicon) |
## Boot Stages & What's Customizable
### 1. GRUB Menu (UEFI boot)
- **Background**: `branding/grub-theme/background.png` — any PNG, GRUB scales it
- **Theme**: `branding/grub-theme/theme.txt` — colors, layout, labels
- **Fonts**: Generated with `grub-mkfont` during build, .pf2 format
- **Config**: Written by build script in Step 5 (`grub.cfg` heredoc)
GRUB theme.txt properties that work:
```
desktop-color: "#rrggbb"
desktop-image: "background.png"
title-text: ""
+ boot_menu { left/top/width/height = N%; item_color/selected_item_color = "#rrggbb" }
+ label { left/top/width = N%; text = "string"; color = "#rrggbb"; align = "center" }
```
**IMPORTANT**: Do NOT reference font names in theme.txt unless you know the exact internal name from grub-mkfont output.
### 2. ISOLINUX Menu (BIOS boot)
- Text-only ANSI-style `MENU COLOR` directives
- Use `vesamenu.c32` for graphical, `menu.c32` for compatibility
### 3. Plymouth Splash (kernel boot -> login)
- Theme: `branding/plymouth-theme/archipelago.script`
- Logo: `branding/plymouth-theme/logo.png` (PNG with transparency)
- Config: `branding/plymouth-theme/archipelago.plymouth`
- Kernel param `splash` must be present
### 4. Console Banner (TTY login)
- ASCII art in `/etc/profile.d/archipelago.sh`
- Uses ANSI escape codes for color
### 5. Installer Prompt
- In systemd service wrapper: `/usr/local/bin/archipelago-start-installer`
## Image Specs
| Asset | Format | Size | Notes |
|-------|--------|------|-------|
| GRUB background | PNG | 1024x768 recommended | Large images slow boot |
| Plymouth logo | PNG (RGBA) | 256x256 recommended | Transparent background |
| GRUB fonts | .pf2 | Generated | `grub-mkfont -s SIZE -o out.pf2 input.ttf` |
## Build Integration
GRUB theme: Step 2 (copied from `branding/grub-theme/`, fonts generated with `grub-mkfont`)
Plymouth theme: Step 3 (component copy) + Step 4 (auto-install.sh copies to target)
GRUB on target: auto-install.sh copies to `/mnt/target/boot/grub/themes/archipelago/`
## What to Edit
| File | Affects |
|------|---------|
| `branding/grub-theme/background.png` | GRUB boot screen image |
| `branding/grub-theme/theme.txt` | GRUB menu colors, layout |
| `branding/plymouth-theme/logo.png` | Plymouth boot logo |
| `branding/plymouth-theme/archipelago.script` | Plymouth animation/progress |
| `branding/generate-grub-background.py` | Procedural background generator |
| `branding/generate-plymouth-logo.py` | Procedural logo generator |

View File

@ -0,0 +1,89 @@
---
name: podman
description: Rootless Podman container management — diagnose, fix, and harden uptime. Use for container issues, port problems, UID mapping, health checks, or uptime hardening.
disable-model-invocation: true
allowed-tools: Bash, Read, Edit, Write, Glob, Grep
argument-hint: "[diagnose|fix|uptime] [container-name]"
---
# Podman — Container Management
Archipelago runs rootless Podman as `archipelago` user (UID 1000). All `podman` commands run without sudo. UID mapping: container UID N → host UID (100000 + N).
**SSH**: `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
## Diagnose
```bash
# Container status
podman ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}\t{{.Networks}}"
# Restart policies (must be "unless-stopped")
for c in $(podman ps -a --format "{{.Names}}"); do
echo -n "$c: "; podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}"
done
# Health checks
for c in $(podman ps --format "{{.Names}}"); do
health=$(podman inspect "$c" --format "{{.State.Health.Status}}" 2>/dev/null)
[ -n "$health" ] && [ "$health" != "<no value>" ] && echo "$c: $health"
done
# Resource usage + recent deaths
podman stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
podman events --filter event=died --since 24h 2>/dev/null | tail -10
# Rootless prerequisites
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR" # must be /run/user/1000
grep archipelago /etc/subuid # must show archipelago:100000:65536
ls /var/lib/systemd/linger/ | grep archipelago # must exist
grep DEFAULT_FORWARD_POLICY /etc/default/ufw # must be ACCEPT
```
Cross-check 4 layers for port consistency: Backend config (package.rs) → Podman ports → Nginx proxy → Frontend appLauncher.ts. See `references/port-map.md`.
## Fix
**Restart policy missing**: `podman update --restart unless-stopped CONTAINER_NAME`
**UID mapping (permission denied)**: `sudo chown -R HOST_UID:HOST_UID /var/lib/archipelago/APP`. Formula: host_uid = 100000 + container_uid. See `references/uid-mapping.md`.
**Port conflict**: `ss -tlnp | grep :PORT` to find offender. Can't add ports to running container — must recreate.
**Network missing**: `podman network connect archy-net CONTAINER_NAME`
**UFW blocking LAN**: `sudo sed -i 's/DEFAULT_FORWARD_POLICY="DROP"/DEFAULT_FORWARD_POLICY="ACCEPT"/' /etc/default/ufw && sudo ufw reload`
**Stale processes**: `pgrep -c -f "podman ps"` — if >10, kill stuck processes.
See `references/common-failures.md` for the full error→cause→fix lookup table.
## Uptime Hardening
### Layer 1: Restart policies
```bash
for c in $(podman ps -a --format "{{.Names}}"); do
policy=$(podman inspect "$c" --format "{{.HostConfig.RestartPolicy.Name}}")
[ "$policy" = "no" ] || [ -z "$policy" ] && podman update --restart unless-stopped "$c"
done
```
### Layer 2: Watchdog timer
Create `/usr/local/bin/archipelago-container-watchdog.sh` that restarts stopped/unhealthy containers every 2 minutes via systemd timer. Script runs as archipelago user with `XDG_RUNTIME_DIR=/run/user/1000`.
### Layer 3: Ordered startup
Bitcoin stack has dependency chain: bitcoin-knots → electrumx + lnd → mempool + btcpay + fedimint → UI containers. Create `/usr/local/bin/archipelago-ordered-start.sh` with wait-for-container logic between tiers.
### Verification
```bash
sudo reboot # then SSH back after 3 min
podman ps --format "{{.Names}}" | sort # should match pre-reboot list
```
## Systemd Requirements
The archipelago.service needs these for rootless Podman:
- `ProtectHome=no` (podman stores in ~/.local/share/containers/)
- `PrivateTmp=no` (runtime in /tmp/podman-run-1000/)
- Do not set `RestrictNamespaces=` or `SystemCallFilter=`
- `Environment=XDG_RUNTIME_DIR=/run/user/1000`

View File

@ -0,0 +1,102 @@
# Common Podman Failure Patterns
## Rootless Podman Specific Failures
| Error | Cause | Fix |
|-------|-------|-----|
| `ERRO[0000] cannot find UID/GID for user` | subuid/subgid not configured | Add `archipelago:100000:65536` to `/etc/subuid` and `/etc/subgid` |
| `Error: unshare: operation not permitted` | Systemd `RestrictNamespaces` blocks user namespaces | Remove `RestrictNamespaces=` from `archipelago.service` |
| `Error: could not get runtime: creating runtime` | XDG_RUNTIME_DIR not set or /run/user/1000 missing | Set `Environment=XDG_RUNTIME_DIR=/run/user/1000` in service, ensure `loginctl enable-linger archipelago` |
| `permission denied` on volume mount | Wrong UID ownership — must use mapped UIDs | `sudo chown -R 100000:100000 /var/lib/archipelago/APP` (see UID mapping table) |
| `ERRO[0000] rootless containers not supported` | Podman not configured for rootless | Run `podman system migrate`, check `/etc/subuid` |
| `Error: creating container storage: layer not known` | Corrupted rootless storage | `podman system reset` (destroys all containers — last resort) |
| `Error: stat /tmp/podman-run-1000/...: no such file` | PrivateTmp=yes in systemd isolates /tmp | Set `PrivateTmp=no` in `archipelago.service` |
| Container ports unreachable from LAN | UFW DEFAULT_FORWARD_POLICY="DROP" | Change to "ACCEPT" in `/etc/default/ufw`, then `sudo ufw reload` |
| `Error: error creating network namespace` | Systemd `SystemCallFilter` blocks clone/unshare | Remove `SystemCallFilter=` from `archipelago.service` |
| Containers lose network after service restart | podman runtime dir in /tmp cleaned | Ensure `PrivateTmp=no` so /tmp/podman-run-1000/ persists |
## Container Won't Start
| Error | Cause | Fix |
|-------|-------|-----|
| `exec format error` | Binary built on wrong arch | Rebuild on the Linux server |
| `address already in use` | Port conflict | `ss -tlnp \| grep :PORT` to find offender |
| `permission denied` | Missing capability, wrong UID ownership, or read-only root | Check capabilities, check volume ownership with mapped UID, add tmpfs |
| `OCI runtime error` | Corrupt container state | `podman rm -f NAME && recreate` |
| `image not known` | Image not pulled | `podman pull IMAGE:TAG` |
| `no such network` | Network missing | `podman network create archy-net` |
| `Error: netavark: ...subnet overlap` | Network CIDR conflict | `podman network rm archy-net && podman network create archy-net` |
## Container Starts But App Unreachable
| Symptom | Check Layer | Fix |
|---------|------------|-----|
| Direct port works, /app/ doesn't | Nginx config | Add `/app/{id}/` location block |
| Neither works | Podman ports | `podman port NAME` — verify mapping exists |
| Port mapped but refused | Container logs | App crashing internally — check logs |
| Works sometimes | Resources | Check OOM kills, CPU, disk space |
| 502 Bad Gateway | Nginx→Container | Wrong port in proxy_pass or container restarted |
| Works locally but not from LAN | UFW forward policy | Set `DEFAULT_FORWARD_POLICY="ACCEPT"` in `/etc/default/ufw` |
## Container Keeps Dying
| Pattern | Cause | Fix |
|---------|-------|-----|
| Exits immediately (code 1) | Config error | Check `podman logs NAME` |
| Dies after minutes | OOM killed | Increase `--memory` limit |
| Dies when dep restarts | No restart policy | Add `--restart unless-stopped` |
| Crash loop | Repeated crash | Fix root cause, don't just restart |
| Exit code 127 | Missing binary in container | Wrong image tag or corrupted image — re-pull |
| Exit code 137 | Killed by OOM or signal | Check `dmesg` for OOM kill, check `podman inspect` for OOMKilled |
## Network Issues
| Problem | Cause | Fix |
|---------|-------|-----|
| Can't resolve container names | Not on archy-net | Recreate with `--network=archy-net` |
| Can't reach internet | DNS missing | Add `--dns 1.1.1.1` |
| Container-to-container timeout | Different networks | Put both on same network |
| Bitcoin RPC refused from container | rpcallowip wrong subnet | Use `rpcallowip=0.0.0.0/0` (safe: port mapped, not exposed) |
| Old containers can't find new network | Subnet changed (rootful→rootless) | Recreate containers on new archy-net (rootless uses 10.89.x.x) |
## Volume Permission Patterns (Rootless UID Mapping)
Formula: **host_uid = 100000 + container_uid**
| Container UID | Host UID | Apps | Data Directory |
|---|---|---|---|
| 0 (root) | 100000 | lnd, fedimint, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay, immich | `/var/lib/archipelago/{app}` |
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `/var/lib/archipelago/postgres-*` |
| 101 | 100101 | bitcoin-knots | `/var/lib/archipelago/bitcoin` |
| 472 | 100472 | grafana | `/var/lib/archipelago/grafana` |
| 999 | 100999 | MariaDB (mysql-mempool) | `/var/lib/archipelago/mysql-mempool` |
## Capability Reference
| Capability | Apps That Need It | Failure Mode |
|-----------|------------------|-------------|
| CHOWN | nextcloud, homeassistant, btcpay, jellyfin, portainer | Can't chown during setup |
| SETUID/SETGID | nextcloud, homeassistant, btcpay, jellyfin | Can't switch to service user |
| DAC_OVERRIDE | nextcloud, homeassistant, btcpay | Can't access cross-UID files |
| FOWNER | bitcoin-knots, lnd, fedimint | Can't modify data dir perms |
| NET_BIND_SERVICE | nginx-proxy-manager, vaultwarden | Can't bind ports <1024 |
| NET_ADMIN + NET_RAW | tailscale | Can't create TUN device or manage routes |
## Read-Only Safe Apps
Only these apps can run with `--read-only` + tmpfs: searxng, grafana, filebrowser, electrumx, mempool-electrs, electrs, nostr-rs-relay, ollama, indeedhub
All others need writable root or will fail silently.
## Systemd Sandbox Requirements for Rootless Podman
These systemd service settings MUST be configured for rootless Podman to work:
| Setting | Required Value | Why |
|---------|---------------|-----|
| `ProtectHome=` | `no` | Podman stores images in `~/.local/share/containers/` |
| `PrivateTmp=` | `no` | Podman runtime lives in `/tmp/podman-run-1000/` |
| `RestrictNamespaces=` | NOT SET | Rootless podman creates user namespaces |
| `SystemCallFilter=` | NOT SET | Rootless podman needs clone/unshare syscalls |
| `ReadWritePaths=` | Include `/var/lib/archipelago /run/user /tmp /etc/containers /var/lib/containers /run/containers` | Volume data + podman runtime paths |
| `Environment=` | `XDG_RUNTIME_DIR=/run/user/1000` | Podman socket location |

View File

@ -0,0 +1,71 @@
# Archipelago Canonical Port Map
All port assignments across the 4 configuration layers. When adding or debugging an app, every row must be consistent across all columns.
## Bitcoin Stack
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| bitcoin-knots | 8332, 8333 | 8332, 8333 | archy-net | /app/bitcoin-knots/ | 8332→bitcoin-knots |
| bitcoin-ui | 8334 | 80 | bridge | /app/bitcoin-ui/ | 8334→bitcoin-knots |
| electrs | 50001 | 50001 | archy-net | /app/electrs/ | 50001→electrs |
| lnd | 9735, 10009, 8080 | 9735, 10009, 8080 | archy-net | /app/lnd/ | 10009→lnd |
| lnd-ui (RTL) | 8081 | 80 | bridge | /app/lnd-ui/ | 8081→lnd |
## Lightning & Payment
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| btcpay-server | 23000 | 49392 | archy-net | /app/btcpay/ | 23000→btcpay-server |
| nbxplorer | 24444 | 32838 | archy-net | N/A (internal) | N/A |
| fedimint | 8173, 8174, 8175 | 8173, 8174, 8175 | archy-net | /app/fedimint/ | 8174→fedimint |
| fedimint-gateway | 8175 | 8175 | archy-net | /app/fedimint-gateway/ | 8175→fedimint-gateway |
## Explorer & Monitoring
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| mempool | 4080 | 8080 | archy-net | /app/mempool/ | 4080→mempool |
| grafana | 3000 | 3000 | bridge | /app/grafana/ | 3000→grafana (new tab) |
## Self-Hosted Apps
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| nextcloud | 8085 | 80 | bridge | /app/nextcloud/ | 8085→nextcloud |
| vaultwarden | 8082 | 80 | bridge | /app/vaultwarden/ | 8082→vaultwarden (new tab) |
| filebrowser | 8083 | 80 | bridge | /app/filebrowser/ | 8083→filebrowser |
| searxng | 8888 | 8080 | bridge | /app/searxng/ | 8888→searxng |
| photoprism | 2342 | 2342 | bridge | /app/photoprism/ | 2342→photoprism (new tab) |
| jellyfin | 8096 | 8096 | bridge | /app/jellyfin/ | 8096→jellyfin |
| homeassistant | 8123 | 8123 | bridge | /app/homeassistant/ | 8123→homeassistant (new tab) |
| ollama | 11434 | 11434 | archy-net | /app/ollama/ | 11434→ollama |
| open-webui | 3080 | 8080 | archy-net | /app/open-webui/ | 3080→open-webui |
## Nostr & Social
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| nostr-rs-relay | 7000 | 8080 | archy-net | /app/nostr-rs-relay/ | 7000→nostr-rs-relay |
| indeedhub | 3001 | 3000 | archy-net | /app/indeedhub/ | 3001→indeedhub |
## System
| App | Host Port(s) | Container Port(s) | Network | Nginx Path | Frontend Map |
|-----|-------------|-------------------|---------|------------|-------------|
| tailscale | 8240 | 8240 | host | /app/tailscale/ | N/A |
| nginx-proxy-manager | 81, 8443 | 81, 443 | bridge | N/A | 81→nginx-proxy-manager |
## Multi-Container Stacks
**Immich**: immich-server (2283), immich-postgres (internal 5432), immich-redis (internal 6379) — all on immich-net
**Penpot**: penpot-frontend (9001→80), penpot-backend, penpot-exporter, penpot-postgres, penpot-mailcatch — all on penpot-net
**Mempool**: mempool (4080→8080), mempool-db (internal 3306) — on archy-net
**BTCPay**: btcpay-server (23000→49392), nbxplorer (24444→32838), btcpay-postgres (internal 5432) — on archy-net
## Key Notes
- **archy-net apps** resolve each other by container name (e.g., `bitcoin-knots:8332`)
- **bridge apps** are standalone — access services via host IP/port
- **host network** (tailscale only) — shares host namespace, no port mapping
- **New tab apps**: btcpay (23000), grafana (3000), vaultwarden (8082), photoprism (2342), homeassistant (8123) — X-Frame-Options blocks iframe

View File

@ -0,0 +1,93 @@
# Rootless Podman UID Mapping Reference
## How Rootless UID Mapping Works
When Podman runs as the `archipelago` user (UID 1000), container processes don't run as their "apparent" UID on the host. Instead, Linux user namespaces remap UIDs.
**Mapping formula**: `host_uid = 100000 + container_uid`
This is configured in `/etc/subuid` and `/etc/subgid`:
```
archipelago:100000:65536
```
This means:
- Container UID 0 (root inside container) → Host UID 100000 (unprivileged on host)
- Container UID 70 (postgres) → Host UID 100070
- Container UID 101 (bitcoin) → Host UID 100101
- etc.
## Why This Matters
Volume directories (bind mounts) on the host must be owned by the **mapped** UID, not the container UID. If Bitcoin runs as UID 101 inside its container, the host directory must be owned by UID 100101.
If ownership is wrong, the container gets `permission denied` when trying to read/write its data.
## Complete UID Mapping Table
| Container UID | Host UID | Containers | Fix Command |
|---|---|---|---|
| 0 (root) | 100000 | lnd, fedimint, fedimint-gateway, homeassistant, jellyfin, vaultwarden, photoprism, ollama, filebrowser, electrumx, btcpay-server, nbxplorer, immich, nostr-rs-relay, strfry, nextcloud, searxng, onlyoffice, tailscale, uptime-kuma | `sudo chown -R 100000:100000 /var/lib/archipelago/{app}` |
| 70 | 100070 | postgres (btcpay-db, immich-db, penpot-postgres) | `sudo chown -R 100070:100070 /var/lib/archipelago/postgres-*` |
| 101 | 100101 | bitcoin-knots, bitcoin-core | `sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin` |
| 472 | 100472 | grafana | `sudo chown -R 100472:100472 /var/lib/archipelago/grafana` |
| 999 | 100999 | MariaDB (mysql-mempool) | `sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool` |
## How to Find a Container's UID
If you encounter a new container with permission issues:
```bash
# Check what user the container runs as
podman inspect CONTAINER_NAME --format "{{.Config.User}}"
# If empty, it runs as root (UID 0) → host UID 100000
# If it shows a username, find the UID inside the image
podman run --rm IMAGE_NAME id
# Then calculate: host_uid = 100000 + container_uid
```
## Fix Script
Run this after any fresh install, migration, or when containers have permission errors:
```bash
#!/bin/bash
# Fix all rootless podman volume ownership
# UID 0 → 100000 (most containers)
for dir in lnd fedimint fedimint-gateway homeassistant jellyfin vaultwarden photoprism \
ollama filebrowser electrumx btcpay nbxplorer immich nostr-rs-relay nextcloud \
searxng onlyoffice uptime-kuma; do
[ -d "/var/lib/archipelago/$dir" ] && sudo chown -R 100000:100000 "/var/lib/archipelago/$dir"
done
# UID 101 → 100101 (Bitcoin)
[ -d "/var/lib/archipelago/bitcoin" ] && sudo chown -R 100101:100101 /var/lib/archipelago/bitcoin
# UID 70 → 100070 (PostgreSQL)
for dir in /var/lib/archipelago/postgres-* /var/lib/archipelago/btcpay-db /var/lib/archipelago/immich-db; do
[ -d "$dir" ] && sudo chown -R 100070:100070 "$dir"
done
# UID 999 → 100999 (MariaDB)
[ -d "/var/lib/archipelago/mysql-mempool" ] && sudo chown -R 100999:100999 /var/lib/archipelago/mysql-mempool
# UID 472 → 100472 (Grafana)
[ -d "/var/lib/archipelago/grafana" ] && sudo chown -R 100472:100472 /var/lib/archipelago/grafana
```
## Rootful vs Rootless Comparison
| Aspect | Rootful (old) | Rootless (current) |
|--------|---------------|-------------------|
| Podman command | `sudo podman` | `podman` (as archipelago user) |
| Container storage | `/var/lib/containers/storage` | `~/.local/share/containers/storage` |
| Container subnet | `10.88.0.0/16` | `10.89.0.0/16` |
| Volume ownership | Container UID directly | Mapped UID (100000 + container_uid) |
| Requires root? | Yes | No (except fixing volume ownership) |
| XDG_RUNTIME_DIR | Not needed | Required: `/run/user/1000` |
| User lingering | Not needed | Required: `loginctl enable-linger` |
| Systemd restrictions | All can be enabled | Must disable: RestrictNamespaces, SystemCallFilter |

View File

@ -0,0 +1,27 @@
# Polish: Backend Quality
All changes built on dev server, not macOS: `./scripts/deploy-to-target.sh --live`
## Priority 1: Eliminate panics
```bash
ssh archipelago@192.168.1.228 "grep -rn 'unwrap()\|\.expect(' ~/archy/core/archipelago/src/ --include='*.rs' | grep -v test | grep -v '_test.rs'"
```
Replace with `?` + `.context()` or `.map_err()`.
## Priority 2: Add timeouts
- Container ops: `tokio::time::timeout(Duration::from_secs(30), op).await`
- HTTP/RPC calls: `reqwest::Client::builder().timeout(Duration::from_secs(10))`
## Priority 3: Connection pooling
Store reusable `reqwest::Client` in RpcHandler instead of creating per-request.
## Priority 4: Clippy
```bash
ssh archipelago@192.168.1.228 "cd ~/archy && cargo clippy --all-targets --all-features 2>&1"
```
## Priority 5: Replace println with tracing
`println!``tracing::info!`, `eprintln!``tracing::warn!`
## Verify
Zero clippy warnings, zero unwrap/expect in prod code, zero println.

View File

@ -0,0 +1,26 @@
# Polish: Deployment Pipeline
## Pre-Deploy Checks
Add to deploy-to-target.sh: SSH key exists, target reachable, 2GB free disk space.
## Backup Before Deploy
```bash
sudo cp /usr/local/bin/archipelago /usr/local/bin/archipelago.backup
sudo cp -a /opt/archipelago/web-ui /opt/archipelago/web-ui.backup
sudo cp /etc/nginx/sites-available/archipelago /etc/nginx/sites-available/archipelago.backup
```
## Health Check After Deploy
Loop up to 15 attempts, 2s apart, checking `curl http://localhost:5678/health` returns 200.
## Rollback on Failure
If health check fails: restore binary, frontend, nginx from .backup files, restart services.
## Deployment Lock
Use `flock` on `/tmp/archipelago-deploy.lock` to prevent concurrent deploys.
## Nginx Validation
Always `sudo nginx -t` before reload. If invalid, restore backup config.
## Integration Flow
1. acquire_lock → 2. pre_deploy_checks → 3. backup_current → 4. build + deploy → 5. validate_nginx → 6. restart services → 7. health_check || rollback

View File

@ -0,0 +1,23 @@
# Polish: Error Handling
## Find
- Silent catches: `grep -rn "catch.*=>.*{}" --include="*.vue" --include="*.ts" src/`
- Empty try/catch: `grep -rn "catch.*{$" -A1` looking for immediate `}`
- Missing error states in views: check each view has `errorMessage` ref
## Fix Pattern
```typescript
.catch((err) => {
console.error('[ComponentName] operation failed:', err)
errorMessage.value = err instanceof Error ? err.message : 'Operation failed'
})
```
Template: `<p v-if="errorMessage" class="text-red-400 text-sm mt-2">{{ errorMessage }}</p>`
## Backend
- Replace `unwrap_or_default()` on serialization with proper error propagation
- Consistent RPC error structure: `{ error: { code: string, message: string } }`
## Verify
Both should return zero: silent catches and empty catch blocks.

View File

@ -0,0 +1,30 @@
# Polish: Form Validation
## Pattern
```typescript
const isSubmitting = ref(false)
const passwordErrors = computed(() => {
const errors: string[] = []
if (password.value.length > 0 && password.value.length < 8)
errors.push('Must be at least 8 characters')
return errors
})
async function submit() {
if (isSubmitting.value) return
isSubmitting.value = true
try { await rpcClient.call(...) }
catch (err) { errorMessage.value = formatError(err) }
finally { isSubmitting.value = false }
}
```
## Checklist per form
- Real-time validation as user types (debounced 300ms)
- Submit button disabled during operation and when validation fails
- All text inputs trimmed before submission
- Error messages are user-friendly (no raw error strings)
- TOTP: `inputmode="numeric"`, auto-submit at 6 digits
## Forms to polish
Login.vue (password setup, TOTP), Settings.vue (password change), any other form inputs.

View File

@ -0,0 +1,26 @@
# Polish: Loading States
Every async view needs 3 states: loading skeleton, empty state, timeout warning.
## Skeleton Pattern
```vue
<div v-if="isLoading"><!-- skeleton matching layout --></div>
<div v-else-if="items.length === 0" class="glass-card text-center py-12">
<p class="text-white/60">No items yet</p>
</div>
<div v-else><!-- real content --></div>
```
## Timeout Warning
After 15s show "Taking longer than expected...", after 30s show troubleshooting.
```typescript
const loadingTooLong = ref(false)
const timeout = setTimeout(() => { loadingTooLong.value = true }, 15000)
watch(isLoading, (val) => { if (!val) clearTimeout(timeout) })
```
## Priority Views
Apps.vue, AppDetails.vue, Marketplace.vue, Dashboard.vue, Cloud.vue, Settings.vue, Server.vue
## Verify
Each view has: `isLoading` ref, skeleton section, empty state, timeout warning. Use global classes only.

View File

@ -0,0 +1,22 @@
# Polish: Security Hardening
## 1. Systemd Service
Add to `image-recipe/configs/archipelago.service`:
`NoNewPrivileges=true`, `ProtectSystem=strict`, `ReadWritePaths=/var/lib/archipelago`
Verify: `ssh ... "sudo systemd-analyze security archipelago"` — score < 5.0
## 2. Nginx Headers
- HSTS (HTTPS only): `add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;`
- Rate limiting zones: `limit_req_zone $binary_remote_addr zone=auth:10m rate=5r/m;`
- Custom log format stripping tokens
## 3. Secrets Management
Replace hardcoded `archipelago123` with generated secrets:
- Generate on first boot: `openssl rand -base64 24 > /var/lib/archipelago/secrets/bitcoin-rpc-pass`
- Backend reads from env var: `std::env::var("ARCHIPELAGO_BITCOIN_RPC_PASS")`
## 4. SSH Hardening
Replace `StrictHostKeyChecking=no` with `StrictHostKeyChecking=accept-new` in deploy script.
## Verify
`grep -rn 'archipelago123' scripts/ core/` should return zero. Nginx headers pass curl check. Rate limiting returns 429 on rapid auth requests.

View File

@ -0,0 +1,25 @@
# Polish: WebSocket & Real-Time
## 1. Connection Status Indicator
Add to App.vue header: green dot (connected), amber pulse (reconnecting), red (disconnected).
Connect to actual WebSocket state from websocket.ts.
## 2. Reconnection UX
After max reconnect attempts, show persistent banner "Connection lost. Click to retry."
Add `forceReconnect()` method that resets attempt counter.
## 3. Heartbeat
Active ping every 30s with 5s pong timeout (replace passive 60s stale detection).
Backend must respond to `ping` with `pong` — check handler.rs.
## 4. Session Timeout
In rpc-client.ts base `call()`: on 401/403 response, redirect to `/login`.
## 5. Race Condition Fix
Use listener deduplication (Set) or remove-all-then-resubscribe on reconnect.
## 6. Message Queuing
Queue subscription requests while disconnected, replay on reconnect.
## Verify
Kill backend → shows "Disconnected" → restart → auto-reconnects. Toggle wifi → status updates. Session timeout → redirects to login.

View File

@ -389,6 +389,15 @@ pub(super) fn get_data_dirs_for_app(package_id: &str) -> Vec<String> {
}
}
/// Read a secret from /var/lib/archipelago/secrets/{name}.
/// Falls back to the provided default if the file doesn't exist.
fn read_secret(name: &str, default: &str) -> String {
let path = format!("/var/lib/archipelago/secrets/{}", name);
std::fs::read_to_string(&path)
.map(|s| s.trim().to_string())
.unwrap_or_else(|_| default.to_string())
}
/// Get app-specific configuration
/// Returns: (ports, volumes, env_vars, custom_command, custom_args)
pub(super) async fn get_app_config(
@ -413,7 +422,12 @@ pub(super) async fn get_app_config(
None,
),
"bitcoin" | "bitcoin-core" | "bitcoin-knots" => (
vec!["8332:8332".to_string(), "8333:8333".to_string()],
vec![
"8332:8332".to_string(),
"8333:8333".to_string(),
"28332:28332".to_string(),
"28333:28333".to_string(),
],
vec!["/var/lib/archipelago/bitcoin:/home/bitcoin/.bitcoin".to_string()],
vec![],
None,
@ -453,7 +467,8 @@ pub(super) async fn get_app_config(
format!("BTCPAY_BTCRPCURL=http://{}:8332", host_ip),
format!("BTCPAY_BTCRPCUSER={}", rpc_user),
format!("BTCPAY_BTCRPCPASSWORD={}", rpc_pass),
"BTCPAY_POSTGRES=User ID=btcpay;Password=btcpaypass;Host=archy-btcpay-db;Port=5432;Database=btcpay;Include Error Detail=true".to_string(),
format!("BTCPAY_POSTGRES=User ID=btcpay;Password={};Host=archy-btcpay-db;Port=5432;Database=btcpay;Include Error Detail=true",
read_secret("btcpay-db-password", "btcpaypass")),
],
None,
None,
@ -481,7 +496,7 @@ pub(super) async fn get_app_config(
"DATABASE_HOST=archy-mempool-db".to_string(),
"DATABASE_DATABASE=mempool".to_string(),
"DATABASE_USERNAME=mempool".to_string(),
"DATABASE_PASSWORD=mempoolpass".to_string(),
format!("DATABASE_PASSWORD={}", read_secret("mempool-db-password", "mempoolpass")),
],
None,
None,
@ -511,8 +526,8 @@ pub(super) async fn get_app_config(
vec![
"MYSQL_DATABASE=mempool".to_string(),
"MYSQL_USER=mempool".to_string(),
"MYSQL_PASSWORD=mempoolpass".to_string(),
"MYSQL_ROOT_PASSWORD=rootpass".to_string(),
format!("MYSQL_PASSWORD={}", read_secret("mempool-db-password", "mempoolpass")),
format!("MYSQL_ROOT_PASSWORD={}", read_secret("mempool-db-root-password", "rootpass")),
],
None,
None,
@ -607,7 +622,7 @@ pub(super) async fn get_app_config(
vec![
"DB_HOSTNAME=immich_postgres".to_string(),
"DB_USERNAME=postgres".to_string(),
"DB_PASSWORD=immichpass".to_string(),
format!("DB_PASSWORD={}", read_secret("immich-db-password", "immichpass")),
"DB_DATABASE_NAME=immich".to_string(),
"REDIS_HOSTNAME=immich_redis".to_string(),
"UPLOAD_LOCATION=/usr/src/app/upload".to_string(),

View File

@ -256,8 +256,9 @@ impl RpcHandler {
.trim()
.to_string();
// Post-start health verification: wait up to 30s for container to be running
for i in 0..6u32 {
// Post-start health verification: wait up to 60s for container to be running
let mut container_running = false;
for i in 0..12u32 {
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
let status = tokio::process::Command::new("podman")
.args(["inspect", container_name, "--format", "{{.State.Status}}"])
@ -266,6 +267,7 @@ impl RpcHandler {
if let Ok(o) = status {
let state = String::from_utf8_lossy(&o.stdout).trim().to_string();
if state == "running" {
container_running = true;
break;
}
if state == "exited" {
@ -288,12 +290,19 @@ impl RpcHandler {
));
}
}
if i == 5 {
debug!("Container {} health check timeout (30s) — continuing anyway", container_name);
if i == 11 {
warn!("Container {} not running after 60s — install may have failed", container_name);
}
}
// Post-install hooks
if !container_running {
return Err(anyhow::anyhow!(
"Container {} did not reach running state within 60s. Check logs with: podman logs {}",
container_name, container_name
));
}
// Post-install hooks — await completion before returning success
self.run_post_install_hooks(package_id).await;
Ok(serde_json::json!({
@ -536,98 +545,106 @@ printtoconsole=1\n",
}
/// Run post-install hooks (Nextcloud trusted domains, Bitcoin UI container).
/// Critical hooks (credential setup, config) are awaited; UI container builds are background.
async fn run_post_install_hooks(&self, package_id: &str) {
if package_id == "filebrowser" {
tokio::spawn(async move {
// Wait for filebrowser to start and initialize its database
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
// Wait for filebrowser to start and initialize its database
tokio::time::sleep(std::time::Duration::from_secs(5)).await;
// Generate a random password (32 bytes, hex-encoded)
let mut buf = [0u8; 32];
rand::RngCore::fill_bytes(&mut rand::rngs::OsRng, &mut buf);
let password = hex::encode(buf);
// Generate a random password (32 bytes, hex-encoded)
let mut buf = [0u8; 32];
rand::RngCore::fill_bytes(&mut rand::rngs::OsRng, &mut buf);
let password = hex::encode(buf);
// Get a JWT token with default credentials
let login_res = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(10))
.build()
.unwrap_or_default()
.post("http://127.0.0.1:8083/api/login")
.json(&serde_json::json!({"username": "admin", "password": "admin"}))
.send()
.await;
// Get a JWT token with default credentials
let client = match reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(10))
.build()
{
Ok(c) => c,
Err(e) => {
tracing::warn!("Failed to create HTTP client for FileBrowser hook: {}", e);
return;
}
};
let token = match login_res {
Ok(resp) if resp.status().is_success() => {
resp.text().await.unwrap_or_default().trim_matches('"').to_string()
}
_ => {
tracing::warn!("FileBrowser not ready for password change — keeping default");
return;
}
};
let login_res = client
.post("http://127.0.0.1:8083/api/login")
.json(&serde_json::json!({"username": "admin", "password": "admin"}))
.send()
.await;
// Change admin password via filebrowser API
let change_res = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(10))
.build()
.unwrap_or_default()
.put("http://127.0.0.1:8083/api/users/1")
.header("X-Auth", &token)
.json(&serde_json::json!({"password": password}))
.send()
.await;
match change_res {
Ok(resp) if resp.status().is_success() => {
let secret_dir = "/var/lib/archipelago/secrets/filebrowser";
let _ = tokio::fs::create_dir_all(secret_dir).await;
let _ = tokio::fs::write(
format!("{}/password", secret_dir),
&password,
).await;
info!("FileBrowser admin password secured (default credentials replaced)");
}
Ok(resp) => {
tracing::warn!("FileBrowser password change failed: {}", resp.status());
}
Err(e) => {
tracing::warn!("FileBrowser password change error: {}", e);
let token = match login_res {
Ok(resp) if resp.status().is_success() => {
match resp.text().await {
Ok(t) => t.trim_matches('"').to_string(),
Err(e) => {
tracing::warn!("FileBrowser login response parse failed: {}", e);
return;
}
}
}
});
_ => {
tracing::warn!("FileBrowser not ready for password change — keeping default");
return;
}
};
// Change admin password via filebrowser API
let change_res = client
.put("http://127.0.0.1:8083/api/users/1")
.header("X-Auth", &token)
.json(&serde_json::json!({"password": password}))
.send()
.await;
match change_res {
Ok(resp) if resp.status().is_success() => {
let secret_dir = "/var/lib/archipelago/secrets/filebrowser";
let _ = tokio::fs::create_dir_all(secret_dir).await;
let _ = tokio::fs::write(
format!("{}/password", secret_dir),
&password,
).await;
info!("FileBrowser admin password secured (default credentials replaced)");
}
Ok(resp) => {
tracing::warn!("FileBrowser password change failed: {}", resp.status());
}
Err(e) => {
tracing::warn!("FileBrowser password change error: {}", e);
}
}
}
if package_id == "nextcloud" {
let host_ip = self.config.host_ip.clone();
tokio::spawn(async move {
// Wait for Nextcloud to finish first-run initialization
tokio::time::sleep(std::time::Duration::from_secs(30)).await;
for domain_idx in 1..=2u8 {
let value = if domain_idx == 1 {
host_ip.as_str()
} else {
"localhost"
};
let _ = tokio::process::Command::new("podman")
.args([
"exec",
"-u",
"33",
"nextcloud",
"php",
"occ",
"config:system:set",
"trusted_domains",
&domain_idx.to_string(),
"--value",
value,
])
.output()
.await;
}
info!("Nextcloud trusted domains configured for {}", host_ip);
});
let host_ip = &self.config.host_ip;
// Wait for Nextcloud to finish first-run initialization
tokio::time::sleep(std::time::Duration::from_secs(30)).await;
for domain_idx in 1..=2u8 {
let value = if domain_idx == 1 {
host_ip.as_str()
} else {
"localhost"
};
let _ = tokio::process::Command::new("podman")
.args([
"exec",
"-u",
"33",
"nextcloud",
"php",
"occ",
"config:system:set",
"trusted_domains",
&domain_idx.to_string(),
"--value",
value,
])
.output()
.await;
}
info!("Nextcloud trusted domains configured for {}", host_ip);
}
// Build and start companion UI containers for headless services

View File

@ -58,6 +58,7 @@ fn create_installing_entry(package_id: &str) -> PackageDataEntry {
PackageDataEntry {
state: PackageState::Installing,
health: None,
exit_code: None,
static_files: StaticFiles {
license: String::new(),
instructions: String::new(),

View File

@ -221,18 +221,30 @@ impl RpcHandler {
}
}
// Remove container (without -f to respect graceful shutdown above)
tracing::info!("Uninstall {}: removing container {}", package_id, name);
let rm_out = tokio::process::Command::new("podman")
.args(["rm", "-f", name])
.args(["rm", name])
.output()
.await;
match rm_out {
Ok(o) if o.status.success() => removed += 1,
Ok(o) => {
// If normal rm fails (e.g., still running), force as fallback
let stderr = String::from_utf8_lossy(&o.stderr);
let msg = format!("Failed to remove {}: {}", name, stderr.trim());
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
tracing::warn!("Uninstall {}: rm {} failed ({}), trying force", package_id, name, stderr.trim());
let force_rm = tokio::process::Command::new("podman")
.args(["rm", "-f", name])
.output()
.await;
match force_rm {
Ok(o2) if o2.status.success() => removed += 1,
_ => {
let msg = format!("Failed to remove {}: {}", name, stderr.trim());
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
}
}
}
Err(e) => {
let msg = format!("Failed to remove {}: {}", name, e);
@ -242,6 +254,26 @@ impl RpcHandler {
}
}
// Clean up dangling volumes associated with removed containers
let _ = tokio::process::Command::new("podman")
.args(["volume", "prune", "-f"])
.output()
.await;
// Clean up app-specific networks (only if no other containers use them)
let app_networks: Vec<&str> = match package_id {
"immich" | "immich_server" => vec!["immich-net"],
"penpot" | "penpot-frontend" => vec!["penpot-net"],
"indeedhub" | "indeedhub-api" => vec!["indeedhub-net"],
_ => vec![],
};
for net in &app_networks {
let _ = tokio::process::Command::new("podman")
.args(["network", "rm", net])
.output()
.await;
}
// Release port allocation
{
let mut allocator = self.port_allocator.lock().await;
@ -257,10 +289,19 @@ impl RpcHandler {
.args(["rm", "-rf", dir])
.output()
.await;
if let Ok(o) = rm_out {
if !o.status.success() {
tracing::warn!("Uninstall {}: rm {} failed", package_id, dir);
match rm_out {
Ok(o) if !o.status.success() => {
let stderr = String::from_utf8_lossy(&o.stderr);
let msg = format!("Failed to remove data {}: {}", dir, stderr.trim());
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
}
Err(e) => {
let msg = format!("Failed to remove data {}: {}", dir, e);
tracing::error!("Uninstall {}: {}", package_id, msg);
errors.push(msg);
}
_ => {}
}
}
}
@ -271,20 +312,24 @@ impl RpcHandler {
package_id,
errors
);
} else {
tracing::info!(
"Uninstall {} complete: stopped={}, removed={}",
return Err(anyhow::anyhow!(
"Uninstall {} partially failed: {}",
package_id,
stopped,
removed
);
errors.join("; ")
));
}
tracing::info!(
"Uninstall {} complete: stopped={}, removed={}",
package_id,
stopped,
removed
);
Ok(serde_json::json!({
"status": if errors.is_empty() { "uninstalled" } else { "partial" },
"status": "uninstalled",
"stopped": stopped,
"removed": removed,
"errors": errors,
}))
}

View File

@ -146,6 +146,7 @@ impl DockerPackageScanner {
let package = PackageDataEntry {
state: package_state.clone(),
health: container.health.clone(),
exit_code: if package_state == PackageState::Exited { container.exit_code } else { None },
static_files: StaticFiles {
license: "MIT".to_string(),
instructions: metadata.description.clone(),

View File

@ -262,33 +262,47 @@ pub async fn recover_containers(containers: &[RunningContainerRecord]) -> Recove
tokio::time::sleep(std::time::Duration::from_secs(3)).await;
}
let result = tokio::time::timeout(
std::time::Duration::from_secs(30),
tokio::process::Command::new("podman")
.args(["start", &record.name])
.output(),
)
.await;
// Try up to 2 attempts with increasing timeout (120s first, 180s retry)
let mut started = false;
for attempt in 0..2u32 {
let timeout_secs = if attempt == 0 { 120 } else { 180 };
if attempt > 0 {
info!("Retrying container {} (attempt {})", record.name, attempt + 1);
tokio::time::sleep(std::time::Duration::from_secs(10)).await;
}
let result = tokio::time::timeout(
std::time::Duration::from_secs(timeout_secs),
tokio::process::Command::new("podman")
.args(["start", &record.name])
.output(),
)
.await;
match result {
Ok(Ok(output)) if output.status.success() => {
info!("Successfully restarted container: {}", record.name);
report.recovered += 1;
}
Ok(Ok(output)) => {
let stderr = String::from_utf8_lossy(&output.stderr);
warn!("Failed to restart container {}: {}", record.name, stderr.trim());
report.failed.push(record.name.clone());
}
Ok(Err(e)) => {
warn!("Failed to execute podman start for {}: {}", record.name, e);
report.failed.push(record.name.clone());
}
Err(_) => {
warn!("Timeout starting container {} (30s)", record.name);
report.failed.push(record.name.clone());
match result {
Ok(Ok(output)) if output.status.success() => {
info!("Successfully restarted container: {}", record.name);
report.recovered += 1;
started = true;
break;
}
Ok(Ok(output)) => {
let stderr = String::from_utf8_lossy(&output.stderr);
warn!("Failed to restart container {} (attempt {}): {}",
record.name, attempt + 1, stderr.trim());
}
Ok(Err(e)) => {
warn!("Failed to execute podman start for {} (attempt {}): {}",
record.name, attempt + 1, e);
}
Err(_) => {
warn!("Timeout starting container {} ({}s, attempt {})",
record.name, timeout_secs, attempt + 1);
}
}
}
if !started {
report.failed.push(record.name.clone());
}
}
report
@ -313,7 +327,7 @@ fn is_process_running(pid: u32) -> bool {
/// Skips containers that the user intentionally stopped via the UI.
pub async fn start_stopped_containers(data_dir: &Path) -> RecoveryReport {
let output = match tokio::time::timeout(
std::time::Duration::from_secs(30),
std::time::Duration::from_secs(60),
tokio::process::Command::new("podman")
.args(["ps", "-a", "--filter", "status=exited", "--filter", "status=created", "--format", "{{.Names}}"])
.output(),
@ -322,7 +336,7 @@ pub async fn start_stopped_containers(data_dir: &Path) -> RecoveryReport {
{
Ok(result) => result,
Err(_) => {
warn!("Timeout listing stopped containers (30s)");
warn!("Timeout listing stopped containers (60s)");
return RecoveryReport { total: 0, recovered: 0, failed: Vec::new() };
}
};
@ -374,12 +388,21 @@ pub async fn start_stopped_containers(data_dir: &Path) -> RecoveryReport {
fn container_boot_tier(name: &str) -> u8 {
let id = name.strip_prefix("archy-").unwrap_or(name);
match id {
"btcpay-db" | "mempool-db" | "penpot-postgres" | "immich_postgres"
| "immich_redis" | "penpot-valkey" => 0,
// Tier 0: Databases and data stores
"btcpay-db" | "mempool-db" | "mysql-mempool" | "penpot-postgres"
| "immich_postgres" | "immich_redis" | "penpot-valkey"
| "endurain-db" | "nextcloud-db"
| "indeedhub-postgres" | "indeedhub-redis" | "indeedhub-minio" => 0,
// Tier 1: Core infrastructure
"bitcoin-knots" | "bitcoin-core" | "bitcoin" => 1,
"lnd" | "electrumx" | "mempool-electrs" | "electrs" | "nbxplorer" => 2,
// Tier 2: Dependent services
"lnd" | "electrumx" | "mempool-electrs" | "electrs" | "nbxplorer"
| "mempool-api" | "indeedhub-api" => 2,
// Tier 4: Frontend/UI
"mempool-web" | "bitcoin-ui" | "lnd-ui" | "electrs-ui"
| "penpot-frontend" | "penpot-exporter" => 4,
| "penpot-frontend" | "penpot-exporter"
| "indeedhub" => 4,
// Tier 3: Everything else
_ => 3,
}
}

View File

@ -124,6 +124,9 @@ pub struct PackageDataEntry {
/// Container health: "healthy", "unhealthy", "starting", or null
#[serde(skip_serializing_if = "Option::is_none")]
pub health: Option<String>,
/// Container exit code (only set when state is Exited): 0 = clean, non-zero = crash
#[serde(rename = "exit-code", skip_serializing_if = "Option::is_none")]
pub exit_code: Option<i32>,
#[serde(rename = "static-files")]
pub static_files: StaticFiles,
pub manifest: Manifest,

View File

@ -1,6 +1,7 @@
// Container Health Monitor
// Checks container health every 60s, auto-restarts unhealthy containers (max 3 times)
// with exponential backoff (10s, 30s, 90s), dependency-aware startup ordering,
// Checks container health every 120s, auto-restarts unhealthy containers (max 10 times)
// with exponential backoff (10s..120s), dependency-aware restart ordering (deps first),
// handles "created" state containers, resets dependent counters when deps recover,
// and sends WebSocket notifications to the UI on failure.
use crate::data_model::{Notification, NotificationLevel};
@ -13,10 +14,10 @@ use std::sync::Arc;
use std::time::Instant;
use tracing::{debug, info, warn};
const MAX_RESTART_ATTEMPTS: u32 = 3;
const CHECK_INTERVAL_SECS: u64 = 60;
/// Backoff delays per attempt: 10s, 30s, 90s
const BACKOFF_DELAYS_SECS: [u64; 3] = [10, 30, 90];
const MAX_RESTART_ATTEMPTS: u32 = 10;
const CHECK_INTERVAL_SECS: u64 = 120;
/// Backoff delays per attempt — escalating from 10s to 120s
const BACKOFF_DELAYS_SECS: [u64; 10] = [10, 15, 20, 30, 30, 45, 60, 60, 90, 120];
/// Reset restart counter after 1 hour of stability
const STABILITY_RESET_SECS: u64 = 3600;
@ -39,25 +40,83 @@ enum StartupTier {
fn container_tier(name: &str) -> StartupTier {
let id = name.strip_prefix("archy-").unwrap_or(name);
match id {
// Tier 0: Databases
"btcpay-db" | "mempool-db" | "penpot-postgres" | "immich_postgres"
| "immich_redis" | "penpot-valkey" | "endurain-db" | "nextcloud-db" => StartupTier::Database,
// Tier 0: Databases and data stores
"btcpay-db" | "mempool-db" | "mysql-mempool" | "penpot-postgres"
| "immich_postgres" | "immich_redis" | "penpot-valkey"
| "endurain-db" | "nextcloud-db"
| "indeedhub-postgres" | "indeedhub-redis" | "indeedhub-minio" => StartupTier::Database,
// Tier 1: Core infrastructure
"bitcoin-knots" | "bitcoin-core" | "bitcoin" => StartupTier::CoreInfra,
// Tier 2: Dependent services
"lnd" | "electrumx" | "mempool-electrs" | "electrs" | "nbxplorer" => StartupTier::DependentService,
// Tier 2: Dependent services (need databases or bitcoin)
"lnd" | "electrumx" | "mempool-electrs" | "electrs" | "nbxplorer"
| "mempool-api" | "indeedhub-api" => StartupTier::DependentService,
// Tier 4: Frontend/UI
"mempool-web" | "bitcoin-ui" | "lnd-ui" | "electrs-ui"
| "penpot-frontend" | "penpot-exporter" => StartupTier::Frontend,
| "penpot-frontend" | "penpot-exporter"
| "indeedhub" => StartupTier::Frontend,
// Tier 3: Everything else
// Tier 3: Application layer (everything else)
_ => StartupTier::Application,
}
}
/// Map containers to their required dependencies.
/// When a dependent fails, check and restart its dependencies first.
fn container_dependencies(name: &str) -> &'static [&'static str] {
let id = name.strip_prefix("archy-").unwrap_or(name);
match id {
// Bitcoin-dependent chain
"lnd" => &["bitcoin-knots"],
"electrumx" | "mempool-electrs" | "electrs" => &["bitcoin-knots"],
"nbxplorer" => &["bitcoin-knots"],
"btcpay-server" => &["btcpay-db", "nbxplorer"],
"mempool-api" => &["mempool-db", "electrumx"],
"mempool-web" => &["mempool-api"],
"fedimint" => &["bitcoin-knots"],
"fedimint-gateway" => &["lnd"],
// IndeedHub stack
"indeedhub-api" => &["indeedhub-postgres", "indeedhub-redis"],
"indeedhub" => &["indeedhub-api"],
"indeedhub-relay" => &["indeedhub-postgres"],
"indeedhub-ffmpeg" => &["indeedhub-api"],
// Multi-container stacks
"immich_server" => &["immich_postgres", "immich_redis"],
"penpot-backend" => &["penpot-postgres", "penpot-valkey"],
"penpot-frontend" => &["penpot-backend"],
// UI containers
"bitcoin-ui" => &["bitcoin-knots"],
"lnd-ui" => &["lnd"],
"electrs-ui" => &["electrumx"],
_ => &[],
}
}
/// Check if all of a container's dependencies are currently running.
fn deps_are_running(name: &str, containers: &[ContainerHealth]) -> bool {
let deps = container_dependencies(name);
if deps.is_empty() {
return true;
}
for dep in deps {
// Check both plain name and archy- prefixed name
let dep_running = containers.iter().any(|c| {
let c_id = c.name.strip_prefix("archy-").unwrap_or(&c.name);
(c_id == *dep || c.name == *dep) && c.state == "running"
});
if !dep_running {
return false;
}
}
true
}
/// Track restart attempts per container with exponential backoff and stability reset.
struct RestartTracker {
attempts: HashMap<String, u32>,
@ -372,7 +431,7 @@ async fn check_containers() -> Vec<ContainerHealth> {
async fn restart_container(name: &str) -> bool {
info!("Auto-restarting unhealthy container: {}", name);
let result = tokio::time::timeout(
std::time::Duration::from_secs(30),
std::time::Duration::from_secs(120),
tokio::process::Command::new("podman")
.args(["start", name])
.output(),
@ -394,7 +453,7 @@ async fn restart_container(name: &str) -> bool {
false
}
Err(_) => {
warn!("Timeout starting container {} (30s)", name);
warn!("Timeout starting container {} (120s)", name);
false
}
}
@ -466,13 +525,33 @@ pub fn spawn_health_monitor(state: Arc<StateManager>, data_dir: PathBuf) {
if container.healthy {
if tracker.attempt_count(&container.name) > 0 {
info!("Container {} is healthy again after restart", container.name);
// Reset attempt counters for containers that depend on this one,
// since their previous failures may have been caused by this
// dependency being down
let recovered_id = container.name.strip_prefix("archy-")
.unwrap_or(&container.name).to_string();
for other in &containers {
let deps = container_dependencies(&other.name);
if deps.iter().any(|d| *d == recovered_id || *d == container.name) {
if tracker.attempt_count(&other.name) > 0 {
info!("Resetting restart counter for {} (dependency {} recovered)",
other.name, container.name);
tracker.clear(&other.name);
restart_history.clear(&other.name);
history_dirty = true;
}
}
}
tracker.clear(&container.name);
restart_history.clear(&container.name);
history_dirty = true;
}
continue;
}
if container.state == "exited" || container.state == "stopped" {
// Handle exited, stopped, AND created state containers
if container.state == "exited" || container.state == "stopped"
|| container.state == "created"
{
// Skip user-stopped containers
if user_stopped.contains(&container.name) {
debug!("Skipping user-stopped container: {}", container.name);
@ -509,6 +588,13 @@ pub fn spawn_health_monitor(state: Arc<StateManager>, data_dir: PathBuf) {
continue;
}
// Skip if dependencies aren't running — they need to start first
if !deps_are_running(&container.name, &containers) {
let deps = container_dependencies(&container.name);
debug!("Container {} waiting for dependencies {:?}", container.name, deps);
continue;
}
// When transitioning to a higher tier, wait briefly for previous tier to stabilize
if let Some(prev) = prev_tier {
if tier > prev {
@ -695,13 +781,13 @@ mod tests {
#[test]
fn test_max_restart_attempts_constant() {
assert!(MAX_RESTART_ATTEMPTS >= 1);
assert!(MAX_RESTART_ATTEMPTS <= 10);
assert_eq!(MAX_RESTART_ATTEMPTS, 3);
assert!(MAX_RESTART_ATTEMPTS <= 20);
assert_eq!(MAX_RESTART_ATTEMPTS, 10);
}
#[test]
fn test_check_interval_constant() {
assert_eq!(CHECK_INTERVAL_SECS, 60);
assert_eq!(CHECK_INTERVAL_SECS, 120);
}
#[test]
@ -740,6 +826,44 @@ mod tests {
assert_eq!(container_tier("archy-btcpay-db"), StartupTier::Database);
assert_eq!(container_tier("immich_postgres"), StartupTier::Database);
assert_eq!(container_tier("penpot-valkey"), StartupTier::Database);
assert_eq!(container_tier("indeedhub-postgres"), StartupTier::Database);
assert_eq!(container_tier("indeedhub-redis"), StartupTier::Database);
assert_eq!(container_tier("indeedhub-minio"), StartupTier::Database);
}
#[test]
fn test_container_tier_indeedhub_api() {
assert_eq!(container_tier("indeedhub-api"), StartupTier::DependentService);
}
#[test]
fn test_container_tier_mempool_api() {
assert_eq!(container_tier("mempool-api"), StartupTier::DependentService);
}
#[test]
fn test_container_dependencies() {
assert!(container_dependencies("lnd").contains(&"bitcoin-knots"));
assert!(container_dependencies("indeedhub-api").contains(&"indeedhub-postgres"));
assert!(container_dependencies("indeedhub-api").contains(&"indeedhub-redis"));
assert!(container_dependencies("mempool-api").contains(&"mempool-db"));
assert!(container_dependencies("mempool-api").contains(&"electrumx"));
assert!(container_dependencies("nextcloud").is_empty());
}
#[test]
fn test_deps_are_running() {
let containers = vec![
ContainerHealth { name: "indeedhub-postgres".into(), app_id: "indeedhub-postgres".into(), state: "running".into(), healthy: true },
ContainerHealth { name: "indeedhub-redis".into(), app_id: "indeedhub-redis".into(), state: "running".into(), healthy: true },
ContainerHealth { name: "indeedhub-api".into(), app_id: "indeedhub-api".into(), state: "exited".into(), healthy: false },
];
assert!(deps_are_running("indeedhub-api", &containers));
// Missing postgres
let partial = vec![
ContainerHealth { name: "indeedhub-redis".into(), app_id: "indeedhub-redis".into(), state: "running".into(), healthy: true },
];
assert!(!deps_are_running("indeedhub-api", &partial));
}
#[test]

View File

@ -14,18 +14,21 @@ use std::path::PathBuf;
use std::sync::Arc;
use tracing::{debug, warn};
/// Spawn the background metrics collector (runs every 60 seconds).
/// Spawn the background metrics collector (runs every 300 seconds / 5 minutes).
/// Evaluates alert rules on each snapshot and dispatches notifications.
/// Note: health_monitor.rs handles container state polling at 120s intervals.
/// This collector handles system-level metrics (CPU, disk, network) and only
/// calls podman stats every 5 minutes to avoid duplicate subprocess overhead.
pub fn spawn_metrics_collector(
store: Arc<MetricsStore>,
state: Option<Arc<crate::state::StateManager>>,
data_dir: Option<PathBuf>,
) {
tokio::spawn(async move {
// Wait 30s for system to stabilize after boot
tokio::time::sleep(std::time::Duration::from_secs(30)).await;
// Wait 60s for system to stabilize after boot
tokio::time::sleep(std::time::Duration::from_secs(60)).await;
let mut interval = tokio::time::interval(std::time::Duration::from_secs(60));
let mut interval = tokio::time::interval(std::time::Duration::from_secs(300));
loop {
interval.tick().await;

View File

@ -34,6 +34,7 @@ pub struct ContainerStatus {
pub name: String,
pub state: ContainerState,
pub health: Option<String>,
pub exit_code: Option<i32>,
pub started_at: Option<String>,
pub image: String,
pub created: String,
@ -150,13 +151,13 @@ impl PodmanClient {
) -> Result<serde_json::Value> {
let socket_path = self.socket_path.clone();
// Connect to the unix socket
// Connect to the unix socket (30s timeout — podman can be slow under load on boot)
let stream = tokio::time::timeout(
std::time::Duration::from_secs(5),
std::time::Duration::from_secs(30),
UnixStream::connect(&socket_path),
)
.await
.map_err(|_| anyhow::anyhow!("Podman socket connection timed out"))?
.map_err(|_| anyhow::anyhow!("Podman socket connection timed out (30s)"))?
.context(format!("Cannot connect to Podman socket at {}", socket_path.display()))?;
// Build the hyper client with the unix stream
@ -179,8 +180,11 @@ impl PodmanClient {
let req = match method {
"POST" => {
let body_str = body.map(|b| serde_json::to_string(&b).unwrap_or_default())
.unwrap_or_default();
let body_str = match body {
Some(b) => serde_json::to_string(&b)
.context("Failed to serialize request body to JSON")?,
None => String::new(),
};
Request::builder()
.method("POST")
.uri(uri)
@ -326,6 +330,8 @@ impl PodmanClient {
"cap_drop": cap_drop,
"read_only_filesystem": manifest.app.security.readonly_root,
"no_new_privileges": true,
"restart_policy": "unless-stopped",
"restart_tries": 5,
"netns": {
"nsmode": match manifest.app.security.network_policy.as_str() {
"host" => "host",
@ -342,8 +348,9 @@ impl PodmanClient {
).await?;
let id = result["Id"].as_str()
.unwrap_or("")
.to_string();
.filter(|s| !s.is_empty())
.map(|s| s.to_string())
.context("Podman API returned no container ID — creation may have failed")?;
Ok(id)
}
@ -396,11 +403,14 @@ impl PodmanClient {
let ports = parse_port_bindings(&data["HostConfig"]["PortBindings"]);
let lan_address = Self::lan_address_for(&container_name);
let exit_code = data["State"]["ExitCode"].as_i64().map(|c| c as i32);
Ok(ContainerStatus {
id: data["Id"].as_str().unwrap_or("").to_string(),
name: container_name,
state: ContainerState::from(state_str),
health,
exit_code,
started_at,
image: data["ImageName"].as_str()
.or_else(|| data["Config"]["Image"].as_str())
@ -477,11 +487,16 @@ impl PodmanClient {
.map(|s| s.to_string());
let lan_address = Self::lan_address_for(&name);
let exit_code = c["ExitCode"].as_i64()
.or_else(|| c["State"]["ExitCode"].as_i64())
.map(|c| c as i32);
result.push(ContainerStatus {
id: c["Id"].as_str().unwrap_or("").to_string(),
name,
state: ContainerState::from(c["State"].as_str().unwrap_or("unknown")),
health,
exit_code,
started_at,
image: c["Image"].as_str().unwrap_or("").to_string(),
created: c["Created"].as_str().unwrap_or("").to_string(),

View File

@ -285,6 +285,7 @@ impl ContainerRuntime for DockerRuntime {
name: parts[1].to_string(),
state: crate::podman_client::ContainerState::from(parts[2]),
health: None,
exit_code: None,
started_at: None,
image: parts[3].to_string(),
created: parts[4].to_string(),
@ -359,6 +360,7 @@ impl ContainerRuntime for DockerRuntime {
container["State"].as_str().unwrap_or("unknown")
),
health: None,
exit_code: container["ExitCode"].as_i64().map(|c| c as i32),
started_at: None,
image: container["Image"].as_str().unwrap_or("").to_string(),
created: container["CreatedAt"].as_str().unwrap_or("").to_string(),

View File

@ -0,0 +1,508 @@
# Archipelago Container Infrastructure — Critical Issues Report
**Date:** 2026-03-31
**Status:** Server .228 rebooted — some apps recovered, many did not. UI showed everything as "crashed" during recovery window.
**Purpose:** Fix guide for getting container lifecycle to production quality.
---
## Executive Summary
The container system has **7 systemic failures** that compound each other:
1. **Silent failures everywhere** — errors are swallowed with `|| true`, `.unwrap_or_default()`, and warn-level logs. Nothing actually tells the user (or the system) that something broke.
2. **Health checks are fake** — manifests define real health checks (HTTP probes, exec checks) but they are **never executed**. "Healthy" just means `podman ps` shows "running".
3. **Duplicate polling burns CPU** — health monitor + metrics collector both call `podman stats` every 60 seconds independently. Add crash recovery snapshots, disk monitor, and frontend polling = constant subprocess spawning.
4. **Uninstall doesn't clean up** — no volume removal, no network cleanup, force-kills stateful containers (risking wallet/DB corruption), returns 200 OK on partial failure.
5. **Two divergent install paths**`first-boot-containers.sh` and the Rust RPC installer use different passwords, ports, capabilities, memory limits, and Bitcoin config. They are never in sync.
6. **UI misrepresents state**`Exited` (even clean exit code 0) shows as "crashed". No "recovering" or "starting up" state exists. During boot recovery, UI shows a wall of red/gray "crashed" labels.
7. **Dependency-blind restarts** — health monitor restarts services without restarting their dependencies first, so they immediately fail again and burn through the 3-attempt limit.
---
## LIVE EVIDENCE: .228 Reboot on 2026-03-31
After rebooting .228, here's the actual container state 30 minutes later:
### Permanently Dead (exceeded 3 restart attempts, abandoned)
| Container | Exit Code | Cause |
|-----------|-----------|-------|
| `indeedhub-postgres` | 0 (clean) | Shut down by reboot. Health monitor tried 3 restarts, it keeps exiting cleanly. Once abandoned, all dependent services die too. |
| `indeedhub-redis` | 0 | Same — clean exit, 3 failed restart attempts, abandoned |
| `indeedhub-minio` | 0 | Same |
| `indeedhub-relay` | 0 | Same |
| `indeedhub` | 0 | Same |
| `indeedhub-api` | 1 | Can't resolve hostname `indeedhub-postgres` (postgres is dead, DNS entry gone from network) |
| `jellyfin` | 137 (OOM) | "Failed to create CoreCLR" — memory limit too low for .NET runtime. SIGKILL = OOM. 3 attempts exhausted. |
### Crash-Looping (still failing on every restart)
| Container | Cause |
|-----------|-------|
| `mempool-api` | `ECONNREFUSED 10.89.0.42:3306` — DB (`archy-mempool-db`) just restarted, not ready yet |
| `portainer` | "database schema version does not align with server version" — image upgraded, DB not migrated. Will NEVER recover. |
| `photoprism` | "Failed creating test file in storage folder" — volume permission issue (rootless UID mapping) |
### Never Started (stuck in "Created" state)
| Container | Cause |
|-----------|-------|
| `archy-mempool-web` | "cannot assign requested address" — network binding failure |
| `fedimint` | Same network error |
### Running but Unhealthy
| Container | Notes |
|-----------|-------|
| `homeassistant` | Up 14 min, health check failing |
| `searxng` | Up 13 min, health check failing |
| `onlyoffice` | Up 10 min, health check failing |
### Actually Recovered (healthy)
`filebrowser`, `bitcoin-knots`, `vaultwarden`, `nginx-proxy-manager`, `archy-btcpay-db`, `lnd`, `electrumx`, `grafana`
### Key Observations
1. **All containers have `unless-stopped` restart policy** — but this doesn't help because containers that exit cleanly (code 0) don't get restarted by Podman. The health monitor is the only restart mechanism, and it gives up after 3 attempts.
2. **The entire IndeedHub stack died** because postgres was abandoned first. Once postgres hit 3 restart attempts, every dependent service (api, redis, minio, relay, main) also failed and hit their own 3-attempt limit. **No dependency awareness.**
3. **Containers in "Created" state** were never even started — some kind of network assignment failure during creation. The health monitor doesn't handle "Created" state containers.
4. **The UI showed ALL apps as "crashed"** during the first few minutes, even the ones that eventually recovered. This is because `Exited` state (even exit code 0) maps to the label "crashed" in `appsConfig.ts`.
---
## Problem 1: Containers Don't Start or Recover After Reboot
**Confirmed:** All apps crashed after .228 reboot on 2026-03-31.
### Root Causes
#### A. Crash recovery has a 30-second timeout that's too short
**File:** `core/archipelago/src/crash_recovery.rs:265-271`
```rust
let result = tokio::time::timeout(
std::time::Duration::from_secs(30),
tokio::process::Command::new("podman").args(["start", &record.name]).output(),
).await;
```
On a cold boot with many containers, Podman is under load. 30 seconds is not enough. If it times out, the container is **skipped** — no retry.
#### B. If `podman ps` itself times out, recovery finds zero containers
**File:** `core/archipelago/src/crash_recovery.rs:318`
The `podman ps -a` call to discover stopped containers has a 30-second timeout. On a busy system post-reboot, this can timeout. Result: `all_names` is empty, recovery silently exits having started nothing.
#### C. Boot tier ordering uses a catch-all that misses dependencies
**File:** `core/archipelago/src/crash_recovery.rs:374-385`
```rust
fn container_boot_tier(name: &str) -> u8 {
match id {
"btcpay-db" | "mempool-db" | ... => 0, // databases
"bitcoin-knots" | ... => 1, // bitcoin
"lnd" | "electrumx" | ... => 2, // depends on bitcoin
"mempool-web" | ... => 4, // frontend
_ => 3, // EVERYTHING ELSE - may start before its dependencies
}
}
```
Any app not explicitly listed gets tier 3, which may be before its dependencies are ready.
#### D. First-boot script swallows ALL errors
**File:** `scripts/first-boot-containers.sh:8` — no `set -e`
48+ commands have `|| true` appended. Every `podman run` failure is silently ignored. The script always exits 0 and reports "complete" to systemd even if 50% of containers failed.
#### E. Install RPC returns success before container is actually running
**File:** `core/archipelago/src/api/rpc/package/install.rs:260-294`
After container creation, the installer polls for 30 seconds (6 checks x 5 seconds). If the container is still in "created" or "starting" state after 30 seconds:
```rust
if i == 5 {
debug!("Container {} health check timeout (30s) -- continuing anyway");
}
```
It logs at debug level and **returns success**. The user sees "installed" but the container never actually started.
### Fixes Required
1. **Increase crash recovery timeout to 120s** and add retry with backoff (3 attempts per container)
2. **Increase `podman ps` timeout to 60s** during boot recovery
3. **Replace tier catch-all** — every container must be explicitly listed or derived from manifest dependencies
4. **Remove `|| true`** from critical commands in first-boot-containers.sh. Use proper error handling: log the error, record the failure, continue to next container, but report actual failures at the end
5. **Install RPC must return failure** if container isn't running after timeout, not silently succeed
6. **Add `--restart unless-stopped`** to container creation in the Podman client (`core/container/src/podman_client.rs:303-335`) — currently missing, so Podman itself never auto-restarts crashed containers
---
## Problem 2: Health Checks Are Fake
### Root Causes
#### A. "Healthy" just means "running" — application health is never checked
**File:** `core/archipelago/src/container/dev_orchestrator.rs:239-249`
```rust
pub async fn get_health_status(&self, app_id: &str) -> Result<String> {
match status.state {
ContainerState::Running => Ok("healthy".to_string()), // <-- THIS IS THE ENTIRE CHECK
ContainerState::Stopped | ContainerState::Exited => Ok("unhealthy".to_string()),
...
}
}
```
A container can be "running" but the application inside is completely broken. This is reported as "healthy".
#### B. Manifest health checks exist but are never executed
All 30+ app manifests in `image-recipe/build/debian-iso/custom/archipelago/apps/*/manifest.yml` define health checks like:
```yaml
health_check:
type: http
endpoint: http://localhost:4080
path: /api/health
interval: 30s
timeout: 5s
retries: 3
```
The `HealthMonitor` struct at `core/container/src/health_monitor.rs` can execute these checks. **But it is never instantiated.** No code path creates a `HealthMonitor` from the manifest health check definitions.
#### C. Health status is never pushed to the frontend via WebSocket
**File:** `core/archipelago/src/data_model.rs:120-127`
```rust
pub struct PackageDataEntry {
pub health: Option<String>, // Field exists but is NEVER POPULATED
}
```
The health field in the data model is always `None`. Frontend can only get health via explicit RPC call, which it almost never makes.
#### D. Frontend never polls health status
**File:** `neode-ui/src/stores/container.ts:169-175`
`fetchHealthStatus()` is only called after `startContainer()` and `startBundledApp()`. There is **no setInterval, no periodic polling, no watch**. After the initial call, health status is never refreshed.
### Fixes Required
1. **Wire up manifest health checks** — instantiate `HealthMonitor` from manifest definitions, run actual HTTP/exec probes instead of just checking `podman ps`
2. **Populate the `health` field in `PackageDataEntry`** so WebSocket pushes real health status to frontend
3. **Add 30-second health polling** in the frontend container store (with backoff to 60s when all healthy)
4. **Fix `get_health_status()`** in dev_orchestrator to call actual health checks, not just check container state
---
## Problem 3: CPU Exhaustion from Duplicate Polling
### Root Causes
#### A. Two independent monitors both call `podman stats` every 60 seconds
- **Health monitor:** `core/archipelago/src/health_monitor.rs:17``CHECK_INTERVAL_SECS = 60`
- Runs `podman ps -a --format json` (line 305-323)
- Runs `podman stats --no-stream` every 5 cycles (line 442-450)
- **Metrics collector:** `core/archipelago/src/monitoring/mod.rs:28` — 60-second interval
- Runs `podman stats --no-stream --format json` independently (collector.rs:220-224)
These are **not coordinated**. Both spawn separate subprocesses. On a system with 15+ containers, each `podman stats` call is expensive.
#### B. Total subprocess spawning frequency
| Component | Interval | What it runs |
|-----------|----------|-------------|
| Health monitor | 60s | `podman ps`, `podman stats` (every 5th), restart attempts |
| Metrics collector | 60s | `podman stats` (duplicate!) |
| Crash recovery snapshot | 120s | `podman ps` |
| Disk monitor | 300s | `df`, `sudo dmesg`, potentially `podman image prune` |
| Telemetry | 900s | `podman stats` (another duplicate) |
| Systemd watchdog | 120s | sd_notify ping |
| Frontend fleet polling | 60s | RPC calls that trigger more podman commands |
That's roughly **one `podman` subprocess every 10-15 seconds** on average, plus all the triggered operations.
#### C. No restart policy means polling-driven restarts
**File:** `core/container/src/podman_client.rs:303-335`
Container creation spec does NOT include `RestartPolicy`. Podman itself never restarts crashed containers. Instead, the health monitor's 60-second poll detects the crash and attempts a restart. This is far more CPU-intensive than Podman's built-in restart mechanism.
#### D. Health monitor restart attempts with exponential backoff still spawn processes
When a container fails, the health monitor tries restarts at 10s, 30s, 90s backoff. Each attempt spawns `podman start`, `podman inspect`, etc. If multiple containers are unhealthy, this multiplies.
### Fixes Required
1. **Deduplicate `podman stats`** — create a shared cache layer. One component fetches, others read from cache (TTL: 30s)
2. **Add `RestartPolicy: unless-stopped` with MaxRetryCount: 5** to all container creation — let Podman handle restarts natively instead of polling
3. **Increase health monitor interval to 120s** (60s is too aggressive when health checks are just `podman ps`)
4. **Remove duplicate `podman stats`** call from metrics collector — share data with health monitor
5. **Make frontend fleet polling viewport-aware** — only poll when user is actually viewing the fleet page
6. **Batch all container queries** — use a single `podman ps -a --format json` per check cycle, shared across all consumers
---
## Problem 4: Uninstall Doesn't Work
### Root Causes
#### A. No volume removal
**File:** `core/archipelago/src/api/rpc/package/runtime.rs:172-289`
The uninstall function stops containers, removes containers, releases ports, and attempts data directory cleanup. It **never removes Podman volumes**. Orphaned volumes accumulate forever.
#### B. No network cleanup
**File:** `core/archipelago/src/api/rpc/package/runtime.rs:172-289`
Multi-container stacks create networks (`archy-net`, `immich-net`, `penpot-net`) during install (`stacks.rs:89, 211`). These are **never cleaned up** during uninstall. Leftover networks can prevent reinstallation.
#### C. Force-kills stateful containers without graceful shutdown
**File:** `core/archipelago/src/api/rpc/package/runtime.rs:226`
```rust
let rm_out = tokio::process::Command::new("podman")
.args(["rm", "-f", name]) // -f = force kill
.output().await;
```
The code defines proper shutdown timeouts (Bitcoin: 600s, LND: 330s, databases: 120s) but only uses them for `stop`. The `rm -f` that follows **ignores these timeouts** and force-kills immediately. This risks corrupting Bitcoin's UTXO set, LND channel state, or database WAL.
#### D. Returns 200 OK even on partial failure
**File:** `core/archipelago/src/api/rpc/package/runtime.rs:268-289`
```rust
Ok(serde_json::json!({
"status": if errors.is_empty() { "uninstalled" } else { "partial" },
...
}))
```
Returns HTTP 200 with `"partial"` status. Frontend at `neode-ui/src/views/apps/useAppsActions.ts:74` doesn't check for "partial" — it deletes the app from the UI regardless.
#### E. Data directory cleanup requires sudo and fails silently
**File:** `core/archipelago/src/api/rpc/package/runtime.rs:256-265`
```rust
let rm_out = tokio::process::Command::new("sudo")
.args(["rm", "-rf", dir]).output().await;
if let Ok(o) = rm_out {
if !o.status.success() {
tracing::warn!(...); // Warning only, continues
}
}
```
If sudo isn't configured or fails, data remains on disk but UI shows "uninstalled".
#### F. Container name detection has gaps
**File:** `core/archipelago/src/api/rpc/package/config.rs:287-340`
Container names are hardcoded patterns. If a container was created with a different naming convention (e.g., by first-boot-containers.sh vs RPC installer), it won't be found and won't be removed.
### Fixes Required
1. **Add `podman volume rm`** for all volumes associated with the app after container removal
2. **Add network cleanup** — remove app-specific networks after all containers on that network are gone
3. **Use `podman stop -t {timeout}` then `podman rm`** (without -f) — respect graceful shutdown timeouts, especially for Bitcoin/LND/databases
4. **Return an error (not 200)** when uninstall has failures. Frontend must check and display errors
5. **Surface "partial" failures to the user** with specific error messages
6. **Unify container naming** — derive names from a single source (manifest), not hardcoded patterns in multiple files
---
## Problem 5: Two Divergent Install Paths
The first-boot bash script and the Rust RPC installer create containers with **different configurations**. This is a major source of bugs.
### Specific Divergences
#### A. Database passwords
- **First-boot** (`scripts/first-boot-containers.sh:118-127`): Generates random passwords with `openssl rand -base64 24`, stores in `/var/lib/archipelago/secrets/`
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:456,484,514-515,610`): Uses hardcoded `"btcpaypass"`, `"mempoolpass"`, `"rootpass"`, `"immichpass"`
**Result:** Apps installed via RPC after first-boot can't connect to databases because passwords don't match.
#### B. Bitcoin configuration
- **First-boot** (`scripts/first-boot-containers.sh:295-313`): Dynamically sets `-prune=550` on small disks, `-txindex=1` on large disks
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:415-420`): No custom args at all
**Result:** Bitcoin installed via RPC has no pruning or txindex regardless of disk size.
#### C. ZMQ configuration for LND
- **First-boot** (`scripts/first-boot-containers.sh:100-114`): Bitcoin.conf generated without ZMQ publisher settings
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:438-439`): LND configured to connect to `tcp://bitcoin-knots:28332` and `tcp://bitcoin-knots:28333`
**Result:** LND can't receive block notifications from Bitcoin because ZMQ isn't configured on either path.
#### D. Port conflicts
- **First-boot** (`scripts/first-boot-containers.sh:813,835`): Both strfry and indeedhub bind to host port 7777
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:734`): IndeedHub uses `8190:3000`
**Result:** On first-boot, whichever of strfry/indeedhub starts second fails. Via RPC, different port entirely.
#### E. Memory limits
- **First-boot** (`scripts/first-boot-containers.sh:253-283`): Ollama gets 1g on low-mem systems
- **Rust RPC** (`core/archipelago/src/api/rpc/package/config.rs:245-280`): Ollama gets 4g always
**Result:** Same app gets different resource limits depending on how it was installed.
#### F. Version mismatches in marketplace UI
- `scripts/image-versions.sh:17`: LND image is `v0.18.4-beta`
- `neode-ui/src/views/marketplace/marketplaceData.ts:155`: Shows `0.17.4`
- `scripts/image-versions.sh:21-22`: Mempool images are `v3.0.0`
- `neode-ui/src/views/marketplace/marketplaceData.ts:177`: Shows `2.5.0`
### Fixes Required
1. **Single source of truth for container config** — Rust config must read passwords from `/var/lib/archipelago/secrets/`, not hardcode them
2. **Add ZMQ config** to Bitcoin startup in both paths: `zmqpubrawblock=tcp://0.0.0.0:28332` and `zmqpubrawtx=tcp://0.0.0.0:28333`
3. **Fix port 7777 conflict** — assign unique ports to strfry and indeedhub
4. **Add disk-aware Bitcoin config** to Rust installer (prune/txindex based on disk size)
5. **Sync memory limits** between first-boot and Rust config
6. **Update marketplace version strings** to match actual image versions in `image-versions.sh`
7. **Long-term: eliminate first-boot-containers.sh** — have the backend handle all container creation using the same Rust code path
---
## Problem 6: Post-Install Hooks Run Async and Fail Silently
**File:** `core/archipelago/src/api/rpc/package/install.rs:541-625`
Post-install hooks (setting FileBrowser password, configuring NextCloud, etc.) are spawned as background tasks:
```rust
tokio::spawn(async move {
let _ = tokio::fs::create_dir_all(secret_dir).await;
let _ = tokio::fs::write(...).await;
});
```
The install RPC returns success **before hooks complete**. If a hook fails (network timeout, service not ready), the error is logged but the user is told installation succeeded. Credentials aren't set, configs aren't applied.
### Fix Required
Await post-install hooks before returning success, or return a "configuring" status and let the frontend poll for completion.
---
## Problem 7: Podman Client Swallows Errors
**File:** `core/container/src/podman_client.rs`
#### A. JSON serialization failures return empty strings (line 182-183)
```rust
let body_str = body.map(|b| serde_json::to_string(&b).unwrap_or_default()).unwrap_or_default();
```
#### B. Container ID parsing failures return empty string (line 344-348)
```rust
let id = result["Id"].as_str().unwrap_or("").to_string();
Ok(id) // Empty string = success?
```
#### C. Socket timeout is only 5 seconds (line 154-160)
On a busy system or during boot, Podman socket may take >5s to respond. Every API call fails. No retry logic.
### Fixes Required
1. Replace `.unwrap_or_default()` with proper error propagation using `?`
2. Return `Err` when container ID is empty
3. Increase socket timeout to 15-30s
4. Add retry with backoff (3 attempts) on socket connection
---
## Problem 8: UI Misrepresents Container State
### Root Causes
#### A. "Exited" always displays as "Crashed" — even for clean shutdowns
**File:** `neode-ui/src/views/apps/appsConfig.ts:119-146`
```typescript
getStatusLabel(state, health):
- "exited" → "crashed" // <-- THIS IS THE PROBLEM
```
Every container that exited — whether from a clean reboot (exit 0), OOM kill (exit 137), or app error (exit 1) — shows the same "crashed" label. After a reboot, the UI is a wall of "crashed" labels even though containers are in the process of starting up.
#### B. No "recovering" or "boot in progress" state exists
**File:** `core/archipelago/src/data_model.rs:103-119`
PackageState enum has `Starting`, but it's only set during **explicit user start actions**, not during automatic crash recovery. During boot recovery, containers transition from `Exited → Running` without ever passing through `Starting`, so the UI never shows a spinner or "starting up" message.
#### C. Backend skips sub-containers from package listing, so their state is invisible
**File:** `core/archipelago/src/container/docker_packages.rs:39-117`
The excluded_services list filters out backend services like `mempool-db`, `btcpay-db`, `nbxplorer`, `penpot-postgres`, etc. UI containers ending in `-ui` are also skipped. These containers are invisible to the user even when they're the actual cause of a stack failure (e.g., `indeedhub-postgres` being dead kills the entire IndeedHub stack, but only `indeedhub-api` errors are visible).
#### D. No distinction between "needs manual intervention" and "will recover soon"
The UI shows the same visual treatment for:
- Portainer (DB migration error — will NEVER recover without manual intervention)
- mempool-api (DB not ready yet — will recover in 30 seconds)
- IndeedHub (dependencies abandoned — won't recover until deps are manually restarted)
### Fixes Required
1. **Differentiate exit codes**: Exit 0 = "stopped" (gray), Exit non-zero = "crashed" (red), Exit 137 = "killed (OOM)" (red with warning)
2. **Add a "recovering" state**: During boot/crash recovery window (first 5 minutes after backend start), show "Starting up..." instead of "crashed" for exited containers
3. **Show sub-container health**: When a parent app is unhealthy, show which sub-service caused the failure (e.g., "IndeedHub: postgres is down")
4. **Distinguish recoverable from permanent failures**: After health monitor gives up (3 attempts), change label to "Needs attention" instead of keeping "crashed"
5. **Add recovery progress indicator**: During boot, show "Recovering containers: 15/22 started" on the dashboard
---
## Problem 9: Dependency-Blind Restarts
### Root Cause (Confirmed by .228 reboot)
The health monitor restarts containers individually without considering dependencies. This was proven by the IndeedHub stack failure:
1. `indeedhub-postgres` exits cleanly (code 0) on reboot
2. Health monitor restarts postgres — it starts, but exits again (likely needs volume mount or network ready)
3. After 3 attempts, postgres is **abandoned**
4. Meanwhile, `indeedhub-api` tries to connect to postgres → `ENOTFOUND indeedhub-postgres` → exits
5. Health monitor restarts api → same DNS failure → exits
6. After 3 attempts, api is **abandoned**
7. Same cascade for redis, minio, relay, main container — all abandoned within minutes
**File:** `core/archipelago/src/health_monitor.rs:500-530`
The restart loop treats each container independently. There's no logic to:
- Check if a container's dependencies are running before restarting it
- Restart dependencies first when a dependent container fails
- Reset attempt counters when a dependency comes back online
**3 attempts is too few**, especially when dependencies need time:
- Attempt 1: 10s backoff → dependency still starting
- Attempt 2: 30s backoff → dependency crashed and is being restarted
- Attempt 3: 90s backoff → dependency hit its own 3-attempt limit and was abandoned
- Game over. Entire stack is dead.
### Fixes Required
1. **Dependency-aware restart ordering**: Before restarting a container, check if its dependencies are running. If not, restart dependencies first.
2. **Increase max restart attempts to 5-10** for containers with dependencies
3. **Reset attempt counters** when a dependency comes back online (the dependent container failed because of the dependency, not itself)
4. **Add a "stack restart" concept**: When restarting any container in a multi-container stack (indeedhub, mempool, btcpay, immich, penpot), restart the entire stack in dependency order
5. **Handle "Created" state containers**: `archy-mempool-web` and `fedimint` are in "Created" state (never started). The health monitor should detect these and attempt to start them.
---
## Priority Order for Fixes
### P0 — System is broken without these (reboot = broken system)
1. **Dependency-aware restarts** in health_monitor.rs — restart dependencies before dependents, reset attempt counters when deps recover
2. **Increase max restart attempts to 10** (currently 3) — dependency chains need more time on boot
3. **Handle "Created" state** — containers stuck in Created are never started by health monitor
4. **Fix UI state labels** — "exited" code 0 should say "stopped", not "crashed". Add "recovering" state during boot window.
5. Fix Rust config to read secrets from `/var/lib/archipelago/secrets/` instead of hardcoded passwords
6. Fix port 7777 conflict (strfry vs indeedhub)
7. Add ZMQ config to Bitcoin for LND block notifications
### P1 — Core functionality broken
8. Wire up manifest health checks (replace fake "running = healthy" with actual HTTP/exec probes)
9. Fix uninstall to clean up volumes, networks, and respect graceful shutdown timeouts
10. Return actual errors from install/uninstall instead of silent success on partial failure
11. Remove `|| true` from critical first-boot commands
12. Show sub-container health in UI (which dependency is actually broken)
### P2 — Performance and CPU
13. Deduplicate `podman stats` calls (health monitor + metrics collector both call every 60s independently)
14. Increase health monitor interval to 120s
15. Add frontend health polling via WebSocket push (populate `health` field in data model)
16. Make fleet polling viewport-aware (don't poll when user isn't viewing)
### P3 — Consistency and correctness
17. Sync memory limits between first-boot and Rust config
18. Update marketplace version strings (LND shows 0.17.4, actual is 0.18.4; Mempool shows 2.5.0, actual is 3.0.0)
19. Unify container naming conventions between first-boot script and Rust config
20. Add disk-aware Bitcoin config (prune/txindex) to Rust installer
21. Distinguish "needs manual intervention" from "will recover soon" in UI
---
## Key Files to Modify
| File | What to fix |
|------|-------------|
| `core/archipelago/src/health_monitor.rs` | Dependency-aware restarts, increase MAX_RESTART_ATTEMPTS to 10, handle Created state, deduplicate with metrics collector |
| `core/container/src/podman_client.rs` | Add RestartPolicy to container creation spec, fix `.unwrap_or_default()` error swallowing, increase socket timeout to 15-30s |
| `core/archipelago/src/crash_recovery.rs` | Increase timeouts to 120s, add retry with backoff, fix tier ordering catch-all |
| `core/archipelago/src/api/rpc/package/install.rs` | Return failure on timeout (not silent success), await post-install hooks |
| `core/archipelago/src/api/rpc/package/runtime.rs` | Add volume/network cleanup on uninstall, use `podman stop -t` then `podman rm` (not `-f`), return errors on partial failure |
| `core/archipelago/src/api/rpc/package/config.rs` | Read secrets from disk, fix port 7777, add ZMQ config, sync memory limits |
| `core/archipelago/src/container/dev_orchestrator.rs` | Wire up manifest-defined health checks instead of just checking podman state |
| `core/archipelago/src/container/docker_packages.rs` | Stop filtering sub-containers from state — or expose their health as part of parent app status |
| `core/archipelago/src/data_model.rs` | Populate `health` field for WebSocket push, add exit code to state |
| `core/archipelago/src/monitoring/mod.rs` | Share podman stats data with health monitor instead of duplicate subprocess calls |
| `neode-ui/src/views/apps/appsConfig.ts` | Fix state labels: exit 0 = "stopped", exit non-zero = "crashed", add "recovering" during boot window |
| `neode-ui/src/stores/container.ts` | Add periodic health polling (30s) |
| `neode-ui/src/views/apps/useAppsActions.ts` | Check for "partial" uninstall status, show errors to user |
| `neode-ui/src/views/marketplace/marketplaceData.ts` | Fix version strings to match image-versions.sh |
| `scripts/first-boot-containers.sh` | Remove `\|\| true` from critical commands, fix port 7777 conflict, add proper error reporting |

159
docs/GAMEPAD-NAV.md Normal file
View File

@ -0,0 +1,159 @@
# Gamepad / Controller Navigation Map
## Global Controls
| Button | Action |
|--------|--------|
| D-pad Up/Down | Navigate between items |
| D-pad Left | Go to sidebar (from any page) |
| D-pad Right | Enter main content from sidebar |
| Enter (A) | Activate / click focused element |
| Escape (B) | Go back one level (inner → container → sidebar → detail page back) |
## Navigation Layers
```
SIDEBAR ──Right──► CONTAINERS (or NAV BAR) ──Enter──► INNER CONTROLS
▲ ▲ │
└──Escape──────────────┘◄─────────Escape──────────────────┘
```
### Sidebar
- **Up/Down**: Move between sidebar items (wraps), auto-navigates links
- **Right**: Jump to main content (first container, or first button on container-free pages)
- **Left**: Nothing
### Nav Bar (mode-switcher tabs, category buttons)
- **Left/Right**: Move between tabs
- **Down**: Jump to first container below (remembers which tab for Up return)
- **Up**: Nothing (Escape to go to sidebar)
- **Left from leftmost**: Go to sidebar
### Container Grid (card tiles on most pages)
- **Arrows**: Spatial nav between containers
- **Enter**: Activate primary action (Install/Launch/navigate) or enter inner controls
- **Escape**: Go to sidebar
- **Left from leftmost**: Go to sidebar
- **Up from top row**: Return to remembered nav bar tab, or spatial to nearest nav item
### Inside Container (inner buttons after Enter)
- **Arrows**: Move between inner controls
- **Escape**: Exit back to the container tile
### Text Inputs
- **Up/Down**: Exit field, navigate spatially
- **Enter**: Submit (click adjacent button)
- **Left/Right**: Cursor movement (exit at edges)
### Container-Free Pages (Settings)
- **Right from sidebar**: Focus first button immediately (no 1s poll delay)
- **Up/Down**: Linear navigation through all buttons/toggles
- **Left**: Go to sidebar
- **Escape**: Go to sidebar
---
## Per-Page Mappings
### Home (`/dashboard`)
Container grid. Dashboard info cards.
### My Apps (`/dashboard/apps`)
| # | Element | Type |
|---|---------|------|
| Nav | My Apps / App Store / Services tabs | Nav bar (Left/Right) |
| 1N | App cards (grid) | Containers — Enter to view details, inner Launch/Stop/Restart buttons |
### App Store / Discover (`/dashboard/discover`)
| # | Element | Type |
|---|---------|------|
| Nav | My Apps / App Store / Services tabs | Nav bar (Left/Right) |
| 12 | Sovereignty Stack featured cards | Containers (`glass-card transition-all hover:-translate-y-1`) |
| 3N | All Applications grid cards | Containers — Enter for details, inner Install/Launch buttons |
### Network (`/dashboard/server`)
| # | Element | Type |
|---|---------|------|
| 1 | Quick Actions card | Single container — Enter to access Restart/Check Tor/View Logs buttons |
| 2 | Local Network card | Container |
| 3 | Web3 card | Container |
| 4 | Network Interfaces card | Container |
| 5 | Tor Services card | Container |
### Mesh (`/dashboard/mesh`)
| # | Element | Type |
|---|---------|------|
| 1 | Device status card | Container (left column) |
| 2 | Actions row (Enable/Broadcast/Off-Grid/Refresh) | Container |
| 3 | Peers list card | Container — Enter peer to open chat, inner peer items navigable |
| 4 | Chat panel | Container (right column) — message input + send |
| 5+ | Tool panels (Bitcoin/Dead Man/Map) | Containers |
**Chat flow**: Select peer (Enter) → focus auto-jumps to message input → type → Enter sends.
### Cloud (`/dashboard/cloud`)
Container grid. Folder/file cards.
### Settings (`/dashboard/settings`)
**Container-free page** — linear button navigation, no containers.
| # | Element | Section |
|---|---------|---------|
| 1 | Server Name input + save | Account Info |
| 2 | What's New button | Account Info |
| 3 | Copy DID button | Account Info |
| 4 | Copy Onion Address button | Account Info |
| 5 | Change Password button | Account → opens modal |
| 6 | Enable 2FA / Disable 2FA button | Account |
| 7 | Logout button | Account |
| 8 | Language selector buttons | Interface Mode |
| 9 | Login with Claude button | Claude Auth |
| 10 | Enable All / toggle per-category | AI Data Access |
| 11 | Manage Updates button | System Updates |
| 12 | Webhook URL input | Webhooks |
| 13 | Secret input | Webhooks |
| 14 | Container Crash / Update Available toggles | Webhooks |
| 15 | Disk Space Warning / Backup Complete toggles | Webhooks |
| 16 | Save Configuration / Send Test buttons | Webhooks |
| 17 | Enable Beta Telemetry button | Telemetry |
| 18 | Create Backup button | Backup |
| 19 | Export Channel Backup button | Backup |
| 20 | Network Diagnostics button | Danger Zone |
| 21 | Reboot button | Danger Zone → confirms with modal |
| 22 | Factory Reset button | Danger Zone → confirms with modal |
### Monitoring (`/dashboard/monitoring`)
Container grid. Stats/chart cards.
---
## Focus Memory
| Key | Remembers | Used When |
|-----|-----------|-----------|
| `sidebar` | Last sidebar item | Returning to sidebar via Escape/Left |
| `main` | Last focused container | Re-entering main zone |
| `navBar` | Last focused tab/button | Up from container returns to same tab |
All focus memory is cleared on route change.
## Data Attributes
| Attribute | Purpose |
|-----------|---------|
| `data-controller-zone="main"` | Main content area (on `<main>`) |
| `data-controller-zone="sidebar"` | Sidebar navigation |
| `data-controller-container` | Focusable card/tile (with `tabindex="0"`) |
| `data-controller-install` | Container has an Install button (Enter prioritizes it) |
| `data-controller-launch` | Container has a Launch button (Enter prioritizes it) |
| `data-controller-install-btn` | The actual Install button inside a container |
| `data-controller-launch-btn` | The actual Launch button inside a container |
| `data-controller-ignore` | Skip this element and descendants from navigation |
| `data-controller-focus` | Make non-standard element focusable |
## Implementation
- **File**: `neode-ui/src/composables/useControllerNav.ts`
- **Store**: `neode-ui/src/stores/controller.ts` (tracks active state + gamepad count)
- **Sounds**: `neode-ui/src/composables/useNavSounds.ts` (move/action/back)
- **Spatial nav**: `findNearestInDirection()` — filters by direction, scores by overlap + distance

443
docs/SEED-VERIFICATION.md Normal file
View File

@ -0,0 +1,443 @@
# Archipelago Seed Verification
Independently verify that your 24-word BIP-39 mnemonic produces the correct
Nostr keys and DID identifiers — using only standard cryptographic primitives,
no Archipelago code.
```
24-word mnemonic
|
v
PBKDF2-HMAC-SHA512 (2048 rounds, salt = "mnemonic")
|
v
64-byte master seed
|
+-- HKDF-SHA256 (info="archipelago/node/ed25519/v1")
| --> Node Ed25519 keypair --> did:key:z...
|
+-- HKDF-SHA256 (info="archipelago/nostr-node/secp256k1/v1")
| --> Node Nostr key --> npub1...
|
+-- HKDF-SHA256 (info="archipelago/identity/{i}/ed25519/v1")
| --> Identity[i] Ed25519 --> did:key:z...
|
+-- BIP-32 m/44'/1237'/0'/0/{i} (NIP-06)
| --> Identity[i] Nostr key --> npub1...
|
+-- BIP-32 m/84'/0'/0'
| --> Bitcoin HD wallet
|
+-- HKDF-SHA256 (info="archipelago/lnd/entropy/v1")
--> 16 bytes LND aezeed entropy
```
Source: [`core/archipelago/src/seed.rs`](../core/archipelago/src/seed.rs) and
[`core/archipelago/src/identity.rs`](../core/archipelago/src/identity.rs)
---
## Setup
```bash
pip3 install cryptography ecdsa
```
Two packages, both pure crypto, no network calls. Python 3.9+.
---
## The Verification Script
Save as `verify-seed.py` and run with your mnemonic:
```bash
MNEMONIC="word1 word2 ... word24" python3 verify-seed.py
```
```python
#!/usr/bin/env python3
"""
Archipelago seed derivation verifier.
Re-derives every key from a BIP-39 mnemonic using the exact same algorithms
as the Rust backend (seed.rs), so you can compare outputs independently.
Dependencies: cryptography, ecdsa (pip3 install cryptography ecdsa)
No network calls. No file writes. Safe to run air-gapped.
"""
import hashlib, hmac, os, sys
# ── BIP-39: mnemonic --> 64-byte master seed ─────────────────────────────
def mnemonic_to_seed(mnemonic: str) -> bytes:
"""PBKDF2-HMAC-SHA512, 2048 rounds, salt = 'mnemonic', no passphrase."""
return hashlib.pbkdf2_hmac(
"sha512",
mnemonic.encode("utf-8"),
b"mnemonic", # BIP-39 salt prefix + empty passphrase
2048,
)
# ── HKDF-SHA256 (RFC 5869) ──────────────────────────────────────────────
def hkdf_sha256(ikm: bytes, info: bytes, length: int = 32) -> bytes:
"""
HKDF-Extract(salt=None, ikm) then HKDF-Expand(PRK, info, L).
Salt=None means 32 zero bytes per RFC 5869 section 2.2.
Matches: hkdf::Hkdf::<Sha256>::new(None, ikm).expand(info, &mut okm)
"""
# Extract
prk = hmac.new(b"\x00" * 32, ikm, hashlib.sha256).digest()
# Expand (32 bytes = 1 block, only T(1) needed)
t1 = hmac.new(prk, info + b"\x01", hashlib.sha256).digest()
return t1[:length]
# ── Ed25519 ──────────────────────────────────────────────────────────────
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
from cryptography.hazmat.primitives.serialization import Encoding, PublicFormat
def ed25519_keypair(secret_32: bytes) -> tuple[bytes, bytes]:
"""Returns (private_32, public_32) from a 32-byte seed."""
sk = Ed25519PrivateKey.from_private_bytes(secret_32)
pk = sk.public_key().public_bytes(Encoding.Raw, PublicFormat.Raw)
return secret_32, pk
# ── secp256k1 ────────────────────────────────────────────────────────────
from ecdsa import SECP256k1, SigningKey as ECDSASigningKey
def secp256k1_xonly(secret_32: bytes) -> bytes:
"""32-byte x-only pubkey (Schnorr/Nostr format) from private key bytes."""
sk = ECDSASigningKey.from_string(secret_32, curve=SECP256k1)
point = sk.get_verifying_key().pubkey.point
return point.x().to_bytes(32, "big")
# ── BIP-32 HD derivation (secp256k1) ────────────────────────────────────
import struct
SECP256K1_N = SECP256k1.order
def _bip32_master(seed: bytes) -> tuple[bytes, bytes]:
"""BIP-32 master key: HMAC-SHA512(key='Bitcoin seed', data=seed)."""
I = hmac.new(b"Bitcoin seed", seed, hashlib.sha512).digest()
return I[:32], I[32:] # (secret, chain_code)
def _bip32_ckd(key: bytes, chain: bytes, index: int) -> tuple[bytes, bytes]:
"""Child key derivation (private -> private)."""
if index >= 0x80000000:
data = b"\x00" + key + struct.pack(">I", index)
else:
# Compressed pubkey for non-hardened
sk = ECDSASigningKey.from_string(key, curve=SECP256k1)
pt = sk.get_verifying_key().pubkey.point
prefix = b"\x02" if pt.y() % 2 == 0 else b"\x03"
data = prefix + pt.x().to_bytes(32, "big") + struct.pack(">I", index)
I = hmac.new(chain, data, hashlib.sha512).digest()
child = (int.from_bytes(I[:32], "big") + int.from_bytes(key, "big")) % SECP256K1_N
return child.to_bytes(32, "big"), I[32:]
def bip32_derive(seed: bytes, path: str) -> bytes:
"""
Derive private key for a BIP-32 path like 'm/44h/1237h/0h/0/0'.
Matches: bitcoin::bip32::Xpriv::new_master + derive_priv
"""
key, chain = _bip32_master(seed)
for part in path.lstrip("m/").split("/"):
hardened = part.endswith("'") or part.endswith("h")
idx = int(part.rstrip("'h"))
if hardened:
idx += 0x80000000
key, chain = _bip32_ckd(key, chain, idx)
return key
# ── Bech32 encoding (NIP-19: npub / nsec) ───────────────────────────────
_BECH32 = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
def _bech32_polymod(values):
GEN = [0x3B6A57B2, 0x26508E6D, 0x1EA119FA, 0x3D4233DD, 0x2A1462B3]
chk = 1
for v in values:
b = chk >> 25
chk = ((chk & 0x1FFFFFF) << 5) ^ v
for i in range(5):
chk ^= GEN[i] if ((b >> i) & 1) else 0
return chk
def bech32_encode(hrp: str, data: bytes) -> str:
"""Bech32 encode (NIP-19 for npub1.../nsec1...)."""
# Convert 8-bit to 5-bit
acc, bits, vals = 0, 0, []
for byte in data:
acc = (acc << 8) | byte
bits += 8
while bits >= 5:
bits -= 5
vals.append((acc >> bits) & 31)
if bits:
vals.append((acc << (5 - bits)) & 31)
# Checksum
hrp_exp = [ord(c) >> 5 for c in hrp] + [0] + [ord(c) & 31 for c in hrp]
polymod = _bech32_polymod(hrp_exp + vals + [0]*6) ^ 1
checksum = [(polymod >> 5*(5-i)) & 31 for i in range(6)]
return hrp + "1" + "".join(_BECH32[d] for d in vals + checksum)
# ── did:key encoding ────────────────────────────────────────────────────
_B58 = "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"
def base58_encode(data: bytes) -> str:
n = int.from_bytes(data, "big")
result = ""
while n > 0:
n, r = divmod(n, 58)
result = _B58[r] + result
for b in data:
if b == 0:
result = "1" + result
else:
break
return result
def to_did_key(ed25519_pub_32: bytes) -> str:
"""did:key:z<base58btc(0xed01 + pubkey)> — W3C did:key method, Ed25519."""
return "did:key:z" + base58_encode(b"\xed\x01" + ed25519_pub_32)
# ── Main ─────────────────────────────────────────────────────────────────
def main():
mnemonic = os.environ.get("MNEMONIC", "").strip()
if not mnemonic:
print("Enter your 24-word mnemonic (space-separated):")
mnemonic = input("> ").strip()
words = mnemonic.split()
if len(words) != 24:
print(f"Error: expected 24 words, got {len(words)}", file=sys.stderr)
sys.exit(1)
seed = mnemonic_to_seed(mnemonic)
W = 72
print()
print("=" * W)
print(" ARCHIPELAGO SEED DERIVATION VERIFICATION")
print("=" * W)
print()
print(f" Seed fingerprint (SHA-256): {hashlib.sha256(seed).hexdigest()[:16]}...")
print(f" Seed length: {len(seed)} bytes")
# ── 1. Node Ed25519 + DID ────────────────────────────────────────────
print()
print("-" * W)
print(" 1. NODE ED25519 KEY")
print(f" HKDF-SHA256(seed, info='archipelago/node/ed25519/v1')")
print("-" * W)
node_ed_priv, node_ed_pub = ed25519_keypair(
hkdf_sha256(seed, b"archipelago/node/ed25519/v1")
)
node_did = to_did_key(node_ed_pub)
print(f" Private: {node_ed_priv.hex()}")
print(f" Public: {node_ed_pub.hex()}")
print(f" did:key: {node_did}")
# ── 2. Node Nostr key ────────────────────────────────────────────────
print()
print("-" * W)
print(" 2. NODE NOSTR KEY")
print(f" HKDF-SHA256(seed, info='archipelago/nostr-node/secp256k1/v1')")
print("-" * W)
node_nostr_priv = hkdf_sha256(seed, b"archipelago/nostr-node/secp256k1/v1")
node_nostr_pub = secp256k1_xonly(node_nostr_priv)
print(f" Private: {node_nostr_priv.hex()}")
print(f" X-only: {node_nostr_pub.hex()}")
print(f" nsec: {bech32_encode('nsec', node_nostr_priv)}")
print(f" npub: {bech32_encode('npub', node_nostr_pub)}")
# ── 3. Identity[0..2] Ed25519 + DID ─────────────────────────────────
print()
print("-" * W)
print(" 3. IDENTITY ED25519 KEYS + DID")
print(f" HKDF-SHA256(seed, info='archipelago/identity/{{i}}/ed25519/v1')")
print("-" * W)
for i in range(3):
info = f"archipelago/identity/{i}/ed25519/v1".encode()
priv, pub = ed25519_keypair(hkdf_sha256(seed, info))
did = to_did_key(pub)
print(f" [{i}] Public: {pub.hex()}")
print(f" did:key: {did}")
# ── 4. Identity[0..2] Nostr (NIP-06 BIP-32) ────────────────────────
print()
print("-" * W)
print(" 4. IDENTITY NOSTR KEYS (NIP-06)")
print(f" BIP-32 m/44'/1237'/0'/0/{{i}}")
print("-" * W)
for i in range(3):
priv = bip32_derive(seed, f"m/44'/1237'/0'/0/{i}")
pub = secp256k1_xonly(priv)
print(f" [{i}] X-only: {pub.hex()}")
print(f" nsec: {bech32_encode('nsec', priv)}")
print(f" npub: {bech32_encode('npub', pub)}")
# ── 5. Bitcoin BIP-84 ───────────────────────────────────────────────
print()
print("-" * W)
print(" 5. BITCOIN WALLET (BIP-84)")
print(f" BIP-32 m/84'/0'/0'")
print("-" * W)
btc_acct = bip32_derive(seed, "m/84'/0'/0'")
btc_pub = secp256k1_xonly(btc_acct)
print(f" Account key: {btc_acct.hex()}")
print(f" Account pub: {btc_pub.hex()}")
# ── 6. LND Entropy ──────────────────────────────────────────────────
print()
print("-" * W)
print(" 6. LND AEZEED ENTROPY")
print(f" HKDF-SHA256(seed, info='archipelago/lnd/entropy/v1') [16 bytes]")
print("-" * W)
lnd = hkdf_sha256(seed, b"archipelago/lnd/entropy/v1", 16)
print(f" Entropy: {lnd.hex()}")
# ── Done ─────────────────────────────────────────────────────────────
print()
print("=" * W)
print(" Compare these values with your Archipelago node:")
print(" UI: Settings > Identity")
print(" SSH: xxd -p /var/lib/archipelago/identity/node_key.pub")
print(" RPC: curl -s http://<ip>/api/rpc \\")
print(" -d '{\"method\":\"identity.get-node\"}' | jq .")
print("=" * W)
print()
if __name__ == "__main__":
main()
```
---
## How to Run
```bash
# Install (two packages, pure crypto, no telemetry)
pip3 install cryptography ecdsa
# Option A: environment variable (doesn't persist in shell history)
read -rs MNEMONIC && export MNEMONIC
# (type or paste your 24 words, press Enter)
python3 verify-seed.py
unset MNEMONIC
# Option B: interactive prompt
python3 verify-seed.py
# Enter your 24-word mnemonic (space-separated):
# > abandon abandon ... art
```
---
## What to Compare
| Output field | Where to find on your node |
|---|---|
| Node Ed25519 public | `xxd -p /var/lib/archipelago/identity/node_key.pub` |
| Node did:key | Settings > Identity > Node DID |
| Node npub | Settings > Identity > Nostr Public Key |
| Identity[0] did:key | Settings > Identity > first identity DID |
| Identity[0] npub | Settings > Identity > first identity Nostr key |
RPC alternative (from any machine on the LAN):
```bash
# Node identity
curl -s http://192.168.1.228/api/rpc \
-H 'Content-Type: application/json' \
-d '{"method":"identity.get-node"}' | jq .
# All identities
curl -s http://192.168.1.228/api/rpc \
-H 'Content-Type: application/json' \
-d '{"method":"identity.list"}' | jq .
```
---
## Cryptographic Reference
### HKDF-SHA256 (RFC 5869)
Used for Ed25519 and node-level Nostr keys. Domain separation via unique `info` strings
prevents key reuse across contexts.
```
Extract: PRK = HMAC-SHA256(salt=0x00*32, ikm=64_byte_seed)
Expand: OKM = HMAC-SHA256(PRK, info || 0x01) [first 32 bytes]
```
The Rust backend uses `hkdf::Hkdf::<Sha256>::new(None, ikm)` where `None` salt = 32 zero bytes.
### BIP-32 (secp256k1 HD derivation)
Used for per-identity Nostr keys (NIP-06) and Bitcoin wallet.
```
Master: HMAC-SHA512(key="Bitcoin seed", data=64_byte_seed)
Child: HMAC-SHA512(key=chain_code, data=0x00||key||index) [hardened]
HMAC-SHA512(key=chain_code, data=pubkey||index) [normal]
```
The Rust backend uses the `bitcoin` crate: `Xpriv::new_master()` + `derive_priv()`.
### did:key (W3C)
```
did:key:z + base58btc( 0xED 0x01 || 32_byte_ed25519_pubkey )
```
Multicodec prefix `0xED 0x01` identifies Ed25519 public keys.
The Rust backend uses `bs58::encode()` over a 34-byte buffer.
### NIP-19 Bech32 (npub/nsec)
```
npub1... = bech32(hrp="npub", data=32_byte_x_only_pubkey)
nsec1... = bech32(hrp="nsec", data=32_byte_private_key)
```
X-only pubkey = just the x-coordinate of the secp256k1 point (Schnorr format).
---
## Security
- Run on an air-gapped machine or at minimum a private terminal session
- The script makes zero network calls and writes zero files
- After verification, clean up:
```bash
rm verify-seed.py
unset MNEMONIC
history -c # bash
# or: fc -W /dev/null # zsh
```
- Never paste your mnemonic into a web tool, online REPL, or shared terminal

View File

@ -1165,10 +1165,11 @@ cat > "$WORK_DIR/setup-tor.sh" <<'TORSCRIPT'
# Prefers system Tor (apt package) over container
ARCHY_TOR_DIR="/var/lib/archipelago/tor"
TOR_CONFIG_DIR="/var/lib/archipelago/tor-config"
TOR_DIR="/var/lib/tor"
LOG="/var/log/archipelago-tor.log"
mkdir -p "$ARCHY_TOR_DIR"
mkdir -p "$ARCHY_TOR_DIR" "$TOR_CONFIG_DIR"
# Write services.json for the backend to read
cat > "$ARCHY_TOR_DIR/services.json" <<TORJSON
@ -1185,11 +1186,17 @@ cat > "$ARCHY_TOR_DIR/services.json" <<TORJSON
}
TORJSON
echo "services.json created"
# Backend reads from tor-config/, not tor/
cp "$ARCHY_TOR_DIR/services.json" "$TOR_CONFIG_DIR/services.json"
chown -R archipelago:archipelago "$TOR_CONFIG_DIR" 2>/dev/null || true
# Generate torrc — use /var/lib/tor/ for hidden services (AppArmor-safe)
cat > /etc/tor/torrc <<TORRC
SocksPort 9050
ControlPort 0
SocksPort 0.0.0.0:9050
SocksPolicy accept 10.89.0.0/16
SocksPolicy accept 127.0.0.0/8
SocksPolicy reject *
# ControlPort disabled for security
HiddenServiceDir $TOR_DIR/hidden_service_archipelago
HiddenServicePort 80 127.0.0.1:80
@ -1257,6 +1264,18 @@ for i in 1 2 3 4 5 6 7 8 9 10; do
break
fi
done
# Sync hostnames to backend-readable directory
HOSTNAMES_DIR="/var/lib/archipelago/tor-hostnames"
mkdir -p "$HOSTNAMES_DIR"
for svc in archipelago bitcoin electrumx lnd btcpay mempool fedimint; do
if [ -f "$TOR_DIR/hidden_service_${svc}/hostname" ]; then
cp "$TOR_DIR/hidden_service_${svc}/hostname" "$HOSTNAMES_DIR/$svc"
echo "$(date): Synced hostname: $svc" >> "$LOG"
fi
done
chown -R archipelago:archipelago "$HOSTNAMES_DIR" 2>/dev/null || true
echo "$(date): Hostnames synced: $(ls $HOSTNAMES_DIR 2>/dev/null | tr '\n' ' ')" >> "$LOG"
TORSCRIPT
chmod +x "$WORK_DIR/setup-tor.sh"

View File

@ -17,13 +17,33 @@ For each task in `loop/plan.md`:
8. Mark it done `- [x]` in `loop/plan.md`
9. Move to the next unchecked task immediately
## Rules
## Critical Rules
- **Deploy-test-fix LOOPS**: Many tasks require you to deploy, test, find failures, fix them, redeploy, and retest. Do NOT mark a task complete until ALL tests in that task pass. If a fix introduces a new failure, fix that too. Keep looping.
- **Read logs obsessively**: After every deploy, read `journalctl`, `podman logs`, and curl output. The logs tell you what's broken.
- **Fix the root cause**: Don't patch symptoms. If a container won't restart, find out WHY (wrong restart policy? health check failing? missing dependency?) and fix the actual cause.
- Never skip a testing gate -- if tests fail, fix before moving on
- If a task is proving difficult, make at least 10 genuine attempts before moving on
- Always read source files before editing them
- Do not stop until all tasks are checked or you are rate limited
- Commit after each completed task
- DEPLOY ONLY TO .198 -- Never deploy to .228
- Use `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.198` for SSH
- Run Rust checks on the dev server, NOT macOS
- Commit after each completed fix (multiple commits per task is fine)
- DO NOT PUSH -- a CI build is in progress, we will push manually later
- Deploy to .228 -- `ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228`
- Run Rust builds/checks on .228, NOT macOS
- Production-quality code only -- no shortcuts, no TODO comments, no unwrap()
## SSH Quick Reference
```bash
SSH="ssh -i ~/.ssh/archipelago-deploy archipelago@192.168.1.228"
# Deploy from macOS:
./scripts/deploy-to-target.sh --target 192.168.1.228
# Build Rust on .228:
$SSH "cd ~/archy/core && cargo clippy --all-targets --all-features && cargo test --all-features"
# Check containers:
$SSH "podman ps -a --format '{{.Names}} {{.State}} {{.Status}}' | sort"
# Read container logs:
$SSH "podman logs bitcoin-knots --tail 30"
# Check backend:
$SSH "journalctl -u archipelago --no-pager -n 50"
```

View File

@ -142,18 +142,21 @@ Row 2: [C] Files [C] Peer1 [C] Peer2 (etc)
No nav bar.
### Grid `[C]` (2-col)
### Grid `[C]`
```
Row 1: [C] Local Network [C] Web3
Row 2: [C] Quick Actions (etc)
Row 1: [C] Quick Actions (full-width, contains Restart/Check Tor/Auto-Sync/Logs)
Row 2: [C] Local Network [C] Web3
Row 3: [C] Network Interfaces [C] Tor Services
```
| Position | Up | Down | Left | Right | Enter |
|----------------|-----------|---------------|-----------|-----------|------------------|
| Local Network | nothing | Quick Actions | Sidebar | Web3 | Drill into [Y] |
| Web3 | nothing | Quick Actions | Local Net | nothing | Drill into [Y] |
| Quick Actions | Local Net | nothing | Sidebar | nothing | Drill into [Y] |
| Position | Up | Down | Left | Right | Enter |
|----------------------|-----------------|---------------------|---------------|-------------------|------------------|
| Quick Actions | nothing | Local Network | Sidebar | nothing | Drill into [Y] |
| Local Network | Quick Actions | Network Interfaces | Sidebar | Web3 | Drill into [Y] |
| Web3 | Quick Actions | Tor Services | Local Network | nothing | Drill into [Y] |
| Network Interfaces | Local Network | nothing | Sidebar | Tor Services | Drill into [Y] |
| Tor Services | Web3 | nothing | Net Interfaces| nothing | Drill into [Y] |
---
@ -179,23 +182,28 @@ Standard spatial grid nav. Left from leftmost = Sidebar. Enter = drill into [Y]
### Nav bar `[N]`
```
[N] My Apps [N] App Store [N] Services | [N] Category filters (etc)
[N] My Apps [N] App Store [N] Services | [N] Discover [N] Categories... | [N] Search
```
### Grid `[C]` (3-col)
Down from nav bar → first container. Nav bar remembers last-focused tab — Up from cards returns to it.
### Grid `[C]`
```
Row 0: [C] Featured1 [C] Featured2 [C] Featured3
Row 1: [C] App1 [C] App2 [C] App3
(etc)
Featured (2-col): [C] Featured1 [C] Featured2
All Apps (3-col): [C] App1 [C] App2 [C] App3
[C] App4 [C] App5 [C] App6
(etc)
```
| Position | Up | Down | Left | Right | Enter |
|--------------|-------------|----------|-----------|------------|---------------|
| [N] tabs | nothing | Featured1| left tab | right tab | Switch/filter |
| Featured1 | [N] bar | App1 | Sidebar | Featured2 | View details |
| App1 | Featured1 | App4 | Sidebar | App2 | Install |
| (etc) | above | below | left/side | right | Install |
Cards use same style as My Apps: `glass-card transition-all hover:-translate-y-1`.
| Position | Up | Down | Left | Right | Enter |
|--------------|----------------|----------|-----------|------------|--------------------|
| [N] tabs | nothing | Featured1| left tab | right tab | Switch/filter |
| Featured1 | remembered [N] | App1 | Sidebar | Featured2 | View details |
| App1 | Featured1 | App4 | Sidebar | App2 | Install / details |
| (etc) | above | below | left/side | right | Install / details |
---
@ -204,11 +212,19 @@ Row 1: [C] App1 [C] App2 [C] App3
### Grid `[C]`
```
Row 1: [C] Device Status [C] Chat Panel
Row 2: [C] Peers List [C] Tab Panel (Bitcoin/Dead Man/Map)
Left column: [C] Device Status [C] Actions [C] Peers List
Right column: [C] Chat Panel [C] Tools (Bitcoin/Dead Man/Map)
```
Spatial grid nav. Enter = drill into controls.
| Position | Up | Down | Left | Right | Enter |
|-----------------|---------------|-----------|-----------|-------------|--------------------------------|
| Device Status | nothing | Actions | Sidebar | Chat Panel | Drill into [Y] |
| Actions | Device Status | Peers | Sidebar | Chat Panel | Drill into [Y] buttons |
| Peers List | Actions | nothing | Sidebar | Chat Panel | Drill into peer rows |
| Chat Panel | nothing | Tools | Device | nothing | Drill into [Y] |
| Tools | Chat Panel | nothing | Peers | nothing | Drill into [Y] |
**Chat flow:** Select a peer/channel (Enter on peer row) → focus auto-jumps to message input → type → Enter sends.
---
@ -227,23 +243,72 @@ Spatial grid nav. Enter = view node details.
## SETTINGS `/dashboard/settings`
### Grid `[C]` (vertical stack)
**Mixed page:** Two containers ([C] Server Name, [C] Interface Mode) + linear buttons.
Up/Down steps through elements. Right navigates paired items on the same row. Left → sidebar.
Enter on containers → drill in. Enter on buttons → activate. Escape → exit container / sidebar.
`[C]` = Container `[B]` = Button `[I]` = Input `[T]` = Toggle
### Account Section (glass-card)
```
Row 1: [C] Account Info
Row 2: [C] Change Password
Row 3: [C] Two-Factor Auth
Row 4: [C] System Info
Row 5: [C] Danger Zone
1. [C] Server Name → Enter: edit name, Enter: save, Escape: cancel
[B] What's New → right of Server Name
2. [B] Copy DID
3. [B] Copy Onion Address
4. [B] Change Password → opens modal
5. [B] Enable 2FA / Disable 2FA → opens modal
6. [B] Logout
```
| Position | Up | Down | Left | Right | Enter |
|-------------------|-----------------|------------------|---------|---------|------------------|
| Account Info | nothing | Change Password | Sidebar | nothing | Drill into [Y] |
| Change Password | Account Info | Two-Factor | Sidebar | nothing | Drill into [Y] |
| Two-Factor | Change Password | System Info | Sidebar | nothing | Drill into [Y] |
| System Info | Two-Factor | Danger Zone | Sidebar | nothing | Drill into [Y] |
| Danger Zone | System Info | nothing | Sidebar | nothing | Drill into [Y] |
### System Section
```
7. [C] Interface Mode → Enter: drill in, Left/Right between Easy/Gamer/Chat, Enter: select, Escape: exit
[B] Language buttons → below Interface Mode
8. [B] Login with Claude → opens modal
9. [T] Enable All (AI data) + per-category [T] toggles
10. [B] Manage Updates
11. [I] Webhook URL
12. [I] Webhook Secret
13. [T] Container Crash [T] Update Available
14. [T] Disk Space Warning [T] Backup Complete
15. [B] Save Configuration [B] Send Test
16. [T] Enable Beta Telemetry
17. [B] Create Backup
18. [B] Export Channel Backup
19. [B] Network Diagnostics → navigates to /dashboard/server
20. [B] Reboot → opens confirm modal
21. [B] Factory Reset → opens confirm modal
```
| Position | Up | Down | Left | Right | Enter |
|---------------------------|-------------|-------------|---------------|----------------|--------------------|
| 1. Server Name | nothing | Copy DID | Sidebar | What's New | Edit name |
| 1b. What's New | nothing | Copy DID | Server Name | nothing | Show release notes |
| 2. Copy DID | Server Name | Copy Onion | Sidebar | nothing | Copy to clipboard |
| 3. Copy Onion | Copy DID | Change PW | Sidebar | nothing | Copy to clipboard |
| 4. Change Password | Copy Onion | Enable 2FA | Sidebar | nothing | Open modal |
| 5. Enable 2FA | Change PW | Logout | Sidebar | nothing | Open modal |
| 6. Logout | Enable 2FA | Language | Sidebar | nothing | Logout |
| 7. Language | Logout | Claude Login| Sidebar | nothing | Select language |
| 8. Login with Claude | Language | AI toggles | Sidebar | nothing | Open modal |
| 9. AI toggles (each row) | above | below | Sidebar | next toggle | Toggle on/off |
| 10. Manage Updates | AI toggles | Webhook URL | Sidebar | nothing | Open updates |
| 11. Webhook URL | Updates | Secret | Sidebar | nothing | Edit field |
| 12. Secret | Webhook URL | Crash toggle| Sidebar | nothing | Edit field |
| 13a. Container Crash | Secret | Disk Space | Sidebar | Update Avail | Toggle on/off |
| 13b. Update Available | Secret | Backup Done | Container Crash| nothing | Toggle on/off |
| 14a. Disk Space Warning | Crash | Save Config | Sidebar | Backup Done | Toggle on/off |
| 14b. Backup Complete | Update Avail| Send Test | Disk Space | nothing | Toggle on/off |
| 15a. Save Configuration | Disk Space | Telemetry | Sidebar | Send Test | Save |
| 15b. Send Test | Backup Done | Telemetry | Save Config | nothing | Send test webhook |
| 16. Telemetry | Save/Test | Create Bkup | Sidebar | nothing | Toggle on/off |
| 17. Create Backup | Telemetry | Export Chan | Sidebar | nothing | Open modal |
| 18. Export Channel | Create Bkup | Net Diag | Sidebar | nothing | Export |
| 19. Network Diagnostics | Export Chan | Reboot | Sidebar | nothing | → /dashboard/server|
| 20. Reboot | Net Diag | Factory Rst | Sidebar | nothing | Open confirm |
| 21. Factory Reset | Reboot | nothing | Sidebar | nothing | Open confirm |
---
@ -526,3 +591,70 @@ Default focus: `[B] Set Password`
6. Inside [Y]: arrows move between inner controls. Escape → back to [C].
7. Escape from [C] → Sidebar.
8. No dead ends.
---
## Implementation Notes (for future sessions)
### Key files
- **Navigation logic**: `neode-ui/src/composables/useControllerNav.ts`
- **Controller store**: `neode-ui/src/stores/controller.ts`
- **Nav sounds**: `neode-ui/src/composables/useNavSounds.ts`
- **Focus styles**: `neode-ui/src/style.css` (lines ~53-142, search `focus-visible`)
### Data attributes
| Attribute | Purpose |
|-----------|---------|
| `data-controller-zone="main"` | Main content area (`<main>` in Dashboard.vue) |
| `data-controller-zone="sidebar"` | Sidebar nav |
| `data-controller-container` + `tabindex="0"` | Focusable card tile — gamepad can land on it, Enter drills in |
| `data-controller-install` | Container has Install button (Enter prioritizes it) |
| `data-controller-launch` | Container has Launch button (Enter prioritizes it) |
| `data-controller-install-btn` | The actual Install button inside a container |
| `data-controller-launch-btn` | The actual Launch button inside a container |
| `data-controller-ignore` | Skip element and descendants from gamepad nav |
| `tabindex="-1"` | Remove from gamepad focus order (used on ToggleSwitch) |
### Focus memory keys
| Key | Purpose | Cleared on |
|-----|---------|------------|
| `sidebar` | Last sidebar item focused | never (persists) |
| `main` | Last container/element in main zone | route change |
| `navBar` | Last nav bar tab (for Up return from containers) | route change |
### Navigation handler order (handleKeyDown)
1. **Text inputs** — special handling (Enter submits, Up/Down exits field)
2. **Escape** — close overlays → exit inner controls → exit to sidebar → back on detail pages
3. **Enter** — container actions (install/launch/link/inner) → regular click
4. **Sidebar** — Up/Down wrap, Right → main (containers or first focusable)
5. **Inside container** — arrows move between inner controls, can't leave via arrows
6. **Nav bar items** — Left/Right between tabs, Down/Up to nearest focusable (containers + buttons)
7. **Main zone** — spatial nav through containers + standalone focusables, fallbacks for edges
### Mixed pages (containers + standalone buttons, e.g. Settings)
- `isNavBarItem()` returns false on container-free pages (lets main zone handler do linear nav)
- Both nav bar handler and main zone handler search containers + standalone focusables together
- This prevents "jumping" where Down skips standalone buttons to reach the next container
- The filter `el.hasAttribute('data-controller-container') || !el.closest('[data-controller-container]')` excludes inner buttons
### Container-free pages (e.g. Settings if all containers removed)
- Sidebar → Right: checks `zone.querySelector('[data-controller-container]')` — if none found, focuses first focusable immediately (no 1s poll delay)
- `isNavBarItem()` returns false (prevents nav bar handler from catching everything)
- Main zone handler's spatial nav through all focusables handles Up/Down/Left/Right
### ToggleSwitch component
- Has `tabindex="-1"` and `data-controller-ignore` — invisible to gamepad nav
- Parent button handles the toggle click, so the switch doesn't need its own focus
- Without this, nav gets stuck bouncing between parent button and toggle switch
### Focus glow styles (Chromium gotchas)
- `box-shadow: 0 0 0 Npx` (spread-based ring) does NOT follow `border-radius` on composited layers (`translateZ(0)`)
- `outline` doesn't follow `border-radius` in Chrome < 94
- Safe approach: use blurred `box-shadow` (`0 0 6px 2px`) or `border-color` change for focus rings
- All `[data-controller-container]` have `outline: none !important` to kill browser defaults
- Cards use `glass-card transition-all hover:-translate-y-1` for consistent hover/focus lift
### Mesh chat auto-focus
- `openChat()`, `openChannelChat()`, `openArchChannel()` all call `nextTick(() => chatInputEl.value?.focus())`
- Message input has `@keydown.enter.exact.prevent="handleSendMessage"` — Enter sends immediately
- Ref: `chatInputEl` on the `<input>` element in Mesh.vue

View File

@ -1,6 +1,6 @@
<template>
<Teleport to="body">
<div class="fixed top-4 right-4 z-[9999] flex flex-col gap-2 pointer-events-none max-w-sm w-full">
<div class="fixed right-4 z-[9999] flex flex-col gap-2 pointer-events-none max-w-sm w-full" style="top: calc(var(--safe-area-top, env(safe-area-inset-top, 0px)) + 16px);">
<TransitionGroup name="toast-stack">
<div
v-for="toast in toasts"

View File

@ -96,10 +96,18 @@ select:focus-visible {
/* Mobile: override with tab bar clearance */
@media (max-width: 767px) {
.mobile-scroll-pad {
padding-bottom: calc(var(--mobile-tab-bar-height, 88px) + env(safe-area-inset-bottom, 0px) + 16px);
padding-bottom: calc(var(--mobile-tab-bar-height, 88px) + var(--safe-area-bottom, env(safe-area-inset-bottom, 0px)) + 16px);
}
.mobile-scroll-pad-back {
padding-bottom: calc(var(--mobile-tab-bar-height, 88px) + env(safe-area-inset-bottom, 0px) + 64px);
padding-bottom: calc(var(--mobile-tab-bar-height, 88px) + var(--safe-area-bottom, env(safe-area-inset-bottom, 0px)) + 64px);
}
/* Safe area top padding for all mobile content views.
When tabs are showing, Dashboard.vue sets an explicit paddingTop via :style
which overrides this. When no tabs (e.g. Home), this kicks in.
Android WebView sets --safe-area-top; iOS uses env(). */
.mobile-safe-top {
padding-top: calc(var(--safe-area-top, env(safe-area-inset-top, 0px)) + 16px);
}
}
@ -309,7 +317,7 @@ input[type="radio"]:active + * {
.chat-mode-pill {
position: absolute;
top: 2.25rem;
right: 1.25rem;
right: 2.25rem;
z-index: 10;
}

View File

@ -81,6 +81,7 @@ export type PackageState = typeof PackageState[keyof typeof PackageState]
export interface PackageDataEntry {
state: PackageState
health?: string | null // "healthy", "unhealthy", "starting", or null
'exit-code'?: number | null // container exit code: 0 = clean stop, non-zero = crash
'static-files'?: {
license: string
instructions: string

View File

@ -38,16 +38,29 @@
@open-new-tab-and-back="openNewTabAndBack"
/>
<!-- Mobile: floating glass close button -->
<button
class="md:hidden app-session-mobile-close"
aria-label="Close"
@click="closeSession"
>
<svg class="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
<!-- Mobile bottom browser bar part of flex layout, doesn't overlay content -->
<div class="md:hidden app-session-mobile-bar">
<button class="app-session-bar-btn" aria-label="Back" @click="iframeGoBack">
<svg fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M15 19l-7-7 7-7" />
</svg>
</button>
<button class="app-session-bar-btn" aria-label="Forward" @click="iframeGoForward">
<svg fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M9 5l7 7-7 7" />
</svg>
</button>
<button class="app-session-bar-btn" aria-label="Open in new tab" @click="openNewTab">
<svg fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M10 6H6a2 2 0 00-2 2v10a2 2 0 002 2h10a2 2 0 002-2v-4M14 4h6m0 0v6m0-6L10 14" />
</svg>
</button>
<button class="app-session-bar-btn" aria-label="Close" @click="closeSession">
<svg fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M6 18L18 6M6 6l12 12" />
</svg>
</button>
</div>
</div>
<NostrIdentityPicker
@ -116,7 +129,10 @@ const appId = computed(() => {
})
const appTitle = computed(() => resolveAppTitle(appId.value))
const mustOpenNewTab = computed(() => NEW_TAB_APPS.has(appId.value))
const isMobile = typeof window !== 'undefined' && window.innerWidth < 768
// On mobile (Android WebView), all apps load in the iframe X-Frame-Options
// doesn't apply since the WebView is the top-level browsing context.
const mustOpenNewTab = computed(() => isMobile ? false : NEW_TAB_APPS.has(appId.value))
const appUrl = computed(() => {
return resolveAppUrl(appId.value, route.query.path as string | undefined)
@ -347,6 +363,7 @@ onBeforeUnmount(() => {
width: 100%;
height: 100%;
border-radius: 0;
border: none;
}
@media (min-width: 768px) {
@ -389,7 +406,8 @@ onBeforeUnmount(() => {
width: 100%;
height: 100%;
border-radius: 0;
box-shadow: 0 25px 50px -12px rgba(0, 0, 0, 0.6);
border: none;
box-shadow: none;
}
@media (min-width: 768px) {
@ -472,29 +490,63 @@ onBeforeUnmount(() => {
opacity: 0;
}
/* Mobile floating glass close button */
.app-session-mobile-close {
position: fixed;
bottom: calc(24px + env(safe-area-inset-bottom, 0px));
left: 50%;
transform: translateX(-50%);
z-index: 2500;
width: 48px;
height: 48px;
/* Mobile: full-bleed app sessions — no border, no radius, no shadow */
@media (max-width: 767px) {
.app-session-panel.glass-card {
border: none !important;
border-radius: 0 !important;
box-shadow: none !important;
}
.app-session-backdrop-overlay {
padding: 0;
backdrop-filter: none;
background: black;
}
/* Iframe frame: push content below status bar on mobile */
.app-session-frame-safe {
padding-top: var(--safe-area-top, env(safe-area-inset-top, 0px));
}
/* Iframe within padded container: fill remaining space */
.app-session-frame-safe iframe {
top: var(--safe-area-top, env(safe-area-inset-top, 0px));
height: calc(100% - var(--safe-area-top, env(safe-area-inset-top, 0px)));
}
}
/* Mobile bottom browser bar sized like the main tab bar.
Uses !important-free display so Tailwind md:hidden can override. */
@media (min-width: 768px) {
.app-session-mobile-bar { display: none !important; }
}
.app-session-mobile-bar {
display: flex;
justify-content: space-around;
align-items: center;
flex-shrink: 0;
padding: 12px 16px;
padding-bottom: calc(12px + var(--safe-area-bottom, env(safe-area-inset-bottom, 0px)));
background: rgba(0, 0, 0, 0.25);
backdrop-filter: blur(18px);
-webkit-backdrop-filter: blur(18px);
border-top: 1px solid rgba(255, 255, 255, 0.06);
}
.app-session-bar-btn {
display: flex;
align-items: center;
justify-content: center;
border-radius: 50%;
background: rgba(0, 0, 0, 0.45);
backdrop-filter: blur(16px);
-webkit-backdrop-filter: blur(16px);
border: 1px solid rgba(255, 255, 255, 0.15);
color: rgba(255, 255, 255, 0.85);
box-shadow: 0 4px 20px rgba(0, 0, 0, 0.4);
transition: background 0.15s ease, transform 0.15s ease;
width: 56px;
height: 56px;
border-radius: 14px;
color: rgba(255, 255, 255, 0.65);
transition: color 0.15s ease, background 0.15s ease;
}
.app-session-mobile-close:active {
background: rgba(0, 0, 0, 0.65);
transform: translateX(-50%) scale(0.9);
.app-session-bar-btn svg {
width: 24px;
height: 24px;
}
.app-session-bar-btn:active {
color: white;
background: rgba(255, 255, 255, 0.12);
}
</style>

View File

@ -83,14 +83,15 @@
<div
v-if="route.path === '/dashboard/chat' || route.path === '/dashboard/mesh'"
:class="['h-full', mobileTabPaddingTop ? 'overflow-y-auto' : '']"
:style="mobileTabPaddingTop ? { paddingTop: (mobileTabPaddingTop + 16) + 'px' } : undefined"
:style="{ paddingTop: mobileTabPaddingTop ? (mobileTabPaddingTop + 16) + 'px' : undefined }"
class="mobile-safe-top"
>
<component :is="Component" />
</div>
<div
v-else
:class="[
'absolute inset-0 px-4 pt-4 md:pt-8 md:px-8 overflow-y-auto',
'absolute inset-0 px-4 pt-4 md:pt-8 md:px-8 overflow-y-auto mobile-safe-top',
needsMobileBackButtonSpace
? 'mobile-scroll-pad-back'
: 'mobile-scroll-pad'

View File

@ -566,6 +566,7 @@ async function restartOnboarding() {
}
.login-card {
overflow: visible !important;
transform-style: preserve-3d;
transition: transform 0.5s cubic-bezier(0.25, 0.46, 0.45, 0.94),
opacity 0.5s cubic-bezier(0.25, 0.46, 0.45, 0.94),

View File

@ -450,7 +450,26 @@ async function loadTorServices() {
catch { torServices.value = []; torDaemonRunning.value = false } finally { torServicesLoading.value = false }
}
function copyTorAddress(address: string) { navigator.clipboard.writeText(address); logsToast.value = 'Onion address copied to clipboard'; setTimeout(() => { logsToast.value = '' }, 3000) }
async function copyTorAddress(address: string) {
try {
if (navigator.clipboard?.writeText) {
await navigator.clipboard.writeText(address)
} else {
const ta = document.createElement('textarea')
ta.value = address
ta.style.position = 'fixed'
ta.style.opacity = '0'
document.body.appendChild(ta)
ta.select()
document.execCommand('copy')
document.body.removeChild(ta)
}
logsToast.value = 'Onion address copied to clipboard'
} catch {
logsToast.value = 'Failed to copy address'
}
setTimeout(() => { logsToast.value = '' }, 3000)
}
async function toggleTorApp(appId: string, enabled: boolean) { try { await rpcClient.call({ method: 'tor.toggle-app', params: { app_id: appId, enabled }, timeout: 90000 }); await loadTorServices() } catch { /* handled */ } }
async function rotateService(name: string) { torRotating.value = name; try { await rpcClient.call({ method: 'tor.rotate-service', params: { name }, timeout: 90000 }); await loadTorServices() } catch { /* handled */ } finally { torRotating.value = false } }

View File

@ -15,10 +15,10 @@
<div class="flex items-center gap-2">
<span
class="inline-flex items-center px-2.5 py-1 rounded-lg text-xs font-medium"
:class="getStatusClass(pkg.state, pkg.health)"
:class="getStatusClass(pkg.state, pkg.health, pkg['exit-code'])"
>
<span class="w-1.5 h-1.5 rounded-full mr-1.5" :class="getStatusDotClass(pkg.state, pkg.health)"></span>
{{ getStatusLabel(pkg.state, pkg.health) }}
<span class="w-1.5 h-1.5 rounded-full mr-1.5" :class="getStatusDotClass(pkg.state, pkg.health, pkg['exit-code'])"></span>
{{ getStatusLabel(pkg.state, pkg.health, pkg['exit-code']) }}
</span>
<span class="text-white/50 text-xs">v{{ pkg.manifest.version }}</span>
</div>
@ -107,10 +107,10 @@
<div class="flex flex-wrap items-center gap-2">
<span
class="inline-flex items-center px-2 py-0.5 rounded text-xs font-medium"
:class="getStatusClass(pkg.state, pkg.health)"
:class="getStatusClass(pkg.state, pkg.health, pkg['exit-code'])"
>
<span class="w-1.5 h-1.5 rounded-full mr-1" :class="getStatusDotClass(pkg.state, pkg.health)"></span>
{{ getStatusLabel(pkg.state, pkg.health) }}
<span class="w-1.5 h-1.5 rounded-full mr-1" :class="getStatusDotClass(pkg.state, pkg.health, pkg['exit-code'])"></span>
{{ getStatusLabel(pkg.state, pkg.health, pkg['exit-code']) }}
</span>
<span class="text-white/50 text-xs">v{{ pkg.manifest.version }}</span>
</div>

View File

@ -107,7 +107,7 @@ export function isRealOnionAddress(addr: string | undefined): boolean {
return !!(addr && addr.endsWith('.onion') && addr.length >= 60 && addr.length <= 70)
}
export function getStatusClass(state: PackageState, health?: string | null): string {
export function getStatusClass(state: PackageState, health?: string | null, exitCode?: number | null): string {
if (state === PackageState.Running && health === 'starting') return 'bg-yellow-500/20 text-yellow-200 border border-yellow-500/30'
if (state === PackageState.Running && health === 'unhealthy') return 'bg-orange-500/20 text-orange-200 border border-orange-500/30'
switch (state) {
@ -116,7 +116,9 @@ export function getStatusClass(state: PackageState, health?: string | null): str
case PackageState.Stopped:
return 'bg-gray-500/20 text-gray-200 border border-gray-500/30'
case PackageState.Exited:
return 'bg-red-500/20 text-red-200 border border-red-500/30'
return exitCode != null && exitCode !== 0
? 'bg-red-500/20 text-red-200 border border-red-500/30'
: 'bg-gray-500/20 text-gray-200 border border-gray-500/30'
case PackageState.Starting:
case PackageState.Stopping:
case PackageState.Restarting:
@ -128,7 +130,7 @@ export function getStatusClass(state: PackageState, health?: string | null): str
}
}
export function getStatusDotClass(state: PackageState, health?: string | null): string {
export function getStatusDotClass(state: PackageState, health?: string | null, exitCode?: number | null): string {
if (state === PackageState.Running && health === 'starting') return 'bg-yellow-400 animate-pulse'
if (state === PackageState.Running && health === 'unhealthy') return 'bg-orange-400 animate-pulse'
switch (state) {
@ -137,7 +139,9 @@ export function getStatusDotClass(state: PackageState, health?: string | null):
case PackageState.Stopped:
return 'bg-gray-400'
case PackageState.Exited:
return 'bg-red-400 animate-pulse'
return exitCode != null && exitCode !== 0
? 'bg-red-400 animate-pulse'
: 'bg-gray-400'
case PackageState.Starting:
case PackageState.Stopping:
case PackageState.Restarting:
@ -149,10 +153,14 @@ export function getStatusDotClass(state: PackageState, health?: string | null):
}
}
export function getStatusLabel(state: PackageState, health?: string | null): string {
export function getStatusLabel(state: PackageState, health?: string | null, exitCode?: number | null): string {
if (state === PackageState.Running && health === 'starting') return 'starting up'
if (state === PackageState.Running && health === 'unhealthy') return 'unhealthy'
if (state === PackageState.Running && health === 'healthy') return 'healthy'
if (state === PackageState.Exited) return 'crashed'
if (state === PackageState.Exited) {
if (exitCode === 137) return 'killed (OOM)'
if (exitCode != null && exitCode !== 0) return 'crashed'
return 'stopped'
}
return state
}

View File

@ -1,5 +1,5 @@
<template>
<div class="relative flex-1 min-h-0 bg-black/40 overflow-hidden">
<div class="relative flex-1 min-h-0 bg-black/40 overflow-hidden app-session-frame-safe">
<Transition name="content-fade">
<div v-if="loading" class="absolute inset-0 z-10 flex items-center justify-center bg-black/40">
<svg class="animate-spin h-8 w-8 text-blue-400" viewBox="0 0 24 24" fill="none">

View File

@ -84,7 +84,7 @@
<div class="flex items-center gap-2">
<span
class="inline-flex items-center gap-1.5 px-2 py-1 rounded text-xs font-medium"
:class="getStatusClass(pkg.state, pkg.health)"
:class="getStatusClass(pkg.state, pkg.health, pkg['exit-code'])"
>
<svg
v-if="isTransitioning"
@ -97,12 +97,13 @@
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
<span v-if="pkg.state === 'running' && pkg.health === 'unhealthy'" class="w-1.5 h-1.5 rounded-full bg-orange-400 animate-pulse"></span>
{{ getStatusLabel(pkg.state, pkg.health) }}
{{ getStatusLabel(pkg.state, pkg.health, pkg['exit-code']) }}
</span>
</div>
<!-- Quick Actions -->
<!-- Quick Actions icon buttons in uniform dark containers -->
<div v-if="!isUninstalling" class="mt-4 flex gap-2">
<!-- Launch -->
<button
v-if="canLaunch(pkg)"
data-controller-launch-btn
@ -112,51 +113,56 @@
{{ t('common.launch') }}
<svg v-if="opensInTab(id)" class="w-3.5 h-3.5 opacity-60" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M10 6H6a2 2 0 00-2 2v10a2 2 0 002 2h10a2 2 0 002-2v-4M14 4h6m0 0v6m0-6L10 14" /></svg>
</button>
<!-- Start (play icon) -->
<button
v-if="!isWebOnly && !isLoading && (pkg.state === 'stopped' || pkg.state === 'exited')"
@click.stop="$emit('start', id)"
class="flex-1 px-4 py-2 glass-button glass-button-success rounded-lg text-sm font-medium flex items-center justify-center gap-2"
class="px-3 py-2 glass-button glass-button-sm rounded-lg flex items-center justify-center"
:title="pkg.state === 'exited' ? 'Restart' : t('common.start')"
>
<span>{{ pkg.state === 'exited' ? 'Restart' : t('common.start') }}</span>
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M8 5v14l11-7z" /></svg>
</button>
<!-- Starting (spinner) -->
<button
v-if="!isWebOnly && isLoading && (pkg.state === 'stopped' || pkg.state === 'exited' || pkg.state === 'starting')"
disabled
class="flex-1 px-4 py-2 glass-button glass-button-success rounded-lg text-sm font-medium opacity-50 cursor-not-allowed flex items-center justify-center gap-2"
class="px-3 py-2 glass-button glass-button-sm rounded-lg opacity-50 cursor-not-allowed flex items-center justify-center"
>
<svg class="animate-spin h-4 w-4" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<svg class="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
<span>{{ t('common.starting') }}</span>
</button>
<!-- Stop (square icon) -->
<button
v-if="!isWebOnly && !isLoading && (pkg.state === 'running' || pkg.state === 'starting')"
@click.stop="$emit('stop', id)"
class="flex-1 px-4 py-2 bg-yellow-500/20 border border-yellow-500/40 rounded-lg text-yellow-200 text-sm font-medium hover:bg-yellow-500/30 transition-colors flex items-center justify-center gap-2"
class="px-3 py-2 glass-button glass-button-sm rounded-lg flex items-center justify-center"
:title="t('common.stop')"
>
<span>{{ t('common.stop') }}</span>
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><rect x="6" y="6" width="12" height="12" rx="1" /></svg>
</button>
<!-- Restart -->
<button
v-if="!isWebOnly && !isLoading && (pkg.state === 'running' || pkg.state === 'starting')"
@click.stop="$emit('restart', id)"
class="px-2.5 py-2 glass-button glass-button-sm rounded-lg flex items-center justify-center"
class="px-3 py-2 glass-button glass-button-sm rounded-lg flex items-center justify-center"
:title="t('common.restart')"
>
<svg class="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" />
</svg>
</button>
<!-- Stopping (spinner) -->
<button
v-if="!isWebOnly && isLoading && (pkg.state === 'running' || pkg.state === 'starting' || pkg.state === 'stopping')"
disabled
class="flex-1 px-4 py-2 bg-yellow-500/20 border border-yellow-500/40 rounded-lg text-yellow-200 text-sm font-medium opacity-50 cursor-not-allowed flex items-center justify-center gap-2"
class="px-3 py-2 glass-button glass-button-sm rounded-lg opacity-50 cursor-not-allowed flex items-center justify-center"
>
<svg class="animate-spin h-4 w-4" aria-hidden="true" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<svg class="animate-spin h-4 w-4" fill="none" viewBox="0 0 24 24">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"></circle>
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
<span>{{ t('common.stopping') }}</span>
</button>
</div>
</div>

View File

@ -116,7 +116,7 @@ export function canLaunch(pkg: PackageDataEntry): boolean {
return !!hasUI && canLaunchState
}
export function getStatusClass(state: PackageState, health?: string | null): string {
export function getStatusClass(state: PackageState, health?: string | null, exitCode?: number | null): string {
if (state === PackageState.Running && health === 'starting') return 'bg-yellow-500/20 text-yellow-200'
if (state === PackageState.Running && health === 'unhealthy') return 'bg-orange-500/20 text-orange-200'
switch (state) {
@ -125,7 +125,10 @@ export function getStatusClass(state: PackageState, health?: string | null): str
case PackageState.Stopped:
return 'bg-gray-500/20 text-gray-200'
case PackageState.Exited:
return 'bg-red-500/20 text-red-200'
// Exit code 0 = clean shutdown (gray), non-zero = crash (red)
return exitCode != null && exitCode !== 0
? 'bg-red-500/20 text-red-200'
: 'bg-gray-500/20 text-gray-200'
case PackageState.Starting:
case PackageState.Stopping:
case PackageState.Restarting:
@ -137,11 +140,15 @@ export function getStatusClass(state: PackageState, health?: string | null): str
}
}
export function getStatusLabel(state: PackageState, health?: string | null): string {
export function getStatusLabel(state: PackageState, health?: string | null, exitCode?: number | null): string {
if (state === PackageState.Running && health === 'starting') return 'starting up'
if (state === PackageState.Running && health === 'unhealthy') return 'unhealthy'
if (state === PackageState.Running && health === 'healthy') return 'healthy'
if (state === PackageState.Exited) return 'crashed'
if (state === PackageState.Exited) {
if (exitCode === 137) return 'killed (OOM)'
if (exitCode != null && exitCode !== 0) return 'crashed'
return 'stopped'
}
return state
}

View File

@ -72,12 +72,10 @@ export function useAppsActions() {
try {
uninstallingApps.value.add(appId)
await store.uninstallPackage(appId)
if (store.packages && store.packages[appId]) {
delete store.packages[appId]
}
// State update comes via WebSocket — no manual deletion needed
} catch (err) {
if (import.meta.env.DEV) console.error('Failed to uninstall app:', err)
showActionError(`Failed to uninstall app: ${err instanceof Error ? err.message : 'Unknown error'}`)
showActionError(`Failed to uninstall: ${err instanceof Error ? err.message : 'Unknown error'}`)
} finally {
uninstallingApps.value.delete(appId)
uninstalling.value = false

View File

@ -2,9 +2,9 @@
<!-- Persistent Mobile Tabs for Apps/Marketplace -->
<div
v-if="showAppsTabs && !isAppSessionActive"
class="md:hidden fixed top-0 left-0 right-0 z-40 px-4 pt-4 pb-2 glass-piece"
class="md:hidden fixed top-0 left-0 right-0 z-40 px-4 pb-2 glass-piece mobile-top-tabs"
:class="{ 'glass-throw-mobile-tabs': showZoomIn }"
style="background: rgba(0, 0, 0, 0.25); backdrop-filter: blur(18px); -webkit-backdrop-filter: blur(18px); transform: translateZ(0);"
style="background: rgba(0, 0, 0, 0.25); backdrop-filter: blur(18px); -webkit-backdrop-filter: blur(18px); transform: translateZ(0); padding-top: calc(var(--safe-area-top, env(safe-area-inset-top, 0px)) + 16px);"
>
<div class="mode-switcher mode-switcher-full">
<RouterLink
@ -29,10 +29,10 @@
<!-- Persistent Mobile Tabs for Network/Cloud -->
<div
v-if="showNetworkTabs && !isAppSessionActive"
class="md:hidden fixed top-0 left-0 right-0 z-40 px-4 pt-4 pb-2 glass-piece"
class="md:hidden fixed left-0 right-0 z-40 px-4 pb-2 glass-piece mobile-top-tabs"
:class="{ 'glass-throw-mobile-tabs-2': showZoomIn }"
style="background: rgba(0, 0, 0, 0.25); backdrop-filter: blur(18px); -webkit-backdrop-filter: blur(18px); transform: translateZ(0);"
:style="{ top: showAppsTabs ? '80px' : '0' }"
:style="{ top: showAppsTabs ? '80px' : '0', paddingTop: showAppsTabs ? '16px' : 'calc(var(--safe-area-top, env(safe-area-inset-top, 0px)) + 16px)' }"
>
<div class="mode-switcher mode-switcher-full">
<RouterLink
@ -66,7 +66,7 @@
:aria-label="t('dashboard.mobileNav')"
class="md:hidden fixed bottom-0 left-0 right-0 border-t border-glass-border shadow-glass z-50 glass-piece"
:class="{ 'glass-throw-tabbar': showZoomIn }"
style="background: rgba(0, 0, 0, 0.25); backdrop-filter: blur(18px); -webkit-backdrop-filter: blur(18px); padding-bottom: env(safe-area-inset-bottom, 0px);"
style="background: rgba(0, 0, 0, 0.25); backdrop-filter: blur(18px); -webkit-backdrop-filter: blur(18px); padding-bottom: var(--safe-area-bottom, env(safe-area-inset-bottom, 0px));"
>
<div class="flex justify-around items-center px-2 py-3 relative">
<RouterLink
@ -160,11 +160,21 @@ const showNetworkTabs = computed(() => {
return route.path.includes('/server') || route.path.includes('/cloud') || route.path.includes('/web5') || route.path.includes('/mesh')
})
// Top padding for content div to clear fixed mobile tab overlays
// Top padding for content div to clear fixed mobile tab overlays.
// Includes safe area inset for Android (read from CSS custom property set by WebView).
const safeAreaTop = ref(0)
function readSafeAreaTop() {
if (typeof window === 'undefined') return
const val = getComputedStyle(document.documentElement).getPropertyValue('--safe-area-top').trim()
if (val) safeAreaTop.value = parseInt(val, 10) || 0
}
const mobileTabPaddingTop = computed(() => {
if (typeof window === 'undefined' || window.innerWidth >= 768) return 0
if (showAppsTabs.value && showNetworkTabs.value) return 160
if (showAppsTabs.value || showNetworkTabs.value) return 80
const sat = safeAreaTop.value
if (showAppsTabs.value && showNetworkTabs.value) return 160 + sat
if (showAppsTabs.value || showNetworkTabs.value) return 80 + sat
return 0
})
@ -188,7 +198,10 @@ function onResize() {
onMounted(() => {
updateTabBarHeight()
readSafeAreaTop()
window.addEventListener('resize', onResize)
// Re-read after WebView injection has had time to run
setTimeout(readSafeAreaTop, 500)
})
onBeforeUnmount(() => {

View File

@ -1,7 +1,8 @@
<template>
<div
v-if="healthNotifications.length > 0"
class="fixed top-4 right-4 z-[200] flex flex-col gap-2 max-w-sm"
class="fixed right-4 z-[200] flex flex-col gap-2 max-w-sm"
style="top: calc(var(--safe-area-top, env(safe-area-inset-top, 0px)) + 16px);"
>
<div
v-for="notif in healthNotifications"

View File

@ -152,7 +152,7 @@ export function getCuratedAppList(): MarketplaceApp[] {
{
id: 'lnd',
title: 'LND',
version: '0.17.4',
version: '0.18.4',
description: 'Lightning Network Daemon. Fast and cheap Bitcoin payments through the Lightning Network.',
icon: '/assets/img/app-icons/lnd.svg',
author: 'Lightning Labs',
@ -174,11 +174,11 @@ export function getCuratedAppList(): MarketplaceApp[] {
{
id: 'mempool',
title: 'Mempool Explorer',
version: '2.5.0',
version: '3.0.0',
description: 'Self-hosted Bitcoin blockchain and mempool visualizer with beautiful explorer interface.',
icon: '/assets/img/app-icons/mempool.webp',
author: 'Mempool',
dockerImage: `${REGISTRY}/mempool-frontend:v2.5.0`,
dockerImage: `${REGISTRY}/mempool-frontend:v3.0.0`,
manifestUrl: undefined,
repoUrl: 'https://github.com/mempool/mempool'
},

View File

@ -110,6 +110,9 @@ rpcallowip=0.0.0.0/0
rpcport=8332
listen=1
printtoconsole=1
# ZMQ publishers for LND and other services that need real-time block/tx notifications
zmqpubrawblock=tcp://0.0.0.0:28332
zmqpubrawtx=tcp://0.0.0.0:28333
BTCCONF
log "Generated bitcoin.conf with rpcauth (no plaintext credentials)"
fi
@ -305,7 +308,7 @@ if ! $DOCKER ps --format '{{.Names}}' 2>/dev/null | grep -qE 'bitcoin-knots|arch
--memory=$(mem_limit bitcoin-knots) --network archy-net \
--cap-drop ALL --cap-add CHOWN --cap-add FOWNER --cap-add SETUID --cap-add SETGID --cap-add DAC_OVERRIDE \
--security-opt no-new-privileges:true \
-p 8332:8332 -p 8333:8333 \
-p 8332:8332 -p 8333:8333 -p 28332:28332 -p 28333:28333 \
-v /var/lib/archipelago/bitcoin:/home/bitcoin/.bitcoin \
"${BITCOIN_KNOTS_IMAGE}" \
-server=1 $BTC_EXTRA_ARGS \
@ -832,7 +835,7 @@ if ! $DOCKER ps --format '{{.Names}}' 2>/dev/null | grep -q indeedhub; then
--memory=$(mem_limit indeedhub) \
--cap-drop ALL --security-opt no-new-privileges:true \
--read-only --tmpfs /tmp:rw,noexec,nosuid,size=64m --tmpfs /app/.next/cache:rw,noexec,nosuid,size=128m \
-p 7777:7777 \
-p 8190:3000 \
-e NODE_ENV=production -e NEXT_TELEMETRY_DISABLED=1 \
"$INDEEDHUB_IMAGE" 2>>"$LOG" || true
# Fix IndeedHub for iframe: remove X-Frame-Options so it loads in Archipelago panel