29 Commits

Author SHA1 Message Date
archipelago
34b1fdc1a3 fix(boot): order archipelago.service after the data volume mount (B17)
On production nodes /var/lib/archipelago (the app data dir AND podman's
graphroot=/var/lib/archipelago/containers/storage) is a separate
device-mapper volume. archipelago.service ordered only After=network-online
.target, so on cold boots it (and its ExecStartPre) could start BEFORE
var-lib-archipelago.mount, write to the bare mountpoint on rootfs, fail every
podman call, exit, and be restarted every 5s until the volume mounted — the
"~20x [FAILED] Failed to start over ~5min" boot flap. Proven live on .198:
"var-lib-archipelago.mount: Directory /var/lib/archipelago to mount over is
not empty, mounting anyway" — the service had written there pre-mount.

Fix: RequiresMountsFor=/var/lib/archipelago (adds Requires= + After= on the
mount unit).
- image-recipe/configs/archipelago.service: ships the directive on fresh ISOs.
- bootstrap::ensure_archipelago_mount_ordering(): self-heals already-deployed
  nodes' installed unit + daemon-reload (boot-ordering only, effective next
  reboot; never restarts the running service). Idempotent; harmless on rootfs
  installs (maps to the always-mounted root).

Verified on .198: after applying, systemctl shows After=var-lib-archipelago
.mount and systemd-analyze verify is clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 03:33:29 -04:00
archipelago
c393b96da3 backend: harden rootless app lifecycle orchestration 2026-06-11 00:24:32 -04:00
Dorian
835c525218 chore(release): stage v1.7.55-alpha 2026-05-13 15:09:22 -04:00
archipelago
f507b847ef chore: release v1.7.48-alpha
Hotfix: archipelago.service ExecStartPre now mkdirs /run/containers and
/var/lib/containers before the unit's mount-namespace setup tries to bind
them. Without this, fresh nodes that don't have /run/containers (e.g.
nodes provisioned without a prior podman session) fail at the namespace
step with:

  Failed to set up mount namespacing: /run/containers: No such file or directory
  Failed at step NAMESPACE spawning /bin/bash: No such file or directory

Existing nodes don't pick up systemd unit changes via OTA — they need a
one-time `systemctl edit archipelago` adding the same mkdir. ISO installs
from this version forward have the fix baked in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:27:22 -04:00
archipelago
80765c5755 feat(systemd): delegate cgroup controllers to archipelago.service
Adds Delegate=memory pids cpu io to the archipelago.service unit.

Context: the service runs as User=archipelago under system.slice with
rootless podman. When podman creates transient libpod-*.scope units for
containers under user.slice, systemd needs the caller to hold
CAP_SYS_ADMIN on the target cgroup subtree \u2014 which happens iff
Delegate= lists the controllers we want to set. Without Delegate, any
future code path that goes through the podman CLI (runtime.rs) instead
of the libpod HTTP API (podman_client.rs) would hit MemoryMax
rejections that have exactly the same symptom as the bug I just fixed
in parse_memory_limit but with a completely different root cause.

Belt-and-braces: current production path uses PodmanClient and was
fixed in the preceding commit. But the DockerRuntime CLI path in
runtime.rs:262-268 (cmd.arg("--memory")) is still reachable via
AutoRuntime fallback on hosts without podman, and future rust
orchestrator code may legitimately need cgroup delegation. This
directive is no-op harmful on hosts that already delegate upstream
(systemd gracefully handles duplicate/nested delegation).
2026-04-23 03:44:36 -04:00
Dorian
2517379ac3 chore: Debian 12 → 13 (Trixie) migration, service hardening
- Update all references from Debian 12 (Bookworm) to Debian 13 (Trixie)
- Enable SystemCallArchitectures, RestrictAddressFamilies, RestrictRealtime
  in archipelago.service (safe on systemd 256+ which respects NoNewPrivileges=no)
- Update GLIBC compatibility checks from 2.36 to 2.40
- ISO filename, build container, and docs updated throughout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 21:32:08 +02:00
Dorian
b8a09b448b fix: AIUI /aiui/ base path, nginx alias cycle, VPN auth, container boot
- AIUI: rebuild with /aiui/ base path (router, chunk loader, SW scope)
- nginx: remove alias from /aiui/ location (caused try_files redirect cycle)
- VPN: WireGuard standalone setup, auth improvements
- ISO: build script hardening, service file updates
- first-boot-containers: networking stack fixes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 20:42:09 +02:00
Dorian
b0907c48b2 feat: NostrVPN mesh + VPN card UI + nvpn v0.3.7
- VPN card: relay URLs, device management, invite QR, add participant
- Backend: vpn.invite, vpn.add-participant, vpn.peer-config RPCs
- nvpn v0.3.7 system service (fixes event processing bug in v0.3.4)
- First-boot: auto-configure nvpn with node identity and endpoint
- Service: AF_NETLINK for WireGuard, NoNewPrivileges=no for sudo wg
- TASK-50: networking stack reliability from first install

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-08 15:00:00 +02:00
Dorian
e5f695c1c4 fix: service file crash on fresh installs, CI workflow portability
- Remove MemoryDenyWriteExecute=yes from archipelago.service — ring
  (rustls) and secp256k1 (bitcoin/nostr) crypto libraries need
  executable memory mappings that this restriction blocks
- Add + prefix to ExecStartPre so mkdir/chown run as root
- Use $HOME/archy instead of /home/archipelago/archy in CI workflows
  so builds work on both .228 and VPS CI runners

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:08:21 +01:00
Dorian
ada035f1b8 fix: reduce TimeoutStopSec from 660s to 15s
The backend shuts down in <1s. The 660s timeout was left from when
Bitcoin Core was managed by this service. With 660s, systemctl stop
hangs for 11 minutes if the process is already dead but systemd
hasn't noticed, blocking all deploys and restarts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 18:25:13 +01:00
Dorian
87bc0baa94 fix: add debian-tor group to backend service for onion address access
The backend couldn't read Tor hidden service hostnames because the
systemd service only had SupplementaryGroups=dialout. Adding debian-tor
allows the backend to read /var/lib/tor/hidden_service_*/hostname
without needing sudo (which is blocked by NoNewPrivileges=yes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 19:14:27 +01:00
Dorian
99400a7165 feat: container orchestration, branding overhaul, onboarding logging
Container orchestration:
- Health monitor with crash recovery and auto-restart
- Doctor service (periodic health checks via systemd timer)
- Reconcile service (desired-state convergence)
- Stack-aware install/uninstall with dependency tracking

Branding:
- Custom GRUB background (designer artwork, 1024x768)
- ISOLINUX boot menu: centered, orange accents, clean labels
- Terminal banners: adaptive width, basic ANSI colors, fits 80-col
- Removed auto-generated splash scripts (designer provides assets)
- GRUB theme: lowercase branding

Frontend:
- 401 handler clears localStorage immediately (prevents cascade)

Backend:
- Onboarding/auth logging ([onboarding] tag in journalctl)
- Cookie Secure flag logging for debugging HTTP/HTTPS issues

ISO fixes:
- Install log saved before unmount (was silently failing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:34:29 +00:00
Dorian
121f17e44e fix: container install flow, filebrowser auth, AppCard enrichment
- Fix .198-style fresh installs: systemd service ExecStartPre creates
  /run/user/1000, enable podman.socket, chmod 644 /etc/hosts
- Filebrowser: add /data volume for database (fixes read-only crash),
  secure auth with random password via backend RPC (no more admin/admin)
- AppCard: enrich installing state with marketplace metadata (icon,
  title, description, tier badge, author, version)
- Registry: btcpayserver 1.13.5 → 1.13.7, images mirrored
- ReadWritePaths: add home container paths for rootless podman

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:32:54 +00:00
Dorian
c3d4a7063b fix: systemd resource limits, Tor rotation transition, unwrap elimination, RPC timeouts
- I2: Add MemoryMax=4G, LimitNOFILE=65535, TasksMax=2048 to systemd service
- I3: Tor rotation keeps old service for 1h transition before cleanup
- R14: Replace .parse().unwrap() with .unwrap_or(localhost) in rate limiter
- R15: Replace 7 unwrap/expect in mesh protocol with proper error propagation
- R27: Add 10s timeouts to mesh Bitcoin RPC calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:46:40 +00:00
Dorian
5aafb6e40c fix: What's New v1.3.0, backend bind 127.0.0.1 in deploy + systemd, dead man's switch permissions
- Added v1.3.0 release notes to Settings "What's New" modal
- Deploy script now auto-fixes backend bind address (0.0.0.0 → 127.0.0.1)
- All image-recipe systemd/service files updated to 127.0.0.1
- Fixed dead man's switch: alert-config.json owned by root, now chown'd
- Removed unused toggleAutoSync function (build error)
- Deploy script adds LND REST port 8080 to Tor config generation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 12:55:31 +00:00
Dorian
0d28d28bf7 security(TASK-8): fix 8 pentest findings — C1/C3/H1/M1/M2/L2
CRITICAL:
- C1: /lnd-connect-info now requires session auth, CORS wildcard removed
- C3: DEV_MODE removed from production service file (dev override only)

HIGH:
- H1: node-message endpoint now verifies ed25519 signatures when
  provided, logs warning for unsigned messages

MEDIUM:
- M1: content.add rejects filenames containing ".." (path traversal)
- M2: NIP-07 postMessage responses use specific origin instead of '*'

LOW:
- L2: Onion validation now enforces strict v3 format (56 base32 chars
  + ".onion", exactly 62 chars, no colons)

Previously fixed: C2 (RPC creds generated per-install from secrets)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 19:45:10 +00:00
Dorian
7f5bbbd74c fix: rootless podman scanning — relax namespace/syscall restrictions
RestrictNamespaces and SystemCallFilter block rootless podman from
creating user namespaces needed for container isolation. Removed these
along with RestrictSUIDSGID (implied by NoNewPrivileges). ProtectHome
set to no (rootless podman needs ~/.local/share/containers writable).

Remaining active protections: NoNewPrivileges, ProtectSystem=strict,
ReadWritePaths, RestrictAddressFamilies, MemoryDenyWriteExecute,
RestrictRealtime, SystemCallArchitectures=native.

Also reduced initial scan delay from 15s to 3s for faster container
visibility after boot, and removed Ollama from auto-deploy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 14:22:00 +00:00
Dorian
39c7ac1924 feat: rootless podman, session hardening, boot stability, sidebar fix
Rootless podman migration (TASK-11):
- Remove sudo from all podman calls in PodmanClient + 8 backend files
- Remove sudo from all podman/docker calls in deploy script
- Restore full systemd security hardening: NoNewPrivileges,
  RestrictAddressFamilies, MemoryDenyWriteExecute, RestrictRealtime,
  RestrictNamespaces, RestrictSUIDSGID, SystemCallFilter, ProtectSystem=strict
- Enable loginctl linger for rootless container persistence
- Remove Ollama from auto-deploy (marketplace-only)

Session & auth hardening:
- Increase MAX_CONCURRENT_SESSIONS 20→50 (prevents eviction storms)
- Debounced 401 redirect in rpc-client.ts (prevents redirect storms)

Boot stability:
- optimize-debian.sh: adds chrony, swap, removes policy-rc.d
- deploy script: pre-restart chrony + swap setup
- ISO build: chrony package, swap file creation
- BootScreen: no longer clears localStorage (prevents splash replay)
- RootRedirect: sole owner of localStorage clearing on server ready

UI fixes:
- Sidebar opacity default changed from 0→visible (fixes missing sidebar
  after page-persistence login without entrance animation)
- Console.log/error wrapped in import.meta.env.DEV guards
- Remove unused route import from RootRedirect

Beta tracking:
- CLAUDE.md: beta freeze protocol added
- MASTER_PLAN.md: TASK-11, TASK-17, phase structure
- BETA-PROGRESS.md: initial tracking doc
- Tagged v1.2.0-alpha.1 as pre-rootless baseline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 13:53:27 +00:00
Dorian
b89808862c fix: restore container scanning — relax systemd sandbox for podman
The security hardening (NoNewPrivileges, RestrictAddressFamilies,
MemoryDenyWriteExecute, RestrictRealtime, ProtectSystem=strict) all
blocked podman container management via sudo. These are temporarily
disabled until TASK-11 (rootless podman migration) is complete.

Remaining active protections: ProtectSystem=true (/usr, /boot),
ProtectHome=yes, PrivateTmp=yes, PrivateDevices=no (mesh radio).

Also adds TASK-11 to MASTER_PLAN.md for tracking the rootless podman
migration that will allow re-enabling full security hardening.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 12:06:35 +00:00
Dorian
e4089287a3 fix: bulletproof mesh serial connection — PrivateDevices, auto-detect fallback, backoff
Root cause: systemd PrivateDevices=yes hid /dev/ttyUSB* from the service,
preventing .198 from connecting to its Heltec V3 after the security hardening.

Changes:
- Set PrivateDevices=no in systemd service (serial access needs physical devices;
  other hardening layers remain: NoNewPrivileges, ProtectSystem, RestrictNamespaces)
- Add SupplementaryGroups=dialout for explicit serial permissions
- Add fallback auto-detect when configured serial path fails to open
- Add exponential backoff on reconnect (5s→60s cap) to reduce log spam
- Add pre-open device existence check with actionable error messages
- Add udev rule (99-mesh-radio.rules) for stable /dev/mesh-radio symlink
- Add /dev/mesh-radio to serial candidate list (checked first)
- Add Connect button per detected device in Mesh UI
- Deploy udev rule to both servers and ISO build
- Fix FEDI_HASH unbound variable in deploy script
- Fix deploy binary step to handle hung service stop gracefully

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 10:50:13 +00:00
Dorian
12b5fb2d1b security hardening 2026-03-18 09:56:40 +00:00
Dorian
430d174389 feat: Phase 2 — systemd sandboxing, Bitcoin RPC localhost binding, Tailscale deprivilege
- Service runs as unprivileged `archipelago` user instead of root
- Added systemd sandboxing: ProtectSystem=strict, NoNewPrivileges, PrivateTmp,
  MemoryDenyWriteExecute, RestrictNamespaces, SystemCallFilter
- Bitcoin RPC rpcallowip restricted to localhost + Podman subnet (10.88.0.0/16)
- Tailscale container: removed --privileged, uses cap-drop ALL + cap-add NET_ADMIN/NET_RAW

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 00:42:29 +00:00
Dorian
7776191fac fix: watchdog killing backend every 60s on .198 (47 restarts/day)
Root cause: sd_notify::notify(true, ...) cleared NOTIFY_SOCKET env var,
so watchdog pings never reached systemd. Backend killed every 60s.

Fixes:
- Change sd_notify::notify first param to false (keep socket)
- Increase WatchdogSec from 60 to 300 (5min) for crash recovery
- Add TimeoutStartSec=300 for slow container startups
- Adjust watchdog ping interval to 120s

This was causing 47 restarts/day on .198 and blocking REBOOT-03,
FLEET-03, FLEET-04, VC-04.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 04:30:57 +00:00
Dorian
e3e279331f feat: add systemd watchdog, OOM detection, disk growth alerting
MEM-01: OOM kill detection via dmesg checks every 5 minutes
MEM-03: Disk growth rate tracking (288 samples over 24h), warns at >1GB/day
MEM-04: Systemd watchdog (WatchdogSec=60, sd_notify::Watchdog every 30s)
        Service Type=notify for proper startup notification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 02:54:59 +00:00
Dorian
a2aa9657b1 fix: prevent My Apps crash when installing apps + add filebrowser to demo
The My Apps page went blank after installing apps because pkg['static-files'].icon
was accessed without optional chaining on dynamically installed packages that lack
the static-files property.

- Make static-files optional in PackageDataEntry type
- Add defensive ?.icon access with fallback in Apps.vue and AppDetails.vue
- Add filebrowser to mock backend staticDevApps (enables Cloud page in demo)
- Expand portMappings and marketplaceMetadata for all marketplace apps
- installPackage now uses staticApp() format for consistent data shape

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:09:59 +00:00
Dorian
589adb8b18 fix: alpha release hardening — onboarding, security, and ISO build
- Convert "Choose Your Path" screen to informative (read-only cards)
- Harden "Choose Your Setup" (gray out Coming Soon options, auto-select Fresh Start)
- Auto-fetch DID on mount with retry and auto-advance after success
- Improve backup download for mobile compatibility
- Add retry logic to verify step with graceful skip option
- Route verify → done → login for complete onboarding flow
- Add AIUI install confirmation via custom event (SEC-001)
- Add file path whitelist for AIUI file access (SEC-002)
- Add log redaction for container logs sent to AIUI (SEC-003)
- Add Secure flag to session cookie in production (SEC-004)
- Fix ISO build script to handle zstd compression errors gracefully
- Sync archipelago.service from live server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:00:28 +00:00
Dorian
6656d2f1d9 fix: implement 22 security pentest remediation fixes
Server-side session management with SHA-256 hashed tokens and HttpOnly
cookies. Auth middleware gating all RPC/WS/proxy routes with method
allowlist. Login rate limiting (5/60s per IP). CORS restricted to
config origin. Docker registry allowlist. App ID and path validation.
P2P message sanitization (HTML + log injection). Onion address and
known-peer validation. Nginx security headers (CSP, X-Frame-Options,
etc.) and AIUI proxy auth. Systemd hardening (non-root, NoNewPrivileges,
ProtectSystem).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 03:26:56 +00:00
Dorian
1073d9fd2c Update Fedimint configuration and enhance onboarding process
- Upgraded Fedimint version to v0.10.0 in docker-compose.yml and manifest.yml, adding support for the built-in Guardian UI.
- Modified .gitignore to exclude deploy-config.sh script.
- Enhanced onboarding process in AuthManager to persist onboarding state and validate password strength during user setup.
- Updated API to handle onboarding completion and password change requests, ensuring a smoother user experience.
- Improved configuration management to support Nostr discovery and Tor proxy settings, enhancing node identity features.
2026-02-17 15:03:34 +00:00
Dorian
34fc06726e Enhance development workflow and deployment practices for Archipelago
- Updated the Development-Workflow documentation to clarify deployment strategy, emphasizing direct deployment to the live system for testing.
- Added detailed instructions for the deployment command, including syncing code, building frontend and backend, and restarting services.
- Improved SSH key management section to assist with authentication issues.
- Expanded the testing workflow to include steps for checking logs and syncing changes back to the ISO build.
- Updated the ISO build integration section to ensure system-level changes are captured for future builds.
- Refactored various sections for clarity and completeness, including deployment paths and system configuration files.
2026-02-01 13:24:03 +00:00