archy/.claude/plans/tailscale-migration.md
Dorian f20f0650cf feat: Discover view, Fleet dashboard, MeshMap, type fixes
- New Discover.vue (app store redesign)
- Fleet.vue dashboard for .228
- MeshMap.vue component
- Fixed Discover.vue type errors (unused var, type predicate)
- Various UI updates (Apps, Dashboard, Marketplace, Mesh, Web5)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 16:12:01 +00:00

120 lines
5.3 KiB
Markdown

# Plan: Seamless Tailscale Migration for Alpha Testers
## Context
Tailscale nodes (Arch 1/2/3) are alpha tester machines. They need full deployment — binary, frontend, infrastructure, and containers — with zero friction. Currently `deploy-tailscale.sh` only deploys binary + frontend (85 lines), missing ALL infrastructure that `deploy-to-target.sh --live` provides (rootless prereqs, UID mapping, containers, nginx, Tor, HTTPS, dev mode, UFW, etc.).
These nodes may also have old **rootful** containers that need migrating to rootless.
## Approach
**Don't refactor the 1615-line deploy-to-target.sh** — too risky during beta freeze. Instead:
1. **Rewrite `deploy-tailscale.sh`** as a full-deploy script with split-mode SSH resilience
2. **Add `--tailscale` flag** to `deploy-to-target.sh` as a convenience wrapper
3. **Add rootful→rootless migration** as an automatic pre-step
4. **Fix `first-boot-containers.sh`** for rootless (separate concern, for ISO builds)
## Changes
### 1. Rewrite `scripts/deploy-tailscale.sh` (~400 lines)
Currently 85 lines doing only binary+frontend. Rewrite to be a full deploy for any node, using split-mode SSH (each step = separate short SSH session) for Tailscale stability.
**Steps the new script will run (each as its own SSH session):**
1. SSH connectivity check
2. Install prerequisites (rsync, node, npm) if missing
3. Rsync code to target
4. **Rootful→rootless migration** (detect `sudo podman ps -a`, stop & remove old rootful containers)
5. Build frontend (nohup + poll, or skip if copy-only node)
6. Build backend (nohup + poll, or skip if copy-only node)
7. Create rollback backup
8. Deploy binary (build locally or copy from .228)
9. Deploy frontend (build locally or copy from .228)
10. Deploy AIUI
11. Sync nginx config + HTTPS snippets
12. Sync systemd service
13. **Setup rootless prereqs** (sysctl, linger, podman.socket)
14. **Create data dirs + UID mapping** (full chown table from deploy-to-target.sh:670-689)
15. **Dev mode** (ARCHIPELAGO_DEV_MODE=true for HTTP cookies over Tailscale)
16. Deploy nostr-provider.js
17. Deploy Claude API proxy (if ANTHROPIC_API_KEY available)
18. Setup NTP + swap
19. Restart services
20. **Setup HTTPS** (with node's own IP in SAN)
21. **Read Bitcoin RPC credentials** from server secrets
22. **Create all containers** (Bitcoin, Mempool, BTCPay, ElectrumX, LND, Fedimint, Immich, HA, Grafana, Jellyfin, Vaultwarden, SearXNG, FileBrowser)
23. **Setup Tor** hidden services
24. **Fix UFW** forward policy
25. **Fix IndeedHub** NIP-07 (if running)
26. **Transfer custom images** for copy-only nodes (individual tarballs, never combined)
27. Run container doctor
28. Write deploy manifest
29. Post-deploy health check
**Copy-only mode**: When target can't build (Arch 1/3), script detects no `cargo`/`npm` on target and copies pre-built artifacts from .228 via SSH pipe.
**Key sections to port from deploy-to-target.sh:**
- Lines 646-689 — rootless prereqs + UID mapping
- Lines 629-641 — dev mode
- Lines 839-1474 — all container creation
- Lines 1143-1234 — Tor setup
- Lines 1477-1485 — UFW fix
- Lines 1487-1545 — IndeedHub NIP-07
### 2. Add `--tailscale` flag to `deploy-to-target.sh` (~30 lines)
Wrapper that calls `deploy-tailscale.sh` for each node sequentially. Also add `--tailscale-node=arch1|arch2|arch3` for single-node targeting.
### 3. Rootful→rootless migration (in deploy-tailscale.sh step 4)
Auto-detect and handle:
```
ssh TARGET 'ROOTFUL=$(sudo podman ps -a 2>/dev/null | wc -l); if [ $ROOTFUL -gt 1 ]; then sudo podman stop --all; sudo podman rm --all; fi'
```
Data safe — `/var/lib/archipelago/` never deleted, only ownership fixed by UID mapping step.
### 4. Fix `scripts/first-boot-containers.sh` (5 targeted edits)
- **Line 15**: Change root check → archipelago user check (UID 1000)
- **Line 140**: Change `10.88.0.0/16``0.0.0.0/0` (match deploy-to-target.sh)
- **After line 111**: Add rootless prereqs (sysctl, linger, podman.socket)
- **After line 113**: Add full UID mapping block
- **Pin `:latest` tags**: photoprism, ollama, searxng, nginx-proxy-manager, penpot
### 5. Update `scripts/setup-https-dev.sh`
Dynamic SAN — detect node's own IPs (including Tailscale interface) instead of hardcoding .228/.198.
## Files Modified
| File | Change | ~Lines |
|------|--------|--------|
| `scripts/deploy-tailscale.sh` | Full rewrite — complete deploy with split-mode SSH | ~400 |
| `scripts/deploy-to-target.sh` | Add `--tailscale` / `--tailscale-node` flags | ~30 |
| `scripts/first-boot-containers.sh` | Fix for rootless (subnet, UID mapping, prereqs) | ~40 |
| `scripts/setup-https-dev.sh` | Dynamic SAN with Tailscale IPs | ~15 |
| `docs/BETA-PROGRESS.md` | Update TASK-11 status | ~5 |
## Auth State Preservation
All user state in `/var/lib/archipelago/` is **never touched** by deploys:
- `sessions.json`, `user.json`, `identities/`, `secrets/`, `federation/`
## Verification
1. Deploy to Arch 2 first (has build tools, safest test)
2. Then Arch 1/3 (copy-only mode)
3. For each node: `podman ps` shows containers, `curl /health` returns 200, UI loads, login works
4. Run container doctor — 0 fixes needed
## Order
1. Rewrite `deploy-tailscale.sh` (main deliverable)
2. Add `--tailscale` flags to `deploy-to-target.sh`
3. Fix `first-boot-containers.sh`
4. Update `setup-https-dev.sh`
5. Test: Arch 2 → Arch 1 → Arch 3
6. Update BETA-PROGRESS.md