diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 00000000..e1b93237 --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,506 @@ +# Archipelago Troubleshooting Guide + +This guide covers the 20 most common issues you may encounter with Archipelago, along with diagnostic commands and solutions. + +## Connection & Access + +### 1. Can't connect to the web UI + +**Symptoms**: Browser shows "connection refused" or spins forever when accessing `http://` + +**Diagnosis**: +```bash +# Check if the server is reachable on the network +ping + +# SSH in and check Nginx +ssh archipelago@ +sudo systemctl status nginx +sudo nginx -t + +# Check if the backend is running +sudo systemctl status archipelago +curl -s http://localhost:5678/health +``` + +**Solutions**: +- Ensure you're on the same network (LAN) as the server +- If Nginx is down: `sudo systemctl restart nginx` +- If backend is down: `sudo systemctl restart archipelago` +- Check firewall: `sudo ufw status` — port 80 (HTTP) and 443 (HTTPS) must be allowed +- If the server IP changed, check your router's DHCP lease table or run `ip addr show` on the server + +### 2. Login page loads but login fails + +**Symptoms**: You see the login screen but entering the correct password shows an error + +**Diagnosis**: +```bash +# Check backend logs +sudo journalctl -u archipelago --since "5 minutes ago" --no-pager + +# Test the RPC endpoint directly +curl -s -X POST http://localhost:5678/rpc/v1 \ + -H 'Content-Type: application/json' \ + -d '{"method":"server.echo","params":{"message":"test"}}' | head -100 +``` + +**Solutions**: +- Default password is `password123` — change it after first login +- Clear browser cookies and try again (stale session cookie) +- Restart the backend: `sudo systemctl restart archipelago` +- Check if the database is accessible: `ls -la /var/lib/archipelago/` + +### 3. Web UI loads but shows blank white page + +**Symptoms**: Browser loads but nothing renders, or you see a white screen + +**Diagnosis**: +```bash +# Check if frontend files exist +ls -la /opt/archipelago/web-ui/index.html +ls -la /opt/archipelago/web-ui/assets/ + +# Check browser console (F12 > Console) for JavaScript errors +# Check Nginx error log +sudo tail -20 /var/log/nginx/error.log +``` + +**Solutions**: +- Redeploy the frontend: run the deploy script from the development machine +- Check if files exist in `/opt/archipelago/web-ui/` — if missing, the deploy didn't complete +- Clear browser cache (Ctrl+Shift+R or Cmd+Shift+R) +- Try a different browser or incognito mode + +### 4. HTTPS certificate warning + +**Symptoms**: Browser shows "Your connection is not private" or certificate error + +**Solutions**: +- Archipelago uses a self-signed certificate by default — this is expected on first visit +- Click "Advanced" > "Proceed to site" (Chrome) or "Accept the Risk" (Firefox) +- For permanent fix, configure a domain name and use Let's Encrypt +- On kiosk mode, the certificate is auto-accepted + +--- + +## App Issues + +### 5. App won't start (container fails to launch) + +**Symptoms**: Clicking "Start" on an app shows an error, or the app stays in "stopped" state + +**Diagnosis**: +```bash +# Check container status +podman ps -a --filter "name=" + +# Check container logs +podman logs --tail 50 + +# Check if the image exists +podman images | grep + +# Check available disk space +df -h /var/lib/archipelago +``` + +**Solutions**: +- If the image is missing: reinstall the app from the Marketplace +- If disk is full: run disk cleanup from Settings, or manually `podman system prune` +- If the container exits immediately: check logs for the root cause (usually missing config or permissions) +- Restart podman: `sudo systemctl restart podman` + +### 6. App shows "unhealthy" status + +**Symptoms**: App is running but shows a yellow or red health indicator + +**Diagnosis**: +```bash +# Check container health +podman healthcheck run + +# Check container resource usage +podman stats --no-stream + +# Check container logs for errors +podman logs --tail 100 | grep -i error +``` + +**Solutions**: +- Some apps take time to become healthy after starting (especially Bitcoin which needs to sync) +- Check if the app has enough resources (RAM, CPU) +- Restart the specific app from the UI or: `podman restart ` +- Check if dependent services are running (e.g., LND requires Bitcoin) + +### 7. Bitcoin not syncing / stuck at a block height + +**Symptoms**: Bitcoin node shows the same block height for an extended period + +**Diagnosis**: +```bash +# Check Bitcoin logs +podman logs bitcoin-knots --tail 50 + +# Check if Bitcoin is connected to peers +podman exec bitcoin-knots bitcoin-cli -datadir=/data getpeerinfo | grep -c '"addr"' + +# Check sync progress +podman exec bitcoin-knots bitcoin-cli -datadir=/data getblockchaininfo | grep -E "blocks|headers|verificationprogress" +``` + +**Solutions**: +- Initial sync takes 1-7 days depending on hardware — be patient +- Ensure the server has a stable internet connection +- Check disk space: Bitcoin requires 600GB+ for full chain +- If stuck: restart the container `podman restart bitcoin-knots` +- If peers = 0: check firewall allows port 8333 outbound +- Add manual peers: edit bitcoin.conf to add `addnode=` entries + +### 8. LND won't connect to Bitcoin + +**Symptoms**: LND shows errors about Bitcoin connection, or channels aren't working + +**Diagnosis**: +```bash +# Check LND logs +podman logs lnd --tail 50 + +# Check if Bitcoin RPC is accessible from LND +podman exec lnd wget -qO- http://bitcoin-knots:8332/ 2>&1 | head -5 + +# Check LND status +podman exec lnd lncli getinfo 2>&1 | head -20 +``` + +**Solutions**: +- Ensure Bitcoin is fully synced before starting LND +- Both containers must be on the same Podman network (`archy-net`) +- Check Bitcoin RPC credentials match what LND expects +- Restart both containers in order: Bitcoin first, then LND + +--- + +## Backup & Recovery + +### 9. Backup fails to create + +**Symptoms**: Backup button shows an error, or backup file is empty + +**Diagnosis**: +```bash +# Check disk space +df -h /var/lib/archipelago + +# Check backup directory permissions +ls -la /var/lib/archipelago/backups/ + +# Check backend logs for backup errors +sudo journalctl -u archipelago --since "10 minutes ago" | grep -i backup +``` + +**Solutions**: +- Ensure sufficient disk space (backups can be large) +- Check permissions: backup directory should be owned by `archipelago` user +- Try creating a smaller backup (exclude app data) +- Restart the backend service and try again + +### 10. Can't restore from backup + +**Symptoms**: Restore process fails or data doesn't appear after restore + +**Diagnosis**: +```bash +# Verify backup file integrity +file /path/to/backup.archipelago +ls -la /path/to/backup.archipelago + +# Check backend logs during restore +sudo journalctl -u archipelago -f +``` + +**Solutions**: +- Ensure the backup file is not corrupted (check file size is reasonable) +- Passphrase must match what was used during backup creation +- Stop all running apps before restoring +- After restore, restart the backend: `sudo systemctl restart archipelago` + +--- + +## System Updates + +### 11. System update fails + +**Symptoms**: Update button shows an error, or update process hangs + +**Diagnosis**: +```bash +# Check internet connectivity +curl -s https://start9.com > /dev/null && echo "Internet OK" || echo "No internet" + +# Check backend logs +sudo journalctl -u archipelago --since "15 minutes ago" | grep -i update + +# Check disk space (updates need temporary space) +df -h / +``` + +**Solutions**: +- Ensure stable internet connection during updates +- Ensure at least 2GB free disk space +- If update hangs: wait 10 minutes, then restart the backend +- Do NOT power off during an update — this can corrupt the system +- If system is in a bad state after failed update: boot from the USB installer and select "Repair" + +### 12. Server won't boot after update + +**Symptoms**: Server doesn't respond after a system update + +**Solutions**: +- Wait 5 minutes — the first boot after update may take longer +- If still unresponsive: connect a monitor/keyboard to check boot messages +- Try the recovery mode: boot from USB installer and select "Repair" +- As a last resort: reflash the USB and restore from backup + +--- + +## Kiosk Mode + +### 13. Kiosk display shows black screen + +**Symptoms**: Connected monitor shows black screen instead of the Archipelago UI + +**Diagnosis**: +```bash +# SSH in and check kiosk service +sudo systemctl status archipelago-kiosk + +# Check if X11/Wayland is running +ps aux | grep -E "(Xorg|weston|chromium|firefox)" + +# Check display output +ls /dev/dri/ +xrandr --query 2>/dev/null || echo "No display server" +``` + +**Solutions**: +- Restart the kiosk service: `sudo systemctl restart archipelago-kiosk` +- Check HDMI cable is securely connected +- Try a different HDMI port or cable +- Check if the display is set to the correct input source +- Review kiosk logs: `sudo journalctl -u archipelago-kiosk --since "5 minutes ago"` + +### 14. Kiosk display is stuck or frozen + +**Symptoms**: Kiosk shows the UI but it's unresponsive to touch/mouse + +**Solutions**: +- The watchdog service should auto-restart frozen kiosk — wait 30 seconds +- SSH in and restart: `sudo systemctl restart archipelago-kiosk` +- Check if the backend is responsive: `curl -s http://localhost:5678/health` +- If backend is down too, restart everything: `sudo systemctl restart archipelago archipelago-kiosk` + +--- + +## Network & Connectivity + +### 15. Tor address not available + +**Symptoms**: Settings shows "Tor: Not configured" or the .onion address is missing + +**Diagnosis**: +```bash +# Check Tor container +podman ps --filter "name=tor" +podman logs tor --tail 20 + +# Check if Tor hostname file exists +cat /var/lib/archipelago/tor/hidden_service/hostname 2>/dev/null +``` + +**Solutions**: +- Tor takes 30-60 seconds to bootstrap — wait and refresh +- If Tor container is stopped: start it from the Apps page +- Check that the Tor data directory exists and has correct permissions +- Restart Tor: `podman restart tor` + +### 16. Peers can't reach my node + +**Symptoms**: Federation peers show "unreachable" status + +**Diagnosis**: +```bash +# Check if Tor is running (needed for peer connectivity) +podman ps --filter "name=tor" + +# Check your Tor address +cat /var/lib/archipelago/tor/hidden_service/hostname + +# Test connectivity from the server side +curl -s http://localhost:5678/rpc/v1 \ + -H 'Content-Type: application/json' \ + -d '{"method":"node.tor-address","params":{}}' | head -50 +``` + +**Solutions**: +- Ensure Tor is running (required for peer-to-peer communication) +- Tor circuits can be slow — connections may take 30+ seconds +- Share your correct .onion address with peers +- Both nodes must have Tor running and be on the same federation + +### 17. DNS resolution issues + +**Symptoms**: Apps can't reach external services, container downloads fail + +**Diagnosis**: +```bash +# Test DNS from the server +nslookup google.com +dig google.com + +# Check DNS configuration +cat /etc/resolv.conf + +# Test from within a container +podman exec bitcoin-knots nslookup seed.bitcoin.sipa.be +``` + +**Solutions**: +- Configure DNS from Settings > Network: try Cloudflare (1.1.1.1) or Google (8.8.8.8) +- If using custom DNS, verify the server addresses are correct +- Restart networking: `sudo systemctl restart systemd-resolved` + +--- + +## Performance & Resources + +### 18. Server is very slow / high CPU usage + +**Symptoms**: Web UI is slow to respond, apps are laggy + +**Diagnosis**: +```bash +# Check CPU and memory usage +top -bn1 | head -15 + +# Check per-container resource usage +podman stats --no-stream + +# Check disk I/O +iostat -x 1 3 +``` + +**Solutions**: +- Bitcoin initial sync uses heavy CPU — this is normal and temporary +- Check which container is using the most resources with `podman stats` +- Stop apps you don't need +- If RAM is full: add swap space or upgrade hardware +- Consider using an SSD if running on HDD (massive I/O improvement) + +### 19. Disk full + +**Symptoms**: Apps fail, UI shows disk warning, new installs fail + +**Diagnosis**: +```bash +# Check disk usage +df -h /var/lib/archipelago + +# Find largest directories +du -sh /var/lib/archipelago/*/ | sort -rh | head -10 + +# Check Podman image/container sizes +podman system df +``` + +**Solutions**: +- Run disk cleanup from Settings +- Remove unused app data: `podman system prune -a` (WARNING: removes all stopped containers and unused images) +- Move Bitcoin data to external drive if chain data is too large +- Check for large log files: `du -sh /var/log/*/ | sort -rh` +- Consider upgrading to a larger disk + +### 20. WebSocket disconnections / "Reconnecting..." banner + +**Symptoms**: UI shows a reconnecting indicator, real-time updates stop + +**Diagnosis**: +```bash +# Check backend health +curl -s http://localhost:5678/health + +# Check backend logs for WebSocket errors +sudo journalctl -u archipelago --since "5 minutes ago" | grep -i websocket + +# Check system resources (WebSocket can drop under load) +free -h +``` + +**Solutions**: +- Brief disconnections are normal during backend restarts — the UI auto-reconnects +- If persistent: check if the backend is overloaded (high CPU/RAM) +- Restart the backend: `sudo systemctl restart archipelago` +- Check Nginx WebSocket proxy config: `/etc/nginx/sites-available/archipelago` must include `proxy_set_header Upgrade $http_upgrade` +- If on WiFi, try wired Ethernet for more stable connectivity + +--- + +## General Maintenance + +### Quick Health Check Commands + +```bash +# Overall system status +sudo systemctl status archipelago nginx + +# All containers +podman ps -a + +# Disk usage +df -h /var/lib/archipelago + +# Memory usage +free -h + +# Recent errors +sudo journalctl -u archipelago --since "1 hour ago" -p err + +# Backend health endpoint +curl -s http://localhost:5678/health +``` + +### Emergency Recovery + +If the system is completely unresponsive: + +1. **Power cycle**: Hold power button for 10 seconds, then turn back on +2. **Wait 5 minutes**: Services take time to start, especially if containers need to recover +3. **SSH in**: If web UI is down but SSH works, restart services manually +4. **USB recovery**: Boot from the Archipelago USB installer and select "Repair" +5. **Clean install + restore**: As last resort, do a fresh install and restore from backup + +### Collecting Diagnostic Information + +If you need to report an issue, collect this information: + +```bash +# System info +uname -a +cat /etc/os-release + +# Service status +sudo systemctl status archipelago nginx + +# Recent logs (last 100 lines) +sudo journalctl -u archipelago --no-pager -n 100 + +# Container status +podman ps -a + +# Disk and memory +df -h +free -h + +# Network +ip addr show +``` diff --git a/loop/plan.md b/loop/plan.md index dd79e6ec..f25cfe0f 100644 --- a/loop/plan.md +++ b/loop/plan.md @@ -384,7 +384,7 @@ #### Sprint 32: Documentation and Community (Week 9-12) -- [ ] **FINALDOC-01** — Write comprehensive troubleshooting guide. Create `docs/troubleshooting.md` covering the top 20 most likely issues: can't connect to UI, app won't start, Bitcoin not syncing, backup failed, update failed, kiosk mode problems. Include diagnostic commands and solutions. +- [x] **FINALDOC-01** — Write comprehensive troubleshooting guide. Create `docs/troubleshooting.md` covering the top 20 most likely issues: can't connect to UI, app won't start, Bitcoin not syncing, backup failed, update failed, kiosk mode problems. Include diagnostic commands and solutions. - [ ] **FINALDOC-02** — Create video/screenshot walkthrough documentation. Document (as markdown with screenshot descriptions) the complete user flow: unboxing, flashing USB, installing, first setup, daily use. These become the basis for future video tutorials.