27 lines
1014 B
Markdown
Raw Normal View History

fix: overhaul container lifecycle — recovery, health, uninstall, UI state Container recovery: - Health monitor: MAX_RESTART_ATTEMPTS 3→10, interval 60s→120s - Dependency-aware restarts: won't restart services before their deps - Reset dependent counters when a dependency recovers - Handle "created" state containers (were invisible to health monitor) - Added IndeedHub, mempool-api, mysql to tier system - Crash recovery: podman start timeout 30s→120s with retry - Podman client: socket timeout 5s→30s, added restart policy UI state representation: - Exit code 0 shows "stopped" (gray), not "crashed" (red) - Exit code 137 shows "killed (OOM)" - Non-zero exit shows "crashed" (red) - Added exit_code field to PackageDataEntry Install/uninstall fixes: - Install returns error when container doesn't start (was silent success) - Post-install hooks awaited instead of fire-and-forget tokio::spawn - Uninstall: graceful rm before force, volume prune, network cleanup - Uninstall returns error on partial failure (was 200 OK) Config consistency: - DB passwords read from /var/lib/archipelago/secrets/ (was hardcoded) - Bitcoin: added ZMQ ports 28332/28333 for LND block notifications - IndeedHub port 7777→8190 (was conflicting with strfry) - Marketplace versions: LND 0.17.4→0.18.4, Mempool 2.5.0→3.0.0 Performance: - Metrics collector interval 60s→300s (was duplicating health monitor) - Podman client: proper error propagation instead of unwrap_or_default Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 07:03:57 +01:00
# Polish: Deployment Pipeline
## Pre-Deploy Checks
Add to deploy-to-target.sh: SSH key exists, target reachable, 2GB free disk space.
## Backup Before Deploy
```bash
sudo cp /usr/local/bin/archipelago /usr/local/bin/archipelago.backup
sudo cp -a /opt/archipelago/web-ui /opt/archipelago/web-ui.backup
sudo cp /etc/nginx/sites-available/archipelago /etc/nginx/sites-available/archipelago.backup
```
## Health Check After Deploy
Loop up to 15 attempts, 2s apart, checking `curl http://localhost:5678/health` returns 200.
## Rollback on Failure
If health check fails: restore binary, frontend, nginx from .backup files, restart services.
## Deployment Lock
Use `flock` on `/tmp/archipelago-deploy.lock` to prevent concurrent deploys.
## Nginx Validation
Always `sudo nginx -t` before reload. If invalid, restore backup config.
## Integration Flow
1. acquire_lock → 2. pre_deploy_checks → 3. backup_current → 4. build + deploy → 5. validate_nginx → 6. restart services → 7. health_check || rollback