docs: record fleet-deploy ENOSPC bug + fix + cleanup outcome

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
archipelago 2026-07-01 11:01:27 -04:00
parent 84d35b3b68
commit e3baaa5de3

View File

@ -1226,3 +1226,35 @@ fails under full-parallel `--workspace` runs, and never on the same test twice
test-fixture/tempfile collision generating non-UTF8 bytes under parallelism, not a real credentials
bug and not related to anything touched this session. Worth a real fix at some point (a test isolation
issue makes CI flaky) but out of scope here.
## 15. Fleet deploy of this session's fixes + deploy-script ENOSPC bug (2026-07-01)
User asked to build+deploy all 8 fixes above to `.116`/`.198`/`.228` via
`scripts/deploy-to-target.sh`. **Found and fixed a real bug in the deploy script itself**: its
`rsync --exclude` list never excluded `releases/` (the local repo's own historical build artifacts
— dozens of versioned binaries + frontend tarballs, 7-10GB) or `reticulum-daemon/.venv` (a Python
virtualenv bundling PyInstaller, ~87MB-several hundred MB depending on state) — every deploy synced
these to the target's root disk. This **filled `.198` (29GB disk) to exactly 100% mid-deploy**,
aborting that deploy with `rsync: ... No space left on device`, and **filled `.228` to 100% right
after a "successful" deploy** (the post-deploy health check kept passing throughout — it doesn't
check free disk space, so nothing alarmed on it). Neither node's actual services were corrupted by
this (verified: containers unaffected, HTTP/HTTPS still 200 after disk was freed) — the risk was
latent (next log/DB write failing), not realized.
**Fixed**: added `--exclude 'releases'` (`aa849849`) and `--exclude '.venv'` (`84d35b3b`) to the
rsync command in `scripts/deploy-to-target.sh:545-559`. Manually removed the already-synced
`releases/`+`.venv` copies from `.116`/`.198`/`.228` (safe — these are deploy-staging copies of
build artifacts, not live node data). Re-ran `.198`'s deploy after the fix; it and `.228`/`.116` are
now all on `84d35b3b` and healthy.
**Also checked** (per user request) the broader Tailscale fleet for the same bloat, at IPs the user
supplied: `100.72.136.5`, `100.89.209.89`, `100.70.96.88`, `100.82.34.38` were all clean (no
`releases/`/`.venv`, 13-32% disk used) — not part of this deploy round, just checked for bloat.
`100.66.157.120` was intentionally **not touched** (reserved as another developer's dev node per
[[reference_test_deploy_roster]]). `100.64.83.15` and `100.102.169.103` were **unreachable** with
every credential combination in memory (both `archipelago`/`debian` users, all 3 known passwords,
plus a `tailscale nc` proxy attempt for the timed-out one) — need the user to supply correct
access details if these need checking later.
`.116`'s HTTPS not responding is **not a bug** — that node's nginx only binds `:80` by design (a
pre-existing dev-node config, see [[reference_116_dev_node]]), unrelated to this deploy.