docs: record fleet-deploy ENOSPC bug + fix + cleanup outcome
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
parent
84d35b3b68
commit
e3baaa5de3
@ -1226,3 +1226,35 @@ fails under full-parallel `--workspace` runs, and never on the same test twice
|
|||||||
test-fixture/tempfile collision generating non-UTF8 bytes under parallelism, not a real credentials
|
test-fixture/tempfile collision generating non-UTF8 bytes under parallelism, not a real credentials
|
||||||
bug and not related to anything touched this session. Worth a real fix at some point (a test isolation
|
bug and not related to anything touched this session. Worth a real fix at some point (a test isolation
|
||||||
issue makes CI flaky) but out of scope here.
|
issue makes CI flaky) but out of scope here.
|
||||||
|
|
||||||
|
## 15. Fleet deploy of this session's fixes + deploy-script ENOSPC bug (2026-07-01)
|
||||||
|
|
||||||
|
User asked to build+deploy all 8 fixes above to `.116`/`.198`/`.228` via
|
||||||
|
`scripts/deploy-to-target.sh`. **Found and fixed a real bug in the deploy script itself**: its
|
||||||
|
`rsync --exclude` list never excluded `releases/` (the local repo's own historical build artifacts
|
||||||
|
— dozens of versioned binaries + frontend tarballs, 7-10GB) or `reticulum-daemon/.venv` (a Python
|
||||||
|
virtualenv bundling PyInstaller, ~87MB-several hundred MB depending on state) — every deploy synced
|
||||||
|
these to the target's root disk. This **filled `.198` (29GB disk) to exactly 100% mid-deploy**,
|
||||||
|
aborting that deploy with `rsync: ... No space left on device`, and **filled `.228` to 100% right
|
||||||
|
after a "successful" deploy** (the post-deploy health check kept passing throughout — it doesn't
|
||||||
|
check free disk space, so nothing alarmed on it). Neither node's actual services were corrupted by
|
||||||
|
this (verified: containers unaffected, HTTP/HTTPS still 200 after disk was freed) — the risk was
|
||||||
|
latent (next log/DB write failing), not realized.
|
||||||
|
|
||||||
|
**Fixed**: added `--exclude 'releases'` (`aa849849`) and `--exclude '.venv'` (`84d35b3b`) to the
|
||||||
|
rsync command in `scripts/deploy-to-target.sh:545-559`. Manually removed the already-synced
|
||||||
|
`releases/`+`.venv` copies from `.116`/`.198`/`.228` (safe — these are deploy-staging copies of
|
||||||
|
build artifacts, not live node data). Re-ran `.198`'s deploy after the fix; it and `.228`/`.116` are
|
||||||
|
now all on `84d35b3b` and healthy.
|
||||||
|
|
||||||
|
**Also checked** (per user request) the broader Tailscale fleet for the same bloat, at IPs the user
|
||||||
|
supplied: `100.72.136.5`, `100.89.209.89`, `100.70.96.88`, `100.82.34.38` were all clean (no
|
||||||
|
`releases/`/`.venv`, 13-32% disk used) — not part of this deploy round, just checked for bloat.
|
||||||
|
`100.66.157.120` was intentionally **not touched** (reserved as another developer's dev node per
|
||||||
|
[[reference_test_deploy_roster]]). `100.64.83.15` and `100.102.169.103` were **unreachable** with
|
||||||
|
every credential combination in memory (both `archipelago`/`debian` users, all 3 known passwords,
|
||||||
|
plus a `tailscale nc` proxy attempt for the timed-out one) — need the user to supply correct
|
||||||
|
access details if these need checking later.
|
||||||
|
|
||||||
|
`.116`'s HTTPS not responding is **not a bug** — that node's nginx only binds `:80` by design (a
|
||||||
|
pre-existing dev-node config, see [[reference_116_dev_node]]), unrelated to this deploy.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user