docs: record fleet-deploy ENOSPC bug + fix + cleanup outcome
Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
This commit is contained in:
parent
84d35b3b68
commit
e3baaa5de3
@ -1226,3 +1226,35 @@ fails under full-parallel `--workspace` runs, and never on the same test twice
|
||||
test-fixture/tempfile collision generating non-UTF8 bytes under parallelism, not a real credentials
|
||||
bug and not related to anything touched this session. Worth a real fix at some point (a test isolation
|
||||
issue makes CI flaky) but out of scope here.
|
||||
|
||||
## 15. Fleet deploy of this session's fixes + deploy-script ENOSPC bug (2026-07-01)
|
||||
|
||||
User asked to build+deploy all 8 fixes above to `.116`/`.198`/`.228` via
|
||||
`scripts/deploy-to-target.sh`. **Found and fixed a real bug in the deploy script itself**: its
|
||||
`rsync --exclude` list never excluded `releases/` (the local repo's own historical build artifacts
|
||||
— dozens of versioned binaries + frontend tarballs, 7-10GB) or `reticulum-daemon/.venv` (a Python
|
||||
virtualenv bundling PyInstaller, ~87MB-several hundred MB depending on state) — every deploy synced
|
||||
these to the target's root disk. This **filled `.198` (29GB disk) to exactly 100% mid-deploy**,
|
||||
aborting that deploy with `rsync: ... No space left on device`, and **filled `.228` to 100% right
|
||||
after a "successful" deploy** (the post-deploy health check kept passing throughout — it doesn't
|
||||
check free disk space, so nothing alarmed on it). Neither node's actual services were corrupted by
|
||||
this (verified: containers unaffected, HTTP/HTTPS still 200 after disk was freed) — the risk was
|
||||
latent (next log/DB write failing), not realized.
|
||||
|
||||
**Fixed**: added `--exclude 'releases'` (`aa849849`) and `--exclude '.venv'` (`84d35b3b`) to the
|
||||
rsync command in `scripts/deploy-to-target.sh:545-559`. Manually removed the already-synced
|
||||
`releases/`+`.venv` copies from `.116`/`.198`/`.228` (safe — these are deploy-staging copies of
|
||||
build artifacts, not live node data). Re-ran `.198`'s deploy after the fix; it and `.228`/`.116` are
|
||||
now all on `84d35b3b` and healthy.
|
||||
|
||||
**Also checked** (per user request) the broader Tailscale fleet for the same bloat, at IPs the user
|
||||
supplied: `100.72.136.5`, `100.89.209.89`, `100.70.96.88`, `100.82.34.38` were all clean (no
|
||||
`releases/`/`.venv`, 13-32% disk used) — not part of this deploy round, just checked for bloat.
|
||||
`100.66.157.120` was intentionally **not touched** (reserved as another developer's dev node per
|
||||
[[reference_test_deploy_roster]]). `100.64.83.15` and `100.102.169.103` were **unreachable** with
|
||||
every credential combination in memory (both `archipelago`/`debian` users, all 3 known passwords,
|
||||
plus a `tailscale nc` proxy attempt for the timed-out one) — need the user to supply correct
|
||||
access details if these need checking later.
|
||||
|
||||
`.116`'s HTTPS not responding is **not a bug** — that node's nginx only binds `:80` by design (a
|
||||
pre-existing dev-node config, see [[reference_116_dev_node]]), unrelated to this deploy.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user