archy/docs/WEEKLY_RELEASE_TRACKER.md
archipelago 329e7811eb test(lifecycle): add os-audit OS-wide health gate; docs: v1.7.91 resume notes
os-audit.sh: one non-destructive scorecard tying backend/RPC health, the
all-apps lifecycle audit (delegates to remote-lifecycle.sh), and the FM-guards
(port-drift, secret-completeness, orphan-container sweep, OTA-wedge). The
per-boot building block for the reboot-survival loop. FM12 check uses jq has()
not // (// treats a legit false as empty). Section A validated all-PASS on .116.

docs: v1.7.91 release-pass resume notes + the bitcoinReceive blocker writeup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 04:36:06 -04:00

222 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Weekly Release Tracker
Last updated: 2026-06-14 (session on node .116 / archi-thinkpad)
---
# ▶ CURRENT PASS — v1.7.91-alpha (2026-06-14)
## RESUME PROMPT (paste into a fresh session, any machine)
> Resume the v1.7.91-alpha release pass for the `archy` repo (on node .116 / archi-thinkpad
> the tree is at /home/archipelago/Projects/archy; on another machine, clone/pull `main` from
> gitea-vps2 http://146.59.87.168:3000/lfg2025/archy.git — my fix commit is pushed there).
> Read the top section of docs/WEEKLY_RELEASE_TRACKER.md FIRST — it has the blocker, the fix
> already made, and exact next steps. Two goals: (1) cut & PUBLISH v1.7.91-alpha, (2) finish
> validating + integrating the new tests/lifecycle/os-audit.sh OS-wide health harness.
> Do NOT redo: the bitcoinReceive.ts TS2538 fix (done, committed) or the os-audit jq false-trap
> fix (done). Resume at "EXACT NEXT STEPS — v1.7.91" below. .116 login password: ThisIsWeb54321@
> (.116 serves http on :80 → ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http).
## What happened this session
- `scripts/create-release.sh 1.7.91-alpha` was running; its release gate PASSED all 7 checks,
backend built clean (7m22s), then it **FAILED at step [4/8] frontend build** with:
`src/utils/bitcoinReceive.ts(23,24): error TS2538: Type 'undefined' cannot be used as an index type.`
Cause: `noUncheckedIndexedAccess``codeMatch[1]` is `string | undefined` and was used directly
to index `RECEIVE_CODE_MESSAGES`. **FIXED**`const code = message.match(/\[([A-Z_]+)\]/)?.[1]`
then `if (code && RECEIVE_CODE_MESSAGES[code])`. `npx vue-tsc --noEmit` is now clean (exit 0).
The failed run aborted BEFORE bumping the manifest (still 1.7.90) or tagging (no v1.7.91 tag),
but it HAD already partial-bumped Cargo.toml/package.json/locks to 1.7.91 — those partial bumps
are reverted (create-release.sh re-owns the bump); only the genuine TS fix + harness are committed.
- Built a new OS-wide health harness `tests/lifecycle/os-audit.sh` (non-destructive, one scorecard):
Section A backend/RPC health, Section B all-apps lifecycle audit (delegates to remote-lifecycle.sh),
Section C FM-guards (port-drift + secret-completeness bats, orphan-container sweep). Section A
validated all-PASS on .116. Fixed a jq bug in the FM12 OTA-wedge check: `//` treats a legit
`false` as empty and fell through to "unknown" — now uses `has()`. Section B is slow (~3 min) and
opaque while running because output is captured (`out=$(...)`) not streamed — minor wart, TODO.
## EXACT NEXT STEPS — v1.7.91 (in order)
1. Confirm clean tree + on main (`git status`; create-release.sh requires `git diff --quiet HEAD`).
The TS fix + os-audit.sh are committed & pushed; version-bump artifacts reverted to 1.7.90.
2. Re-run the release: `scripts/create-release.sh 1.7.91-alpha`. Backend is cached (only a .ts
changed) so it's fast; the frontend build now passes. It bumps versions, builds, writes
releases/manifest.json (→1.7.91-alpha), commits, and tags v1.7.91-alpha.
- Memory guards: grep the staged frontend tarball for "1.7.91-alpha" before shipping (silent
vue-tsc failures); tarball must be flat (`tar -C web/dist/neode-ui .`).
3. Publish: `scripts/publish-release-assets.sh 1.7.91-alpha gitea-vps2`, then
`git push origin main && git push origin --tags` (origin pushes to BOTH gitea-local + vps2).
4. Verify manifest LIVE (this is "published"):
`curl -fsS http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/manifest.json | jq .version`
must show `1.7.91-alpha`. **Then notify the user — they asked to be told when 1.7.91 publishes.**
5. os-audit harness: run a full green pass on .116
(`ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http ARCHY_PASSWORD='ThisIsWeb54321@' tests/lifecycle/os-audit.sh`),
confirm Section A FM12 now reads `update_in_progress=false` (PASS not WARN), review B + C findings,
then wire os-audit.sh into the reboot-survival (L3) loop as the per-boot gate.
---
# ─ HISTORY — v1.7.89-alpha pass (2026-06-12), superseded ─
Last updated: 2026-06-12 ~17:45 EDT (session on node .116)
## RESUME PROMPT (paste into a fresh session)
> Continue the v1.7.89-alpha release pass from /home/archipelago/Projects/archy on node .116.
> Read docs/WEEKLY_RELEASE_TRACKER.md fully first — it has root causes, fixes already made,
> and exact next steps. Do NOT redo: AIUI revert (done, validated), updater fixes in
> core/archipelago/src/update.rs (done, uncommitted), .116 OTA unwedge (done). Resume at
> "EXACT NEXT STEPS" below.
## EXACT NEXT STEPS (in order)
1. Backend focused tests were running in background:
`cd core && timeout 1500 cargo test -p archipelago -- update:: lnd container::image_versions scanner`
(log: /tmp/claude-.../tasks/bds4jk19e.output — if lost, just rerun the command; first
attempt died at 400s timeout during test compile, 1500s is the right budget).
Need: all green.
2. RESOLVED before session end: vitest recheck passed clean — EXIT=0, 79 files / 645 tests,
even while cargo test was compiling. The earlier harness ui-unit-tests FAIL was load/flake
(machine saturated by the parallel cargo test compile), not a real failure. On resume just
rerun `tests/release/run.sh --quick` WITHOUT a parallel cargo build to confirm green;
if it ever fails again, the failing test name is in the stage output (drop `--silent`).
3. Run full harness: `tests/release/run.sh` (static+frontend+backend). Then commit ALL
working-tree changes (one commit, e.g. "fix: harden OTA updates, AIUI desktop gap, LND
no-proxy" — CHANGELOG v1.7.89 section is already curated).
4. Cut release: `scripts/create-release.sh 1.7.89-alpha` (needs clean tree, on main,
validates CHANGELOG section exists — it does). Then
`tests/release/run.sh --manifest` should pass, and grep the staged frontend tarball
for 1.7.89-alpha (memory: silent build failures).
5. Publish: `scripts/publish-release-assets.sh 1.7.89-alpha gitea-vps2`, then
`git push origin main && git push origin --tags` and push gitea-local + tags too.
Verify manifest live on http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/manifest.json
6. Verify OTA on THIS node (.116): schedule is auto_apply; either wait for the scheduler
or trigger via UI. Confirm /var/lib/archipelago/update_state.json current_version
becomes 1.7.89-alpha, `update_in_progress` returns to false, web-ui + binary versions
MATCH (this node currently has web-ui 1.7.84 / binary 1.7.85 mismatch — the OTA heals it),
and journalctl shows "Post-OTA verification succeeded" (the new probe falls back to
http://127.0.0.1/ which is what .116 serves).
7. Update this tracker + docs/PROGRESS_MEMORY.md, mark tasks done.
Purpose: live tracker for this pass — test everything shipped this week (v1.7.83→v1.7.89),
build the release test harness, fix OTA updates on .116, make updates bulletproof, cut v1.7.89-alpha.
If the session is cut off, resume from here.
## Task status
| # | Task | Status |
|---|------|--------|
| 1 | AIUI revert (mobile back/close gone, desktop gap fixed) | DONE — validated |
| 2 | Dev server on :8100 with embedded AIUI | DONE — see below |
| 3 | Inventory this week's release-log items | DONE — see checklist |
| 4 | Test harness covering this week + seed of system-wide harness | IN PROGRESS |
| 5 | Fix OTA updates on .116 + bulletproof updates | IN PROGRESS — diagnosis below |
| 6 | Cut v1.7.89-alpha release | PENDING (gates: 4, 5) |
## State of the working tree
- HEAD = 495b9078 (v1.7.89 changelog + AIUI mobile restore committed).
- Uncommitted, intended for v1.7.89-alpha:
- `neode-ui/src/views/Dashboard.vue` — chat route back to plain `h-full` (desktop bottom-gap fix). Validated.
- `core/.../rpc/lnd/*` + `container/lnd.rs` — LND REST no-proxy + wallet readiness/unlock fixes.
- Version bumps to 1.7.89-alpha (Cargo.toml, package.json, locks), CHANGELOG entry.
- `neode-ui/vite.config.ts` — added `/aiui` dev proxy (keep; dev-only convenience).
## AIUI validation (task 1) — DONE
- HEAD already removed the mobile back button and restored `hideClose=true` (495b9078).
- Working-tree Dashboard.vue removes `dashboard-scroll-panel mobile-scroll-pad` from the chat
route (that padding caused the desktop bottom gap); mesh keeps its styling.
- Chat CSS verified byte-identical to last-good 34c4e87d (May 20).
- Playwright check (desktop 1440x900, mobile 390x844): chat fills full viewport, no bottom gap,
no mobile back/close. `npm run type-check` + focused route tests + full vitest (645/645) pass.
## Dev server on :8100 (task 2) — DONE
- Running: `BACKEND_URL=http://127.0.0.1:5678 VITE_AIUI_URL=/aiui/ npx vite --host 0.0.0.0 --port 8100`
from `neode-ui/` (real local backend on 5678).
- AIUI now embeds in /dashboard/chat via new vite proxy `/aiui``http://127.0.0.1:80`
(the node's deployed AIUI), same-origin like production.
- Secondary throwaway instance for automated checks: :8101 against mock backend
(`node mock-backend.js` on 5959, password `password123`).
## This week's shipped items (v1.7.83 → v1.7.89) — test checklist
### Frontend (vitest/type-check/build cover most; full suite 645/645 green 2026-06-12)
- [x] AIUI fast launch, no availability probe (v1.7.88) — covered by visual check + Chat.vue tests
- [x] AIUI mobile layout restore (v1.7.89) — playwright visual check
- [x] App-session launch metadata from manifests / typed interfaces (v1.7.83) — appSessionConfig tests
- [x] OnlyOffice + Saleor removal (v1.7.83) — catalog tests
- [ ] Bitcoin receive UI flow end-to-end (v1.7.87/88) — needs live LND node check
- [ ] Fleet tab keeps node list/alerts during refresh, names not hashes (v1.7.85/86) — store tests?
- [ ] Credential interstitial full-screen overlay (v1.7.87) — visual
- [ ] Mobile federation/system-update buttons full width (v1.7.86) — visual
### Backend (cargo)
- [ ] LND REST no-proxy client + GET newaddress p2wkh (v1.7.88/89) — unit tests + live check
- [ ] LND wallet readiness/unlock after restart (v1.7.89) — unit + live
- [ ] Bitcoin trusted-node relay rpcauth/txrelay (v1.7.84) — unit tests exist? check
- [ ] Container scanner RAII in-flight guard (v1.7.84) — cargo test
- [ ] ElectrumX health-check startup window + cache tuning (v1.7.85/86)
- [ ] Portainer pin 2.19.4 / bitcoin-ui image pin (v1.7.84/85) — image-versions tests
- [ ] Fleet telemetry name/hostname/URL fields (v1.7.85)
- [ ] Federation no self-import (v1.7.85)
- [ ] Kiosk safe-area + self-update refreshes kiosk files (v1.7.84)
- [ ] Wi-Fi scan error/retry/escaped SSID/open networks (v1.7.84)
### OTA / updates (task 5)
- [ ] .116 stuck: current 1.7.85-alpha, `update_in_progress: true` since 1.7.88 attempt — diagnose+fix
- [ ] Updater hardening: stuck-in-progress recovery, resumable/atomic apply, verify post-restart version
## OTA diagnosis on .116 — ROOT CAUSES FOUND + FIXED (code staged for v1.7.89)
Four bugs, all reproduced from the journal (Jun 12 03:4504:33):
1. Post-OTA probe only tries `https://127.0.0.1/`; .116's nginx binds only :80 (443 is
tailscale's) → connection refused × 18 → a GOOD 1.7.85 update was "rolled back".
FIX: probe falls back to `http://127.0.0.1/` on connect error (update.rs probe_frontend_once).
2. That rollback's binary restore did `host_sudo cp` onto the RUNNING binary → ETXTBSY exit 1
→ binary stayed 1.7.85 while web-ui rolled back to 1.7.84 (mismatch confirmed live).
FIX: rollback now cp→tmp→atomic mv, same pattern as apply (update.rs rollback_update).
3. The rollback chown'd `update-backup/archipelago` root:root IN PLACE → next apply's
fs::copy (as service user) hit EACCES → "Failed to backup current binary" × 3 → 1.7.86/88
never applied. FIX: apply unlinks stale backup first; rollback chowns only its temp copy.
4. Failed apply left `update_in_progress: true` wedged (staging still populated so the
stale-flag guard never fires). Unwedged operationally; fixed structurally by 13.
Operational cleanup DONE on .116 (2026-06-12 17:15): removed root-owned
`update-backup/archipelago`, stale `update-staging/` (1.7.86), and the stale
`update-pending-verify.json`. Next state load clears `update_in_progress`.
NOTE: live web-ui is 1.7.84 / binary 1.7.85 (mismatch from bug 2). Not hand-patched —
the v1.7.89 OTA will resync both. Good 1.7.85 frontend is quarantined at
`/opt/archipelago/web-ui.failed.1781250438247`.
Verification plan: after v1.7.89 release, watch .116 auto-apply (schedule auto_apply),
confirm `update_state.json.current_version == 1.7.89-alpha` and web-ui version matches.
## Test harness (task 4) — CREATED at tests/release/run.sh
- Stages: static (git diff --check, cargo fmt, catalog drift, optional --manifest),
frontend (type-check, full vitest), optional --with-build (build + grep dist for version),
backend (cargo check + focused cargo test: update:: lnd container::image_versions scanner,
all wrapped in `timeout`), optional --live URL smoke (/, /aiui/, /rpc/v1).
- Results so far (2026-06-12): type-check PASS, full vitest 645/645 PASS, cargo fmt PASS,
cargo check PASS, catalog drift PASS (3 pre-existing MISSING_CATALOG warnings, exit 0,
identical on HEAD). Focused backend cargo tests running (first run hit the known slow
test-compile on .116 at 400s timeout; rerunning with 1500s).
- AIUI embed verified end-to-end via playwright on :8101 (mock backend): iframe loads,
`ready` handshake clears the loading overlay, hideClose honored.
- Release flow confirmed: commit all → `scripts/create-release.sh 1.7.89-alpha` (validates
curated CHANGELOG section, builds, manifests, commits, tags) →
`scripts/publish-release-assets.sh 1.7.89-alpha gitea-vps2` → push origin main + tags.
Tarball layout/perms safety is already inside create-release-manifest.sh.
- CHANGELOG v1.7.89 section rewritten layman-readable (updater fixes added).
## Release gates for v1.7.89-alpha (task 6)
1. All harness stages green locally.
2. OTA fix for stuck `update_in_progress` included + .116 updates successfully to the new release.
3. Frontend build: grep packaged tarball for "1.7.89-alpha" before shipping (memory: silent vue-tsc failures).
4. Flat tarball layout (`tar -C web/dist/neode-ui .`).
5. Commit, tag `v1.7.89-alpha`, push origin + gitea-local + tags, publish release assets, verify
manifest + node OTA picks it up.