archy/docs/WEEKLY_RELEASE_TRACKER.md
archipelago 329e7811eb test(lifecycle): add os-audit OS-wide health gate; docs: v1.7.91 resume notes
os-audit.sh: one non-destructive scorecard tying backend/RPC health, the
all-apps lifecycle audit (delegates to remote-lifecycle.sh), and the FM-guards
(port-drift, secret-completeness, orphan-container sweep, OTA-wedge). The
per-boot building block for the reboot-survival loop. FM12 check uses jq has()
not // (// treats a legit false as empty). Section A validated all-PASS on .116.

docs: v1.7.91 release-pass resume notes + the bitcoinReceive blocker writeup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 04:36:06 -04:00

14 KiB
Raw Blame History

Weekly Release Tracker

Last updated: 2026-06-14 (session on node .116 / archi-thinkpad)


▶ CURRENT PASS — v1.7.91-alpha (2026-06-14)

RESUME PROMPT (paste into a fresh session, any machine)

Resume the v1.7.91-alpha release pass for the archy repo (on node .116 / archi-thinkpad the tree is at /home/archipelago/Projects/archy; on another machine, clone/pull main from gitea-vps2 http://146.59.87.168:3000/lfg2025/archy.git — my fix commit is pushed there). Read the top section of docs/WEEKLY_RELEASE_TRACKER.md FIRST — it has the blocker, the fix already made, and exact next steps. Two goals: (1) cut & PUBLISH v1.7.91-alpha, (2) finish validating + integrating the new tests/lifecycle/os-audit.sh OS-wide health harness. Do NOT redo: the bitcoinReceive.ts TS2538 fix (done, committed) or the os-audit jq false-trap fix (done). Resume at "EXACT NEXT STEPS — v1.7.91" below. .116 login password: ThisIsWeb54321@ (.116 serves http on :80 → ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http).

What happened this session

  • scripts/create-release.sh 1.7.91-alpha was running; its release gate PASSED all 7 checks, backend built clean (7m22s), then it FAILED at step [4/8] frontend build with: src/utils/bitcoinReceive.ts(23,24): error TS2538: Type 'undefined' cannot be used as an index type. Cause: noUncheckedIndexedAccesscodeMatch[1] is string | undefined and was used directly to index RECEIVE_CODE_MESSAGES. FIXEDconst code = message.match(/\[([A-Z_]+)\]/)?.[1] then if (code && RECEIVE_CODE_MESSAGES[code]). npx vue-tsc --noEmit is now clean (exit 0). The failed run aborted BEFORE bumping the manifest (still 1.7.90) or tagging (no v1.7.91 tag), but it HAD already partial-bumped Cargo.toml/package.json/locks to 1.7.91 — those partial bumps are reverted (create-release.sh re-owns the bump); only the genuine TS fix + harness are committed.
  • Built a new OS-wide health harness tests/lifecycle/os-audit.sh (non-destructive, one scorecard): Section A backend/RPC health, Section B all-apps lifecycle audit (delegates to remote-lifecycle.sh), Section C FM-guards (port-drift + secret-completeness bats, orphan-container sweep). Section A validated all-PASS on .116. Fixed a jq bug in the FM12 OTA-wedge check: // treats a legit false as empty and fell through to "unknown" — now uses has(). Section B is slow (~3 min) and opaque while running because output is captured (out=$(...)) not streamed — minor wart, TODO.

EXACT NEXT STEPS — v1.7.91 (in order)

  1. Confirm clean tree + on main (git status; create-release.sh requires git diff --quiet HEAD). The TS fix + os-audit.sh are committed & pushed; version-bump artifacts reverted to 1.7.90.
  2. Re-run the release: scripts/create-release.sh 1.7.91-alpha. Backend is cached (only a .ts changed) so it's fast; the frontend build now passes. It bumps versions, builds, writes releases/manifest.json (→1.7.91-alpha), commits, and tags v1.7.91-alpha.
    • Memory guards: grep the staged frontend tarball for "1.7.91-alpha" before shipping (silent vue-tsc failures); tarball must be flat (tar -C web/dist/neode-ui .).
  3. Publish: scripts/publish-release-assets.sh 1.7.91-alpha gitea-vps2, then git push origin main && git push origin --tags (origin pushes to BOTH gitea-local + vps2).
  4. Verify manifest LIVE (this is "published"): curl -fsS http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/manifest.json | jq .version must show 1.7.91-alpha. Then notify the user — they asked to be told when 1.7.91 publishes.
  5. os-audit harness: run a full green pass on .116 (ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http ARCHY_PASSWORD='ThisIsWeb54321@' tests/lifecycle/os-audit.sh), confirm Section A FM12 now reads update_in_progress=false (PASS not WARN), review B + C findings, then wire os-audit.sh into the reboot-survival (L3) loop as the per-boot gate.

─ HISTORY — v1.7.89-alpha pass (2026-06-12), superseded ─

Last updated: 2026-06-12 ~17:45 EDT (session on node .116)

RESUME PROMPT (paste into a fresh session)

Continue the v1.7.89-alpha release pass from /home/archipelago/Projects/archy on node .116. Read docs/WEEKLY_RELEASE_TRACKER.md fully first — it has root causes, fixes already made, and exact next steps. Do NOT redo: AIUI revert (done, validated), updater fixes in core/archipelago/src/update.rs (done, uncommitted), .116 OTA unwedge (done). Resume at "EXACT NEXT STEPS" below.

EXACT NEXT STEPS (in order)

  1. Backend focused tests were running in background: cd core && timeout 1500 cargo test -p archipelago -- update:: lnd container::image_versions scanner (log: /tmp/claude-.../tasks/bds4jk19e.output — if lost, just rerun the command; first attempt died at 400s timeout during test compile, 1500s is the right budget). Need: all green.
  2. RESOLVED before session end: vitest recheck passed clean — EXIT=0, 79 files / 645 tests, even while cargo test was compiling. The earlier harness ui-unit-tests FAIL was load/flake (machine saturated by the parallel cargo test compile), not a real failure. On resume just rerun tests/release/run.sh --quick WITHOUT a parallel cargo build to confirm green; if it ever fails again, the failing test name is in the stage output (drop --silent).
  3. Run full harness: tests/release/run.sh (static+frontend+backend). Then commit ALL working-tree changes (one commit, e.g. "fix: harden OTA updates, AIUI desktop gap, LND no-proxy" — CHANGELOG v1.7.89 section is already curated).
  4. Cut release: scripts/create-release.sh 1.7.89-alpha (needs clean tree, on main, validates CHANGELOG section exists — it does). Then tests/release/run.sh --manifest should pass, and grep the staged frontend tarball for 1.7.89-alpha (memory: silent build failures).
  5. Publish: scripts/publish-release-assets.sh 1.7.89-alpha gitea-vps2, then git push origin main && git push origin --tags and push gitea-local + tags too. Verify manifest live on http://146.59.87.168:3000/lfg2025/archy/raw/branch/main/releases/manifest.json
  6. Verify OTA on THIS node (.116): schedule is auto_apply; either wait for the scheduler or trigger via UI. Confirm /var/lib/archipelago/update_state.json current_version becomes 1.7.89-alpha, update_in_progress returns to false, web-ui + binary versions MATCH (this node currently has web-ui 1.7.84 / binary 1.7.85 mismatch — the OTA heals it), and journalctl shows "Post-OTA verification succeeded" (the new probe falls back to http://127.0.0.1/ which is what .116 serves).
  7. Update this tracker + docs/PROGRESS_MEMORY.md, mark tasks done. Purpose: live tracker for this pass — test everything shipped this week (v1.7.83→v1.7.89), build the release test harness, fix OTA updates on .116, make updates bulletproof, cut v1.7.89-alpha. If the session is cut off, resume from here.

Task status

# Task Status
1 AIUI revert (mobile back/close gone, desktop gap fixed) DONE — validated
2 Dev server on :8100 with embedded AIUI DONE — see below
3 Inventory this week's release-log items DONE — see checklist
4 Test harness covering this week + seed of system-wide harness IN PROGRESS
5 Fix OTA updates on .116 + bulletproof updates IN PROGRESS — diagnosis below
6 Cut v1.7.89-alpha release PENDING (gates: 4, 5)

State of the working tree

  • HEAD = 495b9078 (v1.7.89 changelog + AIUI mobile restore committed).
  • Uncommitted, intended for v1.7.89-alpha:
    • neode-ui/src/views/Dashboard.vue — chat route back to plain h-full (desktop bottom-gap fix). Validated.
    • core/.../rpc/lnd/* + container/lnd.rs — LND REST no-proxy + wallet readiness/unlock fixes.
    • Version bumps to 1.7.89-alpha (Cargo.toml, package.json, locks), CHANGELOG entry.
    • neode-ui/vite.config.ts — added /aiui dev proxy (keep; dev-only convenience).

AIUI validation (task 1) — DONE

  • HEAD already removed the mobile back button and restored hideClose=true (495b9078).
  • Working-tree Dashboard.vue removes dashboard-scroll-panel mobile-scroll-pad from the chat route (that padding caused the desktop bottom gap); mesh keeps its styling.
  • Chat CSS verified byte-identical to last-good 34c4e87d (May 20).
  • Playwright check (desktop 1440x900, mobile 390x844): chat fills full viewport, no bottom gap, no mobile back/close. npm run type-check + focused route tests + full vitest (645/645) pass.

Dev server on :8100 (task 2) — DONE

  • Running: BACKEND_URL=http://127.0.0.1:5678 VITE_AIUI_URL=/aiui/ npx vite --host 0.0.0.0 --port 8100 from neode-ui/ (real local backend on 5678).
  • AIUI now embeds in /dashboard/chat via new vite proxy /aiuihttp://127.0.0.1:80 (the node's deployed AIUI), same-origin like production.
  • Secondary throwaway instance for automated checks: :8101 against mock backend (node mock-backend.js on 5959, password password123).

This week's shipped items (v1.7.83 → v1.7.89) — test checklist

Frontend (vitest/type-check/build cover most; full suite 645/645 green 2026-06-12)

  • AIUI fast launch, no availability probe (v1.7.88) — covered by visual check + Chat.vue tests
  • AIUI mobile layout restore (v1.7.89) — playwright visual check
  • App-session launch metadata from manifests / typed interfaces (v1.7.83) — appSessionConfig tests
  • OnlyOffice + Saleor removal (v1.7.83) — catalog tests
  • Bitcoin receive UI flow end-to-end (v1.7.87/88) — needs live LND node check
  • Fleet tab keeps node list/alerts during refresh, names not hashes (v1.7.85/86) — store tests?
  • Credential interstitial full-screen overlay (v1.7.87) — visual
  • Mobile federation/system-update buttons full width (v1.7.86) — visual

Backend (cargo)

  • LND REST no-proxy client + GET newaddress p2wkh (v1.7.88/89) — unit tests + live check
  • LND wallet readiness/unlock after restart (v1.7.89) — unit + live
  • Bitcoin trusted-node relay rpcauth/txrelay (v1.7.84) — unit tests exist? check
  • Container scanner RAII in-flight guard (v1.7.84) — cargo test
  • ElectrumX health-check startup window + cache tuning (v1.7.85/86)
  • Portainer pin 2.19.4 / bitcoin-ui image pin (v1.7.84/85) — image-versions tests
  • Fleet telemetry name/hostname/URL fields (v1.7.85)
  • Federation no self-import (v1.7.85)
  • Kiosk safe-area + self-update refreshes kiosk files (v1.7.84)
  • Wi-Fi scan error/retry/escaped SSID/open networks (v1.7.84)

OTA / updates (task 5)

  • .116 stuck: current 1.7.85-alpha, update_in_progress: true since 1.7.88 attempt — diagnose+fix
  • Updater hardening: stuck-in-progress recovery, resumable/atomic apply, verify post-restart version

OTA diagnosis on .116 — ROOT CAUSES FOUND + FIXED (code staged for v1.7.89)

Four bugs, all reproduced from the journal (Jun 12 03:4504:33):

  1. Post-OTA probe only tries https://127.0.0.1/; .116's nginx binds only :80 (443 is tailscale's) → connection refused × 18 → a GOOD 1.7.85 update was "rolled back". FIX: probe falls back to http://127.0.0.1/ on connect error (update.rs probe_frontend_once).
  2. That rollback's binary restore did host_sudo cp onto the RUNNING binary → ETXTBSY exit 1 → binary stayed 1.7.85 while web-ui rolled back to 1.7.84 (mismatch confirmed live). FIX: rollback now cp→tmp→atomic mv, same pattern as apply (update.rs rollback_update).
  3. The rollback chown'd update-backup/archipelago root:root IN PLACE → next apply's fs::copy (as service user) hit EACCES → "Failed to backup current binary" × 3 → 1.7.86/88 never applied. FIX: apply unlinks stale backup first; rollback chowns only its temp copy.
  4. Failed apply left update_in_progress: true wedged (staging still populated so the stale-flag guard never fires). Unwedged operationally; fixed structurally by 13.

Operational cleanup DONE on .116 (2026-06-12 17:15): removed root-owned update-backup/archipelago, stale update-staging/ (1.7.86), and the stale update-pending-verify.json. Next state load clears update_in_progress. NOTE: live web-ui is 1.7.84 / binary 1.7.85 (mismatch from bug 2). Not hand-patched — the v1.7.89 OTA will resync both. Good 1.7.85 frontend is quarantined at /opt/archipelago/web-ui.failed.1781250438247. Verification plan: after v1.7.89 release, watch .116 auto-apply (schedule auto_apply), confirm update_state.json.current_version == 1.7.89-alpha and web-ui version matches.

Test harness (task 4) — CREATED at tests/release/run.sh

  • Stages: static (git diff --check, cargo fmt, catalog drift, optional --manifest), frontend (type-check, full vitest), optional --with-build (build + grep dist for version), backend (cargo check + focused cargo test: update:: lnd container::image_versions scanner, all wrapped in timeout), optional --live URL smoke (/, /aiui/, /rpc/v1).
  • Results so far (2026-06-12): type-check PASS, full vitest 645/645 PASS, cargo fmt PASS, cargo check PASS, catalog drift PASS (3 pre-existing MISSING_CATALOG warnings, exit 0, identical on HEAD). Focused backend cargo tests running (first run hit the known slow test-compile on .116 at 400s timeout; rerunning with 1500s).
  • AIUI embed verified end-to-end via playwright on :8101 (mock backend): iframe loads, ready handshake clears the loading overlay, hideClose honored.
  • Release flow confirmed: commit all → scripts/create-release.sh 1.7.89-alpha (validates curated CHANGELOG section, builds, manifests, commits, tags) → scripts/publish-release-assets.sh 1.7.89-alpha gitea-vps2 → push origin main + tags. Tarball layout/perms safety is already inside create-release-manifest.sh.
  • CHANGELOG v1.7.89 section rewritten layman-readable (updater fixes added).

Release gates for v1.7.89-alpha (task 6)

  1. All harness stages green locally.
  2. OTA fix for stuck update_in_progress included + .116 updates successfully to the new release.
  3. Frontend build: grep packaged tarball for "1.7.89-alpha" before shipping (memory: silent vue-tsc failures).
  4. Flat tarball layout (tar -C web/dist/neode-ui .).
  5. Commit, tag v1.7.89-alpha, push origin + gitea-local + tags, publish release assets, verify manifest + node OTA picks it up.