From 4faac9cb7411feddcd6ee05add75bf4b7d78ae9d Mon Sep 17 00:00:00 2001 From: archipelago Date: Thu, 23 Apr 2026 09:14:36 -0400 Subject: [PATCH] docs(resume): add RESUME.md for context-restart recovery Consolidated single-file snapshot of plan + progress for a fresh OpenCode session to pick up the install UX polish work: - Where we are: v1.7.43-alpha shipped, 5 commits on main, deployed to .228, browser verification in progress. - Immediate next step: await user's verification results from https://192.168.1.228/ browser checklist. - Working layout: SSHFS mount, ssh archy / archy228, deploy recipes. - Architecture patterns: async-spawn lifecycle, phase-based install progress, scanner kick, .23 auto-purge migration. - Backlog: Vaultwarden exit-on-start, install log perms, 22 stale cargo test failures, historical changelog entries left intact. - User preferences: "best long-term first", one-by-one, no push, Bitcoin-only, conventional commits. Complements STATUS.md (which remains the engineering log) with a tighter resume-the-work narrative focused on the current round. --- docs/RESUME.md | 179 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 179 insertions(+) create mode 100644 docs/RESUME.md diff --git a/docs/RESUME.md b/docs/RESUME.md new file mode 100644 index 00000000..4123199e --- /dev/null +++ b/docs/RESUME.md @@ -0,0 +1,179 @@ +# RESUME — Install UX polish round (v1.7.43-alpha) + +Last updated: 2026-04-23 + +Read this first if you're a fresh OpenCode session resuming the install/uninstall/update UX work. + +--- + +## Where we are right now + +**v1.7.43-alpha shipped and deployed to .228**. 5 commits on `.116:main`, unpushed per user mirror protocol. + +Commits (newest first): +- `f9fef8d2` docs(status): record rounds 3-5 + config migration + changelog as shipped +- `008da477` docs(changelog): add v1.7.43-alpha entry covering async lifecycle + .23 retirement +- `0ee16820` fix(config): auto-purge decommissioned .23 VPS from saved registry/mirror configs +- `22052325` chore: retire .23 VPS mirror, promote .168 OVH to primary +- `f86d86c3` fix(install): kick scanner post-install so Launch button appears immediately +- `8cc84ebc` feat(install): phase-based progress bar replaces unparseable pull bytes +- `2d5b859e` feat(rpc): async-spawn install/uninstall/update lifecycle (Round 2) +- `0733ac40` fix(ui): shorten install/uninstall/update timeouts for async RPCs (Round 2) +- `e471ef75` fix(rpc): empty icon in transient install entry (Round 2) + +**Deployed artifacts on .228**: +- Backend: `/usr/local/bin/archipelago` md5 `d2b619949f19815faaeab10429e36ba0` +- Frontend: `/opt/archipelago/web-ui/` (v1.7.43-alpha changelog + .168-only registry) +- Rollback backups: `/usr/local/bin/archipelago.bak-pre-async-install` + `/opt/archipelago/web-ui.bak-pre-async-install/` + +**Rollback command** (if catastrophic): +``` +ssh archy228 'sudo cp -a /usr/local/bin/archipelago.bak-pre-async-install /usr/local/bin/archipelago && sudo rsync -a --delete /opt/archipelago/web-ui.bak-pre-async-install/ /opt/archipelago/web-ui/ && sudo systemctl restart archipelago && sudo systemctl reload nginx' +``` + +--- + +## Immediate next step + +**User is doing browser verification on https://192.168.1.228/** right now. Checklist they were given: + +1. Settings → About: top changelog entry reads "v1.7.43-alpha · Apr 23, 2026" with 4 bullets starting "Installing, updating, and removing apps no longer freezes…". Hard-refresh (Cmd+Shift+R) if stale. +2. Settings → App Registries: only `146.59.87.168:3000/lfg2025` + `git.tx1138.com`. No .23. +3. Settings → System Update → Update Mirrors: only `.168` (Server 1 primary) + `tx1138` (Server 2). No .23. +4. Install SearXNG (small, fast image). Expect: instant button response, 7 phase labels in progress bar (Preparing → Pulling image → Creating container → Starting container → Waiting for healthy → Finalizing → Done), Launch button appears within ~3s of "Done". +5. Uninstall: snappy, no freeze. + +**When user reports back**: if any item fails, investigate that item. If everything passes, ask user for the next install/uninstall/update UX issue to tackle (they work "one by one in order"). + +--- + +## Overall mission (unchanged) + +User mandate: _"best server containers in the world"_. Polish install/uninstall/update flows for all 6 bundled server containers + marketplace apps before release. Tackle UX issues one by one in order. + +--- + +## Working layout — SSH + SSHFS + +- SSHFS mount: `/Users/dorian/mnt/archy-thinkpad/` → `archy:Projects/archy/`. Use for all file ops (read/edit/write/glob/grep). +- Direct SSH: `ssh archy` (= `archipelago@192.168.1.116`, ThinkPad dev). Use for git/cargo/npm/systemctl. +- Demo node: `ssh archy228` (= `archipelago@192.168.1.228`). NOPASSWD sudo. Dashboard login pw: `password123`. +- Sudo pw on .116: `ThisIsWeb54321@`. Fallback sudo pw on .228: `archipelago`. +- Cargo: `~/.cargo/bin/cargo` on .116. Long builds: `nohup ... & disown` to `/tmp/cargo-build-*.log`. +- SSHFS flake: `write` sometimes returns `NotFound: FileSystem.readFile` on new files — retry once. + +## Deploy recipes + +**Backend binary** (can't cp while running — "Text file busy"; binary ferries via Mac because .116 can't resolve archy228): +``` +# On .116: +~/.cargo/bin/cargo build --release # ~3.5 min +# From Mac: +scp archy:Projects/archy/core/target/release/archipelago /tmp/archipelago-new +scp /tmp/archipelago-new archy228:/tmp/archipelago-new +ssh archy228 'sudo systemctl stop archipelago && sudo cp /tmp/archipelago-new /usr/local/bin/archipelago && sudo systemctl start archipelago && sudo systemctl reload nginx' +``` + +**Frontend** (rsyncs via Mac): +``` +ssh archy 'cd ~/Projects/archy/neode-ui && npm run build' # outputs to ../web/dist/neode-ui/ +rsync -az --delete archy:Projects/archy/web/dist/neode-ui/ /tmp/archy-web/ +rsync -az /tmp/archy-web/ archy228:/tmp/archy-web/ +ssh archy228 'sudo rsync -a --delete /tmp/archy-web/ /opt/archipelago/web-ui/ && sudo systemctl reload nginx' +``` + +Note: frontend source is `neode-ui/` (has package.json). `web/` has no package.json; `web/dist/neode-ui/` is the build output. + +## Commit protocol + +- Never push. User mirrors to Gitea remotes manually. +- Conventional Commits. No em-dashes or fancy punctuation. +- Multi-line messages via `tmp-commit-msg.txt`: `git commit -F tmp-commit-msg.txt && rm tmp-commit-msg.txt`. +- Git remotes on .116: `gitea-local`, `gitea-vps2` (.168 OVH), `tx1138` (canonical), `origin` (multi-push alias). `.23` URLs were removed from origin and `gitea-vps` remote was deleted — working-copy change, not in any commit. + +## Verification gates + +1. `cargo check` +2. `cargo test -p archipelago --bin archipelago ` (MUST use `--bin archipelago`; no lib target) +3. `cargo build --release` + +Known issues: +- `rust-lld: undefined hidden symbol` → cargo bug with test+release incremental collision. Fix: `rm -rf core/target/debug/incremental` and retry. +- 22 pre-existing `cargo test` failures in unrelated modules (mesh/wallet/credentials/avatar/session/transport/update-mirrors/fips/identity_manager/image_versions). Not blocking. Tech debt. + +--- + +## Architecture — locked-in patterns + +### Async-spawn lifecycle (install/update/uninstall/start/stop/restart) + +- RPC returns `{status, package_id}` immediately (15s client timeout). +- Wrapper flips state to transitional variant (Installing/Updating/Removing/Stopping/Starting/Restarting) BEFORE spawn. +- `tokio::spawn` runs existing monolithic inner handler with `self: Arc`. +- Install/update success: MUST explicitly write terminal Running state. `merge_preserving_transitional` in `server.rs` refuses to let scanner overwrite transitional states. +- Uninstall success: inner handler removes the entry itself. +- On error: revert pre-transition state (or remove entry for install). + +Key files: +- `core/archipelago/src/api/rpc/package/async_lifecycle.rs` — full install/update/uninstall wrappers +- `core/archipelago/src/api/rpc/transitional.rs` — start/stop/restart wrappers +- `core/archipelago/src/server.rs:832-871` — `merge_preserving_transitional`, `is_transitional` +- `core/archipelago/src/server.rs:295-380` — scan loop with `tokio::select!` and tick bump + +### Install progress (phase-based, 7 levels) + +- `podman pull` emits zero parseable progress when stderr is piped (no TTY). Legacy byte regex never matched. +- Phases + UI %: Preparing (5) → PullingImage (20) → CreatingContainer (70) → StartingContainer (80) → WaitingHealthy (88) → PostInstall (95) → Done (100). +- UI bar only advances forward (`Math.max`). +- Final phase label is "Finalizing…" (renamed from "Running post-install…" which confused users). + +Key files: +- `neode-ui/src/stores/server.ts:25-33` — `PHASE_INFO` mapper + +### Scanner kick (instant Launch button) + +- Scan runs every 60s. Post-install state flipped to Running but skeletal manifest (`interfaces: None`) persisted until next scan → `canLaunch(pkg)` false for up to 60s. +- `lan_address` derived from live container port bindings. `manifest.interfaces.main.ui` only populated when `lan_address.is_some() || tor_address.is_some()`. +- Fix: `scan_kick: Arc` + `scan_tick: Arc>` on `RpcHandler`. Scan loop `tokio::select!` between 60s tick + notify. `kick_scanner_and_wait` helper (2s timeout) called in install/update success paths BEFORE writing Running. Merge during Installing keeps state + takes fresh manifest. + +Key files: +- `core/archipelago/src/api/rpc/mod.rs:89-93` — fields on RpcHandler; accessors :186-199 +- `core/archipelago/src/api/rpc/package/async_lifecycle.rs:405-430` — `kick_scanner_and_wait` +- `core/archipelago/src/container/docker_packages.rs:132-218` — where `lan_address` + manifest get populated +- `neode-ui/src/views/apps/appsConfig.ts:106-111` — `canLaunch(pkg)` +- `neode-ui/src/views/apps/AppCard.vue:141-149` — Launch button render + +### Config migration (.23 auto-purge) + +- `load_mirrors` + `load_registries` normally only ADD missing defaults ("explicit removals stick"). +- .23 was a default the user never chose, so we need the opposite: strip it. +- `.retain(|m| !m.url.contains("23.182.128.160"))` before defaults-merge step. Narrow-scope exception, commented in-code. +- Triggers lazily on next load (install RPC, update RPC, Settings UI open). Not tied to boot. + +Key files: +- `core/archipelago/src/container/registry.rs` — `load_registries` +- `core/archipelago/src/update.rs` — `load_mirrors` + +--- + +## Backlog — after v1.7.43 verification + +1. User reports browser-verification results. Fix anything that fails. +2. Continue user's "one by one" install/uninstall/update UX queue — ask for next issue. +3. Tech debt (low priority, not blocking release): + - Vaultwarden container exits immediately on start (separate container-config issue). + - `/var/log/archipelago-container-installs.log` never created — backend runs non-root. Tmpfiles.d rule or path change. + - 22 pre-existing cargo test failures in unrelated modules. + - "Server 3 (OVH)" historical changelog entries in `AccountInfoSection.vue` left intact (user approved — they're release notes for what shipped at the time). + +--- + +## User preferences (must follow) + +- Always state which option is "best long-term" first and explain why in plain terms. Trust my recommendation unless overridden. +- "Tackle them one by one in order" — fix issues sequentially, not in a big bang. +- Bitcoin-only. No altcoins, no proprietary deps without approval. +- Prefer established OSS, crypto-first libs (rustls, argon2, ed25519), privacy-focused (no telemetry), minimal dep trees. +- Atomic commits. +- Never commit secrets. Pin dependency versions. +- Never push — user mirrors to Gitea manually.