docs(resume): add RESUME.md for context-restart recovery

Consolidated single-file snapshot of plan + progress for a fresh
OpenCode session to pick up the install UX polish work:

- Where we are: v1.7.43-alpha shipped, 5 commits on main, deployed
  to .228, browser verification in progress.
- Immediate next step: await user's verification results from
  https://192.168.1.228/ browser checklist.
- Working layout: SSHFS mount, ssh archy / archy228, deploy recipes.
- Architecture patterns: async-spawn lifecycle, phase-based install
  progress, scanner kick, .23 auto-purge migration.
- Backlog: Vaultwarden exit-on-start, install log perms, 22 stale
  cargo test failures, historical changelog entries left intact.
- User preferences: "best long-term first", one-by-one, no push,
  Bitcoin-only, conventional commits.

Complements STATUS.md (which remains the engineering log) with a
tighter resume-the-work narrative focused on the current round.
This commit is contained in:
archipelago 2026-04-23 09:14:36 -04:00
parent f9fef8d2cc
commit 013e8df077

179
docs/RESUME.md Normal file
View File

@ -0,0 +1,179 @@
# RESUME — Install UX polish round (v1.7.43-alpha)
Last updated: 2026-04-23
Read this first if you're a fresh OpenCode session resuming the install/uninstall/update UX work.
---
## Where we are right now
**v1.7.43-alpha shipped and deployed to .228**. 5 commits on `.116:main`, unpushed per user mirror protocol.
Commits (newest first):
- `f9fef8d2` docs(status): record rounds 3-5 + config migration + changelog as shipped
- `008da477` docs(changelog): add v1.7.43-alpha entry covering async lifecycle + .23 retirement
- `0ee16820` fix(config): auto-purge decommissioned .23 VPS from saved registry/mirror configs
- `22052325` chore: retire .23 VPS mirror, promote .168 OVH to primary
- `f86d86c3` fix(install): kick scanner post-install so Launch button appears immediately
- `8cc84ebc` feat(install): phase-based progress bar replaces unparseable pull bytes
- `2d5b859e` feat(rpc): async-spawn install/uninstall/update lifecycle (Round 2)
- `0733ac40` fix(ui): shorten install/uninstall/update timeouts for async RPCs (Round 2)
- `e471ef75` fix(rpc): empty icon in transient install entry (Round 2)
**Deployed artifacts on .228**:
- Backend: `/usr/local/bin/archipelago` md5 `d2b619949f19815faaeab10429e36ba0`
- Frontend: `/opt/archipelago/web-ui/` (v1.7.43-alpha changelog + .168-only registry)
- Rollback backups: `/usr/local/bin/archipelago.bak-pre-async-install` + `/opt/archipelago/web-ui.bak-pre-async-install/`
**Rollback command** (if catastrophic):
```
ssh archy228 'sudo cp -a /usr/local/bin/archipelago.bak-pre-async-install /usr/local/bin/archipelago && sudo rsync -a --delete /opt/archipelago/web-ui.bak-pre-async-install/ /opt/archipelago/web-ui/ && sudo systemctl restart archipelago && sudo systemctl reload nginx'
```
---
## Immediate next step
**User is doing browser verification on https://192.168.1.228/** right now. Checklist they were given:
1. Settings → About: top changelog entry reads "v1.7.43-alpha · Apr 23, 2026" with 4 bullets starting "Installing, updating, and removing apps no longer freezes…". Hard-refresh (Cmd+Shift+R) if stale.
2. Settings → App Registries: only `146.59.87.168:3000/lfg2025` + `git.tx1138.com`. No .23.
3. Settings → System Update → Update Mirrors: only `.168` (Server 1 primary) + `tx1138` (Server 2). No .23.
4. Install SearXNG (small, fast image). Expect: instant button response, 7 phase labels in progress bar (Preparing → Pulling image → Creating container → Starting container → Waiting for healthy → Finalizing → Done), Launch button appears within ~3s of "Done".
5. Uninstall: snappy, no freeze.
**When user reports back**: if any item fails, investigate that item. If everything passes, ask user for the next install/uninstall/update UX issue to tackle (they work "one by one in order").
---
## Overall mission (unchanged)
User mandate: _"best server containers in the world"_. Polish install/uninstall/update flows for all 6 bundled server containers + marketplace apps before release. Tackle UX issues one by one in order.
---
## Working layout — SSH + SSHFS
- SSHFS mount: `/Users/dorian/mnt/archy-thinkpad/``archy:Projects/archy/`. Use for all file ops (read/edit/write/glob/grep).
- Direct SSH: `ssh archy` (= `archipelago@192.168.1.116`, ThinkPad dev). Use for git/cargo/npm/systemctl.
- Demo node: `ssh archy228` (= `archipelago@192.168.1.228`). NOPASSWD sudo. Dashboard login pw: `password123`.
- Sudo pw on .116: `ThisIsWeb54321@`. Fallback sudo pw on .228: `archipelago`.
- Cargo: `~/.cargo/bin/cargo` on .116. Long builds: `nohup ... & disown` to `/tmp/cargo-build-*.log`.
- SSHFS flake: `write` sometimes returns `NotFound: FileSystem.readFile` on new files — retry once.
## Deploy recipes
**Backend binary** (can't cp while running — "Text file busy"; binary ferries via Mac because .116 can't resolve archy228):
```
# On .116:
~/.cargo/bin/cargo build --release # ~3.5 min
# From Mac:
scp archy:Projects/archy/core/target/release/archipelago /tmp/archipelago-new
scp /tmp/archipelago-new archy228:/tmp/archipelago-new
ssh archy228 'sudo systemctl stop archipelago && sudo cp /tmp/archipelago-new /usr/local/bin/archipelago && sudo systemctl start archipelago && sudo systemctl reload nginx'
```
**Frontend** (rsyncs via Mac):
```
ssh archy 'cd ~/Projects/archy/neode-ui && npm run build' # outputs to ../web/dist/neode-ui/
rsync -az --delete archy:Projects/archy/web/dist/neode-ui/ /tmp/archy-web/
rsync -az /tmp/archy-web/ archy228:/tmp/archy-web/
ssh archy228 'sudo rsync -a --delete /tmp/archy-web/ /opt/archipelago/web-ui/ && sudo systemctl reload nginx'
```
Note: frontend source is `neode-ui/` (has package.json). `web/` has no package.json; `web/dist/neode-ui/` is the build output.
## Commit protocol
- Never push. User mirrors to Gitea remotes manually.
- Conventional Commits. No em-dashes or fancy punctuation.
- Multi-line messages via `tmp-commit-msg.txt`: `git commit -F tmp-commit-msg.txt && rm tmp-commit-msg.txt`.
- Git remotes on .116: `gitea-local`, `gitea-vps2` (.168 OVH), `tx1138` (canonical), `origin` (multi-push alias). `.23` URLs were removed from origin and `gitea-vps` remote was deleted — working-copy change, not in any commit.
## Verification gates
1. `cargo check`
2. `cargo test -p archipelago --bin archipelago <filter>` (MUST use `--bin archipelago`; no lib target)
3. `cargo build --release`
Known issues:
- `rust-lld: undefined hidden symbol` → cargo bug with test+release incremental collision. Fix: `rm -rf core/target/debug/incremental` and retry.
- 22 pre-existing `cargo test` failures in unrelated modules (mesh/wallet/credentials/avatar/session/transport/update-mirrors/fips/identity_manager/image_versions). Not blocking. Tech debt.
---
## Architecture — locked-in patterns
### Async-spawn lifecycle (install/update/uninstall/start/stop/restart)
- RPC returns `{status, package_id}` immediately (15s client timeout).
- Wrapper flips state to transitional variant (Installing/Updating/Removing/Stopping/Starting/Restarting) BEFORE spawn.
- `tokio::spawn` runs existing monolithic inner handler with `self: Arc<Self>`.
- Install/update success: MUST explicitly write terminal Running state. `merge_preserving_transitional` in `server.rs` refuses to let scanner overwrite transitional states.
- Uninstall success: inner handler removes the entry itself.
- On error: revert pre-transition state (or remove entry for install).
Key files:
- `core/archipelago/src/api/rpc/package/async_lifecycle.rs` — full install/update/uninstall wrappers
- `core/archipelago/src/api/rpc/transitional.rs` — start/stop/restart wrappers
- `core/archipelago/src/server.rs:832-871``merge_preserving_transitional`, `is_transitional`
- `core/archipelago/src/server.rs:295-380` — scan loop with `tokio::select!` and tick bump
### Install progress (phase-based, 7 levels)
- `podman pull` emits zero parseable progress when stderr is piped (no TTY). Legacy byte regex never matched.
- Phases + UI %: Preparing (5) → PullingImage (20) → CreatingContainer (70) → StartingContainer (80) → WaitingHealthy (88) → PostInstall (95) → Done (100).
- UI bar only advances forward (`Math.max`).
- Final phase label is "Finalizing…" (renamed from "Running post-install…" which confused users).
Key files:
- `neode-ui/src/stores/server.ts:25-33``PHASE_INFO` mapper
### Scanner kick (instant Launch button)
- Scan runs every 60s. Post-install state flipped to Running but skeletal manifest (`interfaces: None`) persisted until next scan → `canLaunch(pkg)` false for up to 60s.
- `lan_address` derived from live container port bindings. `manifest.interfaces.main.ui` only populated when `lan_address.is_some() || tor_address.is_some()`.
- Fix: `scan_kick: Arc<Notify>` + `scan_tick: Arc<watch::Sender<u64>>` on `RpcHandler`. Scan loop `tokio::select!` between 60s tick + notify. `kick_scanner_and_wait` helper (2s timeout) called in install/update success paths BEFORE writing Running. Merge during Installing keeps state + takes fresh manifest.
Key files:
- `core/archipelago/src/api/rpc/mod.rs:89-93` — fields on RpcHandler; accessors :186-199
- `core/archipelago/src/api/rpc/package/async_lifecycle.rs:405-430``kick_scanner_and_wait`
- `core/archipelago/src/container/docker_packages.rs:132-218` — where `lan_address` + manifest get populated
- `neode-ui/src/views/apps/appsConfig.ts:106-111``canLaunch(pkg)`
- `neode-ui/src/views/apps/AppCard.vue:141-149` — Launch button render
### Config migration (.23 auto-purge)
- `load_mirrors` + `load_registries` normally only ADD missing defaults ("explicit removals stick").
- .23 was a default the user never chose, so we need the opposite: strip it.
- `.retain(|m| !m.url.contains("23.182.128.160"))` before defaults-merge step. Narrow-scope exception, commented in-code.
- Triggers lazily on next load (install RPC, update RPC, Settings UI open). Not tied to boot.
Key files:
- `core/archipelago/src/container/registry.rs``load_registries`
- `core/archipelago/src/update.rs``load_mirrors`
---
## Backlog — after v1.7.43 verification
1. User reports browser-verification results. Fix anything that fails.
2. Continue user's "one by one" install/uninstall/update UX queue — ask for next issue.
3. Tech debt (low priority, not blocking release):
- Vaultwarden container exits immediately on start (separate container-config issue).
- `/var/log/archipelago-container-installs.log` never created — backend runs non-root. Tmpfiles.d rule or path change.
- 22 pre-existing cargo test failures in unrelated modules.
- "Server 3 (OVH)" historical changelog entries in `AccountInfoSection.vue` left intact (user approved — they're release notes for what shipped at the time).
---
## User preferences (must follow)
- Always state which option is "best long-term" first and explain why in plain terms. Trust my recommendation unless overridden.
- "Tackle them one by one in order" — fix issues sequentially, not in a big bang.
- Bitcoin-only. No altcoins, no proprietary deps without approval.
- Prefer established OSS, crypto-first libs (rustls, argon2, ed25519), privacy-focused (no telemetry), minimal dep trees.
- Atomic commits.
- Never commit secrets. Pin dependency versions.
- Never push — user mirrors to Gitea manually.