archy/docs/RESUME.md

127 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RESUME — Rust orchestrator migration, Step 8b
Last updated: 2026-04-23 (evening, post-architecture-audit)
Read this first if you're a fresh OpenCode session resuming work. Paste the "Resume prompt" below verbatim.
---
## Resume prompt (paste this into a new opencode session)
> We are mid-migration: `docs/rust-orchestrator-migration.md` + `docs/bulletproof-containers.md` are the plan, Steps 17 + 8a are shipped on `main`, Step 8b is next. Read `docs/RESUME.md` + `docs/STEP-8B-PORT-AUDIT.md` in full. Do NOT run any container mutations or edit `scripts/container-specs.sh`, `scripts/first-boot-containers.sh`, or `scripts/reconcile-containers.sh` — those are dead code scheduled for deletion in Step 8c. Work happens in `core/container/src/manifest.rs`, `core/archipelago/src/container/prod_orchestrator.rs`, and `apps/<id>/manifest.yml`. Summarize back to me what you understand the current state to be, wait for approval before touching anything.
---
## Standing directive from the user
> Please get back to a well architected, minimal as possible, perfect working container architecture. If we've gone off track and the system is getting complex rather than elegant and perfect best containers ever then we need to review all the current state of the system and get back to making the best container system ever and according to our projects goals. We will be working on this until it's perfect.
**Interpretation (validated with the user):** resume the Rust orchestrator migration. Stop patching bash scripts. The bash scripts were supposed to be deleted three months of commits ago and we drifted into maintaining them by accident.
## Latest user comment (must be followed)
> please continue, please state my last comment in the resume doc and first before making this plan to adhere to
Adherence rule for this session:
- Before proposing or executing a plan, first record the user's latest directive in `docs/RESUME.md`.
- Keep work aligned to Step 8 migration goals and avoid off-scope drift.
Most recent directive:
> And we need to get every container working on .116 and tested before we release
Release gate update:
- `.116` must have all required containers healthy and tested before release is allowed.
- Treat runtime stabilization on `.116` as immediate priority while continuing Step 8 migration work.
---
## Where we actually are
### Shipped (Steps 17 + 8a)
Commits on `main` (unpushed to `origin`/tx1138 until release gate; user-visible history):
| Step | Commit | What |
|------|--------|------|
| 1 | (schema in place from earlier commits) | `ContainerConfig.image``ContainerConfig.build` — mutually exclusive pull-or-build source |
| 2 | `34af4d9d` | `ContainerRuntime` trait gains `image_exists` + `build_image`; `PodmanRuntime` impl |
| 3 | `b6a04d31` | `ProdContainerOrchestrator` with build-or-pull + adoption + reconcile |
| 4 | `e8a59c93` | `ContainerOrchestrator` trait; `RpcHandler` uses it in prod |
| 5 | `fc39b04b` | `BootReconciler` — periodic reconcile loop |
| 6 | `48f08aa3` | Wire both into `main.rs` |
| 7 | `069bc4a5` | `bitcoin-ui` pre-start hook renders `nginx.conf` from embedded template (the pattern for "derived config" at apply time) |
| 8a | `a0707f4d`, `1c81a739` | Retire `archipelago-reconcile` systemd timer; split Step 8 into 8a/8b/8c |
Three `apps/*/manifest.yml` are genuinely ported and running under the Rust orchestrator on `.116` + `.228`: `bitcoin-ui`, `electrs-ui`, `lnd-ui` (Step 7).
### Where we drifted (the session that produced the previous RESUME.md)
On 2026-04-23 a fedimint outage on `.116` pulled a session into patching `scripts/reconcile-containers.sh`, `scripts/container-specs.sh`, `scripts/first-boot-containers.sh` — files that Step 8c is scheduled to delete. Five bugs deep, the user halted the session. That cluster of bugs is a symptom of running two incompatible codepaths in parallel (bash first-boot/reconcile + Rust `BootReconciler`), which is exactly the condition Step 8c fixes by deleting the bash half.
**Discard-of-scope decision:** the uncommitted bash edits on `.116` (listed in the previous RESUME.md's "Uncommitted script changes" section) are not going to be committed. The fedimint mDNS-URLs fix, the filebrowser custom-args fix, the bcrypt-escape fix — these all land as changes to `apps/<id>/manifest.yml` + the Rust orchestrator in Steps 8b.0 8b.3. See `docs/STEP-8B-PORT-AUDIT.md` for the exact mapping.
### Current container state on `.116`
Running but drifted. See the "Current container state" section in the previous RESUME.md. Decision (approved by user): accept `.116` is limping until 8b.3 lands. Do not run `scripts/reconcile-containers.sh` or any mutations; all rescues go through the Rust orchestrator or wait for the manifest port.
`.228` is happier — it's already adopted by the Rust orchestrator for the three UI apps.
---
## Next step — Step 8b.0
**Concretely:** schema extensions to `core/container/src/manifest.rs` + unit tests. No orchestrator changes, no manifest changes, no container mutations.
Fields to add (justified in `docs/STEP-8B-PORT-AUDIT.md§Schema gaps`):
- `container.network: Option<String>` — podman `--network` value (`"archy-net"`, `"host"`, or `None` = isolated default).
- `container.custom_args: Vec<String>` — appended to the container command.
- `container.entrypoint: Option<Vec<String>>` — override.
- `container.derived_env: Vec<{key, template}>` — template strings resolved against `HostFacts { host_ip, host_mdns, disk_gb }` at apply time.
- `container.secret_env: Vec<{key, secret_file}>` — read from `/var/lib/archipelago/secrets/<file>` at apply time.
- `container.data_uid: Option<String>``"NNNNN:NNNNN"` applied via `chown -R` before container create.
- `Volume.volume_type: "tmpfs"` + `Volume.tmpfs_options: String` — OR a new `container.tmpfs: Vec<{target, options}>`. Pick one at implementation time.
**Tests** (block the commit until green):
- Every existing `apps/*/manifest.yml` still parses (`parse_every_real_manifest` test).
- Each new field parses correctly with sensible defaults.
- `validate()` rejects: empty custom_args elements, empty entrypoint elements, duplicate derived_env keys, derived_env templates referencing unknown host facts, secret_env with `..` or `/` in secret_file (path-traversal guard).
- `resolve_env(HostFacts)` returns expected strings for each supported placeholder.
- `resolve_secret_env(SecretsProvider)` returns expected strings; missing secret file is a hard error.
This is the smallest useful commit and unblocks every port in 8b.1+.
---
## Project ground rules (standing)
- `archy` SSH alias = `.116`. `archy228` = `.228`. **Do not swap.**
- SSHFS at `/Users/dorian/mnt/archy-thinkpad/` = `archy:Projects/archy/`.
- `.116` sudo password: `ThisIsWeb54321@` — works passwordless in-session via `sudo -nS` after first use.
- `.228` has NOPASSWD.
- Git commits on `.116` MUST use `git commit -F /tmp/tmp-msg.txt` over `ssh archy` — SSHFS `git commit` hangs.
- Never push except current release (granted: `gitea-local` + `gitea-vps2`).
- No em-dashes. Conventional Commits.
- No altcoin mentions, Bitcoin-only.
---
## Recommended next action for the fresh session
1. Read this file + `docs/STEP-8B-PORT-AUDIT.md` + the "Open decisions" section of the audit.
2. Answer the four open decisions (or confirm the recommended defaults).
3. Implement 8b.0 commit 1: add `network`, `custom_args`, `entrypoint`, `derived_env`, `secret_env`, `data_uid` fields to `ContainerConfig` + validation + unit tests. Backwards-compat: every existing `apps/*/manifest.yml` must still parse.
4. Commit + `cargo test -p archipelago-container` + stop.
Do not touch `scripts/*.sh`. Do not run `reconcile-containers.sh`. Do not live-test on `.116` or `.228` until the schema + orchestrator pieces in 8b.0 + 8b.1 are both in.
---
## Recent release (out of scope, for grep context)
v1.7.43-alpha shipped yesterday: tarball-only OTA, async install/uninstall/update lifecycle, install UX polish, `.23` VPS retirement. Manifest at `gitea-local` + `gitea-vps2`. `.228` on the new binary. See `docs/STATUS.md` for the full rundown.
Earlier session notes (container rescue on `.116`, "never fails" directive, env-drift detector experiment) are obsolete — superseded by this file. The directive ("never fails") is honored by the Step 8 migration itself: a declarative manifest regenerated on every reconcile tick can't bake stale IPs into consensus data because the env comes from derived/secret sources that are re-resolved every apply.