From 2a2f10608b2644ed1ce552d0d7b3c0a7a40ac202 Mon Sep 17 00:00:00 2001 From: archipelago Date: Thu, 23 Apr 2026 04:17:56 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20STATUS.md=20=E2=80=94=20.228=20dashboar?= =?UTF-8?q?d=20bugs=20fixed=20(macaroon=20+=20ExtraHost)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/STATUS.md | 36 ++++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/docs/STATUS.md b/docs/STATUS.md index c8ccf919..084c58dd 100644 --- a/docs/STATUS.md +++ b/docs/STATUS.md @@ -1,6 +1,6 @@ # RESUME HERE — Rust orchestrator migration -Updated: 2026-04-23 (Step 9 complete on .228, Step 10 next) +Updated: 2026-04-23 (Step 9 + .228 dashboard bug fixes complete, Step 10 / chaos matrix next) **To resume this work, SSH into the ThinkPad and run `opencode` from `~/Projects/archy/`. Or work from the laptop via the SSHFS mount at `~/mnt/archy-thinkpad/`.** @@ -16,15 +16,27 @@ Working through the 11-step plan in [`rust-orchestrator-migration.md`](./rust-or - [x] **Step 6** — `48f08aa3` main.rs wire-up (orchestrator construction + adopt_existing + BootReconciler spawn + shutdown Notify) - [x] **Step 7** — `069bc4a5` bitcoin-ui pre-start hook + embedded nginx.conf template (8 unit tests + 1 integration test), 39/39 container:: tests pass - [x] **Step 8a** — `a0707f4d` retire archipelago-reconcile.{service,timer} + ISO builder touchpoints, keep scripts for update.rs -- [x] **Step 9** — **Hot-swap on .228 verified.** All three UIs (bitcoin-ui/lnd-ui/electrs-ui) installing + serving HTTP 200. Adoption + reconciler + pre-start hook + dependency ordering all working under the prod code path. See "Step 9 evidence" below. +- [x] **Step 9** — **Hot-swap on .228 verified.** All three UIs (bitcoin-ui/lnd-ui/electrs-ui) installing + serving HTTP 200. +- [x] **.228 dashboard bugs** — ExtraHost `192.168.1.254` bug (`3ee192ba`) + LND macaroon permission bug (`be960023`). See "Post-Step 9 bug hunt" below. - [ ] **Step 8b** — Port remaining ~25 container creations from `first-boot-containers.sh` into `apps//manifest.yml`, then port `update.rs` to orchestrator (deferred, multi-day work) - [ ] **Step 8c** — Rename `first-boot-containers.sh` → `first-boot-setup.sh`, strip container ops, keep setup. Delete `reconcile-containers.sh` + `container-specs.sh`. Add ISO lines to copy `apps/` (final one-way door, requires 8b complete) - [ ] **Step 10** — Hot-swap + verify on .116 (adoption-heavy test — .116 already has all containers running) -- [ ] **Step 11** — Chaos matrix on both nodes +- [ ] **Step 11** — Chaos matrix on both nodes (all 8 scenarios × all containers incl. bitcoin-core) + +## Post-Step 9 bug hunt (.228, 2026-04-23) + +User reported three visible dashboard bugs after Step 9 verification: +1. LND — "no connect details or QR" +2. ElectrumX — stuck at "Building index (2 KB / ~130 GB)" for days +3. bitcoin-core — in scope for chaos testing + +**Root cause #1 (ExtraHost, commit `3ee192ba`)**: `scripts/first-boot-containers.sh` computed `HOST_GATEWAY` from `ip route show default`, which returns the **LAN router** (e.g. 192.168.1.254), not the gateway to the host. Every container configured with `--add-host=host.containers.internal:$HOST_GATEWAY` was dialing the WiFi router instead of the host. LND crash-looped with `dial tcp 192.168.1.254:8332: connection refused`; ElectrumX's DAEMON_URL hit the same dead end; any `archy-net` bridge consumer of bitcoin-core's RPC was broken. Fixed by replacing the computed value with podman's magic `host-gateway` literal (supported since 4.4; we ship 5.4.2). Live-recreated bitcoin-core/electrumx/lnd on .228 with the corrected `--add-host`; LND reached chain backend; ElectrumX resumed indexing (went from 2 KB → 164.9 MB in under an hour). + +**Root cause #2 (macaroon permissions, commit `be960023`)**: LND's `admin.macaroon` lives at `/var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon`, owned by rootless-podman subordinate UID 100000, mode 640. The archipelago server runs as host UID 1000 and literally cannot read the file. Every LND RPC (`getinfo`, `connect-info`, `export-channel-backup`) plus the shared `lnd_client()` helper failed with "Failed to read LND admin macaroon". **Confirmed pre-existing on .116 too** (long-standing bug unrelated to Step 9). Fix: centralised the path as `LND_ADMIN_MACAROON_PATH`, added a `read_lnd_admin_macaroon()` helper in `api/rpc/lnd/mod.rs` that tries direct read first then falls back to `sudo -n cat` (mirrors the pattern already used for Tor onion hostnames). Four call sites routed through the helper. Verified on .228 — `curl -k https:///lnd-connect-info` now returns 200 with cert + macaroon + tor_onion; dashboard QR unblocked. ## Step 9 evidence (.228, 2026-04-23) -- Binary: `fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes` (`732df1b8`) + `feat(systemd): delegate cgroup controllers` (`ba83f9bc`), built on .116, scp'd to .228 as `/usr/local/bin/archipelago`. Old binary backed up at `/usr/local/bin/archipelago.bak-pre-step9`. +- Binary: Step 9 build with `732df1b8` + `ba83f9bc`, scp'd to .228 as `/usr/local/bin/archipelago`. Old binary backed up at `/usr/local/bin/archipelago.bak-pre-step9`. Later replaced with macaroon-fix build (`be960023`); previous backed up at `/usr/local/bin/archipelago.bak-pre-macaroon`. - DEV_MODE override disabled (`override.conf` → `override.conf.disabled-pre-step9`). - `/opt/archipelago/apps/{bitcoin-ui,electrs-ui,lnd-ui}/manifest.yml` populated. - `/opt/archipelago/docker/bitcoin-ui/Dockerfile` replaced with the Step 7 version (no `COPY nginx.conf`). Old dir backed up as `bitcoin-ui.bak-pre-step9`. @@ -34,16 +46,20 @@ Working through the 11-step plan in [`rust-orchestrator-migration.md`](./rust-or - `bitcoin-ui nginx.conf rendered path=/var/lib/archipelago/bitcoin-ui/nginx.conf auth_hash=97af1c18` — pre-start hook fires in `install_fresh` - `curl localhost:8334` → HTTP 200 (bitcoin-ui), `:8081` → 200 (lnd-ui), `:50002` → 200 (electrs-ui) - OCI memory limits correctly applied: bitcoin-ui=128Mi, electrs-ui=128Mi, lnd-ui=64Mi (was emitted as 0 pre-fix) -- bitcoin-core / filebrowser / lnd / electrumx continue running untouched (prod orchestrator currently only manages apps it has manifests for; Step 8b expands that scope). -## Two bugs found & fixed during Step 9 +## Bugs fixed this session -1. **`parse_memory_limit` truncation bug** (`732df1b8`): lowercased "128Mi" → "128mi" → `trim_end_matches('m')` → "128i" → f64 parse fails → `None.unwrap_or(0)` → OCI `memory.limit:0` → systemd rejects MemoryMax=0 at container start. Every manifest in `apps/` uses IEC suffixes so every ProdContainerOrchestrator install was DOA. Now handles Ki/Mi/Gi/Ti + SI decimal + shorthand + raw bytes; 6 regression tests. Also changed `create_container` to OMIT the memory/cpu fields on absent/unparseable input rather than emitting 0. -2. **`archipelago.service` cgroup delegation missing** (`ba83f9bc`): not the root cause of Step 9 failures (bug #1 was), but belt-and-braces so future code paths using the `--memory` CLI arg (runtime.rs DockerRuntime) don't hit the same systemd rejection on hosts where podman needs to create libpod scopes inside a non-delegated system-slice subtree. Added `Delegate=memory pids cpu io`. +1. **`parse_memory_limit` truncation bug** (`732df1b8`): lowercased "128Mi" → "128mi" → `trim_end_matches('m')` → "128i" → f64 parse fails → `None.unwrap_or(0)` → OCI `memory.limit:0` → systemd rejects MemoryMax=0. 6 regression tests; `create_container` now omits instead of emitting 0. +2. **`archipelago.service` cgroup delegation missing** (`ba83f9bc`): belt-and-braces `Delegate=memory pids cpu io`. +3. **ExtraHost `192.168.1.254`** (`3ee192ba`): see Post-Step 9 bug hunt above. +4. **LND admin.macaroon unreadable** (`be960023`): see Post-Step 9 bug hunt above. ## Commits made this session ``` +3ee192ba fix(first-boot): use podman host-gateway magic for host.containers.internal +be960023 fix(lnd): read admin macaroon via sudo fallback +4b8ef0a0 docs: STATUS.md through Step 9 (.228 hot-swap verified) ba83f9bc feat(systemd): delegate cgroup controllers to archipelago.service 732df1b8 fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes a0707f4d refactor: retire archipelago-reconcile.{service,timer} (Step 8a) @@ -52,11 +68,11 @@ a0707f4d refactor: retire archipelago-reconcile.{service,timer} (Step 8a) 069bc4a5 feat: bitcoin-ui pre-start hook (Step 7) ``` -Branch is **17 commits ahead of tx1138/main** (local only — user pushes to mirrors personally). +Branch is **19 commits ahead of tx1138/main** (local only — user pushes to mirrors personally). ## Uncommitted state -Clean. Only untracked: `tests/` (bats harness from prior session, not in scope). +Clean. Only untracked: `tests/` (bats harness from prior session, not in scope), `tmp-dump-spec.py` (scratch). ## Answered design questions (no need to re-ask)