318 lines
10 KiB
Markdown
318 lines
10 KiB
Markdown
# Chat Transcript And Working Notes
|
|
|
|
Date: 2026-05-02
|
|
|
|
This file captures the current chat context, decisions, progress, and next steps so work can continue from another device/session.
|
|
|
|
## User Request
|
|
|
|
The user asked to continue hardening Archipelago app/container lifecycle, then asked multiple times to save the plan/progress/next steps and finally to save the entire chat to Markdown.
|
|
|
|
Key user constraints and corrections:
|
|
|
|
- Continue if next steps are clear; ask only if blocked.
|
|
- Exhaustively harden app/container lifecycle before release.
|
|
- Preserve data during destructive lifecycle testing unless explicitly instructed otherwise.
|
|
- Do not rely on `/app/...` proxy paths for app launch/testing. The user corrected: “we never use paths only ports.”
|
|
- LND/Electrum wallet-connect tests must validate real connection details and QR, including Tor.
|
|
|
|
## Earlier Progress Summary
|
|
|
|
Before the latest work, the project already had substantial lifecycle hardening in progress:
|
|
|
|
- Remote lifecycle harness exists at `tests/lifecycle/remote-lifecycle.sh`.
|
|
- `.198` SSH works with `/home/archipelago/.ssh/id_ed25519`.
|
|
- `.228` RPC works, but SSH is blocked with `Permission denied (publickey,password)`.
|
|
- Multiple backend release binaries were built and deployed to `.198` with backups in `/usr/local/bin/archipelago.bak-*`.
|
|
- Fixed stale package scanner state recovery from `Removing -> Running` when a container is actually live.
|
|
- Fixed startup ordering so crash recovery runs before BootReconciler.
|
|
- Removed dangerous automatic Podman runtime directory deletion on `podman info` failure.
|
|
- Narrowed generic crash recovery to safe legacy containers.
|
|
- Fixed companion reconciliation on install/start/restart.
|
|
- Fixed uninstall/reinstall behavior so uninstall disables manifest apps instead of deleting manifest availability, and reinstall re-enables them.
|
|
- Fixed LND config generation/repair:
|
|
- `bitcoin.active=true`
|
|
- `bitcoin.mainnet=true`
|
|
- `bitcoin.node=bitcoind`
|
|
- `bitcoind.rpchost=bitcoin-knots:8332`
|
|
- sudo fallback for writing container-owned config paths.
|
|
- `.198` had previously passed focused lifecycle for `filebrowser`, `bitcoin-knots`, and a looser LND launch test.
|
|
|
|
## Major Files Touched In This Session
|
|
|
|
- `docs/CONTAINER_LIFECYCLE_HANDOFF.md`
|
|
- `docs/CHAT_TRANSCRIPT_2026-05-02.md`
|
|
- `tests/lifecycle/remote-lifecycle.sh`
|
|
- `core/archipelago/src/container/lnd.rs`
|
|
- `core/archipelago/src/container/companion.rs`
|
|
- `core/archipelago/src/container/prod_orchestrator.rs`
|
|
- `core/archipelago/src/container/docker_packages.rs`
|
|
- `core/container/src/podman_client.rs`
|
|
- `core/archipelago/src/port_allocator.rs`
|
|
- `apps/lnd-ui/manifest.yml`
|
|
- `neode-ui/src/views/appSession/appSessionConfig.ts`
|
|
- `neode-ui/src/stores/container.ts`
|
|
- `neode-ui/src/stores/appLauncher.ts`
|
|
- `neode-ui/src/views/appDetails/appDetailsData.ts`
|
|
- nginx config/snippet files under `scripts/` and `image-recipe/`
|
|
|
|
## LND Wallet Bootstrap Investigation
|
|
|
|
Initial strict LND probe failed because `/lnd-connect-info` could not read `admin.macaroon`:
|
|
|
|
```text
|
|
Failed to read LND admin macaroon — is LND installed?
|
|
direct: Permission denied (os error 13)
|
|
sudo: cat: /var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon: No such file or directory
|
|
```
|
|
|
|
LND logs showed the wallet was uninitialized/locked:
|
|
|
|
```text
|
|
Waiting for wallet encryption password. Use lncli create...
|
|
```
|
|
|
|
Tests showed `lncli create` is interactive and does not support `--stdin`:
|
|
|
|
```text
|
|
[lncli] flag provided but not defined: -stdin
|
|
```
|
|
|
|
`lncli unlock --stdin` is supported, so the final approach was:
|
|
|
|
- Use LND REST unlocker endpoints for new wallet creation.
|
|
- Use `lncli unlock --stdin` only for an existing wallet.
|
|
- Treat “wallet already exists” from REST as a signal to unlock.
|
|
- Use sudo-aware checks/reads for wallet artifacts because LND data directories are container-owned and `0700`.
|
|
|
|
Implemented in `core/archipelago/src/container/lnd.rs`:
|
|
|
|
- `ensure_wallet_initialized()`
|
|
- `file_exists_as_root()`
|
|
- `read_file_as_root()`
|
|
- `init_wallet_via_rest()`
|
|
- `get_lnd_unlocker_json()`
|
|
- `post_lnd_unlocker_json()`
|
|
- `unlock_existing_wallet()`
|
|
- `wait_for_admin_macaroon()`
|
|
- `lnd_getinfo_ready()`
|
|
|
|
Focused Rust test passes:
|
|
|
|
```bash
|
|
cd /home/archipelago/Projects/archy/core
|
|
cargo test -p archipelago --bin archipelago lnd
|
|
```
|
|
|
|
Result:
|
|
|
|
```text
|
|
7 passed; 0 failed
|
|
```
|
|
|
|
## LND UI Port Collision
|
|
|
|
The strict LND UI test then failed with `502`.
|
|
|
|
Investigation found a real port collision:
|
|
|
|
- `nostr-rs-relay` uses host `8081`.
|
|
- Old `archy-lnd-ui` also used host `8081`.
|
|
- nginx `/app/lnd/` proxy also pointed at `8081`.
|
|
|
|
Fix implemented:
|
|
|
|
- Move LND UI companion to host port `18083`, container port `80`.
|
|
- Keep `nostr-rs-relay` on `8081`.
|
|
- Update app metadata/routing to `18083`.
|
|
- Update tests to expect direct port launch.
|
|
|
|
Important correction from user:
|
|
|
|
```text
|
|
we never use paths only ports, how many times do you need to be told
|
|
```
|
|
|
|
Action taken after correction:
|
|
|
|
- Stop validating through `/app/lnd/` and `/app/electrumx/` in the lifecycle harness.
|
|
- Switch `launch_url_for()` to direct app ports.
|
|
- Switch app session resolver to direct `http://host:port` launch, even from HTTPS parent pages.
|
|
- Remove use of `HTTPS_PROXY_PATHS[id]` in `resolveAppUrl()`.
|
|
|
|
Direct-port LND audit command:
|
|
|
|
```bash
|
|
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd tests/lifecycle/remote-lifecycle.sh
|
|
```
|
|
|
|
Result:
|
|
|
|
```text
|
|
### 192.168.1.198 iteration 1 / 1 ###
|
|
lnd state=running
|
|
all checks passed
|
|
```
|
|
|
|
The audit now validates `http://192.168.1.198:18083/`, not `/app/lnd/`.
|
|
|
|
## Lifecycle Harness Changes
|
|
|
|
`tests/lifecycle/remote-lifecycle.sh` changes made:
|
|
|
|
- Normalize package states with `ascii_downcase` because API returned `Running`.
|
|
- Direct port launch URLs:
|
|
- LND: `http://${ARCHY_HOST}:18083/`
|
|
- Electrum/Electrs: `http://${ARCHY_HOST}:50002/`
|
|
- Bitcoin UI: `http://${ARCHY_HOST}:8334/`
|
|
- Other apps mapped to direct ports where known.
|
|
- LND probe checks:
|
|
- `Connect Your Wallet`
|
|
- `id="lndQrBox"`
|
|
- `id="connHost"`
|
|
- `value="rest-tor"`
|
|
- `value="grpc-tor"`
|
|
- `value="rest-local"`
|
|
- `value="grpc-local"`
|
|
- `Copy lndconnect URI`
|
|
- `/lnd-connect-info` cert, macaroon, ports, and Tor onion.
|
|
- Electrum probe checks:
|
|
- local QR container and address field
|
|
- Tor QR container and onion field
|
|
- port `50001`
|
|
- QR renderer
|
|
- direct `http://${ARCHY_HOST}:50002/qrcode.js`
|
|
- `/electrs-status` Tor onion.
|
|
- Full lifecycle now fails immediately on any failed phase with `|| return 1` so a later reinstall cannot mask a failed restart/probe.
|
|
|
|
## Deployments To `.198`
|
|
|
|
Several release builds were made and deployed:
|
|
|
|
```bash
|
|
cd /home/archipelago/Projects/archy/core
|
|
cargo build -p archipelago --bin archipelago --release
|
|
```
|
|
|
|
Deploy pattern:
|
|
|
|
```bash
|
|
scp -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no \
|
|
/home/archipelago/Projects/archy/core/target/release/archipelago \
|
|
archipelago@192.168.1.198:/tmp/archipelago.new
|
|
|
|
ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no \
|
|
archipelago@192.168.1.198 \
|
|
"sudo cp /usr/local/bin/archipelago /usr/local/bin/archipelago.bak-<timestamp> && \
|
|
sudo install -m 0755 /tmp/archipelago.new /usr/local/bin/archipelago && \
|
|
sudo systemctl restart archipelago.service && \
|
|
systemctl is-active archipelago.service"
|
|
```
|
|
|
|
Latest deploy returned:
|
|
|
|
```text
|
|
active
|
|
```
|
|
|
|
## `.198` Current Observations
|
|
|
|
After forcing LND package restart, companion reconciliation succeeded:
|
|
|
|
```text
|
|
nostr-rs-relay Up ... 0.0.0.0:8081->8080/tcp
|
|
lnd Up ... 0.0.0.0:8080->8080/tcp, 0.0.0.0:9735->9735/tcp, 0.0.0.0:10009->10009/tcp
|
|
archy-lnd-ui Up ... 0.0.0.0:18083->80/tcp
|
|
```
|
|
|
|
Direct UI test from `.198` returned `200`:
|
|
|
|
```bash
|
|
curl -i http://127.0.0.1:18083/
|
|
```
|
|
|
|
Strict direct-port LND audit is green:
|
|
|
|
```text
|
|
lnd state=running
|
|
all checks passed
|
|
```
|
|
|
|
## Full LND Lifecycle Status
|
|
|
|
Full direct-port lifecycle was started:
|
|
|
|
```bash
|
|
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
|
|
```
|
|
|
|
It reached:
|
|
|
|
```text
|
|
### 192.168.1.198 iteration 1 / 1 ###
|
|
== lnd: install ==
|
|
== lnd: stop ==
|
|
```
|
|
|
|
Then the user aborted the command while asking to save memory/transcript.
|
|
|
|
The next continuation point is to rerun full LND direct-port lifecycle from scratch and inspect the stop phase if it hangs/fails.
|
|
|
|
## Handoff File
|
|
|
|
A durable handoff file was also created:
|
|
|
|
```text
|
|
docs/CONTAINER_LIFECYCLE_HANDOFF.md
|
|
```
|
|
|
|
It contains the plan, progress, current blockers, and next steps.
|
|
|
|
## Immediate Next Steps
|
|
|
|
1. Rerun full strict LND direct-port lifecycle:
|
|
|
|
```bash
|
|
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
|
|
```
|
|
|
|
2. If it hangs/fails at `stop`, inspect package runtime stop path and logs:
|
|
|
|
```bash
|
|
ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 \
|
|
'journalctl -u archipelago.service -n 260 --no-pager | egrep -i "package\.(stop|start|restart|install|uninstall)|lnd|companion|error|failed" | sed -n "1,220p"; podman ps -a --format "{{.Names}} {{.Status}} {{.Ports}}" | egrep "lnd|nostr" || true'
|
|
```
|
|
|
|
3. If stop is unreliable, inspect/fix:
|
|
|
|
- `core/archipelago/src/api/rpc/package/runtime.rs`
|
|
- `core/archipelago/src/container/prod_orchestrator.rs`
|
|
|
|
Likely causes to check:
|
|
|
|
- Reconciler restarting LND while stop is expected.
|
|
- State scanner reporting stale `running`.
|
|
- Companion handling interfering with parent app state.
|
|
- Async lifecycle returning before actual stop completes.
|
|
|
|
4. Once LND full lifecycle is green, run Electrum strict lifecycle with direct port `50002`:
|
|
|
|
```bash
|
|
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=electrumx ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
|
|
```
|
|
|
|
5. Continue with app groups after LND/Electrum:
|
|
|
|
- `filebrowser`
|
|
- `bitcoin-knots`
|
|
- `lnd`
|
|
- `electrumx`
|
|
- `mempool`
|
|
- `btcpay-server`
|
|
- `fedimint`
|
|
- remaining catalog apps.
|
|
|
|
## Important Instruction To Preserve
|
|
|
|
Use ports only for app launch/testing. Do not add or rely on `/app/...` path proxy launch behavior unless the user explicitly changes this requirement.
|