archy/docs/CHAT_TRANSCRIPT_2026-05-02.md
2026-05-06 09:23:57 -04:00

10 KiB

Chat Transcript And Working Notes

Date: 2026-05-02

This file captures the current chat context, decisions, progress, and next steps so work can continue from another device/session.

User Request

The user asked to continue hardening Archipelago app/container lifecycle, then asked multiple times to save the plan/progress/next steps and finally to save the entire chat to Markdown.

Key user constraints and corrections:

  • Continue if next steps are clear; ask only if blocked.
  • Exhaustively harden app/container lifecycle before release.
  • Preserve data during destructive lifecycle testing unless explicitly instructed otherwise.
  • Do not rely on /app/... proxy paths for app launch/testing. The user corrected: “we never use paths only ports.”
  • LND/Electrum wallet-connect tests must validate real connection details and QR, including Tor.

Earlier Progress Summary

Before the latest work, the project already had substantial lifecycle hardening in progress:

  • Remote lifecycle harness exists at tests/lifecycle/remote-lifecycle.sh.
  • .198 SSH works with /home/archipelago/.ssh/id_ed25519.
  • .228 RPC works, but SSH is blocked with Permission denied (publickey,password).
  • Multiple backend release binaries were built and deployed to .198 with backups in /usr/local/bin/archipelago.bak-*.
  • Fixed stale package scanner state recovery from Removing -> Running when a container is actually live.
  • Fixed startup ordering so crash recovery runs before BootReconciler.
  • Removed dangerous automatic Podman runtime directory deletion on podman info failure.
  • Narrowed generic crash recovery to safe legacy containers.
  • Fixed companion reconciliation on install/start/restart.
  • Fixed uninstall/reinstall behavior so uninstall disables manifest apps instead of deleting manifest availability, and reinstall re-enables them.
  • Fixed LND config generation/repair:
    • bitcoin.active=true
    • bitcoin.mainnet=true
    • bitcoin.node=bitcoind
    • bitcoind.rpchost=bitcoin-knots:8332
    • sudo fallback for writing container-owned config paths.
  • .198 had previously passed focused lifecycle for filebrowser, bitcoin-knots, and a looser LND launch test.

Major Files Touched In This Session

  • docs/CONTAINER_LIFECYCLE_HANDOFF.md
  • docs/CHAT_TRANSCRIPT_2026-05-02.md
  • tests/lifecycle/remote-lifecycle.sh
  • core/archipelago/src/container/lnd.rs
  • core/archipelago/src/container/companion.rs
  • core/archipelago/src/container/prod_orchestrator.rs
  • core/archipelago/src/container/docker_packages.rs
  • core/container/src/podman_client.rs
  • core/archipelago/src/port_allocator.rs
  • apps/lnd-ui/manifest.yml
  • neode-ui/src/views/appSession/appSessionConfig.ts
  • neode-ui/src/stores/container.ts
  • neode-ui/src/stores/appLauncher.ts
  • neode-ui/src/views/appDetails/appDetailsData.ts
  • nginx config/snippet files under scripts/ and image-recipe/

LND Wallet Bootstrap Investigation

Initial strict LND probe failed because /lnd-connect-info could not read admin.macaroon:

Failed to read LND admin macaroon — is LND installed?
direct: Permission denied (os error 13)
sudo: cat: /var/lib/archipelago/lnd/data/chain/bitcoin/mainnet/admin.macaroon: No such file or directory

LND logs showed the wallet was uninitialized/locked:

Waiting for wallet encryption password. Use lncli create...

Tests showed lncli create is interactive and does not support --stdin:

[lncli] flag provided but not defined: -stdin

lncli unlock --stdin is supported, so the final approach was:

  • Use LND REST unlocker endpoints for new wallet creation.
  • Use lncli unlock --stdin only for an existing wallet.
  • Treat “wallet already exists” from REST as a signal to unlock.
  • Use sudo-aware checks/reads for wallet artifacts because LND data directories are container-owned and 0700.

Implemented in core/archipelago/src/container/lnd.rs:

  • ensure_wallet_initialized()
  • file_exists_as_root()
  • read_file_as_root()
  • init_wallet_via_rest()
  • get_lnd_unlocker_json()
  • post_lnd_unlocker_json()
  • unlock_existing_wallet()
  • wait_for_admin_macaroon()
  • lnd_getinfo_ready()

Focused Rust test passes:

cd /home/archipelago/Projects/archy/core
cargo test -p archipelago --bin archipelago lnd

Result:

7 passed; 0 failed

LND UI Port Collision

The strict LND UI test then failed with 502.

Investigation found a real port collision:

  • nostr-rs-relay uses host 8081.
  • Old archy-lnd-ui also used host 8081.
  • nginx /app/lnd/ proxy also pointed at 8081.

Fix implemented:

  • Move LND UI companion to host port 18083, container port 80.
  • Keep nostr-rs-relay on 8081.
  • Update app metadata/routing to 18083.
  • Update tests to expect direct port launch.

Important correction from user:

we never use paths only ports, how many times do you need to be told

Action taken after correction:

  • Stop validating through /app/lnd/ and /app/electrumx/ in the lifecycle harness.
  • Switch launch_url_for() to direct app ports.
  • Switch app session resolver to direct http://host:port launch, even from HTTPS parent pages.
  • Remove use of HTTPS_PROXY_PATHS[id] in resolveAppUrl().

Direct-port LND audit command:

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd tests/lifecycle/remote-lifecycle.sh

Result:

### 192.168.1.198 iteration 1 / 1 ###
lnd                    state=running
all checks passed

The audit now validates http://192.168.1.198:18083/, not /app/lnd/.

Lifecycle Harness Changes

tests/lifecycle/remote-lifecycle.sh changes made:

  • Normalize package states with ascii_downcase because API returned Running.
  • Direct port launch URLs:
    • LND: http://${ARCHY_HOST}:18083/
    • Electrum/Electrs: http://${ARCHY_HOST}:50002/
    • Bitcoin UI: http://${ARCHY_HOST}:8334/
    • Other apps mapped to direct ports where known.
  • LND probe checks:
    • Connect Your Wallet
    • id="lndQrBox"
    • id="connHost"
    • value="rest-tor"
    • value="grpc-tor"
    • value="rest-local"
    • value="grpc-local"
    • Copy lndconnect URI
    • /lnd-connect-info cert, macaroon, ports, and Tor onion.
  • Electrum probe checks:
    • local QR container and address field
    • Tor QR container and onion field
    • port 50001
    • QR renderer
    • direct http://${ARCHY_HOST}:50002/qrcode.js
    • /electrs-status Tor onion.
  • Full lifecycle now fails immediately on any failed phase with || return 1 so a later reinstall cannot mask a failed restart/probe.

Deployments To .198

Several release builds were made and deployed:

cd /home/archipelago/Projects/archy/core
cargo build -p archipelago --bin archipelago --release

Deploy pattern:

scp -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no \
  /home/archipelago/Projects/archy/core/target/release/archipelago \
  archipelago@192.168.1.198:/tmp/archipelago.new

ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no \
  archipelago@192.168.1.198 \
  "sudo cp /usr/local/bin/archipelago /usr/local/bin/archipelago.bak-<timestamp> && \
   sudo install -m 0755 /tmp/archipelago.new /usr/local/bin/archipelago && \
   sudo systemctl restart archipelago.service && \
   systemctl is-active archipelago.service"

Latest deploy returned:

active

.198 Current Observations

After forcing LND package restart, companion reconciliation succeeded:

nostr-rs-relay Up ... 0.0.0.0:8081->8080/tcp
lnd Up ... 0.0.0.0:8080->8080/tcp, 0.0.0.0:9735->9735/tcp, 0.0.0.0:10009->10009/tcp
archy-lnd-ui Up ... 0.0.0.0:18083->80/tcp

Direct UI test from .198 returned 200:

curl -i http://127.0.0.1:18083/

Strict direct-port LND audit is green:

lnd                    state=running
all checks passed

Full LND Lifecycle Status

Full direct-port lifecycle was started:

ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh

It reached:

### 192.168.1.198 iteration 1 / 1 ###
== lnd: install ==
== lnd: stop ==

Then the user aborted the command while asking to save memory/transcript.

The next continuation point is to rerun full LND direct-port lifecycle from scratch and inspect the stop phase if it hangs/fails.

Handoff File

A durable handoff file was also created:

docs/CONTAINER_LIFECYCLE_HANDOFF.md

It contains the plan, progress, current blockers, and next steps.

Immediate Next Steps

  1. Rerun full strict LND direct-port lifecycle:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=lnd ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
  1. If it hangs/fails at stop, inspect package runtime stop path and logs:
ssh -i /home/archipelago/.ssh/id_ed25519 -o StrictHostKeyChecking=no archipelago@192.168.1.198 \
  'journalctl -u archipelago.service -n 260 --no-pager | egrep -i "package\.(stop|start|restart|install|uninstall)|lnd|companion|error|failed" | sed -n "1,220p"; podman ps -a --format "{{.Names}} {{.Status}} {{.Ports}}" | egrep "lnd|nostr" || true'
  1. If stop is unreliable, inspect/fix:
  • core/archipelago/src/api/rpc/package/runtime.rs
  • core/archipelago/src/container/prod_orchestrator.rs

Likely causes to check:

  • Reconciler restarting LND while stop is expected.
  • State scanner reporting stale running.
  • Companion handling interfering with parent app state.
  • Async lifecycle returning before actual stop completes.
  1. Once LND full lifecycle is green, run Electrum strict lifecycle with direct port 50002:
ARCHY_HOST=192.168.1.198 ARCHY_PASSWORD=password123 ARCHY_APPS=electrumx ARCHY_FULL_LIFECYCLE=1 tests/lifecycle/remote-lifecycle.sh
  1. Continue with app groups after LND/Electrum:
  • filebrowser
  • bitcoin-knots
  • lnd
  • electrumx
  • mempool
  • btcpay-server
  • fedimint
  • remaining catalog apps.

Important Instruction To Preserve

Use ports only for app launch/testing. Do not add or rely on /app/... path proxy launch behavior unless the user explicitly changes this requirement.