fix: derive launch port from URL authority, not naive rsplit
reachable_lan_address() parsed the launch port with url.rsplit(':')
which yields "8096/" for manifest interfaces.main URLs that carry a
path (http://localhost:8096/). That fails to parse and silently drops
a perfectly reachable launch URL, so apps like jellyfin, btcpay-server,
fedimint, gitea, nextcloud and portainer showed running with no launch
link in the UI. New launch_url_port() reads digits after the final
colon (mirroring port_from_url in the RPC layer) and tolerates a
trailing path. Adds regression tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
0ed892a412
commit
a483fe4baa
2
core/Cargo.lock
generated
2
core/Cargo.lock
generated
@ -80,7 +80,7 @@ checksum = "a23eb6b1614318a8071c9b2521f36b424b2c83db5eb3a0fead4a6c0809af6e61"
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "archipelago"
|
name = "archipelago"
|
||||||
version = "1.7.89-alpha"
|
version = "1.7.90-alpha"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"anyhow",
|
"anyhow",
|
||||||
"archipelago-container",
|
"archipelago-container",
|
||||||
|
|||||||
@ -699,7 +699,7 @@ async fn reachable_lan_address(app_id: &str, candidate: Option<String>) -> Optio
|
|||||||
if !requires_reachable_launch(app_id) {
|
if !requires_reachable_launch(app_id) {
|
||||||
return Some(url);
|
return Some(url);
|
||||||
}
|
}
|
||||||
let Some(port) = url.rsplit(':').next().and_then(|p| p.parse::<u16>().ok()) else {
|
let Some(port) = launch_url_port(&url) else {
|
||||||
return None;
|
return None;
|
||||||
};
|
};
|
||||||
if launch_port_reachable(port).await {
|
if launch_port_reachable(port).await {
|
||||||
@ -710,6 +710,23 @@ async fn reachable_lan_address(app_id: &str, candidate: Option<String>) -> Optio
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Extract the TCP port from a launch URL's authority.
|
||||||
|
///
|
||||||
|
/// The candidate URL can carry a path when it comes from a manifest
|
||||||
|
/// `interfaces.main` declaration (e.g. `http://localhost:8096/`). A naive
|
||||||
|
/// `rsplit(':')` then yields `"8096/"`, which fails to parse and silently
|
||||||
|
/// drops a reachable launch URL. Reading digits after the final colon mirrors
|
||||||
|
/// `port_from_url` in the RPC layer and tolerates a trailing path.
|
||||||
|
fn launch_url_port(url: &str) -> Option<u16> {
|
||||||
|
let after_colon = url.rsplit_once(':')?.1;
|
||||||
|
after_colon
|
||||||
|
.chars()
|
||||||
|
.take_while(|c| c.is_ascii_digit())
|
||||||
|
.collect::<String>()
|
||||||
|
.parse::<u16>()
|
||||||
|
.ok()
|
||||||
|
}
|
||||||
|
|
||||||
async fn launch_port_reachable(port: u16) -> bool {
|
async fn launch_port_reachable(port: u16) -> bool {
|
||||||
matches!(
|
matches!(
|
||||||
tokio::time::timeout(
|
tokio::time::timeout(
|
||||||
@ -788,3 +805,26 @@ fn package_state_str(state: &PackageState) -> &str {
|
|||||||
PackageState::Updating => "updating",
|
PackageState::Updating => "updating",
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
#[cfg(test)]
|
||||||
|
mod launch_url_port_tests {
|
||||||
|
use super::launch_url_port;
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn parses_port_with_trailing_path() {
|
||||||
|
// Regression: manifest interfaces.main yields a path-suffixed URL.
|
||||||
|
// The old rsplit(':') parse produced "8096/" and dropped the URL.
|
||||||
|
assert_eq!(launch_url_port("http://localhost:8096/"), Some(8096));
|
||||||
|
assert_eq!(launch_url_port("http://localhost:8175/admin"), Some(8175));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn parses_bare_authority_port() {
|
||||||
|
assert_eq!(launch_url_port("http://localhost:8083"), Some(8083));
|
||||||
|
}
|
||||||
|
|
||||||
|
#[test]
|
||||||
|
fn rejects_url_without_port() {
|
||||||
|
assert_eq!(launch_url_port("http://localhost/"), None);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|||||||
@ -1,6 +1,101 @@
|
|||||||
# Migration Status Report
|
# Migration Status Report
|
||||||
|
|
||||||
Last updated: 2026-06-11
|
Last updated: 2026-06-14
|
||||||
|
|
||||||
|
## Validation node (ACTIVE)
|
||||||
|
|
||||||
|
As of 2026-06-14 the app-migration lifecycle validation moves from `.198` (remote, OVH) to
|
||||||
|
**`.116` — the local dev node (`archi-thinkpad`, `192.168.1.116`)** because it is the machine
|
||||||
|
this session runs on, so the harness drives it over loopback instead of SSH (much faster, no
|
||||||
|
network latency). A separate agent owns OS-level fixes + its own test harness; this track owns
|
||||||
|
the **app-packaging migration** lifecycle validation only.
|
||||||
|
|
||||||
|
How to drive the harness against `.116` (local):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ARCHY_HOST=127.0.0.1 ARCHY_SCHEME=http ARCHY_PASSWORD='ThisIsWeb54321@' \
|
||||||
|
ARCHY_APPS='meshtastic,jellyfin,filebrowser,uptime-kuma' \
|
||||||
|
tests/lifecycle/remote-lifecycle.sh # focused, audit-only (non-destructive)
|
||||||
|
```
|
||||||
|
|
||||||
|
- `.116` serves nginx on **:80 only** (443 is tailscale's) → use `ARCHY_SCHEME=http`, `ARCHY_HOST=127.0.0.1`.
|
||||||
|
- Local node is healthy: `update_state.json.current_version == 1.7.90-alpha`, `update_in_progress=false`
|
||||||
|
(the OTA self-heal that was a follow-up gap in PROGRESS_MEMORY is now confirmed resolved on .116).
|
||||||
|
- Login password for `.116`: `ThisIsWeb54321@` (verified against `auth.login`). Note: auth.login
|
||||||
|
has a login rate-limiter — avoid rapid repeated attempts.
|
||||||
|
- `.198` results below remain the prior baseline; new results are tagged `[.116]`.
|
||||||
|
|
||||||
|
### [.116] audit log (newest first)
|
||||||
|
|
||||||
|
- **2026-06-14 — focused audit `meshtastic,jellyfin,filebrowser,uptime-kuma` (audit-only, non-destructive):**
|
||||||
|
harness exit 1, FAILED checks: 1.
|
||||||
|
- `filebrowser` — running, pass (also passed a standalone single-app smoke run).
|
||||||
|
- `uptime-kuma` — running, pass.
|
||||||
|
- `meshtastic` — `state=absent`. Not installed on `.116` (was installed/validated on `.198`).
|
||||||
|
Not a regression; just node state. To exercise meshtastic here, install it first (it needs
|
||||||
|
`/dev/ttyUSB0`, which `.116` may not have) or drop it from the focused set on this node.
|
||||||
|
- `jellyfin` — **running but FAILED: "launch metadata missing: jellyfin has no lan_address".**
|
||||||
|
**ROOT-CAUSED 2026-06-14 — real, current bug in the working tree (a regression).** See
|
||||||
|
"FINDING F1" below.
|
||||||
|
|
||||||
|
### [.116] FINDING F1 — manifest launch URLs with a path are silently dropped (OPEN, fix pending)
|
||||||
|
|
||||||
|
**Symptom:** `jellyfin` is `running` and genuinely serving (`curl 127.0.0.1:8096/` → 302), but
|
||||||
|
`container-list` reports `lan_address: null`, so the UI/harness sees no launch URL.
|
||||||
|
|
||||||
|
**Root cause:** `core/archipelago/src/container/docker_packages.rs::reachable_lan_address()` parses
|
||||||
|
the port out of the candidate URL with `url.rsplit(':').next()`. When the candidate comes from the
|
||||||
|
manifest `interfaces.main` (via `PodmanClient::lan_address_for` →
|
||||||
|
`core/container/src/podman_client.rs::manifest_primary_interface_url`), the URL **includes the
|
||||||
|
manifest `path`** — e.g. jellyfin → `http://localhost:8096/`. Then `rsplit(':').next()` yields
|
||||||
|
`"8096/"`, which **fails to `parse::<u16>()`**, so the function hits its `else { return None }`
|
||||||
|
branch and drops a perfectly reachable launch URL. (Diagnostic tell: the dropped-at-parse path
|
||||||
|
emits **no** log, whereas a genuine unreachable port logs "suppressing unreachable launch URL".
|
||||||
|
jellyfin has no such log; uptime-kuma — whose candidate `…:3002` has no path — does.)
|
||||||
|
|
||||||
|
**Why it's a regression:** the old `extract_lan_address(ports)` produced `http://localhost:PORT`
|
||||||
|
(no path), which parsed fine. The newer manifest-interface feature appends the declared `path`,
|
||||||
|
so any app routed through `lan_address_for` now yields `…:PORT/` and trips the parser.
|
||||||
|
|
||||||
|
**Blast radius (apps in `requires_reachable_launch` whose `interfaces.main.path` = `/`):**
|
||||||
|
`botfights`, `btcpay-server`, `fedimint`, `jellyfin`, `gitea`, `nextcloud`, `portainer`.
|
||||||
|
(`filebrowser`/`nextcloud`/`nginx-proxy-manager`/`vaultwarden` are in `uses_allocated_launch_port`
|
||||||
|
so they hit `extract_lan_address` first and dodge it; `grafana`/`mempool`/`uptime-kuma`/`searxng`
|
||||||
|
have no manifest `interfaces.main` path.) On `.198` this likely went unnoticed because those apps
|
||||||
|
weren't all running during the launch-metadata assertion, or predated the interfaces.main addition.
|
||||||
|
|
||||||
|
**Fix (IMPLEMENTED in working tree, uncommitted):**
|
||||||
|
`docker_packages.rs::reachable_lan_address` now parses the port via a new `launch_url_port()`
|
||||||
|
helper that reads digits after the final colon (`take_while(is_ascii_digit)`), mirroring the
|
||||||
|
RPC-layer `port_from_url`, so `http://localhost:8096/` → `Some(8096)`. Added unit tests
|
||||||
|
(`launch_url_port_tests`) covering the trailing-path regression, the bare-authority case, and a
|
||||||
|
no-port reject. The existing `lan_address_prefers_manifest_main_interface` test only exercised
|
||||||
|
`lan_address_for` (which always returned `…:8175/`) and never the `reachable_lan_address` wrapper,
|
||||||
|
which is why the bug slipped through.
|
||||||
|
|
||||||
|
**Unit validation: GREEN (2026-06-14).** `cargo test -p archipelago --bin archipelago launch_url_port`
|
||||||
|
→ 3 passed / 0 failed (trailing-path, bare-authority, no-port-reject); crate compiles clean.
|
||||||
|
|
||||||
|
**Coordination note (shared tree):** the repo is on branch `fix/wallet-receive-portdrift-secrets`
|
||||||
|
at commit `bb808df8` (= the deployed 1.7.90-alpha). A parallel agent has uncommitted changes here
|
||||||
|
(lnd `wallet.rs`, `bitcoin_relay.rs`, `prod_orchestrator.rs`, electrumx manifest, neode-ui, new
|
||||||
|
bats). To validate F1 in isolation (and NOT deploy their in-flight work onto the live node, nor
|
||||||
|
disturb their tree), the live-validation build is done in a detached git worktree at
|
||||||
|
`/home/archipelago/archy-f1` = clean `bb808df8` + only the F1 `docker_packages.rs` change. Build:
|
||||||
|
`cd /home/archipelago/archy-f1/core && TMPDIR=/home/archipelago/.buildtmp cargo build --release -p archipelago`
|
||||||
|
(`.116`'s `/tmp` is a 7.7G tmpfs that runs 100% full → the ring crate's C compile fails with
|
||||||
|
"No space left on device"; redirect `TMPDIR` to `/` which has ~399G). After validation the
|
||||||
|
worktree is removed (`git worktree remove`). NOTE: sideloading replaces the OTA-managed
|
||||||
|
`/usr/local/bin/archipelago` with a local 1.7.90-alpha+F1 build until the next OTA — back up the
|
||||||
|
current binary first (`/usr/local/bin/archipelago.pre-f1.bak`).
|
||||||
|
|
||||||
|
**Live validation status — PENDING, GATED ON USER OK.** Proving F1 on `.116` requires a backend
|
||||||
|
rebuild + redeploy + `systemctl restart archipelago`. Per `feedback_no_systemctl_deploy_until_quadlet`
|
||||||
|
that restart SIGKILLs every container in the service cgroup (the boot reconciler re-adopts them, but
|
||||||
|
it is disruptive on this live node that runs the user's real bitcoin/lnd/apps). So: unit test first
|
||||||
|
(non-disruptive), then ask before the redeploy. Do NOT run the prod binary directly to "check a
|
||||||
|
version" — `/usr/local/bin/archipelago <anyflag>` boots a whole second node instance (learned the
|
||||||
|
hard way 2026-06-14; it exited without leaving a stray, but don't repeat).
|
||||||
|
|
||||||
## Goal
|
## Goal
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user