archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	44cd5eefdf	feat(rpc): spawn_transitional helper for async lifecycle ops Introduces a new RPC-layer helper that bridges the synchronous ContainerOrchestrator trait with RPC handlers that must return in <1s. The helper flips the package state to a transitional variant (Stopping / Starting / Restarting) in the StateManager so WebSocket clients see the live label immediately, then tokio::spawns the actual orchestrator call. On success it writes the final state; on error it reverts to the pre-transition state and logs via install_log(). The ContainerOrchestrator trait stays synchronous so the reconciler, boot flow, unit tests, and chaos harness keep deterministic behaviour. Async only lives in the RPC layer. Not wired to any handler yet — Commit 2 consumes this helper. Widens install_log visibility from pub(super) to pub(in crate::api::rpc) so the new sibling module can reach it.	2026-04-23 04:55:52 -04:00
archipelago	f721ecf39b	docs: STATUS.md — FUSE/SSHFS development loop section Dedicated section covering the file-ops-via-mount + git/cargo-via-ssh split that makes this dev setup work. Includes: - Exact running mount command (pulled from ps) - macFUSE + sshfs-mac brew install path - Health check + recovery sequence for when mount hangs (it will) - Full which-path-for-which-operation table - Don't-do list (cargo from mount, rsync without AppleDouble exclude, etc) - Cache caveat and inode-sharing note between mount and SSH views No code change.	2026-04-23 04:51:53 -04:00
archipelago	120a307343	docs: STATUS.md — complete SSH/key/sudo/deploy reference for next session Expands NEXT SESSION header with fully verified access info so a fresh agent has zero ambiguity: - SSH key inventory across laptop, .116, .228 (every file, purpose noted) - Actual SSH config aliases (archy, archy228) with IdentitiesOnly - Verified connectivity matrix (laptop -> both; .116 -> .228; .228 has no outbound key) - Corrected sudo state: .228 sudoers file is /etc/sudoers.d/archipelago (not archipelago-ci); .116 has archipelago-ci + archipelago-wg scope-limited drop-ins - SSHFS mount source command + AppleDouble gotcha - Cargo over SSH PATH gotcha + detached build pattern for >2min timeout - End-to-end deploy-to-.228 recipe (build, SCP, atomic swap, verify) - Git workflow rules (no push, no amend, no force, conventional commits) Removes duplicate host-reference block that the prior edit left trailing. No code change.	2026-04-23 04:49:45 -04:00
archipelago	e557e0156f	docs: STATUS.md — dashboard Stop UX bug diagnosis + async-spawn fix plan Captures full design for the next session: - Full bug sequence (5.5min blocking RPC + 30s scan clobbering transitional state) - 4-commit implementation order with exact file:line targets - Single-button UI spec with full label table - Verification gates including manual LND stop test on .228 - Architectural decision: spawn lives in RPC layer, orchestrator trait stays sync No code change yet; next session implements.	2026-04-23 04:45:12 -04:00
archipelago	1ab66f33a3	docs: STATUS.md — .228 dashboard bugs fixed (macaroon + ExtraHost)	2026-04-23 04:17:56 -04:00
archipelago	3ee192ba1f	fix(first-boot): use podman host-gateway magic for host.containers.internal The previous code computed HOST_GATEWAY from `ip route show default` to work around an alleged podman 4.3.x limitation. Two problems: 1. The comment was wrong. Podman 4.4+ supports --add-host=host-gateway natively, and we ship 5.4.2. 2. More critically, `ip route show default` returns the LAN router (e.g. 192.168.1.254) — the gateway to the internet, not the gateway to the host. Every container configured with DAEMON_URL or --bitcoind.rpchost=host.containers.internal was therefore dialing the WiFi router instead of the host machine, silently failing. Symptoms this caused on .228: - LND crash-looped with "dial tcp 192.168.1.254:8332: connection refused" - Dashboard showed no LND connect details or QR - ElectrumX DAEMON_URL broken; stuck at 2 KB index for days - Any service reaching bitcoin-core through the `archy-net` bridge Replace the computed value with the literal string "host-gateway", which podman translates to the correct in-network gateway at container start. Also drop the stale HOST_GATEWAY reference in the Tor-bootstrap branch (it always fell back to TARGET_IP anyway). Verified on .228: after recreating bitcoin-core/electrumx/lnd with the new flag, LND reached the chain backend, ElectrumX resumed indexing, and the dashboard /lnd-connect-info endpoint succeeded.	2026-04-23 04:16:42 -04:00
archipelago	be96002372	fix(lnd): read admin macaroon via sudo fallback LND's admin.macaroon is owned by a rootless-podman subordinate UID (typically 100000) with mode 640. The archipelago server runs as UID 1000 and cannot read the file directly, which caused every dashboard LND RPC (getinfo, connect-info, export-channel-backup) and lnd_client to fail with "Failed to read LND admin macaroon". Add a read_lnd_admin_macaroon() helper that first tries a direct read (for operators who have relaxed permissions) then falls back to `sudo -n cat`, mirroring the pattern already used for Tor hidden service hostnames in handle_lnd_connect_info. Centralise the canonical macaroon path as LND_ADMIN_MACAROON_PATH and route all four callers through the helper. Verified on .228: GET /lnd-connect-info now returns 200 with cert, macaroon, and tor_onion fields. Dashboard QR/connect-string UI unblocked.	2026-04-23 04:15:44 -04:00
archipelago	4b8ef0a098	docs: STATUS.md through Step 9 (.228 hot-swap verified) Logs Step 9 acceptance evidence, the two bugs caught and fixed during the hot-swap (parse_memory_limit IEC suffix bug in 732df1b8 and cgroup Delegate in ba83f9bc), and outlines the Step 10 plan for .116.	2026-04-23 03:46:23 -04:00
archipelago	ba83f9bce2	feat(systemd): delegate cgroup controllers to archipelago.service Adds Delegate=memory pids cpu io to the archipelago.service unit. Context: the service runs as User=archipelago under system.slice with rootless podman. When podman creates transient libpod-*.scope units for containers under user.slice, systemd needs the caller to hold CAP_SYS_ADMIN on the target cgroup subtree \u2014 which happens iff Delegate= lists the controllers we want to set. Without Delegate, any future code path that goes through the podman CLI (runtime.rs) instead of the libpod HTTP API (podman_client.rs) would hit MemoryMax rejections that have exactly the same symptom as the bug I just fixed in parse_memory_limit but with a completely different root cause. Belt-and-braces: current production path uses PodmanClient and was fixed in the preceding commit. But the DockerRuntime CLI path in runtime.rs:262-268 (cmd.arg("--memory")) is still reachable via AutoRuntime fallback on hosts without podman, and future rust orchestrator code may legitimately need cgroup delegation. This directive is no-op harmful on hosts that already delegate upstream (systemd gracefully handles duplicate/nested delegation).	2026-04-23 03:44:36 -04:00
archipelago	732df1b8cb	fix: parse_memory_limit accepts Ki/Mi/Gi IEC binary suffixes The libpod HTTP API path (PodmanClient::create_container) ran manifest memory_limit values like "128Mi" through parse_memory_limit which lowercased+trim_end_matches("m"), leaving "128i" which parse::<f64>() rejected. The resulting None became 0 via .unwrap_or(0), and podman serialised that into the OCI config as memory.limit:0. At container start time systemd then rejected MemoryMax=0 with "Value specified in MemoryMax is out of range". Silently wrong for every manifest in apps/ that uses Kubernetes-style suffixes (all of them). Became visible on .228 when Step 9 first exercised the ProdContainerOrchestrator path for bitcoin-ui and lnd-ui installs \u2014 the old first-boot-containers.sh bash script used podman run --memory 128m directly, which podman-the-CLI parses correctly, so the bug never surfaced before. Two parts: - parse_memory_limit now recognises Ki/Mi/Gi/Ti (IEC binary, what k8s and our manifests use), kB/MB/GB/TB (SI decimal), k/K/m/M/g/G/t/T (docker shorthand, treated as IEC binary for backwards compat), and bare byte integers. Filters out zero/negative results. - create_container omits the memory/cpu fields entirely when the manifest has no limit or parsing fails, rather than emitting 0. The libpod API treats absent as unlimited; 0 is "set MemoryMax=0" which systemd rightly rejects. Defence in depth against the next weird suffix someone puts in a manifest. Six regression tests in the new tests module cover IEC, SI, shorthand, raw bytes, invalid input (empty/garbage/0/negative), and whitespace.	2026-04-23 03:44:23 -04:00
archipelago	a0707f4d48	feat(iso): Step 8a — retire archipelago-reconcile systemd timer BootReconciler (in-process, 30s interval, spawned from main.rs as of Step 6 commit 48f08aa3) fully replaces the timer-driven bash reconciliation path. Delete the systemd unit + timer and their ISO-builder touchpoints. Removed: - image-recipe/configs/archipelago-reconcile.service - image-recipe/configs/archipelago-reconcile.timer - image-recipe/build-auto-installer-iso.sh L412-413 (COPY unit+timer) - image-recipe/build-auto-installer-iso.sh L449 (systemctl enable) - image-recipe/build-auto-installer-iso.sh L542-543 (cp to WORK_DIR) Kept (intentionally): - scripts/reconcile-containers.sh - scripts/container-specs.sh Reason: core/archipelago/src/api/rpc/package/update.rs still invokes reconcile-containers.sh at two sites (OTA update + rollback paths). Porting those call sites to ContainerOrchestrator::upgrade() requires manifests for every container update.rs might touch — that scope belongs in Step 8b. Until then the script stays on disk, just no longer runs on a periodic timer. No Rust code changes. cargo check -p archipelago clean, 6 pre-existing warnings. Skipped full ISO rebuild validation per user decision — edits are 5 textual deletions with zero behavioral ambiguity; Step 9 live hot-swap on .228 will catch any regression.	2026-04-23 03:04:58 -04:00
archipelago	1c81a739d6	docs: split Step 8 into 8a/8b/8c Discovered during Step 8 execution that first-boot-containers.sh creates 30+ containers with per-container logic (wallet loads, DB init, rpcauth derivations, post-create health waits) and does substantial non-container setup (secret gen, rootless-podman subuid chowns, Tor hostnames, WireGuard, firewall, nostr-relay). Only 3 of the 30+ containers have manifests today (the UIs from Step 7). Deleting the bash in a single step bricks first-boot on fresh installs. Split into: - 8a: delete reconcile-containers.sh + container-specs.sh + reconcile systemd unit + timer. BootReconciler fully covers these. Safe, atomic, no manifest porting required. - 8b: port remaining ~25 containers into apps/<id>/manifest.yml. One manifest per commit, validated against current bash behavior. Multi-day scope. - 8c: rename first-boot-containers.sh -> first-boot-setup.sh, strip container ops, keep secret/dir/Tor/WG/firewall setup. Final one-way door, requires 8b complete.	2026-04-23 02:34:43 -04:00
archipelago	6e46932f72	docs: STATUS.md through Step 7	2026-04-23 02:21:01 -04:00
archipelago	069bc4a561	feat(container): bitcoin-ui pre-start hook renders nginx.conf from embedded template Replaces the first-boot-containers.sh sed/envsubst approach with a Rust-native render step bound into the ContainerOrchestrator lifecycle. - New container::bitcoin_ui module: embeds the nginx.conf template via include_str!, reads the plaintext RPC password from /var/lib/archipelago/secrets/bitcoin-rpc-password, substitutes {{BITCOIN_RPC_AUTH}} with base64(archipelago:<password>), and atomic- writes (tmp + rename) to /var/lib/archipelago/bitcoin-ui/nginx.conf. Idempotent: byte-compares before writing so unchanged input is a no-op (no inode churn, no restart cascade). - ProdContainerOrchestrator gains run_pre_start_hooks(app_id) returning HookOutcome::{Rewritten, Unchanged}. Fires in install_fresh before create_container, and in ensure_running: on Running + Rewritten triggers a restart; on Stopped re-renders then starts. - bitcoin-ui Dockerfile no longer COPYs a default.conf; the file now arrives via runtime bind-mount of the rendered config. If the bind- mount is ever missing, nginx starts with no site configured and returns 404 everywhere — safe failure vs. serving upstream RPC with a stale Authorization header. - apps/{bitcoin,electrs,lnd}-ui/manifest.yml land as first-class manifests. bitcoin-ui declares the bind-mount target and a dependency on bitcoin-core; electrs-ui and lnd-ui declare their own deps and health checks. - 8 new unit tests on the render fn (idempotency, rotation, trimming, missing/empty secret, template invariants) plus an integration test asserting install(bitcoin-ui) actually lands a substituted nginx.conf on disk via the hook. 39/39 container:: tests pass (test_parse_image_versions pre-existing failure unchanged, out of scope).	2026-04-23 02:19:52 -04:00
archipelago	ca734e4ea6	docs: STATUS.md through Step 6	2026-04-22 19:20:17 -04:00
archipelago	48f08aa3e4	feat(container): wire ProdContainerOrchestrator + BootReconciler into main Step 6 of the rust-orchestrator migration. Construct the container orchestrator once in main.rs, call load_manifests + adopt_existing immediately after Config::load, log the adoption report, and spawn BootReconciler::run_forever with the 30s default interval. Thread the orchestrator through Server::new -> ApiHandler::new -> RpcHandler::new so the reconciler and RPC layer share one instance. Wire a tokio::sync::Notify through the SIGTERM/SIGINT shutdown path so the reconciler exits cleanly alongside the server drain. Uses notify_one so the signal stores a permit if the reconciler is mid reconcile_all when the signal fires. Delete the commented-out run_boot_reconciliation block in main.rs that documented the prior bash-script approach being unsafe on unbundled installs — the new reconciler is manifest-driven and only touches apps present in /opt/archipelago/apps, fixing that concern. cargo check -p archipelago clean (6 pre-existing dead-code warnings on trait methods not yet exercised until Step 9 hot-swap). Container test suite 43/44 pass; the one failure (container::image_versions:: test_parse_image_versions) is pre-existing and unrelated.	2026-04-22 19:20:13 -04:00
archipelago	fc39b04b4e	feat(container): BootReconciler — periodic reconcile loop for prod orchestrator Step 5 of the rust-orchestrator migration. New file boot_reconciler.rs holds a small Tokio task that calls ProdContainerOrchestrator::reconcile_all() on a 30-second cadence (answered design Q3). * BootReconciler::new(orch, interval, shutdown) — shutdown is an Arc<Notify> so callers can trigger a graceful exit without pulling in tokio-util. * run_forever(self) — does one reconcile immediately, then loops on tokio::select! { sleep_until \| shutdown.notified() }. Shutdown interrupts the sleep but never an in-flight reconcile_all call. * Per-pass outcomes are logged at debug/warn; failures never propagate out because reconcile_all already absorbs per-app errors into ReconcileReport. Four tokio::test(start_paused = true) tests verify the loop cadence against a CountingRuntime test double: * initial_pass_fires_immediately — first reconcile runs with no delay * second_pass_fires_after_interval — second pass fires after exactly interval elapses in paused-clock time * shutdown_terminates_loop — notify_one() lets run_forever return * failure_in_one_pass_does_not_stop_loop — the loop keeps ticking even when the first pass had to install a missing container Not wired into main.rs yet — that is Step 6. Re-exported from container::mod as BootReconciler + RECONCILER_DEFAULT_INTERVAL for the wire-up step.	2026-04-22 19:04:34 -04:00
archipelago	d7692790bc	docs: update STATUS.md — Step 4 done, Step 5 next Records acceptance evidence for Steps 1-4 (container tests 21/21 pass, build clean with expected unused-method warnings) and queues the BootReconciler implementation for Step 5.	2026-04-22 18:57:43 -04:00
archipelago	138588422a	chore: gitignore macOS AppleDouble files from SSHFS writes The laptop mounts ~/Projects/archy over SSHFS and macOS finder / Spotlight sidecars write ._<name> resource-fork files alongside every edit. They are noise; keep them out of git.	2026-04-22 18:56:58 -04:00
archipelago	e8a59c93c6	feat(container): ContainerOrchestrator trait, RpcHandler uses it in prod Step 4 of the rust-orchestrator migration. Unifies the container lifecycle surface behind a single trait so the RPC layer stops caring whether it is talking to the dev or prod orchestrator. * New trait core/archipelago/src/container/traits.rs: ContainerOrchestrator with install / start / stop / restart / remove / upgrade / status / list / logs / health, all keyed by app_id. Every method is async_trait-based. * ProdContainerOrchestrator: the lifecycle methods are moved from inherent impl into the trait impl (avoids name-shadowing recursion). Adoption and reconcile remain inherent since only main.rs / BootReconciler call them. * DevContainerOrchestrator: new trait impl that forwards to the existing Dev-named methods, applying the dev container-name + port-offset rules internally. New load_manifest_for() helper resolves app_id to <data_dir>/apps/<app_id>/manifest.yml so trait-level install(app_id) works in dev too. install_container(manifest, path) stays inherent for the manifest-path RPC shape. * RpcHandler now holds Option<Arc<dyn ContainerOrchestrator>> and, when in dev mode, a separate Option<Arc<DevContainerOrchestrator>> for the manifest_path install RPC. In prod mode RpcHandler::new() constructs a ProdContainerOrchestrator and calls load_manifests() at startup. * All seven container-* RPC guards no longer say dev mode required. container-install still requires dev mode because its manifest_path argument has no prod meaning; every other container RPC now works in both modes via the trait. BOOT STILL DOES NOT USE THIS. main.rs wire-up (Step 6) and BootReconciler (Step 5) come next. Until then the prod orchestrator is constructed but nothing populates /opt/archipelago/apps so it has zero manifests to manage, matching the pre-Step-4 behaviour. Verification: cargo build -p archipelago clean (11 expected unused method warnings for methods not yet wired from main.rs). cargo test -p archipelago: all 21 container::* tests pass (16 prod_orchestrator + 5 others). 24 other test failures are pre-existing and unrelated (identity_manager / session / wallet / mesh / credentials — all independently flaky on file-backed state).	2026-04-22 18:56:52 -04:00
archipelago	b6a04d315a	feat(container): ProdContainerOrchestrator with build-or-pull, adoption, reconcile Step 3 of the rust-orchestrator-migration. New file prod_orchestrator.rs (999 LOC) implements the full public surface that will replace scripts/first-boot-containers.sh: * install / start / stop / restart / remove / upgrade / status / list / logs / health * adopt_existing: read-only scan that claims containers matching our manifests by name, without recreating — preserves the v1.7.42 fixture on .116. * reconcile_all: level-triggered, per-app failures collected rather than aborting. * install_fresh: build-or-pull (Step 2 trait methods), relative build contexts resolved against the manifest directory. Naming rule (answered design Q1): UI app IDs (bitcoin-ui/electrs-ui/lnd-ui) get the archy- prefix; backends keep their bare ID. An explicit extensions.container_name always wins. Codified in compute_container_name() with unit tests for all three tiers. Concurrency (answered design Q4): per-app tokio::sync::Mutex<()> created lazily, protecting every mutating op against the reconciler loop. Acquiring the per-app lock only needs a read lock on the map, so independent apps do not serialize. 16 tests: 3 sync naming rule tests + 13 tokio async tests covering install (pull, build-absent, build-present, relative-context), reconcile (noop/exited/missing/ mixed-failure), adopt-by-name, upgrade sequence ordering, list filtering, health state mapping, and unknown-app-id rejection. All pass. Not wired into main.rs yet — that is Step 6. Crate builds clean with expected unused warnings for the new re-exports.	2026-04-22 18:32:31 -04:00
archipelago	34af4d9d4e	feat(container): runtime trait gains image_exists + build_image Adds two methods to ContainerRuntime so the upcoming ProdContainerOrchestrator can inspect local image storage and build images from BuildConfig: - image_exists(image_ref) -> Result<bool>: local-storage check only, does not consult registries. Distinguishes exit 0 (present) from exit 1 (absent) from other failures (environment error). - build_image(&BuildConfig) -> Result<()>: shells out to podman/docker build with -t, -f, deterministically-sorted --build-arg pairs, and the context path last. Implemented on all three runtimes: - PodmanRuntime: new podman_cli helper shells out alongside the existing HTTP API calls (build and image inspect are awkward over the HTTP API) - DockerRuntime: native docker CLI, same exit-code semantics - AutoRuntime: delegates to the selected inner runtime Argv construction extracted into pure build_args_for_podman helper so it can be unit-tested without a real podman. 4 new tests cover minimal args, custom Dockerfile path, deterministic build-arg sorting (guards against HashMap iteration non-determinism), and context-is-last (positional arg placement is load-bearing for podman build). Step 2 of docs/rust-orchestrator-migration.md. 25/25 tests pass.	2026-04-22 17:46:47 -04:00
archipelago	3767c2670c	feat(container): add build source to manifest schema ContainerConfig.image is now Option<String>, mutually exclusive with a new optional ContainerConfig.build: Option<BuildConfig>. Exactly one of image or build must be present, enforced in AppManifest::validate. Adds ResolvedSource enum (Pull \| Build) and ContainerConfig::resolve + ::image_ref helpers so the orchestrator can treat pull and build uniformly. All 26 existing pull-only manifests continue to parse unchanged (covered by existing_pull_only_manifests_still_parse test). Call sites updated: podman_client, runtime::DockerRuntime, dev_orchestrator. Dev orchestrator errors out cleanly on Build sources until Step 2 lands build_image support on the runtime trait. Step 1 of docs/rust-orchestrator-migration.md. 10 new unit tests, all pass. Also includes: docs/rust-orchestrator-migration.md (design spec) and docs/STATUS.md resume section for the next session.	2026-04-22 17:46:36 -04:00
archipelago	7ecd30bde2	release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block validation. The old 10s client-side timeout was rejecting roughly 30% of UI calls even though the node was perfectly healthy. 20x stress test on the live .116 node (caught in IBD catch-up at block 797k) used to drop 10 of 20 calls; now drops 0 of 20. What changed: - core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up to 3 times with 500ms and 1500ms backoffs between attempts. Only transient transport errors (timeout, connect refused, send/recv IO) trigger retry. A well-formed bitcoind error response is surfaced immediately - real RPC bugs are never masked. - Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top of reqwest's own timeout, so DNS starvation or TLS wedging can't steal the entire retry budget. - handle_bitcoin_getinfo client builder gained a 3s connect_timeout so a dead bitcoind is fast-failed inside the first attempt instead of eating the whole 15s. - Retry policy extracted into a RetryConfig struct so tests can dial down timeouts to ~100ms per attempt. Production defaults live in RetryConfig::production(). Not changed (tracked as follow-up): - mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the same 10s-timeout pattern. Not migrated to the new wrapper in this release; scheduled for v1.7.43 alongside the render_bitcoin_conf work. - lnd/info.rs and electrs_status have similar 10s/15s timeouts but different failure profiles - audit first, migrate only the ones that actually exhibit the bug. Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing. Uses an in-process hyper server (already a transitive dep) to simulate bitcoind responses; no new crates required. - happy_path_first_attempt: no retry when first attempt succeeds - retries_on_timeout_then_succeeds: first attempt times out, second succeeds, returns OK (uses a short-timeout RetryConfig so the test runs in <1s instead of 15s) - retries_exhausted_on_persistent_connect_refused: all attempts fail against a closed port, error bubbles up, elapsed time confirms backoffs actually ran - does_not_retry_on_rpc_level_error: bitcoind-returned error body is surfaced immediately, no retry - does_not_retry_parse_errors: non-JSON response (e.g. 503 with html body) is NOT retried - guards against the tempting "retry all non-2xx" mistake that would mask real bitcoind misconfig - retry_budget_invariants: asserts total wall-time ceiling stays under 60s so a bumped constant can't silently hang a UI call forever Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the old 10s timeout. Worst-case latency was 48.9s during peak validation; happy-path latency (cached result) remains 28-77ms.	2026-04-22 16:46:28 -04:00
archipelago	048679065e	release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.	2026-04-22 16:14:35 -04:00
Dorian	50744952b7	release(v1.7.40-alpha): fix tarball root perms at source so OTA can't 500 again Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details v1.7.38 and v1.7.39 both shipped with `./` inside the frontend tarball marked drwx------ (700). Tar extraction preserves archive perms, so every node that pulled the OTA landed with /opt/archipelago/web-ui at 700, nginx (www-data) returned 500 "permission denied" on every page, and the browser showed "Internal Server Error nginx". .116 hit this on both v1.7.38 and v1.7.39 rollouts. The v1.7.39 runtime self-heal in main.rs was the wrong layer — systemd's ReadOnlyPaths namespace made /opt/archipelago read-only from inside the archipelago service, so chmod from there returned EROFS. Root cause: create-release-manifest.sh used mktemp -d (700 default umask) for staging, then tar preserved that 700 in the archive's root entry. Fix the archive itself: - chmod 755 staging dir + `find -type d -exec chmod 755` + `-type f chmod 644` before tar, so the on-disk entries are correct. - tar --owner=0 --group=0 --mode='u=rwX,go=rX' to normalize archive perms belt-and-braces in case file-mode drift ever reappears. - Post-tar verify: `tar tvzf \| head -1` must show drwxr-xr-x at root, or the release script aborts before the manifest is even generated. Binary unchanged semantically — the main.rs self-heal stays in as a last- resort belt (can't hurt on nodes whose FS isn't namespace-isolated), and the update.rs in-extractor chmod stays in so v1.7.40-onwards extractors are double-safe. The authoritative fix is the archive. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:54:44 -04:00
Dorian	3218f71703	release(v1.7.39-alpha): hotfix web-ui perms after OTA (nginx 500) + startup self-heal Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details v1.7.38 shipped with an OTA bug: the tar-extracted staging dir inherited 700 perms and nginx (www-data) returned 500/403 on every request after the swap. .116 hit this on rollout; had to chmod by hand to recover. - update.rs: after extraction, explicitly chmod 755 dirs + 644 files on the new staging dir before the mv into place, so nginx can stat/serve them. - main.rs: self-heal on startup — if /opt/archipelago/web-ui is not world-readable, run `sudo chmod -R u=rwX,go=rX` to repair. This is what rescues nodes upgrading from v1.7.37/v1.7.38, since their extractor (running on the old binary) doesn't have the chmod fix yet — the new binary's first boot fixes the mess before nginx serves a single request. Everything v1.7.38 shipped is still in this release: - auth.rs auto-heals is_onboarding_complete() from setup_complete + password_hash so nodes don't bounce back to /onboarding/intro after browser clear / reboot / update - useOnboarding tri-state: backend-unreachable no longer defaults to intro - login sounds gated by isFirstInstallPhase() — silent after onboarding, typing sounds unaffected - FIPS app / Nostr Relay / Nostr VPN / Routstr / Penpot removed from catalog + frontend + Rust + docker + icons; 15 image versions deleted from tx1138, .168, gitea-local - AIUI baked into release tarball via demo/aiui/ - prebuild hook syncs app-catalog/catalog.json → public/catalog.json Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:26:54 -04:00
Dorian	ca5d2cc42a	release(v1.7.38-alpha): onboarding auto-heal + silent returning logins + app-store trim - auth.rs now infers onboarding-complete from setup_complete + password_hash so nodes stop bouncing users through the intro wizard after browser clear / update / reboot; the flag self-heals to disk on next check - frontend: "backend uncertain" no longer defaults to /onboarding/intro — useOnboarding returns null + callers poll / retry instead of flashing the wizard - login sounds (synthwave, welcome voice, pop, whoosh, oomph) gated by isFirstInstallPhase(); typing sounds unaffected - removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog, frontend config, Rust AppMetadata + install dispatch + install_penpot_stack; docker/fips-ui + docker/nostr-vpn-ui + apps/penpot dirs and 5 icons deleted; 15 image versions deleted from tx1138, .168, gitea-local registries (.160 Gitea was 502 at release time — follow-up) - AIUI baked into frontend release tarball via demo/aiui/; deploy-to-target falls back to demo/aiui/ when the AIUI sibling checkout is missing - prebuild hook syncs app-catalog/catalog.json → public/catalog.json so the two copies can no longer drift (was the source of the "apps still visible" bug — public/ had stale data) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:02:24 -04:00
Dorian	9cb114c50a	release(v1.7.37-alpha): bitcoin-core install fixes + dynamic node UI + full-archive default Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details Install flow - api/rpc/package/install.rs: always append the literal image URL as a last-resort pull candidate in do_pull_image, so images not carried by any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install instead of masquerading as a generic pull failure across every mirror. - api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat error, not just "file exists". Once bitcoin-knots' first-boot chowns /var/lib/archipelago/bitcoin into the container's user namespace (700 perms, UID 100100/100101), the archipelago daemon can't even traverse in — try_exists returns Err which unwrap_or(false) treated as "not present" and drove a doomed write. Now errors out of the directory traversal are treated as "conf already owned by container user" and the write is skipped. Mirrors the lnd.conf pattern. - api/rpc/package/install.rs: drop the hardcoded `prune=550` from the conf default. Operators with multi-TB drives shouldn't be silently pruned; users who want a pruned node can set it in bitcoin.conf themselves. Full archive is the only honest default. - api/rpc/package/config.rs: bitcoin-core now passes explicit -server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and reads conf + argv only; without these the RPC listens on 127.0.0.1 inside the container and rootlessport can't reach it, so the bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call. Bitcoin Knots keeps its own entrypoint-driven defaults. - container/docker_packages.rs: split bitcoin-core out of the shared AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with bitcoin-core.svg and a Reference-implementation description; the bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home card showing "Bitcoin Knots" for a Core install. Bitcoin node UI (docker/bitcoin-ui) - index.html: impl name/tagline/logo now dynamic. applyImplBranding() reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both get their own icon and subtitle. Settings modal replaced its hardcoded Regtest/txindex=1/port-18443 placeholders with live values from getblockchaininfo + getindexinfo + getzmqnotifications. - index.html: new Storage info card (Full Archive · X GB / Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on the main dashboard, same level as Network. Settings modal mirrors it with the prune height when applicable. - Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the bg-network.jpg used by the dashboard are now COPY'd into the image under /usr/share/nginx/html/assets. Previously the <img src> pointed at paths that 404'd into the SPA fallback and the onerror handler hid the broken logo silently. Frontend - appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334), HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core). Without these the AppSessionFrame showed "No URL found for bitcoin-core" and the home/app-list title fell through to the raw id. - settings/AccountInfoSection.vue: backfill What's New entries for v1.7.31 through v1.7.37 that had been missed in earlier cuts. Release plumbing - releases/v1.7.37-alpha/: binary + frontend tarball. - releases/manifest.json: v1.7.37-alpha, sha256/size refreshed. - Cargo.toml / package.json: version bumps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:03:47 -04:00
Dorian	1f912a0f58	fix(catalog): prefix bitcoin-core image with docker.io/ so the install validator accepts it Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details The trusted-registry allowlist in api/rpc/package/config.rs splits the image on '/' and matches the first segment against a fixed set (docker.io, ghcr.io, git.tx1138.com, 23.182.128.160:3000, ghcr.io, localhost). A bare 'bitcoin/bitcoin:28.4' splits to registry="bitcoin" which isn't on the list, so the install RPC was returning 'Invalid Docker image format'. Live catalogs on .160 and gitea-local already hotfixed directly; these static copies keep ISO builds and the final hardcoded fallback in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 09:18:49 -04:00
Dorian	7106a81c6a	release(v1.7.36-alpha): bitcoin-core in App Store + Sovereignty Stack + dynamic catalog URL Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - neode-ui/public/assets/img/app-icons/bitcoin-core.svg (NEW): 256×256 Umbrel community Bitcoin icon sourced from getumbrel.github.io/ umbrel-apps-gallery/bitcoin/icon.svg. Referenced by the static catalog, the curated fallback, and the upstream lfg2025/app-catalog entry so every surface shows the same image. - app-catalog/catalog.json + neode-ui/public/catalog.json: add bitcoin-core (v28.4) entry pointing at bitcoin/bitcoin:28.4. Same entry pushed to the lfg2025/app-catalog repo on .160 and the local gitea mirror so nodes see it without needing a full archipelago update. Sovereignty Stack entry added to FEATURED_DEFINITIONS with a description that frames it as a Knots alternative, not a rival. - core/archipelago/src/api/handler/mod.rs: handle_app_catalog_proxy is now instance-scoped (&self) and derives its upstream list from load_registries — each active container registry contributes one `<scheme>://<reg.url>/app-catalog/raw/branch/main/catalog.json` URL in priority order (scheme follows tls_verify). When the operator switches mirrors in Settings, the App Store now follows. Falls back to the legacy hardcoded .160/tx1138 pair only when registry config can't be loaded, so the App Store still renders on nodes that haven't persisted one yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 09:06:10 -04:00
Dorian	987158ef5f	release(v1.7.35-alpha): rootless-netns self-heal + app update button + bitcoin-core 28.4 + Node DID unification Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - core/archipelago/src/bootstrap.rs (NEW): embed scripts/container-doctor.sh and image-recipe/configs/archipelago-doctor.{service,timer} via include_str! and sync to disk + enable the timer on every archipelago startup. Idempotent (content-hash compare), dev-box symlink guard keeps the git checkout untouched, best-effort (warn-only on failure) so bootstrap never blocks server readiness. Wired in main.rs as a background tokio task. - scripts/container-doctor.sh: add fix_rootless_netns_egress(). Detects when the rootless-netns has lost its pasta tap (container-to-container still works but outbound DNS/TCP fails) via an nsenter probe into aardvark-dns; with a two-probe 10s debounce to rule out transients and a host-precheck that bails out if the host itself is offline. When the rootless-netns is truly broken, does a graceful podman stop --all / start --all so pasta + aardvark-dns rebuild the netns from scratch. Bitcoin-knots and every other outbound container recover in one cycle. - core/archipelago/src/update.rs: host_sudo → pub(crate) so bootstrap.rs can reuse the existing systemd-run escape hatch. - apps/bitcoin-core/manifest.yml: bump app version 24.0.0 → 28.4.0 and image bitcoin/bitcoin:24.0 → bitcoin/bitcoin:28.4. Resources aligned with the real container-specs.sh large-disk tune (4 GiB memory cap, cpu_limit: 0 so bitcoind can run -par=auto across every core). - neode-ui/src/views/apps/AppCard.vue + Apps.vue: add an Update button + Updating spinner to every app card that has available-update set. Wires through serverStore.updatePackage(id) — the same RPC the detail view already calls. common.update / common.updating i18n keys added in en.json and es.json. - core/archipelago/src/identity_manager.rs: add create_from_signing_key() that mirrors an existing Ed25519 key as a manager-level identity with a deterministic id (`node-<pubkey16>`). Idempotent across restarts, gets the hex-SVG master avatar. - core/archipelago/src/server.rs: the auto-create path on first boot now mirrors the node's own signing_key (seed-derived on onboarded installs) as a "Node" identity instead of generating a random "Default" keypair. Once this ships, the DID on the Web5 DID Status card (via node.did RPC), the Node entry on the Identities page (via identity.list), and the DID used for peer-to-peer connects (via server_info.pubkey) all resolve to the same seed-derived pubkey. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 08:29:56 -04:00
Dorian	5f6b4232d2	release(v1.7.34-alpha): re-seed onboarding cache + rotating login bg + drop re-login zoom Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - useOnboarding.ts: when the backend gives a definitive answer (true/false, not a null retry failure), re-seed the neode_onboarding_complete localStorage flag accordingly. Fixes the case where a user clears site data on an already-onboarded node — OnboardingWrapper's useVideoBackground computed reads localStorage synchronously, so without this re-seed the intro video would fire again on /login even though RootRedirect correctly sent them straight to /login. - OnboardingWrapper.vue: login background now rotates through bg-intro-1..6 on each /login mount, with the current index persisted to localStorage (neode_login_bg_idx) so subsequent logouts advance rather than repeat the same image. - Dashboard.vue: subsequent-login branch drops the 1.2s showZoomIn entirely. Only the first dashboard entry after onboarding plays the full zoom + glitch reveal; every re-login now just fades in with the welcome typing (~300ms). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 05:42:52 -04:00
Dorian	65582d67c6	release(v1.7.33-alpha): onboarding/login UX fixes + PWA cache bust Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - useOnboarding.ts: prefer the backend over localStorage when checking onboarding completion. The old order (localStorage first) meant any browser that had ever onboarded a node would treat every new fresh node as already-onboarded and skip the wizard, dumping the user straight at the inline set-password form. Backend is now authoritative; localStorage stays as the offline fallback. - OnboardingWrapper.vue: skip the intro video on `/login` once `neode_onboarding_complete` is set. Returning logged-out users now get the static lock-screen background + glitch overlay instead of replaying the full intro on every logout. - RootRedirect.vue: when the health check fails, only show the full BootScreen if the node was never onboarded. For already-onboarded nodes (i.e. an OTA-update blip), keep the spinner and poll the health endpoint every 2s for up to 60s before falling back to the boot screen. Fixes the "fake boot loader" / "server starting up" screens flashing on every successful update. - loginTransition store: new `justCompletedOnboarding` flag distinct from `justLoggedIn`. Set true only by the inline setup-password flow (handleSetup). Dashboard.vue branches on it: full glitch+zoom reveal for the post-onboarding entry, quick zoom + welcome typing on every other login (no triple glitch flashes, ~1.2s vs 8s). - vite.config.ts: bump assets cache from `assets-cache-v2` to `assets-cache-v3` so service workers running the previous bundle invalidate their cache and pick up the new UI cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 04:45:33 -04:00
Dorian	fd3f5d2701	release(v1.7.32-alpha): fix frontend tarball layout + mDNS shutdown hang Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - HOTFIX: v1.7.31-alpha's frontend tarball was packaged with a `neode-ui/` top-level directory instead of the flat layout v1.7.30 and earlier used. Nodes that applied v1.7.31 ended up with `/opt/archipelago/web-ui/neode-ui/index.html` instead of `/opt/archipelago/web-ui/index.html`, and nginx returned 403/500. v1.7.32's tarball is built with `tar -C web/dist/neode-ui .` so files land directly at web-ui root. Broken nodes auto-heal on this update (web-ui dir is replaced). - transport/lan.rs: add Drop impl that calls ServiceDaemon::shutdown() on the mdns_sd daemon. Without this the OS thread it spawns, plus the blocking `receiver.recv()` task, keep the tokio runtime alive past SIGTERM — long enough for systemd's TimeoutStopSec to SIGKILL the service and mark it Failed. Was visible on every update: "shut down cleanly" logged, then 15s later systemd forcibly kills. - main.rs: after logging "Archipelago shut down cleanly", call `std::process::exit(0)` explicitly. Belt-and-suspenders against any future non-daemon thread creeping in (reqwest resolver pool, etc.) and causing the same SIGKILL regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 03:52:22 -04:00
Dorian	fdaa5646b2	release(v1.7.31-alpha): idempotent IndeedHub install + auto-merge default mirrors/registries + 3rd OVH update mirror Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - Backend: install.rs registry reachability probe now strips the `host[:port]/namespace` suffix before appending `/v2/` (the Docker V2 API lives at the host root, not under the namespace) and accepts HTTP 405 in addition to 200/401 as "registry daemon alive". This fixes false "unreachable" reports on the Test button for Gitea and other registries that protect their /v2/ endpoint. - Backend: stacks.rs install_indeedhub_stack now force-removes any leftover indeedhub-* containers and indeedhub-net before creating the stack. A partial install (or the old first-boot stub racing the installer) used to leave containers around that blocked re-install with "name already in use". Re-running the App Store install now self-heals. - Backend: registry.rs load_registries auto-merges any default registry URLs missing from the saved config (appended with priority max+10+i, persisted). Lets new default mirrors (e.g. Server 3 OVH) roll out to existing nodes without manual config edits. Explicit removals still stick — URLs absent from disk AND absent from defaults stay gone. - Backend: update.rs adds DEFAULT_TERTIARY_MIRROR_URL at http://146.59.87.168:3000/ (Server 3 OVH) to default_mirrors, with the same auto-merge-on-load behavior as registries. Test updated for 3-mirror default (.160, tx1138, .168). - Scripts: dropped the first-boot IndeedHub stub (~38 lines in first-boot-containers.sh §8b). It predated the proper stack installer, raced it, and was the main source of the name-conflict mess the stacks.rs cleanup above now also guards against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 03:26:09 -04:00
Dorian	f9b44f5e2e	release(v1.7.30-alpha): live install/uninstall progress + cleaner pull waterfall Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - Backend: unified pull-progress streaming across primary AND fallback registries. Earlier code only streamed for the primary attempt; if it failed fast (VPS 404, etc.) the UI froze at 0% until the fallback finished. The waterfall now uses a single shared helper that streams podman stderr through update_install_progress for every URL tried. - Backend: PackageDataEntry gains uninstall_stage, set at each phase of handle_package_uninstall ("Stopping containers (i/total)", "Cleaning up volumes", "Removing app data"). State flips to Removing during the pipeline. - Frontend: MarketplaceAppCard renders the live progress bar with byte counts during installs, matching the System Update download bar style. - Frontend: AppCard renders the live uninstall stage label per app. Modal closes immediately on confirm so concurrent uninstalls each show their own progress on their own card. - Cleanup: removed dead helpers (image_candidates, rewrite_for_primary, primary_image_url, pull_from_registries_with_skip) made unused by the install.rs refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 19:11:36 -04:00
Dorian	7432d84545	release(v1.7.29-alpha): VPS as default app registry + settings UI - New Settings → App registries page (/dashboard/settings/registries) that mirrors the update-mirrors experience: list of configured registries, test reachability, set primary, add/remove. New registry.set-primary RPC; existing registry.{list,add,remove,test} reused. - Default RegistryConfig flipped: VPS (23.182.128.160:3000/lfg2025) is now Server 1 (primary), tx1138 is Server 2 (fallback). - Install pipeline now rewrites the first pull to the primary registry URL before attempting it. Before this, installs always hit whichever registry the image was hardcoded to, so changing the primary didn't actually affect where images came from. On failure, the existing fallback walk skips the primary (already tried) and walks the rest. - App catalog proxy UPSTREAMS order flipped so the catalog follows the same VPS-first rule. - Reboot overlay: animated "a" logo now sits in the center of the ring (matches the screensaver composition). Extracted the logo-wrapper pattern inline. 7/7 registry tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:54:07 -04:00
Dorian	79ae14a127	release(v1.7.28-alpha): reboot progress overlay + VPS default primary - New reboot progress overlay: full-screen black with the screensaver's pulsing ring, rebooting → reconnecting → back-online → stalled stages, elapsed counter, auto-reload on health-check success, manual reload button at 3 min stall. Mirrors the existing update overlay. - Ring extracted from Screensaver.vue into a reusable ScreensaverRing component so the reboot overlay reuses the same animation. - default_mirrors() now puts the VPS as Server 1 (primary) and tx1138 as Server 2 — new nodes fetch manifests from VPS first; existing nodes keep whatever mirror order they've customized. - What's New entry prepended for v1.7.28-alpha. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:06:37 -04:00
Dorian	c3b3b03ee1	release(v1.7.27-alpha): mirror transparency — served-by line + one-click test button - New "Served by {mirror}" line on the System Update page so operators can see which mirror actually served the available manifest (vs. which is configured primary). Backend threads the served URL through UpdateState.manifest_mirror. - New update.test-mirror RPC + per-row lightning-bolt button that pings a mirror and renders reachable/latency or error inline under the URL. - UI polish on the mirrors section: Set Primary, Remove, and the new Test action are compact icon buttons; add-mirror form moved into a dialog. - "What's New" block prepended for v1.7.27-alpha. 21/21 update module tests pass. vue-tsc + vite build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:05:42 -04:00
Dorian	97a3803640	release(v1.7.26-alpha): mirror list + origin-relative download URLs Adds a multi-mirror manifest fetch. `check_for_updates` walks a configurable list (data_dir/update-mirrors.json) in priority order and falls through to the next mirror on any HTTP / parse / timeout failure. Two defaults bake in: Server 1 (git.tx1138.com) and Server 2 (23.182.128.160:3000). Critical fix: after parsing a manifest, rewrite every component's `download_url` so its origin matches the manifest URL we fetched. Before this, the manifest hard-coded absolute URLs pointing at one specific server — so even when a node fetched the manifest from a faster mirror, the actual 200MB download went back to the slow original. Now the faster mirror wins end-to-end. New RPCs: update.list-mirrors, update.add-mirror, update.remove-mirror, update.set-primary-mirror. New UI section on the System Update page for operator management. 5 new unit tests for origin parsing and manifest rewriting (21/21 green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:09:28 -04:00
Dorian	5c634baa6d	release(v1.7.25-alpha): TCP transport for public FIPS mesh + modal cleanup Re-adds the TCP transport (`0.0.0.0:8443`) to the rendered fips.yaml alongside UDP. Upstream factory default enables both; we had inadvertently narrowed to UDP-only when the yaml rewriter was last touched, which left nodes unable to reach fips.v0l.io (the public anchor only answers on TCP right now) or talk across networks that block UDP. Backend startup now compares the installed yaml against the current rendered schema and restarts whichever fips unit is active when they differ — so OTA-upgrading nodes pick up the new transport without anyone having to click Reconnect. Dropped the earlier plan to auto-add federated peers as seed anchors: invites don't carry a FIPS-reachable IP:port, and once TCP reconnects the public mesh, federated peers become npub-routable without needing a seed entry. Seed Anchors modal cleanup: replaced malformed header icon with a three-arc broadcast glyph, and the close button now matches the What's New modal (embedded in the card header, same icon + hover style) instead of the earlier floating off-design placeholder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:25:53 -04:00
Dorian	41474047bf	release(v1.7.24-alpha): unbreak frontend pipeline — fresh UI for the first time since v1.7.17 The npm run build step in the release ritual had been silently failing for roughly seven releases. vue-tsc died with EACCES on a root-owned node_modules/.tmp, exited non-zero, and my `tail -5` of the build output happened to only show vite's precache summary — which makes vite look successful even when the typecheck that precedes it failed. The resulting archipelago-frontend-*.tar.gz files were rebuilds from whatever content happened to live in web/dist/neode-ui/ at the moment (files left over from v1.7.9, owned root:root from an earlier sudo'd operation, unchanged since). Fixed by chowning both paths back to the archipelago user and rebuilding. Every published frontend tarball from v1.7.17 through v1.7.23 therefore shipped the same frozen UI; v1.7.24 is the first release in that stretch whose frontend actually matches its backend. Recorded the build-verification rule as a persistent feedback memory (feedback_frontend_build_verify.md) — future ships must grep the packaged tarball for the new version string before push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:53:00 -04:00
Dorian	005bbd9a9a	release(v1.7.23-alpha): FIPS Seed Anchors reachable via gear icon Adds a gear button next to the FIPS Mesh card's status pill that opens a Teleport-ed modal containing FipsSeedAnchorsCard. The card was landed on disk in v1.7.21 but never wired into a UI entry point per the entry-point convention, so users couldn't access the Add/Remove/Apply controls at all. One gear click now opens them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 08:17:26 -04:00
Dorian	d0c50bc9ce	release(v1.7.22-alpha): honest anchor status + Reconnect works on all nodes - fips::service::active_unit() picks whichever fips unit is running (archipelago-fips.service vs upstream fips.service) so handle_fips_restart and handle_fips_reconnect don't silently no-op on hosts where the archipelago-managed unit was never created. - peer_connectivity_summary(anchor_candidates) replaces the old identity-cache check. anchor_connected is now true when at least one authenticated peer's npub matches the public anchor OR any entry in seed-anchors.json, which matches what the user actually cares about ("am I in the mesh?") rather than what the card used to claim ("is this one specific public anchor reachable?"). - FipsStatus::query takes data_dir now (so it can read seed-anchors) rather than identity_dir. All call-sites updated. - handle_fips_reconnect re-pushes seed anchors after restart so the new daemon gets dialed without waiting for the 5-min apply loop. - FipsNetworkCard label drops "(fips.v0l.io)" — misleading now that multiple anchors may be configured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 07:08:26 -04:00
Dorian	e88719df50	release(v1.7.21-alpha): operator-editable FIPS seed anchors Adds a local seed-anchor list at <data_dir>/seed-anchors.json. Each entry is {npub, address, transport, label}. On archipelago startup and every 5 minutes the list is pushed into the running fips daemon via `fipsctl connect <npub> <addr> <transport>`, so a cluster can anchor itself independently of the global fips.v0l.io. A flaky or unreachable public anchor no longer strands a fresh install. New RPCs: - fips.list-seed-anchors - fips.add-seed-anchor (validates npub1… + host:port) - fips.remove-seed-anchor - fips.apply-seed-anchors (on-demand re-dial) New standalone UI card at views/server/FipsSeedAnchorsCard.vue. Not wired into Home.vue / Server.vue — operator places it per the entry-point convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 06:21:37 -04:00
Dorian	4d8a9e66e3	release(v1.7.20-alpha): stop auto-apply scheduler killing the service The 3AM auto-update path called std::process::exit(0) immediately after apply_update returned. apply_update had already spawned a 2s- delayed systemctl restart, but exit(0) killed the runtime before that spawned task could run — and the unit's Restart=on-failure does not trigger on a clean exit 0, so the service stayed dead until someone SSH'd in and started it manually (.253 hit this today). Scheduler now returns from the task without killing the process; apply_update's existing restart path (same one the UI's Install Update button uses) brings the new version up cleanly. Also hardens the ISO CI: the AIUI inclusion step now falls back to extracting from the newest release tarball if the runner's cached /opt/archipelago/web-ui/aiui path is missing, so a reprovisioned runner can't silently ship a frontend tarball without AIUI. The ISO build step also sanity-checks the binary exists before invoking the builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:33:11 -04:00
Dorian	9fc9696dbd	release(v1.7.19-alpha): kill stale available_update + numeric version compare load_state now drops any stored available_update whenever the running binary version differs from what's on disk — the old migration only cleared it when the stale entry happened to match the new version, so skipping releases (e.g. sideloading 1.7.16 → 1.7.18 without 1.7.17) left a pointer to an intermediate version as the "update available", which the UI then offered as a downgrade prompt. check_for_updates also uses a numeric version comparator so a stale or cached manifest with an older version can't offer itself as an update, and 1.7.10 correctly outranks 1.7.9 past the single-digit patch boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:04:20 -04:00
Dorian	062e1fada2	release(v1.7.18-alpha): transitive peers default Trusted + update-flow logs Flip transitively-discovered federation peers to Trusted instead of Observer. Hints are already only ingested from peers we trust and only peers we trust are re-exported via build_local_state, so the chain of trust is already vetted end-to-end — making the user promote each newcomer by hand was friction with no security win. Backend: - federation/sync.rs: merge_transitive_peers now inserts TrustLevel::Trusted (doc comment updated to explain the transitive-trust rationale) - update.rs: info! log at download start (version, components, total_bytes, staging path), cancel (staging wiped?, marker cleared?), and apply (backup path) so journalctl reveals where a stuck update actually is Frontend: - SystemUpdate What's New block gets a v1.7.18-alpha entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:20:36 -04:00
Dorian	4706dd16e7	release(v1.7.17-alpha): cancel download + stall detection Add Cancel Download button + stall detection so a wedged download can be recovered instead of leaving the UI stuck on a frozen progress bar. Backend: - update.rs: DOWNLOAD_CANCEL AtomicBool + DOWNLOAD_PROGRESS_AT AtomicU64 - download loop checks cancel between chunks and during retry backoff (500ms slices instead of one exponential sleep, so Cancel wakes fast) - cancel_download() wipes staging + clears update_in_progress - update.status exposes download_progress.stalled (30s no-progress) - RPC: update.cancel-download + dispatcher entry Frontend: - SystemUpdate.vue: Cancel Download button, amber stall styling, stalled copy, cancel-download confirm branch in modal - i18n keys (en + es) for cancel/stall flow - v1.7.17-alpha What's New block in AccountInfoSection Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 19:10:34 -04:00

1 2 3 4 5 ...

974 Commits