archy

lfg2025/archy

Author	SHA1	Message	Date
archipelago	48f08aa3e4	feat(container): wire ProdContainerOrchestrator + BootReconciler into main Step 6 of the rust-orchestrator migration. Construct the container orchestrator once in main.rs, call load_manifests + adopt_existing immediately after Config::load, log the adoption report, and spawn BootReconciler::run_forever with the 30s default interval. Thread the orchestrator through Server::new -> ApiHandler::new -> RpcHandler::new so the reconciler and RPC layer share one instance. Wire a tokio::sync::Notify through the SIGTERM/SIGINT shutdown path so the reconciler exits cleanly alongside the server drain. Uses notify_one so the signal stores a permit if the reconciler is mid reconcile_all when the signal fires. Delete the commented-out run_boot_reconciliation block in main.rs that documented the prior bash-script approach being unsafe on unbundled installs — the new reconciler is manifest-driven and only touches apps present in /opt/archipelago/apps, fixing that concern. cargo check -p archipelago clean (6 pre-existing dead-code warnings on trait methods not yet exercised until Step 9 hot-swap). Container test suite 43/44 pass; the one failure (container::image_versions:: test_parse_image_versions) is pre-existing and unrelated.	2026-04-22 19:20:13 -04:00
archipelago	fc39b04b4e	feat(container): BootReconciler — periodic reconcile loop for prod orchestrator Step 5 of the rust-orchestrator migration. New file boot_reconciler.rs holds a small Tokio task that calls ProdContainerOrchestrator::reconcile_all() on a 30-second cadence (answered design Q3). * BootReconciler::new(orch, interval, shutdown) — shutdown is an Arc<Notify> so callers can trigger a graceful exit without pulling in tokio-util. * run_forever(self) — does one reconcile immediately, then loops on tokio::select! { sleep_until \| shutdown.notified() }. Shutdown interrupts the sleep but never an in-flight reconcile_all call. * Per-pass outcomes are logged at debug/warn; failures never propagate out because reconcile_all already absorbs per-app errors into ReconcileReport. Four tokio::test(start_paused = true) tests verify the loop cadence against a CountingRuntime test double: * initial_pass_fires_immediately — first reconcile runs with no delay * second_pass_fires_after_interval — second pass fires after exactly interval elapses in paused-clock time * shutdown_terminates_loop — notify_one() lets run_forever return * failure_in_one_pass_does_not_stop_loop — the loop keeps ticking even when the first pass had to install a missing container Not wired into main.rs yet — that is Step 6. Re-exported from container::mod as BootReconciler + RECONCILER_DEFAULT_INTERVAL for the wire-up step.	2026-04-22 19:04:34 -04:00
archipelago	e8a59c93c6	feat(container): ContainerOrchestrator trait, RpcHandler uses it in prod Step 4 of the rust-orchestrator migration. Unifies the container lifecycle surface behind a single trait so the RPC layer stops caring whether it is talking to the dev or prod orchestrator. * New trait core/archipelago/src/container/traits.rs: ContainerOrchestrator with install / start / stop / restart / remove / upgrade / status / list / logs / health, all keyed by app_id. Every method is async_trait-based. * ProdContainerOrchestrator: the lifecycle methods are moved from inherent impl into the trait impl (avoids name-shadowing recursion). Adoption and reconcile remain inherent since only main.rs / BootReconciler call them. * DevContainerOrchestrator: new trait impl that forwards to the existing Dev-named methods, applying the dev container-name + port-offset rules internally. New load_manifest_for() helper resolves app_id to <data_dir>/apps/<app_id>/manifest.yml so trait-level install(app_id) works in dev too. install_container(manifest, path) stays inherent for the manifest-path RPC shape. * RpcHandler now holds Option<Arc<dyn ContainerOrchestrator>> and, when in dev mode, a separate Option<Arc<DevContainerOrchestrator>> for the manifest_path install RPC. In prod mode RpcHandler::new() constructs a ProdContainerOrchestrator and calls load_manifests() at startup. * All seven container-* RPC guards no longer say dev mode required. container-install still requires dev mode because its manifest_path argument has no prod meaning; every other container RPC now works in both modes via the trait. BOOT STILL DOES NOT USE THIS. main.rs wire-up (Step 6) and BootReconciler (Step 5) come next. Until then the prod orchestrator is constructed but nothing populates /opt/archipelago/apps so it has zero manifests to manage, matching the pre-Step-4 behaviour. Verification: cargo build -p archipelago clean (11 expected unused method warnings for methods not yet wired from main.rs). cargo test -p archipelago: all 21 container::* tests pass (16 prod_orchestrator + 5 others). 24 other test failures are pre-existing and unrelated (identity_manager / session / wallet / mesh / credentials — all independently flaky on file-backed state).	2026-04-22 18:56:52 -04:00
archipelago	b6a04d315a	feat(container): ProdContainerOrchestrator with build-or-pull, adoption, reconcile Step 3 of the rust-orchestrator-migration. New file prod_orchestrator.rs (999 LOC) implements the full public surface that will replace scripts/first-boot-containers.sh: * install / start / stop / restart / remove / upgrade / status / list / logs / health * adopt_existing: read-only scan that claims containers matching our manifests by name, without recreating — preserves the v1.7.42 fixture on .116. * reconcile_all: level-triggered, per-app failures collected rather than aborting. * install_fresh: build-or-pull (Step 2 trait methods), relative build contexts resolved against the manifest directory. Naming rule (answered design Q1): UI app IDs (bitcoin-ui/electrs-ui/lnd-ui) get the archy- prefix; backends keep their bare ID. An explicit extensions.container_name always wins. Codified in compute_container_name() with unit tests for all three tiers. Concurrency (answered design Q4): per-app tokio::sync::Mutex<()> created lazily, protecting every mutating op against the reconciler loop. Acquiring the per-app lock only needs a read lock on the map, so independent apps do not serialize. 16 tests: 3 sync naming rule tests + 13 tokio async tests covering install (pull, build-absent, build-present, relative-context), reconcile (noop/exited/missing/ mixed-failure), adopt-by-name, upgrade sequence ordering, list filtering, health state mapping, and unknown-app-id rejection. All pass. Not wired into main.rs yet — that is Step 6. Crate builds clean with expected unused warnings for the new re-exports.	2026-04-22 18:32:31 -04:00
archipelago	3767c2670c	feat(container): add build source to manifest schema ContainerConfig.image is now Option<String>, mutually exclusive with a new optional ContainerConfig.build: Option<BuildConfig>. Exactly one of image or build must be present, enforced in AppManifest::validate. Adds ResolvedSource enum (Pull \| Build) and ContainerConfig::resolve + ::image_ref helpers so the orchestrator can treat pull and build uniformly. All 26 existing pull-only manifests continue to parse unchanged (covered by existing_pull_only_manifests_still_parse test). Call sites updated: podman_client, runtime::DockerRuntime, dev_orchestrator. Dev orchestrator errors out cleanly on Build sources until Step 2 lands build_image support on the runtime trait. Step 1 of docs/rust-orchestrator-migration.md. 10 new unit tests, all pass. Also includes: docs/rust-orchestrator-migration.md (design spec) and docs/STATUS.md resume section for the next session.	2026-04-22 17:46:36 -04:00
archipelago	7ecd30bde2	release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block validation. The old 10s client-side timeout was rejecting roughly 30% of UI calls even though the node was perfectly healthy. 20x stress test on the live .116 node (caught in IBD catch-up at block 797k) used to drop 10 of 20 calls; now drops 0 of 20. What changed: - core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up to 3 times with 500ms and 1500ms backoffs between attempts. Only transient transport errors (timeout, connect refused, send/recv IO) trigger retry. A well-formed bitcoind error response is surfaced immediately - real RPC bugs are never masked. - Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top of reqwest's own timeout, so DNS starvation or TLS wedging can't steal the entire retry budget. - handle_bitcoin_getinfo client builder gained a 3s connect_timeout so a dead bitcoind is fast-failed inside the first attempt instead of eating the whole 15s. - Retry policy extracted into a RetryConfig struct so tests can dial down timeouts to ~100ms per attempt. Production defaults live in RetryConfig::production(). Not changed (tracked as follow-up): - mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the same 10s-timeout pattern. Not migrated to the new wrapper in this release; scheduled for v1.7.43 alongside the render_bitcoin_conf work. - lnd/info.rs and electrs_status have similar 10s/15s timeouts but different failure profiles - audit first, migrate only the ones that actually exhibit the bug. Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing. Uses an in-process hyper server (already a transitive dep) to simulate bitcoind responses; no new crates required. - happy_path_first_attempt: no retry when first attempt succeeds - retries_on_timeout_then_succeeds: first attempt times out, second succeeds, returns OK (uses a short-timeout RetryConfig so the test runs in <1s instead of 15s) - retries_exhausted_on_persistent_connect_refused: all attempts fail against a closed port, error bubbles up, elapsed time confirms backoffs actually ran - does_not_retry_on_rpc_level_error: bitcoind-returned error body is surfaced immediately, no retry - does_not_retry_parse_errors: non-JSON response (e.g. 503 with html body) is NOT retried - guards against the tempting "retry all non-2xx" mistake that would mask real bitcoind misconfig - retry_budget_invariants: asserts total wall-time ceiling stays under 60s so a bumped constant can't silently hang a UI call forever Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the old 10s timeout. Worst-case latency was 48.9s during peak validation; happy-path latency (cached result) remains 28-77ms.	2026-04-22 16:46:28 -04:00
archipelago	048679065e	release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 + v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500) with no recovery path short of SSH. This release adds a self-check guardrail to the update flow. What changed: - apply_update() writes a pending-verify marker with old+new version and a 150s deadline immediately before scheduling the service restart. - verify_pending_update() runs from main.rs startup. If the marker is present and within its freshness window, the new binary waits 15s for nginx + backend to settle, then probes https://127.0.0.1/ every 5s for up to 90s (self-signed certs accepted). - On any probe success within the window, the marker is cleared and nothing else happens. - On window-exhaust, the new binary: 1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts> (quarantined, not deleted, so we can post-mortem). 2. Restores web-ui.bak on top of web-ui. 3. Calls rollback_update() to restore the previous binary. 4. Updates state.current_version to reflect the rollback. 5. systemctl --no-block restart archipelago so the OLD binary boots. - Markers older than 10 minutes are treated as stale and cleared without probing, so a crashed-during-startup marker from weeks ago cannot spontaneously roll back a healthy node on a later reboot. - rollback_update() binary copy now goes through host_sudo instead of tokio::fs::copy, so it escapes the service's ProtectSystem=strict mount namespace. Without this, the rollback silently failed with EROFS on /usr/local/bin and orphaned the rollback - the exact opposite of what auto-rollback is for. Tests: 4 new unit tests in update::tests covering marker round-trip, absent-marker noop, no-panic on verify_pending_update with nothing to verify, and an invariant assert that the 90s probe window stays below the 600s stale threshold. All passing. Side fix: scripts/create-release-manifest.sh was dying with exit 141 (SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail. Replaced with a single awk NR==1 that doesn't short-circuit the upstream pipe, so the release-build flow is idempotent again.	2026-04-22 16:14:35 -04:00
Dorian	3218f71703	release(v1.7.39-alpha): hotfix web-ui perms after OTA (nginx 500) + startup self-heal Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details v1.7.38 shipped with an OTA bug: the tar-extracted staging dir inherited 700 perms and nginx (www-data) returned 500/403 on every request after the swap. .116 hit this on rollout; had to chmod by hand to recover. - update.rs: after extraction, explicitly chmod 755 dirs + 644 files on the new staging dir before the mv into place, so nginx can stat/serve them. - main.rs: self-heal on startup — if /opt/archipelago/web-ui is not world-readable, run `sudo chmod -R u=rwX,go=rX` to repair. This is what rescues nodes upgrading from v1.7.37/v1.7.38, since their extractor (running on the old binary) doesn't have the chmod fix yet — the new binary's first boot fixes the mess before nginx serves a single request. Everything v1.7.38 shipped is still in this release: - auth.rs auto-heals is_onboarding_complete() from setup_complete + password_hash so nodes don't bounce back to /onboarding/intro after browser clear / reboot / update - useOnboarding tri-state: backend-unreachable no longer defaults to intro - login sounds gated by isFirstInstallPhase() — silent after onboarding, typing sounds unaffected - FIPS app / Nostr Relay / Nostr VPN / Routstr / Penpot removed from catalog + frontend + Rust + docker + icons; 15 image versions deleted from tx1138, .168, gitea-local - AIUI baked into release tarball via demo/aiui/ - prebuild hook syncs app-catalog/catalog.json → public/catalog.json Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:26:54 -04:00
Dorian	ca5d2cc42a	release(v1.7.38-alpha): onboarding auto-heal + silent returning logins + app-store trim - auth.rs now infers onboarding-complete from setup_complete + password_hash so nodes stop bouncing users through the intro wizard after browser clear / update / reboot; the flag self-heals to disk on next check - frontend: "backend uncertain" no longer defaults to /onboarding/intro — useOnboarding returns null + callers poll / retry instead of flashing the wizard - login sounds (synthwave, welcome voice, pop, whoosh, oomph) gated by isFirstInstallPhase(); typing sounds unaffected - removed FIPS app, Nostr Relay, Nostr VPN, Routstr, Penpot from catalog, frontend config, Rust AppMetadata + install dispatch + install_penpot_stack; docker/fips-ui + docker/nostr-vpn-ui + apps/penpot dirs and 5 icons deleted; 15 image versions deleted from tx1138, .168, gitea-local registries (.160 Gitea was 502 at release time — follow-up) - AIUI baked into frontend release tarball via demo/aiui/; deploy-to-target falls back to demo/aiui/ when the AIUI sibling checkout is missing - prebuild hook syncs app-catalog/catalog.json → public/catalog.json so the two copies can no longer drift (was the source of the "apps still visible" bug — public/ had stale data) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 13:02:24 -04:00
Dorian	9cb114c50a	release(v1.7.37-alpha): bitcoin-core install fixes + dynamic node UI + full-archive default Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details Install flow - api/rpc/package/install.rs: always append the literal image URL as a last-resort pull candidate in do_pull_image, so images not carried by any configured mirror (docker.io/bitcoin/bitcoin:28.4) still install instead of masquerading as a generic pull failure across every mirror. - api/rpc/package/install.rs: write_bitcoin_conf now skips on any stat error, not just "file exists". Once bitcoin-knots' first-boot chowns /var/lib/archipelago/bitcoin into the container's user namespace (700 perms, UID 100100/100101), the archipelago daemon can't even traverse in — try_exists returns Err which unwrap_or(false) treated as "not present" and drove a doomed write. Now errors out of the directory traversal are treated as "conf already owned by container user" and the write is skipped. Mirrors the lnd.conf pattern. - api/rpc/package/install.rs: drop the hardcoded `prune=550` from the conf default. Operators with multi-TB drives shouldn't be silently pruned; users who want a pruned node can set it in bitcoin.conf themselves. Full archive is the only honest default. - api/rpc/package/config.rs: bitcoin-core now passes explicit -server/-rpcbind/-rpcallowip/-rpcport/-printtoconsole/-datadir CLI args. Vanilla bitcoin/bitcoin:28.4 has no entrypoint wrapper and reads conf + argv only; without these the RPC listens on 127.0.0.1 inside the container and rootlessport can't reach it, so the bitcoin-ui companion gets 502 on every /bitcoin-rpc/ call. Bitcoin Knots keeps its own entrypoint-driven defaults. - container/docker_packages.rs: split bitcoin-core out of the shared AppMetadata arm. bitcoin-core now surfaces as "Bitcoin Core" with bitcoin-core.svg and a Reference-implementation description; the bitcoin + bitcoin-knots ids keep the Knots branding. Fixes the home card showing "Bitcoin Knots" for a Core install. Bitcoin node UI (docker/bitcoin-ui) - index.html: impl name/tagline/logo now dynamic. applyImplBranding() reads subversion from getnetworkinfo — /Satoshi:X/Knots:Y/ resolves to Bitcoin Knots, plain /Satoshi:X/ resolves to Bitcoin Core. Both get their own icon and subtitle. Settings modal replaced its hardcoded Regtest/txindex=1/port-18443 placeholders with live values from getblockchaininfo + getindexinfo + getzmqnotifications. - index.html: new Storage info card (Full Archive · X GB / Pruned · X GB from blockchainInfo.pruned + size_on_disk) visible on the main dashboard, same level as Network. Settings modal mirrors it with the prune height when applicable. - Dockerfile + assets/: bitcoin-core.svg, bitcoin-knots.webp, and the bg-network.jpg used by the dashboard are now COPY'd into the image under /usr/share/nginx/html/assets. Previously the <img src> pointed at paths that 404'd into the SPA fallback and the onerror handler hid the broken logo silently. Frontend - appSession/appSessionConfig.ts: add bitcoin-core to APP_PORTS (8334), HTTPS_PROXY_PATHS (/app/bitcoin-ui/), and APP_TITLES (Bitcoin Core). Without these the AppSessionFrame showed "No URL found for bitcoin-core" and the home/app-list title fell through to the raw id. - settings/AccountInfoSection.vue: backfill What's New entries for v1.7.31 through v1.7.37 that had been missed in earlier cuts. Release plumbing - releases/v1.7.37-alpha/: binary + frontend tarball. - releases/manifest.json: v1.7.37-alpha, sha256/size refreshed. - Cargo.toml / package.json: version bumps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 11:03:47 -04:00
Dorian	7106a81c6a	release(v1.7.36-alpha): bitcoin-core in App Store + Sovereignty Stack + dynamic catalog URL Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - neode-ui/public/assets/img/app-icons/bitcoin-core.svg (NEW): 256×256 Umbrel community Bitcoin icon sourced from getumbrel.github.io/ umbrel-apps-gallery/bitcoin/icon.svg. Referenced by the static catalog, the curated fallback, and the upstream lfg2025/app-catalog entry so every surface shows the same image. - app-catalog/catalog.json + neode-ui/public/catalog.json: add bitcoin-core (v28.4) entry pointing at bitcoin/bitcoin:28.4. Same entry pushed to the lfg2025/app-catalog repo on .160 and the local gitea mirror so nodes see it without needing a full archipelago update. Sovereignty Stack entry added to FEATURED_DEFINITIONS with a description that frames it as a Knots alternative, not a rival. - core/archipelago/src/api/handler/mod.rs: handle_app_catalog_proxy is now instance-scoped (&self) and derives its upstream list from load_registries — each active container registry contributes one `<scheme>://<reg.url>/app-catalog/raw/branch/main/catalog.json` URL in priority order (scheme follows tls_verify). When the operator switches mirrors in Settings, the App Store now follows. Falls back to the legacy hardcoded .160/tx1138 pair only when registry config can't be loaded, so the App Store still renders on nodes that haven't persisted one yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 09:06:10 -04:00
Dorian	987158ef5f	release(v1.7.35-alpha): rootless-netns self-heal + app update button + bitcoin-core 28.4 + Node DID unification Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - core/archipelago/src/bootstrap.rs (NEW): embed scripts/container-doctor.sh and image-recipe/configs/archipelago-doctor.{service,timer} via include_str! and sync to disk + enable the timer on every archipelago startup. Idempotent (content-hash compare), dev-box symlink guard keeps the git checkout untouched, best-effort (warn-only on failure) so bootstrap never blocks server readiness. Wired in main.rs as a background tokio task. - scripts/container-doctor.sh: add fix_rootless_netns_egress(). Detects when the rootless-netns has lost its pasta tap (container-to-container still works but outbound DNS/TCP fails) via an nsenter probe into aardvark-dns; with a two-probe 10s debounce to rule out transients and a host-precheck that bails out if the host itself is offline. When the rootless-netns is truly broken, does a graceful podman stop --all / start --all so pasta + aardvark-dns rebuild the netns from scratch. Bitcoin-knots and every other outbound container recover in one cycle. - core/archipelago/src/update.rs: host_sudo → pub(crate) so bootstrap.rs can reuse the existing systemd-run escape hatch. - apps/bitcoin-core/manifest.yml: bump app version 24.0.0 → 28.4.0 and image bitcoin/bitcoin:24.0 → bitcoin/bitcoin:28.4. Resources aligned with the real container-specs.sh large-disk tune (4 GiB memory cap, cpu_limit: 0 so bitcoind can run -par=auto across every core). - neode-ui/src/views/apps/AppCard.vue + Apps.vue: add an Update button + Updating spinner to every app card that has available-update set. Wires through serverStore.updatePackage(id) — the same RPC the detail view already calls. common.update / common.updating i18n keys added in en.json and es.json. - core/archipelago/src/identity_manager.rs: add create_from_signing_key() that mirrors an existing Ed25519 key as a manager-level identity with a deterministic id (`node-<pubkey16>`). Idempotent across restarts, gets the hex-SVG master avatar. - core/archipelago/src/server.rs: the auto-create path on first boot now mirrors the node's own signing_key (seed-derived on onboarded installs) as a "Node" identity instead of generating a random "Default" keypair. Once this ships, the DID on the Web5 DID Status card (via node.did RPC), the Node entry on the Identities page (via identity.list), and the DID used for peer-to-peer connects (via server_info.pubkey) all resolve to the same seed-derived pubkey. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 08:29:56 -04:00
Dorian	fd3f5d2701	release(v1.7.32-alpha): fix frontend tarball layout + mDNS shutdown hang Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - HOTFIX: v1.7.31-alpha's frontend tarball was packaged with a `neode-ui/` top-level directory instead of the flat layout v1.7.30 and earlier used. Nodes that applied v1.7.31 ended up with `/opt/archipelago/web-ui/neode-ui/index.html` instead of `/opt/archipelago/web-ui/index.html`, and nginx returned 403/500. v1.7.32's tarball is built with `tar -C web/dist/neode-ui .` so files land directly at web-ui root. Broken nodes auto-heal on this update (web-ui dir is replaced). - transport/lan.rs: add Drop impl that calls ServiceDaemon::shutdown() on the mdns_sd daemon. Without this the OS thread it spawns, plus the blocking `receiver.recv()` task, keep the tokio runtime alive past SIGTERM — long enough for systemd's TimeoutStopSec to SIGKILL the service and mark it Failed. Was visible on every update: "shut down cleanly" logged, then 15s later systemd forcibly kills. - main.rs: after logging "Archipelago shut down cleanly", call `std::process::exit(0)` explicitly. Belt-and-suspenders against any future non-daemon thread creeping in (reqwest resolver pool, etc.) and causing the same SIGKILL regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 03:52:22 -04:00
Dorian	fdaa5646b2	release(v1.7.31-alpha): idempotent IndeedHub install + auto-merge default mirrors/registries + 3rd OVH update mirror Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - Backend: install.rs registry reachability probe now strips the `host[:port]/namespace` suffix before appending `/v2/` (the Docker V2 API lives at the host root, not under the namespace) and accepts HTTP 405 in addition to 200/401 as "registry daemon alive". This fixes false "unreachable" reports on the Test button for Gitea and other registries that protect their /v2/ endpoint. - Backend: stacks.rs install_indeedhub_stack now force-removes any leftover indeedhub-* containers and indeedhub-net before creating the stack. A partial install (or the old first-boot stub racing the installer) used to leave containers around that blocked re-install with "name already in use". Re-running the App Store install now self-heals. - Backend: registry.rs load_registries auto-merges any default registry URLs missing from the saved config (appended with priority max+10+i, persisted). Lets new default mirrors (e.g. Server 3 OVH) roll out to existing nodes without manual config edits. Explicit removals still stick — URLs absent from disk AND absent from defaults stay gone. - Backend: update.rs adds DEFAULT_TERTIARY_MIRROR_URL at http://146.59.87.168:3000/ (Server 3 OVH) to default_mirrors, with the same auto-merge-on-load behavior as registries. Test updated for 3-mirror default (.160, tx1138, .168). - Scripts: dropped the first-boot IndeedHub stub (~38 lines in first-boot-containers.sh §8b). It predated the proper stack installer, raced it, and was the main source of the name-conflict mess the stacks.rs cleanup above now also guards against. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 03:26:09 -04:00
Dorian	f9b44f5e2e	release(v1.7.30-alpha): live install/uninstall progress + cleaner pull waterfall Some checks failed Build Archipelago ISO (dev) / build-iso (push) Has been cancelled Details - Backend: unified pull-progress streaming across primary AND fallback registries. Earlier code only streamed for the primary attempt; if it failed fast (VPS 404, etc.) the UI froze at 0% until the fallback finished. The waterfall now uses a single shared helper that streams podman stderr through update_install_progress for every URL tried. - Backend: PackageDataEntry gains uninstall_stage, set at each phase of handle_package_uninstall ("Stopping containers (i/total)", "Cleaning up volumes", "Removing app data"). State flips to Removing during the pipeline. - Frontend: MarketplaceAppCard renders the live progress bar with byte counts during installs, matching the System Update download bar style. - Frontend: AppCard renders the live uninstall stage label per app. Modal closes immediately on confirm so concurrent uninstalls each show their own progress on their own card. - Cleanup: removed dead helpers (image_candidates, rewrite_for_primary, primary_image_url, pull_from_registries_with_skip) made unused by the install.rs refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 19:11:36 -04:00
Dorian	7432d84545	release(v1.7.29-alpha): VPS as default app registry + settings UI - New Settings → App registries page (/dashboard/settings/registries) that mirrors the update-mirrors experience: list of configured registries, test reachability, set primary, add/remove. New registry.set-primary RPC; existing registry.{list,add,remove,test} reused. - Default RegistryConfig flipped: VPS (23.182.128.160:3000/lfg2025) is now Server 1 (primary), tx1138 is Server 2 (fallback). - Install pipeline now rewrites the first pull to the primary registry URL before attempting it. Before this, installs always hit whichever registry the image was hardcoded to, so changing the primary didn't actually affect where images came from. On failure, the existing fallback walk skips the primary (already tried) and walks the rest. - App catalog proxy UPSTREAMS order flipped so the catalog follows the same VPS-first rule. - Reboot overlay: animated "a" logo now sits in the center of the ring (matches the screensaver composition). Extracted the logo-wrapper pattern inline. 7/7 registry tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:54:07 -04:00
Dorian	79ae14a127	release(v1.7.28-alpha): reboot progress overlay + VPS default primary - New reboot progress overlay: full-screen black with the screensaver's pulsing ring, rebooting → reconnecting → back-online → stalled stages, elapsed counter, auto-reload on health-check success, manual reload button at 3 min stall. Mirrors the existing update overlay. - Ring extracted from Screensaver.vue into a reusable ScreensaverRing component so the reboot overlay reuses the same animation. - default_mirrors() now puts the VPS as Server 1 (primary) and tx1138 as Server 2 — new nodes fetch manifests from VPS first; existing nodes keep whatever mirror order they've customized. - What's New entry prepended for v1.7.28-alpha. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 15:06:37 -04:00
Dorian	c3b3b03ee1	release(v1.7.27-alpha): mirror transparency — served-by line + one-click test button - New "Served by {mirror}" line on the System Update page so operators can see which mirror actually served the available manifest (vs. which is configured primary). Backend threads the served URL through UpdateState.manifest_mirror. - New update.test-mirror RPC + per-row lightning-bolt button that pings a mirror and renders reachable/latency or error inline under the URL. - UI polish on the mirrors section: Set Primary, Remove, and the new Test action are compact icon buttons; add-mirror form moved into a dialog. - "What's New" block prepended for v1.7.27-alpha. 21/21 update module tests pass. vue-tsc + vite build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 13:05:42 -04:00
Dorian	97a3803640	release(v1.7.26-alpha): mirror list + origin-relative download URLs Adds a multi-mirror manifest fetch. `check_for_updates` walks a configurable list (data_dir/update-mirrors.json) in priority order and falls through to the next mirror on any HTTP / parse / timeout failure. Two defaults bake in: Server 1 (git.tx1138.com) and Server 2 (23.182.128.160:3000). Critical fix: after parsing a manifest, rewrite every component's `download_url` so its origin matches the manifest URL we fetched. Before this, the manifest hard-coded absolute URLs pointing at one specific server — so even when a node fetched the manifest from a faster mirror, the actual 200MB download went back to the slow original. Now the faster mirror wins end-to-end. New RPCs: update.list-mirrors, update.add-mirror, update.remove-mirror, update.set-primary-mirror. New UI section on the System Update page for operator management. 5 new unit tests for origin parsing and manifest rewriting (21/21 green). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 10:09:28 -04:00
Dorian	5c634baa6d	release(v1.7.25-alpha): TCP transport for public FIPS mesh + modal cleanup Re-adds the TCP transport (`0.0.0.0:8443`) to the rendered fips.yaml alongside UDP. Upstream factory default enables both; we had inadvertently narrowed to UDP-only when the yaml rewriter was last touched, which left nodes unable to reach fips.v0l.io (the public anchor only answers on TCP right now) or talk across networks that block UDP. Backend startup now compares the installed yaml against the current rendered schema and restarts whichever fips unit is active when they differ — so OTA-upgrading nodes pick up the new transport without anyone having to click Reconnect. Dropped the earlier plan to auto-add federated peers as seed anchors: invites don't carry a FIPS-reachable IP:port, and once TCP reconnects the public mesh, federated peers become npub-routable without needing a seed entry. Seed Anchors modal cleanup: replaced malformed header icon with a three-arc broadcast glyph, and the close button now matches the What's New modal (embedded in the card header, same icon + hover style) instead of the earlier floating off-design placeholder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 09:25:53 -04:00
Dorian	d0c50bc9ce	release(v1.7.22-alpha): honest anchor status + Reconnect works on all nodes - fips::service::active_unit() picks whichever fips unit is running (archipelago-fips.service vs upstream fips.service) so handle_fips_restart and handle_fips_reconnect don't silently no-op on hosts where the archipelago-managed unit was never created. - peer_connectivity_summary(anchor_candidates) replaces the old identity-cache check. anchor_connected is now true when at least one authenticated peer's npub matches the public anchor OR any entry in seed-anchors.json, which matches what the user actually cares about ("am I in the mesh?") rather than what the card used to claim ("is this one specific public anchor reachable?"). - FipsStatus::query takes data_dir now (so it can read seed-anchors) rather than identity_dir. All call-sites updated. - handle_fips_reconnect re-pushes seed anchors after restart so the new daemon gets dialed without waiting for the 5-min apply loop. - FipsNetworkCard label drops "(fips.v0l.io)" — misleading now that multiple anchors may be configured. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 07:08:26 -04:00
Dorian	e88719df50	release(v1.7.21-alpha): operator-editable FIPS seed anchors Adds a local seed-anchor list at <data_dir>/seed-anchors.json. Each entry is {npub, address, transport, label}. On archipelago startup and every 5 minutes the list is pushed into the running fips daemon via `fipsctl connect <npub> <addr> <transport>`, so a cluster can anchor itself independently of the global fips.v0l.io. A flaky or unreachable public anchor no longer strands a fresh install. New RPCs: - fips.list-seed-anchors - fips.add-seed-anchor (validates npub1… + host:port) - fips.remove-seed-anchor - fips.apply-seed-anchors (on-demand re-dial) New standalone UI card at views/server/FipsSeedAnchorsCard.vue. Not wired into Home.vue / Server.vue — operator places it per the entry-point convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 06:21:37 -04:00
Dorian	4d8a9e66e3	release(v1.7.20-alpha): stop auto-apply scheduler killing the service The 3AM auto-update path called std::process::exit(0) immediately after apply_update returned. apply_update had already spawned a 2s- delayed systemctl restart, but exit(0) killed the runtime before that spawned task could run — and the unit's Restart=on-failure does not trigger on a clean exit 0, so the service stayed dead until someone SSH'd in and started it manually (.253 hit this today). Scheduler now returns from the task without killing the process; apply_update's existing restart path (same one the UI's Install Update button uses) brings the new version up cleanly. Also hardens the ISO CI: the AIUI inclusion step now falls back to extracting from the newest release tarball if the runner's cached /opt/archipelago/web-ui/aiui path is missing, so a reprovisioned runner can't silently ship a frontend tarball without AIUI. The ISO build step also sanity-checks the binary exists before invoking the builder. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:33:11 -04:00
Dorian	9fc9696dbd	release(v1.7.19-alpha): kill stale available_update + numeric version compare load_state now drops any stored available_update whenever the running binary version differs from what's on disk — the old migration only cleared it when the stale entry happened to match the new version, so skipping releases (e.g. sideloading 1.7.16 → 1.7.18 without 1.7.17) left a pointer to an intermediate version as the "update available", which the UI then offered as a downgrade prompt. check_for_updates also uses a numeric version comparator so a stale or cached manifest with an older version can't offer itself as an update, and 1.7.10 correctly outranks 1.7.9 past the single-digit patch boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 04:04:20 -04:00
Dorian	062e1fada2	release(v1.7.18-alpha): transitive peers default Trusted + update-flow logs Flip transitively-discovered federation peers to Trusted instead of Observer. Hints are already only ingested from peers we trust and only peers we trust are re-exported via build_local_state, so the chain of trust is already vetted end-to-end — making the user promote each newcomer by hand was friction with no security win. Backend: - federation/sync.rs: merge_transitive_peers now inserts TrustLevel::Trusted (doc comment updated to explain the transitive-trust rationale) - update.rs: info! log at download start (version, components, total_bytes, staging path), cancel (staging wiped?, marker cleared?), and apply (backup path) so journalctl reveals where a stuck update actually is Frontend: - SystemUpdate What's New block gets a v1.7.18-alpha entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 20:20:36 -04:00
Dorian	4706dd16e7	release(v1.7.17-alpha): cancel download + stall detection Add Cancel Download button + stall detection so a wedged download can be recovered instead of leaving the UI stuck on a frozen progress bar. Backend: - update.rs: DOWNLOAD_CANCEL AtomicBool + DOWNLOAD_PROGRESS_AT AtomicU64 - download loop checks cancel between chunks and during retry backoff (500ms slices instead of one exponential sleep, so Cancel wakes fast) - cancel_download() wipes staging + clears update_in_progress - update.status exposes download_progress.stalled (30s no-progress) - RPC: update.cancel-download + dispatcher entry Frontend: - SystemUpdate.vue: Cancel Download button, amber stall styling, stalled copy, cancel-download confirm branch in modal - i18n keys (en + es) for cancel/stall flow - v1.7.17-alpha What's New block in AccountInfoSection Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 19:10:34 -04:00
Dorian	3cbfcabedf	release(v1.7.16-alpha): bidirectional + transitive federation, no self-peering Federation join flow now notifies the inviter with the joiner's name and immediately bumps state so the Federation UI reloads without a manual Sync click. Accepting an invite that points back at the local node is rejected up front (DID/pubkey/onion match). After a peer joins, we spawn a transitive sync that pulls the new peer's federated peer hints so all nodes in the federation learn about each other as Observer entries. Federation.vue polls every 5s while mounted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 18:12:02 -04:00
Dorian	0fad7ee431	release(v1.7.15-alpha): bulletproof downloads — resume, retry, real progress download_update Each component download is now resumable via HTTP Range requests (Range: bytes=N-) and retried up to 6 times with exponential backoff (5/15/30/60/120/180s). On a dropped connection the next attempt picks up at the last written byte offset instead of restarting at zero. Streams via reqwest::Response::chunk() to the staging file so a 160 MB frontend tarball doesn't sit in RAM. SHA is verified over the complete file at the end of each component; mismatch nukes the staged file and restarts from scratch. Real download progress counters New AtomicU64 globals DOWNLOAD_BYTES/DOWNLOAD_TOTAL are updated from the chunk loop. update.status exposes them as download_progress.{bytes_downloaded, total_bytes, active}. The SystemUpdate.vue progress bar now polls update.status every second instead of incrementing a fake random counter — and crucially, if the user navigates away and back, the component picks up the in-progress download from the backend atomics immediately. Update-check retries handle_update_check now retries the manifest fetch up to 3 times with a 5s gap if the first try hits a transport error, so a momentary gitea hiccup doesn't make a node report "up to date" when there actually is a new release. Tight 10s connect timeout per attempt keeps the total bounded. Artefacts: archipelago 1070c87f…c081c162b 40584792 archipelago-frontend-1.7.15-alpha.tar.gz 8e630eba…63fd43f 162078068 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 17:17:58 -04:00
Dorian	923c404678	release(v1.7.14-alpha): install overlay + FIPS real fix + AIUI restore Install UX SystemUpdate.vue now shows a full-screen overlay after apply: the BitcoinFaceAscii logo, a target-version label, an indeterminate progress stripe (solid orange; solid green on ready), and an elapsed-time readout. Polls /health every 1.5s and auto-reloads once the backend reports the new version. 3-min stall → "Reload now" button. Download UI also shows a spinner + "Finishing download — verifying checksum…" while the fake bar sits at 95%. FIPS reconnect — for real this time New fips.reconnect RPC does stop → start → wait 20s → re-poll → classify. Classification buckets: connected / daemon_down / no_seed_key / no_outbound_udp_or_anchor_down / peers_but_no_anchor, each with a plain-language hint surfaced verbatim by the Reconnect button. The real reason nodes like .198/.253 couldn't reach the anchor: identity::write_fips_key_from_seed was writing fips_key.pub as a bech32 npub TEXT file, but upstream fips expects 32 raw bytes. The daemon silently authenticated with garbage. Fix: PublicKey::to_bytes() → raw 32 bytes, and new fips::config::normalize_pub_file migrates legacy files by decoding the npub and rewriting in place. fips.reconnect also re-installs the config + healed keys to /etc/fips before restarting. AIUI preservation + restore apply_update was wiping /opt/archipelago/web-ui/aiui because the Vue build doesn't include it — every OTA lost the Claude sidebar. The preserve block now copies aiui/ + archipelago-companion.apk from the old web-ui into the staging dir before the swap, and prefers new-tar versions if present. To restore it on the three nodes that already lost it (.116/.198/.253), this release bundles the 85 MB aiui build into the frontend tarball. Frontend component size is now ~155 MB. Download / install timeouts Backend download client timeout 1800s → 3600s (1 h). Larger tarball + slow gitea raw throughput put us above the old cap. Frontend update.download rpc timeout 30 min → 65 min to match. package.install rpc timeout 15 min → 45 min — IndeedHub pulls 6 images and was timing out mid-install. UI nit "Rollback to Previous" → "Rollback Available". App-catalog proxy already landed in v1.7.13. Artefacts: archipelago 725e18e6…3c525e6 40462288 archipelago-frontend-1.7.14-alpha.tar.gz c35284be…ff2c16 162077052 (+aiui) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 16:40:25 -04:00
Dorian	30a26f94f7	release(v1.7.13-alpha): proxy app catalog server-side (CORS + CSP fix) The Discover / Marketplace page fetched the app catalog directly from git.tx1138.com/lfg2025/app-catalog/raw/.../catalog.json in the browser. Two blockers hit the fleet simultaneously: (1) tx1138's Gitea doesn't emit Access-Control-Allow-Origin so the HTTPS fetch got CORS-blocked; (2) the HTTP IP-port fallback (http://23.182.128.160:3000/...) falls outside the node's `connect-src` CSP. Users saw the hardcoded fallback instead of the live catalog. Backend: new authenticated GET /api/app-catalog handler uses reqwest to pull catalog.json server-side (15s timeout) and returns it with application/json + 1h Cache-Control. Tries the HTTPS URL first, HTTP IP-port second. Frontend: curatedApps.ts now calls /api/app-catalog (same-origin, no CORS/CSP) with credentials included so the session cookie authenticates the proxy. Baked /catalog.json stays as the last resort. Artefacts: archipelago 0aaf7262…b979f22c 40371192 archipelago-frontend-1.7.13-alpha.tar.gz 27505811…efc6f4142 76982505 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 15:43:45 -04:00
Dorian	b8ab06dd47	release(v1.7.10-alpha): apply namespace fix + FIPS cascade + profile polish THE apply fix archipelago.service uses ProtectSystem=strict, so /opt and /usr are read-only inside the service's mount namespace. sudo inherits that namespace — every sudo mkdir/mv/chown from apply_update was hitting EROFS even as root. Every prior "Failed to apply update" was a symptom of this. New `host_sudo()` helper wraps every filesystem call in `sudo systemd-run --wait --collect --pipe -- <cmd>`, which spawns a transient unit with systemd's default (no ProtectSystem) protections — the command runs in the host namespace and can touch /opt/archipelago + /usr/local/bin normally. FIPS cascade (#2) Home.vue and Server.vue both carry a FIPS row that previously only looked at {installed, service_active, key_present}. Now they also read anchor_connected + authenticated_peer_count and mirror the full FIPS card: green "Active · N peers" when healthy, orange "No anchor" when the DHT bootstrap has failed. Profile paste URL fallback (#4) Web5Identities.vue list + editor previously had `@error="display:none"` on the <img>, which hid the tag without re-rendering the fallback — a broken pasted URL showed up blank. Replaced with reactive pictureLoadFailed / listPictureFailed flags plus a watcher that resets on URL change. Broken URL now falls back to the initial (or identicon for seed-derived identities). Small-upload data URL (#3) Uploaded profile pictures ≤ 64 KB are now inlined as `data:image/png;base64,...` into profile.picture on the client before calling update-profile. That kind-0 event is fetchable by any Nostr client — no Tor needed. Larger uploads fall back to the onion-rooted public_url with a hint telling the user to paste a public https:// URL for broader visibility. Deferred: #1 FIPS Reconnect "actually fixes" — the current Reconnect calls fips.restart which clears the daemon state, but when the anchor is truly unreachable (UDP 8668 blocked by network/ISP), no amount of restart can help. A richer diagnostic is out of scope for this bundle. Artefacts: archipelago 4a77c704…82aa6f8 40379696 archipelago-frontend-1.7.10-alpha.tar.gz 0644a436…54f58 76983846 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:46:03 -04:00
Dorian	cbf30e2e29	release(v1.7.8-alpha): fix apply ETXTBSY — use mv instead of install apply_update's binary swap called `sudo install -m 0755 src /usr/local/bin/archipelago`. install opens the destination for write with O_TRUNC; the kernel returns ETXTBSY (exit 1) when the path is a currently-running executable, which it always is during apply because apply_update is called by the archipelago RPC handler — running as archipelago itself. Every previous "Failed to apply update" was this one root cause; the manual sideload path only worked because we stopped the service first. rename() doesn't modify the file it replaces — it repoints the path at a new inode while the old inode stays alive for any process that has it mapped. `mv` uses rename(). Switched to `sudo mv` (with prior chmod+chown on the staging file) so the swap is atomic and tolerant of the running binary. Frontend tarball byte-identical to v1.7.7-alpha; only the binary version string changes. Artefacts: archipelago 2753daec…48094d 40377648 archipelago-frontend-1.7.8-alpha.tar.gz 4fb79664…0172e9 76984615 (reused) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 13:04:09 -04:00
Dorian	9c6251c784	release(v1.7.6-alpha): robust apply_update + manifest-override env var apply_update frontend swap Transient EROFS on .198 (filesystem hiccup — root FS mounts with errors=remount-ro so a fleeting glitch can bounce /opt to RO for a moment) caught the pre-cleanup `rm -rf web-ui.new web-ui.bak` mid- stride and aborted the apply. Rewrote the swap to use a timestamped staging dir (web-ui.new.<ms>) and a timestamped old-copy path so nothing needs to be rm'd before the extract. After the new tree is mv'd into place, the previous rollback copy is rotated aside with a .<ms> suffix (best-effort) and this apply's old copy becomes the new web-ui.bak. If the final mv fails, the staged old is restored so nginx keeps serving. handle_update_check manifest override handle_update_check takes the git path whenever ~/archy/.git exists. On the dev box (.116) that meant the Pull & Rebuild button was always the only option even though the manifest-path OTA was already wired via ARCHIPELAGO_UPDATE_URL. Now: if that env var is set, we skip the git detection entirely and use the manifest path. The regular fleet (no env var, no repo) hits the manifest branch naturally; beta dev nodes (repo + no env var) still get Pull & Rebuild; dev nodes with the env var explicitly set can finally test the manifest OTA end-to-end. Artefacts: archipelago 356e78cc…91a6dd 40372288 archipelago-frontend-1.7.6-alpha.tar.gz 4fb79664…0172e9 76984615 (reused) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:33:10 -04:00
Dorian	170f8ae787	release(v1.7.4-alpha): fix Install Update tar extraction + progress overshoot apply_update was extracting the frontend tarball with `tar -xzf -C /opt/archipelago`, but the tar contents are the inside of web-ui/ (root entries are ./test-aiui.html, ./assets/, etc.). So the files landed directly in /opt/archipelago instead of under web-ui/, and tar bailed on nginx-owned paths mid-extraction. First end-to-end OTA test (.198) found it: "tar: ./assets/SystemUpdate-…js: Cannot open: No such file or directory". Now extracts into web-ui.new, chowns, then atomically swaps: move existing web-ui → web-ui.bak, then web-ui.new → web-ui. Same pattern as the manual sideload that's been working. Frontend: SystemUpdate.vue fake download progress was capped at "<90" with a Math.random()*15 increment — the last tick could push to ~104.99%. Capped at 95% with a smaller step so it stops at 95 and the real RPC completion jumps it to 100. Artefacts: archipelago a14ad7e4…2a2be3 40361984 archipelago-frontend-1.7.4-alpha.tar.gz 4fb79664…0172e9 76984615 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 12:02:14 -04:00
Dorian	3a479e5b09	release(v1.7.3-alpha): sidebar version sync + FIPS reconnect + profile pic render Sidebar version detect_build_version() no longer reads /opt/archipelago/build-info.txt first. That file was written by the ISO installer at flash time and never rewritten by OTA or sideload, so after any binary swap the sidebar kept advertising whatever the ISO shipped with. Now just returns env!("CARGO_PKG_VERSION") unconditionally — always matches the running binary. FIPS card The two-column grid in FipsNetworkCard.vue placed version/npub boxes side-by-side on mobile but the anchor-status panel forced col-span-2, creating an unbalanced empty column at every desktop width. Anchor status moves to its own full-width row below the grid. When the anchor is not reached, a "Reconnect" button appears next to the status line; it calls fips.restart (45s timeout), waits 5s for the daemon to come back, then reloads fips.status. Surfaces whether the restart actually recovered the anchor in a status flash. Profile picture render Uploaded profile pictures are stored with an onion-rooted URL so external Nostr clients can fetch them. The local browser isn't Tor-routed though, so the <img src> silently 404'd and the UI fell back to showing initials. Added a displayableUrl() helper on Web5Identities.vue that rewrites http://<onion>/blob/<cid>[?...] to same-origin /blob/<cid> for rendering, while the stored URL keeps its onion prefix so publishing to Nostr still works for external viewers. Pass-through for data: URLs and already-relative paths. Identity row title The identity list header now renders profile.display_name (when set) and keeps identity.name as a muted parenthetical. Before, only the internal name was shown and a user who'd customised their Nostr display_name saw a mismatch between their own UI and what peers rendered. Artefacts: archipelago 99184b95…22dc1b 40350664 archipelago-frontend-1.7.3-alpha.tar.gz 7b933cf4…74a8bc 76987031 Changelog layman-style per the saved feedback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 11:44:59 -04:00
Dorian	0d5128a121	release(v1.7.2-alpha): fix Install Update + identity avatar backfill + label Three user-visible fixes shipped together. 1. update.apply permission-denied apply_update() was doing fs::copy into /usr/local/bin/archipelago and tar xzf into /opt/archipelago as the archipelago user — both root-owned. The backup step succeeded (it wrote to data_dir) but the swap failed with a silent permission denied, wrapped as "Failed to apply archipelago". Now uses `sudo install -m 0755` for the binary and `sudo tar -xzf` for the frontend, plus a post-apply `sudo systemctl --no-block restart archipelago` scheduled 2s after the RPC reply so the UI sees success. 2. Apply → Install label en/es locale strings: applyUpdate / applyTitle / applyNow changed from "Apply" to "Install". Matches the user's mental model and distinguishes the user-facing verb from the internal apply_update() function. 3. Identity avatar backfill Identities created before df83163f had profile=None on disk and so rendered as initials. load_record() now synthesizes an IdentityProfile with a default picture (identicon for regular identities, the hex node SVG for derivation_index=0) when profile is missing. The synthetic profile lives only in the returned record; the file stays untouched so a later explicit Save persists whatever the user actually chose. Artefacts: archipelago 70e5444e…67c589 40381960 archipelago-frontend-1.7.2-alpha.tar.gz 806b027b…358a824 76983699 Changelog rewritten layman-style per saved feedback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 11:25:10 -04:00
Dorian	df83163f15	feat(identity,update): default avatars, public blobs, long-running downloads Follow-up to 1fb71b4b on the same v1.7.0-alpha line. Identity avatars • New module `avatar.rs` generates two deterministic SVG styles keyed off the pubkey: a 5×5 mirrored identicon for sub-identities and a hexagonal-network motif for the master (seed index 0) identity. Both returned as base64 data URLs, so a fresh identity has a recognisable picture before the user uploads anything. • `IdentityManager::create()` and `create_from_seed()` populate `profile.picture` on creation. Index 0 gets the node SVG; all other seed-derived + ad-hoc identities get the identicon. Blob store — public flag for profile assets • `BlobMeta.public` (default false) added; `BlobStore::put()` takes a `public: bool`. Missing in legacy meta files = false. • `POST /api/blob` now stores uploads with public=true and returns `public_url` alongside `self_test_url`. public_url is `http://<node-onion>/blob/<cid>` (no cap) if Tor has published the archipelago hidden service, else falls back to the local path. • `GET /blob/<cid>` bypasses the HMAC capability check when the requested blob is flagged public — external Nostr clients fetching a kind-0 `picture` URL can't hold a cap. • Mesh callers (content_ref attachments, dispatch rehydration) pin public=false explicitly so nothing leaks out of the mesh path. Profile editor UX • Collapsed Save + Save & Publish into one button — the Save action now persists locally AND publishes the kind-0 metadata event in one step. Uploads store `public_url` into `profile.picture` / `profile.banner` so the published URL is reachable by external clients. Update client — the 15-second cliff • Frontend `rpcClient.call` for `update.download` now has an explicit 30-minute timeout (was falling back to the default 15 s). `update.apply` gets 5 min, `update.git-apply` gets 15 min. Matches what the backend is actually willing to wait for. • Backend `load_state()` reconciles `state.current_version` with `CARGO_PKG_VERSION` on every start. Sideloaded or reflashed nodes were stuck advertising the old version even with a new binary in place, which kept re-offering the same release as an update. Manifest changelog rewritten for fleet readers per the saved feedback (no function names, no file paths). Artefacts refreshed: binary 12f838c5…5ba82d 40381864 frontend dc3b63af…e9a8370 76984288 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 10:03:38 -04:00
Dorian	1fb71b4b4e	fix(update): 30-min download timeout + tidier progress number Follow-up to 56d4875b, same v1.7.0-alpha shipping band. Backend download timeout bumped from 300s to 1800s (update.rs) with an explicit 30s connect timeout. git.tx1138.com raw-file throughput can sit around 70–80 KB/s, which meant OTA downloads were timing out at ~55% through the 40 MB binary even though the SHA would have matched on a full pull. 30 min gives ample headroom for the worst LAN-to-VPS link we actually hit. Frontend: SystemUpdate.vue now formats downloadPercent with toFixed(2) via a new computed, so the progress card shows "45.23%" instead of "45.270894%". Cosmetic only; the underlying ref still tracks raw floats. Manifest changelog rewritten in user-facing language per the saved feedback — no file paths, function names, or "root cause" phrasing. Artifacts refreshed: binary d85a71c5…982f4 40360936 frontend 8adcdacf…e687f6 76986852 ISO at image-recipe/results/archipelago-installer-unbundled-x86_64.iso (Apr 20 09:00) carries both fixes for fresh installs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 09:03:24 -04:00
Dorian	56d4875b35	fix(vpn,reconcile): restore WG peers on boot + filebrowser spec drift Follow-up to 8b7cb002 (no version bump — same v1.7.0-alpha manifest): * WireGuard peer persistence. Kernel peer state is ephemeral; the add-peer RPC wrote each peer to data_dir/nostr-vpn/peers/.json but nothing re-pushed them on reboot. Result on .198: wg0 came up listening with zero peers after last night's reboot. Added vpn::restore_wg_peers() — reads the peers dir, waits up to 30s for wg0 to exist, then replays each via `archipelago-wg add-peer`. Spawned from main.rs alongside the other startup tasks. Reconcile + filebrowser drift. scripts/container-specs.sh load_spec_ filebrowser now declares SPEC_NETWORK="archy-net" (to match what first-boot-containers.sh creates) and pins the filebrowser-data volume + wget-style healthcheck so the reconciler stops reporting network drift. Without this, reconcile would kill the healthy first-boot filebrowser container and recreate it on bridge, breaking the archy-net DNS name the backend proxies to. Manifest binary sha/size refreshed: 6c178a76…3582cc, 40361912 bytes. Rebuilt ISO at image-recipe/results/archipelago-installer-unbundled-x86_64.iso (Apr 20 07:10) carries both fixes baked in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 07:10:49 -04:00
Dorian	6b78bd692d	fix(fips,kiosk): auto-activate FIPS at onboarding end + 5-min kiosk wait 1. FIPS auto-activate at server startup only fires if fips_key already exists on disk, which on a fresh install is never true until AFTER onboarding. By the time the user completes seed-generate/restore, archipelago has been running for minutes and the startup task has long since exited. User still had to hit Activate. Fix: call spawn_post_onboarding_fips_activate() from the tail of handle_seed_generate and handle_seed_restore — the moment the fips_key materialises, a detached task runs `fips::config::install` + `archipelago-fips.service activate`. Logged only, never blocks the onboarding RPC. 2. Kiosk health-poll window was 30 × 2s (configs/ copy was 60 × 2s but unused — the heredoc in build-auto-installer-iso.sh is what actually lands on disk). On .198's slower hardware archipelago /health wasn't ready within 60s, so Chromium launched against a not-yet-running backend → blank window until manual reboot. Bumped to 150 × 2s (5 min) + TimeoutStartSec=360. .253 was already well within the window; this protects the slower box too. Standalone configs/archipelago-kiosk.service updated in lockstep so the two copies don't drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:09:46 -04:00
Dorian	b643b30bba	fix(fips,iso): bulletproof FIPS from install — no Activate button needed Problems addressed (all observed on .198): * fips_key was written as raw 32 bytes; upstream fips daemon reads it with read_to_string() and bailed with "stream did not contain valid UTF-8", crashlooping indefinitely. * Activate button racy: user had to hit it, and it would keep failing silently because the daemon couldn't parse its own config. * FIPS schema drift (already fixed in 7d8a5864) put the config write path behind the same broken "Activate" flow, so the fix alone didn't help existing nodes. * Journal was on tmpfs — every reboot wiped install/onboarding history, making post-hoc debugging impossible. Changes: * identity.rs: write fips_key as bech32 nsec + newline. load_fips_keys now auto-migrates legacy 32-byte files to bech32 the first time it reads them, so OTA updates from v1.5.0-alpha self-heal without user action. * server.rs: post-onboarding auto-activate task runs on every archipelago startup. If fips_key exists it ensures /etc/fips/fips.yaml is schema-current and starts archipelago-fips.service. Pre-onboarding nodes stay quiet (guarded on fips_key_exists). * ISO build: un-mask archipelago-fips + archipelago-wg + wg-address — all use ConditionPathExists on their key files, so systemd silently skips them pre-onboarding (no MOTD [FAILED]). Only nostr-vpn stays masked (legacy service, superseded by upstream fips). * Journald made persistent via /var/log/journal + 500M cap, so install and first-boot logs survive reboots for diagnosis. After this, a fresh install + onboarding should bring FIPS up automatically with no user interaction. The UI "Activate" button can stay as an escape hatch (the RPC is still there) but is no longer on the critical path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:33:21 -04:00
Dorian	7d8a586401	fix(fips,iso): match upstream fips schema + guard ISO against stale binary 1. FIPS daemon config schema drifted: upstream jmcorgan/fips now takes `node.identity.persistent: true` (keys read from config-dir/fips.key) and `transports.udp.bind_addr: "0.0.0.0:PORT"` instead of `identity.key_file/pub_file` + `transports.udp.enabled/port`. The `tor:` transport was dropped entirely; archipelago handles Tor fallback itself. fips.yaml generated by archipelago::fips::config now matches the upstream schema, and archipelago-fips.service stops crashlooping on Activate. Observed on .198: 52 restarts with "data did not match any variant of untagged enum TransportInstances at line 7 column 3". 2. ISO backend-binary capture didn't verify that the captured binary matched the checked-out Cargo.toml version. Today's 14:40 ISO shipped a stale 1.4.0 binary because `core/target/release/archipelago` pre-dated the 1.5.0-alpha bump — the build grabbed it via the first-priority "local release build" path without looking at it. All four capture sources now go through verify_backend_version() which greps the binary for the expected version string; mismatches are skipped so the build falls through to the source-build path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 15:19:56 -04:00
Dorian	122d00f81e	feat(fips): surface anchor connectivity + peer count in FipsStatus Two new fields on the /rpc fips.status payload: - authenticated_peer_count: how many FIPS peers the daemon has an authenticated session to right now. 0 means isolated / not on the mesh; >0 means traffic to any known npub can DHT-route. - anchor_connected: true when the public anchor (fips.v0l.io, npub1zv58cn7…) is present in the daemon's identity cache. The anchor bootstraps DHT routing for general-case deployments, so this is the best single-value indicator the UI can show for "will federation traffic over FIPS work between previously- unknown peers?" Implementation: fips::service::peer_connectivity_summary shells out to `sudo -n fipsctl show peers` + `... show identity-cache` (archipelago user already has NOPASSWD:ALL per the ISO sudoers and live fleet nodes, confirmed). Failure returns (0, false) so the UI degrades to "unknown" state without crashing. Only queried when service_active — pre-onboarding / daemon-down nodes skip the fipsctl call entirely. UI side (FipsNetworkCard) consumes the full status JSON, so the two new fields are available via existing prop plumbing; visual treatment can come later. Also fixes ISO build (commit 3e04456c wasn't sufficient): the Dockerfile needs `cargo build --release --bins` — upstream FIPS added a `fips-gateway` binary target, and plain `cargo build --release` only builds the default bin list, which caused `cargo deb --no-build` to fail hunting for the missing binary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 08:40:31 -04:00
Dorian	ec5f14166a	feat(federation): periodic sync every 30 minutes Until now federation.sync-state only fired on (a) user clicking Sync in the UI or (b) server-name push. That meant own_fips_npub, last_transport, peer state updates — all the things v1.5 added for auto-upgrade from Tor to FIPS — didn't propagate until the user poked the button. Fix: spawn a background task in server.rs that runs federation::sync_with_peer for every Trusted peer every 30 minutes. First run is 60s after boot (let onboarding settle) and peers are staggered 5s apart to not hammer Tor's SOCKS proxy with concurrent connects. The sync path already prefers FIPS (via PeerRequest), so once peers have learned each other's fips_npub (now automatic thanks to the own_fips_npub broadcast in state snapshots), subsequent periodic syncs route over FIPS — transport badge cycles from 'tor' to 'fips' on its own without user action. Covers task #30. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 08:32:11 -04:00
Dorian	f1c982bc95	fix(nostr): profile publish broadcasts to ALL enabled relays Previously handle_identity_publish_profile defaulted to a single hard-coded relay (ws://localhost:18081) so the user's kind:0 profile event only ever landed on the local relay — hence "Manage Relays shows N connected, but profile edits don't propagate" from testing. Fix — two-layer change: - identity_manager::publish_profile now takes `&[String]` relays instead of one URL. Adds each relay to the nostr-sdk client, gives 15s for handshakes, publishes, then surfaces per-relay accept/reject in a new ProfilePublishOutcome struct so the UI can show WHICH relays accepted vs. rejected and WHY. - RPC handle_identity_publish_profile no longer defaults to the local relay: pulls the ENABLED list from nostr_relays::list_relays (the same table that powers Manage Relays) and publishes to every entry. Accepts an optional `relays: [...]` override for tests. - At-least-one-accept guarantee: if every relay rejects, the call errors instead of silently reporting published=true. User gets a real error message listing the failures. - Response shape: `{event_id, accepted: [urls], rejected: [[url, reason]], relays_attempted: N, published: bool}` so the UI can show a useful status block after clicking Publish. relay_url_matches is tolerant of trailing-slash / case differences since nostr-sdk canonicalises URLs internally. Covers the publishing half of task #29; avatar/banner upload UI is still open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 04:42:25 -04:00
Dorian	2d78c2ef2b	feat(peers): bidirectional /network peer requests Before: Alice sent /network.send-request to Bob, Bob accepted via /network.accept-request and gained Alice in his peers list, but Alice was never notified — her pending row sat there and she had to manually add Bob separately. User complaint: "it's strange you have to do it both ways." Fix — the accept now fires a best-effort connection_accepted message back to the requester: - handle_network_accept_request: after writing the local peer record, assembles a `{type: "connection_accepted", request_id, from_did, from_onion, from_pubkey}` JSON, signs + encrypts + POSTs it to the requester via node_message::send_to_peer. Uses PeerRequest internally so it prefers FIPS and falls back to Tor. - handle_node_message: parses incoming plaintext as JSON; on a match for type=connection_accepted, auto-adds the sender to peers.json (the existing self-pubkey guard in add_peer still applies) and short-circuits the normal store_received path so the acceptance doesn't also land as a chat message in Alice's inbox. Offline handling: if Alice is offline when Bob accepts, the notify warns and the local accept still succeeds. Alice will receive any subsequent message from Bob normally; future iteration could retry on reconnect. Federation-invite flow (federation.accept-invite → notify_join) was already bidirectional; this closes the gap for the peer flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 04:34:37 -04:00
Dorian	84943aaa04	feat(server): lazy-bind FIPS peer listener so fips.install doesn't need an archipelago restart Previously the server checked `fips0` once at startup; if the interface wasn't up (pre-onboarding, or post-onboarding before the user clicked Activate FIPS), the peer listener never bound and stayed unreachable until the next archipelago restart. Replaced with a `peer_late_bind_loop` background task: polls every 30s for an fd00::/8 address on `fips0` and binds the listener the moment one appears. First tick fires immediately so the hot path — fips0 already up at startup — is still zero-cost. Cancellation cascades through the same `tokio::sync::watch` channel the main listener uses. Side effects: - main.rs no longer computes peer_addr eagerly; dropped the unused param from serve_with_shutdown. - FipsTransport::is_available already caches the service probe so the 30s poll doesn't thrash systemctl. Covers task #21. Unblocks the first-boot + onboarding flow for fresh ISO installs on .253. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 04:21:20 -04:00
Dorian	bfe2603f69	feat(federation): advertise own_fips_npub in state snapshots Pre-v1.4 federation pairs (who exchanged invites before fips_npub was part of the invite code) had no path to learn each other's FIPS npub — they'd stay Tor-only forever even after upgrading. Fix: every state snapshot now carries the sender's own_fips_npub, and update_node_state refreshes the stored fips_npub on the receiver side whenever it differs. - NodeStateSnapshot.own_fips_npub (serde default for back-compat). - build_local_state takes own_fips_npub alongside the other single-value fields. - handle_federation_get_state populates own_fips_npub from identity::fips_npub, with a fallback to the upstream daemon's /etc/fips/fips.pub for legacy nodes that never materialised a seed-derived key. - storage::update_node_state now writes fips_npub into the FederatedNode when a new value arrives and trims whitespace before comparing, so key rotations also flow through. - Test fixtures (storage + transport/delta + sync) updated for the new field; existing tests pass. Net effect: on the next sync, .116 and .228 learn each other's fips_npub (currently null from the old invite) and subsequent federation calls route FIPS-first automatically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 04:16:05 -04:00
Dorian	3c83440a60	fix(peers): reject self-add in add_peer() Observed on .228: /var/lib/archipelago/peers.json contained an entry matching the node's own node_key.pub pubkey. It had been added 2026-03-02 and stuck around forever since add_peer() only dedupes by pubkey — nothing stops a pubkey that happens to be ours. How it probably got there: somewhere in the auto-add paths (node-message receive, mesh federation bridge, invite back-and-forth) a message we'd sent was fed back and the receiver-side add used the echoed from_pubkey without realising it was us. Doesn't matter which path — the guard belongs in storage. add_peer now short-circuits when the candidate pubkey matches data_dir/identity/node_key.pub. Helper is_own_pubkey best-effort: unreadable identity → returns false so normal peers aren't blocked. Also manually purged the one stray entry on .228 (1 removed, 2 real peers remain). Future deploys include this guard so the phantom can't come back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 04:02:15 -04:00
Dorian	4c8c4ebc47	feat(federation): v1.5.0 bump + transport badge on each node card Every federated node card now shows a colored badge indicating how archipelago actually reached the peer on the most recent successful call — FIPS / TOR / LAN / MESH — not a prediction based on available addresses. The badge is hidden when we've never reached the peer. Backend: - Cargo.toml: 1.4.0 → 1.5.0 (visible in the sidebar health endpoint). - FederatedNode gains last_transport + last_transport_at (serde default for back-compat with v1.4 nodes.json files). - federation::storage::record_peer_transport(did, onion, transport) — writes both fields plus last_seen after each successful peer call. Matches by DID first, falls back to onion. - federation::sync::sync_with_peer now calls record_peer_transport immediately after a successful PeerRequest return, so the badge on the sync'ing peer's card reflects the transport the call actually rode (fips vs tor). Frontend: - types.ts FederatedNode gains last_transport / last_transport_at (union-typed to the four known kinds). - NodeList.vue: new transportBadge(node) returns {label, cls, title} tuned per transport. Hidden when last_transport is absent so we never lie. Tooltip shows "Last reached via <x> · <time ago>" so stale data is self-evident. Removed the predictive icon from the transport store — badge is now 100% ground-truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 02:51:26 -04:00

1 2 3 4 5 ...

360 Commits