110 Commits

Author SHA1 Message Date
archipelago
a483fe4baa fix: derive launch port from URL authority, not naive rsplit
reachable_lan_address() parsed the launch port with url.rsplit(':')
which yields "8096/" for manifest interfaces.main URLs that carry a
path (http://localhost:8096/). That fails to parse and silently drops
a perfectly reachable launch URL, so apps like jellyfin, btcpay-server,
fedimint, gitea, nextcloud and portainer showed running with no launch
link in the UI. New launch_url_port() reads digits after the final
colon (mirroring port_from_url in the RPC layer) and tolerates a
trailing path. Adds regression tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 03:35:19 -04:00
archipelago
c49e8fcacd fix: harden OTA updates, AIUI desktop gap, LND no-proxy
- update.rs: post-OTA probe falls back to http://127.0.0.1/ on connect
  error (nginx binds :80, not :443) so good updates are no longer rolled
  back; recover stuck update_in_progress; avoid ETXTBSY on running binary
- LND: REST client bypasses proxy, GET newaddress p2wkh, wallet
  readiness/unlock after restart
- Dashboard.vue: chat route back to plain h-full (desktop bottom-gap fix)
- vite.config.ts: dev-only /aiui proxy
- tests/release/run.sh: release gate harness (static+frontend+backend)
- CHANGELOG: v1.7.89-alpha notes

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-13 01:23:32 -04:00
archipelago
b8ac68d844 fix: restore aiui and bitcoin receive before release 2026-06-12 05:10:03 -04:00
archipelago
0339268c43 chore: sync cargo lock for v1.7.87-alpha 2026-06-12 04:55:09 -04:00
archipelago
b11c6c17d1 chore: release v1.7.86-alpha 2026-06-12 04:21:18 -04:00
archipelago
e474a2b4c9 chore: sync generated release artifacts 2026-06-12 03:15:24 -04:00
archipelago
6a30ff11bd chore: release v1.7.84-alpha 2026-06-11 04:44:58 -04:00
archipelago
22df3f8f5f chore: release v1.7.83-alpha 2026-06-11 03:03:32 -04:00
archipelago
af9d531a00 chore: sync cargo lock for v1.7.82-alpha 2026-05-22 17:24:42 -04:00
archipelago
68784be4db chore: sync cargo lock for v1.7.81-alpha 2026-05-21 21:48:46 -04:00
archipelago
c31c3765f4 chore: sync cargo lock for v1.7.80-alpha 2026-05-21 00:39:53 -04:00
archipelago
4da6e3b43c chore: sync cargo lock for v1.7.79-alpha 2026-05-20 23:17:04 -04:00
archipelago
e61c757633 chore: release v1.7.78-alpha 2026-05-20 20:53:23 -04:00
archipelago
0898c54765 chore: bump version to v1.7.77-alpha 2026-05-20 00:38:26 -04:00
archipelago
92c58141af fix(apps): stabilize saleor and netbird launch 2026-05-19 21:45:17 -04:00
archipelago
7b2f4cb05f chore: sync cargo lock for v1.7.75-alpha 2026-05-19 20:27:34 -04:00
archipelago
bd69ef41d5 fix(apps): repair netbird login and iframe focus 2026-05-19 19:21:43 -04:00
archipelago
3e01e57c8d chore: release v1.7.72-alpha 2026-05-19 17:42:11 -04:00
archipelago
ca3e2ee0ca fix(settings): update whats new release notes 2026-05-19 17:33:45 -04:00
archipelago
cede77f3bc chore: update release lockfile 2026-05-19 16:17:13 -04:00
archipelago
881779005a chore: update release lockfile 2026-05-19 14:45:20 -04:00
archipelago
edaece8716 chore: update release lockfile 2026-05-19 09:41:57 -04:00
archipelago
e9898ead76 chore: update release lockfile 2026-05-18 11:55:20 -04:00
archipelago
92c578d3d9 chore: update release lockfile 2026-05-18 10:17:20 -04:00
archipelago
b49d8f1f8a chore: update release lockfile 2026-05-18 09:31:57 -04:00
archipelago
d0b08d2790 chore: update release lockfile 2026-05-17 23:25:16 -04:00
archipelago
837ba63466 chore: update release lockfile 2026-05-17 23:03:44 -04:00
archipelago
a992abcd06 chore: release v1.7.61-alpha 2026-05-17 22:13:21 -04:00
archipelago
4d6b4f76af chore: release v1.7.60-alpha 2026-05-17 20:45:56 -04:00
archipelago
0a94c0097f chore: release v1.7.59-alpha 2026-05-17 19:44:54 -04:00
archipelago
e05e356d64 chore: release v1.7.58-alpha 2026-05-17 18:40:50 -04:00
archipelago
7804223152 chore: release v1.7.57-alpha 2026-05-17 17:30:04 -04:00
Dorian
5818541721 chore: release v1.7.56-alpha 2026-05-14 09:13:58 -04:00
Dorian
835c525218 chore(release): stage v1.7.55-alpha 2026-05-13 15:09:22 -04:00
archipelago
c0751e2551 chore(release): stage v1.7.54-alpha 2026-05-06 09:23:57 -04:00
archipelago
1a0d8a432c chore(release): stage v1.7.53-alpha 2026-05-05 13:59:50 -04:00
archipelago
745cb1c626 chore(release): stage v1.7.52-alpha 2026-05-05 11:29:18 -04:00
archipelago
05e6c2e738 fix: release v1.7.51-alpha install hardening 2026-05-01 05:02:39 -04:00
archipelago
be9f9528c3 fix: release v1.7.50-alpha OTA runtime repair 2026-05-01 03:14:07 -04:00
archipelago
7ab788d178 chore: release v1.7.49-alpha 2026-04-30 16:37:54 -04:00
archipelago
f507b847ef chore: release v1.7.48-alpha
Hotfix: archipelago.service ExecStartPre now mkdirs /run/containers and
/var/lib/containers before the unit's mount-namespace setup tries to bind
them. Without this, fresh nodes that don't have /run/containers (e.g.
nodes provisioned without a prior podman session) fail at the namespace
step with:

  Failed to set up mount namespacing: /run/containers: No such file or directory
  Failed at step NAMESPACE spawning /bin/bash: No such file or directory

Existing nodes don't pick up systemd unit changes via OTA — they need a
one-time `systemctl edit archipelago` adding the same mkdir. ISO installs
from this version forward have the fix baked in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 16:27:22 -04:00
archipelago
8a2899ab4a chore: release v1.7.47-alpha
Sync-perf tuning for bitcoin/bitcoin-core/bitcoin-knots/electrumx.

- Drop the --cpus=2 cap on bitcoin/electrumx variants. Script verification
  is parallelizable; the cap halved IBD speed on 4-8 core machines.
- Bump bitcoin --memory 4g→8g so dbcache=4096 has headroom for mempool +
  connection buffers + I/O. 4g was OOM-prone during heavy IBD.
- Bump electrumx --memory 1g→2g + add CACHE_MB=2048 + MAX_SEND=10MB.
- bitcoin-core CLI args gain -dbcache=4096 -par=0 -maxconnections=125.
- bitcoin-knots manifest matched (1024MB pruned / 4096MB full + par=0).

Future v2: host-RAM-aware dbcache scaling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:47:51 -04:00
archipelago
992b673b20 chore: release v1.7.46-alpha
Follow-up to v1.7.45-alpha closing the remaining tasks identified by the
resilience sweeps + the new bitcoin orphan / install-fail-vanish bugs.

User-visible:
- Health monitor: stop paging on orphaned containers from variant switches
- Install fail: card stays visible (was vanishing) with error message
- Stack pull progress: interpolate 20→70% (was stuck at 20%)
- docker.io → lfg2025 mirror: bitcoin/gitea/nextcloud/valkey

Internal:
- Resilience harness — install-wait uses expected_containers_for, ui+auth
  probes retry with 60s backoff, dep-snapshot fix
- InstallProgress gains optional `message` field (frontend renders it
  when phase is None)

binary  $(stat -c %s releases/v1.7.46-alpha/archipelago)  sha256:$(sha256sum releases/v1.7.46-alpha/archipelago | awk '{print $1}')
tarball $(stat -c %s releases/v1.7.46-alpha/archipelago-frontend-1.7.46-alpha.tar.gz)  sha256:$(sha256sum releases/v1.7.46-alpha/archipelago-frontend-1.7.46-alpha.tar.gz | awk '{print $1}')

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 14:50:33 -04:00
archipelago
4ec6ca98c1 chore: release v1.7.45-alpha
Resilience-validated release. Three full sweeps of the new resilience
harness against .228 confirm no shipstoppers.

Big user-visible:
- Bitcoin RPC auth durably correct via host-rendered nginx.conf bind-mount,
  replaces fragile post-start exec that failed under restricted-cap rootless
  podman ("crun: write cgroup.procs: Permission denied")
- Multi-container stack installs (indeedhub, immich, btcpay, mempool) now
  emit phase events at every boundary so the progress bar advances
- Apps no longer vanish from the dashboard mid-install (absent-scanner skips
  packages in transitional states)
- Indeedhub fresh installs work end-to-end (was 8500+ restart loop): five
  missing env vars (DATABASE_PORT, QUEUE_HOST, QUEUE_PORT,
  S3_PRIVATE_BUCKET_NAME, AES_MASTER_SECRET) added to install code
- Tailscale install fixed: --entrypoint string was being passed as a single
  shell-line arg; switched to custom_args array
- Catalog cleaned of broken entries (dwn, endurain, ollama removed; nextcloud
  restored on docker.io)
- Bitcoin Core update path uses correct image (was looking for nonexistent
  lfg2025/bitcoin:28.4)
- ISO installs now allocate swap on the encrypted data partition

Infra:
- New resilience harness (scripts/resilience/) — black-box state-machine
  tester, every app × every transition. Run before each release.

Sweep #3 final: PASS 107 / FAIL 12 / SKIP 14. The 12 fails are 1 cosmetic
(homeassistant trusted_hosts), 8 harness/timing false-positives, and 3
non-shipstopper tracked items. Down from 23 in baseline sweep #1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 12:31:45 -04:00
archipelago
310c709aba chore(release): bump version to 1.7.43-alpha 2026-04-23 13:21:58 -04:00
archipelago
e103925a4e feat(container): ProdContainerOrchestrator with build-or-pull, adoption, reconcile
Step 3 of the rust-orchestrator-migration. New file prod_orchestrator.rs (999 LOC)
implements the full public surface that will replace scripts/first-boot-containers.sh:

  * install / start / stop / restart / remove / upgrade / status / list / logs / health
  * adopt_existing: read-only scan that claims containers matching our manifests by
    name, without recreating — preserves the v1.7.42 fixture on .116.
  * reconcile_all: level-triggered, per-app failures collected rather than aborting.
  * install_fresh: build-or-pull (Step 2 trait methods), relative build contexts
    resolved against the manifest directory.

Naming rule (answered design Q1): UI app IDs (bitcoin-ui/electrs-ui/lnd-ui) get the
archy- prefix; backends keep their bare ID. An explicit extensions.container_name
always wins. Codified in compute_container_name() with unit tests for all three tiers.

Concurrency (answered design Q4): per-app tokio::sync::Mutex<()> created lazily,
protecting every mutating op against the reconciler loop. Acquiring the per-app
lock only needs a read lock on the map, so independent apps do not serialize.

16 tests: 3 sync naming rule tests + 13 tokio async tests covering install (pull,
build-absent, build-present, relative-context), reconcile (noop/exited/missing/
mixed-failure), adopt-by-name, upgrade sequence ordering, list filtering, health
state mapping, and unknown-app-id rejection. All pass.

Not wired into main.rs yet — that is Step 6. Crate builds clean with expected
unused warnings for the new re-exports.
2026-04-22 18:32:31 -04:00
archipelago
0ac673deb4 release(v1.7.42-alpha): bitcoin RPC retry wrapper so syncing nodes stop flashing red
Closes failure mode adjacent to FM3 (docs/bulletproof-containers.md): on
a syncing pruned node, bitcoind's RPC thread blocks for 5-10s during block
validation. The old 10s client-side timeout was rejecting roughly 30% of
UI calls even though the node was perfectly healthy. 20x stress test on
the live .116 node (caught in IBD catch-up at block 797k) used to drop
10 of 20 calls; now drops 0 of 20.

What changed:
- core/archipelago/src/api/rpc/bitcoin.rs: bitcoin_rpc_call now retries up
  to 3 times with 500ms and 1500ms backoffs between attempts. Only
  transient transport errors (timeout, connect refused, send/recv IO)
  trigger retry. A well-formed bitcoind error response is surfaced
  immediately - real RPC bugs are never masked.
- Per-attempt hard deadline (tokio::time::timeout, 15s) layered on top
  of reqwest's own timeout, so DNS starvation or TLS wedging can't
  steal the entire retry budget.
- handle_bitcoin_getinfo client builder gained a 3s connect_timeout
  so a dead bitcoind is fast-failed inside the first attempt instead
  of eating the whole 15s.
- Retry policy extracted into a RetryConfig struct so tests can dial
  down timeouts to ~100ms per attempt. Production defaults live in
  RetryConfig::production().

Not changed (tracked as follow-up):
- mesh/mod.rs bitcoin_rpc_getblockcount and related helpers use the
  same 10s-timeout pattern. Not migrated to the new wrapper in this
  release; scheduled for v1.7.43 alongside the render_bitcoin_conf
  work.
- lnd/info.rs and electrs_status have similar 10s/15s timeouts but
  different failure profiles - audit first, migrate only the ones
  that actually exhibit the bug.

Tests: 6 new unit tests under api::rpc::bitcoin::tests, all passing.
Uses an in-process hyper server (already a transitive dep) to simulate
bitcoind responses; no new crates required.
  - happy_path_first_attempt: no retry when first attempt succeeds
  - retries_on_timeout_then_succeeds: first attempt times out, second
    succeeds, returns OK (uses a short-timeout RetryConfig so the test
    runs in <1s instead of 15s)
  - retries_exhausted_on_persistent_connect_refused: all attempts fail
    against a closed port, error bubbles up, elapsed time confirms
    backoffs actually ran
  - does_not_retry_on_rpc_level_error: bitcoind-returned error body is
    surfaced immediately, no retry
  - does_not_retry_parse_errors: non-JSON response (e.g. 503 with html
    body) is NOT retried - guards against the tempting "retry all
    non-2xx" mistake that would mask real bitcoind misconfig
  - retry_budget_invariants: asserts total wall-time ceiling stays
    under 60s so a bumped constant can't silently hang a UI call
    forever

Validated live on .116: 20/20 bitcoin.getinfo calls succeed during IBD
catch-up (chain at block 797419 -> 797464), vs ~40% baseline under the
old 10s timeout. Worst-case latency was 48.9s during peak validation;
happy-path latency (cached result) remains 28-77ms.
2026-04-22 16:46:28 -04:00
archipelago
d1bcf271f9 release(v1.7.41-alpha): post-OTA auto-rollback so a bad release cannot strand the fleet
Closes failure mode FM5 from docs/bulletproof-containers.md: the v1.7.38 +
v1.7.39 rollouts left every affected node on an unreachable UI (nginx 500)
with no recovery path short of SSH. This release adds a self-check
guardrail to the update flow.

What changed:
- apply_update() writes a pending-verify marker with old+new version and
  a 150s deadline immediately before scheduling the service restart.
- verify_pending_update() runs from main.rs startup. If the marker is
  present and within its freshness window, the new binary waits 15s for
  nginx + backend to settle, then probes https://127.0.0.1/ every 5s for
  up to 90s (self-signed certs accepted).
- On any probe success within the window, the marker is cleared and
  nothing else happens.
- On window-exhaust, the new binary:
    1. Moves the broken /opt/archipelago/web-ui to web-ui.failed.<ts>
       (quarantined, not deleted, so we can post-mortem).
    2. Restores web-ui.bak on top of web-ui.
    3. Calls rollback_update() to restore the previous binary.
    4. Updates state.current_version to reflect the rollback.
    5. systemctl --no-block restart archipelago so the OLD binary boots.
- Markers older than 10 minutes are treated as stale and cleared without
  probing, so a crashed-during-startup marker from weeks ago cannot
  spontaneously roll back a healthy node on a later reboot.
- rollback_update() binary copy now goes through host_sudo instead of
  tokio::fs::copy, so it escapes the service's ProtectSystem=strict
  mount namespace. Without this, the rollback silently failed with
  EROFS on /usr/local/bin and orphaned the rollback - the exact
  opposite of what auto-rollback is for.

Tests: 4 new unit tests in update::tests covering marker round-trip,
absent-marker noop, no-panic on verify_pending_update with nothing to
verify, and an invariant assert that the 90s probe window stays below
the 600s stale threshold. All passing.

Side fix: scripts/create-release-manifest.sh was dying with exit 141
(SIGPIPE from tar tvzf pipe head pipe awk) under set -euo pipefail.
Replaced with a single awk NR==1 that doesn't short-circuit the upstream
pipe, so the release-build flow is idempotent again.
2026-04-22 16:14:35 -04:00
Dorian
85417de952 release(v1.7.40-alpha): fix tarball root perms at source so OTA can't 500 again
v1.7.38 and v1.7.39 both shipped with `./` inside the frontend tarball marked
drwx------ (700). Tar extraction preserves archive perms, so every node that
pulled the OTA landed with /opt/archipelago/web-ui at 700, nginx (www-data)
returned 500 "permission denied" on every page, and the browser showed
"Internal Server Error nginx". .116 hit this on both v1.7.38 and v1.7.39
rollouts. The v1.7.39 runtime self-heal in main.rs was the wrong layer —
systemd's ReadOnlyPaths namespace made /opt/archipelago read-only from inside
the archipelago service, so chmod from there returned EROFS.

Root cause: create-release-manifest.sh used mktemp -d (700 default umask) for
staging, then tar preserved that 700 in the archive's root entry.

Fix the archive itself:
- chmod 755 staging dir + `find -type d -exec chmod 755` + `-type f chmod 644`
  before tar, so the on-disk entries are correct.
- tar --owner=0 --group=0 --mode='u=rwX,go=rX' to normalize archive perms
  belt-and-braces in case file-mode drift ever reappears.
- Post-tar verify: `tar tvzf | head -1` must show drwxr-xr-x at root, or
  the release script aborts before the manifest is even generated.

Binary unchanged semantically — the main.rs self-heal stays in as a last-
resort belt (can't hurt on nodes whose FS isn't namespace-isolated), and the
update.rs in-extractor chmod stays in so v1.7.40-onwards extractors are
double-safe. The authoritative fix is the archive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:54:44 -04:00
Dorian
b8d084368e release(v1.7.39-alpha): hotfix web-ui perms after OTA (nginx 500) + startup self-heal
v1.7.38 shipped with an OTA bug: the tar-extracted staging dir inherited 700
perms and nginx (www-data) returned 500/403 on every request after the swap.
.116 hit this on rollout; had to chmod by hand to recover.

- update.rs: after extraction, explicitly chmod 755 dirs + 644 files on the
  new staging dir before the mv into place, so nginx can stat/serve them.
- main.rs: self-heal on startup — if /opt/archipelago/web-ui is not
  world-readable, run `sudo chmod -R u=rwX,go=rX` to repair. This is what
  rescues nodes upgrading from v1.7.37/v1.7.38, since their extractor
  (running on the old binary) doesn't have the chmod fix yet — the new
  binary's first boot fixes the mess before nginx serves a single request.

Everything v1.7.38 shipped is still in this release:
- auth.rs auto-heals is_onboarding_complete() from setup_complete +
  password_hash so nodes don't bounce back to /onboarding/intro after
  browser clear / reboot / update
- useOnboarding tri-state: backend-unreachable no longer defaults to intro
- login sounds gated by isFirstInstallPhase() — silent after onboarding,
  typing sounds unaffected
- FIPS app / Nostr Relay / Nostr VPN / Routstr / Penpot removed from
  catalog + frontend + Rust + docker + icons; 15 image versions deleted
  from tx1138, .168, gitea-local
- AIUI baked into release tarball via demo/aiui/
- prebuild hook syncs app-catalog/catalog.json → public/catalog.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 13:26:54 -04:00