archy/docs/UNIFIED-TASK-TRACKER.md
archipelago 6b7af884ab docs: multinode pass swapped to .5, 5x gate launched
.198 IBD/pruned blocker → user chose swap over wait/hardware. .116 ruled
out (no bitcoin container), .120 ruled out (reserved for another dev). .5
(archy-x250-beta) is fully synced despite also being sub-1TB/pruned;
bootstrapped bats+jq and launched the 5x destructive gate there.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
2026-07-01 13:04:04 -04:00

134 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Unified Task Tracker — OTA 1.8.0 + Master Plan
Single working list for everything left before 1.8.0 ships and the next master-plan
exit criteria (multinode + workstreams B/C/D) are met. Supersedes the open-task
sections of `docs/SESSION-1.8.0-OTA-PROGRESS.md` and `docs/PRODUCTION-MASTER-PLAN.md`
as the day-to-day tracker — those docs remain the historical record / detailed
narrative and are still linked from here where useful. **Ordered fastest/simplest
first** so we work top-down instead of hunting across docs.
Verified against actual code state on 2026-07-01 (not just doc text — several
items the source docs still listed as "open" turned out to already be shipped;
those are marked ✅ below with the commit that did it, so we stop re-litigating them).
---
## Tier 0 — Quick / mechanical, no blockers
- [ ] **Update `tests/lifecycle/TESTING.md`'s stale Release Gates checklist** (lines
289296) — several boxes are unchecked but actually true now:
- #1 bitcoin-stops: covered by `tests/lifecycle/bats/bitcoin-knots.bats` stop/restart
tier, included in the 5/5 green gate run.
- #2 `ARCHY_ITERATIONS=5` on .228: **GREEN 2026-06-23 per CLAUDE.md** — check the box.
- #5 cargo 0 warnings: confirmed 0 warnings on `cargo build --release` (2026-07-01).
- #7 layman changelog: `CHANGELOG.md` is backfilled with layman-readable entries
through v1.8.00-alpha — check the box.
- Leave #3 (multinode), #4 (backend-survives-restart / Phase-3 default-on), #6
(LoC decision), #8 (tag pushed) unchecked — genuinely still open, see Tier 2/3.
- [x] ~~Finish the archival/full-node manifest generalization~~ — investigated 2026-07-01:
the hardcoded fallback names in `dependencies.rs:48-52` (`electrs`, `mempool-electrs`,
`mempool-web`) are legacy **alias** ids for `electrumx`/`mempool`, resolved via
id-mapping in a dozen other places (`install.rs`, `runtime.rs`, `config.rs`, etc.),
not separate un-migrated apps with their own manifests. `electrumx` and `mempool`
themselves already declare `bitcoin:archival`. The fallback is correct as-is —
not tech debt, closing this item rather than risk breaking alias resolution.
- [x] ~~Confirm/close the Portainer image-pin item~~ — confirmed 2026-07-01:
`146.59.87.168:3000/lfg2025/portainer:2.19.4` is present in `podman images` on
all 3 LAN nodes (.116/.198/.228), i.e. actually resolvable/pulled from the mirror.
Not a live bug.
- [x] ~~grafana Quadlet "stuck activating"~~ — checked live on .116 (2026-07-01):
`grafana.service` is `active (running)`, container `Up 2 hours (healthy)`. The
2026-06-21 report is stale for grafana. **strfry still unconfirmed** — not
installed on any of .116/.198/.228 to check directly; low priority until someone
actually needs it installed.
## Tier 1 — Medium effort, unblocked
- [x] ~~immich → Quadlet migration~~ — investigated 2026-07-01, turned out already done:
immich uses the same `install_stack_via_orchestrator` primitive as netbird/btcpay
(`immich_stack_app_ids()` in `stacks.rs:690`), and is confirmed running as real
Quadlet units live on .228 (`immich_server.container`, `immich_postgres.container`,
`immich_redis.container`, all active). Not a legacy in-cgroup app — the only
remaining piece is the fleet-wide Phase-3 default-flip, already tracked in Tier 2.
- [x] ~~Netbird reinstall adoption path~~ — investigated 2026-07-01, **not a bug, by
design.** `adopt_stack_if_exists()` (`stacks.rs:140-198`) is only used as a
fallback when the orchestrator has no manifest for the app — there's nothing to
render certs/config from in that case, so skipping rendering is correct. When
the orchestrator *does* have the manifest (the normal path), the reconcile loop
already re-renders certs even for adopted-running containers, fixed in
`4519dbf0` (`prod_orchestrator.rs:1707-1708`).
- [x] ~~TanStack Query (or equivalent) investigation~~ — spike complete 2026-07-01,
**recommendation: don't adopt / close as not needed.** Only 3 stores actually fetch
data, WebSocket push already handles hot data (server-info/package-data), no
cache-invalidation or stale-data bugs found, migration would touch 62 RPC call
sites for no concrete payoff. If boilerplate ever bothers us, extract a
`usePolling()` composable instead — much cheaper than a query-cache migration.
## Tier 2 — High effort, mostly unblocked (the actual next exit criteria)
- [~] **Multinode test pass** (`docs/multinode-testing-plan.md`) — worked the
preconditions on .198 2026-07-01:
- ✅ cleared 2 stale failed-unit records (`archy-mempool-db.service`,
`meshtastic.service` — both `not-found`/dead since 6 and 5 days ago, harmless
bookkeeping, `systemctl --user reset-failed`).
- ✅ nginx `/app/lnd/` proxy target confirmed correct (→ `18083`, matches the
running `archy-lnd-ui` port) — the plan's "stale proxy target" concern doesn't
apply here.
- ⛔ .198 disk (448GB) is below the 1TB archival threshold + was only 21%
through IBD — user chose to **swap in a different node** rather than wait/add
storage. **.116 ruled out** (no bitcoin container installed at all, just the
UI companion). **.120 ruled out** (reserved for another developer). **.5**
(archy-x250-beta, Tailscale `100.72.136.5`) chosen: also sub-1TB (472GB, so
still pruned — that ceiling is shared by every non-.228 node), but **fully
synced** (`ibd:false`, blocks==headers 956,240). Bootstrapped bats 1.11.1 +
jq 1.7.1 onto it 2026-07-01 and **launched the 5× destructive gate
(`ARCHY_ITERATIONS=5 ARCHY_ALLOW_DESTRUCTIVE=1`) — running now**, log at
`/tmp/gate.log` on .5, background poller watching for the `RESULTS` banner.
- Once .5's gate reports: bring the rest of the fleet to precondition, then the
cross-node federation/mesh/transport suites. This is the literal
"next exit criterion" called out in `CLAUDE.md`.
- [ ] **Phase-3 Quadlet default-flip** — code is validated + opt-in via
`ARCHIPELAGO_USE_QUADLET_BACKENDS=true` on .228/.198 already (confirmed live
2026-07-01). Flip needs: re-test on a healthy idle legacy node, then flip the
default, then multinode gate re-run.
- [ ] **Per-app test coverage for the ~30 apps with zero automated coverage**
framework exists (bats + reusable helpers), just needs per-app suites written.
- [ ] **Convert remaining multi-container legacy stacks to the manifest-owned model**
(workstream A tail) — netbird's legacy installer is already deleted (`89d397bb`);
immich (see Tier 1) and any other multi-container stacks are what's left.
- [ ] **Developer tooling CLI suite** (validate/render/local-install/lifecycle-test) —
APP-PACKAGING-MIGRATION-PLAN.md step 5, needed before external devs can publish.
## Tier 3 — Blocked on a decision or resource only you can supply
- [ ] **Version naming decision (1.7.99-alpha → 1.8.0 vs 1.8.00-alpha)** — code is
otherwise ready to tag; this is a one-line decision, then a mechanical bump +
tag + push. **Needs your call**, not more engineering.
- [ ] **Workstream B signing ceremony**`core/archipelago/src/trust/anchor.rs:21`
still has `RELEASE_ROOT_PUBKEY_HEX = None`. Needs the offline
`RELEASE_MASTER_MNEMONIC` to run `docs/workstream-b-signing-runbook.md`'s
4-step ceremony — can't be automated by me.
- [ ] **Bitcoin multi-version fleet-wide OTA**`.228` fully working on branch,
per your prior gating this rollout is explicitly held for your decision on
timing (`docs/bitcoin-version-bulletproof-rollout.md`).
- [ ] **3ccc stock-Meshtastic RF validation** — needs a live send/receive test with
physical radios in your hands; code fix is in place, just unverified live.
## Backlog — deferred, no scope decided, low priority
- [ ] **Marketplace protocol (workstream C)** — design-only (`docs/marketplace-protocol.md`),
no tooling/trust UX built. Future work, not urgent.
- [ ] **DHT distribution (workstream D)** — confirmed design-only, no code
(`docs/dht-distribution-design.md` explicitly says "Status: Design (no code yet)");
an experimental iroh provider skeleton exists behind a feature flag for future
PoC measurement, nothing fleet-facing.
- [ ] **Custom live voice-call protocol** — deprioritized 2026-07-01 per user request;
scope not yet decided. Revisit after the tiers above are worked down.
---
*Historical narrative and detailed per-session logs remain in
`docs/SESSION-1.8.0-OTA-PROGRESS.md` and `docs/PRODUCTION-MASTER-PLAN.md` §6/§8b —
this doc is the live "what's left, in priority order" list. Update it (don't just
append to the old docs) as items close or new ones surface.*