saving

2026-06-12 05:48:30 -04:00
parent 9d91ac8ebc
commit 64a5ab4b7f
37 changed files with 4530 additions and 407 deletions
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -1,117 +1,131 @@
 # openrun — roadmap

-Living backlog. Items are grouped by phase; within a phase, top to bottom is rough priority. Each item names its **test plan** alongside the implementation so they ship together.
-
-This isn't a contract. If something turns out to be harder, smaller, or pointless once we're in the code, edit it here rather than rationalise away.
+Living backlog. Items are grouped by phase; within a phase, top to bottom is rough priority. Edit this file as we go — items removed because they turned out to be misguided are more valuable signal than items completed.

 ---

-## Phase 0 — test scaffolding (prerequisite, one-time)
+## Done — highlights from past sessions

-The codebase has no `tests/` directory and no pytest in `pyproject.toml`. Everything below assumes this is in place first.
+The major arcs that are already landed (see git history for per-commit detail):

- **Add pytest + a minimal `tests/` layout.** Add `pytest` and `pytest-cov` to `[dependency-groups.dev]` in [pyproject.toml](pyproject.toml). Create `tests/{unit,integration,fixtures}/`. A `tests/conftest.py` should expose a shared `tmp_conn` fixture that builds an in-memory SQLite via `openrun.db.connect(":memory:")` (so the live `data/garmin.db` is never touched).
-  - **Test plan:** N/A — this *is* the test plan.
-  - **DoD:** `uv run pytest` exits 0 with one trivial passing test.
+**Project structure & test scaffolding**
+- `src/openrun/` package split out: `config` (UserProfile / HRZones / BanisterParams + TOML reader & writer), `db`, `model`, `plots`, `setup`, and `ingest/*` (auth, garmin_api, garmin_export, fit_linker, manual, time_in_zone).
+- `openrun.toml` is the single source of user-specific config — no athlete numbers anywhere in `src/openrun/`.
+- pytest + pytest-cov in `[dependency-groups.dev]`, `tests/unit/` and `tests/integration/`, `tmp_conn` fixture for in-memory SQLite. **88 tests passing** at last count.

- **Pin a tiny fixture set.** One `.fit` (~5 MB workout), one Takeout `summarizedActivities` snippet (~3 activities), one Connect-format snippet, one of each daily wellness JSON. Anonymise; commit under `tests/fixtures/`. These back every test below.
-  - **DoD:** Each fixture has a one-line `tests/fixtures/README.md` entry recording its provenance + what it exercises.
+**Ingest correctness & robustness**
+- `fit_linker.record_link` + `relink` — absolute paths stored at link time; `openrun-link-fit <new_root> --relink` rewrites the table after an export moves. `_resolve_fit_path` is now an O(1) existence check with a clear "run --relink" hint on miss.
+- `link()` accepts an injected `fit_iter` for unit testing without real FIT files; warn-and-skip on activity-id collision.
+- `handle_fit` skips Takeout-style filenames cleanly and surfaces a "run openrun-link-fit" hint in the summary.
+- Schema round-trip tests: ingest JSON → DB row → loader → DataFrame for `activities` + 7 wellness tables.
+
+**Derived metrics & helpers**
+- `banister_forecast(history, future, *, today=None)` — splices historical load with planned future load and runs the same EWMA; load-bearing splice invariant tested.
+- `plan_to_daily_load(plan, *, tl_per_km, race_day_tl_per_km, race_dates)` — converts weekly km plan rows into daily load with race-week-aware distribution.
+- `calibrate_tl_per_km(conn)` — empirical median/IQR of TL/km from history, replaces hard-coded constants in race-plan flows.
+- `personal_records(activities, distance_bins_km, tolerance)` — fastest run within ±tolerance of each bin.
+- `weekly_time_in_zone(conn)` — ISO-week pivot of the cached TIZ table.
+- `load_sleep_stages(conn)` — deep/light/rem/awake seconds + percentages, with NaN-aware invariant (present-stage pcts sum to 1).
+- `plot_fit_decoupling(records, *, segments)` — new `openrun.plots` submodule (lazy matplotlib import).
+
+**Per-second data from the live API (Path B)**
+- `openrun-sync` downloads each new activity's original FIT via `/download-service/files/activity/{id}`, stores it in `data/fit/<id>.fit`, and links it through `fit_linker.record_link` — so decoupling, FIT-based TIZ, and the route map work without a website export.
+- `_extract_fit_bytes` handles both the zip-wrapped and bare-FIT responses; `download_fit` is idempotent (skips when on-disk + linked); `backfill_fits` pulls per-second history for past activities (`--fit-backfill [--fit-type] [--fit-limit]`).
+- Tested at the `garth.download` boundary (`tests/unit/test_fit_download.py`); network call mocked per ROADMAP conventions.
+
+**Off-watch volume integration**
+- New `manual_activities` table + `openrun-import-manual <csv>` CLI + web form on the Manual log page.
+- `daily_training_load_series(..., include_manual=True)` unions both sources via UNION ALL; same-day rows sum.
+
+**Web app (Streamlit)**
+- `openrun-web` launches a multipage browser UI on `localhost:8501`.
+- Pages: Home, Dashboard, Activities (with row-click drill-in to Activity detail), Race plan (editable + projected PMC), Manual log, Recovery, Efficiency, Sync (in-browser zip ingest), Welcome wizard.
+- `st.navigation`-driven sidebar: pages grouped into sections; **Welcome only appears when openrun.toml is missing or the DB is empty**.
+- First-run wizard writes `openrun.toml`, runs `init_workspace`, and offers to ingest a zip in the same flow — no terminal needed.

 ---

-## Phase 1 — code gaps (existing functionality)
-
-### 1.1  Takeout export doesn't ingest splits — RETIRED (premise wrong for current account)
-~~[ingest/garmin_export.py](src/openrun/ingest/garmin_export.py) only handles `summarizedActivities`. Lap detail in Takeout lives in `<activity_id>_<name>.json` under `DI-Connect-Fitness/`~~
-
-**Finding (2026-05-18):** the actual Takeout dump for this account contains zero per-activity lap-detail JSONs under `DI-Connect-Fitness/` — only `summarizedActivities`, `personalRecord`, `trainingPlan`, `workout`, `gear`. Lap data is only retrievable from the FIT files via `link_fit_files.py` (Path B sync also pulls it directly from `/activity-service/activity/{aid}/splits`). If a future Takeout shape *does* include `<activity_id>.json` lap files, reopen this item; until then the assumption that drove it doesn't hold.
+## Phase 1 — code gaps still open

 ### 1.2  Unhandled Takeout JSON categories
-[project_takeout_export_quirks.md](../.claude/projects/-Users-noise-Documents-obsidian-RunningLog-garmin/memory/project_takeout_export_quirks.md) lists files we currently `unrecognized`: `HydrationLog`, `TrainingReadinessDTO`, `EnduranceScore`, `HillScore`, `RunRacePredictions`. Each is a new SQLite table + dispatch entry. Skip `HydrationLog` (low value), prioritise `TrainingReadinessDTO` (Garmin's own readiness score — useful for comparing against derived TSB) and `RunRacePredictions` (a free baseline for the projected-PMC plan).
- **Test plan:** per category, `test_garmin_export.py::test_handle_<category>` fixture-driven insert; one schema-roundtrip test (`SELECT *` returns the same values).
+The Takeout dump includes several JSON categories we currently mark `unrecognized`: `TrainingReadinessDTO`, `EnduranceScore`, `HillScore`, `RunRacePredictions` (skip `HydrationLog` — low value).

-### 1.3  fit_linker can't disambiguate near-simultaneous activities — DONE
-Implemented warn-and-skip on collision: `link()` tracks `linked_aids: dict[int, Path]`; the second FIT to match an already-linked activity is logged on stderr and skipped (count surfaced in summary). Pure `_match_activity` helper extracted for unit-testable tolerance/closest-pick behaviour, and `link()` accepts a `fit_iter=` injection point so tests avoid the FIT-parsing path. See [tests/unit/test_fit_linker.py](tests/unit/test_fit_linker.py).
+- **Priority:** `TrainingReadinessDTO` first (Garmin's own readiness score — useful as a sanity check against derived TSB) and `RunRacePredictions` (a free baseline for the projected-PMC plan).
+- Each needs a new SQLite table + dispatch entry + a fixture JSON pulled from a real Takeout dump.
+- **Test plan:** per category, `test_garmin_export.py::test_handle_<category>` does fixture-driven insert; one schema-roundtrip test per table.

-### 1.4  `handle_fit` silently drops FITs with no parseable activity-id chunk — DONE
-Main loop now counts FITs whose stem has no 8+-digit chunk and prints a one-line `n FITs skipped … run openrun-link-fit <root> for Takeout-style exports` hint. The activity-id extraction is now `_activity_id_from_filename`, with five unit tests covering classic/Takeout/year-prefix/empty cases, plus a subprocess end-to-end test of `main()`.
-
-### 1.5  `_resolve_fit_path` is fragile (decided: absolute paths) — DONE
-Switched to absolute paths everywhere. `fit_linker.record_link` is a small shared helper used by both ingest paths and stores `str(p.resolve())`. `relink(conn, new_root)` walks a moved export by basename and rewrites the table; CLI flag is `openrun-link-fit <new_root> --relink`. `_resolve_fit_path` collapsed to a one-line existence check that raises with the relink hint on miss. The dead `fit_search_paths` config field (and matching `[ingest]` block in `openrun.toml`) was removed. Live DB migrated (349 paths rewritten, 0 unmatched).
-
-### 1.6  Schema round-trip tests — DONE (wellness + activities; splits/fit_files/tiz open)
-[tests/integration/test_loaders.py](tests/integration/test_loaders.py) covers 8 ingest-handler → DB → loader round-trips:
- `activities` (Takeout scaled-int units → `load_activities` derived columns)
- 7 wellness tables (steps / sleep / stress / hrv / resting_hr / intensity_minutes / body_battery), each through `load_wellness`. Sleep additionally rides through `load_sleep_stages` to lock the deep+light+rem=1.0 invariant.
-**Caught a real bug in the test fixture:** my first draft used `averageSpeed: 27.78` thinking that was the SI value. Inspecting live Takeout data showed Garmin actually stores it as `m/s × 0.1` (e.g. a 3.247 m/s sprint stored as 0.3247), so the `× 10` conversion in the handler is correct — the README table can stay; the fixture was wrong.
-**Still open:** `activity_splits`, `activity_fit_files`, `activity_time_in_zone` — these aren't reached through a single Takeout JSON handler (splits are sync-only, FIT files are linker-driven, TIZ is precomputed), so each needs a different fixture/path. Pulled forward to a follow-up; the existing helpers' tests (test_fit_linker.py, test_weekly_tiz.py) cover most of what these round-trips would.
+### 1.6  Schema round-trip leftovers
+`activity_splits`, `activity_fit_files`, and `activity_time_in_zone` aren't reached through a single Takeout JSON handler (splits are sync-only, FITs are linker-driven, TIZ is precomputed). Each needs a different fixture/path. The existing helper-level tests in `test_fit_linker.py` and `test_weekly_tiz.py` cover most of what these would, but a true round-trip test would close the gap.

 ---

-## Phase 2 — analytical features hinted at in the README
+## Phase 2 — analytical features still open

 ### 2.1  DBSCAN route clustering at scale
-[README §3](README.md) explicitly: *"Adequate for a few hundred starts; switch to `sklearn.cluster.DBSCAN(metric='haversine')` for thousands."* Add `cluster_routes(..., method='dbscan')` alternate path. Don't replace the greedy version — it's a good correctness baseline.
- **Test plan:**
-  - `test_geo.py::test_haversine_known_pairs` — two cities, compare to a published value within 0.1 km.
-  - `test_geo.py::test_cluster_routes_greedy_vs_dbscan_agree` — on a synthetic 200-point dataset (3 well-separated clusters + noise), both methods produce the same cluster assignment up to label permutation.
+The greedy haversine clusterer is fine for hundreds of starts; switch to `sklearn.cluster.DBSCAN(metric='haversine')` once a dataset gets to thousands. Add `cluster_routes(..., method='dbscan')` as an alternate path; keep greedy as the baseline.

-### 2.2  Off-watch volume integration
-[feedback_db_not_full_picture.md](../.claude/projects/-Users-noise-Documents-obsidian-RunningLog-garmin/memory/feedback_db_not_full_picture.md): recorded activity ≠ full training. Add a `manual_activities` table + a CSV/TOML loader so the user can log strength, hikes, unrecorded runs without faking Garmin uploads. `daily_training_load_series` learns an optional `include_manual=True`. Existing analyses pick this up via the same Banister pipeline.
- **Test plan:**
-  - `test_manual_activities.py::test_load_from_csv` — schema check.
-  - `test_loaders.py::test_daily_training_load_series_with_manual` — assert `include_manual=True` increases the total when synthetic manual rows are present, and that the Banister output is monotonically affected.
-
-### 2.3  PR detection by distance bucket — DONE
-`personal_records(activities, distance_bins_km=..., tolerance=0.05)` in [model.py](src/openrun/model.py); tested in [test_race_plan.py](tests/unit/test_race_plan.py) — fastest-in-band selection, tolerance enforcement, NaN-pace masking, empty input. On live data: PR table comes out 5K @ 4:49, 10K @ 5:38, half @ 6:18, 50 @ 8:04 (no marathon bin populated, correctly skipped).
-
-### 2.4  Race-plan TL/km calibration from history — DONE
-`calibrate_tl_per_km(conn, *, activity_types=..., min_distance_km=2.0, lookback_days=365)` returns `{median, q1, q3, n}`. Tests cover the median/IQR math, the short-distance and other-activity-type exclusions, and the empty-DB → NaN path. On live data the median came in at 11.4 with IQR [9.2, 16.4] — confirms the README's "training runs are ~11" claim and shows the spread is wide enough that hardcoded constants are a real weakness.
-
-### 2.5  Banister projection helper — DONE
-`banister_forecast(history, future, *, today=None)` lives in [model.py](src/openrun/model.py); the notebook's splice (`hist[hist.index <= TODAY]` ⨁ `forecast[forecast.index > TODAY]`) is now in one place. Tests in [test_banister.py](tests/unit/test_banister.py):
- `test_banister_forecast_matches_full_history` — splice invariant: PMC frame equals `banister(full)` when full is split (history, future).
- Plus a closed-form impulse-response check on the EWMA itself: with load=L on day 0 and zeros after, `CTL[i] = L·(1-decay)·decay^i` (the ROADMAP's original "CTL at day 42 ≈ L·(1-1/e)" was wrong — that's the step response after τ days, not impulse — corrected in the test).
- `test_banister_steady_state_approaches_input` covers the step response (constant input L converges to L; at i=τ it's L·(1-1/e)).
- Boundary-day overlap, default-today, and tail-extension tests round it out.
-
-### 2.6  Sleep-stage breakdown surfacing — DONE
-`load_sleep_stages(conn)` in [model.py](src/openrun/model.py) returns deep/light/rem/awake (seconds + percentages) keyed by date. Percentages of present stages sum to 1.0 by construction; awake_pct uses total-time-in-bed (not asleep-only). Tests in [test_sleep_stages.py](tests/unit/test_sleep_stages.py) cover the invariant + the zero-sleep / NaN-stage edge cases (the latter exposed by real 2026-05-17 row where `rem_s` is NULL).
-
-### 2.7  Per-second decoupling: drop-in plotting helper — DONE
-New `openrun.plots` submodule (matplotlib imported lazily inside the function so the rest of `openrun` stays import-light). `plot_fit_decoupling(records, *, segments=2, ax=None)` draws a per-segment bar chart with Friel's 5% / 10% reference lines. Tests in [test_plots.py](tests/unit/test_plots.py) cover bar count, caller-supplied Axes, and the "no usable records" sentinel path.
-
-### 2.8  Time-in-zone weekly summary — DONE
-`weekly_time_in_zone(conn, *, start=None, end=None, activity_types=...)` in [model.py](src/openrun/model.py) pivots the cached `activity_time_in_zone` table by ISO week (Monday-anchored). Tests in [test_weekly_tiz.py](tests/unit/test_weekly_tiz.py): within-week summation, week splitting, activity-type filter, date window, empty-frame shape. On live data, recent weeks show a Z3-heavy profile that matches the README's "junk-miles middle" warning.
+- **Adds dependency:** `scikit-learn`. Defer until there's a concrete dataset that needs it.
+- **Test plan:** `test_geo.py::test_haversine_known_pairs` (two cities vs published value within 0.1 km) and `test_cluster_routes_greedy_vs_dbscan_agree` on a synthetic 200-point dataset.

 ---

-## Phase 3 — aspirational (not on the critical path)
+## Phase 3 — web app polish (post-MVP)

-These are README-shaped or memory-shaped *maybes*. Don't take them on without a concrete reason; listed so they stop showing up as recurring "we should…" thoughts.
+The web app covers the full workflow but a few rough edges are worth picking off when there's time.

- **Heat & humidity adjustment of decoupling.** FIT messages carry `temperature` and sometimes `developer_data` from external sensors. We don't capture it. A weather-adjusted Pa:Hr would explain summer-vs-winter decoupling gaps. Real work: capture temp in `load_fit_records`, add a `decoupling_adjusted(...)` that conditions on it. Test plan: regression with vs without adjustment on a hot-run fixture should differ only when `temperature` is present.
- **Multi-athlete support.** The config already accepts a per-user profile. Promoting it to `openrun.toml` having multiple `[athletes.<name>]` blocks + a `--athlete` CLI flag is straightforward but premature for a single-user dataset. Test plan: profile-resolution unit tests would multiply by N.
- **Power & vertical-oscillation regression.** Lap raw has `averagePower`, `verticalOscillation`, `verticalRatio`. No analysis surfaces them. Would slot into [04_efficiency.ipynb](examples/notebooks/04_efficiency.ipynb). Worth a notebook section before becoming a `model.py` helper.
- **Race-day projected vs actual diff tracker.** After a race, compare the [06_race_plan.ipynb](examples/notebooks/06_race_plan.ipynb) projection to what actually happened (CTL on race day, predicted vs realised pace from `RunRacePredictions`). One-shot retrospective; doesn't need to be reusable.
+### 3.1  Browser-driven Garmin live sync — **done**
+Garmin's `garth` uses screen-scraped email/password/MFA login, not real OAuth. The Sync page now drives the full multi-step flow in the browser: email/password → (if required) MFA code → token store in `.secrets/` → **🔄 Sync now** (activities + per-second FIT + wellness), with a log-out that forgets tokens.
+
+- `openrun.ingest.auth` gained web-friendly, `input()`-free helpers: `has_tokens`, `resume`, `current_user`, `begin_login` (returns `("ok", user)` or `("needs_mfa", state)`), `complete_mfa`. The MFA `client_state` is held in `st.session_state` between the password and code steps.
+- `openrun.ingest.garmin_api.run_sync` is the auth-free orchestrator shared by the CLI `main()` and the Sync page; it streams step progress via a `progress` callback.
+- **Done in tests:** `test_auth.py` (token-store helpers + no-MFA and MFA-required login paths, mocked at `garth.sso`), `test_run_sync.py` (orchestration contract). 
+- **Possible follow-up:** a Streamlit `AppTest` smoke test for the page (the repo has none yet); auto-run `openrun-time-in-zone` after a sync that pulled new FITs so the TIZ cache never lags.
+
+### 3.2  Route map on Activity detail
+Each activity with linked FIT has per-second `position_lat/long` (semicircles). Render the route on a map (`pydeck` or `folium`), highlight HR-zone or pace via segment colouring. Pure visualisation; the data is already there.
+
+- **Adds dependency:** `pydeck` or `folium`.
+
+### 3.3  Activity-detail polish
+- Heart rate over time (5-min rolling), with zone bands shaded.
+- Splits sortable.
+- "Compare against another activity" — overlay this run's pace/HR curve on a prior run of similar distance.
+
+### 3.4  Race-plan UX
+- Inline race-week badges in the data-editor table.
+- "Suggest a plan" button that auto-generates a ramp given the race calendar + current CTL.
+- Save vs Reset — currently the editor blows away the plan table on every save.
+
+### 3.5  Multi-athlete support
+The config already accepts a per-user profile. Promoting `openrun.toml` to allow multiple `[athletes.<name>]` blocks + a profile selector in the sidebar is straightforward but premature for the single-user case the README is written for.
+
+---
+
+## Phase 4 — distribution
+
+When the tool is solid enough to share with non-Python folks:
+
+- **One-command installer.** A `make install` / `./run.sh` that handles `uv sync` + first launch. The only terminal step left after the wizard.
+- **Electron/Tauri desktop wrap.** Bundle a Python runtime + the wheel + a thin native shell that opens the Streamlit server in a webview. Mechanical; the HTTP server design is already wrap-ready.
+- **Public release.** PyPI, GitHub releases, screenshots in README. Pick an actual name (`openrun` is provisional — squat-check before shipping).

 ---

 ## Test conventions

 - **Layout:**
-  - `tests/unit/` — pure-function metrics ([model.py](src/openrun/model.py) public surface). One file per concept: `test_banister.py`, `test_decoupling.py`, `test_geo.py`, `test_zones.py`.
-  - `tests/integration/` — ingest pipelines + loaders. Use the in-memory SQLite fixture.
-  - `tests/fixtures/` — pinned JSON / FIT / CSV samples. Anonymised.
- **Granularity:** prefer one assertion per test (within reason). Friel's `< 5%` / `5–10%` thresholds make for natural assertions; the round-trip schema tests should be exact-equality on inserted rows.
+  - `tests/unit/` — pure-function metrics (model.py public surface). One file per concept: `test_banister.py`, `test_race_plan.py`, `test_sleep_stages.py`, etc.
+  - `tests/integration/` — ingest pipelines, loaders, cross-cutting flows.
+  - `tests/fixtures/` — pinned JSON / FIT / CSV samples. Anonymised, small.
+- **Granularity:** one assertion per test when reasonable; closed-form math checks for derivations (impulse response, step response).
 - **What we don't test:**
-  - `garth.connectapi` — network-dependent, fragile, and garth is upstream-deprecated. Mock at the `_safe_call` boundary if we ever need to.
-  - Notebook content. The build scripts ([examples/notebooks/_build_05.py](examples/notebooks/_build_05.py) etc.) are easier to read than the generated `.ipynb`; we test what they import from, not the notebook itself.
-  - Plotting beyond smoke tests (artist counts, axis labels). Visual regression isn't worth the maintenance.
- **Run:** `uv run pytest` (no markers needed at this scale). Add `pytest-cov` for an off-the-shelf coverage report; ignore the percentage as a target until we have meaningful coverage of the pure-function surface.
+  - `garth.connectapi` — network-dependent, fragile, upstream-deprecated. Mock at `_safe_call` if we ever need to.
+  - Notebook content. The build scripts are easier to read than the generated JSON; we test what they import.
+  - Streamlit visual output. AppTest framework smoke-tests page rendering (zero exceptions on a real DB); we don't visual-regress.
+- **Run:** `uv run pytest`. `pytest-cov` is in dev deps; coverage isn't a target.

 ---

 ## Working agreement

-When picking up an item: write the failing test first against the API the test plan describes, then make it pass. If the test plan turns out to be wrong (the function shouldn't behave that way after all), update this file in the same PR. Items removed because they turned out to be misguided are more valuable signal than items completed.
+When picking up an item: write the failing test first against the API the test plan describes, then make it pass. If the test plan turns out to be wrong (the function shouldn't behave that way after all), update this file in the same PR. **Items removed because they turned out to be misguided are more valuable signal than items completed.**