Files
openrun/ROADMAP.md
2026-05-19 08:34:22 -04:00

14 KiB
Raw Blame History

openrun — roadmap

Living backlog. Items are grouped by phase; within a phase, top to bottom is rough priority. Each item names its test plan alongside the implementation so they ship together.

This isn't a contract. If something turns out to be harder, smaller, or pointless once we're in the code, edit it here rather than rationalise away.


Phase 0 — test scaffolding (prerequisite, one-time)

The codebase has no tests/ directory and no pytest in pyproject.toml. Everything below assumes this is in place first.

  • Add pytest + a minimal tests/ layout. Add pytest and pytest-cov to [dependency-groups.dev] in pyproject.toml. Create tests/{unit,integration,fixtures}/. A tests/conftest.py should expose a shared tmp_conn fixture that builds an in-memory SQLite via openrun.db.connect(":memory:") (so the live data/garmin.db is never touched).

    • Test plan: N/A — this is the test plan.
    • DoD: uv run pytest exits 0 with one trivial passing test.
  • Pin a tiny fixture set. One .fit (~5 MB workout), one Takeout summarizedActivities snippet (~3 activities), one Connect-format snippet, one of each daily wellness JSON. Anonymise; commit under tests/fixtures/. These back every test below.

    • DoD: Each fixture has a one-line tests/fixtures/README.md entry recording its provenance + what it exercises.

Phase 1 — code gaps (existing functionality)

1.1 Takeout export doesn't ingest splits — RETIRED (premise wrong for current account)

ingest/garmin_export.py only handles summarizedActivities. Lap detail in Takeout lives in <activity_id>_<name>.json under DI-Connect-Fitness/

Finding (2026-05-18): the actual Takeout dump for this account contains zero per-activity lap-detail JSONs under DI-Connect-Fitness/ — only summarizedActivities, personalRecord, trainingPlan, workout, gear. Lap data is only retrievable from the FIT files via link_fit_files.py (Path B sync also pulls it directly from /activity-service/activity/{aid}/splits). If a future Takeout shape does include <activity_id>.json lap files, reopen this item; until then the assumption that drove it doesn't hold.

1.2 Unhandled Takeout JSON categories

project_takeout_export_quirks.md lists files we currently unrecognized: HydrationLog, TrainingReadinessDTO, EnduranceScore, HillScore, RunRacePredictions. Each is a new SQLite table + dispatch entry. Skip HydrationLog (low value), prioritise TrainingReadinessDTO (Garmin's own readiness score — useful for comparing against derived TSB) and RunRacePredictions (a free baseline for the projected-PMC plan).

  • Test plan: per category, test_garmin_export.py::test_handle_<category> fixture-driven insert; one schema-roundtrip test (SELECT * returns the same values).

1.3 fit_linker can't disambiguate near-simultaneous activities — DONE

Implemented warn-and-skip on collision: link() tracks linked_aids: dict[int, Path]; the second FIT to match an already-linked activity is logged on stderr and skipped (count surfaced in summary). Pure _match_activity helper extracted for unit-testable tolerance/closest-pick behaviour, and link() accepts a fit_iter= injection point so tests avoid the FIT-parsing path. See tests/unit/test_fit_linker.py.

1.4 handle_fit silently drops FITs with no parseable activity-id chunk — DONE

Main loop now counts FITs whose stem has no 8+-digit chunk and prints a one-line n FITs skipped … run openrun-link-fit <root> for Takeout-style exports hint. The activity-id extraction is now _activity_id_from_filename, with five unit tests covering classic/Takeout/year-prefix/empty cases, plus a subprocess end-to-end test of main().

1.5 _resolve_fit_path is fragile (decided: absolute paths) — DONE

Switched to absolute paths everywhere. fit_linker.record_link is a small shared helper used by both ingest paths and stores str(p.resolve()). relink(conn, new_root) walks a moved export by basename and rewrites the table; CLI flag is openrun-link-fit <new_root> --relink. _resolve_fit_path collapsed to a one-line existence check that raises with the relink hint on miss. The dead fit_search_paths config field (and matching [ingest] block in openrun.toml) was removed. Live DB migrated (349 paths rewritten, 0 unmatched).

1.6 Schema round-trip tests — DONE (wellness + activities; splits/fit_files/tiz open)

tests/integration/test_loaders.py covers 8 ingest-handler → DB → loader round-trips:

  • activities (Takeout scaled-int units → load_activities derived columns)
  • 7 wellness tables (steps / sleep / stress / hrv / resting_hr / intensity_minutes / body_battery), each through load_wellness. Sleep additionally rides through load_sleep_stages to lock the deep+light+rem=1.0 invariant. Caught a real bug in the test fixture: my first draft used averageSpeed: 27.78 thinking that was the SI value. Inspecting live Takeout data showed Garmin actually stores it as m/s × 0.1 (e.g. a 3.247 m/s sprint stored as 0.3247), so the × 10 conversion in the handler is correct — the README table can stay; the fixture was wrong. Still open: activity_splits, activity_fit_files, activity_time_in_zone — these aren't reached through a single Takeout JSON handler (splits are sync-only, FIT files are linker-driven, TIZ is precomputed), so each needs a different fixture/path. Pulled forward to a follow-up; the existing helpers' tests (test_fit_linker.py, test_weekly_tiz.py) cover most of what these round-trips would.

Phase 2 — analytical features hinted at in the README

2.1 DBSCAN route clustering at scale

README §3 explicitly: "Adequate for a few hundred starts; switch to sklearn.cluster.DBSCAN(metric='haversine') for thousands." Add cluster_routes(..., method='dbscan') alternate path. Don't replace the greedy version — it's a good correctness baseline.

  • Test plan:
    • test_geo.py::test_haversine_known_pairs — two cities, compare to a published value within 0.1 km.
    • test_geo.py::test_cluster_routes_greedy_vs_dbscan_agree — on a synthetic 200-point dataset (3 well-separated clusters + noise), both methods produce the same cluster assignment up to label permutation.

2.2 Off-watch volume integration

feedback_db_not_full_picture.md: recorded activity ≠ full training. Add a manual_activities table + a CSV/TOML loader so the user can log strength, hikes, unrecorded runs without faking Garmin uploads. daily_training_load_series learns an optional include_manual=True. Existing analyses pick this up via the same Banister pipeline.

  • Test plan:
    • test_manual_activities.py::test_load_from_csv — schema check.
    • test_loaders.py::test_daily_training_load_series_with_manual — assert include_manual=True increases the total when synthetic manual rows are present, and that the Banister output is monotonically affected.

2.3 PR detection by distance bucket — DONE

personal_records(activities, distance_bins_km=..., tolerance=0.05) in model.py; tested in test_race_plan.py — fastest-in-band selection, tolerance enforcement, NaN-pace masking, empty input. On live data: PR table comes out 5K @ 4:49, 10K @ 5:38, half @ 6:18, 50 @ 8:04 (no marathon bin populated, correctly skipped).

2.4 Race-plan TL/km calibration from history — DONE

calibrate_tl_per_km(conn, *, activity_types=..., min_distance_km=2.0, lookback_days=365) returns {median, q1, q3, n}. Tests cover the median/IQR math, the short-distance and other-activity-type exclusions, and the empty-DB → NaN path. On live data the median came in at 11.4 with IQR [9.2, 16.4] — confirms the README's "training runs are ~11" claim and shows the spread is wide enough that hardcoded constants are a real weakness.

2.5 Banister projection helper — DONE

banister_forecast(history, future, *, today=None) lives in model.py; the notebook's splice (hist[hist.index <= TODAY]forecast[forecast.index > TODAY]) is now in one place. Tests in test_banister.py:

  • test_banister_forecast_matches_full_history — splice invariant: PMC frame equals banister(full) when full is split (history, future).
  • Plus a closed-form impulse-response check on the EWMA itself: with load=L on day 0 and zeros after, CTL[i] = L·(1-decay)·decay^i (the ROADMAP's original "CTL at day 42 ≈ L·(1-1/e)" was wrong — that's the step response after τ days, not impulse — corrected in the test).
  • test_banister_steady_state_approaches_input covers the step response (constant input L converges to L; at i=τ it's L·(1-1/e)).
  • Boundary-day overlap, default-today, and tail-extension tests round it out.

2.6 Sleep-stage breakdown surfacing — DONE

load_sleep_stages(conn) in model.py returns deep/light/rem/awake (seconds + percentages) keyed by date. Percentages of present stages sum to 1.0 by construction; awake_pct uses total-time-in-bed (not asleep-only). Tests in test_sleep_stages.py cover the invariant + the zero-sleep / NaN-stage edge cases (the latter exposed by real 2026-05-17 row where rem_s is NULL).

2.7 Per-second decoupling: drop-in plotting helper — DONE

New openrun.plots submodule (matplotlib imported lazily inside the function so the rest of openrun stays import-light). plot_fit_decoupling(records, *, segments=2, ax=None) draws a per-segment bar chart with Friel's 5% / 10% reference lines. Tests in test_plots.py cover bar count, caller-supplied Axes, and the "no usable records" sentinel path.

2.8 Time-in-zone weekly summary — DONE

weekly_time_in_zone(conn, *, start=None, end=None, activity_types=...) in model.py pivots the cached activity_time_in_zone table by ISO week (Monday-anchored). Tests in test_weekly_tiz.py: within-week summation, week splitting, activity-type filter, date window, empty-frame shape. On live data, recent weeks show a Z3-heavy profile that matches the README's "junk-miles middle" warning.


Phase 3 — aspirational (not on the critical path)

These are README-shaped or memory-shaped maybes. Don't take them on without a concrete reason; listed so they stop showing up as recurring "we should…" thoughts.

  • Heat & humidity adjustment of decoupling. FIT messages carry temperature and sometimes developer_data from external sensors. We don't capture it. A weather-adjusted Pa:Hr would explain summer-vs-winter decoupling gaps. Real work: capture temp in load_fit_records, add a decoupling_adjusted(...) that conditions on it. Test plan: regression with vs without adjustment on a hot-run fixture should differ only when temperature is present.
  • Multi-athlete support. The config already accepts a per-user profile. Promoting it to openrun.toml having multiple [athletes.<name>] blocks + a --athlete CLI flag is straightforward but premature for a single-user dataset. Test plan: profile-resolution unit tests would multiply by N.
  • Power & vertical-oscillation regression. Lap raw has averagePower, verticalOscillation, verticalRatio. No analysis surfaces them. Would slot into 04_efficiency.ipynb. Worth a notebook section before becoming a model.py helper.
  • Race-day projected vs actual diff tracker. After a race, compare the 06_race_plan.ipynb projection to what actually happened (CTL on race day, predicted vs realised pace from RunRacePredictions). One-shot retrospective; doesn't need to be reusable.

Test conventions

  • Layout:
    • tests/unit/ — pure-function metrics (model.py public surface). One file per concept: test_banister.py, test_decoupling.py, test_geo.py, test_zones.py.
    • tests/integration/ — ingest pipelines + loaders. Use the in-memory SQLite fixture.
    • tests/fixtures/ — pinned JSON / FIT / CSV samples. Anonymised.
  • Granularity: prefer one assertion per test (within reason). Friel's < 5% / 510% thresholds make for natural assertions; the round-trip schema tests should be exact-equality on inserted rows.
  • What we don't test:
    • garth.connectapi — network-dependent, fragile, and garth is upstream-deprecated. Mock at the _safe_call boundary if we ever need to.
    • Notebook content. The build scripts (examples/notebooks/_build_05.py etc.) are easier to read than the generated .ipynb; we test what they import from, not the notebook itself.
    • Plotting beyond smoke tests (artist counts, axis labels). Visual regression isn't worth the maintenance.
  • Run: uv run pytest (no markers needed at this scale). Add pytest-cov for an off-the-shelf coverage report; ignore the percentage as a target until we have meaningful coverage of the pure-function surface.

Working agreement

When picking up an item: write the failing test first against the API the test plan describes, then make it pass. If the test plan turns out to be wrong (the function shouldn't behave that way after all), update this file in the same PR. Items removed because they turned out to be misguided are more valuable signal than items completed.