1.x updates

This commit is contained in:
2026-05-19 08:34:22 -04:00
parent 3f3fce62d3
commit 9d91ac8ebc
53 changed files with 4541 additions and 2111 deletions

5
.gitignore vendored
View File

@@ -20,5 +20,10 @@ data/raw/
# Notebook checkpoints
.ipynb_checkpoints/
# Test artefacts
.pytest_cache/
.coverage
htmlcov/
# OS
.DS_Store

26
.vscode/settings.json vendored Normal file
View File

@@ -0,0 +1,26 @@
{
"python.defaultInterpreterPath": "${workspaceFolder}/.venv/bin/python",
"python.testing.pytestEnabled": true,
"python.testing.unittestEnabled": false,
"python.testing.pytestArgs": ["tests"],
"workbench.colorCustomizations": {
"activityBar.activeBackground": "#2f7c47",
"activityBar.background": "#2f7c47",
"activityBar.foreground": "#e7e7e7",
"activityBar.inactiveForeground": "#e7e7e799",
"activityBarBadge.background": "#422c74",
"activityBarBadge.foreground": "#e7e7e7",
"commandCenter.border": "#e7e7e799",
"sash.hoverBorder": "#2f7c47",
"statusBar.background": "#215732",
"statusBar.foreground": "#e7e7e7",
"statusBarItem.hoverBackground": "#2f7c47",
"statusBarItem.remoteBackground": "#215732",
"statusBarItem.remoteForeground": "#e7e7e7",
"titleBar.activeBackground": "#215732",
"titleBar.activeForeground": "#e7e7e7",
"titleBar.inactiveBackground": "#21573299",
"titleBar.inactiveForeground": "#e7e7e799"
},
"peacock.color": "#215732"
}

View File

@@ -1,26 +1,59 @@
# Garmin → local SQLite
# openrun — endurance running analytics
A small, self-contained pipeline that pulls Garmin Connect data into a local SQLite database and a stack of pandas notebooks. The aim is a *reference* you can run against your own data — not a polished product — so this README focuses on what's in the data, what gets computed from it, and exactly how each derived metric is defined.
A local-first, open-source endurance-training analysis pipeline. Garmin data in (live API or official export), SQLite on disk, pandas notebooks out. Banister CTL/ATL/TSB, Pa:HR decoupling (per-mile and per-second from FIT), Garmin-configured HR-zone time-in-zone, route clustering, race-plan projection with forward CTL/ATL/TSB.
This README focuses on what's in the data, what gets computed from it, and how each derived metric is defined. The package name `openrun` is provisional; the local repo can be renamed without code changes.
```
garmin/
├── auth.py # Path B: OAuth login for live API
├── sync.py # Path B: incremental sync via garth
├── ingest_export.py # Path A: parse Garmin Connect data export
├── link_fit_files.py # match Takeout FIT files to activities by start_time
├── compute_time_in_zone.py # cache per-activity HR-zone breakdowns
├── analysis.py # shared loaders + derived metrics
├── db.py # schema + connection helpers
├── data/garmin.db # SQLite (created on first run)
└── notebooks/
├── 01_overview.ipynb # data coverage, recent activity
├── 02_running.ipynb # volume, pace, ACWR, PRs
├── 03_recovery.ipynb # sleep, HRV, RHR, body battery
── 04_efficiency.ipynb # m/beat trends year over year
├── 05_intra_run.ipynb # decoupling, cadence, routes, HR zones
── 06_race_plan.ipynb # build a periodised plan + adherence tracker
openrun/
├── pyproject.toml # packaged as `openrun` (src layout, hatchling)
├── openrun.toml # user config (HR zones, weight, LTHR, race calendar)
├── src/openrun/
│ ├── __init__.py # re-exports the common API
├── config.py # UserProfile, HRZones, BanisterParams, TOML loader
│ ├── db.py # SQLite schema + connection helper
│ ├── model.py # loaders + derived metrics (was analysis.py)
│ └── ingest/
│ ├── auth.py # Garmin OAuth login
├── garmin_api.py # Path B: incremental sync via garth
├── garmin_export.py # Path A: parse Connect/Takeout JSON + index FITs
├── fit_linker.py # match Takeout FITs to activities by session.start_time
── time_in_zone.py # cache per-activity HR-zone breakdowns
├── examples/notebooks/
── 01_overview.ipynb # data coverage, recent activity
│ ├── 02_running.ipynb # volume, pace, PMC (Banister), PRs
│ ├── 03_recovery.ipynb # sleep, HRV, RHR, body battery
│ ├── 04_efficiency.ipynb # m/beat trends year over year
│ ├── 05_intra_run.ipynb # decoupling, cadence, routes, HR zones
│ └── 06_race_plan.ipynb # periodised plan + tracker + projected PMC
├── data/garmin.db # SQLite (created on first run)
└── {auth,sync,ingest_export,link_fit_files,compute_time_in_zone,db,analysis}.py
# thin top-level shims for back-compat
```
Most callers want:
```python
from openrun import open_conn, load_activities, banister, daily_training_load_series
conn = open_conn()
runs = load_activities(conn, type='running')
pmc = banister(daily_training_load_series(conn))
```
User-specific values (HR zones, LTHR, weight, race calendar) live in `openrun.toml` — config discovery walks up from cwd looking for it. See [openrun.toml](openrun.toml) for the schema.
CLI entry points (after `uv sync`):
```bash
openrun-auth # OAuth login (Path B)
openrun-sync [--full] # incremental Garmin Connect sync
openrun-ingest <export> # parse a Connect/Takeout export
openrun-link-fit <export> # match Takeout FITs to activities by content
openrun-time-in-zone # populate the activity_time_in_zone cache
```
Top-level shims (`uv run sync.py`, `uv run ingest_export.py`, etc.) still work — they re-export `main()` from the corresponding `openrun.ingest.*` module.
---
## 1 — Source data

117
ROADMAP.md Normal file
View File

@@ -0,0 +1,117 @@
# openrun — roadmap
Living backlog. Items are grouped by phase; within a phase, top to bottom is rough priority. Each item names its **test plan** alongside the implementation so they ship together.
This isn't a contract. If something turns out to be harder, smaller, or pointless once we're in the code, edit it here rather than rationalise away.
---
## Phase 0 — test scaffolding (prerequisite, one-time)
The codebase has no `tests/` directory and no pytest in `pyproject.toml`. Everything below assumes this is in place first.
- **Add pytest + a minimal `tests/` layout.** Add `pytest` and `pytest-cov` to `[dependency-groups.dev]` in [pyproject.toml](pyproject.toml). Create `tests/{unit,integration,fixtures}/`. A `tests/conftest.py` should expose a shared `tmp_conn` fixture that builds an in-memory SQLite via `openrun.db.connect(":memory:")` (so the live `data/garmin.db` is never touched).
- **Test plan:** N/A — this *is* the test plan.
- **DoD:** `uv run pytest` exits 0 with one trivial passing test.
- **Pin a tiny fixture set.** One `.fit` (~5 MB workout), one Takeout `summarizedActivities` snippet (~3 activities), one Connect-format snippet, one of each daily wellness JSON. Anonymise; commit under `tests/fixtures/`. These back every test below.
- **DoD:** Each fixture has a one-line `tests/fixtures/README.md` entry recording its provenance + what it exercises.
---
## Phase 1 — code gaps (existing functionality)
### 1.1 Takeout export doesn't ingest splits — RETIRED (premise wrong for current account)
~~[ingest/garmin_export.py](src/openrun/ingest/garmin_export.py) only handles `summarizedActivities`. Lap detail in Takeout lives in `<activity_id>_<name>.json` under `DI-Connect-Fitness/`~~
**Finding (2026-05-18):** the actual Takeout dump for this account contains zero per-activity lap-detail JSONs under `DI-Connect-Fitness/` — only `summarizedActivities`, `personalRecord`, `trainingPlan`, `workout`, `gear`. Lap data is only retrievable from the FIT files via `link_fit_files.py` (Path B sync also pulls it directly from `/activity-service/activity/{aid}/splits`). If a future Takeout shape *does* include `<activity_id>.json` lap files, reopen this item; until then the assumption that drove it doesn't hold.
### 1.2 Unhandled Takeout JSON categories
[project_takeout_export_quirks.md](../.claude/projects/-Users-noise-Documents-obsidian-RunningLog-garmin/memory/project_takeout_export_quirks.md) lists files we currently `unrecognized`: `HydrationLog`, `TrainingReadinessDTO`, `EnduranceScore`, `HillScore`, `RunRacePredictions`. Each is a new SQLite table + dispatch entry. Skip `HydrationLog` (low value), prioritise `TrainingReadinessDTO` (Garmin's own readiness score — useful for comparing against derived TSB) and `RunRacePredictions` (a free baseline for the projected-PMC plan).
- **Test plan:** per category, `test_garmin_export.py::test_handle_<category>` fixture-driven insert; one schema-roundtrip test (`SELECT *` returns the same values).
### 1.3 fit_linker can't disambiguate near-simultaneous activities — DONE
Implemented warn-and-skip on collision: `link()` tracks `linked_aids: dict[int, Path]`; the second FIT to match an already-linked activity is logged on stderr and skipped (count surfaced in summary). Pure `_match_activity` helper extracted for unit-testable tolerance/closest-pick behaviour, and `link()` accepts a `fit_iter=` injection point so tests avoid the FIT-parsing path. See [tests/unit/test_fit_linker.py](tests/unit/test_fit_linker.py).
### 1.4 `handle_fit` silently drops FITs with no parseable activity-id chunk — DONE
Main loop now counts FITs whose stem has no 8+-digit chunk and prints a one-line `n FITs skipped … run openrun-link-fit <root> for Takeout-style exports` hint. The activity-id extraction is now `_activity_id_from_filename`, with five unit tests covering classic/Takeout/year-prefix/empty cases, plus a subprocess end-to-end test of `main()`.
### 1.5 `_resolve_fit_path` is fragile (decided: absolute paths) — DONE
Switched to absolute paths everywhere. `fit_linker.record_link` is a small shared helper used by both ingest paths and stores `str(p.resolve())`. `relink(conn, new_root)` walks a moved export by basename and rewrites the table; CLI flag is `openrun-link-fit <new_root> --relink`. `_resolve_fit_path` collapsed to a one-line existence check that raises with the relink hint on miss. The dead `fit_search_paths` config field (and matching `[ingest]` block in `openrun.toml`) was removed. Live DB migrated (349 paths rewritten, 0 unmatched).
### 1.6 Schema round-trip tests — DONE (wellness + activities; splits/fit_files/tiz open)
[tests/integration/test_loaders.py](tests/integration/test_loaders.py) covers 8 ingest-handler → DB → loader round-trips:
- `activities` (Takeout scaled-int units → `load_activities` derived columns)
- 7 wellness tables (steps / sleep / stress / hrv / resting_hr / intensity_minutes / body_battery), each through `load_wellness`. Sleep additionally rides through `load_sleep_stages` to lock the deep+light+rem=1.0 invariant.
**Caught a real bug in the test fixture:** my first draft used `averageSpeed: 27.78` thinking that was the SI value. Inspecting live Takeout data showed Garmin actually stores it as `m/s × 0.1` (e.g. a 3.247 m/s sprint stored as 0.3247), so the `× 10` conversion in the handler is correct — the README table can stay; the fixture was wrong.
**Still open:** `activity_splits`, `activity_fit_files`, `activity_time_in_zone` — these aren't reached through a single Takeout JSON handler (splits are sync-only, FIT files are linker-driven, TIZ is precomputed), so each needs a different fixture/path. Pulled forward to a follow-up; the existing helpers' tests (test_fit_linker.py, test_weekly_tiz.py) cover most of what these round-trips would.
---
## Phase 2 — analytical features hinted at in the README
### 2.1 DBSCAN route clustering at scale
[README §3](README.md) explicitly: *"Adequate for a few hundred starts; switch to `sklearn.cluster.DBSCAN(metric='haversine')` for thousands."* Add `cluster_routes(..., method='dbscan')` alternate path. Don't replace the greedy version — it's a good correctness baseline.
- **Test plan:**
- `test_geo.py::test_haversine_known_pairs` — two cities, compare to a published value within 0.1 km.
- `test_geo.py::test_cluster_routes_greedy_vs_dbscan_agree` — on a synthetic 200-point dataset (3 well-separated clusters + noise), both methods produce the same cluster assignment up to label permutation.
### 2.2 Off-watch volume integration
[feedback_db_not_full_picture.md](../.claude/projects/-Users-noise-Documents-obsidian-RunningLog-garmin/memory/feedback_db_not_full_picture.md): recorded activity ≠ full training. Add a `manual_activities` table + a CSV/TOML loader so the user can log strength, hikes, unrecorded runs without faking Garmin uploads. `daily_training_load_series` learns an optional `include_manual=True`. Existing analyses pick this up via the same Banister pipeline.
- **Test plan:**
- `test_manual_activities.py::test_load_from_csv` — schema check.
- `test_loaders.py::test_daily_training_load_series_with_manual` — assert `include_manual=True` increases the total when synthetic manual rows are present, and that the Banister output is monotonically affected.
### 2.3 PR detection by distance bucket — DONE
`personal_records(activities, distance_bins_km=..., tolerance=0.05)` in [model.py](src/openrun/model.py); tested in [test_race_plan.py](tests/unit/test_race_plan.py) — fastest-in-band selection, tolerance enforcement, NaN-pace masking, empty input. On live data: PR table comes out 5K @ 4:49, 10K @ 5:38, half @ 6:18, 50 @ 8:04 (no marathon bin populated, correctly skipped).
### 2.4 Race-plan TL/km calibration from history — DONE
`calibrate_tl_per_km(conn, *, activity_types=..., min_distance_km=2.0, lookback_days=365)` returns `{median, q1, q3, n}`. Tests cover the median/IQR math, the short-distance and other-activity-type exclusions, and the empty-DB → NaN path. On live data the median came in at 11.4 with IQR [9.2, 16.4] — confirms the README's "training runs are ~11" claim and shows the spread is wide enough that hardcoded constants are a real weakness.
### 2.5 Banister projection helper — DONE
`banister_forecast(history, future, *, today=None)` lives in [model.py](src/openrun/model.py); the notebook's splice (`hist[hist.index <= TODAY]``forecast[forecast.index > TODAY]`) is now in one place. Tests in [test_banister.py](tests/unit/test_banister.py):
- `test_banister_forecast_matches_full_history` — splice invariant: PMC frame equals `banister(full)` when full is split (history, future).
- Plus a closed-form impulse-response check on the EWMA itself: with load=L on day 0 and zeros after, `CTL[i] = L·(1-decay)·decay^i` (the ROADMAP's original "CTL at day 42 ≈ L·(1-1/e)" was wrong — that's the step response after τ days, not impulse — corrected in the test).
- `test_banister_steady_state_approaches_input` covers the step response (constant input L converges to L; at i=τ it's L·(1-1/e)).
- Boundary-day overlap, default-today, and tail-extension tests round it out.
### 2.6 Sleep-stage breakdown surfacing — DONE
`load_sleep_stages(conn)` in [model.py](src/openrun/model.py) returns deep/light/rem/awake (seconds + percentages) keyed by date. Percentages of present stages sum to 1.0 by construction; awake_pct uses total-time-in-bed (not asleep-only). Tests in [test_sleep_stages.py](tests/unit/test_sleep_stages.py) cover the invariant + the zero-sleep / NaN-stage edge cases (the latter exposed by real 2026-05-17 row where `rem_s` is NULL).
### 2.7 Per-second decoupling: drop-in plotting helper — DONE
New `openrun.plots` submodule (matplotlib imported lazily inside the function so the rest of `openrun` stays import-light). `plot_fit_decoupling(records, *, segments=2, ax=None)` draws a per-segment bar chart with Friel's 5% / 10% reference lines. Tests in [test_plots.py](tests/unit/test_plots.py) cover bar count, caller-supplied Axes, and the "no usable records" sentinel path.
### 2.8 Time-in-zone weekly summary — DONE
`weekly_time_in_zone(conn, *, start=None, end=None, activity_types=...)` in [model.py](src/openrun/model.py) pivots the cached `activity_time_in_zone` table by ISO week (Monday-anchored). Tests in [test_weekly_tiz.py](tests/unit/test_weekly_tiz.py): within-week summation, week splitting, activity-type filter, date window, empty-frame shape. On live data, recent weeks show a Z3-heavy profile that matches the README's "junk-miles middle" warning.
---
## Phase 3 — aspirational (not on the critical path)
These are README-shaped or memory-shaped *maybes*. Don't take them on without a concrete reason; listed so they stop showing up as recurring "we should…" thoughts.
- **Heat & humidity adjustment of decoupling.** FIT messages carry `temperature` and sometimes `developer_data` from external sensors. We don't capture it. A weather-adjusted Pa:Hr would explain summer-vs-winter decoupling gaps. Real work: capture temp in `load_fit_records`, add a `decoupling_adjusted(...)` that conditions on it. Test plan: regression with vs without adjustment on a hot-run fixture should differ only when `temperature` is present.
- **Multi-athlete support.** The config already accepts a per-user profile. Promoting it to `openrun.toml` having multiple `[athletes.<name>]` blocks + a `--athlete` CLI flag is straightforward but premature for a single-user dataset. Test plan: profile-resolution unit tests would multiply by N.
- **Power & vertical-oscillation regression.** Lap raw has `averagePower`, `verticalOscillation`, `verticalRatio`. No analysis surfaces them. Would slot into [04_efficiency.ipynb](examples/notebooks/04_efficiency.ipynb). Worth a notebook section before becoming a `model.py` helper.
- **Race-day projected vs actual diff tracker.** After a race, compare the [06_race_plan.ipynb](examples/notebooks/06_race_plan.ipynb) projection to what actually happened (CTL on race day, predicted vs realised pace from `RunRacePredictions`). One-shot retrospective; doesn't need to be reusable.
---
## Test conventions
- **Layout:**
- `tests/unit/` — pure-function metrics ([model.py](src/openrun/model.py) public surface). One file per concept: `test_banister.py`, `test_decoupling.py`, `test_geo.py`, `test_zones.py`.
- `tests/integration/` — ingest pipelines + loaders. Use the in-memory SQLite fixture.
- `tests/fixtures/` — pinned JSON / FIT / CSV samples. Anonymised.
- **Granularity:** prefer one assertion per test (within reason). Friel's `< 5%` / `510%` thresholds make for natural assertions; the round-trip schema tests should be exact-equality on inserted rows.
- **What we don't test:**
- `garth.connectapi` — network-dependent, fragile, and garth is upstream-deprecated. Mock at the `_safe_call` boundary if we ever need to.
- Notebook content. The build scripts ([examples/notebooks/_build_05.py](examples/notebooks/_build_05.py) etc.) are easier to read than the generated `.ipynb`; we test what they import from, not the notebook itself.
- Plotting beyond smoke tests (artist counts, axis labels). Visual regression isn't worth the maintenance.
- **Run:** `uv run pytest` (no markers needed at this scale). Add `pytest-cov` for an off-the-shelf coverage report; ignore the percentage as a target until we have meaningful coverage of the pure-function surface.
---
## Working agreement
When picking up an item: write the failing test first against the API the test plan describes, then make it pass. If the test plan turns out to be wrong (the function shouldn't behave that way after all), update this file in the same PR. Items removed because they turned out to be misguided are more valuable signal than items completed.

BIN
agentdb.rvf Normal file

Binary file not shown.

BIN
agentdb.rvf.lock Normal file

Binary file not shown.

View File

@@ -1,597 +1,12 @@
"""Shared loaders + derived columns for Garmin analysis notebooks.
"""Backwards-compat shim — everything is now in openrun.
Usage in a notebook:
from analysis import open_conn, load_activities, load_wellness
conn = open_conn()
runs = load_activities(conn, type='running')
wellness = load_wellness(conn)
This module used to host all loaders + derived metrics. Phase 0 of the
openrun refactor moved them into the `openrun` package. Existing notebooks
that still `from analysis import ...` continue to work; new code should
`from openrun import ...` directly.
"""
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
import pandas as pd
DB_PATH = Path(__file__).parent / "data" / "garmin.db"
def open_conn() -> sqlite3.Connection:
return sqlite3.connect(DB_PATH)
# ---------------------------------------------------------------------------
# activities
# ---------------------------------------------------------------------------
def load_activities(conn: sqlite3.Connection, *, type: str | None = None) -> pd.DataFrame:
"""Activities with derived: distance_km, duration_min, pace_min_per_km, week, month, year."""
sql = """
SELECT activity_id, start_time_local, activity_type, activity_name,
distance_m, duration_s, moving_duration_s,
avg_speed_mps, max_speed_mps, avg_hr, max_hr, calories,
elevation_gain_m, elevation_loss_m,
training_load, aerobic_te, anaerobic_te, vo2_max
FROM activities
"""
if type:
sql += " WHERE activity_type = ?"
df = pd.read_sql(sql, conn, params=[type], parse_dates=["start_time_local"])
else:
df = pd.read_sql(sql, conn, parse_dates=["start_time_local"])
df["distance_km"] = df["distance_m"] / 1000
df["duration_min"] = df["duration_s"] / 60
df["moving_min"] = df["moving_duration_s"] / 60
df["pace_min_per_km"] = df["moving_min"] / df["distance_km"]
# Filter physically-impossible paces (sub-3 min/km is faster than world-record marathon pace)
# and walks-with-stops (>30 min/km).
impossible = (df["distance_km"] < 0.2) | (df["pace_min_per_km"] < 3) | (df["pace_min_per_km"] > 30)
df.loc[impossible, "pace_min_per_km"] = pd.NA
df["date"] = df["start_time_local"].dt.normalize()
df["week"] = df["start_time_local"].dt.to_period("W").dt.start_time
df["month"] = df["start_time_local"].dt.to_period("M").dt.to_timestamp()
df["year"] = df["start_time_local"].dt.year
return df.sort_values("start_time_local").reset_index(drop=True)
# ---------------------------------------------------------------------------
# wellness
# ---------------------------------------------------------------------------
def load_wellness(conn: sqlite3.Connection) -> pd.DataFrame:
"""Joined daily wellness frame indexed by calendar_date (datetime)."""
df = pd.read_sql(
"""
SELECT s.calendar_date,
s.total_steps,
sl.sleep_score,
sl.deep_s, sl.light_s, sl.rem_s, sl.awake_s,
st.avg_stress,
h.last_night_avg AS hrv_last_night,
h.weekly_avg AS hrv_weekly,
h.status AS hrv_status,
im.moderate_minutes,
im.vigorous_minutes,
rh.resting_hr,
bb.charged AS bb_charged,
bb.drained AS bb_drained,
bb.highest AS bb_highest,
bb.lowest AS bb_lowest
FROM daily_steps s
LEFT JOIN daily_sleep sl ON sl.calendar_date = s.calendar_date
LEFT JOIN daily_stress st ON st.calendar_date = s.calendar_date
LEFT JOIN daily_hrv h ON h.calendar_date = s.calendar_date
LEFT JOIN daily_intensity_minutes im ON im.calendar_date = s.calendar_date
LEFT JOIN daily_resting_hr rh ON rh.calendar_date = s.calendar_date
LEFT JOIN daily_body_battery bb ON bb.calendar_date = s.calendar_date
ORDER BY s.calendar_date
""",
conn,
parse_dates=["calendar_date"],
).set_index("calendar_date")
df["sleep_total_s"] = df[["deep_s", "light_s", "rem_s"]].sum(axis=1, min_count=1)
df["sleep_hours"] = df["sleep_total_s"] / 3600
df["deep_pct"] = df["deep_s"] / df["sleep_total_s"]
df["rem_pct"] = df["rem_s"] / df["sleep_total_s"]
return df
# ---------------------------------------------------------------------------
# combine: training load by day, joined with next-day wellness
# ---------------------------------------------------------------------------
def daily_training_load(conn: sqlite3.Connection) -> pd.DataFrame:
"""Sum training load + distance per calendar date (any activity type)."""
acts = load_activities(conn)
daily = (
acts.groupby("date")
.agg(
training_load=("training_load", "sum"),
distance_km=("distance_km", "sum"),
duration_min=("duration_min", "sum"),
n_activities=("activity_id", "count"),
avg_hr_weighted=("avg_hr", "mean"), # simple unweighted; refine if needed
)
)
daily.index = pd.to_datetime(daily.index)
return daily
def joined(conn: sqlite3.Connection) -> pd.DataFrame:
"""Wellness joined with same-day and previous-day training load."""
wellness = load_wellness(conn)
tl = daily_training_load(conn)
df = wellness.join(tl, how="left")
df[["training_load", "distance_km", "duration_min", "n_activities"]] = (
df[["training_load", "distance_km", "duration_min", "n_activities"]].fillna(0)
)
# previous day training load (commonly correlated with overnight HRV / next-morning RHR)
df["training_load_prev"] = df["training_load"].shift(1)
df["distance_km_prev"] = df["distance_km"].shift(1)
return df
# ---------------------------------------------------------------------------
# expand the raw JSON of a table when you want fields the schema doesn't surface
# ---------------------------------------------------------------------------
def expand_raw(df: pd.DataFrame, raw_col: str = "raw") -> pd.DataFrame:
"""For a frame with a `raw` JSON column, return a normalized companion frame."""
if raw_col not in df.columns:
raise KeyError(f"no '{raw_col}' column in frame")
return pd.json_normalize([json.loads(r) for r in df[raw_col]])
# ---------------------------------------------------------------------------
# splits — per-lap data with cadence, stride, GPS, etc. extracted from raw JSON
# ---------------------------------------------------------------------------
_SPLIT_RAW_FIELDS = (
"averageRunCadence",
"maxRunCadence",
"strideLength",
"verticalOscillation",
"verticalRatio",
"groundContactTime",
"averagePower",
"normalizedPower",
"startLatitude",
"startLongitude",
"endLatitude",
"endLongitude",
"avgGradeAdjustedSpeed",
"maxHR",
"elevationGain",
"elevationLoss",
from openrun import * # noqa: F401,F403
from openrun.model import ( # noqa: F401 (re-export private-ish helpers some callers use)
_SPLIT_RAW_FIELDS,
_resolve_fit_path,
)
def load_splits(conn: sqlite3.Connection, *, activity_type: str | None = "running") -> pd.DataFrame:
"""Per-split frame with rich fields expanded from raw JSON, joined to activity start time.
Derived columns:
pace_min_per_km, pace_min_per_mile, speed_kmh, split_seq (0-based position in run),
n_splits (total in that run), frac_through (0..1), year, month.
Splits with implausible values (no HR, distance < 200m, pace > 30 min/km) are dropped.
"""
sql = """
SELECT s.activity_id, s.split_index, s.distance_m, s.duration_s,
s.avg_hr, s.avg_speed_mps, s.elevation_gain_m AS split_elev_gain_m,
s.raw, a.start_time_local, a.activity_type
FROM activity_splits s
JOIN activities a ON a.activity_id = s.activity_id
"""
params: list = []
if activity_type:
sql += " WHERE a.activity_type = ?"
params.append(activity_type)
sql += " ORDER BY s.activity_id, s.split_index"
df = pd.read_sql(sql, conn, params=params, parse_dates=["start_time_local"])
raws = [json.loads(r) if r else {} for r in df["raw"]]
for k in _SPLIT_RAW_FIELDS:
df[k] = [r.get(k) for r in raws]
df = df.drop(columns=["raw"])
df["pace_min_per_km"] = (df["duration_s"] / 60) / (df["distance_m"] / 1000)
df["pace_min_per_mile"] = (df["duration_s"] / 60) / (df["distance_m"] / 1609.344)
df["speed_kmh"] = df["avg_speed_mps"] * 3.6
bad = (
df["distance_m"].lt(200)
| df["avg_hr"].isna()
| df["avg_hr"].lt(60)
| df["pace_min_per_km"].gt(30)
| df["pace_min_per_km"].lt(2.5)
)
df = df.loc[~bad].copy()
df["split_seq"] = df.groupby("activity_id").cumcount()
df["n_splits"] = df.groupby("activity_id")["activity_id"].transform("count")
denom = (df["n_splits"] - 1).replace(0, pd.NA)
df["frac_through"] = df["split_seq"] / denom
df["year"] = df["start_time_local"].dt.year
df["month"] = df["start_time_local"].dt.to_period("M").dt.to_timestamp()
return df.reset_index(drop=True)
def decoupling(splits: pd.DataFrame, min_splits: int = 6) -> pd.DataFrame:
"""Per-activity Pa:Hr decoupling using duration-weighted halves.
`efficiency` per half = mean(speed_mps weighted by duration) / mean(HR weighted by duration).
`decoupling_pct` = (first_half_eff / second_half_eff - 1) * 100.
Positive = pace/HR dropped in 2nd half (the textbook 'cardiac drift' direction).
Negative = ran faster per beat in 2nd half (often: negative split, conservative start).
Endurance benchmark: <5% on a steady aerobic run is 'aerobically developed'.
"""
valid = splits[splits["n_splits"] >= min_splits].copy()
valid["half"] = (valid["frac_through"] >= 0.5).map({False: "first", True: "second"})
def _half_eff(d: pd.DataFrame) -> float:
w = d["duration_s"].to_numpy()
speed = (d["avg_speed_mps"].to_numpy() * w).sum() / w.sum()
hr = (d["avg_hr"].to_numpy() * w).sum() / w.sum()
return speed / hr if hr else float("nan")
eff = (
valid.groupby(["activity_id", "half"])[["avg_speed_mps", "avg_hr", "duration_s"]]
.apply(_half_eff)
.unstack("half")
)
eff["decoupling_pct"] = (eff["first"] / eff["second"] - 1) * 100
eff = eff.dropna(subset=["decoupling_pct"])
meta = valid.groupby("activity_id").agg(
start_time_local=("start_time_local", "first"),
distance_km=("distance_m", lambda s: s.sum() / 1000),
duration_min=("duration_s", lambda s: s.sum() / 60),
avg_hr=("avg_hr", "mean"),
avg_pace_min_per_km=("pace_min_per_km", "mean"),
n_splits=("n_splits", "first"),
)
out = eff.join(meta).reset_index()
out["year"] = out["start_time_local"].dt.year
return out
_HR_ZONE_BOUNDS = (0.50, 0.60, 0.70, 0.80, 0.90, 1.01)
_HR_ZONE_LABELS = ("Z1", "Z2", "Z3", "Z4", "Z5")
def assign_hr_zone(hr: float, hr_max: float) -> str | None:
if hr is None or pd.isna(hr) or not hr_max:
return None
frac = hr / hr_max
for lo, hi, lab in zip(_HR_ZONE_BOUNDS[:-1], _HR_ZONE_BOUNDS[1:], _HR_ZONE_LABELS):
if lo <= frac < hi:
return lab
return "Z5" if frac >= _HR_ZONE_BOUNDS[-2] else "Z1"
# Garmin-configured HR zones for this user (source: DI_CONNECT/.../heartRateZones.json).
# trainingMethod=HR_MAX, maxHeartRateUsed=209, lactateThresholdHR=182, RHR=52.
HR_ZONES_USER: tuple[tuple[str, int, int], ...] = (
("Z1", 102, 122), # recovery
("Z2", 123, 143), # easy aerobic — long-run target
("Z3", 144, 164), # tempo / "junk-miles middle"
("Z4", 165, 185), # threshold (LTHR sits inside Z4 at 182)
("Z5", 186, 209), # VO2 max
)
def hr_to_user_zone(hr: float, zones: tuple[tuple[str, int, int], ...] = HR_ZONES_USER) -> str | None:
"""Map a single HR reading to its configured zone label (Z1..Z5).
Below Z1 floor → None (warmup / walking).
Above Z5 ceiling → still Z5 (rare, edge of effort).
"""
if hr is None or pd.isna(hr):
return None
if hr < zones[0][1]: # below Z1 lower bound
return None
for label, lo, hi in zones:
if lo <= hr <= hi:
return label
return zones[-1][0] # above Z5 ceiling
def time_in_zone_from_fit(records: pd.DataFrame,
zones: tuple[tuple[str, int, int], ...] = HR_ZONES_USER) -> dict[str, float]:
"""Per-second time-in-zone (seconds) from a FIT records frame.
Each record contributes `elapsed_s - prev_elapsed_s` to whichever zone its HR
falls in. Large gaps (>30 s, e.g. a paused recording) are clipped to 30 s
so a stopped watch doesn't dump hours into one zone.
"""
if records is None or records.empty or "heart_rate" not in records:
return {}
r = records.dropna(subset=["heart_rate", "elapsed_s"]).copy()
if r.empty:
return {}
r["dt"] = r["elapsed_s"].diff().fillna(1.0).clip(lower=0, upper=30.0)
r["zone"] = r["heart_rate"].apply(lambda h: hr_to_user_zone(h, zones))
return r.dropna(subset=["zone"]).groupby("zone")["dt"].sum().to_dict()
def time_in_zone_from_splits(splits_df: pd.DataFrame,
zones: tuple[tuple[str, int, int], ...] = HR_ZONES_USER) -> dict[str, float]:
"""Fallback when there's no FIT — assign each split's avg HR to one zone.
Coarser than the per-second method: a split with avg HR 155 contributes its
entire duration to Z3, even if it was actually 1 min Z2 + 3 min Z3 + 1 min Z4.
"""
if splits_df is None or splits_df.empty:
return {}
s = splits_df.dropna(subset=["avg_hr", "duration_s"]).copy()
s["zone"] = s["avg_hr"].apply(lambda h: hr_to_user_zone(h, zones))
return s.dropna(subset=["zone"]).groupby("zone")["duration_s"].sum().to_dict()
def haversine_km(lat1, lon1, lat2, lon2):
"""Vectorised great-circle distance, kilometres. Inputs in degrees."""
import numpy as np
r = 6371.0
lat1, lon1, lat2, lon2 = map(np.radians, (lat1, lon1, lat2, lon2))
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat / 2) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2) ** 2
return 2 * r * np.arcsin(np.sqrt(a))
def banister(
daily_load: pd.Series,
*,
ctl_tau: float = 42.0,
atl_tau: float = 7.0,
start_date: str | pd.Timestamp | None = None,
end_date: str | pd.Timestamp | None = None,
) -> pd.DataFrame:
"""Banister fitness/fatigue/form (CTL/ATL/TSB) from a daily training-load series.
`daily_load` should be a Series indexed by date (one value per day, 0 for rest days).
Missing dates inside the range are filled with 0 — a rest day still updates both
EWMAs (CTL drifts down slowly, ATL drifts down fast → TSB recovers).
Returns a frame indexed by date with columns CTL, ATL, TSB.
Conventions (per Coggan / TrainingPeaks):
CTL_today = CTL_yesterday · exp(1/τ_CTL) + load_today · (1 exp(1/τ_CTL))
ATL_today = ATL_yesterday · exp(1/τ_ATL) + load_today · (1 exp(1/τ_ATL))
TSB_today = CTL_yesterday ATL_yesterday # *yesterday's* values
TSB interpretation:
< 30 severely fatigued (injury risk)
10 to 30 productive overload, the heart of a build block
10 to 0 balanced building
0 to +10 sharpening
+10 to +25 fresh / peaked ← race-day target
> +25 detrained (taper too long)
"""
import numpy as np
if daily_load.empty:
return pd.DataFrame(columns=["CTL", "ATL", "TSB"])
idx = pd.to_datetime(daily_load.index).normalize()
s = pd.Series(daily_load.values, index=idx).groupby(level=0).sum()
lo = pd.Timestamp(start_date) if start_date else s.index.min()
hi = pd.Timestamp(end_date) if end_date else s.index.max()
full = s.reindex(pd.date_range(lo, hi, freq="D"), fill_value=0.0)
decay_ctl, decay_atl = np.exp(-1 / ctl_tau), np.exp(-1 / atl_tau)
w_ctl, w_atl = 1 - decay_ctl, 1 - decay_atl
n = len(full)
ctl = np.zeros(n)
atl = np.zeros(n)
loads = full.to_numpy()
for i in range(n):
prev_ctl = ctl[i - 1] if i else 0.0
prev_atl = atl[i - 1] if i else 0.0
ctl[i] = prev_ctl * decay_ctl + loads[i] * w_ctl
atl[i] = prev_atl * decay_atl + loads[i] * w_atl
out = pd.DataFrame({"CTL": ctl, "ATL": atl}, index=full.index)
out["TSB"] = out["CTL"].shift(1) - out["ATL"].shift(1)
return out
def daily_training_load_series(
conn: sqlite3.Connection,
*,
activity_types: tuple[str, ...] = ("running", "trail_running"),
) -> pd.Series:
"""Daily-summed training_load across the given activity types, in ascending date order."""
placeholders = ",".join(["?"] * len(activity_types))
df = pd.read_sql(
f"""SELECT date(start_time_local) AS d, SUM(training_load) AS tl
FROM activities
WHERE activity_type IN ({placeholders}) AND training_load IS NOT NULL
GROUP BY d ORDER BY d""",
conn,
params=list(activity_types),
parse_dates=["d"],
)
return df.set_index("d")["tl"]
def _resolve_fit_path(rel_path: str) -> Path:
"""Find a FIT file on disk. `fit_path` in the DB is stored relative to the
export root that was passed to `link_fit_files.py`. We don't know which
top-level folder under the project that was, so try each."""
project_root = Path(__file__).parent
for entry in project_root.iterdir():
if entry.is_dir():
candidate = entry / rel_path
if candidate.exists():
return candidate
# Maybe the path is already absolute or relative to cwd
p = Path(rel_path)
if p.exists():
return p
raise FileNotFoundError(f"could not locate FIT file: {rel_path}")
def load_fit_records(conn: sqlite3.Connection, activity_id: int) -> pd.DataFrame:
"""Per-second FIT `record` messages for one activity as a DataFrame.
Columns (subset of what's present):
timestamp (UTC, tz-aware), elapsed_s, heart_rate, speed_mps, distance_m,
cadence_spm (both legs), altitude_m, power_w, position_lat_deg,
position_long_deg, vertical_oscillation_mm, step_length_mm.
Raises if no FIT is linked for the activity.
"""
import fitparse # heavy-ish import; keep lazy
row = conn.execute(
"SELECT fit_path FROM activity_fit_files WHERE activity_id = ?", (activity_id,)
).fetchone()
if row is None:
raise ValueError(f"no FIT linked for activity {activity_id}")
fit_file = _resolve_fit_path(row[0])
fit = fitparse.FitFile(str(fit_file))
rows: list[dict] = []
for msg in fit.get_messages("record"):
rows.append(msg.get_values())
if not rows:
return pd.DataFrame()
df = pd.DataFrame(rows)
# Normalise & rename
out = pd.DataFrame()
if "timestamp" in df.columns:
out["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)
out["elapsed_s"] = (out["timestamp"] - out["timestamp"].iloc[0]).dt.total_seconds()
out["heart_rate"] = df.get("heart_rate")
# Prefer enhanced_speed (always m/s) over the legacy `speed` field
out["speed_mps"] = df.get("enhanced_speed", df.get("speed"))
out["distance_m"] = df.get("distance")
# `cadence` is already both-legs SPM in this account's exports;
# fractional_cadence is a 01 fractional adjustment, ignored.
out["cadence_spm"] = df.get("cadence")
out["altitude_m"] = df.get("enhanced_altitude", df.get("altitude"))
out["power_w"] = df.get("power")
out["vertical_oscillation_mm"] = df.get("vertical_oscillation")
out["step_length_mm"] = df.get("step_length")
# Position: semicircles → degrees
SEMI = 180.0 / (2 ** 31)
if "position_lat" in df.columns:
out["position_lat_deg"] = df["position_lat"] * SEMI
if "position_long" in df.columns:
out["position_long_deg"] = df["position_long"] * SEMI
return out
def fit_decoupling(
records: pd.DataFrame,
*,
segments: int = 2,
warmup_min: float = 5.0,
cooldown_min: float = 2.0,
min_speed_mps: float = 0.5,
) -> pd.DataFrame:
"""Per-second Pa:Hr decoupling — Friel's method, faithful to the literature.
Steps:
1. Drop the first `warmup_min` and last `cooldown_min` of the run.
2. Drop "stopped" records (speed below `min_speed_mps`) so aid-station
pauses don't drag mean speed down.
3. Slice the remaining moving time into `segments` equal-time chunks.
4. For each chunk: `efficiency = mean(speed_mps) / mean(heart_rate)`.
5. decoupling % = (segment_i / segment_0 1) × 100. Negative ⇒ pace/HR
improved (negative split). Positive ⇒ cardiac drift.
Returns one row per segment.
"""
r = records.dropna(subset=["heart_rate", "speed_mps", "elapsed_s"]).copy()
if r.empty:
return pd.DataFrame()
total = r["elapsed_s"].iloc[-1]
r = r[(r["elapsed_s"] >= warmup_min * 60) & (r["elapsed_s"] <= total - cooldown_min * 60)]
moving = r[r["speed_mps"] >= min_speed_mps].copy()
if moving.empty:
return pd.DataFrame()
moving = moving.reset_index(drop=True)
seg_size = len(moving) // segments
out_rows: list[dict] = []
for i in range(segments):
s = i * seg_size
e = (i + 1) * seg_size if i < segments - 1 else len(moving)
chunk = moving.iloc[s:e]
speed = chunk["speed_mps"].mean()
hr = chunk["heart_rate"].mean()
out_rows.append(
{
"segment": i + 1,
"from_min": chunk["elapsed_s"].iloc[0] / 60,
"to_min": chunk["elapsed_s"].iloc[-1] / 60,
"mean_speed_mps": speed,
"mean_pace_min_per_km": (1 / speed) * 1000 / 60 if speed else float("nan"),
"mean_hr": hr,
"efficiency": speed / hr if hr else float("nan"),
}
)
out = pd.DataFrame(out_rows)
base = out["efficiency"].iloc[0]
out["decoupling_pct"] = (base / out["efficiency"] - 1) * 100
return out
def fit_rolling_efficiency(records: pd.DataFrame, window_s: int = 300) -> pd.DataFrame:
"""Rolling mean speed/HR (efficiency) and its derived rolling pace + HR.
Useful for plotting when efficiency declines through a race. `window_s`
defaults to 5 minutes — long enough to smooth GPS/HR jitter but short
enough to see drift in the second half.
"""
r = records.dropna(subset=["heart_rate", "speed_mps", "elapsed_s"]).copy()
if r.empty:
return r
r = r.set_index("elapsed_s")
win = f"{window_s}s"
# Rolling needs a DatetimeIndex; build a synthetic one from elapsed_s.
r["_ts"] = pd.to_datetime(r.index, unit="s")
r = r.set_index("_ts")
rolled = pd.DataFrame(index=r.index)
rolled["rolling_speed_mps"] = r["speed_mps"].rolling(win).mean()
rolled["rolling_hr"] = r["heart_rate"].rolling(win).mean()
rolled["rolling_efficiency"] = rolled["rolling_speed_mps"] / rolled["rolling_hr"]
rolled["elapsed_min"] = (rolled.index - rolled.index[0]).total_seconds() / 60
return rolled.reset_index(drop=True)
def cluster_routes(lats, lons, radius_km: float = 0.25):
"""Greedy haversine-radius clustering of run start points.
Assigns each point to the cluster of the first unassigned point within `radius_km`.
Returns an integer label array; -1 means unclustered (no neighbours).
Good enough for a few hundred runs; for thousands, switch to sklearn DBSCAN with metric='haversine'.
"""
import numpy as np
lats = np.asarray(lats, dtype=float)
lons = np.asarray(lons, dtype=float)
n = len(lats)
labels = np.full(n, -1, dtype=int)
next_label = 0
for i in range(n):
if labels[i] != -1:
continue
d = haversine_km(lats[i], lons[i], lats, lons)
neigh = np.where((d <= radius_km) & (labels == -1))[0]
# Require at least 2 runs to count as a "route"; singletons stay -1.
if len(neigh) >= 2:
labels[neigh] = next_label
next_label += 1
return labels

30
auth.py
View File

@@ -1,31 +1,5 @@
"""One-time login to Garmin Connect; saves OAuth tokens to .secrets/.
Usage:
uv run auth.py
Prompts for email + password (and MFA code if your account has 2FA).
Tokens are refreshed automatically by sync.py on subsequent runs
(valid ~1 year).
"""
import getpass
import os
from pathlib import Path
import garth
TOKEN_DIR = Path(__file__).parent / ".secrets"
def main() -> None:
TOKEN_DIR.mkdir(exist_ok=True)
email = os.environ.get("GARMIN_EMAIL") or input("Garmin email: ").strip()
password = os.environ.get("GARMIN_PASSWORD") or getpass.getpass("Garmin password: ")
garth.login(email, password)
garth.save(TOKEN_DIR)
profile = garth.client.username
print(f"Logged in as {profile}. Tokens saved to {TOKEN_DIR}.")
"""Shim — see openrun.ingest.auth."""
from openrun.ingest.auth import main
if __name__ == "__main__":
main()

View File

@@ -1,134 +1,5 @@
"""Precompute per-activity time-in-zone and cache it in `activity_time_in_zone`.
For each activity, prefer the linked FIT file (per-second HR) over splits
(lap-averaged HR). Skips activities already in the cache unless --force.
Usage:
uv run compute_time_in_zone.py # incremental
uv run compute_time_in_zone.py --force # recompute all
uv run compute_time_in_zone.py --type running # restrict by activity type
The split (lap-averaged) fallback is intentionally noisy compared to FIT —
re-running with new FITs is the fix, not retuning the lap method.
"""
from __future__ import annotations
import argparse
import sys
import time
from pathlib import Path
import pandas as pd
from analysis import (
HR_ZONES_USER,
load_fit_records,
time_in_zone_from_fit,
time_in_zone_from_splits,
)
from db import connect
_ZONE_LABELS = tuple(z[0] for z in HR_ZONES_USER)
def _upsert(conn, activity_id: int, by_zone: dict[str, float], source: str) -> None:
z = {lab: by_zone.get(lab, 0.0) for lab in _ZONE_LABELS}
total = sum(z.values())
conn.execute(
"""INSERT OR REPLACE INTO activity_time_in_zone
(activity_id, z1_s, z2_s, z3_s, z4_s, z5_s, total_s, source, computed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))""",
(activity_id, z["Z1"], z["Z2"], z["Z3"], z["Z4"], z["Z5"], total, source),
)
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--force", action="store_true", help="Recompute even if cached")
parser.add_argument("--type", default=None, help="Restrict to one activity_type (e.g. running)")
parser.add_argument("--limit", type=int, default=None, help="Stop after N activities (debugging)")
args = parser.parse_args()
conn = connect()
# Pick targets: all activities, optionally filtered by type, minus already-cached
sql = "SELECT a.activity_id, a.activity_type FROM activities a"
params: list = []
where: list[str] = []
if args.type:
where.append("a.activity_type = ?")
params.append(args.type)
if not args.force:
where.append(
"a.activity_id NOT IN (SELECT activity_id FROM activity_time_in_zone)"
)
if where:
sql += " WHERE " + " AND ".join(where)
sql += " ORDER BY a.start_time_local DESC"
if args.limit:
sql += f" LIMIT {args.limit}"
targets = list(conn.execute(sql, params))
print(f"{len(targets):,} activities to compute "
f"({'force' if args.force else 'incremental'}"
+ (f", type={args.type}" if args.type else "")
+ ")")
if not targets:
return
# Pre-load all splits in one query for the lap-fallback path
splits_df = pd.read_sql(
"SELECT activity_id, avg_hr, duration_s FROM activity_splits",
conn,
)
splits_by_id = {aid: g for aid, g in splits_df.groupby("activity_id")}
fit_count = lap_count = empty = errors = 0
t0 = time.time()
for i, row in enumerate(targets, 1):
aid = row["activity_id"]
has_fit = conn.execute(
"SELECT 1 FROM activity_fit_files WHERE activity_id = ? LIMIT 1", (aid,)
).fetchone() is not None
try:
if has_fit:
rec = load_fit_records(conn, aid)
tiz = time_in_zone_from_fit(rec)
if tiz:
_upsert(conn, aid, tiz, "fit")
fit_count += 1
if i % 25 == 0:
rate = i / max(time.time() - t0, 1e-6)
eta = (len(targets) - i) / rate
print(f" {i}/{len(targets)} fit={fit_count} lap={lap_count} empty={empty} "
f"rate={rate:.1f}/s eta={eta:.0f}s")
continue
# Fallback: lap-averaged
s = splits_by_id.get(aid)
if s is not None and not s.empty:
tiz = time_in_zone_from_splits(s)
if tiz:
_upsert(conn, aid, tiz, "lap")
lap_count += 1
continue
empty += 1
except Exception as exc: # noqa: BLE001
errors += 1
print(f" ! aid={aid}: {type(exc).__name__}: {exc}", file=sys.stderr)
if i % 50 == 0:
conn.commit()
conn.commit()
print()
print("=== summary ===")
print(f" from FIT : {fit_count:,}")
print(f" from laps : {lap_count:,}")
print(f" empty : {empty:,} (no FIT, no useful splits)")
print(f" errors : {errors:,}")
print(f" elapsed : {time.time() - t0:.1f} s")
"""Shim — see openrun.ingest.time_in_zone."""
from openrun.ingest.time_in_zone import main
if __name__ == "__main__":
main()

164
db.py
View File

@@ -1,162 +1,2 @@
"""SQLite schema + connection helpers."""
from __future__ import annotations
import sqlite3
from pathlib import Path
DB_PATH = Path(__file__).parent / "data" / "garmin.db"
SCHEMA = """
CREATE TABLE IF NOT EXISTS activities (
activity_id INTEGER PRIMARY KEY,
start_time_local TEXT,
start_time_gmt TEXT,
activity_type TEXT,
activity_name TEXT,
distance_m REAL,
duration_s REAL,
moving_duration_s REAL,
avg_speed_mps REAL,
max_speed_mps REAL,
avg_hr REAL,
max_hr REAL,
calories REAL,
elevation_gain_m REAL,
elevation_loss_m REAL,
training_load REAL,
aerobic_te REAL,
anaerobic_te REAL,
vo2_max REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_activities_start ON activities(start_time_local);
CREATE INDEX IF NOT EXISTS idx_activities_type ON activities(activity_type);
CREATE TABLE IF NOT EXISTS activity_splits (
activity_id INTEGER NOT NULL,
split_index INTEGER NOT NULL,
distance_m REAL,
duration_s REAL,
avg_hr REAL,
avg_speed_mps REAL,
elevation_gain_m REAL,
raw TEXT NOT NULL,
PRIMARY KEY (activity_id, split_index),
FOREIGN KEY (activity_id) REFERENCES activities(activity_id)
);
CREATE TABLE IF NOT EXISTS daily_steps (
calendar_date TEXT PRIMARY KEY,
total_steps INTEGER,
step_goal INTEGER,
distance_m REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_sleep (
calendar_date TEXT PRIMARY KEY,
sleep_start_gmt TEXT,
sleep_end_gmt TEXT,
deep_s REAL,
light_s REAL,
rem_s REAL,
awake_s REAL,
sleep_score REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_stress (
calendar_date TEXT PRIMARY KEY,
avg_stress REAL,
max_stress REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_hrv (
calendar_date TEXT PRIMARY KEY,
weekly_avg REAL,
last_night_avg REAL,
last_night_5min REAL,
status TEXT,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_body_battery (
calendar_date TEXT PRIMARY KEY,
charged INTEGER,
drained INTEGER,
highest INTEGER,
lowest INTEGER,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_intensity_minutes (
calendar_date TEXT PRIMARY KEY,
moderate_minutes INTEGER,
vigorous_minutes INTEGER,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_resting_hr (
calendar_date TEXT PRIMARY KEY,
resting_hr INTEGER,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS activity_fit_files (
activity_id INTEGER PRIMARY KEY,
fit_path TEXT NOT NULL,
indexed_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS activity_time_in_zone (
activity_id INTEGER PRIMARY KEY,
z1_s REAL,
z2_s REAL,
z3_s REAL,
z4_s REAL,
z5_s REAL,
total_s REAL,
source TEXT,
computed_at TEXT NOT NULL,
FOREIGN KEY (activity_id) REFERENCES activities(activity_id)
);
CREATE TABLE IF NOT EXISTS sync_state (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
updated_at TEXT NOT NULL
);
"""
def connect() -> sqlite3.Connection:
DB_PATH.parent.mkdir(exist_ok=True)
conn = sqlite3.connect(DB_PATH)
conn.execute("PRAGMA journal_mode = WAL")
conn.execute("PRAGMA foreign_keys = ON")
conn.row_factory = sqlite3.Row
conn.executescript(SCHEMA)
return conn
def set_state(conn: sqlite3.Connection, key: str, value: str) -> None:
conn.execute(
"INSERT INTO sync_state(key, value, updated_at) VALUES (?, ?, datetime('now')) "
"ON CONFLICT(key) DO UPDATE SET value=excluded.value, updated_at=excluded.updated_at",
(key, value),
)
def get_state(conn: sqlite3.Connection, key: str) -> str | None:
row = conn.execute("SELECT value FROM sync_state WHERE key = ?", (key,)).fetchone()
return row["value"] if row else None
"""Back-compat shim — see openrun.db."""
from openrun.db import SCHEMA, connect, get_state, set_state # noqa: F401

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 01 Overview\n",
"# 01 \u2014 Overview\n",
"\n",
"Sanity-check the synced data: what's there, how complete it is, and what you've been up to recently.\n",
"\n",
@@ -13,7 +13,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
@@ -22,7 +22,7 @@
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from analysis import open_conn, load_activities, load_wellness\n",
"from openrun import open_conn, load_activities, load_wellness\n",
"\n",
"pd.options.display.float_format = '{:.2f}'.format\n",
"conn = open_conn()"
@@ -514,7 +514,7 @@
"ax.set_xticklabels(['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])\n",
"ax.set_ylabel('km')\n",
"ax.set_xlabel('')\n",
"ax.set_title('Monthly running km year over year')\n",
"ax.set_title('Monthly running km \u2014 year over year')\n",
"ax.legend(title='Year')\n",
"ax.grid(alpha=0.3)\n",
"plt.tight_layout()"
@@ -628,4 +628,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -4,30 +4,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 02 Running\n",
"# 02 \u2014 Running\n",
"\n",
"Volume, pace, HR efficiency, training load."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"331 runs from 2022-04-09 to 2026-05-10\n"
]
}
],
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from analysis import open_conn, load_activities\n",
"from openrun import open_conn, load_activities\n",
"\n",
"conn = open_conn()\n",
"runs = load_activities(conn, type='running')\n",
@@ -103,7 +94,7 @@
" ax.scatter(g['avg_hr'], g['pace_min_per_km'], s=g['distance_km'] * 6, alpha=0.5, label=str(yr))\n",
"ax.invert_yaxis() # faster pace = smaller number, want top\n",
"ax.set_xlabel('Avg HR (bpm)')\n",
"ax.set_ylabel('Pace (min/km, faster )')\n",
"ax.set_ylabel('Pace (min/km, faster \u2191)')\n",
"ax.set_title('Pace vs. HR by run, sized by distance')\n",
"ax.legend(title='Year')\n",
"ax.grid(alpha=0.3)\n",
@@ -147,10 +138,10 @@
"monthly_easy = easy.set_index('start_time_local')['pace_min_per_km'].resample('ME').median()\n",
"\n",
"fig, ax = plt.subplots(figsize=(13, 4))\n",
"ax.scatter(easy['start_time_local'], easy['pace_min_per_km'], alpha=0.3, s=easy['distance_km']*5, label='runs (HR<150, 3km)')\n",
"ax.scatter(easy['start_time_local'], easy['pace_min_per_km'], alpha=0.3, s=easy['distance_km']*5, label='runs (HR<150, \u22653km)')\n",
"monthly_easy.plot(ax=ax, color='C3', lw=2, marker='o', label='Monthly median')\n",
"ax.invert_yaxis()\n",
"ax.set_ylabel('Pace (min/km, faster )')\n",
"ax.set_ylabel('Pace (min/km, faster \u2191)')\n",
"ax.set_title('Easy-pace trend (proxy for aerobic fitness)')\n",
"ax.legend()\n",
"ax.grid(alpha=0.3)\n",
@@ -160,14 +151,49 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Training load Banister CTL / ATL / TSB\n\nGarmin reports a per-activity training load. The standard endurance-training lens (TrainingPeaks \"Performance Management Chart\") tracks three derived numbers:\n\n- **CTL** *Chronic Training Load* EWMA of daily load with τ = 42 days. **Fitness.**\n- **ATL** *Acute Training Load* same EWMA with τ = 7 days. **Fatigue.**\n- **TSB** *Training Stress Balance* yesterday's CTL minus yesterday's ATL. **Form.**\n\nTSB interpretation:\n\n| TSB | meaning |\n|---|---|\n| < 30 | severely fatigued (injury risk) |\n| 10 to 30 | productive overload heart of a build |\n| 10 to 0 | balanced building |\n| 0 to +10 | sharpening |\n| **+10 to +25** | **fresh / peaked race-day target** |\n| > +25 | detrained (taper too long) |\n\nThis replaces the older 7/28-day rolling ACWR plot same data, EWMAs are smoother and TSB gives you race-day-readiness directly."
"source": "## Training load \u2014 Banister CTL / ATL / TSB\n\nGarmin reports a per-activity training load. The standard endurance-training lens (TrainingPeaks \"Performance Management Chart\") tracks three derived numbers:\n\n- **CTL** *Chronic Training Load* \u2014 EWMA of daily load with \u03c4 = 42 days. **Fitness.**\n- **ATL** *Acute Training Load* \u2014 same EWMA with \u03c4 = 7 days. **Fatigue.**\n- **TSB** *Training Stress Balance* \u2014 yesterday's CTL minus yesterday's ATL. **Form.**\n\nTSB interpretation:\n\n| TSB | meaning |\n|---|---|\n| < \u221230 | severely fatigued (injury risk) |\n| \u221210 to \u221230 | productive overload \u2014 heart of a build |\n| \u221210 to 0 | balanced building |\n| 0 to +10 | sharpening |\n| **+10 to +25** | **fresh / peaked \u2014 race-day target** |\n| > +25 | detrained (taper too long) |\n\nThis replaces the older 7/28-day rolling ACWR plot \u2014 same data, EWMAs are smoother and TSB gives you race-day-readiness directly."
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": "from analysis import banister, daily_training_load_series\n\ntl = daily_training_load_series(conn)\npmc = banister(tl)\n\nfig, (ax1, ax2) = plt.subplots(2, 1, figsize=(13, 7), sharex=True)\n\n# top panel: fitness + fatigue\nax1.plot(pmc.index, pmc['CTL'], color='#2a9d8f', lw=2, label='CTL (fitness, 42d)')\nax1.plot(pmc.index, pmc['ATL'], color='#e76f51', lw=1.2, alpha=0.85, label='ATL (fatigue, 7d)')\nax1.set_ylabel('training load')\nax1.legend(loc='upper left')\nax1.grid(alpha=0.3)\nax1.set_title('Performance Management Chart — CTL / ATL / TSB')\n\n# bottom panel: form\nax2.plot(pmc.index, pmc['TSB'], color='#264653', lw=1.5)\nax2.axhspan(10, 25, color='#2a9d8f', alpha=0.12, label='race-ready (+10 to +25)')\nax2.axhspan(-30, -10, color='#e9c46a', alpha=0.12, label='productive overload (30 to 10)')\nax2.axhline(0, color='gray', lw=0.6)\nax2.axhline(-30, color='#e76f51', ls='--', lw=0.8, label='injury-risk floor (30)')\nax2.set_ylabel('TSB (form)')\nax2.legend(loc='lower left', fontsize=9)\nax2.grid(alpha=0.3)\n\n# annotate prior race days\nrace_dates = pd.to_datetime(['2023-09-23', '2024-09-21', '2025-09-06', '2025-09-20'])\nfor rd in race_dates:\n if rd in pmc.index:\n ax2.axvline(rd, color='#d62828', alpha=0.4, lw=1)\n ax2.annotate(f\"{rd.strftime('%Y-%m-%d')}\\nTSB={pmc.loc[rd, 'TSB']:+.0f}\",\n xy=(rd, pmc.loc[rd, 'TSB']), xytext=(5, 8),\n textcoords='offset points', fontsize=8, color='#d62828')\nplt.tight_layout()"
"source": [
"from openrun import banister, daily_training_load_series\n",
"\n",
"tl = daily_training_load_series(conn)\n",
"pmc = banister(tl)\n",
"\n",
"fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(13, 7), sharex=True)\n",
"\n",
"# top panel: fitness + fatigue\n",
"ax1.plot(pmc.index, pmc['CTL'], color='#2a9d8f', lw=2, label='CTL (fitness, 42d)')\n",
"ax1.plot(pmc.index, pmc['ATL'], color='#e76f51', lw=1.2, alpha=0.85, label='ATL (fatigue, 7d)')\n",
"ax1.set_ylabel('training load')\n",
"ax1.legend(loc='upper left')\n",
"ax1.grid(alpha=0.3)\n",
"ax1.set_title('Performance Management Chart \u2014 CTL / ATL / TSB')\n",
"\n",
"# bottom panel: form\n",
"ax2.plot(pmc.index, pmc['TSB'], color='#264653', lw=1.5)\n",
"ax2.axhspan(10, 25, color='#2a9d8f', alpha=0.12, label='race-ready (+10 to +25)')\n",
"ax2.axhspan(-30, -10, color='#e9c46a', alpha=0.12, label='productive overload (\u221230 to \u221210)')\n",
"ax2.axhline(0, color='gray', lw=0.6)\n",
"ax2.axhline(-30, color='#e76f51', ls='--', lw=0.8, label='injury-risk floor (\u221230)')\n",
"ax2.set_ylabel('TSB (form)')\n",
"ax2.legend(loc='lower left', fontsize=9)\n",
"ax2.grid(alpha=0.3)\n",
"\n",
"# annotate prior race days\n",
"race_dates = pd.to_datetime(['2023-09-23', '2024-09-21', '2025-09-06', '2025-09-20'])\n",
"for rd in race_dates:\n",
" if rd in pmc.index:\n",
" ax2.axvline(rd, color='#d62828', alpha=0.4, lw=1)\n",
" ax2.annotate(f\"{rd.strftime('%Y-%m-%d')}\\nTSB={pmc.loc[rd, 'TSB']:+.0f}\",\n",
" xy=(rd, pmc.loc[rd, 'TSB']), xytext=(5, 8),\n",
" textcoords='offset points', fontsize=8, color='#d62828')\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",

View File

@@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 03 Recovery & wellness\n",
"# 03 \u2014 Recovery & wellness\n",
"\n",
"Sleep, HRV, RHR, body battery, and how they relate to training load."
]
@@ -16,23 +16,22 @@
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from analysis import open_conn, load_wellness, joined\n",
"from openrun import open_conn, load_wellness, joined\n",
"\n",
"conn = open_conn()\n",
"w = load_wellness(conn)\n",
"j = joined(conn)\n",
"print(f'{len(w)} days, {w.index.min().date()} {w.index.max().date()}')"
"print(f'{len(w)} days, {w.index.min().date()} \u2192 {w.index.max().date()}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Recent 30 days at a glance"
"## Recent 30 days \u2014 at a glance"
]
},
{
@@ -74,7 +73,7 @@
"fig, ax = plt.subplots(figsize=(13, 4))\n",
"stages.tail(60).plot.area(ax=ax, alpha=0.7, color=['#1f3a93','#6dd5fa','#9b59b6','#e74c3c'])\n",
"ax.set_ylabel('Hours')\n",
"ax.set_title('Sleep stages last 60 nights')\n",
"ax.set_title('Sleep stages \u2014 last 60 nights')\n",
"ax.grid(alpha=0.3)\n",
"plt.tight_layout()"
]
@@ -185,9 +184,15 @@
}
],
"metadata": {
"kernelspec": {"display_name": ".venv", "language": "python", "name": "python3"},
"language_info": {"name": "python"}
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -4,41 +4,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 04 Are you running more efficiently?\n",
"# 04 \u2014 Are you running more efficiently?\n",
"\n",
"Heart rate and speed as a function of distance, year over year.\n",
"\n",
"## Method\n",
"\n",
"**Efficiency metric:** *meters per heartbeat* = `distance_m / (duration_min × avg_HR)`.\n",
"**Efficiency metric:** *meters per heartbeat* = `distance_m / (duration_min \u00d7 avg_HR)`.\n",
"Higher = more forward motion produced per beat = more aerobically efficient.\n",
"\n",
"**Controlling for confounders:**\n",
"- Distance: longer runs naturally drift HR up and pace down we bucket by distance.\n",
"- Effort type: hard intervals are not comparable to easy aerobic runs we report an *all runs* view and an *easy runs only* (HR < 155) view.\n",
"- Distance: longer runs naturally drift HR up and pace down \u2192 we bucket by distance.\n",
"- Effort type: hard intervals are not comparable to easy aerobic runs \u2192 we report an *all runs* view and an *easy runs only* (HR < 155) view.\n",
"- Activity type: filtered to `running` (excludes trail, cycling, etc.)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"322 runs with HR + pace, 2022-2026\n"
]
}
],
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from analysis import open_conn, load_activities\n",
"from openrun import open_conn, load_activities\n",
"\n",
"pd.options.display.float_format = '{:.2f}'.format\n",
"conn = open_conn()\n",
@@ -178,10 +169,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Yearly summary (runs 5 km)\n",
"## Yearly summary (runs \u2265 5 km)\n",
"\n",
"If `m/beat` is *increasing* year over year more efficient.\n",
"If it's *decreasing* with HR roughly constant losing fitness *or* something is off (sensor, conditions)."
"If `m/beat` is *increasing* year over year \u2192 more efficient.\n",
"If it's *decreasing* with HR roughly constant \u2192 losing fitness *or* something is off (sensor, conditions)."
]
},
{
@@ -301,7 +292,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Meters per heartbeat the headline efficiency view\n",
"## Meters per heartbeat \u2014 the headline efficiency view\n",
"\n",
"One line per year. Each line shows the median efficiency at each distance bucket."
]
@@ -371,7 +362,7 @@
"\n",
"pace_pv.plot(ax=ax2, marker='o')\n",
"ax2.set_title('Median pace by distance')\n",
"ax2.set_ylabel('min/km (faster )'); ax2.grid(alpha=0.3); ax2.legend(title='Year')\n",
"ax2.set_ylabel('min/km (faster \u2193)'); ax2.grid(alpha=0.3); ax2.legend(title='Year')\n",
"plt.tight_layout()"
]
},
@@ -379,7 +370,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pace vs HR scatter every run, colored by year\n",
"## Pace vs HR scatter \u2014 every run, colored by year\n",
"\n",
"If you're getting more efficient, points should drift **right and up** (faster pace at same HR) over time."
]
@@ -407,7 +398,7 @@
" alpha=0.5, label=str(yr))\n",
"ax.invert_yaxis()\n",
"ax.set_xlabel('Avg HR (bpm)')\n",
"ax.set_ylabel('Pace (min/km, faster )')\n",
"ax.set_ylabel('Pace (min/km, faster \u2191)')\n",
"ax.set_title('Every run, sized by distance, colored by year')\n",
"ax.legend(title='Year')\n",
"ax.grid(alpha=0.3)\n",
@@ -418,9 +409,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Easy runs only (HR < 155, 5km)\n",
"## Easy runs only (HR < 155, \u22655km)\n",
"\n",
"This is the cleanest comparison aerobic baseline pace at a given heart-rate ceiling.\n",
"This is the cleanest comparison \u2014 aerobic baseline pace at a given heart-rate ceiling.\n",
"Hard sessions vary a lot; easy runs are more reproducible."
]
},
@@ -563,7 +554,7 @@
"ax.scatter(easy['start_time_local'], easy['mpb'], alpha=0.4, s=easy['distance_km']*4, label='Run (sized by km)')\n",
"monthly.plot(ax=ax, color='C3', lw=2, marker='o', label='Monthly median')\n",
"ax.set_ylabel('Meters per heartbeat')\n",
"ax.set_title(f'Easy-run efficiency over time (HR < {HR_CEIL}, 5km)')\n",
"ax.set_title(f'Easy-run efficiency over time (HR < {HR_CEIL}, \u22655km)')\n",
"ax.legend()\n",
"ax.grid(alpha=0.3)\n",
"plt.tight_layout()"
@@ -573,7 +564,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Curve fits HR and pace as a function of distance, per year\n",
"## Curve fits \u2014 HR and pace as a function of distance, per year\n",
"\n",
"Smooth curves instead of bucketed bars, so you can see the shape clearly."
]
@@ -615,7 +606,7 @@
"\n",
"ax2.set_xscale('log')\n",
"ax2.invert_yaxis()\n",
"ax2.set_xlabel('Distance (km, log)'); ax2.set_ylabel('Pace (min/km, faster )')\n",
"ax2.set_xlabel('Distance (km, log)'); ax2.set_ylabel('Pace (min/km, faster \u2191)')\n",
"ax2.set_title('Pace vs distance, per year'); ax2.legend(title='Year'); ax2.grid(alpha=0.3)\n",
"plt.tight_layout()"
]
@@ -771,4 +762,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -18,11 +18,10 @@
"outputs": [],
"source": [
"import sys\n",
"sys.path.insert(0, '..')\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from analysis import (\n",
"from openrun import (\n",
" open_conn, load_splits, decoupling,\n",
" assign_hr_zone, cluster_routes, haversine_km,\n",
")\n",
@@ -166,7 +165,7 @@
"metadata": {},
"outputs": [],
"source": [
"from analysis import load_fit_records, fit_decoupling, fit_rolling_efficiency\n",
"from openrun import load_fit_records, fit_decoupling, fit_rolling_efficiency\n",
"\n",
"race_meta = pd.read_sql('''\n",
" SELECT a.activity_id, a.start_time_local, a.distance_m/1000 AS km, a.avg_hr\n",
@@ -627,7 +626,7 @@
"metadata": {},
"outputs": [],
"source": [
"from analysis import HR_ZONES_USER\n",
"from openrun import HR_ZONES_USER\n",
"\n",
"tiz = pd.read_sql('''\n",
" SELECT t.activity_id, t.z1_s, t.z2_s, t.z3_s, t.z4_s, t.z5_s, t.total_s, t.source,\n",
@@ -660,7 +659,7 @@
"metadata": {},
"outputs": [],
"source": [
"from analysis import load_fit_records, time_in_zone_from_fit, time_in_zone_from_splits\n",
"from openrun import load_fit_records, time_in_zone_from_fit, time_in_zone_from_splits\n",
"\n",
"race_aid = race_meta.iloc[-1]['activity_id'] # most recent 50K (2025-09-20)\n",
"race_date = race_meta.iloc[-1]['start_time_local'].date()\n",

View File

@@ -25,11 +25,10 @@
"metadata": {},
"outputs": [],
"source": [
"import sys; sys.path.insert(0, '..')\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from analysis import open_conn, load_activities\n",
"from openrun import open_conn, load_activities\n",
"\n",
"PLAN_START = pd.Timestamp('2026-05-18')\n",
"RACES = {\n",
@@ -378,7 +377,7 @@
"metadata": {},
"outputs": [],
"source": [
"from analysis import banister, daily_training_load_series\n",
"from openrun import banister, daily_training_load_series\n",
"\n",
"# Historical daily load (running + trail) up to today\n",
"hist = daily_training_load_series(conn)\n",

View File

@@ -46,7 +46,7 @@ cells.append(code(
"import numpy as np",
"import pandas as pd",
"import matplotlib.pyplot as plt",
"from analysis import (",
"from openrun import (",
" open_conn, load_splits, decoupling,",
" assign_hr_zone, cluster_routes, haversine_km,",
")",
@@ -159,7 +159,7 @@ cells.append(md(
))
cells.append(code(
"from analysis import load_fit_records, fit_decoupling, fit_rolling_efficiency",
"from openrun import load_fit_records, fit_decoupling, fit_rolling_efficiency",
"",
"race_meta = pd.read_sql('''",
" SELECT a.activity_id, a.start_time_local, a.distance_m/1000 AS km, a.avg_hr",
@@ -524,7 +524,7 @@ cells.append(md(
))
cells.append(code(
"from analysis import HR_ZONES_USER",
"from openrun import HR_ZONES_USER",
"",
"tiz = pd.read_sql('''",
" SELECT t.activity_id, t.z1_s, t.z2_s, t.z3_s, t.z4_s, t.z5_s, t.total_s, t.source,",
@@ -549,7 +549,7 @@ cells.append(md(
))
cells.append(code(
"from analysis import load_fit_records, time_in_zone_from_fit, time_in_zone_from_splits",
"from openrun import load_fit_records, time_in_zone_from_fit, time_in_zone_from_splits",
"",
"race_aid = race_meta.iloc[-1]['activity_id'] # most recent 50K (2025-09-20)",
"race_date = race_meta.iloc[-1]['start_time_local'].date()",

View File

@@ -54,7 +54,7 @@ cells.append(code(
"import numpy as np",
"import pandas as pd",
"import matplotlib.pyplot as plt",
"from analysis import open_conn, load_activities",
"from openrun import open_conn, load_activities",
"",
"PLAN_START = pd.Timestamp('2026-05-18')",
"RACES = {",
@@ -354,7 +354,7 @@ cells.append(md(
))
cells.append(code(
"from analysis import banister, daily_training_load_series",
"from openrun import banister, daily_training_load_series",
"",
"# Historical daily load (running + trail) up to today",
"hist = daily_training_load_series(conn)",

View File

@@ -17,14 +17,11 @@
"metadata": {},
"outputs": [],
"source": [
"import sqlite3\n",
"from pathlib import Path\n",
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from openrun import open_conn\n",
"\n",
"DB = Path('..') / 'data' / 'garmin.db'\n",
"conn = sqlite3.connect(DB)\n",
"conn = open_conn()\n",
"\n",
"# What tables do we have, and how many rows in each?\n",
"tables = pd.read_sql(\"SELECT name FROM sqlite_master WHERE type='table' ORDER BY name\", conn)\n",
@@ -37,7 +34,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Activities load into pandas"
"## Activities \u2014 load into pandas"
]
},
{
@@ -99,7 +96,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Sleep, stress, HRV daily timeline"
"## Sleep, stress, HRV \u2014 daily timeline"
]
},
{
@@ -178,4 +175,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}

View File

@@ -1,498 +1,5 @@
"""Ingest a Garmin Connect data export (zip or unzipped directory) into SQLite.
Usage:
uv run ingest_export.py path/to/export.zip
uv run ingest_export.py path/to/unzipped_export_dir/
Garmin's export contains a tree like:
DI_CONNECT/
DI-Connect-Fitness/ # activity JSONs + .fit files
DI-Connect-Wellness/ # daily wellness JSONs
DI-Connect-Aggregator/ # rolled-up summaries
DI-Connect-User/ # profile
...
File names vary by account / export date. We dispatch on filename substrings
and log anything unrecognized so you can tell us about new shapes.
"""
from __future__ import annotations
import argparse
import json
import sqlite3
import sys
import tempfile
import zipfile
from collections import defaultdict
from datetime import datetime
from pathlib import Path
from typing import Any, Callable, Iterable
from db import connect, set_state
# ---------------------------------------------------------------------------
# helpers
# ---------------------------------------------------------------------------
def _load_json(path: Path) -> Any:
try:
with path.open() as fh:
return json.load(fh)
except (OSError, json.JSONDecodeError) as exc:
print(f" ! failed to read {path.name}: {exc}", file=sys.stderr)
return None
def _as_list(payload: Any) -> list[dict]:
"""Garmin JSON files are sometimes a list, sometimes a {"key": [...]} envelope."""
if isinstance(payload, list):
return [x for x in payload if isinstance(x, dict)]
if isinstance(payload, dict):
for v in payload.values():
if isinstance(v, list):
return [x for x in v if isinstance(x, dict)]
return [payload]
return []
def _dump(obj: Any) -> str:
return json.dumps(obj, separators=(",", ":"), default=str)
def _first(d: dict, *keys: str) -> Any:
"""Pull the first non-null value among candidate keys (Garmin renames fields between exports)."""
for k in keys:
if k in d and d[k] is not None:
return d[k]
return None
def _date_key(d: dict) -> str | None:
"""Find the calendar date in a daily-stats record, normalized to ISO."""
raw = _first(d, "calendarDate", "date", "summaryDate", "statisticsStartDate")
if not raw:
return None
if isinstance(raw, str):
return raw[:10]
return str(raw)[:10]
# ---------------------------------------------------------------------------
# handlers — one per data category
# ---------------------------------------------------------------------------
def handle_activities(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
# summarizedActivities format: [{"summarizedActivitiesExport": [{...}, {...}]}]
if isinstance(payload, list) and payload and isinstance(payload[0], dict):
if "summarizedActivitiesExport" in payload[0]:
items = payload[0]["summarizedActivitiesExport"]
else:
items = payload
elif isinstance(payload, dict):
items = _as_list(payload)
else:
items = []
# The Garmin Takeout `summarizedActivities` export uses scaled integer units:
# distance cm → m (÷100)
# duration ms → s (÷1000)
# elevation cm → m (÷100)
# speed m/s ÷ 10 → m/s (×10)
# The live API (`sync.py`) returns these in SI directly, so we only convert
# when the source is this export and the raw values are present in the scaled form.
def _scale(v, factor):
return None if v is None else v * factor
n = 0
for raw in items:
aid = _first(raw, "activityId", "activityIdLocal")
if aid is None:
continue
atype = raw.get("activityType")
if isinstance(atype, dict):
type_key = atype.get("typeKey")
else:
type_key = atype # sometimes a plain string in exports
conn.execute(
"""
INSERT INTO activities (
activity_id, start_time_local, start_time_gmt, activity_type,
activity_name, distance_m, duration_s, moving_duration_s,
avg_speed_mps, max_speed_mps, avg_hr, max_hr, calories,
elevation_gain_m, elevation_loss_m, training_load,
aerobic_te, anaerobic_te, vo2_max, raw, fetched_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))
ON CONFLICT(activity_id) DO UPDATE SET
activity_name=excluded.activity_name,
raw=excluded.raw,
fetched_at=excluded.fetched_at
""",
(
aid,
_first(raw, "startTimeLocal", "beginTimestamp"),
_first(raw, "startTimeGmt", "startTimeGMT"),
type_key,
_first(raw, "activityName", "name"),
_scale(_first(raw, "distance"), 0.01),
_scale(_first(raw, "duration"), 0.001),
_scale(_first(raw, "movingDuration"), 0.001),
_scale(_first(raw, "averageSpeed", "avgSpeed"), 10.0),
_scale(_first(raw, "maxSpeed"), 10.0),
_first(raw, "averageHR", "avgHr"),
_first(raw, "maxHR", "maxHr"),
_first(raw, "calories"),
_scale(_first(raw, "elevationGain"), 0.01),
_scale(_first(raw, "elevationLoss"), 0.01),
_first(raw, "activityTrainingLoad"),
_first(raw, "aerobicTrainingEffect"),
_first(raw, "anaerobicTrainingEffect"),
_first(raw, "vO2MaxValue"),
_dump(raw),
),
)
n += 1
conn.commit()
return n
def handle_sleep(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_sleep
(calendar_date, sleep_start_gmt, sleep_end_gmt, deep_s, light_s, rem_s, awake_s, sleep_score, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "sleepStartTimestampGMT", "sleepStartTimeGmt"),
_first(item, "sleepEndTimestampGMT", "sleepEndTimeGmt"),
_first(item, "deepSleepSeconds"),
_first(item, "lightSleepSeconds"),
_first(item, "remSleepSeconds"),
_first(item, "awakeSleepSeconds", "awakeSeconds"),
_first(item, "sleepScore", "overallSleepScore", "value"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_steps(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_steps
(calendar_date, total_steps, step_goal, distance_m, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "totalSteps", "steps"),
_first(item, "stepGoal", "dailyStepGoal"),
_first(item, "totalDistance", "distance"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_stress(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_stress
(calendar_date, avg_stress, max_stress, raw, fetched_at)
VALUES (?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "overallStressLevel", "averageStressLevel", "avgStress"),
_first(item, "maxStressLevel", "maxStress"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_hrv(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_hrv
(calendar_date, weekly_avg, last_night_avg, last_night_5min, status, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "weeklyAvg"),
_first(item, "lastNightAvg"),
_first(item, "lastNight5MinHigh"),
_first(item, "status"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_resting_hr(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
rhr = _first(item, "restingHeartRate", "value")
if rhr is None:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_resting_hr
(calendar_date, resting_hr, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(date_key, rhr, _dump(item)),
)
n += 1
conn.commit()
return n
def handle_intensity_minutes(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_intensity_minutes
(calendar_date, moderate_minutes, vigorous_minutes, raw, fetched_at)
VALUES (?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "moderateIntensityMinutes", "moderateValue"),
_first(item, "vigorousIntensityMinutes", "vigorousValue"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_body_battery(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_body_battery
(calendar_date, charged, drained, highest, lowest, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "charged", "bodyBatteryChargedValue"),
_first(item, "drained", "bodyBatteryDrainedValue"),
_first(item, "highest", "highestBatteryLevel", "bodyBatteryHighestValue"),
_first(item, "lowest", "lowestBatteryLevel", "bodyBatteryLowestValue"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_fit(conn: sqlite3.Connection, path: Path, export_root: Path) -> int:
"""Index .fit file location by activity ID. Don't parse the binary here.
Garmin uses a few filename formats — handle both the classic `<id>_<name>.fit`
and the Takeout variant `<email>_<id>.fit` by picking the first 8+ digit chunk.
"""
aid = None
for chunk in path.stem.split("_"):
if chunk.isdigit() and len(chunk) >= 8:
aid = int(chunk)
break
if aid is None:
# filename has no recognisable activity-id chunk
return 0
# Verify the parsed number is actually an activity_id we know about.
# The Garmin Takeout dump uses *upload IDs* in FIT filenames (different ID space
# from activity IDs), so naive insert would create thousands of orphaned rows.
# If the id isn't in `activities`, skip — link_fit_files.py handles the
# by-content matching for takeout-format exports.
row = conn.execute(
"SELECT 1 FROM activities WHERE activity_id = ? LIMIT 1", (aid,)
).fetchone()
if row is None:
return 0
rel = path.relative_to(export_root)
conn.execute(
"""INSERT OR REPLACE INTO activity_fit_files (activity_id, fit_path, indexed_at)
VALUES (?, ?, datetime('now'))""",
(aid, str(rel)),
)
conn.commit()
return 1
# ---------------------------------------------------------------------------
# dispatch — pattern → handler
# ---------------------------------------------------------------------------
Handler = Callable[[sqlite3.Connection, Path], int]
# Order matters — first match wins. Use lowercased filename substrings.
DISPATCH: list[tuple[str, str, Handler]] = [
("summarizedActivities", "activities", handle_activities),
("sleepData", "sleep", handle_sleep),
("sleep", "sleep", handle_sleep),
("UDSFile", "steps", handle_steps), # daily steps file naming varies
("step", "steps", handle_steps),
("stressLevel", "stress", handle_stress),
("stress", "stress", handle_stress),
("hrvStatus", "hrv", handle_hrv),
("hrv", "hrv", handle_hrv),
("restingHeart", "resting_hr", handle_resting_hr),
("RHR", "resting_hr", handle_resting_hr),
("intensityMinute", "intensity", handle_intensity_minutes),
("bodyBattery", "body_batt", handle_body_battery),
("BBS", "body_batt", handle_body_battery),
]
def classify(name: str) -> tuple[str, Handler] | None:
lower = name.lower()
for needle, label, fn in DISPATCH:
if needle.lower() in lower:
return label, fn
return None
# ---------------------------------------------------------------------------
# entry
# ---------------------------------------------------------------------------
def iter_files(root: Path) -> Iterable[Path]:
for p in root.rglob("*"):
if p.is_file():
yield p
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("source", help="Path to export.zip or unzipped export directory")
parser.add_argument("--dry-run", action="store_true",
help="Show what would be ingested without writing to the DB")
args = parser.parse_args()
src = Path(args.source).expanduser().resolve()
if not src.exists():
sys.exit(f"path does not exist: {src}")
cleanup_dir: tempfile.TemporaryDirectory | None = None
if src.is_file() and src.suffix.lower() == ".zip":
cleanup_dir = tempfile.TemporaryDirectory(prefix="garmin_export_")
export_root = Path(cleanup_dir.name)
print(f"unzipping {src.name}{export_root}")
with zipfile.ZipFile(src) as zf:
zf.extractall(export_root)
elif src.is_dir():
export_root = src
else:
sys.exit(f"unsupported source: {src}")
conn = connect()
counts: dict[str, int] = defaultdict(int)
unknown: list[str] = []
fit_count = 0
for path in iter_files(export_root):
if path.suffix.lower() == ".fit":
if not args.dry_run:
fit_count += handle_fit(conn, path, export_root)
else:
fit_count += 1
continue
if path.suffix.lower() != ".json":
continue
matched = classify(path.name)
if not matched:
unknown.append(str(path.relative_to(export_root)))
continue
label, fn = matched
if args.dry_run:
counts[label] += 1
continue
try:
counts[label] += fn(conn, path)
except Exception as exc: # noqa: BLE001
print(f" ! error in {label} handler for {path.name}: {exc}", file=sys.stderr)
print("\n=== ingest summary ===")
for label, n in sorted(counts.items()):
print(f" {label:20s} {n:>6} rows" + (" (file count, dry run)" if args.dry_run else ""))
print(f" fit_files {fit_count:>6}")
if unknown:
print(f"\n unrecognized JSON files ({len(unknown)}):")
for name in unknown[:25]:
print(f" {name}")
if len(unknown) > 25:
print(f" ... and {len(unknown) - 25} more")
if not args.dry_run:
set_state(conn, "last_ingest_utc", datetime.utcnow().isoformat(timespec="seconds"))
set_state(conn, "last_ingest_source", str(src))
conn.commit()
conn.close()
if cleanup_dir:
cleanup_dir.cleanup()
print("\n✓ done")
"""Shim — see openrun.ingest.garmin_export."""
from openrun.ingest.garmin_export import main
if __name__ == "__main__":
main()

View File

@@ -1,150 +1,5 @@
"""Match FIT files to activity rows by content (session.start_time).
The Garmin Connect data export embeds activity IDs in FIT filenames
(`<id>_<name>.fit`), so `ingest_export.py` can index them straight from the
filename. The newer Garmin **Takeout** dump uses upload IDs instead
(`<email>_<upload_id>.fit`), which are a different ID space, so filename-based
linking fails. This script does the linking via FIT content: it opens each FIT,
reads the `session.start_time` (UTC), and matches it against
`activities.start_time_gmt` within a small tolerance.
Usage:
uv run link_fit_files.py <export_root>
uv run link_fit_files.py <export_root> --dry-run --min-size-kb 50
Defaults:
Only parses FITs >= 50 KB (workout FITs; smaller files are HRV/sleep/body-battery
monitoring snapshots, ~99% of the dump by count).
Tolerance: 60 seconds. Garmin sometimes rounds session start to the second
while activities.start_time_gmt is also second-precision.
Re-running is safe — the table is upsert by activity_id.
"""
from __future__ import annotations
import argparse
import sqlite3
import sys
from datetime import datetime, timezone
from pathlib import Path
import fitparse
from db import connect
def _parse_session_start(fit_path: Path) -> datetime | None:
"""Return the session start_time as a UTC datetime, or None if the file has none."""
try:
fit = fitparse.FitFile(str(fit_path))
for msg in fit.get_messages("session"):
vals = msg.get_values()
ts = vals.get("start_time")
if ts is None:
continue
if isinstance(ts, datetime):
# fitparse returns naive datetimes in UTC for FIT timestamps.
return ts.replace(tzinfo=timezone.utc) if ts.tzinfo is None else ts.astimezone(timezone.utc)
return None
except Exception as exc: # noqa: BLE001
print(f" ! parse failed for {fit_path.name}: {exc}", file=sys.stderr)
return None
def _load_activity_index(conn: sqlite3.Connection) -> dict[int, int]:
"""Map (epoch_seconds_utc) -> activity_id for every activity with a start_time_gmt."""
idx: dict[int, int] = {}
for aid, gmt in conn.execute("SELECT activity_id, start_time_gmt FROM activities"):
if gmt is None:
continue
# start_time_gmt may be an ISO string, ms epoch number, or sec epoch.
try:
if isinstance(gmt, (int, float)):
v = float(gmt)
# heuristic: >1e12 means ms epoch, otherwise seconds
secs = int(v / 1000) if v > 1e12 else int(v)
else:
s = str(gmt).strip().rstrip("Z")
secs = int(datetime.fromisoformat(s).replace(tzinfo=timezone.utc).timestamp())
except (ValueError, TypeError):
continue
idx[secs] = aid
return idx
def link(export_root: Path, *, dry_run: bool, min_size_kb: int, tolerance_s: int) -> None:
if not export_root.exists():
sys.exit(f"path does not exist: {export_root}")
conn = connect()
index = _load_activity_index(conn)
if not index:
sys.exit("no activities with start_time_gmt — ingest activities first")
sorted_keys = sorted(index.keys())
print(f"loaded {len(index):,} activity start times")
candidates = [p for p in export_root.rglob("*.fit") if p.stat().st_size >= min_size_kb * 1024]
print(f"scanning {len(candidates):,} FIT files ≥ {min_size_kb} KB")
linked = unmatched = parse_failed = 0
for i, p in enumerate(candidates, 1):
if i % 50 == 0:
print(f"{i}/{len(candidates)} processed (linked={linked}, unmatched={unmatched})")
start = _parse_session_start(p)
if start is None:
parse_failed += 1
continue
target = int(start.timestamp())
# Binary search the closest key within tolerance.
from bisect import bisect_left
pos = bisect_left(sorted_keys, target)
candidates_near: list[int] = []
if pos < len(sorted_keys):
candidates_near.append(sorted_keys[pos])
if pos > 0:
candidates_near.append(sorted_keys[pos - 1])
best = min((c for c in candidates_near if abs(c - target) <= tolerance_s), default=None, key=lambda c: abs(c - target))
if best is None:
unmatched += 1
continue
aid = index[best]
if not dry_run:
rel = p.relative_to(export_root)
conn.execute(
"""INSERT OR REPLACE INTO activity_fit_files (activity_id, fit_path, indexed_at)
VALUES (?, ?, datetime('now'))""",
(aid, str(rel)),
)
linked += 1
if not dry_run:
conn.commit()
conn.close()
print()
print("=== linker summary ===")
print(f" candidates scanned: {len(candidates):,}")
print(f" linked : {linked:,}" + (" (dry-run)" if dry_run else ""))
print(f" unmatched start : {unmatched:,} (no activity within ±{tolerance_s}s)")
print(f" parse failures : {parse_failed:,}")
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("export_root", help="Path to the unzipped Garmin export folder")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--min-size-kb", type=int, default=50,
help="Skip FITs smaller than this; default 50 KB filters out HRV/sleep snapshots")
parser.add_argument("--tolerance-s", type=int, default=60,
help="Max seconds between FIT session start and activity start to count as a match")
args = parser.parse_args()
link(Path(args.export_root).expanduser().resolve(),
dry_run=args.dry_run,
min_size_kb=args.min_size_kb,
tolerance_s=args.tolerance_s)
"""Shim — see openrun.ingest.fit_linker."""
from openrun.ingest.fit_linker import main
if __name__ == "__main__":
main()

Binary file not shown.

39
openrun.toml Normal file
View File

@@ -0,0 +1,39 @@
# openrun project config. Discovered by openrun.config.load_config().
# Edit values here; nothing in src/openrun/ should be user-specific.
[user]
name = "Ethan"
height_cm = 190.5
weight_kg = 89.3
hr_max = 209 # Garmin-configured max (heartRateZones.json)
lthr = 182 # Lactate-threshold HR
resting_hr = 52
# Garmin's configured zones (HR_MAX method). Override if you recompute.
[user.hr_zones]
Z1 = [102, 122] # recovery
Z2 = [123, 143] # easy aerobic — long-run target
Z3 = [144, 164] # tempo
Z4 = [165, 185] # threshold (LTHR=182 sits inside)
Z5 = [186, 209] # VO₂ max
[db]
path = "data/garmin.db"
[banister]
ctl_tau_days = 42.0
atl_tau_days = 7.0
race_day_tl_per_km = 7.0 # ultra race-pace; training runs are ~11
# 2026 race calendar. Add/remove as the season evolves.
[[races]]
label = "wk 4 — 30K"
date = "2026-06-13"
[[races]]
label = "wk 10 — 50K"
date = "2026-07-25"
[[races]]
label = "wk 17 — 50 MILE"
date = "2026-09-12"

View File

@@ -1,7 +1,9 @@
[project]
name = "garmin"
name = "openrun"
version = "0.1.0"
description = "Add your description here"
description = "Endurance-running analytics — local-first, open-source. Banister CTL/ATL/TSB, Pa:HR decoupling, FIT-aware HR zones, route clustering, race-plan projection."
readme = "README.md"
license = { text = "MIT" }
requires-python = ">=3.13"
dependencies = [
"fitparse>=1.2.0",
@@ -11,3 +13,28 @@ dependencies = [
"matplotlib>=3.10.9",
"pandas>=3.0.3",
]
[project.scripts]
openrun-init = "openrun.setup:main"
openrun-auth = "openrun.ingest.auth:main"
openrun-sync = "openrun.ingest.garmin_api:main"
openrun-ingest = "openrun.ingest.garmin_export:main"
openrun-link-fit = "openrun.ingest.fit_linker:main"
openrun-time-in-zone = "openrun.ingest.time_in_zone:main"
[dependency-groups]
dev = [
"pytest>=8.0.0",
"pytest-cov>=5.0.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/openrun"]
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "-ra --strict-markers"

Binary file not shown.

84
src/openrun/__init__.py Normal file
View File

@@ -0,0 +1,84 @@
"""openrun — endurance-running analytics, local-first.
Most notebooks just need:
from openrun import open_conn, load_activities, load_wellness
conn = open_conn()
runs = load_activities(conn, type='running')
Submodules:
openrun.config — UserProfile, HR zones, Banister params, race calendar
openrun.db — schema + connection helper
openrun.model — loaders + derived metrics (decoupling, zones, Banister, FIT)
openrun.ingest.* — data-source ingesters (Garmin export, FIT linker, time-in-zone)
"""
from .config import (
Config,
HRZones,
UserProfile,
BanisterParams,
default_config,
default_profile,
load_config,
)
from .db import connect, get_state, set_state
from .model import (
# connections
open_conn,
# loaders
load_activities,
load_wellness,
load_sleep_stages,
load_splits,
daily_training_load,
daily_training_load_series,
joined,
expand_raw,
# FIT
load_fit_records,
fit_decoupling,
fit_rolling_efficiency,
# zones
assign_hr_zone,
hr_to_user_zone,
time_in_zone_from_fit,
time_in_zone_from_splits,
weekly_time_in_zone,
# Banister
banister,
banister_forecast,
# Race-plan helpers
personal_records,
calibrate_tl_per_km,
# Geo
haversine_km,
cluster_routes,
# split-level decoupling (kept for back-compat)
decoupling,
)
__all__ = [
"Config", "HRZones", "UserProfile", "BanisterParams",
"default_config", "default_profile", "load_config",
"connect", "get_state", "set_state",
"open_conn",
"load_activities", "load_wellness", "load_sleep_stages", "load_splits",
"daily_training_load",
"daily_training_load_series", "joined", "expand_raw",
"load_fit_records", "fit_decoupling", "fit_rolling_efficiency",
"assign_hr_zone", "hr_to_user_zone",
"time_in_zone_from_fit", "time_in_zone_from_splits", "weekly_time_in_zone",
"banister", "banister_forecast", "personal_records", "calibrate_tl_per_km",
"haversine_km", "cluster_routes", "decoupling",
"HR_ZONES_USER", # back-compat alias, resolved lazily
]
def __getattr__(name: str):
# Lazy fallback so `from openrun import HR_ZONES_USER` works.
# Resolves to the current default profile's zones at access time.
if name == "HR_ZONES_USER":
from .config import default_profile
return default_profile().hr_zones().as_tuples()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

195
src/openrun/config.py Normal file
View File

@@ -0,0 +1,195 @@
"""User profile and project configuration.
A single TOML file (default: `./openrun.toml`) describes the athlete's biometric
profile, configured HR zones, race calendar, and project paths. Loaded once per
process via `load_config()`.
Discovery order:
1. `OPENRUN_CONFIG` env var (absolute path)
2. `./openrun.toml` in the current working directory
3. `~/.config/openrun/config.toml`
4. Built-in defaults (Z1..Z5 from %HRmax with HRmax=200) — fine for getting started
but the per-second time-in-zone and projected PMC are only as good as this file.
The profile is intentionally simple — a dataclass, not an ORM. Functions in
`openrun.model` that need user-specific numbers take a `UserProfile` argument
(usually defaulting to `default_profile()` which reads the current config).
"""
from __future__ import annotations
import os
import sys
from dataclasses import dataclass, field
from pathlib import Path
if sys.version_info >= (3, 11):
import tomllib
else: # pragma: no cover
import tomli as tomllib
@dataclass(frozen=True)
class HRZones:
"""Inclusive [lo, hi] bpm bounds per zone, Z1..Z5."""
z1: tuple[int, int]
z2: tuple[int, int]
z3: tuple[int, int]
z4: tuple[int, int]
z5: tuple[int, int]
def as_tuples(self) -> tuple[tuple[str, int, int], ...]:
return (
("Z1", self.z1[0], self.z1[1]),
("Z2", self.z2[0], self.z2[1]),
("Z3", self.z3[0], self.z3[1]),
("Z4", self.z4[0], self.z4[1]),
("Z5", self.z5[0], self.z5[1]),
)
@classmethod
def from_pct_hr_max(cls, hr_max: int) -> "HRZones":
"""Generic %HRmax fallback (Karvonen-ish, not personalised)."""
def band(lo_pct: float, hi_pct: float) -> tuple[int, int]:
return int(round(lo_pct * hr_max)), int(round(hi_pct * hr_max))
return cls(
z1=band(0.50, 0.60),
z2=band(0.60, 0.70),
z3=band(0.70, 0.80),
z4=band(0.80, 0.90),
z5=band(0.90, 1.00),
)
@dataclass(frozen=True)
class UserProfile:
"""One athlete's identity + biometric + zone config."""
name: str = "Athlete"
height_cm: float | None = None
weight_kg: float | None = None
hr_max: int = 200
lthr: int | None = None
resting_hr: int | None = None
zones: HRZones | None = None # if None, falls back to from_pct_hr_max(hr_max)
def hr_zones(self) -> HRZones:
return self.zones or HRZones.from_pct_hr_max(self.hr_max)
@dataclass(frozen=True)
class BanisterParams:
ctl_tau_days: float = 42.0
atl_tau_days: float = 7.0
race_day_tl_per_km: float = 7.0 # ultra-paced races burn ~7 TL/km; easy runs ~11
@dataclass(frozen=True)
class Config:
"""Top-level project config."""
user: UserProfile = field(default_factory=UserProfile)
db_path: Path = field(default=Path("data/garmin.db"))
banister: BanisterParams = field(default_factory=BanisterParams)
# Optional race calendar — list of (label, ISO date).
races: tuple[tuple[str, str], ...] = ()
@property
def race_dates(self) -> dict[str, str]:
return dict(self.races)
def _walk_up_for(filename: str, start: Path | None = None) -> Path | None:
"""Walk up the directory tree from `start` (default cwd) looking for `filename`."""
cur = (start or Path.cwd()).resolve()
while True:
candidate = cur / filename
if candidate.is_file():
return candidate
if cur.parent == cur:
return None
cur = cur.parent
def _find_config() -> Path | None:
env = os.environ.get("OPENRUN_CONFIG")
if env:
p = Path(env).expanduser()
if p.is_file():
return p
found = _walk_up_for("openrun.toml")
if found is not None:
return found
user_cfg = Path.home() / ".config" / "openrun" / "config.toml"
if user_cfg.is_file():
return user_cfg
return None
def _zones_from_toml(d: dict) -> HRZones | None:
if not d:
return None
def _pair(key: str) -> tuple[int, int]:
v = d[key]
return int(v[0]), int(v[1])
try:
return HRZones(
z1=_pair("Z1"), z2=_pair("Z2"), z3=_pair("Z3"),
z4=_pair("Z4"), z5=_pair("Z5"),
)
except (KeyError, TypeError, ValueError):
return None
def load_config(path: Path | str | None = None) -> Config:
"""Load configuration. If `path` is None, walks the discovery list."""
cfg_path = Path(path).expanduser() if path else _find_config()
if cfg_path is None:
return Config()
with cfg_path.open("rb") as fh:
data = tomllib.load(fh)
base_dir = cfg_path.parent
user_d = data.get("user", {}) or {}
zones = _zones_from_toml(user_d.get("hr_zones") or data.get("hr_zones") or {})
user = UserProfile(
name=user_d.get("name", "Athlete"),
height_cm=user_d.get("height_cm"),
weight_kg=user_d.get("weight_kg"),
hr_max=int(user_d.get("hr_max", 200)),
lthr=user_d.get("lthr"),
resting_hr=user_d.get("resting_hr"),
zones=zones,
)
ban_d = data.get("banister", {}) or {}
banister = BanisterParams(
ctl_tau_days=float(ban_d.get("ctl_tau_days", 42.0)),
atl_tau_days=float(ban_d.get("atl_tau_days", 7.0)),
race_day_tl_per_km=float(ban_d.get("race_day_tl_per_km", 7.0)),
)
db_d = data.get("db", {}) or {}
db_raw = db_d.get("path", "data/garmin.db")
db_path = Path(db_raw)
if not db_path.is_absolute():
db_path = (base_dir / db_path).resolve()
races_raw = data.get("races", []) or []
races = tuple((str(r["label"]), str(r["date"])) for r in races_raw if "label" in r and "date" in r)
return Config(user=user, db_path=db_path, banister=banister, races=races)
# Cached default — load once per process; explicit overrides go through load_config().
_default: Config | None = None
def default_config() -> Config:
global _default
if _default is None:
_default = load_config()
return _default
def default_profile() -> UserProfile:
return default_config().user

169
src/openrun/db.py Normal file
View File

@@ -0,0 +1,169 @@
"""SQLite schema + connection helpers.
The DB path comes from the loaded `Config` (see `openrun.config`); pass an explicit
`Path` to `connect()` to override (handy for tests).
"""
from __future__ import annotations
import sqlite3
from pathlib import Path
from .config import default_config
SCHEMA = """
CREATE TABLE IF NOT EXISTS activities (
activity_id INTEGER PRIMARY KEY,
start_time_local TEXT,
start_time_gmt TEXT,
activity_type TEXT,
activity_name TEXT,
distance_m REAL,
duration_s REAL,
moving_duration_s REAL,
avg_speed_mps REAL,
max_speed_mps REAL,
avg_hr REAL,
max_hr REAL,
calories REAL,
elevation_gain_m REAL,
elevation_loss_m REAL,
training_load REAL,
aerobic_te REAL,
anaerobic_te REAL,
vo2_max REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_activities_start ON activities(start_time_local);
CREATE INDEX IF NOT EXISTS idx_activities_type ON activities(activity_type);
CREATE TABLE IF NOT EXISTS activity_splits (
activity_id INTEGER NOT NULL,
split_index INTEGER NOT NULL,
distance_m REAL,
duration_s REAL,
avg_hr REAL,
avg_speed_mps REAL,
elevation_gain_m REAL,
raw TEXT NOT NULL,
PRIMARY KEY (activity_id, split_index),
FOREIGN KEY (activity_id) REFERENCES activities(activity_id)
);
CREATE TABLE IF NOT EXISTS daily_steps (
calendar_date TEXT PRIMARY KEY,
total_steps INTEGER,
step_goal INTEGER,
distance_m REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_sleep (
calendar_date TEXT PRIMARY KEY,
sleep_start_gmt TEXT,
sleep_end_gmt TEXT,
deep_s REAL,
light_s REAL,
rem_s REAL,
awake_s REAL,
sleep_score REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_stress (
calendar_date TEXT PRIMARY KEY,
avg_stress REAL,
max_stress REAL,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_hrv (
calendar_date TEXT PRIMARY KEY,
weekly_avg REAL,
last_night_avg REAL,
last_night_5min REAL,
status TEXT,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_body_battery (
calendar_date TEXT PRIMARY KEY,
charged INTEGER,
drained INTEGER,
highest INTEGER,
lowest INTEGER,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_intensity_minutes (
calendar_date TEXT PRIMARY KEY,
moderate_minutes INTEGER,
vigorous_minutes INTEGER,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS daily_resting_hr (
calendar_date TEXT PRIMARY KEY,
resting_hr INTEGER,
raw TEXT NOT NULL,
fetched_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS activity_fit_files (
activity_id INTEGER PRIMARY KEY,
fit_path TEXT NOT NULL,
indexed_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS activity_time_in_zone (
activity_id INTEGER PRIMARY KEY,
z1_s REAL,
z2_s REAL,
z3_s REAL,
z4_s REAL,
z5_s REAL,
total_s REAL,
source TEXT,
computed_at TEXT NOT NULL,
FOREIGN KEY (activity_id) REFERENCES activities(activity_id)
);
CREATE TABLE IF NOT EXISTS sync_state (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
updated_at TEXT NOT NULL
);
"""
def connect(db_path: Path | str | None = None) -> sqlite3.Connection:
"""Open (and create-if-missing) the SQLite DB. Defaults to config-supplied path."""
path = Path(db_path) if db_path is not None else default_config().db_path
path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(path)
conn.execute("PRAGMA journal_mode = WAL")
conn.execute("PRAGMA foreign_keys = ON")
conn.row_factory = sqlite3.Row
conn.executescript(SCHEMA)
return conn
def set_state(conn: sqlite3.Connection, key: str, value: str) -> None:
conn.execute(
"INSERT INTO sync_state(key, value, updated_at) VALUES (?, ?, datetime('now')) "
"ON CONFLICT(key) DO UPDATE SET value=excluded.value, updated_at=excluded.updated_at",
(key, value),
)
def get_state(conn: sqlite3.Connection, key: str) -> str | None:
row = conn.execute("SELECT value FROM sync_state WHERE key = ?", (key,)).fetchone()
return row["value"] if row else None

View File

@@ -0,0 +1,2 @@
"""Data-source ingesters. Each module exposes a `main()` entry point so it can be
invoked as a CLI (`python -m openrun.ingest.<name>`) or imported."""

View File

@@ -0,0 +1,38 @@
"""One-time Garmin Connect login; saves OAuth tokens to ./.secrets/.
Usage:
python -m openrun.ingest.auth
# or, via shim, from the project root:
uv run auth.py
Prompts for email + password (and MFA if your account has 2FA). Tokens are
refreshed automatically by `openrun.ingest.garmin_api` on subsequent runs
(valid ~1 year).
Run from the project root — the token directory is `.secrets/` relative to cwd.
"""
import getpass
import os
from pathlib import Path
import garth
def _token_dir() -> Path:
return Path.cwd() / ".secrets"
def main() -> None:
tok = _token_dir()
tok.mkdir(exist_ok=True)
email = os.environ.get("GARMIN_EMAIL") or input("Garmin email: ").strip()
password = os.environ.get("GARMIN_PASSWORD") or getpass.getpass("Garmin password: ")
garth.login(email, password)
garth.save(tok)
profile = garth.client.username
print(f"Logged in as {profile}. Tokens saved to {tok}.")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,277 @@
"""Match FIT files to activity rows by content (session.start_time).
The Garmin Connect data export embeds activity IDs in FIT filenames
(`<id>_<name>.fit`), so `ingest_export.py` can index them straight from the
filename. The newer Garmin **Takeout** dump uses upload IDs instead
(`<email>_<upload_id>.fit`), which are a different ID space, so filename-based
linking fails. This script does the linking via FIT content: it opens each FIT,
reads the `session.start_time` (UTC), and matches it against
`activities.start_time_gmt` within a small tolerance.
Stored paths are **absolute** (resolved at link time). If the export directory
is moved, run with `--relink <new_root>` to rewrite the table by basename.
Usage:
uv run link_fit_files.py <export_root>
uv run link_fit_files.py <export_root> --dry-run --min-size-kb 50
uv run link_fit_files.py <new_root> --relink
Defaults:
Only parses FITs >= 50 KB (workout FITs; smaller files are HRV/sleep/body-battery
monitoring snapshots, ~99% of the dump by count).
Tolerance: 60 seconds. Garmin sometimes rounds session start to the second
while activities.start_time_gmt is also second-precision.
Re-running is safe — the table is upsert by activity_id.
"""
from __future__ import annotations
import argparse
import sqlite3
import sys
from bisect import bisect_left
from datetime import datetime, timezone
from pathlib import Path
from typing import Iterable, Iterator
import fitparse
from ..db import connect
def _parse_session_start(fit_path: Path) -> datetime | None:
"""Return the session start_time as a UTC datetime, or None if the file has none."""
try:
fit = fitparse.FitFile(str(fit_path))
for msg in fit.get_messages("session"):
vals = msg.get_values()
ts = vals.get("start_time")
if ts is None:
continue
if isinstance(ts, datetime):
# fitparse returns naive datetimes in UTC for FIT timestamps.
return ts.replace(tzinfo=timezone.utc) if ts.tzinfo is None else ts.astimezone(timezone.utc)
return None
except Exception as exc: # noqa: BLE001
print(f" ! parse failed for {fit_path.name}: {exc}", file=sys.stderr)
return None
def _load_activity_index(conn: sqlite3.Connection) -> dict[int, int]:
"""Map (epoch_seconds_utc) -> activity_id for every activity with a start_time_gmt."""
idx: dict[int, int] = {}
for aid, gmt in conn.execute("SELECT activity_id, start_time_gmt FROM activities"):
if gmt is None:
continue
# start_time_gmt may be an ISO string, ms epoch number, or sec epoch.
try:
if isinstance(gmt, (int, float)):
v = float(gmt)
# heuristic: >1e12 means ms epoch, otherwise seconds
secs = int(v / 1000) if v > 1e12 else int(v)
else:
s = str(gmt).strip().rstrip("Z")
secs = int(datetime.fromisoformat(s).replace(tzinfo=timezone.utc).timestamp())
except (ValueError, TypeError):
continue
idx[secs] = aid
return idx
def _match_activity(
target_secs: int,
sorted_keys: list[int],
index: dict[int, int],
tolerance_s: int,
) -> int | None:
"""Return the activity_id whose start_time_gmt is closest to `target_secs`,
if within ±tolerance_s; otherwise None.
`sorted_keys` is `sorted(index.keys())`; passing it in lets callers reuse it
across many matches without re-sorting.
"""
pos = bisect_left(sorted_keys, target_secs)
near: list[int] = []
if pos < len(sorted_keys):
near.append(sorted_keys[pos])
if pos > 0:
near.append(sorted_keys[pos - 1])
in_tolerance = [c for c in near if abs(c - target_secs) <= tolerance_s]
if not in_tolerance:
return None
best = min(in_tolerance, key=lambda c: abs(c - target_secs))
return index[best]
def _iter_fits_with_session_start(
export_root: Path, *, min_size_kb: int
) -> Iterator[tuple[Path, datetime | None]]:
"""Yield (fit_path, session_start_utc) for every FIT under `export_root`
that's at least `min_size_kb` in size. Yields `None` when parsing fails so
callers can count parse failures separately from unmatched.
"""
for p in export_root.rglob("*.fit"):
if p.stat().st_size < min_size_kb * 1024:
continue
yield p, _parse_session_start(p)
def record_link(conn: sqlite3.Connection, activity_id: int, fit_path: Path) -> None:
"""Upsert `activity_fit_files` with an absolute path to the FIT.
Pure helper — no commit, no I/O beyond `.resolve()`. Callers commit when
they're done batching.
"""
abs_path = str(Path(fit_path).resolve())
conn.execute(
"""INSERT OR REPLACE INTO activity_fit_files (activity_id, fit_path, indexed_at)
VALUES (?, ?, datetime('now'))""",
(activity_id, abs_path),
)
def relink(conn: sqlite3.Connection, new_root: Path) -> tuple[int, int]:
"""Rewrite `activity_fit_files.fit_path` after the export directory moved.
Walks `new_root` for `*.fit`, builds a basename → absolute-path map, and
updates every row whose current path's basename appears in that map.
Returns (updated, unmatched) counts.
"""
new_root = Path(new_root).resolve()
if not new_root.is_dir():
raise FileNotFoundError(f"relink root is not a directory: {new_root}")
by_name: dict[str, Path] = {}
for p in new_root.rglob("*.fit"):
# Last-write-wins on duplicate basenames; in practice Garmin FIT names
# are unique per dump (upload ID or activity ID embedded).
by_name[p.name] = p.resolve()
updated = unmatched = 0
rows = conn.execute(
"SELECT activity_id, fit_path FROM activity_fit_files"
).fetchall()
for aid, old_path in rows:
name = Path(old_path).name
if name in by_name:
record_link(conn, aid, by_name[name])
updated += 1
else:
unmatched += 1
conn.commit()
return updated, unmatched
def link(
export_root: Path,
*,
conn: sqlite3.Connection | None = None,
dry_run: bool,
min_size_kb: int,
tolerance_s: int,
fit_iter: Iterable[tuple[Path, datetime | None]] | None = None,
) -> dict[str, int]:
"""Match each FIT under `export_root` to an activity by session start time.
Returns a summary dict: ``{"linked", "unmatched", "parse_failed", "collisions"}``.
`fit_iter` is an optional injection point — tests pass a pre-built sequence
of (path, session_start_utc | None) tuples to avoid the FIT-parsing path.
When None, FITs are discovered under `export_root` and parsed.
Collision behavior: if two FITs both match the *same* activity, the first
wins; subsequent matches are logged + skipped, never silently overwritten.
"""
if fit_iter is None and not export_root.exists():
sys.exit(f"path does not exist: {export_root}")
own_conn = conn is None
if conn is None:
conn = connect()
index = _load_activity_index(conn)
if not index:
sys.exit("no activities with start_time_gmt — ingest activities first")
sorted_keys = sorted(index.keys())
print(f"loaded {len(index):,} activity start times")
if fit_iter is None:
fit_iter = _iter_fits_with_session_start(export_root, min_size_kb=min_size_kb)
linked_aids: dict[int, Path] = {}
linked = unmatched = parse_failed = collisions = 0
for i, (p, start) in enumerate(fit_iter, 1):
if i % 50 == 0:
print(f"{i} processed (linked={linked}, unmatched={unmatched})")
if start is None:
parse_failed += 1
continue
aid = _match_activity(int(start.timestamp()), sorted_keys, index, tolerance_s)
if aid is None:
unmatched += 1
continue
if aid in linked_aids:
print(
f" ! collision: FIT {p.name} matches activity {aid} already linked "
f"to {linked_aids[aid].name}; skipping",
file=sys.stderr,
)
collisions += 1
continue
linked_aids[aid] = p
if not dry_run:
record_link(conn, aid, p)
linked += 1
if not dry_run:
conn.commit()
if own_conn:
conn.close()
print()
print("=== linker summary ===")
print(f" linked : {linked:,}" + (" (dry-run)" if dry_run else ""))
print(f" unmatched start : {unmatched:,} (no activity within ±{tolerance_s}s)")
print(f" collisions : {collisions:,} (two FITs matched the same activity)")
print(f" parse failures : {parse_failed:,}")
return {
"linked": linked,
"unmatched": unmatched,
"parse_failed": parse_failed,
"collisions": collisions,
}
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("export_root", help="Path to the unzipped Garmin export folder (or new location, with --relink)")
parser.add_argument("--dry-run", action="store_true")
parser.add_argument("--min-size-kb", type=int, default=50,
help="Skip FITs smaller than this; default 50 KB filters out HRV/sleep snapshots")
parser.add_argument("--tolerance-s", type=int, default=60,
help="Max seconds between FIT session start and activity start to count as a match")
parser.add_argument("--relink", action="store_true",
help="Rewrite stored FIT paths to a new export root by basename; skips FIT parsing")
args = parser.parse_args()
root = Path(args.export_root).expanduser().resolve()
if args.relink:
conn = connect()
try:
updated, unmatched = relink(conn, root)
finally:
conn.close()
print(f"=== relink summary ===\n updated : {updated:,}\n unmatched : {unmatched:,}")
return
link(
root,
dry_run=args.dry_run,
min_size_kb=args.min_size_kb,
tolerance_s=args.tolerance_s,
)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,430 @@
"""Sync Garmin Connect data to local SQLite (Path B — live API via garth).
Usage:
python -m openrun.ingest.garmin_api # incremental: last 14 days wellness, new activities only
python -m openrun.ingest.garmin_api --full # full backfill (activities + 365 days wellness)
python -m openrun.ingest.garmin_api --days 90 # custom wellness backfill window
python -m openrun.ingest.garmin_api --no-details # skip per-activity detail/splits (faster)
python -m openrun.ingest.garmin_api --skip-activities # wellness only
Run from the project root. Run `openrun.ingest.auth` first to log in.
"""
from __future__ import annotations
import argparse
import dataclasses
import json
import sqlite3
import sys
import time
from datetime import date, datetime, timedelta
from pathlib import Path
from typing import Any, Callable
import garth
from garth.exc import GarthHTTPError
from ..db import connect, get_state, set_state
def _token_dir() -> Path:
return Path.cwd() / ".secrets"
# ---------------------------------------------------------------------------
# helpers
# ---------------------------------------------------------------------------
def _to_jsonable(obj: Any) -> Any:
"""Recursively convert dataclasses/datetimes/dates to JSON-safe values."""
if dataclasses.is_dataclass(obj) and not isinstance(obj, type):
return {f.name: _to_jsonable(getattr(obj, f.name)) for f in dataclasses.fields(obj)}
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, dict):
return {k: _to_jsonable(v) for k, v in obj.items()}
if isinstance(obj, (list, tuple)):
return [_to_jsonable(v) for v in obj]
return obj
def _dump(obj: Any) -> str:
return json.dumps(_to_jsonable(obj), separators=(",", ":"), default=str)
def _safe_call(fn: Callable[[], Any], description: str) -> Any:
"""Run an API call with one retry on transient errors; return None on failure."""
for attempt in (1, 2):
try:
return fn()
except GarthHTTPError as exc:
status = getattr(getattr(exc, "error", None), "response", None)
code = getattr(status, "status_code", None)
if code == 404:
return None
if attempt == 1 and code in (429, 500, 502, 503, 504):
print(f" retry {description} ({code})", file=sys.stderr)
time.sleep(2)
continue
print(f" ! {description} failed: {exc}", file=sys.stderr)
return None
except Exception as exc: # noqa: BLE001
print(f" ! {description} failed: {exc}", file=sys.stderr)
return None
return None
# ---------------------------------------------------------------------------
# activity sync
# ---------------------------------------------------------------------------
def sync_activities(conn: sqlite3.Connection, *, full: bool, fetch_details: bool) -> int:
existing_ids: set[int] = {
row["activity_id"] for row in conn.execute("SELECT activity_id FROM activities")
}
page_size = 100
offset = 0
inserted = 0
detail_count = 0
while True:
page = _safe_call(
lambda: garth.connectapi(
"/activitylist-service/activities/search/activities",
params={"limit": page_size, "start": offset},
),
f"activity list start={offset}",
)
if not page:
break
new_in_page = 0
for raw in page:
aid = raw.get("activityId")
if aid is None:
continue
if aid in existing_ids and not full:
continue
new_in_page += 1
existing_ids.add(aid)
atype = raw.get("activityType") or {}
conn.execute(
"""
INSERT INTO activities (
activity_id, start_time_local, start_time_gmt, activity_type,
activity_name, distance_m, duration_s, moving_duration_s,
avg_speed_mps, max_speed_mps, avg_hr, max_hr, calories,
elevation_gain_m, elevation_loss_m, training_load,
aerobic_te, anaerobic_te, vo2_max, raw, fetched_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))
ON CONFLICT(activity_id) DO UPDATE SET
activity_name=excluded.activity_name,
distance_m=excluded.distance_m,
duration_s=excluded.duration_s,
raw=excluded.raw,
fetched_at=excluded.fetched_at
""",
(
aid,
raw.get("startTimeLocal"),
raw.get("startTimeGMT"),
atype.get("typeKey"),
raw.get("activityName"),
raw.get("distance"),
raw.get("duration"),
raw.get("movingDuration"),
raw.get("averageSpeed"),
raw.get("maxSpeed"),
raw.get("averageHR"),
raw.get("maxHR"),
raw.get("calories"),
raw.get("elevationGain"),
raw.get("elevationLoss"),
raw.get("activityTrainingLoad"),
raw.get("aerobicTrainingEffect"),
raw.get("anaerobicTrainingEffect"),
raw.get("vO2MaxValue"),
_dump(raw),
),
)
inserted += 1
if fetch_details:
if _fetch_activity_details(conn, aid):
detail_count += 1
conn.commit()
print(f" activities: page offset={offset}{new_in_page} new (total inserted: {inserted})")
if not full and new_in_page == 0:
break
if len(page) < page_size:
break
offset += page_size
if fetch_details:
print(f" fetched details/splits for {detail_count} activities")
return inserted
def _fetch_activity_details(conn: sqlite3.Connection, aid: int) -> bool:
detail = _safe_call(
lambda: garth.connectapi(f"/activity-service/activity/{aid}"),
f"activity detail {aid}",
)
if detail:
summary = detail.get("summaryDTO", {}) or {}
conn.execute(
"""
UPDATE activities SET
training_load = COALESCE(?, training_load),
aerobic_te = COALESCE(?, aerobic_te),
anaerobic_te = COALESCE(?, anaerobic_te),
vo2_max = COALESCE(?, vo2_max),
raw = ?
WHERE activity_id = ?
""",
(
summary.get("activityTrainingLoad"),
summary.get("trainingEffect"),
summary.get("anaerobicTrainingEffect"),
detail.get("vO2MaxValue") or summary.get("vO2MaxValue"),
_dump(detail),
aid,
),
)
splits = _safe_call(
lambda: garth.connectapi(f"/activity-service/activity/{aid}/splits"),
f"activity splits {aid}",
)
if splits and isinstance(splits, dict):
lap_dtos = splits.get("lapDTOs", []) or []
for idx, lap in enumerate(lap_dtos):
conn.execute(
"""
INSERT OR REPLACE INTO activity_splits
(activity_id, split_index, distance_m, duration_s, avg_hr, avg_speed_mps, elevation_gain_m, raw)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
aid,
idx,
lap.get("distance"),
lap.get("duration"),
lap.get("averageHR"),
lap.get("averageSpeed"),
lap.get("elevationGain"),
_dump(lap),
),
)
return detail is not None
# ---------------------------------------------------------------------------
# wellness sync
# ---------------------------------------------------------------------------
def sync_steps(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailySteps.list(end=end, period=period), "DailySteps") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_steps
(calendar_date, total_steps, step_goal, distance_m, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.total_steps, r.step_goal, r.total_distance, _dump(r)),
)
conn.commit()
return len(rows)
def sync_sleep(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailySleep.list(end=end, period=period), "DailySleep") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_sleep
(calendar_date, sleep_score, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.value, _dump(r)),
)
conn.commit()
return len(rows)
def sync_stress(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailyStress.list(end=end, period=period), "DailyStress") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_stress
(calendar_date, avg_stress, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.overall_stress_level, _dump(r)),
)
conn.commit()
return len(rows)
def sync_hrv(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailyHRV.list(end=end, period=period), "DailyHRV") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_hrv
(calendar_date, weekly_avg, last_night_avg, last_night_5min, status, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
r.calendar_date.isoformat(),
r.weekly_avg,
r.last_night_avg,
r.last_night_5_min_high,
r.status,
_dump(r),
),
)
conn.commit()
return len(rows)
def sync_intensity_minutes(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(
lambda: garth.DailyIntensityMinutes.list(end=end, period=period),
"DailyIntensityMinutes",
) or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_intensity_minutes
(calendar_date, moderate_minutes, vigorous_minutes, raw, fetched_at)
VALUES (?, ?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.moderate_value, r.vigorous_value, _dump(r)),
)
conn.commit()
return len(rows)
def sync_body_battery(conn: sqlite3.Connection, end: date, period: int) -> int:
inserted = 0
for i in range(period):
d = end - timedelta(days=i)
existing = conn.execute(
"SELECT 1 FROM daily_body_battery WHERE calendar_date = ?", (d.isoformat(),)
).fetchone()
if existing and i > 2:
continue
data = _safe_call(
lambda d=d: garth.connectapi(
"/wellness-service/wellness/bodyBattery/reports/daily",
params={"startDate": d.isoformat(), "endDate": d.isoformat()},
),
f"body battery {d}",
)
if not data:
continue
item = data[0] if isinstance(data, list) and data else data
if not isinstance(item, dict):
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_body_battery
(calendar_date, charged, drained, highest, lowest, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
d.isoformat(),
item.get("charged"),
item.get("drained"),
item.get("highestBatteryLevel") or item.get("highest"),
item.get("lowestBatteryLevel") or item.get("lowest"),
_dump(item),
),
)
inserted += 1
conn.commit()
return inserted
def sync_resting_hr(conn: sqlite3.Connection, end: date, period: int) -> int:
inserted = 0
for i in range(period):
d = end - timedelta(days=i)
if i > 2:
existing = conn.execute(
"SELECT 1 FROM daily_resting_hr WHERE calendar_date = ?", (d.isoformat(),)
).fetchone()
if existing:
continue
data = _safe_call(
lambda d=d: garth.connectapi(
f"/usersummary-service/usersummary/daily/{garth.client.username}",
params={"calendarDate": d.isoformat()},
),
f"daily summary {d}",
)
if not isinstance(data, dict):
continue
rhr = data.get("restingHeartRate")
if rhr is None:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_resting_hr
(calendar_date, resting_hr, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(d.isoformat(), rhr, _dump(data)),
)
inserted += 1
conn.commit()
return inserted
# ---------------------------------------------------------------------------
# main
# ---------------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--full", action="store_true", help="Full backfill (365 days of wellness)")
parser.add_argument("--days", type=int, default=None, help="Days of wellness to backfill")
parser.add_argument("--no-details", action="store_true", help="Skip per-activity detail/splits")
parser.add_argument("--skip-activities", action="store_true",
help="Wellness only, no activity sync")
args = parser.parse_args()
tok = _token_dir()
if not tok.exists() or not any(tok.iterdir()):
print(f"No tokens at {tok} — run `python -m openrun.ingest.auth` first.", file=sys.stderr)
sys.exit(1)
garth.resume(tok)
days = args.days if args.days is not None else (365 if args.full else 14)
end = date.today()
conn = connect()
if not args.skip_activities:
print(f"→ activities (full={args.full}, details={not args.no_details})")
n = sync_activities(conn, full=args.full, fetch_details=not args.no_details)
print(f" done: {n} activities upserted")
wellness = [
("steps", sync_steps),
("sleep score", sync_sleep),
("stress", sync_stress),
("HRV", sync_hrv),
("intensity minutes", sync_intensity_minutes),
("resting HR", sync_resting_hr),
("body battery", sync_body_battery),
]
for name, fn in wellness:
print(f"{name} (last {days} days)")
n = fn(conn, end, days)
print(f" done: {n} rows")
set_state(conn, "last_sync_utc", datetime.utcnow().isoformat(timespec="seconds"))
conn.commit()
conn.close()
print("✓ sync complete")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,512 @@
"""Ingest a Garmin Connect data export (zip or unzipped directory) into SQLite.
Usage:
uv run ingest_export.py path/to/export.zip
uv run ingest_export.py path/to/unzipped_export_dir/
Garmin's export contains a tree like:
DI_CONNECT/
DI-Connect-Fitness/ # activity JSONs + .fit files
DI-Connect-Wellness/ # daily wellness JSONs
DI-Connect-Aggregator/ # rolled-up summaries
DI-Connect-User/ # profile
...
File names vary by account / export date. We dispatch on filename substrings
and log anything unrecognized so you can tell us about new shapes.
"""
from __future__ import annotations
import argparse
import json
import sqlite3
import sys
import tempfile
import zipfile
from collections import defaultdict
from datetime import datetime
from pathlib import Path
from typing import Any, Callable, Iterable
from ..db import connect, set_state
from .fit_linker import record_link
# ---------------------------------------------------------------------------
# helpers
# ---------------------------------------------------------------------------
def _load_json(path: Path) -> Any:
try:
with path.open() as fh:
return json.load(fh)
except (OSError, json.JSONDecodeError) as exc:
print(f" ! failed to read {path.name}: {exc}", file=sys.stderr)
return None
def _as_list(payload: Any) -> list[dict]:
"""Garmin JSON files are sometimes a list, sometimes a {"key": [...]} envelope."""
if isinstance(payload, list):
return [x for x in payload if isinstance(x, dict)]
if isinstance(payload, dict):
for v in payload.values():
if isinstance(v, list):
return [x for x in v if isinstance(x, dict)]
return [payload]
return []
def _dump(obj: Any) -> str:
return json.dumps(obj, separators=(",", ":"), default=str)
def _first(d: dict, *keys: str) -> Any:
"""Pull the first non-null value among candidate keys (Garmin renames fields between exports)."""
for k in keys:
if k in d and d[k] is not None:
return d[k]
return None
def _date_key(d: dict) -> str | None:
"""Find the calendar date in a daily-stats record, normalized to ISO."""
raw = _first(d, "calendarDate", "date", "summaryDate", "statisticsStartDate")
if not raw:
return None
if isinstance(raw, str):
return raw[:10]
return str(raw)[:10]
# ---------------------------------------------------------------------------
# handlers — one per data category
# ---------------------------------------------------------------------------
def handle_activities(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
# summarizedActivities format: [{"summarizedActivitiesExport": [{...}, {...}]}]
if isinstance(payload, list) and payload and isinstance(payload[0], dict):
if "summarizedActivitiesExport" in payload[0]:
items = payload[0]["summarizedActivitiesExport"]
else:
items = payload
elif isinstance(payload, dict):
items = _as_list(payload)
else:
items = []
# The Garmin Takeout `summarizedActivities` export uses scaled integer units:
# distance cm → m (÷100)
# duration ms → s (÷1000)
# elevation cm → m (÷100)
# speed m/s ÷ 10 → m/s (×10)
# The live API (`sync.py`) returns these in SI directly, so we only convert
# when the source is this export and the raw values are present in the scaled form.
def _scale(v, factor):
return None if v is None else v * factor
n = 0
for raw in items:
aid = _first(raw, "activityId", "activityIdLocal")
if aid is None:
continue
atype = raw.get("activityType")
if isinstance(atype, dict):
type_key = atype.get("typeKey")
else:
type_key = atype # sometimes a plain string in exports
conn.execute(
"""
INSERT INTO activities (
activity_id, start_time_local, start_time_gmt, activity_type,
activity_name, distance_m, duration_s, moving_duration_s,
avg_speed_mps, max_speed_mps, avg_hr, max_hr, calories,
elevation_gain_m, elevation_loss_m, training_load,
aerobic_te, anaerobic_te, vo2_max, raw, fetched_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))
ON CONFLICT(activity_id) DO UPDATE SET
activity_name=excluded.activity_name,
raw=excluded.raw,
fetched_at=excluded.fetched_at
""",
(
aid,
_first(raw, "startTimeLocal", "beginTimestamp"),
_first(raw, "startTimeGmt", "startTimeGMT"),
type_key,
_first(raw, "activityName", "name"),
_scale(_first(raw, "distance"), 0.01),
_scale(_first(raw, "duration"), 0.001),
_scale(_first(raw, "movingDuration"), 0.001),
_scale(_first(raw, "averageSpeed", "avgSpeed"), 10.0),
_scale(_first(raw, "maxSpeed"), 10.0),
_first(raw, "averageHR", "avgHr"),
_first(raw, "maxHR", "maxHr"),
_first(raw, "calories"),
_scale(_first(raw, "elevationGain"), 0.01),
_scale(_first(raw, "elevationLoss"), 0.01),
_first(raw, "activityTrainingLoad"),
_first(raw, "aerobicTrainingEffect"),
_first(raw, "anaerobicTrainingEffect"),
_first(raw, "vO2MaxValue"),
_dump(raw),
),
)
n += 1
conn.commit()
return n
def handle_sleep(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_sleep
(calendar_date, sleep_start_gmt, sleep_end_gmt, deep_s, light_s, rem_s, awake_s, sleep_score, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "sleepStartTimestampGMT", "sleepStartTimeGmt"),
_first(item, "sleepEndTimestampGMT", "sleepEndTimeGmt"),
_first(item, "deepSleepSeconds"),
_first(item, "lightSleepSeconds"),
_first(item, "remSleepSeconds"),
_first(item, "awakeSleepSeconds", "awakeSeconds"),
_first(item, "sleepScore", "overallSleepScore", "value"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_steps(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_steps
(calendar_date, total_steps, step_goal, distance_m, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "totalSteps", "steps"),
_first(item, "stepGoal", "dailyStepGoal"),
_first(item, "totalDistance", "distance"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_stress(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_stress
(calendar_date, avg_stress, max_stress, raw, fetched_at)
VALUES (?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "overallStressLevel", "averageStressLevel", "avgStress"),
_first(item, "maxStressLevel", "maxStress"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_hrv(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_hrv
(calendar_date, weekly_avg, last_night_avg, last_night_5min, status, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "weeklyAvg"),
_first(item, "lastNightAvg"),
_first(item, "lastNight5MinHigh"),
_first(item, "status"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_resting_hr(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
rhr = _first(item, "restingHeartRate", "value")
if rhr is None:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_resting_hr
(calendar_date, resting_hr, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(date_key, rhr, _dump(item)),
)
n += 1
conn.commit()
return n
def handle_intensity_minutes(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_intensity_minutes
(calendar_date, moderate_minutes, vigorous_minutes, raw, fetched_at)
VALUES (?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "moderateIntensityMinutes", "moderateValue"),
_first(item, "vigorousIntensityMinutes", "vigorousValue"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def handle_body_battery(conn: sqlite3.Connection, path: Path) -> int:
payload = _load_json(path)
if payload is None:
return 0
n = 0
for item in _as_list(payload):
date_key = _date_key(item)
if not date_key:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_body_battery
(calendar_date, charged, drained, highest, lowest, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
date_key,
_first(item, "charged", "bodyBatteryChargedValue"),
_first(item, "drained", "bodyBatteryDrainedValue"),
_first(item, "highest", "highestBatteryLevel", "bodyBatteryHighestValue"),
_first(item, "lowest", "lowestBatteryLevel", "bodyBatteryLowestValue"),
_dump(item),
),
)
n += 1
conn.commit()
return n
def _activity_id_from_filename(stem: str) -> int | None:
"""Pick the first 8+-digit chunk from a FIT filename stem.
Garmin uses a few filename formats — `<id>_<name>.fit` for the classic
Connect export and `<email>_<upload_id>.fit` for the Takeout dump. The 8+
digit threshold filters out the year prefix used in some Takeout naming.
"""
for chunk in stem.split("_"):
if chunk.isdigit() and len(chunk) >= 8:
return int(chunk)
return None
def handle_fit(conn: sqlite3.Connection, path: Path) -> int:
"""Index .fit file location by activity ID. Don't parse the binary here.
Returns 1 on link, 0 otherwise (no parseable id, or parseable id not in
`activities`). Callers needing to distinguish the two cases should call
`_activity_id_from_filename` themselves before calling this.
"""
aid = _activity_id_from_filename(path.stem)
if aid is None:
return 0
# Verify the parsed number is actually an activity_id we know about.
# The Garmin Takeout dump uses *upload IDs* in FIT filenames (different ID space
# from activity IDs), so naive insert would create thousands of orphaned rows.
# If the id isn't in `activities`, skip — link_fit_files.py handles the
# by-content matching for takeout-format exports.
row = conn.execute(
"SELECT 1 FROM activities WHERE activity_id = ? LIMIT 1", (aid,)
).fetchone()
if row is None:
return 0
record_link(conn, aid, path)
conn.commit()
return 1
# ---------------------------------------------------------------------------
# dispatch — pattern → handler
# ---------------------------------------------------------------------------
Handler = Callable[[sqlite3.Connection, Path], int]
# Order matters — first match wins. Use lowercased filename substrings.
DISPATCH: list[tuple[str, str, Handler]] = [
("summarizedActivities", "activities", handle_activities),
("sleepData", "sleep", handle_sleep),
("sleep", "sleep", handle_sleep),
("UDSFile", "steps", handle_steps), # daily steps file naming varies
("step", "steps", handle_steps),
("stressLevel", "stress", handle_stress),
("stress", "stress", handle_stress),
("hrvStatus", "hrv", handle_hrv),
("hrv", "hrv", handle_hrv),
("restingHeart", "resting_hr", handle_resting_hr),
("RHR", "resting_hr", handle_resting_hr),
("intensityMinute", "intensity", handle_intensity_minutes),
("bodyBattery", "body_batt", handle_body_battery),
("BBS", "body_batt", handle_body_battery),
]
def classify(name: str) -> tuple[str, Handler] | None:
lower = name.lower()
for needle, label, fn in DISPATCH:
if needle.lower() in lower:
return label, fn
return None
# ---------------------------------------------------------------------------
# entry
# ---------------------------------------------------------------------------
def iter_files(root: Path) -> Iterable[Path]:
for p in root.rglob("*"):
if p.is_file():
yield p
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("source", help="Path to export.zip or unzipped export directory")
parser.add_argument("--dry-run", action="store_true",
help="Show what would be ingested without writing to the DB")
args = parser.parse_args()
src = Path(args.source).expanduser().resolve()
if not src.exists():
sys.exit(f"path does not exist: {src}")
cleanup_dir: tempfile.TemporaryDirectory | None = None
if src.is_file() and src.suffix.lower() == ".zip":
cleanup_dir = tempfile.TemporaryDirectory(prefix="garmin_export_")
export_root = Path(cleanup_dir.name)
print(f"unzipping {src.name}{export_root}")
with zipfile.ZipFile(src) as zf:
zf.extractall(export_root)
elif src.is_dir():
export_root = src
else:
sys.exit(f"unsupported source: {src}")
conn = connect()
counts: dict[str, int] = defaultdict(int)
unknown: list[str] = []
fit_count = 0
fit_skipped_no_id = 0
for path in iter_files(export_root):
if path.suffix.lower() == ".fit":
if _activity_id_from_filename(path.stem) is None:
fit_skipped_no_id += 1
continue
if not args.dry_run:
fit_count += handle_fit(conn, path)
else:
fit_count += 1
continue
if path.suffix.lower() != ".json":
continue
matched = classify(path.name)
if not matched:
unknown.append(str(path.relative_to(export_root)))
continue
label, fn = matched
if args.dry_run:
counts[label] += 1
continue
try:
counts[label] += fn(conn, path)
except Exception as exc: # noqa: BLE001
print(f" ! error in {label} handler for {path.name}: {exc}", file=sys.stderr)
print("\n=== ingest summary ===")
for label, n in sorted(counts.items()):
print(f" {label:20s} {n:>6} rows" + (" (file count, dry run)" if args.dry_run else ""))
print(f" fit_files {fit_count:>6}")
if fit_skipped_no_id:
print(
f" {fit_skipped_no_id:>6} FITs skipped (no activity-id in filename; "
f"run openrun-link-fit {export_root} for Takeout-style exports)"
)
if unknown:
print(f"\n unrecognized JSON files ({len(unknown)}):")
for name in unknown[:25]:
print(f" {name}")
if len(unknown) > 25:
print(f" ... and {len(unknown) - 25} more")
if not args.dry_run:
set_state(conn, "last_ingest_utc", datetime.utcnow().isoformat(timespec="seconds"))
set_state(conn, "last_ingest_source", str(src))
conn.commit()
conn.close()
if cleanup_dir:
cleanup_dir.cleanup()
print("\n✓ done")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,134 @@
"""Precompute per-activity time-in-zone and cache it in `activity_time_in_zone`.
For each activity, prefer the linked FIT file (per-second HR) over splits
(lap-averaged HR). Skips activities already in the cache unless --force.
Usage:
uv run compute_time_in_zone.py # incremental
uv run compute_time_in_zone.py --force # recompute all
uv run compute_time_in_zone.py --type running # restrict by activity type
The split (lap-averaged) fallback is intentionally noisy compared to FIT —
re-running with new FITs is the fix, not retuning the lap method.
"""
from __future__ import annotations
import argparse
import sys
import time
from pathlib import Path
import pandas as pd
from ..config import default_profile
from ..db import connect
from ..model import (
load_fit_records,
time_in_zone_from_fit,
time_in_zone_from_splits,
)
# Five labels (Z1..Z5) — match whatever the configured zones expose.
_ZONE_LABELS = tuple(z[0] for z in default_profile().hr_zones().as_tuples())
def _upsert(conn, activity_id: int, by_zone: dict[str, float], source: str) -> None:
z = {lab: by_zone.get(lab, 0.0) for lab in _ZONE_LABELS}
total = sum(z.values())
conn.execute(
"""INSERT OR REPLACE INTO activity_time_in_zone
(activity_id, z1_s, z2_s, z3_s, z4_s, z5_s, total_s, source, computed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))""",
(activity_id, z["Z1"], z["Z2"], z["Z3"], z["Z4"], z["Z5"], total, source),
)
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--force", action="store_true", help="Recompute even if cached")
parser.add_argument("--type", default=None, help="Restrict to one activity_type (e.g. running)")
parser.add_argument("--limit", type=int, default=None, help="Stop after N activities (debugging)")
args = parser.parse_args()
conn = connect()
# Pick targets: all activities, optionally filtered by type, minus already-cached
sql = "SELECT a.activity_id, a.activity_type FROM activities a"
params: list = []
where: list[str] = []
if args.type:
where.append("a.activity_type = ?")
params.append(args.type)
if not args.force:
where.append(
"a.activity_id NOT IN (SELECT activity_id FROM activity_time_in_zone)"
)
if where:
sql += " WHERE " + " AND ".join(where)
sql += " ORDER BY a.start_time_local DESC"
if args.limit:
sql += f" LIMIT {args.limit}"
targets = list(conn.execute(sql, params))
print(f"{len(targets):,} activities to compute "
f"({'force' if args.force else 'incremental'}"
+ (f", type={args.type}" if args.type else "")
+ ")")
if not targets:
return
# Pre-load all splits in one query for the lap-fallback path
splits_df = pd.read_sql(
"SELECT activity_id, avg_hr, duration_s FROM activity_splits",
conn,
)
splits_by_id = {aid: g for aid, g in splits_df.groupby("activity_id")}
fit_count = lap_count = empty = errors = 0
t0 = time.time()
for i, row in enumerate(targets, 1):
aid = row["activity_id"]
has_fit = conn.execute(
"SELECT 1 FROM activity_fit_files WHERE activity_id = ? LIMIT 1", (aid,)
).fetchone() is not None
try:
if has_fit:
rec = load_fit_records(conn, aid)
tiz = time_in_zone_from_fit(rec)
if tiz:
_upsert(conn, aid, tiz, "fit")
fit_count += 1
if i % 25 == 0:
rate = i / max(time.time() - t0, 1e-6)
eta = (len(targets) - i) / rate
print(f" {i}/{len(targets)} fit={fit_count} lap={lap_count} empty={empty} "
f"rate={rate:.1f}/s eta={eta:.0f}s")
continue
# Fallback: lap-averaged
s = splits_by_id.get(aid)
if s is not None and not s.empty:
tiz = time_in_zone_from_splits(s)
if tiz:
_upsert(conn, aid, tiz, "lap")
lap_count += 1
continue
empty += 1
except Exception as exc: # noqa: BLE001
errors += 1
print(f" ! aid={aid}: {type(exc).__name__}: {exc}", file=sys.stderr)
if i % 50 == 0:
conn.commit()
conn.commit()
print()
print("=== summary ===")
print(f" from FIT : {fit_count:,}")
print(f" from laps : {lap_count:,}")
print(f" empty : {empty:,} (no FIT, no useful splits)")
print(f" errors : {errors:,}")
print(f" elapsed : {time.time() - t0:.1f} s")
if __name__ == "__main__":
main()

797
src/openrun/model.py Normal file
View File

@@ -0,0 +1,797 @@
"""Loaders + derived metrics — single-module home for everything notebooks consume.
Originally `analysis.py`. Moved into the `openrun` package as part of Phase 0.
Functions that need user-specific configuration (HR zones, race calendar) pick it
up from `openrun.config.default_config()` unless an explicit override is passed.
"""
from __future__ import annotations
import json
import sqlite3
from pathlib import Path
import pandas as pd
from .config import HRZones, UserProfile, default_config, default_profile
from .db import connect
# ---------------------------------------------------------------------------
# Convenience re-export — most notebooks just want `open_conn()`.
# ---------------------------------------------------------------------------
def open_conn(db_path: Path | str | None = None) -> sqlite3.Connection:
"""Open (and create-if-missing) the SQLite DB. Defaults to config path."""
return connect(db_path)
# ---------------------------------------------------------------------------
# activities
# ---------------------------------------------------------------------------
def load_activities(conn: sqlite3.Connection, *, type: str | None = None) -> pd.DataFrame:
"""Activities with derived: distance_km, duration_min, pace_min_per_km, week, month, year."""
sql = """
SELECT activity_id, start_time_local, activity_type, activity_name,
distance_m, duration_s, moving_duration_s,
avg_speed_mps, max_speed_mps, avg_hr, max_hr, calories,
elevation_gain_m, elevation_loss_m,
training_load, aerobic_te, anaerobic_te, vo2_max
FROM activities
"""
if type:
sql += " WHERE activity_type = ?"
df = pd.read_sql(sql, conn, params=[type], parse_dates=["start_time_local"])
else:
df = pd.read_sql(sql, conn, parse_dates=["start_time_local"])
df["distance_km"] = df["distance_m"] / 1000
df["duration_min"] = df["duration_s"] / 60
df["moving_min"] = df["moving_duration_s"] / 60
df["pace_min_per_km"] = df["moving_min"] / df["distance_km"]
impossible = (df["distance_km"] < 0.2) | (df["pace_min_per_km"] < 3) | (df["pace_min_per_km"] > 30)
df.loc[impossible, "pace_min_per_km"] = pd.NA
df["date"] = df["start_time_local"].dt.normalize()
df["week"] = df["start_time_local"].dt.to_period("W").dt.start_time
df["month"] = df["start_time_local"].dt.to_period("M").dt.to_timestamp()
df["year"] = df["start_time_local"].dt.year
return df.sort_values("start_time_local").reset_index(drop=True)
# ---------------------------------------------------------------------------
# wellness
# ---------------------------------------------------------------------------
def load_wellness(conn: sqlite3.Connection) -> pd.DataFrame:
"""Joined daily wellness frame indexed by calendar_date (datetime)."""
df = pd.read_sql(
"""
SELECT s.calendar_date,
s.total_steps,
sl.sleep_score,
sl.deep_s, sl.light_s, sl.rem_s, sl.awake_s,
st.avg_stress,
h.last_night_avg AS hrv_last_night,
h.weekly_avg AS hrv_weekly,
h.status AS hrv_status,
im.moderate_minutes,
im.vigorous_minutes,
rh.resting_hr,
bb.charged AS bb_charged,
bb.drained AS bb_drained,
bb.highest AS bb_highest,
bb.lowest AS bb_lowest
FROM daily_steps s
LEFT JOIN daily_sleep sl ON sl.calendar_date = s.calendar_date
LEFT JOIN daily_stress st ON st.calendar_date = s.calendar_date
LEFT JOIN daily_hrv h ON h.calendar_date = s.calendar_date
LEFT JOIN daily_intensity_minutes im ON im.calendar_date = s.calendar_date
LEFT JOIN daily_resting_hr rh ON rh.calendar_date = s.calendar_date
LEFT JOIN daily_body_battery bb ON bb.calendar_date = s.calendar_date
ORDER BY s.calendar_date
""",
conn,
parse_dates=["calendar_date"],
).set_index("calendar_date")
df["sleep_total_s"] = df[["deep_s", "light_s", "rem_s"]].sum(axis=1, min_count=1)
df["sleep_hours"] = df["sleep_total_s"] / 3600
df["deep_pct"] = df["deep_s"] / df["sleep_total_s"]
df["rem_pct"] = df["rem_s"] / df["sleep_total_s"]
return df
def load_sleep_stages(conn: sqlite3.Connection) -> pd.DataFrame:
"""Per-night sleep-stage breakdown from `daily_sleep`.
Returns a DataFrame indexed by `calendar_date` (datetime) with columns:
deep_s, light_s, rem_s, awake_s, sleep_score,
sleep_total_s, sleep_hours,
deep_pct, light_pct, rem_pct (the non-NaN ones sum to 1.0),
awake_pct (awake / total-time-in-bed; counts awake_s separately).
Nights with no asleep time (deep+light+rem == 0) yield NaN percentages.
Nights where one stage is NULL upstream (e.g. older Garmin firmware) leave
that stage's pct as NaN; the remaining present-stage pcts still sum to 1.
"""
df = pd.read_sql(
"""
SELECT calendar_date, deep_s, light_s, rem_s, awake_s, sleep_score
FROM daily_sleep
ORDER BY calendar_date
""",
conn,
parse_dates=["calendar_date"],
).set_index("calendar_date")
df["sleep_total_s"] = df[["deep_s", "light_s", "rem_s"]].sum(axis=1, min_count=1)
df["sleep_hours"] = df["sleep_total_s"] / 3600
# Percent of *asleep* time spent in each sleep stage — these three sum to 1
# for any night with positive sleep_total_s.
df["deep_pct"] = df["deep_s"] / df["sleep_total_s"]
df["light_pct"] = df["light_s"] / df["sleep_total_s"]
df["rem_pct"] = df["rem_s"] / df["sleep_total_s"]
# Awake is a fourth bucket — fraction of *time in bed* spent awake.
time_in_bed = df["sleep_total_s"] + df["awake_s"]
df["awake_pct"] = df["awake_s"] / time_in_bed
return df
def daily_training_load(conn: sqlite3.Connection) -> pd.DataFrame:
"""Sum training load + distance per calendar date (any activity type)."""
acts = load_activities(conn)
daily = (
acts.groupby("date")
.agg(
training_load=("training_load", "sum"),
distance_km=("distance_km", "sum"),
duration_min=("duration_min", "sum"),
n_activities=("activity_id", "count"),
avg_hr_weighted=("avg_hr", "mean"),
)
)
daily.index = pd.to_datetime(daily.index)
return daily
def joined(conn: sqlite3.Connection) -> pd.DataFrame:
"""Wellness joined with same-day and previous-day training load."""
wellness = load_wellness(conn)
tl = daily_training_load(conn)
df = wellness.join(tl, how="left")
df[["training_load", "distance_km", "duration_min", "n_activities"]] = (
df[["training_load", "distance_km", "duration_min", "n_activities"]].fillna(0)
)
df["training_load_prev"] = df["training_load"].shift(1)
df["distance_km_prev"] = df["distance_km"].shift(1)
return df
def expand_raw(df: pd.DataFrame, raw_col: str = "raw") -> pd.DataFrame:
"""For a frame with a `raw` JSON column, return a normalized companion frame."""
if raw_col not in df.columns:
raise KeyError(f"no '{raw_col}' column in frame")
return pd.json_normalize([json.loads(r) for r in df[raw_col]])
# ---------------------------------------------------------------------------
# splits — per-lap data with cadence, stride, GPS, etc. extracted from raw JSON
# ---------------------------------------------------------------------------
_SPLIT_RAW_FIELDS = (
"averageRunCadence",
"maxRunCadence",
"strideLength",
"verticalOscillation",
"verticalRatio",
"groundContactTime",
"averagePower",
"normalizedPower",
"startLatitude",
"startLongitude",
"endLatitude",
"endLongitude",
"avgGradeAdjustedSpeed",
"maxHR",
"elevationGain",
"elevationLoss",
)
def load_splits(conn: sqlite3.Connection, *, activity_type: str | None = "running") -> pd.DataFrame:
"""Per-split frame with rich fields expanded from raw JSON, joined to activity start time."""
sql = """
SELECT s.activity_id, s.split_index, s.distance_m, s.duration_s,
s.avg_hr, s.avg_speed_mps, s.elevation_gain_m AS split_elev_gain_m,
s.raw, a.start_time_local, a.activity_type
FROM activity_splits s
JOIN activities a ON a.activity_id = s.activity_id
"""
params: list = []
if activity_type:
sql += " WHERE a.activity_type = ?"
params.append(activity_type)
sql += " ORDER BY s.activity_id, s.split_index"
df = pd.read_sql(sql, conn, params=params, parse_dates=["start_time_local"])
raws = [json.loads(r) if r else {} for r in df["raw"]]
for k in _SPLIT_RAW_FIELDS:
df[k] = [r.get(k) for r in raws]
df = df.drop(columns=["raw"])
df["pace_min_per_km"] = (df["duration_s"] / 60) / (df["distance_m"] / 1000)
df["pace_min_per_mile"] = (df["duration_s"] / 60) / (df["distance_m"] / 1609.344)
df["speed_kmh"] = df["avg_speed_mps"] * 3.6
bad = (
df["distance_m"].lt(200)
| df["avg_hr"].isna()
| df["avg_hr"].lt(60)
| df["pace_min_per_km"].gt(30)
| df["pace_min_per_km"].lt(2.5)
)
df = df.loc[~bad].copy()
df["split_seq"] = df.groupby("activity_id").cumcount()
df["n_splits"] = df.groupby("activity_id")["activity_id"].transform("count")
denom = (df["n_splits"] - 1).replace(0, pd.NA)
df["frac_through"] = df["split_seq"] / denom
df["year"] = df["start_time_local"].dt.year
df["month"] = df["start_time_local"].dt.to_period("M").dt.to_timestamp()
return df.reset_index(drop=True)
def decoupling(splits: pd.DataFrame, min_splits: int = 6) -> pd.DataFrame:
"""Per-activity Pa:Hr decoupling using duration-weighted halves (split-level).
See `fit_decoupling` for the per-second version (cleaner — excludes stopped time).
"""
valid = splits[splits["n_splits"] >= min_splits].copy()
valid["half"] = (valid["frac_through"] >= 0.5).map({False: "first", True: "second"})
def _half_eff(d: pd.DataFrame) -> float:
w = d["duration_s"].to_numpy()
speed = (d["avg_speed_mps"].to_numpy() * w).sum() / w.sum()
hr = (d["avg_hr"].to_numpy() * w).sum() / w.sum()
return speed / hr if hr else float("nan")
eff = (
valid.groupby(["activity_id", "half"])[["avg_speed_mps", "avg_hr", "duration_s"]]
.apply(_half_eff)
.unstack("half")
)
eff["decoupling_pct"] = (eff["first"] / eff["second"] - 1) * 100
eff = eff.dropna(subset=["decoupling_pct"])
meta = valid.groupby("activity_id").agg(
start_time_local=("start_time_local", "first"),
distance_km=("distance_m", lambda s: s.sum() / 1000),
duration_min=("duration_s", lambda s: s.sum() / 60),
avg_hr=("avg_hr", "mean"),
avg_pace_min_per_km=("pace_min_per_km", "mean"),
n_splits=("n_splits", "first"),
)
out = eff.join(meta).reset_index()
out["year"] = out["start_time_local"].dt.year
return out
# ---------------------------------------------------------------------------
# HR zones — bound to the loaded user profile by default
# ---------------------------------------------------------------------------
ZoneTuples = tuple[tuple[str, int, int], ...]
def _resolve_zones(zones: ZoneTuples | HRZones | UserProfile | None) -> ZoneTuples:
"""Accept any of: explicit tuples, HRZones, UserProfile, or None (= config default)."""
if zones is None:
return default_profile().hr_zones().as_tuples()
if isinstance(zones, UserProfile):
return zones.hr_zones().as_tuples()
if isinstance(zones, HRZones):
return zones.as_tuples()
return zones
# Legacy %HRmax helper — preserved for callers who want the generic Karvonen-ish bands.
_HR_ZONE_BOUNDS_PCT = (0.50, 0.60, 0.70, 0.80, 0.90, 1.01)
_HR_ZONE_LABELS = ("Z1", "Z2", "Z3", "Z4", "Z5")
def assign_hr_zone(hr: float, hr_max: float) -> str | None:
"""Generic %HRmax bands (Z1=5060%, Z2=6070%, …, Z5=90100%). Prefer hr_to_user_zone."""
if hr is None or pd.isna(hr) or not hr_max:
return None
frac = hr / hr_max
for lo, hi, lab in zip(_HR_ZONE_BOUNDS_PCT[:-1], _HR_ZONE_BOUNDS_PCT[1:], _HR_ZONE_LABELS):
if lo <= frac < hi:
return lab
return "Z5" if frac >= _HR_ZONE_BOUNDS_PCT[-2] else "Z1"
def hr_to_user_zone(
hr: float,
zones: ZoneTuples | HRZones | UserProfile | None = None,
) -> str | None:
"""Map a single HR reading to its configured zone label (Z1..Z5).
Below Z1 floor → None (warmup / walking).
Above Z5 ceiling → still Z5 (rare, edge of effort).
"""
if hr is None or pd.isna(hr):
return None
z = _resolve_zones(zones)
if hr < z[0][1]:
return None
for label, lo, hi in z:
if lo <= hr <= hi:
return label
return z[-1][0]
def time_in_zone_from_fit(
records: pd.DataFrame,
zones: ZoneTuples | HRZones | UserProfile | None = None,
) -> dict[str, float]:
"""Per-second time-in-zone (seconds) from a FIT records frame.
Each record contributes `elapsed_s - prev_elapsed_s` to whichever zone its HR
falls in. Large gaps (>30 s, e.g. a paused recording) are clipped to 30 s.
"""
if records is None or records.empty or "heart_rate" not in records:
return {}
r = records.dropna(subset=["heart_rate", "elapsed_s"]).copy()
if r.empty:
return {}
z = _resolve_zones(zones)
r["dt"] = r["elapsed_s"].diff().fillna(1.0).clip(lower=0, upper=30.0)
r["zone"] = r["heart_rate"].apply(lambda h: hr_to_user_zone(h, z))
return r.dropna(subset=["zone"]).groupby("zone")["dt"].sum().to_dict()
def time_in_zone_from_splits(
splits_df: pd.DataFrame,
zones: ZoneTuples | HRZones | UserProfile | None = None,
) -> dict[str, float]:
"""Fallback when there's no FIT — assign each split's avg HR to one zone.
Biased toward whichever zone the lap-average falls in; FIT version is the
ground truth where available.
"""
if splits_df is None or splits_df.empty:
return {}
z = _resolve_zones(zones)
s = splits_df.dropna(subset=["avg_hr", "duration_s"]).copy()
s["zone"] = s["avg_hr"].apply(lambda h: hr_to_user_zone(h, z))
return s.dropna(subset=["zone"]).groupby("zone")["duration_s"].sum().to_dict()
def weekly_time_in_zone(
conn: sqlite3.Connection,
*,
start: str | pd.Timestamp | None = None,
end: str | pd.Timestamp | None = None,
activity_types: tuple[str, ...] = ("running", "trail_running"),
) -> pd.DataFrame:
"""ISO-week pivot of the cached `activity_time_in_zone` table.
Returns a DataFrame indexed by the week's Monday with columns
z1_s..z5_s, total_s, n_activities. Only activities whose
`activity_type` is in `activity_types` are included; pass `()` to skip
the type filter entirely.
`start` / `end` filter on `activities.start_time_local` (inclusive).
Empty input returns an empty frame with the same columns.
"""
sql = """
SELECT a.start_time_local AS d,
tiz.z1_s, tiz.z2_s, tiz.z3_s, tiz.z4_s, tiz.z5_s, tiz.total_s
FROM activity_time_in_zone AS tiz
JOIN activities AS a USING(activity_id)
"""
where: list[str] = []
params: list = []
if activity_types:
where.append(f"a.activity_type IN ({','.join(['?'] * len(activity_types))})")
params.extend(activity_types)
if start is not None:
where.append("a.start_time_local >= ?")
params.append(str(pd.Timestamp(start)))
if end is not None:
where.append("a.start_time_local <= ?")
params.append(str(pd.Timestamp(end)))
if where:
sql += " WHERE " + " AND ".join(where)
df = pd.read_sql(sql, conn, params=params, parse_dates=["d"])
cols = ["z1_s", "z2_s", "z3_s", "z4_s", "z5_s", "total_s", "n_activities"]
if df.empty:
return pd.DataFrame(columns=cols)
# Week label = Monday of the ISO week (calendar week starting Monday).
df["week"] = df["d"].dt.to_period("W-SUN").dt.start_time
df["n_activities"] = 1
agg = df.groupby("week")[["z1_s", "z2_s", "z3_s", "z4_s", "z5_s", "total_s", "n_activities"]].sum()
return agg.sort_index()
# Backwards-compat alias — old notebooks reference HR_ZONES_USER directly.
# Resolves lazily via the config default.
def __getattr__(name: str):
if name == "HR_ZONES_USER":
return default_profile().hr_zones().as_tuples()
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
# ---------------------------------------------------------------------------
# Banister fitness/fatigue/form
# ---------------------------------------------------------------------------
def banister(
daily_load: pd.Series,
*,
ctl_tau: float | None = None,
atl_tau: float | None = None,
start_date: str | pd.Timestamp | None = None,
end_date: str | pd.Timestamp | None = None,
) -> pd.DataFrame:
"""Banister fitness/fatigue/form (CTL/ATL/TSB) from a daily training-load series.
Time constants default to config (`banister.ctl_tau_days` / `atl_tau_days`).
Returns a frame indexed by date with columns CTL, ATL, TSB.
"""
import numpy as np
if daily_load.empty:
return pd.DataFrame(columns=["CTL", "ATL", "TSB"])
ban = default_config().banister
if ctl_tau is None:
ctl_tau = ban.ctl_tau_days
if atl_tau is None:
atl_tau = ban.atl_tau_days
idx = pd.to_datetime(daily_load.index).normalize()
s = pd.Series(daily_load.values, index=idx).groupby(level=0).sum()
lo = pd.Timestamp(start_date) if start_date else s.index.min()
hi = pd.Timestamp(end_date) if end_date else s.index.max()
full = s.reindex(pd.date_range(lo, hi, freq="D"), fill_value=0.0)
decay_ctl, decay_atl = np.exp(-1 / ctl_tau), np.exp(-1 / atl_tau)
w_ctl, w_atl = 1 - decay_ctl, 1 - decay_atl
n = len(full)
ctl = np.zeros(n)
atl = np.zeros(n)
loads = full.to_numpy()
for i in range(n):
prev_ctl = ctl[i - 1] if i else 0.0
prev_atl = atl[i - 1] if i else 0.0
ctl[i] = prev_ctl * decay_ctl + loads[i] * w_ctl
atl[i] = prev_atl * decay_atl + loads[i] * w_atl
out = pd.DataFrame({"CTL": ctl, "ATL": atl}, index=full.index)
out["TSB"] = out["CTL"].shift(1) - out["ATL"].shift(1)
return out
def banister_forecast(
history: pd.Series,
future: pd.Series,
*,
today: pd.Timestamp | str | None = None,
ctl_tau: float | None = None,
atl_tau: float | None = None,
) -> pd.DataFrame:
"""Splice historical daily training load with a planned future load schedule,
then run `banister` over the combined series.
`today` is the boundary: history is kept through and including `today`,
future is kept strictly after. When `today` is None it defaults to the last
day in `history`. Index of either series may be `DatetimeIndex` or any
date-like values that `pd.to_datetime` accepts.
The combined series feeds the same recursion as `banister`, so the PMC at
any historical date matches what `banister(history)` would have produced.
"""
today_ts = (
pd.Timestamp(today).normalize()
if today is not None
else pd.to_datetime(history.index).normalize().max()
)
hist_idx = pd.to_datetime(history.index).normalize()
fut_idx = pd.to_datetime(future.index).normalize()
hist = pd.Series(history.values, index=hist_idx)
fut = pd.Series(future.values, index=fut_idx)
hist_part = hist[hist.index <= today_ts]
fut_part = fut[fut.index > today_ts]
combined = pd.concat([hist_part, fut_part]).sort_index()
# Defensive de-dup if history/future overlap with same dates somehow slipped
# past the boundary check (e.g. duplicate index entries in input).
combined = combined[~combined.index.duplicated(keep="first")]
return banister(combined, ctl_tau=ctl_tau, atl_tau=atl_tau)
def personal_records(
activities: pd.DataFrame,
*,
distance_bins_km: tuple[float, ...] = (5.0, 10.0, 21.0975, 42.195, 50.0),
tolerance: float = 0.05,
) -> pd.DataFrame:
"""Pick the fastest activity within ±`tolerance` of each bin distance.
Operates on the DataFrame `load_activities()` returns: requires
`distance_km` and `pace_min_per_km`. Activities with NaN pace
(masked by `load_activities` for implausible values) are ignored.
Returns one row per bin (in the original bin order) with columns:
bin_km, activity_id, start_time_local, distance_km, pace_min_per_km,
moving_duration_s.
Bins with no qualifying activity are omitted from the result.
"""
if activities.empty or "pace_min_per_km" not in activities.columns:
return pd.DataFrame(
columns=[
"bin_km", "activity_id", "start_time_local",
"distance_km", "pace_min_per_km", "moving_duration_s",
]
)
df = activities.dropna(subset=["pace_min_per_km", "distance_km"])
rows: list[dict] = []
for bin_km in distance_bins_km:
lo, hi = bin_km * (1 - tolerance), bin_km * (1 + tolerance)
in_bin = df[(df["distance_km"] >= lo) & (df["distance_km"] <= hi)]
if in_bin.empty:
continue
best = in_bin.loc[in_bin["pace_min_per_km"].idxmin()]
rows.append({
"bin_km": bin_km,
"activity_id": int(best["activity_id"]),
"start_time_local": best.get("start_time_local"),
"distance_km": float(best["distance_km"]),
"pace_min_per_km": float(best["pace_min_per_km"]),
"moving_duration_s": float(best.get("moving_duration_s", float("nan"))),
})
return pd.DataFrame(rows)
def calibrate_tl_per_km(
conn: sqlite3.Connection,
*,
activity_types: tuple[str, ...] = ("running", "trail_running"),
min_distance_km: float = 2.0,
lookback_days: int | None = 365,
) -> dict:
"""Empirical training-load-per-km ratio for converting plan-km → forecast load.
Returns a dict with `median`, `q1`, `q3`, `n`. Designed to replace the
hard-coded `RACE_DAY_TL_PER_KM` / training-run constant in race-plan flows:
feed `median` into `banister_forecast`'s upstream load builder.
Activities with NULL `training_load`, NULL `distance_m`, or distance below
`min_distance_km` are excluded. `lookback_days=None` removes the recency
filter and uses all history.
"""
placeholders = ",".join(["?"] * len(activity_types))
where = [
f"activity_type IN ({placeholders})",
"training_load IS NOT NULL",
"distance_m IS NOT NULL",
f"distance_m >= {min_distance_km * 1000}",
]
params: list = list(activity_types)
if lookback_days is not None:
where.append(f"date(start_time_local) >= date('now', '-{lookback_days} days')")
sql = f"""
SELECT distance_m, training_load
FROM activities
WHERE {' AND '.join(where)}
"""
df = pd.read_sql(sql, conn, params=params)
if df.empty:
return {"median": float("nan"), "q1": float("nan"), "q3": float("nan"), "n": 0}
ratio = df["training_load"] / (df["distance_m"] / 1000.0)
return {
"median": float(ratio.median()),
"q1": float(ratio.quantile(0.25)),
"q3": float(ratio.quantile(0.75)),
"n": int(len(ratio)),
}
def daily_training_load_series(
conn: sqlite3.Connection,
*,
activity_types: tuple[str, ...] = ("running", "trail_running"),
) -> pd.Series:
"""Daily-summed training_load across the given activity types."""
placeholders = ",".join(["?"] * len(activity_types))
df = pd.read_sql(
f"""SELECT date(start_time_local) AS d, SUM(training_load) AS tl
FROM activities
WHERE activity_type IN ({placeholders}) AND training_load IS NOT NULL
GROUP BY d ORDER BY d""",
conn,
params=list(activity_types),
parse_dates=["d"],
)
return df.set_index("d")["tl"]
# ---------------------------------------------------------------------------
# FIT records
# ---------------------------------------------------------------------------
def _resolve_fit_path(stored_path: str) -> Path:
"""Return the FIT path stored in `activity_fit_files.fit_path`.
Paths are stored absolute at link time. If the export directory has since
moved, the FIT won't be at the recorded location — raise with a hint to
run `openrun-link-fit <new_root> --relink` to rewrite the table.
"""
p = Path(stored_path)
if p.exists():
return p
raise FileNotFoundError(
f"FIT file not at recorded path: {stored_path}\n"
"If the export directory has moved, run:\n"
" openrun-link-fit <new_export_root> --relink"
)
def load_fit_records(conn: sqlite3.Connection, activity_id: int) -> pd.DataFrame:
"""Per-second FIT `record` messages for one activity as a DataFrame.
Columns (subset of what's present):
timestamp (UTC), elapsed_s, heart_rate, speed_mps, distance_m, cadence_spm,
altitude_m, power_w, position_lat_deg, position_long_deg,
vertical_oscillation_mm, step_length_mm.
"""
import fitparse
row = conn.execute(
"SELECT fit_path FROM activity_fit_files WHERE activity_id = ?", (activity_id,)
).fetchone()
if row is None:
raise ValueError(f"no FIT linked for activity {activity_id}")
fit_file = _resolve_fit_path(row[0])
fit = fitparse.FitFile(str(fit_file))
rows: list[dict] = []
for msg in fit.get_messages("record"):
rows.append(msg.get_values())
if not rows:
return pd.DataFrame()
df = pd.DataFrame(rows)
out = pd.DataFrame()
if "timestamp" in df.columns:
out["timestamp"] = pd.to_datetime(df["timestamp"], utc=True)
out["elapsed_s"] = (out["timestamp"] - out["timestamp"].iloc[0]).dt.total_seconds()
out["heart_rate"] = df.get("heart_rate")
out["speed_mps"] = df.get("enhanced_speed", df.get("speed"))
out["distance_m"] = df.get("distance")
out["cadence_spm"] = df.get("cadence")
out["altitude_m"] = df.get("enhanced_altitude", df.get("altitude"))
out["power_w"] = df.get("power")
out["vertical_oscillation_mm"] = df.get("vertical_oscillation")
out["step_length_mm"] = df.get("step_length")
SEMI = 180.0 / (2 ** 31)
if "position_lat" in df.columns:
out["position_lat_deg"] = df["position_lat"] * SEMI
if "position_long" in df.columns:
out["position_long_deg"] = df["position_long"] * SEMI
return out
def fit_decoupling(
records: pd.DataFrame,
*,
segments: int = 2,
warmup_min: float = 5.0,
cooldown_min: float = 2.0,
min_speed_mps: float = 0.5,
) -> pd.DataFrame:
"""Per-second Pa:Hr decoupling (Friel's method): chunks of moving-only time."""
r = records.dropna(subset=["heart_rate", "speed_mps", "elapsed_s"]).copy()
if r.empty:
return pd.DataFrame()
total = r["elapsed_s"].iloc[-1]
r = r[(r["elapsed_s"] >= warmup_min * 60) & (r["elapsed_s"] <= total - cooldown_min * 60)]
moving = r[r["speed_mps"] >= min_speed_mps].copy()
if moving.empty:
return pd.DataFrame()
moving = moving.reset_index(drop=True)
seg_size = len(moving) // segments
out_rows: list[dict] = []
for i in range(segments):
s = i * seg_size
e = (i + 1) * seg_size if i < segments - 1 else len(moving)
chunk = moving.iloc[s:e]
speed = chunk["speed_mps"].mean()
hr = chunk["heart_rate"].mean()
out_rows.append(
{
"segment": i + 1,
"from_min": chunk["elapsed_s"].iloc[0] / 60,
"to_min": chunk["elapsed_s"].iloc[-1] / 60,
"mean_speed_mps": speed,
"mean_pace_min_per_km": (1 / speed) * 1000 / 60 if speed else float("nan"),
"mean_hr": hr,
"efficiency": speed / hr if hr else float("nan"),
}
)
out = pd.DataFrame(out_rows)
base = out["efficiency"].iloc[0]
out["decoupling_pct"] = (base / out["efficiency"] - 1) * 100
return out
def fit_rolling_efficiency(records: pd.DataFrame, window_s: int = 300) -> pd.DataFrame:
"""Rolling (smoothed) speed/HR over elapsed time, useful for drift plots."""
r = records.dropna(subset=["heart_rate", "speed_mps", "elapsed_s"]).copy()
if r.empty:
return r
r = r.set_index("elapsed_s")
win = f"{window_s}s"
r["_ts"] = pd.to_datetime(r.index, unit="s")
r = r.set_index("_ts")
rolled = pd.DataFrame(index=r.index)
rolled["rolling_speed_mps"] = r["speed_mps"].rolling(win).mean()
rolled["rolling_hr"] = r["heart_rate"].rolling(win).mean()
rolled["rolling_efficiency"] = rolled["rolling_speed_mps"] / rolled["rolling_hr"]
rolled["elapsed_min"] = (rolled.index - rolled.index[0]).total_seconds() / 60
return rolled.reset_index(drop=True)
# ---------------------------------------------------------------------------
# Geo helpers + route clustering
# ---------------------------------------------------------------------------
def haversine_km(lat1, lon1, lat2, lon2):
"""Vectorised great-circle distance, kilometres. Inputs in degrees."""
import numpy as np
r = 6371.0
lat1, lon1, lat2, lon2 = map(np.radians, (lat1, lon1, lat2, lon2))
dlat = lat2 - lat1
dlon = lon2 - lon1
a = np.sin(dlat / 2) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2) ** 2
return 2 * r * np.arcsin(np.sqrt(a))
def cluster_routes(lats, lons, radius_km: float = 0.25):
"""Greedy haversine-radius clustering of run start points.
Returns an integer label array; -1 means unclustered (no neighbours within radius).
"""
import numpy as np
lats = np.asarray(lats, dtype=float)
lons = np.asarray(lons, dtype=float)
n = len(lats)
labels = np.full(n, -1, dtype=int)
next_label = 0
for i in range(n):
if labels[i] != -1:
continue
d = haversine_km(lats[i], lons[i], lats, lons)
neigh = np.where((d <= radius_km) & (labels == -1))[0]
if len(neigh) >= 2:
labels[neigh] = next_label
next_label += 1
return labels

71
src/openrun/plots.py Normal file
View File

@@ -0,0 +1,71 @@
"""Plot helpers — small, opinionated wrappers over matplotlib.
Each function is a pure helper: pass in a records / decoupling DataFrame, an
optional Axes, and get the populated Axes back so the caller can tweak titles,
add their own colors, save the figure, etc.
The package itself doesn't take a hard matplotlib dependency at import time;
matplotlib is imported lazily inside the functions so notebooks that never
plot still load the rest of `openrun` cleanly.
"""
from __future__ import annotations
from typing import TYPE_CHECKING
import pandas as pd
from .model import fit_decoupling
if TYPE_CHECKING: # pragma: no cover
from matplotlib.axes import Axes
def plot_fit_decoupling(
records: pd.DataFrame,
*,
segments: int = 2,
ax: "Axes | None" = None,
warmup_min: float = 5.0,
cooldown_min: float = 2.0,
min_speed_mps: float = 0.5,
) -> "Axes":
"""Bar chart of per-segment Pa:Hr decoupling for one activity's FIT records.
Each bar is one segment's `decoupling_pct` from `fit_decoupling`. Two
horizontal references are drawn — Friel's `< 5%` "developed" line and the
`10%` "unsustainable" threshold — so a reader can place the run at a
glance.
Returns the matplotlib Axes the bars were drawn on; pass `ax=` to compose
multiple plots in one figure.
"""
import matplotlib.pyplot as plt
dec = fit_decoupling(
records,
segments=segments,
warmup_min=warmup_min,
cooldown_min=cooldown_min,
min_speed_mps=min_speed_mps,
)
if ax is None:
_, ax = plt.subplots(figsize=(6, 3.5))
if dec.empty:
ax.text(0.5, 0.5, "no usable records", ha="center", va="center", transform=ax.transAxes)
ax.set_axis_off()
return ax
xs = dec["segment"].to_numpy()
ys = dec["decoupling_pct"].to_numpy()
ax.bar(xs, ys, width=0.7, color="#264653", edgecolor="white")
ax.axhline(0, color="black", lw=0.6)
ax.axhline(5, color="#2a9d8f", ls=":", lw=1, label="Friel < 5% (developed)")
ax.axhline(10, color="#e76f51", ls="--", lw=1, label="Friel > 10% (unsustainable)")
ax.set_xticks(xs)
ax.set_xticklabels([f"seg {int(s)}" for s in xs])
ax.set_ylabel("decoupling (%)")
ax.set_title(f"Pa:Hr decoupling — {segments} segments")
ax.legend(loc="best", fontsize=8)
return ax

168
src/openrun/setup.py Normal file
View File

@@ -0,0 +1,168 @@
"""Idempotent workspace bootstrap + status reporting.
`init_workspace()` is safe to re-run any number of times: it creates the data
directory, applies the SQL schema via CREATE TABLE IF NOT EXISTS, and returns
a snapshot of what's currently in the workspace (config, DB tables, row
counts, auth-token presence, last sync/ingest timestamps).
CLI:
openrun-init # bootstrap + human-readable status
openrun-init --json # same, but machine-readable
"""
from __future__ import annotations
import argparse
import json
from dataclasses import asdict, dataclass
from pathlib import Path
from .config import Config, _find_config, default_config
from .db import connect, get_state
# Tables the loaders in `openrun.model` depend on. Kept in sync with `db.SCHEMA`.
EXPECTED_TABLES = (
"activities",
"activity_splits",
"activity_fit_files",
"activity_time_in_zone",
"daily_steps",
"daily_sleep",
"daily_stress",
"daily_hrv",
"daily_body_battery",
"daily_intensity_minutes",
"daily_resting_hr",
"sync_state",
)
@dataclass(frozen=True)
class WorkspaceStatus:
"""Snapshot of the workspace after `init_workspace()` runs."""
config_path: Path | None # Where openrun.toml was found (None ⇒ defaults-only)
db_path: Path # Resolved SQLite path
db_existed_before: bool # True if the DB file existed before this call
tables_present: tuple[str, ...]
table_counts: dict[str, int]
auth_tokens_present: bool # True iff secrets_dir has at least one file
last_sync_utc: str | None
last_ingest_utc: str | None
def is_ready(self) -> bool:
"""Schema covers every table the loaders need."""
return set(EXPECTED_TABLES).issubset(self.tables_present)
def to_dict(self) -> dict:
d = asdict(self)
d["config_path"] = str(self.config_path) if self.config_path else None
d["db_path"] = str(self.db_path)
d["tables_present"] = list(self.tables_present)
return d
def init_workspace(
*,
config: Config | None = None,
db_path: Path | None = None,
secrets_dir: Path | None = None,
) -> WorkspaceStatus:
"""Idempotently bootstrap an openrun workspace and report its state.
Re-running is safe: schema is applied via CREATE TABLE IF NOT EXISTS,
`mkdir(..., exist_ok=True)` is used everywhere, and no rows are ever
deleted or modified by this function.
Args:
config: Optional Config override. If None, `default_config()` is used,
which walks up from cwd looking for `openrun.toml`.
db_path: Optional explicit DB path. Overrides `config.db_path`.
secrets_dir: Optional auth-token directory. Defaults to `./.secrets`,
matching what `openrun.ingest.auth` / `openrun.ingest.garmin_api` use.
"""
cfg = config if config is not None else default_config()
resolved_db = Path(db_path) if db_path is not None else cfg.db_path
secrets = Path(secrets_dir) if secrets_dir is not None else (Path.cwd() / ".secrets")
db_existed = resolved_db.exists()
# connect() creates the parent dir and applies SCHEMA — both idempotent.
conn = connect(resolved_db)
try:
tables = tuple(sorted(
row[0]
for row in conn.execute(
"SELECT name FROM sqlite_master WHERE type='table'"
)
))
counts = {
t: conn.execute(f"SELECT COUNT(*) FROM {t}").fetchone()[0]
for t in tables
}
last_sync = get_state(conn, "last_sync_utc")
last_ingest = get_state(conn, "last_ingest_utc")
finally:
conn.close()
# Only attempt config discovery when the caller didn't pass an explicit Config —
# an explicit Config has unknown provenance, so we leave config_path None.
config_path = _find_config() if config is None else None
tokens_present = secrets.is_dir() and any(secrets.iterdir())
return WorkspaceStatus(
config_path=config_path,
db_path=resolved_db,
db_existed_before=db_existed,
tables_present=tables,
table_counts=counts,
auth_tokens_present=tokens_present,
last_sync_utc=last_sync,
last_ingest_utc=last_ingest,
)
def _format_human(status: WorkspaceStatus) -> str:
lines: list[str] = []
lines.append("openrun workspace")
lines.append("=" * 17)
if status.config_path:
lines.append(f" config : {status.config_path}")
else:
lines.append(" config : (built-in defaults — write openrun.toml to customise)")
lines.append(
f" db : {status.db_path} "
f"({'existed' if status.db_existed_before else 'created'})"
)
lines.append(f" schema : {'ready' if status.is_ready() else 'incomplete'}")
lines.append(
f" auth tokens : {'present' if status.auth_tokens_present else 'missing — run openrun-auth'}"
)
lines.append(f" last sync : {status.last_sync_utc or '(never)'}")
lines.append(f" last ingest : {status.last_ingest_utc or '(never)'}")
if status.table_counts:
lines.append("")
lines.append("table counts:")
width = max(len(t) for t in status.table_counts)
for t, n in status.table_counts.items():
lines.append(f" {t:<{width}} {n:>10,}")
return "\n".join(lines)
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--json", action="store_true", help="Emit machine-readable JSON")
args = parser.parse_args()
status = init_workspace()
if args.json:
print(json.dumps(status.to_dict(), indent=2, default=str))
else:
print(_format_human(status))
if __name__ == "__main__":
main()

430
sync.py
View File

@@ -1,431 +1,5 @@
"""Sync Garmin Connect data to local SQLite.
Usage:
uv run sync.py # incremental: last 14 days of wellness, new activities only
uv run sync.py --full # full backfill (activities + last 365 days wellness)
uv run sync.py --days 90 # backfill last 90 days of wellness
uv run sync.py --no-details # skip per-activity detail/splits (faster)
Run auth.py first to log in.
"""
from __future__ import annotations
import argparse
import dataclasses
import json
import sqlite3
import sys
import time
from datetime import date, datetime, timedelta
from pathlib import Path
from typing import Any, Callable
import garth
from garth.exc import GarthHTTPError
from db import connect, get_state, set_state
TOKEN_DIR = Path(__file__).parent / ".secrets"
# ---------------------------------------------------------------------------
# helpers
# ---------------------------------------------------------------------------
def _to_jsonable(obj: Any) -> Any:
"""Recursively convert dataclasses/datetimes/dates to JSON-safe values."""
if dataclasses.is_dataclass(obj) and not isinstance(obj, type):
return {f.name: _to_jsonable(getattr(obj, f.name)) for f in dataclasses.fields(obj)}
if isinstance(obj, (datetime, date)):
return obj.isoformat()
if isinstance(obj, dict):
return {k: _to_jsonable(v) for k, v in obj.items()}
if isinstance(obj, (list, tuple)):
return [_to_jsonable(v) for v in obj]
return obj
def _dump(obj: Any) -> str:
return json.dumps(_to_jsonable(obj), separators=(",", ":"), default=str)
def _safe_call(fn: Callable[[], Any], description: str) -> Any:
"""Run an API call with one retry on transient errors; return None on failure."""
for attempt in (1, 2):
try:
return fn()
except GarthHTTPError as exc:
status = getattr(getattr(exc, "error", None), "response", None)
code = getattr(status, "status_code", None)
if code == 404:
return None # data doesn't exist for this date
if attempt == 1 and code in (429, 500, 502, 503, 504):
print(f" retry {description} ({code})", file=sys.stderr)
time.sleep(2)
continue
print(f" ! {description} failed: {exc}", file=sys.stderr)
return None
except Exception as exc: # noqa: BLE001
print(f" ! {description} failed: {exc}", file=sys.stderr)
return None
return None
# ---------------------------------------------------------------------------
# activity sync
# ---------------------------------------------------------------------------
def sync_activities(conn: sqlite3.Connection, *, full: bool, fetch_details: bool) -> int:
"""Page through the activity list and upsert anything new (or all, in full mode)."""
existing_ids: set[int] = {
row["activity_id"] for row in conn.execute("SELECT activity_id FROM activities")
}
page_size = 100
offset = 0
inserted = 0
detail_count = 0
while True:
page = _safe_call(
lambda: garth.connectapi(
"/activitylist-service/activities/search/activities",
params={"limit": page_size, "start": offset},
),
f"activity list start={offset}",
)
if not page:
break
new_in_page = 0
for raw in page:
aid = raw.get("activityId")
if aid is None:
continue
if aid in existing_ids and not full:
continue
new_in_page += 1
existing_ids.add(aid)
atype = raw.get("activityType") or {}
conn.execute(
"""
INSERT INTO activities (
activity_id, start_time_local, start_time_gmt, activity_type,
activity_name, distance_m, duration_s, moving_duration_s,
avg_speed_mps, max_speed_mps, avg_hr, max_hr, calories,
elevation_gain_m, elevation_loss_m, training_load,
aerobic_te, anaerobic_te, vo2_max, raw, fetched_at
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, datetime('now'))
ON CONFLICT(activity_id) DO UPDATE SET
activity_name=excluded.activity_name,
distance_m=excluded.distance_m,
duration_s=excluded.duration_s,
raw=excluded.raw,
fetched_at=excluded.fetched_at
""",
(
aid,
raw.get("startTimeLocal"),
raw.get("startTimeGMT"),
atype.get("typeKey"),
raw.get("activityName"),
raw.get("distance"),
raw.get("duration"),
raw.get("movingDuration"),
raw.get("averageSpeed"),
raw.get("maxSpeed"),
raw.get("averageHR"),
raw.get("maxHR"),
raw.get("calories"),
raw.get("elevationGain"),
raw.get("elevationLoss"),
raw.get("activityTrainingLoad"),
raw.get("aerobicTrainingEffect"),
raw.get("anaerobicTrainingEffect"),
raw.get("vO2MaxValue"),
_dump(raw),
),
)
inserted += 1
if fetch_details:
if _fetch_activity_details(conn, aid):
detail_count += 1
conn.commit()
print(f" activities: page offset={offset}{new_in_page} new (total inserted: {inserted})")
if not full and new_in_page == 0:
break # incremental: stop once we hit fully-known territory
if len(page) < page_size:
break # last page
offset += page_size
if fetch_details:
print(f" fetched details/splits for {detail_count} activities")
return inserted
def _fetch_activity_details(conn: sqlite3.Connection, aid: int) -> bool:
"""Fetch detail and splits for a single activity. Returns True on success."""
detail = _safe_call(
lambda: garth.connectapi(f"/activity-service/activity/{aid}"),
f"activity detail {aid}",
)
if detail:
summary = detail.get("summaryDTO", {}) or {}
conn.execute(
"""
UPDATE activities SET
training_load = COALESCE(?, training_load),
aerobic_te = COALESCE(?, aerobic_te),
anaerobic_te = COALESCE(?, anaerobic_te),
vo2_max = COALESCE(?, vo2_max),
raw = ?
WHERE activity_id = ?
""",
(
summary.get("activityTrainingLoad"),
summary.get("trainingEffect"),
summary.get("anaerobicTrainingEffect"),
detail.get("vO2MaxValue") or summary.get("vO2MaxValue"),
_dump(detail),
aid,
),
)
splits = _safe_call(
lambda: garth.connectapi(f"/activity-service/activity/{aid}/splits"),
f"activity splits {aid}",
)
if splits and isinstance(splits, dict):
lap_dtos = splits.get("lapDTOs", []) or []
for idx, lap in enumerate(lap_dtos):
conn.execute(
"""
INSERT OR REPLACE INTO activity_splits
(activity_id, split_index, distance_m, duration_s, avg_hr, avg_speed_mps, elevation_gain_m, raw)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""",
(
aid,
idx,
lap.get("distance"),
lap.get("duration"),
lap.get("averageHR"),
lap.get("averageSpeed"),
lap.get("elevationGain"),
_dump(lap),
),
)
return detail is not None
# ---------------------------------------------------------------------------
# wellness sync (one function per data type)
# ---------------------------------------------------------------------------
def sync_steps(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailySteps.list(end=end, period=period), "DailySteps") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_steps
(calendar_date, total_steps, step_goal, distance_m, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.total_steps, r.step_goal, r.total_distance, _dump(r)),
)
conn.commit()
return len(rows)
def sync_sleep(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailySleep.list(end=end, period=period), "DailySleep") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_sleep
(calendar_date, sleep_score, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.value, _dump(r)),
)
conn.commit()
return len(rows)
def sync_stress(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailyStress.list(end=end, period=period), "DailyStress") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_stress
(calendar_date, avg_stress, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.overall_stress_level, _dump(r)),
)
conn.commit()
return len(rows)
def sync_hrv(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(lambda: garth.DailyHRV.list(end=end, period=period), "DailyHRV") or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_hrv
(calendar_date, weekly_avg, last_night_avg, last_night_5min, status, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
r.calendar_date.isoformat(),
r.weekly_avg,
r.last_night_avg,
r.last_night_5_min_high,
r.status,
_dump(r),
),
)
conn.commit()
return len(rows)
def sync_intensity_minutes(conn: sqlite3.Connection, end: date, period: int) -> int:
rows = _safe_call(
lambda: garth.DailyIntensityMinutes.list(end=end, period=period),
"DailyIntensityMinutes",
) or []
for r in rows:
conn.execute(
"""INSERT OR REPLACE INTO daily_intensity_minutes
(calendar_date, moderate_minutes, vigorous_minutes, raw, fetched_at)
VALUES (?, ?, ?, ?, datetime('now'))""",
(r.calendar_date.isoformat(), r.moderate_value, r.vigorous_value, _dump(r)),
)
conn.commit()
return len(rows)
def sync_body_battery(conn: sqlite3.Connection, end: date, period: int) -> int:
"""Per-day calls — slow for large backfills but the only path for BB aggregates."""
inserted = 0
for i in range(period):
d = end - timedelta(days=i)
existing = conn.execute(
"SELECT 1 FROM daily_body_battery WHERE calendar_date = ?", (d.isoformat(),)
).fetchone()
if existing and i > 2: # always refresh last 3 days, skip older if already present
continue
data = _safe_call(
lambda d=d: garth.connectapi(
f"/wellness-service/wellness/bodyBattery/reports/daily",
params={"startDate": d.isoformat(), "endDate": d.isoformat()},
),
f"body battery {d}",
)
if not data:
continue
item = data[0] if isinstance(data, list) and data else data
if not isinstance(item, dict):
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_body_battery
(calendar_date, charged, drained, highest, lowest, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))""",
(
d.isoformat(),
item.get("charged"),
item.get("drained"),
item.get("highestBatteryLevel") or item.get("highest"),
item.get("lowestBatteryLevel") or item.get("lowest"),
_dump(item),
),
)
inserted += 1
conn.commit()
return inserted
def sync_resting_hr(conn: sqlite3.Connection, end: date, period: int) -> int:
"""Resting HR per day via user-summary endpoint."""
inserted = 0
for i in range(period):
d = end - timedelta(days=i)
if i > 2:
existing = conn.execute(
"SELECT 1 FROM daily_resting_hr WHERE calendar_date = ?", (d.isoformat(),)
).fetchone()
if existing:
continue
data = _safe_call(
lambda d=d: garth.connectapi(
f"/usersummary-service/usersummary/daily/{garth.client.username}",
params={"calendarDate": d.isoformat()},
),
f"daily summary {d}",
)
if not isinstance(data, dict):
continue
rhr = data.get("restingHeartRate")
if rhr is None:
continue
conn.execute(
"""INSERT OR REPLACE INTO daily_resting_hr
(calendar_date, resting_hr, raw, fetched_at)
VALUES (?, ?, ?, datetime('now'))""",
(d.isoformat(), rhr, _dump(data)),
)
inserted += 1
conn.commit()
return inserted
# ---------------------------------------------------------------------------
# main
# ---------------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--full", action="store_true", help="Full backfill (365 days of wellness)")
parser.add_argument("--days", type=int, default=None, help="Days of wellness to backfill")
parser.add_argument("--no-details", action="store_true", help="Skip per-activity detail/splits")
parser.add_argument(
"--skip-activities", action="store_true", help="Wellness only, no activity sync"
)
args = parser.parse_args()
if not TOKEN_DIR.exists() or not any(TOKEN_DIR.iterdir()):
print("No tokens found — run `uv run auth.py` first.", file=sys.stderr)
sys.exit(1)
garth.resume(TOKEN_DIR)
days = args.days if args.days is not None else (365 if args.full else 14)
end = date.today()
conn = connect()
if not args.skip_activities:
print(f"→ activities (full={args.full}, details={not args.no_details})")
n = sync_activities(conn, full=args.full, fetch_details=not args.no_details)
print(f" done: {n} activities upserted")
wellness = [
("steps", sync_steps),
("sleep score", sync_sleep),
("stress", sync_stress),
("HRV", sync_hrv),
("intensity minutes", sync_intensity_minutes),
("resting HR", sync_resting_hr),
("body battery", sync_body_battery),
]
for name, fn in wellness:
print(f"{name} (last {days} days)")
n = fn(conn, end, days)
print(f" done: {n} rows")
set_state(conn, "last_sync_utc", datetime.utcnow().isoformat(timespec="seconds"))
conn.commit()
conn.close()
print("✓ sync complete")
"""Shim — see openrun.ingest.garmin_api."""
from openrun.ingest.garmin_api import main
if __name__ == "__main__":
main()

28
tests/conftest.py Normal file
View File

@@ -0,0 +1,28 @@
"""Shared pytest fixtures for openrun tests.
The `tmp_conn` fixture provides a fresh in-memory SQLite connection with the
full openrun schema applied — every test that touches the DB should request
this rather than `openrun.db.connect()` so the live `data/garmin.db` is never
read or written by the test suite.
"""
from __future__ import annotations
import sqlite3
import pytest
from openrun.db import SCHEMA
@pytest.fixture
def tmp_conn() -> sqlite3.Connection:
"""Fresh in-memory SQLite with the openrun schema applied."""
conn = sqlite3.connect(":memory:")
conn.execute("PRAGMA foreign_keys = ON")
conn.row_factory = sqlite3.Row
conn.executescript(SCHEMA)
try:
yield conn
finally:
conn.close()

15
tests/fixtures/README.md vendored Normal file
View File

@@ -0,0 +1,15 @@
# tests/fixtures/
Pinned, anonymised data samples used by the test suite. Each entry below records what the fixture is, where it came from, and which tests it exercises. Add a row when you add a fixture; remove it when nothing references the file.
| Fixture | Origin | Used by |
|---|---|---|
_(empty — populated as Phase 1 items land)_
## Conventions
- **Anonymise before committing.** Strip emails, full names, raw lat/lon outside the run radius. Activity IDs and timestamps are fine.
- **Keep them small.** ~5 MB max per `.fit`; aim for the smallest sample that exercises the code path.
- **One fixture, one purpose.** If you need two FITs for different tests, commit two files — don't reuse one as both a "happy path" and an "edge case."
- **Don't regenerate.** Fixtures are inputs, not outputs. If a test asserts decoupling = 7.3%, that number is locked to *this exact* fixture.

View File

View File

@@ -0,0 +1,180 @@
"""Schema round-trip tests: ingest a synthetic JSON payload through the
real handler, then read it back via the loader and assert the parsed/derived
values are right.
This is the contract test for every table: any future schema or unit-handling
change has to keep these green. Fixtures are inline JSON (anonymised) rather
than pinned files — these tests exercise *parsing*, not realistic data shapes.
"""
from __future__ import annotations
import json
from pathlib import Path
import pandas as pd
import pytest
from openrun.ingest.garmin_export import (
handle_activities,
handle_body_battery,
handle_hrv,
handle_intensity_minutes,
handle_resting_hr,
handle_sleep,
handle_steps,
handle_stress,
)
from openrun.model import (
load_activities,
load_sleep_stages,
load_wellness,
)
def _write_json(tmp_path: Path, name: str, payload) -> Path:
p = tmp_path / name
p.write_text(json.dumps(payload))
return p
def _seed_steps(conn, tmp_path: Path, date: str = "2026-05-04") -> None:
"""`load_wellness` joins everything onto daily_steps, so wellness tests must
seed at least one steps row on the date under test to anchor the join."""
handle_steps(conn, _write_json(tmp_path, "steps_anchor.json",
[{"calendarDate": date, "totalSteps": 1}]))
# ---------------------------------------------------------------------------
# activities — exercises the Takeout scaled-int unit conversion
# ---------------------------------------------------------------------------
def test_activities_roundtrip_takeout_units(tmp_conn, tmp_path: Path) -> None:
"""A Takeout `summarizedActivities` row in scaled-int units must come back
out in SI through `load_activities`, with derived `distance_km` correct."""
payload = [{
"summarizedActivitiesExport": [{
"activityId": 12345678,
"activityName": "Morning Run",
"startTimeLocal": "2026-05-04 06:00:00",
"startTimeGmt": "2026-05-04 10:00:00",
"activityType": {"typeKey": "running"},
"distance": 1_000_000, # 10 000 m, encoded as cm
"duration": 3_600_000, # 3 600 s, encoded as ms
"movingDuration": 3_500_000,
"averageSpeed": 0.2778, # 2.778 m/s stored as m/s ÷ 10
"maxSpeed": 0.4167, # 4.167 m/s stored as m/s ÷ 10
"averageHR": 145,
"maxHR": 168,
"calories": 540,
"elevationGain": 5000, # 50 m, encoded as cm
"elevationLoss": 4800,
"activityTrainingLoad": 110.0,
"aerobicTrainingEffect": 3.5,
"anaerobicTrainingEffect": 0.4,
"vO2MaxValue": 52.0,
}]
}]
n = handle_activities(tmp_conn, _write_json(tmp_path, "activities.json", payload))
assert n == 1
runs = load_activities(tmp_conn, type="running")
assert len(runs) == 1
row = runs.iloc[0]
assert row["activity_id"] == 12345678
assert row["distance_m"] == pytest.approx(10_000.0)
assert row["distance_km"] == pytest.approx(10.0)
assert row["duration_s"] == pytest.approx(3600.0)
assert row["pace_min_per_km"] == pytest.approx((3500 / 60) / 10.0, rel=1e-4)
assert row["avg_speed_mps"] == pytest.approx(2.778, rel=1e-3)
assert row["elevation_gain_m"] == pytest.approx(50.0)
assert row["training_load"] == 110.0
# ---------------------------------------------------------------------------
# Wellness daily tables — same shape: handler reads JSON, loader joins
# ---------------------------------------------------------------------------
def test_steps_roundtrip(tmp_conn, tmp_path: Path) -> None:
payload = [{"calendarDate": "2026-05-04", "totalSteps": 12345, "stepGoal": 10000, "totalDistance": 9234.5}]
handle_steps(tmp_conn, _write_json(tmp_path, "steps.json", payload))
df = load_wellness(tmp_conn)
assert len(df) == 1
row = df.loc[pd.Timestamp("2026-05-04")]
assert row["total_steps"] == 12345
def test_sleep_roundtrip_through_load_sleep_stages(tmp_conn, tmp_path: Path) -> None:
"""handle_sleep → daily_sleep → load_sleep_stages derived columns."""
payload = [{
"calendarDate": "2026-05-04",
"deepSleepSeconds": 3600,
"lightSleepSeconds": 10800,
"remSleepSeconds": 5400,
"awakeSleepSeconds": 1800,
"sleepScore": 78,
}]
handle_sleep(tmp_conn, _write_json(tmp_path, "sleep.json", payload))
df = load_sleep_stages(tmp_conn)
row = df.loc[pd.Timestamp("2026-05-04")]
assert row["sleep_score"] == 78
assert row["sleep_hours"] == pytest.approx((3600 + 10800 + 5400) / 3600.0)
# The invariant we lock in test_sleep_stages.py — duplicated here to assert
# the *roundtrip* preserves it, not just the in-memory frame.
assert row["deep_pct"] + row["light_pct"] + row["rem_pct"] == pytest.approx(1.0)
def test_stress_roundtrip(tmp_conn, tmp_path: Path) -> None:
_seed_steps(tmp_conn, tmp_path)
payload = [{"calendarDate": "2026-05-04", "overallStressLevel": 42, "maxStressLevel": 88}]
handle_stress(tmp_conn, _write_json(tmp_path, "stress.json", payload))
assert load_wellness(tmp_conn).loc[pd.Timestamp("2026-05-04"), "avg_stress"] == 42
def test_hrv_roundtrip(tmp_conn, tmp_path: Path) -> None:
_seed_steps(tmp_conn, tmp_path)
payload = [{
"calendarDate": "2026-05-04",
"weeklyAvg": 55.0,
"lastNightAvg": 60.0,
"lastNight5MinHigh": 72.0,
"status": "BALANCED",
}]
handle_hrv(tmp_conn, _write_json(tmp_path, "hrv.json", payload))
row = load_wellness(tmp_conn).loc[pd.Timestamp("2026-05-04")]
assert row["hrv_last_night"] == 60.0
assert row["hrv_weekly"] == 55.0
assert row["hrv_status"] == "BALANCED"
def test_resting_hr_roundtrip(tmp_conn, tmp_path: Path) -> None:
_seed_steps(tmp_conn, tmp_path)
payload = [{"calendarDate": "2026-05-04", "restingHeartRate": 52}]
handle_resting_hr(tmp_conn, _write_json(tmp_path, "rhr.json", payload))
assert load_wellness(tmp_conn).loc[pd.Timestamp("2026-05-04"), "resting_hr"] == 52
def test_intensity_minutes_roundtrip(tmp_conn, tmp_path: Path) -> None:
_seed_steps(tmp_conn, tmp_path)
payload = [{"calendarDate": "2026-05-04", "moderateIntensityMinutes": 30, "vigorousIntensityMinutes": 15}]
handle_intensity_minutes(tmp_conn, _write_json(tmp_path, "im.json", payload))
row = load_wellness(tmp_conn).loc[pd.Timestamp("2026-05-04")]
assert row["moderate_minutes"] == 30
assert row["vigorous_minutes"] == 15
def test_body_battery_roundtrip(tmp_conn, tmp_path: Path) -> None:
_seed_steps(tmp_conn, tmp_path)
payload = [{
"calendarDate": "2026-05-04",
"charged": 60,
"drained": 45,
"highest": 95,
"lowest": 20,
}]
handle_body_battery(tmp_conn, _write_json(tmp_path, "bb.json", payload))
row = load_wellness(tmp_conn).loc[pd.Timestamp("2026-05-04")]
assert row["bb_charged"] == 60
assert row["bb_lowest"] == 20

126
tests/unit/test_banister.py Normal file
View File

@@ -0,0 +1,126 @@
"""Tests for Banister CTL/ATL/TSB and the forecast splice helper.
Two things to lock down:
- the EWMA math itself, via a closed-form impulse-response check.
- the forecast splice, which must produce the same output as feeding the
whole series at once (i.e. no double-counting on the boundary day).
"""
from __future__ import annotations
import math
import numpy as np
import pandas as pd
import pytest
from openrun.model import banister, banister_forecast
def _daily_zero_load(start: str, days: int) -> pd.Series:
idx = pd.date_range(start, periods=days, freq="D")
return pd.Series(0.0, index=idx)
# ---------------------------------------------------------------------------
# banister() — closed-form correctness
# ---------------------------------------------------------------------------
def test_banister_impulse_decays_exponentially() -> None:
"""Single load on day 0, zeros after. The recursion
CTL[i] = CTL[i-1]*decay + load[i]*(1-decay)
with decay = exp(-1/tau) yields
CTL[i] = L * (1-decay) * decay**i.
"""
load = _daily_zero_load("2026-01-01", 50)
load.iloc[0] = 100.0
pmc = banister(load, ctl_tau=42.0, atl_tau=7.0)
decay_ctl = math.exp(-1 / 42)
decay_atl = math.exp(-1 / 7)
for i in (0, 1, 7, 21, 42):
expected_ctl = 100.0 * (1 - decay_ctl) * decay_ctl**i
expected_atl = 100.0 * (1 - decay_atl) * decay_atl**i
assert pmc["CTL"].iloc[i] == pytest.approx(expected_ctl, rel=1e-9)
assert pmc["ATL"].iloc[i] == pytest.approx(expected_atl, rel=1e-9)
def test_banister_steady_state_approaches_input() -> None:
"""Constant input L converges to CTL=L; after `tau` days CTL ≈ L·(1-1/e)."""
load = pd.Series(100.0, index=pd.date_range("2026-01-01", periods=200, freq="D"))
pmc = banister(load, ctl_tau=42.0, atl_tau=7.0)
assert pmc["CTL"].iloc[41] == pytest.approx(100.0 * (1 - 1 / math.e), rel=1e-3)
# After ~5τ days (200/42 ≈ 4.76), CTL is within 1% of the steady-state value.
assert pmc["CTL"].iloc[-1] == pytest.approx(100.0, rel=0.01)
def test_banister_rest_days_filled_with_zero() -> None:
"""Sparse input (only some dates) must be reindexed daily with zeros so
both EWMAs continue to decay on rest days."""
idx = pd.to_datetime(["2026-01-01", "2026-01-10"])
load = pd.Series([100.0, 0.0], index=idx)
pmc = banister(load, ctl_tau=42.0, atl_tau=7.0)
assert len(pmc) == 10 # 2026-01-01 through 2026-01-10 inclusive
def test_banister_empty_input_returns_empty_frame() -> None:
out = banister(pd.Series(dtype=float))
assert list(out.columns) == ["CTL", "ATL", "TSB"]
assert len(out) == 0
# ---------------------------------------------------------------------------
# banister_forecast() — splice + invariant
# ---------------------------------------------------------------------------
def test_banister_forecast_matches_full_history() -> None:
"""Splitting a full series into (history, future) and running through the
forecast helper produces the identical PMC frame to feeding the whole
series at once. This is the load-bearing invariant."""
rng = np.random.default_rng(seed=42)
idx = pd.date_range("2026-01-01", periods=120, freq="D")
full = pd.Series(rng.integers(0, 150, size=120).astype(float), index=idx)
cutoff = idx[60]
history = full[full.index <= cutoff]
future = full[full.index > cutoff]
pmc_full = banister(full)
pmc_spliced = banister_forecast(history, future, today=cutoff)
pd.testing.assert_frame_equal(pmc_full, pmc_spliced)
def test_banister_forecast_default_today_is_history_max() -> None:
history = pd.Series([10.0, 20.0, 30.0],
index=pd.date_range("2026-01-01", periods=3, freq="D"))
future = pd.Series([5.0, 5.0],
index=pd.date_range("2026-01-04", periods=2, freq="D"))
out_default = banister_forecast(history, future)
out_explicit = banister_forecast(history, future, today="2026-01-03")
pd.testing.assert_frame_equal(out_default, out_explicit)
def test_banister_forecast_drops_future_dates_at_or_before_today() -> None:
"""A future row dated ≤ today must be dropped — history is authoritative for that day."""
history = pd.Series([100.0],
index=pd.date_range("2026-01-03", periods=1, freq="D"))
# Future contains an "overlapping" entry at today that would otherwise sum with history.
future = pd.Series([999.0, 5.0],
index=pd.to_datetime(["2026-01-03", "2026-01-04"]))
pmc = banister_forecast(history, future, today="2026-01-03")
# Day 0 should reflect history's 100, not the 999 from future.
decay_ctl = math.exp(-1 / 42)
assert pmc["CTL"].iloc[0] == pytest.approx(100.0 * (1 - decay_ctl), rel=1e-9)
def test_banister_forecast_extends_through_future_end() -> None:
history = pd.Series([10.0],
index=pd.date_range("2026-01-01", periods=1, freq="D"))
future = pd.Series([5.0] * 5,
index=pd.date_range("2026-01-02", periods=5, freq="D"))
pmc = banister_forecast(history, future)
assert pmc.index.max() == pd.Timestamp("2026-01-06")

View File

@@ -0,0 +1,227 @@
"""Tests for `openrun.ingest.fit_linker`.
We don't exercise the FIT-parsing path here — the linker's responsibilities at
the DB boundary are (1) match a session timestamp to an activity, (2) store
absolute paths, (3) rewrite them when an export moves, and (4) refuse to
silently overwrite an already-linked activity. All four are testable without
a real .fit fixture.
"""
from __future__ import annotations
from datetime import datetime, timezone
from pathlib import Path
from openrun.ingest.fit_linker import (
_match_activity,
link,
record_link,
relink,
)
def _seed_activity(conn, aid: int, when: datetime) -> None:
conn.execute(
"INSERT INTO activities (activity_id, start_time_gmt, raw, fetched_at) VALUES (?, ?, '{}', 'now')",
(aid, when.astimezone(timezone.utc).isoformat().replace("+00:00", "")),
)
conn.commit()
def _row(conn, aid: int) -> tuple[int, str]:
return conn.execute(
"SELECT activity_id, fit_path FROM activity_fit_files WHERE activity_id = ?",
(aid,),
).fetchone()
def test_record_link_stores_absolute_path(tmp_conn, tmp_path: Path) -> None:
"""A relative path passed to record_link is resolved to absolute on disk."""
fit = tmp_path / "sub" / "run.fit"
fit.parent.mkdir(parents=True)
fit.write_bytes(b"") # contents irrelevant — record_link doesn't parse
# Pass a relative path on purpose to verify resolution.
rel = fit.relative_to(tmp_path)
cwd_before = Path.cwd()
import os
os.chdir(tmp_path)
try:
record_link(tmp_conn, 1001, Path(rel))
finally:
os.chdir(cwd_before)
tmp_conn.commit()
stored = _row(tmp_conn, 1001)["fit_path"]
assert Path(stored).is_absolute(), f"expected absolute, got {stored}"
assert Path(stored) == fit.resolve()
def test_record_link_is_idempotent(tmp_conn, tmp_path: Path) -> None:
fit = tmp_path / "run.fit"
fit.write_bytes(b"")
record_link(tmp_conn, 42, fit)
record_link(tmp_conn, 42, fit)
tmp_conn.commit()
count = tmp_conn.execute(
"SELECT COUNT(*) FROM activity_fit_files WHERE activity_id = 42"
).fetchone()[0]
assert count == 1
def test_relink_rewrites_paths_by_basename(tmp_conn, tmp_path: Path) -> None:
"""After moving an export, `relink` should update stored absolute paths to the new root."""
old_root = tmp_path / "old"
new_root = tmp_path / "new"
(old_root / "fit").mkdir(parents=True)
(new_root / "fit").mkdir(parents=True)
# Two activities, one of which will move; one filename has no counterpart in new_root.
fit_a_old = old_root / "fit" / "a.fit"
fit_b_old = old_root / "fit" / "b.fit"
fit_a_old.write_bytes(b"")
fit_b_old.write_bytes(b"")
record_link(tmp_conn, 1, fit_a_old)
record_link(tmp_conn, 2, fit_b_old)
tmp_conn.commit()
fit_a_new = new_root / "fit" / "a.fit"
fit_a_new.write_bytes(b"")
# Intentionally leave `b.fit` absent from new_root to test the unmatched path.
updated, unmatched = relink(tmp_conn, new_root)
assert updated == 1
assert unmatched == 1
a_after = _row(tmp_conn, 1)["fit_path"]
b_after = _row(tmp_conn, 2)["fit_path"]
assert Path(a_after) == fit_a_new.resolve()
# Unmatched row left untouched, still pointing at the old (now non-existent) location.
assert Path(b_after) == fit_b_old.resolve()
def test_relink_rejects_non_directory(tmp_conn, tmp_path: Path) -> None:
missing = tmp_path / "does_not_exist"
try:
relink(tmp_conn, missing)
except FileNotFoundError as exc:
assert "not a directory" in str(exc)
else:
raise AssertionError("expected FileNotFoundError for missing relink root")
# ---------------------------------------------------------------------------
# _match_activity — pure function, no DB
# ---------------------------------------------------------------------------
def test_match_activity_within_tolerance() -> None:
index = {1000: 42, 2000: 99}
keys = sorted(index)
assert _match_activity(1030, keys, index, tolerance_s=60) == 42
def test_match_activity_picks_closest() -> None:
"""Two candidates both within tolerance → closest one wins."""
index = {1000: 42, 1050: 99}
keys = sorted(index)
# 1030 is 30s after 1000 and 20s before 1050 → picks 1050
assert _match_activity(1030, keys, index, tolerance_s=60) == 99
def test_match_activity_outside_tolerance_returns_none() -> None:
index = {1000: 42}
keys = sorted(index)
assert _match_activity(1500, keys, index, tolerance_s=60) is None
def test_match_activity_empty_index() -> None:
assert _match_activity(1000, [], {}, tolerance_s=60) is None
# ---------------------------------------------------------------------------
# link() collision handling — uses fit_iter injection so we never parse a FIT
# ---------------------------------------------------------------------------
def test_link_writes_match(tmp_conn, tmp_path: Path) -> None:
when = datetime(2026, 5, 1, 12, 0, 0, tzinfo=timezone.utc)
_seed_activity(tmp_conn, aid=7, when=when)
fit = tmp_path / "good.fit"
fit.write_bytes(b"")
summary = link(
export_root=tmp_path,
conn=tmp_conn,
dry_run=False,
min_size_kb=0,
tolerance_s=60,
fit_iter=[(fit, when)],
)
assert summary == {"linked": 1, "unmatched": 0, "parse_failed": 0, "collisions": 0}
row = tmp_conn.execute(
"SELECT fit_path FROM activity_fit_files WHERE activity_id = 7"
).fetchone()
assert Path(row["fit_path"]) == fit.resolve()
def test_link_warns_and_skips_on_collision(tmp_conn, tmp_path: Path, capsys) -> None:
"""Two FITs whose session times both match the *same* activity:
first wins, second is reported as a collision and not written."""
when = datetime(2026, 5, 1, 12, 0, 0, tzinfo=timezone.utc)
_seed_activity(tmp_conn, aid=7, when=when)
first = tmp_path / "first.fit"
second = tmp_path / "second.fit"
first.write_bytes(b"")
second.write_bytes(b"")
summary = link(
export_root=tmp_path,
conn=tmp_conn,
dry_run=False,
min_size_kb=0,
tolerance_s=60,
# Both FITs report the same session start → both match aid=7.
fit_iter=[(first, when), (second, when)],
)
assert summary["linked"] == 1
assert summary["collisions"] == 1
# The first FIT's path must still be in the table — second must not have clobbered.
stored = tmp_conn.execute(
"SELECT fit_path FROM activity_fit_files WHERE activity_id = 7"
).fetchone()["fit_path"]
assert Path(stored) == first.resolve()
# Collision is reported on stderr with both filenames for diagnosis.
err = capsys.readouterr().err
assert "collision" in err
assert "second.fit" in err
assert "first.fit" in err
def test_link_counts_parse_failures_and_unmatched(tmp_conn, tmp_path: Path) -> None:
when = datetime(2026, 5, 1, 12, 0, 0, tzinfo=timezone.utc)
_seed_activity(tmp_conn, aid=7, when=when)
far_off = datetime(2030, 1, 1, tzinfo=timezone.utc)
parse_fail = tmp_path / "broken.fit"
orphan = tmp_path / "orphan.fit"
parse_fail.write_bytes(b"")
orphan.write_bytes(b"")
summary = link(
export_root=tmp_path,
conn=tmp_conn,
dry_run=False,
min_size_kb=0,
tolerance_s=60,
fit_iter=[(parse_fail, None), (orphan, far_off)],
)
assert summary["parse_failed"] == 1
assert summary["unmatched"] == 1
assert summary["linked"] == 0

View File

@@ -0,0 +1,61 @@
"""Tests for `openrun.ingest.garmin_export`."""
from __future__ import annotations
import subprocess
import sys
from pathlib import Path
from openrun.ingest.garmin_export import _activity_id_from_filename
def test_activity_id_from_filename_connect_style() -> None:
# Connect export: <id>_<activity_name>.fit
assert _activity_id_from_filename("12345678_morning_run") == 12345678
def test_activity_id_from_filename_takeout_style() -> None:
# Takeout: <email>_<upload_id>.fit (email is the literal prefix Garmin uses)
assert _activity_id_from_filename("o@o00.io_132001424015") == 132001424015
def test_activity_id_from_filename_takeout_with_year_prefix() -> None:
# Some Takeout names embed a date prefix; we want the upload-id (longer chunk), not the year.
assert _activity_id_from_filename("2024_132001424015") == 132001424015
def test_activity_id_from_filename_none_when_no_digit_chunk() -> None:
assert _activity_id_from_filename("trainingPlan") is None
assert _activity_id_from_filename("just_words_here") is None
def test_activity_id_from_filename_short_digit_chunk_rejected() -> None:
# <8 digits is treated as not-an-activity-id (filters out year-like 2024, etc).
assert _activity_id_from_filename("2024_run") is None
def test_main_hint_printed_for_takeout_named_fits(tmp_path: Path) -> None:
"""End-to-end through main(): a FIT whose stem has no activity-id chunk
triggers the 'run openrun-link-fit' hint in the summary."""
export = tmp_path / "export"
export.mkdir()
# A Takeout-style filename — has a long digit chunk, but the chunk is
# the upload id (not the activity id), and we have no activities seeded.
# Even though the regex picks it up, the activity-not-found branch returns 0.
# To exercise the *no-id-in-filename* branch we use a clearly nameless FIT:
(export / "trainingPlan.fit").write_bytes(b"")
# Run via a child process so we don't touch the real DB. We point
# openrun at an isolated workspace via OPENRUN_CONFIG.
cfg = tmp_path / "openrun.toml"
cfg.write_text(f'[db]\npath = "{tmp_path / "test.db"}"\n')
proc = subprocess.run(
[sys.executable, "-m", "openrun.ingest.garmin_export", str(export)],
capture_output=True,
text=True,
env={"OPENRUN_CONFIG": str(cfg), "PATH": __import__("os").environ.get("PATH", "")},
)
assert proc.returncode == 0, proc.stderr
assert "FITs skipped" in proc.stdout
assert "openrun-link-fit" in proc.stdout

25
tests/unit/test_model.py Normal file
View File

@@ -0,0 +1,25 @@
"""Tests for pure helpers in `openrun.model`."""
from __future__ import annotations
from pathlib import Path
import pytest
from openrun.model import _resolve_fit_path
def test_resolve_fit_path_returns_existing_absolute_path(tmp_path: Path) -> None:
fit = tmp_path / "sub" / "run.fit"
fit.parent.mkdir(parents=True)
fit.write_bytes(b"")
assert _resolve_fit_path(str(fit)) == fit
def test_resolve_fit_path_missing_raises_with_relink_hint(tmp_path: Path) -> None:
missing = tmp_path / "gone.fit"
with pytest.raises(FileNotFoundError) as exc_info:
_resolve_fit_path(str(missing))
msg = str(exc_info.value)
assert "--relink" in msg, f"expected hint about --relink, got: {msg!r}"
assert str(missing) in msg

63
tests/unit/test_plots.py Normal file
View File

@@ -0,0 +1,63 @@
"""Smoke tests for `openrun.plots`.
We deliberately don't visual-regress matplotlib output — see ROADMAP test
conventions. Each test asserts only structural properties: bar count, axis
labels, "no data" sentinel behavior.
"""
from __future__ import annotations
import matplotlib
matplotlib.use("Agg") # must be set before pyplot import
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from openrun.plots import plot_fit_decoupling
def _synthetic_records(duration_s: int = 1800, decel_at: int = 900) -> pd.DataFrame:
"""A FIT-records-like frame: constant HR, but speed drops halfway through.
With 2 segments, the second-half efficiency must be lower → positive decoupling.
"""
elapsed = np.arange(duration_s, dtype=float)
speed = np.where(elapsed < decel_at, 3.0, 2.5)
hr = np.full(duration_s, 150.0)
return pd.DataFrame({"elapsed_s": elapsed, "speed_mps": speed, "heart_rate": hr})
def test_plot_fit_decoupling_returns_axes_with_n_bars() -> None:
records = _synthetic_records()
ax = plot_fit_decoupling(records, segments=4)
bars = [p for p in ax.patches if hasattr(p, "get_height")]
assert len(bars) == 4
assert ax.get_ylabel() == "decoupling (%)"
plt.close(ax.figure)
def test_plot_fit_decoupling_segments_default_is_two() -> None:
records = _synthetic_records()
ax = plot_fit_decoupling(records)
bars = [p for p in ax.patches if hasattr(p, "get_height")]
assert len(bars) == 2
plt.close(ax.figure)
def test_plot_fit_decoupling_handles_empty_records() -> None:
"""Empty records frame → 'no usable records' sentinel, not a crash."""
empty = pd.DataFrame(columns=["elapsed_s", "speed_mps", "heart_rate"])
ax = plot_fit_decoupling(empty)
texts = [t.get_text() for t in ax.texts]
assert any("no usable records" in t for t in texts)
plt.close(ax.figure)
def test_plot_fit_decoupling_respects_caller_axes() -> None:
"""When the caller passes ax=, no new figure is created."""
fig, ax = plt.subplots()
out = plot_fit_decoupling(_synthetic_records(), ax=ax)
assert out is ax
plt.close(fig)

View File

@@ -0,0 +1,116 @@
"""Tests for race-plan helpers: `personal_records` and `calibrate_tl_per_km`."""
from __future__ import annotations
import numpy as np
import pandas as pd
from openrun.model import calibrate_tl_per_km, personal_records
def _act(aid: int, dist_km: float, pace: float, when: str = "2026-01-01") -> dict:
"""Build a single-activity dict matching `load_activities` output columns."""
duration_s = pace * 60.0 * dist_km
return {
"activity_id": aid,
"start_time_local": pd.Timestamp(when),
"distance_km": dist_km,
"pace_min_per_km": pace,
"moving_duration_s": duration_s,
}
# ---------------------------------------------------------------------------
# personal_records
# ---------------------------------------------------------------------------
def test_personal_records_picks_fastest_in_bin() -> None:
"""Three 10K candidates: returned row must be the one with the lowest pace."""
acts = pd.DataFrame([
_act(1, 10.0, pace=5.0),
_act(2, 10.2, pace=4.5), # winner
_act(3, 9.8, pace=4.8),
])
pr = personal_records(acts, distance_bins_km=(10.0,))
assert len(pr) == 1
assert pr.loc[0, "activity_id"] == 2
assert pr.loc[0, "pace_min_per_km"] == 4.5
def test_personal_records_respects_tolerance_band() -> None:
"""±5% band on 10K = [9.5, 10.5]. An 11K activity is too far out to qualify."""
acts = pd.DataFrame([
_act(1, 10.0, pace=5.0),
_act(2, 11.0, pace=3.5), # faster but outside tolerance
])
pr = personal_records(acts, distance_bins_km=(10.0,))
assert pr.loc[0, "activity_id"] == 1
def test_personal_records_skips_bins_with_no_qualifying_activity() -> None:
acts = pd.DataFrame([_act(1, 5.0, pace=5.0)])
pr = personal_records(acts, distance_bins_km=(5.0, 10.0, 21.0975))
assert list(pr["bin_km"]) == [5.0]
def test_personal_records_ignores_nan_pace() -> None:
acts = pd.DataFrame([
{**_act(1, 10.0, pace=5.0), "pace_min_per_km": np.nan}, # masked by loader
_act(2, 10.0, pace=5.5),
])
pr = personal_records(acts, distance_bins_km=(10.0,))
assert pr.loc[0, "activity_id"] == 2
def test_personal_records_empty_input_returns_empty_frame() -> None:
pr = personal_records(pd.DataFrame())
assert pr.empty
assert set(pr.columns) >= {"bin_km", "activity_id", "pace_min_per_km"}
# ---------------------------------------------------------------------------
# calibrate_tl_per_km
# ---------------------------------------------------------------------------
def _insert_activity(conn, aid: int, distance_m: float, training_load: float,
activity_type: str = "running", when: str = "2026-05-01") -> None:
conn.execute(
"""INSERT INTO activities
(activity_id, start_time_local, activity_type, distance_m, training_load, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, '{}', 'now')""",
(aid, when, activity_type, distance_m, training_load),
)
def test_calibrate_tl_per_km_returns_median_iqr(tmp_conn) -> None:
"""Median of synthetic tl/km values equals the constructed truth."""
# 5 runs, each 10 km, with training loads chosen so tl/km = 8, 9, 10, 11, 12.
for i, tl_per_km in enumerate([8.0, 9.0, 10.0, 11.0, 12.0], start=1):
_insert_activity(tmp_conn, aid=i, distance_m=10_000, training_load=10 * tl_per_km)
tmp_conn.commit()
result = calibrate_tl_per_km(tmp_conn, lookback_days=None)
assert result["n"] == 5
assert result["median"] == 10.0
assert result["q1"] == 9.0
assert result["q3"] == 11.0
def test_calibrate_tl_per_km_excludes_short_distance(tmp_conn) -> None:
_insert_activity(tmp_conn, aid=1, distance_m=10_000, training_load=100)
_insert_activity(tmp_conn, aid=2, distance_m=500, training_load=50) # too short
tmp_conn.commit()
assert calibrate_tl_per_km(tmp_conn, lookback_days=None)["n"] == 1
def test_calibrate_tl_per_km_excludes_other_activity_types(tmp_conn) -> None:
_insert_activity(tmp_conn, aid=1, distance_m=10_000, training_load=100, activity_type="running")
_insert_activity(tmp_conn, aid=2, distance_m=10_000, training_load=100, activity_type="cycling")
tmp_conn.commit()
assert calibrate_tl_per_km(tmp_conn, lookback_days=None)["n"] == 1
def test_calibrate_tl_per_km_returns_nan_when_empty(tmp_conn) -> None:
result = calibrate_tl_per_km(tmp_conn, lookback_days=None)
assert result["n"] == 0
assert np.isnan(result["median"])

106
tests/unit/test_setup.py Normal file
View File

@@ -0,0 +1,106 @@
"""Tests for `openrun.setup.init_workspace` — idempotency, state reporting."""
from __future__ import annotations
import sqlite3
from pathlib import Path
from openrun.config import Config
from openrun.setup import EXPECTED_TABLES, init_workspace
def _cfg(tmp_path: Path) -> Config:
"""A Config that points at an isolated tmp_path DB."""
return Config(db_path=tmp_path / "test.db")
def _empty_secrets(tmp_path: Path) -> Path:
"""A non-existent secrets dir under tmp_path — guarantees tests never read the real one."""
return tmp_path / ".secrets"
def test_first_run_creates_db(tmp_path: Path) -> None:
cfg = _cfg(tmp_path)
status = init_workspace(config=cfg, secrets_dir=_empty_secrets(tmp_path))
assert not status.db_existed_before
assert cfg.db_path.exists()
assert status.is_ready()
assert set(EXPECTED_TABLES).issubset(status.tables_present)
def test_second_run_is_idempotent(tmp_path: Path) -> None:
cfg = _cfg(tmp_path)
secrets = _empty_secrets(tmp_path)
first = init_workspace(config=cfg, secrets_dir=secrets)
second = init_workspace(config=cfg, secrets_dir=secrets)
assert not first.db_existed_before
assert second.db_existed_before
assert first.tables_present == second.tables_present
# Idempotent: re-running mutates nothing, so all counts stay zero.
assert all(n == 0 for n in second.table_counts.values())
def test_preserves_existing_data(tmp_path: Path) -> None:
cfg = _cfg(tmp_path)
secrets = _empty_secrets(tmp_path)
init_workspace(config=cfg, secrets_dir=secrets)
# Insert one row through a raw connection; init_workspace must not touch it.
conn = sqlite3.connect(cfg.db_path)
conn.execute(
"INSERT INTO activities (activity_id, raw, fetched_at) VALUES (1, '{}', 'now')"
)
conn.commit()
conn.close()
status = init_workspace(config=cfg, secrets_dir=secrets)
assert status.db_existed_before
assert status.table_counts["activities"] == 1
def test_reports_auth_tokens(tmp_path: Path) -> None:
cfg = _cfg(tmp_path)
secrets = tmp_path / ".secrets"
missing = init_workspace(config=cfg, secrets_dir=secrets)
assert not missing.auth_tokens_present
secrets.mkdir()
(secrets / "oauth1_token.json").write_text("{}")
present = init_workspace(config=cfg, secrets_dir=secrets)
assert present.auth_tokens_present
def test_reports_sync_state(tmp_path: Path) -> None:
cfg = _cfg(tmp_path)
secrets = _empty_secrets(tmp_path)
init_workspace(config=cfg, secrets_dir=secrets)
conn = sqlite3.connect(cfg.db_path)
conn.execute(
"INSERT INTO sync_state (key, value, updated_at) VALUES (?, ?, datetime('now'))",
("last_sync_utc", "2026-05-18T10:30:00"),
)
conn.commit()
conn.close()
status = init_workspace(config=cfg, secrets_dir=secrets)
assert status.last_sync_utc == "2026-05-18T10:30:00"
assert status.last_ingest_utc is None
def test_status_serialises_to_json(tmp_path: Path) -> None:
"""`to_dict()` output must be JSON-encodable for `--json` CLI usage."""
import json
status = init_workspace(config=_cfg(tmp_path), secrets_dir=_empty_secrets(tmp_path))
blob = json.dumps(status.to_dict(), default=str)
parsed = json.loads(blob)
assert parsed["db_path"].endswith("test.db")
assert parsed["db_existed_before"] is False
assert "activities" in parsed["tables_present"]

View File

@@ -0,0 +1,68 @@
"""Tests for `load_sleep_stages` — percentage invariants + edge cases."""
from __future__ import annotations
import numpy as np
import pandas as pd
import pytest
from openrun.model import load_sleep_stages
def _seed(conn, date: str, deep: float, light: float, rem: float,
awake: float = 0.0, score: float | None = None) -> None:
conn.execute(
"""INSERT INTO daily_sleep
(calendar_date, deep_s, light_s, rem_s, awake_s, sleep_score, raw, fetched_at)
VALUES (?, ?, ?, ?, ?, ?, '{}', 'now')""",
(date, deep, light, rem, awake, score),
)
conn.commit()
def test_sleep_stages_percentages_sum_to_one(tmp_conn) -> None:
"""deep + light + rem percentages always sum to 1.0 (the load-bearing invariant)."""
_seed(tmp_conn, "2026-05-01", deep=3600, light=10800, rem=5400) # 1h / 3h / 1.5h
_seed(tmp_conn, "2026-05-02", deep=1800, light=9000, rem=3600) # 0.5h / 2.5h / 1h
df = load_sleep_stages(tmp_conn)
total_pct = df["deep_pct"] + df["light_pct"] + df["rem_pct"]
np.testing.assert_allclose(total_pct.values, [1.0, 1.0], rtol=1e-12)
def test_sleep_stages_percentages_match_seconds(tmp_conn) -> None:
_seed(tmp_conn, "2026-05-01", deep=3600, light=10800, rem=5400)
row = load_sleep_stages(tmp_conn).iloc[0]
total = 3600 + 10800 + 5400
assert row["deep_pct"] == pytest.approx(3600 / total)
assert row["light_pct"] == pytest.approx(10800 / total)
assert row["rem_pct"] == pytest.approx(5400 / total)
def test_sleep_stages_awake_pct_uses_time_in_bed(tmp_conn) -> None:
"""awake_pct = awake / (asleep + awake), not /total nor /asleep alone."""
# 4h asleep + 1h awake → awake_pct = 1/5 = 0.2
_seed(tmp_conn, "2026-05-01", deep=3600, light=7200, rem=3600, awake=3600)
row = load_sleep_stages(tmp_conn).iloc[0]
assert row["awake_pct"] == pytest.approx(0.2)
def test_sleep_stages_sleep_hours_excludes_awake(tmp_conn) -> None:
_seed(tmp_conn, "2026-05-01", deep=3600, light=7200, rem=3600, awake=999)
row = load_sleep_stages(tmp_conn).iloc[0]
# 1 + 2 + 1 = 4 hours.
assert row["sleep_hours"] == pytest.approx(4.0)
def test_sleep_stages_zero_sleep_yields_nan_pct(tmp_conn) -> None:
"""Recorded night with no asleep time (data glitch) → NaN percentages, no DivisionByZero."""
_seed(tmp_conn, "2026-05-01", deep=0, light=0, rem=0, awake=1800)
row = load_sleep_stages(tmp_conn).iloc[0]
assert np.isnan(row["deep_pct"])
assert np.isnan(row["light_pct"])
assert np.isnan(row["rem_pct"])
def test_sleep_stages_empty_returns_empty_frame(tmp_conn) -> None:
df = load_sleep_stages(tmp_conn)
assert df.empty

35
tests/unit/test_smoke.py Normal file
View File

@@ -0,0 +1,35 @@
"""Smoke test — verifies the test scaffolding itself works.
If this passes, `uv run pytest` is wired up, the `tmp_conn` fixture builds a
DB from `openrun.db.SCHEMA`, and the schema covers every table that loaders
in `openrun.model` reference.
"""
from __future__ import annotations
EXPECTED_TABLES = frozenset({
"activities",
"activity_splits",
"activity_fit_files",
"activity_time_in_zone",
"daily_steps",
"daily_sleep",
"daily_stress",
"daily_hrv",
"daily_body_battery",
"daily_intensity_minutes",
"daily_resting_hr",
"sync_state",
})
def test_tmp_conn_applies_full_schema(tmp_conn) -> None:
tables = {
row[0]
for row in tmp_conn.execute(
"SELECT name FROM sqlite_master WHERE type='table'"
)
}
missing = EXPECTED_TABLES - tables
assert not missing, f"schema is missing expected tables: {sorted(missing)}"

View File

@@ -0,0 +1,74 @@
"""Tests for `weekly_time_in_zone`."""
from __future__ import annotations
import pandas as pd
from openrun.model import weekly_time_in_zone
def _seed(conn, aid: int, when: str, *, atype: str = "running",
z1=0.0, z2=0.0, z3=0.0, z4=0.0, z5=0.0) -> None:
conn.execute(
"""INSERT INTO activities
(activity_id, start_time_local, activity_type, raw, fetched_at)
VALUES (?, ?, ?, '{}', 'now')""",
(aid, when, atype),
)
conn.execute(
"""INSERT INTO activity_time_in_zone
(activity_id, z1_s, z2_s, z3_s, z4_s, z5_s, total_s, source, computed_at)
VALUES (?, ?, ?, ?, ?, ?, ?, 'fit', 'now')""",
(aid, z1, z2, z3, z4, z5, z1 + z2 + z3 + z4 + z5),
)
conn.commit()
def test_weekly_tiz_sums_within_week(tmp_conn) -> None:
"""Three running activities in the same ISO week sum into one row."""
# 2026-05-04 is Mon, 2026-05-06 Wed, 2026-05-10 Sun — all the same Mon-anchored week.
_seed(tmp_conn, 1, "2026-05-04 06:00:00", z1=600, z2=1200)
_seed(tmp_conn, 2, "2026-05-06 06:00:00", z2=300, z3=900)
_seed(tmp_conn, 3, "2026-05-10 06:00:00", z4=400)
out = weekly_time_in_zone(tmp_conn)
assert len(out) == 1
row = out.iloc[0]
assert row["z1_s"] == 600
assert row["z2_s"] == 1500
assert row["z3_s"] == 900
assert row["z4_s"] == 400
assert row["z5_s"] == 0
assert row["total_s"] == 600 + 1200 + 300 + 900 + 400
assert row["n_activities"] == 3
def test_weekly_tiz_splits_across_weeks(tmp_conn) -> None:
_seed(tmp_conn, 1, "2026-05-04 06:00:00", z2=1000) # week of 2026-05-04
_seed(tmp_conn, 2, "2026-05-12 06:00:00", z3=2000) # week of 2026-05-11
out = weekly_time_in_zone(tmp_conn)
assert len(out) == 2
assert out.index.min() == pd.Timestamp("2026-05-04")
assert out.index.max() == pd.Timestamp("2026-05-11")
def test_weekly_tiz_filters_by_activity_type(tmp_conn) -> None:
_seed(tmp_conn, 1, "2026-05-04 06:00:00", atype="running", z2=1000)
_seed(tmp_conn, 2, "2026-05-04 18:00:00", atype="cycling", z2=2000)
out = weekly_time_in_zone(tmp_conn)
assert out.iloc[0]["z2_s"] == 1000 # cycling excluded by default
def test_weekly_tiz_date_window(tmp_conn) -> None:
_seed(tmp_conn, 1, "2026-04-01 06:00:00", z2=100)
_seed(tmp_conn, 2, "2026-05-04 06:00:00", z2=500)
_seed(tmp_conn, 3, "2026-06-01 06:00:00", z2=900)
out = weekly_time_in_zone(tmp_conn, start="2026-05-01", end="2026-05-31")
assert len(out) == 1
assert out.iloc[0]["z2_s"] == 500
def test_weekly_tiz_empty_returns_empty_frame_with_columns(tmp_conn) -> None:
out = weekly_time_in_zone(tmp_conn)
assert out.empty
assert {"z1_s", "z2_s", "z3_s", "z4_s", "z5_s", "total_s", "n_activities"} <= set(out.columns)

175
uv.lock generated
View File

@@ -220,6 +220,75 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ae/8c/469afb6465b853afff216f9528ffda78a915ff880ed58813ba4faf4ba0b6/contourpy-1.3.3-cp314-cp314t-win_arm64.whl", hash = "sha256:b7448cb5a725bb1e35ce88771b86fba35ef418952474492cf7c764059933ff8b", size = 203831 },
]
[[package]]
name = "coverage"
version = "7.14.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/23/7f/d0720730a397a999ffc0fd3f5bebef347338e3a47b727da66fbb228e2ff2/coverage-7.14.0.tar.gz", hash = "sha256:057a6af2f160a85384cde4ab36f0d2777bae1057bae255f95413cdd382aa5c74", size = 919489 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/6b/76/b7c66ee3c66e1b0f9d894c8125983aa0c03fb2336f2fd16559f9c966157f/coverage-7.14.0-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:f2bbb8254370eb4c628ff3d6fa8a7f74ddc40565394d4f7ab791d1fe568e37ef", size = 219990 },
{ url = "https://files.pythonhosted.org/packages/b3/af/e567cbad5ba69c013a50146dfa886dc7193361fda77521f51274ff620e1b/coverage-7.14.0-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:23b81107f46d3f21d0cbce30664fcec0f5d9f585638a67081750f99738f6bf66", size = 220365 },
{ url = "https://files.pythonhosted.org/packages/44/6f/9ad575d505b4d805b254febc8a5b338a2efe278f8786e56ff1cb8413f9c3/coverage-7.14.0-cp313-cp313-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:22a7e06a5f11a757cdfe79018e9095f9f69ae283c5cd8123774c788deec8717b", size = 251363 },
{ url = "https://files.pythonhosted.org/packages/6f/5f/b5370068b2f57787454592ed7dcd1002f0f1703b7db1fa30f6a325a4ca6e/coverage-7.14.0-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9d1aa57a1dc8e05bdc42e81c5d671d849577aeedf279f4c449d6d286f9ed88ca", size = 253961 },
{ url = "https://files.pythonhosted.org/packages/29/1e/51adf17738976e8f2b85ddef7b7aa12a0838b056c92f175941d8862767c1/coverage-7.14.0-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:90c1a51bcfddf645b3bb7ec333d9e94393a8e94f55642380fa8a9a5a9e636cb7", size = 255193 },
{ url = "https://files.pythonhosted.org/packages/9e/7b/5bfd7ac1df3b881c2ac7a5cbc99c7609e6296c402f5ef587cd81c6f355b3/coverage-7.14.0-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:a841fae2fadcae4f438d43b6ccc4aac2ad609f47cdb6cfdce60cbb3fe5ca7bc2", size = 257326 },
{ url = "https://files.pythonhosted.org/packages/7d/38/1d37d316b174fad3843a1d76dbdfe4398771c9ecd0515935dd9ece9cd627/coverage-7.14.0-cp313-cp313-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:c79d2319cabef1fe8e86df73371126931550804738f78ad7d31e3aad85a67367", size = 251582 },
{ url = "https://files.pythonhosted.org/packages/34/46/746704f95980ba220214e1a41e18cec5aea80a898eaa53c51bf2d645ff36/coverage-7.14.0-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:1b23b0c6f0b1db6ad769b7050c8b641c0bf215ded26c1816955b17b7f26edfa9", size = 253325 },
{ url = "https://files.pythonhosted.org/packages/e1/b9/bbe87206d9687b192352f893797825b5f5b15ecd3aa9c68fbff0c074d77b/coverage-7.14.0-cp313-cp313-musllinux_1_2_i686.whl", hash = "sha256:55d3089079ce181a4566b1065ab28d2575eb76d8ac8f81f4fcda2bf037fee087", size = 251291 },
{ url = "https://files.pythonhosted.org/packages/46/57/b8cdb12ac0d73ef0243218bd5e22c9df8f92edab8018213a86aec67c5324/coverage-7.14.0-cp313-cp313-musllinux_1_2_ppc64le.whl", hash = "sha256:49c005cba1e2f9677fb2845dcdf9a2e72a52a17d63e8231aaaae35d9f50215ef", size = 255448 },
{ url = "https://files.pythonhosted.org/packages/1f/d4/5002019538b2036ce3c84340f54d2fd5100d55b0a6b0894eee56128d03c7/coverage-7.14.0-cp313-cp313-musllinux_1_2_riscv64.whl", hash = "sha256:9117377b823daa28aa8635fbb08cda1cd6be3d7143257345459559aeef852d52", size = 251110 },
{ url = "https://files.pythonhosted.org/packages/37/53/20c5009477660f084e6ed60bc02a91894b8e234e617e86ecfd9aaf78e27b/coverage-7.14.0-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:7b79d646cf46d5cf9a9f40281d4441df5849e445726e369006d2b117710b33fe", size = 252885 },
{ url = "https://files.pythonhosted.org/packages/ae/ab/3cf6427ac9c1f1db747dbb1ce71dde47984876d4c2cfd018a3fef0a78d4d/coverage-7.14.0-cp313-cp313-win32.whl", hash = "sha256:fb609b3658479e33f9516d46f1a89dbb9b6c261366e3a11844a96ec487533dae", size = 222539 },
{ url = "https://files.pythonhosted.org/packages/8f/b8/9228523e80321c2cb4880d1f589bc0171f2f71432c35118ad04dc01decce/coverage-7.14.0-cp313-cp313-win_amd64.whl", hash = "sha256:0773d8329cf32b6fd222e4b52622c61fe8d503eb966cfc8d3c3c10c96266d50e", size = 223344 },
{ url = "https://files.pythonhosted.org/packages/a3/99/118daa192f95e3a6cb2740100fbf8797cda1734b4134ef0b5d501a7fa8f3/coverage-7.14.0-cp313-cp313-win_arm64.whl", hash = "sha256:b4e26a0f1b696faf283bffe5b8569e44e336c582439df5d53281ab89ee0cba96", size = 221966 },
{ url = "https://files.pythonhosted.org/packages/e6/f1/a46cc0c013be170216253184a32366d7cbdb9252feaec866b05c2d12a894/coverage-7.14.0-cp313-cp313t-macosx_10_13_x86_64.whl", hash = "sha256:953f521ca9445300397e65fda3dca58b2dbd68fee983777420b57ac3c77e9f90", size = 220679 },
{ url = "https://files.pythonhosted.org/packages/64/8c/9c30a3d311a34177fa432995be7fbfc64477d8bac5630bd38055b1c9b424/coverage-7.14.0-cp313-cp313t-macosx_11_0_arm64.whl", hash = "sha256:98af83fd65ae24b1fdd03aaead967a9f523bcd2f1aab2d4f3ffda65bb568a6f1", size = 221033 },
{ url = "https://files.pythonhosted.org/packages/9a/cd/3fb5e06c3badefd0c1b47e2044fdca67f8220a4ec2e7fcfb476aa0a67c6c/coverage-7.14.0-cp313-cp313t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:668b92e6958c4db7cf92e81caac328dfbbdbb215db2850ad28f0cbe1eea0bfbd", size = 262333 },
{ url = "https://files.pythonhosted.org/packages/a8/e6/fbc322325c7294d3e22c1ad6b79e45d0806b25228c8e5842aed6d8169aa7/coverage-7.14.0-cp313-cp313t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9fbd898551762dea00d3fef2b1c4f99afd2c6a3ff952ea07d60a9bd5ed4f34bc", size = 264410 },
{ url = "https://files.pythonhosted.org/packages/08/92/c497b264bec1673c47cc77e26f760fcda4654cabf1f39546d1a23a3b8c35/coverage-7.14.0-cp313-cp313t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:68af363c07ecd8d4b7d4043d85cb376d7d227eceb54e5323ee45da73dbd3e426", size = 266836 },
{ url = "https://files.pythonhosted.org/packages/78/fc/045da320987f401af5d2815d351e8aa799aec859f60e29f445e3089eeedb/coverage-7.14.0-cp313-cp313t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:6e57054a583da8ac55edf24117ea4c9133032cfc4cf72aa2d48c1e5d4b52f899", size = 267974 },
{ url = "https://files.pythonhosted.org/packages/1b/ae/227b1e379497fb7a4fc3286e620f80c8a1e7cec66d45695a01639eb1af65/coverage-7.14.0-cp313-cp313t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:cc3499459bbcdd51a65b64c35ab7ed2764eaf3cba826e0df3f1d7fe2e102b70b", size = 261578 },
{ url = "https://files.pythonhosted.org/packages/a0/f5/3570342900f2acea31d33ff1590c5d8bac1a8e1a2e1c6d34a5d5e61de681/coverage-7.14.0-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:45899ec2138a4346ed34d601dedf5076fb74edf2d1dd9dc76a78e82397edee90", size = 264394 },
{ url = "https://files.pythonhosted.org/packages/16/29/de1bbc01c935b28f89b1dc3db85b011c055e843a8e5e3b83141c3f80af7f/coverage-7.14.0-cp313-cp313t-musllinux_1_2_i686.whl", hash = "sha256:8767486808c436f05b23ab98eb963fb29185e32a9357a166971685cb3459900f", size = 262022 },
{ url = "https://files.pythonhosted.org/packages/35/95/f53890b0bf2fc10ab168e05d38869215e73ca24c4cb521c3bb0eb62fe16b/coverage-7.14.0-cp313-cp313t-musllinux_1_2_ppc64le.whl", hash = "sha256:a3b5ddfd6aa7ddad53ee3edb231e88a2151507a43229b7d71b953916deca127d", size = 265732 },
{ url = "https://files.pythonhosted.org/packages/ed/ea/c919e259081dd2bdf0e43b87209709ba7ec2e4117c2a7f5185379c43463c/coverage-7.14.0-cp313-cp313t-musllinux_1_2_riscv64.whl", hash = "sha256:63df0fe568e698e1045792399f8ab6da3a6c2dce3182813fb92afa2641087b47", size = 260921 },
{ url = "https://files.pythonhosted.org/packages/1a/2c/c2831889705a81dc5d1c6ca12e4d8e9b95dfc146d153488a6c0ea685d28e/coverage-7.14.0-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:827d6397dbd95144939b18f89edf31f63e1f99633e8d5f32f22ba8bdda567477", size = 263109 },
{ url = "https://files.pythonhosted.org/packages/5a/a9/2fcae5003cac3d63fe344d2166243c2756935f48420863c5272b240d550b/coverage-7.14.0-cp313-cp313t-win32.whl", hash = "sha256:7bf43e000d24012599b879791cff41589af90674722421ef11b11a5431920bab", size = 223212 },
{ url = "https://files.pythonhosted.org/packages/3f/bb/18e94d7b14b9b398164197114a587a04ab7c9fdbe1d237eef57311c5e883/coverage-7.14.0-cp313-cp313t-win_amd64.whl", hash = "sha256:3f5549365af25d770e06b1f8f5682d9a5637d06eb494db91c6fa75d3950cc917", size = 224272 },
{ url = "https://files.pythonhosted.org/packages/db/56/4f14fad782b035c81c4ffd09159e7103d42bb1d93ac8496d04b90a11b7da/coverage-7.14.0-cp313-cp313t-win_arm64.whl", hash = "sha256:6d160217ec6fe890f16ad3a9531761589443749e448f91986c972714fad361c8", size = 222530 },
{ url = "https://files.pythonhosted.org/packages/1c/18/b9a6586d73992807c26f9a5f274131be3d76b56b18a82b9392e2a25d2e45/coverage-7.14.0-cp314-cp314-macosx_10_15_x86_64.whl", hash = "sha256:9aed9fa983514ca032790f3fe0d1c0e42ca7e16b42432af1706b50a9a46bef5d", size = 220036 },
{ url = "https://files.pythonhosted.org/packages/f3/9b/4165a1d56ddc302a0e2d518fd9d412a4fd0b57562618c78c5f21c57194f5/coverage-7.14.0-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:ba3b8390db29296dbbf49e91b6fe08f990743a90c8f447ba4c2ffc29670dfa63", size = 220368 },
{ url = "https://files.pythonhosted.org/packages/69/aa/c12e52a5ba148d9995229d557e3be6e554fe469addc0e9241b2f0956d8ea/coverage-7.14.0-cp314-cp314-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:3a5d8e876dfa2f102e970b183863d6dedd023d3c0eeca1fe7a9787bc5f28b212", size = 251417 },
{ url = "https://files.pythonhosted.org/packages/d7/51/ec641c26e6dca1b25a7d2035ba6ecb7c884ef1a100a9e42fbe4ce4405139/coverage-7.14.0-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:5ebb8f4614a3787d567e610bbfdf96a4798dd69a1afb1bd8ad228d4111fe6ff3", size = 253924 },
{ url = "https://files.pythonhosted.org/packages/33/c4/59c3de0bd1b538824173fd518fed51c1ce740ca5ed68e74545983f4053a9/coverage-7.14.0-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6b9bf47223dd8db3d4c4b2e443b02bace480d428f0822c3f991600448a176c97", size = 255269 },
{ url = "https://files.pythonhosted.org/packages/7b/a9/36dfa153a62040296f6e7febfdb20a5720622f6ef5a81a41e8237b9a5344/coverage-7.14.0-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:3485a836550b303d006d57cc06e3d5afaabc642c77050b7c985a97b13e3776b8", size = 257583 },
{ url = "https://files.pythonhosted.org/packages/26/7b/cc2c048d4114d9ab1c2409e9ee365e5ae10736df6dffcfc9444effa6c708/coverage-7.14.0-cp314-cp314-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:3e7e88110bae996d199d1693ca8ec3fd52441d426401ae963437598667b4c5eb", size = 251434 },
{ url = "https://files.pythonhosted.org/packages/ee/df/6770eaa576e604575e9a78055313250faef5faa84bd6f71a39fece519c43/coverage-7.14.0-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:15228a6800ce7bdf1b74800595e56db7138cecb338fdbf044806e10dcf182dfe", size = 253280 },
{ url = "https://files.pythonhosted.org/packages/ad/9e/1c0264514a3f98259a6d64765a397b2c8373e3ba59ee722a4802d3ec0c61/coverage-7.14.0-cp314-cp314-musllinux_1_2_i686.whl", hash = "sha256:9d26ac7f5398bafc5b57421ad994e8a4749e8a7a0e62d05ec7d53014d5963bfa", size = 251241 },
{ url = "https://files.pythonhosted.org/packages/64/16/4efdf3e3c4079cdbf0ece56a2fea872df9e8a3e15a13a0af4400e1075944/coverage-7.14.0-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:2fb73254ff43c911c967a899e1359bc5049b4b115d6e8fbdde4937d0a2246cd5", size = 255516 },
{ url = "https://files.pythonhosted.org/packages/93/69/b1de96346603881b3d1bc8d6447c83200e1c9700ffbaff926ba01ff5724c/coverage-7.14.0-cp314-cp314-musllinux_1_2_riscv64.whl", hash = "sha256:454a380af72c6adada298ed270d38c7a391288198dbfb8467f786f588751a90c", size = 251059 },
{ url = "https://files.pythonhosted.org/packages/a4/66/2881853e0363a5e0a724d1103e53650795367471b6afb234f8b49e713bc6/coverage-7.14.0-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:65c86fb646d2bd2972e96bd1a8b45817ed907cee68655d6295fe7ec031d04cca", size = 252716 },
{ url = "https://files.pythonhosted.org/packages/55/5c/0d3305d002c41dcde873dbe456491e663dc55152ca526b630b5c47efd62f/coverage-7.14.0-cp314-cp314-win32.whl", hash = "sha256:6a6516b02a6101398e19a3f44820f69bab2590697f7def4331f668b14adaf828", size = 222788 },
{ url = "https://files.pythonhosted.org/packages/f9/58/6e1b8f52fdc3184b47dc5037f5070d83a3d11042db1594b02d2a44d786c8/coverage-7.14.0-cp314-cp314-win_amd64.whl", hash = "sha256:45e0f79d8351fa76e256716df91eab12890d32678b9590df7ae1042e4bd4cf5d", size = 223600 },
{ url = "https://files.pythonhosted.org/packages/00/70/a18c408e674bc26281cadaedc7351f929bd2094e191e4b15271c30b084cc/coverage-7.14.0-cp314-cp314-win_arm64.whl", hash = "sha256:4b899594a8b2d81e5cc064a0d7f9cac2081fed91049456cae7676787e41549c9", size = 222168 },
{ url = "https://files.pythonhosted.org/packages/3d/89/2681f071d238b62aff8dfc2ab44fc24cfdb38d1c01f391a80522ff5d3a16/coverage-7.14.0-cp314-cp314t-macosx_10_15_x86_64.whl", hash = "sha256:f580f8c80acd94ac72e863efe2cab791d8c38d153e0b463b92dfa000d5c84cd1", size = 220766 },
{ url = "https://files.pythonhosted.org/packages/bd/c7/c987babafd9207ffa1995e1ef1f9b26762cf4963aa768a66b6f0501e4616/coverage-7.14.0-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:a2bd259c442cd43c49b30fbafc51776eb19ea396faf159d26a83e6a0a5f13b0c", size = 221035 },
{ url = "https://files.pythonhosted.org/packages/5a/e9/d6a5ac3b333088143d6fc877d398a9a674dc03124a2f776e131f03864823/coverage-7.14.0-cp314-cp314t-manylinux1_i686.manylinux_2_28_i686.manylinux_2_5_i686.whl", hash = "sha256:a706b908dfa85538863504c624b237a3cc34232bf403c057414ebfdb3b4d9f84", size = 262405 },
{ url = "https://files.pythonhosted.org/packages/38/b1/e70838d29a7c08e22d44398a46db90815bbcbf28de06992bd9210d1a8d8e/coverage-7.14.0-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:7333cd944ee4393b9b3d3c1b598c936d4fc8d70573a4c7dacfec5590dd50e436", size = 264530 },
{ url = "https://files.pythonhosted.org/packages/6b/73/5c31ef97763288d03d9995152b96d5475b527c63d91c84b01caea894b83a/coverage-7.14.0-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0f162bc9a15b82d947b02651b0c7e1609d6f7a8735ca330cfadec8481dd97d5a", size = 266932 },
{ url = "https://files.pythonhosted.org/packages/e1/76/dd56d80f29c5f05b4d76f7e7c6d47cafacae017189c75c5759d24f9ff0cc/coverage-7.14.0-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:362cb78e01a5dc82009d88004cf60f2e6b6d6fcbfdec05b05af73b0abf40118f", size = 268062 },
{ url = "https://files.pythonhosted.org/packages/6e/c7/27ba85cd5b95614f159ff93ebff1901584a8d192e2e5e24c4943a7453f59/coverage-7.14.0-cp314-cp314t-manylinux_2_31_riscv64.manylinux_2_39_riscv64.whl", hash = "sha256:acebd068fca5512c3a6fde9c045f901613478781a73f0e82b307b214daef23fb", size = 261504 },
{ url = "https://files.pythonhosted.org/packages/13/2e/e8149f60ab5d5684c6eee881bdf34b127115cddbb958b196768dd9d63473/coverage-7.14.0-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:29fe3da551dface75deb2ccbf87b6b66e2e7ef38f6d89050b428be94afff3490", size = 264398 },
{ url = "https://files.pythonhosted.org/packages/d9/7f/1261b025285323225f4b4abffa5a643649dfd67e25ddca7ebcbdea3b7cb3/coverage-7.14.0-cp314-cp314t-musllinux_1_2_i686.whl", hash = "sha256:b4cc4fce8672fffcb09b0eafc167b396b3ba53c4a7230f54b7aaffbf6c835fa9", size = 262000 },
{ url = "https://files.pythonhosted.org/packages/d3/dc/829c54f60b9d08389439c00f813c752781c496fc5788c78d8006db4b4f2b/coverage-7.14.0-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:5d4a51aad8ba8bdcd2b8bd8f03d4aca19693fa2327a3470e4718a25b03481020", size = 265732 },
{ url = "https://files.pythonhosted.org/packages/ed/b0/70bd1419941652fa062689cba9c3eeafb8f5e6fbb890bce41c3bdda5dbd6/coverage-7.14.0-cp314-cp314t-musllinux_1_2_riscv64.whl", hash = "sha256:9f323af3e1e4f68b60b7b247e37b8515563a61375518fa59de1af48ba28a3db6", size = 260847 },
{ url = "https://files.pythonhosted.org/packages/f2/73/be40b2390656c654d35ea0015ea7ba3d945769cf80790ad5e0bb2d56d2ba/coverage-7.14.0-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:1a0abc7342ea9711c469dd8b821c6c311e6bc6aac1442e5fbd6b27fae0a8f3db", size = 263166 },
{ url = "https://files.pythonhosted.org/packages/29/55/4a643f712fcf7cf2881f8ec1e0ccb7b164aff3108f69b51801246c8799f2/coverage-7.14.0-cp314-cp314t-win32.whl", hash = "sha256:a9f864ef57b7172e2db87a096642dd51e179e085ab6b2c371c29e885f65c8fb2", size = 223573 },
{ url = "https://files.pythonhosted.org/packages/27/96/3acae5da0953be042c0b4dea6d6789d2f080701c77b88e44d5bd41b9219b/coverage-7.14.0-cp314-cp314t-win_amd64.whl", hash = "sha256:29943e552fdc08e082eb51400fb2f58e118a83b5542bd06531214e084399b644", size = 224680 },
{ url = "https://files.pythonhosted.org/packages/93/3d/6ab5d2dd8325d838737c6f8d83d62eb6230e0d70b87b51b57bbfd08fa767/coverage-7.14.0-cp314-cp314t-win_arm64.whl", hash = "sha256:742a73ea621953b012f2c4c2219b512180dd84489acf5b1596b0aafc55b9100b", size = 222703 },
{ url = "https://files.pythonhosted.org/packages/61/e8/cb8e80d6f9f55b99588625062822bf946cf03ed06315df4bd8397f5632a1/coverage-7.14.0-py3-none-any.whl", hash = "sha256:8de5b61163aee3d05c8a2beab6f47913df7981dad1baf82c414d99158c286ab1", size = 211764 },
]
[[package]]
name = "cycler"
version = "0.12.1"
@@ -303,29 +372,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/fd/ba/56147c165442cc5ba7e82ecf301c9a68353cede498185869e6e02b4c264f/fonttools-4.62.1-py3-none-any.whl", hash = "sha256:7487782e2113861f4ddcc07c3436450659e3caa5e470b27dc2177cade2d8e7fd", size = 1152647 },
]
[[package]]
name = "garmin"
version = "0.1.0"
source = { virtual = "." }
dependencies = [
{ name = "fitparse" },
{ name = "garth" },
{ name = "ipykernel" },
{ name = "jinja2" },
{ name = "matplotlib" },
{ name = "pandas" },
]
[package.metadata]
requires-dist = [
{ name = "fitparse", specifier = ">=1.2.0" },
{ name = "garth", specifier = ">=0.8.0" },
{ name = "ipykernel", specifier = ">=7.2.0" },
{ name = "jinja2", specifier = ">=3.1.6" },
{ name = "matplotlib", specifier = ">=3.10.9" },
{ name = "pandas", specifier = ">=3.0.3" },
]
[[package]]
name = "garth"
version = "0.8.0"
@@ -375,6 +421,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/fa/5e/f8e9a1d23b9c20a551a8a02ea3637b4642e22c2626e3a13a9a29cdea99eb/importlib_metadata-8.7.1-py3-none-any.whl", hash = "sha256:5a1f80bf1daa489495071efbb095d75a634cf28a8bc299581244063b53176151", size = 27865 },
]
[[package]]
name = "iniconfig"
version = "2.3.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484 },
]
[[package]]
name = "ipykernel"
version = "7.2.0"
@@ -775,6 +830,41 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/be/9c/92789c596b8df838baa98fa71844d84283302f7604ed565dafe5a6b5041a/oauthlib-3.3.1-py3-none-any.whl", hash = "sha256:88119c938d2b8fb88561af5f6ee0eec8cc8d552b7bb1f712743136eb7523b7a1", size = 160065 },
]
[[package]]
name = "openrun"
version = "0.1.0"
source = { editable = "." }
dependencies = [
{ name = "fitparse" },
{ name = "garth" },
{ name = "ipykernel" },
{ name = "jinja2" },
{ name = "matplotlib" },
{ name = "pandas" },
]
[package.dev-dependencies]
dev = [
{ name = "pytest" },
{ name = "pytest-cov" },
]
[package.metadata]
requires-dist = [
{ name = "fitparse", specifier = ">=1.2.0" },
{ name = "garth", specifier = ">=0.8.0" },
{ name = "ipykernel", specifier = ">=7.2.0" },
{ name = "jinja2", specifier = ">=3.1.6" },
{ name = "matplotlib", specifier = ">=3.10.9" },
{ name = "pandas", specifier = ">=3.0.3" },
]
[package.metadata.requires-dev]
dev = [
{ name = "pytest", specifier = ">=8.0.0" },
{ name = "pytest-cov", specifier = ">=5.0.0" },
]
[[package]]
name = "opentelemetry-api"
version = "1.40.0"
@@ -1013,6 +1103,15 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/75/a6/a0a304dc33b49145b21f4808d763822111e67d1c3a32b524a1baf947b6e1/platformdirs-4.9.6-py3-none-any.whl", hash = "sha256:e61adb1d5e5cb3441b4b7710bea7e4c12250ca49439228cc1021c00dcfac0917", size = 21348 },
]
[[package]]
name = "pluggy"
version = "1.6.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538 },
]
[[package]]
name = "prompt-toolkit"
version = "3.0.52"
@@ -1206,6 +1305,36 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/10/bd/c038d7cc38edc1aa5bf91ab8068b63d4308c66c4c8bb3cbba7dfbc049f9c/pyparsing-3.3.2-py3-none-any.whl", hash = "sha256:850ba148bd908d7e2411587e247a1e4f0327839c40e2e5e6d05a007ecc69911d", size = 122781 },
]
[[package]]
name = "pytest"
version = "9.0.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
{ name = "iniconfig" },
{ name = "packaging" },
{ name = "pluggy" },
{ name = "pygments" },
]
sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249 },
]
[[package]]
name = "pytest-cov"
version = "7.1.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "coverage" },
{ name = "pluggy" },
{ name = "pytest" },
]
sdist = { url = "https://files.pythonhosted.org/packages/b1/51/a849f96e117386044471c8ec2bd6cfebacda285da9525c9106aeb28da671/pytest_cov-7.1.0.tar.gz", hash = "sha256:30674f2b5f6351aa09702a9c8c364f6a01c27aae0c1366ae8016160d1efc56b2", size = 55592 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/9d/7a/d968e294073affff457b041c2be9868a40c1c71f4a35fcc1e45e5493067b/pytest_cov-7.1.0-py3-none-any.whl", hash = "sha256:a0461110b7865f9a271aa1b51e516c9a95de9d696734a2f71e3e78f46e1d4678", size = 22876 },
]
[[package]]
name = "python-dateutil"
version = "2.9.0.post0"