Files
openrun/README.md
2026-06-12 05:48:30 -04:00

383 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# openrun — endurance running analytics
A local-first, open-source endurance-training analysis tool. Garmin data in (live API or official export), SQLite on disk, an in-browser dashboard out — plus a Python API and Jupyter notebooks for power users. No accounts, no cloud, no telemetry. Your data lives in one file on your machine.
**What it gives you:** Banister CTL/ATL/TSB (fitness/fatigue/form), Pa:HR decoupling per-mile and per-second from FIT, FIT-aware HR zone time-in-zone, GPS route clustering, race-plan projection with forward CTL/ATL/TSB, off-watch activity logging, and a first-run web wizard so non-CLI users can get going in a few minutes.
```
openrun/
├── pyproject.toml # packaged as `openrun` (src layout, hatchling)
├── openrun.toml # your config — created by the setup wizard
├── data/garmin.db # SQLite, created on first run
├── src/openrun/
│ ├── config.py # UserProfile, HRZones, BanisterParams + TOML reader/writer
│ ├── db.py # schema + connect() helper
│ ├── model.py # loaders + derived metrics
│ ├── plots.py # matplotlib plot helpers
│ ├── setup.py # idempotent workspace bootstrap + status
│ ├── ingest/
│ │ ├── auth.py # Garmin OAuth login
│ │ ├── garmin_api.py # Path B: incremental sync via garth
│ │ ├── garmin_export.py # Path A: parse Connect / Takeout export
│ │ ├── fit_linker.py # match Takeout FITs by session.start_time
│ │ ├── manual.py # CSV → manual_activities (off-watch sessions)
│ │ └── time_in_zone.py # per-activity TIZ cache
│ └── web/ # Streamlit app
│ ├── app.py # controller — st.navigation builds the sidebar
│ ├── _helpers.py # cached loaders + sidebar chrome
│ └── pages/ # one file per page
└── examples/notebooks/ # original analysis notebooks (kept as reference)
```
## TL;DR — install and use
```bash
uv sync
uv run openrun-web # opens http://localhost:8501
```
A first-launch wizard walks you through profile + HR zones + race calendar, then either ingests a Garmin export zip or takes you to manual logging.
See [QUICKSTART.md](QUICKSTART.md) for the step-by-step.
---
## The two interfaces
### Web app (recommended for everyone)
```bash
uv run openrun-web
```
Eight pages, all wired to the same SQLite DB:
| Page | What's on it |
|---|---|
| 📊 Dashboard | CTL/ATL/TSB tiles + chart, weekly km + ACWR, polarized split, recent runs |
| 📋 Activities | Filter by type / date / distance / name → click a row to drill in |
| 🏃 Activity detail | Splits + pace-vs-HR scatter, time-in-zone bars, FIT decoupling chart |
| 📈 Race plan | Editable plan rows, projected PMC through race day, race-day TSB table |
| 📝 Manual log | Form-based logging of off-watch sessions (strength, hikes, unrecorded runs) |
| 🛌 Recovery | Sleep stages, HRV + RHR overlay, training→next-morning-HRV correlation |
| 🫀 Efficiency | Metres per heartbeat with 30-run rolling median, distance × year heatmap |
| 🔄 Sync | Status + in-browser zip ingest with live progress |
### CLI (for scripting / power users)
```bash
openrun-init # bootstrap + status dashboard
openrun-auth # Garmin OAuth login (live sync)
openrun-sync [--full] # incremental Garmin Connect sync (downloads per-second FIT for new activities)
openrun-sync --fit-backfill --fit-type running # pull per-second FITs for past runs, no website export
openrun-sync --no-fit # sync without the per-second FIT download
openrun-ingest <path|zip> # parse a Garmin export
openrun-link-fit <export> # match Takeout FITs by start time
openrun-link-fit <root> --relink # rewrite paths after moving the export
openrun-time-in-zone # populate the TIZ cache
openrun-import-manual <csv> # bulk off-watch import
openrun-web # launch the Streamlit app
```
### Python API (for notebooks / one-offs)
```python
from openrun import open_conn, load_activities, banister, daily_training_load_series
conn = open_conn()
runs = load_activities(conn, type="running")
pmc = banister(daily_training_load_series(conn, include_manual=True))
```
---
## 1 — Source data
### Two paths in, intentionally overlapping
| Path | Tool | What you get | When to use |
|---|---|---|---|
| **A. Official data export** (recommended for first load) | `openrun-ingest` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes |
| **B. Live API sync** | `openrun-auth` then `openrun-sync`*or the web app's Sync page* | Last N days, lap-level splits, **per-second FIT files** (downloaded via the API) | Incremental top-ups between exports |
The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests.
**No terminal required.** The web app's **Sync** page now logs in to Garmin (email/password, plus an MFA code when the account needs one) and runs the same pull in-process behind a **🔄 Sync now** button — with checkboxes for full backfill, per-second FIT download, and FIT backfill. Credentials go straight to Garmin; only OAuth tokens are cached in `.secrets/`. The CLI (`openrun-auth` + `openrun-sync`) does the identical thing for scripting.
**Per-second from the API.** `openrun-sync` downloads each new activity's original FIT (`/download-service/files/activity/{id}`) into `data/fit/<id>.fit` and links it in `activity_fit_files` — the same table the export-based linker writes, so decoupling, FIT-based time-in-zone, and the route map all work without a website export. Skip it with `--no-fit`. To pull per-second history for activities synced before this existed, run `openrun-sync --fit-backfill [--fit-type running] [--fit-limit N]`, then `openrun-time-in-zone` to refresh the per-second TIZ cache. Downloads are idempotent — skipped when the FIT is already on disk and linked.
### Garmin's two export formats
Garmin has two unrelated formats both called "the export":
1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data. Filenames are `<activity_id>_<activity_name>.fit`, SI units throughout. The web app's Sync page accepts this zip directly.
2. **Garmin Takeout dump** — UUID-named folder (`<uuid>_N/`) with many domains (aviation, Tacx, InReach…). Running data lives under `DI_CONNECT/`. **Different conventions:**
- FIT filenames: `<email>_<upload_id>.fit` — upload IDs ≠ activity IDs
- `summarizedActivities` uses scaled-integer units: distance **cm**, duration **ms**, elevation **cm**, speed **m/s × 0.1**
- `openrun-ingest` detects and converts these; `openrun-link-fit` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename trick doesn't work.
For Takeout dumps, the full sequence is:
```bash
mkdir -p <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit
unzip -d <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit \
<export>/DI_CONNECT/DI-Connect-Uploaded-Files/UploadedFiles_0-_Part*.zip
uv run openrun-ingest <export>/
uv run openrun-link-fit <export>/
uv run openrun-time-in-zone
```
### Off-watch activities
Recorded Garmin activity isn't the whole picture — strength work, hikes, runs you forgot to record, group runs on someone else's watch. The web app's **Manual log** writes to a separate `manual_activities` table; loaders union it into the PMC when you pass `include_manual=True` (the Dashboard does this by default).
For bulk import, drop a CSV with `activity_date,activity_type,distance_km,duration_min,training_load,notes[,external_id]` and run `uv run openrun-import-manual workouts.csv`. The optional `external_id` makes re-imports idempotent.
### Units worth knowing (Connect vs Takeout)
| Field | Live API / Connect | Takeout `summarizedActivities` | Convert by |
|---|---|---|---|
| distance | m | cm | × 0.01 |
| duration / movingDuration | s | ms | × 0.001 |
| elevationGain / Loss | m | cm | × 0.01 |
| averageSpeed / maxSpeed | m/s | m/s × 0.1 | × 10 |
| HR (avg, max), calories, training_effect, vo2_max | bpm / kcal / scalar | — | — |
| cadence (`averageRunCadence`) | both-legs SPM | both-legs SPM | — (do **not** double) |
| strideLength | cm | cm | × 0.01 for m |
| FIT `position_lat/long` | semicircles | semicircles | × (180 / 2³¹) for degrees |
| FIT `enhanced_speed` | m/s | m/s | — |
The cadence trap is real: `averageRunCadence` is already both-legs (~150180 spm). Doubling it as if it were single-leg yields 300+, which trips any sane filter.
---
## 2 — Schema
`db.py:SCHEMA` is the source of truth; what follows is the gist.
| Table | Grain | Origin | Notes |
|---|---|---|---|
| `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; parsed columns are a curated subset |
| `activity_splits` | one row per lap | Path B only | Auto-laps (~1 km or 1 mi). `raw` has cadence, stride, GPS bounds |
| `activity_fit_files` | one row per linked FIT | linker | `fit_path` is stored **absolute** at link time. Move the export → run `openrun-link-fit --relink` |
| `activity_time_in_zone` | one row per activity | precomputed | `source = 'fit'` (per-second) or `'lap'` (split-average) |
| `manual_activities` | one row per logged session | web form / CSV | Parallel to `activities`, never clobbered by Garmin re-ingest |
| `race_plan` | one row per ISO-Monday week | web editor | Feeds `banister_forecast` for projected PMC |
| `daily_*` (steps / sleep / stress / hrv / body_battery / intensity_minutes / resting_hr) | one row per date | both paths | Each has a `raw` JSON column with the full payload |
| `sync_state` | key/value | both paths | last-sync / last-ingest timestamps |
The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface, unpack it from `raw`:
```python
from openrun import open_conn, load_activities, expand_raw
acts = load_activities(open_conn(), type="running")
fields = expand_raw(acts) # pd.json_normalize'd companion frame
```
---
## 3 — Metrics & how they're calculated
All loaders and helpers live in [src/openrun/model.py](src/openrun/model.py). Highlights:
### Pace
```
pace_min_per_km = (moving_duration_s / 60) / (distance_m / 1000)
```
Implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace and walks-with-stops) get masked to NaN in `load_activities` rather than dropped, so row count is preserved.
### Acute:Chronic Workload Ratio (ACWR)
```
acute = sum(training_load) over the last 7 days
chronic = sum(training_load) over the last 28 days, divided by 4 for week-equivalence
ACWR = acute / chronic
```
Sweet spot ~0.81.3, > 1.5 = aggressive ramp / injury risk. Surfaced on the Dashboard.
### Aerobic efficiency — "m/beat"
```
m_per_beat = distance_m / (duration_s * avg_hr / 60)
```
Higher = more metres per heartbeat = fitter aerobic system. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. The Efficiency page does this with a 30-run rolling median plus a distance-bucket × year heatmap and an "easy runs only" headline view (HR < 75 % LTHR).
### Decoupling (Pa:Hr drift)
Friel's method, two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fuelling deficit). *Negative* = improved (negative split).
**Per-mile (lap level)**`decoupling(splits, min_splits=6)`:
```
For each activity with ≥ min_splits laps:
halves = first / second by lap count
eff_half = duration-weighted mean(speed_mps) / mean(heart_rate)
decoupling_pct = (eff_first / eff_second 1) × 100
```
**Per-second (FIT)**`fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`:
```
records = per-second FIT messages
drop first warmup_min and last cooldown_min
drop records where speed_mps < min_speed_mps # ignore stopped time
slice remaining moving time into N equal-time chunks
for each chunk:
efficiency = mean(speed_mps) / mean(heart_rate)
decoupling_pct = (efficiency[0] / efficiency[i] 1) × 100
```
Friel's thresholds (interpret on **steady aerobic** efforts only):
- **< 5 %** — aerobically developed
- **510 %** — sustainable
- **> 10 %** — pacing unsustainable for the distance, OR fueling shortfall, OR heat
The Activity-detail page plots this with `plot_fit_decoupling`.
### HR zone time-in-zone
Zone boundaries come from `[user.hr_zones]` in your `openrun.toml` (matching what Garmin's `heartRateZones.json` advertises). Two implementations; `openrun-time-in-zone` picks the better one per activity and caches it.
**From FIT (`source='fit'`)**`time_in_zone_from_fit(records)`: per-second; large gaps (>30 s, paused recording) are clipped to 30 s.
**From splits (`source='lap'`)** — fallback when no FIT is linked: assigns each split's avg HR to a single zone. Biased toward the middle zone (smooths over within-lap variation); the FIT version is ground truth where available.
### Polarized split
```
easy = Z1 + Z2 (recovery + easy aerobic)
moderate = Z3 (tempo)
hard = Z4 + Z5 (threshold + VO₂)
```
Seiler's polarised target is ~80 % easy, < 10 % moderate, ~20 % hard. The Dashboard shows your last-12-week split as tiles plus a stacked bar.
### Banister fitness / fatigue / form (PMC)
The standard endurance lens (TrainingPeaks PMC), in `banister(daily_load, ctl_tau=42, atl_tau=7)`:
```
CTL_today = CTL_yesterday · exp(1/τ_CTL) + load_today · (1 exp(1/τ_CTL)) τ_CTL = 42d
ATL_today = ATL_yesterday · exp(1/τ_ATL) + load_today · (1 exp(1/τ_ATL)) τ_ATL = 7d
TSB_today = CTL_yesterday ATL_yesterday # yesterday's values (TP convention)
```
`daily_training_load_series(conn, *, include_manual=True)` is the canonical input — it unions Garmin-recorded TL with `manual_activities`. Rest days are filled with 0; both EWMAs still update.
TSB interpretation (used across the app):
| TSB | meaning |
|---|---|
| < 30 | severely fatigued — injury risk |
| 10 to 30 | productive overload — heart of a build |
| 10 to 0 | balanced building |
| 0 to +10 | sharpening |
| **+10 to +25** | **fresh / peaked — race-day target** |
| > +25 | detrained (taper too long) |
`banister_forecast(history, future, *, today=None)` splices historical load with planned future load and runs the same recursion forward, so the Race-plan page can project CTL/ATL/TSB through race day. The planned-future series comes from `plan_to_daily_load(plan, *, tl_per_km, race_day_tl_per_km, race_dates)` — convert your weekly plan rows into per-day load using a long-run-Saturday-heavy distribution (Sat 40 %, Tue 20 %, Thu 20 %, Mon 10 %, Wed 10 %), with race weeks treating race day specially.
`calibrate_tl_per_km(conn)` reports the empirical median + IQR of `training_load / km` from your history — that's the number to feed into the plan, rather than a hard-coded constant.
### Personal records
`personal_records(activities, distance_bins_km=(5, 10, 21.0975, 42.195, 50), tolerance=0.05)` picks the fastest run within ±5 % of each bin distance.
### GPS route clustering
`cluster_routes(lats, lons, radius_km=0.25)` — greedy haversine-radius clustering of run starts. Per-route pace trends control for terrain (the same loop in 2023 vs 2025 says more about fitness than raw monthly pace). Adequate for hundreds of starts; the README's old TODO for DBSCAN at thousands is still open.
---
## 4 — Reference notebooks (kept around for power-user analysis)
Each notebook under [examples/notebooks/](examples/notebooks/) bootstraps the same way:
```python
from openrun import open_conn, load_activities, ...
conn = open_conn()
```
| # | Notebook | Covers | Web equivalent |
|---|---|---|---|
| 01 | Overview | Data coverage, recent activities | Dashboard |
| 02 | Running | Weekly volume, pace-HR, PMC, PRs | Dashboard + Activities |
| 03 | Recovery | Sleep composition, HRV, RHR, body battery | Recovery |
| 04 | Efficiency | YoY m/beat | Efficiency |
| 05 | Intra-run | Decoupling, cadence, route clusters, TIZ | Activity detail |
| 06 | Race plan | Periodised plan + projected PMC | Race plan |
Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts — edit those, then `uv run python examples/notebooks/_build_NN.py` regenerates the .ipynb. They're kept as a reference for the math the web app implements.
---
## 5 — Setup
```bash
uv sync # creates .venv from pyproject.toml
uv run openrun-web # launch the web app
```
The first time you open the app, a setup wizard collects your profile + HR zones, optionally takes a race calendar, and offers to ingest a Garmin export zip in-browser. Your config is written to `openrun.toml` in the current directory — edit by hand any time.
If you'd rather work from the CLI directly:
```bash
uv run openrun-init # bootstrap (no input required)
uv run openrun-ingest <export> # or
uv run openrun-auth # OAuth login (Path B)
uv run openrun-sync --full # 365-day backfill
```
`garth` (Garmin's live-sync library) is deprecated upstream (<https://github.com/matin/garth/discussions/222>). If the live sync ever stops working, fall back to Path A.
---
## 6 — Configuration
[openrun.toml](openrun.toml) — created by the setup wizard, hand-editable.
```toml
[user]
name = "Athlete"
hr_max = 200
lthr = 175
resting_hr = 50
[user.hr_zones]
Z1 = [100, 120]
Z2 = [121, 140]
Z3 = [141, 160]
Z4 = [161, 180]
Z5 = [181, 200]
[db]
path = "data/garmin.db"
[banister]
ctl_tau_days = 42.0
atl_tau_days = 7.0
race_day_tl_per_km = 7.0
[[races]]
label = "wk 4 — 30K"
date = "2026-06-13"
```
Discovery walks up from cwd looking for `openrun.toml`, falls back to `$OPENRUN_CONFIG`, then `~/.config/openrun/config.toml`, then built-in defaults.
---
## 7 — Tests
```bash
uv run pytest
```
[tests/unit/](tests/unit/) is the pure-function surface (loaders, derivations, helpers). [tests/integration/](tests/integration/) is ingest-path + cross-cutting (`tmp_conn` fixture in [tests/conftest.py](tests/conftest.py) builds an in-memory schema). The Streamlit pages are smoke-tested via `streamlit.testing.v1.AppTest` — see the test invocations in [ROADMAP.md](ROADMAP.md) sections.