Files
openrun/README.md

383 lines
18 KiB
Markdown
Raw Normal View History

2026-05-19 08:34:22 -04:00
# openrun — endurance running analytics
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
A local-first, open-source endurance-training analysis tool. Garmin data in (live API or official export), SQLite on disk, an in-browser dashboard out — plus a Python API and Jupyter notebooks for power users. No accounts, no cloud, no telemetry. Your data lives in one file on your machine.
2026-05-19 08:34:22 -04:00
2026-06-12 05:48:30 -04:00
**What it gives you:** Banister CTL/ATL/TSB (fitness/fatigue/form), Pa:HR decoupling per-mile and per-second from FIT, FIT-aware HR zone time-in-zone, GPS route clustering, race-plan projection with forward CTL/ATL/TSB, off-watch activity logging, and a first-run web wizard so non-CLI users can get going in a few minutes.
2026-05-18 12:53:24 -04:00
```
2026-05-19 08:34:22 -04:00
openrun/
2026-06-12 05:48:30 -04:00
├── pyproject.toml # packaged as `openrun` (src layout, hatchling)
├── openrun.toml # your config — created by the setup wizard
├── data/garmin.db # SQLite, created on first run
2026-05-19 08:34:22 -04:00
├── src/openrun/
2026-06-12 05:48:30 -04:00
│ ├── config.py # UserProfile, HRZones, BanisterParams + TOML reader/writer
│ ├── db.py # schema + connect() helper
│ ├── model.py # loaders + derived metrics
│ ├── plots.py # matplotlib plot helpers
│ ├── setup.py # idempotent workspace bootstrap + status
│ ├── ingest/
│ │ ├── auth.py # Garmin OAuth login
│ │ ├── garmin_api.py # Path B: incremental sync via garth
│ │ ├── garmin_export.py # Path A: parse Connect / Takeout export
│ │ ├── fit_linker.py # match Takeout FITs by session.start_time
│ │ ├── manual.py # CSV → manual_activities (off-watch sessions)
│ │ └── time_in_zone.py # per-activity TIZ cache
│ └── web/ # Streamlit app
│ ├── app.py # controller — st.navigation builds the sidebar
│ ├── _helpers.py # cached loaders + sidebar chrome
│ └── pages/ # one file per page
└── examples/notebooks/ # original analysis notebooks (kept as reference)
```
## TL;DR — install and use
```bash
uv sync
uv run openrun-web # opens http://localhost:8501
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
A first-launch wizard walks you through profile + HR zones + race calendar, then either ingests a Garmin export zip or takes you to manual logging.
2026-05-19 08:34:22 -04:00
2026-06-12 05:48:30 -04:00
See [QUICKSTART.md](QUICKSTART.md) for the step-by-step.
---
## The two interfaces
### Web app (recommended for everyone)
```bash
uv run openrun-web
2026-05-19 08:34:22 -04:00
```
2026-06-12 05:48:30 -04:00
Eight pages, all wired to the same SQLite DB:
2026-05-19 08:34:22 -04:00
2026-06-12 05:48:30 -04:00
| Page | What's on it |
|---|---|
| 📊 Dashboard | CTL/ATL/TSB tiles + chart, weekly km + ACWR, polarized split, recent runs |
| 📋 Activities | Filter by type / date / distance / name → click a row to drill in |
| 🏃 Activity detail | Splits + pace-vs-HR scatter, time-in-zone bars, FIT decoupling chart |
| 📈 Race plan | Editable plan rows, projected PMC through race day, race-day TSB table |
| 📝 Manual log | Form-based logging of off-watch sessions (strength, hikes, unrecorded runs) |
| 🛌 Recovery | Sleep stages, HRV + RHR overlay, training→next-morning-HRV correlation |
| 🫀 Efficiency | Metres per heartbeat with 30-run rolling median, distance × year heatmap |
| 🔄 Sync | Status + in-browser zip ingest with live progress |
### CLI (for scripting / power users)
2026-05-19 08:34:22 -04:00
```bash
2026-06-12 05:48:30 -04:00
openrun-init # bootstrap + status dashboard
openrun-auth # Garmin OAuth login (live sync)
openrun-sync [--full] # incremental Garmin Connect sync (downloads per-second FIT for new activities)
openrun-sync --fit-backfill --fit-type running # pull per-second FITs for past runs, no website export
openrun-sync --no-fit # sync without the per-second FIT download
openrun-ingest <path|zip> # parse a Garmin export
openrun-link-fit <export> # match Takeout FITs by start time
openrun-link-fit <root> --relink # rewrite paths after moving the export
openrun-time-in-zone # populate the TIZ cache
openrun-import-manual <csv> # bulk off-watch import
openrun-web # launch the Streamlit app
2026-05-19 08:34:22 -04:00
```
2026-06-12 05:48:30 -04:00
### Python API (for notebooks / one-offs)
```python
from openrun import open_conn, load_activities, banister, daily_training_load_series
conn = open_conn()
runs = load_activities(conn, type="running")
pmc = banister(daily_training_load_series(conn, include_manual=True))
```
2026-05-19 08:34:22 -04:00
2026-05-18 12:53:24 -04:00
---
## 1 — Source data
2026-06-12 05:48:30 -04:00
### Two paths in, intentionally overlapping
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
| Path | Tool | What you get | When to use |
2026-05-18 12:53:24 -04:00
|---|---|---|---|
2026-06-12 05:48:30 -04:00
| **A. Official data export** (recommended for first load) | `openrun-ingest` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes |
| **B. Live API sync** | `openrun-auth` then `openrun-sync`*or the web app's Sync page* | Last N days, lap-level splits, **per-second FIT files** (downloaded via the API) | Incremental top-ups between exports |
The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests.
**No terminal required.** The web app's **Sync** page now logs in to Garmin (email/password, plus an MFA code when the account needs one) and runs the same pull in-process behind a **🔄 Sync now** button — with checkboxes for full backfill, per-second FIT download, and FIT backfill. Credentials go straight to Garmin; only OAuth tokens are cached in `.secrets/`. The CLI (`openrun-auth` + `openrun-sync`) does the identical thing for scripting.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
**Per-second from the API.** `openrun-sync` downloads each new activity's original FIT (`/download-service/files/activity/{id}`) into `data/fit/<id>.fit` and links it in `activity_fit_files` — the same table the export-based linker writes, so decoupling, FIT-based time-in-zone, and the route map all work without a website export. Skip it with `--no-fit`. To pull per-second history for activities synced before this existed, run `openrun-sync --fit-backfill [--fit-type running] [--fit-limit N]`, then `openrun-time-in-zone` to refresh the per-second TIZ cache. Downloads are idempotent — skipped when the FIT is already on disk and linked.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
### Garmin's two export formats
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
Garmin has two unrelated formats both called "the export":
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data. Filenames are `<activity_id>_<activity_name>.fit`, SI units throughout. The web app's Sync page accepts this zip directly.
2. **Garmin Takeout dump** — UUID-named folder (`<uuid>_N/`) with many domains (aviation, Tacx, InReach…). Running data lives under `DI_CONNECT/`. **Different conventions:**
2026-05-18 12:53:24 -04:00
- FIT filenames: `<email>_<upload_id>.fit` — upload IDs ≠ activity IDs
2026-06-12 05:48:30 -04:00
- `summarizedActivities` uses scaled-integer units: distance **cm**, duration **ms**, elevation **cm**, speed **m/s × 0.1**
- `openrun-ingest` detects and converts these; `openrun-link-fit` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename trick doesn't work.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
For Takeout dumps, the full sequence is:
2026-05-18 12:53:24 -04:00
```bash
mkdir -p <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit
unzip -d <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit \
<export>/DI_CONNECT/DI-Connect-Uploaded-Files/UploadedFiles_0-_Part*.zip
2026-06-12 05:48:30 -04:00
uv run openrun-ingest <export>/
uv run openrun-link-fit <export>/
uv run openrun-time-in-zone
```
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
### Off-watch activities
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
Recorded Garmin activity isn't the whole picture — strength work, hikes, runs you forgot to record, group runs on someone else's watch. The web app's **Manual log** writes to a separate `manual_activities` table; loaders union it into the PMC when you pass `include_manual=True` (the Dashboard does this by default).
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
For bulk import, drop a CSV with `activity_date,activity_type,distance_km,duration_min,training_load,notes[,external_id]` and run `uv run openrun-import-manual workouts.csv`. The optional `external_id` makes re-imports idempotent.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
### Units worth knowing (Connect vs Takeout)
| Field | Live API / Connect | Takeout `summarizedActivities` | Convert by |
2026-05-18 12:53:24 -04:00
|---|---|---|---|
| distance | m | cm | × 0.01 |
| duration / movingDuration | s | ms | × 0.001 |
| elevationGain / Loss | m | cm | × 0.01 |
2026-06-12 05:48:30 -04:00
| averageSpeed / maxSpeed | m/s | m/s × 0.1 | × 10 |
| HR (avg, max), calories, training_effect, vo2_max | bpm / kcal / scalar | — | — |
2026-05-18 12:53:24 -04:00
| cadence (`averageRunCadence`) | both-legs SPM | both-legs SPM | — (do **not** double) |
| strideLength | cm | cm | × 0.01 for m |
| FIT `position_lat/long` | semicircles | semicircles | × (180 / 2³¹) for degrees |
| FIT `enhanced_speed` | m/s | m/s | — |
2026-06-12 05:48:30 -04:00
The cadence trap is real: `averageRunCadence` is already both-legs (~150180 spm). Doubling it as if it were single-leg yields 300+, which trips any sane filter.
2026-05-18 12:53:24 -04:00
---
## 2 — Schema
`db.py:SCHEMA` is the source of truth; what follows is the gist.
| Table | Grain | Origin | Notes |
|---|---|---|---|
2026-06-12 05:48:30 -04:00
| `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; parsed columns are a curated subset |
| `activity_splits` | one row per lap | Path B only | Auto-laps (~1 km or 1 mi). `raw` has cadence, stride, GPS bounds |
| `activity_fit_files` | one row per linked FIT | linker | `fit_path` is stored **absolute** at link time. Move the export → run `openrun-link-fit --relink` |
| `activity_time_in_zone` | one row per activity | precomputed | `source = 'fit'` (per-second) or `'lap'` (split-average) |
| `manual_activities` | one row per logged session | web form / CSV | Parallel to `activities`, never clobbered by Garmin re-ingest |
| `race_plan` | one row per ISO-Monday week | web editor | Feeds `banister_forecast` for projected PMC |
| `daily_*` (steps / sleep / stress / hrv / body_battery / intensity_minutes / resting_hr) | one row per date | both paths | Each has a `raw` JSON column with the full payload |
| `sync_state` | key/value | both paths | last-sync / last-ingest timestamps |
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface, unpack it from `raw`:
2026-05-18 12:53:24 -04:00
```python
2026-06-12 05:48:30 -04:00
from openrun import open_conn, load_activities, expand_raw
acts = load_activities(open_conn(), type="running")
2026-05-18 12:53:24 -04:00
fields = expand_raw(acts) # pd.json_normalize'd companion frame
```
---
## 3 — Metrics & how they're calculated
2026-06-12 05:48:30 -04:00
All loaders and helpers live in [src/openrun/model.py](src/openrun/model.py). Highlights:
2026-05-18 12:53:24 -04:00
### Pace
```
pace_min_per_km = (moving_duration_s / 60) / (distance_m / 1000)
```
2026-06-12 05:48:30 -04:00
Implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace and walks-with-stops) get masked to NaN in `load_activities` rather than dropped, so row count is preserved.
2026-05-18 12:53:24 -04:00
### Acute:Chronic Workload Ratio (ACWR)
```
acute = sum(training_load) over the last 7 days
chronic = sum(training_load) over the last 28 days, divided by 4 for week-equivalence
ACWR = acute / chronic
```
2026-06-12 05:48:30 -04:00
Sweet spot ~0.81.3, > 1.5 = aggressive ramp / injury risk. Surfaced on the Dashboard.
2026-05-18 12:53:24 -04:00
### Aerobic efficiency — "m/beat"
```
m_per_beat = distance_m / (duration_s * avg_hr / 60)
```
2026-06-12 05:48:30 -04:00
Higher = more metres per heartbeat = fitter aerobic system. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. The Efficiency page does this with a 30-run rolling median plus a distance-bucket × year heatmap and an "easy runs only" headline view (HR < 75 % LTHR).
2026-05-18 12:53:24 -04:00
### Decoupling (Pa:Hr drift)
2026-06-12 05:48:30 -04:00
Friel's method, two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fuelling deficit). *Negative* = improved (negative split).
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
**Per-mile (lap level)** — `decoupling(splits, min_splits=6)`:
2026-05-18 12:53:24 -04:00
```
For each activity with ≥ min_splits laps:
2026-06-12 05:48:30 -04:00
halves = first / second by lap count
2026-05-18 12:53:24 -04:00
eff_half = duration-weighted mean(speed_mps) / mean(heart_rate)
decoupling_pct = (eff_first / eff_second 1) × 100
```
2026-06-12 05:48:30 -04:00
**Per-second (FIT)** — `fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`:
2026-05-18 12:53:24 -04:00
```
records = per-second FIT messages
drop first warmup_min and last cooldown_min
drop records where speed_mps < min_speed_mps # ignore stopped time
slice remaining moving time into N equal-time chunks
for each chunk:
efficiency = mean(speed_mps) / mean(heart_rate)
decoupling_pct = (efficiency[0] / efficiency[i] 1) × 100
```
2026-06-12 05:48:30 -04:00
Friel's thresholds (interpret on **steady aerobic** efforts only):
- **< 5 %** — aerobically developed
- **510 %** — sustainable
- **> 10 %** — pacing unsustainable for the distance, OR fueling shortfall, OR heat
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
The Activity-detail page plots this with `plot_fit_decoupling`.
2026-05-18 12:53:24 -04:00
### HR zone time-in-zone
2026-06-12 05:48:30 -04:00
Zone boundaries come from `[user.hr_zones]` in your `openrun.toml` (matching what Garmin's `heartRateZones.json` advertises). Two implementations; `openrun-time-in-zone` picks the better one per activity and caches it.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
**From FIT (`source='fit'`)** — `time_in_zone_from_fit(records)`: per-second; large gaps (>30 s, paused recording) are clipped to 30 s.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
**From splits (`source='lap'`)** — fallback when no FIT is linked: assigns each split's avg HR to a single zone. Biased toward the middle zone (smooths over within-lap variation); the FIT version is ground truth where available.
2026-05-18 12:53:24 -04:00
### Polarized split
```
2026-06-12 05:48:30 -04:00
easy = Z1 + Z2 (recovery + easy aerobic)
moderate = Z3 (tempo)
hard = Z4 + Z5 (threshold + VO₂)
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
Seiler's polarised target is ~80 % easy, < 10 % moderate, ~20 % hard. The Dashboard shows your last-12-week split as tiles plus a stacked bar.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
### Banister fitness / fatigue / form (PMC)
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
The standard endurance lens (TrainingPeaks PMC), in `banister(daily_load, ctl_tau=42, atl_tau=7)`:
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
CTL_today = CTL_yesterday · exp(1/τ_CTL) + load_today · (1 exp(1/τ_CTL)) τ_CTL = 42d
ATL_today = ATL_yesterday · exp(1/τ_ATL) + load_today · (1 exp(1/τ_ATL)) τ_ATL = 7d
TSB_today = CTL_yesterday ATL_yesterday # yesterday's values (TP convention)
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
`daily_training_load_series(conn, *, include_manual=True)` is the canonical input — it unions Garmin-recorded TL with `manual_activities`. Rest days are filled with 0; both EWMAs still update.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
TSB interpretation (used across the app):
2026-05-18 12:53:24 -04:00
| TSB | meaning |
|---|---|
| < 30 | severely fatigued — injury risk |
| 10 to 30 | productive overload — heart of a build |
| 10 to 0 | balanced building |
| 0 to +10 | sharpening |
| **+10 to +25** | **fresh / peaked — race-day target** |
| > +25 | detrained (taper too long) |
2026-06-12 05:48:30 -04:00
`banister_forecast(history, future, *, today=None)` splices historical load with planned future load and runs the same recursion forward, so the Race-plan page can project CTL/ATL/TSB through race day. The planned-future series comes from `plan_to_daily_load(plan, *, tl_per_km, race_day_tl_per_km, race_dates)` — convert your weekly plan rows into per-day load using a long-run-Saturday-heavy distribution (Sat 40 %, Tue 20 %, Thu 20 %, Mon 10 %, Wed 10 %), with race weeks treating race day specially.
`calibrate_tl_per_km(conn)` reports the empirical median + IQR of `training_load / km` from your history — that's the number to feed into the plan, rather than a hard-coded constant.
### Personal records
`personal_records(activities, distance_bins_km=(5, 10, 21.0975, 42.195, 50), tolerance=0.05)` picks the fastest run within ±5 % of each bin distance.
### GPS route clustering
`cluster_routes(lats, lons, radius_km=0.25)` — greedy haversine-radius clustering of run starts. Per-route pace trends control for terrain (the same loop in 2023 vs 2025 says more about fitness than raw monthly pace). Adequate for hundreds of starts; the README's old TODO for DBSCAN at thousands is still open.
2026-05-18 12:53:24 -04:00
---
2026-06-12 05:48:30 -04:00
## 4 — Reference notebooks (kept around for power-user analysis)
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
Each notebook under [examples/notebooks/](examples/notebooks/) bootstraps the same way:
2026-05-18 12:53:24 -04:00
```python
2026-06-12 05:48:30 -04:00
from openrun import open_conn, load_activities, ...
2026-05-18 12:53:24 -04:00
conn = open_conn()
```
2026-06-12 05:48:30 -04:00
| # | Notebook | Covers | Web equivalent |
|---|---|---|---|
| 01 | Overview | Data coverage, recent activities | Dashboard |
| 02 | Running | Weekly volume, pace-HR, PMC, PRs | Dashboard + Activities |
| 03 | Recovery | Sleep composition, HRV, RHR, body battery | Recovery |
| 04 | Efficiency | YoY m/beat | Efficiency |
| 05 | Intra-run | Decoupling, cadence, route clusters, TIZ | Activity detail |
| 06 | Race plan | Periodised plan + projected PMC | Race plan |
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts — edit those, then `uv run python examples/notebooks/_build_NN.py` regenerates the .ipynb. They're kept as a reference for the math the web app implements.
2026-05-18 12:53:24 -04:00
---
## 5 — Setup
```bash
2026-06-12 05:48:30 -04:00
uv sync # creates .venv from pyproject.toml
uv run openrun-web # launch the web app
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
The first time you open the app, a setup wizard collects your profile + HR zones, optionally takes a race calendar, and offers to ingest a Garmin export zip in-browser. Your config is written to `openrun.toml` in the current directory — edit by hand any time.
If you'd rather work from the CLI directly:
2026-05-18 12:53:24 -04:00
```bash
2026-06-12 05:48:30 -04:00
uv run openrun-init # bootstrap (no input required)
uv run openrun-ingest <export> # or
uv run openrun-auth # OAuth login (Path B)
uv run openrun-sync --full # 365-day backfill
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
`garth` (Garmin's live-sync library) is deprecated upstream (<https://github.com/matin/garth/discussions/222>). If the live sync ever stops working, fall back to Path A.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
---
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
## 6 — Configuration
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
[openrun.toml](openrun.toml) — created by the setup wizard, hand-editable.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
```toml
[user]
name = "Athlete"
hr_max = 200
lthr = 175
resting_hr = 50
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
[user.hr_zones]
Z1 = [100, 120]
Z2 = [121, 140]
Z3 = [141, 160]
Z4 = [161, 180]
Z5 = [181, 200]
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
[db]
path = "data/garmin.db"
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
[banister]
ctl_tau_days = 42.0
atl_tau_days = 7.0
race_day_tl_per_km = 7.0
[[races]]
label = "wk 4 — 30K"
date = "2026-06-13"
```
Discovery walks up from cwd looking for `openrun.toml`, falls back to `$OPENRUN_CONFIG`, then `~/.config/openrun/config.toml`, then built-in defaults.
2026-05-18 12:53:24 -04:00
2026-06-12 05:48:30 -04:00
---
## 7 — Tests
```bash
uv run pytest
2026-05-18 12:53:24 -04:00
```
2026-06-12 05:48:30 -04:00
[tests/unit/](tests/unit/) is the pure-function surface (loaders, derivations, helpers). [tests/integration/](tests/integration/) is ingest-path + cross-cutting (`tmp_conn` fixture in [tests/conftest.py](tests/conftest.py) builds an in-memory schema). The Streamlit pages are smoke-tested via `streamlit.testing.v1.AppTest` — see the test invocations in [ROADMAP.md](ROADMAP.md) sections.