This commit is contained in:
2026-06-12 05:48:30 -04:00
parent 9d91ac8ebc
commit 64a5ab4b7f
37 changed files with 4530 additions and 407 deletions

422
README.md
View File

@@ -1,117 +1,154 @@
# openrun — endurance running analytics
A local-first, open-source endurance-training analysis pipeline. Garmin data in (live API or official export), SQLite on disk, pandas notebooks out. Banister CTL/ATL/TSB, Pa:HR decoupling (per-mile and per-second from FIT), Garmin-configured HR-zone time-in-zone, route clustering, race-plan projection with forward CTL/ATL/TSB.
A local-first, open-source endurance-training analysis tool. Garmin data in (live API or official export), SQLite on disk, an in-browser dashboard out — plus a Python API and Jupyter notebooks for power users. No accounts, no cloud, no telemetry. Your data lives in one file on your machine.
This README focuses on what's in the data, what gets computed from it, and how each derived metric is defined. The package name `openrun` is provisional; the local repo can be renamed without code changes.
**What it gives you:** Banister CTL/ATL/TSB (fitness/fatigue/form), Pa:HR decoupling per-mile and per-second from FIT, FIT-aware HR zone time-in-zone, GPS route clustering, race-plan projection with forward CTL/ATL/TSB, off-watch activity logging, and a first-run web wizard so non-CLI users can get going in a few minutes.
```
openrun/
├── pyproject.toml # packaged as `openrun` (src layout, hatchling)
├── openrun.toml # user config (HR zones, weight, LTHR, race calendar)
├── pyproject.toml # packaged as `openrun` (src layout, hatchling)
├── openrun.toml # your config — created by the setup wizard
├── data/garmin.db # SQLite, created on first run
├── src/openrun/
│ ├── __init__.py # re-exports the common API
│ ├── config.py # UserProfile, HRZones, BanisterParams, TOML loader
│ ├── db.py # SQLite schema + connection helper
│ ├── model.py # loaders + derived metrics (was analysis.py)
── ingest/
├── auth.py # Garmin OAuth login
├── garmin_api.py # Path B: incremental sync via garth
├── garmin_export.py # Path A: parse Connect/Takeout JSON + index FITs
├── fit_linker.py # match Takeout FITs to activities by session.start_time
── time_in_zone.py # cache per-activity HR-zone breakdowns
├── examples/notebooks/
├── 01_overview.ipynb # data coverage, recent activity
── 02_running.ipynb # volume, pace, PMC (Banister), PRs
├── 03_recovery.ipynb # sleep, HRV, RHR, body battery
├── 04_efficiency.ipynb # m/beat trends year over year
├── 05_intra_run.ipynb # decoupling, cadence, routes, HR zones
│ └── 06_race_plan.ipynb # periodised plan + tracker + projected PMC
├── data/garmin.db # SQLite (created on first run)
└── {auth,sync,ingest_export,link_fit_files,compute_time_in_zone,db,analysis}.py
# thin top-level shims for back-compat
│ ├── config.py # UserProfile, HRZones, BanisterParams + TOML reader/writer
│ ├── db.py # schema + connect() helper
│ ├── model.py # loaders + derived metrics
│ ├── plots.py # matplotlib plot helpers
── setup.py # idempotent workspace bootstrap + status
├── ingest/
├── auth.py # Garmin OAuth login
├── garmin_api.py # Path B: incremental sync via garth
├── garmin_export.py # Path A: parse Connect / Takeout export
── fit_linker.py # match Takeout FITs by session.start_time
│ │ ├── manual.py # CSV → manual_activities (off-watch sessions)
│ └── time_in_zone.py # per-activity TIZ cache
── web/ # Streamlit app
├── app.py # controller — st.navigation builds the sidebar
├── _helpers.py # cached loaders + sidebar chrome
└── pages/ # one file per page
└── examples/notebooks/ # original analysis notebooks (kept as reference)
```
Most callers want:
## TL;DR — install and use
```bash
uv sync
uv run openrun-web # opens http://localhost:8501
```
A first-launch wizard walks you through profile + HR zones + race calendar, then either ingests a Garmin export zip or takes you to manual logging.
See [QUICKSTART.md](QUICKSTART.md) for the step-by-step.
---
## The two interfaces
### Web app (recommended for everyone)
```bash
uv run openrun-web
```
Eight pages, all wired to the same SQLite DB:
| Page | What's on it |
|---|---|
| 📊 Dashboard | CTL/ATL/TSB tiles + chart, weekly km + ACWR, polarized split, recent runs |
| 📋 Activities | Filter by type / date / distance / name → click a row to drill in |
| 🏃 Activity detail | Splits + pace-vs-HR scatter, time-in-zone bars, FIT decoupling chart |
| 📈 Race plan | Editable plan rows, projected PMC through race day, race-day TSB table |
| 📝 Manual log | Form-based logging of off-watch sessions (strength, hikes, unrecorded runs) |
| 🛌 Recovery | Sleep stages, HRV + RHR overlay, training→next-morning-HRV correlation |
| 🫀 Efficiency | Metres per heartbeat with 30-run rolling median, distance × year heatmap |
| 🔄 Sync | Status + in-browser zip ingest with live progress |
### CLI (for scripting / power users)
```bash
openrun-init # bootstrap + status dashboard
openrun-auth # Garmin OAuth login (live sync)
openrun-sync [--full] # incremental Garmin Connect sync (downloads per-second FIT for new activities)
openrun-sync --fit-backfill --fit-type running # pull per-second FITs for past runs, no website export
openrun-sync --no-fit # sync without the per-second FIT download
openrun-ingest <path|zip> # parse a Garmin export
openrun-link-fit <export> # match Takeout FITs by start time
openrun-link-fit <root> --relink # rewrite paths after moving the export
openrun-time-in-zone # populate the TIZ cache
openrun-import-manual <csv> # bulk off-watch import
openrun-web # launch the Streamlit app
```
### Python API (for notebooks / one-offs)
```python
from openrun import open_conn, load_activities, banister, daily_training_load_series
conn = open_conn()
runs = load_activities(conn, type='running')
pmc = banister(daily_training_load_series(conn))
runs = load_activities(conn, type="running")
pmc = banister(daily_training_load_series(conn, include_manual=True))
```
User-specific values (HR zones, LTHR, weight, race calendar) live in `openrun.toml` — config discovery walks up from cwd looking for it. See [openrun.toml](openrun.toml) for the schema.
CLI entry points (after `uv sync`):
```bash
openrun-auth # OAuth login (Path B)
openrun-sync [--full] # incremental Garmin Connect sync
openrun-ingest <export> # parse a Connect/Takeout export
openrun-link-fit <export> # match Takeout FITs to activities by content
openrun-time-in-zone # populate the activity_time_in_zone cache
```
Top-level shims (`uv run sync.py`, `uv run ingest_export.py`, etc.) still work — they re-export `main()` from the corresponding `openrun.ingest.*` module.
---
## 1 — Source data
### Two ways to get data into the DB
### Two paths in, intentionally overlapping
| Path | Script | What you get | When to use |
| Path | Tool | What you get | When to use |
|---|---|---|---|
| **A. Official data export** (recommended) | `ingest_export.py` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes |
| **B. Live API sync** | `sync.py` (after `auth.py`) | Last N days, lap-level splits, no FIT files | Incremental top-ups between exports |
| **A. Official data export** (recommended for first load) | `openrun-ingest` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes |
| **B. Live API sync** | `openrun-auth` then `openrun-sync`*or the web app's Sync page* | Last N days, lap-level splits, **per-second FIT files** (downloaded via the API) | Incremental top-ups between exports |
The two paths overlap intentionally: the export is the source of truth, the live API fills the gap between exports. The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests.
The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests.
### Garmin's export formats
**No terminal required.** The web app's **Sync** page now logs in to Garmin (email/password, plus an MFA code when the account needs one) and runs the same pull in-process behind a **🔄 Sync now** button — with checkboxes for full backfill, per-second FIT download, and FIT backfill. Credentials go straight to Garmin; only OAuth tokens are cached in `.secrets/`. The CLI (`openrun-auth` + `openrun-sync`) does the identical thing for scripting.
Garmin actually has *two* export formats, both confusingly called "the export":
**Per-second from the API.** `openrun-sync` downloads each new activity's original FIT (`/download-service/files/activity/{id}`) into `data/fit/<id>.fit` and links it in `activity_fit_files` — the same table the export-based linker writes, so decoupling, FIT-based time-in-zone, and the route map all work without a website export. Skip it with `--no-fit`. To pull per-second history for activities synced before this existed, run `openrun-sync --fit-backfill [--fit-type running] [--fit-limit N]`, then `openrun-time-in-zone` to refresh the per-second TIZ cache. Downloads are idempotent — skipped when the FIT is already on disk and linked.
1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data. Filenames are `<activity_id>_<activity_name>.fit`, distances/durations in SI units (m, s). `ingest_export.py` was built for this.
2. **Garmin Takeout dump** — UUID-named folder (`<uuid>_N/`) with many unrelated domains (aviation, Tacx, InReach…). The running data lives under `DI_CONNECT/`. **Crucially different conventions:**
### Garmin's two export formats
Garmin has two unrelated formats both called "the export":
1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data. Filenames are `<activity_id>_<activity_name>.fit`, SI units throughout. The web app's Sync page accepts this zip directly.
2. **Garmin Takeout dump** — UUID-named folder (`<uuid>_N/`) with many domains (aviation, Tacx, InReach…). Running data lives under `DI_CONNECT/`. **Different conventions:**
- FIT filenames: `<email>_<upload_id>.fit` — upload IDs ≠ activity IDs
- `summarizedActivities` uses scaled-integer units: distance in **cm**, duration in **ms**, elevation in **cm**, speed in **m/s ÷ 10**
- `ingest_export.py` now detects and converts these; `link_fit_files.py` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename-based approach doesn't work
- `summarizedActivities` uses scaled-integer units: distance **cm**, duration **ms**, elevation **cm**, speed **m/s × 0.1**
- `openrun-ingest` detects and converts these; `openrun-link-fit` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename trick doesn't work.
For a Takeout dump, the full sequence is:
For Takeout dumps, the full sequence is:
```bash
# 1. Extract the FIT archives in place
mkdir -p <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit
unzip -d <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit \
<export>/DI_CONNECT/DI-Connect-Uploaded-Files/UploadedFiles_0-_Part*.zip
# 2. Ingest JSON + index FITs that have parseable activity IDs
uv run ingest_export.py <export>/
# 3. Link Takeout FITs to activities by start-time match
uv run link_fit_files.py <export>/
# 4. (Optional) precompute time-in-zone cache
uv run compute_time_in_zone.py
uv run openrun-ingest <export>/
uv run openrun-link-fit <export>/
uv run openrun-time-in-zone
```
### Garmin's units & quirks worth knowing
### Off-watch activities
| Field | Live API | Takeout summarizedActivities | Convert by |
Recorded Garmin activity isn't the whole picture — strength work, hikes, runs you forgot to record, group runs on someone else's watch. The web app's **Manual log** writes to a separate `manual_activities` table; loaders union it into the PMC when you pass `include_manual=True` (the Dashboard does this by default).
For bulk import, drop a CSV with `activity_date,activity_type,distance_km,duration_min,training_load,notes[,external_id]` and run `uv run openrun-import-manual workouts.csv`. The optional `external_id` makes re-imports idempotent.
### Units worth knowing (Connect vs Takeout)
| Field | Live API / Connect | Takeout `summarizedActivities` | Convert by |
|---|---|---|---|
| distance | m | cm | × 0.01 |
| duration / movingDuration | s | ms | × 0.001 |
| elevationGain / Loss | m | cm | × 0.01 |
| averageSpeed / maxSpeed | m/s | m/s ÷ 10 | × 10 |
| HR (avg, max) | bpm | bpm | — |
| calories | kcal | kcal | — |
| training_effect, vo2_max | scalar | scalar | — |
| averageSpeed / maxSpeed | m/s | m/s × 0.1 | × 10 |
| HR (avg, max), calories, training_effect, vo2_max | bpm / kcal / scalar | — | — |
| cadence (`averageRunCadence`) | both-legs SPM | both-legs SPM | — (do **not** double) |
| strideLength | cm | cm | × 0.01 for m |
| FIT `position_lat/long` | semicircles | semicircles | × (180 / 2³¹) for degrees |
| FIT `enhanced_speed` | m/s | m/s | — |
The cadence trap bit me: `averageRunCadence` on this account's exports is already both-legs (~150180 spm). Doubling it as if it were single-leg yields 300+ which trips any sane filter, then nothing makes it through.
The cadence trap is real: `averageRunCadence` is already both-legs (~150180 spm). Doubling it as if it were single-leg yields 300+, which trips any sane filter.
---
@@ -121,18 +158,20 @@ The cadence trap bit me: `averageRunCadence` on this account's exports is alread
| Table | Grain | Origin | Notes |
|---|---|---|---|
| `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; the parsed columns are a curated subset |
| `activity_splits` | one row per lap | Path B only | Garmin defaults to auto-laps (~1 km or 1 mi per split). `raw` has cadence, stride, vert osc, GPS bounds |
| `activity_fit_files` | one row per linked FIT | Path A | `fit_path` is **relative** to whichever export root was passed to `link_fit_files.py` — keep the export folder in place |
| `activity_time_in_zone` | one row per activity | precomputed | Cached by `compute_time_in_zone.py`; `source` is `'fit'` or `'lap'` |
| `daily_steps` / `daily_sleep` / `daily_stress` / `daily_hrv` / `daily_body_battery` / `daily_intensity_minutes` / `daily_resting_hr` | one row per calendar date | both paths | Each has a `raw` JSON column with the full source payload |
| `sync_state` | key-value | both paths | Records last-sync / last-ingest timestamps |
| `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; parsed columns are a curated subset |
| `activity_splits` | one row per lap | Path B only | Auto-laps (~1 km or 1 mi). `raw` has cadence, stride, GPS bounds |
| `activity_fit_files` | one row per linked FIT | linker | `fit_path` is stored **absolute** at link time. Move the export → run `openrun-link-fit --relink` |
| `activity_time_in_zone` | one row per activity | precomputed | `source = 'fit'` (per-second) or `'lap'` (split-average) |
| `manual_activities` | one row per logged session | web form / CSV | Parallel to `activities`, never clobbered by Garmin re-ingest |
| `race_plan` | one row per ISO-Monday week | web editor | Feeds `banister_forecast` for projected PMC |
| `daily_*` (steps / sleep / stress / hrv / body_battery / intensity_minutes / resting_hr) | one row per date | both paths | Each has a `raw` JSON column with the full payload |
| `sync_state` | key/value | both paths | last-sync / last-ingest timestamps |
The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface (sleep stage breakdowns, GPS bounding box, swimming-specific metrics, sport sub-types), unpack it from `raw` rather than re-ingesting:
The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface, unpack it from `raw`:
```python
from analysis import expand_raw, load_activities
acts = load_activities(open_conn(), type='running')
from openrun import open_conn, load_activities, expand_raw
acts = load_activities(open_conn(), type="running")
fields = expand_raw(acts) # pd.json_normalize'd companion frame
```
@@ -140,7 +179,7 @@ fields = expand_raw(acts) # pd.json_normalize'd companion frame
## 3 — Metrics & how they're calculated
All loaders and helpers live in [analysis.py](analysis.py). The key derivations:
All loaders and helpers live in [src/openrun/model.py](src/openrun/model.py). Highlights:
### Pace
@@ -148,19 +187,17 @@ All loaders and helpers live in [analysis.py](analysis.py). The key derivations:
pace_min_per_km = (moving_duration_s / 60) / (distance_m / 1000)
```
`moving_duration_s` excludes paused time. Activities with implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace on the fast side and walks-with-stops on the slow side) get masked to NaN in `load_activities` rather than dropped, so the row count is preserved.
Implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace and walks-with-stops) get masked to NaN in `load_activities` rather than dropped, so row count is preserved.
### Acute:Chronic Workload Ratio (ACWR)
Daily training-load sum, then:
```
acute = sum(training_load) over the last 7 days
chronic = sum(training_load) over the last 28 days, divided by 4 for week-equivalence
ACWR = acute / chronic
```
Sweet spot ~0.81.3, > 1.5 = aggressive ramp / injury risk. Implemented inline in [02_running.ipynb](notebooks/02_running.ipynb).
Sweet spot ~0.81.3, > 1.5 = aggressive ramp / injury risk. Surfaced on the Dashboard.
### Aerobic efficiency — "m/beat"
@@ -168,24 +205,22 @@ Sweet spot ~0.81.3, > 1.5 = aggressive ramp / injury risk. Implemented inline
m_per_beat = distance_m / (duration_s * avg_hr / 60)
```
Higher = more metres covered per heartbeat = fitter aerobic system at the run's effort. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. Used as the headline view in [04_efficiency.ipynb](notebooks/04_efficiency.ipynb).
Higher = more metres per heartbeat = fitter aerobic system. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. The Efficiency page does this with a 30-run rolling median plus a distance-bucket × year heatmap and an "easy runs only" headline view (HR < 75 % LTHR).
### Decoupling (Pa:Hr drift)
Friel's method, in two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fueling deficit). *Negative* = improved (negative split).
Friel's method, two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fuelling deficit). *Negative* = improved (negative split).
**Per-mile (lap level)**`analysis.decoupling(splits, min_splits=6)`:
**Per-mile (lap level)**`decoupling(splits, min_splits=6)`:
```
For each activity with ≥ min_splits laps:
halves = first half / second half by lap count
halves = first / second by lap count
eff_half = duration-weighted mean(speed_mps) / mean(heart_rate)
decoupling_pct = (eff_first / eff_second 1) × 100
```
Cheap, but smears across aid-station stops and rounds at lap boundaries.
**Per-second (FIT)**`analysis.fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`:
**Per-second (FIT)**`fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`:
```
records = per-second FIT messages
@@ -197,97 +232,44 @@ for each chunk:
decoupling_pct = (efficiency[0] / efficiency[i] 1) × 100
```
Friel's published thresholds (interpret on **steady aerobic** efforts only):
- **< 5%** — aerobically developed
- **510%** — sustainable
- **> 10%** — pacing unsustainable for the distance, OR fueling shortfall, OR heat
Friel's thresholds (interpret on **steady aerobic** efforts only):
- **< 5 %** — aerobically developed
- **510 %** — sustainable
- **> 10 %** — pacing unsustainable for the distance, OR fueling shortfall, OR heat
In this dataset the per-second number is consistently **715 percentage points lower** than the per-mile number for the same run — the gap is aid-station noise.
The Activity-detail page plots this with `plot_fit_decoupling`.
### HR zone time-in-zone
**Zone boundaries come from Garmin's `heartRateZones.json`** (training method `HR_MAX`), not estimated from observed HR. The constants for this user live in `analysis.HR_ZONES_USER`:
Zone boundaries come from `[user.hr_zones]` in your `openrun.toml` (matching what Garmin's `heartRateZones.json` advertises). Two implementations; `openrun-time-in-zone` picks the better one per activity and caches it.
```python
Z1: 102122 (recovery)
Z2: 123143 (easy aerobic long-run target)
Z3: 144164 (tempo / "junk-miles middle")
Z4: 165185 (threshold LTHR=182 sits inside Z4)
Z5: 186209 (VO₂ max)
```
**From FIT (`source='fit'`)**`time_in_zone_from_fit(records)`: per-second; large gaps (>30 s, paused recording) are clipped to 30 s.
Time-in-zone has two implementations; `compute_time_in_zone.py` picks the better one per activity and caches the result in `activity_time_in_zone`.
**From FIT (`source='fit'`)**`analysis.time_in_zone_from_fit(records)`:
```
records = per-second FIT messages
for each consecutive record pair:
dt = elapsed_seconds delta (clipped to [0, 30] so paused recording doesn't dump hours into one zone)
zone = bucket(this record's heart_rate)
accumulate dt into zone
```
**From splits (`source='lap'`)** — fallback when no FIT is linked:
```
for each split:
zone = bucket(split's avg_hr)
accumulate split.duration_s into that single zone
```
The FIT version is the right number; the lap version is **biased toward the middle zone**: when a lap's HR average is in Z3 but the actual HR oscillated into Z2 and Z4, the lap method dumps the whole lap into Z3. On a sample 8-hour race in this dataset, the lap method over-counted Z3 by 67 minutes and under-counted Z2 and Z4 by ~32 minutes each.
**From splits (`source='lap'`)** — fallback when no FIT is linked: assigns each split's avg HR to a single zone. Biased toward the middle zone (smooths over within-lap variation); the FIT version is ground truth where available.
### Polarized split
A three-bucket simplification used in [05_intra_run.ipynb §4](notebooks/05_intra_run.ipynb):
```
easy = Z1 + Z2 (HR ≤ 143)
moderate = Z3 (144164, the "junk-miles middle")
hard = Z4 + Z5 (≥ 165, threshold and up)
easy = Z1 + Z2 (recovery + easy aerobic)
moderate = Z3 (tempo)
hard = Z4 + Z5 (threshold + VO₂)
```
Seiler's polarized target is ~80% easy, < 10% moderate, ~20% hard. Pyramidal training is high easy, modest moderate, low hard. Threshold-heavy training is the opposite — most time in moderate.
Seiler's polarised target is ~80 % easy, < 10 % moderate, ~20 % hard. The Dashboard shows your last-12-week split as tiles plus a stacked bar.
### Cadence & stride
### Banister fitness / fatigue / form (PMC)
Both come directly from Garmin's lap data (`averageRunCadence`, `strideLength`) and from FIT records (`cadence`, `step_length`). Two gotchas:
1. **Cadence is already both-legs SPM** for this user's exports — do not double. Sanity range for running: 140200 spm.
2. **Stride length is in cm** in lap data (`strideLength: 78` → 78 cm → 0.78 m) and in **mm** in FIT records (`step_length: 1041` → 104 cm). Divide accordingly.
The notebook restricts form analysis to splits with cadence > 0 (drops walks/standing intervals) and to a controlled pace band (5:306:30 min/km) when comparing YoY so fitness changes don't contaminate the form signal.
### GPS route clustering
`analysis.cluster_routes(lats, lons, radius_km=0.25)`:
The standard endurance lens (TrainingPeaks PMC), in `banister(daily_load, ctl_tau=42, atl_tau=7)`:
```
Greedy haversine clustering:
For each unassigned point i:
distances = haversine(point_i, all_other_points)
neighbours = points within radius_km
if len(neighbours) >= 2:
assign all to next_label
next_label += 1
CTL_today = CTL_yesterday · exp(1/τ_CTL) + load_today · (1 exp(1/τ_CTL)) τ_CTL = 42d
ATL_today = ATL_yesterday · exp(1/τ_ATL) + load_today · (1 exp(1/τ_ATL)) τ_ATL = 7d
TSB_today = CTL_yesterday ATL_yesterday # yesterday's values (TP convention)
```
Per-route pace trends control for terrain — the same loop run in 2023 vs 2025 says way more about fitness than raw monthly pace. Adequate for a few hundred starts; switch to `sklearn.cluster.DBSCAN(metric='haversine')` for thousands.
`daily_training_load_series(conn, *, include_manual=True)` is the canonical input — it unions Garmin-recorded TL with `manual_activities`. Rest days are filled with 0; both EWMAs still update.
### Banister fitness / fatigue / form (Performance Management Chart)
The standard endurance-training lens (TrainingPeaks PMC), implemented in `analysis.banister(daily_load, ctl_tau=42, atl_tau=7)`:
```
CTL_today = CTL_yesterday · exp(1/τ_CTL) + load_today · (1 exp(1/τ_CTL)) τ_CTL = 42d
ATL_today = ATL_yesterday · exp(1/τ_ATL) + load_today · (1 exp(1/τ_ATL)) τ_ATL = 7d
TSB_today = CTL_yesterday ATL_yesterday # yesterday's values (TP convention)
```
Helper: `daily_training_load_series(conn, activity_types=('running','trail_running'))` returns the per-day summed `training_load` Series that feeds it. Rest days are filled with 0 — both EWMAs still update, ATL drops faster than CTL, TSB rises.
TSB interpretation (used in both notebooks 02 and 06):
TSB interpretation (used across the app):
| TSB | meaning |
|---|---|
@@ -298,73 +280,103 @@ TSB interpretation (used in both notebooks 02 and 06):
| **+10 to +25** | **fresh / peaked — race-day target** |
| > +25 | detrained (taper too long) |
[02_running.ipynb](notebooks/02_running.ipynb) plots the historical PMC; [06_race_plan.ipynb](notebooks/06_race_plan.ipynb) projects it forward through race day by converting plan-week km into estimated daily loads (separate `RACE_DAY_TL_PER_KM ≈ 7` for the race itself vs `~11` for training runs — race-day TL/km is empirically lower because ultras spread load over more time).
`banister_forecast(history, future, *, today=None)` splices historical load with planned future load and runs the same recursion forward, so the Race-plan page can project CTL/ATL/TSB through race day. The planned-future series comes from `plan_to_daily_load(plan, *, tl_per_km, race_day_tl_per_km, race_dates)` — convert your weekly plan rows into per-day load using a long-run-Saturday-heavy distribution (Sat 40 %, Tue 20 %, Thu 20 %, Mon 10 %, Wed 10 %), with race weeks treating race day specially.
`calibrate_tl_per_km(conn)` reports the empirical median + IQR of `training_load / km` from your history — that's the number to feed into the plan, rather than a hard-coded constant.
### Personal records
`personal_records(activities, distance_bins_km=(5, 10, 21.0975, 42.195, 50), tolerance=0.05)` picks the fastest run within ±5 % of each bin distance.
### GPS route clustering
`cluster_routes(lats, lons, radius_km=0.25)` — greedy haversine-radius clustering of run starts. Per-route pace trends control for terrain (the same loop in 2023 vs 2025 says more about fitness than raw monthly pace). Adequate for hundreds of starts; the README's old TODO for DBSCAN at thousands is still open.
---
## 4 — Notebook tour
## 4 — Reference notebooks (kept around for power-user analysis)
Each notebook bootstraps the same way:
Each notebook under [examples/notebooks/](examples/notebooks/) bootstraps the same way:
```python
import sys; sys.path.insert(0, '..')
from analysis import open_conn, load_activities, load_wellness, ...
from openrun import open_conn, load_activities, ...
conn = open_conn()
```
| # | Notebook | Covers |
|---|---|---|
| 01 | Overview | Data coverage by table, recent activities, monthly mileage YoY, wellness completeness |
| 02 | Running | Weekly volume + rolling, pace-vs-HR scatter, aerobic-efficiency trend, ACWR, PRs by distance |
| 03 | Recovery | Sleep composition, after-run vs after-rest HRV/RHR, correlations between training and recovery, weekly km vs recovery indicators |
| 04 | Efficiency | YoY m/beat by distance bucket, HR/pace split, polynomial curve fits, easy-runs-only headline view |
| 05 | Intra-run | (1) per-mile decoupling, (1b) per-second decoupling on race FITs, (2) cadence & stride at controlled pace, (3) GPS route clustering, (4) HR-zone time-in-zone (FIT-based) + polarized split |
| 06 | Race plan | Periodised 17-week plan around 30K → 50K → 50-mile, with auto-tracking against actual recorded volume |
| # | Notebook | Covers | Web equivalent |
|---|---|---|---|
| 01 | Overview | Data coverage, recent activities | Dashboard |
| 02 | Running | Weekly volume, pace-HR, PMC, PRs | Dashboard + Activities |
| 03 | Recovery | Sleep composition, HRV, RHR, body battery | Recovery |
| 04 | Efficiency | YoY m/beat | Efficiency |
| 05 | Intra-run | Decoupling, cadence, route clusters, TIZ | Activity detail |
| 06 | Race plan | Periodised plan + projected PMC | Race plan |
Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts. Edit those, then `uv run python notebooks/_build_NN.py` regenerates the .ipynb. Easier than editing 30 KB of JSON.
Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts — edit those, then `uv run python examples/notebooks/_build_NN.py` regenerates the .ipynb. They're kept as a reference for the math the web app implements.
---
## 5 — Setup
```bash
uv sync # creates .venv from pyproject.toml
uv sync # creates .venv from pyproject.toml
uv run openrun-web # launch the web app
```
Kernel: in VSCode, pick the project's `.venv` (top-right of any notebook). If it's not discoverable:
The first time you open the app, a setup wizard collects your profile + HR zones, optionally takes a race calendar, and offers to ingest a Garmin export zip in-browser. Your config is written to `openrun.toml` in the current directory — edit by hand any time.
If you'd rather work from the CLI directly:
```bash
uv run python -m ipykernel install --user --name garmin --display-name "garmin (.venv)"
uv run openrun-init # bootstrap (no input required)
uv run openrun-ingest <export> # or
uv run openrun-auth # OAuth login (Path B)
uv run openrun-sync --full # 365-day backfill
```
### Path A — official export
`garth` (Garmin's live-sync library) is deprecated upstream (<https://github.com/matin/garth/discussions/222>). If the live sync ever stops working, fall back to Path A.
1. Go to <https://www.garmin.com/account/datamanagement/exportdata> and request the export. Garmin emails the link within 2472 h.
2. Download the zip (several hundred MB to a few GB) or, for the Takeout dump, the UUID-named folder.
3. Ingest:
---
## 6 — Configuration
[openrun.toml](openrun.toml) — created by the setup wizard, hand-editable.
```toml
[user]
name = "Athlete"
hr_max = 200
lthr = 175
resting_hr = 50
[user.hr_zones]
Z1 = [100, 120]
Z2 = [121, 140]
Z3 = [141, 160]
Z4 = [161, 180]
Z5 = [181, 200]
[db]
path = "data/garmin.db"
[banister]
ctl_tau_days = 42.0
atl_tau_days = 7.0
race_day_tl_per_km = 7.0
[[races]]
label = "wk 4 — 30K"
date = "2026-06-13"
```
Discovery walks up from cwd looking for `openrun.toml`, falls back to `$OPENRUN_CONFIG`, then `~/.config/openrun/config.toml`, then built-in defaults.
---
## 7 — Tests
```bash
uv run ingest_export.py path/to/connect.zip
# or, if already unzipped:
uv run ingest_export.py path/to/<export>/
uv run pytest
```
Add `--dry-run` to preview without writing. The data is upserted on `activity_id` / `calendar_date`, so it's safe to re-ingest a refreshed export later.
For Takeout dumps specifically, follow the 4-step extract+ingest+link+precompute sequence in §1.
### Path B — live API sync
Garmin's SSO endpoint is rate-limited; you may hit 429s. If so, wait 3060 min, switch networks, or use Path A.
```bash
uv run auth.py # prompts for email / password / MFA, saves OAuth to .secrets/
uv run sync.py --full # 365-day backfill
uv run sync.py # incremental thereafter (last 14 days)
uv run sync.py --days 90 # custom backfill window
uv run sync.py --no-details # skip per-activity detail/splits (much faster)
uv run sync.py --skip-activities # wellness only
```
`garth` is deprecated upstream (<https://github.com/matin/garth/discussions/222>). If the live sync stops working, fall back to Path A.
[tests/unit/](tests/unit/) is the pure-function surface (loaders, derivations, helpers). [tests/integration/](tests/integration/) is ingest-path + cross-cutting (`tmp_conn` fixture in [tests/conftest.py](tests/conftest.py) builds an in-memory schema). The Streamlit pages are smoke-tested via `streamlit.testing.v1.AppTest` — see the test invocations in [ROADMAP.md](ROADMAP.md) sections.