# Garmin → local SQLite A small, self-contained pipeline that pulls Garmin Connect data into a local SQLite database and a stack of pandas notebooks. The aim is a *reference* you can run against your own data — not a polished product — so this README focuses on what's in the data, what gets computed from it, and exactly how each derived metric is defined. ``` garmin/ ├── auth.py # Path B: OAuth login for live API ├── sync.py # Path B: incremental sync via garth ├── ingest_export.py # Path A: parse Garmin Connect data export ├── link_fit_files.py # match Takeout FIT files to activities by start_time ├── compute_time_in_zone.py # cache per-activity HR-zone breakdowns ├── analysis.py # shared loaders + derived metrics ├── db.py # schema + connection helpers ├── data/garmin.db # SQLite (created on first run) └── notebooks/ ├── 01_overview.ipynb # data coverage, recent activity ├── 02_running.ipynb # volume, pace, ACWR, PRs ├── 03_recovery.ipynb # sleep, HRV, RHR, body battery ├── 04_efficiency.ipynb # m/beat trends year over year ├── 05_intra_run.ipynb # decoupling, cadence, routes, HR zones └── 06_race_plan.ipynb # build a periodised plan + adherence tracker ``` --- ## 1 — Source data ### Two ways to get data into the DB | Path | Script | What you get | When to use | |---|---|---|---| | **A. Official data export** (recommended) | `ingest_export.py` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes | | **B. Live API sync** | `sync.py` (after `auth.py`) | Last N days, lap-level splits, no FIT files | Incremental top-ups between exports | The two paths overlap intentionally: the export is the source of truth, the live API fills the gap between exports. The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests. ### Garmin's export formats Garmin actually has *two* export formats, both confusingly called "the export": 1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data. Filenames are `_.fit`, distances/durations in SI units (m, s). `ingest_export.py` was built for this. 2. **Garmin Takeout dump** — UUID-named folder (`_N/`) with many unrelated domains (aviation, Tacx, InReach…). The running data lives under `DI_CONNECT/`. **Crucially different conventions:** - FIT filenames: `_.fit` — upload IDs ≠ activity IDs - `summarizedActivities` uses scaled-integer units: distance in **cm**, duration in **ms**, elevation in **cm**, speed in **m/s ÷ 10** - `ingest_export.py` now detects and converts these; `link_fit_files.py` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename-based approach doesn't work For a Takeout dump, the full sequence is: ```bash # 1. Extract the FIT archives in place mkdir -p /DI_CONNECT/DI-Connect-Uploaded-Files/fit unzip -d /DI_CONNECT/DI-Connect-Uploaded-Files/fit \ /DI_CONNECT/DI-Connect-Uploaded-Files/UploadedFiles_0-_Part*.zip # 2. Ingest JSON + index FITs that have parseable activity IDs uv run ingest_export.py / # 3. Link Takeout FITs to activities by start-time match uv run link_fit_files.py / # 4. (Optional) precompute time-in-zone cache uv run compute_time_in_zone.py ``` ### Garmin's units & quirks worth knowing | Field | Live API | Takeout summarizedActivities | Convert by | |---|---|---|---| | distance | m | cm | × 0.01 | | duration / movingDuration | s | ms | × 0.001 | | elevationGain / Loss | m | cm | × 0.01 | | averageSpeed / maxSpeed | m/s | m/s ÷ 10 | × 10 | | HR (avg, max) | bpm | bpm | — | | calories | kcal | kcal | — | | training_effect, vo2_max | scalar | scalar | — | | cadence (`averageRunCadence`) | both-legs SPM | both-legs SPM | — (do **not** double) | | strideLength | cm | cm | × 0.01 for m | | FIT `position_lat/long` | semicircles | semicircles | × (180 / 2³¹) for degrees | | FIT `enhanced_speed` | m/s | m/s | — | The cadence trap bit me: `averageRunCadence` on this account's exports is already both-legs (~150–180 spm). Doubling it as if it were single-leg yields 300+ which trips any sane filter, then nothing makes it through. --- ## 2 — Schema `db.py:SCHEMA` is the source of truth; what follows is the gist. | Table | Grain | Origin | Notes | |---|---|---|---| | `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; the parsed columns are a curated subset | | `activity_splits` | one row per lap | Path B only | Garmin defaults to auto-laps (~1 km or 1 mi per split). `raw` has cadence, stride, vert osc, GPS bounds | | `activity_fit_files` | one row per linked FIT | Path A | `fit_path` is **relative** to whichever export root was passed to `link_fit_files.py` — keep the export folder in place | | `activity_time_in_zone` | one row per activity | precomputed | Cached by `compute_time_in_zone.py`; `source` is `'fit'` or `'lap'` | | `daily_steps` / `daily_sleep` / `daily_stress` / `daily_hrv` / `daily_body_battery` / `daily_intensity_minutes` / `daily_resting_hr` | one row per calendar date | both paths | Each has a `raw` JSON column with the full source payload | | `sync_state` | key-value | both paths | Records last-sync / last-ingest timestamps | The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface (sleep stage breakdowns, GPS bounding box, swimming-specific metrics, sport sub-types), unpack it from `raw` rather than re-ingesting: ```python from analysis import expand_raw, load_activities acts = load_activities(open_conn(), type='running') fields = expand_raw(acts) # pd.json_normalize'd companion frame ``` --- ## 3 — Metrics & how they're calculated All loaders and helpers live in [analysis.py](analysis.py). The key derivations: ### Pace ``` pace_min_per_km = (moving_duration_s / 60) / (distance_m / 1000) ``` `moving_duration_s` excludes paused time. Activities with implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace on the fast side and walks-with-stops on the slow side) get masked to NaN in `load_activities` rather than dropped, so the row count is preserved. ### Acute:Chronic Workload Ratio (ACWR) Daily training-load sum, then: ``` acute = sum(training_load) over the last 7 days chronic = sum(training_load) over the last 28 days, divided by 4 for week-equivalence ACWR = acute / chronic ``` Sweet spot ~0.8–1.3, > 1.5 = aggressive ramp / injury risk. Implemented inline in [02_running.ipynb](notebooks/02_running.ipynb). ### Aerobic efficiency — "m/beat" ``` m_per_beat = distance_m / (duration_s * avg_hr / 60) ``` Higher = more metres covered per heartbeat = fitter aerobic system at the run's effort. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. Used as the headline view in [04_efficiency.ipynb](notebooks/04_efficiency.ipynb). ### Decoupling (Pa:Hr drift) Friel's method, in two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fueling deficit). *Negative* = improved (negative split). **Per-mile (lap level)** — `analysis.decoupling(splits, min_splits=6)`: ``` For each activity with ≥ min_splits laps: halves = first half / second half by lap count eff_half = duration-weighted mean(speed_mps) / mean(heart_rate) decoupling_pct = (eff_first / eff_second − 1) × 100 ``` Cheap, but smears across aid-station stops and rounds at lap boundaries. **Per-second (FIT)** — `analysis.fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`: ``` records = per-second FIT messages drop first warmup_min and last cooldown_min drop records where speed_mps < min_speed_mps # ignore stopped time slice remaining moving time into N equal-time chunks for each chunk: efficiency = mean(speed_mps) / mean(heart_rate) decoupling_pct = (efficiency[0] / efficiency[i] − 1) × 100 ``` Friel's published thresholds (interpret on **steady aerobic** efforts only): - **< 5%** — aerobically developed - **5–10%** — sustainable - **> 10%** — pacing unsustainable for the distance, OR fueling shortfall, OR heat In this dataset the per-second number is consistently **7–15 percentage points lower** than the per-mile number for the same run — the gap is aid-station noise. ### HR zone time-in-zone **Zone boundaries come from Garmin's `heartRateZones.json`** (training method `HR_MAX`), not estimated from observed HR. The constants for this user live in `analysis.HR_ZONES_USER`: ```python Z1: 102–122 (recovery) Z2: 123–143 (easy aerobic — long-run target) Z3: 144–164 (tempo / "junk-miles middle") Z4: 165–185 (threshold — LTHR=182 sits inside Z4) Z5: 186–209 (VO₂ max) ``` Time-in-zone has two implementations; `compute_time_in_zone.py` picks the better one per activity and caches the result in `activity_time_in_zone`. **From FIT (`source='fit'`)** — `analysis.time_in_zone_from_fit(records)`: ``` records = per-second FIT messages for each consecutive record pair: dt = elapsed_seconds delta (clipped to [0, 30] so paused recording doesn't dump hours into one zone) zone = bucket(this record's heart_rate) accumulate dt into zone ``` **From splits (`source='lap'`)** — fallback when no FIT is linked: ``` for each split: zone = bucket(split's avg_hr) accumulate split.duration_s into that single zone ``` The FIT version is the right number; the lap version is **biased toward the middle zone**: when a lap's HR average is in Z3 but the actual HR oscillated into Z2 and Z4, the lap method dumps the whole lap into Z3. On a sample 8-hour race in this dataset, the lap method over-counted Z3 by 67 minutes and under-counted Z2 and Z4 by ~32 minutes each. ### Polarized split A three-bucket simplification used in [05_intra_run.ipynb §4](notebooks/05_intra_run.ipynb): ``` easy = Z1 + Z2 (HR ≤ 143) moderate = Z3 (144–164, the "junk-miles middle") hard = Z4 + Z5 (≥ 165, threshold and up) ``` Seiler's polarized target is ~80% easy, < 10% moderate, ~20% hard. Pyramidal training is high easy, modest moderate, low hard. Threshold-heavy training is the opposite — most time in moderate. ### Cadence & stride Both come directly from Garmin's lap data (`averageRunCadence`, `strideLength`) and from FIT records (`cadence`, `step_length`). Two gotchas: 1. **Cadence is already both-legs SPM** for this user's exports — do not double. Sanity range for running: 140–200 spm. 2. **Stride length is in cm** in lap data (`strideLength: 78` → 78 cm → 0.78 m) and in **mm** in FIT records (`step_length: 1041` → 104 cm). Divide accordingly. The notebook restricts form analysis to splits with cadence > 0 (drops walks/standing intervals) and to a controlled pace band (5:30–6:30 min/km) when comparing YoY so fitness changes don't contaminate the form signal. ### GPS route clustering `analysis.cluster_routes(lats, lons, radius_km=0.25)`: ``` Greedy haversine clustering: For each unassigned point i: distances = haversine(point_i, all_other_points) neighbours = points within radius_km if len(neighbours) >= 2: assign all to next_label next_label += 1 ``` Per-route pace trends control for terrain — the same loop run in 2023 vs 2025 says way more about fitness than raw monthly pace. Adequate for a few hundred starts; switch to `sklearn.cluster.DBSCAN(metric='haversine')` for thousands. ### Banister fitness / fatigue / form (Performance Management Chart) The standard endurance-training lens (TrainingPeaks PMC), implemented in `analysis.banister(daily_load, ctl_tau=42, atl_tau=7)`: ``` CTL_today = CTL_yesterday · exp(−1/τ_CTL) + load_today · (1 − exp(−1/τ_CTL)) τ_CTL = 42d ATL_today = ATL_yesterday · exp(−1/τ_ATL) + load_today · (1 − exp(−1/τ_ATL)) τ_ATL = 7d TSB_today = CTL_yesterday − ATL_yesterday # yesterday's values (TP convention) ``` Helper: `daily_training_load_series(conn, activity_types=('running','trail_running'))` returns the per-day summed `training_load` Series that feeds it. Rest days are filled with 0 — both EWMAs still update, ATL drops faster than CTL, TSB rises. TSB interpretation (used in both notebooks 02 and 06): | TSB | meaning | |---|---| | < −30 | severely fatigued — injury risk | | −10 to −30 | productive overload — heart of a build | | −10 to 0 | balanced building | | 0 to +10 | sharpening | | **+10 to +25** | **fresh / peaked — race-day target** | | > +25 | detrained (taper too long) | [02_running.ipynb](notebooks/02_running.ipynb) plots the historical PMC; [06_race_plan.ipynb](notebooks/06_race_plan.ipynb) projects it forward through race day by converting plan-week km into estimated daily loads (separate `RACE_DAY_TL_PER_KM ≈ 7` for the race itself vs `~11` for training runs — race-day TL/km is empirically lower because ultras spread load over more time). --- ## 4 — Notebook tour Each notebook bootstraps the same way: ```python import sys; sys.path.insert(0, '..') from analysis import open_conn, load_activities, load_wellness, ... conn = open_conn() ``` | # | Notebook | Covers | |---|---|---| | 01 | Overview | Data coverage by table, recent activities, monthly mileage YoY, wellness completeness | | 02 | Running | Weekly volume + rolling, pace-vs-HR scatter, aerobic-efficiency trend, ACWR, PRs by distance | | 03 | Recovery | Sleep composition, after-run vs after-rest HRV/RHR, correlations between training and recovery, weekly km vs recovery indicators | | 04 | Efficiency | YoY m/beat by distance bucket, HR/pace split, polynomial curve fits, easy-runs-only headline view | | 05 | Intra-run | (1) per-mile decoupling, (1b) per-second decoupling on race FITs, (2) cadence & stride at controlled pace, (3) GPS route clustering, (4) HR-zone time-in-zone (FIT-based) + polarized split | | 06 | Race plan | Periodised 17-week plan around 30K → 50K → 50-mile, with auto-tracking against actual recorded volume | Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts. Edit those, then `uv run python notebooks/_build_NN.py` regenerates the .ipynb. Easier than editing 30 KB of JSON. --- ## 5 — Setup ```bash uv sync # creates .venv from pyproject.toml ``` Kernel: in VSCode, pick the project's `.venv` (top-right of any notebook). If it's not discoverable: ```bash uv run python -m ipykernel install --user --name garmin --display-name "garmin (.venv)" ``` ### Path A — official export 1. Go to and request the export. Garmin emails the link within 24–72 h. 2. Download the zip (several hundred MB to a few GB) or, for the Takeout dump, the UUID-named folder. 3. Ingest: ```bash uv run ingest_export.py path/to/connect.zip # or, if already unzipped: uv run ingest_export.py path/to// ``` Add `--dry-run` to preview without writing. The data is upserted on `activity_id` / `calendar_date`, so it's safe to re-ingest a refreshed export later. For Takeout dumps specifically, follow the 4-step extract+ingest+link+precompute sequence in §1. ### Path B — live API sync Garmin's SSO endpoint is rate-limited; you may hit 429s. If so, wait 30–60 min, switch networks, or use Path A. ```bash uv run auth.py # prompts for email / password / MFA, saves OAuth to .secrets/ uv run sync.py --full # 365-day backfill uv run sync.py # incremental thereafter (last 14 days) uv run sync.py --days 90 # custom backfill window uv run sync.py --no-details # skip per-activity detail/splits (much faster) uv run sync.py --skip-activities # wellness only ``` `garth` is deprecated upstream (). If the live sync stops working, fall back to Path A.