Go to file

noisedestroyers 6df25bcd8f P0: untrack Takeout dump + scratch DBs, add CI and DB backup script

- .gitignore now excludes the Garmin Takeout dump (0debd9f2-*/, personal
  health data), ruvector.db / agentdb.rvf scratch files, data/fit/ and
  data/backups/; untracked 23,673 files (kept on disk)
- .gitlab-ci.yml runs pytest via uv on every push
- scripts/backup_db.sh snapshots data/garmin.db (keeps last 14)
- canonical DB is garmin/data/garmin.db; the stray vault-root copy
  (Takeout ingest run from wrong cwd on 2026-06-08) is preserved at
  data/backups/vault-root-takeout-ingest-2026-06-08.db

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

2026-06-12 06:14:53 -04:00

.vscode

1.x updates

2026-05-19 08:34:22 -04:00

examples/notebooks

1.x updates

2026-05-19 08:34:22 -04:00

notebooks/.vscode

P0: untrack Takeout dump + scratch DBs, add CI and DB backup script

2026-06-12 06:14:53 -04:00

scripts

P0: untrack Takeout dump + scratch DBs, add CI and DB backup script

2026-06-12 06:14:53 -04:00

src/openrun

saving

2026-06-12 05:48:30 -04:00

tests

saving

2026-06-12 05:48:30 -04:00

.gitignore

P0: untrack Takeout dump + scratch DBs, add CI and DB backup script

2026-06-12 06:14:53 -04:00

.gitlab-ci.yml

P0: untrack Takeout dump + scratch DBs, add CI and DB backup script

2026-06-12 06:14:53 -04:00

.python-version

first commit

2026-05-18 12:53:24 -04:00

analysis.py

1.x updates

2026-05-19 08:34:22 -04:00

auth.py

1.x updates

2026-05-19 08:34:22 -04:00

compute_time_in_zone.py

1.x updates

2026-05-19 08:34:22 -04:00

db.py

1.x updates

2026-05-19 08:34:22 -04:00

ingest_export.py

1.x updates

2026-05-19 08:34:22 -04:00

link_fit_files.py

1.x updates

2026-05-19 08:34:22 -04:00

openrun.toml

1.x updates

2026-05-19 08:34:22 -04:00

pyproject.toml

saving

2026-06-12 05:48:30 -04:00

QUICKSTART.md

saving

2026-06-12 05:48:30 -04:00

README.md

saving

2026-06-12 05:48:30 -04:00

ROADMAP.md

saving

2026-06-12 05:48:30 -04:00

sync.py

1.x updates

2026-05-19 08:34:22 -04:00

uv.lock

saving

2026-06-12 05:48:30 -04:00

README.md

openrun — endurance running analytics

A local-first, open-source endurance-training analysis tool. Garmin data in (live API or official export), SQLite on disk, an in-browser dashboard out — plus a Python API and Jupyter notebooks for power users. No accounts, no cloud, no telemetry. Your data lives in one file on your machine.

What it gives you: Banister CTL/ATL/TSB (fitness/fatigue/form), Pa:HR decoupling per-mile and per-second from FIT, FIT-aware HR zone time-in-zone, GPS route clustering, race-plan projection with forward CTL/ATL/TSB, off-watch activity logging, and a first-run web wizard so non-CLI users can get going in a few minutes.

openrun/
├── pyproject.toml             # packaged as `openrun` (src layout, hatchling)
├── openrun.toml               # your config — created by the setup wizard
├── data/garmin.db             # SQLite, created on first run
├── src/openrun/
│   ├── config.py              # UserProfile, HRZones, BanisterParams + TOML reader/writer
│   ├── db.py                  # schema + connect() helper
│   ├── model.py               # loaders + derived metrics
│   ├── plots.py               # matplotlib plot helpers
│   ├── setup.py               # idempotent workspace bootstrap + status
│   ├── ingest/
│   │   ├── auth.py            # Garmin OAuth login
│   │   ├── garmin_api.py      # Path B: incremental sync via garth
│   │   ├── garmin_export.py   # Path A: parse Connect / Takeout export
│   │   ├── fit_linker.py      # match Takeout FITs by session.start_time
│   │   ├── manual.py          # CSV → manual_activities (off-watch sessions)
│   │   └── time_in_zone.py    # per-activity TIZ cache
│   └── web/                   # Streamlit app
│       ├── app.py             # controller — st.navigation builds the sidebar
│       ├── _helpers.py        # cached loaders + sidebar chrome
│       └── pages/             # one file per page
└── examples/notebooks/        # original analysis notebooks (kept as reference)

TL;DR — install and use

uv sync
uv run openrun-web      # opens http://localhost:8501

A first-launch wizard walks you through profile + HR zones + race calendar, then either ingests a Garmin export zip or takes you to manual logging.

See QUICKSTART.md for the step-by-step.

The two interfaces

Web app (recommended for everyone)

uv run openrun-web

Eight pages, all wired to the same SQLite DB:

Page	What's on it
📊 Dashboard	CTL/ATL/TSB tiles + chart, weekly km + ACWR, polarized split, recent runs
📋 Activities	Filter by type / date / distance / name → click a row to drill in
🏃 Activity detail	Splits + pace-vs-HR scatter, time-in-zone bars, FIT decoupling chart
📈 Race plan	Editable plan rows, projected PMC through race day, race-day TSB table
📝 Manual log	Form-based logging of off-watch sessions (strength, hikes, unrecorded runs)
🛌 Recovery	Sleep stages, HRV + RHR overlay, training→next-morning-HRV correlation
🫀 Efficiency	Metres per heartbeat with 30-run rolling median, distance × year heatmap
🔄 Sync	Status + in-browser zip ingest with live progress

CLI (for scripting / power users)

openrun-init                  # bootstrap + status dashboard
openrun-auth                  # Garmin OAuth login (live sync)
openrun-sync [--full]         # incremental Garmin Connect sync (downloads per-second FIT for new activities)
openrun-sync --fit-backfill --fit-type running  # pull per-second FITs for past runs, no website export
openrun-sync --no-fit         # sync without the per-second FIT download
openrun-ingest <path|zip>     # parse a Garmin export
openrun-link-fit <export>     # match Takeout FITs by start time
openrun-link-fit <root> --relink  # rewrite paths after moving the export
openrun-time-in-zone          # populate the TIZ cache
openrun-import-manual <csv>   # bulk off-watch import
openrun-web                   # launch the Streamlit app

Python API (for notebooks / one-offs)

from openrun import open_conn, load_activities, banister, daily_training_load_series

conn = open_conn()
runs = load_activities(conn, type="running")
pmc = banister(daily_training_load_series(conn, include_manual=True))

1 — Source data

Two paths in, intentionally overlapping

Path	Tool	What you get	When to use
A. Official data export (recommended for first load)	`openrun-ingest`	Complete history, FIT files, multi-year wellness	First load; periodic refreshes
B. Live API sync	`openrun-auth` then `openrun-sync` — or the web app's Sync page	Last N days, lap-level splits, per-second FIT files (downloaded via the API)	Incremental top-ups between exports

The upsert logic only overwrites the raw-JSON column and fetched_at on conflict, so Path B values aren't clobbered by Path A re-ingests.

No terminal required. The web app's Sync page now logs in to Garmin (email/password, plus an MFA code when the account needs one) and runs the same pull in-process behind a 🔄 Sync now button — with checkboxes for full backfill, per-second FIT download, and FIT backfill. Credentials go straight to Garmin; only OAuth tokens are cached in .secrets/. The CLI (openrun-auth + openrun-sync) does the identical thing for scripting.

Per-second from the API. openrun-sync downloads each new activity's original FIT (/download-service/files/activity/{id}) into data/fit/<id>.fit and links it in activity_fit_files — the same table the export-based linker writes, so decoupling, FIT-based time-in-zone, and the route map all work without a website export. Skip it with --no-fit. To pull per-second history for activities synced before this existed, run openrun-sync --fit-backfill [--fit-type running] [--fit-limit N], then openrun-time-in-zone to refresh the per-second TIZ cache. Downloads are idempotent — skipped when the FIT is already on disk and linked.

Garmin's two export formats

Garmin has two unrelated formats both called "the export":

Garmin Connect data export (connect.zip) — request via Account Management → Export Your Data. Filenames are <activity_id>_<activity_name>.fit, SI units throughout. The web app's Sync page accepts this zip directly.
Garmin Takeout dump — UUID-named folder (<uuid>_N/) with many domains (aviation, Tacx, InReach…). Running data lives under DI_CONNECT/. Different conventions:
- FIT filenames: <email>_<upload_id>.fit — upload IDs ≠ activity IDs
- summarizedActivities uses scaled-integer units: distance cm, duration ms, elevation cm, speed m/s × 0.1
- openrun-ingest detects and converts these; openrun-link-fit links FITs by content (parses session.start_time and matches activities by ±60 s) since the filename trick doesn't work.

For Takeout dumps, the full sequence is:

mkdir -p <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit
unzip -d <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit \
   <export>/DI_CONNECT/DI-Connect-Uploaded-Files/UploadedFiles_0-_Part*.zip

uv run openrun-ingest <export>/
uv run openrun-link-fit <export>/
uv run openrun-time-in-zone

Off-watch activities

Recorded Garmin activity isn't the whole picture — strength work, hikes, runs you forgot to record, group runs on someone else's watch. The web app's Manual log writes to a separate manual_activities table; loaders union it into the PMC when you pass include_manual=True (the Dashboard does this by default).

For bulk import, drop a CSV with activity_date,activity_type,distance_km,duration_min,training_load,notes[,external_id] and run uv run openrun-import-manual workouts.csv. The optional external_id makes re-imports idempotent.

Units worth knowing (Connect vs Takeout)

Field	Live API / Connect	Takeout `summarizedActivities`	Convert by
distance	m	cm	× 0.01
duration / movingDuration	s	ms	× 0.001
elevationGain / Loss	m	cm	× 0.01
averageSpeed / maxSpeed	m/s	m/s × 0.1	× 10
HR (avg, max), calories, training_effect, vo2_max	bpm / kcal / scalar	—	—
cadence (`averageRunCadence`)	both-legs SPM	both-legs SPM	— (do not double)
strideLength	cm	cm	× 0.01 for m
FIT `position_lat/long`	semicircles	semicircles	× (180 / 2³¹) for degrees
FIT `enhanced_speed`	m/s	m/s	—

The cadence trap is real: averageRunCadence is already both-legs (~150–180 spm). Doubling it as if it were single-leg yields 300+, which trips any sane filter.

2 — Schema

db.py:SCHEMA is the source of truth; what follows is the gist.

Table	Grain	Origin	Notes
`activities`	one row per activity	both paths	`raw` JSON has every Garmin field; parsed columns are a curated subset
`activity_splits`	one row per lap	Path B only	Auto-laps (~1 km or 1 mi). `raw` has cadence, stride, GPS bounds
`activity_fit_files`	one row per linked FIT	linker	`fit_path` is stored absolute at link time. Move the export → run `openrun-link-fit --relink`
`activity_time_in_zone`	one row per activity	precomputed	`source = 'fit'` (per-second) or `'lap'` (split-average)
`manual_activities`	one row per logged session	web form / CSV	Parallel to `activities`, never clobbered by Garmin re-ingest
`race_plan`	one row per ISO-Monday week	web editor	Feeds `banister_forecast` for projected PMC
`daily_*` (steps / sleep / stress / hrv / body_battery / intensity_minutes / resting_hr)	one row per date	both paths	Each has a `raw` JSON column with the full payload
`sync_state`	key/value	both paths	last-sync / last-ingest timestamps

The raw column on every table is the full upstream JSON. If you need a field the schema doesn't surface, unpack it from raw:

from openrun import open_conn, load_activities, expand_raw
acts = load_activities(open_conn(), type="running")
fields = expand_raw(acts)  # pd.json_normalize'd companion frame

3 — Metrics & how they're calculated

All loaders and helpers live in src/openrun/model.py. Highlights:

Pace

pace_min_per_km = (moving_duration_s / 60) / (distance_m / 1000)

Implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace and walks-with-stops) get masked to NaN in load_activities rather than dropped, so row count is preserved.

Acute:Chronic Workload Ratio (ACWR)

acute   = sum(training_load) over the last 7 days
chronic = sum(training_load) over the last 28 days, divided by 4 for week-equivalence
ACWR    = acute / chronic

Sweet spot ~0.8–1.3, > 1.5 = aggressive ramp / injury risk. Surfaced on the Dashboard.

Aerobic efficiency — "m/beat"

m_per_beat = distance_m / (duration_s * avg_hr / 60)

Higher = more metres per heartbeat = fitter aerobic system. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. The Efficiency page does this with a 30-run rolling median plus a distance-bucket × year heatmap and an "easy runs only" headline view (HR < 75 % LTHR).

Decoupling (Pa:Hr drift)

Friel's method, two resolutions. Positive = pace/HR fell off in the second half (cardiac drift, possibly fuelling deficit). Negative = improved (negative split).

Per-mile (lap level) — decoupling(splits, min_splits=6):

For each activity with ≥ min_splits laps:
    halves   = first / second by lap count
    eff_half = duration-weighted mean(speed_mps) / mean(heart_rate)
    decoupling_pct = (eff_first / eff_second − 1) × 100

Per-second (FIT) — fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5):

records = per-second FIT messages
drop first warmup_min and last cooldown_min
drop records where speed_mps < min_speed_mps     # ignore stopped time
slice remaining moving time into N equal-time chunks
for each chunk:
    efficiency = mean(speed_mps) / mean(heart_rate)
decoupling_pct = (efficiency[0] / efficiency[i] − 1) × 100

Friel's thresholds (interpret on steady aerobic efforts only):

< 5 % — aerobically developed
5–10 % — sustainable
> 10 % — pacing unsustainable for the distance, OR fueling shortfall, OR heat

The Activity-detail page plots this with plot_fit_decoupling.

HR zone time-in-zone

Zone boundaries come from [user.hr_zones] in your openrun.toml (matching what Garmin's heartRateZones.json advertises). Two implementations; openrun-time-in-zone picks the better one per activity and caches it.

From FIT (source='fit') — time_in_zone_from_fit(records): per-second; large gaps (>30 s, paused recording) are clipped to 30 s.

From splits (source='lap') — fallback when no FIT is linked: assigns each split's avg HR to a single zone. Biased toward the middle zone (smooths over within-lap variation); the FIT version is ground truth where available.

Polarized split

easy     = Z1 + Z2     (recovery + easy aerobic)
moderate = Z3          (tempo)
hard     = Z4 + Z5     (threshold + VO₂)

Seiler's polarised target is ~80 % easy, < 10 % moderate, ~20 % hard. The Dashboard shows your last-12-week split as tiles plus a stacked bar.

Banister fitness / fatigue / form (PMC)

The standard endurance lens (TrainingPeaks PMC), in banister(daily_load, ctl_tau=42, atl_tau=7):

CTL_today = CTL_yesterday · exp(−1/τ_CTL) + load_today · (1 − exp(−1/τ_CTL))   τ_CTL = 42d
ATL_today = ATL_yesterday · exp(−1/τ_ATL) + load_today · (1 − exp(−1/τ_ATL))   τ_ATL =  7d
TSB_today = CTL_yesterday − ATL_yesterday    # yesterday's values (TP convention)

daily_training_load_series(conn, *, include_manual=True) is the canonical input — it unions Garmin-recorded TL with manual_activities. Rest days are filled with 0; both EWMAs still update.

TSB interpretation (used across the app):

TSB	meaning
< −30	severely fatigued — injury risk
−10 to −30	productive overload — heart of a build
−10 to 0	balanced building
0 to +10	sharpening
+10 to +25	fresh / peaked — race-day target
> +25	detrained (taper too long)

banister_forecast(history, future, *, today=None) splices historical load with planned future load and runs the same recursion forward, so the Race-plan page can project CTL/ATL/TSB through race day. The planned-future series comes from plan_to_daily_load(plan, *, tl_per_km, race_day_tl_per_km, race_dates) — convert your weekly plan rows into per-day load using a long-run-Saturday-heavy distribution (Sat 40 %, Tue 20 %, Thu 20 %, Mon 10 %, Wed 10 %), with race weeks treating race day specially.

calibrate_tl_per_km(conn) reports the empirical median + IQR of training_load / km from your history — that's the number to feed into the plan, rather than a hard-coded constant.

Personal records

personal_records(activities, distance_bins_km=(5, 10, 21.0975, 42.195, 50), tolerance=0.05) picks the fastest run within ±5 % of each bin distance.

GPS route clustering

cluster_routes(lats, lons, radius_km=0.25) — greedy haversine-radius clustering of run starts. Per-route pace trends control for terrain (the same loop in 2023 vs 2025 says more about fitness than raw monthly pace). Adequate for hundreds of starts; the README's old TODO for DBSCAN at thousands is still open.

4 — Reference notebooks (kept around for power-user analysis)

Each notebook under examples/notebooks/ bootstraps the same way:

from openrun import open_conn, load_activities, ...
conn = open_conn()

#	Notebook	Covers	Web equivalent
01	Overview	Data coverage, recent activities	Dashboard
02	Running	Weekly volume, pace-HR, PMC, PRs	Dashboard + Activities
03	Recovery	Sleep composition, HRV, RHR, body battery	Recovery
04	Efficiency	YoY m/beat	Efficiency
05	Intra-run	Decoupling, cadence, route clusters, TIZ	Activity detail
06	Race plan	Periodised plan + projected PMC	Race plan

Notebooks 05 and 06 are generated from _build_05.py / _build_06.py build scripts — edit those, then uv run python examples/notebooks/_build_NN.py regenerates the .ipynb. They're kept as a reference for the math the web app implements.

5 — Setup

uv sync                          # creates .venv from pyproject.toml
uv run openrun-web               # launch the web app

The first time you open the app, a setup wizard collects your profile + HR zones, optionally takes a race calendar, and offers to ingest a Garmin export zip in-browser. Your config is written to openrun.toml in the current directory — edit by hand any time.

If you'd rather work from the CLI directly:

uv run openrun-init              # bootstrap (no input required)
uv run openrun-ingest <export>   # or
uv run openrun-auth              # OAuth login (Path B)
uv run openrun-sync --full       # 365-day backfill

garth (Garmin's live-sync library) is deprecated upstream (https://github.com/matin/garth/discussions/222). If the live sync ever stops working, fall back to Path A.

6 — Configuration

openrun.toml — created by the setup wizard, hand-editable.

[user]
name        = "Athlete"
hr_max      = 200
lthr        = 175
resting_hr  = 50

[user.hr_zones]
Z1 = [100, 120]
Z2 = [121, 140]
Z3 = [141, 160]
Z4 = [161, 180]
Z5 = [181, 200]

[db]
path = "data/garmin.db"

[banister]
ctl_tau_days       = 42.0
atl_tau_days       = 7.0
race_day_tl_per_km = 7.0

[[races]]
label = "wk 4 — 30K"
date  = "2026-06-13"

Discovery walks up from cwd looking for openrun.toml, falls back to $OPENRUN_CONFIG, then ~/.config/openrun/config.toml, then built-in defaults.

7 — Tests

uv run pytest

tests/unit/ is the pure-function surface (loaders, derivations, helpers). tests/integration/ is ingest-path + cross-cutting (tmp_conn fixture in tests/conftest.py builds an in-memory schema). The Streamlit pages are smoke-tested via streamlit.testing.v1.AppTest — see the test invocations in ROADMAP.md sections.

README.md Unescape Escape

openrun — endurance running analytics

TL;DR — install and use

The two interfaces

Web app (recommended for everyone)

CLI (for scripting / power users)

Python API (for notebooks / one-offs)

1 — Source data

Two paths in, intentionally overlapping

Garmin's two export formats

Off-watch activities

Units worth knowing (Connect vs Takeout)

2 — Schema

3 — Metrics & how they're calculated

Pace

Acute:Chronic Workload Ratio (ACWR)

Aerobic efficiency — "m/beat"

Decoupling (Pa:Hr drift)

HR zone time-in-zone

Polarized split

Banister fitness / fatigue / form (PMC)

Personal records

GPS route clustering

4 — Reference notebooks (kept around for power-user analysis)

5 — Setup

6 — Configuration

7 — Tests

README.md