Files
openrun/examples/notebooks/_build_05.py
2026-05-19 08:34:22 -04:00

700 lines
33 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""One-shot builder for notebooks/05_intra_run.ipynb.
Run with: uv run python notebooks/_build_05.py
"""
from __future__ import annotations
import json
from pathlib import Path
def md(*lines: str) -> dict:
return {"cell_type": "markdown", "metadata": {}, "source": _join(lines)}
def code(*lines: str) -> dict:
return {
"cell_type": "code",
"execution_count": None,
"metadata": {},
"outputs": [],
"source": _join(lines),
}
def _join(lines):
text = "\n".join(lines)
# Jupyter expects a list where each entry ends with \n except possibly the last.
parts = text.split("\n")
return [p + ("\n" if i < len(parts) - 1 else "") for i, p in enumerate(parts)]
cells: list[dict] = []
cells.append(md(
"# 05 — Intra-run dynamics",
"",
"Within-run signals from lap-level splits: cardiac drift, cadence/stride, route-controlled pace, HR-zone distribution.",
"",
"**Data note.** This project's sync was via the Garmin live API (`sync.py`), not the official zip export, so `activity_fit_files` is empty and per-second FIT data isn't available. Everything here runs on `activity_splits` (per-mile laps, ~2 000 rows). When FIT files arrive via `ingest_export.py`, these same analyses upgrade to per-second resolution — only the loader needs to change.",
))
cells.append(code(
"import sys",
"sys.path.insert(0, '..')",
"import numpy as np",
"import pandas as pd",
"import matplotlib.pyplot as plt",
"from openrun import (",
" open_conn, load_splits, decoupling,",
" assign_hr_zone, cluster_routes, haversine_km,",
")",
"",
"conn = open_conn()",
"splits = load_splits(conn) # running only by default",
"print(f'{len(splits):,} splits across {splits.activity_id.nunique():,} runs, "
"{splits.start_time_local.min().date()} → {splits.start_time_local.max().date()}')",
))
# ----- Section 1: cardiac drift -----
cells.append(md(
"## 1. Cardiac drift (Pa:Hr decoupling)",
"",
"Within a single run, divide the laps into first half and second half. For each half compute the duration-weighted ratio `speed / HR` — essentially \"pace per heartbeat,\" the gold-standard aerobic-fitness index. Decoupling = how much that ratio falls between halves.",
"",
"$$\\text{decoupling}\\;\\% = \\left(\\frac{(\\text{speed}/\\text{HR})_{1st}}{(\\text{speed}/\\text{HR})_{2nd}} - 1\\right) \\times 100$$",
"",
"Friel's rule of thumb for steady aerobic runs:",
"- **< 5 %** — aerobically developed",
"- **510 %** — moderate drift; sustainable",
"- **> 10 %** — significant drift; pace was unsustainable or it was a hot/hard day",
"",
"Negative values mean you ran *more efficiently* in the second half (a negative split or conservative opener).",
))
cells.append(code(
"dec = decoupling(splits, min_splits=6)",
"print(f'{len(dec)} runs with ≥ 6 splits')",
"dec['decoupling_pct'].describe().round(2)",
))
cells.append(code(
"fig, axes = plt.subplots(1, 2, figsize=(13, 4.5))",
"",
"ax = axes[0]",
"ax.hist(dec['decoupling_pct'], bins=30, edgecolor='white')",
"for x, lab, color in [(5, 'good <5%', '#2a9d8f'), (10, 'caution 10%', '#e76f51')]:",
" ax.axvline(x, color=color, ls='--', lw=1.2, label=lab)",
"ax.axvline(0, color='gray', lw=0.8)",
"ax.set_xlabel('decoupling (%)'); ax.set_ylabel('runs')",
"ax.set_title(f'Pa:Hr decoupling distribution (n={len(dec)})')",
"ax.legend()",
"",
"ax = axes[1]",
"sc = ax.scatter(dec['start_time_local'], dec['decoupling_pct'],",
" c=dec['distance_km'], cmap='viridis', alpha=0.75, s=28)",
"ax.axhline(5, color='#2a9d8f', ls='--', lw=1)",
"ax.axhline(10, color='#e76f51', ls='--', lw=1)",
"ax.axhline(0, color='gray', lw=0.6)",
"ax.set_ylabel('decoupling (%)'); ax.set_title('decoupling over time (color = distance km)')",
"plt.colorbar(sc, ax=ax, label='distance (km)')",
"fig.autofmt_xdate()",
"fig.tight_layout()",
))
cells.append(md(
"### Aerobic runs only — the clean view",
"",
"Decoupling is only interpretable on **steady aerobic** efforts. Filter to runs ≥ 8 km with avg HR < 165 (well below threshold for a sub-3:30 marathoner) and look at the trend. Less drift over time = better aerobic conditioning.",
))
cells.append(code(
"aerobic = dec[(dec['distance_km'] >= 8) & (dec['avg_hr'] < 165)].copy()",
"aerobic['quarter'] = aerobic['start_time_local'].dt.to_period('Q').dt.to_timestamp()",
"print(f'{len(aerobic)} aerobic runs')",
"",
"fig, ax = plt.subplots(figsize=(11, 4.5))",
"ax.scatter(aerobic['start_time_local'], aerobic['decoupling_pct'],",
" c=aerobic['avg_hr'], cmap='magma_r', s=40, alpha=0.85)",
"",
"# rolling median (smooth trend)",
"aerobic_sorted = aerobic.sort_values('start_time_local')",
"rolling = aerobic_sorted.set_index('start_time_local')['decoupling_pct'].rolling('120D', min_periods=5).median()",
"ax.plot(rolling.index, rolling.values, color='black', lw=2, label='120-day rolling median')",
"",
"ax.axhline(5, color='#2a9d8f', ls='--', lw=1)",
"ax.axhline(10, color='#e76f51', ls='--', lw=1)",
"ax.set_ylabel('decoupling (%)')",
"ax.set_title('Cardiac drift on aerobic runs (≥8 km, avg HR < 165)')",
"ax.legend(loc='upper right')",
"fig.autofmt_xdate()",
"fig.tight_layout()",
))
cells.append(code(
"# Year-over-year aerobic decoupling summary",
"aerobic.groupby('year').agg(",
" n=('decoupling_pct', 'size'),",
" median_drift=('decoupling_pct', 'median'),",
" mean_drift=('decoupling_pct', 'mean'),",
" pct_under_5=('decoupling_pct', lambda s: (s < 5).mean() * 100),",
").round(2)",
))
# ----- Section 1b: per-second race-day decoupling from FIT files -----
cells.append(md(
"## 1b. Per-second decoupling — race-day deep dive",
"",
"Lap-level decoupling (above) is coarse. With FIT files linked (since the takeout-export ingest), we can read the per-second `heart_rate` and `enhanced_speed` directly and compute Friel's decoupling without the noise from aid-station stops and lap rounding.",
"",
"**Method:**",
"1. Drop the first 5 min (warmup) and last 2 min (cooldown / finish sprint).",
"2. Drop records with speed < 0.5 m/s — aid-station pauses don't drag the mean.",
"3. Slice the moving time into equal-time chunks (halves or quartiles).",
"4. For each chunk: `efficiency = mean(speed) / mean(HR)`.",
"5. `decoupling % = (eff_first / eff_chunk 1) × 100` — positive = drift.",
"",
"Friel's rule: < 5% on a steady aerobic run = aerobically developed; > 10% = unsustainable pacing or fueling deficit. Race-day numbers are expected to be higher than training (you push the back half), but *how much* higher matters.",
))
cells.append(code(
"from openrun import load_fit_records, fit_decoupling, fit_rolling_efficiency",
"",
"race_meta = pd.read_sql('''",
" SELECT a.activity_id, a.start_time_local, a.distance_m/1000 AS km, a.avg_hr",
" FROM activities a JOIN activity_fit_files f USING(activity_id)",
" WHERE a.distance_m >= 45000 AND a.distance_m <= 60000",
" AND a.activity_type='running'",
" ORDER BY a.start_time_local",
"''', conn, parse_dates=['start_time_local'])",
"print(f'{len(race_meta)} prior 50K-class races with FIT linked:')",
"race_meta",
))
cells.append(code(
"# Load all four FITs once, cache the records frames",
"race_records = {}",
"for _, r in race_meta.iterrows():",
" aid = int(r['activity_id'])",
" race_records[aid] = load_fit_records(conn, aid)",
" print(f\" {r['start_time_local'].date()} aid={aid} records={len(race_records[aid]):,}\")",
))
cells.append(md(
"### Halves and quartiles — when does the drift start?",
))
cells.append(code(
"halves = []",
"quarts = []",
"for _, r in race_meta.iterrows():",
" aid = int(r['activity_id'])",
" h = fit_decoupling(race_records[aid], segments=2)",
" h.insert(0, 'race', r['start_time_local'].date())",
" halves.append(h)",
" q = fit_decoupling(race_records[aid], segments=4)",
" q.insert(0, 'race', r['start_time_local'].date())",
" quarts.append(q)",
"halves_df = pd.concat(halves, ignore_index=True)",
"quarts_df = pd.concat(quarts, ignore_index=True)",
"",
"print('Per-race half-by-half decoupling:')",
"(halves_df.pivot(index='race', columns='segment', values='decoupling_pct')",
" .round(1).rename(columns={1:'Q1+Q2', 2:'Q3+Q4'}))",
))
cells.append(code(
"print('Quartile decoupling — where the drift actually starts:')",
"(quarts_df.pivot(index='race', columns='segment', values='decoupling_pct')",
" .round(1).rename(columns={1:'Q1',2:'Q2',3:'Q3',4:'Q4'}))",
))
cells.append(code(
"# Visualise quartile decoupling — bars per race, grouped by quartile",
"fig, ax = plt.subplots(figsize=(11, 4.5))",
"races = sorted(quarts_df['race'].unique())",
"x = np.arange(len(races))",
"w = 0.2",
"colors = ['#2a9d8f', '#e9c46a', '#f4a261', '#e76f51'] # cool → hot",
"for i in range(1, 5):",
" vals = [quarts_df[(quarts_df.race == r) & (quarts_df.segment == i)]['decoupling_pct'].iloc[0]",
" for r in races]",
" ax.bar(x + (i - 2.5) * w, vals, w, color=colors[i-1], label=f'Q{i}')",
"ax.axhline(0, color='black', lw=0.5)",
"ax.axhline(10, color='gray', ls='--', lw=1, label='Friel \"unsustainable\" threshold')",
"ax.set_xticks(x)",
"ax.set_xticklabels([str(r) for r in races])",
"ax.set_ylabel('decoupling (%)')",
"ax.set_title('Per-second decoupling by race quartile — the wall lands in Q3 every time')",
"ax.legend(loc='upper left', ncol=5)",
"fig.tight_layout()",
))
cells.append(md(
"### Rolling efficiency curves — when does the wheels-come-off moment hit?",
"",
"5-minute rolling speed/HR over elapsed time. Flat = pacing matches HR. Falling curve = decoupling in progress. The y-axis is the same physical quantity Friel's method aggregates, just plotted continuously.",
))
cells.append(code(
"fig, axes = plt.subplots(len(race_meta), 1, figsize=(13, 2.5 * len(race_meta)),",
" sharex=True, squeeze=False)",
"axes = axes.flatten()",
"for ax, (_, r) in zip(axes, race_meta.iterrows()):",
" aid = int(r['activity_id'])",
" rolled = fit_rolling_efficiency(race_records[aid], window_s=300)",
" valid = rolled.dropna(subset=['rolling_efficiency'])",
" ax.plot(valid['elapsed_min'], valid['rolling_efficiency'], color='#264653', lw=1.5)",
" # Normalise against the first 30 minutes' mean to show % drop",
" base = valid.loc[valid['elapsed_min'] < 30, 'rolling_efficiency'].mean()",
" if base and base > 0:",
" ax2 = ax.twinx()",
" ax2.plot(valid['elapsed_min'], (valid['rolling_efficiency'] / base - 1) * 100,",
" color='#e76f51', lw=1, alpha=0.6)",
" ax2.set_ylabel('% vs first 30 min', color='#e76f51', fontsize=9)",
" ax2.axhline(0, color='#e76f51', ls=':', lw=0.6, alpha=0.5)",
" ax.set_ylabel('speed / HR')",
" ax.set_title(f\"{r['start_time_local'].date()} — {r['km']:.1f} km, avg HR {r['avg_hr']:.0f}\",",
" fontsize=10)",
"axes[-1].set_xlabel('elapsed minutes')",
"fig.suptitle('Rolling efficiency through each race (5-min window)', y=1.01)",
"fig.tight_layout()",
))
cells.append(md(
"### HR and pace traces, side by side",
"",
"Same data, separated: HR (left axis, magma colour-scale) and pace (right axis, inverted so faster is up). The interesting moments are where the curves *diverge* — HR climbing while pace stays flat (drift) or HR steady while pace falls (just tired legs).",
))
cells.append(code(
"fig, axes = plt.subplots(len(race_meta), 1, figsize=(13, 2.8 * len(race_meta)),",
" sharex=True, squeeze=False)",
"axes = axes.flatten()",
"for ax, (_, r) in zip(axes, race_meta.iterrows()):",
" aid = int(r['activity_id'])",
" rec = race_records[aid].dropna(subset=['heart_rate','speed_mps','elapsed_s'])",
" rec = rec[rec['speed_mps'] > 0.5]",
" em = rec['elapsed_s'] / 60",
" rolled_hr = rec['heart_rate'].rolling(300, min_periods=30).mean()",
" rolled_pace = (1 / rec['speed_mps']) * 1000 / 60",
" rolled_pace = rolled_pace.rolling(300, min_periods=30).mean()",
" ax.plot(em, rolled_hr, color='#9b2226', lw=1.4, label='HR (5-min avg)')",
" ax.set_ylabel('HR (bpm)', color='#9b2226')",
" ax.tick_params(axis='y', labelcolor='#9b2226')",
" ax2 = ax.twinx()",
" ax2.plot(em, rolled_pace, color='#264653', lw=1.4, label='pace (5-min avg)')",
" ax2.set_ylabel('pace (min/km)', color='#264653')",
" ax2.tick_params(axis='y', labelcolor='#264653')",
" ax2.invert_yaxis() # faster = up",
" ax.set_title(f\"{r['start_time_local'].date()} — {r['km']:.1f} km\", fontsize=10)",
"axes[-1].set_xlabel('elapsed minutes')",
"fig.suptitle('HR (red) and pace (dark) — divergence = decoupling', y=1.01)",
"fig.tight_layout()",
))
cells.append(md(
"### Per-second vs per-mile decoupling — sanity check",
"",
"How does the FIT-derived number compare to the lap-level decoupling we computed in §1? Per-second is correctly excluding stopped time and lap rounding, so should be **lower** than the per-mile number for the same race — but the qualitative ranking should agree.",
))
cells.append(code(
"# Pull the per-mile (§1) value for each race and compare to per-second",
"lap_dec = dec.set_index('activity_id')['decoupling_pct']",
"rows = []",
"for _, r in race_meta.iterrows():",
" aid = int(r['activity_id'])",
" ps = halves_df[(halves_df.race == r['start_time_local'].date()) & (halves_df.segment == 2)]['decoupling_pct'].iloc[0]",
" lap = lap_dec.get(aid, float('nan'))",
" rows.append({'race': r['start_time_local'].date(), 'km': r['km'],",
" 'per_mile_decoupling_pct': round(lap, 1),",
" 'per_second_decoupling_pct': round(ps, 1),",
" 'delta': round(lap - ps, 1) if not pd.isna(lap) else None})",
"pd.DataFrame(rows)",
))
cells.append(md(
"### What this means for the 50-mile",
"",
"The per-second view localises the drift: in every prior race the wheels come off around the **4-hour mark** (between Q2 and Q3). For the 50-mile that's roughly halfway through the race — exactly when fueling errors stop being recoverable.",
"",
"Three concrete implications:",
"",
"1. **Front-load fueling.** The textbook glycogen depletion curve says 90 min of running on stored glycogen, then performance falls off without external carbs. Q1 (the easy half) shouldn't be a fueling holiday — every aid station, every hour, from the start.",
"2. **Recalibrate pace by HR, not by feel.** The rolling-efficiency plots show HR rising while pace falls. Setting an HR ceiling (e.g. Z2 top = 143 bpm for the long run, slightly higher for race) and *enforcing it* would flatten the Q3 collapse.",
"3. **What success looks like on Sept 12.** A 50-mile race executed cleanly should look like the *first half* of these 50K curves repeated twice. If the Q3 wall reappears around hour 45, treat it as a planned aid-station break to top up calories before continuing.",
))
# ----- Section 2: cadence + stride -----
cells.append(md(
"## 2. Cadence and stride length",
"",
"At a given pace, faster runners tend to have **higher cadence and shorter stride**. Watching cadence-vs-pace and stride-vs-pace by year shows whether form is shifting independently of fitness.",
"",
"Garmin's `averageRunCadence` per split is already **both-legs** steps-per-minute (typical running range 150185). `strideLength` is in cm.",
))
cells.append(code(
"form = splits.dropna(subset=['averageRunCadence', 'strideLength', 'pace_min_per_km']).copy()",
"form = form[form['averageRunCadence'] > 0].copy() # zeros = walking/standing intervals",
"form['cadence_spm'] = form['averageRunCadence'] # already both-legs SPM",
"form['stride_m'] = form['strideLength'] / 100",
"form = form[(form['cadence_spm'] >= 140) & (form['cadence_spm'] <= 200)] # drop walks/junk",
"print(f'{len(form):,} clean form splits, {form.activity_id.nunique()} runs')",
"",
"fig, axes = plt.subplots(1, 2, figsize=(13, 5), sharex=True)",
"years = sorted(form['year'].unique())",
"cmap = plt.cm.viridis(np.linspace(0, 0.9, len(years)))",
"",
"for c, y in zip(cmap, years):",
" d = form[form['year'] == y]",
" axes[0].scatter(d['pace_min_per_km'], d['cadence_spm'], s=8, alpha=0.35, color=c, label=str(y))",
" axes[1].scatter(d['pace_min_per_km'], d['stride_m'], s=8, alpha=0.35, color=c, label=str(y))",
"",
"axes[0].set_ylabel('cadence (steps/min, both legs)')",
"axes[1].set_ylabel('stride length (m)')",
"for ax in axes:",
" ax.set_xlabel('pace (min/km)')",
" ax.invert_xaxis() # faster runs to the right",
"axes[0].legend(title='year', loc='lower left', fontsize=8)",
"axes[0].set_title('Cadence vs pace')",
"axes[1].set_title('Stride length vs pace')",
"fig.tight_layout()",
))
cells.append(md(
"### Form at a controlled pace",
"",
"Bin splits into a narrow easy-pace band (5:306:30 min/km) and look at cadence / stride / vertical metrics year-over-year. Holding pace constant strips out the obvious \"faster = higher cadence\" effect and isolates technique drift.",
))
cells.append(code(
"band = form[(form['pace_min_per_km'] >= 5.5) & (form['pace_min_per_km'] <= 6.5)].copy()",
"vert = splits.dropna(subset=['verticalOscillation', 'verticalRatio', 'groundContactTime'])",
"vert = vert[(vert['pace_min_per_km'] >= 5.5) & (vert['pace_min_per_km'] <= 6.5)]",
"",
"summary = band.groupby('year').agg(",
" n_splits=('cadence_spm', 'size'),",
" cadence_med=('cadence_spm', 'median'),",
" stride_med=('stride_m', 'median'),",
")",
"vsum = vert.groupby('year').agg(",
" vert_osc_cm=('verticalOscillation', 'median'),",
" vert_ratio=('verticalRatio', 'median'),",
" gct_ms=('groundContactTime', 'median'),",
")",
"summary = summary.join(vsum).round(2)",
"summary",
))
cells.append(code(
"fig, axes = plt.subplots(1, 3, figsize=(14, 4))",
"summary['cadence_med'].plot(kind='bar', ax=axes[0], color='#264653')",
"axes[0].set_ylabel('cadence (spm)'); axes[0].set_title('Cadence at easy pace')",
"axes[0].set_ylim(summary['cadence_med'].min() - 3, summary['cadence_med'].max() + 3)",
"",
"summary['stride_med'].plot(kind='bar', ax=axes[1], color='#2a9d8f')",
"axes[1].set_ylabel('stride length (m)'); axes[1].set_title('Stride at easy pace')",
"axes[1].set_ylim(summary['stride_med'].min() - 0.05, summary['stride_med'].max() + 0.05)",
"",
"if summary['vert_osc_cm'].notna().any():",
" summary['vert_osc_cm'].plot(kind='bar', ax=axes[2], color='#e76f51')",
" axes[2].set_ylabel('vertical oscillation (cm)'); axes[2].set_title('Vertical bounce at easy pace')",
" axes[2].set_ylim(summary['vert_osc_cm'].min() - 0.5, summary['vert_osc_cm'].max() + 0.5)",
"else:",
" axes[2].text(0.5, 0.5, 'no vertical-osc data', ha='center', va='center', transform=axes[2].transAxes)",
"",
"for ax in axes:",
" ax.set_xlabel('year')",
"fig.suptitle('Form metrics in the 5:306:30 min/km band, year over year')",
"fig.tight_layout()",
))
# ----- Section 3: GPS route clustering -----
cells.append(md(
"## 3. Route clustering — pace controlled for terrain",
"",
"Raw pace year-over-year mixes terrain, weather, intent. Cluster runs by their **start coordinates** (greedy haversine, 250 m radius) and you get \"my usual routes.\" Within a cluster the route is roughly the same, so pace differences are mostly fitness, not geography.",
))
cells.append(code(
"starts = (splits.dropna(subset=['startLatitude', 'startLongitude'])",
" .groupby('activity_id')",
" .agg(lat=('startLatitude', 'first'),",
" lon=('startLongitude', 'first'),",
" start_time=('start_time_local', 'first'),",
" distance_km=('distance_m', lambda s: s.sum() / 1000),",
" avg_hr=('avg_hr', 'mean'),",
" avg_pace=('pace_min_per_km', 'mean')))",
"starts['cluster'] = cluster_routes(starts['lat'].values, starts['lon'].values, radius_km=0.25)",
"print(f'{len(starts)} runs with start coords; {(starts.cluster >= 0).sum()} clustered, "
"{(starts.cluster == -1).sum()} singletons')",
"",
"top = (starts[starts.cluster >= 0]",
" .groupby('cluster')",
" .agg(n=('cluster', 'size'),",
" lat=('lat', 'median'),",
" lon=('lon', 'median'),",
" first=('start_time', 'min'),",
" last=('start_time', 'max'),",
" med_dist=('distance_km', 'median'))",
" .sort_values('n', ascending=False))",
"top.head(10)",
))
cells.append(code(
"fig, ax = plt.subplots(figsize=(9, 7))",
"",
"# unclustered points faint",
"unc = starts[starts.cluster == -1]",
"ax.scatter(unc['lon'], unc['lat'], color='lightgray', s=12, alpha=0.6, label='singletons')",
"",
"# top-10 clusters colored",
"top10 = top.head(10).index.tolist()",
"cmap = plt.cm.tab10(np.linspace(0, 1, len(top10)))",
"for c, cl in zip(cmap, top10):",
" pts = starts[starts.cluster == cl]",
" ax.scatter(pts['lon'], pts['lat'], color=c, s=40, alpha=0.8,",
" label=f'route #{cl} (n={len(pts)})')",
"",
"ax.set_xlabel('longitude'); ax.set_ylabel('latitude')",
"ax.set_title('Run start points — top 10 recurring routes')",
"ax.legend(loc='best', fontsize=8)",
"ax.set_aspect('equal', adjustable='datalim')",
"fig.tight_layout()",
))
cells.append(md(
"### Pace progression within each frequent route",
"",
"Now restrict to clusters with ≥5 runs and plot pace over time per route. Slopes here are much cleaner than the global pace trend because terrain is held constant.",
))
cells.append(code(
"frequent = top[top['n'] >= 5].index.tolist()",
"print(f'{len(frequent)} routes with ≥5 runs')",
"",
"if frequent:",
" n_cols = min(3, len(frequent))",
" n_rows = (len(frequent) + n_cols - 1) // n_cols",
" fig, axes = plt.subplots(n_rows, n_cols, figsize=(5 * n_cols, 3.2 * n_rows),",
" squeeze=False, sharey=True)",
" for ax, cl in zip(axes.flat, frequent):",
" d = (starts[starts.cluster == cl]",
" .dropna(subset=['avg_pace'])",
" .sort_values('start_time'))",
" ax.scatter(d['start_time'], d['avg_pace'], s=30, alpha=0.75, color='#264653')",
" if len(d) >= 3:",
" # rolling median by date",
" rd = d.set_index('start_time')['avg_pace'].rolling('120D', min_periods=2).median()",
" ax.plot(rd.index, rd.values, color='#e76f51', lw=1.5)",
" ax.invert_yaxis() # faster up",
" ax.set_title(f'route #{cl} — n={len(d)}, ~{top.loc[cl, \"med_dist\"]:.1f} km')",
" ax.set_ylabel('pace (min/km)')",
" # blank unused axes",
" for ax in axes.flat[len(frequent):]:",
" ax.set_visible(False)",
" fig.suptitle('Per-route pace over time (terrain held roughly constant)')",
" fig.tight_layout()",
"else:",
" print('No clusters with ≥5 runs.')",
))
# ----- Section 4: HR zones (Garmin-configured, FIT-based when available) -----
cells.append(md(
"## 4. HR-zone time-in-zone (Garmin-configured zones, per-second when possible)",
"",
"**Zones come from Garmin's `heartRateZones.json` (training method: HR_MAX),** not estimated from observed HR. Lactate-threshold HR sits at 182 inside Z4.",
"",
"| Zone | range (bpm) | feel | role |",
"|------|-------------|------|------|",
"| Z1 | 102122 | walk / recovery | active rest |",
"| Z2 | 123143 | conversational | **long-run target** |",
"| Z3 | 144164 | tempo | the \"junk-miles middle\" |",
"| Z4 | 165185 | threshold (LTHR=182) | hard sustained |",
"| Z5 | 186209 | VO₂ max | intervals |",
"",
"For each activity, time-in-zone comes from `activity_time_in_zone` (precomputed by `compute_time_in_zone.py`):",
"- **`source='fit'`** — per-second HR from the FIT file. Each record's `dt` (typically 1 s) goes into whichever zone its HR falls in. Accurate even when laps span zone boundaries.",
"- **`source='lap'`** — fallback for activities without a linked FIT. The whole lap's duration is assigned to whichever zone the *lap's average* HR sits in. Smears across boundaries, biases toward middle zones.",
"",
"**Polarized-training rule (Seiler):** elites accumulate ~80% of weekly time in Z1+Z2 and ~20% in Z4+Z5, with little Z3.",
))
cells.append(code(
"from openrun import HR_ZONES_USER",
"",
"tiz = pd.read_sql('''",
" SELECT t.activity_id, t.z1_s, t.z2_s, t.z3_s, t.z4_s, t.z5_s, t.total_s, t.source,",
" a.start_time_local, a.activity_type, a.distance_m, a.duration_s",
" FROM activity_time_in_zone t",
" JOIN activities a USING(activity_id)",
" WHERE a.activity_type IN ('running','trail_running')",
"''', conn, parse_dates=['start_time_local'])",
"tiz['week'] = tiz['start_time_local'].dt.to_period('W-MON').dt.start_time",
"tiz['year'] = tiz['start_time_local'].dt.year",
"",
"print(f'{len(tiz)} activities with cached time-in-zone')",
"print(' source breakdown:')",
"print(tiz['source'].value_counts().to_string())",
"print(f' fit coverage: {(tiz.source==\"fit\").mean()*100:.0f}% of running activities')",
))
cells.append(md(
"### Sanity check: FIT vs lap method on the same race",
"",
"On the same activity, how different are the two estimates? Take the 2025-09-20 race (8 hours, 28k FIT records) and compute both, then compare. The lap method should over-weight whichever zone the typical lap average falls in (here, Z3) and under-count time spent in adjacent zones because boundary-crossing laps get rounded to one zone.",
))
cells.append(code(
"from openrun import load_fit_records, time_in_zone_from_fit, time_in_zone_from_splits",
"",
"race_aid = race_meta.iloc[-1]['activity_id'] # most recent 50K (2025-09-20)",
"race_date = race_meta.iloc[-1]['start_time_local'].date()",
"",
"fit_tiz = time_in_zone_from_fit(race_records[int(race_aid)])",
"race_splits = pd.read_sql(",
" 'SELECT avg_hr, duration_s FROM activity_splits WHERE activity_id = ?',",
" conn, params=[int(race_aid)]",
")",
"lap_tiz = time_in_zone_from_splits(race_splits)",
"",
"compare = pd.DataFrame({",
" 'FIT (per-sec) min': {k: round(v / 60, 1) for k, v in fit_tiz.items()},",
" 'lap (avg-HR) min': {k: round(v / 60, 1) for k, v in lap_tiz.items()},",
"}).reindex(['Z1','Z2','Z3','Z4','Z5']).fillna(0)",
"compare.loc['total'] = compare.sum()",
"compare['delta (min)'] = (compare['FIT (per-sec) min'] - compare['lap (avg-HR) min']).round(1)",
"print(f'Race {race_date} — FIT vs lap time-in-zone:')",
"compare",
))
cells.append(code(
"# Weekly time-in-zone (hours) using whichever method was available per activity",
"wk_cols = ['z1_s','z2_s','z3_s','z4_s','z5_s']",
"weekly = tiz.groupby('week')[wk_cols].sum() / 3600",
"weekly.columns = ['Z1','Z2','Z3','Z4','Z5']",
"",
"fig, ax = plt.subplots(figsize=(14, 4.8))",
"colors = ['#264653', '#2a9d8f', '#e9c46a', '#f4a261', '#e76f51']",
"ax.stackplot(weekly.index, weekly.T.values, labels=weekly.columns, colors=colors, alpha=0.92)",
"ax.set_ylabel('hours / week')",
"ax.set_title('Weekly running time by HR zone (Garmin-configured zones)')",
"ax.legend(loc='upper left', ncol=5, fontsize=9)",
"fig.autofmt_xdate()",
"fig.tight_layout()",
))
cells.append(md(
"### Polarized index over time",
"",
"Collapse to three buckets:",
"- **easy** = Z1 + Z2 (HR ≤ 143)",
"- **moderate \"junk\"** = Z3 (144164)",
"- **hard** = Z4 + Z5 (HR ≥ 165, threshold and up)",
"",
"Polarized: high easy, low moderate, modest hard. Pyramidal: high easy, modest moderate, low hard. Threshold-heavy: lots of moderate.",
))
cells.append(code(
"buckets = pd.DataFrame({",
" 'easy': weekly[['Z1','Z2']].sum(axis=1),",
" 'moderate': weekly['Z3'],",
" 'hard': weekly[['Z4','Z5']].sum(axis=1),",
"})",
"totals = buckets.sum(axis=1)",
"pct = buckets.div(totals.replace(0, np.nan), axis=0) * 100",
"",
"fig, axes = plt.subplots(2, 1, figsize=(14, 7), sharex=True)",
"",
"ax = axes[0]",
"ax.stackplot(pct.index, pct[['easy','moderate','hard']].T.values,",
" labels=['easy (Z1+Z2)', 'moderate (Z3)', 'hard (Z4+Z5)'],",
" colors=['#2a9d8f', '#e9c46a', '#e76f51'], alpha=0.9)",
"ax.axhline(80, color='black', ls='--', lw=0.8, label='80% easy target')",
"ax.set_ylabel('% of weekly time'); ax.set_ylim(0, 100)",
"ax.set_title('Polarized-training split (weekly)')",
"ax.legend(loc='lower left', ncol=4, fontsize=9)",
"",
"ax = axes[1]",
"rolling_pct = pct.rolling(4, min_periods=2).mean()",
"for col, c in [('easy','#2a9d8f'), ('moderate','#e9c46a'), ('hard','#e76f51')]:",
" ax.plot(rolling_pct.index, rolling_pct[col], color=c, lw=2, label=col)",
"ax.axhline(80, color='#2a9d8f', ls='--', lw=0.8)",
"ax.axhline(20, color='#e76f51', ls='--', lw=0.8)",
"ax.set_ylabel('% (4-week rolling mean)')",
"ax.legend(loc='best', fontsize=9)",
"fig.autofmt_xdate()",
"fig.tight_layout()",
))
cells.append(md(
"### Yearly summary + FIT-method coverage",
"",
"`fit_coverage_%` shows what fraction of each year's activities had a linked FIT (and therefore got per-second zones). 2026's lower coverage reflects activities that synced via the live API but aren't in the takeout dump.",
))
cells.append(code(
"yearly_hours = (tiz.groupby('year')[wk_cols].sum() / 3600).round(1)",
"yearly_hours.columns = ['Z1','Z2','Z3','Z4','Z5']",
"yearly_pct = yearly_hours.div(yearly_hours.sum(axis=1), axis=0) * 100",
"",
"out = pd.concat({'hours': yearly_hours, '%': yearly_pct.round(1)}, axis=1)",
"out['easy_pct'] = (yearly_pct[['Z1','Z2']].sum(axis=1)).round(1)",
"out['hard_pct'] = (yearly_pct[['Z4','Z5']].sum(axis=1)).round(1)",
"fit_coverage = tiz.groupby('year').apply(",
" lambda g: (g['source']=='fit').mean() * 100, include_groups=False",
").round(0)",
"out['fit_coverage_%'] = fit_coverage",
"out",
))
cells.append(md(
"### Race-build vs base-period zone distribution",
"",
"Compare what training looked like in the 12 weeks before each prior 50K race vs the rest of the year. A serious build should shift time into Z2 (long aerobic) and Z4 (threshold/tempo) and away from Z3.",
))
cells.append(code(
"race_dates = race_meta['start_time_local']",
"tiz['phase'] = 'base'",
"for rd in race_dates:",
" build_start = rd - pd.Timedelta(weeks=12)",
" mask = (tiz['start_time_local'] >= build_start) & (tiz['start_time_local'] < rd)",
" tiz.loc[mask, 'phase'] = f'build {rd.year}'",
"",
"phase_hours = (tiz.groupby('phase')[wk_cols].sum() / 3600).round(1)",
"phase_hours.columns = ['Z1','Z2','Z3','Z4','Z5']",
"phase_pct = (phase_hours.div(phase_hours.sum(axis=1), axis=0) * 100).round(1)",
"phase_pct['easy_Z1+Z2'] = (phase_pct['Z1'] + phase_pct['Z2']).round(1)",
"phase_pct['junk_Z3'] = phase_pct['Z3']",
"phase_pct['hard_Z4+Z5'] = (phase_pct['Z4'] + phase_pct['Z5']).round(1)",
"phase_pct[['easy_Z1+Z2','junk_Z3','hard_Z4+Z5']].sort_index()",
))
cells.append(md(
"## What's left when FIT files are ingested",
"",
"All four sections now use per-second FIT data where it's linked (349 of 378 activities, 92%). Remaining lap-only activities are mostly old multi-sport / triathlon legs that no FIT was uploaded for. Useful follow-ups:",
"",
"- **Cadence stability** — plot cadence over elapsed time within a long run; quantify the drop in the final 15 %.",
"- **GPS polylines for route clustering** — current §3 uses start coordinates only; with full FIT GPS tracks, match routes by Hausdorff distance (more accurate than start-only).",
"- **Decoupling vs fueling protocol** — once the user logs even informal fueling notes for a few long runs, regress decoupling against carb intake.",
))
notebook = {
"cells": cells,
"metadata": {
"kernelspec": {"display_name": ".venv", "language": "python", "name": "python3"},
"language_info": {"name": "python"},
},
"nbformat": 4,
"nbformat_minor": 5,
}
out = Path(__file__).parent / "05_intra_run.ipynb"
out.write_text(json.dumps(notebook, indent=1))
print(f"wrote {out} ({out.stat().st_size:,} bytes, {len(cells)} cells)")