"**Data note.** This project's sync was via the Garmin live API (`sync.py`), not the official zip export, so `activity_fit_files` is empty and per-second FIT data isn't available. Everything here runs on `activity_splits` (per-mile laps, ~2 000 rows). When FIT files arrive via `ingest_export.py`, these same analyses upgrade to per-second resolution — only the loader needs to change.",
"Within a single run, divide the laps into first half and second half. For each half compute the duration-weighted ratio `speed / HR` — essentially \"pace per heartbeat,\" the gold-standard aerobic-fitness index. Decoupling = how much that ratio falls between halves.",
"ax.set_ylabel('decoupling (%)'); ax.set_title('decoupling over time (color = distance km)')",
"plt.colorbar(sc, ax=ax, label='distance (km)')",
"fig.autofmt_xdate()",
"fig.tight_layout()",
))
cells.append(md(
"### Aerobic runs only — the clean view",
"",
"Decoupling is only interpretable on **steady aerobic** efforts. Filter to runs ≥ 8 km with avg HR < 165 (well below threshold for a sub-3:30 marathoner) and look at the trend. Less drift over time = better aerobic conditioning.",
"ax.plot(rolling.index, rolling.values, color='black', lw=2, label='120-day rolling median')",
"",
"ax.axhline(5, color='#2a9d8f', ls='--', lw=1)",
"ax.axhline(10, color='#e76f51', ls='--', lw=1)",
"ax.set_ylabel('decoupling (%)')",
"ax.set_title('Cardiac drift on aerobic runs (≥8 km, avg HR < 165)')",
"ax.legend(loc='upper right')",
"fig.autofmt_xdate()",
"fig.tight_layout()",
))
cells.append(code(
"# Year-over-year aerobic decoupling summary",
"aerobic.groupby('year').agg(",
" n=('decoupling_pct', 'size'),",
" median_drift=('decoupling_pct', 'median'),",
" mean_drift=('decoupling_pct', 'mean'),",
" pct_under_5=('decoupling_pct', lambda s: (s < 5).mean() * 100),",
").round(2)",
))
# ----- Section 1b: per-second race-day decoupling from FIT files -----
cells.append(md(
"## 1b. Per-second decoupling — race-day deep dive",
"",
"Lap-level decoupling (above) is coarse. With FIT files linked (since the takeout-export ingest), we can read the per-second `heart_rate` and `enhanced_speed` directly and compute Friel's decoupling without the noise from aid-station stops and lap rounding.",
"",
"**Method:**",
"1. Drop the first 5 min (warmup) and last 2 min (cooldown / finish sprint).",
"2. Drop records with speed < 0.5 m/s — aid-station pauses don't drag the mean.",
"3. Slice the moving time into equal-time chunks (halves or quartiles).",
"4. For each chunk: `efficiency = mean(speed) / mean(HR)`.",
"Friel's rule: < 5% on a steady aerobic run = aerobically developed; > 10% = unsustainable pacing or fueling deficit. Race-day numbers are expected to be higher than training (you push the back half), but *how much* higher matters.",
"ax.set_title('Per-second decoupling by race quartile — the wall lands in Q3 every time')",
"ax.legend(loc='upper left', ncol=5)",
"fig.tight_layout()",
))
cells.append(md(
"### Rolling efficiency curves — when does the wheels-come-off moment hit?",
"",
"5-minute rolling speed/HR over elapsed time. Flat = pacing matches HR. Falling curve = decoupling in progress. The y-axis is the same physical quantity Friel's method aggregates, just plotted continuously.",
" ax.set_title(f\"{r['start_time_local'].date()} — {r['km']:.1f} km, avg HR {r['avg_hr']:.0f}\",",
" fontsize=10)",
"axes[-1].set_xlabel('elapsed minutes')",
"fig.suptitle('Rolling efficiency through each race (5-min window)', y=1.01)",
"fig.tight_layout()",
))
cells.append(md(
"### HR and pace traces, side by side",
"",
"Same data, separated: HR (left axis, magma colour-scale) and pace (right axis, inverted so faster is up). The interesting moments are where the curves *diverge* — HR climbing while pace stays flat (drift) or HR steady while pace falls (just tired legs).",
"fig.suptitle('HR (red) and pace (dark) — divergence = decoupling', y=1.01)",
"fig.tight_layout()",
))
cells.append(md(
"### Per-second vs per-mile decoupling — sanity check",
"",
"How does the FIT-derived number compare to the lap-level decoupling we computed in §1? Per-second is correctly excluding stopped time and lap rounding, so should be **lower** than the per-mile number for the same race — but the qualitative ranking should agree.",
))
cells.append(code(
"# Pull the per-mile (§1) value for each race and compare to per-second",
"'delta': round(lap - ps, 1) if not pd.isna(lap) else None})",
"pd.DataFrame(rows)",
))
cells.append(md(
"### What this means for the 50-mile",
"",
"The per-second view localises the drift: in every prior race the wheels come off around the **4-hour mark** (between Q2 and Q3). For the 50-mile that's roughly halfway through the race — exactly when fueling errors stop being recoverable.",
"",
"Three concrete implications:",
"",
"1. **Front-load fueling.** The textbook glycogen depletion curve says 90 min of running on stored glycogen, then performance falls off without external carbs. Q1 (the easy half) shouldn't be a fueling holiday — every aid station, every hour, from the start.",
"2. **Recalibrate pace by HR, not by feel.** The rolling-efficiency plots show HR rising while pace falls. Setting an HR ceiling (e.g. Z2 top = 143 bpm for the long run, slightly higher for race) and *enforcing it* would flatten the Q3 collapse.",
"3. **What success looks like on Sept 12.** A 50-mile race executed cleanly should look like the *first half* of these 50K curves repeated twice. If the Q3 wall reappears around hour 4–5, treat it as a planned aid-station break to top up calories before continuing.",
))
# ----- Section 2: cadence + stride -----
cells.append(md(
"## 2. Cadence and stride length",
"",
"At a given pace, faster runners tend to have **higher cadence and shorter stride**. Watching cadence-vs-pace and stride-vs-pace by year shows whether form is shifting independently of fitness.",
"",
"Garmin's `averageRunCadence` per split is already **both-legs** steps-per-minute (typical running range 150–185). `strideLength` is in cm.",
"Bin splits into a narrow easy-pace band (5:30–6:30 min/km) and look at cadence / stride / vertical metrics year-over-year. Holding pace constant strips out the obvious \"faster = higher cadence\" effect and isolates technique drift.",
"fig.suptitle('Form metrics in the 5:30–6:30 min/km band, year over year')",
"fig.tight_layout()",
))
# ----- Section 3: GPS route clustering -----
cells.append(md(
"## 3. Route clustering — pace controlled for terrain",
"",
"Raw pace year-over-year mixes terrain, weather, intent. Cluster runs by their **start coordinates** (greedy haversine, 250 m radius) and you get \"my usual routes.\" Within a cluster the route is roughly the same, so pace differences are mostly fitness, not geography.",
"ax.set_title('Run start points — top 10 recurring routes')",
"ax.legend(loc='best', fontsize=8)",
"ax.set_aspect('equal', adjustable='datalim')",
"fig.tight_layout()",
))
cells.append(md(
"### Pace progression within each frequent route",
"",
"Now restrict to clusters with ≥5 runs and plot pace over time per route. Slopes here are much cleaner than the global pace trend because terrain is held constant.",
" fig.suptitle('Per-route pace over time (terrain held roughly constant)')",
" fig.tight_layout()",
"else:",
" print('No clusters with ≥5 runs.')",
))
# ----- Section 4: HR zones (Garmin-configured, FIT-based when available) -----
cells.append(md(
"## 4. HR-zone time-in-zone (Garmin-configured zones, per-second when possible)",
"",
"**Zones come from Garmin's `heartRateZones.json` (training method: HR_MAX),** not estimated from observed HR. Lactate-threshold HR sits at 182 inside Z4.",
"",
"| Zone | range (bpm) | feel | role |",
"|------|-------------|------|------|",
"| Z1 | 102–122 | walk / recovery | active rest |",
"For each activity, time-in-zone comes from `activity_time_in_zone` (precomputed by `compute_time_in_zone.py`):",
"- **`source='fit'`** — per-second HR from the FIT file. Each record's `dt` (typically 1 s) goes into whichever zone its HR falls in. Accurate even when laps span zone boundaries.",
"- **`source='lap'`** — fallback for activities without a linked FIT. The whole lap's duration is assigned to whichever zone the *lap's average* HR sits in. Smears across boundaries, biases toward middle zones.",
"",
"**Polarized-training rule (Seiler):** elites accumulate ~80% of weekly time in Z1+Z2 and ~20% in Z4+Z5, with little Z3.",
"print(f' fit coverage: {(tiz.source==\"fit\").mean()*100:.0f}% of running activities')",
))
cells.append(md(
"### Sanity check: FIT vs lap method on the same race",
"",
"On the same activity, how different are the two estimates? Take the 2025-09-20 race (8 hours, 28k FIT records) and compute both, then compare. The lap method should over-weight whichever zone the typical lap average falls in (here, Z3) and under-count time spent in adjacent zones because boundary-crossing laps get rounded to one zone.",
"`fit_coverage_%` shows what fraction of each year's activities had a linked FIT (and therefore got per-second zones). 2026's lower coverage reflects activities that synced via the live API but aren't in the takeout dump.",
"### Race-build vs base-period zone distribution",
"",
"Compare what training looked like in the 12 weeks before each prior 50K race vs the rest of the year. A serious build should shift time into Z2 (long aerobic) and Z4 (threshold/tempo) and away from Z3.",
"All four sections now use per-second FIT data where it's linked (349 of 378 activities, 92%). Remaining lap-only activities are mostly old multi-sport / triathlon legs that no FIT was uploaded for. Useful follow-ups:",
"",
"- **Cadence stability** — plot cadence over elapsed time within a long run; quantify the drop in the final 15 %.",
"- **GPS polylines for route clustering** — current §3 uses start coordinates only; with full FIT GPS tracks, match routes by Hausdorff distance (more accurate than start-only).",
"- **Decoupling vs fueling protocol** — once the user logs even informal fueling notes for a few long runs, regress decoupling against carb intake.",