843 lines
36 KiB
Plaintext
843 lines
36 KiB
Plaintext
|
|
{
|
||
|
|
"cells": [
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"# 05 \u2014 Intra-run dynamics\n",
|
||
|
|
"\n",
|
||
|
|
"Within-run signals from lap-level splits: cardiac drift, cadence/stride, route-controlled pace, HR-zone distribution.\n",
|
||
|
|
"\n",
|
||
|
|
"**Data note.** This project's sync was via the Garmin live API (`sync.py`), not the official zip export, so `activity_fit_files` is empty and per-second FIT data isn't available. Everything here runs on `activity_splits` (per-mile laps, ~2 000 rows). When FIT files arrive via `ingest_export.py`, these same analyses upgrade to per-second resolution \u2014 only the loader needs to change."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"import sys\n",
|
||
|
|
"sys.path.insert(0, '..')\n",
|
||
|
|
"import numpy as np\n",
|
||
|
|
"import pandas as pd\n",
|
||
|
|
"import matplotlib.pyplot as plt\n",
|
||
|
|
"from analysis import (\n",
|
||
|
|
" open_conn, load_splits, decoupling,\n",
|
||
|
|
" assign_hr_zone, cluster_routes, haversine_km,\n",
|
||
|
|
")\n",
|
||
|
|
"\n",
|
||
|
|
"conn = open_conn()\n",
|
||
|
|
"splits = load_splits(conn) # running only by default\n",
|
||
|
|
"print(f'{len(splits):,} splits across {splits.activity_id.nunique():,} runs, {splits.start_time_local.min().date()} \u2192 {splits.start_time_local.max().date()}')"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 1. Cardiac drift (Pa:Hr decoupling)\n",
|
||
|
|
"\n",
|
||
|
|
"Within a single run, divide the laps into first half and second half. For each half compute the duration-weighted ratio `speed / HR` \u2014 essentially \"pace per heartbeat,\" the gold-standard aerobic-fitness index. Decoupling = how much that ratio falls between halves.\n",
|
||
|
|
"\n",
|
||
|
|
"$$\\text{decoupling}\\;\\% = \\left(\\frac{(\\text{speed}/\\text{HR})_{1st}}{(\\text{speed}/\\text{HR})_{2nd}} - 1\\right) \\times 100$$\n",
|
||
|
|
"\n",
|
||
|
|
"Friel's rule of thumb for steady aerobic runs:\n",
|
||
|
|
"- **< 5 %** \u2014 aerobically developed\n",
|
||
|
|
"- **5\u201310 %** \u2014 moderate drift; sustainable\n",
|
||
|
|
"- **> 10 %** \u2014 significant drift; pace was unsustainable or it was a hot/hard day\n",
|
||
|
|
"\n",
|
||
|
|
"Negative values mean you ran *more efficiently* in the second half (a negative split or conservative opener)."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"dec = decoupling(splits, min_splits=6)\n",
|
||
|
|
"print(f'{len(dec)} runs with \u2265 6 splits')\n",
|
||
|
|
"dec['decoupling_pct'].describe().round(2)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"fig, axes = plt.subplots(1, 2, figsize=(13, 4.5))\n",
|
||
|
|
"\n",
|
||
|
|
"ax = axes[0]\n",
|
||
|
|
"ax.hist(dec['decoupling_pct'], bins=30, edgecolor='white')\n",
|
||
|
|
"for x, lab, color in [(5, 'good <5%', '#2a9d8f'), (10, 'caution 10%', '#e76f51')]:\n",
|
||
|
|
" ax.axvline(x, color=color, ls='--', lw=1.2, label=lab)\n",
|
||
|
|
"ax.axvline(0, color='gray', lw=0.8)\n",
|
||
|
|
"ax.set_xlabel('decoupling (%)'); ax.set_ylabel('runs')\n",
|
||
|
|
"ax.set_title(f'Pa:Hr decoupling distribution (n={len(dec)})')\n",
|
||
|
|
"ax.legend()\n",
|
||
|
|
"\n",
|
||
|
|
"ax = axes[1]\n",
|
||
|
|
"sc = ax.scatter(dec['start_time_local'], dec['decoupling_pct'],\n",
|
||
|
|
" c=dec['distance_km'], cmap='viridis', alpha=0.75, s=28)\n",
|
||
|
|
"ax.axhline(5, color='#2a9d8f', ls='--', lw=1)\n",
|
||
|
|
"ax.axhline(10, color='#e76f51', ls='--', lw=1)\n",
|
||
|
|
"ax.axhline(0, color='gray', lw=0.6)\n",
|
||
|
|
"ax.set_ylabel('decoupling (%)'); ax.set_title('decoupling over time (color = distance km)')\n",
|
||
|
|
"plt.colorbar(sc, ax=ax, label='distance (km)')\n",
|
||
|
|
"fig.autofmt_xdate()\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Aerobic runs only \u2014 the clean view\n",
|
||
|
|
"\n",
|
||
|
|
"Decoupling is only interpretable on **steady aerobic** efforts. Filter to runs \u2265 8 km with avg HR < 165 (well below threshold for a sub-3:30 marathoner) and look at the trend. Less drift over time = better aerobic conditioning."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"aerobic = dec[(dec['distance_km'] >= 8) & (dec['avg_hr'] < 165)].copy()\n",
|
||
|
|
"aerobic['quarter'] = aerobic['start_time_local'].dt.to_period('Q').dt.to_timestamp()\n",
|
||
|
|
"print(f'{len(aerobic)} aerobic runs')\n",
|
||
|
|
"\n",
|
||
|
|
"fig, ax = plt.subplots(figsize=(11, 4.5))\n",
|
||
|
|
"ax.scatter(aerobic['start_time_local'], aerobic['decoupling_pct'],\n",
|
||
|
|
" c=aerobic['avg_hr'], cmap='magma_r', s=40, alpha=0.85)\n",
|
||
|
|
"\n",
|
||
|
|
"# rolling median (smooth trend)\n",
|
||
|
|
"aerobic_sorted = aerobic.sort_values('start_time_local')\n",
|
||
|
|
"rolling = aerobic_sorted.set_index('start_time_local')['decoupling_pct'].rolling('120D', min_periods=5).median()\n",
|
||
|
|
"ax.plot(rolling.index, rolling.values, color='black', lw=2, label='120-day rolling median')\n",
|
||
|
|
"\n",
|
||
|
|
"ax.axhline(5, color='#2a9d8f', ls='--', lw=1)\n",
|
||
|
|
"ax.axhline(10, color='#e76f51', ls='--', lw=1)\n",
|
||
|
|
"ax.set_ylabel('decoupling (%)')\n",
|
||
|
|
"ax.set_title('Cardiac drift on aerobic runs (\u22658 km, avg HR < 165)')\n",
|
||
|
|
"ax.legend(loc='upper right')\n",
|
||
|
|
"fig.autofmt_xdate()\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"# Year-over-year aerobic decoupling summary\n",
|
||
|
|
"aerobic.groupby('year').agg(\n",
|
||
|
|
" n=('decoupling_pct', 'size'),\n",
|
||
|
|
" median_drift=('decoupling_pct', 'median'),\n",
|
||
|
|
" mean_drift=('decoupling_pct', 'mean'),\n",
|
||
|
|
" pct_under_5=('decoupling_pct', lambda s: (s < 5).mean() * 100),\n",
|
||
|
|
").round(2)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 1b. Per-second decoupling \u2014 race-day deep dive\n",
|
||
|
|
"\n",
|
||
|
|
"Lap-level decoupling (above) is coarse. With FIT files linked (since the takeout-export ingest), we can read the per-second `heart_rate` and `enhanced_speed` directly and compute Friel's decoupling without the noise from aid-station stops and lap rounding.\n",
|
||
|
|
"\n",
|
||
|
|
"**Method:**\n",
|
||
|
|
"1. Drop the first 5 min (warmup) and last 2 min (cooldown / finish sprint).\n",
|
||
|
|
"2. Drop records with speed < 0.5 m/s \u2014 aid-station pauses don't drag the mean.\n",
|
||
|
|
"3. Slice the moving time into equal-time chunks (halves or quartiles).\n",
|
||
|
|
"4. For each chunk: `efficiency = mean(speed) / mean(HR)`.\n",
|
||
|
|
"5. `decoupling % = (eff_first / eff_chunk \u2212 1) \u00d7 100` \u2014 positive = drift.\n",
|
||
|
|
"\n",
|
||
|
|
"Friel's rule: < 5% on a steady aerobic run = aerobically developed; > 10% = unsustainable pacing or fueling deficit. Race-day numbers are expected to be higher than training (you push the back half), but *how much* higher matters."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"from analysis import load_fit_records, fit_decoupling, fit_rolling_efficiency\n",
|
||
|
|
"\n",
|
||
|
|
"race_meta = pd.read_sql('''\n",
|
||
|
|
" SELECT a.activity_id, a.start_time_local, a.distance_m/1000 AS km, a.avg_hr\n",
|
||
|
|
" FROM activities a JOIN activity_fit_files f USING(activity_id)\n",
|
||
|
|
" WHERE a.distance_m >= 45000 AND a.distance_m <= 60000\n",
|
||
|
|
" AND a.activity_type='running'\n",
|
||
|
|
" ORDER BY a.start_time_local\n",
|
||
|
|
"''', conn, parse_dates=['start_time_local'])\n",
|
||
|
|
"print(f'{len(race_meta)} prior 50K-class races with FIT linked:')\n",
|
||
|
|
"race_meta"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"# Load all four FITs once, cache the records frames\n",
|
||
|
|
"race_records = {}\n",
|
||
|
|
"for _, r in race_meta.iterrows():\n",
|
||
|
|
" aid = int(r['activity_id'])\n",
|
||
|
|
" race_records[aid] = load_fit_records(conn, aid)\n",
|
||
|
|
" print(f\" {r['start_time_local'].date()} aid={aid} records={len(race_records[aid]):,}\")"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Halves and quartiles \u2014 when does the drift start?"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"halves = []\n",
|
||
|
|
"quarts = []\n",
|
||
|
|
"for _, r in race_meta.iterrows():\n",
|
||
|
|
" aid = int(r['activity_id'])\n",
|
||
|
|
" h = fit_decoupling(race_records[aid], segments=2)\n",
|
||
|
|
" h.insert(0, 'race', r['start_time_local'].date())\n",
|
||
|
|
" halves.append(h)\n",
|
||
|
|
" q = fit_decoupling(race_records[aid], segments=4)\n",
|
||
|
|
" q.insert(0, 'race', r['start_time_local'].date())\n",
|
||
|
|
" quarts.append(q)\n",
|
||
|
|
"halves_df = pd.concat(halves, ignore_index=True)\n",
|
||
|
|
"quarts_df = pd.concat(quarts, ignore_index=True)\n",
|
||
|
|
"\n",
|
||
|
|
"print('Per-race half-by-half decoupling:')\n",
|
||
|
|
"(halves_df.pivot(index='race', columns='segment', values='decoupling_pct')\n",
|
||
|
|
" .round(1).rename(columns={1:'Q1+Q2', 2:'Q3+Q4'}))"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"print('Quartile decoupling \u2014 where the drift actually starts:')\n",
|
||
|
|
"(quarts_df.pivot(index='race', columns='segment', values='decoupling_pct')\n",
|
||
|
|
" .round(1).rename(columns={1:'Q1',2:'Q2',3:'Q3',4:'Q4'}))"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"# Visualise quartile decoupling \u2014 bars per race, grouped by quartile\n",
|
||
|
|
"fig, ax = plt.subplots(figsize=(11, 4.5))\n",
|
||
|
|
"races = sorted(quarts_df['race'].unique())\n",
|
||
|
|
"x = np.arange(len(races))\n",
|
||
|
|
"w = 0.2\n",
|
||
|
|
"colors = ['#2a9d8f', '#e9c46a', '#f4a261', '#e76f51'] # cool \u2192 hot\n",
|
||
|
|
"for i in range(1, 5):\n",
|
||
|
|
" vals = [quarts_df[(quarts_df.race == r) & (quarts_df.segment == i)]['decoupling_pct'].iloc[0]\n",
|
||
|
|
" for r in races]\n",
|
||
|
|
" ax.bar(x + (i - 2.5) * w, vals, w, color=colors[i-1], label=f'Q{i}')\n",
|
||
|
|
"ax.axhline(0, color='black', lw=0.5)\n",
|
||
|
|
"ax.axhline(10, color='gray', ls='--', lw=1, label='Friel \"unsustainable\" threshold')\n",
|
||
|
|
"ax.set_xticks(x)\n",
|
||
|
|
"ax.set_xticklabels([str(r) for r in races])\n",
|
||
|
|
"ax.set_ylabel('decoupling (%)')\n",
|
||
|
|
"ax.set_title('Per-second decoupling by race quartile \u2014 the wall lands in Q3 every time')\n",
|
||
|
|
"ax.legend(loc='upper left', ncol=5)\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Rolling efficiency curves \u2014 when does the wheels-come-off moment hit?\n",
|
||
|
|
"\n",
|
||
|
|
"5-minute rolling speed/HR over elapsed time. Flat = pacing matches HR. Falling curve = decoupling in progress. The y-axis is the same physical quantity Friel's method aggregates, just plotted continuously."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"fig, axes = plt.subplots(len(race_meta), 1, figsize=(13, 2.5 * len(race_meta)),\n",
|
||
|
|
" sharex=True, squeeze=False)\n",
|
||
|
|
"axes = axes.flatten()\n",
|
||
|
|
"for ax, (_, r) in zip(axes, race_meta.iterrows()):\n",
|
||
|
|
" aid = int(r['activity_id'])\n",
|
||
|
|
" rolled = fit_rolling_efficiency(race_records[aid], window_s=300)\n",
|
||
|
|
" valid = rolled.dropna(subset=['rolling_efficiency'])\n",
|
||
|
|
" ax.plot(valid['elapsed_min'], valid['rolling_efficiency'], color='#264653', lw=1.5)\n",
|
||
|
|
" # Normalise against the first 30 minutes' mean to show % drop\n",
|
||
|
|
" base = valid.loc[valid['elapsed_min'] < 30, 'rolling_efficiency'].mean()\n",
|
||
|
|
" if base and base > 0:\n",
|
||
|
|
" ax2 = ax.twinx()\n",
|
||
|
|
" ax2.plot(valid['elapsed_min'], (valid['rolling_efficiency'] / base - 1) * 100,\n",
|
||
|
|
" color='#e76f51', lw=1, alpha=0.6)\n",
|
||
|
|
" ax2.set_ylabel('% vs first 30 min', color='#e76f51', fontsize=9)\n",
|
||
|
|
" ax2.axhline(0, color='#e76f51', ls=':', lw=0.6, alpha=0.5)\n",
|
||
|
|
" ax.set_ylabel('speed / HR')\n",
|
||
|
|
" ax.set_title(f\"{r['start_time_local'].date()} \u2014 {r['km']:.1f} km, avg HR {r['avg_hr']:.0f}\",\n",
|
||
|
|
" fontsize=10)\n",
|
||
|
|
"axes[-1].set_xlabel('elapsed minutes')\n",
|
||
|
|
"fig.suptitle('Rolling efficiency through each race (5-min window)', y=1.01)\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### HR and pace traces, side by side\n",
|
||
|
|
"\n",
|
||
|
|
"Same data, separated: HR (left axis, magma colour-scale) and pace (right axis, inverted so faster is up). The interesting moments are where the curves *diverge* \u2014 HR climbing while pace stays flat (drift) or HR steady while pace falls (just tired legs)."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"fig, axes = plt.subplots(len(race_meta), 1, figsize=(13, 2.8 * len(race_meta)),\n",
|
||
|
|
" sharex=True, squeeze=False)\n",
|
||
|
|
"axes = axes.flatten()\n",
|
||
|
|
"for ax, (_, r) in zip(axes, race_meta.iterrows()):\n",
|
||
|
|
" aid = int(r['activity_id'])\n",
|
||
|
|
" rec = race_records[aid].dropna(subset=['heart_rate','speed_mps','elapsed_s'])\n",
|
||
|
|
" rec = rec[rec['speed_mps'] > 0.5]\n",
|
||
|
|
" em = rec['elapsed_s'] / 60\n",
|
||
|
|
" rolled_hr = rec['heart_rate'].rolling(300, min_periods=30).mean()\n",
|
||
|
|
" rolled_pace = (1 / rec['speed_mps']) * 1000 / 60\n",
|
||
|
|
" rolled_pace = rolled_pace.rolling(300, min_periods=30).mean()\n",
|
||
|
|
" ax.plot(em, rolled_hr, color='#9b2226', lw=1.4, label='HR (5-min avg)')\n",
|
||
|
|
" ax.set_ylabel('HR (bpm)', color='#9b2226')\n",
|
||
|
|
" ax.tick_params(axis='y', labelcolor='#9b2226')\n",
|
||
|
|
" ax2 = ax.twinx()\n",
|
||
|
|
" ax2.plot(em, rolled_pace, color='#264653', lw=1.4, label='pace (5-min avg)')\n",
|
||
|
|
" ax2.set_ylabel('pace (min/km)', color='#264653')\n",
|
||
|
|
" ax2.tick_params(axis='y', labelcolor='#264653')\n",
|
||
|
|
" ax2.invert_yaxis() # faster = up\n",
|
||
|
|
" ax.set_title(f\"{r['start_time_local'].date()} \u2014 {r['km']:.1f} km\", fontsize=10)\n",
|
||
|
|
"axes[-1].set_xlabel('elapsed minutes')\n",
|
||
|
|
"fig.suptitle('HR (red) and pace (dark) \u2014 divergence = decoupling', y=1.01)\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Per-second vs per-mile decoupling \u2014 sanity check\n",
|
||
|
|
"\n",
|
||
|
|
"How does the FIT-derived number compare to the lap-level decoupling we computed in \u00a71? Per-second is correctly excluding stopped time and lap rounding, so should be **lower** than the per-mile number for the same race \u2014 but the qualitative ranking should agree."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"# Pull the per-mile (\u00a71) value for each race and compare to per-second\n",
|
||
|
|
"lap_dec = dec.set_index('activity_id')['decoupling_pct']\n",
|
||
|
|
"rows = []\n",
|
||
|
|
"for _, r in race_meta.iterrows():\n",
|
||
|
|
" aid = int(r['activity_id'])\n",
|
||
|
|
" ps = halves_df[(halves_df.race == r['start_time_local'].date()) & (halves_df.segment == 2)]['decoupling_pct'].iloc[0]\n",
|
||
|
|
" lap = lap_dec.get(aid, float('nan'))\n",
|
||
|
|
" rows.append({'race': r['start_time_local'].date(), 'km': r['km'],\n",
|
||
|
|
" 'per_mile_decoupling_pct': round(lap, 1),\n",
|
||
|
|
" 'per_second_decoupling_pct': round(ps, 1),\n",
|
||
|
|
" 'delta': round(lap - ps, 1) if not pd.isna(lap) else None})\n",
|
||
|
|
"pd.DataFrame(rows)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### What this means for the 50-mile\n",
|
||
|
|
"\n",
|
||
|
|
"The per-second view localises the drift: in every prior race the wheels come off around the **4-hour mark** (between Q2 and Q3). For the 50-mile that's roughly halfway through the race \u2014 exactly when fueling errors stop being recoverable.\n",
|
||
|
|
"\n",
|
||
|
|
"Three concrete implications:\n",
|
||
|
|
"\n",
|
||
|
|
"1. **Front-load fueling.** The textbook glycogen depletion curve says 90 min of running on stored glycogen, then performance falls off without external carbs. Q1 (the easy half) shouldn't be a fueling holiday \u2014 every aid station, every hour, from the start.\n",
|
||
|
|
"2. **Recalibrate pace by HR, not by feel.** The rolling-efficiency plots show HR rising while pace falls. Setting an HR ceiling (e.g. Z2 top = 143 bpm for the long run, slightly higher for race) and *enforcing it* would flatten the Q3 collapse.\n",
|
||
|
|
"3. **What success looks like on Sept 12.** A 50-mile race executed cleanly should look like the *first half* of these 50K curves repeated twice. If the Q3 wall reappears around hour 4\u20135, treat it as a planned aid-station break to top up calories before continuing."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 2. Cadence and stride length\n",
|
||
|
|
"\n",
|
||
|
|
"At a given pace, faster runners tend to have **higher cadence and shorter stride**. Watching cadence-vs-pace and stride-vs-pace by year shows whether form is shifting independently of fitness.\n",
|
||
|
|
"\n",
|
||
|
|
"Garmin's `averageRunCadence` per split is already **both-legs** steps-per-minute (typical running range 150\u2013185). `strideLength` is in cm."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"form = splits.dropna(subset=['averageRunCadence', 'strideLength', 'pace_min_per_km']).copy()\n",
|
||
|
|
"form = form[form['averageRunCadence'] > 0].copy() # zeros = walking/standing intervals\n",
|
||
|
|
"form['cadence_spm'] = form['averageRunCadence'] # already both-legs SPM\n",
|
||
|
|
"form['stride_m'] = form['strideLength'] / 100\n",
|
||
|
|
"form = form[(form['cadence_spm'] >= 140) & (form['cadence_spm'] <= 200)] # drop walks/junk\n",
|
||
|
|
"print(f'{len(form):,} clean form splits, {form.activity_id.nunique()} runs')\n",
|
||
|
|
"\n",
|
||
|
|
"fig, axes = plt.subplots(1, 2, figsize=(13, 5), sharex=True)\n",
|
||
|
|
"years = sorted(form['year'].unique())\n",
|
||
|
|
"cmap = plt.cm.viridis(np.linspace(0, 0.9, len(years)))\n",
|
||
|
|
"\n",
|
||
|
|
"for c, y in zip(cmap, years):\n",
|
||
|
|
" d = form[form['year'] == y]\n",
|
||
|
|
" axes[0].scatter(d['pace_min_per_km'], d['cadence_spm'], s=8, alpha=0.35, color=c, label=str(y))\n",
|
||
|
|
" axes[1].scatter(d['pace_min_per_km'], d['stride_m'], s=8, alpha=0.35, color=c, label=str(y))\n",
|
||
|
|
"\n",
|
||
|
|
"axes[0].set_ylabel('cadence (steps/min, both legs)')\n",
|
||
|
|
"axes[1].set_ylabel('stride length (m)')\n",
|
||
|
|
"for ax in axes:\n",
|
||
|
|
" ax.set_xlabel('pace (min/km)')\n",
|
||
|
|
" ax.invert_xaxis() # faster runs to the right\n",
|
||
|
|
"axes[0].legend(title='year', loc='lower left', fontsize=8)\n",
|
||
|
|
"axes[0].set_title('Cadence vs pace')\n",
|
||
|
|
"axes[1].set_title('Stride length vs pace')\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Form at a controlled pace\n",
|
||
|
|
"\n",
|
||
|
|
"Bin splits into a narrow easy-pace band (5:30\u20136:30 min/km) and look at cadence / stride / vertical metrics year-over-year. Holding pace constant strips out the obvious \"faster = higher cadence\" effect and isolates technique drift."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"band = form[(form['pace_min_per_km'] >= 5.5) & (form['pace_min_per_km'] <= 6.5)].copy()\n",
|
||
|
|
"vert = splits.dropna(subset=['verticalOscillation', 'verticalRatio', 'groundContactTime'])\n",
|
||
|
|
"vert = vert[(vert['pace_min_per_km'] >= 5.5) & (vert['pace_min_per_km'] <= 6.5)]\n",
|
||
|
|
"\n",
|
||
|
|
"summary = band.groupby('year').agg(\n",
|
||
|
|
" n_splits=('cadence_spm', 'size'),\n",
|
||
|
|
" cadence_med=('cadence_spm', 'median'),\n",
|
||
|
|
" stride_med=('stride_m', 'median'),\n",
|
||
|
|
")\n",
|
||
|
|
"vsum = vert.groupby('year').agg(\n",
|
||
|
|
" vert_osc_cm=('verticalOscillation', 'median'),\n",
|
||
|
|
" vert_ratio=('verticalRatio', 'median'),\n",
|
||
|
|
" gct_ms=('groundContactTime', 'median'),\n",
|
||
|
|
")\n",
|
||
|
|
"summary = summary.join(vsum).round(2)\n",
|
||
|
|
"summary"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"fig, axes = plt.subplots(1, 3, figsize=(14, 4))\n",
|
||
|
|
"summary['cadence_med'].plot(kind='bar', ax=axes[0], color='#264653')\n",
|
||
|
|
"axes[0].set_ylabel('cadence (spm)'); axes[0].set_title('Cadence at easy pace')\n",
|
||
|
|
"axes[0].set_ylim(summary['cadence_med'].min() - 3, summary['cadence_med'].max() + 3)\n",
|
||
|
|
"\n",
|
||
|
|
"summary['stride_med'].plot(kind='bar', ax=axes[1], color='#2a9d8f')\n",
|
||
|
|
"axes[1].set_ylabel('stride length (m)'); axes[1].set_title('Stride at easy pace')\n",
|
||
|
|
"axes[1].set_ylim(summary['stride_med'].min() - 0.05, summary['stride_med'].max() + 0.05)\n",
|
||
|
|
"\n",
|
||
|
|
"if summary['vert_osc_cm'].notna().any():\n",
|
||
|
|
" summary['vert_osc_cm'].plot(kind='bar', ax=axes[2], color='#e76f51')\n",
|
||
|
|
" axes[2].set_ylabel('vertical oscillation (cm)'); axes[2].set_title('Vertical bounce at easy pace')\n",
|
||
|
|
" axes[2].set_ylim(summary['vert_osc_cm'].min() - 0.5, summary['vert_osc_cm'].max() + 0.5)\n",
|
||
|
|
"else:\n",
|
||
|
|
" axes[2].text(0.5, 0.5, 'no vertical-osc data', ha='center', va='center', transform=axes[2].transAxes)\n",
|
||
|
|
"\n",
|
||
|
|
"for ax in axes:\n",
|
||
|
|
" ax.set_xlabel('year')\n",
|
||
|
|
"fig.suptitle('Form metrics in the 5:30\u20136:30 min/km band, year over year')\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 3. Route clustering \u2014 pace controlled for terrain\n",
|
||
|
|
"\n",
|
||
|
|
"Raw pace year-over-year mixes terrain, weather, intent. Cluster runs by their **start coordinates** (greedy haversine, 250 m radius) and you get \"my usual routes.\" Within a cluster the route is roughly the same, so pace differences are mostly fitness, not geography."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"starts = (splits.dropna(subset=['startLatitude', 'startLongitude'])\n",
|
||
|
|
" .groupby('activity_id')\n",
|
||
|
|
" .agg(lat=('startLatitude', 'first'),\n",
|
||
|
|
" lon=('startLongitude', 'first'),\n",
|
||
|
|
" start_time=('start_time_local', 'first'),\n",
|
||
|
|
" distance_km=('distance_m', lambda s: s.sum() / 1000),\n",
|
||
|
|
" avg_hr=('avg_hr', 'mean'),\n",
|
||
|
|
" avg_pace=('pace_min_per_km', 'mean')))\n",
|
||
|
|
"starts['cluster'] = cluster_routes(starts['lat'].values, starts['lon'].values, radius_km=0.25)\n",
|
||
|
|
"print(f'{len(starts)} runs with start coords; {(starts.cluster >= 0).sum()} clustered, {(starts.cluster == -1).sum()} singletons')\n",
|
||
|
|
"\n",
|
||
|
|
"top = (starts[starts.cluster >= 0]\n",
|
||
|
|
" .groupby('cluster')\n",
|
||
|
|
" .agg(n=('cluster', 'size'),\n",
|
||
|
|
" lat=('lat', 'median'),\n",
|
||
|
|
" lon=('lon', 'median'),\n",
|
||
|
|
" first=('start_time', 'min'),\n",
|
||
|
|
" last=('start_time', 'max'),\n",
|
||
|
|
" med_dist=('distance_km', 'median'))\n",
|
||
|
|
" .sort_values('n', ascending=False))\n",
|
||
|
|
"top.head(10)"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"fig, ax = plt.subplots(figsize=(9, 7))\n",
|
||
|
|
"\n",
|
||
|
|
"# unclustered points faint\n",
|
||
|
|
"unc = starts[starts.cluster == -1]\n",
|
||
|
|
"ax.scatter(unc['lon'], unc['lat'], color='lightgray', s=12, alpha=0.6, label='singletons')\n",
|
||
|
|
"\n",
|
||
|
|
"# top-10 clusters colored\n",
|
||
|
|
"top10 = top.head(10).index.tolist()\n",
|
||
|
|
"cmap = plt.cm.tab10(np.linspace(0, 1, len(top10)))\n",
|
||
|
|
"for c, cl in zip(cmap, top10):\n",
|
||
|
|
" pts = starts[starts.cluster == cl]\n",
|
||
|
|
" ax.scatter(pts['lon'], pts['lat'], color=c, s=40, alpha=0.8,\n",
|
||
|
|
" label=f'route #{cl} (n={len(pts)})')\n",
|
||
|
|
"\n",
|
||
|
|
"ax.set_xlabel('longitude'); ax.set_ylabel('latitude')\n",
|
||
|
|
"ax.set_title('Run start points \u2014 top 10 recurring routes')\n",
|
||
|
|
"ax.legend(loc='best', fontsize=8)\n",
|
||
|
|
"ax.set_aspect('equal', adjustable='datalim')\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Pace progression within each frequent route\n",
|
||
|
|
"\n",
|
||
|
|
"Now restrict to clusters with \u22655 runs and plot pace over time per route. Slopes here are much cleaner than the global pace trend because terrain is held constant."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"frequent = top[top['n'] >= 5].index.tolist()\n",
|
||
|
|
"print(f'{len(frequent)} routes with \u22655 runs')\n",
|
||
|
|
"\n",
|
||
|
|
"if frequent:\n",
|
||
|
|
" n_cols = min(3, len(frequent))\n",
|
||
|
|
" n_rows = (len(frequent) + n_cols - 1) // n_cols\n",
|
||
|
|
" fig, axes = plt.subplots(n_rows, n_cols, figsize=(5 * n_cols, 3.2 * n_rows),\n",
|
||
|
|
" squeeze=False, sharey=True)\n",
|
||
|
|
" for ax, cl in zip(axes.flat, frequent):\n",
|
||
|
|
" d = (starts[starts.cluster == cl]\n",
|
||
|
|
" .dropna(subset=['avg_pace'])\n",
|
||
|
|
" .sort_values('start_time'))\n",
|
||
|
|
" ax.scatter(d['start_time'], d['avg_pace'], s=30, alpha=0.75, color='#264653')\n",
|
||
|
|
" if len(d) >= 3:\n",
|
||
|
|
" # rolling median by date\n",
|
||
|
|
" rd = d.set_index('start_time')['avg_pace'].rolling('120D', min_periods=2).median()\n",
|
||
|
|
" ax.plot(rd.index, rd.values, color='#e76f51', lw=1.5)\n",
|
||
|
|
" ax.invert_yaxis() # faster up\n",
|
||
|
|
" ax.set_title(f'route #{cl} \u2014 n={len(d)}, ~{top.loc[cl, \"med_dist\"]:.1f} km')\n",
|
||
|
|
" ax.set_ylabel('pace (min/km)')\n",
|
||
|
|
" # blank unused axes\n",
|
||
|
|
" for ax in axes.flat[len(frequent):]:\n",
|
||
|
|
" ax.set_visible(False)\n",
|
||
|
|
" fig.suptitle('Per-route pace over time (terrain held roughly constant)')\n",
|
||
|
|
" fig.tight_layout()\n",
|
||
|
|
"else:\n",
|
||
|
|
" print('No clusters with \u22655 runs.')"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## 4. HR-zone time-in-zone (Garmin-configured zones, per-second when possible)\n",
|
||
|
|
"\n",
|
||
|
|
"**Zones come from Garmin's `heartRateZones.json` (training method: HR_MAX),** not estimated from observed HR. Lactate-threshold HR sits at 182 inside Z4.\n",
|
||
|
|
"\n",
|
||
|
|
"| Zone | range (bpm) | feel | role |\n",
|
||
|
|
"|------|-------------|------|------|\n",
|
||
|
|
"| Z1 | 102\u2013122 | walk / recovery | active rest |\n",
|
||
|
|
"| Z2 | 123\u2013143 | conversational | **long-run target** |\n",
|
||
|
|
"| Z3 | 144\u2013164 | tempo | the \"junk-miles middle\" |\n",
|
||
|
|
"| Z4 | 165\u2013185 | threshold (LTHR=182) | hard sustained |\n",
|
||
|
|
"| Z5 | 186\u2013209 | VO\u2082 max | intervals |\n",
|
||
|
|
"\n",
|
||
|
|
"For each activity, time-in-zone comes from `activity_time_in_zone` (precomputed by `compute_time_in_zone.py`):\n",
|
||
|
|
"- **`source='fit'`** \u2014 per-second HR from the FIT file. Each record's `dt` (typically 1 s) goes into whichever zone its HR falls in. Accurate even when laps span zone boundaries.\n",
|
||
|
|
"- **`source='lap'`** \u2014 fallback for activities without a linked FIT. The whole lap's duration is assigned to whichever zone the *lap's average* HR sits in. Smears across boundaries, biases toward middle zones.\n",
|
||
|
|
"\n",
|
||
|
|
"**Polarized-training rule (Seiler):** elites accumulate ~80% of weekly time in Z1+Z2 and ~20% in Z4+Z5, with little Z3."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"from analysis import HR_ZONES_USER\n",
|
||
|
|
"\n",
|
||
|
|
"tiz = pd.read_sql('''\n",
|
||
|
|
" SELECT t.activity_id, t.z1_s, t.z2_s, t.z3_s, t.z4_s, t.z5_s, t.total_s, t.source,\n",
|
||
|
|
" a.start_time_local, a.activity_type, a.distance_m, a.duration_s\n",
|
||
|
|
" FROM activity_time_in_zone t\n",
|
||
|
|
" JOIN activities a USING(activity_id)\n",
|
||
|
|
" WHERE a.activity_type IN ('running','trail_running')\n",
|
||
|
|
"''', conn, parse_dates=['start_time_local'])\n",
|
||
|
|
"tiz['week'] = tiz['start_time_local'].dt.to_period('W-MON').dt.start_time\n",
|
||
|
|
"tiz['year'] = tiz['start_time_local'].dt.year\n",
|
||
|
|
"\n",
|
||
|
|
"print(f'{len(tiz)} activities with cached time-in-zone')\n",
|
||
|
|
"print(' source breakdown:')\n",
|
||
|
|
"print(tiz['source'].value_counts().to_string())\n",
|
||
|
|
"print(f' fit coverage: {(tiz.source==\"fit\").mean()*100:.0f}% of running activities')"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Sanity check: FIT vs lap method on the same race\n",
|
||
|
|
"\n",
|
||
|
|
"On the same activity, how different are the two estimates? Take the 2025-09-20 race (8 hours, 28k FIT records) and compute both, then compare. The lap method should over-weight whichever zone the typical lap average falls in (here, Z3) and under-count time spent in adjacent zones because boundary-crossing laps get rounded to one zone."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"from analysis import load_fit_records, time_in_zone_from_fit, time_in_zone_from_splits\n",
|
||
|
|
"\n",
|
||
|
|
"race_aid = race_meta.iloc[-1]['activity_id'] # most recent 50K (2025-09-20)\n",
|
||
|
|
"race_date = race_meta.iloc[-1]['start_time_local'].date()\n",
|
||
|
|
"\n",
|
||
|
|
"fit_tiz = time_in_zone_from_fit(race_records[int(race_aid)])\n",
|
||
|
|
"race_splits = pd.read_sql(\n",
|
||
|
|
" 'SELECT avg_hr, duration_s FROM activity_splits WHERE activity_id = ?',\n",
|
||
|
|
" conn, params=[int(race_aid)]\n",
|
||
|
|
")\n",
|
||
|
|
"lap_tiz = time_in_zone_from_splits(race_splits)\n",
|
||
|
|
"\n",
|
||
|
|
"compare = pd.DataFrame({\n",
|
||
|
|
" 'FIT (per-sec) min': {k: round(v / 60, 1) for k, v in fit_tiz.items()},\n",
|
||
|
|
" 'lap (avg-HR) min': {k: round(v / 60, 1) for k, v in lap_tiz.items()},\n",
|
||
|
|
"}).reindex(['Z1','Z2','Z3','Z4','Z5']).fillna(0)\n",
|
||
|
|
"compare.loc['total'] = compare.sum()\n",
|
||
|
|
"compare['delta (min)'] = (compare['FIT (per-sec) min'] - compare['lap (avg-HR) min']).round(1)\n",
|
||
|
|
"print(f'Race {race_date} \u2014 FIT vs lap time-in-zone:')\n",
|
||
|
|
"compare"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"# Weekly time-in-zone (hours) using whichever method was available per activity\n",
|
||
|
|
"wk_cols = ['z1_s','z2_s','z3_s','z4_s','z5_s']\n",
|
||
|
|
"weekly = tiz.groupby('week')[wk_cols].sum() / 3600\n",
|
||
|
|
"weekly.columns = ['Z1','Z2','Z3','Z4','Z5']\n",
|
||
|
|
"\n",
|
||
|
|
"fig, ax = plt.subplots(figsize=(14, 4.8))\n",
|
||
|
|
"colors = ['#264653', '#2a9d8f', '#e9c46a', '#f4a261', '#e76f51']\n",
|
||
|
|
"ax.stackplot(weekly.index, weekly.T.values, labels=weekly.columns, colors=colors, alpha=0.92)\n",
|
||
|
|
"ax.set_ylabel('hours / week')\n",
|
||
|
|
"ax.set_title('Weekly running time by HR zone (Garmin-configured zones)')\n",
|
||
|
|
"ax.legend(loc='upper left', ncol=5, fontsize=9)\n",
|
||
|
|
"fig.autofmt_xdate()\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Polarized index over time\n",
|
||
|
|
"\n",
|
||
|
|
"Collapse to three buckets:\n",
|
||
|
|
"- **easy** = Z1 + Z2 (HR \u2264 143)\n",
|
||
|
|
"- **moderate \"junk\"** = Z3 (144\u2013164)\n",
|
||
|
|
"- **hard** = Z4 + Z5 (HR \u2265 165, threshold and up)\n",
|
||
|
|
"\n",
|
||
|
|
"Polarized: high easy, low moderate, modest hard. Pyramidal: high easy, modest moderate, low hard. Threshold-heavy: lots of moderate."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"buckets = pd.DataFrame({\n",
|
||
|
|
" 'easy': weekly[['Z1','Z2']].sum(axis=1),\n",
|
||
|
|
" 'moderate': weekly['Z3'],\n",
|
||
|
|
" 'hard': weekly[['Z4','Z5']].sum(axis=1),\n",
|
||
|
|
"})\n",
|
||
|
|
"totals = buckets.sum(axis=1)\n",
|
||
|
|
"pct = buckets.div(totals.replace(0, np.nan), axis=0) * 100\n",
|
||
|
|
"\n",
|
||
|
|
"fig, axes = plt.subplots(2, 1, figsize=(14, 7), sharex=True)\n",
|
||
|
|
"\n",
|
||
|
|
"ax = axes[0]\n",
|
||
|
|
"ax.stackplot(pct.index, pct[['easy','moderate','hard']].T.values,\n",
|
||
|
|
" labels=['easy (Z1+Z2)', 'moderate (Z3)', 'hard (Z4+Z5)'],\n",
|
||
|
|
" colors=['#2a9d8f', '#e9c46a', '#e76f51'], alpha=0.9)\n",
|
||
|
|
"ax.axhline(80, color='black', ls='--', lw=0.8, label='80% easy target')\n",
|
||
|
|
"ax.set_ylabel('% of weekly time'); ax.set_ylim(0, 100)\n",
|
||
|
|
"ax.set_title('Polarized-training split (weekly)')\n",
|
||
|
|
"ax.legend(loc='lower left', ncol=4, fontsize=9)\n",
|
||
|
|
"\n",
|
||
|
|
"ax = axes[1]\n",
|
||
|
|
"rolling_pct = pct.rolling(4, min_periods=2).mean()\n",
|
||
|
|
"for col, c in [('easy','#2a9d8f'), ('moderate','#e9c46a'), ('hard','#e76f51')]:\n",
|
||
|
|
" ax.plot(rolling_pct.index, rolling_pct[col], color=c, lw=2, label=col)\n",
|
||
|
|
"ax.axhline(80, color='#2a9d8f', ls='--', lw=0.8)\n",
|
||
|
|
"ax.axhline(20, color='#e76f51', ls='--', lw=0.8)\n",
|
||
|
|
"ax.set_ylabel('% (4-week rolling mean)')\n",
|
||
|
|
"ax.legend(loc='best', fontsize=9)\n",
|
||
|
|
"fig.autofmt_xdate()\n",
|
||
|
|
"fig.tight_layout()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Yearly summary + FIT-method coverage\n",
|
||
|
|
"\n",
|
||
|
|
"`fit_coverage_%` shows what fraction of each year's activities had a linked FIT (and therefore got per-second zones). 2026's lower coverage reflects activities that synced via the live API but aren't in the takeout dump."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"yearly_hours = (tiz.groupby('year')[wk_cols].sum() / 3600).round(1)\n",
|
||
|
|
"yearly_hours.columns = ['Z1','Z2','Z3','Z4','Z5']\n",
|
||
|
|
"yearly_pct = yearly_hours.div(yearly_hours.sum(axis=1), axis=0) * 100\n",
|
||
|
|
"\n",
|
||
|
|
"out = pd.concat({'hours': yearly_hours, '%': yearly_pct.round(1)}, axis=1)\n",
|
||
|
|
"out['easy_pct'] = (yearly_pct[['Z1','Z2']].sum(axis=1)).round(1)\n",
|
||
|
|
"out['hard_pct'] = (yearly_pct[['Z4','Z5']].sum(axis=1)).round(1)\n",
|
||
|
|
"fit_coverage = tiz.groupby('year').apply(\n",
|
||
|
|
" lambda g: (g['source']=='fit').mean() * 100, include_groups=False\n",
|
||
|
|
").round(0)\n",
|
||
|
|
"out['fit_coverage_%'] = fit_coverage\n",
|
||
|
|
"out"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"### Race-build vs base-period zone distribution\n",
|
||
|
|
"\n",
|
||
|
|
"Compare what training looked like in the 12 weeks before each prior 50K race vs the rest of the year. A serious build should shift time into Z2 (long aerobic) and Z4 (threshold/tempo) and away from Z3."
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "code",
|
||
|
|
"execution_count": null,
|
||
|
|
"metadata": {},
|
||
|
|
"outputs": [],
|
||
|
|
"source": [
|
||
|
|
"race_dates = race_meta['start_time_local']\n",
|
||
|
|
"tiz['phase'] = 'base'\n",
|
||
|
|
"for rd in race_dates:\n",
|
||
|
|
" build_start = rd - pd.Timedelta(weeks=12)\n",
|
||
|
|
" mask = (tiz['start_time_local'] >= build_start) & (tiz['start_time_local'] < rd)\n",
|
||
|
|
" tiz.loc[mask, 'phase'] = f'build {rd.year}'\n",
|
||
|
|
"\n",
|
||
|
|
"phase_hours = (tiz.groupby('phase')[wk_cols].sum() / 3600).round(1)\n",
|
||
|
|
"phase_hours.columns = ['Z1','Z2','Z3','Z4','Z5']\n",
|
||
|
|
"phase_pct = (phase_hours.div(phase_hours.sum(axis=1), axis=0) * 100).round(1)\n",
|
||
|
|
"phase_pct['easy_Z1+Z2'] = (phase_pct['Z1'] + phase_pct['Z2']).round(1)\n",
|
||
|
|
"phase_pct['junk_Z3'] = phase_pct['Z3']\n",
|
||
|
|
"phase_pct['hard_Z4+Z5'] = (phase_pct['Z4'] + phase_pct['Z5']).round(1)\n",
|
||
|
|
"phase_pct[['easy_Z1+Z2','junk_Z3','hard_Z4+Z5']].sort_index()"
|
||
|
|
]
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"cell_type": "markdown",
|
||
|
|
"metadata": {},
|
||
|
|
"source": [
|
||
|
|
"## What's left when FIT files are ingested\n",
|
||
|
|
"\n",
|
||
|
|
"All four sections now use per-second FIT data where it's linked (349 of 378 activities, 92%). Remaining lap-only activities are mostly old multi-sport / triathlon legs that no FIT was uploaded for. Useful follow-ups:\n",
|
||
|
|
"\n",
|
||
|
|
"- **Cadence stability** \u2014 plot cadence over elapsed time within a long run; quantify the drop in the final 15 %.\n",
|
||
|
|
"- **GPS polylines for route clustering** \u2014 current \u00a73 uses start coordinates only; with full FIT GPS tracks, match routes by Hausdorff distance (more accurate than start-only).\n",
|
||
|
|
"- **Decoupling vs fueling protocol** \u2014 once the user logs even informal fueling notes for a few long runs, regress decoupling against carb intake."
|
||
|
|
]
|
||
|
|
}
|
||
|
|
],
|
||
|
|
"metadata": {
|
||
|
|
"kernelspec": {
|
||
|
|
"display_name": ".venv",
|
||
|
|
"language": "python",
|
||
|
|
"name": "python3"
|
||
|
|
},
|
||
|
|
"language_info": {
|
||
|
|
"name": "python"
|
||
|
|
}
|
||
|
|
},
|
||
|
|
"nbformat": 4,
|
||
|
|
"nbformat_minor": 5
|
||
|
|
}
|