saving

2026-06-12 05:48:30 -04:00
parent 9d91ac8ebc
commit 64a5ab4b7f
37 changed files with 4530 additions and 407 deletions
--- a/README.md
+++ b/README.md
@@ -1,117 +1,154 @@
 # openrun — endurance running analytics

-A local-first, open-source endurance-training analysis pipeline. Garmin data in (live API or official export), SQLite on disk, pandas notebooks out. Banister CTL/ATL/TSB, Pa:HR decoupling (per-mile and per-second from FIT), Garmin-configured HR-zone time-in-zone, route clustering, race-plan projection with forward CTL/ATL/TSB.
+A local-first, open-source endurance-training analysis tool. Garmin data in (live API or official export), SQLite on disk, an in-browser dashboard out — plus a Python API and Jupyter notebooks for power users. No accounts, no cloud, no telemetry. Your data lives in one file on your machine.

-This README focuses on what's in the data, what gets computed from it, and how each derived metric is defined. The package name `openrun` is provisional; the local repo can be renamed without code changes.
+**What it gives you:** Banister CTL/ATL/TSB (fitness/fatigue/form), Pa:HR decoupling per-mile and per-second from FIT, FIT-aware HR zone time-in-zone, GPS route clustering, race-plan projection with forward CTL/ATL/TSB, off-watch activity logging, and a first-run web wizard so non-CLI users can get going in a few minutes.

 ```
 openrun/
-├── pyproject.toml         # packaged as `openrun` (src layout, hatchling)
-├── openrun.toml           # user config (HR zones, weight, LTHR, race calendar)
+├── pyproject.toml             # packaged as `openrun` (src layout, hatchling)
+├── openrun.toml               # your config — created by the setup wizard
+├── data/garmin.db             # SQLite, created on first run
 ├── src/openrun/
-│   ├── __init__.py        # re-exports the common API
-│   ├── config.py          # UserProfile, HRZones, BanisterParams, TOML loader
-│   ├── db.py              # SQLite schema + connection helper
-│   ├── model.py           # loaders + derived metrics (was analysis.py)
-│   └── ingest/
-│       ├── auth.py        # Garmin OAuth login
-│       ├── garmin_api.py  # Path B: incremental sync via garth
-│       ├── garmin_export.py # Path A: parse Connect/Takeout JSON + index FITs
-│       ├── fit_linker.py  # match Takeout FITs to activities by session.start_time
-│       └── time_in_zone.py # cache per-activity HR-zone breakdowns
-├── examples/notebooks/
-│   ├── 01_overview.ipynb     # data coverage, recent activity
-│   ├── 02_running.ipynb      # volume, pace, PMC (Banister), PRs
-│   ├── 03_recovery.ipynb     # sleep, HRV, RHR, body battery
-│   ├── 04_efficiency.ipynb   # m/beat trends year over year
-│   ├── 05_intra_run.ipynb    # decoupling, cadence, routes, HR zones
-│   └── 06_race_plan.ipynb    # periodised plan + tracker + projected PMC
-├── data/garmin.db            # SQLite (created on first run)
-└── {auth,sync,ingest_export,link_fit_files,compute_time_in_zone,db,analysis}.py
-                              # thin top-level shims for back-compat
+│   ├── config.py              # UserProfile, HRZones, BanisterParams + TOML reader/writer
+│   ├── db.py                  # schema + connect() helper
+│   ├── model.py               # loaders + derived metrics
+│   ├── plots.py               # matplotlib plot helpers
+│   ├── setup.py               # idempotent workspace bootstrap + status
+│   ├── ingest/
+│   │   ├── auth.py            # Garmin OAuth login
+│   │   ├── garmin_api.py      # Path B: incremental sync via garth
+│   │   ├── garmin_export.py   # Path A: parse Connect / Takeout export
+│   │   ├── fit_linker.py      # match Takeout FITs by session.start_time
+│   │   ├── manual.py          # CSV → manual_activities (off-watch sessions)
+│   │   └── time_in_zone.py    # per-activity TIZ cache
+│   └── web/                   # Streamlit app
+│       ├── app.py             # controller — st.navigation builds the sidebar
+│       ├── _helpers.py        # cached loaders + sidebar chrome
+│       └── pages/             # one file per page
+└── examples/notebooks/        # original analysis notebooks (kept as reference)
 ```

-Most callers want:
+## TL;DR — install and use
+
+```bash
+uv sync
+uv run openrun-web      # opens http://localhost:8501
+```
+
+A first-launch wizard walks you through profile + HR zones + race calendar, then either ingests a Garmin export zip or takes you to manual logging.
+
+See [QUICKSTART.md](QUICKSTART.md) for the step-by-step.
+
+---
+
+## The two interfaces
+
+### Web app (recommended for everyone)
+
+```bash
+uv run openrun-web
+```
+
+Eight pages, all wired to the same SQLite DB:
+
+| Page | What's on it |
+|---|---|
+| 📊 Dashboard | CTL/ATL/TSB tiles + chart, weekly km + ACWR, polarized split, recent runs |
+| 📋 Activities | Filter by type / date / distance / name → click a row to drill in |
+| 🏃 Activity detail | Splits + pace-vs-HR scatter, time-in-zone bars, FIT decoupling chart |
+| 📈 Race plan | Editable plan rows, projected PMC through race day, race-day TSB table |
+| 📝 Manual log | Form-based logging of off-watch sessions (strength, hikes, unrecorded runs) |
+| 🛌 Recovery | Sleep stages, HRV + RHR overlay, training→next-morning-HRV correlation |
+| 🫀 Efficiency | Metres per heartbeat with 30-run rolling median, distance × year heatmap |
+| 🔄 Sync | Status + in-browser zip ingest with live progress |
+
+### CLI (for scripting / power users)
+
+```bash
+openrun-init                  # bootstrap + status dashboard
+openrun-auth                  # Garmin OAuth login (live sync)
+openrun-sync [--full]         # incremental Garmin Connect sync (downloads per-second FIT for new activities)
+openrun-sync --fit-backfill --fit-type running  # pull per-second FITs for past runs, no website export
+openrun-sync --no-fit         # sync without the per-second FIT download
+openrun-ingest <path|zip>     # parse a Garmin export
+openrun-link-fit <export>     # match Takeout FITs by start time
+openrun-link-fit <root> --relink  # rewrite paths after moving the export
+openrun-time-in-zone          # populate the TIZ cache
+openrun-import-manual <csv>   # bulk off-watch import
+openrun-web                   # launch the Streamlit app
+```
+
+### Python API (for notebooks / one-offs)

 ```python
 from openrun import open_conn, load_activities, banister, daily_training_load_series
+
 conn = open_conn()
-runs = load_activities(conn, type='running')
-pmc  = banister(daily_training_load_series(conn))
+runs = load_activities(conn, type="running")
+pmc = banister(daily_training_load_series(conn, include_manual=True))
 ```

-User-specific values (HR zones, LTHR, weight, race calendar) live in `openrun.toml` — config discovery walks up from cwd looking for it. See [openrun.toml](openrun.toml) for the schema.
-
-CLI entry points (after `uv sync`):
-
-```bash
-openrun-auth               # OAuth login (Path B)
-openrun-sync [--full]      # incremental Garmin Connect sync
-openrun-ingest <export>    # parse a Connect/Takeout export
-openrun-link-fit <export>  # match Takeout FITs to activities by content
-openrun-time-in-zone       # populate the activity_time_in_zone cache
-```
-
-Top-level shims (`uv run sync.py`, `uv run ingest_export.py`, etc.) still work — they re-export `main()` from the corresponding `openrun.ingest.*` module.
-
 ---

 ## 1 — Source data

-### Two ways to get data into the DB
+### Two paths in, intentionally overlapping

-| Path | Script | What you get | When to use |
+| Path | Tool | What you get | When to use |
 |---|---|---|---|
-| **A. Official data export** (recommended) | `ingest_export.py` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes |
-| **B. Live API sync** | `sync.py` (after `auth.py`) | Last N days, lap-level splits, no FIT files | Incremental top-ups between exports |
+| **A. Official data export** (recommended for first load) | `openrun-ingest` | Complete history, FIT files, multi-year wellness | First load; periodic refreshes |
+| **B. Live API sync** | `openrun-auth` then `openrun-sync` — *or the web app's Sync page* | Last N days, lap-level splits, **per-second FIT files** (downloaded via the API) | Incremental top-ups between exports |

-The two paths overlap intentionally: the export is the source of truth, the live API fills the gap between exports. The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests.
+The upsert logic only overwrites the raw-JSON column and `fetched_at` on conflict, so Path B values aren't clobbered by Path A re-ingests.

-### Garmin's export formats
+**No terminal required.** The web app's **Sync** page now logs in to Garmin (email/password, plus an MFA code when the account needs one) and runs the same pull in-process behind a **🔄 Sync now** button — with checkboxes for full backfill, per-second FIT download, and FIT backfill. Credentials go straight to Garmin; only OAuth tokens are cached in `.secrets/`. The CLI (`openrun-auth` + `openrun-sync`) does the identical thing for scripting.

-Garmin actually has *two* export formats, both confusingly called "the export":
+**Per-second from the API.** `openrun-sync` downloads each new activity's original FIT (`/download-service/files/activity/{id}`) into `data/fit/<id>.fit` and links it in `activity_fit_files` — the same table the export-based linker writes, so decoupling, FIT-based time-in-zone, and the route map all work without a website export. Skip it with `--no-fit`. To pull per-second history for activities synced before this existed, run `openrun-sync --fit-backfill [--fit-type running] [--fit-limit N]`, then `openrun-time-in-zone` to refresh the per-second TIZ cache. Downloads are idempotent — skipped when the FIT is already on disk and linked.

-1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data.  Filenames are `<activity_id>_<activity_name>.fit`, distances/durations in SI units (m, s). `ingest_export.py` was built for this.
-2. **Garmin Takeout dump** — UUID-named folder (`<uuid>_N/`) with many unrelated domains (aviation, Tacx, InReach…). The running data lives under `DI_CONNECT/`. **Crucially different conventions:**
+### Garmin's two export formats
+
+Garmin has two unrelated formats both called "the export":
+
+1. **Garmin Connect data export** (`connect.zip`) — request via Account Management → Export Your Data. Filenames are `<activity_id>_<activity_name>.fit`, SI units throughout. The web app's Sync page accepts this zip directly.
+2. **Garmin Takeout dump** — UUID-named folder (`<uuid>_N/`) with many domains (aviation, Tacx, InReach…). Running data lives under `DI_CONNECT/`. **Different conventions:**
   - FIT filenames: `<email>_<upload_id>.fit` — upload IDs ≠ activity IDs
-   - `summarizedActivities` uses scaled-integer units: distance in **cm**, duration in **ms**, elevation in **cm**, speed in **m/s ÷ 10**
-   - `ingest_export.py` now detects and converts these; `link_fit_files.py` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename-based approach doesn't work
+   - `summarizedActivities` uses scaled-integer units: distance **cm**, duration **ms**, elevation **cm**, speed **m/s × 0.1**
+   - `openrun-ingest` detects and converts these; `openrun-link-fit` links FITs by content (parses `session.start_time` and matches activities by ±60 s) since the filename trick doesn't work.

-For a Takeout dump, the full sequence is:
+For Takeout dumps, the full sequence is:

 ```bash
-# 1. Extract the FIT archives in place
 mkdir -p <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit
 unzip -d <export>/DI_CONNECT/DI-Connect-Uploaded-Files/fit \
   <export>/DI_CONNECT/DI-Connect-Uploaded-Files/UploadedFiles_0-_Part*.zip

-# 2. Ingest JSON + index FITs that have parseable activity IDs
-uv run ingest_export.py <export>/
-
-# 3. Link Takeout FITs to activities by start-time match
-uv run link_fit_files.py <export>/
-
-# 4. (Optional) precompute time-in-zone cache
-uv run compute_time_in_zone.py
+uv run openrun-ingest <export>/
+uv run openrun-link-fit <export>/
+uv run openrun-time-in-zone
 ```

-### Garmin's units & quirks worth knowing
+### Off-watch activities

-| Field | Live API | Takeout summarizedActivities | Convert by |
+Recorded Garmin activity isn't the whole picture — strength work, hikes, runs you forgot to record, group runs on someone else's watch. The web app's **Manual log** writes to a separate `manual_activities` table; loaders union it into the PMC when you pass `include_manual=True` (the Dashboard does this by default).
+
+For bulk import, drop a CSV with `activity_date,activity_type,distance_km,duration_min,training_load,notes[,external_id]` and run `uv run openrun-import-manual workouts.csv`. The optional `external_id` makes re-imports idempotent.
+
+### Units worth knowing (Connect vs Takeout)
+
+| Field | Live API / Connect | Takeout `summarizedActivities` | Convert by |
 |---|---|---|---|
 | distance | m | cm | × 0.01 |
 | duration / movingDuration | s | ms | × 0.001 |
 | elevationGain / Loss | m | cm | × 0.01 |
-| averageSpeed / maxSpeed | m/s | m/s ÷ 10 | × 10 |
-| HR (avg, max) | bpm | bpm | — |
-| calories | kcal | kcal | — |
-| training_effect, vo2_max | scalar | scalar | — |
+| averageSpeed / maxSpeed | m/s | m/s × 0.1 | × 10 |
+| HR (avg, max), calories, training_effect, vo2_max | bpm / kcal / scalar | — | — |
 | cadence (`averageRunCadence`) | both-legs SPM | both-legs SPM | — (do **not** double) |
 | strideLength | cm | cm | × 0.01 for m |
 | FIT `position_lat/long` | semicircles | semicircles | × (180 / 2³¹) for degrees |
 | FIT `enhanced_speed` | m/s | m/s | — |

-The cadence trap bit me: `averageRunCadence` on this account's exports is already both-legs (~150–180 spm). Doubling it as if it were single-leg yields 300+ which trips any sane filter, then nothing makes it through.
+The cadence trap is real: `averageRunCadence` is already both-legs (~150–180 spm). Doubling it as if it were single-leg yields 300+, which trips any sane filter.

 ---

@@ -121,18 +158,20 @@ The cadence trap bit me: `averageRunCadence` on this account's exports is alread

 | Table | Grain | Origin | Notes |
 |---|---|---|---|
-| `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; the parsed columns are a curated subset |
-| `activity_splits` | one row per lap | Path B only | Garmin defaults to auto-laps (~1 km or 1 mi per split). `raw` has cadence, stride, vert osc, GPS bounds |
-| `activity_fit_files` | one row per linked FIT | Path A | `fit_path` is **relative** to whichever export root was passed to `link_fit_files.py` — keep the export folder in place |
-| `activity_time_in_zone` | one row per activity | precomputed | Cached by `compute_time_in_zone.py`; `source` is `'fit'` or `'lap'` |
-| `daily_steps` / `daily_sleep` / `daily_stress` / `daily_hrv` / `daily_body_battery` / `daily_intensity_minutes` / `daily_resting_hr` | one row per calendar date | both paths | Each has a `raw` JSON column with the full source payload |
-| `sync_state` | key-value | both paths | Records last-sync / last-ingest timestamps |
+| `activities` | one row per activity | both paths | `raw` JSON has every Garmin field; parsed columns are a curated subset |
+| `activity_splits` | one row per lap | Path B only | Auto-laps (~1 km or 1 mi). `raw` has cadence, stride, GPS bounds |
+| `activity_fit_files` | one row per linked FIT | linker | `fit_path` is stored **absolute** at link time. Move the export → run `openrun-link-fit --relink` |
+| `activity_time_in_zone` | one row per activity | precomputed | `source = 'fit'` (per-second) or `'lap'` (split-average) |
+| `manual_activities` | one row per logged session | web form / CSV | Parallel to `activities`, never clobbered by Garmin re-ingest |
+| `race_plan` | one row per ISO-Monday week | web editor | Feeds `banister_forecast` for projected PMC |
+| `daily_*` (steps / sleep / stress / hrv / body_battery / intensity_minutes / resting_hr) | one row per date | both paths | Each has a `raw` JSON column with the full payload |
+| `sync_state` | key/value | both paths | last-sync / last-ingest timestamps |

-The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface (sleep stage breakdowns, GPS bounding box, swimming-specific metrics, sport sub-types), unpack it from `raw` rather than re-ingesting:
+The `raw` column on every table is the full upstream JSON. If you need a field the schema doesn't surface, unpack it from `raw`:

 ```python
-from analysis import expand_raw, load_activities
-acts = load_activities(open_conn(), type='running')
+from openrun import open_conn, load_activities, expand_raw
+acts = load_activities(open_conn(), type="running")
 fields = expand_raw(acts)  # pd.json_normalize'd companion frame
 ```

@@ -140,7 +179,7 @@ fields = expand_raw(acts)  # pd.json_normalize'd companion frame

 ## 3 — Metrics & how they're calculated

-All loaders and helpers live in [analysis.py](analysis.py). The key derivations:
+All loaders and helpers live in [src/openrun/model.py](src/openrun/model.py). Highlights:

 ### Pace

@@ -148,19 +187,17 @@ All loaders and helpers live in [analysis.py](analysis.py). The key derivations:
 pace_min_per_km = (moving_duration_s / 60) / (distance_m / 1000)
 ```

-`moving_duration_s` excludes paused time. Activities with implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace on the fast side and walks-with-stops on the slow side) get masked to NaN in `load_activities` rather than dropped, so the row count is preserved.
+Implausible paces (sub-3 min/km or over 30 min/km — covers world-record marathon pace and walks-with-stops) get masked to NaN in `load_activities` rather than dropped, so row count is preserved.

 ### Acute:Chronic Workload Ratio (ACWR)

-Daily training-load sum, then:
-
 ```
 acute   = sum(training_load) over the last 7 days
 chronic = sum(training_load) over the last 28 days, divided by 4 for week-equivalence
 ACWR    = acute / chronic
 ```

-Sweet spot ~0.8–1.3, > 1.5 = aggressive ramp / injury risk. Implemented inline in [02_running.ipynb](notebooks/02_running.ipynb).
+Sweet spot ~0.8–1.3, > 1.5 = aggressive ramp / injury risk. Surfaced on the Dashboard.

 ### Aerobic efficiency — "m/beat"

@@ -168,24 +205,22 @@ Sweet spot ~0.8–1.3, > 1.5 = aggressive ramp / injury risk. Implemented inline
 m_per_beat = distance_m / (duration_s * avg_hr / 60)
 ```

-Higher = more metres covered per heartbeat = fitter aerobic system at the run's effort. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. Used as the headline view in [04_efficiency.ipynb](notebooks/04_efficiency.ipynb).
+Higher = more metres per heartbeat = fitter aerobic system. Best compared YoY within a tight distance bucket so terrain and intent don't dominate. The Efficiency page does this with a 30-run rolling median plus a distance-bucket × year heatmap and an "easy runs only" headline view (HR < 75 % LTHR).

 ### Decoupling (Pa:Hr drift)

-Friel's method, in two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fueling deficit). *Negative* = improved (negative split).
+Friel's method, two resolutions. *Positive* = pace/HR fell off in the second half (cardiac drift, possibly fuelling deficit). *Negative* = improved (negative split).

-**Per-mile (lap level)** — `analysis.decoupling(splits, min_splits=6)`:
+**Per-mile (lap level)** — `decoupling(splits, min_splits=6)`:

 ```
 For each activity with ≥ min_splits laps:
-    halves   = first half / second half by lap count
+    halves   = first / second by lap count
    eff_half = duration-weighted mean(speed_mps) / mean(heart_rate)
    decoupling_pct = (eff_first / eff_second − 1) × 100
 ```

-Cheap, but smears across aid-station stops and rounds at lap boundaries.
-
-**Per-second (FIT)** — `analysis.fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`:
+**Per-second (FIT)** — `fit_decoupling(records, segments=N, warmup_min=5, cooldown_min=2, min_speed_mps=0.5)`:

 ```
 records = per-second FIT messages
@@ -197,97 +232,44 @@ for each chunk:
 decoupling_pct = (efficiency[0] / efficiency[i] − 1) × 100
 ```

-Friel's published thresholds (interpret on **steady aerobic** efforts only):
- **< 5%** — aerobically developed
- **5–10%** — sustainable
- **> 10%** — pacing unsustainable for the distance, OR fueling shortfall, OR heat
+Friel's thresholds (interpret on **steady aerobic** efforts only):
+- **< 5 %** — aerobically developed
+- **5–10 %** — sustainable
+- **> 10 %** — pacing unsustainable for the distance, OR fueling shortfall, OR heat

-In this dataset the per-second number is consistently **7–15 percentage points lower** than the per-mile number for the same run — the gap is aid-station noise.
+The Activity-detail page plots this with `plot_fit_decoupling`.

 ### HR zone time-in-zone

-**Zone boundaries come from Garmin's `heartRateZones.json`** (training method `HR_MAX`), not estimated from observed HR. The constants for this user live in `analysis.HR_ZONES_USER`:
+Zone boundaries come from `[user.hr_zones]` in your `openrun.toml` (matching what Garmin's `heartRateZones.json` advertises). Two implementations; `openrun-time-in-zone` picks the better one per activity and caches it.

-```python
-Z1: 102–122   (recovery)
-Z2: 123–143   (easy aerobic — long-run target)
-Z3: 144–164   (tempo / "junk-miles middle")
-Z4: 165–185   (threshold — LTHR=182 sits inside Z4)
-Z5: 186–209   (VO₂ max)
-```
+**From FIT (`source='fit'`)** — `time_in_zone_from_fit(records)`: per-second; large gaps (>30 s, paused recording) are clipped to 30 s.

-Time-in-zone has two implementations; `compute_time_in_zone.py` picks the better one per activity and caches the result in `activity_time_in_zone`.
-
-**From FIT (`source='fit'`)** — `analysis.time_in_zone_from_fit(records)`:
-
-```
-records = per-second FIT messages
-for each consecutive record pair:
-    dt = elapsed_seconds delta (clipped to [0, 30] so paused recording doesn't dump hours into one zone)
-    zone = bucket(this record's heart_rate)
-    accumulate dt into zone
-```
-
-**From splits (`source='lap'`)** — fallback when no FIT is linked:
-
-```
-for each split:
-    zone = bucket(split's avg_hr)
-    accumulate split.duration_s into that single zone
-```
-
-The FIT version is the right number; the lap version is **biased toward the middle zone**: when a lap's HR average is in Z3 but the actual HR oscillated into Z2 and Z4, the lap method dumps the whole lap into Z3. On a sample 8-hour race in this dataset, the lap method over-counted Z3 by 67 minutes and under-counted Z2 and Z4 by ~32 minutes each.
+**From splits (`source='lap'`)** — fallback when no FIT is linked: assigns each split's avg HR to a single zone. Biased toward the middle zone (smooths over within-lap variation); the FIT version is ground truth where available.

 ### Polarized split

-A three-bucket simplification used in [05_intra_run.ipynb §4](notebooks/05_intra_run.ipynb):
-
 ```
-easy     = Z1 + Z2     (HR ≤ 143)
-moderate = Z3          (144–164, the "junk-miles middle")
-hard     = Z4 + Z5     (≥ 165, threshold and up)
+easy     = Z1 + Z2     (recovery + easy aerobic)
+moderate = Z3          (tempo)
+hard     = Z4 + Z5     (threshold + VO₂)
 ```

-Seiler's polarized target is ~80% easy, < 10% moderate, ~20% hard. Pyramidal training is high easy, modest moderate, low hard. Threshold-heavy training is the opposite — most time in moderate.
+Seiler's polarised target is ~80 % easy, < 10 % moderate, ~20 % hard. The Dashboard shows your last-12-week split as tiles plus a stacked bar.

-### Cadence & stride
+### Banister fitness / fatigue / form (PMC)

-Both come directly from Garmin's lap data (`averageRunCadence`, `strideLength`) and from FIT records (`cadence`, `step_length`). Two gotchas:
-
-1. **Cadence is already both-legs SPM** for this user's exports — do not double. Sanity range for running: 140–200 spm.
-2. **Stride length is in cm** in lap data (`strideLength: 78` → 78 cm → 0.78 m) and in **mm** in FIT records (`step_length: 1041` → 104 cm). Divide accordingly.
-
-The notebook restricts form analysis to splits with cadence > 0 (drops walks/standing intervals) and to a controlled pace band (5:30–6:30 min/km) when comparing YoY so fitness changes don't contaminate the form signal.
-
-### GPS route clustering
-
-`analysis.cluster_routes(lats, lons, radius_km=0.25)`:
+The standard endurance lens (TrainingPeaks PMC), in `banister(daily_load, ctl_tau=42, atl_tau=7)`:

 ```
-Greedy haversine clustering:
-  For each unassigned point i:
-    distances = haversine(point_i, all_other_points)
-    neighbours = points within radius_km
-    if len(neighbours) >= 2:
-        assign all to next_label
-    next_label += 1
+CTL_today = CTL_yesterday · exp(−1/τ_CTL) + load_today · (1 − exp(−1/τ_CTL))   τ_CTL = 42d
+ATL_today = ATL_yesterday · exp(−1/τ_ATL) + load_today · (1 − exp(−1/τ_ATL))   τ_ATL =  7d
+TSB_today = CTL_yesterday − ATL_yesterday    # yesterday's values (TP convention)
 ```

-Per-route pace trends control for terrain — the same loop run in 2023 vs 2025 says way more about fitness than raw monthly pace. Adequate for a few hundred starts; switch to `sklearn.cluster.DBSCAN(metric='haversine')` for thousands.
+`daily_training_load_series(conn, *, include_manual=True)` is the canonical input — it unions Garmin-recorded TL with `manual_activities`. Rest days are filled with 0; both EWMAs still update.

-### Banister fitness / fatigue / form (Performance Management Chart)
-
-The standard endurance-training lens (TrainingPeaks PMC), implemented in `analysis.banister(daily_load, ctl_tau=42, atl_tau=7)`:
-
-```
-CTL_today  = CTL_yesterday · exp(−1/τ_CTL) + load_today · (1 − exp(−1/τ_CTL))   τ_CTL = 42d
-ATL_today  = ATL_yesterday · exp(−1/τ_ATL) + load_today · (1 − exp(−1/τ_ATL))   τ_ATL =  7d
-TSB_today  = CTL_yesterday − ATL_yesterday          # yesterday's values (TP convention)
-```
-
-Helper: `daily_training_load_series(conn, activity_types=('running','trail_running'))` returns the per-day summed `training_load` Series that feeds it. Rest days are filled with 0 — both EWMAs still update, ATL drops faster than CTL, TSB rises.
-
-TSB interpretation (used in both notebooks 02 and 06):
+TSB interpretation (used across the app):

 | TSB | meaning |
 |---|---|
@@ -298,73 +280,103 @@ TSB interpretation (used in both notebooks 02 and 06):
 | **+10 to +25** | **fresh / peaked — race-day target** |
 | > +25 | detrained (taper too long) |

-[02_running.ipynb](notebooks/02_running.ipynb) plots the historical PMC; [06_race_plan.ipynb](notebooks/06_race_plan.ipynb) projects it forward through race day by converting plan-week km into estimated daily loads (separate `RACE_DAY_TL_PER_KM ≈ 7` for the race itself vs `~11` for training runs — race-day TL/km is empirically lower because ultras spread load over more time).
+`banister_forecast(history, future, *, today=None)` splices historical load with planned future load and runs the same recursion forward, so the Race-plan page can project CTL/ATL/TSB through race day. The planned-future series comes from `plan_to_daily_load(plan, *, tl_per_km, race_day_tl_per_km, race_dates)` — convert your weekly plan rows into per-day load using a long-run-Saturday-heavy distribution (Sat 40 %, Tue 20 %, Thu 20 %, Mon 10 %, Wed 10 %), with race weeks treating race day specially.
+
+`calibrate_tl_per_km(conn)` reports the empirical median + IQR of `training_load / km` from your history — that's the number to feed into the plan, rather than a hard-coded constant.
+
+### Personal records
+
+`personal_records(activities, distance_bins_km=(5, 10, 21.0975, 42.195, 50), tolerance=0.05)` picks the fastest run within ±5 % of each bin distance.
+
+### GPS route clustering
+
+`cluster_routes(lats, lons, radius_km=0.25)` — greedy haversine-radius clustering of run starts. Per-route pace trends control for terrain (the same loop in 2023 vs 2025 says more about fitness than raw monthly pace). Adequate for hundreds of starts; the README's old TODO for DBSCAN at thousands is still open.

 ---

-## 4 — Notebook tour
+## 4 — Reference notebooks (kept around for power-user analysis)

-Each notebook bootstraps the same way:
+Each notebook under [examples/notebooks/](examples/notebooks/) bootstraps the same way:

 ```python
-import sys; sys.path.insert(0, '..')
-from analysis import open_conn, load_activities, load_wellness, ...
+from openrun import open_conn, load_activities, ...
 conn = open_conn()
 ```

-| # | Notebook | Covers |
-|---|---|---|
-| 01 | Overview | Data coverage by table, recent activities, monthly mileage YoY, wellness completeness |
-| 02 | Running | Weekly volume + rolling, pace-vs-HR scatter, aerobic-efficiency trend, ACWR, PRs by distance |
-| 03 | Recovery | Sleep composition, after-run vs after-rest HRV/RHR, correlations between training and recovery, weekly km vs recovery indicators |
-| 04 | Efficiency | YoY m/beat by distance bucket, HR/pace split, polynomial curve fits, easy-runs-only headline view |
-| 05 | Intra-run | (1) per-mile decoupling, (1b) per-second decoupling on race FITs, (2) cadence & stride at controlled pace, (3) GPS route clustering, (4) HR-zone time-in-zone (FIT-based) + polarized split |
-| 06 | Race plan | Periodised 17-week plan around 30K → 50K → 50-mile, with auto-tracking against actual recorded volume |
+| # | Notebook | Covers | Web equivalent |
+|---|---|---|---|
+| 01 | Overview | Data coverage, recent activities | Dashboard |
+| 02 | Running | Weekly volume, pace-HR, PMC, PRs | Dashboard + Activities |
+| 03 | Recovery | Sleep composition, HRV, RHR, body battery | Recovery |
+| 04 | Efficiency | YoY m/beat | Efficiency |
+| 05 | Intra-run | Decoupling, cadence, route clusters, TIZ | Activity detail |
+| 06 | Race plan | Periodised plan + projected PMC | Race plan |

-Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts. Edit those, then `uv run python notebooks/_build_NN.py` regenerates the .ipynb. Easier than editing 30 KB of JSON.
+Notebooks 05 and 06 are generated from `_build_05.py` / `_build_06.py` build scripts — edit those, then `uv run python examples/notebooks/_build_NN.py` regenerates the .ipynb. They're kept as a reference for the math the web app implements.

 ---

 ## 5 — Setup

 ```bash
-uv sync                                  # creates .venv from pyproject.toml
+uv sync                          # creates .venv from pyproject.toml
+uv run openrun-web               # launch the web app
 ```

-Kernel: in VSCode, pick the project's `.venv` (top-right of any notebook). If it's not discoverable:
+The first time you open the app, a setup wizard collects your profile + HR zones, optionally takes a race calendar, and offers to ingest a Garmin export zip in-browser. Your config is written to `openrun.toml` in the current directory — edit by hand any time.
+
+If you'd rather work from the CLI directly:

 ```bash
-uv run python -m ipykernel install --user --name garmin --display-name "garmin (.venv)"
+uv run openrun-init              # bootstrap (no input required)
+uv run openrun-ingest <export>   # or
+uv run openrun-auth              # OAuth login (Path B)
+uv run openrun-sync --full       # 365-day backfill
 ```

-### Path A — official export
+`garth` (Garmin's live-sync library) is deprecated upstream (<https://github.com/matin/garth/discussions/222>). If the live sync ever stops working, fall back to Path A.

-1. Go to <https://www.garmin.com/account/datamanagement/exportdata> and request the export. Garmin emails the link within 24–72 h.
-2. Download the zip (several hundred MB to a few GB) or, for the Takeout dump, the UUID-named folder.
-3. Ingest:
+---
+
+## 6 — Configuration
+
+[openrun.toml](openrun.toml) — created by the setup wizard, hand-editable.
+
+```toml
+[user]
+name        = "Athlete"
+hr_max      = 200
+lthr        = 175
+resting_hr  = 50
+
+[user.hr_zones]
+Z1 = [100, 120]
+Z2 = [121, 140]
+Z3 = [141, 160]
+Z4 = [161, 180]
+Z5 = [181, 200]
+
+[db]
+path = "data/garmin.db"
+
+[banister]
+ctl_tau_days       = 42.0
+atl_tau_days       = 7.0
+race_day_tl_per_km = 7.0
+
+[[races]]
+label = "wk 4 — 30K"
+date  = "2026-06-13"
+```
+
+Discovery walks up from cwd looking for `openrun.toml`, falls back to `$OPENRUN_CONFIG`, then `~/.config/openrun/config.toml`, then built-in defaults.
+
+---
+
+## 7 — Tests

 ```bash
-uv run ingest_export.py path/to/connect.zip
-# or, if already unzipped:
-uv run ingest_export.py path/to/<export>/
+uv run pytest
 ```

-Add `--dry-run` to preview without writing. The data is upserted on `activity_id` / `calendar_date`, so it's safe to re-ingest a refreshed export later.
-
-For Takeout dumps specifically, follow the 4-step extract+ingest+link+precompute sequence in §1.
-
-### Path B — live API sync
-
-Garmin's SSO endpoint is rate-limited; you may hit 429s. If so, wait 30–60 min, switch networks, or use Path A.
-
-```bash
-uv run auth.py            # prompts for email / password / MFA, saves OAuth to .secrets/
-uv run sync.py --full     # 365-day backfill
-uv run sync.py            # incremental thereafter (last 14 days)
-
-uv run sync.py --days 90          # custom backfill window
-uv run sync.py --no-details       # skip per-activity detail/splits (much faster)
-uv run sync.py --skip-activities  # wellness only
-```
-
-`garth` is deprecated upstream (<https://github.com/matin/garth/discussions/222>). If the live sync stops working, fall back to Path A.
+[tests/unit/](tests/unit/) is the pure-function surface (loaders, derivations, helpers). [tests/integration/](tests/integration/) is ingest-path + cross-cutting (`tmp_conn` fixture in [tests/conftest.py](tests/conftest.py) builds an in-memory schema). The Streamlit pages are smoke-tested via `streamlit.testing.v1.AppTest` — see the test invocations in [ROADMAP.md](ROADMAP.md) sections.