diff --git a/.claude/skills/.gitignore b/.claude/skills/.gitignore new file mode 100644 index 0000000..7a60b85 --- /dev/null +++ b/.claude/skills/.gitignore @@ -0,0 +1,2 @@ +__pycache__/ +*.pyc diff --git a/.claude/skills/REFERENCE.md b/.claude/skills/REFERENCE.md new file mode 100644 index 0000000..786740c --- /dev/null +++ b/.claude/skills/REFERENCE.md @@ -0,0 +1,145 @@ +# Solar install — system map (shared reference for the solar skills) + +This file is the ground truth the `solar-*` / `troubleshoot-*` / `power-usage` +skills build on. Read it once at the start of any solar task. Everything below +was verified live on this host (the monitoring Pi) on 2026-06-23; re-verify +anything load-bearing before acting on it. + +## Topology + +``` +6× EG4 LifePower4 v2 packs ──RS485 (1 FTDI each)──┐ +2× MPP Solar LVX6048 inverters ──USB-HID/PI18─────┤ this Pi ──MQTT──► HA broker +1× OpenEVSE charger (10.0.0.249) ──its own WiFi───┘ (daemons) 10.0.0.41:1883 +``` + +All telemetry lands on the **MQTT broker at 10.0.0.41:1883** under HA +auto-discovery (`homeassistant///config` retained, `.../state` +republished each poll cycle — **state topics are NOT retained**, so to read +current values you must listen for a window: use `lib/solar-snapshot`). + +Broker credentials live in `~/.config/powermon/powermon.yaml` +(`mqttbroker.{name,port,username,password}`). **Never hardcode them** — every +tool here reads them from that file. `lib/solar-snapshot` does too. + +## The snapshot helper + +`./lib/solar-snapshot` (relative to this skills dir) captures the latest value of +every matching MQTT topic over a short window and prints a table. This is the +primary read tool — prefer it over raw `mosquitto_sub`. + +``` +solar-snapshot [-w SECONDS] [-g GREP_RE] [-f] TOPIC_FILTER... +``` +MQTT `+` matches one WHOLE level, so `lifepower4_+` matches nothing. Subscribe to +`homeassistant/sensor/+/state` and narrow with `-g`: +``` +solar-snapshot -g 'lvx6048_1_' 'homeassistant/sensor/+/state' +solar-snapshot -w 16 -g 'lifepower4_[1-6]_soc/' 'homeassistant/sensor/+/state' +solar-snapshot 'openevse/#' # EVSE publishes on-change; idle when unplugged +``` + +## The history helper + +`solar-snapshot` only sees *now*. For "when did X last happen / show last week", +use `./lib/ha-history`, which queries **Home Assistant's recorder** (the only +store that keeps history — local journald is volatile, ~1 day, wiped on reboot; +no solar data goes to InfluxDB). Default window 7 days; HA recorder default +retention is 10 days. +``` +ha-history [-s SINCE] [-e END] [-m REGEX] [-a] ENTITY... +ha-history -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode +ha-history -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code +``` +**HA entity_ids ≠ MQTT object names.** powermon's hass output doubles the device +slug and is inconsistent across commands, so you must use the real ids, e.g.: +- device mode: `sensor.lvx6048_lvx6048_{1,2}_device_mode` (device slug `lvx6048`) +- fault code: `sensor.lvx6048_0{1,2}_lvx6048_{1,2}_fault_code` (slug `lvx6048_01`/`_02`) +- PV/batt/load: `sensor.lvx6048_lvx6048_1_{mppt1_input_power,mppt1_input_voltage,battery_voltage,ac_output_active_power}` +- EG4 packs follow the same doubling, e.g. `sensor.lifepower4_*`. When unsure, list + them: `curl -s -H "Authorization: Bearer $(cat ~/.config/ha/token)" $HA/api/states + | python3 -c 'import sys,json;[print(s["entity_id"]) for s in json.load(sys.stdin) if "lvx6048" in s["entity_id"]]'` +Auth: reads a long-lived token from `~/.config/ha/token` (mode 600) or `$HA_TOKEN` +— never on the command line, never hardcoded. Base URL `$HA_URL` else +`~/.config/ha/url` else `http://10.0.0.41:8123`. If it reports "no token", the user +must create one (HA → Profile → Security → Long-lived access tokens) and write it +to `~/.config/ha/token`; tell them which file, don't ask them to paste it in chat. +Recorder excludes (per `eg4battery/homeassistant/recorder.yaml`) drop EG4 +per-cell/register/string entities — those have no history; the inverter +`device_mode`/`fault_code` and pack `soc`/`pack_voltage` etc. are recorded. + +## Services (this Pi) + +| Service | Role | Entities it feeds | +|---|---|---| +| `powermon.service` | LVX6048 #1 poller (PI18/USB) | `lvx6048_1_*` | +| `powermon2.service` | LVX6048 #2 poller (PI18/USB) | `lvx6048_2_*` | +| `lvx-resolve-links.service` | oneshot: maps `/dev/hidraw*` → `/dev/lvx6048-{1,2}` by PI18 serial; runs before powermon | (links) | +| `lvx-control.service` | bridges `solar/control/lvx6048/*` → powermon adhoc queue | (control) | +| `eg4-battery.service` | polls all 6 packs over RS485/Modbus | `lifepower4_1..6_*` | + +Quick health: `systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service` +Logs: `journalctl -u --since "10 min ago" --no-pager` + +## Entities cheat-sheet + +**Inverters** `lvx6048_{1,2}_*` (PI18 GS/MOD/PIRI/FWS/ET): +`device_mode` (Power-On/Standby/Bypass/Battery/Fault/Charge…), `fault_code`, +`battery_voltage`, `battery_capacity` (%), `ac_output_active_power` (W), +`ac_output_voltage`, `grid_voltage`, `mppt1_input_power`/`mppt2_input_power` (W, PV), +`inverter_heat_sink_temperature`, `parallel_instance_number` (0 = master, 1+ = slave). + +**Packs** `lifepower4_{1..6}_*` (Modbus): `soc`, `soc_alt`, `pack_voltage`, +`pack_current` (signed, + = charging), `cell_01..16_voltage`, +`cell_voltage_delta_mv` (imbalance), `cell_voltage_min`/`max`, `capacity_ah`, +`temperature_01..04`, `temperature_pcb`, `model`, `firmware_version`, +`firmware_date`, warning/protection bits, `register_NN` raw. There are 16 cells/pack. + +**EVSE** `openevse/` and `openevse_*` HA entities: `power` (W), `voltage`, +`amp` (mA raw → A in HA), `pilot`, `max_current`, `session_energy` (Wh), +`total_energy`, `status` (active/sleeping/disabled…), `state`, `temp`, +`vehicle` (plug). Charger HTTP UI at http://10.0.0.249. + +Derived HA template sensors (`lifepower4_N_pack_power`, `_temperature_max`, +`_cell_imbalance_pct`, `lifepower4_stack_*`) are computed **inside HA**, not on +MQTT — compute them yourself from the raw entities when working off the Pi. + +## Known issues / gotchas (check memory for the canonical versions) + +- **Inverter `battery_voltage` is INTERMITTENTLY wrong** — read a correct ~54 V on + 2026-06-20 (verified via HA history), but ~9–10 V on 2026-06-23/24 after the Jun 22 + 14:18 reboot, with packs steady at ~52–53 V throughout. So it's a post-reboot / + re-init glitch (the inverter or PI18 GS field not settling after restart), NOT a + permanent scaling bug. Implication: treat the inverter battery reading as + untrustworthy and use the `lifepower4_*` pack entities for any battery math; if it + reads ~10 V right now, a powermon (or inverter) restart may clear it — worth testing. +- **Pack 6 is an oddball**: Modbus addr `0x01` @ 115200 (packs 1–5 are `0x40` @ + 9600); ran 65 % SoC while 1–5 sat 40–44 %. Treat as a distinct member. +- **EG4 SoC never re-anchors** (drifts because packs rarely hit 100 % to reset the + coulomb counter). See memory `project_eg4_soc_drift_remediation`. +- **RS485 daisy-chain silences slave packs** — each pack needs its own FTDI; an + inter-pack chain demotes slaves. See memory `project_eg4_daisy_chain_silences_slaves`. +- **No per-day inverter energy** — PI18 only gives `ET` (lifetime Wh); ED/EM/EY NAK. + Daily kWh must come from HA recorder or ET deltas. +- **Parallel cluster**: changing inverter settings on only one unit risks fault 86 + (desync). `lvx-control` always mirrors to both — that's why setters go through it. + +## Action policy for these skills + +**Allowed (safe remediation):** +- Read anything: `solar-snapshot`, `mosquitto_sub`, `journalctl`, `systemctl status/is-active`. +- Restart the data-plane daemons when they're wedged: + `sudo systemctl restart powermon.service` / `powermon2.service` / `eg4-battery.service` / `lvx-control.service` +- Recover inverter USB links: `sudo systemctl restart lvx-resolve-links.service` + or `sudo /usr/local/sbin/lvx-resolve-links`. + +**Forbidden (escalate to the user instead — propose the exact command, don't run it):** +- Any inverter/battery **setter**: `solar/control/lvx6048/*` publishes + (charger priority, max charge current, output priority, …). +- `lvx-flash/flash.py apply` and `dump`/`compare`/`sync-check` — they contend for + exclusive USB and stop powermon; advanced, user-driven only. +- Anything that writes battery thresholds, output mode, or factory resets. +- Power-cycling hardware, moving cables, breaker changes. + +When a fix is outside the allowed set, report the finding and hand the user the +precise command(s) to run. diff --git a/.claude/skills/lib/ha-history b/.claude/skills/lib/ha-history new file mode 100755 index 0000000..069100c --- /dev/null +++ b/.claude/skills/lib/ha-history @@ -0,0 +1,184 @@ +#!/usr/bin/env python3 +"""ha-history — pull state history for HA entities from the Home Assistant +recorder, and print a compact change-point timeline. Companion to solar-snapshot +(which only sees live values); this is the historic-lookback tool. + +Auth: reads a long-lived access token from ~/.config/ha/token (or $HA_TOKEN). + No secret is ever passed on the command line or hardcoded. +Base URL: $HA_URL, else ~/.config/ha/url, else http://10.0.0.41:8123. + +Usage: + ha-history [-s SINCE] [-e END] [-m REGEX] [-a] ENTITY [ENTITY ...] + + ENTITY entity_id; a bare name with no dot is auto-prefixed `sensor.` + e.g. `lvx6048_1_device_mode` -> `sensor.lvx6048_1_device_mode` + -s SINCE start of window. default "7 days ago". + accepts: "7 days ago", "7d", "36h", "90m", an ISO timestamp, + or a date "2026-06-16". + -e END end of window. default: now. same formats as -s. + -m REGEX only show change-points whose state matches REGEX (case-insensitive); + the per-entity header still reports the full count. e.g. -m fault + -a show every recorded point, not just state *changes*. + +Examples: + ha-history lvx6048_1_device_mode lvx6048_2_device_mode + ha-history -s "10 days ago" -m fault lvx6048_1_fault_code lvx6048_2_fault_code + ha-history -s 2026-06-16 -e 2026-06-23 lvx6048_1_device_mode +""" +import sys, os, re, json, argparse, urllib.request, urllib.parse, urllib.error +from datetime import datetime, timedelta, timezone + +CONF_DIR = os.path.expanduser("~/.config/ha") +DEFAULT_URL = "http://10.0.0.41:8123" + + +def die(msg, code=1): + print(f"ha-history: {msg}", file=sys.stderr) + sys.exit(code) + + +def load_token(): + tok = os.environ.get("HA_TOKEN") + if tok: + return tok.strip() + path = os.path.join(CONF_DIR, "token") + if not os.path.exists(path): + die("no token. Create a Long-Lived Access Token in HA " + "(Profile -> Security), then:\n" + " mkdir -p ~/.config/ha && install -m600 /dev/stdin ~/.config/ha/token\n" + "or set $HA_TOKEN.") + with open(path) as f: + tok = f.read().strip() + if not tok: + die(f"{path} is empty") + return tok + + +def base_url(): + if os.environ.get("HA_URL"): + return os.environ["HA_URL"].rstrip("/") + p = os.path.join(CONF_DIR, "url") + if os.path.exists(p): + with open(p) as f: + u = f.read().strip() + if u: + return u.rstrip("/") + return DEFAULT_URL + + +def parse_when(s, *, default_now=False): + if s is None: + return datetime.now(timezone.utc).astimezone() if default_now else None + s = s.strip() + m = re.fullmatch(r"(\d+)\s*(d|h|m)(?:ays?|ours?|in(?:ute)?s?)?(?:\s*ago)?", s, re.I) + if m: + n, unit = int(m.group(1)), m.group(2).lower() + delta = {"d": timedelta(days=n), "h": timedelta(hours=n), "m": timedelta(minutes=n)}[unit] + return datetime.now(timezone.utc).astimezone() - delta + # ISO timestamp or bare date + try: + dt = datetime.fromisoformat(s) + except ValueError: + die(f"can't parse time {s!r}. Use '7 days ago', '36h', ISO, or 'YYYY-MM-DD'.") + if dt.tzinfo is None: # assume local tz + dt = dt.astimezone() + return dt + + +def fetch(url, token): + req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"}) + try: + with urllib.request.urlopen(req, timeout=30) as r: + return json.load(r) + except urllib.error.HTTPError as e: + if e.code == 401: + die("401 Unauthorized — token rejected. Regenerate it in HA and rewrite " + "~/.config/ha/token.") + die(f"HTTP {e.code} from HA: {e.reason}") + except urllib.error.URLError as e: + die(f"cannot reach HA at {url.split('/api')[0]}: {e.reason}") + + +def fmt_local(iso): + """HA returns UTC ISO; show local time, second precision.""" + try: + return datetime.fromisoformat(iso).astimezone().strftime("%Y-%m-%d %H:%M:%S") + except (ValueError, TypeError): + return str(iso) + + +def main(): + ap = argparse.ArgumentParser(add_help=False) + ap.add_argument("-s", "--since", default="7 days ago") + ap.add_argument("-e", "--end", default=None) + ap.add_argument("-m", "--match", default=None) + ap.add_argument("-a", "--all-points", action="store_true") + ap.add_argument("-h", "--help", action="store_true") + ap.add_argument("entities", nargs="*") + a = ap.parse_args() + if a.help or not a.entities: + print(__doc__.strip()) + sys.exit(0 if a.help else 2) + + ents = [e if "." in e else f"sensor.{e}" for e in a.entities] + start = parse_when(a.since) + end = parse_when(a.end, default_now=True) + matcher = re.compile(a.match, re.I) if a.match else None + + token = load_token() + url = (f"{base_url()}/api/history/period/" + f"{urllib.parse.quote(start.isoformat())}" + f"?end_time={urllib.parse.quote(end.isoformat())}" + f"&filter_entity_id={urllib.parse.quote(','.join(ents))}" + f"&minimal_response&no_attributes") + data = fetch(url, token) + + print(f"# HA history {start.strftime('%Y-%m-%d %H:%M')} -> " + f"{end.strftime('%Y-%m-%d %H:%M')} ({base_url()})\n") + + by_id = {} + for series in data or []: + if series: + by_id[series[0].get("entity_id")] = series + + for ent in ents: + series = by_id.get(ent) + if not series: + print(f"{ent}\n (no recorded history in window — entity wrong, " + f"excluded from recorder, or purged)\n") + continue + # Build (time, state) points, collapsing consecutive identical states + # unless --all-points. + points, prev = [], object() + for item in series: + st = item.get("state") + ts = item.get("last_changed") or item.get("last_updated") + if a.all_points or st != prev: + points.append((ts, st)) + prev = st + shown = [(ts, st) for ts, st in points if not matcher or matcher.search(str(st))] + label = "points" if a.all_points else "changes" + extra = f", {len(shown)} match /m" if matcher else "" + print(f"{ent} ({len(points)} {label}{extra})") + if not shown: + print(" (nothing matched)\n") + continue + for i, (ts, st) in enumerate(shown): + mark = " <<< FAULT" if re.search(r"fault", str(st), re.I) and st not in ("No fault",) else "" + # duration until next change-point in the *full* timeline + dur = "" + if not matcher: + nxt = points[i + 1][0] if i + 1 < len(points) else None + if nxt: + try: + d = datetime.fromisoformat(nxt) - datetime.fromisoformat(ts) + secs = int(d.total_seconds()) + dur = f" ({secs//3600}h{secs%3600//60:02d}m)" if secs >= 3600 else f" ({secs//60}m)" + except (ValueError, TypeError): + pass + print(f" {fmt_local(ts)} {st}{dur}{mark}") + print() + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/lib/solar-snapshot b/.claude/skills/lib/solar-snapshot new file mode 100755 index 0000000..772365a --- /dev/null +++ b/.claude/skills/lib/solar-snapshot @@ -0,0 +1,97 @@ +#!/usr/bin/env bash +# solar-snapshot — capture the latest retained/published value of every MQTT +# topic matching a filter, over a short listen window, and print a clean table. +# +# Why a listen window: powermon/eg4-battery STATE topics are NOT retained — they +# are republished every poll cycle (GS ~5s, packs ~one cycle, EVSE on-change). +# So we subscribe for a few seconds and keep the last value seen per topic. +# (HA discovery `.../config` topics ARE retained and show up immediately.) +# +# Broker credentials are read from ~/.config/powermon/powermon.yaml (the same +# source the openevse + lvx-control tools use) so nothing is hardcoded here. +# +# NOTE on MQTT wildcards: `+` matches exactly ONE whole level, so it cannot be +# used as a name prefix. `homeassistant/sensor/lifepower4_+/state` matches NOTHING. +# To grab a family of entities, subscribe to the level wildcard and filter with -g: +# solar-snapshot -g lifepower4 'homeassistant/sensor/+/state' +# +# Usage: +# solar-snapshot [-w SECONDS] [-f] [-g GREP_RE] TOPIC_FILTER [TOPIC_FILTER ...] +# -w SECONDS listen window (default 12) +# -f print full topic path (default: strip homeassistant// prefix) +# -g GREP_RE keep only topics whose path matches this extended-regex +# +# Examples: +# solar-snapshot -g 'lvx6048_1' 'homeassistant/sensor/+/state' +# solar-snapshot -w 18 -g 'lifepower4_[1-6]_soc' 'homeassistant/sensor/+/state' +# solar-snapshot 'openevse/#' +# solar-snapshot -w 6 'homeassistant/sensor/lvx6048_1_battery_voltage/state' \ +# 'homeassistant/sensor/lifepower4_1_pack_voltage/state' +# +# Exit status reflects the formatting stage, not mosquitto_sub's benign -W +# window-expiry code, so callers don't misread a normal capture as a failure. +set -eu + +WINDOW=12 +FULL=0 +GREP_RE="" +while getopts "w:fg:" opt; do + case "$opt" in + w) WINDOW="$OPTARG" ;; + f) FULL=1 ;; + g) GREP_RE="$OPTARG" ;; + *) echo "usage: solar-snapshot [-w SECONDS] [-f] [-g GREP_RE] TOPIC_FILTER..." >&2; exit 2 ;; + esac +done +shift $((OPTIND - 1)) +if [ "$#" -lt 1 ]; then + echo "usage: solar-snapshot [-w SECONDS] [-f] [-g GREP_RE] TOPIC_FILTER..." >&2 + exit 2 +fi + +CONF="${POWERMON_CONF:-$HOME/.config/powermon/powermon.yaml}" +if [ ! -r "$CONF" ]; then + echo "solar-snapshot: cannot read broker config $CONF" >&2 + exit 1 +fi + +# Pull host/port/user/pass from the mqttbroker: block of powermon.yaml. +# Keys are anchored to leading whitespace + exact key so `name:` doesn't also +# match `username:`. +read -r HOST PORT USER PASS < <(awk ' + /^[^[:space:]]/ { inblk=0 } + /^mqttbroker:/ { inblk=1; next } + inblk && /^[[:space:]]+name:/ { h=$2 } + inblk && /^[[:space:]]+port:/ { p=$2 } + inblk && /^[[:space:]]+username:/ { u=$2 } + inblk && /^[[:space:]]+password:/ { w=$2 } + END { print h, (p?p:1883), u, w } +' "$CONF") + +if [ -z "${HOST:-}" ]; then + echo "solar-snapshot: no mqttbroker.name found in $CONF" >&2 + exit 1 +fi + +# Build -t args from filters. +TARGS=() +for f in "$@"; do TARGS+=(-t "$f"); done + +# Subscribe for the window, then reduce to last-value-per-topic. +timeout "$((WINDOW + 2))" mosquitto_sub -h "$HOST" -p "$PORT" -u "$USER" -P "$PASS" \ + -W "$WINDOW" -v "${TARGS[@]}" 2>/dev/null \ +| { [ -n "$GREP_RE" ] && grep -E "$GREP_RE" || cat; } \ +| awk -v full="$FULL" ' + { t=$1; $1=""; sub(/^ /,""); v=$0; last[t]=v; order[t]=NR } + END { + n=0 + for (t in last) { keys[n++]=t } + # stable-ish sort by topic name + for (i=0;i- + Analyze where the power is going across the install — load vs PV generation vs + grid vs battery flow, plus EVSE charging sessions. Use when the user asks "why is + my battery draining / how much am I using / where are the watts going / is the car + charging / what's my solar production / power consumption", or wants an energy + balance or breakdown. Read-only; this skill measures and explains, it does not + change anything. +--- + +# power-usage + +## 0. Load context +Shell cwd is the repo root; anchor paths there: +```bash +ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot" +``` +Read `$ROOT/.claude/skills/REFERENCE.md` for entity names. Key sign conventions: pack +`pack_current` is signed (**+ = charging, − = discharging**); inverter +`mppt*_input_power` is PV in (W); `ac_output_active_power` is load out (W). + +## 1. Instantaneous energy balance +```bash +# Generation (PV) + load, per inverter: +"$SNAP" -w 10 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|ac_output_active_power|grid_voltage|device_mode)/' 'homeassistant/sensor/+/state' +# Battery flow, per pack (sum the pack_power = V×I yourself): +"$SNAP" -w 16 -g 'lifepower4_[1-6]_(pack_voltage|pack_current|soc)/' 'homeassistant/sensor/+/state' +# EV charger: +"$SNAP" -w 8 'openevse/status' 'openevse/power' 'openevse/amp' 'openevse/voltage' 'openevse/session_energy' +``` +Then state the balance in words: +- **PV in** = sum of all `mppt*_input_power`. +- **Battery** = sum of (pack_voltage × pack_current) over 6 packs. Negative total = + discharging (load exceeds PV+grid); positive = charging. +- **Load out** = sum of inverter `ac_output_active_power`. +- **EVSE** = `openevse/power` — and the EVSE load is a *subset* of total load, so a + draining battery with the car plugged usually explains itself here. +- **Grid**: `device_mode` Bypass/Line means grid is carrying/supplementing; Battery + mode means running off the bank. The LVX6048 has no clean grid-power entity, so + infer grid = load − PV − battery_discharge. + +Sanity: PV + grid + battery_discharge ≈ load (within metering noise). A big residual +means one feed is mis-reported — note it (e.g. the known `lvx6048_1_battery_voltage` +~10 V glitch will corrupt any pack-power math that uses the *inverter's* battery +reading; always use the **pack** entities for battery flow). + +## 2. "Why is the battery draining?" +Walk the chain: is PV low (night/shade/§5 of troubleshoot-inverter dead string)? Is +load high (check `ac_output_active_power` and EVSE `power`)? Is the inverter in +Battery mode instead of using grid (`device_mode`)? Pin the drain on the largest +negative contributor and say which. + +## 3. EVSE sessions +```bash +"$SNAP" -w 10 'openevse/status' 'openevse/state' 'openevse/session_energy' 'openevse/total_energy' 'openevse/vehicle' 'openevse/pilot' 'openevse/max_current' +``` +- `status` active = charging; sleeping/disabled = not drawing. `vehicle` = plugged. +- `session_energy` (Wh) this plug-in; `pilot`/`max_current` = the current cap the + EVSE is signalling. Idle EVSE publishes little — a short empty capture is normal. +- For history/trends (daily kWh, past sessions), the data lives in **Home Assistant's + recorder**, not on MQTT — direct the user to the HA Energy dashboard / + `sensor.openevse_total_day|week|month`. PI18 has no per-day inverter energy + (memory `project_lvx6048_no_daily_energy_query`); only `ET` lifetime Wh exists. + +## 4. Report +Give the live balance (PV / load / battery / grid / EVSE, with numbers and signs), +the headline ("you're pulling X W from the bank because load Y W > PV Z W, car is +taking W W"), and point at HA recorder for anything historical. This skill never +changes settings — if the answer is "shift charging to solar hours" etc., suggest it +as advice, don't actuate. diff --git a/.claude/skills/solar-health-check/SKILL.md b/.claude/skills/solar-health-check/SKILL.md new file mode 100644 index 0000000..b41a517 --- /dev/null +++ b/.claude/skills/solar-health-check/SKILL.md @@ -0,0 +1,103 @@ +--- +name: solar-health-check +description: >- + Top-level health snapshot of the whole solar/power install — 2 LVX6048 + inverters, 6 EG4 LifePower4 packs, and the OpenEVSE charger — with cross-checks + and a green/yellow/red verdict. Use when the user asks "how's the solar / + battery / power system doing", "is everything ok", "check the install", wants a + status report, or as the first step before deeper troubleshooting. For deep + dives into one subsystem, hand off to troubleshoot-inverter, troubleshoot-battery, + or power-usage. +--- + +# solar-health-check + +A fast, read-only sweep of every subsystem that ends in a clear verdict. Do NOT +change settings here; if something needs a restart, that's allowed (see policy). + +## 0. Load context +Skills run with the shell cwd at the repo root, so anchor paths there: +```bash +ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history" +``` +Read `$ROOT/.claude/skills/REFERENCE.md` (system map, entity names, snapshot helper, +action policy) before proceeding. + +## 1. Services up? +```bash +systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service lvx-resolve-links.service +``` +`lvx-resolve-links` is a oneshot → expect `active`/`exited` (not `failed`). Any +`failed`/`inactive` on the others is RED. For a wedged data-plane daemon, a +restart is allowed (see §6). + +## 2. Capture live telemetry +```bash +"$SNAP" -w 10 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|battery_capacity|ac_output_active_power|mppt1_input_power|mppt2_input_power|grid_voltage|inverter_heat_sink_temperature|parallel_instance_number)/' 'homeassistant/sensor/+/state' +"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current|cell_voltage_delta_mv|temperature_pcb)/' 'homeassistant/sensor/+/state' +"$SNAP" -w 6 'openevse/status' 'openevse/amp' 'openevse/power' 'openevse/session_energy' +``` +If a family returns "(no messages)": the feeding daemon is silent → that subsystem +is RED regardless of `is-active` (running but not publishing). EVSE idle/unplugged +publishing nothing is normal — confirm via `openevse/status`. + +## 3. Cross-checks (this is the value-add — single sensors can each look fine) +- **Battery voltage agreement**: each inverter's `battery_voltage` should be within + ~1 V of the pack stack voltage (`pack_voltage` ≈ 51–55 V). **Known anomaly:** the + inverter reading is *intermittently* wrong (correct ~54 V on 2026-06-20, ~9–10 V + after the Jun 22 reboot) — a post-reboot glitch, not a permanent bug. If it reads + ~10 V, note it and suggest a powermon restart; use the `lifepower4_*` pack entities, + never the inverter reading, for any battery math (see REFERENCE known-issues). +- **Cross-unit PV production (catches a silently-dead inverter)**: compare + `lvx6048_1_mppt1_input_power` vs `lvx6048_2_mppt1_input_power`. In daylight (the + *other* unit clearly producing), one unit pinned at **0 W** = that inverter is down + and being masked by its sibling — RED → troubleshoot-inverter. This is exactly the + 2026-06-20 fault-08 failure mode (unit 1 sat at 0 W for ~1.8 days). At night/heavy + shade both at 0 W is normal. +- **SoC spread across packs**: `max(soc) - min(soc)` over the 6 packs. BUT first + cross-check against `pack_voltage`/`cell_voltage_max`: the packs are paralleled, so + if all `pack_voltage` agree (±0.1 V) the packs are physically at the same charge and + any SoC spread is **counter drift**, not real imbalance (pack 6 ran 76 % while + reading the same 53.4 V / 3.337 V/cell as packs at 50–55 % on 2026-06-24). Real + imbalance = pack voltages actually diverge. Drift → note it, recommend a calibration + charge; >20 % spread with diverging voltages = RED → troubleshoot-battery. +- **Cell imbalance**: any pack with `cell_voltage_delta_mv` > 50 = YELLOW, > 100 = RED. +- **Parallel master/slave**: exactly one inverter should report + `parallel_instance_number` 0 (master); the other 1+. Two masters or two slaves = RED. +- **Faults**: any `fault_code` non-zero, or `device_mode` = Fault = RED → troubleshoot-inverter. +- **Temps**: pack `temperature_pcb` > 55 °C or inverter heat-sink > 75 °C = YELLOW. +- **Power balance sanity**: PV in (`mppt*_input_power`) vs AC out vs pack + `pack_current` should roughly conserve. Gross mismatch = investigate via power-usage. + +## 4. Verdict +Print a compact table (subsystem → state → one-line reason), then an overall +GREEN / YELLOW / RED with the top 1–3 issues and which deeper skill to run. + +## 5. Recent error scan (only if anything looked off) +```bash +for s in powermon powermon2 eg4-battery lvx-control; do + echo "== $s =="; journalctl -u $s.service --since "15 min ago" --no-pager | grep -iE 'error|timeout|fail|crc|nak|reconnect' | tail -5 +done +``` + +## 5b. Historical sanity — did anything fail while unattended? (needs HA token) +Live snapshots miss faults that already cleared and silent-unit spells that ended. +If `~/.config/ha/token` exists (see REFERENCE), scan the recorder for the last few +days. Use the REAL HA entity_ids (doubled slug — see REFERENCE), not MQTT names: +```bash +"$HIST" -s "5 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code +# silent-unit hunt: sample midday PV both units across recent days; one pinned 0 while +# the other produced = it was down. e.g. check a midday window per day: +"$HIST" -s "2 days ago" sensor.lvx6048_lvx6048_1_mppt1_input_power sensor.lvx6048_lvx6048_2_mppt1_input_power | head -40 +``` +Any fault-08 / silent-unit episode → report with timestamps and hand off to +troubleshoot-inverter §2–§5. No token → say so and point the user at REFERENCE to add one. + +## 6. Allowed remediation +If a daemon is `failed` or running-but-silent, restarting it is permitted: +```bash +sudo systemctl restart eg4-battery.service # or powermon / powermon2 / lvx-control +``` +Re-run the relevant snapshot to confirm data resumes. Anything beyond a restart +(settings, flash, cabling) → report and hand the user the exact command. Never +publish to `solar/control/lvx6048/*` from this skill. diff --git a/.claude/skills/troubleshoot-battery/SKILL.md b/.claude/skills/troubleshoot-battery/SKILL.md new file mode 100644 index 0000000..2e9491f --- /dev/null +++ b/.claude/skills/troubleshoot-battery/SKILL.md @@ -0,0 +1,77 @@ +--- +name: troubleshoot-battery +description: >- + Diagnose the 6× EG4 LifePower4 v2 battery packs — per-pack SoC, cell imbalance, + SoC drift, RS485/Modbus comms silence, temperature, and warning/protection bits. + Use when a pack reads oddly, packs disagree on SoC, a pack stopped reporting, + cells look imbalanced, the user mentions "battery problem / pack down / SoC wrong + / imbalance / one battery", or after solar-health-check flags the stack. Read-only + plus a safe eg4-battery daemon restart; never writes BMS settings. +--- + +# troubleshoot-battery + +## 0. Load context +Shell cwd is the repo root; anchor paths there: +```bash +ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot" +``` +Read `$ROOT/.claude/skills/REFERENCE.md`. There are **6 packs** `lifepower4_1..6_*`, +all served by `eg4-battery.service` (one FTDI RS485 adapter per pack). Pack config: +`~/.config/eg4-battery/eg4-battery.yaml`. + +## 1. Are all 6 packs reporting? +```bash +systemctl is-active eg4-battery.service +"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current)/' 'homeassistant/sensor/+/state' +``` +- Fewer than 6 packs in the output → a pack is **silent on RS485**, go to §4. +- All 6 present → go to §2/§3. + +## 2. SoC spread & drift +```bash +"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|soc_alt|pack_voltage)/' 'homeassistant/sensor/+/state' +``` +- Compute `max(soc) - min(soc)`. >10 % = imbalance worth noting; >20 % = significant. + **Pack 6 historically runs high** (it's the oddball: Modbus addr `0x01`/115200 vs + `0x40`/9600 for packs 1–5) — judge it on its own, don't assume it tracks 1–5. +- **SoC drift is a known design limitation**, not a live fault: the coulomb counter + never re-anchors because the bank rarely reaches 100 % to reset. See memory + `project_eg4_soc_drift_remediation`. If SoC looks wrong but `pack_voltage` is + sane, suspect drift, not a dead pack. Voltage→SoC sanity for LFP at rest: + ~51.2 V ≈ low, ~53.5 V ≈ mid, ~54+ V ≈ high (loaded/charging skews this). + +## 3. Cell imbalance, temperature, protection bits +```bash +"$SNAP" -w 16 -g 'lifepower4_[1-6]_(cell_voltage_delta_mv|cell_voltage_min|cell_voltage_max|temperature_pcb)/' 'homeassistant/sensor/+/state' +# For a specific suspect pack N, pull all 16 cells + bits: +"$SNAP" -w 14 -g 'lifepower4_3_(cell_[0-9]+_voltage|warning|protection|temperature)' 'homeassistant/sensor/+/state' +``` +- `cell_voltage_delta_mv`: <30 mV good, 30–50 mV watch, >50 mV imbalanced, >100 mV + bad (a weak/failing cell, or the pack simply needs a long absorb to balance). +- Any warning/protection bit set → read its name; over-/under-voltage, + over-temp, and over-current protections will also explain a pack dropping current. +- Temps: `temperature_pcb` > 55 °C = watch. + +## 4. RS485 / Modbus comms silence (a pack missing from §1) +```bash +journalctl -u eg4-battery.service --since "15 min ago" --no-pager | grep -iE 'timeout|crc|nak|error|no response|pack|addr' | tail -30 +ls -l /dev/serial/by-id/ | grep -i ft232 # all 6 FTDI adapters enumerated? +grep -E 'name:|port:|address|baud' ~/.config/eg4-battery/eg4-battery.yaml +``` +- Missing FTDI under `by-id` → USB/adapter/cable issue for that pack (hardware → + report to user; don't unplug things yourself). +- FTDI present but pack times out → check it isn't demoted by an inter-pack + daisy-chain (memory `project_eg4_daisy_chain_silences_slaves`: each pack must be on + its own dongle; a chain silences slaves). Also confirm the pack's `address`/`baud` + in config match the unit (pack 6 legitimately differs). +- Daemon wedged after a USB re-enumerate → restart is ALLOWED: + ```bash + sudo systemctl restart eg4-battery.service + ``` + Then re-run §1 to confirm all 6 return. + +## 5. Report +Per-pack table (SoC, voltage, current, delta_mv, max temp, any bits) + the stack +spread, whether it's drift vs a real fault, comms state, what you restarted, and any +hardware action left for the user. Do not write BMS registers or thresholds. diff --git a/.claude/skills/troubleshoot-inverter/SKILL.md b/.claude/skills/troubleshoot-inverter/SKILL.md new file mode 100644 index 0000000..55f75ed --- /dev/null +++ b/.claude/skills/troubleshoot-inverter/SKILL.md @@ -0,0 +1,113 @@ +--- +name: troubleshoot-inverter +description: >- + Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating + mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and + powermon daemon health. Use when an inverter shows a fault, is in the wrong mode, + stopped publishing, the two units disagree, PV looks low, or the user says + "inverter problem / fault code / no solar / one inverter is down". Read-only plus + safe link/daemon recovery; never changes inverter settings. +--- + +# troubleshoot-inverter + +## 0. Load context +Shell cwd is the repo root; anchor paths there: +```bash +ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history" +``` +Read `$ROOT/.claude/skills/REFERENCE.md`. Inverter entities are `lvx6048_1_*` +(powermon.service) and `lvx6048_2_*` (powermon2.service). + +## 1. Is it a data problem or a device problem? +```bash +systemctl is-active powermon.service powermon2.service lvx-resolve-links.service +ls -l /dev/lvx6048-1 /dev/lvx6048-2 # symlinks present? point at hidraw? +"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state' +``` +- Service active + data flowing → **device/config** issue, go to §2. +- Service active but a unit's entities are silent, or a `/dev/lvx6048-*` symlink is + missing/dangling → **USB-HID link** issue, go to §4. + +## 2. Faults & mode +```bash +"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state' +journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20 +``` +- `device_mode` values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault. + `Bypass`/`Line` = passing grid through (normal when grid present + low PV). + `Fault` = stop, decode the fault. +- The FWS fault/warning bit → label mapping lives in the patched driver + `$ROOT/LVX6048/powermon-patches/pi18.py` (search `FWS`, `fault`, `warning`). + Read it to translate a raw `fault_code`. MOD code labels are there too. + Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout, + 08 bus voltage too high, 56 battery connection open, 71 parallel version + different, 80–86 parallel-cluster faults (86 = output setting mismatch). + +### Historic faults — "when did it last happen / show me last week" +Local logs only reach the last reboot (`journalctl` here is volatile, ~1 day), so +for anything older query HA's recorder via `ha-history` (needs `~/.config/ha/token` +— see REFERENCE; if absent, tell the user how to create it, don't block): +Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is +doubled and differs per command): +```bash +"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode +"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code +``` +Each `Fault`/non-`No fault` change-point prints with a local timestamp and how long +it lasted (marked `<<< FAULT`). To pin the cause, re-query the same window with `-a` +for the surrounding conditions — `mppt1_input_voltage`, `mppt1_input_power`, +`battery_voltage`, `ac_output_active_power`: +- **Normal PV V (~300 V) + normal battery (~54 V) at the fault** → it's an internal + DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory). +- **Fault on ONE unit only, repeatedly** → unit-specific weakness (bus regulation / + cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units). +- **`mppt_input_power` flatlines at 0 after the fault and stays there** → that unit + silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20 + occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the + array offline ~1.8 days while unit 2 masked it). Cross-check the *other* unit's PV + over the same span to catch this. + +## 3. Parallel-cluster sync +The two units run paralleled; desync throws **fault 86**. +```bash +"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state' +``` +- Exactly one unit = `parallel_instance_number` 0 (master); other ≥1. Two masters / + two slaves / both 0 → cluster confused → likely needs a coordinated power cycle + (user action — propose it, don't do it). +- Healthy parallel = both units sharing load roughly symmetrically + (`ac_output_active_power` comparable). One at ~0 W while the other carries + everything = that unit dropped out of the cluster. +- Deeper sync/settings comparison via `$ROOT/LVX6048/lvx-flash/flash.py sync-check` / + `compare` exists but **stops powermon and grabs the USB** — advanced, user-run + only. Propose the command; don't execute. + +## 4. USB-HID link recovery (ALLOWED) +Symptom: one unit's entities stale/absent, or `/dev/lvx6048-*` missing/dangling. +The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon. +```bash +sudo systemctl restart lvx-resolve-links.service # remaps + restarts powermon{,2} +# or run the resolver directly: +sudo /usr/local/sbin/lvx-resolve-links +ls -l /dev/lvx6048-* # both symlinks back? +sudo systemctl restart powermon.service powermon2.service # if still wedged +``` +Then re-run §1 snapshot to confirm both units publish again. Note: after an +inverter power-cycle the udev rule normally self-heals; manual restart is the +fallback when it didn't. + +## 5. PV / MPPT looks low +```bash +"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state' +``` +Cross-reference array memory (`project_solar_array_config`): 9s2p per inverter, and +there's a **suspected dead string per inverter** — one MPPT reading ~half the other, +or roughly half nameplate in full sun, is consistent with that and worth confirming +at the combiner, not a software bug. Zero PV at night/heavy shade is normal. + +## 6. Report +State: which unit, mode, fault (decoded), cluster health, link state, what you +recovered (if anything), and any remaining action that needs the user (power cycle, +flash.py, combiner check) as exact commands/steps. Never publish to +`solar/control/lvx6048/*` here.