Add solar monitoring/troubleshooting skills for agents

Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:

- solar-health-check  : whole-system sweep + cross-checks + R/Y/G verdict,
                        incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage         : PV/load/grid/battery balance + EVSE sessions

Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history     : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).

Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-24 11:46:20 -04:00
parent 5484bb5fa6
commit aa97d65b0c
8 changed files with 792 additions and 0 deletions

2
.claude/skills/.gitignore vendored Normal file
View File

@@ -0,0 +1,2 @@
__pycache__/
*.pyc

145
.claude/skills/REFERENCE.md Normal file
View File

@@ -0,0 +1,145 @@
# Solar install — system map (shared reference for the solar skills)
This file is the ground truth the `solar-*` / `troubleshoot-*` / `power-usage`
skills build on. Read it once at the start of any solar task. Everything below
was verified live on this host (the monitoring Pi) on 2026-06-23; re-verify
anything load-bearing before acting on it.
## Topology
```
6× EG4 LifePower4 v2 packs ──RS485 (1 FTDI each)──┐
2× MPP Solar LVX6048 inverters ──USB-HID/PI18─────┤ this Pi ──MQTT──► HA broker
1× OpenEVSE charger (10.0.0.249) ──its own WiFi───┘ (daemons) 10.0.0.41:1883
```
All telemetry lands on the **MQTT broker at 10.0.0.41:1883** under HA
auto-discovery (`homeassistant/<class>/<entity>/config` retained, `.../state`
republished each poll cycle — **state topics are NOT retained**, so to read
current values you must listen for a window: use `lib/solar-snapshot`).
Broker credentials live in `~/.config/powermon/powermon.yaml`
(`mqttbroker.{name,port,username,password}`). **Never hardcode them** — every
tool here reads them from that file. `lib/solar-snapshot` does too.
## The snapshot helper
`./lib/solar-snapshot` (relative to this skills dir) captures the latest value of
every matching MQTT topic over a short window and prints a table. This is the
primary read tool — prefer it over raw `mosquitto_sub`.
```
solar-snapshot [-w SECONDS] [-g GREP_RE] [-f] TOPIC_FILTER...
```
MQTT `+` matches one WHOLE level, so `lifepower4_+` matches nothing. Subscribe to
`homeassistant/sensor/+/state` and narrow with `-g`:
```
solar-snapshot -g 'lvx6048_1_' 'homeassistant/sensor/+/state'
solar-snapshot -w 16 -g 'lifepower4_[1-6]_soc/' 'homeassistant/sensor/+/state'
solar-snapshot 'openevse/#' # EVSE publishes on-change; idle when unplugged
```
## The history helper
`solar-snapshot` only sees *now*. For "when did X last happen / show last week",
use `./lib/ha-history`, which queries **Home Assistant's recorder** (the only
store that keeps history — local journald is volatile, ~1 day, wiped on reboot;
no solar data goes to InfluxDB). Default window 7 days; HA recorder default
retention is 10 days.
```
ha-history [-s SINCE] [-e END] [-m REGEX] [-a] ENTITY...
ha-history -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
ha-history -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
```
**HA entity_ids ≠ MQTT object names.** powermon's hass output doubles the device
slug and is inconsistent across commands, so you must use the real ids, e.g.:
- device mode: `sensor.lvx6048_lvx6048_{1,2}_device_mode` (device slug `lvx6048`)
- fault code: `sensor.lvx6048_0{1,2}_lvx6048_{1,2}_fault_code` (slug `lvx6048_01`/`_02`)
- PV/batt/load: `sensor.lvx6048_lvx6048_1_{mppt1_input_power,mppt1_input_voltage,battery_voltage,ac_output_active_power}`
- EG4 packs follow the same doubling, e.g. `sensor.lifepower4_*`. When unsure, list
them: `curl -s -H "Authorization: Bearer $(cat ~/.config/ha/token)" $HA/api/states
| python3 -c 'import sys,json;[print(s["entity_id"]) for s in json.load(sys.stdin) if "lvx6048" in s["entity_id"]]'`
Auth: reads a long-lived token from `~/.config/ha/token` (mode 600) or `$HA_TOKEN`
— never on the command line, never hardcoded. Base URL `$HA_URL` else
`~/.config/ha/url` else `http://10.0.0.41:8123`. If it reports "no token", the user
must create one (HA → Profile → Security → Long-lived access tokens) and write it
to `~/.config/ha/token`; tell them which file, don't ask them to paste it in chat.
Recorder excludes (per `eg4battery/homeassistant/recorder.yaml`) drop EG4
per-cell/register/string entities — those have no history; the inverter
`device_mode`/`fault_code` and pack `soc`/`pack_voltage` etc. are recorded.
## Services (this Pi)
| Service | Role | Entities it feeds |
|---|---|---|
| `powermon.service` | LVX6048 #1 poller (PI18/USB) | `lvx6048_1_*` |
| `powermon2.service` | LVX6048 #2 poller (PI18/USB) | `lvx6048_2_*` |
| `lvx-resolve-links.service` | oneshot: maps `/dev/hidraw*``/dev/lvx6048-{1,2}` by PI18 serial; runs before powermon | (links) |
| `lvx-control.service` | bridges `solar/control/lvx6048/*` → powermon adhoc queue | (control) |
| `eg4-battery.service` | polls all 6 packs over RS485/Modbus | `lifepower4_1..6_*` |
Quick health: `systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service`
Logs: `journalctl -u <svc> --since "10 min ago" --no-pager`
## Entities cheat-sheet
**Inverters** `lvx6048_{1,2}_*` (PI18 GS/MOD/PIRI/FWS/ET):
`device_mode` (Power-On/Standby/Bypass/Battery/Fault/Charge…), `fault_code`,
`battery_voltage`, `battery_capacity` (%), `ac_output_active_power` (W),
`ac_output_voltage`, `grid_voltage`, `mppt1_input_power`/`mppt2_input_power` (W, PV),
`inverter_heat_sink_temperature`, `parallel_instance_number` (0 = master, 1+ = slave).
**Packs** `lifepower4_{1..6}_*` (Modbus): `soc`, `soc_alt`, `pack_voltage`,
`pack_current` (signed, + = charging), `cell_01..16_voltage`,
`cell_voltage_delta_mv` (imbalance), `cell_voltage_min`/`max`, `capacity_ah`,
`temperature_01..04`, `temperature_pcb`, `model`, `firmware_version`,
`firmware_date`, warning/protection bits, `register_NN` raw. There are 16 cells/pack.
**EVSE** `openevse/<key>` and `openevse_*` HA entities: `power` (W), `voltage`,
`amp` (mA raw → A in HA), `pilot`, `max_current`, `session_energy` (Wh),
`total_energy`, `status` (active/sleeping/disabled…), `state`, `temp`,
`vehicle` (plug). Charger HTTP UI at http://10.0.0.249.
Derived HA template sensors (`lifepower4_N_pack_power`, `_temperature_max`,
`_cell_imbalance_pct`, `lifepower4_stack_*`) are computed **inside HA**, not on
MQTT — compute them yourself from the raw entities when working off the Pi.
## Known issues / gotchas (check memory for the canonical versions)
- **Inverter `battery_voltage` is INTERMITTENTLY wrong** — read a correct ~54 V on
2026-06-20 (verified via HA history), but ~910 V on 2026-06-23/24 after the Jun 22
14:18 reboot, with packs steady at ~5253 V throughout. So it's a post-reboot /
re-init glitch (the inverter or PI18 GS field not settling after restart), NOT a
permanent scaling bug. Implication: treat the inverter battery reading as
untrustworthy and use the `lifepower4_*` pack entities for any battery math; if it
reads ~10 V right now, a powermon (or inverter) restart may clear it — worth testing.
- **Pack 6 is an oddball**: Modbus addr `0x01` @ 115200 (packs 15 are `0x40` @
9600); ran 65 % SoC while 15 sat 4044 %. Treat as a distinct member.
- **EG4 SoC never re-anchors** (drifts because packs rarely hit 100 % to reset the
coulomb counter). See memory `project_eg4_soc_drift_remediation`.
- **RS485 daisy-chain silences slave packs** — each pack needs its own FTDI; an
inter-pack chain demotes slaves. See memory `project_eg4_daisy_chain_silences_slaves`.
- **No per-day inverter energy** — PI18 only gives `ET` (lifetime Wh); ED/EM/EY NAK.
Daily kWh must come from HA recorder or ET deltas.
- **Parallel cluster**: changing inverter settings on only one unit risks fault 86
(desync). `lvx-control` always mirrors to both — that's why setters go through it.
## Action policy for these skills
**Allowed (safe remediation):**
- Read anything: `solar-snapshot`, `mosquitto_sub`, `journalctl`, `systemctl status/is-active`.
- Restart the data-plane daemons when they're wedged:
`sudo systemctl restart powermon.service` / `powermon2.service` / `eg4-battery.service` / `lvx-control.service`
- Recover inverter USB links: `sudo systemctl restart lvx-resolve-links.service`
or `sudo /usr/local/sbin/lvx-resolve-links`.
**Forbidden (escalate to the user instead — propose the exact command, don't run it):**
- Any inverter/battery **setter**: `solar/control/lvx6048/*` publishes
(charger priority, max charge current, output priority, …).
- `lvx-flash/flash.py apply` and `dump`/`compare`/`sync-check` — they contend for
exclusive USB and stop powermon; advanced, user-driven only.
- Anything that writes battery thresholds, output mode, or factory resets.
- Power-cycling hardware, moving cables, breaker changes.
When a fix is outside the allowed set, report the finding and hand the user the
precise command(s) to run.

184
.claude/skills/lib/ha-history Executable file
View File

@@ -0,0 +1,184 @@
#!/usr/bin/env python3
"""ha-history — pull state history for HA entities from the Home Assistant
recorder, and print a compact change-point timeline. Companion to solar-snapshot
(which only sees live values); this is the historic-lookback tool.
Auth: reads a long-lived access token from ~/.config/ha/token (or $HA_TOKEN).
No secret is ever passed on the command line or hardcoded.
Base URL: $HA_URL, else ~/.config/ha/url, else http://10.0.0.41:8123.
Usage:
ha-history [-s SINCE] [-e END] [-m REGEX] [-a] ENTITY [ENTITY ...]
ENTITY entity_id; a bare name with no dot is auto-prefixed `sensor.`
e.g. `lvx6048_1_device_mode` -> `sensor.lvx6048_1_device_mode`
-s SINCE start of window. default "7 days ago".
accepts: "7 days ago", "7d", "36h", "90m", an ISO timestamp,
or a date "2026-06-16".
-e END end of window. default: now. same formats as -s.
-m REGEX only show change-points whose state matches REGEX (case-insensitive);
the per-entity header still reports the full count. e.g. -m fault
-a show every recorded point, not just state *changes*.
Examples:
ha-history lvx6048_1_device_mode lvx6048_2_device_mode
ha-history -s "10 days ago" -m fault lvx6048_1_fault_code lvx6048_2_fault_code
ha-history -s 2026-06-16 -e 2026-06-23 lvx6048_1_device_mode
"""
import sys, os, re, json, argparse, urllib.request, urllib.parse, urllib.error
from datetime import datetime, timedelta, timezone
CONF_DIR = os.path.expanduser("~/.config/ha")
DEFAULT_URL = "http://10.0.0.41:8123"
def die(msg, code=1):
print(f"ha-history: {msg}", file=sys.stderr)
sys.exit(code)
def load_token():
tok = os.environ.get("HA_TOKEN")
if tok:
return tok.strip()
path = os.path.join(CONF_DIR, "token")
if not os.path.exists(path):
die("no token. Create a Long-Lived Access Token in HA "
"(Profile -> Security), then:\n"
" mkdir -p ~/.config/ha && install -m600 /dev/stdin ~/.config/ha/token\n"
"or set $HA_TOKEN.")
with open(path) as f:
tok = f.read().strip()
if not tok:
die(f"{path} is empty")
return tok
def base_url():
if os.environ.get("HA_URL"):
return os.environ["HA_URL"].rstrip("/")
p = os.path.join(CONF_DIR, "url")
if os.path.exists(p):
with open(p) as f:
u = f.read().strip()
if u:
return u.rstrip("/")
return DEFAULT_URL
def parse_when(s, *, default_now=False):
if s is None:
return datetime.now(timezone.utc).astimezone() if default_now else None
s = s.strip()
m = re.fullmatch(r"(\d+)\s*(d|h|m)(?:ays?|ours?|in(?:ute)?s?)?(?:\s*ago)?", s, re.I)
if m:
n, unit = int(m.group(1)), m.group(2).lower()
delta = {"d": timedelta(days=n), "h": timedelta(hours=n), "m": timedelta(minutes=n)}[unit]
return datetime.now(timezone.utc).astimezone() - delta
# ISO timestamp or bare date
try:
dt = datetime.fromisoformat(s)
except ValueError:
die(f"can't parse time {s!r}. Use '7 days ago', '36h', ISO, or 'YYYY-MM-DD'.")
if dt.tzinfo is None: # assume local tz
dt = dt.astimezone()
return dt
def fetch(url, token):
req = urllib.request.Request(url, headers={"Authorization": f"Bearer {token}"})
try:
with urllib.request.urlopen(req, timeout=30) as r:
return json.load(r)
except urllib.error.HTTPError as e:
if e.code == 401:
die("401 Unauthorized — token rejected. Regenerate it in HA and rewrite "
"~/.config/ha/token.")
die(f"HTTP {e.code} from HA: {e.reason}")
except urllib.error.URLError as e:
die(f"cannot reach HA at {url.split('/api')[0]}: {e.reason}")
def fmt_local(iso):
"""HA returns UTC ISO; show local time, second precision."""
try:
return datetime.fromisoformat(iso).astimezone().strftime("%Y-%m-%d %H:%M:%S")
except (ValueError, TypeError):
return str(iso)
def main():
ap = argparse.ArgumentParser(add_help=False)
ap.add_argument("-s", "--since", default="7 days ago")
ap.add_argument("-e", "--end", default=None)
ap.add_argument("-m", "--match", default=None)
ap.add_argument("-a", "--all-points", action="store_true")
ap.add_argument("-h", "--help", action="store_true")
ap.add_argument("entities", nargs="*")
a = ap.parse_args()
if a.help or not a.entities:
print(__doc__.strip())
sys.exit(0 if a.help else 2)
ents = [e if "." in e else f"sensor.{e}" for e in a.entities]
start = parse_when(a.since)
end = parse_when(a.end, default_now=True)
matcher = re.compile(a.match, re.I) if a.match else None
token = load_token()
url = (f"{base_url()}/api/history/period/"
f"{urllib.parse.quote(start.isoformat())}"
f"?end_time={urllib.parse.quote(end.isoformat())}"
f"&filter_entity_id={urllib.parse.quote(','.join(ents))}"
f"&minimal_response&no_attributes")
data = fetch(url, token)
print(f"# HA history {start.strftime('%Y-%m-%d %H:%M')} -> "
f"{end.strftime('%Y-%m-%d %H:%M')} ({base_url()})\n")
by_id = {}
for series in data or []:
if series:
by_id[series[0].get("entity_id")] = series
for ent in ents:
series = by_id.get(ent)
if not series:
print(f"{ent}\n (no recorded history in window — entity wrong, "
f"excluded from recorder, or purged)\n")
continue
# Build (time, state) points, collapsing consecutive identical states
# unless --all-points.
points, prev = [], object()
for item in series:
st = item.get("state")
ts = item.get("last_changed") or item.get("last_updated")
if a.all_points or st != prev:
points.append((ts, st))
prev = st
shown = [(ts, st) for ts, st in points if not matcher or matcher.search(str(st))]
label = "points" if a.all_points else "changes"
extra = f", {len(shown)} match /m" if matcher else ""
print(f"{ent} ({len(points)} {label}{extra})")
if not shown:
print(" (nothing matched)\n")
continue
for i, (ts, st) in enumerate(shown):
mark = " <<< FAULT" if re.search(r"fault", str(st), re.I) and st not in ("No fault",) else ""
# duration until next change-point in the *full* timeline
dur = ""
if not matcher:
nxt = points[i + 1][0] if i + 1 < len(points) else None
if nxt:
try:
d = datetime.fromisoformat(nxt) - datetime.fromisoformat(ts)
secs = int(d.total_seconds())
dur = f" ({secs//3600}h{secs%3600//60:02d}m)" if secs >= 3600 else f" ({secs//60}m)"
except (ValueError, TypeError):
pass
print(f" {fmt_local(ts)} {st}{dur}{mark}")
print()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,97 @@
#!/usr/bin/env bash
# solar-snapshot — capture the latest retained/published value of every MQTT
# topic matching a filter, over a short listen window, and print a clean table.
#
# Why a listen window: powermon/eg4-battery STATE topics are NOT retained — they
# are republished every poll cycle (GS ~5s, packs ~one cycle, EVSE on-change).
# So we subscribe for a few seconds and keep the last value seen per topic.
# (HA discovery `.../config` topics ARE retained and show up immediately.)
#
# Broker credentials are read from ~/.config/powermon/powermon.yaml (the same
# source the openevse + lvx-control tools use) so nothing is hardcoded here.
#
# NOTE on MQTT wildcards: `+` matches exactly ONE whole level, so it cannot be
# used as a name prefix. `homeassistant/sensor/lifepower4_+/state` matches NOTHING.
# To grab a family of entities, subscribe to the level wildcard and filter with -g:
# solar-snapshot -g lifepower4 'homeassistant/sensor/+/state'
#
# Usage:
# solar-snapshot [-w SECONDS] [-f] [-g GREP_RE] TOPIC_FILTER [TOPIC_FILTER ...]
# -w SECONDS listen window (default 12)
# -f print full topic path (default: strip homeassistant/<class>/ prefix)
# -g GREP_RE keep only topics whose path matches this extended-regex
#
# Examples:
# solar-snapshot -g 'lvx6048_1' 'homeassistant/sensor/+/state'
# solar-snapshot -w 18 -g 'lifepower4_[1-6]_soc' 'homeassistant/sensor/+/state'
# solar-snapshot 'openevse/#'
# solar-snapshot -w 6 'homeassistant/sensor/lvx6048_1_battery_voltage/state' \
# 'homeassistant/sensor/lifepower4_1_pack_voltage/state'
#
# Exit status reflects the formatting stage, not mosquitto_sub's benign -W
# window-expiry code, so callers don't misread a normal capture as a failure.
set -eu
WINDOW=12
FULL=0
GREP_RE=""
while getopts "w:fg:" opt; do
case "$opt" in
w) WINDOW="$OPTARG" ;;
f) FULL=1 ;;
g) GREP_RE="$OPTARG" ;;
*) echo "usage: solar-snapshot [-w SECONDS] [-f] [-g GREP_RE] TOPIC_FILTER..." >&2; exit 2 ;;
esac
done
shift $((OPTIND - 1))
if [ "$#" -lt 1 ]; then
echo "usage: solar-snapshot [-w SECONDS] [-f] [-g GREP_RE] TOPIC_FILTER..." >&2
exit 2
fi
CONF="${POWERMON_CONF:-$HOME/.config/powermon/powermon.yaml}"
if [ ! -r "$CONF" ]; then
echo "solar-snapshot: cannot read broker config $CONF" >&2
exit 1
fi
# Pull host/port/user/pass from the mqttbroker: block of powermon.yaml.
# Keys are anchored to leading whitespace + exact key so `name:` doesn't also
# match `username:`.
read -r HOST PORT USER PASS < <(awk '
/^[^[:space:]]/ { inblk=0 }
/^mqttbroker:/ { inblk=1; next }
inblk && /^[[:space:]]+name:/ { h=$2 }
inblk && /^[[:space:]]+port:/ { p=$2 }
inblk && /^[[:space:]]+username:/ { u=$2 }
inblk && /^[[:space:]]+password:/ { w=$2 }
END { print h, (p?p:1883), u, w }
' "$CONF")
if [ -z "${HOST:-}" ]; then
echo "solar-snapshot: no mqttbroker.name found in $CONF" >&2
exit 1
fi
# Build -t args from filters.
TARGS=()
for f in "$@"; do TARGS+=(-t "$f"); done
# Subscribe for the window, then reduce to last-value-per-topic.
timeout "$((WINDOW + 2))" mosquitto_sub -h "$HOST" -p "$PORT" -u "$USER" -P "$PASS" \
-W "$WINDOW" -v "${TARGS[@]}" 2>/dev/null \
| { [ -n "$GREP_RE" ] && grep -E "$GREP_RE" || cat; } \
| awk -v full="$FULL" '
{ t=$1; $1=""; sub(/^ /,""); v=$0; last[t]=v; order[t]=NR }
END {
n=0
for (t in last) { keys[n++]=t }
# stable-ish sort by topic name
for (i=0;i<n;i++) for (j=i+1;j<n;j++) if (keys[j]<keys[i]) { tmp=keys[i];keys[i]=keys[j];keys[j]=tmp }
for (i=0;i<n;i++) {
t=keys[i]; disp=t
if (!full) { sub(/^homeassistant\/[^/]+\//,"",disp); sub(/\/state$/,"",disp) }
printf "%-44s %s\n", disp, last[t]
}
if (n==0) print "(no messages in window — topics idle, broker unreachable, or filter wrong)"
}'

View File

@@ -0,0 +1,71 @@
---
name: power-usage
description: >-
Analyze where the power is going across the install — load vs PV generation vs
grid vs battery flow, plus EVSE charging sessions. Use when the user asks "why is
my battery draining / how much am I using / where are the watts going / is the car
charging / what's my solar production / power consumption", or wants an energy
balance or breakdown. Read-only; this skill measures and explains, it does not
change anything.
---
# power-usage
## 0. Load context
Shell cwd is the repo root; anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"
```
Read `$ROOT/.claude/skills/REFERENCE.md` for entity names. Key sign conventions: pack
`pack_current` is signed (**+ = charging, = discharging**); inverter
`mppt*_input_power` is PV in (W); `ac_output_active_power` is load out (W).
## 1. Instantaneous energy balance
```bash
# Generation (PV) + load, per inverter:
"$SNAP" -w 10 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|ac_output_active_power|grid_voltage|device_mode)/' 'homeassistant/sensor/+/state'
# Battery flow, per pack (sum the pack_power = V×I yourself):
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(pack_voltage|pack_current|soc)/' 'homeassistant/sensor/+/state'
# EV charger:
"$SNAP" -w 8 'openevse/status' 'openevse/power' 'openevse/amp' 'openevse/voltage' 'openevse/session_energy'
```
Then state the balance in words:
- **PV in** = sum of all `mppt*_input_power`.
- **Battery** = sum of (pack_voltage × pack_current) over 6 packs. Negative total =
discharging (load exceeds PV+grid); positive = charging.
- **Load out** = sum of inverter `ac_output_active_power`.
- **EVSE** = `openevse/power` — and the EVSE load is a *subset* of total load, so a
draining battery with the car plugged usually explains itself here.
- **Grid**: `device_mode` Bypass/Line means grid is carrying/supplementing; Battery
mode means running off the bank. The LVX6048 has no clean grid-power entity, so
infer grid = load PV battery_discharge.
Sanity: PV + grid + battery_discharge ≈ load (within metering noise). A big residual
means one feed is mis-reported — note it (e.g. the known `lvx6048_1_battery_voltage`
~10 V glitch will corrupt any pack-power math that uses the *inverter's* battery
reading; always use the **pack** entities for battery flow).
## 2. "Why is the battery draining?"
Walk the chain: is PV low (night/shade/§5 of troubleshoot-inverter dead string)? Is
load high (check `ac_output_active_power` and EVSE `power`)? Is the inverter in
Battery mode instead of using grid (`device_mode`)? Pin the drain on the largest
negative contributor and say which.
## 3. EVSE sessions
```bash
"$SNAP" -w 10 'openevse/status' 'openevse/state' 'openevse/session_energy' 'openevse/total_energy' 'openevse/vehicle' 'openevse/pilot' 'openevse/max_current'
```
- `status` active = charging; sleeping/disabled = not drawing. `vehicle` = plugged.
- `session_energy` (Wh) this plug-in; `pilot`/`max_current` = the current cap the
EVSE is signalling. Idle EVSE publishes little — a short empty capture is normal.
- For history/trends (daily kWh, past sessions), the data lives in **Home Assistant's
recorder**, not on MQTT — direct the user to the HA Energy dashboard /
`sensor.openevse_total_day|week|month`. PI18 has no per-day inverter energy
(memory `project_lvx6048_no_daily_energy_query`); only `ET` lifetime Wh exists.
## 4. Report
Give the live balance (PV / load / battery / grid / EVSE, with numbers and signs),
the headline ("you're pulling X W from the bank because load Y W > PV Z W, car is
taking W W"), and point at HA recorder for anything historical. This skill never
changes settings — if the answer is "shift charging to solar hours" etc., suggest it
as advice, don't actuate.

View File

@@ -0,0 +1,103 @@
---
name: solar-health-check
description: >-
Top-level health snapshot of the whole solar/power install — 2 LVX6048
inverters, 6 EG4 LifePower4 packs, and the OpenEVSE charger — with cross-checks
and a green/yellow/red verdict. Use when the user asks "how's the solar /
battery / power system doing", "is everything ok", "check the install", wants a
status report, or as the first step before deeper troubleshooting. For deep
dives into one subsystem, hand off to troubleshoot-inverter, troubleshoot-battery,
or power-usage.
---
# solar-health-check
A fast, read-only sweep of every subsystem that ends in a clear verdict. Do NOT
change settings here; if something needs a restart, that's allowed (see policy).
## 0. Load context
Skills run with the shell cwd at the repo root, so anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
```
Read `$ROOT/.claude/skills/REFERENCE.md` (system map, entity names, snapshot helper,
action policy) before proceeding.
## 1. Services up?
```bash
systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service lvx-resolve-links.service
```
`lvx-resolve-links` is a oneshot → expect `active`/`exited` (not `failed`). Any
`failed`/`inactive` on the others is RED. For a wedged data-plane daemon, a
restart is allowed (see §6).
## 2. Capture live telemetry
```bash
"$SNAP" -w 10 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|battery_capacity|ac_output_active_power|mppt1_input_power|mppt2_input_power|grid_voltage|inverter_heat_sink_temperature|parallel_instance_number)/' 'homeassistant/sensor/+/state'
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current|cell_voltage_delta_mv|temperature_pcb)/' 'homeassistant/sensor/+/state'
"$SNAP" -w 6 'openevse/status' 'openevse/amp' 'openevse/power' 'openevse/session_energy'
```
If a family returns "(no messages)": the feeding daemon is silent → that subsystem
is RED regardless of `is-active` (running but not publishing). EVSE idle/unplugged
publishing nothing is normal — confirm via `openevse/status`.
## 3. Cross-checks (this is the value-add — single sensors can each look fine)
- **Battery voltage agreement**: each inverter's `battery_voltage` should be within
~1 V of the pack stack voltage (`pack_voltage` ≈ 5155 V). **Known anomaly:** the
inverter reading is *intermittently* wrong (correct ~54 V on 2026-06-20, ~910 V
after the Jun 22 reboot) — a post-reboot glitch, not a permanent bug. If it reads
~10 V, note it and suggest a powermon restart; use the `lifepower4_*` pack entities,
never the inverter reading, for any battery math (see REFERENCE known-issues).
- **Cross-unit PV production (catches a silently-dead inverter)**: compare
`lvx6048_1_mppt1_input_power` vs `lvx6048_2_mppt1_input_power`. In daylight (the
*other* unit clearly producing), one unit pinned at **0 W** = that inverter is down
and being masked by its sibling — RED → troubleshoot-inverter. This is exactly the
2026-06-20 fault-08 failure mode (unit 1 sat at 0 W for ~1.8 days). At night/heavy
shade both at 0 W is normal.
- **SoC spread across packs**: `max(soc) - min(soc)` over the 6 packs. BUT first
cross-check against `pack_voltage`/`cell_voltage_max`: the packs are paralleled, so
if all `pack_voltage` agree (±0.1 V) the packs are physically at the same charge and
any SoC spread is **counter drift**, not real imbalance (pack 6 ran 76 % while
reading the same 53.4 V / 3.337 V/cell as packs at 5055 % on 2026-06-24). Real
imbalance = pack voltages actually diverge. Drift → note it, recommend a calibration
charge; >20 % spread with diverging voltages = RED → troubleshoot-battery.
- **Cell imbalance**: any pack with `cell_voltage_delta_mv` > 50 = YELLOW, > 100 = RED.
- **Parallel master/slave**: exactly one inverter should report
`parallel_instance_number` 0 (master); the other 1+. Two masters or two slaves = RED.
- **Faults**: any `fault_code` non-zero, or `device_mode` = Fault = RED → troubleshoot-inverter.
- **Temps**: pack `temperature_pcb` > 55 °C or inverter heat-sink > 75 °C = YELLOW.
- **Power balance sanity**: PV in (`mppt*_input_power`) vs AC out vs pack
`pack_current` should roughly conserve. Gross mismatch = investigate via power-usage.
## 4. Verdict
Print a compact table (subsystem → state → one-line reason), then an overall
GREEN / YELLOW / RED with the top 13 issues and which deeper skill to run.
## 5. Recent error scan (only if anything looked off)
```bash
for s in powermon powermon2 eg4-battery lvx-control; do
echo "== $s =="; journalctl -u $s.service --since "15 min ago" --no-pager | grep -iE 'error|timeout|fail|crc|nak|reconnect' | tail -5
done
```
## 5b. Historical sanity — did anything fail while unattended? (needs HA token)
Live snapshots miss faults that already cleared and silent-unit spells that ended.
If `~/.config/ha/token` exists (see REFERENCE), scan the recorder for the last few
days. Use the REAL HA entity_ids (doubled slug — see REFERENCE), not MQTT names:
```bash
"$HIST" -s "5 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
# silent-unit hunt: sample midday PV both units across recent days; one pinned 0 while
# the other produced = it was down. e.g. check a midday window per day:
"$HIST" -s "2 days ago" sensor.lvx6048_lvx6048_1_mppt1_input_power sensor.lvx6048_lvx6048_2_mppt1_input_power | head -40
```
Any fault-08 / silent-unit episode → report with timestamps and hand off to
troubleshoot-inverter §2§5. No token → say so and point the user at REFERENCE to add one.
## 6. Allowed remediation
If a daemon is `failed` or running-but-silent, restarting it is permitted:
```bash
sudo systemctl restart eg4-battery.service # or powermon / powermon2 / lvx-control
```
Re-run the relevant snapshot to confirm data resumes. Anything beyond a restart
(settings, flash, cabling) → report and hand the user the exact command. Never
publish to `solar/control/lvx6048/*` from this skill.

View File

@@ -0,0 +1,77 @@
---
name: troubleshoot-battery
description: >-
Diagnose the 6× EG4 LifePower4 v2 battery packs — per-pack SoC, cell imbalance,
SoC drift, RS485/Modbus comms silence, temperature, and warning/protection bits.
Use when a pack reads oddly, packs disagree on SoC, a pack stopped reporting,
cells look imbalanced, the user mentions "battery problem / pack down / SoC wrong
/ imbalance / one battery", or after solar-health-check flags the stack. Read-only
plus a safe eg4-battery daemon restart; never writes BMS settings.
---
# troubleshoot-battery
## 0. Load context
Shell cwd is the repo root; anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"
```
Read `$ROOT/.claude/skills/REFERENCE.md`. There are **6 packs** `lifepower4_1..6_*`,
all served by `eg4-battery.service` (one FTDI RS485 adapter per pack). Pack config:
`~/.config/eg4-battery/eg4-battery.yaml`.
## 1. Are all 6 packs reporting?
```bash
systemctl is-active eg4-battery.service
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current)/' 'homeassistant/sensor/+/state'
```
- Fewer than 6 packs in the output → a pack is **silent on RS485**, go to §4.
- All 6 present → go to §2/§3.
## 2. SoC spread & drift
```bash
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|soc_alt|pack_voltage)/' 'homeassistant/sensor/+/state'
```
- Compute `max(soc) - min(soc)`. >10 % = imbalance worth noting; >20 % = significant.
**Pack 6 historically runs high** (it's the oddball: Modbus addr `0x01`/115200 vs
`0x40`/9600 for packs 15) — judge it on its own, don't assume it tracks 15.
- **SoC drift is a known design limitation**, not a live fault: the coulomb counter
never re-anchors because the bank rarely reaches 100 % to reset. See memory
`project_eg4_soc_drift_remediation`. If SoC looks wrong but `pack_voltage` is
sane, suspect drift, not a dead pack. Voltage→SoC sanity for LFP at rest:
~51.2 V ≈ low, ~53.5 V ≈ mid, ~54+ V ≈ high (loaded/charging skews this).
## 3. Cell imbalance, temperature, protection bits
```bash
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(cell_voltage_delta_mv|cell_voltage_min|cell_voltage_max|temperature_pcb)/' 'homeassistant/sensor/+/state'
# For a specific suspect pack N, pull all 16 cells + bits:
"$SNAP" -w 14 -g 'lifepower4_3_(cell_[0-9]+_voltage|warning|protection|temperature)' 'homeassistant/sensor/+/state'
```
- `cell_voltage_delta_mv`: <30 mV good, 3050 mV watch, >50 mV imbalanced, >100 mV
bad (a weak/failing cell, or the pack simply needs a long absorb to balance).
- Any warning/protection bit set → read its name; over-/under-voltage,
over-temp, and over-current protections will also explain a pack dropping current.
- Temps: `temperature_pcb` > 55 °C = watch.
## 4. RS485 / Modbus comms silence (a pack missing from §1)
```bash
journalctl -u eg4-battery.service --since "15 min ago" --no-pager | grep -iE 'timeout|crc|nak|error|no response|pack|addr' | tail -30
ls -l /dev/serial/by-id/ | grep -i ft232 # all 6 FTDI adapters enumerated?
grep -E 'name:|port:|address|baud' ~/.config/eg4-battery/eg4-battery.yaml
```
- Missing FTDI under `by-id` → USB/adapter/cable issue for that pack (hardware →
report to user; don't unplug things yourself).
- FTDI present but pack times out → check it isn't demoted by an inter-pack
daisy-chain (memory `project_eg4_daisy_chain_silences_slaves`: each pack must be on
its own dongle; a chain silences slaves). Also confirm the pack's `address`/`baud`
in config match the unit (pack 6 legitimately differs).
- Daemon wedged after a USB re-enumerate → restart is ALLOWED:
```bash
sudo systemctl restart eg4-battery.service
```
Then re-run §1 to confirm all 6 return.
## 5. Report
Per-pack table (SoC, voltage, current, delta_mv, max temp, any bits) + the stack
spread, whether it's drift vs a real fault, comms state, what you restarted, and any
hardware action left for the user. Do not write BMS registers or thresholds.

View File

@@ -0,0 +1,113 @@
---
name: troubleshoot-inverter
description: >-
Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating
mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and
powermon daemon health. Use when an inverter shows a fault, is in the wrong mode,
stopped publishing, the two units disagree, PV looks low, or the user says
"inverter problem / fault code / no solar / one inverter is down". Read-only plus
safe link/daemon recovery; never changes inverter settings.
---
# troubleshoot-inverter
## 0. Load context
Shell cwd is the repo root; anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
```
Read `$ROOT/.claude/skills/REFERENCE.md`. Inverter entities are `lvx6048_1_*`
(powermon.service) and `lvx6048_2_*` (powermon2.service).
## 1. Is it a data problem or a device problem?
```bash
systemctl is-active powermon.service powermon2.service lvx-resolve-links.service
ls -l /dev/lvx6048-1 /dev/lvx6048-2 # symlinks present? point at hidraw?
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state'
```
- Service active + data flowing → **device/config** issue, go to §2.
- Service active but a unit's entities are silent, or a `/dev/lvx6048-*` symlink is
missing/dangling → **USB-HID link** issue, go to §4.
## 2. Faults & mode
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state'
journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20
```
- `device_mode` values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault.
`Bypass`/`Line` = passing grid through (normal when grid present + low PV).
`Fault` = stop, decode the fault.
- The FWS fault/warning bit → label mapping lives in the patched driver
`$ROOT/LVX6048/powermon-patches/pi18.py` (search `FWS`, `fault`, `warning`).
Read it to translate a raw `fault_code`. MOD code labels are there too.
Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout,
08 bus voltage too high, 56 battery connection open, 71 parallel version
different, 8086 parallel-cluster faults (86 = output setting mismatch).
### Historic faults — "when did it last happen / show me last week"
Local logs only reach the last reboot (`journalctl` here is volatile, ~1 day), so
for anything older query HA's recorder via `ha-history` (needs `~/.config/ha/token`
— see REFERENCE; if absent, tell the user how to create it, don't block):
Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is
doubled and differs per command):
```bash
"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
```
Each `Fault`/non-`No fault` change-point prints with a local timestamp and how long
it lasted (marked `<<< FAULT`). To pin the cause, re-query the same window with `-a`
for the surrounding conditions — `mppt1_input_voltage`, `mppt1_input_power`,
`battery_voltage`, `ac_output_active_power`:
- **Normal PV V (~300 V) + normal battery (~54 V) at the fault** → it's an internal
DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory).
- **Fault on ONE unit only, repeatedly** → unit-specific weakness (bus regulation /
cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units).
- **`mppt_input_power` flatlines at 0 after the fault and stays there** → that unit
silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20
occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the
array offline ~1.8 days while unit 2 masked it). Cross-check the *other* unit's PV
over the same span to catch this.
## 3. Parallel-cluster sync
The two units run paralleled; desync throws **fault 86**.
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state'
```
- Exactly one unit = `parallel_instance_number` 0 (master); other ≥1. Two masters /
two slaves / both 0 → cluster confused → likely needs a coordinated power cycle
(user action — propose it, don't do it).
- Healthy parallel = both units sharing load roughly symmetrically
(`ac_output_active_power` comparable). One at ~0 W while the other carries
everything = that unit dropped out of the cluster.
- Deeper sync/settings comparison via `$ROOT/LVX6048/lvx-flash/flash.py sync-check` /
`compare` exists but **stops powermon and grabs the USB** — advanced, user-run
only. Propose the command; don't execute.
## 4. USB-HID link recovery (ALLOWED)
Symptom: one unit's entities stale/absent, or `/dev/lvx6048-*` missing/dangling.
The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon.
```bash
sudo systemctl restart lvx-resolve-links.service # remaps + restarts powermon{,2}
# or run the resolver directly:
sudo /usr/local/sbin/lvx-resolve-links
ls -l /dev/lvx6048-* # both symlinks back?
sudo systemctl restart powermon.service powermon2.service # if still wedged
```
Then re-run §1 snapshot to confirm both units publish again. Note: after an
inverter power-cycle the udev rule normally self-heals; manual restart is the
fallback when it didn't.
## 5. PV / MPPT looks low
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state'
```
Cross-reference array memory (`project_solar_array_config`): 9s2p per inverter, and
there's a **suspected dead string per inverter** — one MPPT reading ~half the other,
or roughly half nameplate in full sun, is consistent with that and worth confirming
at the combiner, not a software bug. Zero PV at night/heavy shade is normal.
## 6. Report
State: which unit, mode, fault (decoded), cluster health, link state, what you
recovered (if anything), and any remaining action that needs the user (power cycle,
flash.py, combiner check) as exact commands/steps. Never publish to
`solar/control/lvx6048/*` here.