Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:
- solar-health-check : whole-system sweep + cross-checks + R/Y/G verdict,
incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage : PV/load/grid/battery balance + EVSE sessions
Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).
Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
145
.claude/skills/REFERENCE.md
Normal file
145
.claude/skills/REFERENCE.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Solar install — system map (shared reference for the solar skills)
|
||||
|
||||
This file is the ground truth the `solar-*` / `troubleshoot-*` / `power-usage`
|
||||
skills build on. Read it once at the start of any solar task. Everything below
|
||||
was verified live on this host (the monitoring Pi) on 2026-06-23; re-verify
|
||||
anything load-bearing before acting on it.
|
||||
|
||||
## Topology
|
||||
|
||||
```
|
||||
6× EG4 LifePower4 v2 packs ──RS485 (1 FTDI each)──┐
|
||||
2× MPP Solar LVX6048 inverters ──USB-HID/PI18─────┤ this Pi ──MQTT──► HA broker
|
||||
1× OpenEVSE charger (10.0.0.249) ──its own WiFi───┘ (daemons) 10.0.0.41:1883
|
||||
```
|
||||
|
||||
All telemetry lands on the **MQTT broker at 10.0.0.41:1883** under HA
|
||||
auto-discovery (`homeassistant/<class>/<entity>/config` retained, `.../state`
|
||||
republished each poll cycle — **state topics are NOT retained**, so to read
|
||||
current values you must listen for a window: use `lib/solar-snapshot`).
|
||||
|
||||
Broker credentials live in `~/.config/powermon/powermon.yaml`
|
||||
(`mqttbroker.{name,port,username,password}`). **Never hardcode them** — every
|
||||
tool here reads them from that file. `lib/solar-snapshot` does too.
|
||||
|
||||
## The snapshot helper
|
||||
|
||||
`./lib/solar-snapshot` (relative to this skills dir) captures the latest value of
|
||||
every matching MQTT topic over a short window and prints a table. This is the
|
||||
primary read tool — prefer it over raw `mosquitto_sub`.
|
||||
|
||||
```
|
||||
solar-snapshot [-w SECONDS] [-g GREP_RE] [-f] TOPIC_FILTER...
|
||||
```
|
||||
MQTT `+` matches one WHOLE level, so `lifepower4_+` matches nothing. Subscribe to
|
||||
`homeassistant/sensor/+/state` and narrow with `-g`:
|
||||
```
|
||||
solar-snapshot -g 'lvx6048_1_' 'homeassistant/sensor/+/state'
|
||||
solar-snapshot -w 16 -g 'lifepower4_[1-6]_soc/' 'homeassistant/sensor/+/state'
|
||||
solar-snapshot 'openevse/#' # EVSE publishes on-change; idle when unplugged
|
||||
```
|
||||
|
||||
## The history helper
|
||||
|
||||
`solar-snapshot` only sees *now*. For "when did X last happen / show last week",
|
||||
use `./lib/ha-history`, which queries **Home Assistant's recorder** (the only
|
||||
store that keeps history — local journald is volatile, ~1 day, wiped on reboot;
|
||||
no solar data goes to InfluxDB). Default window 7 days; HA recorder default
|
||||
retention is 10 days.
|
||||
```
|
||||
ha-history [-s SINCE] [-e END] [-m REGEX] [-a] ENTITY...
|
||||
ha-history -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
|
||||
ha-history -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
|
||||
```
|
||||
**HA entity_ids ≠ MQTT object names.** powermon's hass output doubles the device
|
||||
slug and is inconsistent across commands, so you must use the real ids, e.g.:
|
||||
- device mode: `sensor.lvx6048_lvx6048_{1,2}_device_mode` (device slug `lvx6048`)
|
||||
- fault code: `sensor.lvx6048_0{1,2}_lvx6048_{1,2}_fault_code` (slug `lvx6048_01`/`_02`)
|
||||
- PV/batt/load: `sensor.lvx6048_lvx6048_1_{mppt1_input_power,mppt1_input_voltage,battery_voltage,ac_output_active_power}`
|
||||
- EG4 packs follow the same doubling, e.g. `sensor.lifepower4_*`. When unsure, list
|
||||
them: `curl -s -H "Authorization: Bearer $(cat ~/.config/ha/token)" $HA/api/states
|
||||
| python3 -c 'import sys,json;[print(s["entity_id"]) for s in json.load(sys.stdin) if "lvx6048" in s["entity_id"]]'`
|
||||
Auth: reads a long-lived token from `~/.config/ha/token` (mode 600) or `$HA_TOKEN`
|
||||
— never on the command line, never hardcoded. Base URL `$HA_URL` else
|
||||
`~/.config/ha/url` else `http://10.0.0.41:8123`. If it reports "no token", the user
|
||||
must create one (HA → Profile → Security → Long-lived access tokens) and write it
|
||||
to `~/.config/ha/token`; tell them which file, don't ask them to paste it in chat.
|
||||
Recorder excludes (per `eg4battery/homeassistant/recorder.yaml`) drop EG4
|
||||
per-cell/register/string entities — those have no history; the inverter
|
||||
`device_mode`/`fault_code` and pack `soc`/`pack_voltage` etc. are recorded.
|
||||
|
||||
## Services (this Pi)
|
||||
|
||||
| Service | Role | Entities it feeds |
|
||||
|---|---|---|
|
||||
| `powermon.service` | LVX6048 #1 poller (PI18/USB) | `lvx6048_1_*` |
|
||||
| `powermon2.service` | LVX6048 #2 poller (PI18/USB) | `lvx6048_2_*` |
|
||||
| `lvx-resolve-links.service` | oneshot: maps `/dev/hidraw*` → `/dev/lvx6048-{1,2}` by PI18 serial; runs before powermon | (links) |
|
||||
| `lvx-control.service` | bridges `solar/control/lvx6048/*` → powermon adhoc queue | (control) |
|
||||
| `eg4-battery.service` | polls all 6 packs over RS485/Modbus | `lifepower4_1..6_*` |
|
||||
|
||||
Quick health: `systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service`
|
||||
Logs: `journalctl -u <svc> --since "10 min ago" --no-pager`
|
||||
|
||||
## Entities cheat-sheet
|
||||
|
||||
**Inverters** `lvx6048_{1,2}_*` (PI18 GS/MOD/PIRI/FWS/ET):
|
||||
`device_mode` (Power-On/Standby/Bypass/Battery/Fault/Charge…), `fault_code`,
|
||||
`battery_voltage`, `battery_capacity` (%), `ac_output_active_power` (W),
|
||||
`ac_output_voltage`, `grid_voltage`, `mppt1_input_power`/`mppt2_input_power` (W, PV),
|
||||
`inverter_heat_sink_temperature`, `parallel_instance_number` (0 = master, 1+ = slave).
|
||||
|
||||
**Packs** `lifepower4_{1..6}_*` (Modbus): `soc`, `soc_alt`, `pack_voltage`,
|
||||
`pack_current` (signed, + = charging), `cell_01..16_voltage`,
|
||||
`cell_voltage_delta_mv` (imbalance), `cell_voltage_min`/`max`, `capacity_ah`,
|
||||
`temperature_01..04`, `temperature_pcb`, `model`, `firmware_version`,
|
||||
`firmware_date`, warning/protection bits, `register_NN` raw. There are 16 cells/pack.
|
||||
|
||||
**EVSE** `openevse/<key>` and `openevse_*` HA entities: `power` (W), `voltage`,
|
||||
`amp` (mA raw → A in HA), `pilot`, `max_current`, `session_energy` (Wh),
|
||||
`total_energy`, `status` (active/sleeping/disabled…), `state`, `temp`,
|
||||
`vehicle` (plug). Charger HTTP UI at http://10.0.0.249.
|
||||
|
||||
Derived HA template sensors (`lifepower4_N_pack_power`, `_temperature_max`,
|
||||
`_cell_imbalance_pct`, `lifepower4_stack_*`) are computed **inside HA**, not on
|
||||
MQTT — compute them yourself from the raw entities when working off the Pi.
|
||||
|
||||
## Known issues / gotchas (check memory for the canonical versions)
|
||||
|
||||
- **Inverter `battery_voltage` is INTERMITTENTLY wrong** — read a correct ~54 V on
|
||||
2026-06-20 (verified via HA history), but ~9–10 V on 2026-06-23/24 after the Jun 22
|
||||
14:18 reboot, with packs steady at ~52–53 V throughout. So it's a post-reboot /
|
||||
re-init glitch (the inverter or PI18 GS field not settling after restart), NOT a
|
||||
permanent scaling bug. Implication: treat the inverter battery reading as
|
||||
untrustworthy and use the `lifepower4_*` pack entities for any battery math; if it
|
||||
reads ~10 V right now, a powermon (or inverter) restart may clear it — worth testing.
|
||||
- **Pack 6 is an oddball**: Modbus addr `0x01` @ 115200 (packs 1–5 are `0x40` @
|
||||
9600); ran 65 % SoC while 1–5 sat 40–44 %. Treat as a distinct member.
|
||||
- **EG4 SoC never re-anchors** (drifts because packs rarely hit 100 % to reset the
|
||||
coulomb counter). See memory `project_eg4_soc_drift_remediation`.
|
||||
- **RS485 daisy-chain silences slave packs** — each pack needs its own FTDI; an
|
||||
inter-pack chain demotes slaves. See memory `project_eg4_daisy_chain_silences_slaves`.
|
||||
- **No per-day inverter energy** — PI18 only gives `ET` (lifetime Wh); ED/EM/EY NAK.
|
||||
Daily kWh must come from HA recorder or ET deltas.
|
||||
- **Parallel cluster**: changing inverter settings on only one unit risks fault 86
|
||||
(desync). `lvx-control` always mirrors to both — that's why setters go through it.
|
||||
|
||||
## Action policy for these skills
|
||||
|
||||
**Allowed (safe remediation):**
|
||||
- Read anything: `solar-snapshot`, `mosquitto_sub`, `journalctl`, `systemctl status/is-active`.
|
||||
- Restart the data-plane daemons when they're wedged:
|
||||
`sudo systemctl restart powermon.service` / `powermon2.service` / `eg4-battery.service` / `lvx-control.service`
|
||||
- Recover inverter USB links: `sudo systemctl restart lvx-resolve-links.service`
|
||||
or `sudo /usr/local/sbin/lvx-resolve-links`.
|
||||
|
||||
**Forbidden (escalate to the user instead — propose the exact command, don't run it):**
|
||||
- Any inverter/battery **setter**: `solar/control/lvx6048/*` publishes
|
||||
(charger priority, max charge current, output priority, …).
|
||||
- `lvx-flash/flash.py apply` and `dump`/`compare`/`sync-check` — they contend for
|
||||
exclusive USB and stop powermon; advanced, user-driven only.
|
||||
- Anything that writes battery thresholds, output mode, or factory resets.
|
||||
- Power-cycling hardware, moving cables, breaker changes.
|
||||
|
||||
When a fix is outside the allowed set, report the finding and hand the user the
|
||||
precise command(s) to run.
|
||||
Reference in New Issue
Block a user