Files
shaggy-solar/.claude/skills/REFERENCE.md
noise aa97d65b0c Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:

- solar-health-check  : whole-system sweep + cross-checks + R/Y/G verdict,
                        incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage         : PV/load/grid/battery balance + EVSE sessions

Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history     : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).

Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:46:20 -04:00

146 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Solar install — system map (shared reference for the solar skills)
This file is the ground truth the `solar-*` / `troubleshoot-*` / `power-usage`
skills build on. Read it once at the start of any solar task. Everything below
was verified live on this host (the monitoring Pi) on 2026-06-23; re-verify
anything load-bearing before acting on it.
## Topology
```
6× EG4 LifePower4 v2 packs ──RS485 (1 FTDI each)──┐
2× MPP Solar LVX6048 inverters ──USB-HID/PI18─────┤ this Pi ──MQTT──► HA broker
1× OpenEVSE charger (10.0.0.249) ──its own WiFi───┘ (daemons) 10.0.0.41:1883
```
All telemetry lands on the **MQTT broker at 10.0.0.41:1883** under HA
auto-discovery (`homeassistant/<class>/<entity>/config` retained, `.../state`
republished each poll cycle — **state topics are NOT retained**, so to read
current values you must listen for a window: use `lib/solar-snapshot`).
Broker credentials live in `~/.config/powermon/powermon.yaml`
(`mqttbroker.{name,port,username,password}`). **Never hardcode them** — every
tool here reads them from that file. `lib/solar-snapshot` does too.
## The snapshot helper
`./lib/solar-snapshot` (relative to this skills dir) captures the latest value of
every matching MQTT topic over a short window and prints a table. This is the
primary read tool — prefer it over raw `mosquitto_sub`.
```
solar-snapshot [-w SECONDS] [-g GREP_RE] [-f] TOPIC_FILTER...
```
MQTT `+` matches one WHOLE level, so `lifepower4_+` matches nothing. Subscribe to
`homeassistant/sensor/+/state` and narrow with `-g`:
```
solar-snapshot -g 'lvx6048_1_' 'homeassistant/sensor/+/state'
solar-snapshot -w 16 -g 'lifepower4_[1-6]_soc/' 'homeassistant/sensor/+/state'
solar-snapshot 'openevse/#' # EVSE publishes on-change; idle when unplugged
```
## The history helper
`solar-snapshot` only sees *now*. For "when did X last happen / show last week",
use `./lib/ha-history`, which queries **Home Assistant's recorder** (the only
store that keeps history — local journald is volatile, ~1 day, wiped on reboot;
no solar data goes to InfluxDB). Default window 7 days; HA recorder default
retention is 10 days.
```
ha-history [-s SINCE] [-e END] [-m REGEX] [-a] ENTITY...
ha-history -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
ha-history -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
```
**HA entity_ids ≠ MQTT object names.** powermon's hass output doubles the device
slug and is inconsistent across commands, so you must use the real ids, e.g.:
- device mode: `sensor.lvx6048_lvx6048_{1,2}_device_mode` (device slug `lvx6048`)
- fault code: `sensor.lvx6048_0{1,2}_lvx6048_{1,2}_fault_code` (slug `lvx6048_01`/`_02`)
- PV/batt/load: `sensor.lvx6048_lvx6048_1_{mppt1_input_power,mppt1_input_voltage,battery_voltage,ac_output_active_power}`
- EG4 packs follow the same doubling, e.g. `sensor.lifepower4_*`. When unsure, list
them: `curl -s -H "Authorization: Bearer $(cat ~/.config/ha/token)" $HA/api/states
| python3 -c 'import sys,json;[print(s["entity_id"]) for s in json.load(sys.stdin) if "lvx6048" in s["entity_id"]]'`
Auth: reads a long-lived token from `~/.config/ha/token` (mode 600) or `$HA_TOKEN`
— never on the command line, never hardcoded. Base URL `$HA_URL` else
`~/.config/ha/url` else `http://10.0.0.41:8123`. If it reports "no token", the user
must create one (HA → Profile → Security → Long-lived access tokens) and write it
to `~/.config/ha/token`; tell them which file, don't ask them to paste it in chat.
Recorder excludes (per `eg4battery/homeassistant/recorder.yaml`) drop EG4
per-cell/register/string entities — those have no history; the inverter
`device_mode`/`fault_code` and pack `soc`/`pack_voltage` etc. are recorded.
## Services (this Pi)
| Service | Role | Entities it feeds |
|---|---|---|
| `powermon.service` | LVX6048 #1 poller (PI18/USB) | `lvx6048_1_*` |
| `powermon2.service` | LVX6048 #2 poller (PI18/USB) | `lvx6048_2_*` |
| `lvx-resolve-links.service` | oneshot: maps `/dev/hidraw*``/dev/lvx6048-{1,2}` by PI18 serial; runs before powermon | (links) |
| `lvx-control.service` | bridges `solar/control/lvx6048/*` → powermon adhoc queue | (control) |
| `eg4-battery.service` | polls all 6 packs over RS485/Modbus | `lifepower4_1..6_*` |
Quick health: `systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service`
Logs: `journalctl -u <svc> --since "10 min ago" --no-pager`
## Entities cheat-sheet
**Inverters** `lvx6048_{1,2}_*` (PI18 GS/MOD/PIRI/FWS/ET):
`device_mode` (Power-On/Standby/Bypass/Battery/Fault/Charge…), `fault_code`,
`battery_voltage`, `battery_capacity` (%), `ac_output_active_power` (W),
`ac_output_voltage`, `grid_voltage`, `mppt1_input_power`/`mppt2_input_power` (W, PV),
`inverter_heat_sink_temperature`, `parallel_instance_number` (0 = master, 1+ = slave).
**Packs** `lifepower4_{1..6}_*` (Modbus): `soc`, `soc_alt`, `pack_voltage`,
`pack_current` (signed, + = charging), `cell_01..16_voltage`,
`cell_voltage_delta_mv` (imbalance), `cell_voltage_min`/`max`, `capacity_ah`,
`temperature_01..04`, `temperature_pcb`, `model`, `firmware_version`,
`firmware_date`, warning/protection bits, `register_NN` raw. There are 16 cells/pack.
**EVSE** `openevse/<key>` and `openevse_*` HA entities: `power` (W), `voltage`,
`amp` (mA raw → A in HA), `pilot`, `max_current`, `session_energy` (Wh),
`total_energy`, `status` (active/sleeping/disabled…), `state`, `temp`,
`vehicle` (plug). Charger HTTP UI at http://10.0.0.249.
Derived HA template sensors (`lifepower4_N_pack_power`, `_temperature_max`,
`_cell_imbalance_pct`, `lifepower4_stack_*`) are computed **inside HA**, not on
MQTT — compute them yourself from the raw entities when working off the Pi.
## Known issues / gotchas (check memory for the canonical versions)
- **Inverter `battery_voltage` is INTERMITTENTLY wrong** — read a correct ~54 V on
2026-06-20 (verified via HA history), but ~910 V on 2026-06-23/24 after the Jun 22
14:18 reboot, with packs steady at ~5253 V throughout. So it's a post-reboot /
re-init glitch (the inverter or PI18 GS field not settling after restart), NOT a
permanent scaling bug. Implication: treat the inverter battery reading as
untrustworthy and use the `lifepower4_*` pack entities for any battery math; if it
reads ~10 V right now, a powermon (or inverter) restart may clear it — worth testing.
- **Pack 6 is an oddball**: Modbus addr `0x01` @ 115200 (packs 15 are `0x40` @
9600); ran 65 % SoC while 15 sat 4044 %. Treat as a distinct member.
- **EG4 SoC never re-anchors** (drifts because packs rarely hit 100 % to reset the
coulomb counter). See memory `project_eg4_soc_drift_remediation`.
- **RS485 daisy-chain silences slave packs** — each pack needs its own FTDI; an
inter-pack chain demotes slaves. See memory `project_eg4_daisy_chain_silences_slaves`.
- **No per-day inverter energy** — PI18 only gives `ET` (lifetime Wh); ED/EM/EY NAK.
Daily kWh must come from HA recorder or ET deltas.
- **Parallel cluster**: changing inverter settings on only one unit risks fault 86
(desync). `lvx-control` always mirrors to both — that's why setters go through it.
## Action policy for these skills
**Allowed (safe remediation):**
- Read anything: `solar-snapshot`, `mosquitto_sub`, `journalctl`, `systemctl status/is-active`.
- Restart the data-plane daemons when they're wedged:
`sudo systemctl restart powermon.service` / `powermon2.service` / `eg4-battery.service` / `lvx-control.service`
- Recover inverter USB links: `sudo systemctl restart lvx-resolve-links.service`
or `sudo /usr/local/sbin/lvx-resolve-links`.
**Forbidden (escalate to the user instead — propose the exact command, don't run it):**
- Any inverter/battery **setter**: `solar/control/lvx6048/*` publishes
(charger priority, max charge current, output priority, …).
- `lvx-flash/flash.py apply` and `dump`/`compare`/`sync-check` — they contend for
exclusive USB and stop powermon; advanced, user-driven only.
- Anything that writes battery thresholds, output mode, or factory resets.
- Power-cycling hardware, moving cables, breaker changes.
When a fix is outside the allowed set, report the finding and hand the user the
precise command(s) to run.