Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:
- solar-health-check : whole-system sweep + cross-checks + R/Y/G verdict,
incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage : PV/load/grid/battery balance + EVSE sessions
Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).
Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
113
.claude/skills/troubleshoot-inverter/SKILL.md
Normal file
113
.claude/skills/troubleshoot-inverter/SKILL.md
Normal file
@@ -0,0 +1,113 @@
|
||||
---
|
||||
name: troubleshoot-inverter
|
||||
description: >-
|
||||
Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating
|
||||
mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and
|
||||
powermon daemon health. Use when an inverter shows a fault, is in the wrong mode,
|
||||
stopped publishing, the two units disagree, PV looks low, or the user says
|
||||
"inverter problem / fault code / no solar / one inverter is down". Read-only plus
|
||||
safe link/daemon recovery; never changes inverter settings.
|
||||
---
|
||||
|
||||
# troubleshoot-inverter
|
||||
|
||||
## 0. Load context
|
||||
Shell cwd is the repo root; anchor paths there:
|
||||
```bash
|
||||
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
|
||||
```
|
||||
Read `$ROOT/.claude/skills/REFERENCE.md`. Inverter entities are `lvx6048_1_*`
|
||||
(powermon.service) and `lvx6048_2_*` (powermon2.service).
|
||||
|
||||
## 1. Is it a data problem or a device problem?
|
||||
```bash
|
||||
systemctl is-active powermon.service powermon2.service lvx-resolve-links.service
|
||||
ls -l /dev/lvx6048-1 /dev/lvx6048-2 # symlinks present? point at hidraw?
|
||||
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state'
|
||||
```
|
||||
- Service active + data flowing → **device/config** issue, go to §2.
|
||||
- Service active but a unit's entities are silent, or a `/dev/lvx6048-*` symlink is
|
||||
missing/dangling → **USB-HID link** issue, go to §4.
|
||||
|
||||
## 2. Faults & mode
|
||||
```bash
|
||||
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state'
|
||||
journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20
|
||||
```
|
||||
- `device_mode` values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault.
|
||||
`Bypass`/`Line` = passing grid through (normal when grid present + low PV).
|
||||
`Fault` = stop, decode the fault.
|
||||
- The FWS fault/warning bit → label mapping lives in the patched driver
|
||||
`$ROOT/LVX6048/powermon-patches/pi18.py` (search `FWS`, `fault`, `warning`).
|
||||
Read it to translate a raw `fault_code`. MOD code labels are there too.
|
||||
Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout,
|
||||
08 bus voltage too high, 56 battery connection open, 71 parallel version
|
||||
different, 80–86 parallel-cluster faults (86 = output setting mismatch).
|
||||
|
||||
### Historic faults — "when did it last happen / show me last week"
|
||||
Local logs only reach the last reboot (`journalctl` here is volatile, ~1 day), so
|
||||
for anything older query HA's recorder via `ha-history` (needs `~/.config/ha/token`
|
||||
— see REFERENCE; if absent, tell the user how to create it, don't block):
|
||||
Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is
|
||||
doubled and differs per command):
|
||||
```bash
|
||||
"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
|
||||
"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
|
||||
```
|
||||
Each `Fault`/non-`No fault` change-point prints with a local timestamp and how long
|
||||
it lasted (marked `<<< FAULT`). To pin the cause, re-query the same window with `-a`
|
||||
for the surrounding conditions — `mppt1_input_voltage`, `mppt1_input_power`,
|
||||
`battery_voltage`, `ac_output_active_power`:
|
||||
- **Normal PV V (~300 V) + normal battery (~54 V) at the fault** → it's an internal
|
||||
DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory).
|
||||
- **Fault on ONE unit only, repeatedly** → unit-specific weakness (bus regulation /
|
||||
cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units).
|
||||
- **`mppt_input_power` flatlines at 0 after the fault and stays there** → that unit
|
||||
silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20
|
||||
occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the
|
||||
array offline ~1.8 days while unit 2 masked it). Cross-check the *other* unit's PV
|
||||
over the same span to catch this.
|
||||
|
||||
## 3. Parallel-cluster sync
|
||||
The two units run paralleled; desync throws **fault 86**.
|
||||
```bash
|
||||
"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state'
|
||||
```
|
||||
- Exactly one unit = `parallel_instance_number` 0 (master); other ≥1. Two masters /
|
||||
two slaves / both 0 → cluster confused → likely needs a coordinated power cycle
|
||||
(user action — propose it, don't do it).
|
||||
- Healthy parallel = both units sharing load roughly symmetrically
|
||||
(`ac_output_active_power` comparable). One at ~0 W while the other carries
|
||||
everything = that unit dropped out of the cluster.
|
||||
- Deeper sync/settings comparison via `$ROOT/LVX6048/lvx-flash/flash.py sync-check` /
|
||||
`compare` exists but **stops powermon and grabs the USB** — advanced, user-run
|
||||
only. Propose the command; don't execute.
|
||||
|
||||
## 4. USB-HID link recovery (ALLOWED)
|
||||
Symptom: one unit's entities stale/absent, or `/dev/lvx6048-*` missing/dangling.
|
||||
The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon.
|
||||
```bash
|
||||
sudo systemctl restart lvx-resolve-links.service # remaps + restarts powermon{,2}
|
||||
# or run the resolver directly:
|
||||
sudo /usr/local/sbin/lvx-resolve-links
|
||||
ls -l /dev/lvx6048-* # both symlinks back?
|
||||
sudo systemctl restart powermon.service powermon2.service # if still wedged
|
||||
```
|
||||
Then re-run §1 snapshot to confirm both units publish again. Note: after an
|
||||
inverter power-cycle the udev rule normally self-heals; manual restart is the
|
||||
fallback when it didn't.
|
||||
|
||||
## 5. PV / MPPT looks low
|
||||
```bash
|
||||
"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state'
|
||||
```
|
||||
Cross-reference array memory (`project_solar_array_config`): 9s2p per inverter, and
|
||||
there's a **suspected dead string per inverter** — one MPPT reading ~half the other,
|
||||
or roughly half nameplate in full sun, is consistent with that and worth confirming
|
||||
at the combiner, not a software bug. Zero PV at night/heavy shade is normal.
|
||||
|
||||
## 6. Report
|
||||
State: which unit, mode, fault (decoded), cluster health, link state, what you
|
||||
recovered (if anything), and any remaining action that needs the user (power cycle,
|
||||
flash.py, combiner check) as exact commands/steps. Never publish to
|
||||
`solar/control/lvx6048/*` here.
|
||||
Reference in New Issue
Block a user