Files
shaggy-solar/.claude/skills/troubleshoot-inverter/SKILL.md
noise aa97d65b0c Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:

- solar-health-check  : whole-system sweep + cross-checks + R/Y/G verdict,
                        incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage         : PV/load/grid/battery balance + EVSE sessions

Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history     : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).

Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:46:20 -04:00

114 lines
6.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: troubleshoot-inverter
description: >-
Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating
mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and
powermon daemon health. Use when an inverter shows a fault, is in the wrong mode,
stopped publishing, the two units disagree, PV looks low, or the user says
"inverter problem / fault code / no solar / one inverter is down". Read-only plus
safe link/daemon recovery; never changes inverter settings.
---
# troubleshoot-inverter
## 0. Load context
Shell cwd is the repo root; anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
```
Read `$ROOT/.claude/skills/REFERENCE.md`. Inverter entities are `lvx6048_1_*`
(powermon.service) and `lvx6048_2_*` (powermon2.service).
## 1. Is it a data problem or a device problem?
```bash
systemctl is-active powermon.service powermon2.service lvx-resolve-links.service
ls -l /dev/lvx6048-1 /dev/lvx6048-2 # symlinks present? point at hidraw?
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state'
```
- Service active + data flowing → **device/config** issue, go to §2.
- Service active but a unit's entities are silent, or a `/dev/lvx6048-*` symlink is
missing/dangling → **USB-HID link** issue, go to §4.
## 2. Faults & mode
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state'
journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20
```
- `device_mode` values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault.
`Bypass`/`Line` = passing grid through (normal when grid present + low PV).
`Fault` = stop, decode the fault.
- The FWS fault/warning bit → label mapping lives in the patched driver
`$ROOT/LVX6048/powermon-patches/pi18.py` (search `FWS`, `fault`, `warning`).
Read it to translate a raw `fault_code`. MOD code labels are there too.
Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout,
08 bus voltage too high, 56 battery connection open, 71 parallel version
different, 8086 parallel-cluster faults (86 = output setting mismatch).
### Historic faults — "when did it last happen / show me last week"
Local logs only reach the last reboot (`journalctl` here is volatile, ~1 day), so
for anything older query HA's recorder via `ha-history` (needs `~/.config/ha/token`
— see REFERENCE; if absent, tell the user how to create it, don't block):
Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is
doubled and differs per command):
```bash
"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
```
Each `Fault`/non-`No fault` change-point prints with a local timestamp and how long
it lasted (marked `<<< FAULT`). To pin the cause, re-query the same window with `-a`
for the surrounding conditions — `mppt1_input_voltage`, `mppt1_input_power`,
`battery_voltage`, `ac_output_active_power`:
- **Normal PV V (~300 V) + normal battery (~54 V) at the fault** → it's an internal
DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory).
- **Fault on ONE unit only, repeatedly** → unit-specific weakness (bus regulation /
cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units).
- **`mppt_input_power` flatlines at 0 after the fault and stays there** → that unit
silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20
occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the
array offline ~1.8 days while unit 2 masked it). Cross-check the *other* unit's PV
over the same span to catch this.
## 3. Parallel-cluster sync
The two units run paralleled; desync throws **fault 86**.
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state'
```
- Exactly one unit = `parallel_instance_number` 0 (master); other ≥1. Two masters /
two slaves / both 0 → cluster confused → likely needs a coordinated power cycle
(user action — propose it, don't do it).
- Healthy parallel = both units sharing load roughly symmetrically
(`ac_output_active_power` comparable). One at ~0 W while the other carries
everything = that unit dropped out of the cluster.
- Deeper sync/settings comparison via `$ROOT/LVX6048/lvx-flash/flash.py sync-check` /
`compare` exists but **stops powermon and grabs the USB** — advanced, user-run
only. Propose the command; don't execute.
## 4. USB-HID link recovery (ALLOWED)
Symptom: one unit's entities stale/absent, or `/dev/lvx6048-*` missing/dangling.
The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon.
```bash
sudo systemctl restart lvx-resolve-links.service # remaps + restarts powermon{,2}
# or run the resolver directly:
sudo /usr/local/sbin/lvx-resolve-links
ls -l /dev/lvx6048-* # both symlinks back?
sudo systemctl restart powermon.service powermon2.service # if still wedged
```
Then re-run §1 snapshot to confirm both units publish again. Note: after an
inverter power-cycle the udev rule normally self-heals; manual restart is the
fallback when it didn't.
## 5. PV / MPPT looks low
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state'
```
Cross-reference array memory (`project_solar_array_config`): 9s2p per inverter, and
there's a **suspected dead string per inverter** — one MPPT reading ~half the other,
or roughly half nameplate in full sun, is consistent with that and worth confirming
at the combiner, not a software bug. Zero PV at night/heavy shade is normal.
## 6. Report
State: which unit, mode, fault (decoded), cluster health, link state, what you
recovered (if anything), and any remaining action that needs the user (power cycle,
flash.py, combiner check) as exact commands/steps. Never publish to
`solar/control/lvx6048/*` here.