Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:
- solar-health-check : whole-system sweep + cross-checks + R/Y/G verdict,
incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage : PV/load/grid/battery balance + EVSE sessions
Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).
Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:46:20 -04:00
|
|
|
|
---
|
|
|
|
|
|
name: troubleshoot-inverter
|
|
|
|
|
|
description: >-
|
|
|
|
|
|
Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating
|
|
|
|
|
|
mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and
|
|
|
|
|
|
powermon daemon health. Use when an inverter shows a fault, is in the wrong mode,
|
|
|
|
|
|
stopped publishing, the two units disagree, PV looks low, or the user says
|
|
|
|
|
|
"inverter problem / fault code / no solar / one inverter is down". Read-only plus
|
|
|
|
|
|
safe link/daemon recovery; never changes inverter settings.
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
# troubleshoot-inverter
|
|
|
|
|
|
|
|
|
|
|
|
## 0. Load context
|
|
|
|
|
|
Shell cwd is the repo root; anchor paths there:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
|
|
|
|
|
|
```
|
|
|
|
|
|
Read `$ROOT/.claude/skills/REFERENCE.md`. Inverter entities are `lvx6048_1_*`
|
|
|
|
|
|
(powermon.service) and `lvx6048_2_*` (powermon2.service).
|
|
|
|
|
|
|
|
|
|
|
|
## 1. Is it a data problem or a device problem?
|
|
|
|
|
|
```bash
|
|
|
|
|
|
systemctl is-active powermon.service powermon2.service lvx-resolve-links.service
|
|
|
|
|
|
ls -l /dev/lvx6048-1 /dev/lvx6048-2 # symlinks present? point at hidraw?
|
|
|
|
|
|
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state'
|
|
|
|
|
|
```
|
|
|
|
|
|
- Service active + data flowing → **device/config** issue, go to §2.
|
|
|
|
|
|
- Service active but a unit's entities are silent, or a `/dev/lvx6048-*` symlink is
|
|
|
|
|
|
missing/dangling → **USB-HID link** issue, go to §4.
|
|
|
|
|
|
|
|
|
|
|
|
## 2. Faults & mode
|
|
|
|
|
|
```bash
|
|
|
|
|
|
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state'
|
|
|
|
|
|
journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20
|
|
|
|
|
|
```
|
|
|
|
|
|
- `device_mode` values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault.
|
|
|
|
|
|
`Bypass`/`Line` = passing grid through (normal when grid present + low PV).
|
|
|
|
|
|
`Fault` = stop, decode the fault.
|
|
|
|
|
|
- The FWS fault/warning bit → label mapping lives in the patched driver
|
|
|
|
|
|
`$ROOT/LVX6048/powermon-patches/pi18.py` (search `FWS`, `fault`, `warning`).
|
|
|
|
|
|
Read it to translate a raw `fault_code`. MOD code labels are there too.
|
|
|
|
|
|
Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout,
|
|
|
|
|
|
08 bus voltage too high, 56 battery connection open, 71 parallel version
|
|
|
|
|
|
different, 80–86 parallel-cluster faults (86 = output setting mismatch).
|
|
|
|
|
|
|
|
|
|
|
|
### Historic faults — "when did it last happen / show me last week"
|
|
|
|
|
|
Local logs only reach the last reboot (`journalctl` here is volatile, ~1 day), so
|
|
|
|
|
|
for anything older query HA's recorder via `ha-history` (needs `~/.config/ha/token`
|
|
|
|
|
|
— see REFERENCE; if absent, tell the user how to create it, don't block):
|
|
|
|
|
|
Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is
|
|
|
|
|
|
doubled and differs per command):
|
|
|
|
|
|
```bash
|
|
|
|
|
|
"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
|
|
|
|
|
|
"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
|
|
|
|
|
|
```
|
|
|
|
|
|
Each `Fault`/non-`No fault` change-point prints with a local timestamp and how long
|
|
|
|
|
|
it lasted (marked `<<< FAULT`). To pin the cause, re-query the same window with `-a`
|
|
|
|
|
|
for the surrounding conditions — `mppt1_input_voltage`, `mppt1_input_power`,
|
|
|
|
|
|
`battery_voltage`, `ac_output_active_power`:
|
|
|
|
|
|
- **Normal PV V (~300 V) + normal battery (~54 V) at the fault** → it's an internal
|
|
|
|
|
|
DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory).
|
|
|
|
|
|
- **Fault on ONE unit only, repeatedly** → unit-specific weakness (bus regulation /
|
|
|
|
|
|
cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units).
|
|
|
|
|
|
- **`mppt_input_power` flatlines at 0 after the fault and stays there** → that unit
|
|
|
|
|
|
silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20
|
|
|
|
|
|
occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the
|
|
|
|
|
|
array offline ~1.8 days while unit 2 masked it). Cross-check the *other* unit's PV
|
|
|
|
|
|
over the same span to catch this.
|
|
|
|
|
|
|
|
|
|
|
|
## 3. Parallel-cluster sync
|
|
|
|
|
|
The two units run paralleled; desync throws **fault 86**.
|
|
|
|
|
|
```bash
|
|
|
|
|
|
"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state'
|
|
|
|
|
|
```
|
|
|
|
|
|
- Exactly one unit = `parallel_instance_number` 0 (master); other ≥1. Two masters /
|
|
|
|
|
|
two slaves / both 0 → cluster confused → likely needs a coordinated power cycle
|
|
|
|
|
|
(user action — propose it, don't do it).
|
|
|
|
|
|
- Healthy parallel = both units sharing load roughly symmetrically
|
|
|
|
|
|
(`ac_output_active_power` comparable). One at ~0 W while the other carries
|
|
|
|
|
|
everything = that unit dropped out of the cluster.
|
|
|
|
|
|
- Deeper sync/settings comparison via `$ROOT/LVX6048/lvx-flash/flash.py sync-check` /
|
|
|
|
|
|
`compare` exists but **stops powermon and grabs the USB** — advanced, user-run
|
|
|
|
|
|
only. Propose the command; don't execute.
|
|
|
|
|
|
|
|
|
|
|
|
## 4. USB-HID link recovery (ALLOWED)
|
|
|
|
|
|
Symptom: one unit's entities stale/absent, or `/dev/lvx6048-*` missing/dangling.
|
|
|
|
|
|
The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon.
|
|
|
|
|
|
```bash
|
|
|
|
|
|
sudo systemctl restart lvx-resolve-links.service # remaps + restarts powermon{,2}
|
|
|
|
|
|
# or run the resolver directly:
|
|
|
|
|
|
sudo /usr/local/sbin/lvx-resolve-links
|
|
|
|
|
|
ls -l /dev/lvx6048-* # both symlinks back?
|
|
|
|
|
|
sudo systemctl restart powermon.service powermon2.service # if still wedged
|
|
|
|
|
|
```
|
|
|
|
|
|
Then re-run §1 snapshot to confirm both units publish again. Note: after an
|
|
|
|
|
|
inverter power-cycle the udev rule normally self-heals; manual restart is the
|
|
|
|
|
|
fallback when it didn't.
|
|
|
|
|
|
|
|
|
|
|
|
## 5. PV / MPPT looks low
|
|
|
|
|
|
```bash
|
|
|
|
|
|
"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state'
|
|
|
|
|
|
```
|
Docs: reflect this session's findings across the repo
- top-level README.md (new): system overview, subsystem map, skills pointer,
notable findings.
- eg4battery README/NOTES: 3 -> 6 packs (pack 6 oddball 0x01/115200); SoC drift
+ calibration section; closed-loop comms evaluated and rejected (loses per-pack
telemetry, no native protocol, doesn't fix drift); how to force a grid charge
via output-priority SUB.
- LVX6048 README: closed-loop pending item -> resolved decision; new "SoC
calibration & known firmware quirks" section (POP single-digit/POP01, MCHGC
charge-lock, re_discharge=re-discharge can't exceed float, PIRI lag, powermon
adhoc wedge); skills pointer.
- lvx-control README: POP encoding fix, POP crc-but-applies quirk, verify-by-
behavior, grid-charge-via-SUB usage.
- troubleshoot-inverter skill: corrected the stale "dead string per inverter"
claim — both strings healthy; low PV is tilt/heat/shade/curtailment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 21:46:02 -04:00
|
|
|
|
Cross-reference array memory (`project_solar_array_config`): single MPPT per inverter,
|
|
|
|
|
|
9s2p into it. **"Low" PV is mostly expected, not a fault** (verified 2026-06-24): at a
|
|
|
|
|
|
clear-noon peak each inverter made ~16 A @ ~300 V — a down string would read ~10 A, so
|
|
|
|
|
|
both parallel strings are live. The ~66%-of-nameplate output is 45° tilt (wrong for
|
|
|
|
|
|
high summer sun), heat derate, tree shading, AND **demand-throttling** (at noon the
|
|
|
|
|
|
battery charge pins at its cap, so the array is curtailed — not weak). To judge the
|
|
|
|
|
|
array, measure on a clear noon with the bank *hungry* (uncurtailed): ~16 A @ 300 V then
|
|
|
|
|
|
is healthy. Zero PV at night/heavy shade is normal. Don't diagnose "low PV" off a
|
|
|
|
|
|
single off-peak or demand-limited sample.
|
Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:
- solar-health-check : whole-system sweep + cross-checks + R/Y/G verdict,
incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage : PV/load/grid/battery balance + EVSE sessions
Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).
Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:46:20 -04:00
|
|
|
|
|
|
|
|
|
|
## 6. Report
|
|
|
|
|
|
State: which unit, mode, fault (decoded), cluster health, link state, what you
|
|
|
|
|
|
recovered (if anything), and any remaining action that needs the user (power cycle,
|
|
|
|
|
|
flash.py, combiner check) as exact commands/steps. Never publish to
|
|
|
|
|
|
`solar/control/lvx6048/*` here.
|