Files
shaggy-solar/.claude/skills/troubleshoot-inverter/SKILL.md
noise 8dafce7dfe Docs: reflect this session's findings across the repo
- top-level README.md (new): system overview, subsystem map, skills pointer,
  notable findings.
- eg4battery README/NOTES: 3 -> 6 packs (pack 6 oddball 0x01/115200); SoC drift
  + calibration section; closed-loop comms evaluated and rejected (loses per-pack
  telemetry, no native protocol, doesn't fix drift); how to force a grid charge
  via output-priority SUB.
- LVX6048 README: closed-loop pending item -> resolved decision; new "SoC
  calibration & known firmware quirks" section (POP single-digit/POP01, MCHGC
  charge-lock, re_discharge=re-discharge can't exceed float, PIRI lag, powermon
  adhoc wedge); skills pointer.
- lvx-control README: POP encoding fix, POP crc-but-applies quirk, verify-by-
  behavior, grid-charge-via-SUB usage.
- troubleshoot-inverter skill: corrected the stale "dead string per inverter"
  claim — both strings healthy; low PV is tilt/heat/shade/curtailment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 21:46:02 -04:00

119 lines
6.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: troubleshoot-inverter
description: >-
Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating
mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and
powermon daemon health. Use when an inverter shows a fault, is in the wrong mode,
stopped publishing, the two units disagree, PV looks low, or the user says
"inverter problem / fault code / no solar / one inverter is down". Read-only plus
safe link/daemon recovery; never changes inverter settings.
---
# troubleshoot-inverter
## 0. Load context
Shell cwd is the repo root; anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
```
Read `$ROOT/.claude/skills/REFERENCE.md`. Inverter entities are `lvx6048_1_*`
(powermon.service) and `lvx6048_2_*` (powermon2.service).
## 1. Is it a data problem or a device problem?
```bash
systemctl is-active powermon.service powermon2.service lvx-resolve-links.service
ls -l /dev/lvx6048-1 /dev/lvx6048-2 # symlinks present? point at hidraw?
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state'
```
- Service active + data flowing → **device/config** issue, go to §2.
- Service active but a unit's entities are silent, or a `/dev/lvx6048-*` symlink is
missing/dangling → **USB-HID link** issue, go to §4.
## 2. Faults & mode
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state'
journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20
```
- `device_mode` values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault.
`Bypass`/`Line` = passing grid through (normal when grid present + low PV).
`Fault` = stop, decode the fault.
- The FWS fault/warning bit → label mapping lives in the patched driver
`$ROOT/LVX6048/powermon-patches/pi18.py` (search `FWS`, `fault`, `warning`).
Read it to translate a raw `fault_code`. MOD code labels are there too.
Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout,
08 bus voltage too high, 56 battery connection open, 71 parallel version
different, 8086 parallel-cluster faults (86 = output setting mismatch).
### Historic faults — "when did it last happen / show me last week"
Local logs only reach the last reboot (`journalctl` here is volatile, ~1 day), so
for anything older query HA's recorder via `ha-history` (needs `~/.config/ha/token`
— see REFERENCE; if absent, tell the user how to create it, don't block):
Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is
doubled and differs per command):
```bash
"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
```
Each `Fault`/non-`No fault` change-point prints with a local timestamp and how long
it lasted (marked `<<< FAULT`). To pin the cause, re-query the same window with `-a`
for the surrounding conditions — `mppt1_input_voltage`, `mppt1_input_power`,
`battery_voltage`, `ac_output_active_power`:
- **Normal PV V (~300 V) + normal battery (~54 V) at the fault** → it's an internal
DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory).
- **Fault on ONE unit only, repeatedly** → unit-specific weakness (bus regulation /
cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units).
- **`mppt_input_power` flatlines at 0 after the fault and stays there** → that unit
silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20
occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the
array offline ~1.8 days while unit 2 masked it). Cross-check the *other* unit's PV
over the same span to catch this.
## 3. Parallel-cluster sync
The two units run paralleled; desync throws **fault 86**.
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state'
```
- Exactly one unit = `parallel_instance_number` 0 (master); other ≥1. Two masters /
two slaves / both 0 → cluster confused → likely needs a coordinated power cycle
(user action — propose it, don't do it).
- Healthy parallel = both units sharing load roughly symmetrically
(`ac_output_active_power` comparable). One at ~0 W while the other carries
everything = that unit dropped out of the cluster.
- Deeper sync/settings comparison via `$ROOT/LVX6048/lvx-flash/flash.py sync-check` /
`compare` exists but **stops powermon and grabs the USB** — advanced, user-run
only. Propose the command; don't execute.
## 4. USB-HID link recovery (ALLOWED)
Symptom: one unit's entities stale/absent, or `/dev/lvx6048-*` missing/dangling.
The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon.
```bash
sudo systemctl restart lvx-resolve-links.service # remaps + restarts powermon{,2}
# or run the resolver directly:
sudo /usr/local/sbin/lvx-resolve-links
ls -l /dev/lvx6048-* # both symlinks back?
sudo systemctl restart powermon.service powermon2.service # if still wedged
```
Then re-run §1 snapshot to confirm both units publish again. Note: after an
inverter power-cycle the udev rule normally self-heals; manual restart is the
fallback when it didn't.
## 5. PV / MPPT looks low
```bash
"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state'
```
Cross-reference array memory (`project_solar_array_config`): single MPPT per inverter,
9s2p into it. **"Low" PV is mostly expected, not a fault** (verified 2026-06-24): at a
clear-noon peak each inverter made ~16 A @ ~300 V — a down string would read ~10 A, so
both parallel strings are live. The ~66%-of-nameplate output is 45° tilt (wrong for
high summer sun), heat derate, tree shading, AND **demand-throttling** (at noon the
battery charge pins at its cap, so the array is curtailed — not weak). To judge the
array, measure on a clear noon with the bank *hungry* (uncurtailed): ~16 A @ 300 V then
is healthy. Zero PV at night/heavy shade is normal. Don't diagnose "low PV" off a
single off-peak or demand-limited sample.
## 6. Report
State: which unit, mode, fault (decoded), cluster health, link state, what you
recovered (if anything), and any remaining action that needs the user (power cycle,
flash.py, combiner check) as exact commands/steps. Never publish to
`solar/control/lvx6048/*` here.