Files
shaggy-solar/.claude/skills/troubleshoot-inverter/SKILL.md
noise aa97d65b0c Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:

- solar-health-check  : whole-system sweep + cross-checks + R/Y/G verdict,
                        incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage         : PV/load/grid/battery balance + EVSE sessions

Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history     : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).

Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:46:20 -04:00

6.3 KiB
Raw Blame History

name, description
name description
troubleshoot-inverter Diagnose the 2× MPP Solar LVX6048 inverters — faults/warnings (FWS), operating mode, parallel-cluster master/slave sync, PV/MPPT input, USB-HID link loss, and powermon daemon health. Use when an inverter shows a fault, is in the wrong mode, stopped publishing, the two units disagree, PV looks low, or the user says "inverter problem / fault code / no solar / one inverter is down". Read-only plus safe link/daemon recovery; never changes inverter settings.

troubleshoot-inverter

0. Load context

Shell cwd is the repo root; anchor paths there:

ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"

Read $ROOT/.claude/skills/REFERENCE.md. Inverter entities are lvx6048_1_* (powermon.service) and lvx6048_2_* (powermon2.service).

1. Is it a data problem or a device problem?

systemctl is-active powermon.service powermon2.service lvx-resolve-links.service
ls -l /dev/lvx6048-1 /dev/lvx6048-2          # symlinks present? point at hidraw?
"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|ac_output_active_power)/' 'homeassistant/sensor/+/state'
  • Service active + data flowing → device/config issue, go to §2.
  • Service active but a unit's entities are silent, or a /dev/lvx6048-* symlink is missing/dangling → USB-HID link issue, go to §4.

2. Faults & mode

"$SNAP" -w 12 -g 'lvx6048_[12]_(device_mode|fault_code|inverter_heat_sink_temperature)/' 'homeassistant/sensor/+/state'
journalctl -u powermon.service -u powermon2.service --since "20 min ago" --no-pager | grep -iE 'fault|warn|FWS|mode|error' | tail -20
  • device_mode values: Power-On / Standby / Bypass / Battery / Line / Charge / Fault. Bypass/Line = passing grid through (normal when grid present + low PV). Fault = stop, decode the fault.
  • The FWS fault/warning bit → label mapping lives in the patched driver $ROOT/LVX6048/powermon-patches/pi18.py (search FWS, fault, warning). Read it to translate a raw fault_code. MOD code labels are there too. Quick refs: 02 over-temp, 03/04 battery V high/low, 07 overload timeout, 08 bus voltage too high, 56 battery connection open, 71 parallel version different, 8086 parallel-cluster faults (86 = output setting mismatch).

Historic faults — "when did it last happen / show me last week"

Local logs only reach the last reboot (journalctl here is volatile, ~1 day), so for anything older query HA's recorder via ha-history (needs ~/.config/ha/token — see REFERENCE; if absent, tell the user how to create it, don't block): Use the REAL HA entity_ids (NOT the MQTT object names — see REFERENCE; the slug is doubled and differs per command):

"$HIST" -s "10 days ago" sensor.lvx6048_lvx6048_1_device_mode sensor.lvx6048_lvx6048_2_device_mode
"$HIST" -s "10 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code

Each Fault/non-No fault change-point prints with a local timestamp and how long it lasted (marked <<< FAULT). To pin the cause, re-query the same window with -a for the surrounding conditions — mppt1_input_voltage, mppt1_input_power, battery_voltage, ac_output_active_power:

  • Normal PV V (~300 V) + normal battery (~54 V) at the fault → it's an internal DC-bus transient, NOT input/battery over-voltage (rules out the cold-Voc theory).
  • Fault on ONE unit only, repeatedly → unit-specific weakness (bus regulation / cap / sensor / slave-CPU FW), not environmental (which hits both paralleled units).
  • mppt_input_power flatlines at 0 after the fault and stays there → that unit silently stopped producing; check it didn't sit dead until a reboot (a real 2026-06-20 occurrence: unit 1 fault 08 at 17:20 → 0 W PV until the Jun 22 reboot, ~half the array offline ~1.8 days while unit 2 masked it). Cross-check the other unit's PV over the same span to catch this.

3. Parallel-cluster sync

The two units run paralleled; desync throws fault 86.

"$SNAP" -w 12 -g 'lvx6048_[12]_(parallel_instance_number|ac_output_active_power|ac_output_voltage)/' 'homeassistant/sensor/+/state'
  • Exactly one unit = parallel_instance_number 0 (master); other ≥1. Two masters / two slaves / both 0 → cluster confused → likely needs a coordinated power cycle (user action — propose it, don't do it).
  • Healthy parallel = both units sharing load roughly symmetrically (ac_output_active_power comparable). One at ~0 W while the other carries everything = that unit dropped out of the cluster.
  • Deeper sync/settings comparison via $ROOT/LVX6048/lvx-flash/flash.py sync-check / compare exists but stops powermon and grabs the USB — advanced, user-run only. Propose the command; don't execute.

Symptom: one unit's entities stale/absent, or /dev/lvx6048-* missing/dangling. The resolver maps hidraw→stable symlink by PI18 serial and restarts powermon.

sudo systemctl restart lvx-resolve-links.service     # remaps + restarts powermon{,2}
# or run the resolver directly:
sudo /usr/local/sbin/lvx-resolve-links
ls -l /dev/lvx6048-*                                  # both symlinks back?
sudo systemctl restart powermon.service powermon2.service   # if still wedged

Then re-run §1 snapshot to confirm both units publish again. Note: after an inverter power-cycle the udev rule normally self-heals; manual restart is the fallback when it didn't.

5. PV / MPPT looks low

"$SNAP" -w 12 -g 'lvx6048_[12]_(mppt1_input_power|mppt2_input_power|device_mode)/' 'homeassistant/sensor/+/state'

Cross-reference array memory (project_solar_array_config): 9s2p per inverter, and there's a suspected dead string per inverter — one MPPT reading ~half the other, or roughly half nameplate in full sun, is consistent with that and worth confirming at the combiner, not a software bug. Zero PV at night/heavy shade is normal.

6. Report

State: which unit, mode, fault (decoded), cluster health, link state, what you recovered (if anything), and any remaining action that needs the user (power cycle, flash.py, combiner check) as exact commands/steps. Never publish to solar/control/lvx6048/* here.