Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:
- solar-health-check : whole-system sweep + cross-checks + R/Y/G verdict,
incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage : PV/load/grid/battery balance + EVSE sessions
Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).
Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
104 lines
6.0 KiB
Markdown
104 lines
6.0 KiB
Markdown
---
|
||
name: solar-health-check
|
||
description: >-
|
||
Top-level health snapshot of the whole solar/power install — 2 LVX6048
|
||
inverters, 6 EG4 LifePower4 packs, and the OpenEVSE charger — with cross-checks
|
||
and a green/yellow/red verdict. Use when the user asks "how's the solar /
|
||
battery / power system doing", "is everything ok", "check the install", wants a
|
||
status report, or as the first step before deeper troubleshooting. For deep
|
||
dives into one subsystem, hand off to troubleshoot-inverter, troubleshoot-battery,
|
||
or power-usage.
|
||
---
|
||
|
||
# solar-health-check
|
||
|
||
A fast, read-only sweep of every subsystem that ends in a clear verdict. Do NOT
|
||
change settings here; if something needs a restart, that's allowed (see policy).
|
||
|
||
## 0. Load context
|
||
Skills run with the shell cwd at the repo root, so anchor paths there:
|
||
```bash
|
||
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"; HIST="$ROOT/.claude/skills/lib/ha-history"
|
||
```
|
||
Read `$ROOT/.claude/skills/REFERENCE.md` (system map, entity names, snapshot helper,
|
||
action policy) before proceeding.
|
||
|
||
## 1. Services up?
|
||
```bash
|
||
systemctl is-active powermon.service powermon2.service eg4-battery.service lvx-control.service lvx-resolve-links.service
|
||
```
|
||
`lvx-resolve-links` is a oneshot → expect `active`/`exited` (not `failed`). Any
|
||
`failed`/`inactive` on the others is RED. For a wedged data-plane daemon, a
|
||
restart is allowed (see §6).
|
||
|
||
## 2. Capture live telemetry
|
||
```bash
|
||
"$SNAP" -w 10 -g 'lvx6048_[12]_(device_mode|fault_code|battery_voltage|battery_capacity|ac_output_active_power|mppt1_input_power|mppt2_input_power|grid_voltage|inverter_heat_sink_temperature|parallel_instance_number)/' 'homeassistant/sensor/+/state'
|
||
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current|cell_voltage_delta_mv|temperature_pcb)/' 'homeassistant/sensor/+/state'
|
||
"$SNAP" -w 6 'openevse/status' 'openevse/amp' 'openevse/power' 'openevse/session_energy'
|
||
```
|
||
If a family returns "(no messages)": the feeding daemon is silent → that subsystem
|
||
is RED regardless of `is-active` (running but not publishing). EVSE idle/unplugged
|
||
publishing nothing is normal — confirm via `openevse/status`.
|
||
|
||
## 3. Cross-checks (this is the value-add — single sensors can each look fine)
|
||
- **Battery voltage agreement**: each inverter's `battery_voltage` should be within
|
||
~1 V of the pack stack voltage (`pack_voltage` ≈ 51–55 V). **Known anomaly:** the
|
||
inverter reading is *intermittently* wrong (correct ~54 V on 2026-06-20, ~9–10 V
|
||
after the Jun 22 reboot) — a post-reboot glitch, not a permanent bug. If it reads
|
||
~10 V, note it and suggest a powermon restart; use the `lifepower4_*` pack entities,
|
||
never the inverter reading, for any battery math (see REFERENCE known-issues).
|
||
- **Cross-unit PV production (catches a silently-dead inverter)**: compare
|
||
`lvx6048_1_mppt1_input_power` vs `lvx6048_2_mppt1_input_power`. In daylight (the
|
||
*other* unit clearly producing), one unit pinned at **0 W** = that inverter is down
|
||
and being masked by its sibling — RED → troubleshoot-inverter. This is exactly the
|
||
2026-06-20 fault-08 failure mode (unit 1 sat at 0 W for ~1.8 days). At night/heavy
|
||
shade both at 0 W is normal.
|
||
- **SoC spread across packs**: `max(soc) - min(soc)` over the 6 packs. BUT first
|
||
cross-check against `pack_voltage`/`cell_voltage_max`: the packs are paralleled, so
|
||
if all `pack_voltage` agree (±0.1 V) the packs are physically at the same charge and
|
||
any SoC spread is **counter drift**, not real imbalance (pack 6 ran 76 % while
|
||
reading the same 53.4 V / 3.337 V/cell as packs at 50–55 % on 2026-06-24). Real
|
||
imbalance = pack voltages actually diverge. Drift → note it, recommend a calibration
|
||
charge; >20 % spread with diverging voltages = RED → troubleshoot-battery.
|
||
- **Cell imbalance**: any pack with `cell_voltage_delta_mv` > 50 = YELLOW, > 100 = RED.
|
||
- **Parallel master/slave**: exactly one inverter should report
|
||
`parallel_instance_number` 0 (master); the other 1+. Two masters or two slaves = RED.
|
||
- **Faults**: any `fault_code` non-zero, or `device_mode` = Fault = RED → troubleshoot-inverter.
|
||
- **Temps**: pack `temperature_pcb` > 55 °C or inverter heat-sink > 75 °C = YELLOW.
|
||
- **Power balance sanity**: PV in (`mppt*_input_power`) vs AC out vs pack
|
||
`pack_current` should roughly conserve. Gross mismatch = investigate via power-usage.
|
||
|
||
## 4. Verdict
|
||
Print a compact table (subsystem → state → one-line reason), then an overall
|
||
GREEN / YELLOW / RED with the top 1–3 issues and which deeper skill to run.
|
||
|
||
## 5. Recent error scan (only if anything looked off)
|
||
```bash
|
||
for s in powermon powermon2 eg4-battery lvx-control; do
|
||
echo "== $s =="; journalctl -u $s.service --since "15 min ago" --no-pager | grep -iE 'error|timeout|fail|crc|nak|reconnect' | tail -5
|
||
done
|
||
```
|
||
|
||
## 5b. Historical sanity — did anything fail while unattended? (needs HA token)
|
||
Live snapshots miss faults that already cleared and silent-unit spells that ended.
|
||
If `~/.config/ha/token` exists (see REFERENCE), scan the recorder for the last few
|
||
days. Use the REAL HA entity_ids (doubled slug — see REFERENCE), not MQTT names:
|
||
```bash
|
||
"$HIST" -s "5 days ago" -m fault sensor.lvx6048_01_lvx6048_1_fault_code sensor.lvx6048_02_lvx6048_2_fault_code
|
||
# silent-unit hunt: sample midday PV both units across recent days; one pinned 0 while
|
||
# the other produced = it was down. e.g. check a midday window per day:
|
||
"$HIST" -s "2 days ago" sensor.lvx6048_lvx6048_1_mppt1_input_power sensor.lvx6048_lvx6048_2_mppt1_input_power | head -40
|
||
```
|
||
Any fault-08 / silent-unit episode → report with timestamps and hand off to
|
||
troubleshoot-inverter §2–§5. No token → say so and point the user at REFERENCE to add one.
|
||
|
||
## 6. Allowed remediation
|
||
If a daemon is `failed` or running-but-silent, restarting it is permitted:
|
||
```bash
|
||
sudo systemctl restart eg4-battery.service # or powermon / powermon2 / lvx-control
|
||
```
|
||
Re-run the relevant snapshot to confirm data resumes. Anything beyond a restart
|
||
(settings, flash, cabling) → report and hand the user the exact command. Never
|
||
publish to `solar/control/lvx6048/*` from this skill.
|