Files
shaggy-solar/.claude/skills/troubleshoot-battery/SKILL.md
noise aa97d65b0c Add solar monitoring/troubleshooting skills for agents
Four Skill-tool skills under .claude/skills/ that let an agent monitor and
troubleshoot the install (2x LVX6048, 6x EG4 LifePower4, OpenEVSE), grounded
in the real MQTT/HA topology rather than generic advice:

- solar-health-check  : whole-system sweep + cross-checks + R/Y/G verdict,
                        incl. cross-unit "silently-dead inverter" detection
- troubleshoot-inverter: FWS fault decode, parallel sync, USB link recovery
- troubleshoot-battery : per-pack imbalance vs SoC-counter-drift, RS485 silence
- power-usage         : PV/load/grid/battery balance + EVSE sessions

Shared lib:
- solar-snapshot : live MQTT capture (creds from powermon.yaml, no hardcoding)
- ha-history     : HA recorder lookback (token from ~/.config/ha/token)
REFERENCE.md documents topology, real HA entity_ids (doubled slug), known
issues, and a safe-remediation-only action policy (restarts yes; setters no).

Action boundary: diagnose + restart wedged daemons / recover USB links;
never touches inverter/battery setters or flash.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:46:20 -04:00

3.9 KiB
Raw Permalink Blame History

name, description
name description
troubleshoot-battery Diagnose the 6× EG4 LifePower4 v2 battery packs — per-pack SoC, cell imbalance, SoC drift, RS485/Modbus comms silence, temperature, and warning/protection bits. Use when a pack reads oddly, packs disagree on SoC, a pack stopped reporting, cells look imbalanced, the user mentions "battery problem / pack down / SoC wrong / imbalance / one battery", or after solar-health-check flags the stack. Read-only plus a safe eg4-battery daemon restart; never writes BMS settings.

troubleshoot-battery

0. Load context

Shell cwd is the repo root; anchor paths there:

ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"

Read $ROOT/.claude/skills/REFERENCE.md. There are 6 packs lifepower4_1..6_*, all served by eg4-battery.service (one FTDI RS485 adapter per pack). Pack config: ~/.config/eg4-battery/eg4-battery.yaml.

1. Are all 6 packs reporting?

systemctl is-active eg4-battery.service
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current)/' 'homeassistant/sensor/+/state'
  • Fewer than 6 packs in the output → a pack is silent on RS485, go to §4.
  • All 6 present → go to §2/§3.

2. SoC spread & drift

"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|soc_alt|pack_voltage)/' 'homeassistant/sensor/+/state'
  • Compute max(soc) - min(soc). >10 % = imbalance worth noting; >20 % = significant. Pack 6 historically runs high (it's the oddball: Modbus addr 0x01/115200 vs 0x40/9600 for packs 15) — judge it on its own, don't assume it tracks 15.
  • SoC drift is a known design limitation, not a live fault: the coulomb counter never re-anchors because the bank rarely reaches 100 % to reset. See memory project_eg4_soc_drift_remediation. If SoC looks wrong but pack_voltage is sane, suspect drift, not a dead pack. Voltage→SoC sanity for LFP at rest: ~51.2 V ≈ low, ~53.5 V ≈ mid, ~54+ V ≈ high (loaded/charging skews this).

3. Cell imbalance, temperature, protection bits

"$SNAP" -w 16 -g 'lifepower4_[1-6]_(cell_voltage_delta_mv|cell_voltage_min|cell_voltage_max|temperature_pcb)/' 'homeassistant/sensor/+/state'
# For a specific suspect pack N, pull all 16 cells + bits:
"$SNAP" -w 14 -g 'lifepower4_3_(cell_[0-9]+_voltage|warning|protection|temperature)' 'homeassistant/sensor/+/state'
  • cell_voltage_delta_mv: <30 mV good, 3050 mV watch, >50 mV imbalanced, >100 mV bad (a weak/failing cell, or the pack simply needs a long absorb to balance).
  • Any warning/protection bit set → read its name; over-/under-voltage, over-temp, and over-current protections will also explain a pack dropping current.
  • Temps: temperature_pcb > 55 °C = watch.

4. RS485 / Modbus comms silence (a pack missing from §1)

journalctl -u eg4-battery.service --since "15 min ago" --no-pager | grep -iE 'timeout|crc|nak|error|no response|pack|addr' | tail -30
ls -l /dev/serial/by-id/ | grep -i ft232      # all 6 FTDI adapters enumerated?
grep -E 'name:|port:|address|baud' ~/.config/eg4-battery/eg4-battery.yaml
  • Missing FTDI under by-id → USB/adapter/cable issue for that pack (hardware → report to user; don't unplug things yourself).
  • FTDI present but pack times out → check it isn't demoted by an inter-pack daisy-chain (memory project_eg4_daisy_chain_silences_slaves: each pack must be on its own dongle; a chain silences slaves). Also confirm the pack's address/baud in config match the unit (pack 6 legitimately differs).
  • Daemon wedged after a USB re-enumerate → restart is ALLOWED:
    sudo systemctl restart eg4-battery.service
    
    Then re-run §1 to confirm all 6 return.

5. Report

Per-pack table (SoC, voltage, current, delta_mv, max temp, any bits) + the stack spread, whether it's drift vs a real fault, comms state, what you restarted, and any hardware action left for the user. Do not write BMS registers or thresholds.