Files
shaggy-solar/.claude/skills/troubleshoot-battery/SKILL.md

78 lines
3.9 KiB
Markdown
Raw Normal View History

---
name: troubleshoot-battery
description: >-
Diagnose the 6× EG4 LifePower4 v2 battery packs — per-pack SoC, cell imbalance,
SoC drift, RS485/Modbus comms silence, temperature, and warning/protection bits.
Use when a pack reads oddly, packs disagree on SoC, a pack stopped reporting,
cells look imbalanced, the user mentions "battery problem / pack down / SoC wrong
/ imbalance / one battery", or after solar-health-check flags the stack. Read-only
plus a safe eg4-battery daemon restart; never writes BMS settings.
---
# troubleshoot-battery
## 0. Load context
Shell cwd is the repo root; anchor paths there:
```bash
ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot"
```
Read `$ROOT/.claude/skills/REFERENCE.md`. There are **6 packs** `lifepower4_1..6_*`,
all served by `eg4-battery.service` (one FTDI RS485 adapter per pack). Pack config:
`~/.config/eg4-battery/eg4-battery.yaml`.
## 1. Are all 6 packs reporting?
```bash
systemctl is-active eg4-battery.service
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current)/' 'homeassistant/sensor/+/state'
```
- Fewer than 6 packs in the output → a pack is **silent on RS485**, go to §4.
- All 6 present → go to §2/§3.
## 2. SoC spread & drift
```bash
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|soc_alt|pack_voltage)/' 'homeassistant/sensor/+/state'
```
- Compute `max(soc) - min(soc)`. >10 % = imbalance worth noting; >20 % = significant.
**Pack 6 historically runs high** (it's the oddball: Modbus addr `0x01`/115200 vs
`0x40`/9600 for packs 15) — judge it on its own, don't assume it tracks 15.
- **SoC drift is a known design limitation**, not a live fault: the coulomb counter
never re-anchors because the bank rarely reaches 100 % to reset. See memory
`project_eg4_soc_drift_remediation`. If SoC looks wrong but `pack_voltage` is
sane, suspect drift, not a dead pack. Voltage→SoC sanity for LFP at rest:
~51.2 V ≈ low, ~53.5 V ≈ mid, ~54+ V ≈ high (loaded/charging skews this).
## 3. Cell imbalance, temperature, protection bits
```bash
"$SNAP" -w 16 -g 'lifepower4_[1-6]_(cell_voltage_delta_mv|cell_voltage_min|cell_voltage_max|temperature_pcb)/' 'homeassistant/sensor/+/state'
# For a specific suspect pack N, pull all 16 cells + bits:
"$SNAP" -w 14 -g 'lifepower4_3_(cell_[0-9]+_voltage|warning|protection|temperature)' 'homeassistant/sensor/+/state'
```
- `cell_voltage_delta_mv`: <30 mV good, 3050 mV watch, >50 mV imbalanced, >100 mV
bad (a weak/failing cell, or the pack simply needs a long absorb to balance).
- Any warning/protection bit set → read its name; over-/under-voltage,
over-temp, and over-current protections will also explain a pack dropping current.
- Temps: `temperature_pcb` > 55 °C = watch.
## 4. RS485 / Modbus comms silence (a pack missing from §1)
```bash
journalctl -u eg4-battery.service --since "15 min ago" --no-pager | grep -iE 'timeout|crc|nak|error|no response|pack|addr' | tail -30
ls -l /dev/serial/by-id/ | grep -i ft232 # all 6 FTDI adapters enumerated?
grep -E 'name:|port:|address|baud' ~/.config/eg4-battery/eg4-battery.yaml
```
- Missing FTDI under `by-id` → USB/adapter/cable issue for that pack (hardware →
report to user; don't unplug things yourself).
- FTDI present but pack times out → check it isn't demoted by an inter-pack
daisy-chain (memory `project_eg4_daisy_chain_silences_slaves`: each pack must be on
its own dongle; a chain silences slaves). Also confirm the pack's `address`/`baud`
in config match the unit (pack 6 legitimately differs).
- Daemon wedged after a USB re-enumerate → restart is ALLOWED:
```bash
sudo systemctl restart eg4-battery.service
```
Then re-run §1 to confirm all 6 return.
## 5. Report
Per-pack table (SoC, voltage, current, delta_mv, max temp, any bits) + the stack
spread, whether it's drift vs a real fault, comms state, what you restarted, and any
hardware action left for the user. Do not write BMS registers or thresholds.