--- name: troubleshoot-battery description: >- Diagnose the 6× EG4 LifePower4 v2 battery packs — per-pack SoC, cell imbalance, SoC drift, RS485/Modbus comms silence, temperature, and warning/protection bits. Use when a pack reads oddly, packs disagree on SoC, a pack stopped reporting, cells look imbalanced, the user mentions "battery problem / pack down / SoC wrong / imbalance / one battery", or after solar-health-check flags the stack. Read-only plus a safe eg4-battery daemon restart; never writes BMS settings. --- # troubleshoot-battery ## 0. Load context Shell cwd is the repo root; anchor paths there: ```bash ROOT="$(git rev-parse --show-toplevel)"; SNAP="$ROOT/.claude/skills/lib/solar-snapshot" ``` Read `$ROOT/.claude/skills/REFERENCE.md`. There are **6 packs** `lifepower4_1..6_*`, all served by `eg4-battery.service` (one FTDI RS485 adapter per pack). Pack config: `~/.config/eg4-battery/eg4-battery.yaml`. ## 1. Are all 6 packs reporting? ```bash systemctl is-active eg4-battery.service "$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|pack_voltage|pack_current)/' 'homeassistant/sensor/+/state' ``` - Fewer than 6 packs in the output → a pack is **silent on RS485**, go to §4. - All 6 present → go to §2/§3. ## 2. SoC spread & drift ```bash "$SNAP" -w 16 -g 'lifepower4_[1-6]_(soc|soc_alt|pack_voltage)/' 'homeassistant/sensor/+/state' ``` - Compute `max(soc) - min(soc)`. >10 % = imbalance worth noting; >20 % = significant. **Pack 6 historically runs high** (it's the oddball: Modbus addr `0x01`/115200 vs `0x40`/9600 for packs 1–5) — judge it on its own, don't assume it tracks 1–5. - **SoC drift is a known design limitation**, not a live fault: the coulomb counter never re-anchors because the bank rarely reaches 100 % to reset. See memory `project_eg4_soc_drift_remediation`. If SoC looks wrong but `pack_voltage` is sane, suspect drift, not a dead pack. Voltage→SoC sanity for LFP at rest: ~51.2 V ≈ low, ~53.5 V ≈ mid, ~54+ V ≈ high (loaded/charging skews this). ## 3. Cell imbalance, temperature, protection bits ```bash "$SNAP" -w 16 -g 'lifepower4_[1-6]_(cell_voltage_delta_mv|cell_voltage_min|cell_voltage_max|temperature_pcb)/' 'homeassistant/sensor/+/state' # For a specific suspect pack N, pull all 16 cells + bits: "$SNAP" -w 14 -g 'lifepower4_3_(cell_[0-9]+_voltage|warning|protection|temperature)' 'homeassistant/sensor/+/state' ``` - `cell_voltage_delta_mv`: <30 mV good, 30–50 mV watch, >50 mV imbalanced, >100 mV bad (a weak/failing cell, or the pack simply needs a long absorb to balance). - Any warning/protection bit set → read its name; over-/under-voltage, over-temp, and over-current protections will also explain a pack dropping current. - Temps: `temperature_pcb` > 55 °C = watch. ## 4. RS485 / Modbus comms silence (a pack missing from §1) ```bash journalctl -u eg4-battery.service --since "15 min ago" --no-pager | grep -iE 'timeout|crc|nak|error|no response|pack|addr' | tail -30 ls -l /dev/serial/by-id/ | grep -i ft232 # all 6 FTDI adapters enumerated? grep -E 'name:|port:|address|baud' ~/.config/eg4-battery/eg4-battery.yaml ``` - Missing FTDI under `by-id` → USB/adapter/cable issue for that pack (hardware → report to user; don't unplug things yourself). - FTDI present but pack times out → check it isn't demoted by an inter-pack daisy-chain (memory `project_eg4_daisy_chain_silences_slaves`: each pack must be on its own dongle; a chain silences slaves). Also confirm the pack's `address`/`baud` in config match the unit (pack 6 legitimately differs). - Daemon wedged after a USB re-enumerate → restart is ALLOWED: ```bash sudo systemctl restart eg4-battery.service ``` Then re-run §1 to confirm all 6 return. ## 5. Report Per-pack table (SoC, voltage, current, delta_mv, max temp, any bits) + the stack spread, whether it's drift vs a real fault, comms state, what you restarted, and any hardware action left for the user. Do not write BMS registers or thresholds.