288 lines
9.1 KiB
Markdown
288 lines
9.1 KiB
Markdown
# Codebase Quality Assessment -- LLM Instructions
|
|
|
|
**Purpose:** Reproducible, automated quality scoring for the Impakt codebase.
|
|
Run this assessment at any milestone (feature complete, pre-release, periodic review) to track quality over time. Results go into `docs/QA-<DATETIME>.md` using the template in `docs/QA-TEMPLATE.md`.
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
```bash
|
|
cd /Users/noise/Code/impakt
|
|
uv sync --dev
|
|
```
|
|
|
|
All commands below assume `uv run` prefix and the project root as working directory.
|
|
|
|
---
|
|
|
|
## Step 1: Collect Raw Metrics
|
|
|
|
Run these commands and record the output. They are independent and can run in parallel.
|
|
|
|
### 1a. Inventory
|
|
|
|
```bash
|
|
# Source file count and line count
|
|
find src -name "*.py" | wc -l
|
|
find src -name "*.py" -exec cat {} + | wc -l
|
|
|
|
# Test file count and line count
|
|
find tests -name "*.py" | wc -l
|
|
find tests -name "*.py" -exec cat {} + | wc -l
|
|
```
|
|
|
|
### 1b. Test Suite
|
|
|
|
```bash
|
|
# Test count and pass/fail
|
|
uv run python -m pytest --tb=no -q 2>&1 | tail -5
|
|
|
|
# Test collection (verify count)
|
|
uv run python -m pytest --co -q 2>&1 | tail -3
|
|
```
|
|
|
|
### 1c. Type Safety
|
|
|
|
```bash
|
|
# mypy strict error count and file breakdown
|
|
uv run python -m mypy src/impakt --ignore-missing-imports 2>&1 | tail -3
|
|
|
|
# Error category breakdown
|
|
uv run python -m mypy src/impakt --ignore-missing-imports 2>&1 \
|
|
| grep "error:" | sed 's/.*\[/[/' | sort | uniq -c | sort -rn
|
|
```
|
|
|
|
### 1d. Lint
|
|
|
|
```bash
|
|
# Ruff violation count
|
|
uv run python -m ruff check src/ 2>&1 | tail -5
|
|
|
|
# Violation breakdown by rule
|
|
uv run python -m ruff check src/ 2>&1 \
|
|
| grep -oE '[A-Z][0-9]+' | sort | uniq -c | sort -rn
|
|
```
|
|
|
|
### 1e. Complexity
|
|
|
|
```bash
|
|
# File size distribution
|
|
find src/impakt -name "*.py" -exec wc -l {} + | grep -v total \
|
|
| awk '{print $1}' | sort -n | awk '
|
|
BEGIN {n=0; sum=0}
|
|
{a[n++]=$1; sum+=$1}
|
|
END {
|
|
printf "Min: %d, Max: %d, Mean: %.0f, Median: %d\n", a[0], a[n-1], sum/n, a[int(n/2)]
|
|
over300=0; for(i=0;i<n;i++) if(a[i]>300) over300++
|
|
printf "Files >300 lines: %d / %d\n", over300, n
|
|
}'
|
|
|
|
# High-complexity files (branch density)
|
|
find src/impakt -name "*.py" ! -path "*__pycache__*" | while read f; do
|
|
ifs=$(grep -cE '^\s*(if |elif )' "$f" || true)
|
|
fors=$(grep -cE '^\s*(for |while )' "$f" || true)
|
|
excepts=$(grep -cE '^\s*except' "$f" || true)
|
|
complexity=$((ifs + fors + excepts + 1))
|
|
if [ "$complexity" -gt 15 ]; then
|
|
lines=$(wc -l < "$f")
|
|
echo " $complexity $f ($lines lines)"
|
|
fi
|
|
done | sort -rn
|
|
```
|
|
|
|
### 1f. Documentation
|
|
|
|
```bash
|
|
# Docstring coverage (approximate)
|
|
echo "Docstrings: $(grep -rcE '^\s+"""' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s}')"
|
|
echo "Definitions: $(grep -rcE '^\s*(def |class )' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s}')"
|
|
|
|
# __all__ coverage
|
|
for d in src/impakt/*/; do
|
|
[ -d "$d" ] || continue
|
|
mod=$(basename "$d")
|
|
init="${d}__init__.py"
|
|
[ -f "$init" ] && has_all=$(grep -c "__all__" "$init" || echo 0) || has_all="-"
|
|
echo " $mod: __all__=$has_all"
|
|
done
|
|
```
|
|
|
|
### 1g. Security
|
|
|
|
```bash
|
|
# Dangerous patterns
|
|
echo "eval/exec: $(grep -rn 'eval(\|exec(' src/impakt/ --include='*.py' | grep -v '# noqa' | wc -l | tr -d ' ')"
|
|
echo "subprocess/os.system: $(grep -rn 'subprocess\|os\.system\|os\.popen' src/impakt/ --include='*.py' | wc -l | tr -d ' ')"
|
|
echo "bare except: $(grep -rn 'except:' src/impakt/ --include='*.py' | wc -l | tr -d ' ')"
|
|
echo "hardcoded secrets: $(grep -rin 'password\s*=\|secret\s*=\|api_key\s*=\|token\s*=' src/impakt/ --include='*.py' | wc -l | tr -d ' ')"
|
|
```
|
|
|
|
### 1h. Maintainability
|
|
|
|
```bash
|
|
# Debt markers
|
|
echo "TODO: $(grep -rc 'TODO' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
echo "FIXME: $(grep -rc 'FIXME' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
echo "HACK: $(grep -rc 'HACK' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
|
|
# Logging usage
|
|
echo "logging calls: $(grep -rc 'logger\.\|logging\.' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
|
|
# Error handling
|
|
echo "try/except blocks: $(grep -rc 'try:' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
echo "bare excepts: $(grep -rc 'except:' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
|
|
# Internal coupling
|
|
echo "internal imports: $(grep -rc 'from impakt\.' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
|
|
```
|
|
|
|
---
|
|
|
|
## Step 2: Score Each Dimension
|
|
|
|
Use the raw metrics to assign a 0.0-10.0 score per dimension using these rubrics.
|
|
|
|
### Test Health (weight: 20%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | 100% pass, test:source ratio >= 0.5, coverage >= 80%, integration tests present |
|
|
| 8 | 100% pass, ratio 0.2-0.5, integration tests present |
|
|
| 6 | 100% pass, ratio < 0.2 or missing coverage measurement |
|
|
| 4 | >95% pass |
|
|
| 2 | >80% pass |
|
|
| 0 | <80% pass or no tests |
|
|
|
|
### Type Safety (weight: 15%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | mypy strict, 0 errors |
|
|
| 8 | mypy strict, <10 errors |
|
|
| 6 | mypy strict, <50 errors OR mypy non-strict with 0 errors |
|
|
| 4 | mypy configured but >50 errors |
|
|
| 2 | no type checker configured |
|
|
| 0 | no type hints |
|
|
|
|
### Lint Hygiene (weight: 10%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | 0 violations |
|
|
| 8 | <10 violations, all auto-fixable |
|
|
| 6 | <100 violations, mostly auto-fixable |
|
|
| 4 | 100-500 violations |
|
|
| 2 | >500 violations |
|
|
| 0 | no linter configured |
|
|
|
|
### Architecture (weight: 15%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | Clear layering, all modules have `__all__`, low coupling, plugin system |
|
|
| 9 | Clear layering, most modules have `__all__`, plugin system |
|
|
| 7 | Identifiable layers, some modules lack public API definition |
|
|
| 5 | Flat structure, unclear module boundaries |
|
|
| 3 | God objects, circular dependencies |
|
|
| 0 | Single-file or completely unstructured |
|
|
|
|
Evaluate qualitatively: read `__init__.py` files, check import patterns, assess layer violations (does `io` import from `web`? does `channel` depend on `plot`?).
|
|
|
|
### Documentation (weight: 10%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | >90% docstring coverage, generated API docs, architectural docs with diagrams |
|
|
| 8 | >80% docstring coverage, README with architecture, no generated API docs |
|
|
| 6 | >60% docstring coverage, basic README |
|
|
| 4 | <60% docstring coverage |
|
|
| 2 | README only |
|
|
| 0 | No documentation |
|
|
|
|
### Complexity (weight: 10%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | Median file <150 lines, 0 files >300, max complexity <15 |
|
|
| 8 | Median <150, <=3 files >300, high-complexity files are justified (parsers, UI assembly) |
|
|
| 6 | Median <200, <=10 files >300 |
|
|
| 4 | Median >200 or many files >500 |
|
|
| 2 | Multiple files >1000 lines |
|
|
| 0 | Monolithic files, no decomposition |
|
|
|
|
### Security (weight: 10%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | No eval/exec, no subprocess, no secrets, input validation at boundaries |
|
|
| 9 | eval/exec present but sandboxed (restricted builtins, blocklist), no secrets |
|
|
| 7 | eval/exec partially sandboxed, or subprocess with shell=False |
|
|
| 5 | Unsandboxed eval/exec or subprocess with shell=True |
|
|
| 3 | Hardcoded secrets present |
|
|
| 0 | Multiple critical security anti-patterns |
|
|
|
|
### Maintainability (weight: 10%)
|
|
|
|
| Score | Criteria |
|
|
|-------|----------|
|
|
| 10 | 0 debt markers, consistent error handling, logging, modern tooling |
|
|
| 8 | <5 debt markers, no bare excepts, logging present |
|
|
| 6 | <20 debt markers, some bare excepts |
|
|
| 4 | >20 debt markers or many bare excepts |
|
|
| 2 | Inconsistent patterns, no logging |
|
|
| 0 | No discernible standards |
|
|
|
|
---
|
|
|
|
## Step 3: Compute Composite Score
|
|
|
|
```
|
|
composite = (
|
|
test_health * 0.20 +
|
|
type_safety * 0.15 +
|
|
lint_hygiene * 0.10 +
|
|
architecture * 0.15 +
|
|
documentation * 0.10 +
|
|
complexity * 0.10 +
|
|
security * 0.10 +
|
|
maintainability * 0.10
|
|
) * 10
|
|
```
|
|
|
|
### Grade Scale
|
|
|
|
| Range | Grade | Meaning |
|
|
|-------|-------|---------|
|
|
| 90-100 | A | Production-grade, minimal tech debt |
|
|
| 80-89 | B+ | Strong codebase, minor gaps |
|
|
| 70-79 | B | Good foundation, actionable improvements exist |
|
|
| 60-69 | C+ | Functional but accumulating debt |
|
|
| 50-59 | C | Significant quality concerns |
|
|
| <50 | D | Major remediation needed |
|
|
|
|
---
|
|
|
|
## Step 4: Write Report
|
|
|
|
Copy `docs/QA-TEMPLATE.md` to `docs/QA-<YYYY-MM-DD_HHMM>.md` and fill in:
|
|
|
|
1. All raw metric values
|
|
2. All dimension scores with brief justification
|
|
3. Composite score and grade
|
|
4. Delta from previous assessment (if one exists)
|
|
5. Top 3-5 actionable improvements
|
|
6. Acknowledgment of any scoring judgment calls
|
|
|
|
---
|
|
|
|
## Notes for LLM Execution
|
|
|
|
- Run all Step 1 commands. Do not skip any -- partial data leads to unreliable scores.
|
|
- When scoring Architecture, actually read the import graph. Do not guess from file names alone.
|
|
- For Security, read the context around any `eval`/`exec`/`subprocess` hits to assess sandboxing.
|
|
- Be consistent: same rubric, same thresholds, every time. If a metric falls between rubric rows, interpolate linearly.
|
|
- If the project adds coverage measurement (`pytest-cov`), factor it into Test Health.
|
|
- Record the exact command output, not just the derived score. Future assessments need the raw data to compute deltas.
|
|
- Check for previous `QA-*.md` files in `docs/` to compute improvement trends.
|