2026-04-11 05:05:50 -04:00
# Codebase Quality Assessment -- LLM Instructions
**Purpose:** Reproducible, automated quality scoring for the Impakt codebase.
Run this assessment at any milestone (feature complete, pre-release, periodic review) to track quality over time. Results go into `docs/QA-<DATETIME>.md` using the template in `docs/QA-TEMPLATE.md` .
---
## Prerequisites
```bash
cd /Users/noise/Code/impakt
uv sync --dev
```
All commands below assume `uv run` prefix and the project root as working directory.
---
## Step 1: Collect Raw Metrics
Run these commands and record the output. They are independent and can run in parallel.
### 1a. Inventory
```bash
# Source file count and line count
find src -name "*.py" | wc -l
find src -name "*.py" -exec cat {} + | wc -l
# Test file count and line count
find tests -name "*.py" | wc -l
find tests -name "*.py" -exec cat {} + | wc -l
```
### 1b. Test Suite
```bash
# Test count and pass/fail
uv run python -m pytest --tb=no -q 2>&1 | tail -5
# Test collection (verify count)
uv run python -m pytest --co -q 2>&1 | tail -3
```
### 1c. Type Safety
```bash
# mypy strict error count and file breakdown
uv run python -m mypy src/impakt --ignore-missing-imports 2>&1 | tail -3
# Error category breakdown
uv run python -m mypy src/impakt --ignore-missing-imports 2>&1 \
| grep "error:" | sed 's/.*\[/[/' | sort | uniq -c | sort -rn
```
### 1d. Lint
```bash
# Ruff violation count
uv run python -m ruff check src/ 2>&1 | tail -5
# Violation breakdown by rule
uv run python -m ruff check src/ 2>&1 \
| grep -oE '[A-Z][0-9]+' | sort | uniq -c | sort -rn
```
### 1e. Complexity
```bash
# File size distribution
find src/impakt -name "*.py" -exec wc -l {} + | grep -v total \
| awk '{print $1}' | sort -n | awk '
BEGIN {n=0; sum=0}
{a[n++]=$1; sum+=$1}
END {
printf "Min: %d, Max: %d, Mean: %.0f, Median: %d\n", a[0], a[n-1], sum/n, a[int(n/2)]
over300=0; for(i=0;i<n;i++) if(a[i]>300) over300++
printf "Files >300 lines: %d / %d\n", over300, n
}'
# High-complexity files (branch density)
find src/impakt -name "*.py" ! -path "*__pycache__*" | while read f; do
ifs=$(grep -cE '^\s*(if |elif )' "$f" || true)
fors=$(grep -cE '^\s*(for |while )' "$f" || true)
excepts=$(grep -cE '^\s*except' "$f" || true)
complexity=$((ifs + fors + excepts + 1))
if [ "$complexity" -gt 15 ]; then
lines=$(wc -l < "$f")
echo " $complexity $f ($lines lines)"
fi
done | sort -rn
```
### 1f. Documentation
```bash
# Docstring coverage (approximate)
echo "Docstrings: $(grep -rcE '^\s+"""' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s}')"
echo "Definitions: $(grep -rcE '^\s*(def |class )' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s}')"
# __all__ coverage
for d in src/impakt/*/; do
[ -d "$d" ] || continue
mod=$(basename "$d")
init="${d}__init__.py"
[ -f "$init" ] && has_all=$(grep -c "__all__" "$init" || echo 0) || has_all="-"
echo " $mod: __all __ =$has_all"
done
```
### 1g. Security
```bash
# Dangerous patterns
echo "eval/exec: $(grep -rn 'eval(\|exec(' src/impakt/ --include='*.py' | grep -v '# noqa' | wc -l | tr -d ' ')"
echo "subprocess/os.system: $(grep -rn 'subprocess\|os\.system\|os\.popen' src/impakt/ --include='*.py' | wc -l | tr -d ' ')"
echo "bare except: $(grep -rn 'except:' src/impakt/ --include='*.py' | wc -l | tr -d ' ')"
echo "hardcoded secrets: $(grep -rin 'password\s*=\|secret\s*=\|api_key\s*=\|token\s*=' src/impakt/ --include='*.py' | wc -l | tr -d ' ')"
```
### 1h. Maintainability
```bash
# Debt markers
echo "TODO: $(grep -rc 'TODO' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
echo "FIXME: $(grep -rc 'FIXME' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
echo "HACK: $(grep -rc 'HACK' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
# Logging usage
echo "logging calls: $(grep -rc 'logger\.\|logging\.' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
# Error handling
echo "try/except blocks: $(grep -rc 'try:' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
echo "bare excepts: $(grep -rc 'except:' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
# Internal coupling
echo "internal imports: $(grep -rc 'from impakt\.' src/impakt/ --include='*.py' | awk -F: '{s+=$2}END{print s+0}')"
```
---
## Step 2: Score Each Dimension
Use the raw metrics to assign a 0.0-10.0 score per dimension using these rubrics.
### Test Health (weight: 20%)
| Score | Criteria |
|-------|----------|
| 10 | 100% pass, test:source ratio >= 0.5, coverage >= 80%, integration tests present |
| 8 | 100% pass, ratio 0.2-0.5, integration tests present |
| 6 | 100% pass, ratio < 0.2 or missing coverage measurement |
| 4 | >95% pass |
| 2 | >80% pass |
| 0 | <80% pass or no tests |
### Type Safety (weight: 15%)
| Score | Criteria |
|-------|----------|
| 10 | mypy strict, 0 errors |
| 8 | mypy strict, <10 errors |
| 6 | mypy strict, <50 errors OR mypy non-strict with 0 errors |
| 4 | mypy configured but >50 errors |
| 2 | no type checker configured |
| 0 | no type hints |
### Lint Hygiene (weight: 10%)
| Score | Criteria |
|-------|----------|
| 10 | 0 violations |
| 8 | <10 violations, all auto-fixable |
| 6 | <100 violations, mostly auto-fixable |
| 4 | 100-500 violations |
| 2 | >500 violations |
| 0 | no linter configured |
### Architecture (weight: 15%)
| Score | Criteria |
|-------|----------|
| 10 | Clear layering, all modules have `__all__` , low coupling, plugin system |
| 9 | Clear layering, most modules have `__all__` , plugin system |
| 7 | Identifiable layers, some modules lack public API definition |
| 5 | Flat structure, unclear module boundaries |
| 3 | God objects, circular dependencies |
| 0 | Single-file or completely unstructured |
Evaluate qualitatively: read `__init__.py` files, check import patterns, assess layer violations (does `io` import from `web` ? does `channel` depend on `plot` ?).
### Documentation (weight: 10%)
| Score | Criteria |
|-------|----------|
| 10 | >90% docstring coverage, generated API docs, architectural docs with diagrams |
| 8 | >80% docstring coverage, README with architecture, no generated API docs |
| 6 | >60% docstring coverage, basic README |
| 4 | <60% docstring coverage |
| 2 | README only |
| 0 | No documentation |
### Complexity (weight: 10%)
| Score | Criteria |
|-------|----------|
| 10 | Median file <150 lines, 0 files >300, max complexity <15 |
| 8 | Median <150, <=3 files >300, high-complexity files are justified (parsers, UI assembly) |
| 6 | Median <200, <=10 files >300 |
| 4 | Median >200 or many files >500 |
| 2 | Multiple files >1000 lines |
| 0 | Monolithic files, no decomposition |
### Security (weight: 10%)
| Score | Criteria |
|-------|----------|
| 10 | No eval/exec, no subprocess, no secrets, input validation at boundaries |
| 9 | eval/exec present but sandboxed (restricted builtins, blocklist), no secrets |
| 7 | eval/exec partially sandboxed, or subprocess with shell=False |
| 5 | Unsandboxed eval/exec or subprocess with shell=True |
| 3 | Hardcoded secrets present |
| 0 | Multiple critical security anti-patterns |
### Maintainability (weight: 10%)
| Score | Criteria |
|-------|----------|
| 10 | 0 debt markers, consistent error handling, logging, modern tooling |
| 8 | <5 debt markers, no bare excepts, logging present |
| 6 | <20 debt markers, some bare excepts |
| 4 | >20 debt markers or many bare excepts |
| 2 | Inconsistent patterns, no logging |
| 0 | No discernible standards |
---
## Step 3: Compute Composite Score
```
composite = (
test_health * 0.20 +
type_safety * 0.15 +
lint_hygiene * 0.10 +
architecture * 0.15 +
documentation * 0.10 +
complexity * 0.10 +
security * 0.10 +
maintainability * 0.10
) * 10
```
### Grade Scale
| Range | Grade | Meaning |
|-------|-------|---------|
| 90-100 | A | Production-grade, minimal tech debt |
| 80-89 | B+ | Strong codebase, minor gaps |
| 70-79 | B | Good foundation, actionable improvements exist |
| 60-69 | C+ | Functional but accumulating debt |
| 50-59 | C | Significant quality concerns |
| <50 | D | Major remediation needed |
---
## Step 4: Write Report
Copy `docs/QA-TEMPLATE.md` to `docs/QA-<YYYY-MM-DD_HHMM>.md` and fill in:
2026-04-11 06:31:01 -04:00
1. **Summary block at the top ** (immediately after the `#` heading):
- Blockquote with composite score, grade, and one-sentence summary
- Quick-reference dimension score table
- The summary sentence should note overall quality posture and the most significant change since the last assessment (or "Baseline assessment" if first run)
2. All raw metric values
3. All dimension scores with brief justification
4. Composite score and grade
5. Delta from previous assessment (if one exists)
6. Top 3-5 actionable improvements
7. Acknowledgment of any scoring judgment calls
2026-04-11 05:05:50 -04:00
---
## Notes for LLM Execution
- Run all Step 1 commands. Do not skip any -- partial data leads to unreliable scores.
- When scoring Architecture, actually read the import graph. Do not guess from file names alone.
- For Security, read the context around any `eval` /`exec` /`subprocess` hits to assess sandboxing.
- Be consistent: same rubric, same thresholds, every time. If a metric falls between rubric rows, interpolate linearly.
- If the project adds coverage measurement (`pytest-cov` ), factor it into Test Health.
- Record the exact command output, not just the derived score. Future assessments need the raw data to compute deltas.
- Check for previous `QA-*.md` files in `docs/` to compute improvement trends.