Added QA process - subagent, md template, scorecards.

This commit is contained in:
2026-04-11 06:31:01 -04:00
parent 40c8b0f132
commit 794de9c721
11 changed files with 740 additions and 6 deletions

View File

@@ -1,5 +1,19 @@
# Quality Assessment -- 2026-04-11
> **Score: 78.3 / 100 — Grade: B**
> Baseline assessment. Strong architecture and documentation (9.0 each), solid test suite (240/240 pass), held back by lint debt (89 violations) and type errors (49 mypy strict). Clear path to B+ with low-effort fixes.
| Dimension | Score |
|-----------|-------|
| Test Health | 8.0 |
| Type Safety | 6.5 |
| Lint Hygiene | 6.0 |
| Architecture | 9.0 |
| Documentation | 9.0 |
| Complexity | 7.0 |
| Security | 8.5 |
| Maintainability | 8.5 |
**Version:** 0.1.0
**Assessed by:** Claude Opus 4.6
**Previous assessment:** None (baseline)

210
docs/QA-2026-04-11_0528.md Normal file
View File

@@ -0,0 +1,210 @@
# Quality Assessment -- 2026-04-11
> **Score: 78.3 / 100 — Grade: B**
> No code changes since baseline. All metrics identical. Five recommended actions remain open — completing them would project to ~85 (B+).
| Dimension | Score |
|-----------|-------|
| Test Health | 8.0 |
| Type Safety | 6.5 |
| Lint Hygiene | 6.0 |
| Architecture | 9.0 |
| Documentation | 9.0 |
| Complexity | 7.0 |
| Security | 8.5 |
| Maintainability | 8.5 |
**Version:** 0.1.0
**Assessed by:** Claude Opus 4.6
**Previous assessment:** QA-2026-04-11_0459.md
---
## Inventory
| Metric | Value |
|--------|-------|
| Source files | 72 |
| Source lines | 10,325 |
| Test files | 30 |
| Test lines | 2,736 |
| Test:source ratio | 0.26 |
| Direct dependencies | 10 core + 1 optional + 4 dev |
---
## Raw Metrics
### Test Suite
```
240 passed, 7 warnings in 9.26s
```
- Tests collected: 240
- Tests passed: 240
- Tests failed: 0
- Test duration: 9.26s
### Type Safety (mypy --strict)
```
Found 49 errors in 20 files (checked 72 source files)
```
- Total errors: 49
- Files with errors: 20 / 72 (72% clean)
- Top error categories:
- `[type-arg]` 17 -- missing generic parameters on `dict`, `list`
- `[attr-defined]` 9 -- attribute access on loosely typed objects
- `[no-any-return]` 5 -- returning `Any` from typed functions
- `[var-annotated]` 3
- `[import-untyped]` 3
- `[assignment]` 3
- `[valid-type]` 2
- `[return-value]` 2
- `[no-untyped-call]` 2
- `[unused-ignore]` 1
- `[no-untyped-def]` 1
- `[comparison-overlap]` 1
### Lint (ruff)
```
Found 89 errors.
[*] 73 fixable with the `--fix` option (9 hidden fixes can be enabled with the `--unsafe-fixes` option).
```
- Total violations: 89
- Auto-fixable: 73 (82%)
- Top violation rules:
- `F401` 61 -- unused imports
- `I001` 9 -- unsorted imports
- `F601` 8 -- duplicate dictionary keys
- `E501` 5 -- line too long
- `F841` 3 -- unused variables
- `P035` 2 -- string concatenation in f-string
- `F541` 1 -- f-string without placeholder
### Complexity
- File size: min=1 / median=133 / mean=143 / max=693
- Files >300 lines: 6 / 72
- High-complexity files (branch density >15):
```
80 src/impakt/io/mme.py (693 lines) -- ISO 13499 parser, justified
44 src/impakt/web/components/criteria.py (343) -- UI assembly with protocol logic
30 src/impakt/channel/model.py (456) -- core data model, multiple classes
27 src/impakt/web/state.py (274) -- app state with multi-test support
27 src/impakt/protocol/euro_ncap.py (238) -- sliding-scale scoring tables
25 src/impakt/web/callbacks/plot_callbacks.py (249) -- transform pipeline orchestration
21 src/impakt/protocol/iihs.py (180) -- G/A/M/P rating logic
20 src/impakt/plot/engine.py (257) -- Plotly rendering with corridors
19 src/impakt/script/cli.py (140) -- CLI arg parsing
17 src/impakt/web/components/channel_grid.py (368) -- DataTable assembly
16 src/impakt/web/callbacks/channel_callbacks.py (195) -- selection/filter callbacks
```
### Documentation
- Docstring coverage: 414 / 454 definitions (91%)
- Modules with `__all__`: 6 / 11 public modules
- channel: YES
- criteria: YES
- io: NO
- plot: YES
- plugin: NO
- protocol: YES
- report: NO
- script: NO
- template: NO
- transform: YES
- web: YES
- README: 1,266 lines with 20 Mermaid diagram references
- Architectural diagrams: yes
### Security
- eval/exec (sandboxed): 1 -- `math_expr.py:151`, restricted builtins `{}` + token blocklist (flagged `# noqa: S307`)
- eval/exec (unsandboxed): 0
- subprocess: 0 (the string "subprocess" appears only as a blocklist entry in `math_expr.py`)
- Hardcoded secrets: 0
- Bare except: 0
### Maintainability
- TODO: 0
- FIXME: 0
- HACK: 0
- Logging calls: 48
- try/except blocks: 52
- Bare excepts: 0
- Internal imports (coupling): 190
---
## Scorecard
| # | Dimension | Weight | Score | Weighted | Justification |
|---|-----------|--------|-------|----------|---------------|
| 1 | Test Health | 20% | 8.0/10 | 16.0 | 240/240 pass. 0.26 ratio (within 0.2-0.5 band). Integration tests with real datasets. No coverage % configured. |
| 2 | Type Safety | 15% | 6.5/10 | 9.75 | mypy strict enabled. 49 errors remain in 20 files, concentrated in web layer. Mostly cosmetic (`type-arg` 17). Between 6 (<50 errors) and 8 (<10 errors). |
| 3 | Lint Hygiene | 10% | 6.0/10 | 6.0 | 89 violations, 82% auto-fixable. Dominated by unused imports (F401=61). 8 duplicate dict keys (F601) need manual fix. Rubric: <100, mostly auto-fixable = 6. |
| 4 | Architecture | 15% | 9.0/10 | 13.5 | Clean 4-layer design (data -> transform -> protocol -> web). Plugin system. No layer violations found. 6/11 modules export `__all__`. Docked 1 point for 5 missing `__all__`. |
| 5 | Documentation | 10% | 9.0/10 | 9.0 | 91% docstring coverage (>90%). README with 20 Mermaid diagrams. No generated API reference docs, so not a full 10. |
| 6 | Complexity | 10% | 7.0/10 | 7.0 | Median 133 (<150). 6 files >300 lines. `mme.py` at 693/80 complexity is the outlier -- justified as a format parser. Between 8 (<=3 files >300) and 6 (<=10 files >300). |
| 7 | Security | 10% | 8.5/10 | 8.5 | Single eval sandboxed with `{"__builtins__": {}}` + 16-item token blocklist. No subprocess, no secrets. Between 9 (sandboxed) and 7 (partially sandboxed). |
| 8 | Maintainability | 10% | 8.5/10 | 8.5 | Zero debt markers. Zero bare excepts. 48 logging calls across codebase. Modern tooling (uv, hatchling, ruff, mypy). Between 10 (perfect) and 8 (<5 markers). |
### Composite Score: **78.3 / 100**
### Grade: **B**
Calculation: (8.0*0.20 + 6.5*0.15 + 6.0*0.10 + 9.0*0.15 + 9.0*0.10 + 7.0*0.10 + 8.5*0.10 + 8.5*0.10) * 10 = (1.60 + 0.975 + 0.60 + 1.35 + 0.90 + 0.70 + 0.85 + 0.85) * 10 = 78.25 -> 78.3
---
## Delta from Previous Assessment
| Dimension | Previous | Current | Change |
|-----------|----------|---------|--------|
| Test Health | 8.0 | 8.0 | 0.0 |
| Type Safety | 6.5 | 6.5 | 0.0 |
| Lint Hygiene | 6.0 | 6.0 | 0.0 |
| Architecture | 9.0 | 9.0 | 0.0 |
| Documentation | 9.0 | 9.0 | 0.0 |
| Complexity | 7.0 | 7.0 | 0.0 |
| Security | 8.5 | 8.5 | 0.0 |
| Maintainability | 8.5 | 8.5 | 0.0 |
| **Composite** | **78.3** | **78.3** | **0.0** |
---
## Top Improvements Since Last Assessment
No code changes since previous assessment -- all metrics are identical.
---
## Recommended Actions (Priority Order)
| # | Action | Effort | Impact | Dimensions Affected |
|---|--------|--------|--------|---------------------|
| 1 | Run `uv run ruff check --fix src/` to clear 73 auto-fixable violations | 1 min | +2.0 lint -> 8.0 | Lint Hygiene |
| 2 | Fix 8 duplicate dict keys in `channel/lookup.py` (F601) and remaining manual lint fixes | 15 min | +1.0 lint -> 9.0+ | Lint Hygiene |
| 3 | Add `--cov --cov-report=term` to pytest config, target 80%+ branch coverage | 30 min | +1.0 test | Test Health |
| 4 | Resolve 17 `[type-arg]` mypy errors (add `dict[str, X]` generics to web layer) | 1 hr | +1.0 type | Type Safety |
| 5 | Add `__all__` to `io`, `plugin`, `report`, `script`, `template` modules | 30 min | +0.5 arch | Architecture |
**Projected score after actions 1-5: ~85 (B+)**
---
## Notes
- **No code changes detected** between this assessment and the prior one (QA-2026-04-11_0459). All raw metrics are identical, yielding the same scores. The recommended actions from the baseline remain open.
- **Architecture score is qualitative.** Import graph was inspected: no layer violations found (data layer does not import from web/plot). The `web` module sits at the top of the dependency tree as expected.
- **Security eval in `math_expr.py`** is sandboxed (empty `__builtins__`, 16-entry token blocklist for `import`, `exec`, `eval`, `subprocess`, `os.`, `sys.`, `__`, etc.). The `# noqa: S307` comment causes it to be excluded from the `grep -v '# noqa'` security scan. An AST-based evaluator would be safer but is lower priority given the blocklist approach.
- **The "subprocess" grep hit** is a false positive: the string appears only in the forbidden-token blocklist within `math_expr.py`, not as an actual subprocess invocation.
- **Complexity scoring for `mme.py`** remains lenient because format parsers inherently have high branch density. If it grows beyond ~800 lines, consider extracting sub-parsers.
- **Test:source ratio of 0.26** has not changed. Protocol and criteria modules remain the highest-value targets for additional test coverage.

198
docs/QA-2026-04-11_0619.md Normal file
View File

@@ -0,0 +1,198 @@
# Quality Assessment -- 2026-04-11
**Version:** 0.1.0
**Assessed by:** Claude Sonnet 4.6
**Previous assessment:** QA-2026-04-11_0528.md
---
## Inventory
| Metric | Value |
|--------|-------|
| Source files | 72 |
| Source lines | 10,325 |
| Test files | 30 |
| Test lines | 2,736 |
| Test:source ratio | 0.26 |
| Direct dependencies | 10 core + 1 optional + 4 dev |
---
## Raw Metrics
### Test Suite
```
240 passed, 7 warnings in 9.26s
```
- Tests collected: 240
- Tests passed: 240
- Tests failed: 0
- Test duration: 9.26s
### Type Safety (mypy --strict)
```
Found 49 errors in 20 files (checked 72 source files)
```
- Total errors: 49
- Files with errors: 20 / 72 (72% clean)
- Top error categories:
- `[type-arg]` 17 — missing generic parameters on `dict`, `list`
- `[attr-defined]` 9 — attribute access on loosely typed objects
- `[no-any-return]` 5 — returning `Any` from typed functions
- `[var-annotated]` 3
- `[import-untyped]` 3
- `[assignment]` 3
- `[valid-type]` 2
- `[return-value]` 2
- `[no-untyped-call]` 2
- `[unused-ignore]` 1
- `[no-untyped-def]` 1
- `[comparison-overlap]` 1
### Lint (ruff)
```
Found 89 errors.
[*] 73 fixable with the `--fix` option (9 hidden fixes can be enabled with the `--unsafe-fixes` option).
```
- Total violations: 89
- Auto-fixable: 73 (82%)
- Top violation rules:
- `F401` 61 — unused imports
- `I001` 9 — unsorted imports
- `F601` 8 — duplicate dictionary keys
- `E501` 5 — line too long
- `F841` 3 — unused variables
- `P035` 2 — string concatenation in f-string
- `F541` 1 — f-string without placeholder
### Complexity
- File size: min=1 / median=133 / mean=143 / max=693
- Files >300 lines: 6 / 72
- High-complexity files (branch density >15):
```
80 src/impakt/io/mme.py (693 lines) -- ISO 13499 parser, justified
44 src/impakt/web/components/criteria.py (343 lines) -- UI assembly with protocol logic
30 src/impakt/channel/model.py (456 lines) -- core data model, multiple classes
27 src/impakt/web/state.py (274 lines) -- app state with multi-test support
27 src/impakt/protocol/euro_ncap.py (238 lines) -- sliding-scale scoring tables
25 src/impakt/web/callbacks/plot_callbacks.py (249 lines) -- transform pipeline orchestration
21 src/impakt/protocol/iihs.py (180 lines) -- G/A/M/P rating logic
20 src/impakt/plot/engine.py (257 lines) -- Plotly rendering with corridors
19 src/impakt/script/cli.py (140 lines) -- CLI arg parsing
17 src/impakt/web/components/channel_grid.py (368 lines) -- DataTable assembly
16 src/impakt/web/callbacks/channel_callbacks.py (195 lines) -- selection/filter callbacks
```
### Documentation
- Docstring coverage: 414 / 454 definitions (91%)
- Modules with `__all__`: 6 / 11 public modules
- channel: YES
- criteria: YES
- io: NO
- plot: YES
- plugin: NO
- protocol: YES
- report: NO
- script: NO
- template: NO
- transform: YES
- web: YES
- README: 1,266 lines with 20 Mermaid diagram references
- Architectural diagrams: yes
### Security
- eval/exec (sandboxed): 1 — `math_expr.py`, restricted builtins `{}` + token blocklist; excluded from grep via `# noqa: S307`
- eval/exec (unsandboxed): 0
- subprocess: 0 actual invocations (the string `"subprocess"` at `math_expr.py:70` is a forbidden-token blocklist entry, not a real call)
- Hardcoded secrets: 0
- Bare except: 0
### Maintainability
- TODO: 0
- FIXME: 0
- HACK: 0
- Logging calls: 48
- try/except blocks: 52
- Bare excepts: 0
- Internal imports (coupling): 190
---
## Scorecard
| # | Dimension | Weight | Score | Weighted | Justification |
|---|-----------|--------|-------|----------|---------------|
| 1 | Test Health | 20% | 8.0/10 | 1.60 | 240/240 pass. test:source ratio 0.26 (within 0.20.5 band). Integration tests with real datasets present. No coverage % configured. |
| 2 | Type Safety | 15% | 6.5/10 | 0.975 | mypy strict enabled. 49 errors in 20 files, concentrated in web layer. Mostly cosmetic (`type-arg` 17). Interpolated between 6 (<50 errors) and 8 (<10 errors). |
| 3 | Lint Hygiene | 10% | 6.0/10 | 0.60 | 89 violations (82% auto-fixable). Dominated by unused imports (F401=61). 8 duplicate dict keys (F601) need manual fix. Rubric: <100, mostly auto-fixable = 6. |
| 4 | Architecture | 15% | 9.0/10 | 1.35 | Clean 4-layer design (data→transform→protocol→web). Plugin system present. No layer violations found. 6/11 modules export `__all__`. Docked 1 point for 5 missing `__all__`. |
| 5 | Documentation | 10% | 9.0/10 | 0.90 | 91% docstring coverage (>90%). README with 20 Mermaid diagrams. No generated API reference docs, so not a full 10. |
| 6 | Complexity | 10% | 7.0/10 | 0.70 | Median 133 (<150). 6 files >300 lines. `mme.py` at 693/80 complexity is the outlier — justified as a format parser. Interpolated between 8 (≤3 files >300) and 6 (≤10 files >300). |
| 7 | Security | 10% | 8.5/10 | 0.85 | Single eval sandboxed with `{"__builtins__": {}}` + 16-item token blocklist. No subprocess, no secrets, no bare excepts. Interpolated between 9 (fully sandboxed) and 7 (partially sandboxed). |
| 8 | Maintainability | 10% | 8.5/10 | 0.85 | Zero debt markers. Zero bare excepts. 48 logging calls across codebase. Modern tooling (uv, hatchling, ruff, mypy). Between 10 (perfect) and 8 (<5 markers). |
### Composite Score: **78.3 / 100**
### Grade: **B**
Calculation: (8.0×0.20 + 6.5×0.15 + 6.0×0.10 + 9.0×0.15 + 9.0×0.10 + 7.0×0.10 + 8.5×0.10 + 8.5×0.10) × 10
= (1.60 + 0.975 + 0.60 + 1.35 + 0.90 + 0.70 + 0.85 + 0.85) × 10
= 7.825 × 10 = **78.3**
---
## Delta from Previous Assessment
| Dimension | Previous | Current | Change |
|-----------|----------|---------|--------|
| Test Health | 8.0 | 8.0 | 0.0 |
| Type Safety | 6.5 | 6.5 | 0.0 |
| Lint Hygiene | 6.0 | 6.0 | 0.0 |
| Architecture | 9.0 | 9.0 | 0.0 |
| Documentation | 9.0 | 9.0 | 0.0 |
| Complexity | 7.0 | 7.0 | 0.0 |
| Security | 8.5 | 8.5 | 0.0 |
| Maintainability | 8.5 | 8.5 | 0.0 |
| **Composite** | **78.3** | **78.3** | **0.0** |
---
## Top Improvements Since Last Assessment
No code changes detected since QA-2026-04-11_0528 — all raw metrics are identical.
---
## Recommended Actions (Priority Order)
| # | Action | Effort | Impact | Dimensions Affected |
|---|--------|--------|--------|---------------------|
| 1 | Run `uv run ruff check --fix src/` to clear 73 auto-fixable violations | 1 min | +2.0 lint → 8.0 | Lint Hygiene |
| 2 | Manually fix 8 duplicate dict keys (`F601`) in `channel/lookup.py` and remaining non-auto-fixable lint violations | 15 min | +1.0 lint → 9.0+ | Lint Hygiene |
| 3 | Add `--cov --cov-report=term-missing` to pytest config; target ≥80% branch coverage | 30 min | +1.0 test → 9.0 | Test Health |
| 4 | Resolve 17 `[type-arg]` mypy errors (add `dict[str, X]` / `list[X]` generics, primarily in web layer) | 1 hr | +1.0 type → 7.5 | Type Safety |
| 5 | Add `__all__` to `io`, `plugin`, `report`, `script`, `template` modules | 30 min | +0.5 arch → 9.5 | Architecture |
**Projected composite after actions 15: ~85 (B+)**
---
## Notes
- **No code changes detected** between this assessment and QA-2026-04-11_0528. All 72 source files and 30 test files are unchanged, yielding identical metrics and scores for the third consecutive assessment.
- **Architecture is qualitative.** Import graph inspected: no layer violations found. The `web` module sits at the top of the dependency tree; `io`/`transform`/`protocol` layers do not import from `web` or `plot`.
- **Security eval in `math_expr.py`** is sandboxed via empty `__builtins__` dict and a 16-entry token blocklist (including `import`, `exec`, `eval`, `subprocess`, `os.`, `sys.`, `__`). The `# noqa: S307` comment excludes it from the `grep -v '# noqa'` scan. An AST-based evaluator would be safer but is lower priority given existing mitigations.
- **Subprocess grep hit** is a confirmed false positive: the string `"subprocess"` appears only as a forbidden-token blocklist entry at `math_expr.py:70`, not as an actual invocation.
- **Complexity scoring for `mme.py`** remains lenient: ISO 13499 format parsers inherently carry high branch density. Consider extracting sub-parsers if it grows beyond ~800 lines.
- **The three assessments today (0459, 0528, 0619) are identical** because no source code was modified between runs. The recommended actions above remain the highest-value next steps.

View File

@@ -267,12 +267,16 @@ composite = (
Copy `docs/QA-TEMPLATE.md` to `docs/QA-<YYYY-MM-DD_HHMM>.md` and fill in:
1. All raw metric values
2. All dimension scores with brief justification
3. Composite score and grade
4. Delta from previous assessment (if one exists)
5. Top 3-5 actionable improvements
6. Acknowledgment of any scoring judgment calls
1. **Summary block at the top** (immediately after the `#` heading):
- Blockquote with composite score, grade, and one-sentence summary
- Quick-reference dimension score table
- The summary sentence should note overall quality posture and the most significant change since the last assessment (or "Baseline assessment" if first run)
2. All raw metric values
3. All dimension scores with brief justification
4. Composite score and grade
5. Delta from previous assessment (if one exists)
6. Top 3-5 actionable improvements
7. Acknowledgment of any scoring judgment calls
---

View File

@@ -1,5 +1,19 @@
# Quality Assessment -- [DATE]
> **Score: [ ] / 100 — Grade: [ ]**
> [ one-sentence summary of overall quality and key movement since last assessment ]
| Dimension | Score |
|-----------|-------|
| Test Health | /10 |
| Type Safety | /10 |
| Lint Hygiene | /10 |
| Architecture | /10 |
| Documentation | /10 |
| Complexity | /10 |
| Security | /10 |
| Maintainability | /10 |
**Version:** [VERSION]
**Assessed by:** [human / LLM model]
**Previous assessment:** [filename or "None (baseline)"]