testing/qwen3-coder-30b/README.md

# Local LLM Performance Testing Framework

A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.

## Features

- **Performance Metrics**: Response time, tokens per second, CPU/memory/GPU usage
- **Quality Assessment**: Automated code quality evaluation
- **Multi-Model Testing**: Test multiple LLMs simultaneously
- **Parallel Execution**: Run tests efficiently
- **Comprehensive Reporting**: Detailed test results and comparisons

## Installation

```bash
# Install required dependencies
pip install psutil
```

## Usage

### Quick Start

1. Create a test suite and add models:

```python
from llm_test_framework import TestSuite, MockLLM

# Create test suite
suite = TestSuite()

# Add models to test
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))

# Define test tasks
tasks = [
    {
        "name": "Simple function",
        "prompt": "Write a Python function to add two numbers",
        "expected_output": "return a + b"
    }
]

# Run tests
suite.run_all_tests(tasks)
suite.save_results()
```

### Running the Example

```bash
python llm_test_framework.py
```

This will run sample tests with mock models and save results to JSON.

## Components

### 1. TestSuite
Main class managing the testing process, including:
- Model registration
- Test execution
- Result collection and saving

### 2. LLMInterface
Abstract base class for LLM implementations with methods:
- `generate()`: Generate text from prompt
- `get_model_info()`: Get model information

### 3. TestResult
Data class storing individual test results with:
- Performance metrics
- Quality scores
- Raw outputs
- Success status

## Extending the Framework

To add support for a new LLM:

1. Create a new class inheriting from `LLMInterface`
2. Implement `generate()` and `get_model_info()` methods
3. Add your model to the test suite

## Test Types

The framework supports multiple types of coding tasks:
- Basic function implementations
- Code refactoring examples
- Algorithm challenges
- System design questions

## Output

Results are saved in JSON format with:
- Performance metrics (response time, TPS)
- Resource usage (CPU, memory, GPU)
- Quality assessment scores
- Raw model outputs
- Success/failure indicators

## Future Enhancements

- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
- Advanced code quality evaluation using static analysis
- Web-based dashboard for results visualization
- Database storage for historical comparisons
- Automated statistical analysis of results
Initial commit: localgenai stack Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-05-08 11:35:10 -04:00			`# Local LLM Performance Testing Framework`

			`A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.`

			`## Features`

			`- Performance Metrics: Response time, tokens per second, CPU/memory/GPU usage`
			`- Quality Assessment: Automated code quality evaluation`
			`- Multi-Model Testing: Test multiple LLMs simultaneously`
			`- Parallel Execution: Run tests efficiently`
			`- Comprehensive Reporting: Detailed test results and comparisons`

			`## Installation`

			```bash
			`# Install required dependencies`
			`pip install psutil`
			```

			`## Usage`

			`### Quick Start`

			`1. Create a test suite and add models:`

			```python
			`from llm_test_framework import TestSuite, MockLLM`

			`# Create test suite`
			`suite = TestSuite()`

			`# Add models to test`
			`suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))`
			`suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))`

			`# Define test tasks`
			`tasks = [`
			`{`
			`"name": "Simple function",`
			`"prompt": "Write a Python function to add two numbers",`
			`"expected_output": "return a + b"`
			`}`
			`]`

			`# Run tests`
			`suite.run_all_tests(tasks)`
			`suite.save_results()`
			```

			`### Running the Example`

			```bash
			`python llm_test_framework.py`
			```

			`This will run sample tests with mock models and save results to JSON.`

			`## Components`

			`### 1. TestSuite`
			`Main class managing the testing process, including:`
			`- Model registration`
			`- Test execution`
			`- Result collection and saving`

			`### 2. LLMInterface`
			`Abstract base class for LLM implementations with methods:`
			- `generate()`: Generate text from prompt
			- `get_model_info()`: Get model information

			`### 3. TestResult`
			`Data class storing individual test results with:`
			`- Performance metrics`
			`- Quality scores`
			`- Raw outputs`
			`- Success status`

			`## Extending the Framework`

			`To add support for a new LLM:`

			1. Create a new class inheriting from `LLMInterface`
			2. Implement `generate()` and `get_model_info()` methods
			`3. Add your model to the test suite`

			`## Test Types`

			`The framework supports multiple types of coding tasks:`
			`- Basic function implementations`
			`- Code refactoring examples`
			`- Algorithm challenges`
			`- System design questions`

			`## Output`

			`Results are saved in JSON format with:`
			`- Performance metrics (response time, TPS)`
			`- Resource usage (CPU, memory, GPU)`
			`- Quality assessment scores`
			`- Raw model outputs`
			`- Success/failure indicators`

			`## Future Enhancements`

			`- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)`
			`- Advanced code quality evaluation using static analysis`
			`- Web-based dashboard for results visualization`
			`- Database storage for historical comparisons`
			`- Automated statistical analysis of results`