Initial commit: localgenai stack
Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.
- pyinfra/framework/: pyinfra deploy targeting the box
- llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
for gfx1151), OpenWebUI
- Beszel (host + container + AMD GPU dashboard via sysfs)
- OpenLIT (LLM fleet metrics)
- Phoenix (per-trace agent waterfall)
- OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
- install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
109
testing/qwen3-coder-30b/README.md
Normal file
109
testing/qwen3-coder-30b/README.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Local LLM Performance Testing Framework
|
||||
|
||||
A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.
|
||||
|
||||
## Features
|
||||
|
||||
- **Performance Metrics**: Response time, tokens per second, CPU/memory/GPU usage
|
||||
- **Quality Assessment**: Automated code quality evaluation
|
||||
- **Multi-Model Testing**: Test multiple LLMs simultaneously
|
||||
- **Parallel Execution**: Run tests efficiently
|
||||
- **Comprehensive Reporting**: Detailed test results and comparisons
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Install required dependencies
|
||||
pip install psutil
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Quick Start
|
||||
|
||||
1. Create a test suite and add models:
|
||||
|
||||
```python
|
||||
from llm_test_framework import TestSuite, MockLLM
|
||||
|
||||
# Create test suite
|
||||
suite = TestSuite()
|
||||
|
||||
# Add models to test
|
||||
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
|
||||
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))
|
||||
|
||||
# Define test tasks
|
||||
tasks = [
|
||||
{
|
||||
"name": "Simple function",
|
||||
"prompt": "Write a Python function to add two numbers",
|
||||
"expected_output": "return a + b"
|
||||
}
|
||||
]
|
||||
|
||||
# Run tests
|
||||
suite.run_all_tests(tasks)
|
||||
suite.save_results()
|
||||
```
|
||||
|
||||
### Running the Example
|
||||
|
||||
```bash
|
||||
python llm_test_framework.py
|
||||
```
|
||||
|
||||
This will run sample tests with mock models and save results to JSON.
|
||||
|
||||
## Components
|
||||
|
||||
### 1. TestSuite
|
||||
Main class managing the testing process, including:
|
||||
- Model registration
|
||||
- Test execution
|
||||
- Result collection and saving
|
||||
|
||||
### 2. LLMInterface
|
||||
Abstract base class for LLM implementations with methods:
|
||||
- `generate()`: Generate text from prompt
|
||||
- `get_model_info()`: Get model information
|
||||
|
||||
### 3. TestResult
|
||||
Data class storing individual test results with:
|
||||
- Performance metrics
|
||||
- Quality scores
|
||||
- Raw outputs
|
||||
- Success status
|
||||
|
||||
## Extending the Framework
|
||||
|
||||
To add support for a new LLM:
|
||||
|
||||
1. Create a new class inheriting from `LLMInterface`
|
||||
2. Implement `generate()` and `get_model_info()` methods
|
||||
3. Add your model to the test suite
|
||||
|
||||
## Test Types
|
||||
|
||||
The framework supports multiple types of coding tasks:
|
||||
- Basic function implementations
|
||||
- Code refactoring examples
|
||||
- Algorithm challenges
|
||||
- System design questions
|
||||
|
||||
## Output
|
||||
|
||||
Results are saved in JSON format with:
|
||||
- Performance metrics (response time, TPS)
|
||||
- Resource usage (CPU, memory, GPU)
|
||||
- Quality assessment scores
|
||||
- Raw model outputs
|
||||
- Success/failure indicators
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
|
||||
- Advanced code quality evaluation using static analysis
|
||||
- Web-based dashboard for results visualization
|
||||
- Database storage for historical comparisons
|
||||
- Automated statistical analysis of results
|
||||
Reference in New Issue
Block a user