109 lines
2.6 KiB
Markdown
109 lines
2.6 KiB
Markdown
|
|
# Local LLM Performance Testing Framework
|
||
|
|
|
||
|
|
A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- **Performance Metrics**: Response time, tokens per second, CPU/memory/GPU usage
|
||
|
|
- **Quality Assessment**: Automated code quality evaluation
|
||
|
|
- **Multi-Model Testing**: Test multiple LLMs simultaneously
|
||
|
|
- **Parallel Execution**: Run tests efficiently
|
||
|
|
- **Comprehensive Reporting**: Detailed test results and comparisons
|
||
|
|
|
||
|
|
## Installation
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Install required dependencies
|
||
|
|
pip install psutil
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
### Quick Start
|
||
|
|
|
||
|
|
1. Create a test suite and add models:
|
||
|
|
|
||
|
|
```python
|
||
|
|
from llm_test_framework import TestSuite, MockLLM
|
||
|
|
|
||
|
|
# Create test suite
|
||
|
|
suite = TestSuite()
|
||
|
|
|
||
|
|
# Add models to test
|
||
|
|
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
|
||
|
|
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))
|
||
|
|
|
||
|
|
# Define test tasks
|
||
|
|
tasks = [
|
||
|
|
{
|
||
|
|
"name": "Simple function",
|
||
|
|
"prompt": "Write a Python function to add two numbers",
|
||
|
|
"expected_output": "return a + b"
|
||
|
|
}
|
||
|
|
]
|
||
|
|
|
||
|
|
# Run tests
|
||
|
|
suite.run_all_tests(tasks)
|
||
|
|
suite.save_results()
|
||
|
|
```
|
||
|
|
|
||
|
|
### Running the Example
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python llm_test_framework.py
|
||
|
|
```
|
||
|
|
|
||
|
|
This will run sample tests with mock models and save results to JSON.
|
||
|
|
|
||
|
|
## Components
|
||
|
|
|
||
|
|
### 1. TestSuite
|
||
|
|
Main class managing the testing process, including:
|
||
|
|
- Model registration
|
||
|
|
- Test execution
|
||
|
|
- Result collection and saving
|
||
|
|
|
||
|
|
### 2. LLMInterface
|
||
|
|
Abstract base class for LLM implementations with methods:
|
||
|
|
- `generate()`: Generate text from prompt
|
||
|
|
- `get_model_info()`: Get model information
|
||
|
|
|
||
|
|
### 3. TestResult
|
||
|
|
Data class storing individual test results with:
|
||
|
|
- Performance metrics
|
||
|
|
- Quality scores
|
||
|
|
- Raw outputs
|
||
|
|
- Success status
|
||
|
|
|
||
|
|
## Extending the Framework
|
||
|
|
|
||
|
|
To add support for a new LLM:
|
||
|
|
|
||
|
|
1. Create a new class inheriting from `LLMInterface`
|
||
|
|
2. Implement `generate()` and `get_model_info()` methods
|
||
|
|
3. Add your model to the test suite
|
||
|
|
|
||
|
|
## Test Types
|
||
|
|
|
||
|
|
The framework supports multiple types of coding tasks:
|
||
|
|
- Basic function implementations
|
||
|
|
- Code refactoring examples
|
||
|
|
- Algorithm challenges
|
||
|
|
- System design questions
|
||
|
|
|
||
|
|
## Output
|
||
|
|
|
||
|
|
Results are saved in JSON format with:
|
||
|
|
- Performance metrics (response time, TPS)
|
||
|
|
- Resource usage (CPU, memory, GPU)
|
||
|
|
- Quality assessment scores
|
||
|
|
- Raw model outputs
|
||
|
|
- Success/failure indicators
|
||
|
|
|
||
|
|
## Future Enhancements
|
||
|
|
|
||
|
|
- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
|
||
|
|
- Advanced code quality evaluation using static analysis
|
||
|
|
- Web-based dashboard for results visualization
|
||
|
|
- Database storage for historical comparisons
|
||
|
|
- Automated statistical analysis of results
|