# Local LLM Performance Testing Framework

A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.

## Features

- **Performance Metrics**: Response time, tokens per second, CPU/memory/GPU usage
- **Quality Assessment**: Automated code quality evaluation
- **Multi-Model Testing**: Test multiple LLMs simultaneously
- **Parallel Execution**: Run tests efficiently
- **Comprehensive Reporting**: Detailed test results and comparisons

## Installation

```bash
# Install required dependencies
pip install psutil
```

## Usage

### Quick Start

1. Create a test suite and add models:

```python
from llm_test_framework import TestSuite, MockLLM

# Create test suite
suite = TestSuite()

# Add models to test
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))

# Define test tasks
tasks = [
    {
        "name": "Simple function",
        "prompt": "Write a Python function to add two numbers",
        "expected_output": "return a + b"
    }
]

# Run tests
suite.run_all_tests(tasks)
suite.save_results()
```

### Running the Example

```bash
python llm_test_framework.py
```

This will run sample tests with mock models and save results to JSON.

## Components

### 1. TestSuite
Main class managing the testing process, including:
- Model registration
- Test execution
- Result collection and saving

### 2. LLMInterface
Abstract base class for LLM implementations with methods:
- `generate()`: Generate text from prompt
- `get_model_info()`: Get model information

### 3. TestResult
Data class storing individual test results with:
- Performance metrics
- Quality scores
- Raw outputs
- Success status

## Extending the Framework

To add support for a new LLM:

1. Create a new class inheriting from `LLMInterface`
2. Implement `generate()` and `get_model_info()` methods
3. Add your model to the test suite

## Test Types

The framework supports multiple types of coding tasks:
- Basic function implementations
- Code refactoring examples
- Algorithm challenges
- System design questions

## Output

Results are saved in JSON format with:
- Performance metrics (response time, TPS)
- Resource usage (CPU, memory, GPU)
- Quality assessment scores
- Raw model outputs
- Success/failure indicators

## Future Enhancements

- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
- Advanced code quality evaluation using static analysis
- Web-based dashboard for results visualization
- Database storage for historical comparisons
- Automated statistical analysis of results