# Local LLM Performance Testing Framework A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks. ## Features - **Performance Metrics**: Response time, tokens per second, CPU/memory/GPU usage - **Quality Assessment**: Automated code quality evaluation - **Multi-Model Testing**: Test multiple LLMs simultaneously - **Parallel Execution**: Run tests efficiently - **Comprehensive Reporting**: Detailed test results and comparisons ## Installation ```bash # Install required dependencies pip install psutil ``` ## Usage ### Quick Start 1. Create a test suite and add models: ```python from llm_test_framework import TestSuite, MockLLM # Create test suite suite = TestSuite() # Add models to test suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"})) suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"})) # Define test tasks tasks = [ { "name": "Simple function", "prompt": "Write a Python function to add two numbers", "expected_output": "return a + b" } ] # Run tests suite.run_all_tests(tasks) suite.save_results() ``` ### Running the Example ```bash python llm_test_framework.py ``` This will run sample tests with mock models and save results to JSON. ## Components ### 1. TestSuite Main class managing the testing process, including: - Model registration - Test execution - Result collection and saving ### 2. LLMInterface Abstract base class for LLM implementations with methods: - `generate()`: Generate text from prompt - `get_model_info()`: Get model information ### 3. TestResult Data class storing individual test results with: - Performance metrics - Quality scores - Raw outputs - Success status ## Extending the Framework To add support for a new LLM: 1. Create a new class inheriting from `LLMInterface` 2. Implement `generate()` and `get_model_info()` methods 3. Add your model to the test suite ## Test Types The framework supports multiple types of coding tasks: - Basic function implementations - Code refactoring examples - Algorithm challenges - System design questions ## Output Results are saved in JSON format with: - Performance metrics (response time, TPS) - Resource usage (CPU, memory, GPU) - Quality assessment scores - Raw model outputs - Success/failure indicators ## Future Enhancements - Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.) - Advanced code quality evaluation using static analysis - Web-based dashboard for results visualization - Database storage for historical comparisons - Automated statistical analysis of results