Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.
- pyinfra/framework/: pyinfra deploy targeting the box
- llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
for gfx1151), OpenWebUI
- Beszel (host + container + AMD GPU dashboard via sysfs)
- OpenLIT (LLM fleet metrics)
- Phoenix (per-trace agent waterfall)
- OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
- install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Local LLM Performance Testing Framework
A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.
Features
- Performance Metrics: Response time, tokens per second, CPU/memory/GPU usage
- Quality Assessment: Automated code quality evaluation
- Multi-Model Testing: Test multiple LLMs simultaneously
- Parallel Execution: Run tests efficiently
- Comprehensive Reporting: Detailed test results and comparisons
Installation
# Install required dependencies
pip install psutil
Usage
Quick Start
- Create a test suite and add models:
from llm_test_framework import TestSuite, MockLLM
# Create test suite
suite = TestSuite()
# Add models to test
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))
# Define test tasks
tasks = [
{
"name": "Simple function",
"prompt": "Write a Python function to add two numbers",
"expected_output": "return a + b"
}
]
# Run tests
suite.run_all_tests(tasks)
suite.save_results()
Running the Example
python llm_test_framework.py
This will run sample tests with mock models and save results to JSON.
Components
1. TestSuite
Main class managing the testing process, including:
- Model registration
- Test execution
- Result collection and saving
2. LLMInterface
Abstract base class for LLM implementations with methods:
generate(): Generate text from promptget_model_info(): Get model information
3. TestResult
Data class storing individual test results with:
- Performance metrics
- Quality scores
- Raw outputs
- Success status
Extending the Framework
To add support for a new LLM:
- Create a new class inheriting from
LLMInterface - Implement
generate()andget_model_info()methods - Add your model to the test suite
Test Types
The framework supports multiple types of coding tasks:
- Basic function implementations
- Code refactoring examples
- Algorithm challenges
- System design questions
Output
Results are saved in JSON format with:
- Performance metrics (response time, TPS)
- Resource usage (CPU, memory, GPU)
- Quality assessment scores
- Raw model outputs
- Success/failure indicators
Future Enhancements
- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
- Advanced code quality evaluation using static analysis
- Web-based dashboard for results visualization
- Database storage for historical comparisons
- Automated statistical analysis of results