Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00
commit 2c4bfefa95
36 changed files with 5265 additions and 0 deletions
--- a/testing/qwen3-coder-30b/README.md
+++ b/testing/qwen3-coder-30b/README.md
@@ -0,0 +1,109 @@
+# Local LLM Performance Testing Framework
+
+A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.
+
+## Features
+
+- **Performance Metrics**: Response time, tokens per second, CPU/memory/GPU usage
+- **Quality Assessment**: Automated code quality evaluation
+- **Multi-Model Testing**: Test multiple LLMs simultaneously
+- **Parallel Execution**: Run tests efficiently
+- **Comprehensive Reporting**: Detailed test results and comparisons
+
+## Installation
+
+```bash
+# Install required dependencies
+pip install psutil
+```
+
+## Usage
+
+### Quick Start
+
+1. Create a test suite and add models:
+
+```python
+from llm_test_framework import TestSuite, MockLLM
+
+# Create test suite
+suite = TestSuite()
+
+# Add models to test
+suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
+suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))
+
+# Define test tasks
+tasks = [
+    {
+        "name": "Simple function",
+        "prompt": "Write a Python function to add two numbers",
+        "expected_output": "return a + b"
+    }
+]
+
+# Run tests
+suite.run_all_tests(tasks)
+suite.save_results()
+```
+
+### Running the Example
+
+```bash
+python llm_test_framework.py
+```
+
+This will run sample tests with mock models and save results to JSON.
+
+## Components
+
+### 1. TestSuite
+Main class managing the testing process, including:
+- Model registration
+- Test execution
+- Result collection and saving
+
+### 2. LLMInterface
+Abstract base class for LLM implementations with methods:
+- `generate()`: Generate text from prompt
+- `get_model_info()`: Get model information
+
+### 3. TestResult
+Data class storing individual test results with:
+- Performance metrics
+- Quality scores
+- Raw outputs
+- Success status
+
+## Extending the Framework
+
+To add support for a new LLM:
+
+1. Create a new class inheriting from `LLMInterface`
+2. Implement `generate()` and `get_model_info()` methods
+3. Add your model to the test suite
+
+## Test Types
+
+The framework supports multiple types of coding tasks:
+- Basic function implementations
+- Code refactoring examples
+- Algorithm challenges
+- System design questions
+
+## Output
+
+Results are saved in JSON format with:
+- Performance metrics (response time, TPS)
+- Resource usage (CPU, memory, GPU)
+- Quality assessment scores
+- Raw model outputs
+- Success/failure indicators
+
+## Future Enhancements
+
+- Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
+- Advanced code quality evaluation using static analysis
+- Web-based dashboard for results visualization
+- Database storage for historical comparisons
+- Automated statistical analysis of results