Files
localgenai/testing/qwen3-coder-30b/README.md
noisedestroyers 2c4bfefa95 Initial commit: localgenai stack
Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.

- pyinfra/framework/: pyinfra deploy targeting the box
  - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
    for gfx1151), OpenWebUI
  - Beszel (host + container + AMD GPU dashboard via sysfs)
  - OpenLIT (LLM fleet metrics)
  - Phoenix (per-trace agent waterfall)
  - OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
  - install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
  documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00

2.6 KiB

Local LLM Performance Testing Framework

A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.

Features

  • Performance Metrics: Response time, tokens per second, CPU/memory/GPU usage
  • Quality Assessment: Automated code quality evaluation
  • Multi-Model Testing: Test multiple LLMs simultaneously
  • Parallel Execution: Run tests efficiently
  • Comprehensive Reporting: Detailed test results and comparisons

Installation

# Install required dependencies
pip install psutil

Usage

Quick Start

  1. Create a test suite and add models:
from llm_test_framework import TestSuite, MockLLM

# Create test suite
suite = TestSuite()

# Add models to test
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))

# Define test tasks
tasks = [
    {
        "name": "Simple function",
        "prompt": "Write a Python function to add two numbers",
        "expected_output": "return a + b"
    }
]

# Run tests
suite.run_all_tests(tasks)
suite.save_results()

Running the Example

python llm_test_framework.py

This will run sample tests with mock models and save results to JSON.

Components

1. TestSuite

Main class managing the testing process, including:

  • Model registration
  • Test execution
  • Result collection and saving

2. LLMInterface

Abstract base class for LLM implementations with methods:

  • generate(): Generate text from prompt
  • get_model_info(): Get model information

3. TestResult

Data class storing individual test results with:

  • Performance metrics
  • Quality scores
  • Raw outputs
  • Success status

Extending the Framework

To add support for a new LLM:

  1. Create a new class inheriting from LLMInterface
  2. Implement generate() and get_model_info() methods
  3. Add your model to the test suite

Test Types

The framework supports multiple types of coding tasks:

  • Basic function implementations
  • Code refactoring examples
  • Algorithm challenges
  • System design questions

Output

Results are saved in JSON format with:

  • Performance metrics (response time, TPS)
  • Resource usage (CPU, memory, GPU)
  • Quality assessment scores
  • Raw model outputs
  • Success/failure indicators

Future Enhancements

  • Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
  • Advanced code quality evaluation using static analysis
  • Web-based dashboard for results visualization
  • Database storage for historical comparisons
  • Automated statistical analysis of results