Files

noisedestroyers 2c4bfefa95 Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.

- pyinfra/framework/: pyinfra deploy targeting the box
  - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
    for gfx1151), OpenWebUI
  - Beszel (host + container + AMD GPU dashboard via sysfs)
  - OpenLIT (LLM fleet metrics)
  - Phoenix (per-trace agent waterfall)
  - OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
  - install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
  documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 11:35:10 -04:00

2.6 KiB

Raw Blame History

Local LLM Performance Testing Framework

A comprehensive framework for comparing the performance and quality of different local hosted LLMs on coding tasks.

Features

Performance Metrics: Response time, tokens per second, CPU/memory/GPU usage
Quality Assessment: Automated code quality evaluation
Multi-Model Testing: Test multiple LLMs simultaneously
Parallel Execution: Run tests efficiently
Comprehensive Reporting: Detailed test results and comparisons

Installation

# Install required dependencies
pip install psutil

Usage

Quick Start

Create a test suite and add models:

from llm_test_framework import TestSuite, MockLLM

# Create test suite
suite = TestSuite()

# Add models to test
suite.add_model(MockLLM("gpt4all", {"model_path": "/path/to/model"}))
suite.add_model(MockLLM("llama.cpp", {"model_path": "/path/to/llama"}))

# Define test tasks
tasks = [
    {
        "name": "Simple function",
        "prompt": "Write a Python function to add two numbers",
        "expected_output": "return a + b"
    }
]

# Run tests
suite.run_all_tests(tasks)
suite.save_results()

Running the Example

python llm_test_framework.py

This will run sample tests with mock models and save results to JSON.

Components

1. TestSuite

Main class managing the testing process, including:

Model registration
Test execution
Result collection and saving

2. LLMInterface

Abstract base class for LLM implementations with methods:

generate(): Generate text from prompt
get_model_info(): Get model information

3. TestResult

Data class storing individual test results with:

Performance metrics
Quality scores
Raw outputs
Success status

Extending the Framework

To add support for a new LLM:

Create a new class inheriting from LLMInterface
Implement generate() and get_model_info() methods
Add your model to the test suite

Test Types

The framework supports multiple types of coding tasks:

Basic function implementations
Code refactoring examples
Algorithm challenges
System design questions

Output

Results are saved in JSON format with:

Performance metrics (response time, TPS)
Resource usage (CPU, memory, GPU)
Quality assessment scores
Raw model outputs
Success/failure indicators

Future Enhancements

Integration with local LLM APIs (text-generation-webui, llama.cpp, etc.)
Advanced code quality evaluation using static analysis
Web-based dashboard for results visualization
Database storage for historical comparisons
Automated statistical analysis of results

2.6 KiB Raw Blame History