Files

noisedestroyers 2c4bfefa95 Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.

- pyinfra/framework/: pyinfra deploy targeting the box
  - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
    for gfx1151), OpenWebUI
  - Beszel (host + container + AMD GPU dashboard via sysfs)
  - OpenLIT (LLM fleet metrics)
  - Phoenix (per-trace agent waterfall)
  - OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
  - install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
  documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-08 11:35:10 -04:00

14 KiB

Raw Blame History

Local GenAI roadmap

What to build on top of the Framework Desktop / Strix Halo stack to narrow the functional gap with Claude Code + Anthropic's broader ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit quarterly.

Where we stand

Containerized inference at /srv/docker/{llama,vllm,ollama}
Models on the unified-memory iGPU at 64 GB UMA
OpenCode on the Mac, pointed at the box over Tailscale
Playwright + SearXNG MCP wired into OpenCode (localgenai/opencode/)
Self-hosted SearXNG instance at https://searxng.n0n.io
Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B

The mental model: three layers

The OSS ecosystem stratifies cleanly once you stop thinking about individual products:

┌─────────────────────────────────────────────────────────────────┐
│ Layer 3: Management & observability                             │
│   Dispatch tickets to agents, watch spend, track skill           │
│   accumulation. Where Claude Cowork lives. (Multica, Plane.)    │
└────────────────────────────┬────────────────────────────────────┘
                             │ assigns work to ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 2: Orchestration frameworks                                │
│   Code that defines who does what, in what order. Reach for      │
│   when you want to *encode* a pipeline, not drive interactively. │
│   (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.)               │
└────────────────────────────┬────────────────────────────────────┘
                             │ runs ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 1: Harnesses                                               │
│   The thing you actually drive — terminal or IDE agent that      │
│   takes a prompt and uses tools. (OpenCode, Aider, Cline,        │
│   OpenHands, OpenWork, Goose, Pi.)                               │
└────────────────────────────┬────────────────────────────────────┘
                             │ calls ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 0: Inference + tools                                       │
│   Models, MCP servers, embedding stores, voice/vision pipelines. │
│   (Ollama, vLLM, llama.cpp, MCP catalog.)                        │
└─────────────────────────────────────────────────────────────────┘

Pick what you need from each layer; you don't need anything from a higher layer if the one below is doing the job.

Layer 1: Harnesses

The interactive driver's seat. We use OpenCode; alternatives worth having in your back pocket:

Aider — terminal, git-native, every change is a commit. Strong on multi-file refactors. Often outperforms OpenCode on small-context local models.
Cline — VS Code agent, 5M+ installs. v3.58 added native subagents and a CI/CD headless mode. Closest "Claude Code in the IDE" feel.
OpenHands (was OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs tests, browses. Heavier than OpenCode, more autonomous.
OpenWork — desktop GUI agent built on deepagentsjs. Multi-step planning, filesystem access, subagent delegation baked in. npx to start.
Goose by Block — MCP-native, opinionated about toolkits. Strong fit if you're going all-in on MCP.
Pi — minimal terminal harness; small surface area, easy to read and extend.
Catalog: https://github.com/bradAGI/awesome-cli-coding-agents

When to add another: when you hit a workflow OpenCode doesn't handle well. Aider for refactors, Cline for IDE-bound work, OpenHands for autonomous loops, OpenWork for subagent-heavy planning.

Layer 2: Orchestration frameworks

When you want to build a pipeline rather than drive a prebuilt one — write code that defines agents, tools, and a flow. Overkill for solo coding; on-target for personal automation pipelines.

LangGraph — directed graph with conditional edges, built-in checkpointing with time-travel. Surpassed CrewAI in stars in early 2026; production-grade.
CrewAI — role-based crews, agents/tasks/crew in <20 lines of Python. Lowest learning curve, ~5× faster than LangGraph in many cases. A2A protocol support.
AutoGen / AG2 — event-driven, async-first; GroupChat coordination with a selector deciding who speaks next.
OpenAI Agents SDK — production replacement for Swarm. Explicit agent-to-agent handoffs carrying conversation context.
OpenAgents — only framework with native support for both MCP and A2A protocols.

All five are model-agnostic — point them at the local Ollama endpoint or LiteLLM and they work the same as if you were using GPT-5.

Layer 3: Management & observability

The Claude Cowork analog layer. Dispatch work, track progress, monitor spend. The piece I missed in earlier roadmap drafts:

Multica — open-source platform for managing AI coding agents as teammates. File issues, assign to agents, they pick up work, write code, report blockers, update statuses. Skill accumulation across tasks. Dispatches to Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes, Gemini, Kiro, etc. #1 GitHub TypeScript Trending in April 2026, 10.7 k stars in three months. Closest existing OSS analog to Cowork; works above OpenCode rather than replacing it.
Plane — AI-native project management with MCP server and full Agent Run lifecycle tracking. Velocity, workload, blockers auto-populated. Self-hostable.
Mission Control — self-hosted dashboard for agent fleets. Token-spend per model, trend charts, real-time posture scoring. SQLite, zero deps. Useful even solo just to watch your local model's spend pattern.
ClawDeck — focused kanban for OpenClaw agents. Smaller scope, more opinionated.
OpenWebUI — multi-user chat UI in front of Ollama. Doesn't manage agent sessions, but covers shared-LLM-access for a household.

Still gap (May 2026): live shared agent sessions — multiple humans driving one agent on one repo in real time, Cowork's multiplayer mode. Multica is async-ticket-flow, not live-collaborative. Watch for projects that close this.

Layer 0: Cross-cutting capabilities

Things you wire in once, useful from any harness above.

Aggregation & routing

LiteLLM — proxy that exposes one OpenAI-compatible endpoint over many backends. Rules like "code task → local Qwen3, image input → vision model, fast chat → Haiku." Worth standing up early so all higher-layer tools just point at one URL.

Code intelligence / RAG

The local model can't see beyond what fits in its prompt. Claude Code papers over this with long-context recall + WebFetch; locally we need explicit retrieval.

Continue.dev — VS Code / JetBrains plugin with built-in @codebase indexer. Pairs with nomic-embed-text via Ollama for local embeddings. Biggest single-shot quality lift for local-model coding.
Tabby — self-hosted Copilot-style inline completion. Different niche from agentic coding, covers the "as I type" loop OpenCode doesn't.

Web access — done

Already wired in localgenai/opencode/:

@playwright/mcp — browser automation. Navigate, click, fill forms, read DOM snapshots.
mcp-searxng — web search via your searxng.n0n.io instance. No API keys.
mcp-server-fetch — not yet wired; useful if Playwright feels heavy for plain page reads.

Persistent memory

OpenCode's memory is per-session unless you wire MCP for it.

mcp-server-memory (official) — knowledge-graph memory across any MCP client. Crude but works.
mem0 — sophisticated memory, self-hostable with a local embedding model.

Vision: qwen2.5-vl:7b or llama3.2-vision:11b via Ollama. Route via LiteLLM so OpenCode auto-picks vision for image-bearing prompts. Closes the screenshot gap.
Voice: whisper-compose.yaml and piper-compose.yaml are already in localgenai/. STT in, TTS out — a "talk to your local assistant" loop is something Claude Code itself doesn't do.

Design (genuine new capability)

Anthropic ships Claude Design (Anthropic Labs); the OSS analog is real and active:

nexu-io/open-design — local-first OSS alternative to Claude Design. 19 Skills, 71 brand-grade Design Systems, generates web/desktop/mobile prototypes, slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4 export. Ships a stdio MCP server exposing design source — tokens, JSX components, entry HTML — as a structured API to any MCP-aware agent. Pairs naturally with OpenCode + the local model.
Penpot — full OSS Figma alternative. Different scope (interactive design tool, not agentic).
OpenPencil — AI-native vector design tool, "design-as-code." Newer; track.

Skills / MCP catalog

Official MCP registry
Awesome MCP Servers (1200+ entries)
mcpservers.org searchable directory

Build the skills you miss; publish them.

Prioritized next steps

In rough order of impact-per-hour-spent:

Continue.dev + nomic-embed-text in VS Code. Biggest single quality jump for local-model coding via codebase indexing.
Symlink localgenai/opencode/opencode.json into place so the Playwright + SearXNG MCP servers activate. (Already written; one ln -sf away.)
Stand up LiteLLM as the aggregation layer in front of Ollama — even at one model, the abstraction pays off the moment you add a second.
Stand up Multica for ticket-style work dispatch. Closest Cowork analog; useful even solo for tracking what the agent has done across sessions.
Add a vision model (qwen2.5-vl:7b) and route image prompts via LiteLLM. Closes the screenshot gap.
Stand up OpenDesign as a compose stack. Connect its MCP server to OpenCode. Genuinely new capability versus Claude Code.
Try Aider on a real refactor — different agentic loop than OpenCode, often better with smaller-context local models.
Wire whisper + piper into a voice loop. Not a Claude Code parity item — a new capability the local stack uniquely enables.
Mission Control on the box for token-spend visibility once multi-model routing through LiteLLM is real.
Evaluate OpenHands or Goose for autonomous agentic loops where OpenCode's per-task UX feels too hands-on.
OpenWebUI for household-shared chat access. Adds value for non-coding LLM use, fills a small gap in the multi-user story.

Reference catalogs to monitor

Official MCP registry: https://registry.modelcontextprotocol.io/
Awesome CLI coding agents: https://github.com/bradAGI/awesome-cli-coding-agents
Awesome MCP servers: https://github.com/punkpeye/awesome-mcp-servers
mcpservers.org directory: https://mcpservers.org/
Anthropic plugin catalog (parity-tracking): https://claude.com/plugins

Open gaps to keep watching

Live shared agent sessions — Cowork's multiplayer mode (multiple humans driving one agent on one repo, real-time). No direct OSS equivalent yet; Multica is async-ticket-flow.
Cohesive "skills" library as polished as Anthropic's plugin catalog. The MCP ecosystem has the parts but lacks the curation.
AMD ROCm packages for Ubuntu 26.04 — relevant if we ever decide to abandon containerization for native ROCm tooling. (Tracked separately in TODO.md.)

14 KiB Raw Blame History Unescape Escape