Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.

- pyinfra/framework/: pyinfra deploy targeting the box
  - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
    for gfx1151), OpenWebUI
  - Beszel (host + container + AMD GPU dashboard via sysfs)
  - OpenLIT (LLM fleet metrics)
  - Phoenix (per-trace agent waterfall)
  - OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
  - install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
  documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-08 11:35:10 -04:00
commit 2c4bfefa95
36 changed files with 5265 additions and 0 deletions

263
Roadmap.md Normal file
View File

@@ -0,0 +1,263 @@
# Local GenAI roadmap
What to build on top of the Framework Desktop / Strix Halo stack to
narrow the functional gap with Claude Code + Anthropic's broader
ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit
quarterly.
## Where we stand
- Containerized inference at `/srv/docker/{llama,vllm,ollama}`
- Models on the unified-memory iGPU at 64 GB UMA
- OpenCode on the Mac, pointed at the box over Tailscale
- Playwright + SearXNG MCP wired into OpenCode (`localgenai/opencode/`)
- Self-hosted SearXNG instance at `https://searxng.n0n.io`
- Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B
## The mental model: three layers
The OSS ecosystem stratifies cleanly once you stop thinking about
individual products:
```
┌─────────────────────────────────────────────────────────────────┐
│ Layer 3: Management & observability │
│ Dispatch tickets to agents, watch spend, track skill │
│ accumulation. Where Claude Cowork lives. (Multica, Plane.) │
└────────────────────────────┬────────────────────────────────────┘
│ assigns work to ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 2: Orchestration frameworks │
│ Code that defines who does what, in what order. Reach for │
│ when you want to *encode* a pipeline, not drive interactively. │
│ (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.) │
└────────────────────────────┬────────────────────────────────────┘
│ runs ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 1: Harnesses │
│ The thing you actually drive — terminal or IDE agent that │
│ takes a prompt and uses tools. (OpenCode, Aider, Cline, │
│ OpenHands, OpenWork, Goose, Pi.) │
└────────────────────────────┬────────────────────────────────────┘
│ calls ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 0: Inference + tools │
│ Models, MCP servers, embedding stores, voice/vision pipelines. │
│ (Ollama, vLLM, llama.cpp, MCP catalog.) │
└─────────────────────────────────────────────────────────────────┘
```
Pick what you need from each layer; you don't need anything from a
higher layer if the one below is doing the job.
## Layer 1: Harnesses
The interactive driver's seat. We use **OpenCode**; alternatives worth
having in your back pocket:
- **[Aider](https://aider.chat)** — terminal, git-native, every change
is a commit. Strong on multi-file refactors. Often outperforms
OpenCode on small-context local models.
- **[Cline](https://github.com/cline/cline)** — VS Code agent, 5M+
installs. v3.58 added native subagents and a CI/CD headless mode.
Closest "Claude Code in the IDE" feel.
- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** (was
OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs
tests, browses. Heavier than OpenCode, more autonomous.
- **[OpenWork](https://supergok.com/openwork-open-source-ai-agent-framework/)**
— desktop GUI agent built on `deepagentsjs`. Multi-step planning,
filesystem access, **subagent delegation** baked in. `npx` to start.
- **[Goose](https://github.com/block/goose)** by Block — MCP-native,
opinionated about toolkits. Strong fit if you're going all-in on MCP.
- **[Pi](https://github.com/bradAGI/pi-mono)** — minimal terminal
harness; small surface area, easy to read and extend.
- Catalog: <https://github.com/bradAGI/awesome-cli-coding-agents>
When to add another: when you hit a workflow OpenCode doesn't handle
well. Aider for refactors, Cline for IDE-bound work, OpenHands for
autonomous loops, OpenWork for subagent-heavy planning.
## Layer 2: Orchestration frameworks
When you want to *build* a pipeline rather than drive a prebuilt one —
write code that defines agents, tools, and a flow. Overkill for solo
coding; on-target for personal automation pipelines.
- **[LangGraph](https://github.com/langchain-ai/langgraph)** — directed
graph with conditional edges, built-in checkpointing with time-travel.
Surpassed CrewAI in stars in early 2026; production-grade.
- **[CrewAI](https://github.com/crewAIInc/crewAI)** — role-based crews,
agents/tasks/crew in <20 lines of Python. Lowest learning curve,
~5× faster than LangGraph in many cases. A2A protocol support.
- **[AutoGen / AG2](https://github.com/microsoft/autogen)** —
event-driven, async-first; GroupChat coordination with a selector
deciding who speaks next.
- **[OpenAI Agents SDK](https://github.com/openai/openai-agents-python)** —
production replacement for Swarm. Explicit agent-to-agent handoffs
carrying conversation context.
- **[OpenAgents](https://github.com/openagents/openagents)** — only
framework with native support for **both MCP and A2A** protocols.
All five are model-agnostic — point them at the local Ollama endpoint
or LiteLLM and they work the same as if you were using GPT-5.
## Layer 3: Management & observability
The Claude Cowork analog layer. Dispatch work, track progress, monitor
spend. The piece I missed in earlier roadmap drafts:
- **[Multica](https://github.com/multica-ai/multica)** — open-source
platform for managing AI coding agents *as teammates*. File issues,
assign to agents, they pick up work, write code, report blockers,
update statuses. Skill accumulation across tasks. Dispatches to
Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes,
Gemini, Kiro, etc. **#1 GitHub TypeScript Trending in April 2026,
10.7 k stars in three months.** Closest existing OSS analog to
Cowork; works *above* OpenCode rather than replacing it.
- **[Plane](https://plane.so/)** — AI-native project management with
MCP server and full Agent Run lifecycle tracking. Velocity, workload,
blockers auto-populated. Self-hostable.
- **[Mission Control](https://github.com/builderz-labs/mission-control)** —
self-hosted dashboard for agent fleets. Token-spend per model, trend
charts, real-time posture scoring. SQLite, zero deps. Useful even
solo just to watch your local model's spend pattern.
- **[ClawDeck](https://github.com/builderz-labs/clawdeck)** — focused
kanban for OpenClaw agents. Smaller scope, more opinionated.
- **[OpenWebUI](https://github.com/open-webui/open-webui)** — multi-user
chat UI in front of Ollama. Doesn't manage *agent sessions*, but
covers shared-LLM-access for a household.
**Still gap (May 2026):** *live* shared agent sessions — multiple humans
driving one agent on one repo in real time, Cowork's multiplayer mode.
Multica is async-ticket-flow, not live-collaborative. Watch for
projects that close this.
## Layer 0: Cross-cutting capabilities
Things you wire in once, useful from any harness above.
### Aggregation & routing
- **[LiteLLM](https://github.com/BerriAI/litellm)** — proxy that exposes
one OpenAI-compatible endpoint over many backends. Rules like "code
task → local Qwen3, image input → vision model, fast chat → Haiku."
Worth standing up early so all higher-layer tools just point at one
URL.
### Code intelligence / RAG
The local model can't see beyond what fits in its prompt. Claude Code
papers over this with long-context recall + WebFetch; locally we need
explicit retrieval.
- **[Continue.dev](https://continue.dev)** — VS Code / JetBrains plugin
with built-in `@codebase` indexer. Pairs with `nomic-embed-text` via
Ollama for local embeddings. **Biggest single-shot quality lift** for
local-model coding.
- **[Tabby](https://github.com/TabbyML/tabby)** — self-hosted
Copilot-style inline completion. Different niche from agentic coding,
covers the "as I type" loop OpenCode doesn't.
### Web access — done
Already wired in `localgenai/opencode/`:
- **[@playwright/mcp](https://github.com/microsoft/playwright-mcp)** —
browser automation. Navigate, click, fill forms, read DOM snapshots.
- **[mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)** —
web search via your `searxng.n0n.io` instance. No API keys.
- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
not yet wired; useful if Playwright feels heavy for plain page reads.
### Persistent memory
OpenCode's memory is per-session unless you wire MCP for it.
- **`mcp-server-memory`** (official) — knowledge-graph memory across
any MCP client. Crude but works.
- **[mem0](https://github.com/mem0ai/mem0)** — sophisticated memory,
self-hostable with a local embedding model.
### Multi-modal
- **Vision**: `qwen2.5-vl:7b` or `llama3.2-vision:11b` via Ollama.
Route via LiteLLM so OpenCode auto-picks vision for image-bearing
prompts. Closes the screenshot gap.
- **Voice**: `whisper-compose.yaml` and `piper-compose.yaml` are
already in `localgenai/`. STT in, TTS out — a "talk to your local
assistant" loop is something Claude Code itself doesn't do.
### Design (genuine new capability)
Anthropic ships **Claude Design** (Anthropic Labs); the OSS analog is
real and active:
- **[nexu-io/open-design](https://github.com/nexu-io/open-design)** —
local-first OSS alternative to Claude Design. 19 Skills, 71
brand-grade Design Systems, generates web/desktop/mobile prototypes,
slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4
export. **Ships a stdio MCP server** exposing design source — tokens,
JSX components, entry HTML — as a structured API to any MCP-aware
agent. Pairs naturally with OpenCode + the local model.
- **[Penpot](https://penpot.app)** — full OSS Figma alternative.
Different scope (interactive design tool, not agentic).
- **[OpenPencil](https://github.com/ZSeven-W/openpencil)** — AI-native
vector design tool, "design-as-code." Newer; track.
### Skills / MCP catalog
- **[Official MCP registry](https://registry.modelcontextprotocol.io/)**
- **[Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers)**
(1200+ entries)
- **[mcpservers.org](https://mcpservers.org/)** searchable directory
Build the skills you miss; publish them.
## Prioritized next steps
In rough order of impact-per-hour-spent:
1. **Continue.dev + `nomic-embed-text` in VS Code.** Biggest single
quality jump for local-model coding via codebase indexing.
2. **Symlink `localgenai/opencode/opencode.json`** into place so the
Playwright + SearXNG MCP servers activate. (Already written;
one `ln -sf` away.)
3. **Stand up LiteLLM** as the aggregation layer in front of Ollama —
even at one model, the abstraction pays off the moment you add a
second.
4. **Stand up Multica** for ticket-style work dispatch. Closest Cowork
analog; useful even solo for tracking what the agent has done across
sessions.
5. **Add a vision model** (`qwen2.5-vl:7b`) and route image prompts via
LiteLLM. Closes the screenshot gap.
6. **Stand up OpenDesign** as a compose stack. Connect its MCP server
to OpenCode. Genuinely new capability versus Claude Code.
7. **Try Aider on a real refactor** — different agentic loop than
OpenCode, often better with smaller-context local models.
8. **Wire whisper + piper into a voice loop.** Not a Claude Code parity
item — a *new* capability the local stack uniquely enables.
9. **Mission Control on the box** for token-spend visibility once
multi-model routing through LiteLLM is real.
10. **Evaluate OpenHands or Goose** for autonomous agentic loops where
OpenCode's per-task UX feels too hands-on.
11. **OpenWebUI for household-shared chat access.** Adds value for
non-coding LLM use, fills a small gap in the multi-user story.
## Reference catalogs to monitor
- Official MCP registry: <https://registry.modelcontextprotocol.io/>
- Awesome CLI coding agents: <https://github.com/bradAGI/awesome-cli-coding-agents>
- Awesome MCP servers: <https://github.com/punkpeye/awesome-mcp-servers>
- mcpservers.org directory: <https://mcpservers.org/>
- Anthropic plugin catalog (parity-tracking): <https://claude.com/plugins>
## Open gaps to keep watching
- **Live shared agent sessions** — Cowork's multiplayer mode (multiple
humans driving one agent on one repo, real-time). No direct OSS
equivalent yet; Multica is async-ticket-flow.
- **Cohesive "skills" library** as polished as Anthropic's plugin
catalog. The MCP ecosystem has the parts but lacks the curation.
- **AMD ROCm packages for Ubuntu 26.04** — relevant if we ever decide
to abandon containerization for native ROCm tooling. (Tracked
separately in `TODO.md`.)