Initial commit: localgenai stack
Containerized local LLM stack for the Framework Desktop / Strix Halo,
plus the OpenCode harness on the Mac side.
- pyinfra/framework/: pyinfra deploy targeting the box
- llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override
for gfx1151), OpenWebUI
- Beszel (host + container + AMD GPU dashboard via sysfs)
- OpenLIT (LLM fleet metrics)
- Phoenix (per-trace agent waterfall)
- OpenHands (autonomous agent in a Docker sandbox)
- opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter)
- install.sh deploys to ~/.config/opencode/
- StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md:
documentation and planning
- testing/qwen3-coder-30b/: small evaluation harness
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
263
Roadmap.md
Normal file
263
Roadmap.md
Normal file
@@ -0,0 +1,263 @@
|
||||
# Local GenAI roadmap
|
||||
|
||||
What to build on top of the Framework Desktop / Strix Halo stack to
|
||||
narrow the functional gap with Claude Code + Anthropic's broader
|
||||
ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit
|
||||
quarterly.
|
||||
|
||||
## Where we stand
|
||||
|
||||
- Containerized inference at `/srv/docker/{llama,vllm,ollama}`
|
||||
- Models on the unified-memory iGPU at 64 GB UMA
|
||||
- OpenCode on the Mac, pointed at the box over Tailscale
|
||||
- Playwright + SearXNG MCP wired into OpenCode (`localgenai/opencode/`)
|
||||
- Self-hosted SearXNG instance at `https://searxng.n0n.io`
|
||||
- Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B
|
||||
|
||||
## The mental model: three layers
|
||||
|
||||
The OSS ecosystem stratifies cleanly once you stop thinking about
|
||||
individual products:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Layer 3: Management & observability │
|
||||
│ Dispatch tickets to agents, watch spend, track skill │
|
||||
│ accumulation. Where Claude Cowork lives. (Multica, Plane.) │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│ assigns work to ↓
|
||||
┌────────────────────────────┴────────────────────────────────────┐
|
||||
│ Layer 2: Orchestration frameworks │
|
||||
│ Code that defines who does what, in what order. Reach for │
|
||||
│ when you want to *encode* a pipeline, not drive interactively. │
|
||||
│ (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.) │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│ runs ↓
|
||||
┌────────────────────────────┴────────────────────────────────────┐
|
||||
│ Layer 1: Harnesses │
|
||||
│ The thing you actually drive — terminal or IDE agent that │
|
||||
│ takes a prompt and uses tools. (OpenCode, Aider, Cline, │
|
||||
│ OpenHands, OpenWork, Goose, Pi.) │
|
||||
└────────────────────────────┬────────────────────────────────────┘
|
||||
│ calls ↓
|
||||
┌────────────────────────────┴────────────────────────────────────┐
|
||||
│ Layer 0: Inference + tools │
|
||||
│ Models, MCP servers, embedding stores, voice/vision pipelines. │
|
||||
│ (Ollama, vLLM, llama.cpp, MCP catalog.) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Pick what you need from each layer; you don't need anything from a
|
||||
higher layer if the one below is doing the job.
|
||||
|
||||
## Layer 1: Harnesses
|
||||
|
||||
The interactive driver's seat. We use **OpenCode**; alternatives worth
|
||||
having in your back pocket:
|
||||
|
||||
- **[Aider](https://aider.chat)** — terminal, git-native, every change
|
||||
is a commit. Strong on multi-file refactors. Often outperforms
|
||||
OpenCode on small-context local models.
|
||||
- **[Cline](https://github.com/cline/cline)** — VS Code agent, 5M+
|
||||
installs. v3.58 added native subagents and a CI/CD headless mode.
|
||||
Closest "Claude Code in the IDE" feel.
|
||||
- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** (was
|
||||
OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs
|
||||
tests, browses. Heavier than OpenCode, more autonomous.
|
||||
- **[OpenWork](https://supergok.com/openwork-open-source-ai-agent-framework/)**
|
||||
— desktop GUI agent built on `deepagentsjs`. Multi-step planning,
|
||||
filesystem access, **subagent delegation** baked in. `npx` to start.
|
||||
- **[Goose](https://github.com/block/goose)** by Block — MCP-native,
|
||||
opinionated about toolkits. Strong fit if you're going all-in on MCP.
|
||||
- **[Pi](https://github.com/bradAGI/pi-mono)** — minimal terminal
|
||||
harness; small surface area, easy to read and extend.
|
||||
- Catalog: <https://github.com/bradAGI/awesome-cli-coding-agents>
|
||||
|
||||
When to add another: when you hit a workflow OpenCode doesn't handle
|
||||
well. Aider for refactors, Cline for IDE-bound work, OpenHands for
|
||||
autonomous loops, OpenWork for subagent-heavy planning.
|
||||
|
||||
## Layer 2: Orchestration frameworks
|
||||
|
||||
When you want to *build* a pipeline rather than drive a prebuilt one —
|
||||
write code that defines agents, tools, and a flow. Overkill for solo
|
||||
coding; on-target for personal automation pipelines.
|
||||
|
||||
- **[LangGraph](https://github.com/langchain-ai/langgraph)** — directed
|
||||
graph with conditional edges, built-in checkpointing with time-travel.
|
||||
Surpassed CrewAI in stars in early 2026; production-grade.
|
||||
- **[CrewAI](https://github.com/crewAIInc/crewAI)** — role-based crews,
|
||||
agents/tasks/crew in <20 lines of Python. Lowest learning curve,
|
||||
~5× faster than LangGraph in many cases. A2A protocol support.
|
||||
- **[AutoGen / AG2](https://github.com/microsoft/autogen)** —
|
||||
event-driven, async-first; GroupChat coordination with a selector
|
||||
deciding who speaks next.
|
||||
- **[OpenAI Agents SDK](https://github.com/openai/openai-agents-python)** —
|
||||
production replacement for Swarm. Explicit agent-to-agent handoffs
|
||||
carrying conversation context.
|
||||
- **[OpenAgents](https://github.com/openagents/openagents)** — only
|
||||
framework with native support for **both MCP and A2A** protocols.
|
||||
|
||||
All five are model-agnostic — point them at the local Ollama endpoint
|
||||
or LiteLLM and they work the same as if you were using GPT-5.
|
||||
|
||||
## Layer 3: Management & observability
|
||||
|
||||
The Claude Cowork analog layer. Dispatch work, track progress, monitor
|
||||
spend. The piece I missed in earlier roadmap drafts:
|
||||
|
||||
- **[Multica](https://github.com/multica-ai/multica)** — open-source
|
||||
platform for managing AI coding agents *as teammates*. File issues,
|
||||
assign to agents, they pick up work, write code, report blockers,
|
||||
update statuses. Skill accumulation across tasks. Dispatches to
|
||||
Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes,
|
||||
Gemini, Kiro, etc. **#1 GitHub TypeScript Trending in April 2026,
|
||||
10.7 k stars in three months.** Closest existing OSS analog to
|
||||
Cowork; works *above* OpenCode rather than replacing it.
|
||||
- **[Plane](https://plane.so/)** — AI-native project management with
|
||||
MCP server and full Agent Run lifecycle tracking. Velocity, workload,
|
||||
blockers auto-populated. Self-hostable.
|
||||
- **[Mission Control](https://github.com/builderz-labs/mission-control)** —
|
||||
self-hosted dashboard for agent fleets. Token-spend per model, trend
|
||||
charts, real-time posture scoring. SQLite, zero deps. Useful even
|
||||
solo just to watch your local model's spend pattern.
|
||||
- **[ClawDeck](https://github.com/builderz-labs/clawdeck)** — focused
|
||||
kanban for OpenClaw agents. Smaller scope, more opinionated.
|
||||
- **[OpenWebUI](https://github.com/open-webui/open-webui)** — multi-user
|
||||
chat UI in front of Ollama. Doesn't manage *agent sessions*, but
|
||||
covers shared-LLM-access for a household.
|
||||
|
||||
**Still gap (May 2026):** *live* shared agent sessions — multiple humans
|
||||
driving one agent on one repo in real time, Cowork's multiplayer mode.
|
||||
Multica is async-ticket-flow, not live-collaborative. Watch for
|
||||
projects that close this.
|
||||
|
||||
## Layer 0: Cross-cutting capabilities
|
||||
|
||||
Things you wire in once, useful from any harness above.
|
||||
|
||||
### Aggregation & routing
|
||||
|
||||
- **[LiteLLM](https://github.com/BerriAI/litellm)** — proxy that exposes
|
||||
one OpenAI-compatible endpoint over many backends. Rules like "code
|
||||
task → local Qwen3, image input → vision model, fast chat → Haiku."
|
||||
Worth standing up early so all higher-layer tools just point at one
|
||||
URL.
|
||||
|
||||
### Code intelligence / RAG
|
||||
|
||||
The local model can't see beyond what fits in its prompt. Claude Code
|
||||
papers over this with long-context recall + WebFetch; locally we need
|
||||
explicit retrieval.
|
||||
|
||||
- **[Continue.dev](https://continue.dev)** — VS Code / JetBrains plugin
|
||||
with built-in `@codebase` indexer. Pairs with `nomic-embed-text` via
|
||||
Ollama for local embeddings. **Biggest single-shot quality lift** for
|
||||
local-model coding.
|
||||
- **[Tabby](https://github.com/TabbyML/tabby)** — self-hosted
|
||||
Copilot-style inline completion. Different niche from agentic coding,
|
||||
covers the "as I type" loop OpenCode doesn't.
|
||||
|
||||
### Web access — done
|
||||
|
||||
Already wired in `localgenai/opencode/`:
|
||||
|
||||
- **[@playwright/mcp](https://github.com/microsoft/playwright-mcp)** —
|
||||
browser automation. Navigate, click, fill forms, read DOM snapshots.
|
||||
- **[mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)** —
|
||||
web search via your `searxng.n0n.io` instance. No API keys.
|
||||
- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
|
||||
not yet wired; useful if Playwright feels heavy for plain page reads.
|
||||
|
||||
### Persistent memory
|
||||
|
||||
OpenCode's memory is per-session unless you wire MCP for it.
|
||||
|
||||
- **`mcp-server-memory`** (official) — knowledge-graph memory across
|
||||
any MCP client. Crude but works.
|
||||
- **[mem0](https://github.com/mem0ai/mem0)** — sophisticated memory,
|
||||
self-hostable with a local embedding model.
|
||||
|
||||
### Multi-modal
|
||||
|
||||
- **Vision**: `qwen2.5-vl:7b` or `llama3.2-vision:11b` via Ollama.
|
||||
Route via LiteLLM so OpenCode auto-picks vision for image-bearing
|
||||
prompts. Closes the screenshot gap.
|
||||
- **Voice**: `whisper-compose.yaml` and `piper-compose.yaml` are
|
||||
already in `localgenai/`. STT in, TTS out — a "talk to your local
|
||||
assistant" loop is something Claude Code itself doesn't do.
|
||||
|
||||
### Design (genuine new capability)
|
||||
|
||||
Anthropic ships **Claude Design** (Anthropic Labs); the OSS analog is
|
||||
real and active:
|
||||
|
||||
- **[nexu-io/open-design](https://github.com/nexu-io/open-design)** —
|
||||
local-first OSS alternative to Claude Design. 19 Skills, 71
|
||||
brand-grade Design Systems, generates web/desktop/mobile prototypes,
|
||||
slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4
|
||||
export. **Ships a stdio MCP server** exposing design source — tokens,
|
||||
JSX components, entry HTML — as a structured API to any MCP-aware
|
||||
agent. Pairs naturally with OpenCode + the local model.
|
||||
- **[Penpot](https://penpot.app)** — full OSS Figma alternative.
|
||||
Different scope (interactive design tool, not agentic).
|
||||
- **[OpenPencil](https://github.com/ZSeven-W/openpencil)** — AI-native
|
||||
vector design tool, "design-as-code." Newer; track.
|
||||
|
||||
### Skills / MCP catalog
|
||||
|
||||
- **[Official MCP registry](https://registry.modelcontextprotocol.io/)**
|
||||
- **[Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers)**
|
||||
(1200+ entries)
|
||||
- **[mcpservers.org](https://mcpservers.org/)** searchable directory
|
||||
|
||||
Build the skills you miss; publish them.
|
||||
|
||||
## Prioritized next steps
|
||||
|
||||
In rough order of impact-per-hour-spent:
|
||||
|
||||
1. **Continue.dev + `nomic-embed-text` in VS Code.** Biggest single
|
||||
quality jump for local-model coding via codebase indexing.
|
||||
2. **Symlink `localgenai/opencode/opencode.json`** into place so the
|
||||
Playwright + SearXNG MCP servers activate. (Already written;
|
||||
one `ln -sf` away.)
|
||||
3. **Stand up LiteLLM** as the aggregation layer in front of Ollama —
|
||||
even at one model, the abstraction pays off the moment you add a
|
||||
second.
|
||||
4. **Stand up Multica** for ticket-style work dispatch. Closest Cowork
|
||||
analog; useful even solo for tracking what the agent has done across
|
||||
sessions.
|
||||
5. **Add a vision model** (`qwen2.5-vl:7b`) and route image prompts via
|
||||
LiteLLM. Closes the screenshot gap.
|
||||
6. **Stand up OpenDesign** as a compose stack. Connect its MCP server
|
||||
to OpenCode. Genuinely new capability versus Claude Code.
|
||||
7. **Try Aider on a real refactor** — different agentic loop than
|
||||
OpenCode, often better with smaller-context local models.
|
||||
8. **Wire whisper + piper into a voice loop.** Not a Claude Code parity
|
||||
item — a *new* capability the local stack uniquely enables.
|
||||
9. **Mission Control on the box** for token-spend visibility once
|
||||
multi-model routing through LiteLLM is real.
|
||||
10. **Evaluate OpenHands or Goose** for autonomous agentic loops where
|
||||
OpenCode's per-task UX feels too hands-on.
|
||||
11. **OpenWebUI for household-shared chat access.** Adds value for
|
||||
non-coding LLM use, fills a small gap in the multi-user story.
|
||||
|
||||
## Reference catalogs to monitor
|
||||
|
||||
- Official MCP registry: <https://registry.modelcontextprotocol.io/>
|
||||
- Awesome CLI coding agents: <https://github.com/bradAGI/awesome-cli-coding-agents>
|
||||
- Awesome MCP servers: <https://github.com/punkpeye/awesome-mcp-servers>
|
||||
- mcpservers.org directory: <https://mcpservers.org/>
|
||||
- Anthropic plugin catalog (parity-tracking): <https://claude.com/plugins>
|
||||
|
||||
## Open gaps to keep watching
|
||||
|
||||
- **Live shared agent sessions** — Cowork's multiplayer mode (multiple
|
||||
humans driving one agent on one repo, real-time). No direct OSS
|
||||
equivalent yet; Multica is async-ticket-flow.
|
||||
- **Cohesive "skills" library** as polished as Anthropic's plugin
|
||||
catalog. The MCP ecosystem has the parts but lacks the curation.
|
||||
- **AMD ROCm packages for Ubuntu 26.04** — relevant if we ever decide
|
||||
to abandon containerization for native ROCm tooling. (Tracked
|
||||
separately in `TODO.md`.)
|
||||
Reference in New Issue
Block a user