Snapshot of where opencode + Qwen3-Coder + MCPs + Kimi-Linear + voice + Phoenix tracing land today, plus in-flight (oc-tree, kimi-linear context ramp) and next (ComfyUI) items with pointers to per-project NEXT_STEPS.md guides.
303 lines
16 KiB
Markdown
303 lines
16 KiB
Markdown
# Local GenAI roadmap
|
||
|
||
What to build on top of the Framework Desktop / Strix Halo stack to
|
||
narrow the functional gap with Claude Code + Anthropic's broader
|
||
ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit
|
||
quarterly.
|
||
|
||
## Where we stand
|
||
|
||
- Containerized inference at `/srv/docker/{llama,vllm,ollama}`
|
||
- Models on the unified-memory iGPU at 64 GB UMA
|
||
- OpenCode on the Mac, pointed at the box over Tailscale
|
||
- Playwright + SearXNG MCP wired into OpenCode (`localgenai/opencode/`)
|
||
- Self-hosted SearXNG instance at `https://searxng.n0n.io`
|
||
- Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B
|
||
|
||
## The mental model: three layers
|
||
|
||
The OSS ecosystem stratifies cleanly once you stop thinking about
|
||
individual products:
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ Layer 3: Management & observability │
|
||
│ Dispatch tickets to agents, watch spend, track skill │
|
||
│ accumulation. Where Claude Cowork lives. (Multica, Plane.) │
|
||
└────────────────────────────┬────────────────────────────────────┘
|
||
│ assigns work to ↓
|
||
┌────────────────────────────┴────────────────────────────────────┐
|
||
│ Layer 2: Orchestration frameworks │
|
||
│ Code that defines who does what, in what order. Reach for │
|
||
│ when you want to *encode* a pipeline, not drive interactively. │
|
||
│ (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.) │
|
||
└────────────────────────────┬────────────────────────────────────┘
|
||
│ runs ↓
|
||
┌────────────────────────────┴────────────────────────────────────┐
|
||
│ Layer 1: Harnesses │
|
||
│ The thing you actually drive — terminal or IDE agent that │
|
||
│ takes a prompt and uses tools. (OpenCode, Aider, Cline, │
|
||
│ OpenHands, OpenWork, Goose, Pi.) │
|
||
└────────────────────────────┬────────────────────────────────────┘
|
||
│ calls ↓
|
||
┌────────────────────────────┴────────────────────────────────────┐
|
||
│ Layer 0: Inference + tools │
|
||
│ Models, MCP servers, embedding stores, voice/vision pipelines. │
|
||
│ (Ollama, vLLM, llama.cpp, MCP catalog.) │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
Pick what you need from each layer; you don't need anything from a
|
||
higher layer if the one below is doing the job.
|
||
|
||
## Layer 1: Harnesses
|
||
|
||
The interactive driver's seat. We use **OpenCode**; alternatives worth
|
||
having in your back pocket:
|
||
|
||
- **[Aider](https://aider.chat)** — terminal, git-native, every change
|
||
is a commit. Strong on multi-file refactors. Often outperforms
|
||
OpenCode on small-context local models.
|
||
- **[Cline](https://github.com/cline/cline)** — VS Code agent, 5M+
|
||
installs. v3.58 added native subagents and a CI/CD headless mode.
|
||
Closest "Claude Code in the IDE" feel.
|
||
- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** (was
|
||
OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs
|
||
tests, browses. Heavier than OpenCode, more autonomous.
|
||
- **[OpenWork](https://supergok.com/openwork-open-source-ai-agent-framework/)**
|
||
— desktop GUI agent built on `deepagentsjs`. Multi-step planning,
|
||
filesystem access, **subagent delegation** baked in. `npx` to start.
|
||
- **[Goose](https://github.com/block/goose)** by Block — MCP-native,
|
||
opinionated about toolkits. Strong fit if you're going all-in on MCP.
|
||
- **[Pi](https://github.com/bradAGI/pi-mono)** — minimal terminal
|
||
harness; small surface area, easy to read and extend.
|
||
- Catalog: <https://github.com/bradAGI/awesome-cli-coding-agents>
|
||
|
||
When to add another: when you hit a workflow OpenCode doesn't handle
|
||
well. Aider for refactors, Cline for IDE-bound work, OpenHands for
|
||
autonomous loops, OpenWork for subagent-heavy planning.
|
||
|
||
## Layer 2: Orchestration frameworks
|
||
|
||
When you want to *build* a pipeline rather than drive a prebuilt one —
|
||
write code that defines agents, tools, and a flow. Overkill for solo
|
||
coding; on-target for personal automation pipelines.
|
||
|
||
- **[LangGraph](https://github.com/langchain-ai/langgraph)** — directed
|
||
graph with conditional edges, built-in checkpointing with time-travel.
|
||
Surpassed CrewAI in stars in early 2026; production-grade.
|
||
- **[CrewAI](https://github.com/crewAIInc/crewAI)** — role-based crews,
|
||
agents/tasks/crew in <20 lines of Python. Lowest learning curve,
|
||
~5× faster than LangGraph in many cases. A2A protocol support.
|
||
- **[AutoGen / AG2](https://github.com/microsoft/autogen)** —
|
||
event-driven, async-first; GroupChat coordination with a selector
|
||
deciding who speaks next.
|
||
- **[OpenAI Agents SDK](https://github.com/openai/openai-agents-python)** —
|
||
production replacement for Swarm. Explicit agent-to-agent handoffs
|
||
carrying conversation context.
|
||
- **[OpenAgents](https://github.com/openagents/openagents)** — only
|
||
framework with native support for **both MCP and A2A** protocols.
|
||
|
||
All five are model-agnostic — point them at the local Ollama endpoint
|
||
or LiteLLM and they work the same as if you were using GPT-5.
|
||
|
||
## Layer 3: Management & observability
|
||
|
||
The Claude Cowork analog layer. Dispatch work, track progress, monitor
|
||
spend. The piece I missed in earlier roadmap drafts:
|
||
|
||
- **[Multica](https://github.com/multica-ai/multica)** — open-source
|
||
platform for managing AI coding agents *as teammates*. File issues,
|
||
assign to agents, they pick up work, write code, report blockers,
|
||
update statuses. Skill accumulation across tasks. Dispatches to
|
||
Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes,
|
||
Gemini, Kiro, etc. **#1 GitHub TypeScript Trending in April 2026,
|
||
10.7 k stars in three months.** Closest existing OSS analog to
|
||
Cowork; works *above* OpenCode rather than replacing it.
|
||
- **[Plane](https://plane.so/)** — AI-native project management with
|
||
MCP server and full Agent Run lifecycle tracking. Velocity, workload,
|
||
blockers auto-populated. Self-hostable.
|
||
- **[Mission Control](https://github.com/builderz-labs/mission-control)** —
|
||
self-hosted dashboard for agent fleets. Token-spend per model, trend
|
||
charts, real-time posture scoring. SQLite, zero deps. Useful even
|
||
solo just to watch your local model's spend pattern.
|
||
- **[ClawDeck](https://github.com/builderz-labs/clawdeck)** — focused
|
||
kanban for OpenClaw agents. Smaller scope, more opinionated.
|
||
- **[OpenWebUI](https://github.com/open-webui/open-webui)** — multi-user
|
||
chat UI in front of Ollama. Doesn't manage *agent sessions*, but
|
||
covers shared-LLM-access for a household.
|
||
|
||
**Still gap (May 2026):** *live* shared agent sessions — multiple humans
|
||
driving one agent on one repo in real time, Cowork's multiplayer mode.
|
||
Multica is async-ticket-flow, not live-collaborative. Watch for
|
||
projects that close this.
|
||
|
||
## Layer 0: Cross-cutting capabilities
|
||
|
||
Things you wire in once, useful from any harness above.
|
||
|
||
### Aggregation & routing
|
||
|
||
- **[LiteLLM](https://github.com/BerriAI/litellm)** — proxy that exposes
|
||
one OpenAI-compatible endpoint over many backends. Rules like "code
|
||
task → local Qwen3, image input → vision model, fast chat → Haiku."
|
||
Worth standing up early so all higher-layer tools just point at one
|
||
URL.
|
||
|
||
### Code intelligence / RAG
|
||
|
||
The local model can't see beyond what fits in its prompt. Claude Code
|
||
papers over this with long-context recall + WebFetch; locally we need
|
||
explicit retrieval.
|
||
|
||
- **[Continue.dev](https://continue.dev)** — VS Code / JetBrains plugin
|
||
with built-in `@codebase` indexer. Pairs with `nomic-embed-text` via
|
||
Ollama for local embeddings. **Biggest single-shot quality lift** for
|
||
local-model coding.
|
||
- **[Tabby](https://github.com/TabbyML/tabby)** — self-hosted
|
||
Copilot-style inline completion. Different niche from agentic coding,
|
||
covers the "as I type" loop OpenCode doesn't.
|
||
|
||
### Web access — done
|
||
|
||
Already wired in `localgenai/opencode/`:
|
||
|
||
- **[@playwright/mcp](https://github.com/microsoft/playwright-mcp)** —
|
||
browser automation. Navigate, click, fill forms, read DOM snapshots.
|
||
- **[mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)** —
|
||
web search via your `searxng.n0n.io` instance. No API keys.
|
||
- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
|
||
not yet wired; useful if Playwright feels heavy for plain page reads.
|
||
|
||
**Next: cited-search variant.** Fork `mcp-searxng` (or wrap it) to expose
|
||
a `search_cited` tool that auto-fetches the top 3-5 results, extracts
|
||
readable text via mozilla/readability or trafilatura, and returns
|
||
numbered `[n] URL\n<excerpt>` blocks with a tool description telling the
|
||
model to cite as `[n]`. Closes the citation-quality gap with
|
||
[opencode-websearch-cited](https://github.com/ghoulr/opencode-websearch-cited)
|
||
(which delegates to Gemini/OpenAI/OpenRouter) while staying fully local.
|
||
Tradeoff: 3-5 HTTP fetches per query → slower; needs timeout + graceful
|
||
skip for sites that block scrapers.
|
||
|
||
### Observability — partial
|
||
|
||
- **[Arize Phoenix](https://github.com/Arize-ai/phoenix)** wired via
|
||
`opencode/.opencode/plugin/phoenix-bridge.js`: OpenCode's experimental
|
||
OTel spans go to Phoenix at `framework:6006` with the OpenInference
|
||
span processor for LLM/tool-aware attributes. Subagents nest under the
|
||
parent session as a unified trace tree. **External** trace viewing is
|
||
solved.
|
||
- **Gap: in-harness visibility.** OpenCode's TUI has no sidebar plugin
|
||
API ([sst/opencode#5971](https://github.com/sst/opencode/issues/5971),
|
||
open) and no built-in tree/timeline view. Plugins can only show
|
||
toasts. While driving the agent, there's no live "what's it doing
|
||
right now" pane — you have to alt-tab to Phoenix.
|
||
- **Next: SSE sidecar.** `opencode serve` exposes documented SSE streams
|
||
at `/global/event` and `/session/{id}/event` carrying every tool call,
|
||
message part, and child-session creation. A small TUI sidecar
|
||
(Bubbletea/textual, ~200 lines) running in a tmux pane next to
|
||
opencode can subscribe and render a live tree:
|
||
`session → tool call → child session → tool call`. Subagents *are*
|
||
child sessions in the data model, so the hierarchy comes for free.
|
||
Phoenix stays the deep-trace store; sidecar is the live pane.
|
||
|
||
### Persistent memory
|
||
|
||
OpenCode's memory is per-session unless you wire MCP for it.
|
||
|
||
- **`mcp-server-memory`** (official) — knowledge-graph memory across
|
||
any MCP client. Crude but works.
|
||
- **[mem0](https://github.com/mem0ai/mem0)** — sophisticated memory,
|
||
self-hostable with a local embedding model.
|
||
|
||
### Multi-modal
|
||
|
||
- **Vision**: `qwen2.5-vl:7b` or `llama3.2-vision:11b` via Ollama.
|
||
Route via LiteLLM so OpenCode auto-picks vision for image-bearing
|
||
prompts. Closes the screenshot gap.
|
||
- **Voice**: `whisper-compose.yaml` and `piper-compose.yaml` are
|
||
already in `localgenai/`. STT in, TTS out — a "talk to your local
|
||
assistant" loop is something Claude Code itself doesn't do.
|
||
|
||
### Design (genuine new capability)
|
||
|
||
Anthropic ships **Claude Design** (Anthropic Labs); the OSS analog is
|
||
real and active:
|
||
|
||
- **[nexu-io/open-design](https://github.com/nexu-io/open-design)** —
|
||
local-first OSS alternative to Claude Design. 19 Skills, 71
|
||
brand-grade Design Systems, generates web/desktop/mobile prototypes,
|
||
slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4
|
||
export. **Ships a stdio MCP server** exposing design source — tokens,
|
||
JSX components, entry HTML — as a structured API to any MCP-aware
|
||
agent. Pairs naturally with OpenCode + the local model.
|
||
- **[Penpot](https://penpot.app)** — full OSS Figma alternative.
|
||
Different scope (interactive design tool, not agentic).
|
||
- **[OpenPencil](https://github.com/ZSeven-W/openpencil)** — AI-native
|
||
vector design tool, "design-as-code." Newer; track.
|
||
|
||
### Skills / MCP catalog
|
||
|
||
- **[Official MCP registry](https://registry.modelcontextprotocol.io/)**
|
||
- **[Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers)**
|
||
(1200+ entries)
|
||
- **[mcpservers.org](https://mcpservers.org/)** searchable directory
|
||
|
||
Build the skills you miss; publish them.
|
||
|
||
## Prioritized next steps
|
||
|
||
In rough order of impact-per-hour-spent:
|
||
|
||
1. **Continue.dev + `nomic-embed-text` in VS Code.** Biggest single
|
||
quality jump for local-model coding via codebase indexing.
|
||
2. **Symlink `localgenai/opencode/opencode.json`** into place so the
|
||
Playwright + SearXNG MCP servers activate. (Already written;
|
||
one `ln -sf` away.)
|
||
3. **Stand up LiteLLM** as the aggregation layer in front of Ollama —
|
||
even at one model, the abstraction pays off the moment you add a
|
||
second.
|
||
4. **Stand up Multica** for ticket-style work dispatch. Closest Cowork
|
||
analog; useful even solo for tracking what the agent has done across
|
||
sessions.
|
||
5. **Add a vision model** (`qwen2.5-vl:7b`) and route image prompts via
|
||
LiteLLM. Closes the screenshot gap.
|
||
6. **Stand up OpenDesign** as a compose stack. Connect its MCP server
|
||
to OpenCode. Genuinely new capability versus Claude Code.
|
||
7. **Try Aider on a real refactor** — different agentic loop than
|
||
OpenCode, often better with smaller-context local models.
|
||
8. **Wire whisper + piper into a voice loop.** Not a Claude Code parity
|
||
item — a *new* capability the local stack uniquely enables.
|
||
9. **Mission Control on the box** for token-spend visibility once
|
||
multi-model routing through LiteLLM is real.
|
||
10. **Evaluate OpenHands or Goose** for autonomous agentic loops where
|
||
OpenCode's per-task UX feels too hands-on.
|
||
11. **OpenWebUI for household-shared chat access.** Adds value for
|
||
non-coding LLM use, fills a small gap in the multi-user story.
|
||
12. **SSE sidecar for in-harness visibility.** Live tree pane subscribed
|
||
to opencode's `/session/{id}/event` SSE; closes the "what's it
|
||
doing right now" gap that Phoenix doesn't fill (Phoenix is for
|
||
after-the-fact deep traces, not glanceable status).
|
||
13. **Cited-search MCP.** Fork `mcp-searxng` to add `search_cited` with
|
||
auto-fetch + readability extraction. Smaller-impact than the
|
||
sidecar but cheap and keeps everything local.
|
||
|
||
## Reference catalogs to monitor
|
||
|
||
- Official MCP registry: <https://registry.modelcontextprotocol.io/>
|
||
- Awesome CLI coding agents: <https://github.com/bradAGI/awesome-cli-coding-agents>
|
||
- Awesome MCP servers: <https://github.com/punkpeye/awesome-mcp-servers>
|
||
- mcpservers.org directory: <https://mcpservers.org/>
|
||
- Anthropic plugin catalog (parity-tracking): <https://claude.com/plugins>
|
||
|
||
## Open gaps to keep watching
|
||
|
||
- **Live shared agent sessions** — Cowork's multiplayer mode (multiple
|
||
humans driving one agent on one repo, real-time). No direct OSS
|
||
equivalent yet; Multica is async-ticket-flow.
|
||
- **Cohesive "skills" library** as polished as Anthropic's plugin
|
||
catalog. The MCP ecosystem has the parts but lacks the curation.
|
||
- **AMD ROCm packages for Ubuntu 26.04** — relevant if we ever decide
|
||
to abandon containerization for native ROCm tooling. (Tracked
|
||
separately in `TODO.md`.)
|