264 lines
14 KiB
Markdown
264 lines
14 KiB
Markdown
|
|
# Local GenAI roadmap
|
|||
|
|
|
|||
|
|
What to build on top of the Framework Desktop / Strix Halo stack to
|
|||
|
|
narrow the functional gap with Claude Code + Anthropic's broader
|
|||
|
|
ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit
|
|||
|
|
quarterly.
|
|||
|
|
|
|||
|
|
## Where we stand
|
|||
|
|
|
|||
|
|
- Containerized inference at `/srv/docker/{llama,vllm,ollama}`
|
|||
|
|
- Models on the unified-memory iGPU at 64 GB UMA
|
|||
|
|
- OpenCode on the Mac, pointed at the box over Tailscale
|
|||
|
|
- Playwright + SearXNG MCP wired into OpenCode (`localgenai/opencode/`)
|
|||
|
|
- Self-hosted SearXNG instance at `https://searxng.n0n.io`
|
|||
|
|
- Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B
|
|||
|
|
|
|||
|
|
## The mental model: three layers
|
|||
|
|
|
|||
|
|
The OSS ecosystem stratifies cleanly once you stop thinking about
|
|||
|
|
individual products:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|||
|
|
│ Layer 3: Management & observability │
|
|||
|
|
│ Dispatch tickets to agents, watch spend, track skill │
|
|||
|
|
│ accumulation. Where Claude Cowork lives. (Multica, Plane.) │
|
|||
|
|
└────────────────────────────┬────────────────────────────────────┘
|
|||
|
|
│ assigns work to ↓
|
|||
|
|
┌────────────────────────────┴────────────────────────────────────┐
|
|||
|
|
│ Layer 2: Orchestration frameworks │
|
|||
|
|
│ Code that defines who does what, in what order. Reach for │
|
|||
|
|
│ when you want to *encode* a pipeline, not drive interactively. │
|
|||
|
|
│ (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.) │
|
|||
|
|
└────────────────────────────┬────────────────────────────────────┘
|
|||
|
|
│ runs ↓
|
|||
|
|
┌────────────────────────────┴────────────────────────────────────┐
|
|||
|
|
│ Layer 1: Harnesses │
|
|||
|
|
│ The thing you actually drive — terminal or IDE agent that │
|
|||
|
|
│ takes a prompt and uses tools. (OpenCode, Aider, Cline, │
|
|||
|
|
│ OpenHands, OpenWork, Goose, Pi.) │
|
|||
|
|
└────────────────────────────┬────────────────────────────────────┘
|
|||
|
|
│ calls ↓
|
|||
|
|
┌────────────────────────────┴────────────────────────────────────┐
|
|||
|
|
│ Layer 0: Inference + tools │
|
|||
|
|
│ Models, MCP servers, embedding stores, voice/vision pipelines. │
|
|||
|
|
│ (Ollama, vLLM, llama.cpp, MCP catalog.) │
|
|||
|
|
└─────────────────────────────────────────────────────────────────┘
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Pick what you need from each layer; you don't need anything from a
|
|||
|
|
higher layer if the one below is doing the job.
|
|||
|
|
|
|||
|
|
## Layer 1: Harnesses
|
|||
|
|
|
|||
|
|
The interactive driver's seat. We use **OpenCode**; alternatives worth
|
|||
|
|
having in your back pocket:
|
|||
|
|
|
|||
|
|
- **[Aider](https://aider.chat)** — terminal, git-native, every change
|
|||
|
|
is a commit. Strong on multi-file refactors. Often outperforms
|
|||
|
|
OpenCode on small-context local models.
|
|||
|
|
- **[Cline](https://github.com/cline/cline)** — VS Code agent, 5M+
|
|||
|
|
installs. v3.58 added native subagents and a CI/CD headless mode.
|
|||
|
|
Closest "Claude Code in the IDE" feel.
|
|||
|
|
- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** (was
|
|||
|
|
OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs
|
|||
|
|
tests, browses. Heavier than OpenCode, more autonomous.
|
|||
|
|
- **[OpenWork](https://supergok.com/openwork-open-source-ai-agent-framework/)**
|
|||
|
|
— desktop GUI agent built on `deepagentsjs`. Multi-step planning,
|
|||
|
|
filesystem access, **subagent delegation** baked in. `npx` to start.
|
|||
|
|
- **[Goose](https://github.com/block/goose)** by Block — MCP-native,
|
|||
|
|
opinionated about toolkits. Strong fit if you're going all-in on MCP.
|
|||
|
|
- **[Pi](https://github.com/bradAGI/pi-mono)** — minimal terminal
|
|||
|
|
harness; small surface area, easy to read and extend.
|
|||
|
|
- Catalog: <https://github.com/bradAGI/awesome-cli-coding-agents>
|
|||
|
|
|
|||
|
|
When to add another: when you hit a workflow OpenCode doesn't handle
|
|||
|
|
well. Aider for refactors, Cline for IDE-bound work, OpenHands for
|
|||
|
|
autonomous loops, OpenWork for subagent-heavy planning.
|
|||
|
|
|
|||
|
|
## Layer 2: Orchestration frameworks
|
|||
|
|
|
|||
|
|
When you want to *build* a pipeline rather than drive a prebuilt one —
|
|||
|
|
write code that defines agents, tools, and a flow. Overkill for solo
|
|||
|
|
coding; on-target for personal automation pipelines.
|
|||
|
|
|
|||
|
|
- **[LangGraph](https://github.com/langchain-ai/langgraph)** — directed
|
|||
|
|
graph with conditional edges, built-in checkpointing with time-travel.
|
|||
|
|
Surpassed CrewAI in stars in early 2026; production-grade.
|
|||
|
|
- **[CrewAI](https://github.com/crewAIInc/crewAI)** — role-based crews,
|
|||
|
|
agents/tasks/crew in <20 lines of Python. Lowest learning curve,
|
|||
|
|
~5× faster than LangGraph in many cases. A2A protocol support.
|
|||
|
|
- **[AutoGen / AG2](https://github.com/microsoft/autogen)** —
|
|||
|
|
event-driven, async-first; GroupChat coordination with a selector
|
|||
|
|
deciding who speaks next.
|
|||
|
|
- **[OpenAI Agents SDK](https://github.com/openai/openai-agents-python)** —
|
|||
|
|
production replacement for Swarm. Explicit agent-to-agent handoffs
|
|||
|
|
carrying conversation context.
|
|||
|
|
- **[OpenAgents](https://github.com/openagents/openagents)** — only
|
|||
|
|
framework with native support for **both MCP and A2A** protocols.
|
|||
|
|
|
|||
|
|
All five are model-agnostic — point them at the local Ollama endpoint
|
|||
|
|
or LiteLLM and they work the same as if you were using GPT-5.
|
|||
|
|
|
|||
|
|
## Layer 3: Management & observability
|
|||
|
|
|
|||
|
|
The Claude Cowork analog layer. Dispatch work, track progress, monitor
|
|||
|
|
spend. The piece I missed in earlier roadmap drafts:
|
|||
|
|
|
|||
|
|
- **[Multica](https://github.com/multica-ai/multica)** — open-source
|
|||
|
|
platform for managing AI coding agents *as teammates*. File issues,
|
|||
|
|
assign to agents, they pick up work, write code, report blockers,
|
|||
|
|
update statuses. Skill accumulation across tasks. Dispatches to
|
|||
|
|
Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes,
|
|||
|
|
Gemini, Kiro, etc. **#1 GitHub TypeScript Trending in April 2026,
|
|||
|
|
10.7 k stars in three months.** Closest existing OSS analog to
|
|||
|
|
Cowork; works *above* OpenCode rather than replacing it.
|
|||
|
|
- **[Plane](https://plane.so/)** — AI-native project management with
|
|||
|
|
MCP server and full Agent Run lifecycle tracking. Velocity, workload,
|
|||
|
|
blockers auto-populated. Self-hostable.
|
|||
|
|
- **[Mission Control](https://github.com/builderz-labs/mission-control)** —
|
|||
|
|
self-hosted dashboard for agent fleets. Token-spend per model, trend
|
|||
|
|
charts, real-time posture scoring. SQLite, zero deps. Useful even
|
|||
|
|
solo just to watch your local model's spend pattern.
|
|||
|
|
- **[ClawDeck](https://github.com/builderz-labs/clawdeck)** — focused
|
|||
|
|
kanban for OpenClaw agents. Smaller scope, more opinionated.
|
|||
|
|
- **[OpenWebUI](https://github.com/open-webui/open-webui)** — multi-user
|
|||
|
|
chat UI in front of Ollama. Doesn't manage *agent sessions*, but
|
|||
|
|
covers shared-LLM-access for a household.
|
|||
|
|
|
|||
|
|
**Still gap (May 2026):** *live* shared agent sessions — multiple humans
|
|||
|
|
driving one agent on one repo in real time, Cowork's multiplayer mode.
|
|||
|
|
Multica is async-ticket-flow, not live-collaborative. Watch for
|
|||
|
|
projects that close this.
|
|||
|
|
|
|||
|
|
## Layer 0: Cross-cutting capabilities
|
|||
|
|
|
|||
|
|
Things you wire in once, useful from any harness above.
|
|||
|
|
|
|||
|
|
### Aggregation & routing
|
|||
|
|
|
|||
|
|
- **[LiteLLM](https://github.com/BerriAI/litellm)** — proxy that exposes
|
|||
|
|
one OpenAI-compatible endpoint over many backends. Rules like "code
|
|||
|
|
task → local Qwen3, image input → vision model, fast chat → Haiku."
|
|||
|
|
Worth standing up early so all higher-layer tools just point at one
|
|||
|
|
URL.
|
|||
|
|
|
|||
|
|
### Code intelligence / RAG
|
|||
|
|
|
|||
|
|
The local model can't see beyond what fits in its prompt. Claude Code
|
|||
|
|
papers over this with long-context recall + WebFetch; locally we need
|
|||
|
|
explicit retrieval.
|
|||
|
|
|
|||
|
|
- **[Continue.dev](https://continue.dev)** — VS Code / JetBrains plugin
|
|||
|
|
with built-in `@codebase` indexer. Pairs with `nomic-embed-text` via
|
|||
|
|
Ollama for local embeddings. **Biggest single-shot quality lift** for
|
|||
|
|
local-model coding.
|
|||
|
|
- **[Tabby](https://github.com/TabbyML/tabby)** — self-hosted
|
|||
|
|
Copilot-style inline completion. Different niche from agentic coding,
|
|||
|
|
covers the "as I type" loop OpenCode doesn't.
|
|||
|
|
|
|||
|
|
### Web access — done
|
|||
|
|
|
|||
|
|
Already wired in `localgenai/opencode/`:
|
|||
|
|
|
|||
|
|
- **[@playwright/mcp](https://github.com/microsoft/playwright-mcp)** —
|
|||
|
|
browser automation. Navigate, click, fill forms, read DOM snapshots.
|
|||
|
|
- **[mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)** —
|
|||
|
|
web search via your `searxng.n0n.io` instance. No API keys.
|
|||
|
|
- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
|
|||
|
|
not yet wired; useful if Playwright feels heavy for plain page reads.
|
|||
|
|
|
|||
|
|
### Persistent memory
|
|||
|
|
|
|||
|
|
OpenCode's memory is per-session unless you wire MCP for it.
|
|||
|
|
|
|||
|
|
- **`mcp-server-memory`** (official) — knowledge-graph memory across
|
|||
|
|
any MCP client. Crude but works.
|
|||
|
|
- **[mem0](https://github.com/mem0ai/mem0)** — sophisticated memory,
|
|||
|
|
self-hostable with a local embedding model.
|
|||
|
|
|
|||
|
|
### Multi-modal
|
|||
|
|
|
|||
|
|
- **Vision**: `qwen2.5-vl:7b` or `llama3.2-vision:11b` via Ollama.
|
|||
|
|
Route via LiteLLM so OpenCode auto-picks vision for image-bearing
|
|||
|
|
prompts. Closes the screenshot gap.
|
|||
|
|
- **Voice**: `whisper-compose.yaml` and `piper-compose.yaml` are
|
|||
|
|
already in `localgenai/`. STT in, TTS out — a "talk to your local
|
|||
|
|
assistant" loop is something Claude Code itself doesn't do.
|
|||
|
|
|
|||
|
|
### Design (genuine new capability)
|
|||
|
|
|
|||
|
|
Anthropic ships **Claude Design** (Anthropic Labs); the OSS analog is
|
|||
|
|
real and active:
|
|||
|
|
|
|||
|
|
- **[nexu-io/open-design](https://github.com/nexu-io/open-design)** —
|
|||
|
|
local-first OSS alternative to Claude Design. 19 Skills, 71
|
|||
|
|
brand-grade Design Systems, generates web/desktop/mobile prototypes,
|
|||
|
|
slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4
|
|||
|
|
export. **Ships a stdio MCP server** exposing design source — tokens,
|
|||
|
|
JSX components, entry HTML — as a structured API to any MCP-aware
|
|||
|
|
agent. Pairs naturally with OpenCode + the local model.
|
|||
|
|
- **[Penpot](https://penpot.app)** — full OSS Figma alternative.
|
|||
|
|
Different scope (interactive design tool, not agentic).
|
|||
|
|
- **[OpenPencil](https://github.com/ZSeven-W/openpencil)** — AI-native
|
|||
|
|
vector design tool, "design-as-code." Newer; track.
|
|||
|
|
|
|||
|
|
### Skills / MCP catalog
|
|||
|
|
|
|||
|
|
- **[Official MCP registry](https://registry.modelcontextprotocol.io/)**
|
|||
|
|
- **[Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers)**
|
|||
|
|
(1200+ entries)
|
|||
|
|
- **[mcpservers.org](https://mcpservers.org/)** searchable directory
|
|||
|
|
|
|||
|
|
Build the skills you miss; publish them.
|
|||
|
|
|
|||
|
|
## Prioritized next steps
|
|||
|
|
|
|||
|
|
In rough order of impact-per-hour-spent:
|
|||
|
|
|
|||
|
|
1. **Continue.dev + `nomic-embed-text` in VS Code.** Biggest single
|
|||
|
|
quality jump for local-model coding via codebase indexing.
|
|||
|
|
2. **Symlink `localgenai/opencode/opencode.json`** into place so the
|
|||
|
|
Playwright + SearXNG MCP servers activate. (Already written;
|
|||
|
|
one `ln -sf` away.)
|
|||
|
|
3. **Stand up LiteLLM** as the aggregation layer in front of Ollama —
|
|||
|
|
even at one model, the abstraction pays off the moment you add a
|
|||
|
|
second.
|
|||
|
|
4. **Stand up Multica** for ticket-style work dispatch. Closest Cowork
|
|||
|
|
analog; useful even solo for tracking what the agent has done across
|
|||
|
|
sessions.
|
|||
|
|
5. **Add a vision model** (`qwen2.5-vl:7b`) and route image prompts via
|
|||
|
|
LiteLLM. Closes the screenshot gap.
|
|||
|
|
6. **Stand up OpenDesign** as a compose stack. Connect its MCP server
|
|||
|
|
to OpenCode. Genuinely new capability versus Claude Code.
|
|||
|
|
7. **Try Aider on a real refactor** — different agentic loop than
|
|||
|
|
OpenCode, often better with smaller-context local models.
|
|||
|
|
8. **Wire whisper + piper into a voice loop.** Not a Claude Code parity
|
|||
|
|
item — a *new* capability the local stack uniquely enables.
|
|||
|
|
9. **Mission Control on the box** for token-spend visibility once
|
|||
|
|
multi-model routing through LiteLLM is real.
|
|||
|
|
10. **Evaluate OpenHands or Goose** for autonomous agentic loops where
|
|||
|
|
OpenCode's per-task UX feels too hands-on.
|
|||
|
|
11. **OpenWebUI for household-shared chat access.** Adds value for
|
|||
|
|
non-coding LLM use, fills a small gap in the multi-user story.
|
|||
|
|
|
|||
|
|
## Reference catalogs to monitor
|
|||
|
|
|
|||
|
|
- Official MCP registry: <https://registry.modelcontextprotocol.io/>
|
|||
|
|
- Awesome CLI coding agents: <https://github.com/bradAGI/awesome-cli-coding-agents>
|
|||
|
|
- Awesome MCP servers: <https://github.com/punkpeye/awesome-mcp-servers>
|
|||
|
|
- mcpservers.org directory: <https://mcpservers.org/>
|
|||
|
|
- Anthropic plugin catalog (parity-tracking): <https://claude.com/plugins>
|
|||
|
|
|
|||
|
|
## Open gaps to keep watching
|
|||
|
|
|
|||
|
|
- **Live shared agent sessions** — Cowork's multiplayer mode (multiple
|
|||
|
|
humans driving one agent on one repo, real-time). No direct OSS
|
|||
|
|
equivalent yet; Multica is async-ticket-flow.
|
|||
|
|
- **Cohesive "skills" library** as polished as Anthropic's plugin
|
|||
|
|
catalog. The MCP ecosystem has the parts but lacks the curation.
|
|||
|
|
- **AMD ROCm packages for Ubuntu 26.04** — relevant if we ever decide
|
|||
|
|
to abandon containerization for native ROCm tooling. (Tracked
|
|||
|
|
separately in `TODO.md`.)
|