localgenai/Roadmap.md

# Local GenAI roadmap

What to build on top of the Framework Desktop / Strix Halo stack to
narrow the functional gap with Claude Code + Anthropic's broader
ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit
quarterly.

## Where we stand

- Containerized inference at `/srv/docker/{llama,vllm,ollama}`
- Models on the unified-memory iGPU at 64 GB UMA
- OpenCode on the Mac, pointed at the box over Tailscale
- Playwright + SearXNG MCP wired into OpenCode (`localgenai/opencode/`)
- Self-hosted SearXNG instance at `https://searxng.n0n.io`
- Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B

## The mental model: three layers

The OSS ecosystem stratifies cleanly once you stop thinking about
individual products:

```
┌─────────────────────────────────────────────────────────────────┐
│ Layer 3: Management & observability                             │
│   Dispatch tickets to agents, watch spend, track skill           │
│   accumulation. Where Claude Cowork lives. (Multica, Plane.)    │
└────────────────────────────┬────────────────────────────────────┘
                             │ assigns work to ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 2: Orchestration frameworks                                │
│   Code that defines who does what, in what order. Reach for      │
│   when you want to *encode* a pipeline, not drive interactively. │
│   (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.)               │
└────────────────────────────┬────────────────────────────────────┘
                             │ runs ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 1: Harnesses                                               │
│   The thing you actually drive — terminal or IDE agent that      │
│   takes a prompt and uses tools. (OpenCode, Aider, Cline,        │
│   OpenHands, OpenWork, Goose, Pi.)                               │
└────────────────────────────┬────────────────────────────────────┘
                             │ calls ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 0: Inference + tools                                       │
│   Models, MCP servers, embedding stores, voice/vision pipelines. │
│   (Ollama, vLLM, llama.cpp, MCP catalog.)                        │
└─────────────────────────────────────────────────────────────────┘
```

Pick what you need from each layer; you don't need anything from a
higher layer if the one below is doing the job.

## Layer 1: Harnesses

The interactive driver's seat. We use **OpenCode**; alternatives worth
having in your back pocket:

- **[Aider](https://aider.chat)** — terminal, git-native, every change
  is a commit. Strong on multi-file refactors. Often outperforms
  OpenCode on small-context local models.
- **[Cline](https://github.com/cline/cline)** — VS Code agent, 5M+
  installs. v3.58 added native subagents and a CI/CD headless mode.
  Closest "Claude Code in the IDE" feel.
- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** (was
  OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs
  tests, browses. Heavier than OpenCode, more autonomous.
- **[OpenWork](https://supergok.com/openwork-open-source-ai-agent-framework/)**
  — desktop GUI agent built on `deepagentsjs`. Multi-step planning,
  filesystem access, **subagent delegation** baked in. `npx` to start.
- **[Goose](https://github.com/block/goose)** by Block — MCP-native,
  opinionated about toolkits. Strong fit if you're going all-in on MCP.
- **[Pi](https://github.com/bradAGI/pi-mono)** — minimal terminal
  harness; small surface area, easy to read and extend.
- Catalog: <https://github.com/bradAGI/awesome-cli-coding-agents>

When to add another: when you hit a workflow OpenCode doesn't handle
well. Aider for refactors, Cline for IDE-bound work, OpenHands for
autonomous loops, OpenWork for subagent-heavy planning.

## Layer 2: Orchestration frameworks

When you want to *build* a pipeline rather than drive a prebuilt one —
write code that defines agents, tools, and a flow. Overkill for solo
coding; on-target for personal automation pipelines.

- **[LangGraph](https://github.com/langchain-ai/langgraph)** — directed
  graph with conditional edges, built-in checkpointing with time-travel.
  Surpassed CrewAI in stars in early 2026; production-grade.
- **[CrewAI](https://github.com/crewAIInc/crewAI)** — role-based crews,
  agents/tasks/crew in <20 lines of Python. Lowest learning curve,
  ~5× faster than LangGraph in many cases. A2A protocol support.
- **[AutoGen / AG2](https://github.com/microsoft/autogen)** —
  event-driven, async-first; GroupChat coordination with a selector
  deciding who speaks next.
- **[OpenAI Agents SDK](https://github.com/openai/openai-agents-python)** —
  production replacement for Swarm. Explicit agent-to-agent handoffs
  carrying conversation context.
- **[OpenAgents](https://github.com/openagents/openagents)** — only
  framework with native support for **both MCP and A2A** protocols.

All five are model-agnostic — point them at the local Ollama endpoint
or LiteLLM and they work the same as if you were using GPT-5.

## Layer 3: Management & observability

The Claude Cowork analog layer. Dispatch work, track progress, monitor
spend. The piece I missed in earlier roadmap drafts:

- **[Multica](https://github.com/multica-ai/multica)** — open-source
  platform for managing AI coding agents *as teammates*. File issues,
  assign to agents, they pick up work, write code, report blockers,
  update statuses. Skill accumulation across tasks. Dispatches to
  Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes,
  Gemini, Kiro, etc. **#1 GitHub TypeScript Trending in April 2026,
  10.7 k stars in three months.** Closest existing OSS analog to
  Cowork; works *above* OpenCode rather than replacing it.
- **[Plane](https://plane.so/)** — AI-native project management with
  MCP server and full Agent Run lifecycle tracking. Velocity, workload,
  blockers auto-populated. Self-hostable.
- **[Mission Control](https://github.com/builderz-labs/mission-control)** —
  self-hosted dashboard for agent fleets. Token-spend per model, trend
  charts, real-time posture scoring. SQLite, zero deps. Useful even
  solo just to watch your local model's spend pattern.
- **[ClawDeck](https://github.com/builderz-labs/clawdeck)** — focused
  kanban for OpenClaw agents. Smaller scope, more opinionated.
- **[OpenWebUI](https://github.com/open-webui/open-webui)** — multi-user
  chat UI in front of Ollama. Doesn't manage *agent sessions*, but
  covers shared-LLM-access for a household.

**Still gap (May 2026):** *live* shared agent sessions — multiple humans
driving one agent on one repo in real time, Cowork's multiplayer mode.
Multica is async-ticket-flow, not live-collaborative. Watch for
projects that close this.

## Layer 0: Cross-cutting capabilities

Things you wire in once, useful from any harness above.

### Aggregation & routing

- **[LiteLLM](https://github.com/BerriAI/litellm)** — proxy that exposes
  one OpenAI-compatible endpoint over many backends. Rules like "code
  task → local Qwen3, image input → vision model, fast chat → Haiku."
  Worth standing up early so all higher-layer tools just point at one
  URL.

### Code intelligence / RAG

The local model can't see beyond what fits in its prompt. Claude Code
papers over this with long-context recall + WebFetch; locally we need
explicit retrieval.

- **[Continue.dev](https://continue.dev)** — VS Code / JetBrains plugin
  with built-in `@codebase` indexer. Pairs with `nomic-embed-text` via
  Ollama for local embeddings. **Biggest single-shot quality lift** for
  local-model coding.
- **[Tabby](https://github.com/TabbyML/tabby)** — self-hosted
  Copilot-style inline completion. Different niche from agentic coding,
  covers the "as I type" loop OpenCode doesn't.

### Web access — done

Already wired in `localgenai/opencode/`:

- **[@playwright/mcp](https://github.com/microsoft/playwright-mcp)** —
  browser automation. Navigate, click, fill forms, read DOM snapshots.
- **[mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)** —
  web search via your `searxng.n0n.io` instance. No API keys.
- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
  not yet wired; useful if Playwright feels heavy for plain page reads.

**Next: cited-search variant.** Fork `mcp-searxng` (or wrap it) to expose
a `search_cited` tool that auto-fetches the top 3-5 results, extracts
readable text via mozilla/readability or trafilatura, and returns
numbered `[n] URL\n<excerpt>` blocks with a tool description telling the
model to cite as `[n]`. Closes the citation-quality gap with
[opencode-websearch-cited](https://github.com/ghoulr/opencode-websearch-cited)
(which delegates to Gemini/OpenAI/OpenRouter) while staying fully local.
Tradeoff: 3-5 HTTP fetches per query → slower; needs timeout + graceful
skip for sites that block scrapers.

### Observability — partial

- **[Arize Phoenix](https://github.com/Arize-ai/phoenix)** wired via
  `opencode/.opencode/plugin/phoenix-bridge.js`: OpenCode's experimental
  OTel spans go to Phoenix at `framework:6006` with the OpenInference
  span processor for LLM/tool-aware attributes. Subagents nest under the
  parent session as a unified trace tree. **External** trace viewing is
  solved.
- **Gap: in-harness visibility.** OpenCode's TUI has no sidebar plugin
  API ([sst/opencode#5971](https://github.com/sst/opencode/issues/5971),
  open) and no built-in tree/timeline view. Plugins can only show
  toasts. While driving the agent, there's no live "what's it doing
  right now" pane — you have to alt-tab to Phoenix.
- **Next: SSE sidecar.** `opencode serve` exposes documented SSE streams
  at `/global/event` and `/session/{id}/event` carrying every tool call,
  message part, and child-session creation. A small TUI sidecar
  (Bubbletea/textual, ~200 lines) running in a tmux pane next to
  opencode can subscribe and render a live tree:
  `session → tool call → child session → tool call`. Subagents *are*
  child sessions in the data model, so the hierarchy comes for free.
  Phoenix stays the deep-trace store; sidecar is the live pane.

### Persistent memory

OpenCode's memory is per-session unless you wire MCP for it.

- **`mcp-server-memory`** (official) — knowledge-graph memory across
  any MCP client. Crude but works.
- **[mem0](https://github.com/mem0ai/mem0)** — sophisticated memory,
  self-hostable with a local embedding model.

### Multi-modal

- **Vision**: `qwen2.5-vl:7b` or `llama3.2-vision:11b` via Ollama.
  Route via LiteLLM so OpenCode auto-picks vision for image-bearing
  prompts. Closes the screenshot gap.
- **Voice**: `whisper-compose.yaml` and `piper-compose.yaml` are
  already in `localgenai/`. STT in, TTS out — a "talk to your local
  assistant" loop is something Claude Code itself doesn't do.

### Design (genuine new capability)

Anthropic ships **Claude Design** (Anthropic Labs); the OSS analog is
real and active:

- **[nexu-io/open-design](https://github.com/nexu-io/open-design)** —
  local-first OSS alternative to Claude Design. 19 Skills, 71
  brand-grade Design Systems, generates web/desktop/mobile prototypes,
  slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4
  export. **Ships a stdio MCP server** exposing design source — tokens,
  JSX components, entry HTML — as a structured API to any MCP-aware
  agent. Pairs naturally with OpenCode + the local model.
- **[Penpot](https://penpot.app)** — full OSS Figma alternative.
  Different scope (interactive design tool, not agentic).
- **[OpenPencil](https://github.com/ZSeven-W/openpencil)** — AI-native
  vector design tool, "design-as-code." Newer; track.

### Skills / MCP catalog

- **[Official MCP registry](https://registry.modelcontextprotocol.io/)**
- **[Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers)**
  (1200+ entries)
- **[mcpservers.org](https://mcpservers.org/)** searchable directory

Build the skills you miss; publish them.

## Prioritized next steps

In rough order of impact-per-hour-spent:

1. **Continue.dev + `nomic-embed-text` in VS Code.** Biggest single
   quality jump for local-model coding via codebase indexing.
2. **Symlink `localgenai/opencode/opencode.json`** into place so the
   Playwright + SearXNG MCP servers activate. (Already written;
   one `ln -sf` away.)
3. **Stand up LiteLLM** as the aggregation layer in front of Ollama —
   even at one model, the abstraction pays off the moment you add a
   second.
4. **Stand up Multica** for ticket-style work dispatch. Closest Cowork
   analog; useful even solo for tracking what the agent has done across
   sessions.
5. **Add a vision model** (`qwen2.5-vl:7b`) and route image prompts via
   LiteLLM. Closes the screenshot gap.
6. **Stand up OpenDesign** as a compose stack. Connect its MCP server
   to OpenCode. Genuinely new capability versus Claude Code.
7. **Try Aider on a real refactor** — different agentic loop than
   OpenCode, often better with smaller-context local models.
8. **Wire whisper + piper into a voice loop.** Not a Claude Code parity
   item — a *new* capability the local stack uniquely enables.
9. **Mission Control on the box** for token-spend visibility once
   multi-model routing through LiteLLM is real.
10. **Evaluate OpenHands or Goose** for autonomous agentic loops where
    OpenCode's per-task UX feels too hands-on.
11. **OpenWebUI for household-shared chat access.** Adds value for
    non-coding LLM use, fills a small gap in the multi-user story.
12. **SSE sidecar for in-harness visibility.** Live tree pane subscribed
    to opencode's `/session/{id}/event` SSE; closes the "what's it
    doing right now" gap that Phoenix doesn't fill (Phoenix is for
    after-the-fact deep traces, not glanceable status).
13. **Cited-search MCP.** Fork `mcp-searxng` to add `search_cited` with
    auto-fetch + readability extraction. Smaller-impact than the
    sidecar but cheap and keeps everything local.

## Reference catalogs to monitor

- Official MCP registry: <https://registry.modelcontextprotocol.io/>
- Awesome CLI coding agents: <https://github.com/bradAGI/awesome-cli-coding-agents>
- Awesome MCP servers: <https://github.com/punkpeye/awesome-mcp-servers>
- mcpservers.org directory: <https://mcpservers.org/>
- Anthropic plugin catalog (parity-tracking): <https://claude.com/plugins>

## Open gaps to keep watching

- **Live shared agent sessions** — Cowork's multiplayer mode (multiple
  humans driving one agent on one repo, real-time). No direct OSS
  equivalent yet; Multica is async-ticket-flow.
- **Cohesive "skills" library** as polished as Anthropic's plugin
  catalog. The MCP ecosystem has the parts but lacks the curation.
- **AMD ROCm packages for Ubuntu 26.04** — relevant if we ever decide
  to abandon containerization for native ROCm tooling. (Tracked
  separately in `TODO.md`.)