Files
localgenai/Roadmap.md
noisedestroyers a29793032d Document current coding-workflow stack state
Snapshot of where opencode + Qwen3-Coder + MCPs + Kimi-Linear + voice
  + Phoenix tracing land today, plus in-flight (oc-tree, kimi-linear
  context ramp) and next (ComfyUI) items with pointers to per-project
  NEXT_STEPS.md guides.
2026-05-10 21:14:43 -04:00

16 KiB
Raw Permalink Blame History

Local GenAI roadmap

What to build on top of the Framework Desktop / Strix Halo stack to narrow the functional gap with Claude Code + Anthropic's broader ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit quarterly.

Where we stand

  • Containerized inference at /srv/docker/{llama,vllm,ollama}
  • Models on the unified-memory iGPU at 64 GB UMA
  • OpenCode on the Mac, pointed at the box over Tailscale
  • Playwright + SearXNG MCP wired into OpenCode (localgenai/opencode/)
  • Self-hosted SearXNG instance at https://searxng.n0n.io
  • Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B

The mental model: three layers

The OSS ecosystem stratifies cleanly once you stop thinking about individual products:

┌─────────────────────────────────────────────────────────────────┐
│ Layer 3: Management & observability                             │
│   Dispatch tickets to agents, watch spend, track skill           │
│   accumulation. Where Claude Cowork lives. (Multica, Plane.)    │
└────────────────────────────┬────────────────────────────────────┘
                             │ assigns work to ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 2: Orchestration frameworks                                │
│   Code that defines who does what, in what order. Reach for      │
│   when you want to *encode* a pipeline, not drive interactively. │
│   (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.)               │
└────────────────────────────┬────────────────────────────────────┘
                             │ runs ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 1: Harnesses                                               │
│   The thing you actually drive — terminal or IDE agent that      │
│   takes a prompt and uses tools. (OpenCode, Aider, Cline,        │
│   OpenHands, OpenWork, Goose, Pi.)                               │
└────────────────────────────┬────────────────────────────────────┘
                             │ calls ↓
┌────────────────────────────┴────────────────────────────────────┐
│ Layer 0: Inference + tools                                       │
│   Models, MCP servers, embedding stores, voice/vision pipelines. │
│   (Ollama, vLLM, llama.cpp, MCP catalog.)                        │
└─────────────────────────────────────────────────────────────────┘

Pick what you need from each layer; you don't need anything from a higher layer if the one below is doing the job.

Layer 1: Harnesses

The interactive driver's seat. We use OpenCode; alternatives worth having in your back pocket:

  • Aider — terminal, git-native, every change is a commit. Strong on multi-file refactors. Often outperforms OpenCode on small-context local models.
  • Cline — VS Code agent, 5M+ installs. v3.58 added native subagents and a CI/CD headless mode. Closest "Claude Code in the IDE" feel.
  • OpenHands (was OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs tests, browses. Heavier than OpenCode, more autonomous.
  • OpenWork — desktop GUI agent built on deepagentsjs. Multi-step planning, filesystem access, subagent delegation baked in. npx to start.
  • Goose by Block — MCP-native, opinionated about toolkits. Strong fit if you're going all-in on MCP.
  • Pi — minimal terminal harness; small surface area, easy to read and extend.
  • Catalog: https://github.com/bradAGI/awesome-cli-coding-agents

When to add another: when you hit a workflow OpenCode doesn't handle well. Aider for refactors, Cline for IDE-bound work, OpenHands for autonomous loops, OpenWork for subagent-heavy planning.

Layer 2: Orchestration frameworks

When you want to build a pipeline rather than drive a prebuilt one — write code that defines agents, tools, and a flow. Overkill for solo coding; on-target for personal automation pipelines.

  • LangGraph — directed graph with conditional edges, built-in checkpointing with time-travel. Surpassed CrewAI in stars in early 2026; production-grade.
  • CrewAI — role-based crews, agents/tasks/crew in <20 lines of Python. Lowest learning curve, ~5× faster than LangGraph in many cases. A2A protocol support.
  • AutoGen / AG2 — event-driven, async-first; GroupChat coordination with a selector deciding who speaks next.
  • OpenAI Agents SDK — production replacement for Swarm. Explicit agent-to-agent handoffs carrying conversation context.
  • OpenAgents — only framework with native support for both MCP and A2A protocols.

All five are model-agnostic — point them at the local Ollama endpoint or LiteLLM and they work the same as if you were using GPT-5.

Layer 3: Management & observability

The Claude Cowork analog layer. Dispatch work, track progress, monitor spend. The piece I missed in earlier roadmap drafts:

  • Multica — open-source platform for managing AI coding agents as teammates. File issues, assign to agents, they pick up work, write code, report blockers, update statuses. Skill accumulation across tasks. Dispatches to Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes, Gemini, Kiro, etc. #1 GitHub TypeScript Trending in April 2026, 10.7 k stars in three months. Closest existing OSS analog to Cowork; works above OpenCode rather than replacing it.
  • Plane — AI-native project management with MCP server and full Agent Run lifecycle tracking. Velocity, workload, blockers auto-populated. Self-hostable.
  • Mission Control — self-hosted dashboard for agent fleets. Token-spend per model, trend charts, real-time posture scoring. SQLite, zero deps. Useful even solo just to watch your local model's spend pattern.
  • ClawDeck — focused kanban for OpenClaw agents. Smaller scope, more opinionated.
  • OpenWebUI — multi-user chat UI in front of Ollama. Doesn't manage agent sessions, but covers shared-LLM-access for a household.

Still gap (May 2026): live shared agent sessions — multiple humans driving one agent on one repo in real time, Cowork's multiplayer mode. Multica is async-ticket-flow, not live-collaborative. Watch for projects that close this.

Layer 0: Cross-cutting capabilities

Things you wire in once, useful from any harness above.

Aggregation & routing

  • LiteLLM — proxy that exposes one OpenAI-compatible endpoint over many backends. Rules like "code task → local Qwen3, image input → vision model, fast chat → Haiku." Worth standing up early so all higher-layer tools just point at one URL.

Code intelligence / RAG

The local model can't see beyond what fits in its prompt. Claude Code papers over this with long-context recall + WebFetch; locally we need explicit retrieval.

  • Continue.dev — VS Code / JetBrains plugin with built-in @codebase indexer. Pairs with nomic-embed-text via Ollama for local embeddings. Biggest single-shot quality lift for local-model coding.
  • Tabby — self-hosted Copilot-style inline completion. Different niche from agentic coding, covers the "as I type" loop OpenCode doesn't.

Web access — done

Already wired in localgenai/opencode/:

  • @playwright/mcp — browser automation. Navigate, click, fill forms, read DOM snapshots.
  • mcp-searxng — web search via your searxng.n0n.io instance. No API keys.
  • mcp-server-fetch — not yet wired; useful if Playwright feels heavy for plain page reads.

Next: cited-search variant. Fork mcp-searxng (or wrap it) to expose a search_cited tool that auto-fetches the top 3-5 results, extracts readable text via mozilla/readability or trafilatura, and returns numbered [n] URL\n<excerpt> blocks with a tool description telling the model to cite as [n]. Closes the citation-quality gap with opencode-websearch-cited (which delegates to Gemini/OpenAI/OpenRouter) while staying fully local. Tradeoff: 3-5 HTTP fetches per query → slower; needs timeout + graceful skip for sites that block scrapers.

Observability — partial

  • Arize Phoenix wired via opencode/.opencode/plugin/phoenix-bridge.js: OpenCode's experimental OTel spans go to Phoenix at framework:6006 with the OpenInference span processor for LLM/tool-aware attributes. Subagents nest under the parent session as a unified trace tree. External trace viewing is solved.
  • Gap: in-harness visibility. OpenCode's TUI has no sidebar plugin API (sst/opencode#5971, open) and no built-in tree/timeline view. Plugins can only show toasts. While driving the agent, there's no live "what's it doing right now" pane — you have to alt-tab to Phoenix.
  • Next: SSE sidecar. opencode serve exposes documented SSE streams at /global/event and /session/{id}/event carrying every tool call, message part, and child-session creation. A small TUI sidecar (Bubbletea/textual, ~200 lines) running in a tmux pane next to opencode can subscribe and render a live tree: session → tool call → child session → tool call. Subagents are child sessions in the data model, so the hierarchy comes for free. Phoenix stays the deep-trace store; sidecar is the live pane.

Persistent memory

OpenCode's memory is per-session unless you wire MCP for it.

  • mcp-server-memory (official) — knowledge-graph memory across any MCP client. Crude but works.
  • mem0 — sophisticated memory, self-hostable with a local embedding model.

Multi-modal

  • Vision: qwen2.5-vl:7b or llama3.2-vision:11b via Ollama. Route via LiteLLM so OpenCode auto-picks vision for image-bearing prompts. Closes the screenshot gap.
  • Voice: whisper-compose.yaml and piper-compose.yaml are already in localgenai/. STT in, TTS out — a "talk to your local assistant" loop is something Claude Code itself doesn't do.

Design (genuine new capability)

Anthropic ships Claude Design (Anthropic Labs); the OSS analog is real and active:

  • nexu-io/open-design — local-first OSS alternative to Claude Design. 19 Skills, 71 brand-grade Design Systems, generates web/desktop/mobile prototypes, slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4 export. Ships a stdio MCP server exposing design source — tokens, JSX components, entry HTML — as a structured API to any MCP-aware agent. Pairs naturally with OpenCode + the local model.
  • Penpot — full OSS Figma alternative. Different scope (interactive design tool, not agentic).
  • OpenPencil — AI-native vector design tool, "design-as-code." Newer; track.

Skills / MCP catalog

Build the skills you miss; publish them.

Prioritized next steps

In rough order of impact-per-hour-spent:

  1. Continue.dev + nomic-embed-text in VS Code. Biggest single quality jump for local-model coding via codebase indexing.
  2. Symlink localgenai/opencode/opencode.json into place so the Playwright + SearXNG MCP servers activate. (Already written; one ln -sf away.)
  3. Stand up LiteLLM as the aggregation layer in front of Ollama — even at one model, the abstraction pays off the moment you add a second.
  4. Stand up Multica for ticket-style work dispatch. Closest Cowork analog; useful even solo for tracking what the agent has done across sessions.
  5. Add a vision model (qwen2.5-vl:7b) and route image prompts via LiteLLM. Closes the screenshot gap.
  6. Stand up OpenDesign as a compose stack. Connect its MCP server to OpenCode. Genuinely new capability versus Claude Code.
  7. Try Aider on a real refactor — different agentic loop than OpenCode, often better with smaller-context local models.
  8. Wire whisper + piper into a voice loop. Not a Claude Code parity item — a new capability the local stack uniquely enables.
  9. Mission Control on the box for token-spend visibility once multi-model routing through LiteLLM is real.
  10. Evaluate OpenHands or Goose for autonomous agentic loops where OpenCode's per-task UX feels too hands-on.
  11. OpenWebUI for household-shared chat access. Adds value for non-coding LLM use, fills a small gap in the multi-user story.
  12. SSE sidecar for in-harness visibility. Live tree pane subscribed to opencode's /session/{id}/event SSE; closes the "what's it doing right now" gap that Phoenix doesn't fill (Phoenix is for after-the-fact deep traces, not glanceable status).
  13. Cited-search MCP. Fork mcp-searxng to add search_cited with auto-fetch + readability extraction. Smaller-impact than the sidecar but cheap and keeps everything local.

Reference catalogs to monitor

Open gaps to keep watching

  • Live shared agent sessions — Cowork's multiplayer mode (multiple humans driving one agent on one repo, real-time). No direct OSS equivalent yet; Multica is async-ticket-flow.
  • Cohesive "skills" library as polished as Anthropic's plugin catalog. The MCP ecosystem has the parts but lacks the curation.
  • AMD ROCm packages for Ubuntu 26.04 — relevant if we ever decide to abandon containerization for native ROCm tooling. (Tracked separately in TODO.md.)