Initial commit: localgenai stack

Containerized local LLM stack for the Framework Desktop / Strix Halo, plus the OpenCode harness on the Mac side. - pyinfra/framework/: pyinfra deploy targeting the box - llama.cpp (Vulkan), vLLM (ROCm), Ollama (ROCm with HSA override for gfx1151), OpenWebUI - Beszel (host + container + AMD GPU dashboard via sysfs) - OpenLIT (LLM fleet metrics) - Phoenix (per-trace agent waterfall) - OpenHands (autonomous agent in a Docker sandbox) - opencode/: OpenCode config + Phoenix bridge plugin (OTel exporter) - install.sh deploys to ~/.config/opencode/ - StrixHaloSetup.md / StrixHaloMemory.md / Roadmap.md / TODO.md: documentation and planning - testing/qwen3-coder-30b/: small evaluation harness Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 11:35:10 -04:00
commit 2c4bfefa95
36 changed files with 5265 additions and 0 deletions
--- a/Roadmap.md
+++ b/Roadmap.md
@@ -0,0 +1,263 @@
+# Local GenAI roadmap
+
+What to build on top of the Framework Desktop / Strix Halo stack to
+narrow the functional gap with Claude Code + Anthropic's broader
+ecosystem (Cowork, Skills, Design). Captured 2026-05-07; revisit
+quarterly.
+
+## Where we stand
+
+- Containerized inference at `/srv/docker/{llama,vllm,ollama}`
+- Models on the unified-memory iGPU at 64 GB UMA
+- OpenCode on the Mac, pointed at the box over Tailscale
+- Playwright + SearXNG MCP wired into OpenCode (`localgenai/opencode/`)
+- Self-hosted SearXNG instance at `https://searxng.n0n.io`
+- Solid floor for single-user terminal coding with Qwen3-Coder-30B-A3B
+
+## The mental model: three layers
+
+The OSS ecosystem stratifies cleanly once you stop thinking about
+individual products:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ Layer 3: Management & observability                             │
+│   Dispatch tickets to agents, watch spend, track skill           │
+│   accumulation. Where Claude Cowork lives. (Multica, Plane.)    │
+└────────────────────────────┬────────────────────────────────────┘
+                             │ assigns work to ↓
+┌────────────────────────────┴────────────────────────────────────┐
+│ Layer 2: Orchestration frameworks                                │
+│   Code that defines who does what, in what order. Reach for      │
+│   when you want to *encode* a pipeline, not drive interactively. │
+│   (LangGraph, CrewAI, AutoGen, OpenAI Agents SDK.)               │
+└────────────────────────────┬────────────────────────────────────┘
+                             │ runs ↓
+┌────────────────────────────┴────────────────────────────────────┐
+│ Layer 1: Harnesses                                               │
+│   The thing you actually drive — terminal or IDE agent that      │
+│   takes a prompt and uses tools. (OpenCode, Aider, Cline,        │
+│   OpenHands, OpenWork, Goose, Pi.)                               │
+└────────────────────────────┬────────────────────────────────────┘
+                             │ calls ↓
+┌────────────────────────────┴────────────────────────────────────┐
+│ Layer 0: Inference + tools                                       │
+│   Models, MCP servers, embedding stores, voice/vision pipelines. │
+│   (Ollama, vLLM, llama.cpp, MCP catalog.)                        │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+Pick what you need from each layer; you don't need anything from a
+higher layer if the one below is doing the job.
+
+## Layer 1: Harnesses
+
+The interactive driver's seat. We use **OpenCode**; alternatives worth
+having in your back pocket:
+
+- **[Aider](https://aider.chat)** — terminal, git-native, every change
+  is a commit. Strong on multi-file refactors. Often outperforms
+  OpenCode on small-context local models.
+- **[Cline](https://github.com/cline/cline)** — VS Code agent, 5M+
+  installs. v3.58 added native subagents and a CI/CD headless mode.
+  Closest "Claude Code in the IDE" feel.
+- **[OpenHands](https://github.com/All-Hands-AI/OpenHands)** (was
+  OpenDevin) — autonomous agent in a Docker sandbox; edits code, runs
+  tests, browses. Heavier than OpenCode, more autonomous.
+- **[OpenWork](https://supergok.com/openwork-open-source-ai-agent-framework/)**
+  — desktop GUI agent built on `deepagentsjs`. Multi-step planning,
+  filesystem access, **subagent delegation** baked in. `npx` to start.
+- **[Goose](https://github.com/block/goose)** by Block — MCP-native,
+  opinionated about toolkits. Strong fit if you're going all-in on MCP.
+- **[Pi](https://github.com/bradAGI/pi-mono)** — minimal terminal
+  harness; small surface area, easy to read and extend.
+- Catalog: <https://github.com/bradAGI/awesome-cli-coding-agents>
+
+When to add another: when you hit a workflow OpenCode doesn't handle
+well. Aider for refactors, Cline for IDE-bound work, OpenHands for
+autonomous loops, OpenWork for subagent-heavy planning.
+
+## Layer 2: Orchestration frameworks
+
+When you want to *build* a pipeline rather than drive a prebuilt one —
+write code that defines agents, tools, and a flow. Overkill for solo
+coding; on-target for personal automation pipelines.
+
+- **[LangGraph](https://github.com/langchain-ai/langgraph)** — directed
+  graph with conditional edges, built-in checkpointing with time-travel.
+  Surpassed CrewAI in stars in early 2026; production-grade.
+- **[CrewAI](https://github.com/crewAIInc/crewAI)** — role-based crews,
+  agents/tasks/crew in <20 lines of Python. Lowest learning curve,
+  ~5× faster than LangGraph in many cases. A2A protocol support.
+- **[AutoGen / AG2](https://github.com/microsoft/autogen)** —
+  event-driven, async-first; GroupChat coordination with a selector
+  deciding who speaks next.
+- **[OpenAI Agents SDK](https://github.com/openai/openai-agents-python)** —
+  production replacement for Swarm. Explicit agent-to-agent handoffs
+  carrying conversation context.
+- **[OpenAgents](https://github.com/openagents/openagents)** — only
+  framework with native support for **both MCP and A2A** protocols.
+
+All five are model-agnostic — point them at the local Ollama endpoint
+or LiteLLM and they work the same as if you were using GPT-5.
+
+## Layer 3: Management & observability
+
+The Claude Cowork analog layer. Dispatch work, track progress, monitor
+spend. The piece I missed in earlier roadmap drafts:
+
+- **[Multica](https://github.com/multica-ai/multica)** — open-source
+  platform for managing AI coding agents *as teammates*. File issues,
+  assign to agents, they pick up work, write code, report blockers,
+  update statuses. Skill accumulation across tasks. Dispatches to
+  Claude Code, Codex, Copilot CLI, OpenCode, Pi, Cursor Agent, Hermes,
+  Gemini, Kiro, etc. **#1 GitHub TypeScript Trending in April 2026,
+  10.7 k stars in three months.** Closest existing OSS analog to
+  Cowork; works *above* OpenCode rather than replacing it.
+- **[Plane](https://plane.so/)** — AI-native project management with
+  MCP server and full Agent Run lifecycle tracking. Velocity, workload,
+  blockers auto-populated. Self-hostable.
+- **[Mission Control](https://github.com/builderz-labs/mission-control)** —
+  self-hosted dashboard for agent fleets. Token-spend per model, trend
+  charts, real-time posture scoring. SQLite, zero deps. Useful even
+  solo just to watch your local model's spend pattern.
+- **[ClawDeck](https://github.com/builderz-labs/clawdeck)** — focused
+  kanban for OpenClaw agents. Smaller scope, more opinionated.
+- **[OpenWebUI](https://github.com/open-webui/open-webui)** — multi-user
+  chat UI in front of Ollama. Doesn't manage *agent sessions*, but
+  covers shared-LLM-access for a household.
+
+**Still gap (May 2026):** *live* shared agent sessions — multiple humans
+driving one agent on one repo in real time, Cowork's multiplayer mode.
+Multica is async-ticket-flow, not live-collaborative. Watch for
+projects that close this.
+
+## Layer 0: Cross-cutting capabilities
+
+Things you wire in once, useful from any harness above.
+
+### Aggregation & routing
+
+- **[LiteLLM](https://github.com/BerriAI/litellm)** — proxy that exposes
+  one OpenAI-compatible endpoint over many backends. Rules like "code
+  task → local Qwen3, image input → vision model, fast chat → Haiku."
+  Worth standing up early so all higher-layer tools just point at one
+  URL.
+
+### Code intelligence / RAG
+
+The local model can't see beyond what fits in its prompt. Claude Code
+papers over this with long-context recall + WebFetch; locally we need
+explicit retrieval.
+
+- **[Continue.dev](https://continue.dev)** — VS Code / JetBrains plugin
+  with built-in `@codebase` indexer. Pairs with `nomic-embed-text` via
+  Ollama for local embeddings. **Biggest single-shot quality lift** for
+  local-model coding.
+- **[Tabby](https://github.com/TabbyML/tabby)** — self-hosted
+  Copilot-style inline completion. Different niche from agentic coding,
+  covers the "as I type" loop OpenCode doesn't.
+
+### Web access — done
+
+Already wired in `localgenai/opencode/`:
+
+- **[@playwright/mcp](https://github.com/microsoft/playwright-mcp)** —
+  browser automation. Navigate, click, fill forms, read DOM snapshots.
+- **[mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)** —
+  web search via your `searxng.n0n.io` instance. No API keys.
+- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
+  not yet wired; useful if Playwright feels heavy for plain page reads.
+
+### Persistent memory
+
+OpenCode's memory is per-session unless you wire MCP for it.
+
+- **`mcp-server-memory`** (official) — knowledge-graph memory across
+  any MCP client. Crude but works.
+- **[mem0](https://github.com/mem0ai/mem0)** — sophisticated memory,
+  self-hostable with a local embedding model.
+
+### Multi-modal
+
+- **Vision**: `qwen2.5-vl:7b` or `llama3.2-vision:11b` via Ollama.
+  Route via LiteLLM so OpenCode auto-picks vision for image-bearing
+  prompts. Closes the screenshot gap.
+- **Voice**: `whisper-compose.yaml` and `piper-compose.yaml` are
+  already in `localgenai/`. STT in, TTS out — a "talk to your local
+  assistant" loop is something Claude Code itself doesn't do.
+
+### Design (genuine new capability)
+
+Anthropic ships **Claude Design** (Anthropic Labs); the OSS analog is
+real and active:
+
+- **[nexu-io/open-design](https://github.com/nexu-io/open-design)** —
+  local-first OSS alternative to Claude Design. 19 Skills, 71
+  brand-grade Design Systems, generates web/desktop/mobile prototypes,
+  slides, images, videos with sandboxed preview and HTML/PDF/PPTX/MP4
+  export. **Ships a stdio MCP server** exposing design source — tokens,
+  JSX components, entry HTML — as a structured API to any MCP-aware
+  agent. Pairs naturally with OpenCode + the local model.
+- **[Penpot](https://penpot.app)** — full OSS Figma alternative.
+  Different scope (interactive design tool, not agentic).
+- **[OpenPencil](https://github.com/ZSeven-W/openpencil)** — AI-native
+  vector design tool, "design-as-code." Newer; track.
+
+### Skills / MCP catalog
+
+- **[Official MCP registry](https://registry.modelcontextprotocol.io/)**
+- **[Awesome MCP Servers](https://github.com/punkpeye/awesome-mcp-servers)**
+  (1200+ entries)
+- **[mcpservers.org](https://mcpservers.org/)** searchable directory
+
+Build the skills you miss; publish them.
+
+## Prioritized next steps
+
+In rough order of impact-per-hour-spent:
+
+1. **Continue.dev + `nomic-embed-text` in VS Code.** Biggest single
+   quality jump for local-model coding via codebase indexing.
+2. **Symlink `localgenai/opencode/opencode.json`** into place so the
+   Playwright + SearXNG MCP servers activate. (Already written;
+   one `ln -sf` away.)
+3. **Stand up LiteLLM** as the aggregation layer in front of Ollama —
+   even at one model, the abstraction pays off the moment you add a
+   second.
+4. **Stand up Multica** for ticket-style work dispatch. Closest Cowork
+   analog; useful even solo for tracking what the agent has done across
+   sessions.
+5. **Add a vision model** (`qwen2.5-vl:7b`) and route image prompts via
+   LiteLLM. Closes the screenshot gap.
+6. **Stand up OpenDesign** as a compose stack. Connect its MCP server
+   to OpenCode. Genuinely new capability versus Claude Code.
+7. **Try Aider on a real refactor** — different agentic loop than
+   OpenCode, often better with smaller-context local models.
+8. **Wire whisper + piper into a voice loop.** Not a Claude Code parity
+   item — a *new* capability the local stack uniquely enables.
+9. **Mission Control on the box** for token-spend visibility once
+   multi-model routing through LiteLLM is real.
+10. **Evaluate OpenHands or Goose** for autonomous agentic loops where
+    OpenCode's per-task UX feels too hands-on.
+11. **OpenWebUI for household-shared chat access.** Adds value for
+    non-coding LLM use, fills a small gap in the multi-user story.
+
+## Reference catalogs to monitor
+
+- Official MCP registry: <https://registry.modelcontextprotocol.io/>
+- Awesome CLI coding agents: <https://github.com/bradAGI/awesome-cli-coding-agents>
+- Awesome MCP servers: <https://github.com/punkpeye/awesome-mcp-servers>
+- mcpservers.org directory: <https://mcpservers.org/>
+- Anthropic plugin catalog (parity-tracking): <https://claude.com/plugins>
+
+## Open gaps to keep watching
+
+- **Live shared agent sessions** — Cowork's multiplayer mode (multiple
+  humans driving one agent on one repo, real-time). No direct OSS
+  equivalent yet; Multica is async-ticket-flow.
+- **Cohesive "skills" library** as polished as Anthropic's plugin
+  catalog. The MCP ecosystem has the parts but lacks the curation.
+- **AMD ROCm packages for Ubuntu 26.04** — relevant if we ever decide
+  to abandon containerization for native ROCm tooling. (Tracked
+  separately in `TODO.md`.)