Document current coding-workflow stack state

Snapshot of where opencode + Qwen3-Coder + MCPs + Kimi-Linear + voice
  + Phoenix tracing land today, plus in-flight (oc-tree, kimi-linear
  context ramp) and next (ComfyUI) items with pointers to per-project
  NEXT_STEPS.md guides.
This commit is contained in:
2026-05-10 21:14:43 -04:00
parent 228fe8d1ac
commit a29793032d
35 changed files with 2067 additions and 37 deletions

View File

@@ -68,6 +68,23 @@ opencode
Send a prompt; watch it land in Phoenix at <http://framework:6006>.
## Coding workflows — state of the stack (2026-05-10)
| Component | Status | Notes |
|---|---|---|
| **Primary**: opencode → `framework/qwen3-coder:30b` (Ollama) | Working daily-driver | Full MCP toolbox (playwright, searxng, serena, basic-memory, sequential-thinking, task-master). Phoenix bridge emits OTel traces for every call. |
| **Secondary**: opencode → `framework-vllm/kimi-linear` (vLLM) | Configured, experimental | `tool_call: false` — Kimi-Linear is a research model and isn't strongly tool-trained. Long-context chat only. Switch via `/model framework-vllm/kimi-linear`. |
| **Chat front-end for Kimi-Linear**: OpenWebUI | Working | vLLM endpoint wired via `OPENAI_API_BASE_URLS`. Pick `kimi-linear` from the model selector in OpenWebUI. |
| **Voice loop**: faster-whisper + Kokoro | Deployed | OpenAI-compatible endpoints; not yet wired into a hands-free chat loop. |
| **Observability**: Phoenix per-trace + OpenLIT fleet | Working | All opencode prompts traced via `.opencode/plugin/phoenix-bridge.js`. |
| **In flight**: oc-tree TUI sidecar | M0 skeleton done, awaiting probe | Live tree view of opencode's SSE event stream; tmux-side companion to Phoenix's post-hoc traces. See [`oc-tree/NEXT_STEPS.md`](oc-tree/NEXT_STEPS.md). |
| **In flight**: Kimi-Linear context ramp | P0 done at 32K, BIOS+GTT recipe applied | Next: ramp `--max-model-len` toward 1M and benchmark. See [`kimi-linear/NEXT_STEPS.md`](kimi-linear/NEXT_STEPS.md). |
| **Configured but not deployed**: VS Code Continue | YAML ready | Drop [`vscode-continue-config.yml`](vscode-continue-config.yml) into `~/.continue/config.yaml`; `ollama pull nomic-embed-text` on the box for `@codebase` indexing. |
| **Planned next**: ComfyUI for image generation | Phase plan written | Flux.1-Dev via kyuz0 toolbox, same pyinfra pattern as kimi-linear. See task list (CF-P0..P4). |
| **Roadmap-only** (not started): Aider, Cline, OpenHands, LiteLLM, Multica | — | See `Roadmap.md`. |
**Single-line summary**: opencode + Qwen3-Coder + the MCP toolbox is the working coding harness; Kimi-Linear is alongside for long-context chat; oc-tree and ComfyUI are the next two infra additions.
## Why this stack
- **Bandwidth, not VRAM, is the ceiling on Strix Halo.** 256 GB/s memory

View File

@@ -169,6 +169,38 @@ Already wired in `localgenai/opencode/`:
- **[mcp-server-fetch](https://github.com/modelcontextprotocol/servers/tree/main/src/fetch)** —
not yet wired; useful if Playwright feels heavy for plain page reads.
**Next: cited-search variant.** Fork `mcp-searxng` (or wrap it) to expose
a `search_cited` tool that auto-fetches the top 3-5 results, extracts
readable text via mozilla/readability or trafilatura, and returns
numbered `[n] URL\n<excerpt>` blocks with a tool description telling the
model to cite as `[n]`. Closes the citation-quality gap with
[opencode-websearch-cited](https://github.com/ghoulr/opencode-websearch-cited)
(which delegates to Gemini/OpenAI/OpenRouter) while staying fully local.
Tradeoff: 3-5 HTTP fetches per query → slower; needs timeout + graceful
skip for sites that block scrapers.
### Observability — partial
- **[Arize Phoenix](https://github.com/Arize-ai/phoenix)** wired via
`opencode/.opencode/plugin/phoenix-bridge.js`: OpenCode's experimental
OTel spans go to Phoenix at `framework:6006` with the OpenInference
span processor for LLM/tool-aware attributes. Subagents nest under the
parent session as a unified trace tree. **External** trace viewing is
solved.
- **Gap: in-harness visibility.** OpenCode's TUI has no sidebar plugin
API ([sst/opencode#5971](https://github.com/sst/opencode/issues/5971),
open) and no built-in tree/timeline view. Plugins can only show
toasts. While driving the agent, there's no live "what's it doing
right now" pane — you have to alt-tab to Phoenix.
- **Next: SSE sidecar.** `opencode serve` exposes documented SSE streams
at `/global/event` and `/session/{id}/event` carrying every tool call,
message part, and child-session creation. A small TUI sidecar
(Bubbletea/textual, ~200 lines) running in a tmux pane next to
opencode can subscribe and render a live tree:
`session → tool call → child session → tool call`. Subagents *are*
child sessions in the data model, so the hierarchy comes for free.
Phoenix stays the deep-trace store; sidecar is the live pane.
### Persistent memory
OpenCode's memory is per-session unless you wire MCP for it.
@@ -242,6 +274,13 @@ In rough order of impact-per-hour-spent:
OpenCode's per-task UX feels too hands-on.
11. **OpenWebUI for household-shared chat access.** Adds value for
non-coding LLM use, fills a small gap in the multi-user story.
12. **SSE sidecar for in-harness visibility.** Live tree pane subscribed
to opencode's `/session/{id}/event` SSE; closes the "what's it
doing right now" gap that Phoenix doesn't fill (Phoenix is for
after-the-fact deep traces, not glanceable status).
13. **Cited-search MCP.** Fork `mcp-searxng` to add `search_cited` with
auto-fetch + readability extraction. Smaller-impact than the
sidecar but cheap and keeps everything local.
## Reference catalogs to monitor

View File

@@ -0,0 +1,38 @@
# Agent Orchestration Hooks
This directory contains dmux hooks that define how agents work in the OpenCode environment.
## Available Hooks
- `worktree_created` - Triggered when a new worktree is created
- `run_test` - For running tests
- `run_dev` - For development tasks
- `post_merge` - After merging changes
## Hook Usage
Each hook should be executable and contain logic for the specific agent role:
```bash
#!/bin/bash
# Example hook for documentation agent
# .dmux-hooks/worktree_created
# Set environment for qwen3-coder
export DMUX_AGENT="opencode"
export QWEN_MODEL="qwen3-coder:30b"
export AGENT_CONTEXT="homelab-documentation"
# Run qwen3-coder documentation tasks
echo "Initializing homelab documentation agent..."
qwen3-coder --generate-docs \
--model "qwen3-coder:30b" \
--output "$DMUX_WORKTREE_PATH/docs/homelab.md"
```
## Agent Roles
- `documentation-agent.sh` - Generates documentation using qwen3-coder
- `scripting-agent.sh` - Creates automation scripts
- `deployment-agent.sh` - Manages deployments
- `monitoring-agent.sh` - Tracks system performance

View File

@@ -0,0 +1,64 @@
# Gitea OpenCode Agent Project
This is a default project structure for an agentic OpenCode development environment using the qwen3-coder model (30b).
## Project Structure
```
gitea-opencode-agent-project/
├── docs/ # Documentation
├── agents/ # Agent definition files
├── scripts/ # Automation scripts
├── config/ # Configuration files
├── src/ # Source code
├── README.md # Project documentation
├── AGENTS.md # OpenCode agent definitions
└── .gitignore # Git ignore patterns
```
## Agent Definitions
Agent files are stored in the `agents/` directory with the following naming convention:
- `agent-<role>.sh` for bash agents
- `agent-<role>.py` for Python agents
- `agent-<role>.json` for configuration files
## Documentation
The `docs/` directory contains the project documentation, including:
- Agent documentation
- Implementation guides
- Usage examples
- API documentation
## Setup Instructions
1. Initialize the project with `init`
2. Configure the qwen3-coder model
3. Deploy agents using the standard OpenCode Task Master system
4. Run the environment with `docker compose up -d`
## Configuration
Configuration files are stored in `config/` and include:
- Agent parameters
- Environment settings
- Model specifications
- Deployment configurations
## OpenCode Integration
This project is designed to work with OpenCode's agent system directly without requiring dmux. The agents can be invoked as specialized tools using OpenCode's task master functionality:
```bash
# Example OpenCode task master commands
/opencode agent documentation-agent
/opencode agent scripting-agent
/opencode agent deployment-agent
```
The qwen3-coder (30b) model integration is handled through OpenCode's provider configuration, allowing each agent to leverage the advanced reasoning capabilities of the model for:
- Complex problem identification in infrastructure documentation
- Multi-step configuration automation
- Intelligent deployment strategy recommendations
- Analysis of complex dependency chains in network diagrams

View File

@@ -0,0 +1,57 @@
# Agent Orchestration
This project uses OpenCode's Task Master system for agent orchestration instead of dmux.
## Available Agents
- `documentation-agent.sh` - Generates documentation using qwen3-coder
- `scripting-agent.sh` - Creates automation scripts and configurations
- `deployment-agent.sh` - Manages deployments and optimizations
- `monitoring-agent.sh` - Tracks system health and performance
## Usage with OpenCode
Agents can be invoked using OpenCode's Task Master system:
```bash
# Initialize the project
/init
# Run specific agents
/opencode agent documentation-agent
/opencode agent scripting-agent
/opencode agent deployment-agent
# Run all agents
/opencode agents all
```
## Integration with qwen3-coder (30b)
Each agent leverages the qwen3-coder model's advanced reasoning capabilities:
- Complex problem identification in infrastructure documentation
- Multi-step configuration automation
- Intelligent deployment strategy recommendations
- Analysis of complex dependency chains in network diagrams
## Agent Responsibilities
### Documentation Agent
- Generates documentation from code analysis
- Creates standardized documentation templates
- Updates documentation based on infrastructure changes
### Scripting Agent
- Creates Docker Compose configurations
- Generates setup scripts
- Writes automation routines
### Deployment Agent
- Analyzes deployment strategies
- Optimizes container configurations
- Manages multi-service deployments
### Monitoring Agent
- Monitors system performance
- Analyzes infrastructure health
- Detects potential issues

View File

@@ -0,0 +1,26 @@
# Deployment Agent
## Overview
This agent specializes in managing deployments and configurations using the qwen3-coder model (30b) for the homelab infrastructure project.
## Capabilities
- Analyzes deployment strategies
- Optimizes container configurations
- Manages multi-service deployments
- Provides intelligent deployment recommendations
## Usage
```bash
# Optimize deployment with qwen3-coder
qwen3-coder --optimize-deployment \
--model "qwen3-coder:30b" \
--config "docker-compose.yml" \
--target "homelab-system"
```
## Integration
This agent integrates with:
- Docker Compose configurations
- CI/CD pipelines
- Network documentation
- Security configurations

View File

@@ -0,0 +1,26 @@
# Documentation Agent
## Overview
This agent specializes in generating and maintaining documentation using the qwen3-coder model (30b) for the homelab infrastructure project.
## Capabilities
- Generates documentation from code analysis
- Creates standardized documentation templates
- Updates documentation based on infrastructure changes
- Analyzes documentation gaps and suggests improvements
## Usage
```bash
# Generate documentation with qwen3-coder
qwen3-coder --generate-docs \
--model "qwen3-coder:30b" \
--project "homelab-infrastructure" \
--output "docs/homelab.md"
```
## Integration
This agent integrates with:
- Code repositories
- Network diagrams
- Configuration files
- Deployment scripts

View File

@@ -0,0 +1,26 @@
# Monitoring Agent
## Overview
This agent specializes in monitoring system health and performance using the qwen3-coder model (30b) for the homelab infrastructure project.
## Capabilities
- Monitors system performance
- Analyzes infrastructure health
- Detects potential issues
- Provides optimization suggestions
## Usage
```bash
# Monitor system with qwen3-coder
qwen3-coder --monitor-system \
--model "qwen3-coder:30b" \
--target "homelab-network" \
--output "reports/health.md"
```
## Integration
This agent integrates with:
- System logs
- Performance metrics
- Configuration files
- Documentation updates

View File

@@ -0,0 +1,26 @@
# Scripting Agent
## Overview
This agent specializes in generating automation scripts and configurations using the qwen3-coder model (30b) for the homelab infrastructure project.
## Capabilities
- Creates Docker Compose configurations
- Generates setup scripts
- Writes automation routines
- Creates deployment manifests
## Usage
```bash
# Generate automation script with qwen3-coder
qwen3-coder --generate-script \
--model "qwen3-coder:30b" \
--task "docker-compose generation" \
--output "docker-compose.yml"
```
## Integration
This agent integrates with:
- Docker Compose templates
- Configuration files
- Deployment workflows
- Network documentation

146
kimi-linear/NEXT_STEPS.md Normal file
View File

@@ -0,0 +1,146 @@
# kimi-linear — resumption guide
Open this first when picking the work back up.
## What this project is
Kimi-Linear-48B-A3B-Instruct on the Strix Halo box via vLLM, ROCm/TheRock 7.x,
gfx1151. Sits beside Ollama+Qwen3-Coder. Goal: long-context (256K-1M)
local inference using the model architecturally best-suited to the box's
unified-memory shape.
Roadmap entry: `localgenai/Roadmap.md` → "Layer 0: Inference + tools"
(to be added after BIOS unblock).
Container artifacts: `pyinfra/framework/compose/kimi-linear.yml` +
`pyinfra/framework/compose/kimi-linear/` (Dockerfile, build.sh,
patch-tokenizer.sh, smoke.sh, README.md). Deploy push:
`cd pyinfra/framework && ./run.sh`.
## Where we are (2026-05-10)
**P0 — DONE, constrained.** First-ever locally-served Kimi-Linear
generation on a Strix Halo iGPU. Smoke test passes:
`/v1/models` returns `kimi-linear`; tiny generation returns "ok".
Current runtime cap: `--max-model-len 4096`, `--max-num-seqs 1`,
`--num-gpu-blocks-override 32`. The long-context point of the model is
locked behind a **VRAM ceiling** — see "What blocks progress" below.
**P1-P4 — pending.**
## The gauntlet of fixes that got us here (don't re-derive)
All of these are baked into the repo. Reproducing P0 from a clean box is
push the repo + run the steps in `compose/kimi-linear/README.md`.
1. **Image entrypoint missing.** kyuz0 toolboxes drop into a shell;
compose's `command:` gets exec'd as the program. Fix: explicit
`entrypoint: ["vllm", "serve"]` in the compose file, model path as
positional first arg.
2. **Tokenizer ImportError.** `tokenization_kimi.py` imports
`bytes_to_unicode` from a transformers internal that's been removed.
Fix: `patch-tokenizer.sh` inlines the function. Idempotent.
3. **Missing AITER gfx1151 GEMM configs.** kyuz0 image is built for
gfx1151 but doesn't ship the AITER autotuning JSONs for every op
Kimi's MLA layers hit (validated against Qwen/MiniMax, not
MLA-heavy models). Fix: derived `Dockerfile` copies gfx1100 (RDNA3)
configs into gfx1151-named slots — kernels compile + run, tile
sizes not optimal but functional.
4. **MLA AITER FP8 BMM tries to materialize a 30 GB intermediate.**
On top of resident weights, that's ~58 GB needed and we have 31 GB.
Fix: `VLLM_ROCM_USE_AITER_MLA=0` — bypasses AITER for just the MLA
path, keeps it for everything else.
5. **`HSA_XNACK=1` is a trap with vLLM.** It enables HIP demand-paging
into GTT (115 GB ceiling per kernel cmdline `amdgpu.gttsize=117760`)
so vLLM computes "Available KV cache memory: 73.6 GiB", but
PyTorch's actual allocator stays capped at the GPU pool (~31 GB).
vLLM then OOMs trying to allocate the budget it computed. Fix: turn
XNACK *off*, live within the 31 GB pool until BIOS UMA gives a
single bigger pool.
6. **`--swap-space` was removed in modern vLLM.** Don't pass it.
7. **`--num-gpu-blocks-override 32`** as belt-and-braces against
vLLM's KV pool auto-discovery picking a too-big number even without
XNACK.
## What blocks long context
PyTorch's discoverable VRAM equals the **BIOS UMA Frame Buffer Size**,
not `amdgpu.gttsize`. The kernel cmdline is necessary but insufficient.
- 128 GB physical → 64 GB UMA → ~62 GB visible to Linux.
- `rocminfo` reports two ~31 GB GPU pools (Pool 1 coarse / Pool 2 fine).
- PyTorch's allocator only uses one pool (~31 GB) → OOMs at ~30 GB
Kimi weights with little KV headroom.
- User has previously found Framework's BIOS caps UMA at 64 GB.
**Research outcome (2026-05-10):** the right unblock isn't a higher BIOS
UMA cap — it's the inverse. Set UMA *small* and merge the two pools.
| Layer | Setting |
| --- | --- |
| BIOS | Update to **3.05 stable** (Apr 2026); set UMA Frame Buffer = **0.5 GB** or **8 GB** (counter-intuitive but documented — frees pages for GTT) |
| Kernel | **≥ 6.16.9.** Earlier kernels cap ROCm visibility at 15.5 GB. pyinfra's `linux-generic-hwe-24.04` may be on 6.8/6.11 — verify with `uname -r` and upgrade if needed. |
| Cmdline | `amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432` (`ttm.pages_limit` in 4 KiB pages = 128 GiB) |
| Env | `HSA_XNACK=1` **+** `HSA_FORCE_FINE_GRAIN_PCIE=1` (the piece our earlier XNACK attempt was missing) |
| Env | `PYTORCH_HIP_ALLOC_CONF="backend:native,expandable_segments:True,garbage_collection_threshold:0.9"` (HIP variant, not CUDA) |
| PyTorch | TheRock gfx1151 wheels — kyuz0:stable image already uses these |
Confirmed working in kyuz0/amd-strix-halo-vllm-toolboxes (DeepWiki:
"Kernel Parameters and Unified Memory") and the Framework community
"Linux + ROCm: January 2026 Stable Configurations Update" thread.
Single-process allocation budget after this: ~110 GB.
**The hard ceiling**, if all of the above doesn't yield enough: 96 GB
direct UMA. AMD AGESA / StrixHaloPI limit, not Framework's. No path
past 96 GB on signed firmware as of May 2026; 192 GB Strix Halo refresh
is rumored but unreleased.
**Fallback if the merge approach doesn't work for vLLM specifically**:
llama.cpp ROCm + bartowski Q4_K_M GGUF. Different memory model, splits
layers across CPU/GPU. Lower throughput, more flexible.
## When you come back
1. Read research output for BIOS UMA limits. If it landed in the chat,
it's the most recent note in this project's session. If not yet,
re-dispatch (prompt is in conversation history — Framework Desktop
May 2026 UMA Frame Buffer cap research).
2. Decide path:
- **BIOS bump available** → flash, reboot, drop the constraints in
`compose/kimi-linear.yml` (max-model-len, num-gpu-blocks-override),
re-`./smoke.sh`, ramp context.
- **BIOS capped at 64 GB** → pivot to llama.cpp ROCm path or accept
4K context as the long-term reality.
3. Then advance through P1 → P2 → P3 per the roadmap.
## Files of record
- `pyinfra/framework/compose/kimi-linear.yml` — service def, all the
flag/env tradeoffs documented inline.
- `pyinfra/framework/compose/kimi-linear/` — Dockerfile + scripts.
- `pyinfra/framework/deploy.py` — wired into the service loop +
asset-copy block.
- `Roadmap.md` — strategy.
- `StrixHaloMemory.md` — the UMA-vs-GTT discussion that needs a
follow-up paragraph from this work (PyTorch caps at UMA, XNACK is a
trap).
## Decisions worth not relitigating
- **vLLM 0.19.x via kyuz0:stable** chosen over source-build with
`v0.11.2` pin (build.sh exists as fallback). The recipe pin advice
was based on an earlier Moonshot doc; current upstream works.
- **4-bit compressed-tensors** (cyankiwi) chosen over 8-bit. With the
31 GB ceiling, 8-bit wouldn't even fit weights resident.
- **VLLM_ROCM_USE_AITER_MLA=0**, NOT `VLLM_MLA_DISABLE=1`. Granular
disable preserves AITER for non-MLA paths. The full disable is the
next escalation if needed.
- **No upstream filings.** Findings stay in this repo per project
policy (memory: `feedback_private_findings`).

1
oc-tree/.python-version Normal file
View File

@@ -0,0 +1 @@
3.11

120
oc-tree/NEXT_STEPS.md Normal file
View File

@@ -0,0 +1,120 @@
# oc-tree — resumption guide
Open this file first when picking the work back up.
## What this project is
A Python TUI sidecar that subscribes to `opencode serve`'s SSE event
stream and renders a live tree of sessions → messages → tool calls in a
terminal pane next to the opencode TUI. Subagents nest under their
parent via `session.created.info.parentID`.
Roadmap entry: `localgenai/Roadmap.md` → "Layer 0: Cross-cutting
capabilities" → "Observability — partial" + prioritized step #12.
Phoenix (`opencode/.opencode/plugin/phoenix-bridge.js`) already handles
the *external* deep-trace store. oc-tree is the glanceable "what's it
doing right now" pane that lives in-harness (well, in tmux next to it).
## Where we are
- Plan agreed: Python + textual, lives at `localgenai/oc-tree/`.
- Phases: M0 → M1 → M2 → M3 → M4. See descriptions in this repo's task
list (`TaskList`) or the roadmap entry.
- **M0 — DONE** (skeleton). `uv sync` installs cleanly; `oc-tree` and
`oc-tree-probe` entry points resolve. Imports verified.
- **M0 — AWAITING USER ACTION** (schema verification). Probe is built
but hasn't been run against a live `opencode serve`. Three open
schema questions still unanswered.
## What blocks progress
User needs to run the probe against a real opencode session and report
back. Without these answers, M2's reducer design is guesswork.
Run in a tmux pane while `opencode serve` is up:
```sh
cd ~/Documents/obsidian/localgenai/oc-tree
uv run oc-tree-probe
```
Drive opencode through a session that hits **all three** triggers:
- Spawn a Task-tool subagent
- Trigger at least one permission prompt
- Make at least one regular tool call (Read/Bash/etc.)
Ctrl-C the probe. Then run:
```sh
# Q1: does session.created.info.parentID populate for subagents?
jq -r 'select(.type=="session.created") | .raw.properties.info.parentID' \
/tmp/oc-tree-probe.jsonl
# Q2: does message.part.updated carry full part or delta?
jq -c 'select(.type=="message.part.updated") | .raw.properties.part' \
/tmp/oc-tree-probe.jsonl | head
# Q3: what permission.* events actually fire?
jq -r '.type' /tmp/oc-tree-probe.jsonl | grep -i permission | sort -u
```
Paste the output (or the JSONL file path) into the next session.
## What happens next
Once probe answers are in:
1. Mark M0 complete, start **M1 (flat session list)** — textual app,
live-updating list of sessions with status, no nesting yet. Proves
the reducer + render loop. Independent of the schema answers, so
could start in parallel.
2. **M2 (tree view)** — needs probe answers to know:
- Whether to nest by `parentID` directly (Q1 yes) or fall back to
inferring subagents from `Task` tool-part response payloads.
- Whether the part-update reducer replaces by `partID` (Q2 = full
part) or merges a delta (Q2 = delta).
- What permission events to render (Q3).
3. **M3 (reconnect + state rebuild)** — heartbeat watchdog, REST replay
on disconnect. Driven by sst/opencode#15149/#22198 known leaks.
4. **M4 (polish)** — keybindings, theme, tmux layout doc.
## File layout
```
localgenai/oc-tree/
├── pyproject.toml uv project (textual, httpx, httpx-sse)
├── README.md user-facing readme
├── NEXT_STEPS.md this file
├── .python-version 3.11
└── src/oc_tree/
├── client.py OpenCodeClient: REST + SSE
├── probe.py schema-verification CLI
├── __main__.py stub for `oc-tree` (real TUI in M1)
└── widgets/ empty (populated in M1+)
```
## Key references
- opencode server docs: <https://opencode.ai/docs/server/>
- Authoritative schema: `GET /doc` on a running `opencode serve` (do
not hardcode — fetch per-version).
- sst/opencode#7451 — no per-session SSE endpoint; we filter `/event`
client-side.
- sst/opencode#6573 — Task subagent over `opencode serve` may have
bugs; this is what Q1 verifies.
- sst/opencode#11424`message.part.updated` sometimes replays full
state; this is what Q2 verifies.
- sst/opencode#15149, #22198 — SSE disconnect leaks; informs M3
shutdown discipline.
## Decisions worth not relitigating
- **Python + textual** chosen over Go+Bubbletea (faster iteration,
matches stack — uvx already in use) and Node+ink (worse SSE/UI
ergonomics; phoenix-bridge.js doesn't justify matching).
- **Read-only v1.** No sending messages, no editing. Just visibility.
- **Lives in `localgenai/oc-tree/`** rather than its own repo; can be
extracted later if it warrants a standalone release.
- **State rebuild via REST on every (re)connect** rather than trusting
SSE catchup or `Last-Event-ID` (server doesn't honor it).

72
oc-tree/README.md Normal file
View File

@@ -0,0 +1,72 @@
# oc-tree
Live tree-view sidecar for [opencode](https://opencode.ai). Subscribes
to the `opencode serve` SSE event stream and renders a live hierarchy of
sessions → messages → tool calls in a terminal pane next to the opencode
TUI.
Phoenix (`opencode/.opencode/plugin/phoenix-bridge.js`) handles
after-the-fact deep traces; oc-tree is the glanceable "what's it doing
right now" pane.
## Status
- **M0** — skeleton + schema probe (current).
- M1 — flat session list (textual).
- M2 — tree view with subagent nesting via `parentID`.
- M3 — heartbeat watchdog + REST replay on reconnect.
- M4 — polish (keybindings, theme, tmux layout doc).
Picking the work back up: read [`NEXT_STEPS.md`](./NEXT_STEPS.md) first.
## Requirements
- Python 3.11+
- `uv`
- A running `opencode serve` (default `127.0.0.1:4096`)
## Quickstart
```sh
cd ~/Documents/obsidian/localgenai/oc-tree
uv sync
uv run oc-tree-probe
```
Drive opencode in another terminal. Probe writes JSONL frames to
`/tmp/oc-tree-probe.jsonl` and a live counter to stdout. Ctrl-C to stop;
event-type counts print on exit.
### M0 verification queries
After driving a session that includes a Task subagent, a permission
prompt, and a tool call:
```sh
# 1. Does session.created.info.parentID populate for subagents?
jq -r 'select(.type=="session.created") | .raw.properties.info.parentID' \
/tmp/oc-tree-probe.jsonl
# 2. Does message.part.updated carry full parts or deltas?
jq -c 'select(.type=="message.part.updated") | .raw.properties.part' \
/tmp/oc-tree-probe.jsonl | head
# 3. Which permission.* events actually fire?
jq -r '.type' /tmp/oc-tree-probe.jsonl | grep -i permission | sort -u
```
## Configuration
| Env var | Default |
| --------------------------- | ------------------------ |
| `OPENCODE_URL` | `http://127.0.0.1:4096` |
| `OPENCODE_SERVER_USERNAME` | `opencode` (if pw set) |
| `OPENCODE_SERVER_PASSWORD` | _(unset → no auth)_ |
## References
- [opencode server docs](https://opencode.ai/docs/server/)
- [sst/opencode#7451](https://github.com/sst/opencode/issues/7451) — no per-session SSE endpoint
- [sst/opencode#6573](https://github.com/sst/opencode/issues/6573) — Task subagent over `opencode serve`
- [sst/opencode#11424](https://github.com/sst/opencode/issues/11424) — replayed message.part.updated frames
- [sst/opencode#15149](https://github.com/sst/opencode/issues/15149) — SSE disconnect leaves server hung

21
oc-tree/pyproject.toml Normal file
View File

@@ -0,0 +1,21 @@
[project]
name = "oc-tree"
version = "0.0.1"
description = "Live tree-view sidecar for opencode SSE events"
requires-python = ">=3.11"
dependencies = [
"textual>=0.80",
"httpx>=0.27",
"httpx-sse>=0.4",
]
[project.scripts]
oc-tree = "oc_tree.__main__:main"
oc-tree-probe = "oc_tree.probe:main"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["src/oc_tree"]

View File

View File

@@ -0,0 +1,20 @@
"""Entry point. M0: stub that points users at the probe.
The textual app lands in M1+. Until then, this command exists so the
script registration in pyproject.toml resolves.
"""
from __future__ import annotations
def main() -> int:
print(
"oc-tree TUI ships in M1.\n"
"For now, run the schema probe:\n"
" uv run oc-tree-probe\n"
)
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,93 @@
"""Thin async client for opencode's HTTP + SSE API.
Talks to `opencode serve` (default 127.0.0.1:4096). Auth is off unless
OPENCODE_SERVER_PASSWORD is set, matching upstream defaults.
The single SSE endpoint is `GET /event`; per-session streams don't exist
(sst/opencode#7451), so callers filter by sessionID client-side.
"""
from __future__ import annotations
import base64
import os
from collections.abc import AsyncIterator
from dataclasses import dataclass
from typing import Any
import httpx
from httpx_sse import aconnect_sse
@dataclass(frozen=True)
class Event:
type: str
properties: dict[str, Any]
raw: dict[str, Any]
def _auth_header() -> dict[str, str]:
pw = os.environ.get("OPENCODE_SERVER_PASSWORD", "")
if not pw:
return {}
user = os.environ.get("OPENCODE_SERVER_USERNAME", "opencode")
token = base64.b64encode(f"{user}:{pw}".encode()).decode()
return {"Authorization": f"Basic {token}"}
class OpenCodeClient:
def __init__(
self,
base_url: str | None = None,
*,
timeout: float = 30.0,
) -> None:
self.base_url = (
base_url
or os.environ.get("OPENCODE_URL")
or "http://127.0.0.1:4096"
).rstrip("/")
# SSE needs no read timeout; REST calls cap at `timeout`.
self._sse_timeout = httpx.Timeout(timeout, read=None)
self._rest_timeout = httpx.Timeout(timeout)
self._headers = _auth_header()
async def list_sessions(
self, *, scope: str = "project", limit: int = 50
) -> list[dict[str, Any]]:
async with httpx.AsyncClient(
base_url=self.base_url,
headers=self._headers,
timeout=self._rest_timeout,
) as c:
r = await c.get("/session", params={"scope": scope, "limit": limit})
r.raise_for_status()
return r.json()
async def get_session_messages(self, session_id: str) -> list[dict[str, Any]]:
async with httpx.AsyncClient(
base_url=self.base_url,
headers=self._headers,
timeout=self._rest_timeout,
) as c:
r = await c.get(f"/session/{session_id}/message")
r.raise_for_status()
return r.json()
async def stream_events(self) -> AsyncIterator[Event]:
"""Yield events from /event. Caller handles reconnect."""
async with httpx.AsyncClient(
base_url=self.base_url,
headers=self._headers,
timeout=self._sse_timeout,
) as c:
async with aconnect_sse(c, "GET", "/event") as src:
async for sse in src.aiter_sse():
if not sse.data:
continue
payload = sse.json()
yield Event(
type=payload.get("type", ""),
properties=payload.get("properties", {}) or {},
raw=payload,
)

View File

@@ -0,0 +1,100 @@
"""Schema probe: dump raw /event frames to JSONL for inspection.
Used in M0 to verify three open questions before building the UI:
1. Does session.created.info.parentID get populated for Task subagents?
(sst/opencode#6573 — may be broken over `opencode serve`.)
2. Does message.part.updated carry the full part or a delta?
(sst/opencode#11424 — frames sometimes replay full state.)
3. What permission.* event names actually fire? (Undocumented.)
Run, drive opencode through a session that includes a Task-tool
subagent + a permission prompt + at least one tool call, then grep the
output JSONL for `parentID`, `permission`, and successive part frames.
Usage:
uv run oc-tree-probe # writes /tmp/oc-tree-probe.jsonl
uv run oc-tree-probe --out file.jsonl
"""
from __future__ import annotations
import argparse
import asyncio
import json
import signal
import sys
from collections import Counter
from datetime import datetime, timezone
from pathlib import Path
from .client import OpenCodeClient
async def _run(out_path: Path) -> int:
client = OpenCodeClient()
counts: Counter[str] = Counter()
print(f"oc-tree probe → {out_path}")
print(f" base_url={client.base_url}")
print(" drive opencode now; ctrl-c to stop and print summary\n")
stop = asyncio.Event()
loop = asyncio.get_running_loop()
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, stop.set)
with out_path.open("a", encoding="utf-8") as f:
try:
stream = client.stream_events()
stream_task = asyncio.create_task(_consume(stream, f, counts))
stop_task = asyncio.create_task(stop.wait())
done, pending = await asyncio.wait(
{stream_task, stop_task},
return_when=asyncio.FIRST_COMPLETED,
)
for t in pending:
t.cancel()
for t in done:
exc = t.exception()
if exc and not isinstance(exc, asyncio.CancelledError):
print(f"\nstream error: {exc!r}", file=sys.stderr)
except KeyboardInterrupt:
pass
print("\n--- event counts ---")
for t, n in counts.most_common():
print(f" {n:>5} {t}")
return 0
async def _consume(stream, f, counts: Counter[str]) -> None:
async for event in stream:
counts[event.type] += 1
line = json.dumps(
{
"ts": datetime.now(timezone.utc).isoformat(),
"type": event.type,
"raw": event.raw,
},
ensure_ascii=False,
)
f.write(line + "\n")
f.flush()
# Live tick so the operator knows it's working.
sys.stdout.write(f"\r{sum(counts.values()):>6} events last={event.type:<40}")
sys.stdout.flush()
def main() -> int:
ap = argparse.ArgumentParser(description="opencode SSE schema probe")
ap.add_argument(
"--out",
type=Path,
default=Path("/tmp/oc-tree-probe.jsonl"),
help="output JSONL file (appended)",
)
args = ap.parse_args()
return asyncio.run(_run(args.out))
if __name__ == "__main__":
raise SystemExit(main())

View File

212
oc-tree/uv.lock generated Normal file
View File

@@ -0,0 +1,212 @@
version = 1
requires-python = ">=3.11"
[[package]]
name = "anyio"
version = "4.13.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "idna" },
{ name = "typing-extensions", marker = "python_full_version < '3.13'" },
]
sdist = { url = "https://files.pythonhosted.org/packages/19/14/2c5dd9f512b66549ae92767a9c7b330ae88e1932ca57876909410251fe13/anyio-4.13.0.tar.gz", hash = "sha256:334b70e641fd2221c1505b3890c69882fe4a2df910cba14d97019b90b24439dc", size = 231622 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/da/42/e921fccf5015463e32a3cf6ee7f980a6ed0f395ceeaa45060b61d86486c2/anyio-4.13.0-py3-none-any.whl", hash = "sha256:08b310f9e24a9594186fd75b4f73f4a4152069e3853f1ed8bfbf58369f4ad708", size = 114353 },
]
[[package]]
name = "certifi"
version = "2026.4.22"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/25/ee/6caf7a40c36a1220410afe15a1cc64993a1f864871f698c0f93acb72842a/certifi-2026.4.22.tar.gz", hash = "sha256:8d455352a37b71bf76a79caa83a3d6c25afee4a385d632127b6afb3963f1c580", size = 137077 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/22/30/7cd8fdcdfbc5b869528b079bfb76dcdf6056b1a2097a662e5e8c04f42965/certifi-2026.4.22-py3-none-any.whl", hash = "sha256:3cb2210c8f88ba2318d29b0388d1023c8492ff72ecdde4ebdaddbb13a31b1c4a", size = 135707 },
]
[[package]]
name = "h11"
version = "0.16.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/01/ee/02a2c011bdab74c6fb3c75474d40b3052059d95df7e73351460c8588d963/h11-0.16.0.tar.gz", hash = "sha256:4e35b956cf45792e4caa5885e69fba00bdbc6ffafbfa020300e549b208ee5ff1", size = 101250 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/04/4b/29cac41a4d98d144bf5f6d33995617b185d14b22401f75ca86f384e87ff1/h11-0.16.0-py3-none-any.whl", hash = "sha256:63cf8bbe7522de3bf65932fda1d9c2772064ffb3dae62d55932da54b31cb6c86", size = 37515 },
]
[[package]]
name = "httpcore"
version = "1.0.9"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "certifi" },
{ name = "h11" },
]
sdist = { url = "https://files.pythonhosted.org/packages/06/94/82699a10bca87a5556c9c59b5963f2d039dbd239f25bc2a63907a05a14cb/httpcore-1.0.9.tar.gz", hash = "sha256:6e34463af53fd2ab5d807f399a9b45ea31c3dfa2276f15a2c3f00afff6e176e8", size = 85484 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/7e/f5/f66802a942d491edb555dd61e3a9961140fd64c90bce1eafd741609d334d/httpcore-1.0.9-py3-none-any.whl", hash = "sha256:2d400746a40668fc9dec9810239072b40b4484b640a8c38fd654a024c7a1bf55", size = 78784 },
]
[[package]]
name = "httpx"
version = "0.28.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "anyio" },
{ name = "certifi" },
{ name = "httpcore" },
{ name = "idna" },
]
sdist = { url = "https://files.pythonhosted.org/packages/b1/df/48c586a5fe32a0f01324ee087459e112ebb7224f646c0b5023f5e79e9956/httpx-0.28.1.tar.gz", hash = "sha256:75e98c5f16b0f35b567856f597f06ff2270a374470a5c2392242528e3e3e42fc", size = 141406 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/2a/39/e50c7c3a983047577ee07d2a9e53faf5a69493943ec3f6a384bdc792deb2/httpx-0.28.1-py3-none-any.whl", hash = "sha256:d909fcccc110f8c7faf814ca82a9a4d816bc5a6dbfea25d6591d6985b8ba59ad", size = 73517 },
]
[[package]]
name = "httpx-sse"
version = "0.4.3"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/0f/4c/751061ffa58615a32c31b2d82e8482be8dd4a89154f003147acee90f2be9/httpx_sse-0.4.3.tar.gz", hash = "sha256:9b1ed0127459a66014aec3c56bebd93da3c1bc8bb6618c8082039a44889a755d", size = 15943 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/d2/fd/6668e5aec43ab844de6fc74927e155a3b37bf40d7c3790e49fc0406b6578/httpx_sse-0.4.3-py3-none-any.whl", hash = "sha256:0ac1c9fe3c0afad2e0ebb25a934a59f4c7823b60792691f779fad2c5568830fc", size = 8960 },
]
[[package]]
name = "idna"
version = "3.13"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/ce/cc/762dfb036166873f0059f3b7de4565e1b5bc3d6f28a414c13da27e442f99/idna-3.13.tar.gz", hash = "sha256:585ea8fe5d69b9181ec1afba340451fba6ba764af97026f92a91d4eef164a242", size = 194210 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/5d/13/ad7d7ca3808a898b4612b6fe93cde56b53f3034dcde235acb1f0e1df24c6/idna-3.13-py3-none-any.whl", hash = "sha256:892ea0cde124a99ce773decba204c5552b69c3c67ffd5f232eb7696135bc8bb3", size = 68629 },
]
[[package]]
name = "linkify-it-py"
version = "2.1.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "uc-micro-py" },
]
sdist = { url = "https://files.pythonhosted.org/packages/2e/c9/06ea13676ef354f0af6169587ae292d3e2406e212876a413bf9eece4eb23/linkify_it_py-2.1.0.tar.gz", hash = "sha256:43360231720999c10e9328dc3691160e27a718e280673d444c38d7d3aaa3b98b", size = 29158 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b4/de/88b3be5c31b22333b3ca2f6ff1de4e863d8fe45aaea7485f591970ec1d3e/linkify_it_py-2.1.0-py3-none-any.whl", hash = "sha256:0d252c1594ecba2ecedc444053db5d3a9b7ec1b0dd929c8f1d74dce89f86c05e", size = 19878 },
]
[[package]]
name = "markdown-it-py"
version = "4.2.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "mdurl" },
]
sdist = { url = "https://files.pythonhosted.org/packages/06/ff/7841249c247aa650a76b9ee4bbaeae59370dc8bfd2f6c01f3630c35eb134/markdown_it_py-4.2.0.tar.gz", hash = "sha256:04a21681d6fbb623de53f6f364d352309d4094dd4194040a10fd51833e418d49", size = 82454 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b3/81/4da04ced5a082363ecfa159c010d200ecbd959ae410c10c0264a38cac0f5/markdown_it_py-4.2.0-py3-none-any.whl", hash = "sha256:9f7ebbcd14fe59494226453aed97c1070d83f8d24b6fc3a3bcf9a38092641c4a", size = 91687 },
]
[package.optional-dependencies]
linkify = [
{ name = "linkify-it-py" },
]
[[package]]
name = "mdit-py-plugins"
version = "0.6.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "markdown-it-py" },
]
sdist = { url = "https://files.pythonhosted.org/packages/d8/3d/e0e8d9d1cee04f758120915e2b2a3a07eb41f8cf4654b4734788a522bcd1/mdit_py_plugins-0.6.0.tar.gz", hash = "sha256:2436f14a7295837ac9228a36feeabda867c4abc488c8d019ad5c0bda88eee040", size = 56025 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/71/d6/48f5b9e44e2e760855d7b489b1317cd7620e82dcb73197961e5cc1391348/mdit_py_plugins-0.6.0-py3-none-any.whl", hash = "sha256:f7e7a25d8b616fee99cb1e330da73451d11a8061baf39bb9663ab9ce0e005b90", size = 66655 },
]
[[package]]
name = "mdurl"
version = "0.1.2"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/d6/54/cfe61301667036ec958cb99bd3efefba235e65cdeb9c84d24a8293ba1d90/mdurl-0.1.2.tar.gz", hash = "sha256:bb413d29f5eea38f31dd4754dd7377d4465116fb207585f97bf925588687c1ba", size = 8729 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/b3/38/89ba8ad64ae25be8de66a6d463314cf1eb366222074cfda9ee839c56a4b4/mdurl-0.1.2-py3-none-any.whl", hash = "sha256:84008a41e51615a49fc9966191ff91509e3c40b939176e643fd50a5c2196b8f8", size = 9979 },
]
[[package]]
name = "oc-tree"
version = "0.0.1"
source = { editable = "." }
dependencies = [
{ name = "httpx" },
{ name = "httpx-sse" },
{ name = "textual" },
]
[package.metadata]
requires-dist = [
{ name = "httpx", specifier = ">=0.27" },
{ name = "httpx-sse", specifier = ">=0.4" },
{ name = "textual", specifier = ">=0.80" },
]
[[package]]
name = "platformdirs"
version = "4.9.6"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/9f/4a/0883b8e3802965322523f0b200ecf33d31f10991d0401162f4b23c698b42/platformdirs-4.9.6.tar.gz", hash = "sha256:3bfa75b0ad0db84096ae777218481852c0ebc6c727b3168c1b9e0118e458cf0a", size = 29400 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/75/a6/a0a304dc33b49145b21f4808d763822111e67d1c3a32b524a1baf947b6e1/platformdirs-4.9.6-py3-none-any.whl", hash = "sha256:e61adb1d5e5cb3441b4b7710bea7e4c12250ca49439228cc1021c00dcfac0917", size = 21348 },
]
[[package]]
name = "pygments"
version = "2.20.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/c3/b2/bc9c9196916376152d655522fdcebac55e66de6603a76a02bca1b6414f6c/pygments-2.20.0.tar.gz", hash = "sha256:6757cd03768053ff99f3039c1a36d6c0aa0b263438fcab17520b30a303a82b5f", size = 4955991 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/f4/7e/a72dd26f3b0f4f2bf1dd8923c85f7ceb43172af56d63c7383eb62b332364/pygments-2.20.0-py3-none-any.whl", hash = "sha256:81a9e26dd42fd28a23a2d169d86d7ac03b46e2f8b59ed4698fb4785f946d0176", size = 1231151 },
]
[[package]]
name = "rich"
version = "15.0.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "markdown-it-py" },
{ name = "pygments" },
]
sdist = { url = "https://files.pythonhosted.org/packages/c0/8f/0722ca900cc807c13a6a0c696dacf35430f72e0ec571c4275d2371fca3e9/rich-15.0.0.tar.gz", hash = "sha256:edd07a4824c6b40189fb7ac9bc4c52536e9780fbbfbddf6f1e2502c31b068c36", size = 230680 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/82/3b/64d4899d73f91ba49a8c18a8ff3f0ea8f1c1d75481760df8c68ef5235bf5/rich-15.0.0-py3-none-any.whl", hash = "sha256:33bd4ef74232fb73fe9279a257718407f169c09b78a87ad3d296f548e27de0bb", size = 310654 },
]
[[package]]
name = "textual"
version = "8.2.5"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "markdown-it-py", extra = ["linkify"] },
{ name = "mdit-py-plugins" },
{ name = "platformdirs" },
{ name = "pygments" },
{ name = "rich" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/62/1e/1eedc5bac184d00aaa5f9a99095f7e266af3ec46fa926c1051be5d358da1/textual-8.2.5.tar.gz", hash = "sha256:6c894e65a879dadb4f6cf46ddcfedb0173ff7e0cb1fe605ff7b357a597bdbc90", size = 1851596 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/cd/01/c4555f9c8a692ff83d84930150540f743ce94c89234f9e9a15ff4baba3a8/textual-8.2.5-py3-none-any.whl", hash = "sha256:247d2aa2faf222749c321f88a736247f37ee2c023604079c7490bfacddfcd4b2", size = 727050 },
]
[[package]]
name = "typing-extensions"
version = "4.15.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/72/94/1a15dd82efb362ac84269196e94cf00f187f7ed21c242792a923cdb1c61f/typing_extensions-4.15.0.tar.gz", hash = "sha256:0cea48d173cc12fa28ecabc3b837ea3cf6f38c6d1136f85cbaaf598984861466", size = 109391 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/18/67/36e9267722cc04a6b9f15c7f3441c2363321a3ea07da7ae0c0707beb2a9c/typing_extensions-4.15.0-py3-none-any.whl", hash = "sha256:f0fa19c6845758ab08074a0cfa8b7aecb71c999ca73d62883bc25cc018c4e548", size = 44614 },
]
[[package]]
name = "uc-micro-py"
version = "2.0.0"
source = { registry = "https://pypi.org/simple" }
sdist = { url = "https://files.pythonhosted.org/packages/78/67/9a363818028526e2d4579334460df777115bdec1bb77c08f9db88f6389f2/uc_micro_py-2.0.0.tar.gz", hash = "sha256:c53691e495c8db60e16ffc4861a35469b0ba0821fe409a8a7a0a71864d33a811", size = 6611 }
wheels = [
{ url = "https://files.pythonhosted.org/packages/61/73/d21edf5b204d1467e06500080a50f79d49ef2b997c79123a536d4a17d97c/uc_micro_py-2.0.0-py3-none-any.whl", hash = "sha256:3603a3859af53e5a39bc7677713c78ea6589ff188d70f4fee165db88e22b242c", size = 6383 },
]

View File

@@ -5,14 +5,61 @@ stack. `install.sh` deploys it to `~/.config/opencode/` on a Mac.
## What's wired up
- **Local model**: `framework/qwen3-coder:30b` served by Ollama on the
Framework Desktop, reachable over Tailscale.
- **Local models**: two providers, manually switched via `/model`.
- `framework/qwen3-coder:30b` — Qwen3-Coder 30B-A3B via Ollama, the
daily-driver coding model. 128K context, 11434.
- `framework-vllm/kimi-linear` — Kimi-Linear 48B-A3B via vLLM, the
long-context play (hybrid KDA/MLA, MoE 3B active). 32K context for
now (ramps further in P3 of the kimi-linear roadmap), 8000.
**Tools disabled** (`tool_call: false`) — Kimi-Linear is a research
architecture release and isn't strongly tool-trained; the model
knows the Kimi-K2 tool tokens but emits non-structured output when
given an MCP toolbox. Use it for chat / long-context reasoning;
switch to `framework/qwen3-coder:30b` for agentic work.
- **Playwright MCP** ([@playwright/mcp](https://github.com/microsoft/playwright-mcp)) —
browser automation. The model can navigate pages, click, fill forms,
read DOM snapshots. Closes the agentic-browsing gap.
- **SearXNG MCP** ([mcp-searxng](https://github.com/ihor-sokoliuk/mcp-searxng)) —
web search via your self-hosted instance at <https://searxng.n0n.io>.
No external API keys, no rate-limit roulette.
- **Serena MCP** ([oraios/serena](https://github.com/oraios/serena)) —
LSP-backed semantic code navigation (find symbol, references, rename,
insert before/after). Cuts the tokens a local 70B-class model burns on
grep-style flailing by roughly an order of magnitude. Uses a **custom
trimmed context** (`serena-ide-trim.yml`) that exposes only the 8
unique-LSP-value tools — JetBrains tools, line-level edits redundant
with opencode's `Edit`, Serena's own memory tools (basic-memory MCP is
canonical), and onboarding/meta noise are all excluded. Down from 46
raw → 41 ide-context-filtered → **8 active**. Scoped to the cwd via
`--project-from-cwd`.
- **basic-memory MCP** ([basicmachines-co/basic-memory](https://github.com/basicmachines-co/basic-memory)) —
Markdown-backed persistent memory across sessions. Storage lives in
`~/Documents/obsidian/AI-memory/` (symlinked from `~/basic-memory`),
so notes are browsable in Obsidian's graph and search. Replaces
Claude Code's auto-memory write-back, which opencode lacks natively.
- **sequential-thinking MCP** ([modelcontextprotocol/servers/sequentialthinking](https://github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking)) —
externalizes chain-of-thought as tool calls. Helps weaker local
models stay on-plan over multi-step work; near-zero cost when not
actively used.
- **github MCP** ([github/github-mcp-server](https://github.com/github/github-mcp-server)) —
GitHub repo / issue / PR / code-search access. Launched with
`--read-only` and a narrowed `--toolsets repos,issues,pull_requests,code_security`
allowlist. With a **classic** PAT (`ghp_…`), GitHub's auto-scope-filtering
(Jan 2026) trims tools further by hiding ones whose scopes the token
lacks — saves ~23k tokens of tool-list overhead, meaningful for a 70B's
effective context. Requires `GITHUB_PERSONAL_ACCESS_TOKEN` to be exported
in your shell env (not in opencode.json). Drop `--read-only` from
`opencode.json` once you trust the model's tool calls.
**Note**: This MCP is disabled since the user is utilizing a self-hosted Gitea instance instead of GitHub.
- **task-master MCP** ([eyaltoledano/claude-task-master](https://github.com/eyaltoledano/claude-task-master)) —
Workflow / task-gate MCP. File-based: each project gets a
`.taskmaster/` dir with tasks, complexity, and config — no DB, no
external service. `OLLAMA_BASE_URL` is pre-set in `opencode.json` so
task-master's AI features (parse-prd, expand-task) route through your
framework Ollama. The npm-global install also provides a `task-master`
CLI (`task-master init` to scaffold per-project). Replaces the
workflow-gate role originally proposed for Archon, without Supabase.
- **Phoenix bridge plugin** (`.opencode/plugin/phoenix-bridge.js`) —
exports OpenTelemetry spans for every LLM call, tool call, and
subagent invocation to the Phoenix container running on the Framework
@@ -31,13 +78,22 @@ the plugin. Each step checks before doing work. Specifically:
1. Verifies Homebrew is present (won't install it for you)
2. `brew install node uv jq sst/tap/opencode` (skips if already at latest)
3. Pre-caches Playwright's chromium so the first MCP call is instant
4. `npm install` in `.opencode/plugin/` for the Phoenix bridge OTel deps
5. Generates `~/.config/opencode/opencode.json` from the repo's
4. `uv tool install serena-agent@latest --prerelease=allow` so opencode
can launch Serena as a plain `serena` binary on PATH (faster than
re-resolving via `uvx` on every session)
5. Creates `~/Documents/obsidian/AI-memory/` and symlinks `~/basic-memory`
to it, so basic-memory MCP writes into the Obsidian vault by default
6. `brew install github-mcp-server` and warns if `GITHUB_PERSONAL_ACCESS_TOKEN`
isn't set in your shell — the MCP needs it to authenticate
7. `npm install -g task-master-ai` (workflow MCP, also exposes the
`task-master` CLI for `task-master init` per project)
8. `npm install` in `.opencode/plugin/` for the Phoenix bridge OTel deps
9. Generates `~/.config/opencode/opencode.json` from the repo's
`opencode.json`, rewriting relative plugin paths to absolute so
OpenCode loads the plugin regardless of which directory it's launched
from
Step 5 is the reason the deployed config isn't a plain symlink. The
Step 9 is the reason the deployed config isn't a plain symlink. The
repo's `opencode.json` uses a relative plugin path (`./...`) so it stays
valid in place; the deployed copy is generated with that path resolved
to an absolute one. Edits to the repo's `opencode.json` need a re-run
@@ -57,11 +113,35 @@ Then in opencode:
```
opencode
> /mcp # should list playwright and searxng as connected
> /mcp # should list playwright, searxng, serena, basic-memory,
# sequential-thinking, github, task-master as connected
> search the web for "qwen3-coder benchmarks"
> open https://example.com and tell me the H1
> use serena to find the definition of `parse_request`
> remember: this project ships its memory into the Obsidian vault
> /sequentialthinking think through the trade-offs of X vs Y
> list my recent github PRs across all repos
> task-master init # then ask the model to plan tasks for this project
```
For parallel agents, plain tmux + git worktree is enough at the 70B's
~2-pane concurrency ceiling. A two-line zsh helper covers the
"new isolated worktree → split tmux pane → start opencode" loop:
```sh
work() {
local name="${1:?usage: work <branch-name>}"
local wt="../$(basename "$PWD")-$name"
git worktree add "$wt" -b "$name" && tmux split-window -h -c "$wt" "opencode"
}
unwork() { local wt="$PWD"; cd .. && git worktree remove --force "$wt"; }
```
Serena's first invocation in a project may take a few seconds — it
indexes the workspace via the language server. basic-memory's first
write creates the project layout under `~/Documents/obsidian/AI-memory/`
which Obsidian will pick up on its next vault scan.
## Phoenix tracing
The plugin at `.opencode/plugin/phoenix-bridge.js` boots an OpenTelemetry
@@ -142,3 +222,32 @@ plugin no-ops — the rest of OpenCode still works fine.
using the same `type/command/enabled` shape. The
[official MCP registry](https://registry.modelcontextprotocol.io/)
and [Awesome MCP Servers](https://mcpservers.org/) catalog options.
- **Tool-list bloat is real on a local 70B.** Every tool description
costs context. Five MCP servers exposing ~10 tools each puts the
active-tool list around 50 — manageable, but adding two more
full-spectrum servers (e.g. GitHub MCP at ~70 tools without scope
filtering, plus Context7) starts crowding effective context. Prefer
servers with toolset filtering or per-agent allow-lists in opencode.
- **basic-memory storage path.** The symlink `~/basic-memory`
`~/Documents/obsidian/AI-memory` is created by `install.sh` only if
`~/basic-memory` doesn't already exist. If you'd previously run
basic-memory before this setup, move that directory's contents into
`AI-memory/` first, then delete `~/basic-memory` and re-run
`install.sh`.
- **Serena PATH gotcha.** `uv tool install` puts `serena` in
`~/.local/bin/`. If your shell rc doesn't export that, `opencode`
won't find the binary. The script warns; fix is one line in
`~/.zshrc`: `export PATH="$HOME/.local/bin:$PATH"`.
- **Serena tool trim** (`serena-ide-trim.yml`). The custom context
excludes 28 tools beyond what the built-in `ide` context already
filters. To re-expose any of them, edit
[`serena-ide-trim.yml`](serena-ide-trim.yml) and remove the entry
from `excluded_tools`, then re-run `./install.sh`. The path injection
(`./serena-ide-trim.yml` → absolute) is handled by install.sh's jq
pass at deploy time.
- **GitHub PAT.** Use a **classic** PAT (`ghp_…`) — auto-scope-filtering
only kicks in for classic tokens, not fine-grained ones. Without
it, the GitHub MCP exposes its full ~70-tool surface, which costs
~23k tokens of context the local 70B can ill afford. Generate at
<https://github.com/settings/tokens> with the scopes you actually
want exposed.

View File

@@ -8,8 +8,12 @@
# 1. Verify Homebrew is present
# 2. Install node, uv, opencode, jq (skips if already at latest)
# 3. Pre-cache Playwright's chromium so the first MCP call is instant
# 4. Install the Phoenix bridge plugin's OTel deps
# 5. Generate ~/.config/opencode/opencode.json from the repo's
# 4. Install Serena (uv tool — LSP-backed code navigation MCP)
# 5. Wire basic-memory's storage to the Obsidian vault's AI-memory folder
# 6. Install github-mcp-server + check for GITHUB_PERSONAL_ACCESS_TOKEN
# 7. Install task-master-ai (workflow MCP)
# 8. Install the Phoenix bridge plugin's OTel deps
# 9. Generate ~/.config/opencode/opencode.json from the repo's
# opencode.json with relative plugin paths rewritten to absolute,
# so opencode loads the plugin regardless of where it's launched.
#
@@ -28,14 +32,14 @@ warn() { printf ' \033[33m!\033[0m %s\n' "$*"; }
fail() { printf ' \033[31m✗\033[0m %s\n' "$*"; exit 1; }
# --- 1. Homebrew -------------------------------------------------------------
bold "[1/5] Homebrew"
bold "[1/9] Homebrew"
if ! command -v brew >/dev/null 2>&1; then
fail "brew not found. Install from https://brew.sh, then re-run."
fi
ok "brew $(brew --version | head -1 | awk '{print $2}')"
# --- 2. CLI deps -------------------------------------------------------------
bold "[2/5] CLI dependencies"
bold "[2/9] CLI dependencies"
brew_install_if_missing() {
local pkg="$1"
local bin="${2:-$1}"
@@ -60,7 +64,7 @@ else
fi
# --- 3. Playwright browsers --------------------------------------------------
bold "[3/5] Playwright browser cache"
bold "[3/9] Playwright browser cache"
PW_CACHE="${HOME}/Library/Caches/ms-playwright"
if [[ -d "$PW_CACHE" ]] && find "$PW_CACHE" -name "chrome" -o -name "Chromium*" 2>/dev/null | grep -q .; then
ok "browsers already cached at $PW_CACHE"
@@ -70,8 +74,93 @@ else
ok "browsers cached"
fi
# --- 4. Phoenix bridge plugin deps ------------------------------------------
bold "[4/5] Phoenix bridge plugin deps"
# --- 4. Serena (LSP-backed semantic code navigation MCP) --------------------
# Installed once as a uv tool so opencode can launch it as `serena
# start-mcp-server ...` without paying uvx's resolution cost on every
# session start. --prerelease=allow is required because serena-agent
# ships pre-1.0 versions.
bold "[4/9] Serena MCP"
if uv tool list 2>/dev/null | awk '{print $1}' | grep -qx 'serena-agent'; then
ok "serena-agent already installed ($(serena --version 2>/dev/null | head -1 || echo 'version unknown'))"
else
info "installing serena-agent via uv tool (~30s first run)"
uv tool install -p 3.13 serena-agent@latest --prerelease=allow
ok "serena-agent installed"
fi
if ! command -v serena >/dev/null 2>&1; then
warn "serena binary not on PATH — uv tool's bin dir may not be exported."
warn "Add this to your shell rc: export PATH=\"\$HOME/.local/bin:\$PATH\""
fi
# --- 5. basic-memory storage ------------------------------------------------
# basic-memory defaults its project home to ~/basic-memory. We point that
# at a folder inside the Obsidian vault via symlink so the AI's notes
# show up in Obsidian's graph and search. Symlink (not env var) chosen
# because it's stable across basic-memory's evolving config schema.
bold "[5/9] basic-memory storage"
AI_MEM_PATH="${HOME}/Documents/obsidian/AI-memory"
if [[ ! -d "$AI_MEM_PATH" ]]; then
info "creating $AI_MEM_PATH"
mkdir -p "$AI_MEM_PATH"
ok "AI-memory directory created"
else
ok "AI-memory directory exists at $AI_MEM_PATH"
fi
if [[ -L "${HOME}/basic-memory" ]]; then
link_target="$(readlink "${HOME}/basic-memory")"
if [[ "$link_target" == "$AI_MEM_PATH" ]]; then
ok "~/basic-memory already linked to AI-memory"
else
warn "~/basic-memory points to $link_target — leaving as-is. basic-memory MCP will write there, not to AI-memory."
fi
elif [[ -e "${HOME}/basic-memory" ]]; then
warn "~/basic-memory exists and is not a symlink. Move or remove it for AI-memory linkage."
else
info "linking ~/basic-memory -> $AI_MEM_PATH"
ln -s "$AI_MEM_PATH" "${HOME}/basic-memory"
ok "symlink created"
fi
# --- 6. github-mcp-server (GitHub MCP, classic-PAT auto-scope-filtered) -----
# Homebrew formula tracks upstream releases; the binary is a Go single-file.
# We launch it via opencode.json's mcp.github entry with --read-only and a
# narrowed --toolsets allowlist; auto-scope-filtering on classic PATs (the
# Jan 2026 GitHub feature) cuts ~23k tokens of tool-list overhead — significant
# for a local 70B's effective context. Token itself is NOT in opencode.json
# (it's git-tracked); github-mcp-server inherits GITHUB_PERSONAL_ACCESS_TOKEN
# from the user's shell env.
bold "[6/9] github-mcp-server"
brew_install_if_missing github-mcp-server github-mcp-server
if [[ -z "${GITHUB_PERSONAL_ACCESS_TOKEN:-}" ]]; then
warn "GITHUB_PERSONAL_ACCESS_TOKEN is not set in your shell environment."
warn " github-mcp-server will fail to connect on opencode startup."
warn " Fix:"
warn " 1. Create a classic PAT (starts with ghp_) at"
warn " https://github.com/settings/tokens with the scopes you want"
warn " exposed (auto-filtering hides tools whose scopes the PAT lacks)."
warn " 2. Add to ~/.zshrc:"
warn " export GITHUB_PERSONAL_ACCESS_TOKEN=ghp_xxxxxxxxxxxx"
warn " 3. Re-source the shell (or open a new terminal) before launching opencode."
else
ok "GITHUB_PERSONAL_ACCESS_TOKEN is set in this shell"
fi
# --- 7. claude-task-master (workflow MCP, npm global) -----------------------
# task-master-ai: workflow/task-gate MCP. File-based (.taskmaster/ in each
# project), no DB, no external service. opencode.json launches it via
# `npx -y task-master-ai`; the global install also provides the `task-master`
# CLI (`task-master init` to scaffold a project's tasks).
bold "[7/9] claude-task-master"
if command -v task-master >/dev/null 2>&1; then
ok "task-master-ai already installed ($(task-master --version 2>/dev/null | head -1 || echo 'version unknown'))"
else
info "installing task-master-ai globally via npm"
npm install -g task-master-ai
ok "task-master-ai installed"
fi
# --- 8. Phoenix bridge plugin deps ------------------------------------------
bold "[8/9] Phoenix bridge plugin deps"
if [[ -d ".opencode/plugin/node_modules" && -f ".opencode/plugin/package-lock.json" ]]; then
# Re-run npm install if package.json is newer than the lockfile, otherwise skip.
if [[ ".opencode/plugin/package.json" -nt ".opencode/plugin/package-lock.json" ]]; then
@@ -87,12 +176,12 @@ else
ok "deps installed"
fi
# --- 5. Generate ~/.config/opencode/opencode.json ---------------------------
# --- 9. Generate ~/.config/opencode/opencode.json ---------------------------
# The repo's opencode.json uses relative plugin paths so it stays valid
# in-place. Rewriting them to absolute paths here makes opencode find the
# plugin regardless of which directory it was launched from. Re-run this
# script after editing opencode.json.
bold "[5/5] Deploy global config"
bold "[9/9] Deploy global config"
mkdir -p "${HOME}/.config/opencode"
src="${HERE}/opencode.json"
dst="${HOME}/.config/opencode/opencode.json"
@@ -109,24 +198,34 @@ if [[ -L "${HOME}/.config/opencode/.opencode" ]]; then
rm "${HOME}/.config/opencode/.opencode"
fi
# Rewrite any relative plugin path (./foo, ../foo) to an absolute path
# rooted at this directory. Absolute paths and npm-package refs pass
# through untouched.
# Rewrite any relative path (./foo, ../foo) to an absolute path rooted at
# this directory. Applies to both the top-level `plugin` array and to any
# string inside `mcp.<name>.command[]` (used for serena's --context arg
# pointing at serena-ide-trim.yml). Absolute paths and npm-package refs
# pass through untouched.
jq --arg here "$HERE" '
.plugin = (
(.plugin // [])
| map(
if type == "string" and (startswith("./") or startswith("../"))
then ($here + "/" + ltrimstr("./") | gsub("/\\./"; "/"))
else .
end
)
)
def rewrite($h):
if type == "string" and (startswith("./") or startswith("../"))
then ($h + "/" + ltrimstr("./") | gsub("/\\./"; "/"))
else .
end;
.plugin = ((.plugin // []) | map(rewrite($here)))
| .mcp = (
(.mcp // {})
| with_entries(
if (.value | type) == "object" and (.value | has("command"))
then .value.command |= map(rewrite($here))
else .
end
)
)
' "$src" > "$dst.tmp"
mv "$dst.tmp" "$dst"
ok "wrote $dst"
info "plugin paths resolved to:"
jq -r '.plugin[]?' "$dst" | sed 's/^/ /'
info "mcp.serena context resolved to:"
jq -r '.mcp.serena.command | map(select(test("\\.yml$"))) | .[]?' "$dst" | sed 's/^/ /'
echo
bold "Done."

View File

@@ -7,7 +7,7 @@
"provider": {
"framework": {
"npm": "@ai-sdk/openai-compatible",
"name": "Framework Desktop (Strix Halo)",
"name": "Framework Desktop (Strix Halo) — Ollama",
"options": {
"baseURL": "http://framework:11434/v1"
},
@@ -20,6 +20,24 @@
}
}
}
},
"framework-vllm": {
"npm": "@ai-sdk/openai-compatible",
"name": "Framework Desktop (Strix Halo) — vLLM",
"options": {
"baseURL": "http://framework:8000/v1",
"apiKey": "dummy"
},
"models": {
"kimi-linear": {
"name": "Kimi-Linear 48B-A3B (long-context, vLLM)",
"limit": {
"context": 32768,
"output": 8192
},
"tool_call": false
}
}
}
},
"mcp": {
@@ -35,6 +53,44 @@
"environment": {
"SEARXNG_URL": "https://searxng.n0n.io"
}
},
"serena": {
"type": "local",
"command": [
"serena", "start-mcp-server",
"--context", "./serena-ide-trim.yml",
"--project-from-cwd",
"--open-web-dashboard", "false"
],
"enabled": true
},
"basic-memory": {
"type": "local",
"command": ["uvx", "basic-memory", "mcp"],
"enabled": true
},
"sequential-thinking": {
"type": "local",
"command": ["npx", "-y", "@modelcontextprotocol/server-sequential-thinking"],
"enabled": true
},
"github": {
"type": "local",
"command": [
"github-mcp-server",
"stdio",
"--read-only",
"--toolsets", "repos,issues,pull_requests,code_security"
],
"enabled": false
},
"task-master": {
"type": "local",
"command": ["npx", "-y", "task-master-ai"],
"enabled": true,
"environment": {
"OLLAMA_BASE_URL": "http://framework:11434/v1"
}
}
},
"model": "framework/qwen3-coder:30b"

View File

@@ -0,0 +1,92 @@
# Custom Serena context for the localgenai stack.
#
# Extends Serena's built-in `ide` context with deeper exclusions tailored to
# opencode + a local 70B-class model. Loaded by Serena via the
# `--context /abs/path/to/this.yml` flag (install.sh rewrites the relative
# path in opencode.json to the repo's absolute path at deploy time).
#
# Trim summary (see opencode/README.md for rationale):
# 46 raw tools → ide context excludes 5 → this context excludes 28 more
# → ~12 visible tools, all unique LSP value
#
# Cut categories:
# - JetBrains backend (11) — language_backend: LSP, never JetBrains
# - Line-level edits (5) — opencode's Edit covers them
# - Memory tools (6) — basic-memory MCP is canonical
# - Onboarding / meta (5) — bootstrap noise the model rarely needs
# - Destructive / dashboard (2) — remove_project, open_dashboard
#
# Kept (the unique LSP value, the reason we installed Serena):
# find_symbol, find_referencing_symbols, get_symbols_overview,
# replace_symbol_body, insert_after_symbol, insert_before_symbol,
# rename_symbol, safe_delete_symbol, restart_language_server,
# list_queryable_projects, query_project (+ activate_project, latent)
description: opencode IDE context, trimmed for local 70B
prompt: |
You are running in an IDE assistant context where file operations,
basic (line-based) edits and reads, and shell commands are handled by
your own, internal tools.
If Serena's tools can be used to achieve your task, you should
prioritize them. In particular, it is important that you avoid reading
entire source code files unless it is strictly necessary! Instead, for
exploring and reading code in a token-efficient manner, use Serena's
symbolic-search tools (find_symbol, find_referencing_symbols,
get_symbols_overview).
excluded_tools:
# Inherited from built-in ide context (opencode built-ins cover these)
- create_text_file
- read_file
- execute_shell_command
- find_file
- list_dir
# Line-level edits — redundant with opencode's Edit and Grep
- replace_content
- delete_lines
- replace_lines
- insert_at_line
- search_for_pattern
# Memory tools — basic-memory MCP is the canonical persistent memory
- write_memory
- read_memory
- list_memories
- delete_memory
- rename_memory
- edit_memory
# Onboarding / meta — bootstrap noise
- check_onboarding_performed
- onboarding
- initial_instructions
- serena_info
- get_current_config
# Destructive / dashboard
- remove_project
- open_dashboard
# JetBrains backend — never used in this setup (language_backend: LSP)
- jet_brains_find_symbol
- jet_brains_move
- jet_brains_safe_delete
- jet_brains_inline_symbol
- jet_brains_find_referencing_symbols
- jet_brains_get_symbols_overview
- jet_brains_type_hierarchy
- jet_brains_find_declaration
- jet_brains_find_implementations
- jet_brains_rename
- jet_brains_debug
tool_description_overrides: {}
# When `single_project: true` and a project is given at startup, Serena
# limits the toolset to what the project actually needs and disables
# `activate_project`. With opencode launching Serena via
# `--project-from-cwd`, a project is always present.
single_project: true

View File

@@ -0,0 +1,112 @@
# Kimi-Linear-48B-A3B-Instruct on vLLM, gfx1151, via kyuz0's TheRock 7.x
# toolbox. Pioneer-grade: no public Strix Halo benchmarks exist for this
# model as of 2026-05.
#
# Three risks P0 verifies in one shot:
# - KDA Triton kernel on gfx1151 (fla-core) unverified
# - compressed-tensors loader on ROCm unverified
# - HIP-graph-capture on gfx1151 broken; mitigated
# via --enforce-eager
#
# Image strategy. Default `image:` is upstream `kyuz0:stable` (vLLM
# ~6aa057c from 2026-04-22). If that crashes with the v0.12-class
# `MLAModules.__init__() missing 'indexer_rotary_emb'`, build a
# v0.11.2-pinned image locally with ./build.sh and edit `image:` below to
# `kimi-linear-local:v0.11.2`. Source build is multi-hour.
#
# Weights. Despite their HF name, cyankiwi's "AWQ" Kimi-Linear weights
# are actually `compressed-tensors` int4 group-quantized — see config.json.
# Download with:
# huggingface-cli download cyankiwi/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit \
# --local-dir /models/moonshotai/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit
# Size: ~35 GB on disk (4-bit). 8-bit variant is ~54 GB if quality drives
# us up later; both fit 128 GB unified comfortably.
services:
kimi-linear:
# Derived image: kyuz0:stable + gfx1151 AITER GEMM config fallbacks
# (Kimi-Linear's MLA layers hit FP8 BMM ops kyuz0 didn't validate
# with their tested models). See ./Dockerfile. Build is fast — just
# file copies inside the image.
build:
context: .
dockerfile: Dockerfile
image: kimi-linear-local:aiter-fixed
container_name: kimi-linear
restart: unless-stopped
devices:
- /dev/kfd:/dev/kfd
- /dev/dri:/dev/dri
cap_add:
- SYS_PTRACE
security_opt:
- seccomp=unconfined
# Numeric GIDs of host's video (44) and render (991) groups — names
# don't exist inside the container, but the GIDs need to match the
# host so /dev/kfd + /dev/dri are accessible.
group_add:
- "44"
- "991"
shm_size: 16g
ipc: host
environment:
# gfx1151 native: kyuz0 image is built with GFX=gfx1151, so unlike
# ollama.yml (which uses 11.0.0 to coerce gfx1100 kernels), here we
# want the GPU to report its real ISA.
- HSA_OVERRIDE_GFX_VERSION=11.5.1
# AITER attention path — kyuz0's image patches AITER for RDNA
# ds_swizzle fallbacks; the env flag opts vLLM into using it.
- VLLM_ROCM_USE_AITER=1
# MLA pre-processing via AITER triton_fp8_bmm tries to materialize
# a ~30 GB intermediate alongside resident weights. Bypass that op;
# other AITER paths stay on.
- VLLM_ROCM_USE_AITER_MLA=0
# Unified-memory recipe (BIOS UMA=0.5 GB + ttm.pages_limit cmdline
# + the env triple below). Lets PyTorch's HIP allocator treat the
# two rocminfo pools as one ~110 GB arena. Without the
# FINE_GRAIN_PCIE flag, XNACK alone is a trap (vLLM mis-computes
# KV budget vs. allocator ceiling).
- HSA_XNACK=1
- HSA_FORCE_FINE_GRAIN_PCIE=1
- PYTORCH_HIP_ALLOC_CONF=backend:native,expandable_segments:True,garbage_collection_threshold:0.9
volumes:
- /models:/models:ro
ports:
- "8000:8000"
# kyuz0 toolboxes drop into a shell by default; without an explicit
# entrypoint, `command:` would be exec'd as a program (the
# `exec "--model": executable file not found` failure).
entrypoint: ["vllm", "serve"]
command:
# Positional model path (vllm serve's documented form).
- /models/moonshotai/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit
- --served-model-name
- kimi-linear
# Auto-detect would also work — config.json carries quant_method.
# Explicit flag makes the failure mode loud if the loader is wrong.
- --quantization
- compressed-tensors
# Conservative restart point after BIOS+cmdline+env unblock.
# P3 ramps further: 32K → 128K → 256K → 512K → 1M.
- --max-model-len
- "32768"
- --gpu-memory-utilization
- "0.92"
- --max-num-seqs
- "4"
# gfx1151 V1-engine HIP-graph-capture is broken (vllm-project/vllm#32180).
# Eager costs throughput, not correctness; do not remove without
# verifying upstream fix landed.
- --enforce-eager
# Kimi-Linear ships custom modeling_kimi.py — required.
- --trust-remote-code
# Tool-calling support — opencode sends tool_choice:"auto" whenever
# MCP servers are connected. vLLM is strict and rejects unless both
# flags are present. Moonshot's Kimi family uses the kimi_k2 parser
# for tool-call formatting; Kimi-Linear inherits the same template.
- --enable-auto-tool-choice
- --tool-call-parser
- kimi_k2
- --host
- 0.0.0.0
- --port
- "8000"

View File

@@ -0,0 +1,35 @@
# Derived image: kyuz0:stable plus gfx1151 AITER GEMM config fallbacks.
#
# kyuz0's image is built for gfx1151 but doesn't ship every per-op AITER
# autotuning config. Kimi-Linear's MLA layers hit FP8 BMM ops
# (BATCHED_GEMM-A8W8-A_PER_TOKEN_GROUP_PREQUANT_W_PER_BATCHED_TENSOR_QUANT
# and friends) that have no gfx1151 config in the bundle. We synthesize
# them by copying from the closest-arch config that does exist (RDNA3
# gfx1100 is closest to RDNA3.5 gfx1151). Tile sizes won't be optimal
# but the kernels will compile and run.
#
# Idempotent — only fills slots that don't already have a gfx1151 config.
#
# If we ever need a vLLM-pinned base (e.g. upstream regresses on
# Kimi-Linear), build it via ./build.sh first and change FROM here to
# kimi-linear-local:v0.11.2.
FROM kyuz0/vllm-therock-gfx1151:stable
RUN set -e; \
DIR=/opt/venv/lib64/python3.12/site-packages/aiter/ops/triton/configs/gemm; \
cd "$DIR"; \
filled=0; \
for SRC_PREFIX in gfx1100 gfx1101 gfx942 gfx90a; do \
for SRC in ${SRC_PREFIX}-*.json; do \
[ -f "$SRC" ] || continue; \
OP=${SRC#${SRC_PREFIX}-}; \
DST=gfx1151-${OP}; \
if [ ! -f "$DST" ]; then \
cp "$SRC" "$DST"; \
echo "[fix-aiter] $SRC -> $DST"; \
filled=$((filled+1)); \
fi; \
done; \
done; \
echo "[fix-aiter] filled $filled gfx1151 config slots"

View File

@@ -0,0 +1,124 @@
# kimi-linear
Kimi-Linear-48B-A3B-Instruct on vLLM, ROCm/TheRock 7.x, gfx1151. Sits
beside Ollama (port 11434, Qwen3-Coder) on port 8000. OpenAI-compatible.
This is the **P0 verification stage** — no public Strix Halo numbers
exist for this model as of 2026-05. Three things are unverified until a
first generation succeeds: KDA Triton kernel on gfx1151,
compressed-tensors loader on ROCm, and AITER + Kimi MoE topology.
Smoke-test below confirms all three at once.
## Prereqs
- Pyinfra deploy has run (`./run.sh` from `pyinfra/framework/`) — gives
you `/srv/docker/kimi-linear/`, GPU group membership, `/models/`
layout, and `huggingface-cli` on the box.
- Hugging Face CLI authenticated (`huggingface-cli login`) if the
weights repo gates downloads. cyankiwi's repo is currently public.
## Step 1 — Download weights
```sh
huggingface-cli download \
cyankiwi/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit \
--local-dir /models/moonshotai/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit
```
~35 GB. The repo is named `AWQ-4bit` but the actual format is
`compressed-tensors` int4 group-quantized — see `config.json`.
## Step 2 — Try the upstream image first
```sh
cd /srv/docker/kimi-linear
docker compose pull # ~8.5 GB
docker compose up -d
docker compose logs -f
```
Watch for one of three things:
- **Loads cleanly, model serves on :8000** → P0 passes. Run `./smoke.sh`.
- **`MLAModules.__init__() missing 'indexer_rotary_emb'`** → upstream
image is on vLLM 0.12.x; need the v0.11.2 source build. Skip to
Step 3.
- **KDA / Triton / fla-core compile error** → kernel doesn't work on
gfx1151 yet. Fall back path: llama.cpp ROCm + bartowski Q4_K_M GGUF
in `compose/llama.yml`. Document the error in
`localgenai/kimi-linear/NOTES.md` and stop.
## Step 3 — Source build (if needed)
```sh
cd /srv/docker/kimi-linear
tmux new -s kimi-build
./build.sh # multi-hour. Detach with C-b d; reattach with `tmux a -t kimi-build`
```
Builds `kimi-linear-local:v0.11.2` from kyuz0 SHA `e2288d6` with
`VLLM_COMMIT=v0.11.2`. Then edit `docker-compose.yml`:
```yaml
image: kimi-linear-local:v0.11.2
```
…and `docker compose up -d` again.
## Step 4 — Smoke test
```sh
./smoke.sh
```
Expects: `/v1/models` returns `kimi-linear`; a four-token generation
returns "ok". If both pass, **P0 is done**. Update task #6 and proceed
to P1.
## Operations
```sh
docker compose logs -f kimi-linear # tail
docker compose restart kimi-linear # reload
docker compose down # stop
docker compose exec kimi-linear bash # shell in
amdgpu_top # on host: GPU power, mem, util
```
## Pin manifest
| Component | Pin |
| --------------------------- | ---------------------------------- |
| kyuz0 toolbox | commit `e2288d6` (2026-04-22) |
| vLLM | tag `v0.11.2` (Moonshot recipe) |
| Image (default) | `kyuz0/vllm-therock-gfx1151:stable`|
| Image (pinned, if built) | `kimi-linear-local:v0.11.2` |
| Weights | `cyankiwi/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit` (compressed-tensors int4) |
| ROCm | TheRock nightlies via kyuz0 base |
| Python | 3.12 (hardcoded in kyuz0 Dockerfile) |
Bump policy: don't move vLLM to 0.12.x; don't move kyuz0 commit without
re-running smoke; bump weights only when an 8-bit A/B is in scope (P3).
## Port collision warning
`compose/vllm.yml` is a placeholder stub that also binds `:8000`. Only
one of `kimi-linear` and `vllm` can run at a time. Don't `docker compose
up` both. Long term either delete the stub or move it to a different
port; not in scope here.
## Known issues / mitigations
- **HIP graph capture broken on gfx1151** (vllm-project/vllm#32180) —
`--enforce-eager` mitigates at a throughput cost. Re-test without it
once the upstream fix lands.
- **vLLM 0.12.0 crash on Kimi-Linear** —
`MLAModules.__init__() missing 'indexer_rotary_emb'`. Hard pin to
0.11.2.
- **No published gfx1151 numbers** — we are first. Findings stay
private (no upstream filings) per project policy.
## Status
P0 in progress. Update `oc-tree`-style `NEXT_STEPS.md` if you set this
aside mid-verification.

View File

@@ -0,0 +1,51 @@
#!/usr/bin/env bash
# Source-build a vLLM 0.11.2-pinned image from kyuz0's gfx1151 toolbox.
# Use only when the upstream `kyuz0/vllm-therock-gfx1151:stable` tag
# crashes on Kimi-Linear with a v0.12-class error
# (`MLAModules.__init__() missing 'indexer_rotary_emb'`).
#
# Compiles flash-attention, AITER+CK, vLLM, and bitsandbytes from source
# with MAX_JOBS=4 (fixed upstream). Expect a multi-hour wall-clock on
# Strix Halo. Idempotent — skips if the target tag already exists.
#
# Pin policy. KYUZ0_COMMIT is the upstream SHA whose CI build produced
# the published `:stable` on 2026-04-22; bump only after re-validating
# Kimi-Linear works with the new toolbox revision. VLLM_COMMIT is the
# Moonshot recipe pin for Kimi-Linear; do not bump to v0.12.x.
set -euo pipefail
KYUZ0_REPO="https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes.git"
KYUZ0_COMMIT="e2288d6"
VLLM_COMMIT="v0.11.2"
IMAGE_TAG="kimi-linear-local:${VLLM_COMMIT}"
WORKDIR="/tmp/kimi-linear-build"
if docker image inspect "$IMAGE_TAG" >/dev/null 2>&1; then
echo "[build] $IMAGE_TAG already exists. To rebuild: docker rmi $IMAGE_TAG"
exit 0
fi
if [ ! -d "$WORKDIR/.git" ]; then
rm -rf "$WORKDIR"
git clone "$KYUZ0_REPO" "$WORKDIR"
fi
cd "$WORKDIR"
git fetch origin
git checkout --quiet "$KYUZ0_COMMIT"
echo "[build] kyuz0 toolbox @ $(git rev-parse --short HEAD)"
echo "[build] vLLM pin: $VLLM_COMMIT"
echo "[build] image tag: $IMAGE_TAG"
echo "[build] expected wall-clock: hours. Use tmux."
echo
docker build \
--build-arg "VLLM_COMMIT=${VLLM_COMMIT}" \
-t "$IMAGE_TAG" \
-f Dockerfile \
.
echo
echo "[build] done. Switch image: in docker-compose.yml to $IMAGE_TAG."

View File

@@ -0,0 +1,66 @@
#!/usr/bin/env bash
# Patch cyankiwi's tokenization_kimi.py to inline `bytes_to_unicode`.
#
# Why: tokenization_kimi.py does
# from transformers.models.gpt2.tokenization_gpt2 import bytes_to_unicode
# which fails on recent transformers (the helper was removed/relocated).
# The function itself is ~10 lines of public BPE byte-mapping math; we
# inline it. Idempotent — re-running is a no-op once patched.
#
# Run on the box, after weights are downloaded, before first
# `docker compose up`. Recreates the container at the end so
# `trust_remote_code` re-copies the patched file into its module cache.
set -euo pipefail
MODEL_DIR="${MODEL_DIR:-/models/moonshotai/Kimi-Linear-48B-A3B-Instruct-AWQ-4bit}"
F="$MODEL_DIR/tokenization_kimi.py"
if [ ! -f "$F" ]; then
echo "[patch-tokenizer] not found: $F" >&2
echo "[patch-tokenizer] download weights first, or set MODEL_DIR=" >&2
exit 1
fi
if grep -q '__patched_bytes_to_unicode__' "$F"; then
echo "[patch-tokenizer] $F already patched. Nothing to do."
exit 0
fi
if ! grep -q 'from transformers.models.gpt2.tokenization_gpt2 import bytes_to_unicode' "$F"; then
echo "[patch-tokenizer] expected import line not present in $F." >&2
echo "[patch-tokenizer] upstream may have changed — inspect manually:" >&2
echo " grep -n bytes_to_unicode '$F'" >&2
exit 2
fi
python3 - "$F" <<'PYEOF'
import pathlib, sys
p = pathlib.Path(sys.argv[1])
s = p.read_text()
old = "from transformers.models.gpt2.tokenization_gpt2 import bytes_to_unicode"
new = (
"# __patched_bytes_to_unicode__ — inlined; helper removed from recent transformers\n"
"def bytes_to_unicode():\n"
" bs = (list(range(ord(\"!\"), ord(\"~\") + 1))\n"
" + list(range(ord(\"¡\"), ord(\"¬\") + 1))\n"
" + list(range(ord(\"®\"), ord(\"ÿ\") + 1)))\n"
" cs = bs[:]\n"
" n = 0\n"
" for b in range(2**8):\n"
" if b not in bs:\n"
" bs.append(b)\n"
" cs.append(2**8 + n)\n"
" n += 1\n"
" cs = [chr(n) for n in cs]\n"
" return dict(zip(bs, cs))"
)
p.write_text(s.replace(old, new))
print("[patch-tokenizer] patched", p)
PYEOF
echo "[patch-tokenizer] recreating container to refresh trust_remote_code module cache"
cd "$(dirname "$0")"
docker compose down
docker compose up -d
echo "[patch-tokenizer] done. Tail logs with: docker compose logs -f"

View File

@@ -0,0 +1,24 @@
#!/usr/bin/env bash
# Smoke-test the running kimi-linear vLLM container. Exits non-zero if
# anything's wrong, so it doubles as a P1 health check.
set -euo pipefail
HOST="${KIMI_HOST:-127.0.0.1:8000}"
MODEL="${KIMI_MODEL:-kimi-linear}"
echo "[smoke] GET /v1/models on $HOST"
curl -fsS "http://$HOST/v1/models" | python3 -m json.tool
echo
echo "[smoke] POST /v1/chat/completions ($MODEL) — tiny generation"
curl -fsS "http://$HOST/v1/chat/completions" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"$MODEL\",
\"messages\": [{\"role\": \"user\", \"content\": \"Reply with exactly: ok\"}],
\"max_tokens\": 16,
\"temperature\": 0.0
}" | python3 -m json.tool
echo
echo "[smoke] passed"

View File

@@ -17,6 +17,11 @@ services:
- "host.docker.internal:host-gateway"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
# vLLM (Kimi-Linear) exposed as an OpenAI-compatible backend. The
# model isn't strongly tool-trained — opencode's agentic system
# prompt confuses it. OpenWebUI's plain chat UI is the right home.
- OPENAI_API_BASE_URLS=http://host.docker.internal:8000/v1
- OPENAI_API_KEYS=dummy
# Built-in web search via the project's SearXNG instance.
- ENABLE_RAG_WEB_SEARCH=true
- RAG_WEB_SEARCH_ENGINE=searxng

View File

@@ -343,18 +343,23 @@ server.user(
_sudo=True,
)
# Kernel cmdline tuning per Gygeek/Framework-strix-halo-llm-setup:
# - amd_iommu=off — ~6 % memory-read improvement on Strix Halo
# - amdgpu.gttsize=117760 — ~115 GB GTT ceiling so the GPU can borrow
# most of system RAM dynamically. Acts as a
# ceiling, not an allocation. See ../../StrixHaloMemory.md
# for the UMA-vs-GTT trade-off discussion.
# Kernel cmdline tuning. The Strix Halo unified-memory recipe (kyuz0
# vllm-toolboxes "Kernel Parameters and Unified Memory" + Framework's
# "Linux + ROCm: January 2026 Stable Configurations" thread):
# - amd_iommu=off — ~6 % memory-read improvement
# - amdgpu.gttsize=131072 — 128 GiB GTT ceiling (deprecated knob
# but still honored on kernel 6.16+)
# - ttm.pages_limit=33554432 — 128 GiB in 4 KiB pages; forward-
# compatible TTM page cap
# Combined with BIOS UMA at 0.5 GB and HSA_FORCE_FINE_GRAIN_PCIE=1 in the
# container, PyTorch's HIP allocator merges the two rocminfo pools into a
# single ~110 GB arena. See ../../StrixHaloMemory.md for context.
# Requires a reboot to take effect; pyinfra leaves that to you.
files.line(
name="GRUB cmdline (amd_iommu, gttsize)",
name="GRUB cmdline (amd_iommu, gttsize, ttm)",
path="/etc/default/grub",
line=r"^GRUB_CMDLINE_LINUX_DEFAULT=.*",
replace='GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off amdgpu.gttsize=117760"',
replace='GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off amdgpu.gttsize=131072 ttm.pages_limit=33554432"',
_sudo=True,
)
server.shell(
@@ -418,6 +423,7 @@ for svc in (
"llama",
"vllm",
"ollama",
"kimi-linear",
"openwebui",
"beszel",
"openlit",
@@ -559,6 +565,27 @@ for cfg in (
_sudo=True,
)
# Kimi-Linear container assets (build script, smoke test, operator doc).
# The compose file itself is copied by the for-loop above; the rest of
# the build context lives under compose/kimi-linear/ on the source side
# and at /srv/docker/kimi-linear/ on the box. Source is the source of
# truth — pyinfra overwrites drift.
for asset, mode in (
("Dockerfile", "0664"),
("build.sh", "0775"),
("smoke.sh", "0775"),
("patch-tokenizer.sh", "0775"),
("README.md", "0664"),
):
files.put(
name=f"kimi-linear: {asset}",
src=f"compose/kimi-linear/{asset}",
dest=f"{COMPOSE_DIR}/kimi-linear/{asset}",
group="docker",
mode=mode,
_sudo=True,
)
# Voice stack — Wyoming-protocol Whisper (STT) and Piper (TTS). Models
# are downloaded on first start; bind-mounting these dirs survives
# container recreation.

View File

@@ -4,4 +4,4 @@
set -euo pipefail
cd "$(dirname "$0")"
exec pyinfra -v --ssh-password-prompt inventory.py deploy.py "$@"
exec pyinfra -v --ssh-password-prompt inventory.py deploy.py "$@"

View File

@@ -0,0 +1,77 @@
# Qwen Large Codebase Roadmap
This document outlines recommendations for working with large codebases using opencode and your existing vector database setup.
## Current Setup
- Large codebase enriched by large-context LLM
- Vector database for searching codebase
- Opencode with existing MCP servers (Playwright, SearXNG, Serena, Basic-memory, Sequential-thinking, Task-master)
## Recommended Tooling
### 1. Continue.dev Integration
- **Purpose**: Better codebase indexing and search with nomic-embed-text
- **Benefits**: Inline completion support, enhanced code understanding capabilities
- **Implementation**: Deploy continue.dev alongside existing setup
### 2. Enhanced RAG Implementation
- **Purpose**: Leverage vector DB for efficient codebase navigation
- **Approach**:
- Custom MCP server that queries your vector DB for relevant code sections
- Cited search variants for precise code reference retrieval
- **Integration**: Combine with existing symbolic search from Serena MCP
### 3. Structured Workflows
- **Purpose**: Organize complex projects and search activities
- **Tools**:
- Task-master MCP for task management
- Workflow patterns that tie vector DB queries to specific tasks
- **Benefits**: Better tracking of search results and implementation progress
### 4. Memory Management
- **Purpose**: Persistent documentation of codebase insights
- **Approach**:
- Leverage basic-memory MCP for notes tied to vector DB queries
- Create patterns for documenting important code patterns, API decisions, and insights
- **Integration**: Connect memory entries to specific vector search results
### 5. Monitoring Integration
- **Purpose**: Track performance of vector database queries
- **Tools**:
- Phoenix tracing for performance monitoring
- Track tool usage patterns for optimizing search strategies
- **Benefits**: Visibility into query efficiency and system performance
### 6. Code Navigation Enhancement
- **Purpose**: Combine vector search with symbolic navigation
- **Approach**:
- Use Serena MCP for symbolic code navigation
- Augment with vector search results for context
- **Integration**: Create hybrid search approaches that use both methods
## Implementation Approach
1. **Phase 1**: Install continue.dev for enhanced code understanding
2. **Phase 2**: Set up vector DB query tools as custom MCP servers
3. **Phase 3**: Create patterns for combining vector search with symbolic navigation
4. **Phase 4**: Implement persistent memory patterns for documenting findings
5. **Phase 5**: Establish monitoring and optimization practices
## Key Integration Points
### Vector DB + Opencode
- Use vector database queries to find relevant code sections
- Combine with symbolic search for complete context understanding
- Enable citation-based referencing of code locations
### Memory + Search
- Document search results in basic-memory
- Create connections between vector DB entries and memory notes
- Maintain a searchable knowledge base of codebase insights
### Monitoring + Performance
- Track query performance through Phoenix
- Optimize search strategies based on usage patterns
- Monitor system efficiency as complexity scales
This roadmap provides a gradual approach to enhancing your codebase management capabilities while leveraging your existing infrastructure.

View File

@@ -0,0 +1,49 @@
name: framework
version: 0.0.1
schema: v1
# Continue.dev config for the framework-backed local LLM stack. Drop at
# ~/.continue/config.yaml (per-machine) or <repo>/.continue/config.yaml
# (repo-scoped). One-time prep on the box: `ollama pull nomic-embed-text`.
#
# Two chat models, manually switched via Continue's model dropdown.
# Qwen3-Coder is the daily driver; Kimi-Linear is the long-context play
# and only works once P0 verification passes on the box.
models:
- name: Qwen3-Coder 30B (Ollama)
provider: ollama
model: qwen3-coder:30b # Ollama tag (the "A3B" is the MoE active count, descriptive only)
apiBase: http://framework:11434 # NO /v1 — Ollama provider speaks /api/* directly
roles: [chat, edit, apply]
defaultCompletionOptions:
contextLength: 65536 # matches OLLAMA_CONTEXT_LENGTH in ollama.yml
- name: Kimi-Linear 48B-A3B (vLLM)
provider: openai # vLLM exposes OpenAI-compatible /v1
model: kimi-linear # served-model-name in compose/kimi-linear.yml
apiBase: http://framework:8000/v1
apiKey: dummy # vLLM doesn't enforce; Continue requires non-empty
roles: [chat, edit, apply]
defaultCompletionOptions:
contextLength: 32768 # P0 narrow start; bump as --max-model-len ramps
- name: nomic-embed-text
provider: ollama
model: nomic-embed-text
apiBase: http://framework:11434
roles: [embed]
context:
- provider: code
- provider: diff
- provider: terminal
- provider: problems
- provider: folder
- provider: codebase # uses the embed-role model above
- provider: file
- provider: open
# Tab autocomplete intentionally omitted — Qwen3-Coder-30B is too slow
# for inline FIM. Add a small FIM-trained model later (qwen2.5-coder:1.5b
# or starcoder2:3b) and wire as roles: [autocomplete].