added models, model-swap, ...

2026-06-26 08:13:33 -04:00
parent de1635872f
commit 224afbb3a6
18 changed files with 1659 additions and 243 deletions
--- a/pyinfra/framework/compose/coder/README.md
+++ b/pyinfra/framework/compose/coder/README.md
@@ -0,0 +1,110 @@
+# Coder — workspace manager (pilot)
+
+Self-hosted control plane at <http://framework:7080> that creates
+per-project dev containers from Terraform templates. Each workspace:
+its own container, browser code-server with the Claude Code extension
+pre-installed, no host-port bookkeeping (everything tunnels through the
+dashboard), and idle autostop so parked projects give their RAM back to
+the inference stacks.
+
+**Pilot status.** Evaluating against the standalone code-server stack
+(`/srv/docker/code-server`, kept as-is meanwhile). Verdict after a week
+of real use: does the dashboard + autostop + create-from-template earn
+an always-on control plane + Postgres? If yes, the standalone stack
+retires; if no, Coder goes and a thin `workspace` spawner script
+replaces it.
+
+## Bring-up
+
+```sh
+cd /srv/docker/coder && docker compose up -d
+```
+
+The sibling `.env` (DOCKER_GROUP_ID, CODER_ACCESS_URL, random Postgres
+password) is generated by deploy.py on first `./run.sh` — no hand-fill
+needed. First visit to <http://framework:7080> creates the admin
+account (pick anything; it's local to the box).
+
+> **HTTPS is required for extension panels.** VS Code webviews (the
+> Claude Code panel, markdown preview, etc.) run on service workers,
+> which browsers only allow in a secure context — over plain
+> `http://framework:7080` the editor works but webview panels render
+> blank. Fix via Tailscale Serve (real Let's Encrypt cert for the
+> tailnet name; enable "HTTPS Certificates" once in the Tailscale
+> admin console):
+>
+> ```sh
+> sudo tailscale serve --bg 7080
+> # then in .env: CODER_ACCESS_URL=https://framework.<tailnet>.ts.net
+> docker compose up -d        # recreate server with new access URL
+> # restart any existing workspace — app URLs derive from the access URL
+> ```
+>
+> Localhost is also a secure context, so
+> `ssh -L 7080:localhost:7080 framework` + http://localhost:7080 works
+> in a pinch.
+
+## Push the template (one-time + after edits)
+
+The `coder` CLI ships inside the server image; authenticate it once:
+
+```sh
+docker compose exec coder coder login http://localhost:7080
+# prints a /cli-auth URL — open http://framework:7080/cli-auth in your
+# browser, copy the session token, paste it back
+```
+
+Then push:
+
+```sh
+docker compose exec coder coder templates push code-server \
+  --directory /templates/code-server --yes
+```
+
+Template source of truth is the repo
+(`pyinfra/framework/compose/coder/templates/code-server/main.tf`) —
+edit there, `./run.sh`, re-push. Edits on the box get overwritten.
+
+## First workspace
+
+Dashboard → Workspaces → Create → `code-server` template → name it
+after the project. Open the code-server app tile, then one-time
+Claude sign-in: the extension shows an OAuth URL → open in another tab
+→ approve → paste the code back. Credentials live in `~/.claude` on
+the workspace's home volume and survive stop/start and rebuilds.
+
+Set **idle autostop** under Template → Settings → Schedule (suggest
+1–2 h inactivity). Activity = open code-server tab, SSH, web terminal.
+
+## What persists where
+
+| Thing                            | Where                                        |
+| -------------------------------- | -------------------------------------------- |
+| Workspace home (repos, ~/.claude, extensions) | named volume `coder-<workspace-id>-home` |
+| Control-plane state (users, templates, workspace defs) | Postgres → `/srv/docker/coder/postgres` |
+| Template source                  | this repo, shipped to `/srv/docker/coder/templates` |
+
+A stopped workspace's container is deleted; only the home volume
+remains. `docker volume ls | grep coder-` to audit.
+
+## Security
+
+Same posture as OpenHands: the server container holds the docker
+socket and spawns code-running containers — root-equivalent on the
+box. Tailscale-only exposure is the mitigation; never forward :7080
+anywhere else. Coder does have real auth (the admin account), but
+treat that as defense-in-depth, not as permission to expose it.
+
+## Notes
+
+- Workspaces reach the sibling services via `host.docker.internal`
+  (Ollama :11434, LiteLLM :4000, Phoenix :6006, ...).
+- Long Claude runs: same rules as anywhere — the process lives in the
+  workspace container, so it survives laptop/browser disconnects, but
+  **autostop will kill an idle-looking workspace mid-run**. For
+  multi-hour unattended tasks either bump the workspace's TTL in the
+  dashboard or use the host-side tmux + `claude remote-control`
+  pattern (framework README, "Claude Code on the box").
+- The base image is `codercom/enterprise-base:ubuntu` (sudo-enabled,
+  common toolchain). Per-project images are a later refinement —
+  swap the `image` in main.tf or parameterize with `coder_parameter`.
--- a/pyinfra/framework/compose/coder/templates/code-server/main.tf
+++ b/pyinfra/framework/compose/coder/templates/code-server/main.tf
@@ -0,0 +1,121 @@
+# code-server workspace template — one dev container per project, with
+# browser VS Code and Claude Code (extension + CLI) ready on first open.
+#
+# Source of truth is the repo copy at
+# pyinfra/framework/compose/coder/templates/code-server/main.tf; pyinfra
+# ships it to /srv/docker/coder/templates/, mounted read-only into the
+# server container at /templates. Push after edits:
+#   cd /srv/docker/coder
+#   docker compose exec coder coder templates push code-server \
+#     --directory /templates/code-server --yes
+
+terraform {
+  required_providers {
+    coder = {
+      source = "coder/coder"
+    }
+    docker = {
+      source = "kreuzwerker/docker"
+    }
+  }
+}
+
+provider "coder" {}
+
+# Talks to the host daemon via the socket mounted into the server
+# container (default unix:///var/run/docker.sock) — workspace containers
+# are siblings of the compose stacks, not children.
+provider "docker" {}
+
+data "coder_workspace" "me" {}
+data "coder_workspace_owner" "me" {}
+
+resource "coder_agent" "main" {
+  arch = "amd64"
+  os   = "linux"
+
+  # Claude Code CLI — native installer, lands in ~/.local/bin inside the
+  # persisted home volume, so this is a no-op after the first start. The
+  # code-server extension bundles its own CLI, but having `claude` on
+  # PATH enables tmux-based long runs in the workspace terminal.
+  startup_script = <<-EOT
+    set -e
+    command -v claude >/dev/null 2>&1 || curl -fsSL https://claude.ai/install.sh | bash
+  EOT
+
+  env = {
+    GIT_AUTHOR_NAME     = data.coder_workspace_owner.me.full_name
+    GIT_AUTHOR_EMAIL    = data.coder_workspace_owner.me.email
+    GIT_COMMITTER_NAME  = data.coder_workspace_owner.me.full_name
+    GIT_COMMITTER_EMAIL = data.coder_workspace_owner.me.email
+  }
+
+  metadata {
+    display_name = "CPU"
+    key          = "cpu"
+    script       = "coder stat cpu"
+    interval     = 10
+    timeout      = 1
+  }
+  metadata {
+    display_name = "RAM"
+    key          = "mem"
+    script       = "coder stat mem"
+    interval     = 10
+    timeout      = 1
+  }
+}
+
+# Browser VS Code inside the workspace, surfaced as a dashboard app.
+# Extensions install from Open VSX — anthropic.claude-code is the
+# official Claude Code extension
+# (https://open-vsx.org/extension/Anthropic/claude-code). OAuth creds
+# land in ~/.claude inside the home volume and survive rebuilds.
+module "code_server" {
+  count    = data.coder_workspace.me.start_count
+  source   = "registry.coder.com/coder/code-server/coder"
+  version  = "~> 1.0"
+  agent_id = coder_agent.main.id
+  folder   = "/home/coder/project"
+
+  extensions = [
+    "anthropic.claude-code",
+  ]
+}
+
+# Home survives workspace stop/start AND template-driven rebuilds —
+# ignore_changes keeps Terraform from recreating the volume (and wiping
+# ~/.claude, extensions, repos) when template metadata shifts.
+resource "docker_volume" "home" {
+  name = "coder-${data.coder_workspace.me.id}-home"
+  lifecycle {
+    ignore_changes = all
+  }
+}
+
+resource "docker_container" "workspace" {
+  # start_count is 0 when the workspace is stopped — the container is
+  # deleted but the home volume above persists. This is what idle
+  # autostop reclaims: RAM back to the inference stacks.
+  count    = data.coder_workspace.me.start_count
+  image    = "codercom/enterprise-base:ubuntu"
+  name     = "coder-${data.coder_workspace_owner.me.name}-${lower(data.coder_workspace.me.name)}"
+  hostname = data.coder_workspace.me.name
+
+  # Agent bootstrap, rewritten to reach the control plane through the
+  # docker bridge (the workspace is a sibling container; "localhost"
+  # inside it isn't the Coder server).
+  entrypoint = ["sh", "-c", replace(coder_agent.main.init_script, "/localhost|127\\.0\\.0\\.1/", "host.docker.internal")]
+  env        = ["CODER_AGENT_TOKEN=${coder_agent.main.token}"]
+
+  host {
+    host = "host.docker.internal"
+    ip   = "host-gateway"
+  }
+
+  volumes {
+    container_path = "/home/coder"
+    volume_name    = docker_volume.home.name
+    read_only      = false
+  }
+}