Files
touchbase/docs/progress/2026-05-03-production-prep.md
2026-05-05 10:39:59 -04:00

8.0 KiB
Raw Permalink Blame History

2026-05-03 — Production Prep

Companion to BuildLog.md. Predecessor: 2026-05-03-reminders.md.

Milestone

A self-contained, reproducible production deployment story. The app + worker now build into a single Docker image (touchbase/app:latest, ~1.7 GB). Compose's prod profile brings up postgres + app + worker + caddy with sensible env defaults; the runbook in docs/deploy.md walks through first-time setup and ongoing operations.

What landed

Path Role
Dockerfile Multi-stage build (deps → builder → runner). Single image runs both pnpm start (web) and pnpm worker per command override. Non-root nextjs user, exposes 3000.
.dockerignore Excludes node_modules, .next, .git, .env, IDE/OS junk, docs/progress (those ship with code but not in the build context to keep it lean)
compose.yaml Updated app + worker to use env_file: .env with explicit DATABASE_URL override that points at the postgres service hostname (so prod profile works without manual env tweaking). Added Docker HEALTHCHECK on app via /api/health.
src/app/api/health/route.ts Liveness/readiness endpoint. Returns {status, version, time, checks: {app, db}}. 200 if Postgres reachable; 503 otherwise. Cached 5s.
next.config.ts Added turbopack.root to silence Next 16's "multiple lockfiles" warning at build time.
src/lib/seed.ts Refuses to run in NODE_ENV=production unless ALLOW_SEED_IN_PRODUCTION=1 is set. Reason: seed() TRUNCATEs every table; one accidental prod run wipes the database.
docs/deploy.md First-time setup walkthrough, required env vars table, migrations / logs / backups / rollback / common ops sections, "what's not yet automated" section pointing at registry, secrets, observability, CI as next steps.

What's verified

  • pnpm test92/92 green
  • pnpm lint — clean
  • pnpm exec tsc --noEmit — clean
  • docker-compose --profile prod build app — succeeds; produces touchbase/app:latest at 1.73 GB
  • Live container smoke:
    • Started app container against the dev Postgres → /api/health returns {"status":"ok","checks":{"app":"ok","db":"ok"}} and / returns 200 HTML
    • Started worker container with pnpm worker command override → "[worker] pg-boss started; handlers registered, idling"

Decisions ratified

Decision Resolution
Image strategy Single image, full deps for both web and worker. Container picks via command: override. Reason: simpler than two images, ~1.7 GB is acceptable for v1. Optimization to standalone-mode + slim image deferred.
Base image node:22-bookworm-slim (Debian) over node:22-alpine. Reason: better native-deps compatibility (sharp, esbuild) at the cost of ~80 MB.
pnpm in runner stage npm install --global pnpm@10.18.3 instead of corepack prepare. Reason: corepack writes to user-specific cache; the non-root nextjs user can't write to ~/.cache if corepack-prepare ran as root in the image. Global npm install is simpler and works for any UID.
Healthcheck endpoint /api/health with db.$queryRaw\SELECT 1`` ping. 200 healthy / 503 degraded. Cached 5s so flapping monitors don't hammer the DB.
Compose env strategy env_file: .env for convenience; explicit DATABASE_URL: ${DATABASE_URL:-postgresql://...postgres:5432/...} override so prod-profile-locally just works (uses internal docker network hostname). For prod deploy, user export DATABASE_URL=... to override before docker-compose up.
Seed safety in prod seed() throws if NODE_ENV=production unless ALLOW_SEED_IN_PRODUCTION=1. Reason: forgetting that seed wipes data is a foot-gun; the env var makes "I really mean it" explicit.
Migration strategy pnpm exec prisma migrate deploy from the host or via docker-compose exec app. Non-destructive; no resets. Documented in docs/deploy.md.
Where logs go Container stdout/stderr → Docker's logging driver. Default json-file is fine for v1; structured logging (Pino) + remote sink is a "next step" item.
Image tag in compose touchbase/app:latest (was :dev). Reason: clearer naming for prod. Real tag-by-SHA discipline waits until we have an image registry.

Gotchas hit

corepack + non-root user

First boot of the app container crashed with EACCES: permission denied, mkdir '/home/nextjs/.cache/node/corepack/v1'. Corepack (which is bundled with Node 22) downloads pnpm to a user-specific cache on first invocation; the runner stage activated pnpm as root, so when the runtime user nextjs tried to invoke pnpm start, corepack tried to download into nextjs's home dir, which was empty/unwritable. Fix: install pnpm globally via npm install --global pnpm@10.18.3 in the runner stage so it's on PATH for any UID.

Prisma openssl warnings

Build emits Prisma failed to detect the libssl/openssl version to use, and may not work as expected. Defaulting to "openssl-1.1.x". These are harmless — Prisma 7 with the driver-adapter pattern doesn't use the native query engine binary at runtime. The warning fires during prisma generate checking for binary fallback compatibility. Could silence by apt-get install -y openssl in the builder stage; deferred.

Image size (1.73 GB)

Bigger than ideal. Sources of bulk: full node_modules (we keep dev deps for tsx to run the worker, plus prisma CLI for migrations); the regular Next build vs standalone output. Optimization paths:

  • Standalone Next build + separate slim worker image (~150 MB web, ~300 MB worker, total ~450 MB vs 1.7 GB)
  • Compile worker to a single bundle with esbuild → drop tsx from runtime
  • Strip dev deps from runtime — would also need to compile scripts/seed.ts and any other tsx-only entries

Deferred to "production prep v2" if/when image size becomes a constraint (it isn't for one-host deploys).

Open questions

  1. Customer-visible brand name (still pending)
  2. Currency
  3. Stripe account ownership
  4. Real domain + Caddy config
  5. NEW: image registry choice (GHCR, ECR, Docker Hub)? Defer until we deploy to more than one host.
  6. NEW: CI provider (GitHub Actions presumably) — when the practice wants formal release discipline.

Roadmap status

  • Backend 14 done
  • 5a + 5b + 5c.1 + 5c.2 done
  • UX phases AE done
  • 5d Stripe — scaffolded; awaits real keys for live verification
  • 5e reminders — done
  • Production prep — done 2026-05-03 (this session)

v1 is now soft-launchable. Remaining gates before opening to real customers:

  1. Live Stripe verification in test mode — user provides keys, we run the deposit flow end-to-end.
  2. Brand + policy decisions — customer-visible name, ToS/cancellation copy, real domain.
  3. First production deploy — stand up the host, run through docs/deploy.md, point a real domain at it.

Code-wise, nothing else is needed for soft launch.

Two-track:

  • Code track: nothing critical. Optional polish: 24h reminder is hard-coded — make REMINDER_LEAD_MS configurable; add a Sentry/GlitchTip integration for prod error tracking; CI workflow with pnpm test/lint/tsc. Each is half-day-ish.
  • Operational track: depends on you — Stripe keys, domain, deploy host, brand decisions.

Pause is appropriate here. Pick up when you want to either verify Stripe live, deploy somewhere, or polish a specific thing.

How to resume

cd /Users/noise/Documents/code/touchbase
docker-compose --profile prod build       # builds app + worker images (~3 min)
docker-compose --profile prod up -d       # starts postgres + app + worker + caddy
curl -sf http://localhost:3000/api/health # confirms app + db reachable
docker-compose --profile prod logs -f

Or for a single-component test:

docker-compose up -d postgres                         # dev-style postgres
docker run --rm --network touchbase_default \
  -e DATABASE_URL="postgresql://touchbase:touchbase@postgres:5432/touchbase_dev?schema=public" \
  -e AUTH_SECRET=test -p 3001:3000 \
  touchbase/app:latest
# /api/health on :3001