Files
touchbase/docs/progress/2026-05-03-production-prep.md
2026-05-05 10:39:59 -04:00

116 lines
8.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 2026-05-03 — Production Prep
> Companion to `BuildLog.md`. Predecessor: `2026-05-03-reminders.md`.
## Milestone
A self-contained, reproducible production deployment story. The app + worker now build into a single Docker image (`touchbase/app:latest`, ~1.7 GB). Compose's `prod` profile brings up `postgres` + `app` + `worker` + `caddy` with sensible env defaults; the runbook in `docs/deploy.md` walks through first-time setup and ongoing operations.
## What landed
| Path | Role |
|---|---|
| `Dockerfile` | Multi-stage build (deps → builder → runner). Single image runs both `pnpm start` (web) and `pnpm worker` per command override. Non-root `nextjs` user, exposes 3000. |
| `.dockerignore` | Excludes node_modules, .next, .git, .env, IDE/OS junk, docs/progress (those ship with code but not in the build context to keep it lean) |
| `compose.yaml` | Updated `app` + `worker` to use `env_file: .env` with explicit `DATABASE_URL` override that points at the `postgres` service hostname (so prod profile works without manual env tweaking). Added Docker `HEALTHCHECK` on app via `/api/health`. |
| `src/app/api/health/route.ts` | Liveness/readiness endpoint. Returns `{status, version, time, checks: {app, db}}`. 200 if Postgres reachable; 503 otherwise. Cached 5s. |
| `next.config.ts` | Added `turbopack.root` to silence Next 16's "multiple lockfiles" warning at build time. |
| `src/lib/seed.ts` | Refuses to run in `NODE_ENV=production` unless `ALLOW_SEED_IN_PRODUCTION=1` is set. Reason: `seed()` `TRUNCATE`s every table; one accidental prod run wipes the database. |
| `docs/deploy.md` | First-time setup walkthrough, required env vars table, migrations / logs / backups / rollback / common ops sections, "what's not yet automated" section pointing at registry, secrets, observability, CI as next steps. |
## What's verified
- `pnpm test`**92/92 green**
- `pnpm lint` — clean
- `pnpm exec tsc --noEmit` — clean
- `docker-compose --profile prod build app` — succeeds; produces `touchbase/app:latest` at 1.73 GB
- Live container smoke:
- Started `app` container against the dev Postgres → `/api/health` returns `{"status":"ok","checks":{"app":"ok","db":"ok"}}` and `/` returns 200 HTML
- Started `worker` container with `pnpm worker` command override → "[worker] pg-boss started; handlers registered, idling"
## Decisions ratified
| Decision | Resolution |
|---|---|
| Image strategy | **Single image, full deps** for both web and worker. Container picks via `command:` override. Reason: simpler than two images, ~1.7 GB is acceptable for v1. Optimization to standalone-mode + slim image deferred. |
| Base image | `node:22-bookworm-slim` (Debian) over `node:22-alpine`. Reason: better native-deps compatibility (sharp, esbuild) at the cost of ~80 MB. |
| pnpm in runner stage | `npm install --global pnpm@10.18.3` instead of `corepack prepare`. Reason: corepack writes to user-specific cache; the non-root `nextjs` user can't write to `~/.cache` if corepack-prepare ran as root in the image. Global npm install is simpler and works for any UID. |
| Healthcheck endpoint | `/api/health` with `db.$queryRaw\`SELECT 1\`` ping. 200 healthy / 503 degraded. Cached 5s so flapping monitors don't hammer the DB. |
| Compose env strategy | `env_file: .env` for convenience; explicit `DATABASE_URL: ${DATABASE_URL:-postgresql://...postgres:5432/...}` override so prod-profile-locally just works (uses internal docker network hostname). For prod deploy, user `export DATABASE_URL=...` to override before `docker-compose up`. |
| Seed safety in prod | `seed()` throws if `NODE_ENV=production` unless `ALLOW_SEED_IN_PRODUCTION=1`. Reason: forgetting that `seed` wipes data is a foot-gun; the env var makes "I really mean it" explicit. |
| Migration strategy | `pnpm exec prisma migrate deploy` from the host or via `docker-compose exec app`. Non-destructive; no resets. Documented in `docs/deploy.md`. |
| Where logs go | Container stdout/stderr → Docker's logging driver. Default json-file is fine for v1; structured logging (Pino) + remote sink is a "next step" item. |
| Image tag in compose | `touchbase/app:latest` (was `:dev`). Reason: clearer naming for prod. Real tag-by-SHA discipline waits until we have an image registry. |
## Gotchas hit
### corepack + non-root user
First boot of the app container crashed with `EACCES: permission denied, mkdir '/home/nextjs/.cache/node/corepack/v1'`. Corepack (which is bundled with Node 22) downloads pnpm to a user-specific cache on first invocation; the runner stage activated pnpm as root, so when the runtime user `nextjs` tried to invoke `pnpm start`, corepack tried to download into nextjs's home dir, which was empty/unwritable. Fix: install pnpm globally via `npm install --global pnpm@10.18.3` in the runner stage so it's on PATH for any UID.
### Prisma openssl warnings
Build emits `Prisma failed to detect the libssl/openssl version to use, and may not work as expected. Defaulting to "openssl-1.1.x".` These are harmless — Prisma 7 with the driver-adapter pattern doesn't use the native query engine binary at runtime. The warning fires during `prisma generate` checking for binary fallback compatibility. Could silence by `apt-get install -y openssl` in the builder stage; deferred.
### Image size (1.73 GB)
Bigger than ideal. Sources of bulk: full `node_modules` (we keep dev deps for `tsx` to run the worker, plus `prisma` CLI for migrations); the regular Next build vs standalone output. Optimization paths:
- **Standalone Next build** + separate slim worker image (~150 MB web, ~300 MB worker, total ~450 MB vs 1.7 GB)
- Compile worker to a single bundle with esbuild → drop tsx from runtime
- Strip dev deps from runtime — would also need to compile `scripts/seed.ts` and any other tsx-only entries
Deferred to "production prep v2" if/when image size becomes a constraint (it isn't for one-host deploys).
## Open questions
1. Customer-visible brand name (still pending)
2. Currency
3. Stripe account ownership
4. Real domain + Caddy config
5. **NEW**: image registry choice (GHCR, ECR, Docker Hub)? Defer until we deploy to more than one host.
6. **NEW**: CI provider (GitHub Actions presumably) — when the practice wants formal release discipline.
## Roadmap status
- Backend 14 done
- 5a + 5b + 5c.1 + 5c.2 done
- UX phases AE done
- 5d Stripe — scaffolded; awaits real keys for live verification
- 5e reminders — done
- **Production prep — done 2026-05-03 (this session)**
**v1 is now soft-launchable.** Remaining gates before opening to real customers:
1. **Live Stripe verification in test mode** — user provides keys, we run the deposit flow end-to-end.
2. **Brand + policy decisions** — customer-visible name, ToS/cancellation copy, real domain.
3. **First production deploy** — stand up the host, run through `docs/deploy.md`, point a real domain at it.
Code-wise, nothing else is needed for soft launch.
## Recommended next step
Two-track:
- **Code track**: nothing critical. Optional polish: 24h reminder is hard-coded — make `REMINDER_LEAD_MS` configurable; add a Sentry/GlitchTip integration for prod error tracking; CI workflow with `pnpm test`/`lint`/`tsc`. Each is half-day-ish.
- **Operational track**: depends on you — Stripe keys, domain, deploy host, brand decisions.
Pause is appropriate here. Pick up when you want to either verify Stripe live, deploy somewhere, or polish a specific thing.
## How to resume
```bash
cd /Users/noise/Documents/code/touchbase
docker-compose --profile prod build # builds app + worker images (~3 min)
docker-compose --profile prod up -d # starts postgres + app + worker + caddy
curl -sf http://localhost:3000/api/health # confirms app + db reachable
docker-compose --profile prod logs -f
```
Or for a single-component test:
```bash
docker-compose up -d postgres # dev-style postgres
docker run --rm --network touchbase_default \
-e DATABASE_URL="postgresql://touchbase:touchbase@postgres:5432/touchbase_dev?schema=public" \
-e AUTH_SECRET=test -p 3001:3000 \
touchbase/app:latest
# /api/health on :3001
```