Files
touchbase/docs/progress/2026-05-05-rate-limit-and-audit.md
2026-05-05 10:39:59 -04:00

6.9 KiB

2026-05-05 — Rate Limiting + Audit Logging

Companion to BuildLog.md. Predecessor: 2026-05-03-ci-and-polish.md.

Milestone

Two safety-net additions: rate limiting on the magic-link signin endpoint (5/min/IP) to prevent email-spam abuse, and audit logging on every booking lifecycle transition + admin entity create/update + sign-in. AuditLog rows now accumulate as the practice operates — load-bearing for any future compliance conversation.

What landed

Rate limiting

Path Role
src/lib/rate-limit.ts In-memory sliding-window limiter. check(key, limit, windowMs){ ok, remaining } or { ok: false, retryAfterSec }. Periodic cleanup of expired buckets so the map doesn't grow forever.
src/middleware.ts Next.js middleware. Matches /api/auth/signin/:path* POSTs; 5 per minute per IP. Returns 429 with Retry-After header when over.
test/rate-limit.test.ts 5 tests covering under/at/over limit, per-key independence, window reset, remaining count.

Audit logging

Path Role
src/lib/audit.ts audit(db, { actorId, action, entityType, entityId, meta? }). Best-effort (catches and logs write failures). Pulls IP + UA from request headers via next/headers when available; nulls when called outside a request context (CLI, worker, tests).
Action wiring Booking lifecycle: booking.created (public + admin + reschedule), booking.cancelled (customer + admin), booking.rescheduled, booking.completed, booking.no_show. Admin CRUD: service.created/updated, room.created/updated, therapist.created/updated. Auth: user.signed_in via Auth.js events.signIn (lazy-imported).
test/audit.test.ts 4 tests covering row creation, null IP/UA outside request context, null actorId for system events, best-effort no-throw semantics.

What's verified

  • pnpm test101/101 green (was 92; +5 rate-limit + +4 audit)
  • pnpm lint — clean
  • pnpm exec tsc --noEmit — clean
  • Live smoke (curl + Mailpit, dev server):
    • Rate limit: 6 rapid signin POSTs to /api/auth/signin/nodemailer → first 5 return 302 (success), 6th returns 429.
    • Audit: signed in as admin@touchbase.local via magic link → AuditLog row written: action='user.signed_in', entityType='User', ip='::1', ua=<UA string>.

Decisions ratified

Decision Resolution
Limiter implementation In-memory sliding window. Single-host only; resets per process. Reason: matches our v1 deployment model (one host). When we go multi-instance, swap for Redis or pg-backed; the call site stays the same.
Where to apply rate limit /api/auth/signin/* POSTs. Reason: highest-value abuse target — sends email per request. Booking flow has DB exclusion-constraint protection already; magic-link spam has none until now.
Limit value 5 per minute per IP. Reason: tight enough to prevent abuse, loose enough that a user retrying because of a typo isn't blocked. Tighten if we see abuse.
IP extraction x-forwarded-for[0]x-real-ip"unknown". Reason: any of the standard proxy headers (Caddy sets x-forwarded-for); fallback prevents undefined-key bugs.
Audit semantics Best-effort writes — failures are logged but don't propagate. Reason: a misbehaving audit table must not break a booking.
Audit meta typing Prisma.InputJsonValue (the typed JSON input). Null encoded as Prisma.JsonNull per Prisma 7 convention.
Auth signIn audit events.signIn callback in Auth.js config, lazy-imports @/lib/audit and @/lib/db. Reason: importing Prisma at the top of src/auth.ts adds noticeable cold-start to the auth handler; lazy import only pays the cost on actual sign-in.
Action vocabulary Dot-separated entity.verb: booking.created, service.updated, user.signed_in. Documented at the top of src/lib/audit.ts.
What's instrumented Booking lifecycle (5 actions) + admin CRUD entity create/update (services, rooms, therapists) + signin. Not yet instrumented: working-hours / overrides / room-blocks edits (high-volume, low compliance value); customer notes views (no UI for that yet); Stripe events (covered separately by Notification + Payment rows).
Where audit is stored The existing AuditLog table from BuildLog.md §4.8. Append-only. No retention policy yet — documented as a future concern.

Gotchas hit

Prisma.InputJsonValue vs Record<string, unknown> typing

First pass typed meta as Record<string, unknown> and Prisma's generated meta field rejected it ('Record<string, unknown>' is not assignable to type 'NullableJsonNullValueInput | InputJsonValue | undefined'). Switched to Prisma.InputJsonValue and used Prisma.JsonNull for the absent case. Required import { Prisma } from "..." (value, not type).

Open questions

  1. Customer-visible brand name (still pending)
  2. Currency
  3. Stripe account ownership
  4. Real domain
  5. NEW: AuditLog retention policy. Currently rows accumulate forever. For a compliance/legal conversation later, the practice will probably want a documented retention window (e.g. 7 years for healthcare-adjacent records). Defer.
  6. NEW: Rate-limit tightening — current 5/min/IP might be too generous for a busy practice; revisit once we have data.

Roadmap status

  • v1 feature-complete
  • Production prep done
  • CI in place
  • Reminder lead time configurable
  • Rate limiting + audit logging done 2026-05-05 (this session)

Outstanding gates before opening to real customers (operational, not code):

  1. Live Stripe verification in test mode
  2. Brand + policy decisions
  3. First production deploy

Code-wise, the remaining "next step" items are all genuinely optional polish:

  • Sentry / GlitchTip for prod error tracking
  • CSP / security headers via Caddy
  • Audit logging for working-hours / overrides / room-blocks if we want full coverage
  • AuditLog retention job (pg-boss recurring delete-older-than-X)
  • Push the docker image to a registry once we have one

This is a clean place to pause for operational / brand decisions. The code stack is genuinely production-ready for a soft launch.

If continuing on the code side, my pick: Sentry/GlitchTip as the last load-bearing prod-readiness item. Half-day. Everything else is nice-to-have.

How to resume

cd /Users/noise/Documents/code/touchbase
docker-compose up -d postgres mailpit
pnpm db:seed
pnpm dev

# Test rate limiting (returns 429 on 6th attempt within a minute):
TOKEN=$(curl -s -c /tmp/tb.txt http://localhost:3000/api/auth/csrf | jq -r .csrfToken)
for i in 1 2 3 4 5 6; do
  curl -s -b /tmp/tb.txt -X POST -d "csrfToken=$TOKEN&email=test$i@x.com" \
    -o /dev/null -w "%{http_code}\n" \
    http://localhost:3000/api/auth/signin/nodemailer
done

# Inspect audit log:
docker exec touchbase-postgres-1 psql -U touchbase -d touchbase_dev \
  -c 'SELECT action, "entityType", "entityId", ip, "createdAt" FROM "AuditLog" ORDER BY "createdAt" DESC LIMIT 20;'