Notes From Production

Engineering writing on the systems we actually run.

Long-form notes on AI orchestration, RAG architecture, scaling decisions, multi-tenant SaaS, and the operational discipline behind production systems. Written by the engineers building them.

Subscribe Talk to Us

RAG · Architecture

Production RAG architecture: what survives contact with real users.

A long-form walkthrough of how we design retrieval pipelines that hold up beyond the demo: chunking strategies, hybrid search, re-ranking, evaluation harnesses, and the cost discipline that separates a hobby project from a production system.

Shubham Wadhwa22 min read

Observability

Observability for LLM systems isn't APM.

Token economics, prompt drift, eval regressions, and tool-use failures don't show up in a Datadog dashboard. Here's the observability stack we ship with every AI system.

12 min read

Multi-Tenant SaaS

Tenant isolation when shared infra is the only option.

Row-level isolation, schema-per-tenant, and database-per-tenant aren't a continuum. They're different products. Picking the right one is the most consequential architectural decision in a SaaS.

16 min read

Infrastructure Scaling

Scaling Django past the easy plateau.

The specific patterns that keep a Django app honest as it crosses from "first scale issue" to "the platform our business runs on." Pooling, async tasks, query budgets, and what to outsource to Postgres.

18 min read

AI Orchestration

Multi-model routing without lock-in.

A pattern for routing requests across providers with consistent telemetry, predictable cost ceilings, and safe failover when one model dies on you mid-day.

14 min read