Engineering writing on the systems we actually run.
Long-form notes on AI orchestration, RAG architecture, scaling decisions, multi-tenant SaaS, and the operational discipline behind production systems. Written by the engineers building them.
Production RAG architecture: what survives contact with real users.
A long-form walkthrough of how we design retrieval pipelines that hold up beyond the demo: chunking strategies, hybrid search, re-ranking, evaluation harnesses, and the cost discipline that separates a hobby project from a production system.
Observability for LLM systems isn't APM.
Token economics, prompt drift, eval regressions, and tool-use failures don't show up in a Datadog dashboard. Here's the observability stack we ship with every AI system.
Tenant isolation when shared infra is the only option.
Row-level isolation, schema-per-tenant, and database-per-tenant aren't a continuum. They're different products. Picking the right one is the most consequential architectural decision in a SaaS.
Scaling Django past the easy plateau.
The specific patterns that keep a Django app honest as it crosses from "first scale issue" to "the platform our business runs on." Pooling, async tasks, query budgets, and what to outsource to Postgres.
Multi-model routing without lock-in.
A pattern for routing requests across providers with consistent telemetry, predictable cost ceilings, and safe failover when one model dies on you mid-day.