Research

Research notes

Deep dives into RAG quality, vector search and inference performance.

A practical blueprint for improving retrieval quality, reranking and source attribution in RAG systems in plain language.

How to choose between ANN libraries, index sizes and metadata filters when latency matters and budgets are pretending to be infinite.

Concurrency patterns, batching strategies and caching to keep inference under 1s without setting the server on fire.

A framework for measuring hallucinations, attribution quality and regressions that actually matter in production.

Guardrails should reduce risk without making users wait 12 seconds for a refusal.