Research
Research notes
Deep dives into RAG quality, vector search and inference performance.
2024-12-10
RAG Optimization Playbook
A practical blueprint for improving retrieval quality, reranking and source attribution in RAG systems in plain language.
2024-11-03
Vector Search Tradeoffs for Production
How to choose between ANN libraries, index sizes and metadata filters when latency matters and budgets are pretending to be infinite.
2024-10-15
Scaling FastAPI for Inference
Concurrency patterns, batching strategies and caching to keep inference under 1s without setting the server on fire.
2025-01-05
Evaluating LLM Reliability (Without the Drama)
A framework for measuring hallucinations, attribution quality and regressions that actually matter in production.
2024-09-02
RAG Guardrails That Do Not Ruin UX
Guardrails should reduce risk without making users wait 12 seconds for a refusal.