Jules Kourelakos

Eval-driven development

Anthropic - Demystifying evals for AI agents

(paper) Evaluation-Driven Development of LLM Agents: A Process Model and Reference Architecture