Jules Kourelakos

Eval-driven development

Created April 22, 2026 · Last updated April 22, 2026

Anthropic - Demystifying evals for AI agents

(paper) Evaluation-Driven Development of LLM Agents: A Process Model and Reference Architecture