What you will learn.
Evals are the single most-discussed and least-built piece of infrastructure in modern AI engineering. Everyone agrees they matter; almost no one wants to do the work of building them well. The result, across our industry, is a quiet epidemic of teams shipping model and prompt changes with no idea whether they helped.
This course is a focused, five-week treatment of evals and observability as engineering disciplines. We cover golden-set construction, adversarial set generation, pairwise model-graded evaluation, online judges, regression gating, drift detection, structured tracing across model boundaries, and the kinds of dashboards that survive contact with a skeptical executive. The capstone is a real, running eval suite plus an observability surface, delivered against a system of your choice.
This is not a course about LLM theory. It is a course about the unromantic but high-leverage work that distinguishes teams who improve from teams who churn.
