T5 · Quality & evals · learntobuild.ai

AI output is non-deterministic — the same prompt can be great today and broken tomorrow. Anyone can ship an AI feature; the people who get paid can prove it works and catch it when it regresses. That’s this track.

What you’ll learn

Why AI output needs testing — the difference between “it worked when I tried it” and “it works.” (LLM evaluation, plainly)
Writing evals — turning “is this good?” into a score you can track. (Evaluating AI Agents)
CI gates — wiring evals into GitHub Actions so a bad prompt turns the build red before it ships.
Reviewing agent code — reading a diff critically, spotting the plausible-but-wrong.
Cost & context management — keeping the loop fed without burning tokens or drowning in stale context.

The build

Add an eval suite and a CI gate to an AI feature (project 11). Done means a deliberately bad prompt fails your automated check.

The track that separates hobby from production. Lessons in progress — newsletter first.

Quality & evals

What you’ll learn

The build

Eval suite + CI gate on an AI feature

Get the build logs + the Mirror System