Quality & evals
How to know your AI is actually good — and stays good. Write evals, gate them in CI, review agent-written code, and keep cost and context under control. The senior skill.
AI output is non-deterministic — the same prompt can be great today and broken tomorrow. Anyone can ship an AI feature; the people who get paid can prove it works and catch it when it regresses. That’s this track.
What you’ll learn
- Why AI output needs testing — the difference between “it worked when I tried it” and “it works.” (LLM evaluation, plainly)
- Writing evals — turning “is this good?” into a score you can track. (Evaluating AI Agents)
- CI gates — wiring evals into GitHub Actions so a bad prompt turns the build red before it ships.
- Reviewing agent code — reading a diff critically, spotting the plausible-but-wrong.
- Cost & context management — keeping the loop fed without burning tokens or drowning in stale context.
The build
Add an eval suite and a CI gate to an AI feature (project 11). Done means a deliberately bad prompt fails your automated check.
The track that separates hobby from production. Lessons in progress — newsletter first.
Eval suite + CI gate on an AI feature
done → A bad prompt turns your GitHub Actions check red.