← all tracks
T5

Quality & evals

How to know your AI is actually good — and stays good. Write evals, gate them in CI, review agent-written code, and keep cost and context under control. The senior skill.

Claude CodeCodex

AI output is non-deterministic — the same prompt can be great today and broken tomorrow. Anyone can ship an AI feature; the people who get paid can prove it works and catch it when it regresses. That’s this track.

What you’ll learn

  • Why AI output needs testing — the difference between “it worked when I tried it” and “it works.” (LLM evaluation, plainly)
  • Writing evals — turning “is this good?” into a score you can track. (Evaluating AI Agents)
  • CI gates — wiring evals into GitHub Actions so a bad prompt turns the build red before it ships.
  • Reviewing agent code — reading a diff critically, spotting the plausible-but-wrong.
  • Cost & context management — keeping the loop fed without burning tokens or drowning in stale context.

The build

Add an eval suite and a CI gate to an AI feature (project 11). Done means a deliberately bad prompt fails your automated check.

The track that separates hobby from production. Lessons in progress — newsletter first.

Build these in this track
11 Advanced

Eval suite + CI gate on an AI feature

done → A bad prompt turns your GitHub Actions check red.

Free · the newsletter

Get the build logs + the Mirror System

One email a week: a real build broken down, plus a working reference agent you can clone. No fluff, unsubscribe anytime.

No spam. Unsubscribe in one click.