What goes in your eval checklist before shipping?

by 16h ago

I am collecting the non-obvious checks teams wish they had run before an LLM feature hit production.

by · 16h ago

Latency budget first. Nobody cares about clever output if the spinner lasts forever.

by · 15h ago

I always add one adversarial example per happy path now.

by · 14h ago

Style drift matters more than people admit once support teams start reading the output.

by · 14h ago

We diff schema compliance between model versions before we look at tone.