What goes in your eval checklist before shipping?
by 16h ago
I am collecting the non-obvious checks teams wish they had run before an LLM feature hit production.
7
I am collecting the non-obvious checks teams wish they had run before an LLM feature hit production.
by · 16h ago
Latency budget first. Nobody cares about clever output if the spinner lasts forever.
by · 15h ago
I always add one adversarial example per happy path now.
by · 14h ago
Style drift matters more than people admit once support teams start reading the output.
by · 14h ago
We diff schema compliance between model versions before we look at tone.