Prompt Testing Vocabulary

Prompt regression testing, prompt versioning, A/B prompt comparison, golden datasets, eval harnesses, PromptFoo, and Braintrust vocabulary.

Key vocabulary

  • Prompt regression testing — running a fixed set of test cases against a prompt after every change, to verify that new edits do not break previously working behaviour.
  • Prompt versioning — tracking prompt changes with version identifiers (v1, v2…) so you can reproduce results and roll back if quality degrades.
  • Golden dataset — a curated set of inputs with known correct outputs used as the ground truth for evaluating prompts.
  • Eval harness — the infrastructure (code + datasets + metrics) that runs evaluations automatically and reports results.
  • A/B prompt comparison — running two prompt variants on the same inputs and comparing outputs to determine which performs better.
0 / 5 completed
1 / 5
A team runs their full prompt test suite after every PR that modifies a system prompt. This practice is called: