AdvancedVocabulary#ai-llm#developer-tools#data-science-ml

LLM Eval Benchmarking Vocabulary

Practice the vocabulary of scoring a model release against a repeatable evaluation suite.

0 / 5 completed
1 / 5
At standup, a dev mentions running a fixed set of representative prompts through a model release and scoring its outputs against a rubric before shipping it to users. What is this practice called?