Agentic Evaluation Vocabulary

Practice vocabulary for evaluating agentic AI systems: trajectory evaluation, task completion rate, tool call accuracy, and benchmarks.

0 / 5 completed

1 / 5

A researcher says 'We use agent trajectory evaluation.' What does 'trajectory' refer to in this context?

2 / 5

Your team reports 'The agent completed the task in 8 steps vs. the expected 5.' Why does step count matter in agentic evaluation?

3 / 5

An evaluation report shows 'tool call accuracy: 78%.' What does this metric measure?

4 / 5

A colleague mentions 'the agent's reasoning trace shows the agent misidentified the goal.' What is a reasoning trace?

5 / 5

Your team says 'We benchmark the agent on SWE-bench.' What type of benchmark is this?