AI Leaderboard & Ranking Vocabulary

LMSYS Chatbot Arena, Elo ratings, HELM, Open LLM Leaderboard, contamination, and benchmark gaming concerns.

Key vocabulary

  • LMSYS Chatbot Arena — a crowdsourced leaderboard where users rate model responses in blind pairwise comparisons.
  • Elo rating — a score derived from pairwise win/loss results; higher Elo means more wins against stronger opponents.
  • Contamination — when benchmark test data appears in a model’s training set, inflating its score unfairly.
  • Benchmark gaming — optimizing specifically for leaderboard metrics without improving real-world capability.
  • HELM (Holistic Evaluation of Language Models) — a benchmark suite measuring models across many scenarios and metrics simultaneously.
0 / 5 completed
1 / 5
LMSYS Chatbot Arena rankings are based on: