Reading Leaderboard Results — Comprehension Exercises
Read the leaderboard entry description below, then answer comprehension questions about Elo ratings, ranking methodology, and result interpretation.
📄 PASSAGE — Read carefully before answering
OpenChat Leaderboard — Methodology and Current Rankings (Excerpt)
Rankings are determined by Elo rating, a scoring system borrowed from competitive chess. When two models face each other in a "battle" — evaluated on the same prompt by human raters who choose a winner — the winning model gains Elo points and the losing model loses points. A model with a higher Elo score has won more head-to-head comparisons relative to the difficulty of its opponents.
Current top entries: Solaris-34B leads with an Elo of 1,312 (±18), based on 14,200 battles. Meridian-13B holds second place at 1,284 (±22), with 9,600 battles. Apex-7B is ranked third at 1,241 (±31), with 4,800 battles.
Models with fewer battles have wider confidence intervals, meaning their true ranking is less certain. The leaderboard notes contamination concerns for two models not shown here: their training data may have included prompts used in the evaluation set, inflating their apparent scores.
Rankings should be interpreted as community preference signals, not definitive capability measurements. Voter demographics, prompt selection, and language distribution all influence outcomes. The leaderboard is updated weekly.
Question 1 of 4