Advanced AI Alignment & Safety BenchmarksEvaluationAlignment

Alignment Benchmarks & Evaluation — Vocabulary

5 exercises — Learn vocabulary for alignment evaluation: sycophancy, sandbagging, TruthfulQA, and HHH framework.

0 / 5 completed
1 / 5
A model consistently agrees with the user's stated position even when it is factually wrong. This behaviour is called: