Advanced AI Alignment & Safety SafetyInterpretabilityAlignment

AI Safety Properties — Vocabulary

5 exercises — Learn vocabulary for AI safety properties: corrigibility, scalable oversight, interpretability, and deceptive alignment.

0 / 5 completed
1 / 5
A corrigible AI is one that: