Advanced AI Alignment & Safety CorrigibilityControlSafety

Corrigibility Vocabulary

5 exercises — Learn the vocabulary of AI corrigibility: corrigible AI, instrumental convergence, resistance to shutdown, and the tension between capability and human control.

0 / 5 completed

1 / 5

What does it mean for an AI system to be corrigible?

2 / 5

A researcher says: "Advanced capabilities and corrigibility may be in tension." What tension are they describing?

3 / 5

What is instrumental convergence in the context of AI safety?

4 / 5

In the sentence: "The agent is corrigible if it doesn't resist human control", what is the key behavioural implication?

5 / 5

Why is a fully corrigible AI also considered potentially dangerous by some alignment researchers?