Advanced AI Alignment & Safety Red-teamingSafetyAdversarial

AI Red-Teaming — Vocabulary

5 exercises — Learn the vocabulary of AI red-teaming: jailbreaks, prompt injection, adversarial probing, and safety evaluation.

0 / 5 completed

1 / 5

In AI safety, red-teaming refers to:

2 / 5

A user crafts a prompt that causes the model to ignore its system instructions and reveal confidential data. This is called:

3 / 5

A jailbreak in the context of AI models is:

4 / 5

The team reports overrefusal as a problem with the new model. What does this mean?

5 / 5

Which sentence correctly uses adversarial probing in context?