AdvancedVocabulary#ai-llm#security#developer-tools

AI Red Teaming Vocabulary

Build fluency in the vocabulary of proactively testing a model for a harmful or exploitable response.

0 / 5 completed

1 / 5

At standup, a dev mentions deliberately trying to manipulate a model into producing a harmful or policy-violating response before that model ever reaches real users. What is this practice called?

2 / 5

During a design review, the team wants red teamers to systematically try a known category of attack, like an indirect prompt injection hidden inside retrieved content, rather than testing only obvious, direct attempts. Which capability supports this?

3 / 5

In a code review, a dev notices a discovered exploit is documented with the exact prompt that triggered it and tracked until a fix is verified to close that specific gap. What does this represent?

4 / 5

An incident report shows a known jailbreak technique from a prior red-teaming exercise resurfaced in production months later because the earlier finding was never actually verified as fixed. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team invests in structured AI red teaming instead of just relying on real user reports to surface a harmful model response. What is the reasoning?