5 exercises — Practice the language used in AI safety reports, capability disclosures, and risk tier communication.
0 / 5 completed
1 / 5
A safety team says: "This feature imposes an alignment tax." What do they mean?
The 'alignment tax' is the practical cost of making a model safer — e.g., refusals that reduce helpfulness, extra inference latency from safety classifiers, or reduced benchmark scores. It is a real trade-off AI teams must weigh.
2 / 5
A researcher discovers a jailbreak in a deployed model. Following responsible disclosure practice, they should:
Responsible disclosure in AI mirrors cybersecurity practice — the discoverer notifies the developer privately first, giving them time to patch or mitigate before the vulnerability becomes public knowledge.
3 / 5
Which sentence correctly uses dual-use in an AI context?
Dual-use in AI safety means a capability can serve beneficial purposes (e.g., biosecurity research) but could also be misused (e.g., bioweapon design). Dual-use capabilities require extra scrutiny before deployment.
4 / 5
When a capability is classified as a high risk tier, this means:
Risk tiers classify AI capabilities by potential harm (e.g., CBRN risks, cyberoffense, CSAM) and determine what mitigations must be in place before a model with that capability is deployed or made accessible.
5 / 5
Fill in the blank: "We identified a jailbreak vector and are implementing a ___ in the next model version."
A 'mitigation' is the standard term for a countermeasure addressing an identified vulnerability — e.g., a classifier that catches the jailbreak pattern, a prompt prefix that reinforces safety instructions, or a new RLHF round targeting the behaviour.