#red-teaming
2 articles tagged #red-teaming
All English for IT articles related to #red-teaming.
-
AI Safety English: Vocabulary for Alignment, Red-Teaming, and Safety Evaluation
Alignment, corrigibility, RLHF, reward hacking, jailbreak — the precise English vocabulary AI safety researchers and LLM engineers use in safety reviews and evaluations.
-
English for ML Security Engineers: Adversarial Attacks, Poisoning, and Model Integrity
Learn the English vocabulary and natural discussion phrases used by ML security engineers covering adversarial examples, data poisoning, and model red-teaming.