#alignment
2 articles tagged #alignment
All English for IT articles related to #alignment.
-
AI Safety English: Vocabulary for Alignment, Red-Teaming, and Safety Evaluation
Alignment, corrigibility, RLHF, reward hacking, jailbreak — the precise English vocabulary AI safety researchers and LLM engineers use in safety reviews and evaluations.
-
Vocabulary for AI Safety Engineers
Essential English vocabulary for AI safety engineers: red-teaming, adversarial prompts, hallucination, guardrails, alignment, RLHF, and constitutional AI explained.