Advanced AI Alignment & Safety RLHFFine-tuningSafety

RLHF & Preference Learning — Vocabulary

5 exercises — Learn the key vocabulary of RLHF: reward models, PPO, DPO, Constitutional AI, and preference labeling.

0 / 5 completed
1 / 5
What is a reward model in an RLHF pipeline?