5 exercises — Learn scalable oversight vocabulary: debate, amplification, iterated amplification, and how humans supervise AI they cannot fully verify.
0 / 5 completed
1 / 5
What is the core problem that scalable oversight research tries to solve?
Scalable oversight addresses the challenge that as AI systems become more capable, humans may no longer be able to reliably judge whether AI outputs are correct, safe, or aligned. The goal is to develop techniques that extend human supervisory ability even when the AI surpasses human expertise in the task domain.
2 / 5
In AI debate (a scalable oversight technique), what is the key assumption?
The debate approach (proposed by Irving et al., OpenAI) assumes that a non-expert human judge can more reliably identify which of two opposing AI arguments is correct than they could produce the correct answer themselves. Two AI agents argue opposite sides; the human judges the more persuasive and truthful argument.
3 / 5
A colleague explains: "The human's ability to evaluate AI outputs doesn't scale as the AI gets smarter." Which technique directly addresses this by recursively decomposing tasks?
Iterated amplification (Christiano et al.) addresses the scaling problem by recursively breaking complex tasks into smaller sub-tasks that humans can evaluate. A human assisted by AI (the "amplified" human) supervises the AI on sub-tasks, and this process iterates until the full task is covered — extending human oversight without requiring the human to directly evaluate complex outputs.
4 / 5
What does amplification mean in the context of scalable oversight?
Amplification means augmenting human supervisory ability with AI assistance. A human can consult a (weaker or parallel) AI to help decompose a problem, check sub-answers, or reason about complex outputs — effectively "amplifying" what the human can supervise. The combined human+AI overseer then trains the next, more capable AI iteration.
5 / 5
Why is scalable oversight particularly critical for superhuman AI systems?
For AI systems operating at superhuman capability levels, a human reviewer cannot simply "check the answer." Scalable oversight techniques like debate and amplification are designed precisely for this regime — providing oversight mechanisms that don't require the human to be smarter than the AI in order to supervise it effectively.