Evaluating LLM Outputs
3 exercises — spot hallucinations, verify technical claims, and write targeted correction prompts.
0 / 3 completed
Hallucination red flags in LLM output
- Specific version numbers or dates without a cited source
- API names that "sound right" but don't match documentation
- High confidence on verifiable facts ("introduced in...", "deprecated in...")
- Named papers, RFCs, or authors — fabricated citations are common
- Statistics or percentages without a source
- Code that compiles but has subtle logic errors
1 / 3
An LLM confidently states: "The React
useLayoutEffect hook was introduced in React 18." How do you identify and flag this as a potential hallucination?Option C is correct.
Signs that a response may be hallucinated:
• Specific version numbers, dates, or statistics without a cited source
• API names that "feel right" but don't match documentation
• High-confidence language on verifiable facts ("X was introduced in Y")
• Names of papers, authors, or RFC numbers that might be fabricated
Verification workflow for code-related LLM output:
1. Run the code locally — does it actually work?
2. Check official docs for any API, version, or behaviour claims
3. Search for the specific function signature — does it match?
4. Be especially sceptical of "new in version X" or "deprecated in version Y" claims
Asking the same model again is not verification — it will often give the same incorrect answer, or vary between answers, which tells you it's uncertain.
useLayoutEffect was introduced in React 16.8 (February 2019, the same release as all hooks). The model's statement that it was "introduced in React 18" is a hallucination with false precision.Signs that a response may be hallucinated:
• Specific version numbers, dates, or statistics without a cited source
• API names that "feel right" but don't match documentation
• High-confidence language on verifiable facts ("X was introduced in Y")
• Names of papers, authors, or RFC numbers that might be fabricated
Verification workflow for code-related LLM output:
1. Run the code locally — does it actually work?
2. Check official docs for any API, version, or behaviour claims
3. Search for the specific function signature — does it match?
4. Be especially sceptical of "new in version X" or "deprecated in version Y" claims
Asking the same model again is not verification — it will often give the same incorrect answer, or vary between answers, which tells you it's uncertain.
Next up: AI Vocabulary →