Advanced LLM App Development #llm-evaluation#faithfulness#hallucination#ragas

LLM Application Evaluation Language

5 exercises — Use the precise vocabulary for faithfulness, context recall, hallucination, LLM-as-judge, and eval dataset curation.

0 / 5 completed
Quick reference: LLM Evaluation Metrics
  • faithfulness — every claim in the answer is supported by retrieved context (no hallucination)
  • answer relevance — the answer addresses the user's actual question
  • context recall — fraction of all relevant corpus chunks that were retrieved
  • LLM-as-judge — using a language model to score another model's outputs at scale
  • groundedness score — proportion of response claims traceable to real source evidence
1 / 5

A colleague explains their RAG evaluation setup: "We run two RAGAS metrics — faithfulness and answer relevance. They keep getting confused in our team docs." What is the precise distinction between these two metrics?