Learn the vocabulary of grounding a language model's answers in retrieved source content.
0 / 5 completed
1 / 5
At standup, a dev mentions retrieving relevant documents from a knowledge base and feeding them into a language model's prompt before it generates an answer. What is this pattern called?
Retrieval-augmented generation retrieves relevant documents from an external knowledge base and includes them in the model's prompt, letting it generate an answer grounded in that specific content rather than relying solely on what it learned during training. This lets a model answer accurately about content that's current or specific to an organization, without retraining the model itself. It's become a standard pattern for building a knowledge-grounded AI assistant.
2 / 5
During a design review, the team wants to break a long source document into smaller pieces before embedding them, so retrieval returns focused, relevant sections instead of entire documents. Which capability supports this?
Document chunking splits a long source document into smaller, more focused pieces before embedding, so a retrieval query returns a specific relevant section rather than an entire lengthy document that mixes relevant and irrelevant content together. This improves both retrieval precision and the amount of the model's limited context window that gets used efficiently. Choosing a good chunk size is a real tuning decision, since chunks that are too small can lose context and chunks that are too large dilute relevance.
3 / 5
In a code review, a dev notices the pipeline reorders retrieved chunks by a secondary relevance model before passing only the top few into the prompt. What does this represent?
Re-ranking retrieved results applies a secondary, often more precise relevance model to reorder an initial set of candidates, then keeps only the top few for the final prompt. This improves answer quality by filtering out marginally relevant chunks that a faster initial retrieval step let through. It's a common two-stage pattern, since a fast broad retrieval followed by a slower precise re-rank balances speed and accuracy.
4 / 5
An incident report shows a RAG-based assistant confidently answered a question using an outdated retrieved chunk, since the underlying source document had since been corrected. What practice would reduce this risk?
Keeping the retrieval index synchronized with each source document's latest version ensures a retrieved chunk reflects current, corrected content rather than a stale snapshot. Assuming retrieval always reflects the latest content skips the real synchronization work required to keep an index current. This matters especially for a knowledge base where accuracy of specific facts, like policy details, has real consequences.
5 / 5
During a PR review, a teammate asks why the team retrieves and injects relevant documents into the prompt instead of just asking the language model directly without any retrieval step. What is the reasoning?
Asking a language model directly without retrieval limits its answer to whatever it learned during training, which may be outdated or missing organization-specific content entirely. Injecting retrieved, current source content grounds the answer in accurate, relevant material the model can directly reference. The tradeoff is the added pipeline complexity of chunking, embedding, retrieving, and re-ranking before generation even begins.