Learn the vocabulary of managing how much text a model can consider at once.
0 / 5 completed
1 / 5
At standup, a dev mentions the maximum amount of text, measured in tokens, that a model can consider at once when generating a response. What is this limit called?
The context window is the maximum number of tokens a model can consider at once, spanning the prompt, any retrieved context, and the response it's generating. The training dataset size is a separate concept describing what the model learned from, not what it can process in a single call. Managing this window carefully is essential once a conversation or document grows close to that limit.
2 / 5
During a design review, the team wants older, less relevant turns of a long conversation summarized and compressed rather than kept verbatim as the context window fills up. Which capability supports this?
Conversation summarization compresses older, less relevant turns into a shorter summary, freeing up space in the context window for new, more relevant content. Keeping every turn verbatim forever eventually exceeds the window and forces an abrupt truncation. This summarization keeps a long-running conversation coherent without silently losing the most recent, most relevant exchanges.
3 / 5
In a code review, a dev notices the system estimates a prompt's token count before sending it, rejecting or trimming it if it would exceed the model's context window. What does this represent?
Proactive token budgeting estimates a prompt's token count before sending it, rejecting or trimming content that would exceed the context window rather than letting the call fail unexpectedly. Sending every prompt with no estimate risks an opaque error partway through a user's request. This budgeting step catches an oversized prompt before it ever reaches the model.
4 / 5
An incident report shows a long document was silently truncated mid-sentence when it exceeded the context window, and the model's summary omitted key content from the cut-off portion. What practice would prevent this?
Chunking a long document into overlapping segments that each fit the context window ensures no portion is silently dropped before the model ever sees it. Sending the whole document as one prompt regardless of length risks exactly the kind of mid-sentence truncation this incident describes. This chunking approach is standard practice for handling input longer than any single context window.
5 / 5
During a PR review, a teammate asks why the team budgets tokens carefully instead of just sending the full conversation history with every request. What is the reasoning?
Exceeding the context window causes a hard failure or an unpredictable truncation that can silently drop important content. Proactive budgeting, including summarization of older turns, keeps a long-running conversation usable well past what a single window could otherwise hold. The tradeoff is the added complexity of tracking token counts and deciding what to compress or drop.