Practice the vocabulary of displaying a model's response incrementally as it's generated.
0 / 5 completed
1 / 5
At standup, a dev mentions displaying a language model's response to the user incrementally, token by token, as it's generated, rather than waiting for the entire response to finish first. What is this pattern called?
Token streaming displays a language model's response incrementally to the user as each token is generated, rather than waiting for the entire response to finish before showing any of it. This significantly improves perceived responsiveness, since the user sees the first words appear almost immediately instead of facing a blank wait for a potentially long full response. It's become the standard user experience pattern for a chat-style AI interface.
2 / 5
During a design review, the team wants the client to gracefully handle a dropped connection partway through a streamed response, resuming or clearly indicating the interruption rather than silently showing an incomplete answer. Which capability supports this?
Stream interruption handling detects a dropped connection partway through a streamed response and either resumes it or clearly shows the user an interrupted state, rather than silently leaving a truncated answer displayed as though it were complete. Silently showing an incomplete response risks the user mistaking a cut-off answer for the model's actual full intended output. This graceful handling is an important detail for a reliable streaming user experience, since a network interruption is a realistic, expected occurrence.
3 / 5
In a code review, a dev notices the backend uses a server-sent events connection to push each generated token to the client incrementally as it becomes available. What does this represent?
Server-sent events, or SSE, provide a lightweight, one-directional connection the backend uses to push each generated token to the client incrementally as it becomes available, without the client needing to repeatedly poll for an update. A single request-response cycle returning the entire response only once forces the user to wait for the full generation before seeing anything. SSE, along with WebSockets, is a common transport choice for implementing token streaming in a web-based chat interface.
4 / 5
An incident report shows a client rendering streamed tokens directly as raw, unescaped HTML let a model's generated text inject unintended markup into the page. What practice would prevent this?
Sanitizing or safely escaping each streamed token before rendering it into the page prevents a model's generated text, which is fundamentally untrusted output, from being interpreted as unintended HTML markup. Rendering raw, unescaped tokens directly creates a real injection risk similar to any other untrusted content rendered without proper escaping. This sanitization step matters for streamed output just as much as it does for any other user-facing or model-facing dynamic content.
5 / 5
During a PR review, a teammate asks why the team streams a model's response incrementally instead of simply waiting for the full response and displaying it all at once. What is the reasoning?
Waiting for the full response and displaying it all at once can leave the user staring at a blank wait, especially for a longer generated response, with no feedback that anything is happening. Streaming shows the first tokens almost immediately, making the interaction feel substantially more responsive even though the total generation time is the same. The tradeoff is the added implementation complexity of a streaming transport and handling a partial or interrupted response gracefully.