Practice the vocabulary of reusing a static prompt segment to reduce latency and cost.
0 / 5 completed
1 / 5
At standup, a dev mentions storing a long, unchanging portion of a prompt, like a system instruction, so the model doesn't have to reprocess it from scratch on every single call. What is this technique called?
Prompt caching stores a long, unchanging portion of a prompt, like a system instruction, so the model doesn't reprocess it from scratch on every call, reducing both latency and cost. Reprocessing the same unchanging content every single time wastes compute on work that produces an identical result. This caching is especially valuable for an application that reuses the same lengthy instructions across many separate requests.
2 / 5
During a design review, the team wants only the truly static, reused prefix of a prompt cached, while the unique, per-request portion is still processed fresh each time. Which capability supports this?
Cache-boundary design separates a prompt's truly static, reused prefix from its unique, per-request portion, so only the static part benefits from caching while the dynamic part is still processed fresh. Caching the entire prompt as one unit fails the moment any part of it changes between requests, since even a tiny difference invalidates the whole cache. This boundary design is what makes prompt caching actually pay off across many varied requests.
3 / 5
In a code review, a dev notices a cached prompt segment automatically expires and gets rebuilt after a short time-to-live window rather than being reused indefinitely without any freshness check. What does this represent?
A time-to-live policy automatically expires and rebuilds a cached prompt segment after a set window, rather than reusing it indefinitely with no freshness check. Reusing a cached segment indefinitely risks it going stale if the underlying static content it represents ever actually changes. This TTL policy balances the cost savings of caching against the risk of serving genuinely outdated cached content.
4 / 5
An incident report shows a prompt's static instructions were updated, but a stale cached version kept being served for hours because the cache had no expiration tied to the content's own version. What practice would prevent this?
Tying a cached prompt segment's validity to a content version or hash invalidates the cache automatically whenever the underlying static instructions actually change. Caching indefinitely with no such tie risks serving a stale, outdated version long after the real content was updated. This version-aware invalidation is what keeps prompt caching both fast and correct.
5 / 5
During a PR review, a teammate asks why the team designs a specific cache boundary instead of just caching the entire prompt as one single block regardless of what varies between requests. What is the reasoning?
Caching the whole prompt as one block invalidates instantly whenever any part of it varies between requests, which defeats the purpose since most real prompts include some per-request content. A designed cache boundary keeps the genuinely static portion reused across many requests. The tradeoff is the upfront design work of identifying exactly which part of a prompt is truly static.