AdvancedVocabulary#ai-llm#backend#developer-tools

Prompt Caching Vocabulary

Practice the vocabulary of reusing a static prompt segment to reduce latency and cost.

0 / 5 completed

1 / 5

At standup, a dev mentions storing a long, unchanging portion of a prompt, like a system instruction, so the model doesn't have to reprocess it from scratch on every single call. What is this technique called?

2 / 5

During a design review, the team wants only the truly static, reused prefix of a prompt cached, while the unique, per-request portion is still processed fresh each time. Which capability supports this?

3 / 5

In a code review, a dev notices a cached prompt segment automatically expires and gets rebuilt after a short time-to-live window rather than being reused indefinitely without any freshness check. What does this represent?

4 / 5

An incident report shows a prompt's static instructions were updated, but a stale cached version kept being served for hours because the cache had no expiration tied to the content's own version. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team designs a specific cache boundary instead of just caching the entire prompt as one single block regardless of what varies between requests. What is the reasoning?