Anthropic's prompt caching stores frequently reused prompt prefixes server-side, enabling cache reads at ~10% of normal input token cost. Learn the vocabulary for cache_control directives, TTL behavior, creation vs read token billing, and multi-turn conversation patterns.
0 / 5 completed
1 / 5
An engineer adds cache_control: { type: 'ephemeral' } to a 900-token system prompt. On the second identical call 10 minutes later, which token counts appear in the response usage?
On a cache hit, Anthropic reports cache_read_input_tokens for the cached portion and bills at ~10% of normal input token cost. input_tokens reflects only non-cached tokens. The first call shows cache_creation_input_tokens; subsequent cache-hit calls show cache_read_input_tokens.
2 / 5
What is the minimum number of tokens a prompt block must contain before Anthropic will store it in the prompt cache?
Anthropic requires at least 1,024 tokens (for Claude 3.5 and Claude 3 models) in a cacheable block before it will be stored. Shorter blocks are processed normally without caching. This threshold exists to make cache storage cost-effective.
3 / 5
How long does an Anthropic prompt cache entry remain valid after its last use?
Anthropic ephemeral cache entries have a 5-minute TTL that resets on each cache hit. As long as a cached prefix is accessed at least once every 5 minutes, it stays warm. Unused entries expire 5 minutes after their last access.
4 / 5
A developer places cache_control on the last user message in a multi-turn conversation. What does Anthropic cache in this case?
Anthropic caches everything from the beginning of the prompt up to and including the block marked with cache_control. Placing the marker on the last user turn caches the full conversation prefix, which is the typical pattern for multi-turn conversation acceleration.
5 / 5
Which cost multiplier applies to cache_creation_input_tokens compared to standard input token pricing for Claude 3.5 Sonnet?
Writing to the prompt cache costs 1.25× the standard input token price (25% markup). This creation cost is offset after approximately 2 cache hits. Reads cost only ~0.1× (10%) of standard input pricing, making caching economical for reused prefixes.