Practice LLM cost management vocabulary: cost per 1000 tokens, caching responses, API call pricing, switching to smaller models, and token usage optimization.
0 / 5 completed
1 / 5
The LLM provider charges $0.002 per 1,000 ___ for input. What unit is pricing based on?
LLM APIs price by the token — a subword unit typically representing about 4 characters or 0.75 words. Input tokens (prompt) and output tokens (completion) are often priced separately. Understanding tokenization is essential for cost estimation.
2 / 5
To reduce costs the team ___ responses for repeated queries such as FAQ answers.
Response caching stores the LLM's output for common or identical queries and serves the cached result for subsequent identical requests. This eliminates redundant API calls, reduces latency, and can cut costs dramatically for applications with repetitive queries.
3 / 5
The engineer calculates: 'This API call costs ___.' They are estimating cost from token count × price rate.
Per-call cost is calculated by multiplying (input tokens × input price) + (output tokens × output price). For example: 500 input tokens at $0.002/1K + 200 output tokens at $0.006/1K = $0.001 + $0.0012 = $0.0022 per call.
4 / 5
For sentiment classification the team ___ to a smaller model, cutting inference cost by 80%.
Switching to a smaller model for simple tasks (classification, extraction, routing) is one of the most effective cost optimisations. Large frontier models are not needed for every task — smaller models cost a fraction of the price and have acceptable quality for constrained tasks.
5 / 5
The cost review identifies ___ usage optimization as a priority: prompts contain repeated boilerplate consuming many tokens.
Token usage optimization means reducing the tokens consumed per API call — for example, by removing repetitive boilerplate from system prompts, using more concise instructions, truncating context, or switching to structured output formats that require fewer tokens.