Claude API English: Tool Use and Extended Thinking Vocabulary

Master the English vocabulary of the Anthropic Claude API — tool use, extended thinking, streaming, and prompt engineering terms explained for IT professionals.

Introduction

The Anthropic Claude API has its own vocabulary for the features that make it distinctive: tool use, extended thinking, prompt caching, and multi-turn conversations. If your team integrates Claude into a product, you will use these terms in code reviews, API documentation, and architecture discussions. Learning the precise English used in Anthropic’s official documentation and developer community helps you communicate clearly and understand Claude’s behaviour more accurately.

Messages API and Conversation Structure

Claude uses a Messages API where each API call receives a list of messages with alternating user and assistant roles. Engineers describe the structure as:

  • “We build the conversation history as a list of messages” — passing prior turns in the messages array
  • “We prepend a system prompt to guide Claude’s behaviour” — the system parameter sets context before the conversation
  • “We append the assistant’s response to the history for the next turn” — maintaining multi-turn state
  • “Tokens in, tokens out” — engineers sometimes describe API cost as input tokens consumed and output tokens generated

The word turn is important here. A single user message plus Claude’s response is called a turn. “We limit the conversation to ten turns” means ten user-assistant exchanges. Engineers also say “multi-turn conversation” to mean a conversation with history, as opposed to a single-shot prompt.

Tool Use

Tool use (sometimes called function calling in other LLM APIs) allows Claude to request that your code execute a function and return the result. The Anthropic vocabulary is specific:

  • tool definition — a JSON schema describing a tool’s name, description, and parameters
  • tool use block — when Claude’s response contains a request to call a tool
  • tool result — the output you send back after executing the tool
  • “Claude decides whether to use a tool” — the model chooses based on the conversation context
  • “We handle the tool use block and return a tool result” — the code pattern for executing a tool call

Engineers describe the flow: “When Claude returns a tool use block, we extract the tool name and input, execute the function locally, and send back a user message containing the tool result.” Note that tool results are sent back as a user role message, which is non-obvious and often discussed in code reviews.

A common phrase in architecture discussions: “We give Claude a set of tools and let it orchestrate the workflow — Claude decides the order and which tools to call.”

Extended Thinking

Extended thinking is a Claude feature where the model reasons through a problem before giving a final answer. The reasoning is visible to the developer. The vocabulary:

  • “Enable extended thinking” — set the thinking parameter in the API request
  • “Thinking budget” — the maximum number of tokens Claude can use for internal reasoning
  • “Thinking blocks” — the content blocks in the response that contain Claude’s reasoning
  • “The thinking is not shown to end users” — developers can access it but typically hide it from the UI
  • “We use extended thinking for complex multi-step problems” — a common stated use case

Engineers say: “We enable extended thinking with a budget of 10,000 tokens for our data analysis endpoint — the reasoning traces help us debug unexpected outputs.” The phrase “reasoning trace” is used interchangeably with “thinking block” in informal discussion.

Prompt Caching

Prompt caching reduces cost when the same prefix is sent repeatedly. The vocabulary:

  • “We cache the system prompt” — mark a static prefix for caching with a cache_control parameter
  • “Cache hit” — the cached content was reused, reducing cost and latency
  • “Cache miss” — the cached content was not available, full processing occurred
  • “We save tokens by caching the large document” — a common motivation for using prompt caching

Engineers often note in architecture documents: “Since our system prompt and RAG context are the same for every request to this endpoint, we use prompt caching to reduce input token costs significantly.”

Streaming Responses

Most production Claude integrations use streaming. The vocabulary:

  • “Stream the response” — receive tokens as they are generated rather than waiting for the full response
  • “Event stream” — the series of server-sent events that make up a streaming response
  • “Delta” — the incremental text in each streaming event
  • “We accumulate the deltas to build the full response” — a common implementation note

Key Vocabulary

TermDefinition
turnOne user message plus one assistant response in a conversation
system promptA parameter that provides instructions to Claude before the conversation
tool use blockA response block where Claude requests a function to be called
tool resultThe output returned to Claude after executing a requested function
extended thinkingA mode where Claude reasons internally before producing a final answer
thinking budgetThe maximum token count allocated for extended thinking
thinking blockA response block containing Claude’s internal reasoning
prompt cachingReusing a previously processed prompt prefix to reduce cost
cache hitWhen a cached prefix is successfully reused
streamingReceiving response tokens incrementally as they are generated

Practice Tips

  1. Read Anthropic’s official API documentation. Anthropic’s docs are well-structured and use consistent terminology. Pay attention to the difference between “tool use block” and “tool result” — these are the exact terms used in the API response schema.

  2. Write clear tool definitions. The description field in a tool definition is read by Claude, not just humans. Practise writing tool descriptions in natural, precise English: “Search the product catalogue by keyword and optional price range. Returns up to 20 matching products.”

  3. Practise explaining streaming to non-technical stakeholders. A common communication challenge: “Rather than waiting several seconds for the full response, the user sees text appearing word by word, which makes the interface feel much more responsive.”

  4. Use “thinking budget” in architecture conversations. When discussing extended thinking, frame it in terms of budget and cost trade-offs: “We allocate a larger thinking budget for complex queries and a smaller one for simple lookups to balance quality and cost.”

Conclusion

The Anthropic Claude API vocabulary — tool use, extended thinking, thinking budget, prompt caching, and streaming — is precise and important to understand for production integrations. Using the correct terms in documentation, code comments, and team discussions shows technical depth and prevents the ambiguity that arises from using vague synonyms. As Claude’s capabilities evolve, the vocabulary will grow, making it valuable to build a habit of reading official documentation carefully.