Why this matters: AI agents are in production. Whether you are building with LangGraph, AutoGen, CrewAI, or the Claude Agents SDK, you need to discuss agent loops, tool use, memory, and safety precisely.

Key agent vocabulary

Agent architecture

  • "The agent follows a ReAct loop — reason, act, observe, repeat."
  • "Each iteration of the loop is an agent step."
  • "The full sequence of steps is the agent's trajectory."

Tool use & memory

  • "The agent selected web_search from its tool registry."
  • "Episodic memory stores what happened in past conversations."
  • "We use a vector store for semantic long-term memory."

Safety & observability

  • "Output guardrails block harmful content before returning it."
  • "We added a human-in-the-loop checkpoint before file deletion."
  • "Each run produces a trace with individual spans."

Frequently Asked Questions

What is the ReAct pattern in AI agent architecture?

ReAct (Reasoning + Acting) is a framework where an LLM-based agent interleaves reasoning traces with actions. In each step the agent produces a Thought (what it is trying to do), selects an Action (a tool call), and receives an Observation (the tool result), then repeats until the task is complete. This think-act-observe loop is the foundation of most modern agent frameworks including LangGraph and the Claude Agents SDK. The full sequence of steps is called the agent's trajectory.

What does "tool use" mean in the context of AI agents?

Tool use (also called function calling) is the ability of an LLM to invoke external capabilities — such as a web search, a code executor, a database query, or a file system operation — by generating a structured tool call. The agent selects tools from its tool registry (the list of available tools with their schemas), generates the call arguments, receives the result, and incorporates it into its reasoning. Tool use is what allows agents to take actions in the real world beyond text generation.

What is the difference between an orchestrator and a sub-agent in multi-agent systems?

In a multi-agent system, the orchestrator is the top-level agent responsible for planning, task decomposition, and delegating work to specialised sub-agents. Sub-agents handle specific subtasks and return results to the orchestrator. Frameworks like AutoGen and CrewAI formalise this hierarchy. The orchestrator maintains the overall goal and decides when sub-agent results are sufficient or when to retry. Handoffs are the mechanism by which control transfers between agents.

What is agent memory and what types exist?

Agent memory refers to the mechanisms an agent uses to store and retrieve information. The main types are: in-context memory (information held within the active context window for the current session), episodic memory (records of past interactions, typically stored externally), semantic memory (factual knowledge stored in a vector store and retrieved via similarity search), and procedural memory (learned behaviours or skills). Context window management — deciding what to keep, summarise, or offload — is a core agent design challenge.

What are agent guardrails and why are they important?

Guardrails are safety and quality mechanisms applied to agent inputs and outputs. Input guardrails check whether incoming user requests are safe and within scope before the agent processes them. Output guardrails check agent responses before they are returned — filtering harmful content, validating that actions are within permitted scope, or enforcing format requirements. Human-in-the-loop checkpoints pause execution and require human approval before high-stakes actions (e.g. sending emails, modifying databases, spending money).

How is agent observability different from standard application monitoring?

Standard monitoring tracks metrics like uptime and latency. Agent observability requires tracking the entire agent trajectory: each LLM call (with token counts and costs), every tool invocation and its result, reasoning steps, and the final output. This is typically structured as a trace (the full run) made up of individual spans (each step). Platforms like LangSmith, Langfuse, and Arize Phoenix provide agent-specific observability. Token budget tracking is especially important since long agent runs can become expensive.

What is the planner-executor agentic design pattern?

In the planner-executor pattern, a planning agent first decomposes the goal into a structured task list or execution plan, then passes this to an executor agent (or agents) that carries out each step. The planner focuses on strategy and sequencing; the executor focuses on action. This separation reduces errors caused by single-agent context overload and allows the planner to verify intermediate results and replan if a step fails.

What does "agentic workflow" mean and how does it differ from a simple LLM call?

An agentic workflow is a sequence of LLM decisions and actions that run autonomously over multiple steps to complete a complex task. Unlike a single LLM call (prompt in, response out), an agentic workflow involves multiple iterations, tool calls, memory retrieval, and potentially multiple agents. The key characteristics are autonomy (the system decides its own next steps), tool use (ability to interact with external systems), and persistence (maintaining state across steps).

What are agent evals and why do AI teams run them?

Agent evals (evaluations) are systematic tests that measure how well an agent performs its intended task. Unlike unit tests that check deterministic code, agent evals account for non-determinism by running agents over benchmark datasets and scoring outputs. Common eval dimensions include: task completion rate, trajectory efficiency (was the shortest path taken?), tool call accuracy, and faithfulness (does the output reflect the retrieved evidence?). Evals are run before deploying agent updates to detect capability regressions.

What is "long-term memory" in AI agents and how is it implemented?

Long-term memory allows agents to recall information beyond the current context window, across sessions. It is typically implemented using a vector store: information is chunked, embedded as vectors, and stored in a database (e.g. Qdrant, Pinecone, pgvector). When relevant history is needed, a similarity search retrieves the most relevant chunks and injects them into the current context. This is distinct from in-context memory (within the prompt) and episodic logs (structured records of past runs).