Agent Memory Vocabulary
5 exercises — master the vocabulary of how AI agents store and retrieve information: in-context vs. external memory, episodic/semantic/procedural memory types, and context window management.
0 / 5 completed
Agent memory vocabulary quick reference
- In-context memory — everything in the current context window (conversation, tool results, scratchpad)
- External memory — information stored outside the LLM, retrieved on demand (vector DB, KV store)
- Episodic memory — records of specific past events, tied to time and context
- Semantic memory — general facts about the world, not tied to a specific event
- Procedural memory — knowledge of how to perform tasks; encoded skills and workflows
- RAG — Retrieval-Augmented Generation: fetching relevant documents before answering
- Context window pressure — when accumulated tokens approach the context limit
- Memory summarization — compressing old context to free up token budget
1 / 5
In agentic systems architecture, an AI agent has two fundamentally different places it can store and retrieve information. What are the two primary memory storage categories, and how do engineers distinguish between them?
In-context vs. external memory is the foundational memory architecture distinction in agentic systems.
In-context memory:
• Everything in the agent's context window — the "working memory" of the current run
• Includes: system prompt, conversation history, tool results, scratchpad
• Limitations: bounded by the context window limit (e.g. 128k tokens); not persistent across agent runs
• Cost: every token counts against the LLM API cost
External memory:
• Information stored outside the LLM, retrieved on demand
• Types: vector database (semantic search), key-value store (exact lookup), relational DB (structured queries), document store
• Advantages: unlimited capacity, persistent across runs, shareable across agents
• Requires a retrieval step — a tool call to fetch relevant content
The distinction matters because:
• Architectural decisions (what goes in-context vs. what goes external) directly affect agent quality, cost, and latency
• Interview questions asking "how would you scale this agent's knowledge base?" expect this vocabulary
Key vocabulary:
• Context window — the maximum token capacity of the LLM's working memory
• Context stuffing — putting too much into the context window, degrading performance
• Retrieval-augmented memory — fetching relevant facts from external memory on demand
In-context memory:
• Everything in the agent's context window — the "working memory" of the current run
• Includes: system prompt, conversation history, tool results, scratchpad
• Limitations: bounded by the context window limit (e.g. 128k tokens); not persistent across agent runs
• Cost: every token counts against the LLM API cost
External memory:
• Information stored outside the LLM, retrieved on demand
• Types: vector database (semantic search), key-value store (exact lookup), relational DB (structured queries), document store
• Advantages: unlimited capacity, persistent across runs, shareable across agents
• Requires a retrieval step — a tool call to fetch relevant content
The distinction matters because:
• Architectural decisions (what goes in-context vs. what goes external) directly affect agent quality, cost, and latency
• Interview questions asking "how would you scale this agent's knowledge base?" expect this vocabulary
Key vocabulary:
• Context window — the maximum token capacity of the LLM's working memory
• Context stuffing — putting too much into the context window, degrading performance
• Retrieval-augmented memory — fetching relevant facts from external memory on demand