Advanced AI Agents #memory #RAG #context-window #vector-db

Agent Memory Vocabulary

5 exercises — master the vocabulary of how AI agents store and retrieve information: in-context vs. external memory, episodic/semantic/procedural memory types, and context window management.

0 / 5 completed

Agent memory vocabulary quick reference

In-context memory — everything in the current context window (conversation, tool results, scratchpad)
External memory — information stored outside the LLM, retrieved on demand (vector DB, KV store)
Episodic memory — records of specific past events, tied to time and context
Semantic memory — general facts about the world, not tied to a specific event
Procedural memory — knowledge of how to perform tasks; encoded skills and workflows
RAG — Retrieval-Augmented Generation: fetching relevant documents before answering
Context window pressure — when accumulated tokens approach the context limit
Memory summarization — compressing old context to free up token budget

1 / 5

In agentic systems architecture, an AI agent has two fundamentally different places it can store and retrieve information. What are the two primary memory storage categories, and how do engineers distinguish between them?

2 / 5

A cognitive science–inspired framework classifies agent memory into three types: episodic memory, semantic memory, and procedural memory. A product manager asks you to explain how your AI assistant stores the fact that "last Tuesday you helped User X draft a pricing proposal." Which type of memory does this belong to?

3 / 5

An agent architect describes a design where the agent uses a vector database to find relevant documents before answering. During a technical review they say: "The retrieval step adds ~200ms latency, but it keeps the context window clean and lets us retrieve from a 2-million-document corpus." What problem does this design solve, and what is the correct vocabulary for it?

4 / 5

A teammate proposes adding external memory to the agent so it can "remember" facts about each user across sessions. They sketch two designs: Design A — store user facts in a key-value store, retrieved by exact user ID. Design B — embed user notes into vectors for semantic similarity search. Which design is best for recalling a specific user preference like "User #1042 prefers British English"?

Matching the memory store to the retrieval pattern is a core agent architecture decision.

Pattern	Best Store	Example Use Case
Lookup by exact key	Key-value store (Redis, DynamoDB)	User preferences, session state, tool results cache
Semantic / fuzzy search	Vector DB (Pinecone, Weaviate, pgvector)	Document knowledge bases, FAQ retrieval, similar past tickets
Complex structured queries	SQL (PostgreSQL, Supabase)	Multi-filter search, aggregation, reporting
Full-text keyword search	Elasticsearch, BM25	Code search, exact phrase matching, log search

Why Design A wins for this use case:
• You know exactly who you're looking for: user ID 1042
• The lookup is deterministic — you don't want a semantic search returning the "most similar" user by mistake
• Sub-millisecond KV reads eliminate retrieval latency from the agent loop

Key vocabulary:
• Retrieval drift — when semantic search returns slightly wrong results because the query wasn't well-formed
• Deterministic retrieval — exact-match lookup; always returns the same result for the same key
• Memory schema — the data structure design for what an agent stores per user or session

5 / 5

During an architecture review, an engineer raises a concern: "Our agent has a 128k token context window, but after 20 turns of conversation, the early parts of the conversation history are being truncated — the agent is forgetting what the user said at the start." What is the technical term for this problem, and what are the standard mitigation strategies?