Memory transforms AI agents from stateless responders into entities that learn and adapt. Understanding memory architectures is essential for building agents that improve with experience.
## The Memory Challenge
LLMs have no inherent memory between sessions. Each conversation starts fresh. For agents performing ongoing tasks, this limitation is critical:
- Context from previous interactions is lost
- Learned preferences must be re-explained
- Mistakes may repeat without correction
- Relationship building is impossible
## Memory Types for AI Agents
### Short-term Memory (Working Memory)
Maintains context within a session:
```python
class WorkingMemory:
def __init__(self, max_tokens=4000):
self.messages = []
self.max_tokens = max_tokens
def add(self, message):
self.messages.append(message)
self._prune_if_needed()
def _prune_if_needed(self):
while self._token_count() > self.max_tokens:
# Remove oldest non-system messages
self.messages.pop(1)
```
Strategies for managing short-term memory:
- Sliding window: Keep recent N messages
- Summarization: Compress old context
- Importance scoring: Retain critical information
### Long-term Memory (Semantic Memory)
Persists information across sessions:
```python
class LongTermMemory:
def __init__(self, vector_store):
self.vector_store = vector_store
def remember(self, information, metadata):
embedding = embed(information)
self.vector_store.upsert(
id=generate_id(),
vector=embedding,
metadata={**metadata, "timestamp": now()}
)
def recall(self, query, k=5):
results = self.vector_store.query(embed(query), top_k=k)
return [r.metadata["content"] for r in results]
```
Use cases:
- User preferences and history
- Domain knowledge accumulation
- Learned procedures and patterns
### Episodic Memory
Stores specific experiences as narratives:
```python
class EpisodicMemory:
def __init__(self):
self.episodes = []
def record_episode(self, trigger, actions, outcome, learnings):
self.episodes.append({
"trigger": trigger,
"actions": actions,
"outcome": outcome, # success/failure
"learnings": learnings,
"timestamp": now()
})
def find_similar_episodes(self, situation):
# Find past experiences relevant to current situation
return semantic_search(self.episodes, situation)
```
Enables:
- Learning from mistakes
- Applying past solutions
- Explaining reasoning based on experience
## Memory Architecture Patterns
### Tiered Memory
```
Immediate Context → Working Memory → Long-term Store
↓ ↓ ↓
Current turn Session history Permanent storage
```
### Memory Consolidation
Like human sleep, periodically consolidate working memory:
```python
async def consolidate_memory(session):
# Extract key information from session
summary = await llm.summarize(session.messages)
entities = await llm.extract_entities(session.messages)
learnings = await llm.identify_learnings(session.messages)
# Store in long-term memory
long_term.remember(summary, {"type": "session_summary"})
for entity in entities:
long_term.remember(entity, {"type": "entity"})
for learning in learnings:
episodic.record_episode(**learning)
```
## Retrieval Strategies
When to retrieve memories:
- At conversation start: Load user context
- On topic change: Fetch relevant knowledge
- Before actions: Check for past similar situations
- On confusion: Search for clarifying information
## Privacy and Retention
Memory systems must respect:
- User consent for data storage
- Right to deletion
- Data minimization principles
- Retention policies
```python
def forget_user(user_id):
long_term.delete_by_metadata({"user_id": user_id})
episodic.delete_by_metadata({"user_id": user_id})
```
## Implementation Considerations
- Vector databases: Pinecone, Weaviate, Chroma
- Storage costs scale with memory size
- Retrieval latency impacts response time
- Memory quality degrades without maintenance
Effective memory systems are what separate demo agents from production agents. Invest in memory architecture early.