Agents need memory to maintain context. This guide shows you how to manage conversation history using simple Python data structures.
Coming from Software Engineering? Conversation history management is just session state. If you've built web apps with server-side sessions or a Redis-backed session store, you already know how to manage conversational state. The
messagesarray is your session object, and each LLM call resends the whole thing — the model keeps nothing between calls.
Why Conversation History Matters
Basic History: List of Dictionaries
The simplest approach - store messages as a list:
# script_id: day_036_conversation_history/basic_chat_history
from openai import OpenAI
client = OpenAI()
# Initialize conversation history
history = []
def chat(user_message: str) -> str:
"""Send a message and maintain history."""
# Add user message to history
history.append({
"role": "user",
"content": user_message
})
# Call the LLM with full history
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."}
] + history
)
assistant_message = response.choices[0].message.content
# Add assistant response to history
history.append({
"role": "assistant",
"content": assistant_message
})
return assistant_message
# Usage
print(chat("My name is Alice"))
print(chat("What's my name?")) # Remembers!
print(chat("What did I first tell you?")) # Still remembers!
Structured History Class
A cleaner, reusable approach:
# script_id: day_036_conversation_history/structured_history_class
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
@dataclass
class Message:
role: str # "user", "assistant", or "system"
content: str
timestamp: datetime = field(default_factory=datetime.now)
metadata: dict = field(default_factory=dict)
class ConversationHistory:
"""Manage conversation history for an agent."""
def __init__(self, system_prompt: str = "You are a helpful assistant."):
self.system_prompt = system_prompt
self.messages: List[Message] = []
def add_user_message(self, content: str, metadata: dict = None):
"""Add a user message to history."""
self.messages.append(Message(
role="user",
content=content,
metadata=metadata or {}
))
def add_assistant_message(self, content: str, metadata: dict = None):
"""Add an assistant message to history."""
self.messages.append(Message(
role="assistant",
content=content,
metadata=metadata or {}
))
def get_messages_for_api(self) -> List[dict]:
"""Get messages formatted for API call."""
api_messages = [{"role": "system", "content": self.system_prompt}]
for msg in self.messages:
api_messages.append({
"role": msg.role,
"content": msg.content
})
return api_messages
def clear(self):
"""Clear all history."""
self.messages = []
def get_last_n_messages(self, n: int) -> List[Message]:
"""Get the last n messages."""
return self.messages[-n:]
def __len__(self):
return len(self.messages)
# Usage
history = ConversationHistory(system_prompt="You are a coding assistant.")
history.add_user_message("How do I read a file in Python?")
history.add_assistant_message("You can use the open() function...")
messages = history.get_messages_for_api()
# Ready to send to LLM!
The rest of this lesson uses a generic add(role, content) for brevity; the interface is interchangeable — only the eviction/persistence behavior changes between sections.
Sliding Window History
The model can only read a fixed amount of text per request — its context window. That text is measured in tokens, which are just chunks of characters (roughly 4 characters, or about 3/4 of a word, each). Because every message you resend counts against that limit, a long conversation eventually won't fit — that's what "token overflow" means. The fix is the same as any bounded cache: drop the oldest entries.
This is an LRU-style bounded cache for your session — deque(maxlen=N) evicts the oldest turn exactly like a capped session ring buffer. Prevent token overflow by keeping only recent messages:
# script_id: day_036_conversation_history/sliding_window_history
from collections import deque
class SlidingWindowHistory:
"""Keep only the most recent N messages."""
def __init__(self, max_messages: int = 20, system_prompt: str = "You are helpful."):
self.max_messages = max_messages
self.system_prompt = system_prompt
self.messages = deque(maxlen=max_messages)
def add(self, role: str, content: str):
"""Add a message (oldest messages auto-removed when full)."""
self.messages.append({"role": role, "content": content})
def get_messages(self) -> list:
"""Get all messages with system prompt."""
return [
{"role": "system", "content": self.system_prompt}
] + list(self.messages)
def get_token_estimate(self) -> int:
"""Rough estimate of token count."""
total_chars = sum(len(m["content"]) for m in self.messages)
# Rough rule of thumb: ~4 characters per token for English text.
# Only an estimate — use tiktoken (shown next) for the exact count.
return total_chars // 4
# Usage
history = SlidingWindowHistory(max_messages=10)
for i in range(15):
history.add("user", f"Message {i}")
history.add("assistant", f"Response {i}")
print(f"Messages kept: {len(history.messages)}") # 10, not 30
Heads-up: frameworks like LangGraph (covered later in this phase) can manage the message list for you automatically. The hand-rolled versions here teach you what those frameworks do under the hood — worth understanding before you hand it off.
Token-Aware History
Keep messages within a token budget. tiktoken is OpenAI's tokenizer — it counts tokens exactly the way the model does instead of guessing with chars/4. (pip install tiktoken. Token counts are model-specific; an Anthropic model would count differently.)
# script_id: day_036_conversation_history/token_aware_history
import tiktoken
class TokenAwareHistory:
"""Manage history within a token budget."""
def __init__(self, max_tokens: int = 4000, model: str = "gpt-4o-mini"):
self.max_tokens = max_tokens
self.model = model
self.encoder = tiktoken.encoding_for_model(model)
self.system_prompt = "You are a helpful assistant."
self.messages = []
def count_tokens(self, text: str) -> int:
"""Count tokens in text."""
return len(self.encoder.encode(text))
def total_tokens(self) -> int:
"""Count total tokens in history."""
total = self.count_tokens(self.system_prompt)
for msg in self.messages:
total += self.count_tokens(msg["content"])
total += 4 # Overhead per message
return total
def add(self, role: str, content: str):
"""Add message, removing old ones if over budget."""
new_tokens = self.count_tokens(content) + 4
# Remove old messages until we have room
while self.messages and (self.total_tokens() + new_tokens > self.max_tokens):
removed = self.messages.pop(0)
print("Removed old message to stay within budget")
self.messages.append({"role": role, "content": content})
def get_messages(self) -> list:
"""Get messages for API call."""
return [
{"role": "system", "content": self.system_prompt}
] + self.messages
# Usage
history = TokenAwareHistory(max_tokens=1000)
history.add("user", "Tell me about Python")
history.add("assistant", "Python is a programming language..." * 50) # Long response
history.add("user", "What about JavaScript?") # Might trigger cleanup
print(f"Current tokens: {history.total_tokens()}")
The print on each eviction is just so you can see the trimming happen — production code would log this at debug level rather than print to stdout.
History with Tool Calls
Include tool calls in your history. When the model decides to call a tool (the ReAct pattern from Day 35), three things must land in history in order: the assistant's tool request, the result you fed back (as a role: "tool" message whose tool_call_id matches the request's id), and the assistant's final answer. The class below records that exact sequence so the model can see what it asked for and what came back.
# script_id: day_036_conversation_history/tool_aware_history
class ToolAwareHistory:
"""History that handles tool calls."""
def __init__(self):
self.messages = []
def add_user(self, content: str):
self.messages.append({"role": "user", "content": content})
def add_assistant(self, content: str = None, tool_calls: list = None):
"""Add assistant message, optionally with tool calls."""
message = {"role": "assistant"}
if content:
message["content"] = content
if tool_calls:
message["tool_calls"] = tool_calls
self.messages.append(message)
def add_tool_result(self, tool_call_id: str, result: str):
"""Add a tool result."""
self.messages.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result
})
def get_messages(self) -> list:
return self.messages.copy()
# Usage with tool calls
history = ToolAwareHistory()
history.add_user("What's the weather in Tokyo?")
# LLM responds with tool call
history.add_assistant(tool_calls=[{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"city": "Tokyo"}'
}
}])
# Add tool result
history.add_tool_result("call_123", '{"temp": 22, "condition": "sunny"}')
# LLM gives final response
history.add_assistant("The weather in Tokyo is 22°C and sunny!")
Conversation Branching
Support for exploring different conversation paths:
# script_id: day_036_conversation_history/branchable_history
from typing import Dict, List, Optional
import copy
class BranchableHistory:
"""History that supports branching and rollback."""
def __init__(self):
self.messages: List[dict] = []
self.branches: Dict[str, List[dict]] = {}
self.current_branch: str = "main"
def add(self, role: str, content: str):
"""Add message to current branch."""
self.messages.append({
"role": role,
"content": content,
"branch": self.current_branch
})
def create_branch(self, name: str):
"""Fork the current state into a new, independent branch."""
# Persist the branch we're leaving so it isn't lost...
self.branches[self.current_branch] = self.messages
# ...then start the new branch as a fork (independent copy).
self.current_branch = name
self.messages = copy.deepcopy(self.branches[name] if name in self.branches
else self.messages)
def switch_branch(self, name: str):
"""Switch to a different branch."""
if name not in self.branches and name != self.current_branch:
raise ValueError(f"Branch {name} not found")
# Save current branch before swapping, so edits don't leak across branches.
self.branches[self.current_branch] = self.messages
self.messages = copy.deepcopy(self.branches[name])
self.current_branch = name
def rollback(self, n: int = 1):
"""Remove the last n messages."""
if n <= 0:
return
self.messages = self.messages[:-n]
def get_messages(self) -> list:
"""Get messages for API (without metadata)."""
return [
{"role": m["role"], "content": m["content"]}
for m in self.messages
]
# Usage
history = BranchableHistory()
history.add("user", "Help me write a function")
history.add("assistant", "Sure! What should it do?")
history.add("user", "Calculate factorial")
# Fork off to try a recursive approach (this branch is independent)
history.create_branch("recursive")
history.add("assistant", "Here's a recursive approach...")
# Go back to the shared starting point and fork a separate iterative branch
history.switch_branch("main")
history.create_branch("iterative")
history.add("assistant", "Here's an iterative approach...")
# Branches are independent: "iterative" never sees the "recursive" turn.
history.switch_branch("recursive") # back to the recursive line of thought
Persisting History
This is just serializing your session store — json.dump/load is the conversational equivalent of writing a Redis session to disk and rehydrating it. Save and load conversation history:
# script_id: day_036_conversation_history/persistent_history
import json
from pathlib import Path
from datetime import datetime
class PersistentHistory:
"""History that can be saved to and loaded from disk."""
def __init__(self, filepath: str = None):
self.filepath = filepath
self.messages = []
self.metadata = {
"created_at": datetime.now().isoformat(),
"updated_at": None
}
if filepath and Path(filepath).exists():
self.load()
def add(self, role: str, content: str):
"""Add a message."""
self.messages.append({
"role": role,
"content": content,
"timestamp": datetime.now().isoformat()
})
self.metadata["updated_at"] = datetime.now().isoformat()
def save(self, filepath: str = None):
"""Save history to file."""
path = filepath or self.filepath
if not path:
raise ValueError("No filepath specified")
data = {
"metadata": self.metadata,
"messages": self.messages
}
with open(path, 'w') as f:
json.dump(data, f, indent=2)
def load(self, filepath: str = None):
"""Load history from file."""
path = filepath or self.filepath
if not path:
raise ValueError("No filepath specified")
with open(path, 'r') as f:
data = json.load(f)
self.metadata = data.get("metadata", {})
self.messages = data.get("messages", [])
def get_messages_for_api(self) -> list:
"""Get messages without timestamps."""
return [
{"role": m["role"], "content": m["content"]}
for m in self.messages
]
# Usage
history = PersistentHistory("conversation.json")
history.add("user", "Hello!")
history.add("assistant", "Hi there!")
history.save()
# Later...
loaded_history = PersistentHistory("conversation.json")
print(loaded_history.messages) # Previous conversation loaded!
Summary
Quick Reference
# script_id: day_036_conversation_history/quick_reference
# Basic history
history = []
history.append({"role": "user", "content": "Hello"})
history.append({"role": "assistant", "content": "Hi!"})
# Sliding window
from collections import deque
history = deque(maxlen=20)
# Token-aware
while total_tokens > max_tokens:
history.pop(0)
# Save/Load
json.dump(history, open("history.json", "w"))
history = json.load(open("history.json"))
Exercises
- Build a sliding-window history with
deque(maxlen=20)and confirm that the oldest messages drop off after the 21st append — but make sure the system message never gets evicted. - Write a
token_budget_trim(history, max_tokens)that removes the oldest non-system messages until the estimated token count fits, then test it on a long fake conversation. - Add
save(path)/load(path)methods that round-trip the history to JSON and back, and verify the reloaded list equals the original. - Extend a history entry to carry a
tool_callsfield, then write a function that reconstructs only the user/assistant turns (dropping tool noise) for a clean transcript view.
Solutions (approaches)
- Keep the system message separate: store it once, build the window from
deque(maxlen=19)of the rest, and prepend system on read. Append 21 user turns and assertlen(window) == 19. - Estimate tokens with
len(content) // 4(ortiktoken); loopwhile total > max_tokens: history.pop(1)(index 1 skips system) and recompute. json.dump(self.messages, open(path, "w"))andself.messages = json.load(open(path)); assert equality after reload.- Filter with
[m for m in history if m["role"] in ("user", "assistant") and not m.get("tool_calls")].
Checkpoint
Run the SlidingWindowHistory example: add 15 user/assistant pairs to a window of max_messages=10 and print len(history.messages). You should see exactly 10, not 30 — the oldest messages were silently evicted by deque(maxlen=...), just like the system session stores you've capped before. If you see 30, you're appending to a plain list somewhere instead of the bounded deque.
What's Next?
Now let's learn about implementing hard stops and max iterations to prevent your agent from running forever!