This is Part 4 of the series.
- Part 1: Why Python Still Dominates in 2026
- Part 2: Build Your Own AI Chatbot — RAG From Scratch to Deployment
- Part 3: One AI Is No Longer Enough — LangGraph Multi-Agent Systems
- Part 4: AI That Finally Remembers — Complete LangGraph Memory Guide ← You are here
📌 Level: Intermediate (Parts 1–3 are sufficient background) ⏱️ Reading time: ~12 min / Hands-on time: ~2 hours 🛠️ End result: An AI assistant that remembers conversations and personalizes over time
In the last part we built a 3-agent AI team.
The researcher, writer, and fact-checker collaborated to produce solid output. But what happens when you open it the next day?
It starts completely fresh.
“I told you my name was Alex yesterday, didn’t I?” → “I’m sorry, I don’t have that information.”
“Write it in the style of that Python article we did last time.” → “What style would you like?”
From the user’s perspective, this is no better than a search box. A real assistant needs to remember.
Today we’re going to attach memory to LangGraph to build an AI that is genuinely yours.
📊 Table of Contents
- Why Memory Matters — The Limits of Amnesiac AI
- Two Types of LangGraph Memory — Short-term vs Long-term
- Short-Term Memory — MemorySaver (within-session recall)
- Mid-Term Memory — SQLiteSaver (local persistent storage)
- Long-Term Memory — PostgreSQL (production-grade)
- The Long-Term Memory Store — Learning User Preferences
- Multi-User Handling — Separating Users with thread_id
- Practical Pattern: Summary Compression for Cost Control
- Memory Design Checklist
1. Why Memory Matters
Feel the difference side by side.
AI without memory:
[Day 1] User: "I'm a backend developer and mainly use Python." AI: "Got it!"[Day 2] User: "Explain this at my level." AI: "Could you tell me what field you're in and what language you use..."
AI with memory:
[Day 1] User: "I'm a backend developer and mainly use Python." AI: "I'll remember that!"[Day 2] User: "Explain this at my level." AI: "Framing this for a Python backend developer — since you already know asyncio, let me use that as context..."
Do you feel the difference? The second AI behaves like an assistant for exactly one reason: it remembers.
LangGraph manages memory in two layers.
2. Two Types of LangGraph Memory
LangGraph’s official memory design is clear.
┌─────────────────────────────────────────────────┐│ LangGraph Memory Architecture │├────────────────────────┬────────────────────────┤│ Short-term Memory │ Long-term Memory │├────────────────────────┼────────────────────────┤│ Within current thread │ Across multiple sessions││ Managed automatically │ Must be saved/retrieved ││ Handled by checkpointer│ Handled by Store ││ Gone when session ends │ Persists permanently │├────────────────────────┼────────────────────────┤│ Examples: │ Examples: ││ - Current conversation │ - User's name ││ - Active task state │ - Preferred writing style││ - Error history │ - Past projects │└────────────────────────┴────────────────────────┘
Checkpointer = handles short-term memory. Saves the current conversation flow. Store = handles long-term memory. Persists facts and preferences across sessions.
Production systems need both. Let’s build them step by step.
3. Short-Term Memory — MemorySaver (Within-Session Recall)
Start with the simplest option. MemorySaver is a checkpointer that stores in RAM. It disappears when the server restarts, but within a session it remembers everything.
# memory_basic.pyimport osfrom dotenv import load_dotenvfrom typing import TypedDict, Annotatedfrom langgraph.graph import StateGraph, ENDfrom langgraph.graph.message import add_messagesfrom langgraph.checkpoint.memory import MemorySaverfrom langchain_anthropic import ChatAnthropicfrom langchain_core.messages import HumanMessage, SystemMessageload_dotenv()# ── State Definition ────────────────────────────────class ChatState(TypedDict): messages: Annotated[list, add_messages] # add_messages: accumulates messages instead of overwriting# ── LLM ────────────────────────────────────────────llm = ChatAnthropic( model="claude-sonnet-4-20250514", api_key=os.getenv("ANTHROPIC_API_KEY"), max_tokens=1024)# ── Chatbot Node ────────────────────────────────────def chatbot_node(state: ChatState) -> dict: system = SystemMessage(content="""You are a friendly AI assistant with excellent memory.Always reference the conversation history and respond with appropriate context.""") response = llm.invoke([system] + state["messages"]) return {"messages": [response]}# ── Graph Assembly ──────────────────────────────────builder = StateGraph(ChatState)builder.add_node("chatbot", chatbot_node)builder.set_entry_point("chatbot")builder.add_edge("chatbot", END)# Key line: attach MemorySavermemory = MemorySaver()graph = builder.compile(checkpointer=memory)# ── Chat Function ───────────────────────────────────def chat(user_input: str, thread_id: str = "default") -> str: """ thread_id: identifies the conversation session. Same thread_id = the AI remembers the prior conversation. """ config = {"configurable": {"thread_id": thread_id}} result = graph.invoke( {"messages": [HumanMessage(content=user_input)]}, config=config ) return result["messages"][-1].content# ── Test ────────────────────────────────────────────if __name__ == "__main__": tid = "user-test-001" print("=" * 50) r1 = chat("Hi! My name is Alex. I'm a Python developer.", tid) print(f"AI: {r1}\n") r2 = chat("What's my name again?", tid) print(f"AI: {r2}\n") r3 = chat("Do you remember my job?", tid) print(f"AI: {r3}\n")
Run this and the AI will remember the name and job within the same session.
⚠️ MemorySaver’s limitation Restarting the server wipes all conversations. Fine for development and testing, not viable for real services.
4. Mid-Term Memory — SQLiteSaver (Local Persistent Storage)
If memory needs to survive a server restart, use SQLite. Everything is stored in a single .db file — no extra installation required, and perfect for personal projects or single-server deployments.
pip install langgraph-checkpoint-sqlite
# memory_sqlite.pyfrom langgraph.checkpoint.sqlite import SqliteSaver# Other imports same as above...# Replace MemorySaver with SqliteSaver# ":memory:" → in-memory (testing only)# "chat_memory.db" → writes to file (persistent)with SqliteSaver.from_conn_string("chat_memory.db") as checkpointer: graph = builder.compile(checkpointer=checkpointer) config = {"configurable": {"thread_id": "user-alex"}} graph.invoke( {"messages": [HumanMessage("I mainly build FastAPI services with Python.")]}, config=config ) print("✅ Run 1 complete. Try stopping and restarting the process.")
Even after killing and restarting the process, the chat_memory.db file remains and the conversation can be resumed.
# Running again later — previous conversation is restoredwith SqliteSaver.from_conn_string("chat_memory.db") as checkpointer: graph = builder.compile(checkpointer=checkpointer) config = {"configurable": {"thread_id": "user-alex"}} result = graph.invoke( {"messages": [HumanMessage("What framework did I say I mainly use?")]}, config=config ) print(result["messages"][-1].content) # → "You said you mainly use FastAPI!"
5. Long-Term Memory — PostgreSQL (Production-Grade)
For services that need simultaneous access from multiple servers, or that need to manage millions of conversations, use PostgreSQL. It is the backend LangGraph officially recommends for production.
pip install langgraph-checkpoint-postgres psycopg psycopg-pool
# memory_postgres.pyfrom langgraph.checkpoint.postgres import PostgresSaverimport osDB_URI = os.getenv("DATABASE_URL")# e.g. "postgresql://user:password@localhost:5432/mydb"def get_graph_with_postgres(): """Return a graph connected to a PostgreSQL checkpointer""" with PostgresSaver.from_conn_string(DB_URI) as checkpointer: # Run only once — auto-creates the required tables checkpointer.setup() graph = builder.compile(checkpointer=checkpointer) return graph, checkpointer# Pattern for use with FastAPIfrom fastapi import FastAPIfrom contextlib import asynccontextmanagerfrom langgraph.checkpoint.postgres.aio import AsyncPostgresSaverfrom langchain_core.messages import HumanMessageapp = FastAPI()graph_instance = None@asynccontextmanagerasync def lifespan(app: FastAPI): """Connect to DB on startup, close on shutdown""" global graph_instance async with AsyncPostgresSaver.from_conn_string(DB_URI) as checkpointer: await checkpointer.setup() # Init tables (runs once) graph_instance = builder.compile(checkpointer=checkpointer) print("✅ PostgreSQL checkpointer connected") yield # Server is running # Connection released automatically on shutdownapp = FastAPI(lifespan=lifespan)@app.post("/chat/{user_id}")async def chat_endpoint(user_id: str, message: str): config = {"configurable": {"thread_id": user_id}} result = await graph_instance.ainvoke( {"messages": [HumanMessage(content=message)]}, config=config ) return {"reply": result["messages"][-1].content}
| Environment | Recommended Checkpointer | Characteristics |
|---|---|---|
| Development / Testing | MemorySaver | No install needed, fast, resets on restart |
| Local service | SqliteSaver | File-based, persistent, single-server |
| Production | PostgresSaver | Multi-server, high availability, encryption support |
6. The Long-Term Memory Store — Learning User Preferences
If the checkpointer remembers “conversation flow,” the Store remembers “facts.”
It accumulates things like a user’s preferred writing style, frequently used language, and job role across multiple sessions. Without this, true personalization is impossible.
# memory_store.pyfrom langgraph.store.memory import InMemoryStore# Development: InMemoryStore# Production: connect a persistent store (DB, Redis, etc.)store = InMemoryStore()def save_user_preference(user_id: str, key: str, value: str): """Save a user preference to the Store""" namespace = ("user_prefs", user_id) store.put(namespace, key, {"value": value}) print(f"💾 Saved: [{user_id}] {key} = {value}")def load_user_preferences(user_id: str) -> dict: """Load all preferences for a user""" namespace = ("user_prefs", user_id) items = store.search(namespace) return {item.key: item.value["value"] for item in items}# ── Personalized chatbot node ─────────────────────────def smart_chatbot_node(state: ChatState, config: dict) -> dict: """ Retrieves stored preferences via user ID and injects them into the system prompt """ user_id = config.get("configurable", {}).get("user_id", "anonymous") prefs = load_user_preferences(user_id) pref_text = "" if prefs: pref_text = "\n\n[What I know about this user]\n" for k, v in prefs.items(): pref_text += f"- {k}: {v}\n" system = SystemMessage(content=f"""You are a personalized AI assistant.Provide responses tailored to the user's background and preferences.{pref_text}""") response = llm.invoke([system] + state["messages"]) # Auto-detect and save new preferences from conversation last_msg = state["messages"][-1].content.lower() if "python" in last_msg and ("love" in last_msg or "prefer" in last_msg): save_user_preference(user_id, "preferred_language", "Python") if "developer" in last_msg or "engineer" in last_msg: save_user_preference(user_id, "role", "Developer") return {"messages": [response]}
Practical Usage Example
# Pre-populate user infosave_user_preference("user-alex", "name", "Alex")save_user_preference("user-alex", "role", "Python Backend Developer")save_user_preference("user-alex", "preferred_style", "Technical and concise")save_user_preference("user-alex", "experience", "3 years")# Verify stored infoprefs = load_user_preferences("user-alex")print(prefs)# → {'name': 'Alex', 'role': 'Python Backend Developer', ...}# From now on, every conversation with this user automatically reflects this infoconfig = { "configurable": { "thread_id": "session-001", "user_id": "user-alex" # Used to look up the Store }}
7. Multi-User Handling — Separating Users with thread_id
In a real service, multiple users connect simultaneously. thread_id must be designed correctly to prevent conversations from bleeding across users.
# thread_id design patterns# ✅ Good — unique per user and purposeconfig_alex = {"configurable": {"thread_id": "user-alex-general"}}config_minho = {"configurable": {"thread_id": "user-minho-general"}}# One user can hold multiple threadsconfig_alex_work = {"configurable": {"thread_id": "user-alex-work-project-a"}}config_alex_study = {"configurable": {"thread_id": "user-alex-study-python"}}# ❌ Bad — all users share the same thread_id# Conversations will be mixed togetherconfig_bad = {"configurable": {"thread_id": "global"}}
# Auto-generate per-user thread_id in FastAPIfrom fastapi import FastAPI, Headerfrom typing import Optionalapp = FastAPI()@app.post("/chat")async def chat( message: str, session_id: str, # Client-managed session ID x_user_id: Optional[str] = Header(None) # User ID from JWT): # Combine user ID + session ID for guaranteed uniqueness thread_id = f"{x_user_id}-{session_id}" config = {"configurable": { "thread_id": thread_id, "user_id": x_user_id }} result = await graph_instance.ainvoke( {"messages": [HumanMessage(content=message)]}, config=config ) return {"reply": result["messages"][-1].content, "thread_id": thread_id}
8. Practical Pattern: Summary Compression for Cost Control
Memory introduces a problem: the longer the conversation, the higher the token cost.
A 100-turn conversation means sending 100 messages to the LLM every single time. That’s not sustainable.
The solution: compress old messages into a summary.
# memory_summary.py — conversation summary patterndef summarize_if_too_long(state: ChatState) -> dict: """ When the message count exceeds 20, compress older messages into a summary. Maintains: summary of past + most recent 5 messages. """ messages = state["messages"] THRESHOLD = 20 # Trigger compression above this count KEEP_RECENT = 5 # Always keep the last N messages as-is if len(messages) <= THRESHOLD: return {} # Still short — do nothing to_summarize = messages[:-KEEP_RECENT] recent = messages[-KEEP_RECENT:] # Generate summary via LLM summary_prompt = f"""Summarize the following conversation in 3–5 sentences.Be sure to include: the user's name, job, preferences, and key topics discussed.Conversation:{chr(10).join([f"{m.type}: {m.content}" for m in to_summarize])}""" summary_response = llm.invoke([HumanMessage(content=summary_prompt)]) summary_message = SystemMessage( content=f"[Previous Conversation Summary]\n{summary_response.content}" ) print(f"🗜️ Memory compressed: {len(to_summarize)} messages → 1 summary") return {"messages": [summary_message] + recent}
# Add summary node to the graphbuilder = StateGraph(ChatState)builder.add_node("chatbot", chatbot_node)builder.add_node("summarize", summarize_if_too_long)builder.set_entry_point("chatbot")builder.add_edge("chatbot", "summarize")builder.add_edge("summarize", END)
With this pattern, token costs stay flat no matter how long the conversation grows.
| Approach | Context Size After 100 Turns | Cost |
|---|---|---|
| No summary | All 100 messages | 💸💸💸 |
| Summary compression | 1 summary + last 5 messages | 💸 |
9. Memory Design Checklist
Answer these questions before designing memory in your project.
Choosing the right backend
- [ ] Local dev / prototype →
MemorySaver - [ ] Single-server small service →
SqliteSaver - [ ] Multi-server / production →
PostgresSaver
Designing thread_id
- [ ] Is every user’s ID unique?
- [ ] Are different conversation topics separated into distinct threads?
- [ ] For logged-in users, are you using
user_id + session_idcombination?
Long-term memory Store
- [ ] Is there information that must persist across sessions? (name, role, preferences)
- [ ] Auto-detection vs manual save — which approach fits?
Cost optimization
- [ ] Is there logic to summarize long conversations?
- [ ] Is only the most recent N messages kept in active context?
Error handling
- [ ] Is there a fallback if the DB connection fails?
- [ ] Does the app degrade gracefully to
MemorySaverwhen needed?
Wrapping Up — The Gap Between an AI That Remembers and One That Doesn’t
Before and after adding memory, an AI feels completely different.
An AI without memory is an intern who starts from zero every morning. An AI with memory is a colleague who knows you.
Here’s what we covered:
- MemorySaver → In-session memory. Development and testing only.
- SqliteSaver → File-based persistent memory. Local or small-scale services.
- PostgresSaver → DB-backed. The production standard.
- Store → Cross-session facts and preferences. True personalization.
- Summary pattern → Cost optimization for long conversations.
Combine these four layers and you have an AI assistant that knows you, remembers you, and grows with you.
In Part 5 — the finale — we’ll bring everything together: packaging the full AI assistant service with Docker and deploying it to the cloud. Everything built across Parts 1–4 will come together in one complete system.
🔖 Other posts in this series
- Part 1: Why Python Still Dominates in 2026
- Part 2: Build Your Own AI Chatbot — RAG From Scratch to Deployment
- Part 3: One AI Is No Longer Enough — LangGraph Multi-Agent Systems
- Part 4: AI That Finally Remembers — Complete LangGraph Memory Guide ← You are here
- Part 5: Bringing It All Together — Docker Packaging & Cloud Deployment (coming soon)
Tags: #Python #LangGraph #AIMemory #Chatbot #LangChain #PostgreSQL #SQLite #AIDevelopment #DevTutorial #2026
Sources: LangGraph Official Docs · DigitalOcean LangGraph+Mem0 Tutorial · Markaicode LangGraph Memory Guide · LangChain Checkpoint Docs