Skip to content
Docs

Get started

Add Agent Memory to an agent so it can recall durable context across conversations.

This guide uses the Agents SDK and its Session API to expose memory recall as a model-callable tool. The same pattern applies if you use another agent framework: store memories with ingest() or remember(), expose recall() through one of your agent's tools, and use the system prompt to tell the model when to search memory.

Prerequisites

  1. Sign up for a Cloudflare account.
  2. Install Node.js.

Node.js version manager

Use a Node version manager like Volta or nvm to avoid permission issues and change Node.js versions. Wrangler, discussed later in this guide, requires a Node version of 16.17.0 or later.

You also need access to Agent Memory.

How agent memory works

Use recall() when the model needs relevant memory to answer or act. Use ingest() when you have conversation messages and want Agent Memory to extract durable memories automatically. Use remember() when your agent already knows the exact memory to store.

Do not call ingest() after every model turn. Instead, batch ingestion after the user goes idle, when a conversation is compacted, or at another natural checkpoint.

1. Create a project

Create a Worker project:

npm create cloudflare@latest -- memory-agent

For setup, select the following options:

  • For What would you like to start with?, choose Hello World example.
  • For Which template would you like to use?, choose Worker only.
  • For Which language do you want to use?, choose TypeScript.
  • For Do you want to use git for version control?, choose Yes.
  • For Do you want to deploy your application?, choose No (we will be making some changes before deploying).

Move into the project directory:

Terminal window
cd memory-agent

Install the dependencies used by this guide:

npm i agents ai workers-ai-provider

2. Create a namespace

A namespace scopes the memory profiles for your application. Create one with Wrangler:

npx wrangler agent-memory namespace create my-agent

You will use the namespace name, my-agent, in your Worker binding.

3. Configure bindings

Add an agent_memory binding to your Wrangler configuration. If you use the Agents SDK, also register your agent Durable Object.

JSONC
{
"$schema": "./node_modules/wrangler/config-schema.json",
"name": "memory-agent",
"main": "src/server.ts",
// Set this to today's date
"compatibility_date": "2026-06-04",
"compatibility_flags": [
"nodejs_compat"
],
"ai": {
"binding": "AI"
},
"agent_memory": [
{
"binding": "MEMORY",
"namespace": "my-agent"
}
],
"durable_objects": {
"bindings": [
{
"name": "ChatAgent",
"class_name": "ChatAgent"
}
]
},
"migrations": [
{
"tag": "v1",
"new_sqlite_classes": [
"ChatAgent"
]
}
]
}

Generate local TypeScript types for your bindings:

npx wrangler types

4. Add memory recall as a tool

The model cannot use memory just because your application has a memory binding. You need to expose recall through a tool and instruct the model when to call it.

With the Agents SDK Session API, add a searchable context provider. Session turns the provider's search() method into a search_context tool for the model.

Create src/server.ts and add the recall setup:

src/server.js
import { Agent, routeAgentRequest } from "agents";
import { Session } from "agents/experimental/memory/session";
const INSTRUCTIONS = "You are a helpful assistant.";
const MEMORY_CONTEXT = `Long-term memory is available through the search_context tool.
MEMORY POLICY
- Search memory with search_context when the user asks what you know or remember about them.
- Search memory when the request depends on prior sessions, preferences, project state, conventions, decisions, or long-running tasks.
- Phrase memory searches as concise topics, not questions.
- Do not search memory to repeat something the user just said in the current conversation.
- When search_context returns results, always incorporate them into your response. The results are real memories from previous conversations.
- Treat recalled memories as helpful context, not guaranteed truth. If a memory is important for an irreversible action, confirm with the user.`;
const MEMORY_PROFILE_NAME = "demo-user";
export class ChatAgent extends Agent {
initialState = { cursor: 0, nextIngestAt: null };
session = Session.create(this)
.withContext("instructions", {
provider: { get: async () => INSTRUCTIONS },
})
.withContext("memory", {
description:
"Searchable durable memory: facts, events, instructions, and tasks from prior conversations.",
provider: {
get: async () => MEMORY_CONTEXT,
search: async (query) => {
const profile = await this.env.MEMORY.getProfile(MEMORY_PROFILE_NAME);
const { answer } = await profile.recall(query, {
responseLength: "short",
});
return answer || "No relevant memories found.";
},
},
})
.withCachedPrompt();
}
export default {
async fetch(request, env) {
return (
(await routeAgentRequest(request, env)) ??
new Response("Not found", { status: 404 })
);
},
};

The system prompt is as important as the tool. It tells the model when to call search_context, when not to call it, and how to treat recalled memory.

5. Extract memories from conversation

Next, give your agent a way to add durable memories. In a chat agent, the usual path is to store the conversation in Session, then call ingest() after the user goes idle.

Change the agents import and add the AI SDK imports. Keep the Session import from step 4.

src/server.js
import { Agent, getAgentByName, routeAgentRequest } from "agents";
import { convertToModelMessages, generateText, stepCountIs } from "ai";
import { createWorkersAI } from "workers-ai-provider";

Add the ingestion delay near the top of the file, below the imports:

src/server.js
const MEMORY_INGEST_DELAY_SECONDS = 10;

Then update ChatAgent with the following shape. The comment marks where to keep the Session setup from step 4.

src/server.js
export class ChatAgent extends Agent {
initialState = { cursor: 0, nextIngestAt: null };
// Keep the `session = Session.create(this)` setup from step 4 here.
async chat(message) {
const userMessage = {
id: `user-${crypto.randomUUID()}`,
role: "user",
parts: [{ type: "text", text: message }],
};
await this.session.appendMessage(userMessage);
await this.scheduleIngest();
const workersai = createWorkersAI({ binding: this.env.AI });
const result = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
system: await this.session.freezeSystemPrompt(),
messages: await convertToModelMessages(await this.session.getHistory()),
tools: await this.session.tools(),
stopWhen: stepCountIs(5),
});
const assistantMessage = {
id: `assistant-${crypto.randomUUID()}`,
role: "assistant",
parts: [{ type: "text", text: result.text }],
};
await this.session.appendMessage(assistantMessage);
return result.text;
}
async ingestScheduledMemory() {
await this.runIngest();
}
async scheduleIngest() {
await this.cancelPendingIngest();
await this.schedule(
MEMORY_INGEST_DELAY_SECONDS,
"ingestScheduledMemory",
{},
);
this.setState({
...this.state,
nextIngestAt: Date.now() + MEMORY_INGEST_DELAY_SECONDS * 1000,
});
}
async cancelPendingIngest() {
const pending = await this.listSchedules();
for (const schedule of pending) {
if (schedule.callback === "ingestScheduledMemory") {
await this.cancelSchedule(schedule.id);
}
}
}
async runIngest() {
const history = await this.session.getHistory();
const messages = history
.slice(this.state.cursor)
.filter(
(message) => message.role === "user" || message.role === "assistant",
)
.map((message) => ({
role: message.role,
content: message.parts
.map((part) => (part.type === "text" ? part.text : ""))
.filter(Boolean)
.join("\n\n"),
}))
.filter((message) => message.content);
if (messages.length === 0) {
this.setState({ ...this.state, nextIngestAt: null });
return { ingested: 0 };
}
const profile = await this.env.MEMORY.getProfile(MEMORY_PROFILE_NAME);
await profile.ingest(messages, { sessionId: this.name });
this.setState({
...this.state,
cursor: history.length,
nextIngestAt: null,
});
return { ingested: messages.length };
}
}

Replace the default export with a small test endpoint. Each conversationId maps to a separate Agent instance with its own Session history.

src/server.js
export default {
async fetch(request, env) {
const url = new URL(request.url);
if (request.method === "POST" && url.pathname === "/chat") {
const { message, conversationId = "default" } = await request.json();
if (!message) {
return Response.json({ error: "Missing message" }, { status: 400 });
}
const agent = await getAgentByName(env.ChatAgent, conversationId);
const response = await agent.chat(message);
return Response.json({ response });
}
return (
(await routeAgentRequest(request, env)) ??
new Response("Not found", { status: 404 })
);
},
};

The ingestion path is scheduleIngest() and runIngest(). Each user message cancels the previous pending ingest and schedules a new one, so Agent Memory processes the conversation after the user goes idle instead of after every agent turn.

runIngest() uses a cursor so each batch only includes messages that have not already been ingested. The sessionId groups memories by conversation inside the shared memory profile.

This demo uses one Agent Memory profile, demo-user, across multiple conversations. In production, choose profile names that match your application scope, such as users, teams, tenants, or organizations.

You can also call the ingestion logic from a Session compaction hook. The important constraint is to ingest in batches at natural checkpoints, not after every agent turn. In production, choose an ingest delay that matches your application's user experience.

6. (Optional) Store explicit memories when needed

Automatic ingestion is enough for most apps. If you want the model to store a specific memory immediately, add a server-side tool whose execute function calls remember().

Use this when the agent already knows the exact memory to store. For example, the model might call a rememberMemory tool after the user says: "Remember that I prefer concise answers."

If the model can call a memory-write tool, add system prompt instructions that define what is worth remembering and when to ask for confirmation. For many agents, automatic conversation ingestion is simpler and safer than giving the model a direct memory-write tool.

7. Test the app

Start local development:

npx wrangler dev

Ask the first conversation to remember a durable preference:

Terminal window
curl -X POST "http://localhost:8787/chat" \
-H "Content-Type: application/json" \
-d '{"conversationId":"first-chat","message":"I prefer TypeScript examples and concise answers."}'

Wait at least 30 seconds before sending the next request. The code schedules ingestion to run 10 seconds after the user goes idle, and Agent Memory then needs additional time to extract, classify, and index the memories before they are available for recall.

Ask a different conversation a question that depends on durable memory. This request uses a different Session history but the same Agent Memory profile.

Terminal window
curl -X POST "http://localhost:8787/chat" \
-H "Content-Type: application/json" \
-d '{"conversationId":"second-chat","message":"What do you know or remember about me and my preferences?"}'

The model should call search_context, receive recalled memory from Agent Memory, and use that context in its response. The second conversation has no shared Session history with the first, so any knowledge of user preferences comes from Agent Memory.

Next steps