How Agent Memory works
Agent Memory is a managed service that gives your applications persistent, AI-powered memory. It automatically turns raw conversations into structured knowledge and retrieves the right context when you need it.
Agent Memory classifies every extracted memory into one of four types:
- Facts — Stable knowledge about a person, project, or tool. Preferences, identities, relationships, and goals. Facts evolve over time through supersession: when a newer fact replaces an older one on the same topic, the old version is preserved but the latest surfaces in recall results.
- Events — Completed actions anchored to a point in time. Deployments, decisions, milestones, and observations. Events accumulate and do not conflict with each other.
- Instructions — Reusable procedures, workflows, and conventions. Like facts, instructions support supersession when updated.
- Tasks — Short-lived, session-scoped items such as active investigations and follow-ups. Tasks are deprioritized after the session ends.
When you call ingest(), Agent Memory processes the conversation through several stages:
-
Extraction — AI reads the conversation and identifies discrete, memorable items. Each item is a standalone piece of knowledge with a clear summary and supporting content.
-
Classification — Each extracted item is classified into a memory type (fact, event, instruction, or task) and assigned a topic key, keywords, and search queries for later retrieval.
-
Deduplication — The system checks for duplicates against both the current batch and existing stored memories. Facts and instructions with the same topic key supersede older versions rather than creating duplicates.
-
Storage — Memories are written to durable storage with full-text search indexes. Non-task memories are also embedded as vectors for semantic search.
Raw conversation messages are always stored verbatim alongside extracted memories, preserving the original transcript for full-text search.
When you call recall(), Agent Memory runs multiple retrieval strategies in parallel:
-
Query analysis — AI analyzes your query to determine the best retrieval approach, generating keyword terms, topic keys, and semantic search vectors.
-
Parallel retrieval — The system simultaneously searches across keyword indexes, topic key lookups, semantic vector indexes, and raw conversation messages.
-
Scoring and ranking — Results from all sources are combined and ranked to surface the most relevant memories while maintaining diversity across retrieval methods.
-
Synthesis — AI generates a natural language answer from the top-ranked memories, grounded in the actual stored content.
If no memories match the query, recall() returns an empty answer rather than hallucinating a response.
Agent Memory is designed for safe re-ingestion:
-
Messages are content-addressed. Each message gets a deterministic ID derived from its content and session. Sending the same message twice does not create a duplicate.
-
Sessions are deterministic. If you do not provide a
sessionId, one is derived from the message content. The same conversation always maps to the same session. -
Facts and instructions evolve. When a new memory shares a topic key with an existing one (for example, "editor preference"), the old memory is marked as superseded. The latest version surfaces in recall results, but the full history is preserved.