Outcome focus: Clarified the difference between short-term session state and durable agent memory, then mapped the operational risks around retrieval, security, retention, cost, and memory poisoning.
agentsgoogle cloudadkmemoryrag
Note as of April 2026: The session and memory split this post describes is still correct. The specific API names, the
google-cloud-aiplatform 1.110.0version pin, and the two-implementation framing (InMemoryMemoryServiceplusVertexAiMemoryBankService) are 1.x. ADK 2.0 (Beta v2.0.0b1, released April 22, 2026) expandedMemoryServiceto three implementations and renamed the async methods. See the follow-up on ADK 2.0 harness engineering for the current API surface and a known production gap withToolConfirmationand persistent session services.
The first serious design fork in an agent system is memory.
Not the model. Not the tool list. Not the prompt.
Memory.
What should the agent remember? How long should it remember it? Who is allowed to read it? What should be forgotten? What happens when a memory is wrong? What happens when yesterday's context leaks into today's task? What happens when a user says something private and the system quietly turns it into durable state?
Those questions decide whether memory is a helpful continuity layer or an operational liability.
Google's Agent Development Kit makes this fork visible. ADK separates session context from long-term memory. A Session tracks the current conversation thread: events, state, and short-term context. A MemoryService is the searchable long-term knowledge layer. Its contract is intentionally simple: add a completed session to memory, then search memory later when the agent needs prior context.
That simplicity is useful.
It is also easy to underestimate.
A prototype can survive with in-process memory. A production agent cannot. Once an agent starts remembering preferences, facts, support history, user context, workflow details, or decisions across sessions, the memory layer becomes part of the system's trust boundary.
That is the point of this post. Agent memory is not just persistence. It is retrieval, scope, governance, retention, evaluation, and failure handling.
Session is not memory#
ADK's docs make a clean distinction between session state and long-term memory.
A session is the conversation container. It tracks what is happening in the current interaction. It is the short-term context that helps the agent respond coherently within a thread.
Memory is different. It is a searchable archive or knowledge library the agent can consult across sessions. In ADK terms, BaseMemoryService is the interface for that layer. It is responsible for two core operations:
add_session_to_memory, which ingests information from a session.search_memory, which retrieves relevant prior context for a query.
That distinction matters because teams often use "memory" to mean everything.
Conversation history is memory. User profile is memory. Tool output cache is memory. Retrieval corpus is memory. Workflow state is memory. Preference store is memory. Audit log is memory.
Those are not the same thing.
If they are all collapsed into one bucket, the system becomes hard to reason about. A temporary fact can become durable. A durable preference can get buried in chat history. A compliance record can get mixed with personalization. A user correction can fail to update the older memory that keeps reappearing.
The first memory design decision is naming the type of state.
What belongs in the current session? What belongs in long-term memory? What belongs in a normal database? What belongs in a cache? What belongs only in the audit log? What should never be persisted?
The cleaner those boundaries are, the safer the agent becomes.
The ADK options#
ADK currently presents two main memory service implementations.
InMemoryMemoryService stores session information in the application's memory and uses basic keyword matching for searches. It requires no setup. It is good for prototyping, local development, and simple testing. It is not durable. Restart the process and the memory is gone.
VertexAiMemoryBankService connects an ADK agent to Vertex AI Agent Engine Memory Bank, now documented under Google's Agent Platform and Gemini Enterprise Agent Platform surfaces. It is the managed path for persistent, evolving memories from user conversations. The docs describe it as extracting meaningful information from conversations, consolidating new memories with existing ones, and supporting semantic retrieval across sessions.
That is the official fork:
- In-memory for demos and tests.
- Memory Bank for managed, persistent long-term memory.
There are still middle grounds. A small internal system might use SQLite, Redis, Postgres with pgvector, or another lightweight store behind a custom memory adapter. That can be legitimate when the problem is narrow and the team wants local durability without managed cloud infrastructure. But the moment the system handles real users, sensitive data, or business-critical workflows, the adapter has to carry the same responsibilities as any production memory service.
Durable does not automatically mean production-ready.
It only means the mistake can survive a restart.
The canonical workflow#
The ADK workflow is simple in shape.
First, the user interacts with an agent in a session. The session accumulates events. Those events may include user messages, model responses, and tool actions.
Second, after the session has enough meaningful information, the application calls add_session_to_memory. With Memory Bank, that triggers memory generation from the session. The managed service can extract facts, consolidate them with existing memories, and store them under a scope such as a user identity and app name.
Third, in a later turn or later session, the agent retrieves memory. This can happen through tools such as PreloadMemoryTool, which always retrieves memory at the beginning of a turn, or LoadMemory, which lets the agent decide when memory is useful. The application can also call search_memory directly.
Fourth, retrieved memories are inserted into the model context so the agent can respond with continuity.
That flow is not complicated:
session = await session_service.get_session(
app_name=APP_NAME,
user_id=USER_ID,
session_id=session_id,
)
await memory_service.add_session_to_memory(session)
response = await memory_service.search_memory(
app_name=APP_NAME,
user_id=USER_ID,
query="What temperature do I prefer?",
)The exact call path depends on whether the app is using a local ADK runner or an Agent Engine ADK template. Google's current quickstart also documents async_add_session_to_memory and async_search_memory on the Agent Engine path, with a note that these require google-cloud-aiplatform version 1.110.0 or newer.
That version detail is the kind of thing I would put directly into implementation notes. Memory code ages quickly.
Memory Bank is not just a vector database#
It is tempting to reduce long-term memory to "store embeddings and search them later."
That is part of the story, but it is not the whole story for Memory Bank.
Google's current Memory Bank overview describes several managed capabilities: memory extraction, memory consolidation, asynchronous generation, customizable extraction, multimodal understanding, managed storage and retrieval, scope isolation, similarity search, automatic expiration through TTLs, and memory revisions.
Those details matter.
Memory Bank is not only storing chunks. It is generating memories from source conversation events. It can consolidate new facts with prior facts so memory evolves. It stores memories under scopes. It can expire memories. It keeps revisions so a team can inspect how memories changed over time.
That makes it closer to a managed memory system than a generic vector store.
At retrieval time, Memory Bank can retrieve all memories for a scope or use similarity search. The current fetch-memory docs say similarity search compares embedding vectors between memory facts and the search query, and returns results sorted from shortest Euclidean distance to greatest Euclidean distance.
That is a useful correction to older summaries that casually say cosine similarity.
The lesson is not that Euclidean distance is spiritually important. The lesson is that retrieval mechanics are implementation details that should be verified against current docs before becoming architecture claims.
Scope is the security model#
Memory Bank scopes memories.
That sounds like a small implementation detail. It is not.
Scope decides which memories are considered during retrieval. The docs describe memory retrieval by exact scope match. If memories are scoped to {"user_id": "123"}, retrieval for another user should not see them. When similarity search is used, Memory Bank considers memories with the same scope as the request.
That is the beginning of the security model.
The current Google Cloud docs also support IAM Conditions for Memory Bank resources. Access can be conditioned on the memory scope through the aiplatform.googleapis.com/memoryScope API attribute. Specialized roles include memory viewer, memory editor, and memory user roles.
This is important for multi-tenant systems.
Project-level access is too broad for many agent memory workloads. Developers may need to inspect test memories without seeing production user memories. An agent identity may need write access for one application scope and read access for another. A support workflow may need scoped access to a specific user's memories, not the entire project.
The memory design should answer:
- What is the scope key?
- Is it user ID, account ID, tenant ID, app name, or some combination?
- Who can read memories in that scope?
- Who can create or generate memories?
- Who can delete or revise them?
- What happens when a user changes account or leaves the system?
If scope is sloppy, memory becomes a data leak waiting for a retrieval query.
Retention is not optional#
One of the better current Memory Bank features is automatic expiration.
The docs describe TTL support so stale information can be automatically deleted, with TTLs configurable on the Memory Bank instance for inserted or generated memories. That matters because agent memory should not be immortal by default.
People change preferences. Business rules change. Past conversations become stale. Some memories are useful for one week and dangerous after six months. Regulated data may need explicit deletion. Users may need a way to revoke or correct what the agent remembers.
Retention should be a design input, not a cleanup task.
For every memory category, I would ask:
- Why are we storing this?
- How long should it live?
- Who can delete it?
- Can the user inspect it?
- Can the user correct it?
- Does the memory contain personal, regulated, contractual, or confidential data?
- Is it needed for personalization, audit, compliance, or model behavior?
The right answer may be different for different kinds of memory.
A user's preferred temperature may be harmless and durable. A sensitive medical detail may require strict access and retention rules. A temporary project instruction may expire when the project closes. An extracted business fact may need a provenance link back to the source session.
Memory that cannot be forgotten is not a feature.
It is debt.
Memory poisoning is real#
The current Memory Bank overview explicitly calls out prompt injection and memory poisoning.
That is the right warning.
Memory poisoning happens when false or malicious information gets stored as memory and later influences the agent. This is worse than a bad answer in one turn. It persists. It can bias future sessions. It can be retrieved when the user is not thinking about the old interaction at all.
An attacker might try to convince an agent to remember a false preference, a fake authorization rule, a malicious instruction, or a misleading fact about a business process. A normal user can also accidentally create bad memory by saying something ambiguous, sarcastic, incomplete, or temporary.
That is why memory generation should not be automatic in every context.
Some memories can be generated from sessions in the background. Others should be created through explicit tools with confirmation. Some should require human review. Some should never be stored.
For high-risk agents, I would separate memory into categories:
- Harmless personalization.
- User-stated durable preferences.
- Operational facts.
- Sensitive facts.
- Security-relevant facts.
- Temporary instructions.
- Explicitly prohibited memory.
Then I would decide how each category is written, retrieved, revised, and deleted.
The system should not treat all remembered facts as equally trustworthy.
RAG is not dead, but hoarding context is#
There is a popular argument that enterprises are moving away from giant RAG stores toward more agentic architectures that fetch fresh data at run time.
I think the useful version of that argument is not "RAG is dead."
It is that static retrieval is not enough.
An agent memory system should not become a landfill of embeddings. Long-term memory is useful when it preserves meaningful context that improves future behavior. It is harmful when it stores stale facts, duplicates, sensitive data, and weak summaries that crowd out fresher sources of truth.
Some context should be retrieved live from systems of record. A customer's current subscription state should come from the billing system. A ticket's current status should come from the support system. A policy should come from the policy repository. A memory should not override fresh operational truth unless the application explicitly designs that precedence.
Memory is best for continuity and personalization:
- Preferences.
- Prior decisions.
- Stable user facts.
- Summaries of past interactions.
- Patterns that are not otherwise captured in a source system.
It is not a replacement for live data access.
The strongest agent systems will use both. Memory for continuity. Tools and source systems for current truth. Retrieval for documents. Logs for audit. Databases for durable records.
The architecture should know which one is authoritative.
Embedding drift and retrieval drift#
For custom memory backends, embedding drift is a real issue.
If the team changes embedding models, dimensions, chunking, normalization, or distance metrics, old vectors may not behave like new vectors. Similarity scores can shift. Retrieval quality can degrade. Thresholds tuned on one embedding model may not transfer to another.
Managed Memory Bank hides some of that implementation burden, but it does not remove the need to evaluate retrieval behavior. If the memory service changes, model behavior can change. If extraction prompts change, generated memories can change. If TTL settings change, context continuity can change.
For custom stores, I would version:
- Embedding model.
- Embedding dimension.
- Distance metric.
- Chunking or summarization strategy.
- Memory schema.
- Extraction prompt or classifier.
- Index version.
Then I would maintain a re-index plan.
Migration is not just recomputing vectors. It is re-running retrieval tests and checking whether the agent still sees the right memories at the right time.
Cost and latency belong in the design#
Memory has cost.
There is write cost. Search cost. Storage cost. Latency cost. Evaluation cost. Operational cost. Security review cost.
In-memory service is cheap and fast, but it is not durable. Local Redis or SQLite may be inexpensive, but they push reliability and access control back to the application team. A managed service reduces operational burden, but it introduces cloud dependency and pricing considerations. External vector databases may offer flexibility, but they need their own ACLs, networking, region controls, backup plans, and deletion workflows.
The right answer depends on the agent.
For a local prototype, in-memory is fine. For an internal tool with limited sensitivity, a lightweight persistent store may be enough. For a user-facing agent that needs personalization across sessions, managed memory becomes more attractive. For regulated workloads, access control, retention, regionality, auditability, and deletion may matter more than raw retrieval speed.
Memory architecture is a product decision disguised as infrastructure.
What I would build first#
I would start smaller than most teams want.
First, define the memory categories. Do not start by choosing the store. Decide what the agent is allowed to remember.
Second, define the scope. User, tenant, app, organization, project, or another boundary. Write examples. Check whether two users could accidentally retrieve each other's memories.
Third, build with InMemoryMemoryService only long enough to prove behavior. Use it for local tests and demos. Do not confuse a passing demo with a production memory design.
Fourth, move to a durable memory service before real users depend on continuity. On Google Cloud, that likely means VertexAiMemoryBankService or direct Memory Bank APIs if the managed path fits the product.
Fifth, add retrieval tests. Give the agent known memories and queries. Verify that it retrieves the right memory, ignores the wrong memory, and behaves correctly when no memory is found.
Sixth, add retention and deletion. If the system cannot forget, it is not ready.
Seventh, add observability. Log when memory is generated, which scope it belongs to, which memories were retrieved, and whether the answer used them. Be careful not to create a second sensitive-data store in logs.
That is enough to learn without pretending memory is solved.
Failure modes#
The first failure mode is shipping in-memory persistence. It works until the process restarts, then users discover the agent has amnesia.
The second is storing whole conversations when the product needs distilled facts. Full transcripts can be noisy, sensitive, and expensive to retrieve.
The third is over-extraction. The memory layer saves too much, including temporary preferences, jokes, mistakes, or sensitive details.
The fourth is under-extraction. The agent never remembers the thing users expect it to remember, so trust collapses.
The fifth is scope leakage. Memories from one user, tenant, or app become retrievable in another context.
The sixth is retrieval overconfidence. The agent treats retrieved memory as truth even when it conflicts with current source-of-record data.
The seventh is memory poisoning. Bad instructions or false facts get persisted and affect future sessions.
The eighth is no deletion story. The team can create memories but cannot explain how a user corrects or removes them.
The ninth is embedding migration without evaluation. The index changes and nobody knows retrieval quality changed until users report strange behavior.
The tenth is calling memory "production-ready" because it is durable. Production readiness also means access control, retention, monitoring, cost awareness, evals, and recovery.
The point#
Agent memory is not magic.
It is state management with language in the loop.
That makes it powerful and dangerous in equal measure. The agent can remember the user's preferences, avoid asking the same questions, personalize responses, and maintain continuity across sessions. It can also remember the wrong thing, leak context, retrieve stale facts, or turn a casual sentence into durable behavior.
Google ADK gives the memory layer a clean interface. Memory Bank gives Google Cloud teams a managed path with persistent memories, extraction, consolidation, scoped retrieval, TTLs, revisions, and IAM controls. Those are useful primitives.
They do not remove the design work.
The design work is deciding what memory means for the product.
What should be remembered. What should be retrieved. What should expire. What should require review. What should come from a source system instead. What should never be stored.
That is the real memory architecture.