What ADK 2.0 Adds, and Where the Approval Path Still Breaks

Why an ADK 2.0 ToolConfirmation flow paired with VertexAiSessionService re-presented the same approval to a reviewer on Monday morning and ran the tool twice, and what the gap tells you about how to evaluate harness primitives at different maturity levels.

By Jovani Pink April 28, 2026 9 min — Platform & AI Engineering

Outcome focus: Reader can map ADK 2.0 primitives onto a session-service backing store and decide which combinations are production-ready, which are beta-with-known-gaps, and which require waiting.

Updated April 28, 2026 to cover ADK 2.0 (Beta v2.0.0b1, released April 22). The November 2025 post on agent memory as an operating boundary is the conceptual anchor; the API surface in that earlier post is 1.x.

A reviewer on a payments operations team approved a sensitive refund request on a Friday at 4:55 pm. The agent had paused on ToolConfirmation waiting for a human decision. The reviewer clicked approve. The agent confirmed receipt. The reviewer left for the weekend.

The 5 pm deploy rolled the worker.

Monday morning the reviewer opened the queue. The same refund request was sitting at the top, with the same payload, the same justification text, and the same context. The reviewer assumed something had failed silently over the weekend, looked at the request again, and approved it a second time. The agent processed the refund. The customer received two refunds. The audit log showed two distinct approval events with no recoverable evidence that the second was a duplicate of the first.

The team had picked VertexAiSessionService for the agent's session backing because the production checklist required durable session events for SOC2 audit logging. They had picked ToolConfirmation for the approval flow because the ADK 2.0 documentation describes it as the canonical Human-in-the-Loop primitive. Both choices were correct in isolation. The combination was silently broken.

The current ADK 2.0 documentation states the gap directly: "DatabaseSessionService is not supported by this feature. VertexAiSessionService is not supported by this feature." The feature itself is labeled experimental. The confirmation state goes into in-process state, not the configured persistent backing store. A worker restart erases it.

This post is about why that gap matters, what ADK 2.0 actually adds at different maturity levels, and how a harness engineer decides which primitives to depend on and which to wait for.

Harness Engineering, in One Paragraph#

The framing the agent community has been converging on is that an agent is a model plus a harness, and the harness is the part that ships, not the model. Cobus Greyling has written about this directly in his Substack on harness engineering; the same shape shows up in this site's earlier posts on the repo as a harness and the control-plane vs compute-plane split for sandboxed agents. What makes ADK 2.0 a useful case study for the discipline is that the SDK now ships an explicit set of harness primitives at uneven maturity. Some are GA-quality and ready to depend on. One is experimental in a way that breaks production durability if you do not read the small print.

The Three-Tier Memory Architecture#

ADK 2.0 separates session state, durable memory, and per-invocation scratch into three tiers, each with its own backing service. The session tier is SessionService, with three implementations: InMemorySessionService for development, VertexAiSessionService as the default persistent option backed by Google Cloud Agent Engine, and DatabaseSessionService for SQL-backed deployments (SQLite, MySQL, Postgres). The durable memory tier is MemoryService, with three implementations: InMemoryMemoryService for development, VertexAiMemoryBankService as a managed Memory-as-a-Service that extracts and consolidates facts, and VertexAiRagMemoryService for vector-indexed retrieval.

State within a session is namespaced by prefix. The state prefix documentation confirms three: user: for cross-session personal context tied to a user id, app: for global application data shared across users, and temp: for single-invocation scratch that is discarded after the turn completes.

The three-tier ADK 2.0 memory architecture. The session tier carries the conversation. State prefixes carve scope inside the session. The memory tier carries durable semantic recall.

The split matters because each tier has different durability, scope, and access semantics. Session events live for the lifetime of the session and respect the configured SessionService. State prefixed user: survives across that user's sessions. State prefixed app: survives across all users. State prefixed temp: does not survive at all. Memory entries written through add_session_to_memory survive across sessions and can be retrieved through search_memory against a different session later. Mixing these up is the most common ADK design mistake; treating temp: as if it were durable, or treating MemoryService as if it were a session log, both produce the kind of subtle correctness failures that survive code review and surface in production.

Lazy Context Compaction#

The model's context window is the operational bottleneck for long-running sessions. ADK 2.0 ships EventsCompactionConfig to manage this without forcing the developer to write summarization logic by hand. The configuration takes a compaction_interval (the number of completed events that triggers compaction), an overlap_size (the number of recently compacted events kept verbatim in the new compaction set so context quality does not collapse at the boundary), and an LlmEventSummarizer that calls a configured model (Gemini, by default) to produce the summary. What compaction actually does in production is straightforward: it drops events older than the window and replaces them with one summary event, so the agent's view of its own history shrinks without losing the load-bearing facts. The honest limit is that summary quality is whatever the summarizer model produces, which means compaction is one more model call to evaluate, not a free lunch.

Skills, Briefly#

ADK 2.0 implements progressive skill loading on the model side; the Skills documentation and the developers blog guide describe an L1 metadata layer (frontmatter, around 100 tokens), an L2 instructions layer (the body of a skill, under 5,000 tokens, loaded on trigger), and an L3 resources layer (external files loaded only when the agent reaches for them). The post on the repo as a harness covers the markdown side of this discipline in detail; the SDK side is an implementation of the same idea with the model's context budget as the optimization target.

The Capability Matrix#

The harness engineering decision is which ADK 2.0 primitive to depend on with which session-service backing. The matrix below is the artifact I keep open when reviewing an agent design.

PrimitiveBacking service requiredMaturityProduction verdictFailure mode if you ignore the verdict
State prefixes (user:, app:, temp:)Any SessionServiceGAUse freelyNone at the prefix layer; mistakes are application-level scope errors
EventsCompactionConfigAny SessionServiceGAUse; evaluate the summarizerContext grows past the window and the model degrades on long sessions
Skills (L1/L2/L3)None at the SessionService layerGAUse; evaluate trigger accuracyContext budget is consumed by always-loaded instructions; skill triggering misfires
MemoryService (VertexAiMemoryBankService)Independent of SessionServiceGAUse with retention and access controlsMemory poisoning, retention bloat, IAM gaps
ToolConfirmation with InMemorySessionServiceInMemorySessionService onlyExperimentalDemo and prototyping onlyNone for prototypes; cannot survive a worker restart
ToolConfirmation with VertexAiSessionServiceNot supportedExperimentalDo not deployApproval state lost on worker restart; reviewer re-approves; tool runs twice
ToolConfirmation with DatabaseSessionServiceNot supportedExperimentalDo not deploySame failure mode as VertexAi backing

The pattern in the matrix is that the durable-memory primitives are GA and the human-in-the-loop primitive is experimental. That asymmetry is what surprises teams. Memory is the older problem; HITL is the newer one. ADK 2.0 has solved the older problem and is still working through the newer one.

Where the Approval Path Still Breaks#

The mechanism behind the failure scene at the top of this post is that ADK's ToolConfirmation keeps its pending-approval state in the agent runtime's process memory. The session events that the persistent VertexAiSessionService does write are the surrounding context (the request, the tool call, the agent's reasoning) but not the confirmation state itself. When the worker restarts, the agent loads the session events from the durable backing store, sees that an approval was requested, and presents the request to the next available reviewer. There is no record that an earlier reviewer already approved it, because that approval lived in the in-process state that died with the previous worker.

The right move today, if the application requires both durable session events and HITL approval, is to keep ToolConfirmation out of the path entirely and run the approval flow through an external system. A queue of pending approvals, a small approval service that writes to the same audit store as the session events, and an idempotency key on the tool call itself are the production-shaped pieces. The post on durable agent execution and the xstate-python landscape covers why this is a state-machine plus durable-execution problem, not a primitive-availability problem; the same shape applies here. ADK 2.0's ToolConfirmation will mature; the application's correctness should not block on that.

Migration: 1.x to 2.0#

If you are running the architecture I described in the November 2025 post, the smallest move onto 2.0 without a storage migration is to update the API surface in place. The async memory methods were renamed (the 1.x async_add_session_to_memory and async_search_memory are now add_session_to_memory and search_memory), the google-cloud-aiplatform 1.110.0 version pin from that earlier post is stale, and MemoryService now ships as three implementations rather than two. The conceptual split between session and memory is unchanged. Storage is incompatible between 1.x and 2.0; plan the upgrade against a window where you can migrate state, not as an in-place version bump. The official ADK migration notes at adk.dev/2.0/ cover the breaking changes that matter.

The harness engineering question that survives the migration is the one this post opened with: which primitives are ready to depend on, which are still maturing, and how do you build the application so that the experimental ones can mature without forcing a rewrite. ADK 2.0 is a strong harness for the durable-memory side of the problem and a partial harness for the human-in-the-loop side. Build for the version of the SDK that ships, not the one the documentation implies.

Back to all writing
On this page
  1. Harness Engineering, in One Paragraph
  2. The Three-Tier Memory Architecture
  3. Lazy Context Compaction
  4. Skills, Briefly
  5. The Capability Matrix
  6. Where the Approval Path Still Breaks
  7. Migration: 1.x to 2.0