Google Agent Development Kit (ADK) Handbook

A practitioner's reference for building, configuring, and deploying agents with Google ADK 2.0 — covering LLM agents, agent teams, the graph-based Workflow Runtime, the Task API, A2A delegation, and deployment on Agent Runtime.

google-adk 2.3 (2.0 GA) Python 3.10+ · Go · Java · TS Gemini 3 / 3.5 · Claude · Ollama Graph Workflow Runtime A2A Protocol · Task API Agent Runtime (fka Agent Engine)

Getting Started — Install & Setup

Google Agent Development Kit (ADK) is an open-source, code-first framework for building, evaluating, and deploying agentic AI applications. ADK supports Python, Go, Java, and TypeScript; this handbook focuses on Python, with Go equivalents noted where the graph API differs.

ℹ️

ADK 2.0 reached General Availability for Python on May 19, 2026 and for Go on June 30, 2026. It introduces the Workflow Runtime — a graph-based execution engine — alongside the original LlmAgent model. Official docs: adk.dev and the migration guide at adk.dev/2.0.

Installation

# Install ADK (Python 3.10+; ADK 2.0 core requires 3.10+, some extras want 3.11+)
pip install google-adk        # currently 2.3.x — release cadence is roughly bi-weekly

# Optional extras — the 2.0 line expanded this list significantly
pip install google-adk[eval]           # evaluation tools (adk eval)
pip install google-adk[a2a]            # Agent-to-Agent protocol client/server
pip install google-adk[mcp]            # Model Context Protocol toolsets
pip install google-adk[toolbox]        # pre-built connectors: GitHub, Jira, Notion, MongoDB, Qdrant...
pip install google-adk[agent-identity] # workload identity / auth for agent-to-service calls
pip install google-adk[gcp]            # Vertex AI / Agent Runtime deployment support
pip install google-adk[all]            # everything, incl. litellm, otel-gcp, db, slack, e2b

# Create a new agent project scaffold
adk create my_agent

Project Structure

The adk create command scaffolds a minimal but complete agent project. Since 2.0, the scaffold can generate either an Agent-first project or a Workflow-first (graph) project:

my_agent/
  agent.py          # root_agent definition — required entry point
  .env              # API keys and environment variables
  __init__.py       # exposes root_agent for the ADK runner

The only required element is a root_agent variable in agent.py. It can be a plain Agent, or — since 2.0 — a Workflow instance composed of nodes and edges. Everything else is optional.

🚨

Breaking change — sessions are not forward-compatible. Sessions written by ADK 2.0 add new node_info and output fields to the Event schema. They are readable by ADK 1.28+ but are incompatible with older 1.x releases. If you have a custom BaseSessionService backed by rigid SQL columns, migrate the schema before upgrading. Pin with google-adk~=1.0 if you are not ready to move.

First Agent

Update agent.py with a working agent that uses a tool:

from google.adk import Agent

# Tool: plain Python function — ADK infers schema from type hints + docstring
def get_weather(city: str) -> dict:
    """Returns current weather conditions for a city.

    Args:
        city: The name of the city to get weather for.

    Returns:
        A dict with status, city, and temperature fields.
    """
    # Replace with a real weather API call in production
    return {"status": "success", "city": city, "temp_c": 22}

root_agent = Agent(
    name="weather_agent",
    model="gemini-3-flash",
    description="Provides current weather for any city.",
    instruction="You are a weather assistant. Use get_weather to answer questions.",
    tools=[get_weather],
)

💡

Model string tip: gemini-flash-latest and gemini-pro-latest aliases always point at the current recommended Flash/Pro model, which is useful in samples and demos. Pin an explicit version (e.g. gemini-3-flash) for anything you deploy to production, so a model-alias rollover doesn't silently change your agent's behaviour.

Running Your Agent

adk run — CLI

Interactive terminal session. Best for quick iteration during development.

adk run my_agent

adk web — Agent Builder UI

Starts a local browser UI at http://localhost:8000. Since 2.0 it supports pointing at either a directory of agents or a single agent folder directly, and renders the workflow graph visually.

adk web path/to/agents_dir --port 8000

⚠️

Dev only: adk web is for development and debugging. Use Cloud Run, GKE, or Agent Runtime for production deployments. The 2.0 Dev UI also added a consolidated, click-to-expand event view, full keyboard/arrow-key navigation, and rich tooltips on function calls showing arguments, responses, and state changes as they happen.

Environment Configuration

# Gemini API (Google AI Studio)
GOOGLE_API_KEY="your-key-here"

# OR: Gemini Enterprise Agent Platform (formerly Vertex AI)
GOOGLE_CLOUD_PROJECT="your-project-id"
GOOGLE_CLOUD_LOCATION="us-central1"

# Default-on OpenTelemetry tracing for agents deployed on Agent Runtime — no extra config needed
# For local/self-hosted OTel export instead:
OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4317"

Build Your Agent

The primary agent type in ADK is LlmAgent (aliased as Agent). It uses a large language model as its core engine to reason, plan, use tools, and generate responses. In ADK 2.0, an Agent can also run standalone or as a node inside a Workflow graph — the same class powers both models.

LlmAgent Parameters

name

Required. Unique identifier for the agent. Used in multi-agent delegation, workflow node references, and logs.

model

Required. Model string: "gemini-3-flash", "gemini-3-pro", "gemini-3.5-flash", or any LiteLLM-supported model.

instruction

System prompt defining the agent's persona, scope, and behavioural guardrails. Supports templating with {variable} placeholders bound via instruction_provider.

description

Human-readable summary used by parent orchestrators, and by the workflow router, to decide when to delegate to this agent.

tools

List of callables or BaseTool instances. Functions are auto-wrapped; docstrings drive schema generation.

sub_agents

List of child agents this agent can delegate to (hierarchical model). For graph-based orchestration, prefer composing agents as Workflow nodes instead — see Workflow Runtime.

mode

New in 2.0. "task" turns the agent into a self-contained Task-API unit with finish_task installed automatically — useful as a workflow node that must produce structured output and hand control back to the graph.

input_schema / output_schema

Pydantic model or JSON schema for structured input/output. Forces the LLM to return valid JSON matching the schema; required for most task-mode graph nodes.

generate_content_config

Low-level GenerateContentConfig for temperature, top-p, max tokens, safety settings, thinking level, etc.

planner

Enables multi-step planning before tool execution. Use built_in_planner() for Gemini's native planning/thinking.

code_executor

Attach a code execution environment (e.g. LocalCodeExecutor or the e2b sandbox extra) allowing the agent to run Python dynamically.

Writing Effective Instructions

Instructions are the single biggest lever on agent quality. Treat them as a precision contract.

root_agent = Agent(
    name="support_agent",
    model="gemini-3-flash",
    instruction="""
You are a Tier-1 support agent for AcmeCorp SaaS.

SCOPE
- Answer questions about billing, account access, and product features.
- For engineering or outages, route the user to the escalation_agent.
- Never discuss competitor products or internal roadmaps.

FORMAT
- Keep responses under 120 words.
- Use bullet points for steps.
- End every response with: 'Is there anything else I can help with?'

TOOLS
- Use search_kb before answering any product feature question.
- Use get_account_info only when the user provides their account ID.
    """,
    tools=[search_kb, get_account_info],
    sub_agents=[escalation_agent],
)

Supported Models

Model	String	Strengths	Best For
Gemini 3.5 Flash	`gemini-3.5-flash`	Latest agentic/coding model, 4× output tokens/sec vs. prior frontier models, best long-horizon tool use	Production agents, coding assistants, long agentic runs
Gemini 3 Pro	`gemini-3-pro`	Deepest reasoning, native multimodal, 1M-token context	Complex analysis, planning nodes, orchestrators
Gemini 3 Flash	`gemini-3-flash`	Pro-grade reasoning at Flash-level latency and cost	Default choice for most sub-agents and tool-calling nodes
Gemini 3.1 Flash-Lite	`gemini-3.1-flash-lite`	Lowest latency/cost, configurable thinking level	High-volume, cost-sensitive classification and routing nodes
Claude (via LiteLLM)	`anthropic/claude-sonnet-4-6`	Instruction following, writing, structured output	Drafting, review/verifier agents
Ollama (local)	`ollama/llama3.1`	Privacy, no API cost	Dev/test, on-premises

# Any non-Gemini model via LiteLLM
from google.adk.models.lite_llm import LiteLlm

agent = Agent(
    model=LiteLlm(model="anthropic/claude-sonnet-4-6"),
    ...
)

# Local Ollama
agent = Agent(
    model=LiteLlm(model="ollama/mistral"),
    ...
)

Sessions & Memory

ADK separates sessions (short-term per-conversation state) from memory (long-term cross-session recall). On Google Cloud, the managed memory service was renamed as part of the June 2026 Agent Platform rebrand.

Session State

Scoped to a single conversation run. Access via tool_context.state or callback_context.state inside tools and callbacks. Automatically checkpointed — including new node_info/output fields for graph runs.

Agent Platform Memory Bank

Persists facts across sessions. Use InMemoryMemoryService for dev or VertexAiRagMemoryService (backed by Agent Platform Memory Bank, formerly "Memory Bank" under Vertex AI) for production RAG-backed recall.

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.memory import InMemoryMemoryService

runner = Runner(
    agent=root_agent,
    app_name="my_app",
    session_service=InMemorySessionService(),
    memory_service=InMemoryMemoryService(),   # optional
)

# Create a session and run
session = await runner.session_service.create_session(
    app_name="my_app", user_id="user-123"
)
async for event in runner.run_async(
    user_id="user-123",
    session_id=session.id,
    new_message=Content(parts=[Part(text="What is the weather in Tokyo?")]),
):
    if event.is_final_response():
        print(event.content.parts[0].text)

Agent Teams & A2A

ADK's Agent Teams pattern is a hierarchical multi-agent primitive: a root orchestrator delegates tasks to specialised sub-agents. The orchestrator is an LlmAgent with sub_agents; it decides autonomously which sub-agent handles each task. For deterministic, auditable delegation, ADK 2.0 layers the Task API and a formalised A2A protocol on top of this same model.

User Request

→

Orchestrator (LlmAgent)

→

Sub-Agent A

Sub-Agent B

→

Aggregated Response

Defining an Agent Team

from google.adk import Agent

# Specialist agents
billing_agent = Agent(
    name="billing_agent",
    model="gemini-3-flash",
    description="Handles billing questions, invoices, and subscription changes.",
    instruction="You are a billing specialist. Only answer billing-related questions.",
    tools=[get_invoice, update_subscription, apply_credit],
)

technical_agent = Agent(
    name="technical_agent",
    model="gemini-3-pro",  # richer model for complex debugging
    description="Handles technical support, bugs, and integration issues.",
    instruction="You are a technical support engineer. Diagnose and resolve technical issues.",
    tools=[search_kb, run_diagnostics, create_ticket],
)

# Orchestrator — delegates via sub_agents, no direct tools needed
root_agent = Agent(
    name="support_orchestrator",
    model="gemini-3-flash",
    description="Routes customer support requests to the right specialist.",
    instruction="""
You are a customer support orchestrator.
Analyse the user's intent and delegate to the appropriate specialist agent.
- Billing questions → billing_agent
- Technical issues  → technical_agent
Never answer domain questions yourself — always delegate.
    """,
    sub_agents=[billing_agent, technical_agent],
)

A2A Protocol — Formalised in 2.0

Agent-to-Agent (A2A) delegation lets one agent call a "specialist" agent running in a different process, region, or even a different framework entirely, discovered via a URL endpoint. ADK 2.0 formalises this with the a2a extra, structured task hand-off, and a shared Task API so that a delegation can be:

Multi-Turn Task Mode

The remote agent has a real back-and-forth conversation (e.g. clarifying questions) before returning a structured result via finish_task.

Single-Turn Controlled Output

One request, one structured response — no conversation. Ideal for classification, extraction, or scoring calls between services.

Mixed / HITL Delegation

A task can pause mid-execution and wait for a human approval step before resuming — the same primitive used for in-graph human-in-the-loop nodes.

from google.adk.a2a import A2AClient
from google.adk import Agent

# Delegate to a specialist agent running as its own remote service
credit_check_agent = A2AClient(
    name="credit_check_agent",
    endpoint="https://risk.internal.acme.com/agents/credit-check",
    description="Runs a credit check and returns an approval decision.",
)

root_agent = Agent(
    name="loan_intake_agent",
    model="gemini-3-flash",
    instruction="Collect loan application details, then delegate credit checks to credit_check_agent.",
    sub_agents=[credit_check_agent],
)

Agent Registry

ℹ️

Agent Registry is now GA on Gemini Enterprise Agent Platform — a centralised catalog for discovering and registering agents and MCP servers across an organisation, with a v1 API and client libraries. This makes A2A discovery a lookup instead of a hardcoded URL for teams running many internal agents.

Context Sharing Between Agents

All agents in a hierarchical team share the same session state. Sub-agents can read and write state keys to pass results back to the orchestrator.

from google.adk.tools import ToolContext

def fetch_user_profile(user_id: str, tool_context: ToolContext) -> dict:
    """Fetch and cache user profile in session state."""
    profile = call_user_api(user_id)
    tool_context.state["user_profile"] = profile
    return profile

# In another agent's tool, read state written by a sibling agent:
def personalize_response(tool_context: ToolContext) -> str:
    profile = tool_context.state.get("user_profile", {})
    return f"Hello {profile.get('name', 'there')}!"

💡

Design principle: Give each sub-agent a narrow, well-described description. The orchestrator uses this description — not the agent's name — to decide delegation. For anything that needs a guaranteed execution order, fan-out, or retries, prefer the graph-based Workflow Runtime over implicit LLM-driven delegation.

Multi-Tool Agents

A multi-tool agent is an LlmAgent equipped with several tools. The LLM autonomously decides which tools to call, in what order, and how to chain their outputs to fulfil the user's request.

Tool Types Overview

Type	How to Create	Best For
Function Tool	Plain Python function with type hints + docstring	Business logic, API calls, data access
AgentTool	Wrap another `Agent` as a tool	Reusing an existing agent inside another agent
Built-in Tools	`google_search()`, `code_execution()`	Web grounding, running code
MCP Tools	`MCPToolset.from_server(...)`	Third-party MCP server integrations
OpenAPI Tools	`OpenAPIToolset.from_spec(...)`	Any REST API with an OpenAPI spec
Toolbox Connectors	`from google.adk.toolbox import ...`	Pre-built, maintained connectors — see below

ℹ️

ADK Tools & Integrations Ecosystem (Feb 2026): the google-adk[toolbox] extra ships maintained connectors across four categories — Code/dev: GitHub, GitLab, Postman, Restate, Daytona · Project mgmt: Asana, Jira, Confluence, Linear, Notion · Data/memory: MongoDB, Pinecone, Qdrant, Chroma, GoodMem · Observability: MLflow (OTel-native), Arize AX, AgentOps, Freeplay, Monocle. This turns ADK from "build your own connector" into a curated execution layer over your existing engineering stack.

Multi-Tool Agent Example

from google.adk import Agent
from google.adk.tools import google_search, code_execution

def query_database(sql: str) -> dict:
    """Run a read-only SQL query against the analytics database.

    Args:
        sql: A valid SELECT statement.

    Returns:
        A dict with 'rows' (list of row dicts) and 'row_count'.
    """
    results = db.execute(sql).fetchall()
    return {"rows": [dict(r) for r in results], "row_count": len(results)}

analyst_agent = Agent(
    name="analyst_agent",
    model="gemini-3-pro",
    instruction="""
You are a data analyst. You can query the database, search the web for context,
execute Python for data transformation, and email summaries.
Always validate your SQL is read-only before executing.
    """,
    tools=[
        query_database,
        send_email,
        google_search(),     # built-in: web grounding
        code_execution(),    # built-in: run Python for analysis
    ],
)

Tool Result Handling

ADK serialises tool return values to JSON and injects them back into the LLM context as tool response parts. Best practices:

Return structured dicts, not raw strings — the LLM reasons better over structured data.
Always include a status key ("success" or "error") so the LLM can handle failures gracefully.
Return only what the LLM needs — large payloads inflate token usage and can confuse reasoning.
Use consistent key names across tools to reduce LLM confusion (e.g. always "items" not sometimes "results").

💡

Tool count tip: LLMs perform best with 5–10 well-defined tools. Beyond ~15 tools, consider splitting the agent into specialised sub-agents — or into distinct nodes in a Workflow graph — with a routing layer in front.

Workflow Runtime — Graph-Based Orchestration

ADK 2.0's headline change is the Workflow Runtime: a graph-based execution engine that transitions ADK from a purely hierarchical agent executor into a system where Agents, Tools, and plain Python Functions are all evaluated as nodes in a directed graph connected by edges. This gives you deterministic, LLM-free control flow with conditional routing, fan-out/fan-in, retries, loops, and human-in-the-loop pauses — all as first-class, resumable graph behaviour.

ℹ️

The classic workflow agents from ADK 1.x — SequentialAgent, ParallelAgent, and LoopAgent — remain available for simple, linear cases and are still the fastest way to express "run these in order" or "run these concurrently." The Workflow graph API described below is the ADK 2.0 way to express anything with branching, joins, retries, or pauses.

Nodes and Edges — The Mental Model

A node is any unit of work: an Agent (LLM call), a FunctionNode (pure Python — free and instant), or a JoinNode (waits for parallel branches to complete). Edges describe the flow between nodes, and can be unconditional or conditional on a node's output.

START

→

classify (FunctionNode)

→

router (LlmAgent)

├─ "urgent" →

priority_handler

└─ "standard" →

standard_handler

both →

JoinNode

→

human_review (HITL)

Basic Sequential Graph

from google.adk import Agent, Workflow

generate_fruit_agent = Agent(
    name="generate_fruit_agent",
    instruction="Return the name of a random fruit. Return only the name.",
)
generate_benefit_agent = Agent(
    name="generate_benefit_agent",
    instruction="Tell me a health benefit about the specified fruit.",
)

# A 3-tuple edge chains nodes in order: START → A → B
root_agent = Workflow(
    name="root_agent",
    edges=[("START", generate_fruit_agent, generate_benefit_agent)],
)

Conditional Routing

A router node returns a route key; the matching edge decides where execution goes next. This is a plain Python function or a lightweight agent — no LLM call required if the logic is deterministic.

from google.adk import Agent, Workflow
from google.adk.events import Event

def router(node_input: str) -> Event:
    """Route to task B or C based on node_input."""
    if is_urgent(node_input):
        return Event(route="RUN_TASK_C")
    return Event(route="RUN_TASK_B")

task_B_agent = Agent(name="task_B_agent", model="gemini-3-flash", instruction="Handle a standard request.")

def task_C_node(node_input: str) -> Event:
    """A FunctionNode to execute node C — zero LLM cost."""
    return Event(output="Escalated to on-call engineer.")

root_agent = Workflow(
    name="routing_workflow",
    edges=[
        ("START", task_A_node, router),
        (router, {
            "RUN_TASK_B": task_B_agent,
            "RUN_TASK_C": task_C_node,
        }),
    ],
)

Fan-Out / Fan-In with JoinNode

Run several branches in parallel from a shared source, then wait for all of them to complete before continuing — the graph equivalent of ParallelAgent, but composable with conditional routing and retries.

from google.adk import Workflow
from google.adk.workflow import JoinNode

my_join_node = JoinNode(name="my_join_node")

root_agent = Workflow(
    name="parallel_research",
    edges=[
        ("START", parallel_task_A, my_join_node),
        ("START", parallel_task_B, my_join_node),
        ("START", parallel_task_C, my_join_node),
        (my_join_node, final_summary_agent),
    ],
)

⚠️

JoinNode proceeds only once every upstream node has emitted an Event. If one branch fails to output anything, the join is stuck and the workflow stalls. Always include a failsafe/fallback output path from any node that feeds a JoinNode, and pair it with a RetryConfig or timeout so a stuck branch surfaces as an error rather than hanging forever.

Retries, Timeouts & Human-in-the-Loop

Every node accepts a retry_config and timeout, and any node can pause the graph to wait for human input — the same mechanism the Task API uses for multi-turn conversations.

from google.adk.workflow import RetryConfig

flaky_api_node = FunctionNode(
    name="call_flaky_partner_api",
    fn=call_partner_api,
    retry_config=RetryConfig(
        max_attempts=3,
        initial_delay=1.0,
        backoff_factor=2.0,
        jitter=True,
        exceptions=(ConnectionError, TimeoutError),
    ),
    timeout=30,   # hard wall-clock cap in seconds
)

approval_gate = Agent(
    name="approval_gate",
    mode="task",
    instruction="Summarise the pending action and wait for a human 'approve' or 'reject'.",
)  # pauses the graph until a human responds via the ADK Web UI or API

🚨

Migration note: a broad except Exception: block inside a node's underlying tool silently disables the 2.0 automatic retry mechanism, because the framework never sees the failure. Worse, catching BaseException also traps NodeInterruptedError, which breaks the framework's ability to pause for human-in-the-loop input. Let exceptions propagate out of tools; configure RetryConfig instead of catching broadly.

Choosing an Orchestration Style

Pattern	Mechanism	Use When
Simple ordered pipeline	`SequentialAgent`	Each step depends on the previous step's output; no branching
Simple concurrent fanout	`ParallelAgent`	Steps are independent; no need for conditional joins
Simple retry loop	`LoopAgent`	Repeat until quality threshold, with a hard `max_iterations`
Conditional branching / DAGs	`Workflow` + `edges`	Routing depends on data; multiple paths through the graph
Fan-out with aggregation	`Workflow` + `JoinNode`	Parallel branches must be collected before continuing
Human approval mid-flow	`Workflow` node in `mode="task"`	High-stakes actions (payments, deletions) need a pause-and-resume gate
Dynamic dispatch	`LlmAgent` + `sub_agents`	Routing logic is fuzzy/ambiguous; let the LLM decide

Agent & Graph Routing

Routing controls which agent — or which graph node — handles a given request. ADK supports everything from fully LLM-driven delegation to deterministic, code-based routing inside a Workflow.

LLM-Based Routing (Default, Hierarchical)

The orchestrator's LLM reads the user input and selects a sub-agent based on each agent's description. No routing code required.

router = Agent(
    name="router",
    model="gemini-3-flash",
    instruction="Route the user's request to the most appropriate specialist.",
    sub_agents=[billing_agent, tech_agent, onboarding_agent],
    # ADK uses agent.description to build the routing prompt automatically
)

LLM-as-Router in a Graph

For graph-based flows, an LlmAgent can act purely as a classifier: it emits a route tag, and a trivial function node maps that tag to an edge. The model decides; the graph makes the decision reliable, observable, and resumable.

classifier = Agent(
    name="classifier", model="gemini-3.1-flash-lite",
    instruction="Classify the message as one of: question, exclamation, statement. Reply with only that word.",
)

def emit_route(classifier_output: str) -> Event:
    return Event(route=classifier_output.strip().lower())

root_agent = Workflow(
    name="llm_router_workflow",
    edges=[
        ("START", classifier, emit_route),
        (emit_route, {
            "question":    question_handler,
            "exclamation": exclamation_handler,
            "statement":   statement_handler,
        }),
    ],
)

Custom Router Function (Hierarchical)

For deterministic or cost-optimised routing outside a graph, provide a before_agent_callback that sets the target agent name in state:

from google.adk.agents.callback_context import CallbackContext
import re

def intent_router(callback_context: CallbackContext):
    """Keyword-based intent classification before LLM routing."""
    user_msg = callback_context.user_content.parts[0].text.lower()

    if re.search(r"invoice|billing|payment|refund", user_msg):
        callback_context.state["intended_agent"] = "billing_agent"
    elif re.search(r"error|crash|bug|integrate", user_msg):
        callback_context.state["intended_agent"] = "tech_agent"
    # Returning None lets the LLM orchestrator handle ambiguous cases

Routing Patterns Summary

LLM Routing

Pro: Flexible, handles ambiguous inputs, zero code. Con: Costs tokens per request, non-deterministic.

LlmAgent + sub_agents

Graph Routing

Pro: Deterministic, observable in the workflow graph, cheap for pure function nodes. Con: Requires up-front route design.

Workflow edges + Event.route

Keyword / Rule Routing

Pro: Fast, free, deterministic. Con: Brittle, misses paraphrasing, requires maintenance.

before_agent_callback

Fallback Chain

Try primary agent; on error or low-confidence response, re-route to a fallback agent. Implement via after_agent_callback or a default route edge.

after_agent_callback

Agent Configuration

ADK exposes granular configuration through GenerateContentConfig for LLM parameters, RunConfig for runtime execution settings, and (2.0+) RetryConfig/timeouts at the node level.

LLM Generation Config

from google.genai.types import GenerateContentConfig, SafetySetting, HarmCategory, HarmBlockThreshold

agent = Agent(
    name="configured_agent",
    model="gemini-3-flash",
    instruction="You are a precise data extraction assistant.",
    generate_content_config=GenerateContentConfig(
        temperature=0.2,          # low = more deterministic
        top_p=0.9,
        max_output_tokens=2048,
        thinking_level="low",   # "minimal" | "low" | "medium" | "high" — Gemini 3.x thinking control
        stop_sequences=["END_OF_OUTPUT"],
        safety_settings=[
            SafetySetting(
                category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
                threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
            )
        ],
    ),
)

Runtime Config

from google.adk.runners import RunConfig

run_config = RunConfig(
    streaming_mode="SSE",        # "SSE" | "BIDI" | None
    max_llm_calls=20,              # circuit breaker: max LLM calls per run
    response_modalities=["TEXT"],  # "TEXT" | "AUDIO" | "IMAGE"
)

async for event in runner.run_async(
    user_id="u1", session_id="s1",
    new_message=msg,
    run_config=run_config,
):
    ...

Key Configuration Parameters

Parameter	Location	Default	Notes
`temperature`	GenerateContentConfig	1.0	0.0 = deterministic, 2.0 = creative
`thinking_level`	GenerateContentConfig	model default	Gemini 3.x reasoning control: minimal/low/medium/high — trades latency for depth
`max_output_tokens`	GenerateContentConfig	model default	Cap output length to control cost
`max_llm_calls`	RunConfig	unlimited	Always set in production to prevent runaway agents
`streaming_mode`	RunConfig	None	SSE for web streaming; BIDI for voice/audio
`output_schema`	Agent	None	Forces structured JSON output; required for most task-mode nodes
`retry_config`	Node / FunctionNode	no retry	New in 2.0. Per-node retry policy with backoff and jitter
`timeout`	Node	none	New in 2.0. Hard wall-clock cap per node attempt

🚨

Always set max_llm_calls in production. A buggy tool returning infinite errors causes an LLM agent to loop indefinitely without a call limit, generating unexpected costs. In graph workflows, pair this with per-node timeout and retry_config for defense in depth.

Structured Output

Force the LLM to return JSON matching a Pydantic schema:

from pydantic import BaseModel
from google.adk import Agent

class ExtractedOrder(BaseModel):
    order_id: str
    customer_name: str
    total: float
    line_items: list[str]

extractor = Agent(
    name="order_extractor",
    model="gemini-3-flash",
    instruction="Extract order details from the provided email text.",
    output_schema=ExtractedOrder,  # response is always valid JSON
)

Custom Agents & Nodes

Custom agents are created by subclassing BaseAgent. Use them when you need unique operational logic, specific control flows, or integrations not covered by LlmAgent or the workflow primitives. In graph-first designs, the equivalent extension point is a FunctionNode or a custom BaseNode subclass.

BaseAgent Contract (Hierarchical)

from google.adk.agents import BaseAgent
from google.adk.agents.invocation_context import InvocationContext
from google.adk.events import Event
from typing import AsyncGenerator

class MyCustomAgent(BaseAgent):
    """A custom agent with bespoke control flow."""

    validator: BaseAgent
    executor: BaseAgent
    max_retries: int = 3

    async def _run_async_impl(
        self, ctx: InvocationContext
    ) -> AsyncGenerator[Event, None]:
        """Core execution logic — must yield Event objects."""

        for attempt in range(self.max_retries):
            async for event in self.validator.run_async(ctx):
                yield event

            if ctx.session.state.get("validation_passed"):
                async for event in self.executor.run_async(ctx):
                    yield event
                return

            ctx.session.state["retry_count"] = attempt + 1

        ctx.session.state["failed"] = True

Custom FunctionNode (Graph)

Any pure Python function can be dropped directly into a Workflow's edges list. Use a FunctionNode wrapper when you need to attach a retry_config, timeout, or explicit schema:

from google.adk.workflow import FunctionNode, RetryConfig

def validate_and_normalize(raw_input: dict) -> dict:
    """Deterministic validation node — no LLM, no cost."""
    if not raw_input.get("email"):
        raise ValueError("Missing email field")  # propagates to RetryConfig, don't swallow it
    return {**raw_input, "email": raw_input["email"].strip().lower()}

validate_node = FunctionNode(
    name="validate_and_normalize",
    fn=validate_and_normalize,
    retry_config=RetryConfig(max_attempts=2),
    timeout=5,
)

Custom Agent / Node Use Cases

Rules Engine Node

Deterministic business rule evaluation as a FunctionNode. Route around the LLM entirely for cases fully covered by rules — zero LLM cost.

Human-in-the-Loop Node

Pause a graph mid-run and wait for human approval, natively supported by the Workflow Runtime and the Task API — no bespoke resumption plumbing required.

Guard Agent

Input/output safety guard. Wrap any sub-agent or node: inspect the incoming message, optionally reject it, run the wrapped step, then inspect/redact the response before returning.

Caching Node

Semantic cache layer as a FunctionNode. Hash the query, check a vector store (Pinecone/Qdrant/Chroma via the toolbox extra) for a near-duplicate prior answer, and route around the LLM on a hit.

Guard Agent Example

from google.adk.agents import BaseAgent

class InputGuardAgent(BaseAgent):
    """Validates inputs before passing to inner_agent."""
    inner_agent: BaseAgent
    forbidden_patterns: list[str] = []

    async def _run_async_impl(self, ctx):
        user_text = ctx.user_content.parts[0].text if ctx.user_content else ""

        for pattern in self.forbidden_patterns:
            if pattern.lower() in user_text.lower():
                yield self._make_final_response(
                    "I'm not able to help with that request."
                )
                return

        async for event in self.inner_agent.run_async(ctx):
            yield event

Custom Tools & Integrations

Custom tools are Python callables that agents can invoke. ADK auto-generates the tool schema from type annotations and the Google-style docstring. No additional registration boilerplate is needed.

Function Tool (Recommended)

def search_documents(
    query: str,
    collection: str = "general",
    max_results: int = 5,
) -> dict:
    """Search an internal document collection.

    Use this tool to find relevant documents, policies, or knowledge-base
    articles matching a natural-language query.

    Args:
        query: Natural-language search query.
        collection: Document collection to search. One of: 'general',
            'technical', 'legal'. Defaults to 'general'.
        max_results: Maximum number of results to return (1–20).

    Returns:
        dict with:
            status: 'success' or 'error'.
            results: list of dicts, each with 'title', 'url', 'snippet'.
            total_found: int — total matching documents.
    """
    hits = vector_store.search(query, collection=collection, top_k=max_results)
    return {
        "status": "success",
        "results": [{"title": h.title, "url": h.url, "snippet": h.snippet} for h in hits],
        "total_found": hits.total,
    }

Pre-Built Toolbox Connectors

Rather than hand-rolling connectors for common systems, the toolbox extra ships maintained, versioned wrappers:

from google.adk.toolbox.github import GitHubToolset
from google.adk.toolbox.jira import JiraToolset
from google.adk.toolbox.mongodb import MongoDBToolset
from google.adk import Agent

github_tools  = GitHubToolset(repo="acme/backend", token=os.environ["GITHUB_TOKEN"])
jira_tools    = JiraToolset(project="ENG", base_url="https://acme.atlassian.net")
mongo_tools   = MongoDBToolset(connection_string=os.environ["MONGO_URI"], read_only=True)

devops_agent = Agent(
    name="devops_agent",
    model="gemini-3-flash",
    instruction="Triage failing CI runs: check GitHub Actions, file/update Jira tickets, and query build metrics from MongoDB.",
    tools=[*github_tools.get_tools(), *jira_tools.get_tools(), *mongo_tools.get_tools()],
)

MCP Tool Integration

Connect any Model Context Protocol server as a toolset — ADK discovers tools from the MCP server automatically. With Agent Registry now GA, MCP servers registered in your org can also be discovered by name instead of a hardcoded connection string.

from google.adk.tools.mcp_tool import MCPToolset, StdioServerParameters

async def build_agent_with_mcp():
    mcp_tools, exit_stack = await MCPToolset.from_server(
        connection_params=StdioServerParameters(
            command="npx",
            args=["-y", "@modelcontextprotocol/server-filesystem", "/workspace"],
        )
    )
    agent = Agent(
        name="fs_agent",
        model="gemini-3-flash",
        tools=mcp_tools,    # tools auto-discovered from MCP server
    )
    return agent, exit_stack

OpenAPI Tool Integration

from google.adk.tools.openapi_tool import OpenAPIToolset

async def build_api_agent():
    toolset = await OpenAPIToolset.from_spec(
        spec_url="https://api.example.com/openapi.json",
        operations_filter=lambda op: op.method == "GET",
    )
    agent = Agent(
        name="api_agent",
        model="gemini-3-flash",
        tools=toolset.tools,
    )
    return agent

Tool Best Practices

✅

Write docstrings like user stories. ADK passes your docstring verbatim to the LLM as the tool description. Vague docstrings lead to incorrect tool selection.

Use specific argument types. Prefer Literal["option_a", "option_b"] over bare str for enum-like args — it becomes an enum in the tool schema, drastically reducing LLM errors.

Never swallow exceptions from tools. Return an {"status": "error", "message": "..."} dict for expected failures, but let unexpected exceptions propagate — since 2.0, that's what feeds the node-level RetryConfig.

Skills for Agents

Agent Skills are reusable, self-contained capability packages that work within AI context-window limits. They differ from raw tools by encapsulating a higher-level behaviour with its own context, prompt fragments, and internal logic.

ℹ️

ADK Skills docs: adk.dev/skills/ — covers both pre-built skills from the Agent Skills registry and building custom skills.

Skills vs Tools vs Sub-Agents vs Graph Nodes

	Tool	Skill	Sub-Agent	Graph Node
Scope	Single function call	Multi-step capability	Full autonomous agent	Any unit of work
Context cost	Low	Medium (context-efficient)	High (own conversation)	Depends on node type
Control flow	None	Internal only	LLM-decided delegation	Explicit edges/routes
Retries	Manual	Manual	Manual	`RetryConfig` built in
LLM calls	0	0–1 (optional)	1+	0 (FunctionNode) or 1+ (Agent node)

Using a Pre-built Skill

from google.adk.skills import load_skill
from google.adk import Agent

# Load a pre-built skill from the Agent Skills registry
web_research_skill = load_skill("google/web-research")
summariser_skill   = load_skill("google/summarise-document")

agent = Agent(
    name="research_agent",
    model="gemini-3-flash",
    tools=[web_research_skill, summariser_skill],
)

Building a Custom Skill

A skill is a Python class that implements the BaseSkill interface, packaging related tools + context instructions together:

from google.adk.skills import BaseSkill
from dataclasses import dataclass

@dataclass
class CRMSkill(BaseSkill):
    """CRM operations skill — search contacts, create deals, update pipeline.

    Optimised to minimise context-window usage by pre-fetching commonly
    needed metadata in a single batch call rather than individual lookups.
    """
    crm_url: str
    api_key: str

    @property
    def name(self) -> str:
        return "crm_skill"

    @property
    def description(self) -> str:
        return (
            "Search CRM contacts, create or update deals, "
            "and manage sales pipeline stages."
        )

    def get_tools(self) -> list:
        return [self._search_contacts, self._create_deal, self._update_stage]

    def _search_contacts(self, name: str = "", email: str = "") -> dict:
        """Search CRM for contacts by name or email."""
        results = crm_api.search(name=name, email=email, url=self.crm_url)
        return {"status": "success", "contacts": results}

    def _create_deal(self, contact_id: str, title: str, value: float) -> dict:
        """Create a new deal in the CRM pipeline."""
        deal = crm_api.create_deal(contact_id=contact_id, title=title, value=value)
        return {"status": "success", "deal_id": deal.id}

    def _update_stage(self, deal_id: str, stage: str) -> dict:
        """Update the pipeline stage of an existing deal."""
        crm_api.update(deal_id=deal_id, stage=stage)
        return {"status": "success"}

# Use the skill in an agent
crm_skill = CRMSkill(crm_url="https://crm.example.com", api_key="...")
agent = Agent(
    name="sales_agent",
    model="gemini-3-flash",
    tools=crm_skill.get_tools(),
)

Context-Efficiency Tips

Batch lookups: Skills should fetch all needed context in a single tool call rather than making multiple sequential calls.
Return summaries: Return compact summaries of large data sets; include a fetch_details(id) tool for when the LLM needs more.
Use state caching: Cache expensive lookups in session state so sibling tools in the same skill don't repeat the same API call.
Skill composition: Compose coarse-grained skills from fine-grained tools rather than exposing every low-level function directly to the LLM.

Callbacks

Callbacks hook into the agent execution lifecycle without modifying core logic. They are the primary extension point for logging, guardrails, caching, and side effects.

Callback	When	Can Modify
`before_agent_callback`	Before any agent run starts	Block execution, modify state
`after_agent_callback`	After agent run completes	Modify/replace final response
`before_model_callback`	Before each LLM API call	Modify prompt, block call, inject cached response
`after_model_callback`	After each LLM API call	Inspect/modify raw LLM response
`before_tool_callback`	Before each tool execution	Block call, modify args, return mock result
`after_tool_callback`	After each tool execution	Modify tool response, log results

from google.adk.agents.callback_context import CallbackContext
from google.adk.models import LlmRequest, LlmResponse
import time

# Example: LLM call logger + latency tracker
def before_model_log(ctx: CallbackContext, request: LlmRequest):
    ctx.state["_llm_start"] = time.time()
    print(f"[LLM] Calling {request.model} | tokens≈{len(str(request.contents))//4}")

def after_model_log(ctx: CallbackContext, response: LlmResponse):
    elapsed = time.time() - ctx.state.pop("_llm_start", time.time())
    print(f"[LLM] Done in {elapsed:.2f}s")

agent = Agent(
    name="monitored_agent",
    model="gemini-3-flash",
    before_model_callback=before_model_log,
    after_model_callback=after_model_log,
    ...
)

ℹ️

The wider ADK ecosystem (Go SDK first, Python following the same direction) is moving toward a unified execution context — one context object shared by tools, callbacks, and graph nodes — so instrumentation code written for a callback also works unmodified inside a workflow node. Watch the release notes if you build heavily on ToolContext/CallbackContext internals.

2.0 Migration Notes

ADK Python 2.0 reached GA on May 19, 2026; ADK Go 2.0 followed on June 30, 2026. Both introduce the Workflow Runtime as a genuinely additive layer — existing 1.x LlmAgent, SequentialAgent, ParallelAgent, and LoopAgent code keeps working — but a few things are worth reviewing before upgrading a production app.

✕ Watch for — Breaking in 2.0

Custom BaseSessionService backed by rigid SQL columns needs new node_info / output fields
Sessions written by 2.0 are unreadable by ADK <1.28
Broad except Exception in tools silently disables node retries
Catching BaseException traps NodeInterruptedError and breaks HITL pausing
Task-mode agents can't be used as static (non-task) graph nodes

✓ New in 2.0 — Additive

Workflow + node/edge graph engine — routing, fan-out/fan-in, retry, loops
Task API — multi-turn task mode, single-turn controlled output, HITL
JoinNode for deterministic fan-in of parallel branches
Per-node RetryConfig and timeout
A2A protocol formalised; Agent Registry reached GA for discovery
Toolbox ecosystem: GitHub, Jira, Notion, MongoDB, Pinecone, Qdrant, MLflow, and more

Safe Upgrade Path

# If you're not ready to move to 2.0 yet, pin the compatible-release operator:
pip install "google-adk~=1.0"

# When ready, upgrade and re-run your eval suite before promoting to prod
pip install --upgrade google-adk
adk eval my_agent ./evals/regression_suite.json

⚠️

Report any additional 1.x → 2.0 incompatibility you hit through the adk-python issue tracker — the migration guide at adk.dev/2.0 is actively updated as teams upgrade.

Deployment

ℹ️

Naming update (June 2026): Vertex AI is now part of Gemini Enterprise Agent Platform. Agent Engine is renamed Agent Runtime. Agent Builder Sessions is now Agent Platform Sessions, and Memory Bank is now Agent Platform Memory Bank. Old bookmarked Cloud Console links redirect automatically.

Cloud Run (Recommended for Most Teams)

# Build and push container
gcloud builds submit --tag gcr.io/PROJECT_ID/my-agent

# Deploy to Cloud Run
gcloud run deploy my-agent \
  --image gcr.io/PROJECT_ID/my-agent \
  --region us-central1 \
  --platform managed \
  --allow-unauthenticated \
  --set-env-vars GOOGLE_CLOUD_PROJECT=PROJECT_ID

Agent Runtime (Production Grade, formerly Agent Engine)

Agent Runtime provides session persistence, horizontal scaling, and built-in observability — with OpenTelemetry tracing enabled by default for new ADK agent deployments as of the 2026 release wave:

# Deploy using ADK CLI
adk deploy cloud_run \
  --project PROJECT_ID \
  --region us-central1 \
  --agent_module my_agent \
  --service_name my-agent-service \
  --verbosity debug     # detailed Cloud Run deploy logging

# Or use the agents-cli for the full managed Agent Runtime
agents deploy --source . --name my-agent --region us-central1

Sub-Second Cold Starts

Agent Runtime provisioning is now under 1 minute, and cold starts for scaled-to-zero agents are sub-second — a major change from the original Agent Engine's minutes-long spin-up.

Long-Running Operations

Agent Runtime now supports operations running for up to 7 days — suited to long agentic workflows and overnight batch-style graph runs.

Custom Containers

You can bring your own container image to Agent Runtime instead of using the managed build path, useful for agents with unusual system dependencies.

Deployment Checklist

Set max_llm_calls in RunConfig and per-node timeout/RetryConfig to prevent runaway agents and stuck graphs.
Use a production-grade session service (Cloud Spanner, Firestore, or Postgres) — verify your schema handles the 2.0 node_info/output event fields.
Rely on default-on OpenTelemetry tracing on Agent Runtime, or export to your own OTel collector / MLflow for self-hosted observability.
Store all secrets in Secret Manager — never in environment variables baked into container images.
Set up rate limiting and budget alerts in Google Cloud Billing.
Write evaluation tests with adk eval before promoting to production, and re-run them after any ADK version bump.
If exposing agents for A2A delegation, register them in Agent Registry so other teams can discover them without hardcoded endpoints.

Reference Links

Python Quickstart Installation, project setup, first agent
ADK 2.0 Overview & Migration Guide Workflow Runtime, Task API, breaking changes
LLM Agents LlmAgent / Agent reference — all parameters
Graph Routes Workflow, edges, routers, JoinNode
Classic Workflow Agents SequentialAgent, ParallelAgent, LoopAgent
Multi-Agent Systems & A2A Agent teams, orchestration, A2A protocol
Custom Agents Extending BaseAgent
Agent Config GenerateContentConfig, RunConfig, RetryConfig reference
Custom Tools Function tools, MCP tools, OpenAPI toolsets, Toolbox connectors
Skills for Agents Pre-built skills, custom skill development
Callbacks Lifecycle hooks, guardrails, logging patterns
Deploy to Cloud Run Container deployment guide
Agent Runtime on Gemini Enterprise Agent Platform Managed deployment, formerly Vertex AI Agent Engine
Evaluation Testing agents with adk eval
ADK Python GitHub Source code, changelog, issues, examples
ADK Go GitHub Idiomatic Go implementation, GA since June 2026
Python API Reference Full class / method reference incl. google.adk.workflow
google-adk on PyPI Latest release, changelog, extras