Enterprise Integration Reference

LLM API Standards and Tooling Handbook

A production-ready guide to OpenAI and Anthropic API standards, tool-calling formats, and architecture patterns that reduce lock-in while keeping developer velocity high.

Raw JSON First Python SDK Examples Vendor-Agnostic Design v2 · July 2026

What's new in v2 (July 2026): a new Module 7 on the Model Context Protocol (MCP) — now governed by the Linux Foundation's Agentic AI Foundation and adopted by OpenAI, Google, Microsoft, and AWS as the standard for connecting models to tools. Module 2 covers OpenAI's Responses API and the Assistants API sunset. Module 3 covers Anthropic's GA structured outputs, strict tool use, and the platform.claude.com console migration. Module 5 adds OpenAI-compatible endpoints (Gemini, Grok) and OpenRouter to the agnostic-wrapper discussion. Code samples updated to current model families (GPT-5.6, Claude Sonnet 4.6 / Opus 4.7).

How to use this handbook: every module starts with the exact wire payload sent over HTTP, then shows how Python SDKs abstract that payload. This is the fastest way to understand both API behavior and production debugging.

This handbook is organized around seven enterprise implementation modules. The sequence moves from API fundamentals, through the two dominant wire formats and tool-calling conventions, to vendor-agnostic architecture, common integration failures, and the Model Context Protocol (MCP) — the interoperability layer that has emerged alongside chat-completions as a second de facto standard.

Module 1 Included

The Paradigm of LLM APIs

Why standards matter, what lock-in looks like, and why /v1/chat/completions became the practical common language.

Module 2 Included

OpenAI API Standard

Message roles, payload anatomy, Python SDK usage, and how the newer Responses API relates to Chat Completions.

Module 3 Included

Anthropic Messages API

Top-level system prompt design, strict alternation, and implementation details in Python.

Module 4 Included

Tool Calling Formats

OpenAI function/tool calling versus Anthropic tool definitions and output behavior.

Module 5 Included

Agnostic Wrapper Pattern

Factory-based router design, LiteLLM translation, and OpenAI-compatible endpoints (Gemini, Grok) to avoid provider-specific rewrites.

Module 6 Included

Pitfalls and Anti-Patterns

Streaming gaps, schema drift, incorrect tool-result handoff, and blind trust in third-party MCP servers.

Module 7 New in v2

Model Context Protocol (MCP)

The open, Linux Foundation-governed standard for connecting models to tools and data — architecture, a minimal server, and how it differs from native tool calling.

Module 1: The Paradigm of LLM APIs

When teams integrate LLMs, they are not just calling a model. They are establishing a long-term contract between product behavior, infrastructure, and vendor capability. API standards matter because they define how quickly you can switch providers, integrate new models, or roll back during incidents.

The API Landscape

Think of LLM APIs like electrical sockets. If every country had a unique plug shape, device makers would ship custom adapters for each region. That is exactly what happens when each model provider has a unique payload format: your application code becomes a stack of adapters, fragile parsers, and duplicated logic.

In enterprise systems, vendor lock-in is not only a pricing risk. It is also an operational risk. If procurement, latency, compliance, or safety requirements change, your architecture should allow provider migration with minimal code churn.

Concern	Without Standardization	With Standardization
Cost control	Model switch requires rewrites	Model switch is often a config change
Incident response	Hard to fail over across providers	Fallback routing is easier
Team onboarding	Provider-specific mental models	Shared API semantics across teams

The De Facto Standard

The OpenAI chat-completions shape, especially /v1/chat/completions and the messages array, became the de facto standard because it is simple, expressive, and easy to proxy. Open-source inference servers such as vLLM, Ollama, and llama.cpp adopted compatible endpoints so developers could reuse clients and existing prompts. By 2026 this pattern extended to hosted frontier labs as well: Google's Gemini API and xAI's Grok API both ship OpenAI-compatible endpoints (swap the base_url and API key, keep the OpenAI SDK), and Anthropic's own /v1/messages is reachable through the same OpenAI SDK shim on gateways such as Vercel's AI Gateway.

In practice, this means many systems talk OpenAI-format JSON even when the backend model is not from OpenAI at all. This has the same effect as SQL in databases: each engine has differences, but a shared shape dramatically lowers adoption friction.

A second standard emerged in 2025–2026: while chat-completions standardized how you talk to a model, the Model Context Protocol (MCP) — introduced by Anthropic in November 2024 and donated to the Linux Foundation's Agentic AI Foundation in December 2025 — standardized how a model reaches out to tools and data sources. OpenAI, Google DeepMind, Microsoft, and AWS all shipped MCP support within about a year of launch. See Module 7 for the full picture.

Module 2: The OpenAI API Standard

The Payload Structure (Raw JSON)

At the wire level, the request body is JSON. The core object is messages, an ordered list representing conversation turns. Each item has a role and content. The three core roles are:

system: policy, tone, constraints, and behavioral instructions.
user: user intent and query.
assistant: prior model responses, useful for continuation and grounding.

{
  "model": "gpt-5.6-terra",
  "temperature": 0.2,
  "max_tokens": 400,
  "messages": [
    {
      "role": "system",
      "content": "You are an enterprise support assistant. Be concise and accurate."
    },
    {
      "role": "user",
      "content": "Summarize the SLA policy for API retries."
    }
  ]
}

The Code (Python SDK)

import os
from typing import Optional

from openai import OpenAI


def get_openai_chat_reply(user_prompt: str) -> str:
    """Send a standard chat completion request and return assistant text."""
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    response = client.chat.completions.create(
        model="gpt-5.6-terra",
        temperature=0.2,
        max_tokens=400,
        messages=[
            {"role": "system", "content": "You are an enterprise support assistant."},
            {"role": "user", "content": user_prompt},
        ],
    )

    content: Optional[str] = response.choices[0].message.content
    return content or ""


if __name__ == "__main__":
    answer = get_openai_chat_reply("Summarize the SLA policy for API retries.")
    print(answer)

Key characteristic: OpenAI-format APIs are highly flexible. Multiple system messages are allowed, and roles can repeat (for example several assistant turns), which is convenient for prompt layering and replaying conversation state.

The Responses API — OpenAI's Newer Surface

Since March 2025, OpenAI has offered a second surface, the Responses API (client.responses.create), positioned as the forward-looking default for agentic and reasoning workflows. OpenAI has stated Chat Completions will remain supported indefinitely as a de facto industry standard, but new capabilities — built-in tools, native MCP client support, encrypted stateful reasoning, and better cache utilization — land on Responses first. The Assistants API, an earlier attempt at statefulness, is scheduled to sunset on August 26, 2026, with its functionality absorbed into Responses.

{
  "model": "gpt-5.6",
  "instructions": "You are an enterprise support assistant. Be concise and accurate.",
  "input": [
    { "role": "user", "content": "Summarize the SLA policy for API retries." }
  ],
  "store": true,
  "tools": [{ "type": "web_search" }]
}

from openai import OpenAI

client = OpenAI(api_key="...")

response = client.responses.create(
    model="gpt-5.6",
    instructions="You are an enterprise support assistant.",
    input=[{"role": "user", "content": "Summarize the SLA policy for API retries."}],
    store=True,  # preserves reasoning and tool context turn-to-turn
)

print(response.output_text)

Dimension	Chat Completions	Responses API
State	Stateless — you resend full history	Optional server-side state via `store: true`
Vocabulary	`messages` array	`input` / `output` items (superset of messages)
Portability	Near-universal — the format most third-party gateways mirror	OpenAI-specific; growing MCP and built-in tool support
Recommended for	Simple stateless chat, cross-provider code, existing integrations	New agentic projects, multi-step tool use, reasoning models

Architecture decision: if cross-provider portability matters more than OpenAI's newest agentic features, stay on Chat Completions — it is what Gemini, Grok, vLLM, and most gateways mirror (see Module 1). If you are building an OpenAI-only agent that needs built-in tools, MCP connectors, or long-running reasoning state, start new work on the Responses API.

Module 3: The Anthropic API Standard (Messages API)

The Payload Structure (Raw JSON)

Anthropic's Messages API also uses a turn list, but the system instruction is defined at the top-level system field, not inside the messages array.

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 400,
  "temperature": 0.2,
  "system": "You are an enterprise support assistant. Be concise and accurate.",
  "messages": [
    {
      "role": "user",
      "content": "Summarize the SLA policy for API retries."
    }
  ]
}

The Core Differences from OpenAI

System prompt location: Anthropic keeps system top-level; OpenAI places system instructions inside messages.
Strict alternation: Anthropic enforces user/assistant alternation in messages. You cannot arbitrarily repeat roles without normalization.

The Code (Python SDK)

import os
from typing import List

from anthropic import Anthropic
from anthropic.types import MessageParam


def get_anthropic_reply(user_prompt: str) -> str:
    """Call Anthropic Messages API and return plain text output."""
    client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

    messages: List[MessageParam] = [
        {"role": "user", "content": user_prompt}
    ]

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=400,
        temperature=0.2,
        system="You are an enterprise support assistant.",
        messages=messages,
    )

    text_blocks = [block.text for block in response.content if block.type == "text"]
    return "\n".join(text_blocks)


if __name__ == "__main__":
    answer = get_anthropic_reply("Summarize the SLA policy for API retries.")
    print(answer)

Structured Outputs and Strict Tool Use

Structured outputs reached general availability on the Claude API for Claude 4.5-and-later models (Sonnet 4.5+, Opus 4.5+, Haiku 4.5) in 2026. Rather than hoping the model returns clean JSON, the schema is compiled into a constrained-decoding grammar so the response is guaranteed to validate — no markdown fences, no trailing commas, no retry loop.

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "Extract the order: 2x SKU-4471 at $18.50, ship to Austin, TX."}
  ],
  "output_config": {
    "format": {
      "type": "json_schema",
      "schema": {
        "type": "object",
        "properties": {
          "sku": {"type": "string"},
          "quantity": {"type": "integer"},
          "unit_price": {"type": "number"},
          "ship_to_state": {"type": "string"}
        },
        "required": ["sku", "quantity", "unit_price", "ship_to_state"]
      }
    }
  }
}

The same request can pair output_config.format with tools[].strict: true to also guarantee that tool-call arguments validate against input_schema — useful in agentic loops where a malformed tool call silently breaks the next step. Note the parameter moved from the beta-era output_format to output_config.format; the old shape and its beta header still work during a transition period.

Console migration: console.anthropic.com now redirects to platform.claude.com — the API host (api.anthropic.com) and the anthropic-version: 2023-06-01 header are unchanged. Other 2026 additions worth knowing: a Files API for uploading documents once and referencing them by ID across requests, context editing and beta compaction for pruning or summarizing long agent transcripts server-side, and Claude in Amazon Bedrock's /anthropic/v1/messages endpoint, which mirrors the first-party request shape.

Module 4: The Tool Calling Formats (The Big Divide)

Why Tools Matter

In enterprise systems, free-form text is not enough. Agents must call systems of record, workflow engines, and policy services using structured input. Tool calling turns "chat" into "controlled execution."

Analogy: a tool call is like a purchase order form. A phone call saying "buy servers" is ambiguous, but a structured order with fields, validation, and IDs can be audited and automated.

OpenAI Tool Format

OpenAI defines tools in a tools array and controls execution behavior using tool_choice.

{
  "model": "gpt-5.6-terra",
  "messages": [
    {"role": "system", "content": "You are a weather assistant."},
    {"role": "user", "content": "Will it rain in Seattle tomorrow?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather forecast by city and date.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"},
            "date": {"type": "string", "description": "ISO date, YYYY-MM-DD"}
          },
          "required": ["city", "date"],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  ],
  "tool_choice": "auto"
}

Setting "strict": true on a function definition asks OpenAI to guarantee the returned arguments validate against the schema via constrained decoding — the same underlying idea as Anthropic's strict tool use, released around the same period.

When the model chooses a tool, it returns tool_calls on the assistant message:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Seattle\",\"date\":\"2026-04-21\"}"
            }
          }
        ]
      }
    }
  ]
}

Anthropic Tool Format

Anthropic declares tools with name, description, and input_schema. Tool requests appear as tool_use content blocks.

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 500,
  "system": "You are a weather assistant.",
  "messages": [
    {
      "role": "user",
      "content": "Will it rain in Seattle tomorrow?"
    }
  ],
  "tools": [
    {
      "name": "get_weather",
      "description": "Get weather forecast by city and date.",
      "input_schema": {
        "type": "object",
        "properties": {
          "city": {"type": "string"},
          "date": {"type": "string"}
        },
        "required": ["city", "date"]
      },
      "strict": true
    }
  ]
}

Anthropic's Messages API can also declare an MCP connector directly in the request via a top-level mcp_servers array, letting Claude call a remote MCP server's tools without you hand-defining each one in tools. See Module 7 for how MCP tool declarations differ from this locally-defined format.

Typical response shape includes a tool_use block and may include text blocks. Some Claude variants also emit XML-like reasoning markers such as <thinking>...</thinking> in text output before tool usage. Treat those as model text, not structured API fields.

{
  "content": [
    {
      "type": "text",
      "text": "<thinking>I should call weather data first.</thinking>"
    },
    {
      "type": "tool_use",
      "id": "toolu_01ABC",
      "name": "get_weather",
      "input": {
        "city": "Seattle",
        "date": "2026-07-26"
      }
    }
  ],
  "stop_reason": "tool_use"
}

Module 5: Building an Agnostic Wrapper (Enterprise Architecture)

The Problem

If your code directly depends on provider-specific request and response shapes, moving from GPT-4o to Claude 3.5 Sonnet for cost, latency, or policy reasons becomes a refactor project. Every team touching prompts, tools, and telemetry must rewrite integration code.

Analogy: this is like writing SQL directly against one vendor's proprietary syntax in every file. It works until procurement or platform strategy changes.

The Solution: LLM Router/Wrapper

Use one internal request contract and route it through adapters. The wrapper normalizes message formats, tool schemas, and response parsing. Your app calls the wrapper, not raw vendor APIs.

Implementation Option A: Factory Pattern (Conceptual)

import json
import os
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Dict, List

from anthropic import Anthropic
from openai import OpenAI


@dataclass
class ChatRequest:
    model: str
    messages: List[Dict[str, Any]]
    temperature: float = 0.2
    max_tokens: int = 500


class LlmProvider(ABC):
    @abstractmethod
    def complete(self, request: ChatRequest) -> str:
        raise NotImplementedError


class OpenAIProvider(LlmProvider):
    def __init__(self) -> None:
        self.client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    def complete(self, request: ChatRequest) -> str:
        response = self.client.chat.completions.create(
            model=request.model,
            messages=request.messages,
            temperature=request.temperature,
            max_tokens=request.max_tokens,
        )
        return response.choices[0].message.content or ""


class AnthropicProvider(LlmProvider):
    def __init__(self) -> None:
        self.client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

    def complete(self, request: ChatRequest) -> str:
        # Normalize OpenAI-style messages into Anthropic expectations.
        system_messages = [m["content"] for m in request.messages if m["role"] == "system"]
        system_prompt = "\n".join(system_messages) if system_messages else ""
        non_system_messages = [m for m in request.messages if m["role"] != "system"]

        response = self.client.messages.create(
            model=request.model,
            system=system_prompt,
            messages=non_system_messages,
            temperature=request.temperature,
            max_tokens=request.max_tokens,
        )
        return "\n".join(block.text for block in response.content if block.type == "text")


class LlmProviderFactory:
    @staticmethod
    def create(provider: str) -> LlmProvider:
        if provider == "openai":
            return OpenAIProvider()
        if provider == "anthropic":
            return AnthropicProvider()
        raise ValueError(f"Unsupported provider: {provider}")


if __name__ == "__main__":
    request = ChatRequest(
        model="gpt-5.6-terra",
        messages=[
            {"role": "system", "content": "You are concise."},
            {"role": "user", "content": "Explain exponential backoff in two bullets."},
        ],
    )

    provider = LlmProviderFactory.create("openai")
    print(provider.complete(request))

Implementation Option B: LiteLLM Translation Layer

LiteLLM is an industry-standard library for routing and translating OpenAI-style requests to multiple providers.

import os
from typing import Any, Dict, List

from litellm import completion


def run_with_litellm(messages: List[Dict[str, str]]) -> str:
    """Send OpenAI-formatted messages to an Anthropic model via LiteLLM."""
    os.environ["ANTHROPIC_API_KEY"] = os.environ["ANTHROPIC_API_KEY"]

    response = completion(
        model="anthropic/claude-sonnet-4-6",
        messages=messages,
        temperature=0.2,
        max_tokens=500,
    )

    content: str = response["choices"][0]["message"]["content"]
    return content


if __name__ == "__main__":
    text = run_with_litellm([
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "List three API governance checks."},
    ])
    print(text)

Enterprise note: a router is also the right place to centralize retries, timeout budgets, request IDs, telemetry, redaction, and model-level policy controls.

Implementation Option C: OpenAI-Compatible Endpoints and OpenRouter

For the simplest cases, you may not need a custom wrapper at all. Because Gemini and Grok both expose OpenAI-compatible endpoints, switching providers can be a three-line change: swap base_url, swap the API key, and — capability differences aside — the same OpenAI client keeps working.

from openai import OpenAI

# Same SDK, pointed at Gemini's OpenAI-compatible endpoint instead of OpenAI's own API.
client = OpenAI(
    api_key=os.environ["GEMINI_API_KEY"],
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

response = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[{"role": "user", "content": "Summarize the SLA policy for API retries."}],
)
print(response.choices[0].message.content)

OpenRouter takes this further as a hosted multi-provider gateway: one API key, one OpenAI-shaped endpoint, and a model string prefix (openai/gpt-5.6, anthropic/claude-sonnet-4.6, google/gemini-3.5-flash) selects the backend, with automatic fallback routing across providers on outage or rate limit.

Choosing between LiteLLM, OpenAI-compat endpoints, and MCP: LiteLLM and OpenAI-compatible base URLs solve which model answers the prompt — they are model-call routers. MCP (Module 7) solves a different problem: which tools and data sources the model can reach, independent of which model you route to. Production agent platforms typically need both layers: a model router for cost/latency/failover, and MCP for a portable tool ecosystem.

Module 6: Common Pitfalls and Anti-Patterns

1) Ignoring Streaming

Blocking calls for long generations increase latency and often cause frontend or gateway timeouts. Streaming improves perceived responsiveness because the UI starts rendering tokens immediately.

# Raw request shape (OpenAI)
{
  "model": "gpt-5.6-terra",
  "stream": true,
  "messages": [
    {"role": "user", "content": "Generate a 1200-word architecture proposal."}
  ]
}

import os
from typing import Iterator

from openai import OpenAI


def stream_openai_text(prompt: str) -> Iterator[str]:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    stream = client.chat.completions.create(
        model="gpt-5.6-terra",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )

    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            yield delta


if __name__ == "__main__":
    for token in stream_openai_text("Generate a 1200-word architecture proposal."):
        print(token, end="", flush=True)

2) Hardcoding Schemas

Manually writing JSON schemas for tools is error-prone and drifts from your actual function signature. Use Pydantic to generate schemas from typed models and keep one source of truth. Note that OpenAI's strict: true and Anthropic's structured outputs / strict tool use (Module 3) eliminate the class of failure where the model returns malformed JSON — but they do not eliminate the need for one canonical schema definition; you still generate the schema once and reuse it everywhere.

import json
from pydantic import BaseModel, Field


class GetWeatherInput(BaseModel):
    city: str = Field(description="City name, for example Seattle")
    date: str = Field(description="ISO date in YYYY-MM-DD format")


def openai_tool_definition() -> dict:
    return {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather forecast by city and date.",
            "parameters": GetWeatherInput.model_json_schema(),
        },
    }


if __name__ == "__main__":
    print(json.dumps(openai_tool_definition(), indent=2))

3) Failing to Handle Tool Results Correctly

A tool call is only half the loop. You must execute the tool and send the result back in the provider's expected format, or the model cannot continue reasoning with fresh data.

OpenAI Tool Result Handoff

# After receiving assistant.tool_calls[0]
{
  "role": "tool",
  "tool_call_id": "call_123",
  "content": "{\"temperature_c\": 9, \"condition\": \"rain\"}"
}

Anthropic Tool Result Handoff

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01ABC",
      "content": "{\"temperature_c\": 9, \"condition\": \"rain\"}"
    }
  ]
}

import json
import os
from typing import Any, Dict

from anthropic import Anthropic
from openai import OpenAI


def run_tool_and_continue_openai() -> str:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "system", "content": "You are a weather assistant."},
        {"role": "user", "content": "Will it rain in Seattle tomorrow?"},
    ]

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather forecast by city and date.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string"},
                        "date": {"type": "string"}
                    },
                    "required": ["city", "date"],
                },
            },
        }
    ]

    first = client.chat.completions.create(model="gpt-5.6-terra", messages=messages, tools=tools)
    tool_call = first.choices[0].message.tool_calls[0]
    args: Dict[str, Any] = json.loads(tool_call.function.arguments)

    # Simulated tool execution.
    tool_result = {"city": args["city"], "date": args["date"], "condition": "rain", "temperature_c": 9}

    messages.append(first.choices[0].message.model_dump())
    messages.append(
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(tool_result),
        }
    )

    final = client.chat.completions.create(model="gpt-5.6-terra", messages=messages, tools=tools)
    return final.choices[0].message.content or ""


def run_tool_and_continue_anthropic() -> str:
    client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    tools = [
        {
            "name": "get_weather",
            "description": "Get weather forecast by city and date.",
            "input_schema": {
                "type": "object",
                "properties": {"city": {"type": "string"}, "date": {"type": "string"}},
                "required": ["city", "date"],
            },
        }
    ]

    first = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        system="You are a weather assistant.",
        tools=tools,
        messages=[{"role": "user", "content": "Will it rain in Seattle tomorrow?"}],
    )

    tool_use_block = next(block for block in first.content if block.type == "tool_use")
    tool_input = dict(tool_use_block.input)
    tool_result = {
        "city": tool_input["city"],
        "date": tool_input["date"],
        "condition": "rain",
        "temperature_c": 9,
    }

    final = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        system="You are a weather assistant.",
        tools=tools,
        messages=[
            {"role": "user", "content": "Will it rain in Seattle tomorrow?"},
            {"role": "assistant", "content": [block.model_dump() for block in first.content]},
            {
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_block.id,
                        "content": json.dumps(tool_result),
                    }
                ],
            },
        ],
    )

    return "\n".join(block.text for block in final.content if block.type == "text")

4) Blind Trust in Third-Party MCP Servers

As MCP adoption exploded through 2025–2026, so did incidents that abused it. Because an MCP server's tool descriptions are themselves text fed into the model's context, a malicious or compromised server can embed hidden instructions that hijack the agent — a class of attack commonly called tool poisoning. Documented 2026 incidents included a cross-tenant data leak in a project-management MCP integration and a path-traversal flaw that exposed thousands of hosted MCP apps. Treat every third-party MCP server the way you would treat an unreviewed dependency with production credentials, not a trusted internal library.

# Minimum viable MCP trust checklist before connecting a server
# 1. Pin the server to a specific version/commit — never "latest" from an
#    unaudited registry.
# 2. Read every tool description for embedded instructions aimed at the
#    model rather than the user ("ignore previous instructions", hidden
#    unicode, etc).
# 3. Scope OAuth/API tokens the server receives to least privilege —
#    read-only where the workflow allows it.
# 4. Run untrusted servers in an isolated process/network namespace, not
#    inline with your primary agent process.
# 5. Log every tool call and result for audit, the same way you would
#    log a native function-calling invocation.

⚠

This is not a reason to avoid MCP — it is a reason to apply the same supply-chain discipline you already apply to npm packages or Docker base images. See Module 7 for the protocol's architecture and the guardrails that mitigate this class of risk.

Operational checklist: validate schema once (or use structured outputs / strict tool use), stream where possible, enforce timeout and retry budgets, always close the tool loop by posting structured results back to the model, and audit every MCP server before granting it credentials.

Module 7: The Model Context Protocol (MCP)

Function/tool calling (Module 4) solves how a single request tells a single model about a handful of tools. It does not solve a bigger problem: if you have 10 AI applications and 100 tools, you end up hand-building up to 1,000 bespoke integrations, each coupled to one provider's tool-calling syntax. Anthropic open-sourced the Model Context Protocol (MCP) in November 2024 to collapse that N×M problem into N+M: each application implements the MCP client once, each tool implements the MCP server once, and any client can use any server.

MCP's adoption curve is one of the fastest of any developer protocol in recent memory. OpenAI adopted it in March 2025 and added ChatGPT app support in September 2025. Google DeepMind and Microsoft followed within the year. On December 9, 2025, Anthropic donated MCP's governance to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation — the same body that stewards Kubernetes, PyTorch, and Node.js. Platinum members now include AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft, and OpenAI. Monthly SDK downloads went from roughly 100,000 at launch to 97 million by March 2026. A major spec revision, versioned MCP 2026-07-28, is the largest update since launch and opens a 12-month deprecation window for older spec versions.

Concern	Native Function/Tool Calling	MCP
Coupling	Tool schema hand-defined per provider, per app	Tool schema defined once by the server; any MCP client can consume it
Discovery	Static — you enumerate tools in the request	Dynamic — clients can list a server's available tools/resources at connect time
Transport	In-band with the model API call	Separate JSON-RPC 2.0 channel (stdio locally, Streamable HTTP remotely)
Reuse	Rebuild the integration for each model provider	Write the server once; Claude, ChatGPT, Gemini clients all connect to it
Governance	Vendor-controlled API surface	Open spec under the Linux Foundation's AAIF

Architecture — Host, Client, Server

MCP borrows its shape from the Language Server Protocol. A host application (Claude, ChatGPT, an IDE, your own agent) embeds one or more clients, each holding a 1:1 connection to an MCP server. Servers expose three primitive types: tools (model-invocable functions), resources (readable data the host can attach to context), and prompts (reusable prompt templates the host can surface to the user). Messages are JSON-RPC 2.0 over stdio for local servers or Streamable HTTP for remote ones.

MCP vs. Native Tool Calling — When to Use Which

Use native tool calling for a small, stable set of functions specific to one application and one provider — it is simpler, has no extra process to run, and avoids MCP's network/transport overhead.
Use MCP when the same tool needs to serve multiple AI applications or model providers, when the tool set is large or changes independently of your app's release cycle, or when you want to consume tools that a third party already built (GitHub, Slack, a database, a SaaS product) without writing a custom adapter.
They compose: an MCP client can translate discovered MCP tools into the exact tools array shape a given provider's chat/completions or Messages call expects — MCP does not replace Module 4's wire formats, it feeds them.

Building a Minimal MCP Server (Python)

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("weather-service")


@mcp.tool()
def get_weather(city: str, date: str) -> dict:
    """Get weather forecast by city and ISO date (YYYY-MM-DD)."""
    # Replace with a real weather API call.
    return {"city": city, "date": date, "condition": "rain", "temperature_c": 9}


@mcp.resource("weather://stations/{city}")
def list_stations(city: str) -> str:
    """Expose station metadata as a readable resource."""
    return f"Nearest station for {city}: KSEA (Seattle-Tacoma Intl)"


if __name__ == "__main__":
    mcp.run(transport="stdio")

Connecting to that server from a host application is a client-side concern, not a model-API concern — the host discovers get_weather at connect time and translates it into whichever provider's tool format is in use for the current turn:

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client


async def call_weather_tool() -> dict:
    server_params = StdioServerParameters(command="python", args=["weather_server.py"])

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            tools = await session.list_tools()
            print([t.name for t in tools.tools])  # -> ["get_weather"]

            result = await session.call_tool(
                "get_weather", arguments={"city": "Seattle", "date": "2026-07-26"}
            )
            return result.content


if __name__ == "__main__":
    print(asyncio.run(call_weather_tool()))

On the Anthropic Messages API, a hosted client can skip running its own MCP client loop and instead point the API directly at a remote server via mcp_servers, letting Claude discover and call the server's tools server-side within a single request:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [
    {"role": "user", "content": "What's the weather forecast for Seattle tomorrow?"}
  ],
  "mcp_servers": [
    {
      "type": "url",
      "url": "https://mcp.example.com/weather",
      "name": "weather-service"
    }
  ]
}

Security is not optional. MCP's rapid adoption outran its security tooling in early 2026 — over 30 CVEs were filed against MCP servers and hosts in January and February alone. Apply the checklist in Module 6 before granting any MCP server access to credentials, and prefer Streamable HTTP with proper auth over long-lived SSE connections, which the 2026-07-28 spec revision deprecates in favor of stricter session handling.

LLM API Standards and Tooling Handbook

Table of Contents

Module 1: The Paradigm of LLM APIs

The API Landscape

The De Facto Standard

Module 2: The OpenAI API Standard

The Payload Structure (Raw JSON)

The Code (Python SDK)

The Responses API — OpenAI's Newer Surface

Module 3: The Anthropic API Standard (Messages API)

The Payload Structure (Raw JSON)

The Core Differences from OpenAI

The Code (Python SDK)

Structured Outputs and Strict Tool Use

Module 4: The Tool Calling Formats (The Big Divide)

Why Tools Matter

OpenAI Tool Format

Anthropic Tool Format

Module 5: Building an Agnostic Wrapper (Enterprise Architecture)

The Problem

The Solution: LLM Router/Wrapper

Implementation Option A: Factory Pattern (Conceptual)

Implementation Option B: LiteLLM Translation Layer

Implementation Option C: OpenAI-Compatible Endpoints and OpenRouter

Module 6: Common Pitfalls and Anti-Patterns

1) Ignoring Streaming

2) Hardcoding Schemas

3) Failing to Handle Tool Results Correctly

OpenAI Tool Result Handoff

Anthropic Tool Result Handoff

4) Blind Trust in Third-Party MCP Servers

Module 7: The Model Context Protocol (MCP)

Architecture — Host, Client, Server

MCP vs. Native Tool Calling — When to Use Which

Building a Minimal MCP Server (Python)