LLM API Standards and Tooling Handbook
Back to handbooks index
Enterprise Integration Reference

LLM API Standards and Tooling Handbook

A production-ready guide to OpenAI and Anthropic API standards, tool-calling formats, and architecture patterns that reduce lock-in while keeping developer velocity high.

Raw JSON First Python SDK Examples Vendor-Agnostic Design April 2026
i
How to use this handbook: every module starts with the exact wire payload sent over HTTP, then shows how Python SDKs abstract that payload. This is the fastest way to understand both API behavior and production debugging.

Table of Contents

This handbook is organized around six enterprise implementation modules. The sequence moves from API fundamentals to architecture and common integration failures.

Module 1 Included
The Paradigm of LLM APIs
Why standards matter, what lock-in looks like, and why /v1/chat/completions became the practical common language.
Module 2 Included
OpenAI API Standard
Message roles, payload anatomy, and Python SDK usage with flexible role ordering.
Module 3 Included
Anthropic Messages API
Top-level system prompt design, strict alternation, and implementation details in Python.
Module 4 Included
Tool Calling Formats
OpenAI function/tool calling versus Anthropic tool definitions and output behavior.
Module 5 Included
Agnostic Wrapper Pattern
Factory-based router design and LiteLLM translation to avoid provider-specific rewrites.
Module 6 Included
Pitfalls and Anti-Patterns
Streaming gaps, schema drift, and incorrect tool-result handoff patterns that break agents.

Module 1: The Paradigm of LLM APIs

When teams integrate LLMs, they are not just calling a model. They are establishing a long-term contract between product behavior, infrastructure, and vendor capability. API standards matter because they define how quickly you can switch providers, integrate new models, or roll back during incidents.

The API Landscape

Think of LLM APIs like electrical sockets. If every country had a unique plug shape, device makers would ship custom adapters for each region. That is exactly what happens when each model provider has a unique payload format: your application code becomes a stack of adapters, fragile parsers, and duplicated logic.

In enterprise systems, vendor lock-in is not only a pricing risk. It is also an operational risk. If procurement, latency, compliance, or safety requirements change, your architecture should allow provider migration with minimal code churn.

ConcernWithout StandardizationWith Standardization
Cost controlModel switch requires rewritesModel switch is often a config change
Incident responseHard to fail over across providersFallback routing is easier
Team onboardingProvider-specific mental modelsShared API semantics across teams

The De Facto Standard

The OpenAI chat-completions shape, especially /v1/chat/completions and the messages array, became the de facto standard because it is simple, expressive, and easy to proxy. Open-source inference servers such as vLLM, Ollama, and llama.cpp adopted compatible endpoints so developers could reuse clients and existing prompts.

In practice, this means many systems talk OpenAI-format JSON even when the backend model is not from OpenAI at all. This has the same effect as SQL in databases: each engine has differences, but a shared shape dramatically lowers adoption friction.

Module 2: The OpenAI API Standard

The Payload Structure (Raw JSON)

At the wire level, the request body is JSON. The core object is messages, an ordered list representing conversation turns. Each item has a role and content. The three core roles are:

{
  "model": "gpt-4o-mini",
  "temperature": 0.2,
  "max_tokens": 400,
  "messages": [
    {
      "role": "system",
      "content": "You are an enterprise support assistant. Be concise and accurate."
    },
    {
      "role": "user",
      "content": "Summarize the SLA policy for API retries."
    }
  ]
}

The Code (Python SDK)

import os
from typing import Optional

from openai import OpenAI


def get_openai_chat_reply(user_prompt: str) -> str:
    """Send a standard chat completion request and return assistant text."""
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0.2,
        max_tokens=400,
        messages=[
            {"role": "system", "content": "You are an enterprise support assistant."},
            {"role": "user", "content": user_prompt},
        ],
    )

    content: Optional[str] = response.choices[0].message.content
    return content or ""


if __name__ == "__main__":
    answer = get_openai_chat_reply("Summarize the SLA policy for API retries.")
    print(answer)
+
Key characteristic: OpenAI-format APIs are highly flexible. Multiple system messages are allowed, and roles can repeat (for example several assistant turns), which is convenient for prompt layering and replaying conversation state.

Module 3: The Anthropic API Standard (Messages API)

The Payload Structure (Raw JSON)

Anthropic's Messages API also uses a turn list, but the system instruction is defined at the top-level system field, not inside the messages array.

{
  "model": "claude-3-5-sonnet-20240620",
  "max_tokens": 400,
  "temperature": 0.2,
  "system": "You are an enterprise support assistant. Be concise and accurate.",
  "messages": [
    {
      "role": "user",
      "content": "Summarize the SLA policy for API retries."
    }
  ]
}

The Core Differences from OpenAI

  1. System prompt location: Anthropic keeps system top-level; OpenAI places system instructions inside messages.
  2. Strict alternation: Anthropic enforces user/assistant alternation in messages. You cannot arbitrarily repeat roles without normalization.

The Code (Python SDK)

import os
from typing import List

from anthropic import Anthropic
from anthropic.types import MessageParam


def get_anthropic_reply(user_prompt: str) -> str:
    """Call Anthropic Messages API and return plain text output."""
    client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

    messages: List[MessageParam] = [
        {"role": "user", "content": user_prompt}
    ]

    response = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        max_tokens=400,
        temperature=0.2,
        system="You are an enterprise support assistant.",
        messages=messages,
    )

    text_blocks = [block.text for block in response.content if block.type == "text"]
    return "\n".join(text_blocks)


if __name__ == "__main__":
    answer = get_anthropic_reply("Summarize the SLA policy for API retries.")
    print(answer)

Module 4: The Tool Calling Formats (The Big Divide)

Why Tools Matter

In enterprise systems, free-form text is not enough. Agents must call systems of record, workflow engines, and policy services using structured input. Tool calling turns "chat" into "controlled execution."

Analogy: a tool call is like a purchase order form. A phone call saying "buy servers" is ambiguous, but a structured order with fields, validation, and IDs can be audited and automated.

OpenAI Tool Format

OpenAI defines tools in a tools array and controls execution behavior using tool_choice.

{
  "model": "gpt-4o-mini",
  "messages": [
    {"role": "system", "content": "You are a weather assistant."},
    {"role": "user", "content": "Will it rain in Seattle tomorrow?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather forecast by city and date.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"},
            "date": {"type": "string", "description": "ISO date, YYYY-MM-DD"}
          },
          "required": ["city", "date"],
          "additionalProperties": false
        }
      }
    }
  ],
  "tool_choice": "auto"
}

When the model chooses a tool, it returns tool_calls on the assistant message:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Seattle\",\"date\":\"2026-04-21\"}"
            }
          }
        ]
      }
    }
  ]
}

Anthropic Tool Format

Anthropic declares tools with name, description, and input_schema. Tool requests appear as tool_use content blocks.

{
  "model": "claude-3-5-sonnet-20240620",
  "max_tokens": 500,
  "system": "You are a weather assistant.",
  "messages": [
    {
      "role": "user",
      "content": "Will it rain in Seattle tomorrow?"
    }
  ],
  "tools": [
    {
      "name": "get_weather",
      "description": "Get weather forecast by city and date.",
      "input_schema": {
        "type": "object",
        "properties": {
          "city": {"type": "string"},
          "date": {"type": "string"}
        },
        "required": ["city", "date"]
      }
    }
  ]
}

Typical response shape includes a tool_use block and may include text blocks. Some Claude variants also emit XML-like reasoning markers such as <thinking>...</thinking> in text output before tool usage. Treat those as model text, not structured API fields.

{
  "content": [
    {
      "type": "text",
      "text": "<thinking>I should call weather data first.</thinking>"
    },
    {
      "type": "tool_use",
      "id": "toolu_01ABC",
      "name": "get_weather",
      "input": {
        "city": "Seattle",
        "date": "2026-04-21"
      }
    }
  ],
  "stop_reason": "tool_use"
}

Module 5: Building an Agnostic Wrapper (Enterprise Architecture)

The Problem

If your code directly depends on provider-specific request and response shapes, moving from GPT-4o to Claude 3.5 Sonnet for cost, latency, or policy reasons becomes a refactor project. Every team touching prompts, tools, and telemetry must rewrite integration code.

Analogy: this is like writing SQL directly against one vendor's proprietary syntax in every file. It works until procurement or platform strategy changes.

The Solution: LLM Router/Wrapper

Use one internal request contract and route it through adapters. The wrapper normalizes message formats, tool schemas, and response parsing. Your app calls the wrapper, not raw vendor APIs.

Implementation Option A: Factory Pattern (Conceptual)

import json
import os
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Dict, List

from anthropic import Anthropic
from openai import OpenAI


@dataclass
class ChatRequest:
    model: str
    messages: List[Dict[str, Any]]
    temperature: float = 0.2
    max_tokens: int = 500


class LlmProvider(ABC):
    @abstractmethod
    def complete(self, request: ChatRequest) -> str:
        raise NotImplementedError


class OpenAIProvider(LlmProvider):
    def __init__(self) -> None:
        self.client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    def complete(self, request: ChatRequest) -> str:
        response = self.client.chat.completions.create(
            model=request.model,
            messages=request.messages,
            temperature=request.temperature,
            max_tokens=request.max_tokens,
        )
        return response.choices[0].message.content or ""


class AnthropicProvider(LlmProvider):
    def __init__(self) -> None:
        self.client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

    def complete(self, request: ChatRequest) -> str:
        # Normalize OpenAI-style messages into Anthropic expectations.
        system_messages = [m["content"] for m in request.messages if m["role"] == "system"]
        system_prompt = "\n".join(system_messages) if system_messages else ""
        non_system_messages = [m for m in request.messages if m["role"] != "system"]

        response = self.client.messages.create(
            model=request.model,
            system=system_prompt,
            messages=non_system_messages,
            temperature=request.temperature,
            max_tokens=request.max_tokens,
        )
        return "\n".join(block.text for block in response.content if block.type == "text")


class LlmProviderFactory:
    @staticmethod
    def create(provider: str) -> LlmProvider:
        if provider == "openai":
            return OpenAIProvider()
        if provider == "anthropic":
            return AnthropicProvider()
        raise ValueError(f"Unsupported provider: {provider}")


if __name__ == "__main__":
    request = ChatRequest(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are concise."},
            {"role": "user", "content": "Explain exponential backoff in two bullets."},
        ],
    )

    provider = LlmProviderFactory.create("openai")
    print(provider.complete(request))

Implementation Option B: LiteLLM Translation Layer

LiteLLM is an industry-standard library for routing and translating OpenAI-style requests to multiple providers.

import os
from typing import Any, Dict, List

from litellm import completion


def run_with_litellm(messages: List[Dict[str, str]]) -> str:
    """Send OpenAI-formatted messages to an Anthropic model via LiteLLM."""
    os.environ["ANTHROPIC_API_KEY"] = os.environ["ANTHROPIC_API_KEY"]

    response = completion(
        model="anthropic/claude-3-5-sonnet-20240620",
        messages=messages,
        temperature=0.2,
        max_tokens=500,
    )

    content: str = response["choices"][0]["message"]["content"]
    return content


if __name__ == "__main__":
    text = run_with_litellm([
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "List three API governance checks."},
    ])
    print(text)
!
Enterprise note: a router is also the right place to centralize retries, timeout budgets, request IDs, telemetry, redaction, and model-level policy controls.

Module 6: Common Pitfalls and Anti-Patterns

1) Ignoring Streaming

Blocking calls for long generations increase latency and often cause frontend or gateway timeouts. Streaming improves perceived responsiveness because the UI starts rendering tokens immediately.

# Raw request shape (OpenAI)
{
  "model": "gpt-4o-mini",
  "stream": true,
  "messages": [
    {"role": "user", "content": "Generate a 1200-word architecture proposal."}
  ]
}
import os
from typing import Iterator

from openai import OpenAI


def stream_openai_text(prompt: str) -> Iterator[str]:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
    )

    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            yield delta


if __name__ == "__main__":
    for token in stream_openai_text("Generate a 1200-word architecture proposal."):
        print(token, end="", flush=True)

2) Hardcoding Schemas

Manually writing JSON schemas for tools is error-prone and drifts from your actual function signature. Use Pydantic to generate schemas from typed models and keep one source of truth.

import json
from pydantic import BaseModel, Field


class GetWeatherInput(BaseModel):
    city: str = Field(description="City name, for example Seattle")
    date: str = Field(description="ISO date in YYYY-MM-DD format")


def openai_tool_definition() -> dict:
    return {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather forecast by city and date.",
            "parameters": GetWeatherInput.model_json_schema(),
        },
    }


if __name__ == "__main__":
    print(json.dumps(openai_tool_definition(), indent=2))

3) Failing to Handle Tool Results Correctly

A tool call is only half the loop. You must execute the tool and send the result back in the provider's expected format, or the model cannot continue reasoning with fresh data.

OpenAI Tool Result Handoff

# After receiving assistant.tool_calls[0]
{
  "role": "tool",
  "tool_call_id": "call_123",
  "content": "{\"temperature_c\": 9, \"condition\": \"rain\"}"
}

Anthropic Tool Result Handoff

{
  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01ABC",
      "content": "{\"temperature_c\": 9, \"condition\": \"rain\"}"
    }
  ]
}
import json
import os
from typing import Any, Dict

from anthropic import Anthropic
from openai import OpenAI


def run_tool_and_continue_openai() -> str:
    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
    messages = [
        {"role": "system", "content": "You are a weather assistant."},
        {"role": "user", "content": "Will it rain in Seattle tomorrow?"},
    ]

    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather forecast by city and date.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {"type": "string"},
                        "date": {"type": "string"}
                    },
                    "required": ["city", "date"],
                },
            },
        }
    ]

    first = client.chat.completions.create(model="gpt-4o-mini", messages=messages, tools=tools)
    tool_call = first.choices[0].message.tool_calls[0]
    args: Dict[str, Any] = json.loads(tool_call.function.arguments)

    # Simulated tool execution.
    tool_result = {"city": args["city"], "date": args["date"], "condition": "rain", "temperature_c": 9}

    messages.append(first.choices[0].message.model_dump())
    messages.append(
        {
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(tool_result),
        }
    )

    final = client.chat.completions.create(model="gpt-4o-mini", messages=messages, tools=tools)
    return final.choices[0].message.content or ""


def run_tool_and_continue_anthropic() -> str:
    client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    tools = [
        {
            "name": "get_weather",
            "description": "Get weather forecast by city and date.",
            "input_schema": {
                "type": "object",
                "properties": {"city": {"type": "string"}, "date": {"type": "string"}},
                "required": ["city", "date"],
            },
        }
    ]

    first = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        max_tokens=500,
        system="You are a weather assistant.",
        tools=tools,
        messages=[{"role": "user", "content": "Will it rain in Seattle tomorrow?"}],
    )

    tool_use_block = next(block for block in first.content if block.type == "tool_use")
    tool_input = dict(tool_use_block.input)
    tool_result = {
        "city": tool_input["city"],
        "date": tool_input["date"],
        "condition": "rain",
        "temperature_c": 9,
    }

    final = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        max_tokens=500,
        system="You are a weather assistant.",
        tools=tools,
        messages=[
            {"role": "user", "content": "Will it rain in Seattle tomorrow?"},
            {"role": "assistant", "content": [block.model_dump() for block in first.content]},
            {
                "role": "user",
                "content": [
                    {
                        "type": "tool_result",
                        "tool_use_id": tool_use_block.id,
                        "content": json.dumps(tool_result),
                    }
                ],
            },
        ],
    )

    return "\n".join(block.text for block in final.content if block.type == "text")
i
Operational checklist: validate schema once, stream where possible, enforce timeout and retry budgets, and always close the tool loop by posting structured results back to the model.