LLM API Standards and Tooling Handbook
A production-ready guide to OpenAI and Anthropic API standards, tool-calling formats, and architecture patterns that reduce lock-in while keeping developer velocity high.
Table of Contents
This handbook is organized around six enterprise implementation modules. The sequence moves from API fundamentals to architecture and common integration failures.
Module 1: The Paradigm of LLM APIs
When teams integrate LLMs, they are not just calling a model. They are establishing a long-term contract between product behavior, infrastructure, and vendor capability. API standards matter because they define how quickly you can switch providers, integrate new models, or roll back during incidents.
The API Landscape
Think of LLM APIs like electrical sockets. If every country had a unique plug shape, device makers would ship custom adapters for each region. That is exactly what happens when each model provider has a unique payload format: your application code becomes a stack of adapters, fragile parsers, and duplicated logic.
In enterprise systems, vendor lock-in is not only a pricing risk. It is also an operational risk. If procurement, latency, compliance, or safety requirements change, your architecture should allow provider migration with minimal code churn.
| Concern | Without Standardization | With Standardization |
|---|---|---|
| Cost control | Model switch requires rewrites | Model switch is often a config change |
| Incident response | Hard to fail over across providers | Fallback routing is easier |
| Team onboarding | Provider-specific mental models | Shared API semantics across teams |
The De Facto Standard
The OpenAI chat-completions shape, especially /v1/chat/completions and the messages array, became the de facto standard because it is simple, expressive, and easy to proxy. Open-source inference servers such as vLLM, Ollama, and llama.cpp adopted compatible endpoints so developers could reuse clients and existing prompts.
In practice, this means many systems talk OpenAI-format JSON even when the backend model is not from OpenAI at all. This has the same effect as SQL in databases: each engine has differences, but a shared shape dramatically lowers adoption friction.
Module 2: The OpenAI API Standard
The Payload Structure (Raw JSON)
At the wire level, the request body is JSON. The core object is messages, an ordered list representing conversation turns. Each item has a role and content. The three core roles are:
- system: policy, tone, constraints, and behavioral instructions.
- user: user intent and query.
- assistant: prior model responses, useful for continuation and grounding.
{
"model": "gpt-4o-mini",
"temperature": 0.2,
"max_tokens": 400,
"messages": [
{
"role": "system",
"content": "You are an enterprise support assistant. Be concise and accurate."
},
{
"role": "user",
"content": "Summarize the SLA policy for API retries."
}
]
}
The Code (Python SDK)
import os
from typing import Optional
from openai import OpenAI
def get_openai_chat_reply(user_prompt: str) -> str:
"""Send a standard chat completion request and return assistant text."""
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
response = client.chat.completions.create(
model="gpt-4o-mini",
temperature=0.2,
max_tokens=400,
messages=[
{"role": "system", "content": "You are an enterprise support assistant."},
{"role": "user", "content": user_prompt},
],
)
content: Optional[str] = response.choices[0].message.content
return content or ""
if __name__ == "__main__":
answer = get_openai_chat_reply("Summarize the SLA policy for API retries.")
print(answer)
system messages are allowed, and roles can repeat (for example several assistant turns), which is convenient for prompt layering and replaying conversation state.Module 3: The Anthropic API Standard (Messages API)
The Payload Structure (Raw JSON)
Anthropic's Messages API also uses a turn list, but the system instruction is defined at the top-level system field, not inside the messages array.
{
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 400,
"temperature": 0.2,
"system": "You are an enterprise support assistant. Be concise and accurate.",
"messages": [
{
"role": "user",
"content": "Summarize the SLA policy for API retries."
}
]
}
The Core Differences from OpenAI
- System prompt location: Anthropic keeps
systemtop-level; OpenAI places system instructions insidemessages. - Strict alternation: Anthropic enforces user/assistant alternation in
messages. You cannot arbitrarily repeat roles without normalization.
The Code (Python SDK)
import os
from typing import List
from anthropic import Anthropic
from anthropic.types import MessageParam
def get_anthropic_reply(user_prompt: str) -> str:
"""Call Anthropic Messages API and return plain text output."""
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
messages: List[MessageParam] = [
{"role": "user", "content": user_prompt}
]
response = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=400,
temperature=0.2,
system="You are an enterprise support assistant.",
messages=messages,
)
text_blocks = [block.text for block in response.content if block.type == "text"]
return "\n".join(text_blocks)
if __name__ == "__main__":
answer = get_anthropic_reply("Summarize the SLA policy for API retries.")
print(answer)
Module 4: The Tool Calling Formats (The Big Divide)
Why Tools Matter
In enterprise systems, free-form text is not enough. Agents must call systems of record, workflow engines, and policy services using structured input. Tool calling turns "chat" into "controlled execution."
Analogy: a tool call is like a purchase order form. A phone call saying "buy servers" is ambiguous, but a structured order with fields, validation, and IDs can be audited and automated.
OpenAI Tool Format
OpenAI defines tools in a tools array and controls execution behavior using tool_choice.
{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a weather assistant."},
{"role": "user", "content": "Will it rain in Seattle tomorrow?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather forecast by city and date.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"date": {"type": "string", "description": "ISO date, YYYY-MM-DD"}
},
"required": ["city", "date"],
"additionalProperties": false
}
}
}
],
"tool_choice": "auto"
}
When the model chooses a tool, it returns tool_calls on the assistant message:
{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Seattle\",\"date\":\"2026-04-21\"}"
}
}
]
}
}
]
}
Anthropic Tool Format
Anthropic declares tools with name, description, and input_schema. Tool requests appear as tool_use content blocks.
{
"model": "claude-3-5-sonnet-20240620",
"max_tokens": 500,
"system": "You are a weather assistant.",
"messages": [
{
"role": "user",
"content": "Will it rain in Seattle tomorrow?"
}
],
"tools": [
{
"name": "get_weather",
"description": "Get weather forecast by city and date.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"date": {"type": "string"}
},
"required": ["city", "date"]
}
}
]
}
Typical response shape includes a tool_use block and may include text blocks. Some Claude variants also emit XML-like reasoning markers such as <thinking>...</thinking> in text output before tool usage. Treat those as model text, not structured API fields.
{
"content": [
{
"type": "text",
"text": "<thinking>I should call weather data first.</thinking>"
},
{
"type": "tool_use",
"id": "toolu_01ABC",
"name": "get_weather",
"input": {
"city": "Seattle",
"date": "2026-04-21"
}
}
],
"stop_reason": "tool_use"
}
Module 5: Building an Agnostic Wrapper (Enterprise Architecture)
The Problem
If your code directly depends on provider-specific request and response shapes, moving from GPT-4o to Claude 3.5 Sonnet for cost, latency, or policy reasons becomes a refactor project. Every team touching prompts, tools, and telemetry must rewrite integration code.
Analogy: this is like writing SQL directly against one vendor's proprietary syntax in every file. It works until procurement or platform strategy changes.
The Solution: LLM Router/Wrapper
Use one internal request contract and route it through adapters. The wrapper normalizes message formats, tool schemas, and response parsing. Your app calls the wrapper, not raw vendor APIs.
Implementation Option A: Factory Pattern (Conceptual)
import json
import os
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any, Dict, List
from anthropic import Anthropic
from openai import OpenAI
@dataclass
class ChatRequest:
model: str
messages: List[Dict[str, Any]]
temperature: float = 0.2
max_tokens: int = 500
class LlmProvider(ABC):
@abstractmethod
def complete(self, request: ChatRequest) -> str:
raise NotImplementedError
class OpenAIProvider(LlmProvider):
def __init__(self) -> None:
self.client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
def complete(self, request: ChatRequest) -> str:
response = self.client.chat.completions.create(
model=request.model,
messages=request.messages,
temperature=request.temperature,
max_tokens=request.max_tokens,
)
return response.choices[0].message.content or ""
class AnthropicProvider(LlmProvider):
def __init__(self) -> None:
self.client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
def complete(self, request: ChatRequest) -> str:
# Normalize OpenAI-style messages into Anthropic expectations.
system_messages = [m["content"] for m in request.messages if m["role"] == "system"]
system_prompt = "\n".join(system_messages) if system_messages else ""
non_system_messages = [m for m in request.messages if m["role"] != "system"]
response = self.client.messages.create(
model=request.model,
system=system_prompt,
messages=non_system_messages,
temperature=request.temperature,
max_tokens=request.max_tokens,
)
return "\n".join(block.text for block in response.content if block.type == "text")
class LlmProviderFactory:
@staticmethod
def create(provider: str) -> LlmProvider:
if provider == "openai":
return OpenAIProvider()
if provider == "anthropic":
return AnthropicProvider()
raise ValueError(f"Unsupported provider: {provider}")
if __name__ == "__main__":
request = ChatRequest(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "Explain exponential backoff in two bullets."},
],
)
provider = LlmProviderFactory.create("openai")
print(provider.complete(request))
Implementation Option B: LiteLLM Translation Layer
LiteLLM is an industry-standard library for routing and translating OpenAI-style requests to multiple providers.
import os
from typing import Any, Dict, List
from litellm import completion
def run_with_litellm(messages: List[Dict[str, str]]) -> str:
"""Send OpenAI-formatted messages to an Anthropic model via LiteLLM."""
os.environ["ANTHROPIC_API_KEY"] = os.environ["ANTHROPIC_API_KEY"]
response = completion(
model="anthropic/claude-3-5-sonnet-20240620",
messages=messages,
temperature=0.2,
max_tokens=500,
)
content: str = response["choices"][0]["message"]["content"]
return content
if __name__ == "__main__":
text = run_with_litellm([
{"role": "system", "content": "You are concise."},
{"role": "user", "content": "List three API governance checks."},
])
print(text)
Module 6: Common Pitfalls and Anti-Patterns
1) Ignoring Streaming
Blocking calls for long generations increase latency and often cause frontend or gateway timeouts. Streaming improves perceived responsiveness because the UI starts rendering tokens immediately.
# Raw request shape (OpenAI)
{
"model": "gpt-4o-mini",
"stream": true,
"messages": [
{"role": "user", "content": "Generate a 1200-word architecture proposal."}
]
}
import os
from typing import Iterator
from openai import OpenAI
def stream_openai_text(prompt: str) -> Iterator[str]:
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
yield delta
if __name__ == "__main__":
for token in stream_openai_text("Generate a 1200-word architecture proposal."):
print(token, end="", flush=True)
2) Hardcoding Schemas
Manually writing JSON schemas for tools is error-prone and drifts from your actual function signature. Use Pydantic to generate schemas from typed models and keep one source of truth.
import json
from pydantic import BaseModel, Field
class GetWeatherInput(BaseModel):
city: str = Field(description="City name, for example Seattle")
date: str = Field(description="ISO date in YYYY-MM-DD format")
def openai_tool_definition() -> dict:
return {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather forecast by city and date.",
"parameters": GetWeatherInput.model_json_schema(),
},
}
if __name__ == "__main__":
print(json.dumps(openai_tool_definition(), indent=2))
3) Failing to Handle Tool Results Correctly
A tool call is only half the loop. You must execute the tool and send the result back in the provider's expected format, or the model cannot continue reasoning with fresh data.
OpenAI Tool Result Handoff
# After receiving assistant.tool_calls[0]
{
"role": "tool",
"tool_call_id": "call_123",
"content": "{\"temperature_c\": 9, \"condition\": \"rain\"}"
}
Anthropic Tool Result Handoff
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01ABC",
"content": "{\"temperature_c\": 9, \"condition\": \"rain\"}"
}
]
}
import json
import os
from typing import Any, Dict
from anthropic import Anthropic
from openai import OpenAI
def run_tool_and_continue_openai() -> str:
client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
messages = [
{"role": "system", "content": "You are a weather assistant."},
{"role": "user", "content": "Will it rain in Seattle tomorrow?"},
]
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather forecast by city and date.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"date": {"type": "string"}
},
"required": ["city", "date"],
},
},
}
]
first = client.chat.completions.create(model="gpt-4o-mini", messages=messages, tools=tools)
tool_call = first.choices[0].message.tool_calls[0]
args: Dict[str, Any] = json.loads(tool_call.function.arguments)
# Simulated tool execution.
tool_result = {"city": args["city"], "date": args["date"], "condition": "rain", "temperature_c": 9}
messages.append(first.choices[0].message.model_dump())
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(tool_result),
}
)
final = client.chat.completions.create(model="gpt-4o-mini", messages=messages, tools=tools)
return final.choices[0].message.content or ""
def run_tool_and_continue_anthropic() -> str:
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
tools = [
{
"name": "get_weather",
"description": "Get weather forecast by city and date.",
"input_schema": {
"type": "object",
"properties": {"city": {"type": "string"}, "date": {"type": "string"}},
"required": ["city", "date"],
},
}
]
first = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=500,
system="You are a weather assistant.",
tools=tools,
messages=[{"role": "user", "content": "Will it rain in Seattle tomorrow?"}],
)
tool_use_block = next(block for block in first.content if block.type == "tool_use")
tool_input = dict(tool_use_block.input)
tool_result = {
"city": tool_input["city"],
"date": tool_input["date"],
"condition": "rain",
"temperature_c": 9,
}
final = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=500,
system="You are a weather assistant.",
tools=tools,
messages=[
{"role": "user", "content": "Will it rain in Seattle tomorrow?"},
{"role": "assistant", "content": [block.model_dump() for block in first.content]},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(tool_result),
}
],
},
],
)
return "\n".join(block.text for block in final.content if block.type == "text")