V1
Back to handbooks index
Harness Engineering · AI
2026 Edition Multi-Agent Production
Field Handbook · Production AI Systems

Harness
Engineering

"Prompt engineering was 2023. Harness engineering is 2026."

The complete operational guide to building reliable AI systems — covering tool orchestration, multi-layer guardrails, error recovery, feedback loops, observability, and human oversight. Goes beyond individual model calls to the production infrastructure that makes AI agents dependable.

Tool Orchestration Multi-Agent Observability Guardrails Error Recovery Human-in-Loop
01

What Is Harness Engineering

// DEFINITION AND SCOPE

Harness engineering is the discipline of designing the execution environment, tooling infrastructure, and orchestration logic that surrounds a language model. It is everything that is not the model itself — context assembly, tool dispatch, memory management, guardrails, retry logic, state tracking, output validation, and observability.

The harness converts a probabilistic text generator into a reliable, auditable, goal-directed system. Without a harness, an LLM is a demo. With a production-grade harness, the same model becomes infrastructure.

✕ Without a Harness
  • Model calls are stateless — no continuity across turns
  • Errors surface as hallucinated "success" responses
  • Tools invoked with no permission scoping
  • No retry, no fallback, no circuit breaking
  • Unlimited token burn — no cost guardrails
  • Invisible — no tracing, no audit, no replay
  • Human oversight requires manual monitoring
✓ With a Production Harness
  • Structured state handoffs between context windows
  • Deterministic validation of every output
  • Tools scoped to minimum required permissions
  • Retry with backoff, fallback strategies, circuit breakers
  • Token budgets, rate limits, cost alerting
  • Full distributed trace: every decision auditable
  • Policy-defined human approval gates
Key insight from 2025 benchmarks: Improving the harness on the same model consistently outperforms switching to a more capable model on real production workloads. The scaffolding is the system — the model is just a component. The same Opus 4.7 running inside different harnesses produces dramatically different reliability profiles.
02

Why It Matters in 2026

// THE SHIFT FROM PROMPT TO SYSTEM

Three converging forces have made harness engineering the dominant discipline for production AI teams in 2026: models are reasoning-capable by default (reducing prompt ROI), agents are now taking consequential real-world actions (raising the cost of failure), and multi-agent topologies mean errors compound across parallel execution paths.

Prompt ROI Collapse Driver

Stanford HAI (late 2025): marginal returns on prompt optimization have flattened as frontier models reason by default. The leverage has moved entirely to execution infrastructure — context assembly, tool selection, and output validation.

Consequential Actions Driver

Agents now write to databases, push to production branches, send emails, submit PRs, and provision cloud resources. Errors are no longer correctable by re-reading the chat. A harness-less agent with shell access is an uncontrolled blast radius.

Multi-Agent Compounding Driver

Ten agents running in parallel each making small errors creates cascading failures that are nearly impossible to debug post-hoc. The harness provides the isolation, state boundaries, and inter-agent contracts that prevent compound failure.

OWASP LLM06:2025 — Excessive Agency: The most dangerous harness anti-pattern is over-provisioning. An agent given filesystem write access, network egress, and process execution rights — when it only needs to run tests — is an amplified attack surface. OWASP classifies unnecessary permissions as a top-tier LLM risk. The harness is the enforcement layer that limits what an agent can actually do versus what it thinks it can do.
03

The Three-Layer Model

// ARCHITECTURE OVERVIEW

Every production AI harness — regardless of the model or framework — is composed of three concentric layers. Most teams in 2025 only built Layer 1. Teams shipping reliable production AI in 2026 engineer all three deliberately.

💬
Layer 1: Model Interface
Prompt construction, context assembly, response parsing, token management
⚙️
Layer 2: Runtime Environment
Tool definitions, memory stores, input validation, output guardrails, context window management
🌐
Layer 3: Orchestration
Agent loops, task decomposition, conditional branching, human approval gates, parallel execution, state handoffs

Agent Execution Loop (Plan–Execute–Verify)

Input
Task + Context
Plan
Decompose + Select Tools
Guardrail
Input Validation
Execute
Tool Calls
Verify
Output Validation
Feedback
Pass / Fix / Escalate
04

Tool Orchestration

// CAPABILITY REGISTRY, DISPATCH, AND PERMISSIONS

Tools give an agent capabilities beyond text generation — web search, code execution, database queries, file operations, API calls. The harness is responsible for defining the tool registry, enforcing permission scopes, handling tool failures, and composing multi-tool workflows into coherent agent actions.

Start with 3–5 well-defined tools. A lean, well-scoped tool registry outperforms a broad, loosely defined one. Each tool should have: a clear natural-language description for model selection, typed input/output schema, explicit permission scope, timeout budget, and a documented fallback behavior.

Tool Registry Design

Python — Tool Registry with Permissions
from dataclasses import dataclass from enum import Enum from typing import Callable, Any import anthropic class PermissionLevel(Enum): READ_ONLY = "read_only" # no side effects WRITE = "write" # modifies state, reversible DESTRUCTIVE = "destructive" # requires human approval @dataclass class HarnessTool: name: str description: str schema: dict # JSON Schema for inputs handler: Callable permission: PermissionLevel timeout_s: int = 30 max_retries: int = 3 # Tool registry — define capabilities explicitly TOOL_REGISTRY: dict[str, HarnessTool] = { "read_file": HarnessTool( name="read_file", description="Read the contents of a file. Only files under ./src/ are accessible.", schema={ "type": "object", "properties": { "path": {"type": "string", "pattern": "^\\./src/"} }, "required": ["path"] }, handler=handle_read_file, permission=PermissionLevel.READ_ONLY, ), "run_tests": HarnessTool( name="run_tests", description="Execute the test suite. Returns exit code, stdout, and stderr.", schema={"type": "object", "properties": { "test_path": {"type": "string", "default": "tests/"} }}, handler=handle_run_tests, permission=PermissionLevel.READ_ONLY, timeout_s=120, ), "write_file": HarnessTool( name="write_file", description="Write content to a file. Restricted to ./src/. Creates backup before writing.", schema={ "type": "object", "properties": { "path": {"type": "string", "pattern": "^\\./src/"}, "content": {"type": "string"} }, "required": ["path", "content"] }, handler=handle_write_file, permission=PermissionLevel.WRITE, ), "delete_file": HarnessTool( name="delete_file", description="Delete a file. REQUIRES human approval before execution.", schema={"type": "object", "properties": { "path": {"type": "string"} }}, handler=handle_delete_file, permission=PermissionLevel.DESTRUCTIVE, # always gate to human ), }

Multi-Tool Workflow: Write–Test–Fix Loop

Python — Harness Execution with Tool Dispatch
import anthropic, json, time from harness import TOOL_REGISTRY, PermissionLevel, require_human_approval client = anthropic.Anthropic() def run_agent_loop(task: str, max_iterations: int = 20) -> str: """ Bounded Plan–Execute–Verify loop with tool dispatch. Exits on: task completion, max iterations, or unrecoverable error. """ messages = [{"role": "user", "content": task}] tool_defs = [t_to_anthropic_schema(t) for t in TOOL_REGISTRY.values()] iteration = 0 while iteration < max_iterations: iteration += 1 # ── Model call ────────────────────────────────────────── response = client.messages.create( model="claude-sonnet-4-6", max_tokens=8192, tools=tool_defs, messages=messages, ) # ── Stop conditions ───────────────────────────────────── if response.stop_reason == "end_turn": return extract_text(response) # task complete # ── Tool use ──────────────────────────────────────────── tool_results = [] for block in response.content: if block.type != "tool_use": continue tool = TOOL_REGISTRY.get(block.name) if not tool: tool_results.append(tool_error(block.id, f"Unknown tool: {block.name}")) continue # ── Permission gate ───────────────────────────────────── if tool.permission == PermissionLevel.DESTRUCTIVE: approved = require_human_approval(block.name, block.input) if not approved: tool_results.append(tool_error(block.id, "Action rejected by operator")) continue # ── Execute with timeout + retry ──────────────────────── result = execute_with_retry(tool, block.input) tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result}) messages.append({"role": "assistant", "content": response.content}) messages.append({"role": "user", "content": tool_results}) raise HarnessError(f"Max iterations ({max_iterations}) reached without completion") def execute_with_retry(tool, inputs, attempt=0): try: result = run_with_timeout(tool.handler, inputs, tool.timeout_s) return json.dumps(result) except TimeoutError: return json.dumps({"error": "timeout", "tool": tool.name, "timeout_s": tool.timeout_s}) except Exception as e: if attempt < tool.max_retries: time.sleep(2 ** attempt) # exponential backoff return execute_with_retry(tool, inputs, attempt + 1) return json.dumps({"error": str(e), "final_attempt": True})
05

Guardrails & Safety

// MULTI-LAYER BEHAVIORAL ENFORCEMENT

Guardrails are the deterministic rules that prevent agents from taking harmful, unauthorized, or out-of-scope actions. Unlike model-level safety (stochastic), harness guardrails are deterministic and composable. They operate at five distinct intercept points: before input reaches the model, during tool invocation, on output before delivery, at the session boundary, and at the infrastructure level.

🔍
Input Rails
Prompt injection detection, content classification, scope validation
⛓️
Dialog Rails
Conversation flow control, topic boundaries, CoT steering
🔧
Execution Rails
Tool allow/deny lists, parameter validation, sandbox boundaries
📤
Output Rails
Schema validation, linting, test gate, PII scrubbing
💰
Cost Rails
Token budgets, rate limits, per-session cost caps
📁
Data Rails
File path allowlists, secret detection, data classification gates
NVIDIA NeMo Guardrails pattern: Define rails using the Colang DSL at each intercept layer. The execution rail layer specifically governs what tools the LLM can invoke and what their inputs/outputs may contain — the reference for behavioral-level enforcement when static allow/deny lists are insufficient. More constraints yield more reliability, not less.

Guardrail Implementation: CLAUDE.md / AGENTS.md

Markdown — CLAUDE.md Harness Config (Claude Code)
# CLAUDE.md — Project Harness Configuration # Loaded automatically at session start by Claude Code ## Project Context This is a TypeScript Node.js API service. Tests use Vitest. Deployment via GitHub Actions. ## Allowed Actions - Read and modify files under ./src/ and ./tests/ - Run: npm test, npm run lint, npm run typecheck - Create new files following the naming convention: kebab-case.ts ## Prohibited Actions - NEVER modify .env, .env.production, or any secrets file - NEVER run npm publish, git push, or deployment scripts - NEVER install new packages without confirming with the user first - NEVER delete files — move to ./trash/ directory instead - NEVER expose API keys, tokens, or credentials in any output ## Required Verification Steps After any code change, you MUST: 1. Run npm run typecheck — zero errors required 2. Run npm test — all tests must pass 3. Run npm run lint — zero warnings for new code ## Output Format - Always explain the change made and why - List files modified - Show test results summary - Flag any remaining TODOs ## Escalation Triggers Pause and ask the user before proceeding if: - Any test failure that you cannot fix in 2 attempts - A change affects more than 5 files - You encounter an architectural decision not covered here

Output Validation Gate

Python — Multi-Layer Output Guardrail
import subprocess, re from dataclasses import dataclass @dataclass class ValidationResult: passed: bool errors: list[str] warnings: list[str] def validate_code_output(code: str, file_path: str) -> ValidationResult: """ Multi-layer output validation before accepting agent-written code. Each layer is deterministic — not reliant on the model. """ errors, warnings = [], [] # Layer 1: Secret detection (never commit secrets) SECRET_PATTERNS = [ r'sk-[A-Za-z0-9]{32,}', # OpenAI / Anthropic keys r'AKIA[A-Z0-9]{16}', # AWS access key r'ghp_[A-Za-z0-9]{36}', # GitHub PAT r'password\s*=\s*["\'][^"\']{8,}', # hardcoded password ] for pattern in SECRET_PATTERNS: if re.search(pattern, code, re.IGNORECASE): errors.append(f"SECRET_DETECTED: pattern '{pattern}' found in output") # Layer 2: Static analysis (TypeScript example) if file_path.endswith('.ts'): r = subprocess.run(['npx', 'tsc', '--noEmit', file_path], capture_output=True, text=True) if r.returncode != 0: errors.append(f"TYPE_ERROR: {r.stdout[:500]}") # Layer 3: Lint r = subprocess.run(['npx', 'eslint', file_path, '--format=json'], capture_output=True, text=True) if r.returncode != 0: warnings.append(f"LINT_WARNING: {r.stdout[:300]}") # Layer 4: Unit tests (if applicable) r = subprocess.run(['npm', 'test', '--', '--run'], capture_output=True, text=True, timeout=60) if r.returncode != 0: errors.append(f"TEST_FAILURE:\n{r.stdout[-800:]}") return ValidationResult(passed=len(errors)==0, errors=errors, warnings=warnings)
06

Error Recovery

// RETRY STRATEGIES, FALLBACKS, AND CIRCUIT BREAKERS

Agent failures fall into three categories: transient (network timeout, rate limit), recoverable (test failure, type error, tool validation error), and unrecoverable (stuck in a loop, contradiction in task requirements, permission denied). Each category demands a different recovery strategy. The harness must distinguish between them and respond appropriately rather than retrying uniformly.

Error TypeExamplesRecovery StrategyMax Attempts
Transient API timeout, 429 rate limit, network flap Exponential backoff with jitter 3 with backoff
Tool Failure Tool returns error, invalid output schema Return structured error to model; allow re-plan Model decides
Validation Failure Tests fail, typecheck fails, lint errors Pass error output back to model as feedback 2–3 fix attempts
Loop Detection Same tool called 3x with same inputs Break loop, surface to human checkpoint 1 detection → escalate
Budget Exceeded Token limit, cost cap, iteration cap hit Checkpoint state, pause, notify operator Hard stop
Unrecoverable Conflicting requirements, missing credentials Escalate to human with diagnosis Immediate
Python — Circuit Breaker + Loop Detection
from collections import Counter, defaultdict from datetime import datetime, timedelta class HarnessCircuitBreaker: """ Detects and breaks pathological execution patterns before they exhaust tokens, loop infinitely, or cause runaway tool calls. """ def __init__(self): self.tool_call_history = [] # (tool_name, input_hash, timestamp) self.iteration_count = 0 self.fix_attempts = defaultdict(int) # tool -> consecutive failures def record_tool_call(self, tool_name: str, inputs: dict) -> None: self.tool_call_history.append((tool_name, hash(str(inputs)), datetime.now())) def check_for_loops(self) -> tuple[bool, str]: """Detect identical tool+input pairs in recent history.""" recent = self.tool_call_history[-6:] # last 6 calls call_counts = Counter((name, h) for name, h, _ in recent) for (tool, h), count in call_counts.items(): if count >= 3: return True, f"Loop detected: '{tool}' called {count}x with identical inputs" return False, "" def record_fix_failure(self, tool: str) -> bool: """Returns True if fix attempts exhausted — escalate to human.""" self.fix_attempts[tool] += 1 return self.fix_attempts[tool] >= 3 def reset_fix_counter(self, tool: str) -> None: self.fix_attempts[tool] = 0 # success — reset counter # ─── Recovery message injected back to model ───────────────────────────── RECOVERY_PROMPT_TEMPLATE = """ The previous action failed. Here is the structured error: ERROR TYPE: {error_type} ERROR DETAIL: {error_detail} Attempt {attempt} of {max_attempts}. {"This is your FINAL attempt. If you cannot fix this, respond with ESCALATE: ." if attempt >= max_attempts else ""} Diagnose the root cause from the error output above and try a different approach. """
07

Feedback Loops

// WRITE–TEST–FIX, ONLINE EVAL, CONTINUOUS LEARNING

Feedback loops are the mechanism by which agent output drives the next agent action. The simplest and most effective feedback loop for coding agents is the write–test–fix cycle: the agent writes code, the harness runs tests, failure output is fed back as the next input. This tight loop converts test failures into self-correction signals without human involvement.

01
Write
Agent modifies code
02
Validate
Lint + typecheck (sync)
03
Test
Run test suite
04
Evaluate
Pass / Fail / Partial
05
Feed Back
Inject error context
06
Fix or Escalate
Retry or → Human

Structured Feedback Injection

Python — Feedback Context Assembly
def build_feedback_context(validation: ValidationResult, attempt: int) -> str: """ Transforms raw validation output into structured feedback the model can act on. Critical: include the exact error, not a summary of it. Models fix errors they can see; they hallucinate fixes for errors they can't. """ lines = [f"=== VALIDATION FEEDBACK (attempt {attempt}) ==="] if validation.errors: lines.append("\n🔴 ERRORS (must fix before proceeding):") for err in validation.errors: lines.append(f" • {err}") if validation.warnings: lines.append("\n🟡 WARNINGS (fix if possible):") for w in validation.warnings: lines.append(f" • {w}") lines.append("\nRequired action: Fix ALL errors above. Re-run validation after changes.") lines.append(f"Attempts remaining: {3 - attempt}") return "\n".join(lines) # ─── Offline eval loop (run after session, improve harness) ────────────── def run_offline_eval(harness_config: dict, eval_suite: list[dict]) -> dict: """ Replay agent tasks against a golden eval suite to measure harness quality. Key metrics: task completion rate, iteration count, token cost, error rate. Run in CI to catch harness regressions before production deployment. """ results = {"passed": 0, "failed": 0, "total_tokens": 0, "avg_iterations": 0} for case in eval_suite: outcome = run_agent_loop(case["task"], max_iterations=case.get("max_iter", 15)) passed = evaluate_outcome(outcome, case["expected"]) results["passed" if passed else "failed"] += 1 return results
Write–test–fix loop quality: Agent frameworks reporting 80–90%+ task completion on SWE-bench benchmarks all implement a tight write–test–fix feedback loop. The loop is more valuable than model size. A Claude Sonnet in a well-instrumented feedback harness consistently outperforms Opus running without one on multi-step coding tasks.
08

Observability

// TRACES, METRICS, LOGS, AND COST ATTRIBUTION

You cannot improve what you cannot observe. AI harness observability requires three signal types: distributed traces (for step-by-step action reconstruction), structured metrics (for quantitative performance tracking), and cost attribution (for token and compute accountability). Unlike traditional software, AI observability must also capture reasoning transparency — the decisions the model made and why.

Distributed Traces

Capture every model call, tool invocation, and validation step as a span in a trace. OpenTelemetry is the standard. Each span should record: model, input tokens, output tokens, duration, tool name, inputs, outputs, and the parent span that triggered it.

Session Metrics

Track per-session: total iterations, tool calls by type, error/retry counts, total token cost, task completion status, human intervention count, and time-to-completion. These drive harness tuning over time.

Cost Attribution

Log input tokens, output tokens, model tier, and compute duration for every call. Attribute cost to task, session, user, and team. Cost spikes are the earliest signal of runaway loops or harness misconfiguration.

Structured Session Logging

Python — OpenTelemetry Harness Instrumentation
from opentelemetry import trace from opentelemetry.trace import Status, StatusCode import json, time tracer = trace.get_tracer("ai.harness") class HarnessObserver: """ Wraps every harness action in an OTel span. Integrates with Jaeger, Honeycomb, Datadog, or any OTel backend. """ def trace_model_call(self, session_id: str, messages: list, response) -> None: with tracer.start_as_current_span("llm.call") as span: span.set_attribute("session.id", session_id) span.set_attribute("llm.model", response.model) span.set_attribute("llm.input_tokens", response.usage.input_tokens) span.set_attribute("llm.output_tokens", response.usage.output_tokens) span.set_attribute("llm.stop_reason", response.stop_reason) span.set_attribute("llm.tool_use", response.stop_reason == "tool_use") # Cost attribution: input=$3/M, output=$15/M for Sonnet cost_usd = (response.usage.input_tokens * 3 + response.usage.output_tokens * 15) / 1_000_000 span.set_attribute("llm.cost_usd", cost_usd) def trace_tool_call(self, tool_name: str, inputs: dict, result: str, duration_ms: float, error: str = None) -> None: with tracer.start_as_current_span(f"tool.{tool_name}") as span: span.set_attribute("tool.name", tool_name) span.set_attribute("tool.input_size", len(json.dumps(inputs))) span.set_attribute("tool.duration_ms", duration_ms) span.set_attribute("tool.success", error is None) if error: span.set_status(Status(StatusCode.ERROR, error)) span.record_exception(Exception(error)) def emit_session_summary(self, session: dict) -> None: """Structured log line — queryable in any log aggregation platform.""" log_line = { "event": "harness.session.complete", "session_id": session["id"], "task": session["task"][:120], "completed": session["completed"], "iterations": session["iterations"], "tool_calls": session["tool_calls"], "errors": session["error_count"], "hitl_events": session["human_interventions"], "total_tokens": session["input_tokens"] + session["output_tokens"], "cost_usd": round(session["cost_usd"], 4), "duration_s": session["duration_s"], "timestamp": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), } print(json.dumps(log_line)) # structured → Splunk / Datadog / CloudWatch

Key Metrics Dashboard: What to Track

MetricTargetAlert ThresholdDiagnosis
Task Completion Rate>85%<70%Harness guardrails too restrictive, or model+task mismatch
Avg Iterations / Task<8>15Feedback loop not informative; tool errors not being surfaced properly
Human Intervention Rate<15%>40%Escalation thresholds miscalibrated or task scope too ambiguous
Tool Error Rate<5%>20%Tool schema unclear; model misusing tools; infrastructure instability
Cost / Task (USD)Baseline ±20%>3× baselineLoop detection failure; context not being compacted; model over-calling tools
P99 Latency<60s / iteration>120sTool timeouts; upstream API degradation; context window size issues
09

Human-in-the-Loop

// APPROVAL GATES, ESCALATION POLICIES, AND BREAK-GLASS

Human-in-the-loop (HITL) is not a safety afterthought — it is an architectural primitive. The harness defines where and when human judgment is injected into the agent loop. The goal is not maximum oversight but right-placed oversight: humans at the decisions where autonomy risk is highest, automation everywhere else.

Mandatory HITL Gates Always
  • Destructive operations (delete, drop table, terminate instance)
  • External communications (send email, post to Slack, submit PR)
  • Credential or secret access beyond scoped tools
  • Changes to CI/CD pipelines or deployment configurations
  • Any action affecting more than N files (configurable)
  • Loop detection trigger — model stuck in retry cycle
  • Unrecoverable errors after max fix attempts
Configurable HITL Thresholds Tunable
  • File change scope: ask if modifying >5 files
  • Cost gate: pause if session exceeds $X
  • Architectural decisions outside CLAUDE.md scope
  • Package installation / dependency changes
  • Test coverage drops below threshold
  • New external network connections introduced
  • Off-hours execution of sensitive operations

HITL Approval Flow Implementation

Python — Human Approval Gate
from enum import Enum import asyncio class ApprovalOutcome(Enum): APPROVED = "approved" REJECTED = "rejected" MODIFIED = "modified" # human edits the proposed action ESCALATED = "escalated" # routed to senior approver class HITLGate: """ Pauses agent execution, presents action + context to a human operator, and resumes (or terminates) based on their decision. Supports async approvals via Slack, web UI, or CLI. """ def __init__(self, notification_backend): self.backend = notification_backend # Slack, PagerDuty, web webhook self.pending = {} # request_id → asyncio.Future async def request_approval(self, action_type: str, proposed_action: dict, context: str, timeout_s: int = 300) -> ApprovalOutcome: request_id = generate_id() future = asyncio.get_event_loop().create_future() self.pending[request_id] = future # Notify operator (non-blocking) await self.backend.notify({ "request_id": request_id, "action_type": action_type, "proposed_action": proposed_action, "context": context[:500], "approve_url": f"https://harness.internal/approve/{request_id}", "reject_url": f"https://harness.internal/reject/{request_id}", }) try: # Block agent loop until human responds or timeout result = await asyncio.wait_for(future, timeout=timeout_s) return result except asyncio.TimeoutError: # Timeout policy: default deny for destructive, allow for safe return ApprovalOutcome.REJECTED # conservative default finally: self.pending.pop(request_id, None) def resolve(self, request_id: str, outcome: ApprovalOutcome) -> None: """Called by the approval webhook when human responds.""" if request_id in self.pending: self.pending[request_id].set_result(outcome) # ─── Example: Slack notification for HITL approval ─────────────────────── SLACK_HITL_BLOCK = { "blocks": [ {"type": "header", "text": {"type": "plain_text", "text": "🤖 Agent Action Requires Approval"}}, {"type": "section", "fields": [ {"type": "mrkdwn", "text": "*Action:*\nDelete database migration file"}, {"type": "mrkdwn", "text": "*Risk Level:*\n🔴 HIGH — irreversible"}, ]}, {"type": "actions", "elements": [ {"type": "button", "text": {"type": "plain_text", "text": "✅ Approve"}, "style": "primary"}, {"type": "button", "text": {"type": "plain_text", "text": "❌ Reject"}, "style": "danger"}, ]}, ] }
10

Claude Code Harness

// HOOKS, SKILLS, MCP, AND SESSION MANAGEMENT

Claude Code is Anthropic's terminal-based agentic coding CLI with a built-in harness layer. It exposes three primary harness extension points: hooks (lifecycle callbacks at pre/post-tool execution), CLAUDE.md (session-scoped behavioral constraints), and MCP server integrations (tool capability extensions). Used together, these allow full harness engineering without writing custom agent scaffolding.

JSON — Claude Code Settings Harness Config
// .claude/settings.json — project-level harness configuration { "model": "claude-sonnet-4-6", "env": { "MAX_THINKING_TOKENS": "10000", // cap extended thinking "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50" // compact at 50% context, not 95% }, "hooks": { // PRE-TOOL: validate all bash commands before execution "PreToolUse": [ { "matcher": "Bash", "hooks": [{ "type": "command", "command": "./harness/validate-command.sh" // Non-zero exit = BLOCK the command + return error to model }] } ], // POST-TOOL: run linting after every file write "PostToolUse": [ { "matcher": "Write", "hooks": [{ "type": "command", "command": "./harness/post-write.sh" // Runs: lint → typecheck → output to model as feedback }] } ], // POST-TURN: log every session turn for observability "PostTurn": [{ "type": "command", "command": "./harness/log-turn.sh" }] }, "permissions": { "allow": [ "Bash(npm run *)", // npm scripts: yes "Bash(git status)", // git read: yes "Bash(git diff *)", "Read(src/**)", // src read: yes "Write(src/**)" // src write: yes ], "deny": [ "Bash(git push *)", // no remote push "Bash(rm -rf *)", // no mass delete "Bash(npm publish)", // no publish "Bash(curl * | bash)", // no remote execution "Read(.env*)" // no secrets access ] } }
Bash — Pre-Tool Hook: Command Validation
#!/bin/bash # harness/validate-command.sh # Claude Code passes tool input via stdin as JSON. # Exit 0 = allow. Exit 1 = BLOCK. Exit 2 = BLOCK + return message to model. INPUT=$(cat) # Read JSON from stdin: {"command": "...", "description": "..."} CMD=$(echo $INPUT | jq -r .command) # Block patterns — never allow these regardless of context BLOCKED=( "rm -rf /" "dd if=/" "mkfs" "chmod 777" ":(){:|:&};:" # fork bomb "curl.*|.*bash" "wget.*|.*sh" ) for pattern in "${BLOCKED[@]}"; do if echo "$CMD" | grep -qE "$pattern"; then echo "BLOCKED: Command matches prohibited pattern: $pattern" >&2 exit 2 # Exit 2 = message returned to model fi done # Log all commands for audit trail echo "$(date -u +"%Y-%m-%dT%H:%M:%SZ") CMD: $CMD" >> logs/harness-audit.log exit 0 # Allow command to proceed
MCP server risk: Don't enable all MCP servers at once. Each MCP server expands the agent's tool surface. Run npx ecc-agentshield scan (AgentShield, 2026) to audit your MCP config, hooks, and CLAUDE.md for injection risks, permission over-grants, and secret exposure patterns before production use.
11

GitHub Copilot & Codex CLI

// HARNESS CONFIGURATION FOR EDITOR-NATIVE AGENTS

GitHub Copilot and OpenAI Codex CLI each provide harness configuration primitives that parallel Claude Code's CLAUDE.md/hooks model. Copilot uses repository instruction files and organization-level policy. Codex uses AGENTS.md and sandbox permissions. Understanding the harness surface of each tool lets you apply the same disciplined engineering regardless of which agent is running.

Dimension
Claude Code
GitHub Copilot Agent
OpenAI Codex CLI
Config File
CLAUDE.md + settings.json
.github/copilot-instructions.md
AGENTS.md
Hooks / Lifecycle
YES — PreToolUse, PostToolUse, PostTurn
PARTIAL — GitHub Actions triggers
NO — static config only
Permission Scoping
FINE-GRAINED — per-tool allow/deny
MEDIUM — org policy + repo rules
MEDIUM — sandbox profiles
Observability
Custom hooks + OTel integration
GitHub Audit Log + Copilot metrics
AGENTS.md + external logging
HITL
Custom approval gate via hooks
PR review workflow (native)
Approval-required flag in config
MCP Support
YES — full MCP client
YES — via Agent HQ (Feb 2026)
LIMITED — plugin model
Feedback Loop
Built-in + customizable via hooks
CI/CD pipeline feedback
AGENTS.md defines test commands

AGENTS.md — Codex Harness Config

Markdown — AGENTS.md (OpenAI Codex)
# AGENTS.md — OpenAI Codex Harness Configuration # Equivalent to CLAUDE.md for the Codex CLI environment ## Project Python FastAPI service. Tests: pytest. Linting: ruff. Type checking: mypy. ## Commands # These are the ONLY commands the agent should run unsupervised - Run tests: pytest tests/ -v --tb=short - Lint: ruff check . --fix - Type check: mypy src/ --strict - Format: ruff format . ## Sandbox Permissions (network isolation) # Codex runs in a network-isolated sandbox by default network: disabled # prevent exfiltration, force offline operation ## Required Workflow For EVERY code change, in this order: 1. Make targeted, minimal changes 2. Run mypy — zero new errors 3. Run ruff — auto-fix, then zero remaining issues 4. Run pytest — all existing tests must pass 5. Add tests for new behavior (minimum 1 test per new function) ## Boundaries - Modify ONLY files under src/ and tests/ - Do NOT modify pyproject.toml or requirements.txt without asking - Do NOT create new API endpoints without showing the design first - Do NOT use subprocess in application code ## On Failure If any check fails after 2 attempts: output the exact error and ask for guidance. Do NOT try a third fix attempt independently.
12

Orchestration Frameworks

// LANGCHAIN, LANGGRAPH, CREWAI, AND BEYOND

Orchestration frameworks provide composable harness primitives — prompt templates, tool definitions, memory abstractions, and agent loop logic — so teams don't build the execution layer from scratch. Choice of framework shapes harness architecture. Pick based on production requirements, not demo simplicity.

FrameworkModelBest ForHarness MaturityProduction Uses
LangGraph Graph / stateful Complex multi-step workflows with branching and cycles HIGH Klarna, Replit, Elastic
LangChain Chain / pipeline Rapid prototyping, broad integrations (100k+ stars) MEDIUM Widely deployed, vary in rigor
CrewAI Multi-agent crews Role-based agent teams with task delegation MEDIUM Enterprise automation pilots
AWS Bedrock AgentCore Runtime + gateway Full production stack: runtime, memory, identity, observability HIGH AWS-native enterprise
Custom + Anthropic SDK Direct API Maximum control, minimum abstraction overhead MAX (manual) High-reliability production systems
LangGraph v1.0 (October 2025): Graph-based stateful orchestration for complex agent workflows. Nodes are agent steps; edges are conditional routing logic; state is a typed dict persisted across the graph. First-class support for human-in-the-loop checkpoints, streaming, and tool-use error recovery. The reference framework for production multi-agent harnesses as of 2026.
13

MCP & A2A Protocols

// TOOL INTEROPERABILITY AND AGENT-TO-AGENT DELEGATION

The Model Context Protocol (MCP) is the emerging standard for tool interoperability — any MCP-compliant client (Claude Code, Copilot, LangChain) can consume any MCP-compliant tool server without bespoke integration. The Agent-to-Agent (A2A) protocol extends this to multi-agent topologies: orchestrator agents delegate to specialist subagents through a standardized calling convention.

Model Context Protocol (MCP) Standard

Defines the interface between model clients and tool servers. Tool servers expose: capabilities manifest, tool schemas, and a call endpoint. Clients invoke tools by name with validated inputs. The harness governs which MCP servers are loaded per session and what their permission scopes are.

Supported by: Claude Code, Copilot Agent HQ, LangChain, LangGraph, Cursor, and 200+ community servers.

Agent-to-Agent (A2A) 2026

Standardized protocol for orchestrator→subagent delegation. An orchestrator agent can spawn a specialist subagent (code reviewer, security scanner, documentation writer) as a structured API call. A2A ensures: typed task contracts, result schemas, error propagation, and cost attribution across agent boundaries.

Reference: IEEE CAI 2026 Tutorial on Engineering Trustworthy Multi-Agent Systems.

MCP Harness Configuration Example

JSON — MCP Server Config (.claude/settings.json)
// Only include servers needed for the current project. // Each server = expanded attack surface. Audit with ecc-agentshield. { "mcpServers": { "filesystem": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-filesystem", "./src"], // Scoped to ./src only — not the full filesystem "description": "Read/write access to project source files only" }, "github": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"], "env": { "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}" // Never hardcode tokens — use env var references } }, "postgres": { "command": "npx", "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://readonly_user@localhost/mydb"], // READ-ONLY database user — never give agents write access to prod DB "description": "Read-only access to development database schema and data" } // Intentionally excluded: shell execution, network access, cloud control plane } }
14

Harness Maturity Model

// WHERE IS YOUR TEAM TODAY

Harness engineering maturity maps to the CISA ZTA model structure: four stages across five dimensions. Teams rarely advance all dimensions simultaneously. Start with guardrails and feedback loops (highest safety ROI), then add observability and HITL rigor, then optimize orchestration.

Stage 01
Ad Hoc
  • No config files
  • No permission scoping
  • No feedback loop
  • Manual testing only
  • No cost visibility
  • All-or-nothing iteration
Stage 02
Structured
  • CLAUDE.md / AGENTS.md present
  • Basic allow/deny rules
  • Test feedback loop
  • Manual HITL checkpoints
  • Basic cost logging
  • Iteration cap set
Stage 03
Production
  • Multi-layer guardrails
  • Structured error recovery
  • OTel traces + metrics
  • Automated HITL gates
  • Offline eval suite (CI)
  • MCP scoped per project
Stage 04
Optimal
  • Policy-as-code guardrails
  • Dynamic HITL thresholds
  • Full A2A + orchestration
  • Continuous harness eval
  • Cost attribution per team
  • Security scan in CI/CD
15

Patterns & Anti-Patterns

// WHAT WORKS, WHAT FAILS, AND WHY
✕ Anti-Patterns (Production Failures)
  • Unbounded iteration — no max_iterations cap; agent burns tokens until OOM or API limit
  • Silent errors — tool returns error as empty string; model assumes success
  • Broad file access — agent has write access to entire repo including secrets, CI config
  • No state checkpoints — 20-iteration task has no recovery point on failure
  • Prompt-only guardrails — "don't touch .env" in the prompt; not enforced deterministically
  • No loop detection — same tool called 8× with identical inputs before token exhaustion
  • All MCPs enabled — every integration loaded regardless of current task
  • Human gate theater — HITL approval always auto-approves due to timeout policy
✓ Production Patterns (What Ships)
  • Bounded loops — hard iteration cap; structured handoff artifact on max-iter
  • Structured tool errors — every error has type, detail, and suggested recovery hint
  • Minimal tool surface — 3–5 scoped tools per session, not a full tool registry
  • Explicit checkpointing — write progress artifact to disk every N iterations
  • Deterministic guardrails — hook scripts enforce rules at OS level, not prompt level
  • Loop detection — circuit breaker triggers HITL on 3× same tool+input
  • Task-scoped MCP — only load servers required for the current task category
  • Tiered HITL — auto-approve low-risk, manual-approve destructive, reject on timeout
Prompt injection via tools: When an agent reads external content (files, web pages, emails, database rows) and that content contains instructions ("Ignore previous instructions and..."), indirect prompt injection attacks can hijack the agent loop. Defense: treat all external tool output as untrusted data, not as instructions. Sanitize tool outputs before injection into the conversation; use a dedicated trusted/untrusted content zone in the context window.
16

Implementation Roadmap

// SIX STEPS TO PRODUCTION-GRADE HARNESS

Whether you're starting from a bare API call or retrofitting an existing agent, the path to production harness engineering follows the same six steps. Each step independently improves reliability. The order matters: guardrails before observability, feedback loop before HITL tuning.

Roadmap — 6-Step Harness Engineering Path
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 1 — ASSET INVENTORY (Day 1–2) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✦ List every tool the agent currently uses or could use ✦ Classify each tool: read-only / write / destructive ✦ Identify data the agent can touch (files, DBs, APIs, credentials) ✦ Map current failure modes: what breaks, how often, impact ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 2 — CONFIGURATION FILE (Day 3–4) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✦ Create CLAUDE.md / AGENTS.md with: - Explicit allowed operations - Explicit prohibited operations - Required verification steps after any change - Escalation triggers (when to stop and ask) ✦ Set iteration cap (start: 15 for coding, 30 for research) ✦ Define context compaction strategy (compact at 50%, not 95%) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 3 — FEEDBACK LOOP (Week 1) ROI: HIGHEST ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✦ Implement write-test-fix cycle: - Agent writes → tests run automatically → failures injected as next input ✦ Add output validation: secret detection + static analysis + lint ✦ Structure error messages: type + detail + recovery hint ✦ Set max fix attempts (2–3) before escalating to human ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 4 — GUARDRAILS (Week 2) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✦ Add pre-tool hook: validate commands before execution ✦ Scope file/directory access to minimum needed ✦ Add loop detection (3× same tool+input = escalate) ✦ Set token budget + cost alert threshold ✦ Define HITL gates for all destructive operations ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 5 — OBSERVABILITY (Week 3) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✦ Instrument all model calls with OTel spans ✦ Log: every tool call, duration, error rate, cost ✦ Emit structured session summary on completion ✦ Set up dashboard: completion rate, avg iterations, cost/task ✦ Add post-turn hook for audit trail ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ STEP 6 — EVALUATE AND ITERATE (Ongoing) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ✦ Build an offline eval suite (10–20 representative tasks) ✦ Run eval in CI on every harness config change ✦ Track: completion rate, cost/task, iteration count, error rate ✦ Review HITL logs weekly — tune thresholds based on outcomes ✦ Scan harness config for security issues (ecc-agentshield) ✦ Update CLAUDE.md/AGENTS.md as project evolves
Common Failure Modes Avoid
  • Starting with orchestration before feedback loop is solid
  • Treating CLAUDE.md as documentation, not a contract
  • Giving the same agent all MCP servers for all tasks
  • HITL gates that always auto-approve due to short timeouts
  • No eval suite — can't tell if harness changes help or hurt
  • Measuring only task success rate, not cost efficiency
Success Indicators Target
  • Task completion rate >85% without human intervention
  • Agent self-corrects test failures in <3 iterations
  • No secret or PII ever appears in agent output
  • Cost per task stable within ±20% across model upgrades
  • Full trace replay possible for any failed session
  • Harness config checked into source control, reviewed in PR
Reference implementations and resources: NIST AI Agent Standards Initiative (Feb 2026) · OWASP LLM Top 10 (2025) — especially LLM01 Prompt Injection and LLM06 Excessive Agency · IEEE CAI 2026 — Engineering Trustworthy Multi-Agent Systems · everything-claude-code (Anthropic Hackathon Winner, 140k⭐) — battle-tested hooks, skills, MCP configs · ecc-agentshield — harness security scanner · AWS AgentCore Samples — full production harness reference across multiple frameworks