Field Handbook · Production AI Systems

Harness
Engineering

"Prompt engineering was 2023. Harness engineering is 2026 — and by July it's the fourth paradigm, sitting under loop and context engineering in the same stack."

The complete operational guide to building reliable AI systems — covering tool orchestration, multi-layer guardrails, error recovery, feedback loops, observability, and human oversight. Goes beyond individual model calls to the production infrastructure that makes AI agents dependable. Updated for the July 2026 harness stack: Inner vs. Outer Harness, Claude Opus 4.8 and Sonnet 5, native /goal and /checkup in Claude Code, and the model-abstraction lessons of the Fable 5 / Mythos 5 export-control episode.

Tool Orchestration Multi-Agent Observability Guardrails Error Recovery Human-in-Loop

What Is Harness Engineering

// DEFINITION AND SCOPE

Harness engineering is the discipline of designing the execution environment, tooling infrastructure, and orchestration logic that surrounds a language model. It is everything that is not the model itself — context assembly, tool dispatch, memory management, guardrails, retry logic, state tracking, output validation, and observability.

The harness converts a probabilistic text generator into a reliable, auditable, goal-directed system. Without a harness, an LLM is a demo. With a production-grade harness, the same model becomes infrastructure.

✕ Without a Harness

Model calls are stateless — no continuity across turns
Errors surface as hallucinated "success" responses
Tools invoked with no permission scoping
No retry, no fallback, no circuit breaking
Unlimited token burn — no cost guardrails
Invisible — no tracing, no audit, no replay
Human oversight requires manual monitoring

✓ With a Production Harness

Structured state handoffs between context windows
Deterministic validation of every output
Tools scoped to minimum required permissions
Retry with backoff, fallback strategies, circuit breakers
Token budgets, rate limits, cost alerting
Full distributed trace: every decision auditable
Policy-defined human approval gates

▸

Key insight, still holding in mid-2026: Improving the harness on the same model consistently outperforms switching to a more capable model on real production workloads. The scaffolding is the system — the model is just a component. The same Claude Opus 4.8 running inside different harnesses produces dramatically different reliability profiles.

Inner Harness vs. Outer Harness

By mid-2026 the field settled on a sharper distinction than "harness vs. model." Frontier labs build the Inner Harness — the safety layers, native tool-calling, context management, and compaction baked directly into the model runtime (what ships inside Claude Code or the Claude Agent SDK). The durable engineering moat for everyone else is the Outer Harness: the custom configuration, environment routing, evals, and situational guardrails a platform team builds on top to map a raw model onto a real business workflow. The model proposes; the harness decides what is actually allowed to happen.

Inner Harness Lab-Built

Native tool-calling, permission model, context compaction, hooks, sandboxing, and session persistence shipped by the model provider — e.g. Claude Code's deny-first permission system and multi-stage context pipeline.

Outer Harness You Build

Your CLAUDE.md/AGENTS.md contracts, custom hooks, MCP scoping, eval suites, and escalation policy. This is where competitive advantage and model-swap flexibility actually live — commoditize the model, harden the harness.

Why It Matters in 2026

// THE SHIFT FROM PROMPT TO SYSTEM

Three converging forces have made harness engineering the dominant discipline for production AI teams in 2026: models are reasoning-capable by default (reducing prompt ROI), agents are now taking consequential real-world actions (raising the cost of failure), and multi-agent topologies mean errors compound across parallel execution paths. By May 2026, industry press was calling it the "fourth paradigm" of AI engineering, after prompt, context, and loop engineering.

Prompt ROI Collapse Driver

Marginal returns on prompt optimization have flattened as frontier models reason by default. The leverage has moved entirely to execution infrastructure — context assembly, tool selection, and output validation.

Consequential Actions Driver

Agents now write to databases, push to production branches, send emails, submit PRs, and provision cloud resources. Errors are no longer correctable by re-reading the chat. A harness-less agent with shell access is an uncontrolled blast radius.

Multi-Agent Compounding Driver

Ten agents running in parallel each making small errors creates cascading failures that are nearly impossible to debug post-hoc. The harness provides the isolation, state boundaries, and inter-agent contracts that prevent compound failure.

Convergence, Not Capability Data

Harness's State of Engineering Excellence 2026 report (May 2026, 700 practitioners) found teams report record AI productivity gains while lacking instruments to verify them — 89% trust their metrics, yet 94% say tech debt, validation time, and burnout are missing from those same numbers.

Capability Horizon Data

METR estimated (Feb 2026) that Claude Opus 4.6 could reliably complete software tasks with a 50%-success time horizon of roughly 14.5 hours — nearly two full workdays of autonomous execution, raising the stakes on every guardrail decision.

Deployment Gap Data

The Cloud Security Alliance estimated 88% of enterprise agent projects still stall before reaching production in 2026 — precisely the gap that disciplined harness engineering, not bigger models, is closing.

⚠

OWASP LLM06:2025 — Excessive Agency (still current through 2026): The most dangerous harness anti-pattern is over-provisioning. An agent given filesystem write access, network egress, and process execution rights — when it only needs to run tests — is an amplified attack surface. OWASP classifies unnecessary permissions as a top-tier LLM risk. The harness is the enforcement layer that limits what an agent can actually do versus what it thinks it can do.

The Three-Layer Model

// ARCHITECTURE OVERVIEW

Every production AI harness — regardless of the model or framework — is composed of three concentric layers. Most teams in 2025 only built Layer 1. Teams shipping reliable production AI in 2026 engineer all three deliberately.

💬

Layer 1: Model Interface

Prompt construction, context assembly, response parsing, token management

⚙️

Layer 2: Runtime Environment

Tool definitions, memory stores, input validation, output guardrails, context window management

🌐

Layer 3: Orchestration

Agent loops, task decomposition, conditional branching, human approval gates, parallel execution, state handoffs

Agent Execution Loop (Plan–Execute–Verify)

Input

Task + Context

→

Plan

Decompose + Select Tools

→

Guardrail

Input Validation

→

Execute

Tool Calls

→

Verify

Output Validation

→

Feedback

Pass / Fix / Escalate

Tool Orchestration

// CAPABILITY REGISTRY, DISPATCH, AND PERMISSIONS

Tools give an agent capabilities beyond text generation — web search, code execution, database queries, file operations, API calls. The harness is responsible for defining the tool registry, enforcing permission scopes, handling tool failures, and composing multi-tool workflows into coherent agent actions.

▸

Start with 3–5 well-defined tools. A lean, well-scoped tool registry outperforms a broad, loosely defined one. Each tool should have: a clear natural-language description for model selection, typed input/output schema, explicit permission scope, timeout budget, and a documented fallback behavior.

Tool Registry Design

Python — Tool Registry with Permissions

from dataclasses import dataclass
from enum import Enum
from typing import Callable, Any
import anthropic

class PermissionLevel(Enum):
    READ_ONLY  = "read_only"    # no side effects
    WRITE      = "write"         # modifies state, reversible
    DESTRUCTIVE = "destructive"  # requires human approval

@dataclass
class HarnessTool:
    name:        str
    description: str
    schema:      dict          # JSON Schema for inputs
    handler:     Callable
    permission:  PermissionLevel
    timeout_s:   int = 30
    max_retries: int = 3

# Tool registry — define capabilities explicitly
TOOL_REGISTRY: dict[str, HarnessTool] = {
    "read_file": HarnessTool(
        name="read_file",
        description="Read the contents of a file. Only files under ./src/ are accessible.",
        schema={
            "type": "object",
            "properties": {
                "path": {"type": "string", "pattern": "^\\./src/"}
            },
            "required": ["path"]
        },
        handler=handle_read_file,
        permission=PermissionLevel.READ_ONLY,
    ),
    "run_tests": HarnessTool(
        name="run_tests",
        description="Execute the test suite. Returns exit code, stdout, and stderr.",
        schema={"type": "object", "properties": {
            "test_path": {"type": "string", "default": "tests/"}
        }},
        handler=handle_run_tests,
        permission=PermissionLevel.READ_ONLY,
        timeout_s=120,
    ),
    "write_file": HarnessTool(
        name="write_file",
        description="Write content to a file. Restricted to ./src/. Creates backup before writing.",
        schema={
            "type": "object",
            "properties": {
                "path":    {"type": "string", "pattern": "^\\./src/"},
                "content": {"type": "string"}
            },
            "required": ["path", "content"]
        },
        handler=handle_write_file,
        permission=PermissionLevel.WRITE,
    ),
    "delete_file": HarnessTool(
        name="delete_file",
        description="Delete a file. REQUIRES human approval before execution.",
        schema={"type": "object", "properties": {
            "path": {"type": "string"}
        }},
        handler=handle_delete_file,
        permission=PermissionLevel.DESTRUCTIVE,  # always gate to human
    ),
}

Multi-Tool Workflow: Write–Test–Fix Loop

Python — Harness Execution with Tool Dispatch

import anthropic, json, time
from harness import TOOL_REGISTRY, PermissionLevel, require_human_approval

client = anthropic.Anthropic()

def run_agent_loop(task: str, max_iterations: int = 20) -> str:
    """
    Bounded Plan–Execute–Verify loop with tool dispatch.
    Exits on: task completion, max iterations, or unrecoverable error.
    """
    messages = [{"role": "user", "content": task}]
    tool_defs = [t_to_anthropic_schema(t) for t in TOOL_REGISTRY.values()]
    iteration  = 0

    while iteration < max_iterations:
        iteration += 1
        # ── Model call ──────────────────────────────────────────
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=8192,
            tools=tool_defs,
            messages=messages,
        )

        # ── Stop conditions ─────────────────────────────────────
        if response.stop_reason == "end_turn":
            return extract_text(response)          # task complete

        # ── Tool use ────────────────────────────────────────────
        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue

            tool = TOOL_REGISTRY.get(block.name)
            if not tool:
                tool_results.append(tool_error(block.id, f"Unknown tool: {block.name}"))
                continue

            # ── Permission gate ─────────────────────────────────────
            if tool.permission == PermissionLevel.DESTRUCTIVE:
                approved = require_human_approval(block.name, block.input)
                if not approved:
                    tool_results.append(tool_error(block.id, "Action rejected by operator"))
                    continue

            # ── Execute with timeout + retry ────────────────────────
            result = execute_with_retry(tool, block.input)
            tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result})

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user",      "content": tool_results})

    raise HarnessError(f"Max iterations ({max_iterations}) reached without completion")


def execute_with_retry(tool, inputs, attempt=0):
    try:
        result = run_with_timeout(tool.handler, inputs, tool.timeout_s)
        return json.dumps(result)
    except TimeoutError:
        return json.dumps({"error": "timeout", "tool": tool.name, "timeout_s": tool.timeout_s})
    except Exception as e:
        if attempt < tool.max_retries:
            time.sleep(2 ** attempt)              # exponential backoff
            return execute_with_retry(tool, inputs, attempt + 1)
        return json.dumps({"error": str(e), "final_attempt": True})

Guardrails & Safety

// MULTI-LAYER BEHAVIORAL ENFORCEMENT

Guardrails are the deterministic rules that prevent agents from taking harmful, unauthorized, or out-of-scope actions. Unlike model-level safety (stochastic), harness guardrails are deterministic and composable. They operate at five distinct intercept points: before input reaches the model, during tool invocation, on output before delivery, at the session boundary, and at the infrastructure level.

🔍

Input Rails

Prompt injection detection, content classification, scope validation

⛓️

Dialog Rails

Conversation flow control, topic boundaries, CoT steering

🔧

Execution Rails

Tool allow/deny lists, parameter validation, sandbox boundaries

📤

Output Rails

Schema validation, linting, test gate, PII scrubbing

💰

Cost Rails

Token budgets, rate limits, per-session cost caps

📁

Data Rails

File path allowlists, secret detection, data classification gates

▸

NVIDIA NeMo Guardrails pattern: Define rails using the Colang DSL at each intercept layer. The execution rail layer specifically governs what tools the LLM can invoke and what their inputs/outputs may contain — the reference for behavioral-level enforcement when static allow/deny lists are insufficient. More constraints yield more reliability, not less.

Guardrail Implementation: CLAUDE.md / AGENTS.md

Markdown — CLAUDE.md Harness Config (Claude Code)

# CLAUDE.md — Project Harness Configuration
# Loaded automatically at session start by Claude Code

## Project Context
This is a TypeScript Node.js API service. Tests use Vitest. Deployment via GitHub Actions.

## Allowed Actions
- Read and modify files under ./src/ and ./tests/
- Run: npm test, npm run lint, npm run typecheck
- Create new files following the naming convention: kebab-case.ts

## Prohibited Actions
- NEVER modify .env, .env.production, or any secrets file
- NEVER run npm publish, git push, or deployment scripts
- NEVER install new packages without confirming with the user first
- NEVER delete files — move to ./trash/ directory instead
- NEVER expose API keys, tokens, or credentials in any output

## Required Verification Steps
After any code change, you MUST:
1. Run npm run typecheck — zero errors required
2. Run npm test — all tests must pass
3. Run npm run lint — zero warnings for new code

## Output Format
- Always explain the change made and why
- List files modified
- Show test results summary
- Flag any remaining TODOs

## Escalation Triggers
Pause and ask the user before proceeding if:
- Any test failure that you cannot fix in 2 attempts
- A change affects more than 5 files
- You encounter an architectural decision not covered here

Output Validation Gate

Python — Multi-Layer Output Guardrail

import subprocess, re
from dataclasses import dataclass

@dataclass
class ValidationResult:
    passed:  bool
    errors:  list[str]
    warnings: list[str]

def validate_code_output(code: str, file_path: str) -> ValidationResult:
    """
    Multi-layer output validation before accepting agent-written code.
    Each layer is deterministic — not reliant on the model.
    """
    errors, warnings = [], []

    # Layer 1: Secret detection (never commit secrets)
    SECRET_PATTERNS = [
        r'sk-[A-Za-z0-9]{32,}',          # OpenAI / Anthropic keys
        r'AKIA[A-Z0-9]{16}',               # AWS access key
        r'ghp_[A-Za-z0-9]{36}',            # GitHub PAT
        r'password\s*=\s*["\'][^"\']{8,}', # hardcoded password
    ]
    for pattern in SECRET_PATTERNS:
        if re.search(pattern, code, re.IGNORECASE):
            errors.append(f"SECRET_DETECTED: pattern '{pattern}' found in output")

    # Layer 2: Static analysis (TypeScript example)
    if file_path.endswith('.ts'):
        r = subprocess.run(['npx', 'tsc', '--noEmit', file_path], capture_output=True, text=True)
        if r.returncode != 0:
            errors.append(f"TYPE_ERROR: {r.stdout[:500]}")

    # Layer 3: Lint
    r = subprocess.run(['npx', 'eslint', file_path, '--format=json'], capture_output=True, text=True)
    if r.returncode != 0:
        warnings.append(f"LINT_WARNING: {r.stdout[:300]}")

    # Layer 4: Unit tests (if applicable)
    r = subprocess.run(['npm', 'test', '--', '--run'], capture_output=True, text=True, timeout=60)
    if r.returncode != 0:
        errors.append(f"TEST_FAILURE:\n{r.stdout[-800:]}")

    return ValidationResult(passed=len(errors)==0, errors=errors, warnings=warnings)

Error Recovery

// RETRY STRATEGIES, FALLBACKS, AND CIRCUIT BREAKERS

Agent failures fall into three categories: transient (network timeout, rate limit), recoverable (test failure, type error, tool validation error), and unrecoverable (stuck in a loop, contradiction in task requirements, permission denied). Each category demands a different recovery strategy. The harness must distinguish between them and respond appropriately rather than retrying uniformly.

Error Type	Examples	Recovery Strategy	Max Attempts
Transient	API timeout, 429 rate limit, network flap	Exponential backoff with jitter	3 with backoff
Tool Failure	Tool returns error, invalid output schema	Return structured error to model; allow re-plan	Model decides
Validation Failure	Tests fail, typecheck fails, lint errors	Pass error output back to model as feedback	2–3 fix attempts
Loop Detection	Same tool called 3x with same inputs	Break loop, surface to human checkpoint	1 detection → escalate
Budget Exceeded	Token limit, cost cap, iteration cap hit	Checkpoint state, pause, notify operator	Hard stop
Unrecoverable	Conflicting requirements, missing credentials	Escalate to human with diagnosis	Immediate

Python — Circuit Breaker + Loop Detection

from collections import Counter, defaultdict
from datetime import datetime, timedelta

class HarnessCircuitBreaker:
    """
    Detects and breaks pathological execution patterns before they
    exhaust tokens, loop infinitely, or cause runaway tool calls.
    """
    def __init__(self):
        self.tool_call_history = []     # (tool_name, input_hash, timestamp)
        self.iteration_count   = 0
        self.fix_attempts      = defaultdict(int)  # tool -> consecutive failures

    def record_tool_call(self, tool_name: str, inputs: dict) -> None:
        self.tool_call_history.append((tool_name, hash(str(inputs)), datetime.now()))

    def check_for_loops(self) -> tuple[bool, str]:
        """Detect identical tool+input pairs in recent history."""
        recent = self.tool_call_history[-6:]  # last 6 calls
        call_counts = Counter((name, h) for name, h, _ in recent)
        for (tool, h), count in call_counts.items():
            if count >= 3:
                return True, f"Loop detected: '{tool}' called {count}x with identical inputs"
        return False, ""

    def record_fix_failure(self, tool: str) -> bool:
        """Returns True if fix attempts exhausted — escalate to human."""
        self.fix_attempts[tool] += 1
        return self.fix_attempts[tool] >= 3

    def reset_fix_counter(self, tool: str) -> None:
        self.fix_attempts[tool] = 0   # success — reset counter


# ─── Recovery message injected back to model ─────────────────────────────
RECOVERY_PROMPT_TEMPLATE = """
The previous action failed. Here is the structured error:

ERROR TYPE: {error_type}
ERROR DETAIL:
{error_detail}

Attempt {attempt} of {max_attempts}.
{"This is your FINAL attempt. If you cannot fix this, respond with ESCALATE: ." if attempt >= max_attempts else ""}

Diagnose the root cause from the error output above and try a different approach.
"""

Feedback Loops

// WRITE–TEST–FIX, ONLINE EVAL, CONTINUOUS LEARNING

Feedback loops are the mechanism by which agent output drives the next agent action. The simplest and most effective feedback loop for coding agents is the write–test–fix cycle: the agent writes code, the harness runs tests, failure output is fed back as the next input. This tight loop converts test failures into self-correction signals without human involvement.

Write

Agent modifies code

Validate

Lint + typecheck (sync)

Test

Run test suite

Evaluate

Pass / Fail / Partial

Feed Back

Inject error context

Fix or Escalate

Retry or → Human

Structured Feedback Injection

Python — Feedback Context Assembly

def build_feedback_context(validation: ValidationResult, attempt: int) -> str:
    """
    Transforms raw validation output into structured feedback the model
    can act on. Critical: include the exact error, not a summary of it.
    Models fix errors they can see; they hallucinate fixes for errors they can't.
    """
    lines = [f"=== VALIDATION FEEDBACK (attempt {attempt}) ==="]

    if validation.errors:
        lines.append("\n🔴 ERRORS (must fix before proceeding):")
        for err in validation.errors:
            lines.append(f"  • {err}")

    if validation.warnings:
        lines.append("\n🟡 WARNINGS (fix if possible):")
        for w in validation.warnings:
            lines.append(f"  • {w}")

    lines.append("\nRequired action: Fix ALL errors above. Re-run validation after changes.")
    lines.append(f"Attempts remaining: {3 - attempt}")
    return "\n".join(lines)


# ─── Offline eval loop (run after session, improve harness) ──────────────
def run_offline_eval(harness_config: dict, eval_suite: list[dict]) -> dict:
    """
    Replay agent tasks against a golden eval suite to measure harness quality.
    Key metrics: task completion rate, iteration count, token cost, error rate.
    Run in CI to catch harness regressions before production deployment.
    """
    results = {"passed": 0, "failed": 0, "total_tokens": 0, "avg_iterations": 0}
    for case in eval_suite:
        outcome = run_agent_loop(case["task"], max_iterations=case.get("max_iter", 15))
        passed  = evaluate_outcome(outcome, case["expected"])
        results["passed" if passed else "failed"] += 1
    return results

✓

Write–test–fix loop quality: Agent frameworks reporting 80–90%+ task completion on SWE-bench benchmarks all implement a tight write–test–fix feedback loop. The loop is more valuable than model size. A Claude Sonnet in a well-instrumented feedback harness consistently outperforms Opus running without one on multi-step coding tasks.

Observability

// TRACES, METRICS, LOGS, AND COST ATTRIBUTION

You cannot improve what you cannot observe. AI harness observability requires three signal types: distributed traces (for step-by-step action reconstruction), structured metrics (for quantitative performance tracking), and cost attribution (for token and compute accountability). Unlike traditional software, AI observability must also capture reasoning transparency — the decisions the model made and why.

Distributed Traces

Capture every model call, tool invocation, and validation step as a span in a trace. OpenTelemetry is the standard. Each span should record: model, input tokens, output tokens, duration, tool name, inputs, outputs, and the parent span that triggered it.

Session Metrics

Track per-session: total iterations, tool calls by type, error/retry counts, total token cost, task completion status, human intervention count, and time-to-completion. These drive harness tuning over time.

Cost Attribution

Log input tokens, output tokens, model tier, and compute duration for every call. Attribute cost to task, session, user, and team. Cost spikes are the earliest signal of runaway loops or harness misconfiguration.

Structured Session Logging

Python — OpenTelemetry Harness Instrumentation

from opentelemetry import trace
from opentelemetry.trace import Status, StatusCode
import json, time

tracer = trace.get_tracer("ai.harness")

class HarnessObserver:
    """
    Wraps every harness action in an OTel span.
    Integrates with Jaeger, Honeycomb, Datadog, or any OTel backend.
    """

    def trace_model_call(self, session_id: str, messages: list, response) -> None:
        with tracer.start_as_current_span("llm.call") as span:
            span.set_attribute("session.id",          session_id)
            span.set_attribute("llm.model",           response.model)
            span.set_attribute("llm.input_tokens",    response.usage.input_tokens)
            span.set_attribute("llm.output_tokens",   response.usage.output_tokens)
            span.set_attribute("llm.stop_reason",     response.stop_reason)
            span.set_attribute("llm.tool_use",        response.stop_reason == "tool_use")
            # Cost attribution: input=$3/M, output=$15/M for Sonnet
            cost_usd = (response.usage.input_tokens * 3 +
                        response.usage.output_tokens * 15) / 1_000_000
            span.set_attribute("llm.cost_usd", cost_usd)

    def trace_tool_call(self, tool_name: str, inputs: dict, result: str,
                        duration_ms: float, error: str = None) -> None:
        with tracer.start_as_current_span(f"tool.{tool_name}") as span:
            span.set_attribute("tool.name",        tool_name)
            span.set_attribute("tool.input_size",  len(json.dumps(inputs)))
            span.set_attribute("tool.duration_ms", duration_ms)
            span.set_attribute("tool.success",     error is None)
            if error:
                span.set_status(Status(StatusCode.ERROR, error))
                span.record_exception(Exception(error))

    def emit_session_summary(self, session: dict) -> None:
        """Structured log line — queryable in any log aggregation platform."""
        log_line = {
            "event":          "harness.session.complete",
            "session_id":     session["id"],
            "task":           session["task"][:120],
            "completed":      session["completed"],
            "iterations":     session["iterations"],
            "tool_calls":     session["tool_calls"],
            "errors":         session["error_count"],
            "hitl_events":    session["human_interventions"],
            "total_tokens":   session["input_tokens"] + session["output_tokens"],
            "cost_usd":       round(session["cost_usd"], 4),
            "duration_s":     session["duration_s"],
            "timestamp":      time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
        }
        print(json.dumps(log_line))  # structured → Splunk / Datadog / CloudWatch

Key Metrics Dashboard: What to Track

Metric	Target	Alert Threshold	Diagnosis
Task Completion Rate	>85%	<70%	Harness guardrails too restrictive, or model+task mismatch
Avg Iterations / Task	<8	>15	Feedback loop not informative; tool errors not being surfaced properly
Human Intervention Rate	<15%	>40%	Escalation thresholds miscalibrated or task scope too ambiguous
Tool Error Rate	<5%	>20%	Tool schema unclear; model misusing tools; infrastructure instability
Cost / Task (USD)	Baseline ±20%	>3× baseline	Loop detection failure; context not being compacted; model over-calling tools
P99 Latency	<60s / iteration	>120s	Tool timeouts; upstream API degradation; context window size issues

Human-in-the-Loop

// APPROVAL GATES, ESCALATION POLICIES, AND BREAK-GLASS

Human-in-the-loop (HITL) is not a safety afterthought — it is an architectural primitive. The harness defines where and when human judgment is injected into the agent loop. The goal is not maximum oversight but right-placed oversight: humans at the decisions where autonomy risk is highest, automation everywhere else.

Mandatory HITL Gates Always

Destructive operations (delete, drop table, terminate instance)
External communications (send email, post to Slack, submit PR)
Credential or secret access beyond scoped tools
Changes to CI/CD pipelines or deployment configurations
Any action affecting more than N files (configurable)
Loop detection trigger — model stuck in retry cycle
Unrecoverable errors after max fix attempts

Configurable HITL Thresholds Tunable

File change scope: ask if modifying >5 files
Cost gate: pause if session exceeds $X
Architectural decisions outside CLAUDE.md scope
Package installation / dependency changes
Test coverage drops below threshold
New external network connections introduced
Off-hours execution of sensitive operations

HITL Approval Flow Implementation

Python — Human Approval Gate

from enum import Enum
import asyncio

class ApprovalOutcome(Enum):
    APPROVED  = "approved"
    REJECTED  = "rejected"
    MODIFIED  = "modified"   # human edits the proposed action
    ESCALATED = "escalated"  # routed to senior approver

class HITLGate:
    """
    Pauses agent execution, presents action + context to a human operator,
    and resumes (or terminates) based on their decision.
    Supports async approvals via Slack, web UI, or CLI.
    """

    def __init__(self, notification_backend):
        self.backend   = notification_backend  # Slack, PagerDuty, web webhook
        self.pending   = {}                    # request_id → asyncio.Future

    async def request_approval(self,
                               action_type: str,
                               proposed_action: dict,
                               context: str,
                               timeout_s: int = 300) -> ApprovalOutcome:
        request_id = generate_id()
        future     = asyncio.get_event_loop().create_future()
        self.pending[request_id] = future

        # Notify operator (non-blocking)
        await self.backend.notify({
            "request_id":     request_id,
            "action_type":    action_type,
            "proposed_action": proposed_action,
            "context":        context[:500],
            "approve_url":    f"https://harness.internal/approve/{request_id}",
            "reject_url":     f"https://harness.internal/reject/{request_id}",
        })

        try:
            # Block agent loop until human responds or timeout
            result = await asyncio.wait_for(future, timeout=timeout_s)
            return result
        except asyncio.TimeoutError:
            # Timeout policy: default deny for destructive, allow for safe
            return ApprovalOutcome.REJECTED  # conservative default
        finally:
            self.pending.pop(request_id, None)

    def resolve(self, request_id: str, outcome: ApprovalOutcome) -> None:
        """Called by the approval webhook when human responds."""
        if request_id in self.pending:
            self.pending[request_id].set_result(outcome)


# ─── Example: Slack notification for HITL approval ───────────────────────
SLACK_HITL_BLOCK = {
    "blocks": [
        {"type": "header", "text": {"type": "plain_text", "text": "🤖 Agent Action Requires Approval"}},
        {"type": "section", "fields": [
            {"type": "mrkdwn", "text": "*Action:*\nDelete database migration file"},
            {"type": "mrkdwn", "text": "*Risk Level:*\n🔴 HIGH — irreversible"},
        ]},
        {"type": "actions", "elements": [
            {"type": "button", "text": {"type": "plain_text", "text": "✅ Approve"}, "style": "primary"},
            {"type": "button", "text": {"type": "plain_text", "text": "❌ Reject"},  "style": "danger"},
        ]},
    ]
}

Claude Code Harness

// HOOKS, SKILLS, MCP, AND SESSION MANAGEMENT

Claude Code is Anthropic's terminal-based agentic coding CLI with a built-in harness layer. It exposes three primary harness extension points: hooks (lifecycle callbacks at pre/post-tool execution), CLAUDE.md (session-scoped behavioral constraints), and MCP server integrations (tool capability extensions). Used together, these allow full harness engineering without writing custom agent scaffolding.

Two loop-level primitives now sit natively inside the harness: /goal keeps Claude working across turns until a verifiable completion condition holds, and /checkup (shipped July 2026) audits and self-heals your own harness config — pruning unused skills/MCPs/plugins to save context, de-duplicating CLAUDE.md against the checked-in version, splitting a bloated root file into nested files plus skills, and disabling slow hooks, all with confirmation before it changes anything.

ℹ

Harness portability, stress-tested: Claude Fable 5 and Claude Mythos 5 launched June 9, 2026, were suspended June 12 under a U.S. Department of Commerce export control, and had access restored July 1, 2026 once the control was lifted. Teams whose harness code depended on model IDs rather than an abstraction layer inherited that outage directly; teams that treated the model as a swappable component — routing to Sonnet 5 or Opus 4.8 and back — degraded gracefully. Build the harness to expect model churn, not just model upgrades.

JSON — Claude Code Settings Harness Config

// .claude/settings.json — project-level harness configuration
{
  "model": "claude-sonnet-5",
  "env": {
    "MAX_THINKING_TOKENS":           "10000",   // cap extended thinking
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"      // compact at 50% context, not 95%
  },

  "hooks": {
    // PRE-TOOL: validate all bash commands before execution
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [{
          "type":    "command",
          "command": "./harness/validate-command.sh"
          // Non-zero exit = BLOCK the command + return error to model
        }]
      }
    ],
    // POST-TOOL: run linting after every file write
    "PostToolUse": [
      {
        "matcher": "Write",
        "hooks": [{
          "type":    "command",
          "command": "./harness/post-write.sh"
          // Runs: lint → typecheck → output to model as feedback
        }]
      }
    ],
    // POST-TURN: log every session turn for observability
    "PostTurn": [{
      "type":    "command",
      "command": "./harness/log-turn.sh"
    }]
  },

  "permissions": {
    "allow": [
      "Bash(npm run *)",      // npm scripts: yes
      "Bash(git status)",     // git read: yes
      "Bash(git diff *)",
      "Read(src/**)",         // src read: yes
      "Write(src/**)"         // src write: yes
    ],
    "deny": [
      "Bash(git push *)",     // no remote push
      "Bash(rm -rf *)",       // no mass delete
      "Bash(npm publish)",    // no publish
      "Bash(curl * | bash)",  // no remote execution
      "Read(.env*)"           // no secrets access
    ]
  }
}

Bash — Pre-Tool Hook: Command Validation

#!/bin/bash
# harness/validate-command.sh
# Claude Code passes tool input via stdin as JSON.
# Exit 0 = allow. Exit 1 = BLOCK. Exit 2 = BLOCK + return message to model.

INPUT=$(cat)  # Read JSON from stdin: {"command": "...", "description": "..."}
CMD=$(echo $INPUT | jq -r .command)

# Block patterns — never allow these regardless of context
BLOCKED=(
  "rm -rf /"
  "dd if=/"
  "mkfs"
  "chmod 777"
  ":(){:|:&};:"  # fork bomb
  "curl.*|.*bash"
  "wget.*|.*sh"
)

for pattern in "${BLOCKED[@]}"; do
  if echo "$CMD" | grep -qE "$pattern"; then
    echo "BLOCKED: Command matches prohibited pattern: $pattern" >&2
    exit 2  # Exit 2 = message returned to model
  fi
done

# Log all commands for audit trail
echo "$(date -u +"%Y-%m-%dT%H:%M:%SZ") CMD: $CMD" >> logs/harness-audit.log

exit 0  # Allow command to proceed

⚠

MCP server risk: Don't enable all MCP servers at once. Each MCP server expands the agent's tool surface. Run npx ecc-agentshield scan (AgentShield, 2026) to audit your MCP config, hooks, and CLAUDE.md for injection risks, permission over-grants, and secret exposure patterns before production use.

GitHub Copilot & Codex CLI

// HARNESS CONFIGURATION FOR EDITOR-NATIVE AGENTS

GitHub Copilot and OpenAI Codex CLI each provide harness configuration primitives that parallel Claude Code's CLAUDE.md/hooks model. Copilot uses repository instruction files and organization-level policy. Codex uses AGENTS.md and sandbox permissions. Understanding the harness surface of each tool lets you apply the same disciplined engineering regardless of which agent is running.

Config File

CLAUDE.md + settings.json

.github/copilot-instructions.md

AGENTS.md

Hooks / Lifecycle

YES — PreToolUse, PostToolUse, PostTurn

PARTIAL — GitHub Actions triggers

NO — static config only

Permission Scoping

FINE-GRAINED — per-tool allow/deny

MEDIUM — org policy + repo rules

MEDIUM — sandbox profiles

Observability

Custom hooks + OTel integration

GitHub Audit Log + Copilot metrics

AGENTS.md + external logging

HITL

Custom approval gate via hooks

PR review workflow (native)

Approval-required flag in config

MCP Support

YES — full MCP client

YES — via Agent HQ (Feb 2026)

LIMITED — plugin model

Feedback Loop

Built-in + customizable via hooks

CI/CD pipeline feedback

AGENTS.md defines test commands

AGENTS.md — Codex Harness Config

Markdown — AGENTS.md (OpenAI Codex)

# AGENTS.md — OpenAI Codex Harness Configuration
# Equivalent to CLAUDE.md for the Codex CLI environment

## Project
Python FastAPI service. Tests: pytest. Linting: ruff. Type checking: mypy.

## Commands
# These are the ONLY commands the agent should run unsupervised
- Run tests:     pytest tests/ -v --tb=short
- Lint:          ruff check . --fix
- Type check:    mypy src/ --strict
- Format:        ruff format .

## Sandbox Permissions (network isolation)
# Codex runs in a network-isolated sandbox by default
network: disabled   # prevent exfiltration, force offline operation

## Required Workflow
For EVERY code change, in this order:
1. Make targeted, minimal changes
2. Run mypy — zero new errors
3. Run ruff — auto-fix, then zero remaining issues
4. Run pytest — all existing tests must pass
5. Add tests for new behavior (minimum 1 test per new function)

## Boundaries
- Modify ONLY files under src/ and tests/
- Do NOT modify pyproject.toml or requirements.txt without asking
- Do NOT create new API endpoints without showing the design first
- Do NOT use subprocess in application code

## On Failure
If any check fails after 2 attempts: output the exact error and ask for guidance.
Do NOT try a third fix attempt independently.

Orchestration Frameworks

// LANGCHAIN, LANGGRAPH, CREWAI, AND BEYOND

Orchestration frameworks provide composable harness primitives — prompt templates, tool definitions, memory abstractions, and agent loop logic — so teams don't build the execution layer from scratch. Choice of framework shapes harness architecture. Pick based on production requirements, not demo simplicity.

Framework	Model	Best For	Harness Maturity	Production Uses
LangGraph	Graph / stateful	Complex multi-step workflows with branching and cycles	HIGH	Klarna, Replit, Elastic
LangChain	Chain / pipeline	Rapid prototyping, broad integrations (100k+ stars)	MEDIUM	Widely deployed, vary in rigor
CrewAI	Multi-agent crews + Flows	Role-based agent teams; Flows (2026) add event-driven orchestration for structured pipelines	MEDIUM	Enterprise automation pilots
Google ADK	Agent Development Kit	2026 integrations ecosystem (Hugging Face, GitHub, Notion, Daytona) for wiring external services without losing determinism	HIGH	Google Cloud–native enterprise
AWS Bedrock AgentCore	Runtime + gateway	Full production stack: runtime, memory, identity, observability	HIGH	AWS-native enterprise
Custom + Anthropic SDK	Direct API	Maximum control, minimum abstraction overhead	MAX (manual)	High-reliability production systems

▸

LangGraph v1.0 (October 2025): Graph-based stateful orchestration for complex agent workflows. Nodes are agent steps; edges are conditional routing logic; state is a typed dict persisted across the graph. First-class support for human-in-the-loop checkpoints, streaming, and tool-use error recovery. The reference framework for production multi-agent harnesses as of 2026.

MCP & A2A Protocols

// TOOL INTEROPERABILITY AND AGENT-TO-AGENT DELEGATION

The Model Context Protocol (MCP) is the emerging standard for tool interoperability — any MCP-compliant client (Claude Code, Copilot, LangChain) can consume any MCP-compliant tool server without bespoke integration. The Agent-to-Agent (A2A) protocol extends this to multi-agent topologies: orchestrator agents delegate to specialist subagents through a standardized calling convention.

Model Context Protocol (MCP) Standard

Defines the interface between model clients and tool servers. Tool servers expose: capabilities manifest, tool schemas, and a call endpoint. Clients invoke tools by name with validated inputs. The harness governs which MCP servers are loaded per session and what their permission scopes are.

Supported by: Claude Code, Copilot Agent HQ, LangChain, LangGraph, Cursor, and thousands of community servers. The combined Python and TypeScript MCP SDKs surpassed 97 million monthly downloads by March 2026, up from roughly 100,000 at launch in late 2024 — every major AI provider now ships MCP-compatible tooling.

Agent-to-Agent (A2A) 2026

Standardized protocol for orchestrator→subagent delegation. An orchestrator agent can spawn a specialist subagent (code reviewer, security scanner, documentation writer) as a structured API call. A2A ensures: typed task contracts, result schemas, error propagation, and cost attribution across agent boundaries.

Reference: IEEE CAI 2026 Tutorial on Engineering Trustworthy Multi-Agent Systems.

MCP Harness Configuration Example

JSON — MCP Server Config (.claude/settings.json)

// Only include servers needed for the current project.
// Each server = expanded attack surface. Audit with ecc-agentshield.
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "./src"],
      // Scoped to ./src only — not the full filesystem
      "description": "Read/write access to project source files only"
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
        // Never hardcode tokens — use env var references
      }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres",
               "postgresql://readonly_user@localhost/mydb"],
      // READ-ONLY database user — never give agents write access to prod DB
      "description": "Read-only access to development database schema and data"
    }
    // Intentionally excluded: shell execution, network access, cloud control plane
  }
}

Harness Maturity Model

// WHERE IS YOUR TEAM TODAY

Harness engineering maturity maps to the CISA ZTA model structure: four stages across five dimensions. Teams rarely advance all dimensions simultaneously. Start with guardrails and feedback loops (highest safety ROI), then add observability and HITL rigor, then optimize orchestration.

Stage 01

Ad Hoc

No config files
No permission scoping
No feedback loop
Manual testing only
No cost visibility
All-or-nothing iteration

Stage 02

Structured

CLAUDE.md / AGENTS.md present
Basic allow/deny rules
Test feedback loop
Manual HITL checkpoints
Basic cost logging
Iteration cap set

Stage 03

Production

Multi-layer guardrails
Structured error recovery
OTel traces + metrics
Automated HITL gates
Offline eval suite (CI)
MCP scoped per project

Stage 04

Optimal

Policy-as-code guardrails
Dynamic HITL thresholds
Full A2A + orchestration
Continuous harness eval
Cost attribution per team
Security scan in CI/CD

Patterns & Anti-Patterns

// WHAT WORKS, WHAT FAILS, AND WHY

✕ Anti-Patterns (Production Failures)

Unbounded iteration — no max_iterations cap; agent burns tokens until OOM or API limit
Silent errors — tool returns error as empty string; model assumes success
Broad file access — agent has write access to entire repo including secrets, CI config
No state checkpoints — 20-iteration task has no recovery point on failure
Prompt-only guardrails — "don't touch .env" in the prompt; not enforced deterministically
No loop detection — same tool called 8× with identical inputs before token exhaustion
All MCPs enabled — every integration loaded regardless of current task
Human gate theater — HITL approval always auto-approves due to timeout policy

✓ Production Patterns (What Ships)

Bounded loops — hard iteration cap; structured handoff artifact on max-iter
Structured tool errors — every error has type, detail, and suggested recovery hint
Minimal tool surface — 3–5 scoped tools per session, not a full tool registry
Explicit checkpointing — write progress artifact to disk every N iterations
Deterministic guardrails — hook scripts enforce rules at OS level, not prompt level
Loop detection — circuit breaker triggers HITL on 3× same tool+input
Task-scoped MCP — only load servers required for the current task category
Tiered HITL — auto-approve low-risk, manual-approve destructive, reject on timeout

⚠

Prompt injection via tools: When an agent reads external content (files, web pages, emails, database rows) and that content contains instructions ("Ignore previous instructions and..."), indirect prompt injection attacks can hijack the agent loop. Defense: treat all external tool output as untrusted data, not as instructions. Sanitize tool outputs before injection into the conversation; use a dedicated trusted/untrusted content zone in the context window.

Implementation Roadmap

// SIX STEPS TO PRODUCTION-GRADE HARNESS

Whether you're starting from a bare API call or retrofitting an existing agent, the path to production harness engineering follows the same six steps. Each step independently improves reliability. The order matters: guardrails before observability, feedback loop before HITL tuning.

Roadmap — 6-Step Harness Engineering Path

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1 — ASSET INVENTORY (Day 1–2)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✦ List every tool the agent currently uses or could use
  ✦ Classify each tool: read-only / write / destructive
  ✦ Identify data the agent can touch (files, DBs, APIs, credentials)
  ✦ Map current failure modes: what breaks, how often, impact

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 2 — CONFIGURATION FILE (Day 3–4)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✦ Create CLAUDE.md / AGENTS.md with:
      - Explicit allowed operations
      - Explicit prohibited operations
      - Required verification steps after any change
      - Escalation triggers (when to stop and ask)
  ✦ Set iteration cap (start: 15 for coding, 30 for research)
  ✦ Define context compaction strategy (compact at 50%, not 95%)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 3 — FEEDBACK LOOP (Week 1)   ROI: HIGHEST
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✦ Implement write-test-fix cycle:
      - Agent writes → tests run automatically → failures injected as next input
  ✦ Add output validation: secret detection + static analysis + lint
  ✦ Structure error messages: type + detail + recovery hint
  ✦ Set max fix attempts (2–3) before escalating to human

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 4 — GUARDRAILS (Week 2)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✦ Add pre-tool hook: validate commands before execution
  ✦ Scope file/directory access to minimum needed
  ✦ Add loop detection (3× same tool+input = escalate)
  ✦ Set token budget + cost alert threshold
  ✦ Define HITL gates for all destructive operations

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 5 — OBSERVABILITY (Week 3)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✦ Instrument all model calls with OTel spans
  ✦ Log: every tool call, duration, error rate, cost
  ✦ Emit structured session summary on completion
  ✦ Set up dashboard: completion rate, avg iterations, cost/task
  ✦ Add post-turn hook for audit trail

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 6 — EVALUATE AND ITERATE (Ongoing)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✦ Build an offline eval suite (10–20 representative tasks)
  ✦ Run eval in CI on every harness config change
  ✦ Track: completion rate, cost/task, iteration count, error rate
  ✦ Review HITL logs weekly — tune thresholds based on outcomes
  ✦ Scan harness config for security issues (ecc-agentshield)
  ✦ Update CLAUDE.md/AGENTS.md as project evolves

Common Failure Modes Avoid

Starting with orchestration before feedback loop is solid
Treating CLAUDE.md as documentation, not a contract
Giving the same agent all MCP servers for all tasks
HITL gates that always auto-approve due to short timeouts
No eval suite — can't tell if harness changes help or hurt
Measuring only task success rate, not cost efficiency

Success Indicators Target

Task completion rate >85% without human intervention
Agent self-corrects test failures in <3 iterations
No secret or PII ever appears in agent output
Cost per task stable within ±20% across model upgrades
Full trace replay possible for any failed session
Harness config checked into source control, reviewed in PR

▸

Reference implementations and resources: NIST AI Agent Standards Initiative (Feb 2026) · OWASP LLM Top 10 (2025, current through 2026) — especially LLM01 Prompt Injection and LLM06 Excessive Agency · IEEE CAI 2026 — Engineering Trustworthy Multi-Agent Systems · everything-claude-code (Anthropic Hackathon Winner, 140k⭐) — battle-tested hooks, skills, MCP configs · ecc-agentshield — harness security scanner · AWS AgentCore Samples — full production harness reference · Lilian Weng, Harness Engineering for Self-Improvement (July 2026) · Microsoft, Context Engineering for Reliable AI Agents: Lessons from Building Azure SRE Agent — GA April 2026, autonomously handling 35,000+ production incidents and cutting App Service time-to-mitigation from 40.5 hours to 3 minutes · OpenAI, Harness Engineering: Leveraging Codex in an Agent-First World · Harness, State of Engineering Excellence 2026 · ai-boost/awesome-harness-engineering (curated GitHub list)

HarnessEngineering

What Is Harness Engineering

Inner Harness vs. Outer Harness

Why It Matters in 2026

The Three-Layer Model

Agent Execution Loop (Plan–Execute–Verify)

Tool Orchestration

Tool Registry Design

Multi-Tool Workflow: Write–Test–Fix Loop

Guardrails & Safety

Guardrail Implementation: CLAUDE.md / AGENTS.md

Output Validation Gate

Error Recovery

Feedback Loops

Structured Feedback Injection

Observability

Structured Session Logging

Key Metrics Dashboard: What to Track

Human-in-the-Loop

HITL Approval Flow Implementation

Claude Code Harness

GitHub Copilot & Codex CLI

AGENTS.md — Codex Harness Config

Orchestration Frameworks

MCP & A2A Protocols

MCP Harness Configuration Example

Harness Maturity Model

Patterns & Anti-Patterns

Implementation Roadmap

Harness
Engineering