LangSmith Handbook

A production-oriented reference for LangSmith tracing, datasets, evaluation, feedback loops, and prompt management across LangChain and framework-agnostic Python applications.

Framework Agnostic Latest langsmith SDK Python First Observability + Eval

This release includes the complete handbook: setup, tracing, datasets, evaluation, metadata, feedback workflows, and prompt management. The structure stays aligned with the reference handbook while the examples stay current with the modern LangSmith SDK.

1 Module 1: Setup & Core Client
2 Module 2: Tracing & Observability
3 Module 3: Datasets & Test Cases
4 Module 4: Evaluation (Offline Testing)
5 Module 5: Metadata, Tags, & User Feedback
6 Module 6: The LangChain Prompt Hub

ℹ

Design principle: LangSmith should be presented as a standalone observability and evaluation platform, not as a feature hidden behind LangChain. This handbook will emphasize the vanilla langsmith SDK throughout.

Module 1: Setup & Core Client

LangSmith tracing can be turned on with a minimal environment contract and then accessed programmatically through the Client. In practice, this means you can start with environment-only tracing for fast adoption, then graduate to the SDK for datasets, evaluation, automation, and production feedback workflows.

Environment Configuration Tracing Baseline

Required variables: LANGSMITH_TRACING, LANGSMITH_API_KEY, and LANGSMITH_PROJECT. These are the core switches that enable tracing, authenticate your process, and group runs under a project.

Enable tracing Authenticate SDK Route runs to project

Variable	Purpose	Example Value
`LANGSMITH_TRACING`	Turns tracing on for supported integrations and SDK helpers.	`true`
`LANGSMITH_API_KEY`	Authenticates requests to LangSmith.	`lsv2_pt_...`
`LANGSMITH_PROJECT`	Sets the logical project name shown in the UI.	`customer-support-prod`

💡

Optional but useful: set LANGSMITH_WORKSPACE_ID if your API key can access multiple workspaces, and set LANGSMITH_ENDPOINT if you use a self-hosted or regional deployment.

Recommended Local Setup Script

The snippet below is Python instead of shell so it can be pasted directly into a local bootstrap script, test harness, notebook, or demo app. In production, prefer a secrets manager or runtime environment injection rather than hardcoding values.

import os


def configure_langsmith_environment() -> None:
    """Set the minimum LangSmith environment required for tracing.

    This is convenient for local demos and reproducible examples.
    In production, use your platform's secret manager or environment injection.
    """
    os.environ["LANGSMITH_TRACING"] = "true"
    os.environ["LANGSMITH_API_KEY"] = "YOUR_LANGSMITH_API_KEY"
    os.environ["LANGSMITH_PROJECT"] = "langsmith-handbook-dev"

    # Optional when one API key can access more than one workspace.
    # os.environ["LANGSMITH_WORKSPACE_ID"] = "YOUR_WORKSPACE_ID"

    # Optional for EU, self-hosted, or custom LangSmith deployments.
    # os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"


def validate_langsmith_environment() -> None:
    """Fail early if a required variable is missing.

    Explicit validation is useful in CI, worker boot, and local smoke tests.
    """
    required_keys = (
        "LANGSMITH_TRACING",
        "LANGSMITH_API_KEY",
        "LANGSMITH_PROJECT",
    )

    missing = [key for key in required_keys if not os.getenv(key)]
    if missing:
        raise RuntimeError(f"Missing LangSmith configuration: {', '.join(missing)}")


if __name__ == "__main__":
    configure_langsmith_environment()
    validate_langsmith_environment()
    print("LangSmith environment is configured correctly.")

What This Enables

LangChain auto-tracing: supported LangChain executions will log to LangSmith once tracing is enabled.
Vanilla Python instrumentation: the same project and credentials are reused by @traceable, wrappers, evaluation helpers, and the API client.
Consistent project boundaries: all runs from the process are grouped under the configured project unless overridden dynamically.

Client Initialization Client()

Standard: initialize Client from the langsmith package when you need to create datasets, examples, feedback, evaluations, or interact with LangSmith programmatically. The client can read credentials from environment variables or be configured explicitly.

Programmatic API access Datasets Feedback

Environment-Driven Client

This is the preferred default for application code because it keeps secrets out of source and aligns with standard deployment workflows.

import os
from langsmith import Client


def build_langsmith_client() -> Client:
    """Create a LangSmith client using environment-based configuration.

    The Client automatically reads LANGSMITH_API_KEY and related settings,
    so this version works well in apps, jobs, and CI pipelines.
    """
    if not os.getenv("LANGSMITH_API_KEY"):
        raise RuntimeError("LANGSMITH_API_KEY must be set before creating Client().")

    return Client()


if __name__ == "__main__":
    client = build_langsmith_client()
    print(f"LangSmith client initialized: {client.__class__.__name__}")

Explicit Client Configuration

Use explicit configuration when environment variables are not available, when you are bootstrapping tracing programmatically, or when you need to target a custom LangSmith endpoint.

from langsmith import Client


def build_explicit_langsmith_client() -> Client:
    """Create a LangSmith client with explicit API settings.

    This pattern is useful when credentials come from a secret manager,
    a deployment platform, or a custom configuration service.
    """
    return Client(
        api_key="YOUR_LANGSMITH_API_KEY",
        api_url="https://api.smith.langchain.com",
    )


if __name__ == "__main__":
    client = build_explicit_langsmith_client()
    print("Explicit LangSmith client created successfully.")

Minimal Programmatic API Example

The example below proves the client is more than a tracing helper. It is the entry point for platform automation and should be treated like a first-class SDK object in your MLOps codebase.

from typing import Iterable

from langsmith import Client


def list_project_names(limit: int = 5) -> list[str]:
    """Return a small sample of project names visible to the client.

    Reading back projects is a simple smoke test that confirms the client can
    authenticate and talk to the LangSmith API successfully.
    """
    client = Client()
    projects: Iterable = client.list_projects(limit=limit)
    return [project.name for project in projects]


if __name__ == "__main__":
    names = list_project_names(limit=5)
    print("Visible LangSmith projects:")
    for name in names:
        print(f"- {name}")

⚠

Operational rule: use environment-based configuration for deployed workloads and explicit Client(...) initialization only when you intentionally need to override the runtime environment.

Module 1 Summary

Tracing baseline: set LANGSMITH_TRACING, LANGSMITH_API_KEY, and LANGSMITH_PROJECT.
Programmatic access: use from langsmith import Client for datasets, evaluation, feedback, and platform automation.
Deployment guidance: prefer environment variables in production; use explicit client parameters when you need custom endpoints or runtime-only secret retrieval.

Module 2: Tracing & Observability

LangSmith tracing works in two complementary modes. With LangChain, tracing can be almost automatic once the environment is configured. Without LangChain, the langsmith SDK gives you explicit instrumentation primitives such as @traceable, the trace context manager, and client wrappers for provider SDKs like OpenAI.

ℹ

Framework-agnostic rule: LangSmith is not limited to LangChain pipelines. You can trace plain Python services, FastAPI endpoints, batch jobs, evaluation runners, or custom orchestration frameworks using the same SDK and project model.

With LangChain Auto-traced by environment

Standard behavior: once LANGSMITH_TRACING=true and your credentials are configured, LangChain LCEL chains automatically emit traces to LangSmith. No extra tracing code is required for the basic path.

LCEL Zero extra tracing code Project-aware

import os

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI


def run_langchain_autotraced_example() -> str:
    """Run a simple LCEL chain that is auto-traced by LangSmith.

    As long as LANGSMITH_TRACING, LANGSMITH_API_KEY, and LANGSMITH_PROJECT are
    configured in the environment, this invocation will appear in LangSmith.
    """
    required_keys = (
        "LANGSMITH_TRACING",
        "LANGSMITH_API_KEY",
        "LANGSMITH_PROJECT",
        "OPENAI_API_KEY",
    )
    missing = [key for key in required_keys if not os.getenv(key)]
    if missing:
        raise RuntimeError(f"Missing configuration: {', '.join(missing)}")

    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                "You are a helpful assistant. Answer only from the given context.",
            ),
            ("user", "Question: {question}\nContext: {context}"),
        ]
    )
    model = ChatOpenAI(model="gpt-4.1-mini")
    parser = StrOutputParser()
    chain = prompt | model | parser

    return chain.invoke(
        {
            "question": "What happened in this morning's meeting?",
            "context": "The team finalized the migration plan and assigned owners.",
        }
    )


if __name__ == "__main__":
    print(run_langchain_autotraced_example())

Without LangChain (Vanilla Python) @traceable

Recommended SDK primitive: use the @traceable decorator to mark custom functions as LangSmith runs. Choose a run_type such as chain, tool, or llm so the trace renders correctly in the LangSmith UI.

Custom Python code Nested traces run_type aware

from openai import OpenAI
from langsmith import traceable


openai_client = OpenAI()


@traceable(run_type="tool", name="Build Context")
def build_context(question: str) -> str:
    """Return retrieval context as a traceable tool step.

    Marking this as a tool keeps the trace tree semantically meaningful.
    """
    if "meeting" in question.lower():
        return "During the meeting, the migration was approved and owners were assigned."
    return "No relevant context was found."


@traceable(run_type="llm", name="OpenAI Summarizer")
def call_llm(messages: list[dict[str, str]]) -> str:
    """Call the LLM and record the step as an LLM run.

    Using run_type='llm' helps LangSmith render token, latency, and model data
    appropriately for this node in the trace tree.
    """
    response = openai_client.chat.completions.create(
        model="gpt-4.1-mini",
        temperature=0,
        messages=messages,
    )
    return response.choices[0].message.content or ""


@traceable(run_type="chain", name="Summarize Question")
def summarize_question(question: str) -> str:
    """Compose a small pipeline using traceable child steps.

    LangSmith automatically nests traceable calls made inside this function.
    """
    context = build_context(question)
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant. Answer only from the provided context.",
        },
        {
            "role": "user",
            "content": f"Question: {question}\nContext: {context}",
        },
    ]
    return call_llm(messages)


if __name__ == "__main__":
    answer = summarize_question("Can you summarize the meeting?")
    print(answer)

💡

Trace hierarchy matters: make the outer orchestration function run_type="chain", helper retrieval or enrichment functions run_type="tool", and model calls run_type="llm". This makes the trace readable for humans and evaluators.

SDK Wrappers wrap_openai

Best use case: if you already call the native OpenAI SDK directly, wrap the client once with langsmith.wrappers.wrap_openai. This preserves your existing code style while automatically tracing chat and responses API calls.

Native OpenAI SDK Sync and async supported Minimal refactor

import openai
from langsmith.wrappers import wrap_openai


def run_wrapped_openai_example() -> str:
    """Trace a native OpenAI SDK call without switching to LangChain.

    The wrapped client behaves like the normal OpenAI client, but LangSmith
    automatically captures the request and response as a traced run.
    """
    client = wrap_openai(openai.OpenAI())

    messages = [
        {"role": "system", "content": "You are a concise assistant."},
        {
            "role": "user",
            "content": "List three reasons why observability matters in LLM systems.",
        },
    ]

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    )
    return completion.choices[0].message.content or ""


if __name__ == "__main__":
    print(run_wrapped_openai_example())

import openai
from langsmith.wrappers import wrap_openai


def run_wrapped_openai_responses_api() -> str:
    """Trace the OpenAI Responses API through the same wrapped client.

    This is useful when your application uses the newer OpenAI responses style
    instead of chat.completions.
    """
    client = wrap_openai(openai.OpenAI())

    response = client.responses.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is LangSmith used for?"},
        ],
    )
    return response.output_text


if __name__ == "__main__":
    print(run_wrapped_openai_responses_api())

Module 3: Datasets & Test Cases

Datasets are the backbone of repeatable evaluation in LangSmith. They let you store canonical inputs, expected outputs, metadata, and trace-linked examples so offline testing and regression analysis can run against a stable benchmark instead of ad hoc prompts.

Creating Datasets client.create_dataset

Standard: create datasets programmatically so they can be versioned, recreated in CI, and managed as part of your evaluation pipeline. For most app testing, the default kv data type is appropriate.

Programmatic setup CI-friendly Evaluation ready

from langsmith import Client
from langsmith.schemas import DataType


def create_langsmith_dataset() -> str:
    """Create a dataset for offline QA and return its ID.

    Defining the schemas up front helps other engineers understand exactly what
    each example in the dataset is expected to contain.
    """
    client = Client()

    dataset = client.create_dataset(
        dataset_name="support-agent-regression-suite",
        description="Regression test cases for customer support answer quality.",
        data_type=DataType.kv,
        inputs_schema={
            "type": "object",
            "properties": {
                "question": {"type": "string"},
                "context": {"type": "string"},
            },
            "required": ["question", "context"],
        },
        outputs_schema={
            "type": "object",
            "properties": {
                "answer": {"type": "string"},
            },
            "required": ["answer"],
        },
        metadata={"owner": "ml-platform", "use_case": "offline-evaluation"},
    )
    return str(dataset.id)


if __name__ == "__main__":
    print(create_langsmith_dataset())

Adding Examples client.create_examples

Preferred bulk pattern: use create_examples(..., examples=[...]) with a single list of example objects. This is the current recommended API instead of older split-argument upload patterns.

Bulk insert Inputs + outputs Metadata + splits

from typing import Any

from langsmith import Client


def populate_dataset_examples() -> dict[str, Any]:
    """Populate a dataset with question and answer examples.

    The examples list is the modern API because each row stays self-contained,
    making the code easier to review and update over time.
    """
    client = Client()

    examples = [
        {
            "inputs": {
                "question": "What happened in the morning migration meeting?",
                "context": "The migration plan was approved and owners were assigned.",
            },
            "outputs": {
                "answer": "The migration plan was approved and owners were assigned.",
            },
            "metadata": {"difficulty": "easy", "topic": "meetings"},
            "splits": ["train"],
        },
        {
            "inputs": {
                "question": "Who owns the migration follow-up work?",
                "context": "Ava owns rollout coordination and Liam owns validation.",
            },
            "outputs": {
                "answer": "Ava owns rollout coordination and Liam owns validation.",
            },
            "metadata": {"difficulty": "medium", "topic": "ownership"},
            "splits": ["test"],
        },
    ]

    return client.create_examples(
        dataset_name="support-agent-regression-suite",
        examples=examples,
    )


if __name__ == "__main__":
    response = populate_dataset_examples()
    print(response)

💡

Good dataset hygiene: include metadata such as difficulty, topic, or product area, and use splits like train, test, or validation to organize how the dataset is consumed later by evaluators.

Creating Examples from Existing Traces client.create_example_from_run

Production feedback loop: promote successful production runs into datasets so you can turn real traffic into regression test cases. This is one of the strongest LangSmith workflows because it connects observability directly to evaluation.

Production traces Regression capture Real-world examples

from langsmith import Client


def create_example_from_known_run(run_id: str) -> str:
    """Promote an existing production run into a dataset example.

    This is useful when a real user interaction becomes a strong regression test
    case that you want to preserve for future evaluation runs.
    """
    client = Client()
    run = client.read_run(run_id)
    example = client.create_example_from_run(
        run,
        dataset_name="support-agent-regression-suite",
    )
    return str(example.id)


if __name__ == "__main__":
    print(create_example_from_known_run("YOUR_RUN_ID"))

from typing import Iterable

from langsmith import Client


def backfill_examples_from_recent_production_runs(limit: int = 3) -> list[str]:
    """Create dataset examples from recent production runs.

    This pattern is useful for curating a regression suite from high-value or
    representative real-world traffic after manual review.
    """
    client = Client()

    recent_runs: Iterable = client.list_runs(
        project_name="customer-support-prod",
        limit=limit,
    )

    created_example_ids: list[str] = []
    for run in recent_runs:
        # Only promote successful runs that contain both inputs and outputs.
        if not getattr(run, "outputs", None):
            continue

        example = client.create_example_from_run(
            run,
            dataset_name="support-agent-regression-suite",
        )
        created_example_ids.append(str(example.id))

    return created_example_ids


if __name__ == "__main__":
    print(backfill_examples_from_recent_production_runs())

Module 3 Summary

Create datasets programmatically: use client.create_dataset(...) so evaluation infrastructure can be reproduced reliably.
Use bulk example creation: prefer client.create_examples(..., examples=[...]) for clean, modern dataset population.
Promote real traces into test cases: use client.create_example_from_run(...) to turn production observations into durable regression coverage.

Module 4: Evaluation (Offline Testing)

Tracing explains what happened in production. Evaluation tells you whether the system is getting better. In LangSmith, the usual pattern is to define a target application function, point it at a dataset, and run one or more evaluators that score quality dimensions such as correctness, helpfulness, retrieval quality, or policy compliance.

Running Evaluations Against a Dataset langsmith.evaluation.evaluate

Core workflow: give LangSmith a target callable, a dataset or iterable of examples, and a list of evaluators. LangSmith will execute the target over the dataset, record the experiment, and store evaluator outputs alongside traces for later comparison.

Offline regression Experiment tracking Repeatable QA

from langsmith import traceable
from langsmith.evaluation import evaluate


@traceable(name="support_agent", run_type="chain")
def support_agent(inputs: dict) -> dict:
    question = inputs["question"]

    if "refund" in question.lower():
        return {"answer": "Refunds can be requested within 30 days of purchase."}

    return {"answer": "Please contact support@example.com for account help."}


def exact_match(outputs: dict, reference_outputs: dict) -> dict:
    predicted = outputs.get("answer", "").strip().lower()
    expected = reference_outputs.get("answer", "").strip().lower()

    return {
        "key": "exact_match",
        "score": 1 if predicted == expected else 0,
    }


if __name__ == "__main__":
    experiment_results = evaluate(
        target=support_agent,
        data="support-agent-regression-suite",
        evaluators=[exact_match],
        experiment_prefix="support-agent-offline",
        description="Baseline regression sweep for the current support workflow.",
        max_concurrency=4,
    )
    print(experiment_results)

ℹ

Important shape rule: keep your target function input and output schema stable. Evaluators become much easier to write when all runs return a predictable dictionary such as {"answer": ...} or {"answer": ..., "citations": ...}.

Custom Evaluators Deterministic checks

Use when you can define a rule: exact match, keyword presence, citation coverage, JSON schema validity, or refusal-policy checks are usually better handled by deterministic code before you reach for a judge model.

Cheap Auditable Fast feedback

import re


def contains_order_number(outputs: dict) -> dict:
    answer = outputs.get("answer", "")
    has_order_number = bool(re.search(r"ORD-\d{6}", answer))

    return {
        "key": "contains_order_number",
        "score": 1 if has_order_number else 0,
        "comment": "Checks whether the answer includes a formatted order number.",
    }


def mentions_refund_window(outputs: dict, reference_outputs: dict) -> dict:
    answer = outputs.get("answer", "").lower()
    expected_phrase = reference_outputs.get("required_phrase", "").lower()

    return {
        "key": "mentions_refund_window",
        "score": 1 if expected_phrase and expected_phrase in answer else 0,
    }

LLM-as-a-Judge Model-based grading

Use when the quality bar is semantic: helpfulness, tone, groundedness, completeness, and rubric-based scoring often need language-model judgment. Keep the rubric explicit and return a structured score plus comment.

Semantic quality Rubric scoring Human-like review

import json

from openai import OpenAI

from langsmith.wrappers import wrap_openai


judge_client = wrap_openai(OpenAI())


def helpfulness_judge(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    rubric = """
    Score the assistant response from 0 to 1.
    1.0 = fully correct and directly helpful
    0.5 = partially helpful or incomplete
    0.0 = incorrect, misleading, or irrelevant

    Return strict JSON with keys: score, reasoning.
    """.strip()

    response = judge_client.responses.create(
        model="gpt-4.1-mini",
        input=[
            {
                "role": "system",
                "content": rubric,
            },
            {
                "role": "user",
                "content": json.dumps(
                    {
                        "question": inputs.get("question"),
                        "assistant_answer": outputs.get("answer"),
                        "reference_answer": reference_outputs.get("answer"),
                    }
                ),
            },
        ],
    )

    payload = json.loads(response.output_text)
    return {
        "key": "helpfulness_judge",
        "score": float(payload["score"]),
        "comment": payload["reasoning"],
    }

⚠

Evaluator discipline: deterministic evaluators should come first. Judge models are powerful, but they cost more, introduce variance, and can hide rubric ambiguity if you have not first defined what success means.

Module 4 Summary

Use evaluate() for repeatable experiments: target plus dataset plus evaluators is the core offline testing loop.
Prefer deterministic evaluators where possible: they are cheaper, clearer, and easier to debug.
Add model-based judges only for semantic criteria: rubric-driven grading works best when you require nuanced quality assessment.

Module 5: Metadata, Tags, & User Feedback

Production observability becomes useful when traces carry business context. Metadata and tags let you slice runs by tenant, feature flag, model version, channel, or experiment branch. The feedback API then closes the loop by attaching human or automated judgments directly to a run or trace.

Attaching Runtime Metadata and Tags get_current_run_tree

Use inside traced execution: fetch the current run tree, then enrich it with details that matter later during filtering or root-cause analysis. This is where you annotate traces with the exact context that product teams care about.

Tenant context Feature flags Routing metadata

from langsmith import get_current_run_tree, traceable


@traceable(name="answer_support_question", run_type="chain", tags=["support"])
def answer_support_question(question: str, customer_tier: str, channel: str) -> dict:
    run_tree = get_current_run_tree()
    if run_tree is not None:
        run_tree.metadata["customer_tier"] = customer_tier
        run_tree.metadata["channel"] = channel
        run_tree.metadata["workflow_version"] = "2025-03-router-a"
        run_tree.tags.extend([
            f"tier:{customer_tier}",
            f"channel:{channel}",
            "experience:support",
        ])

    return {
        "answer": f"Handled question '{question}' for {customer_tier} tier customer."
    }

from langsmith import trace


def sync_customer_profile(customer_id: str) -> dict:
    with trace(
        "sync_customer_profile",
        run_type="tool",
        tags=["crm", "profile-sync"],
        metadata={"customer_id": customer_id, "source": "nightly-job"},
    ):
        return {"status": "ok", "customer_id": customer_id}

💡

Choose metadata deliberately: put filterable facts into metadata, use tags for broad grouping, and avoid stuffing large payloads into either. The goal is fast triage, not dumping full business objects into every trace.

Recording User or QA Feedback client.create_feedback

Use after a run completes: attach a score, value, and comment to the run that produced an answer. This lets you build dashboards around thumbs-up rates, agent defects, human-review outcomes, or safety audits.

Human review Thumbs up/down QA annotation

from langsmith import Client


def record_user_feedback(
    run_id: str,
    trace_id: str,
    was_helpful: bool,
    comment: str | None = None,
) -> None:
    client = Client()
    client.create_feedback(
        run_id=run_id,
        trace_id=trace_id,
        key="user_helpfulness",
        score=1.0 if was_helpful else 0.0,
        value={"label": "thumbs_up" if was_helpful else "thumbs_down"},
        comment=comment,
    )

from langsmith import Client, get_current_run_tree, traceable


client = Client()


@traceable(name="moderated_answer", run_type="chain")
def moderated_answer(question: str) -> dict:
    run_tree = get_current_run_tree()

    answer = {"answer": "Please reset your password from the account settings page."}

    if run_tree is not None:
        client.create_feedback(
            run_id=run_tree.id,
            trace_id=run_tree.trace_id,
            key="policy_review",
            score=1.0,
            comment="Passed automated policy checks.",
            value={"reviewer": "automated-guardrail"},
        )

    return answer

Signal	Where To Store It	Typical Example
Customer tier	Metadata	`{"customer_tier": "enterprise"}`
Broad grouping	Tags	`["support", "web-chat"]`
User judgment	Feedback	`score=1.0`, `key="user_helpfulness"`

Module 5 Summary

Annotate runs with business context: metadata and tags make traces searchable and operationally useful.
Use get_current_run_tree() inside traced code: it is the clean way to enrich the active run.
Attach explicit feedback to runs: client.create_feedback(...) turns production judgments into durable quality signals.

Module 6: The LangChain Prompt Hub

The modern Prompt Hub is backed by LangSmith. You can pull published prompts directly into code, pin a specific prompt revision for reproducibility, and push private prompt templates from local development into your workspace. The most practical approach is to use the LangChain helper functions when you are already inside a LangChain application, and the LangSmith client directly when you want tighter SDK control.

Pulling Prompts Into Code langchain_classic.hub.pull

Use for prompt reuse and pinning: pull a community prompt, your own private prompt, or a specific revision so evaluations and deployments stay reproducible.

Prompt reuse Version pinning LangChain helper

import os

from langchain_classic import hub


public_prompt = hub.pull("efriis/my-first-prompt")

private_prompt = hub.pull(
    "support-agent-router",
    api_key=os.environ["LANGSMITH_API_KEY"],
)

pinned_prompt = hub.pull(
    "acme/support-agent-router:YOUR_COMMIT_HASH",
    api_key=os.environ["LANGSMITH_API_KEY"],
)

print(type(public_prompt))
print(type(private_prompt))
print(type(pinned_prompt))

ℹ

Identifier rule: pull accepts owner/prompt_name, owner/prompt_name:commit_hash, or just prompt_name when you are addressing your own private prompt repository.

Publishing a Prompt to a Private Hub Repo langchain_classic.hub.push

Use when a local prompt becomes shared infrastructure: push a prompt template into the Hub so application code, evaluations, and teammates can all reference the same artifact.

Private prompt repo Team collaboration Prompt release flow

import os

from langchain_core.prompts import ChatPromptTemplate
from langchain_classic import hub


router_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a support router. Classify tickets into billing, account, or technical support.",
        ),
        ("user", "{question}"),
    ]
)

hub_url = hub.push(
    "support-agent-router",
    router_prompt,
    api_key=os.environ["LANGSMITH_API_KEY"],
    new_repo_is_public=False,
    new_repo_description="Internal router prompt for support ticket triage.",
    readme="# Support Agent Router\n\nInternal prompt used by the production support workflow.",
    tags=["support", "router", "internal"],
)

print(hub_url)

Using the LangSmith Client Directly client.pull_prompt / client.push_prompt

Use when you want SDK-native control: the LangSmith client exposes prompt APIs directly, including flags like include_model, tags, description, and repository visibility.

SDK-native Prompt management LangSmith first

from langchain_core.prompts import ChatPromptTemplate
from langsmith import Client


client = Client()

prompt = client.pull_prompt("support-agent-router", include_model=False)

published_url = client.push_prompt(
    "support-agent-router-v2",
    object=ChatPromptTemplate.from_messages(
        [
            ("system", "Answer support questions using policy-approved wording only."),
            ("user", "{question}"),
        ]
    ),
    is_public=False,
    description="Second-generation prompt with stricter policy wording.",
    tags=["support", "policy", "draft"],
)

print(prompt)
print(published_url)

⚠

Security note: if you use prompt-loading features that can hydrate secrets from the environment, only enable them for trusted prompt repositories. Treat prompts as code, not as passive content.

Module 6 Summary

Pull prompts directly into application code: use the Hub for reuse, versioning, and reproducible experiments.
Push shared prompt templates into private repositories: this turns prompt changes into managed artifacts instead of local string edits.
Use LangChain helpers or the LangSmith client depending on context: both paths target the same Prompt Hub capabilities.

LangSmith Handbook

Table of Contents

Module 1: Setup & Core Client

Recommended Local Setup Script

What This Enables

Environment-Driven Client

Explicit Client Configuration

Minimal Programmatic API Example

Module 1 Summary

Module 2: Tracing & Observability

Module 3: Datasets & Test Cases

Module 3 Summary

Module 4: Evaluation (Offline Testing)

Module 4 Summary

Module 5: Metadata, Tags, & User Feedback

Module 5 Summary

Module 6: The LangChain Prompt Hub

Module 6 Summary