Back to handbooks index

Palantir Foundry Developer Handbook

A production-oriented handbook for engineers, FDEs, analytics leads, and application builders working across Foundry's data integration, transforms, Ontology, and operational application stack.

Official docs synthesis Code Repositories + Ontology + Workshop Enterprise architecture focus March 2026

Table of Contents

This handbook follows the same mental model Palantir uses in the documentation: data first lands as datasets, is shaped into reliable pipelines, becomes business-native through the Ontology, and is then activated in analytics and operational applications.

Module 1: Foundry Architecture and Core Concepts Integration -> Transformation -> Ontology -> Apps

Platform philosophy, Compass, Projects, folders, RIDs, data branching, and Data Lineage.

Module 2: Data Integration and Ingestion Data Connection

How Foundry connects to JDBC systems, APIs, file stores, and operational systems, then lands raw data as governed datasets.

Module 3: Data Transformation in Code Repositories transforms.api

Foundry-native Python transforms, incremental pipelines, and when to choose Code Repositories over Pipeline Builder.

Module 4: The Ontology Objects + Links + Actions

Why Foundry models the business as an operational graph instead of leaving teams with raw tables and joins.

Module 5: Analytics and Operational Applications Code Workspaces, Contour, Quiver, Workshop

How analysts, data scientists, and operational teams consume and act on data without breaking governance.

Module 6: Security and Governance Markings + Lineage

Mandatory controls, project roles, policy enforcement, and how security propagates with data rather than relying on ad hoc dashboards.

Module 7: Real-World Use Cases Supply Chain + AML

End-to-end scenarios tying ingestion, transforms, Ontology, ML, analytics, and actions into operational systems.

Module 1: Foundry Architecture and Core Concepts

The Platform Philosophy

Foundry is not trying to be a prettier data lake. It is trying to close the gap between data engineering and operations. The core journey is: Integration -> Transformation -> Ontology -> Applications. Each step adds more structure, more accountability, and more operational usefulness.

External systems
->
Raw datasets
->
Curated pipelines
->
Ontology objects
->
Workshop / Quiver / APIs

A useful analogy is an industrial refinery. Raw crude oil is valuable, but nobody wants to run a logistics operation directly on crude oil. You refine it into diesel, jet fuel, and lubricants with known quality and governance. Foundry does the same for enterprise data: raw ERP extracts and API payloads are refined into reusable datasets, then elevated into business-native entities like Factory, Part, Shipment, or Transaction.

i
Why Foundry feels different from a traditional data stack: in a warehouse-centric stack, the end state is often a table and a BI dashboard. In Foundry, the end state is meant to be an operational capability: a governed object model, reusable logic, and applications that can safely write decisions back into the system.
Integration
Foundry connects to source systems and lands data as datasets with transactions, permissions, lineage, and scheduling built in.
Transformation
Pipelines in Code Repositories or Pipeline Builder convert raw data into curated, testable, governed datasets.
Ontology
The semantic and operational layer that maps datasets into objects, properties, links, actions, and functions.
Applications
Workshop, Quiver, Contour, Code Workspaces, and the OSDK allow teams to analyze, decide, and act on the same governed substrate.

Compass and the Filesystem

Compass is the shared file-and-resource layer of Foundry. Public documentation describes Projects and resources as the basic building blocks of the platform. A Project is the collaboration boundary. A resource is the thing inside it: dataset, repository, analysis, application, report, or other artifact.

Think of a Project like a secured building, folders like rooms, and resources like the equipment in those rooms. The label on the door may change, but the serial number engraved on the machine stays fixed. That serial number is the RID.

Foundry uses the term resource instead of file because many resources, such as datasets and repositories, contain internal files of their own.
ConceptWhat it isWhy it matters
ProjectPrimary collaboration and permission boundaryStandardizes access, ownership, and discoverability for related work
FolderOrganizational container inside a ProjectKeeps complex delivery programs navigable
ResourceDataset, repo, dashboard, app, report, or other assetCommon security, metadata, comments, sharing, and auditing model
RIDStable unique identifier for a resourceDecouples references from fragile human-readable paths

Datasets themselves are wrappers around files stored in a backing filesystem, often cloud object storage. The value of the dataset abstraction is not the bytes alone. It is the managed metadata around them: schema, transactions, branches, permissions, lineage, and build semantics.

Branching and Data Lineage

This is where Foundry usually clicks for engineering teams. Most platforms version code. Foundry versions code and data together. Public documentation explicitly describes dataset transactions and dataset branches as the basis of Foundry's "Git for data" behavior.

+
Why this matters operationally: you are not only testing new transformation logic on a branch. You are testing the resulting data products on a branch too. That means you can inspect branch-specific outputs before they affect downstream consumers.

Traditional data stacks usually force teams into one of two bad choices: either test against production-like data outside the main pipeline, or run risky changes directly in shared production tables. Foundry's branch-aware build system provides a third option: a safe rehearsal environment where both transformation logic and downstream datasets can evolve together.

Code branch:        feature/late-shipments
Dataset branches:   raw_orders@master, curated_orders@feature, alerts@feature
Build fallback:     feature -> master

Result:
- unchanged upstream inputs can still be read from master
- changed transforms publish branch-specific JobSpecs
- changed outputs materialize only on the feature branch
- downstream users on master see no disruption

Data Lineage is the explainer surface for all of this. The docs describe it as an interactive tool for holistically viewing how data flows through the platform. In practice, it is the control tower for understanding:

The best analogy is software dependency tracing plus change impact analysis, but for data products. Instead of asking "What services call this API?", you ask "If I change this transform or this upstream source, which datasets, object types, dashboards, and Workshop modules become affected?"

Traditional stack problemFoundry answerEnterprise value
Opaque ETL jobsLineage graph across datasets, code, and buildsFaster root-cause analysis and safer change review
Shared production tables make experimentation riskyCode branches plus dataset branchesParallel development without corrupting shared outputs
Security reviews happen after data is copied aroundLineage-aware security inheritanceAccess control moves with the data automatically
!
Important nuance: dataset branches are not Git branches in every respect. The docs explicitly note that datasets do not support dataset-branch merging the way Git does. Foundry instead uses the build system and code review process to move tested logic into the main branch and then materialize new canonical data there.

Module 2: Data Integration and Ingestion

Data Connection

Foundry's public documentation frames ingestion around the Data Connection framework. That framework is designed to manage source connections over time using dataset transactions, branching, granular security, and synchronized metadata. In field language, practitioners often refer to the underlying connector and orchestration pattern as Magritte and source agents; in the product docs, the supported surface is the Data Connection application and its source-specific sync capabilities.

At a practical level, Data Connection gives Foundry a standard contract for pulling from very different sources:

JDBC / Warehouses
ERP databases, SQL Server, PostgreSQL, Oracle, Snowflake, and similar sources are pulled into Foundry as datasets or virtualized for downstream use.
REST APIs
External systems can be synced or exported to using Data Connection and external transforms, especially for scheduled pull or push workflows.
Object Stores and Files
S3 and other file-based sources map naturally onto dataset transactions such as snapshot mirrors or append-only mirrors.

A good analogy is a managed loading dock for the enterprise. The loading dock does not care whether the incoming goods arrived by truck, ship, or rail. It standardizes intake, manifests, timestamps, security checks, and hand-off into the warehouse. Data Connection plays that role for source systems.

Syncs and Schedules

Foundry distinguishes between getting data in once and getting data in reliably over time. That sounds obvious, but it is where many data platforms quietly fail. A one-off extract is not a data product. A repeatable sync with lineage, permissions, and clear transaction semantics is.

PatternHow it worksWhen to use it
Direct or manual ingestionUpload or one-time import into a dataset resourceBootstrapping, prototypes, ad hoc investigations
Scheduled syncRecurring ingestion that lands new dataset transactions over timeOperational production pipelines
Virtual accessExpose source data without full replication in some casesLatency-sensitive or governance-constrained access patterns
External transformsCode-driven scheduled API interaction using Code RepositoriesCustom REST ingestion and outbound integration workflows

What lands in the Foundry filesystem is usually a raw dataset. That raw dataset is intentionally close to source reality. You do not want business logic hidden in the loading step. The source system should remain auditable, and the refinement should happen downstream in transformations.

i
Why Foundry insists on explicit landing zones: separating raw ingestion from curated transformation makes incident response, replay, backfills, and governance substantially easier. When a downstream KPI breaks, teams can verify whether the issue came from source extraction, transformation logic, or ontology mapping.

Transaction-Aware Ingestion

The public dataset documentation is explicit that ingestion style matters because it determines downstream pipeline behavior:

If you are building a serious production pipeline, the ingestion mode is not just a connector setting. It is an architectural decision that determines cost profile, latency, and whether downstream pipelines can remain incremental.

Module 3: Data Transformation in Code Repositories

The Python Transforms API

Foundry's transforms.api is the contract between your Python code and Foundry's build system. This is what turns a PySpark function into a governed pipeline step with declared inputs, outputs, lineage, checks, preview support, branching, and scheduling.

That declaration step is the important difference from generic PySpark. In open Spark, you can read anything and write anywhere as long as the cluster permits it. In Foundry, you declare the data contract up front so the platform can reason about lineage, impact, permissions, and builds.

Copy-Pasteable PySpark Transform

from pyspark.sql import functions as F
from transforms.api import transform_df, Input, Output


@transform_df(
    Output("/Acme/SupplyChain/curated/factory_part_demand"),
    shipments=Input("/Acme/SupplyChain/raw/erp_shipments"),
    part_master=Input("/Acme/SupplyChain/master/parts"),
)
def compute(shipments, part_master):
    return (
        shipments
        .join(part_master, on="part_id", how="left")
        .withColumn("required_date", F.to_date("required_timestamp"))
        .withColumn("late_flag", F.col("promised_timestamp") > F.col("required_timestamp"))
        .withColumn("open_value_usd", F.round(F.col("quantity") * F.col("unit_cost_usd"), 2))
        .select(
            "shipment_id",
            "factory_id",
            "part_id",
            "part_description",
            "required_date",
            "quantity",
            "open_value_usd",
            "late_flag",
        )
    )

This snippet uses the exact Foundry wrapper style requested: from transforms.api import transform_df, Input, Output. The platform knows:

Why Incremental Processing Matters

Official Foundry documentation is direct here: incremental pipelines avoid recomputing unchanged data and are often necessary when input scale is high. If your transaction logs are growing by millions of rows per day, full snapshots are a tax you will keep paying forever.

The analogy is simple. A nightly batch rebuild is like recalculating every bank account in the country because one customer made a deposit. Incremental processing instead says: process the new deposit, update the affected state, and move on.

Incremental PySpark Transform

from pyspark.sql import functions as F
from transforms.api import (
    incremental,
    transform,
    Input,
    Output,
    IncrementalTransformInput,
)


@incremental()
@transform(
    risk_scores=Output("/Acme/AML/curated/transaction_risk_scores"),
    transactions=Input("/Acme/AML/raw/daily_transactions"),
    customers=Input("/Acme/AML/master/customers"),
)
def compute(ctx, risk_scores, transactions: IncrementalTransformInput, customers):
    new_transactions = transactions.dataframe("added")

    if ctx.is_incremental and new_transactions.rdd.isEmpty():
        return

    customer_df = customers.dataframe()

    scored = (
        new_transactions
        .join(customer_df, on="customer_id", how="left")
        .withColumn(
            "risk_score",
            F.when(F.col("amount_usd") >= 10000, F.lit(0.60)).otherwise(F.lit(0.10))
            + F.when(F.col("high_risk_country") == F.lit(True), F.lit(0.25)).otherwise(F.lit(0.00))
            + F.when(F.col("pep_flag") == F.lit(True), F.lit(0.15)).otherwise(F.lit(0.00))
        )
        .withColumn(
            "risk_bucket",
            F.when(F.col("risk_score") >= 0.75, F.lit("HIGH")).otherwise(F.lit("STANDARD"))
        )
        .select(
            "transaction_id",
            "customer_id",
            "booking_date",
            "amount_usd",
            "risk_score",
            "risk_bucket",
        )
    )

    risk_scores.write_dataframe(scored)

This example uses IncrementalTransformInput directly and reads only the added window from the transactions input, which is the exact capability documented in the API reference. That is what keeps the transform proportional to new data instead of proportional to total historical data.

!
Architectural warning: incremental pipelines are powerful but more fragile. Foundry's docs explicitly note that you must understand transaction behavior and be resilient to upstream SNAPSHOT or UPDATE events. Use incremental logic when the volume justifies the added complexity.

Code Repositories vs Pipeline Builder

Both are first-class. The wrong move is treating one as "for engineers" and the other as "for non-engineers." The real decision is about control, complexity, and maintainability.

Choose thisBest forTrade-off
Code RepositoriesComplex business logic, PySpark, custom libraries, tests, code review, reusable engineering standardsHigher engineering overhead, slower for simple mappings
Pipeline BuilderFast delivery, visual composition, common joins/filters, streaming and batch patterns, lower-code deliveryLess expressive for specialized logic or heavy software-engineering workflows

A useful heuristic:

+
Foundry-specific value: whichever authoring surface you choose, the platform still gives you the same underlying benefits: builds, branch-aware execution, lineage, health checks, scheduling, and permission-aware data products.

Module 4: The Ontology

Objects, Links, and Properties

The Ontology is the heart of Foundry because it changes the question from "What tables do we have?" to "What parts of the business are we representing, and how do they relate?" Public documentation describes the Ontology as an operational layer sitting on top of datasets, models, and other digital assets, connecting them to their real-world counterparts.

Object Type
A business entity such as Factory, Part, Supplier, Customer, Case, or Transaction.
Property
A field on that entity: status, amount, capacity, priority, owner, or date.
Link Type
A typed relationship between objects: Factory consumes Part, Customer owns Account, Transaction belongs to Case.

The conceptual shift is from row-oriented thinking to domain-oriented thinking. In a warehouse, a user may need to remember that fact_shipments.factory_id = dim_factory.id. In the Ontology, a user works with a Factory object that already knows its related Shipment, Supplier, or Part objects. The join logic becomes part of the platform's semantic contract rather than tribal knowledge in SQL.

If datasets are the refined fuel, the Ontology is the engine block. It gives the business a machine-readable representation of how the organization actually works.

Why Foundry Pushes the Ontology So Hard

Actions and Functions

Foundry distinguishes between action types and functions. Action types are the user-facing, governed transaction surface. Functions are server-side logic units that can compute values, return object sets, or generate Ontology edits. When an edit function is wired into an action type, users can safely write decisions back into the Ontology.

That is why Foundry applications are more than dashboards. A planner can change the state of a shipment. An investigator can escalate a transaction. An operations lead can assign ownership. The logic is centralized, audited, and permissioned.

Function-Backed Ontology Action Example

import { Shipment } from "@ontology/sdk";
import { Client, Osdk } from "@osdk/client";
import { createEditBatch, Edits } from "@osdk/functions";

type ShipmentEdit = Edits.Object<Shipment>;

export default function requestExpedite(
    client: Client,
    shipment: Osdk.Instance<Shipment>,
    requestedBy: string,
    reason: string,
): ShipmentEdit[] {
    const batch = createEditBatch<ShipmentEdit>(client);

    batch.update(shipment, {
        status: "EXPEDITE_REQUESTED",
        expediteRequestedBy: requestedBy,
        expediteReason: reason,
        expediteRequestedAt: new Date().toISOString(),
    });

    return batch.getEdits();
}

This snippet follows the official TypeScript v2 functions pattern for Ontology edits: define an Edits type, create an edit batch with createEditBatch, update the object, and return the edits. Per the docs, the edits only take effect when the function is configured as a function-backed action.

i
Where the security comes from: not from burying checks in frontend code. The action type can enforce who may run it, what parameters are required, what submission criteria apply, and what side effects or validations should happen. That is the enterprise advantage over hand-built CRUD screens.

Read-Oriented Python Function Example

from functions.api import function
from ontology_sdk import FoundryClient
from ontology_sdk.ontology.objects import Transaction
from ontology_sdk.ontology.object_sets import TransactionObjectSet


@function
def high_risk_transactions(min_score: float) -> TransactionObjectSet:
    client = FoundryClient()
    return client.ontology.objects.Transaction.where(
        Transaction.object_type.riskScore >= min_score
    )

Use read-oriented functions like this when you need server-side logic for Workshop, Quiver, or other operational interfaces. Use edit-returning functions when the workflow needs governed writeback.

Module 5: Analytics and Operational Applications

Code Workspaces

Code Workspaces gives users managed JupyterLab, RStudio, and VS Code environments inside Foundry. The public docs emphasize that these workspaces inherit Foundry's security, permissions, branching, scheduling, and repository infrastructure.

For data scientists, the value is straightforward: work in a familiar notebook or IDE, but on the same governed datasets and object model as the rest of the platform. No shadow copy. No side channel. No unsecured export just to train a model.

+
Why this matters: in many organizations, notebooks become a compliance escape hatch. Code Workspaces turns them into a first-class governed surface backed by Code Repositories and platform security.

Contour and Quiver

Contour and Quiver are both analysis tools, but they sit on different mental models.

ToolBest mental modelBest use case
ContourTable-centric, point-and-click analysis at scaleLarge tabular analysis, dataset derivation, low-code transformations, dashboards over tables
QuiverObject-aware and time-series-aware analyticsOntology-driven analysis, linked-object exploration, operational dashboards, time-series workflows

A simple analogy: Contour is closer to a governed, scalable spreadsheet-plus-query environment for tables. Quiver is closer to an operational analytics canvas where objects and signals are native citizens.

Workshop

Workshop is Foundry's operational application builder. The docs describe it as a flexible, object-oriented application building tool that uses Ontology objects, links, actions, and functions as first-class building blocks. That is the key distinction: it is not merely a dashboard builder.

Think of Workshop as the last mile between a governed digital twin and the humans who need to operate the business. A CRM, an alert triage desk, a maintenance queue, a parts shortage cockpit, and a fraud review inbox are all natural Workshop workloads.

What Workshop gives you
Layouts, widgets, events, object-aware views, actions, and function-backed logic, all aligned to the Ontology and the platform design system.
What it saves you from
Rebuilding auth, access control, search, writeback rules, object joins, and frontend infrastructure for every operational app.

Why Foundry does this differently: in a traditional stack, the BI layer is read-only and the operational app is a separate engineering program. Foundry tries to collapse that gap so that analytics and operations run on the same semantic and governance substrate.

Module 6: Security and Governance

Markings and Mandatory Controls

Foundry's security model is built around the idea that access control should travel with the data, not be bolted onto the final dashboard. The docs frame this as a combination of mandatory controls and discretionary controls.

The public docs are especially clear that markings inherit both through the file hierarchy and through data dependencies. That means a sensitive upstream dataset can automatically impose additional data requirements on downstream derivatives.

Raw PII dataset
->
Curated customer dataset
->
Ontology object
->
Workshop app

If the raw dataset carries a PII marking, that constraint propagates unless it is deliberately and correctly removed as part of an approved transformation stage. This is exactly why compliance teams like the platform. You do not have to hope every downstream analyst remembered the sensitivity level. The platform enforces it.

CBAC and PBAC in Practice

In customer conversations you will often hear CBAC and PBAC. Foundry's public docs emphasize markings, organizations, roles, restricted views, and additional data requirements more than those acronyms, but the enterprise interpretation is usually:

ModelHow to think about it in FoundryTypical implementation surface
CBACClassification-based access control. Access depends on the sensitivity classification attached to data.Markings, organizations, inherited data requirements, project boundaries
PBACPurpose-based access control. Access is constrained to approved workflows and legitimate business purpose.Project roles, application-specific access, action permissions, functions, restricted views, policy-driven workflow design

In other words, CBAC answers "What classification is this data?" PBAC answers "Even if I can see it, what am I allowed to do with it in this workflow?" Foundry's advantage is that both questions can be enforced inside the same lineage-aware platform.

i
Enterprise value: most data platforms secure storage, then leave downstream applications to reinvent authorization. Foundry extends security into datasets, ontology objects, actions, functions, and applications so governance does not disappear at the exact moment humans start making decisions.

Automatic Propagation Through Lineage

The docs explicitly state that restricting access to a dataset restricts access to downstream derived data because markings inherit along data dependencies. That propagation is one of the most important reasons Foundry commands enterprise budget:

Security review becomes much more legible too. The access checker and Data Lineage views let teams inspect not only whether a user has access to a resource, but whether they meet additional data requirements inherited from lineage.

Module 7: Real-World Use Cases

Scenario 1: Supply Chain Command Center

Goal: ingest ERP and logistics data, produce operational Factory and Part objects, and let planners trigger an Expedite Shipment action from a Workshop application.

ERP / WMS / Supplier feeds
->
Raw datasets
->
PySpark transforms
->
Factory + Part objects
->
Workshop command center

How the pieces combine

Why Foundry is strong here

In a conventional stack, the command center is often a fragile front-end project sitting on top of replicated warehouse views and custom service endpoints. In Foundry, the app can directly use object-aware search, links, actions, and security on top of the Ontology.

Representative action

A Workshop button calls a function-backed action similar to the requestExpedite example above. That action can require planner permissions, enforce that only at-risk shipments are eligible, and write the decision into the shipment object's writeback dataset so the whole organization sees the new state.

Scenario 2: Anti-Money Laundering Alerting

Goal: process daily transaction logs incrementally, score transactions in a model workflow, and surface high-risk Transaction objects in an investigator inbox.

Core banking logs
->
Append-only raw transactions
->
Incremental risk transform
->
Model scoring
->
Transaction objects + inbox app

How the pieces combine

Where governance matters most

AML is a textbook case for lineage-aware security. Case data may require investigation-specific markings so one case team cannot casually inspect another case's evidence. The documentation's markings examples explicitly highlight case-based access control as a strong use case.

Representative model workflow

1. Land daily transactions as APPEND dataset transactions.
2. Run an incremental transform to derive features only for newly added records.
3. Score the new feature set from Code Workspaces or a model integration workflow.
4. Publish high-risk records to a curated dataset and map them into Transaction objects.
5. Surface those objects in a Workshop inbox.
6. Let investigators trigger actions such as Open Case, Escalate, or Dismiss with Reason.

The result is not just "a fraud dashboard." It is a governed operational system that unifies ingestion, scoring, review, and decision capture.

Foundry Patterns to Remember

Raw first, semantics later
Land data close to source truth, curate in transforms, then map to the Ontology. Do not hide business logic inside ingestion.
Branch both logic and outputs
Use Foundry branching to test not only code changes, but the resulting data products and downstream impact safely.
Use the Ontology for reuse
If three dashboards need the same business entity, model it once as an object type rather than re-implementing joins three times.
Security travels with data
Markings and lineage inheritance are not bureaucracy. They are what make it possible to operationalize sensitive data without constant manual policing.

Official Docs Used