Back to handbooks index

PostgreSQL: Intermediate to Advanced Developer Handbook

Production-focused PostgreSQL patterns for engineers moving beyond CRUD into advanced joins, ranking, recursive queries, stored procedures, PL/pgSQL functions, and JSONB-heavy workloads.

PostgreSQL 16+ Advanced SQL Window Functions PL/pgSQL · JSONB April 2026
ℹ
Schema theme: the examples keep using an e-commerce platform, but at this level the focus is operational reporting, hierarchical data, programmable database logic, and semi-structured payloads.

Module 1: Advanced Joins & Set Operators

At the intermediate-to-advanced level, joins go beyond simple INNER and LEFT. This module covers join patterns that solve real operational problems — hierarchical data, parameterized subqueries, combinatorial grids, and result-set algebra.

SELF JOIN — Hierarchical & Intra-Table Relationships

A self join connects a table to itself using two aliases. It is the natural pattern for any data that has a parent-child relationship stored in the same table — org charts, category trees, referral chains, or bill-of-materials structures.

-- Employee hierarchy: who reports to whom?
SELECT
    e.employee_id,
    e.employee_name  AS employee,
    e.department,
    m.employee_name  AS direct_manager,
    m.department     AS manager_department
FROM hr.employees AS e
LEFT JOIN hr.employees AS m
    ON e.manager_id = m.employee_id
ORDER BY m.employee_name NULLS FIRST, e.employee_name;

Finding peers — employees who share the same manager:

SELECT
    a.employee_name AS peer_a,
    b.employee_name AS peer_b,
    m.employee_name AS shared_manager
FROM hr.employees AS a
INNER JOIN hr.employees AS b
    ON a.manager_id = b.manager_id
    AND a.employee_id < b.employee_id          -- avoid duplicates & self-pairs
INNER JOIN hr.employees AS m
    ON m.employee_id = a.manager_id
ORDER BY shared_manager, peer_a;
💡
The a.employee_id < b.employee_id trick eliminates both self-pairs (Alice–Alice) and mirror duplicates (Alice–Bob / Bob–Alice). Without it, the result is cluttered and misleading.

Detecting referral chains in an e-commerce context:

-- Users who referred other users (single level)
SELECT
    referrer.user_id   AS referrer_id,
    referrer.full_name AS referrer_name,
    referred.user_id   AS referred_id,
    referred.full_name AS referred_name,
    referred.created_at AS signup_date
FROM sales.users AS referred
INNER JOIN sales.users AS referrer
    ON referred.referred_by_user_id = referrer.user_id
ORDER BY referrer.full_name, referred.created_at;

CROSS JOIN — Intentional Cartesian Products

A cross join produces every combination of rows from two sources. It is dangerous on large tables (M × N rows), but deliberately powerful for generating grids, scaffolds, and gap-detection frameworks from small reference data.

-- Shipping price grid: every region × every tier
SELECT
    region.region_name,
    tier.tier_name,
    region.base_rate + tier.surcharge AS estimated_rate
FROM (VALUES
    ('North America', 5.00), ('Europe', 8.50), ('APAC', 12.00)
) AS region(region_name, base_rate)
CROSS JOIN (VALUES
    ('Standard', 0.00), ('Express', 4.50), ('Overnight', 9.00)
) AS tier(tier_name, surcharge)
ORDER BY region.region_name, estimated_rate;

Gap detection scaffold — cross join a user list with a month series, then left join actual data to find zeros:

WITH months AS (
    SELECT generate_series(
        DATE '2026-01-01',
        DATE '2026-06-01',
        INTERVAL '1 month'
    )::DATE AS month_start
),
active_users AS (
    SELECT user_id, full_name
    FROM sales.users
    WHERE is_active = TRUE
)
SELECT
    au.user_id,
    au.full_name,
    m.month_start,
    COUNT(o.order_id)              AS order_count,
    COALESCE(SUM(o.total_amount), 0) AS month_revenue
FROM active_users AS au
CROSS JOIN months AS m
LEFT JOIN sales.orders AS o
    ON  o.user_id = au.user_id
    AND DATE_TRUNC('month', o.order_date) = m.month_start
    AND o.order_status IN ('paid', 'shipped', 'delivered')
GROUP BY au.user_id, au.full_name, m.month_start
ORDER BY au.user_id, m.month_start;
âš 
Guard rail: never cross join two production tables without thinking about cardinality. 10K × 10K = 100 million rows. Use CROSS JOIN only with small reference sets, VALUES lists, or generate_series output.

LATERAL — The Parameterized Subquery Join

LATERAL lets a subquery reference columns from the outer query row. PostgreSQL evaluates the subquery once per outer row, making it the SQL equivalent of a correlated for-each loop. This is the cleanest pattern for "top-N per group," derived slices, and parameterized lookups.

Pattern 1 — Latest N Orders Per User

SELECT
    u.user_id,
    u.full_name,
    recent.order_id,
    recent.order_date,
    recent.total_amount
FROM sales.users AS u
LEFT JOIN LATERAL (
    SELECT o.order_id, o.order_date, o.total_amount
    FROM sales.orders AS o
    WHERE o.user_id = u.user_id
    ORDER BY o.order_date DESC
    LIMIT 3
) AS recent
    ON TRUE
ORDER BY u.user_id, recent.order_date DESC;

ON TRUE is required because the filtering already happened inside the lateral subquery. With LEFT JOIN LATERAL, users who have zero orders still appear (with NULL columns from recent).

Pattern 2 — Inline Aggregation Per Row

-- For each product category, get aggregated order stats
SELECT
    c.category_id,
    c.category_name,
    stats.total_orders,
    stats.total_revenue,
    stats.avg_order_value
FROM catalog.categories AS c
LEFT JOIN LATERAL (
    SELECT
        COUNT(*)                    AS total_orders,
        COALESCE(SUM(oi.line_total), 0) AS total_revenue,
        ROUND(AVG(oi.line_total), 2)    AS avg_order_value
    FROM sales.order_items AS oi
    INNER JOIN catalog.products AS p
        ON p.product_id = oi.product_id
    WHERE p.category_id = c.category_id
) AS stats
    ON TRUE
ORDER BY stats.total_revenue DESC NULLS LAST;

Pattern 3 — LATERAL with unnest for Array Expansion

-- Expand a JSONB array of tags stored on each product
SELECT
    p.product_id,
    p.product_name,
    tag.value AS tag
FROM catalog.products AS p
LEFT JOIN LATERAL jsonb_array_elements_text(p.tags) AS tag(value)
    ON TRUE
ORDER BY p.product_id, tag;
ℹ
LATERAL vs. correlated subquery in SELECT: a correlated subquery in the SELECT list can return only one value. LATERAL can return multiple rows and multiple columns, making it far more flexible for set-returning operations.

Set Operators — Result-Set Algebra

Set operators combine or compare entire result sets vertically. Both queries must produce the same number of columns with compatible data types. Column names come from the first query.

OperatorReturnsDuplicatesPerformance Note
UNIONAll unique rows from both queriesRemovedImplicit sort or hash — extra work
UNION ALLAll rows from both queriesKeptNo dedup — fastest
INTERSECTRows common to both queriesRemovedHash or sort join
INTERSECT ALLCommon rows, preserving duplicate countKept (min count)Less common but useful
EXCEPTRows in query 1 not in query 2RemovedDirectional — order matters
EXCEPT ALLRows in query 1 not in query 2, preserving countsKept (subtracted)Useful for audit diffs

UNION vs. UNION ALL

-- Deduplicated email list across two sources
SELECT email FROM sales.users
UNION
SELECT email FROM sales.marketing_subscribers
ORDER BY email;
-- Keep all rows (e.g., event log merge from regional shards)
SELECT order_id, event_type, created_at FROM sales.order_events_na
UNION ALL
SELECT order_id, event_type, created_at FROM sales.order_events_eu
ORDER BY created_at DESC;

INTERSECT — Finding Overlap

-- Customers who are also newsletter subscribers
SELECT email FROM sales.users
INTERSECT
SELECT email FROM sales.marketing_subscribers;

EXCEPT — Finding Differences

-- Subscribers who have NOT created an account
SELECT email FROM sales.marketing_subscribers
EXCEPT
SELECT email FROM sales.users;

Advanced: Combining Set Operators

You can chain set operators and control precedence with parentheses. INTERSECT binds tighter than UNION by default, but explicit grouping is always clearer:

-- Users who bought in Q1 but NOT in Q2
(
    SELECT DISTINCT user_id
    FROM sales.orders
    WHERE order_date BETWEEN '2026-01-01' AND '2026-03-31'
)
EXCEPT
(
    SELECT DISTINCT user_id
    FROM sales.orders
    WHERE order_date BETWEEN '2026-04-01' AND '2026-06-30'
);
-- Users who bought in BOTH Q1 AND Q2
(
    SELECT DISTINCT user_id
    FROM sales.orders
    WHERE order_date BETWEEN '2026-01-01' AND '2026-03-31'
)
INTERSECT
(
    SELECT DISTINCT user_id
    FROM sales.orders
    WHERE order_date BETWEEN '2026-04-01' AND '2026-06-30'
);
💡
Performance: UNION ALL is always cheaper than UNION. Prefer UNION ALL when you know the sources are already distinct, or when duplicates are semantically correct (e.g., event logs). Use EXCEPT over anti-joins when readability matters more than fine-tuned index usage.

Module 2: Window Functions & Ranking

The OVER() clause defines an analytical window. Unlike GROUP BY, which collapses many rows into one summary row, a window function keeps each original row and attaches a calculation beside it. This module covers the ranking family, offset navigation, running aggregates, and distribution functions.

The Anatomy of OVER()

ClausePurposeExample
PARTITION BYSplits rows into groups (like GROUP BY, but without collapsing)PARTITION BY user_id
ORDER BYDetermines sequence within each partitionORDER BY total_amount DESC
Frame clauseDefines the sliding window of rows for aggregate functionsROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Ranking Functions — Deep Dive

ROW_NUMBER() — Unique Sequence

ROW_NUMBER() assigns a unique integer per row within its partition, even when ties exist. This is the go-to for "pick exactly one row per group" (deduplication, latest record per entity).

-- Most recent order per user (dedup pattern)
WITH ranked AS (
    SELECT
        user_id,
        order_id,
        order_date,
        total_amount,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY order_date DESC
        ) AS rn
    FROM sales.orders
)
SELECT user_id, order_id, order_date, total_amount
FROM ranked
WHERE rn = 1
ORDER BY order_date DESC;

RANK() — Gaps After Ties

RANK() gives tied rows the same number, then skips. Two rows tied for rank 1 means the next row gets rank 3. Use this when the competitive gap matters (leaderboards, contest standings).

SELECT
    user_id,
    order_id,
    total_amount,
    RANK() OVER (
        PARTITION BY user_id
        ORDER BY total_amount DESC
    ) AS spending_rank
FROM sales.orders
ORDER BY user_id, spending_rank;

DENSE_RANK() — No Gaps

DENSE_RANK() keeps consecutive numbers even with ties. Two rows tied for rank 1 means the next row gets rank 2. Best when you need “how many distinct tiers exist” rather than absolute position.

SELECT
    user_id,
    order_id,
    total_amount,
    DENSE_RANK() OVER (
        PARTITION BY user_id
        ORDER BY total_amount DESC
    ) AS dense_rank_val
FROM sales.orders
ORDER BY user_id, dense_rank_val;

Side-by-Side Comparison

SELECT
    user_id,
    order_id,
    total_amount,
    ROW_NUMBER() OVER w AS row_num,
    RANK()       OVER w AS rank_val,
    DENSE_RANK() OVER w AS d_rank_val
FROM sales.orders
WINDOW w AS (PARTITION BY user_id ORDER BY total_amount DESC)
ORDER BY user_id, total_amount DESC;
total_amountROW_NUMBERRANKDENSE_RANK
500.00111
500.00211
300.00332
150.00443
💡
Named WINDOW clause: WINDOW w AS (...) defines the window once and reuses it across multiple functions — eliminating repetition and keeping the query DRY.

NTILE(n) — Percentile Buckets

NTILE(n) divides a partition into n roughly equal buckets. Use it for percentile segmentation — customer quartiles, order deciles, or performance tiers.

-- Customer spending quartiles
SELECT
    u.user_id,
    u.full_name,
    totals.lifetime_spent,
    NTILE(4) OVER (ORDER BY totals.lifetime_spent DESC) AS quartile
FROM sales.users AS u
INNER JOIN (
    SELECT user_id, SUM(total_amount) AS lifetime_spent
    FROM sales.orders
    WHERE order_status IN ('paid', 'shipped', 'delivered')
    GROUP BY user_id
) AS totals
    ON totals.user_id = u.user_id
ORDER BY quartile, totals.lifetime_spent DESC;

Offset Functions — LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()

LAG() and LEAD()

These functions peek at neighboring rows without a self-join. LAG looks backward, LEAD looks forward. They are essential for change analysis — week-over-week growth, time gaps between events, or trend detection.

WITH weekly_sales AS (
    SELECT
        DATE_TRUNC('week', order_date) AS sales_week,
        SUM(total_amount)              AS weekly_revenue
    FROM sales.orders
    WHERE order_status IN ('paid', 'shipped', 'delivered')
    GROUP BY DATE_TRUNC('week', order_date)
)
SELECT
    sales_week,
    weekly_revenue,
    LAG(weekly_revenue)  OVER w AS prev_week,
    LEAD(weekly_revenue) OVER w AS next_week,
    weekly_revenue - LAG(weekly_revenue) OVER w AS wow_change,
    ROUND(
        (weekly_revenue - LAG(weekly_revenue) OVER w)
        / NULLIF(LAG(weekly_revenue) OVER w, 0) * 100, 1
    ) AS wow_growth_pct
FROM weekly_sales
WINDOW w AS (ORDER BY sales_week)
ORDER BY sales_week;

Time gap between consecutive orders per user:

SELECT
    user_id,
    order_id,
    order_date,
    LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_order_date,
    order_date - LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS days_since_last_order
FROM sales.orders
ORDER BY user_id, order_date;

FIRST_VALUE() and LAST_VALUE()

These functions return the first or last value in the window frame. Useful for comparing each row against a baseline (e.g., every order vs. the customer's first order).

SELECT
    user_id,
    order_id,
    order_date,
    total_amount,
    FIRST_VALUE(total_amount) OVER (
        PARTITION BY user_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS first_order_amount,
    total_amount - FIRST_VALUE(total_amount) OVER (
        PARTITION BY user_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS diff_from_first
FROM sales.orders
ORDER BY user_id, order_date;
âš 
LAST_VALUE trap: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE returns the current row — not the actual last row. Always specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when you want the true last value.

Running Aggregates

An aggregate inside OVER() creates a running calculation. Cumulative revenue, running counts, and moving averages — all without self-joins.

SELECT
    order_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_revenue,
    AVG(total_amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS moving_avg_7_orders
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
ORDER BY order_date;

Distribution Functions

PERCENT_RANK() and CUME_DIST() tell you where a row falls relative to the entire partition — useful for percentile-based reporting and outlier detection.

SELECT
    user_id,
    total_amount,
    ROUND(PERCENT_RANK() OVER (ORDER BY total_amount)::NUMERIC, 3) AS pct_rank,
    ROUND(CUME_DIST()    OVER (ORDER BY total_amount)::NUMERIC, 3) AS cumulative_dist
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
ORDER BY total_amount DESC;
ℹ
PERCENT_RANK returns 0 for the first row and scales to 1. CUME_DIST returns the fraction of rows with values ≤ the current row. Use PERCENT_RANK for "what percentile am I in?" and CUME_DIST for "what fraction of records are at or below me?"

Module 3: Common Table Expressions (CTEs)

CTEs name intermediate result sets inside a single statement. At the advanced level, they become powerful tools for hierarchical traversal, data pipelines, and even multi-table mutations.

Chaining Standard CTEs

Multiple CTEs let you break a large query into named steps. Each CTE can reference the ones defined before it, creating a readable data pipeline.

WITH paid_orders AS (
    SELECT user_id, total_amount, order_date
    FROM sales.orders
    WHERE order_status IN ('paid', 'shipped', 'delivered')
),
user_order_totals AS (
    SELECT
        user_id,
        COUNT(*)           AS completed_order_count,
        SUM(total_amount)  AS lifetime_value,
        MAX(order_date)    AS last_order_date
    FROM paid_orders
    GROUP BY user_id
),
vip_users AS (
    SELECT *
    FROM user_order_totals
    WHERE lifetime_value >= 5000.00
)
SELECT
    u.user_id,
    u.full_name,
    v.completed_order_count,
    v.lifetime_value,
    v.last_order_date
FROM vip_users AS v
INNER JOIN sales.users AS u
    ON u.user_id = v.user_id
ORDER BY v.lifetime_value DESC;

CTE Materialization Hints

By default, PostgreSQL may inline a CTE or materialize it. In PostgreSQL 12+, you can control this explicitly. MATERIALIZED forces the CTE to be evaluated once and stored as a temp result. NOT MATERIALIZED allows the optimizer to inline it into the outer query.

-- Force materialization when the CTE is referenced multiple times
WITH active_users AS MATERIALIZED (
    SELECT user_id, full_name, email
    FROM sales.users
    WHERE is_active = TRUE
)
SELECT au.full_name, COUNT(o.order_id) AS order_count
FROM active_users AS au
LEFT JOIN sales.orders AS o
    ON o.user_id = au.user_id
GROUP BY au.full_name
ORDER BY order_count DESC;
💡
When to use MATERIALIZED: when the CTE is referenced multiple times, or when it acts as an optimization fence (preventing the planner from pushing filters into the CTE). Use NOT MATERIALIZED when the CTE is used once and you want the planner to optimize it as part of the outer query.

Writable CTEs (DML inside WITH)

PostgreSQL allows INSERT, UPDATE, and DELETE inside CTEs. This creates powerful single-statement pipelines that archive, transform, and report in one atomic operation.

-- Archive old orders and return a summary in one statement
WITH archived AS (
    DELETE FROM sales.orders
    WHERE order_date < NOW() - INTERVAL '2 years'
      AND order_status = 'delivered'
    RETURNING order_id, user_id, total_amount, order_date
),
inserted AS (
    INSERT INTO archive.orders (order_id, user_id, total_amount, order_date)
    SELECT order_id, user_id, total_amount, order_date
    FROM archived
    RETURNING order_id
)
SELECT COUNT(*) AS archived_count FROM inserted;
âš 
Writable CTEs execute regardless of whether the outer query reads them. If you define a CTE with DELETE, it will delete rows even if the outer SELECT is never reached due to an error in another CTE. Test carefully.

Recursive CTEs — Hierarchical Traversal

WITH RECURSIVE has two parts: the anchor (base case) and the recursive member (step). PostgreSQL repeatedly executes the recursive member until it returns no new rows.

Org Chart with Depth and Path

WITH RECURSIVE org_chart AS (
    -- Anchor: top-level managers (no manager)
    SELECT
        employee_id,
        employee_name,
        manager_id,
        0 AS depth,
        employee_name::TEXT AS reporting_path
    FROM hr.employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive member: walk down the tree
    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        oc.depth + 1,
        oc.reporting_path || ' → ' || e.employee_name
    FROM hr.employees AS e
    INNER JOIN org_chart AS oc
        ON e.manager_id = oc.employee_id
)
SELECT employee_id, employee_name, depth, reporting_path
FROM org_chart
ORDER BY reporting_path;

Category Breadcrumb Trail

-- Build "Electronics → Phones → Smartphones" breadcrumbs
WITH RECURSIVE cat_path AS (
    SELECT
        category_id,
        category_name,
        parent_category_id,
        category_name::TEXT AS breadcrumb
    FROM catalog.categories
    WHERE parent_category_id IS NULL

    UNION ALL

    SELECT
        c.category_id,
        c.category_name,
        c.parent_category_id,
        cp.breadcrumb || ' → ' || c.category_name
    FROM catalog.categories AS c
    INNER JOIN cat_path AS cp
        ON c.parent_category_id = cp.category_id
)
SELECT category_id, category_name, breadcrumb
FROM cat_path
ORDER BY breadcrumb;

Generating a Number Series

-- Recursive CTE as a substitute for generate_series
WITH RECURSIVE nums AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM nums WHERE n < 100
)
SELECT n FROM nums;

Cycle Detection (PostgreSQL 14+)

The CYCLE clause prevents infinite loops when data has circular references (e.g., an employee who is their own indirect manager). Before PostgreSQL 14, you had to track visited nodes manually with arrays.

WITH RECURSIVE org_chart AS (
    SELECT employee_id, employee_name, manager_id, 0 AS depth
    FROM hr.employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.employee_id, e.employee_name, e.manager_id, oc.depth + 1
    FROM hr.employees AS e
    INNER JOIN org_chart AS oc
        ON e.manager_id = oc.employee_id
)
CYCLE employee_id SET is_cycle USING path
SELECT employee_id, employee_name, depth
FROM org_chart
WHERE NOT is_cycle
ORDER BY depth, employee_name;
ℹ
CTE performance tip: recursive CTEs can be slow on large hierarchies. Always include a depth limit (WHERE depth < 20) or use CYCLE to prevent runaway recursion. Index the join column (manager_id, parent_category_id) for best performance.

Module 4: Advanced Grouping & Pivoting

Standard GROUP BY produces one level of aggregation. Advanced grouping (ROLLUP, CUBE, GROUPING SETS) produces multiple levels in a single pass — subtotals, cross-totals, and grand totals without UNION ALL gymnastics.

ROLLUP — Hierarchical Subtotals

ROLLUP creates subtotals that roll up from right to left. ROLLUP(a, b) produces groups: (a, b), (a), (). Think of it as progressively removing the rightmost column.

SELECT
    sales_region,
    order_status,
    COUNT(*)           AS order_count,
    SUM(total_amount)  AS revenue
FROM sales.orders
GROUP BY ROLLUP (sales_region, order_status)
ORDER BY sales_region NULLS LAST, order_status NULLS LAST;

Result rows where order_status IS NULL are region subtotals. The row where both are NULL is the grand total.

CUBE — All Dimension Combinations

CUBE generates subtotals for every possible combination. CUBE(a, b) produces: (a, b), (a), (b), (). Use it when you need cross-dimensional summaries (e.g., revenue by region, by status, and by both).

SELECT
    sales_region,
    order_status,
    SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY CUBE (sales_region, order_status)
ORDER BY sales_region NULLS LAST, order_status NULLS LAST;

GROUPING SETS — Explicit Control

GROUPING SETS lets you specify exactly which groupings to compute. This is the most flexible option — ROLLUP and CUBE are syntactic shortcuts for common GROUPING SETS patterns.

SELECT
    sales_region,
    payment_method,
    SUM(total_amount)  AS revenue,
    COUNT(*)           AS order_count
FROM sales.orders
GROUP BY GROUPING SETS (
    (sales_region, payment_method),   -- detail
    (sales_region),                   -- subtotal by region
    (payment_method),                 -- subtotal by payment
    ()                                -- grand total
)
ORDER BY sales_region NULLS LAST, payment_method NULLS LAST;

The GROUPING() Function — Disambiguating NULLs

Subtotal rows have NULL in the grouped column, but real data can also be NULL. The GROUPING() function returns 1 when the NULL comes from the grouping operation (subtotal), and 0 when it is a real data NULL.

SELECT
    CASE WHEN GROUPING(sales_region) = 1 THEN '** ALL REGIONS **'
         ELSE COALESCE(sales_region, '(unknown)')
    END AS region_label,
    CASE WHEN GROUPING(order_status) = 1 THEN '** ALL STATUSES **'
         ELSE order_status
    END AS status_label,
    SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY ROLLUP (sales_region, order_status)
ORDER BY GROUPING(sales_region), sales_region NULLS LAST,
         GROUPING(order_status), order_status NULLS LAST;
💡
Sorting subtotals last: ORDER BY GROUPING(col), col puts detail rows before subtotals and subtotals before the grand total — the natural report layout.

Pivoting with FILTER

PostgreSQL does not have a native PIVOT keyword. The idiomatic replacement is conditional aggregation using the FILTER clause, which is both readable and performant.

SELECT
    DATE_TRUNC('month', order_date)  AS sales_month,
    COUNT(*)                                           AS total_orders,
    SUM(total_amount) FILTER (WHERE order_status = 'paid')      AS paid,
    SUM(total_amount) FILTER (WHERE order_status = 'shipped')   AS shipped,
    SUM(total_amount) FILTER (WHERE order_status = 'delivered') AS delivered,
    SUM(total_amount) FILTER (WHERE order_status = 'cancelled') AS cancelled
FROM sales.orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY sales_month;

Pivoting with CASE (Portable Alternative)

CASE-based pivoting works on all SQL databases. It is slightly more verbose than FILTER but useful when you need cross-database compatibility.

SELECT
    sales_region,
    SUM(CASE WHEN payment_method = 'credit_card' THEN total_amount ELSE 0 END) AS credit_card,
    SUM(CASE WHEN payment_method = 'paypal'      THEN total_amount ELSE 0 END) AS paypal,
    SUM(CASE WHEN payment_method = 'bank_transfer' THEN total_amount ELSE 0 END) AS bank_transfer
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
GROUP BY sales_region
ORDER BY sales_region;

Dynamic Pivoting with crosstab()

The tablefunc extension provides crosstab() for cases where the pivot categories are not known at write time.

CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT *
FROM crosstab(
    $$
    SELECT
        sales_region,
        order_status,
        SUM(total_amount)::NUMERIC
    FROM sales.orders
    GROUP BY sales_region, order_status
    ORDER BY 1, 2
    $$,
    $$ VALUES ('cancelled'), ('delivered'), ('paid'), ('shipped') $$
) AS pivot_table (
    sales_region TEXT,
    cancelled    NUMERIC,
    delivered    NUMERIC,
    paid         NUMERIC,
    shipped      NUMERIC
);
ℹ
FILTER vs. CASE vs. crosstab(): use FILTER when categories are known and fixed — it is the cleanest PostgreSQL idiom. Use CASE for portability. Use crosstab() only when categories are dynamic or unknown at design time.

Module 5: Stored Procedures & Temporary Tables

PostgreSQL procedures were introduced in version 11. Unlike functions, procedures can manage transactions with COMMIT and ROLLBACK when executed in an appropriate call context. That makes them the right tool for multi-step batch workflows, data migrations, and any operation that needs explicit transaction control within a single callable unit.

FeatureFunction (CREATE FUNCTION)Procedure (CREATE PROCEDURE)
Returns a valueYes — scalar, composite, or tableNo — uses OUT / INOUT parameters
Called withSELECT func()CALL proc()
Usable in queriesYes — in SELECT, WHERE, JOINNo
Transaction controlNoYes — COMMIT / ROLLBACK
Best forComputed values, reusable query logicBatch workflows, data migrations, multi-step mutations

Basic Procedure with IN / OUT / INOUT Parameters

IN passes values in. OUT returns values. INOUT does both.

CREATE OR REPLACE PROCEDURE sales.apply_discount(
    IN  p_user_id        INTEGER,
    IN  p_discount_pct   NUMERIC,
    INOUT p_orders_updated INTEGER DEFAULT 0
)
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE sales.orders
    SET total_amount = total_amount * (1 - p_discount_pct / 100)
    WHERE user_id = p_user_id
      AND order_status = 'pending_payment';

    GET DIAGNOSTICS p_orders_updated = ROW_COUNT;
END;
$$;
CALL sales.apply_discount(42, 15.0, NULL);
-- Returns count of updated orders via the INOUT parameter
💡
GET DIAGNOSTICS captures metadata from the last SQL statement. ROW_COUNT returns the number of affected rows — essential for logging and conditional logic inside procedures.

Control Flow & Variables

PL/pgSQL supports DECLARE blocks, IF / ELSIF / ELSE, CASE, and loops. Use them to build conditional business logic directly in the database.

CREATE OR REPLACE PROCEDURE sales.upgrade_user_tier(
    IN p_user_id INTEGER
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_total_spent NUMERIC;
    v_new_tier    TEXT;
BEGIN
    SELECT COALESCE(SUM(total_amount), 0)
    INTO v_total_spent
    FROM sales.orders
    WHERE user_id = p_user_id
      AND order_status IN ('paid', 'shipped', 'delivered');

    IF v_total_spent >= 10000 THEN
        v_new_tier := 'platinum';
    ELSIF v_total_spent >= 5000 THEN
        v_new_tier := 'gold';
    ELSIF v_total_spent >= 1000 THEN
        v_new_tier := 'silver';
    ELSE
        v_new_tier := 'bronze';
    END IF;

    UPDATE sales.users
    SET loyalty_tier = v_new_tier,
        updated_at   = NOW()
    WHERE user_id = p_user_id;

    RAISE NOTICE 'User % upgraded to % (lifetime: %)',
        p_user_id, v_new_tier, v_total_spent;
END;
$$;

Cursor-Based Iteration

Cursors let a procedure walk through a result set row by row. Use them when each row requires individual processing that cannot be expressed as a single set-based statement.

CREATE OR REPLACE PROCEDURE sales.recalculate_all_tiers()
LANGUAGE plpgsql
AS $$
DECLARE
    r_user RECORD;
    cur_users CURSOR FOR
        SELECT user_id FROM sales.users WHERE is_active = TRUE;
BEGIN
    OPEN cur_users;
    LOOP
        FETCH cur_users INTO r_user;
        EXIT WHEN NOT FOUND;

        CALL sales.upgrade_user_tier(r_user.user_id);
    END LOOP;
    CLOSE cur_users;

    RAISE NOTICE 'All active user tiers recalculated.';
END;
$$;
âš 
Prefer set-based operations over cursors. A single UPDATE ... FROM ... is almost always faster than a cursor loop. Use cursors only when per-row logic is genuinely different (e.g., calling another procedure per row, or complex branching that varies by row).

Temporary Tables Inside Procedures

Temp tables are powerful staging areas for multi-step logic. They exist only for the session (or the transaction, with ON COMMIT DROP). Use them to collect intermediate results, then process those results in subsequent steps.

ON COMMIT DROP — Transaction-Scoped Temp Tables

CREATE OR REPLACE PROCEDURE billing.process_batch_payments(
    IN p_batch_id BIGINT
)
LANGUAGE plpgsql
AS $$
BEGIN
    -- Step 1: Create a transaction-scoped staging table
    CREATE TEMP TABLE IF NOT EXISTS payment_worklist (
        order_id BIGINT,
        user_id  BIGINT,
        amount   NUMERIC(12, 2)
    ) ON COMMIT DROP;

    -- Step 2: Load pending payments
    INSERT INTO payment_worklist (order_id, user_id, amount)
    SELECT order_id, user_id, total_amount
    FROM sales.orders
    WHERE payment_batch_id = p_batch_id
      AND order_status = 'pending_payment';

    -- Step 3: Mark orders as paid
    UPDATE sales.orders AS o
    SET order_status = 'paid',
        paid_at      = NOW()
    FROM payment_worklist AS pw
    WHERE o.order_id = pw.order_id;

    -- Step 4: Award loyalty points
    UPDATE sales.users AS u
    SET loyalty_points = u.loyalty_points + 10
    FROM payment_worklist AS pw
    WHERE u.user_id = pw.user_id;

    -- Step 5: Log batch completion
    INSERT INTO billing.batch_log (batch_id, processed_at, order_count)
    SELECT p_batch_id, NOW(), COUNT(*) FROM payment_worklist;

    COMMIT;
EXCEPTION
    WHEN OTHERS THEN
        ROLLBACK;
        RAISE;
END;
$$;

ON COMMIT PRESERVE ROWS — Session-Scoped Temp Tables

When the temp table needs to survive across multiple transactions within the same session (e.g., a multi-step data import), use ON COMMIT PRESERVE ROWS (the default) or simply CREATE TEMP TABLE.

-- Session-scoped: survives COMMIT, dropped when session ends
CREATE TEMP TABLE import_staging (
    raw_email   TEXT,
    raw_name    TEXT,
    is_valid    BOOLEAN DEFAULT FALSE
);

-- Use TRUNCATE to reuse the same temp table across calls
TRUNCATE import_staging;
Temp Table LifecycleCreated WithDropped WhenUse Case
Transaction-scopedON COMMIT DROPTransaction endsSingle-batch procedures
Session-scopedON COMMIT PRESERVE ROWSSession endsMulti-step imports, interactive workflows
Reusable patternCREATE TEMP TABLE IF NOT EXISTS + TRUNCATEManually or session endRepeated procedure calls

Error Handling with EXCEPTION

The EXCEPTION block catches errors inside PL/pgSQL. You can handle specific error conditions or use WHEN OTHERS as a catch-all. This lets you log failures, clean up resources, or provide meaningful error messages.

CREATE OR REPLACE PROCEDURE sales.safe_transfer_order(
    IN p_order_id  BIGINT,
    IN p_new_user  INTEGER
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_old_user INTEGER;
BEGIN
    SELECT user_id INTO STRICT v_old_user
    FROM sales.orders
    WHERE order_id = p_order_id;

    UPDATE sales.orders
    SET user_id = p_new_user
    WHERE order_id = p_order_id;

    RAISE NOTICE 'Order % transferred from user % to user %.',
        p_order_id, v_old_user, p_new_user;

EXCEPTION
    WHEN NO_DATA_FOUND THEN
        RAISE EXCEPTION 'Order % does not exist.', p_order_id;
    WHEN foreign_key_violation THEN
        RAISE EXCEPTION 'User % does not exist.', p_new_user;
    WHEN OTHERS THEN
        RAISE EXCEPTION 'Unexpected error: % [%]', SQLERRM, SQLSTATE;
END;
$$;
ℹ
When to use procedures vs. application code: put logic in a procedure when it involves multiple tables in a single transaction, needs to stay close to the data for performance, or must be consistent across all callers (APIs, batch jobs, reports). Keep business rules in application code when they need unit testing, frequent iteration, or integration with external services.

Module 6: User-Defined Functions (UDFs)

Functions encapsulate reusable logic that returns a value. Unlike procedures, functions can be called inside SELECT, WHERE, and JOIN clauses, making them composable building blocks for queries.

Language Choice: SQL vs. PL/pgSQL

LanguageUse WhenPerformance
LANGUAGE sqlThe body is a single SQL statement or simple expressionCan be inlined by the planner — fastest
LANGUAGE plpgsqlYou need variables, control flow, or multiple statementsSlightly more overhead; not inlined

Volatility Categories

Volatility tells PostgreSQL how aggressively it can optimize calls.

CategoryMeaningExample
IMMUTABLESame inputs always produce same output, no side effectsMath, string formatting
STABLESame within a single statement, reads DB but no writesLookups against config tables
VOLATILECan return different results even within the same statementNOW(), random(), writes
💡
Rule of thumb: mark functions IMMUTABLE when possible — PostgreSQL can cache results, use them in index expressions, and fold constant calls at plan time. Mislabeling a volatile function as immutable can cause silent data bugs.

Scalar Function (PL/pgSQL)

Returns a single value. Great for business-rule calculations that are reused across multiple queries.

CREATE OR REPLACE FUNCTION billing.calculate_order_tax(
    p_subtotal NUMERIC,
    p_tax_rate NUMERIC
)
RETURNS NUMERIC
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
    RETURN ROUND(p_subtotal * p_tax_rate, 2);
END;
$$;
SELECT
    order_id,
    total_amount,
    billing.calculate_order_tax(total_amount, 0.0825) AS tax
FROM sales.orders
LIMIT 10;

Scalar Function (Pure SQL — Inlineable)

A pure SQL function can be inlined by the planner — it is effectively copy-pasted into the calling query, avoiding function-call overhead entirely.

CREATE OR REPLACE FUNCTION billing.net_amount(
    p_gross NUMERIC,
    p_tax   NUMERIC
)
RETURNS NUMERIC
LANGUAGE sql
IMMUTABLE
AS $$
    SELECT ROUND(p_gross - p_tax, 2);
$$;

Table-Returning Function

A table function returns a result set that callers can use like a virtual table in FROM. This is powerful for parameterized views.

CREATE OR REPLACE FUNCTION reporting.get_user_orders(
    p_user_id BIGINT
)
RETURNS TABLE (
    order_id     BIGINT,
    order_date   TIMESTAMP WITH TIME ZONE,
    order_status TEXT,
    total_amount NUMERIC(12, 2)
)
LANGUAGE plpgsql
STABLE
AS $$
BEGIN
    RETURN QUERY
    SELECT
        o.order_id,
        o.order_date,
        o.order_status,
        o.total_amount
    FROM sales.orders AS o
    WHERE o.user_id = p_user_id
    ORDER BY o.order_date DESC;
END;
$$;
-- Use like a table
SELECT * FROM reporting.get_user_orders(42);

-- Join with other tables
SELECT
    uo.order_id,
    uo.total_amount,
    u.full_name
FROM reporting.get_user_orders(42) AS uo
INNER JOIN sales.users AS u
    ON u.user_id = 42;

Function with Default Parameters

Defaults make functions flexible without breaking existing callers.

CREATE OR REPLACE FUNCTION reporting.recent_orders(
    p_days   INTEGER DEFAULT 30,
    p_status TEXT    DEFAULT 'paid'
)
RETURNS TABLE (
    order_id     BIGINT,
    user_id      BIGINT,
    total_amount NUMERIC(12, 2),
    order_date   TIMESTAMP WITH TIME ZONE
)
LANGUAGE sql
STABLE
AS $$
    SELECT order_id, user_id, total_amount, order_date
    FROM sales.orders
    WHERE order_date >= NOW() - (p_days || ' days')::INTERVAL
      AND order_status = p_status
    ORDER BY order_date DESC;
$$;
-- Uses defaults: last 30 days, status = 'paid'
SELECT * FROM reporting.recent_orders();

-- Override days only
SELECT * FROM reporting.recent_orders(p_days := 7);

-- Override both
SELECT * FROM reporting.recent_orders(90, 'shipped');

Trigger Function

Trigger functions automate side effects when rows are inserted, updated, or deleted. They return TRIGGER and have access to special variables NEW (incoming row) and OLD (previous row).

CREATE OR REPLACE FUNCTION sales.set_updated_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
    NEW.updated_at := NOW();
    RETURN NEW;
END;
$$;

CREATE TRIGGER trg_users_updated_at
    BEFORE UPDATE ON sales.users
    FOR EACH ROW
    EXECUTE FUNCTION sales.set_updated_at();
âš 
Trigger performance: triggers fire per row by default. On bulk updates (10K+ rows), they can become a bottleneck. Consider FOR EACH STATEMENT triggers or batch processing inside procedures for high-volume operations.

SECURITY DEFINER Functions

SECURITY DEFINER runs the function with the permissions of the function owner, not the caller. This is useful for controlled access — letting application users call a function that reads from tables they cannot access directly.

CREATE OR REPLACE FUNCTION reporting.get_order_summary(
    p_user_id BIGINT
)
RETURNS TABLE (
    total_orders  BIGINT,
    total_spent   NUMERIC,
    avg_order_val NUMERIC
)
LANGUAGE sql
STABLE
SECURITY DEFINER
AS $$
    SELECT
        COUNT(*)                        AS total_orders,
        COALESCE(SUM(total_amount), 0)  AS total_spent,
        ROUND(AVG(total_amount), 2)     AS avg_order_val
    FROM sales.orders
    WHERE user_id = p_user_id
      AND order_status IN ('paid', 'shipped', 'delivered');
$$;
ℹ
Functions vs. Procedures recap: use functions when you need a return value composable in queries. Use procedures when you need transaction control (COMMIT/ROLLBACK). If the logic is purely a calculation with no side effects, a LANGUAGE sql IMMUTABLE function gives the best performance.

Module 7: Bonus - Advanced Data Types (JSONB)

PostgreSQL is famous for JSON support because it lets you mix relational structure with flexible semi-structured payloads. That is useful when the core entity is stable but some attributes vary, such as marketplace-specific metadata, checkout payloads, or webhook responses.

CREATE TABLE sales.order_events (
    event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    order_id BIGINT NOT NULL,
    event_type TEXT NOT NULL,
    event_payload JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
SELECT
    order_id,
    event_payload ->> 'source' AS source_system,
    event_payload -> 'payment' ->> 'provider' AS payment_provider
FROM sales.order_events
WHERE event_type = 'checkout_completed';
SELECT order_id, event_payload
FROM sales.order_events
WHERE event_payload @> '{"payment": {"provider": "stripe"}}'::jsonb;
💡
Design guideline: keep highly queried, strongly typed business fields in regular columns. Use JSONB for flexible attributes, payload snapshots, or edge-case metadata, not as an excuse to avoid data modeling.

Reference Links