PostgreSQL: Intermediate to Advanced Developer Handbook
Production-focused PostgreSQL patterns for engineers moving beyond CRUD into advanced joins, ranking, recursive queries, stored procedures, PL/pgSQL functions, and JSONB-heavy workloads.
Module 1: Advanced Joins & Set Operators
At the intermediate-to-advanced level, joins go beyond simple INNER and LEFT. This module covers join patterns that solve real operational problems — hierarchical data, parameterized subqueries, combinatorial grids, and result-set algebra.
SELF JOIN — Hierarchical & Intra-Table Relationships
A self join connects a table to itself using two aliases. It is the natural pattern for any data that has a parent-child relationship stored in the same table — org charts, category trees, referral chains, or bill-of-materials structures.
-- Employee hierarchy: who reports to whom?
SELECT
e.employee_id,
e.employee_name AS employee,
e.department,
m.employee_name AS direct_manager,
m.department AS manager_department
FROM hr.employees AS e
LEFT JOIN hr.employees AS m
ON e.manager_id = m.employee_id
ORDER BY m.employee_name NULLS FIRST, e.employee_name;
Finding peers — employees who share the same manager:
SELECT
a.employee_name AS peer_a,
b.employee_name AS peer_b,
m.employee_name AS shared_manager
FROM hr.employees AS a
INNER JOIN hr.employees AS b
ON a.manager_id = b.manager_id
AND a.employee_id < b.employee_id -- avoid duplicates & self-pairs
INNER JOIN hr.employees AS m
ON m.employee_id = a.manager_id
ORDER BY shared_manager, peer_a;
a.employee_id < b.employee_id trick eliminates both self-pairs (Alice–Alice) and mirror duplicates (Alice–Bob / Bob–Alice). Without it, the result is cluttered and misleading.Detecting referral chains in an e-commerce context:
-- Users who referred other users (single level)
SELECT
referrer.user_id AS referrer_id,
referrer.full_name AS referrer_name,
referred.user_id AS referred_id,
referred.full_name AS referred_name,
referred.created_at AS signup_date
FROM sales.users AS referred
INNER JOIN sales.users AS referrer
ON referred.referred_by_user_id = referrer.user_id
ORDER BY referrer.full_name, referred.created_at;
CROSS JOIN — Intentional Cartesian Products
A cross join produces every combination of rows from two sources. It is dangerous on large tables (M × N rows), but deliberately powerful for generating grids, scaffolds, and gap-detection frameworks from small reference data.
-- Shipping price grid: every region Ã every tier
SELECT
region.region_name,
tier.tier_name,
region.base_rate + tier.surcharge AS estimated_rate
FROM (VALUES
('North America', 5.00), ('Europe', 8.50), ('APAC', 12.00)
) AS region(region_name, base_rate)
CROSS JOIN (VALUES
('Standard', 0.00), ('Express', 4.50), ('Overnight', 9.00)
) AS tier(tier_name, surcharge)
ORDER BY region.region_name, estimated_rate;
Gap detection scaffold — cross join a user list with a month series, then left join actual data to find zeros:
WITH months AS (
SELECT generate_series(
DATE '2026-01-01',
DATE '2026-06-01',
INTERVAL '1 month'
)::DATE AS month_start
),
active_users AS (
SELECT user_id, full_name
FROM sales.users
WHERE is_active = TRUE
)
SELECT
au.user_id,
au.full_name,
m.month_start,
COUNT(o.order_id) AS order_count,
COALESCE(SUM(o.total_amount), 0) AS month_revenue
FROM active_users AS au
CROSS JOIN months AS m
LEFT JOIN sales.orders AS o
ON o.user_id = au.user_id
AND DATE_TRUNC('month', o.order_date) = m.month_start
AND o.order_status IN ('paid', 'shipped', 'delivered')
GROUP BY au.user_id, au.full_name, m.month_start
ORDER BY au.user_id, m.month_start;
CROSS JOIN only with small reference sets, VALUES lists, or generate_series output.LATERAL — The Parameterized Subquery Join
LATERAL lets a subquery reference columns from the outer query row. PostgreSQL evaluates the subquery once per outer row, making it the SQL equivalent of a correlated for-each loop. This is the cleanest pattern for "top-N per group," derived slices, and parameterized lookups.
Pattern 1 — Latest N Orders Per User
SELECT
u.user_id,
u.full_name,
recent.order_id,
recent.order_date,
recent.total_amount
FROM sales.users AS u
LEFT JOIN LATERAL (
SELECT o.order_id, o.order_date, o.total_amount
FROM sales.orders AS o
WHERE o.user_id = u.user_id
ORDER BY o.order_date DESC
LIMIT 3
) AS recent
ON TRUE
ORDER BY u.user_id, recent.order_date DESC;
ON TRUE is required because the filtering already happened inside the lateral subquery. With LEFT JOIN LATERAL, users who have zero orders still appear (with NULL columns from recent).
Pattern 2 — Inline Aggregation Per Row
-- For each product category, get aggregated order stats
SELECT
c.category_id,
c.category_name,
stats.total_orders,
stats.total_revenue,
stats.avg_order_value
FROM catalog.categories AS c
LEFT JOIN LATERAL (
SELECT
COUNT(*) AS total_orders,
COALESCE(SUM(oi.line_total), 0) AS total_revenue,
ROUND(AVG(oi.line_total), 2) AS avg_order_value
FROM sales.order_items AS oi
INNER JOIN catalog.products AS p
ON p.product_id = oi.product_id
WHERE p.category_id = c.category_id
) AS stats
ON TRUE
ORDER BY stats.total_revenue DESC NULLS LAST;
Pattern 3 — LATERAL with unnest for Array Expansion
-- Expand a JSONB array of tags stored on each product
SELECT
p.product_id,
p.product_name,
tag.value AS tag
FROM catalog.products AS p
LEFT JOIN LATERAL jsonb_array_elements_text(p.tags) AS tag(value)
ON TRUE
ORDER BY p.product_id, tag;
SELECT list can return only one value. LATERAL can return multiple rows and multiple columns, making it far more flexible for set-returning operations.Set Operators — Result-Set Algebra
Set operators combine or compare entire result sets vertically. Both queries must produce the same number of columns with compatible data types. Column names come from the first query.
| Operator | Returns | Duplicates | Performance Note |
|---|---|---|---|
UNION | All unique rows from both queries | Removed | Implicit sort or hash — extra work |
UNION ALL | All rows from both queries | Kept | No dedup — fastest |
INTERSECT | Rows common to both queries | Removed | Hash or sort join |
INTERSECT ALL | Common rows, preserving duplicate count | Kept (min count) | Less common but useful |
EXCEPT | Rows in query 1 not in query 2 | Removed | Directional — order matters |
EXCEPT ALL | Rows in query 1 not in query 2, preserving counts | Kept (subtracted) | Useful for audit diffs |
UNION vs. UNION ALL
-- Deduplicated email list across two sources SELECT email FROM sales.users UNION SELECT email FROM sales.marketing_subscribers ORDER BY email;
-- Keep all rows (e.g., event log merge from regional shards) SELECT order_id, event_type, created_at FROM sales.order_events_na UNION ALL SELECT order_id, event_type, created_at FROM sales.order_events_eu ORDER BY created_at DESC;
INTERSECT — Finding Overlap
-- Customers who are also newsletter subscribers SELECT email FROM sales.users INTERSECT SELECT email FROM sales.marketing_subscribers;
EXCEPT — Finding Differences
-- Subscribers who have NOT created an account SELECT email FROM sales.marketing_subscribers EXCEPT SELECT email FROM sales.users;
Advanced: Combining Set Operators
You can chain set operators and control precedence with parentheses. INTERSECT binds tighter than UNION by default, but explicit grouping is always clearer:
-- Users who bought in Q1 but NOT in Q2
(
SELECT DISTINCT user_id
FROM sales.orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-03-31'
)
EXCEPT
(
SELECT DISTINCT user_id
FROM sales.orders
WHERE order_date BETWEEN '2026-04-01' AND '2026-06-30'
);
-- Users who bought in BOTH Q1 AND Q2
(
SELECT DISTINCT user_id
FROM sales.orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-03-31'
)
INTERSECT
(
SELECT DISTINCT user_id
FROM sales.orders
WHERE order_date BETWEEN '2026-04-01' AND '2026-06-30'
);
UNION ALL is always cheaper than UNION. Prefer UNION ALL when you know the sources are already distinct, or when duplicates are semantically correct (e.g., event logs). Use EXCEPT over anti-joins when readability matters more than fine-tuned index usage.Module 2: Window Functions & Ranking
The OVER() clause defines an analytical window. Unlike GROUP BY, which collapses many rows into one summary row, a window function keeps each original row and attaches a calculation beside it. This module covers the ranking family, offset navigation, running aggregates, and distribution functions.
The Anatomy of OVER()
| Clause | Purpose | Example |
|---|---|---|
PARTITION BY | Splits rows into groups (like GROUP BY, but without collapsing) | PARTITION BY user_id |
ORDER BY | Determines sequence within each partition | ORDER BY total_amount DESC |
| Frame clause | Defines the sliding window of rows for aggregate functions | ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
Ranking Functions — Deep Dive
ROW_NUMBER() — Unique Sequence
ROW_NUMBER() assigns a unique integer per row within its partition, even when ties exist. This is the go-to for "pick exactly one row per group" (deduplication, latest record per entity).
-- Most recent order per user (dedup pattern)
WITH ranked AS (
SELECT
user_id,
order_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY order_date DESC
) AS rn
FROM sales.orders
)
SELECT user_id, order_id, order_date, total_amount
FROM ranked
WHERE rn = 1
ORDER BY order_date DESC;
RANK() — Gaps After Ties
RANK() gives tied rows the same number, then skips. Two rows tied for rank 1 means the next row gets rank 3. Use this when the competitive gap matters (leaderboards, contest standings).
SELECT
user_id,
order_id,
total_amount,
RANK() OVER (
PARTITION BY user_id
ORDER BY total_amount DESC
) AS spending_rank
FROM sales.orders
ORDER BY user_id, spending_rank;
DENSE_RANK() — No Gaps
DENSE_RANK() keeps consecutive numbers even with ties. Two rows tied for rank 1 means the next row gets rank 2. Best when you need “how many distinct tiers exist” rather than absolute position.
SELECT
user_id,
order_id,
total_amount,
DENSE_RANK() OVER (
PARTITION BY user_id
ORDER BY total_amount DESC
) AS dense_rank_val
FROM sales.orders
ORDER BY user_id, dense_rank_val;
Side-by-Side Comparison
SELECT
user_id,
order_id,
total_amount,
ROW_NUMBER() OVER w AS row_num,
RANK() OVER w AS rank_val,
DENSE_RANK() OVER w AS d_rank_val
FROM sales.orders
WINDOW w AS (PARTITION BY user_id ORDER BY total_amount DESC)
ORDER BY user_id, total_amount DESC;
| total_amount | ROW_NUMBER | RANK | DENSE_RANK |
|---|---|---|---|
| 500.00 | 1 | 1 | 1 |
| 500.00 | 2 | 1 | 1 |
| 300.00 | 3 | 3 | 2 |
| 150.00 | 4 | 4 | 3 |
WINDOW w AS (...) defines the window once and reuses it across multiple functions — eliminating repetition and keeping the query DRY.NTILE(n) — Percentile Buckets
NTILE(n) divides a partition into n roughly equal buckets. Use it for percentile segmentation — customer quartiles, order deciles, or performance tiers.
-- Customer spending quartiles
SELECT
u.user_id,
u.full_name,
totals.lifetime_spent,
NTILE(4) OVER (ORDER BY totals.lifetime_spent DESC) AS quartile
FROM sales.users AS u
INNER JOIN (
SELECT user_id, SUM(total_amount) AS lifetime_spent
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
GROUP BY user_id
) AS totals
ON totals.user_id = u.user_id
ORDER BY quartile, totals.lifetime_spent DESC;
Offset Functions — LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
LAG() and LEAD()
These functions peek at neighboring rows without a self-join. LAG looks backward, LEAD looks forward. They are essential for change analysis — week-over-week growth, time gaps between events, or trend detection.
WITH weekly_sales AS (
SELECT
DATE_TRUNC('week', order_date) AS sales_week,
SUM(total_amount) AS weekly_revenue
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
GROUP BY DATE_TRUNC('week', order_date)
)
SELECT
sales_week,
weekly_revenue,
LAG(weekly_revenue) OVER w AS prev_week,
LEAD(weekly_revenue) OVER w AS next_week,
weekly_revenue - LAG(weekly_revenue) OVER w AS wow_change,
ROUND(
(weekly_revenue - LAG(weekly_revenue) OVER w)
/ NULLIF(LAG(weekly_revenue) OVER w, 0) * 100, 1
) AS wow_growth_pct
FROM weekly_sales
WINDOW w AS (ORDER BY sales_week)
ORDER BY sales_week;
Time gap between consecutive orders per user:
SELECT
user_id,
order_id,
order_date,
LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS prev_order_date,
order_date - LAG(order_date) OVER (PARTITION BY user_id ORDER BY order_date) AS days_since_last_order
FROM sales.orders
ORDER BY user_id, order_date;
FIRST_VALUE() and LAST_VALUE()
These functions return the first or last value in the window frame. Useful for comparing each row against a baseline (e.g., every order vs. the customer's first order).
SELECT
user_id,
order_id,
order_date,
total_amount,
FIRST_VALUE(total_amount) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS first_order_amount,
total_amount - FIRST_VALUE(total_amount) OVER (
PARTITION BY user_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS diff_from_first
FROM sales.orders
ORDER BY user_id, order_date;
LAST_VALUE trap: the default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which means LAST_VALUE returns the current row — not the actual last row. Always specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING when you want the true last value.Running Aggregates
An aggregate inside OVER() creates a running calculation. Cumulative revenue, running counts, and moving averages — all without self-joins.
SELECT
order_id,
order_date,
total_amount,
SUM(total_amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue,
AVG(total_amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7_orders
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
ORDER BY order_date;
Distribution Functions
PERCENT_RANK() and CUME_DIST() tell you where a row falls relative to the entire partition — useful for percentile-based reporting and outlier detection.
SELECT
user_id,
total_amount,
ROUND(PERCENT_RANK() OVER (ORDER BY total_amount)::NUMERIC, 3) AS pct_rank,
ROUND(CUME_DIST() OVER (ORDER BY total_amount)::NUMERIC, 3) AS cumulative_dist
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
ORDER BY total_amount DESC;
PERCENT_RANK returns 0 for the first row and scales to 1. CUME_DIST returns the fraction of rows with values ≤ the current row. Use PERCENT_RANK for "what percentile am I in?" and CUME_DIST for "what fraction of records are at or below me?"Module 3: Common Table Expressions (CTEs)
CTEs name intermediate result sets inside a single statement. At the advanced level, they become powerful tools for hierarchical traversal, data pipelines, and even multi-table mutations.
Chaining Standard CTEs
Multiple CTEs let you break a large query into named steps. Each CTE can reference the ones defined before it, creating a readable data pipeline.
WITH paid_orders AS (
SELECT user_id, total_amount, order_date
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
),
user_order_totals AS (
SELECT
user_id,
COUNT(*) AS completed_order_count,
SUM(total_amount) AS lifetime_value,
MAX(order_date) AS last_order_date
FROM paid_orders
GROUP BY user_id
),
vip_users AS (
SELECT *
FROM user_order_totals
WHERE lifetime_value >= 5000.00
)
SELECT
u.user_id,
u.full_name,
v.completed_order_count,
v.lifetime_value,
v.last_order_date
FROM vip_users AS v
INNER JOIN sales.users AS u
ON u.user_id = v.user_id
ORDER BY v.lifetime_value DESC;
CTE Materialization Hints
By default, PostgreSQL may inline a CTE or materialize it. In PostgreSQL 12+, you can control this explicitly. MATERIALIZED forces the CTE to be evaluated once and stored as a temp result. NOT MATERIALIZED allows the optimizer to inline it into the outer query.
-- Force materialization when the CTE is referenced multiple times
WITH active_users AS MATERIALIZED (
SELECT user_id, full_name, email
FROM sales.users
WHERE is_active = TRUE
)
SELECT au.full_name, COUNT(o.order_id) AS order_count
FROM active_users AS au
LEFT JOIN sales.orders AS o
ON o.user_id = au.user_id
GROUP BY au.full_name
ORDER BY order_count DESC;
MATERIALIZED: when the CTE is referenced multiple times, or when it acts as an optimization fence (preventing the planner from pushing filters into the CTE). Use NOT MATERIALIZED when the CTE is used once and you want the planner to optimize it as part of the outer query.Writable CTEs (DML inside WITH)
PostgreSQL allows INSERT, UPDATE, and DELETE inside CTEs. This creates powerful single-statement pipelines that archive, transform, and report in one atomic operation.
-- Archive old orders and return a summary in one statement
WITH archived AS (
DELETE FROM sales.orders
WHERE order_date < NOW() - INTERVAL '2 years'
AND order_status = 'delivered'
RETURNING order_id, user_id, total_amount, order_date
),
inserted AS (
INSERT INTO archive.orders (order_id, user_id, total_amount, order_date)
SELECT order_id, user_id, total_amount, order_date
FROM archived
RETURNING order_id
)
SELECT COUNT(*) AS archived_count FROM inserted;
DELETE, it will delete rows even if the outer SELECT is never reached due to an error in another CTE. Test carefully.Recursive CTEs — Hierarchical Traversal
WITH RECURSIVE has two parts: the anchor (base case) and the recursive member (step). PostgreSQL repeatedly executes the recursive member until it returns no new rows.
Org Chart with Depth and Path
WITH RECURSIVE org_chart AS (
-- Anchor: top-level managers (no manager)
SELECT
employee_id,
employee_name,
manager_id,
0 AS depth,
employee_name::TEXT AS reporting_path
FROM hr.employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: walk down the tree
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
oc.depth + 1,
oc.reporting_path || ' → ' || e.employee_name
FROM hr.employees AS e
INNER JOIN org_chart AS oc
ON e.manager_id = oc.employee_id
)
SELECT employee_id, employee_name, depth, reporting_path
FROM org_chart
ORDER BY reporting_path;
Category Breadcrumb Trail
-- Build "Electronics → Phones → Smartphones" breadcrumbs
WITH RECURSIVE cat_path AS (
SELECT
category_id,
category_name,
parent_category_id,
category_name::TEXT AS breadcrumb
FROM catalog.categories
WHERE parent_category_id IS NULL
UNION ALL
SELECT
c.category_id,
c.category_name,
c.parent_category_id,
cp.breadcrumb || ' → ' || c.category_name
FROM catalog.categories AS c
INNER JOIN cat_path AS cp
ON c.parent_category_id = cp.category_id
)
SELECT category_id, category_name, breadcrumb
FROM cat_path
ORDER BY breadcrumb;
Generating a Number Series
-- Recursive CTE as a substitute for generate_series
WITH RECURSIVE nums AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM nums WHERE n < 100
)
SELECT n FROM nums;
Cycle Detection (PostgreSQL 14+)
The CYCLE clause prevents infinite loops when data has circular references (e.g., an employee who is their own indirect manager). Before PostgreSQL 14, you had to track visited nodes manually with arrays.
WITH RECURSIVE org_chart AS (
SELECT employee_id, employee_name, manager_id, 0 AS depth
FROM hr.employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.employee_name, e.manager_id, oc.depth + 1
FROM hr.employees AS e
INNER JOIN org_chart AS oc
ON e.manager_id = oc.employee_id
)
CYCLE employee_id SET is_cycle USING path
SELECT employee_id, employee_name, depth
FROM org_chart
WHERE NOT is_cycle
ORDER BY depth, employee_name;
WHERE depth < 20) or use CYCLE to prevent runaway recursion. Index the join column (manager_id, parent_category_id) for best performance.Module 4: Advanced Grouping & Pivoting
Standard GROUP BY produces one level of aggregation. Advanced grouping (ROLLUP, CUBE, GROUPING SETS) produces multiple levels in a single pass — subtotals, cross-totals, and grand totals without UNION ALL gymnastics.
ROLLUP — Hierarchical Subtotals
ROLLUP creates subtotals that roll up from right to left. ROLLUP(a, b) produces groups: (a, b), (a), (). Think of it as progressively removing the rightmost column.
SELECT
sales_region,
order_status,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY ROLLUP (sales_region, order_status)
ORDER BY sales_region NULLS LAST, order_status NULLS LAST;
Result rows where order_status IS NULL are region subtotals. The row where both are NULL is the grand total.
CUBE — All Dimension Combinations
CUBE generates subtotals for every possible combination. CUBE(a, b) produces: (a, b), (a), (b), (). Use it when you need cross-dimensional summaries (e.g., revenue by region, by status, and by both).
SELECT
sales_region,
order_status,
SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY CUBE (sales_region, order_status)
ORDER BY sales_region NULLS LAST, order_status NULLS LAST;
GROUPING SETS — Explicit Control
GROUPING SETS lets you specify exactly which groupings to compute. This is the most flexible option — ROLLUP and CUBE are syntactic shortcuts for common GROUPING SETS patterns.
SELECT
sales_region,
payment_method,
SUM(total_amount) AS revenue,
COUNT(*) AS order_count
FROM sales.orders
GROUP BY GROUPING SETS (
(sales_region, payment_method), -- detail
(sales_region), -- subtotal by region
(payment_method), -- subtotal by payment
() -- grand total
)
ORDER BY sales_region NULLS LAST, payment_method NULLS LAST;
The GROUPING() Function — Disambiguating NULLs
Subtotal rows have NULL in the grouped column, but real data can also be NULL. The GROUPING() function returns 1 when the NULL comes from the grouping operation (subtotal), and 0 when it is a real data NULL.
SELECT
CASE WHEN GROUPING(sales_region) = 1 THEN '** ALL REGIONS **'
ELSE COALESCE(sales_region, '(unknown)')
END AS region_label,
CASE WHEN GROUPING(order_status) = 1 THEN '** ALL STATUSES **'
ELSE order_status
END AS status_label,
SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY ROLLUP (sales_region, order_status)
ORDER BY GROUPING(sales_region), sales_region NULLS LAST,
GROUPING(order_status), order_status NULLS LAST;
ORDER BY GROUPING(col), col puts detail rows before subtotals and subtotals before the grand total — the natural report layout.Pivoting with FILTER
PostgreSQL does not have a native PIVOT keyword. The idiomatic replacement is conditional aggregation using the FILTER clause, which is both readable and performant.
SELECT
DATE_TRUNC('month', order_date) AS sales_month,
COUNT(*) AS total_orders,
SUM(total_amount) FILTER (WHERE order_status = 'paid') AS paid,
SUM(total_amount) FILTER (WHERE order_status = 'shipped') AS shipped,
SUM(total_amount) FILTER (WHERE order_status = 'delivered') AS delivered,
SUM(total_amount) FILTER (WHERE order_status = 'cancelled') AS cancelled
FROM sales.orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY sales_month;
Pivoting with CASE (Portable Alternative)
CASE-based pivoting works on all SQL databases. It is slightly more verbose than FILTER but useful when you need cross-database compatibility.
SELECT
sales_region,
SUM(CASE WHEN payment_method = 'credit_card' THEN total_amount ELSE 0 END) AS credit_card,
SUM(CASE WHEN payment_method = 'paypal' THEN total_amount ELSE 0 END) AS paypal,
SUM(CASE WHEN payment_method = 'bank_transfer' THEN total_amount ELSE 0 END) AS bank_transfer
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
GROUP BY sales_region
ORDER BY sales_region;
Dynamic Pivoting with crosstab()
The tablefunc extension provides crosstab() for cases where the pivot categories are not known at write time.
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT *
FROM crosstab(
$$
SELECT
sales_region,
order_status,
SUM(total_amount)::NUMERIC
FROM sales.orders
GROUP BY sales_region, order_status
ORDER BY 1, 2
$$,
$$ VALUES ('cancelled'), ('delivered'), ('paid'), ('shipped') $$
) AS pivot_table (
sales_region TEXT,
cancelled NUMERIC,
delivered NUMERIC,
paid NUMERIC,
shipped NUMERIC
);
FILTER vs. CASE vs. crosstab(): use FILTER when categories are known and fixed — it is the cleanest PostgreSQL idiom. Use CASE for portability. Use crosstab() only when categories are dynamic or unknown at design time.Module 5: Stored Procedures & Temporary Tables
PostgreSQL procedures were introduced in version 11. Unlike functions, procedures can manage transactions with COMMIT and ROLLBACK when executed in an appropriate call context. That makes them the right tool for multi-step batch workflows, data migrations, and any operation that needs explicit transaction control within a single callable unit.
| Feature | Function (CREATE FUNCTION) | Procedure (CREATE PROCEDURE) |
|---|---|---|
| Returns a value | Yes — scalar, composite, or table | No — uses OUT / INOUT parameters |
| Called with | SELECT func() | CALL proc() |
| Usable in queries | Yes — in SELECT, WHERE, JOIN | No |
| Transaction control | No | Yes — COMMIT / ROLLBACK |
| Best for | Computed values, reusable query logic | Batch workflows, data migrations, multi-step mutations |
Basic Procedure with IN / OUT / INOUT Parameters
IN passes values in. OUT returns values. INOUT does both.
CREATE OR REPLACE PROCEDURE sales.apply_discount(
IN p_user_id INTEGER,
IN p_discount_pct NUMERIC,
INOUT p_orders_updated INTEGER DEFAULT 0
)
LANGUAGE plpgsql
AS $$
BEGIN
UPDATE sales.orders
SET total_amount = total_amount * (1 - p_discount_pct / 100)
WHERE user_id = p_user_id
AND order_status = 'pending_payment';
GET DIAGNOSTICS p_orders_updated = ROW_COUNT;
END;
$$;
CALL sales.apply_discount(42, 15.0, NULL); -- Returns count of updated orders via the INOUT parameter
GET DIAGNOSTICS captures metadata from the last SQL statement. ROW_COUNT returns the number of affected rows — essential for logging and conditional logic inside procedures.Control Flow & Variables
PL/pgSQL supports DECLARE blocks, IF / ELSIF / ELSE, CASE, and loops. Use them to build conditional business logic directly in the database.
CREATE OR REPLACE PROCEDURE sales.upgrade_user_tier(
IN p_user_id INTEGER
)
LANGUAGE plpgsql
AS $$
DECLARE
v_total_spent NUMERIC;
v_new_tier TEXT;
BEGIN
SELECT COALESCE(SUM(total_amount), 0)
INTO v_total_spent
FROM sales.orders
WHERE user_id = p_user_id
AND order_status IN ('paid', 'shipped', 'delivered');
IF v_total_spent >= 10000 THEN
v_new_tier := 'platinum';
ELSIF v_total_spent >= 5000 THEN
v_new_tier := 'gold';
ELSIF v_total_spent >= 1000 THEN
v_new_tier := 'silver';
ELSE
v_new_tier := 'bronze';
END IF;
UPDATE sales.users
SET loyalty_tier = v_new_tier,
updated_at = NOW()
WHERE user_id = p_user_id;
RAISE NOTICE 'User % upgraded to % (lifetime: %)',
p_user_id, v_new_tier, v_total_spent;
END;
$$;
Cursor-Based Iteration
Cursors let a procedure walk through a result set row by row. Use them when each row requires individual processing that cannot be expressed as a single set-based statement.
CREATE OR REPLACE PROCEDURE sales.recalculate_all_tiers()
LANGUAGE plpgsql
AS $$
DECLARE
r_user RECORD;
cur_users CURSOR FOR
SELECT user_id FROM sales.users WHERE is_active = TRUE;
BEGIN
OPEN cur_users;
LOOP
FETCH cur_users INTO r_user;
EXIT WHEN NOT FOUND;
CALL sales.upgrade_user_tier(r_user.user_id);
END LOOP;
CLOSE cur_users;
RAISE NOTICE 'All active user tiers recalculated.';
END;
$$;
UPDATE ... FROM ... is almost always faster than a cursor loop. Use cursors only when per-row logic is genuinely different (e.g., calling another procedure per row, or complex branching that varies by row).Temporary Tables Inside Procedures
Temp tables are powerful staging areas for multi-step logic. They exist only for the session (or the transaction, with ON COMMIT DROP). Use them to collect intermediate results, then process those results in subsequent steps.
ON COMMIT DROP — Transaction-Scoped Temp Tables
CREATE OR REPLACE PROCEDURE billing.process_batch_payments(
IN p_batch_id BIGINT
)
LANGUAGE plpgsql
AS $$
BEGIN
-- Step 1: Create a transaction-scoped staging table
CREATE TEMP TABLE IF NOT EXISTS payment_worklist (
order_id BIGINT,
user_id BIGINT,
amount NUMERIC(12, 2)
) ON COMMIT DROP;
-- Step 2: Load pending payments
INSERT INTO payment_worklist (order_id, user_id, amount)
SELECT order_id, user_id, total_amount
FROM sales.orders
WHERE payment_batch_id = p_batch_id
AND order_status = 'pending_payment';
-- Step 3: Mark orders as paid
UPDATE sales.orders AS o
SET order_status = 'paid',
paid_at = NOW()
FROM payment_worklist AS pw
WHERE o.order_id = pw.order_id;
-- Step 4: Award loyalty points
UPDATE sales.users AS u
SET loyalty_points = u.loyalty_points + 10
FROM payment_worklist AS pw
WHERE u.user_id = pw.user_id;
-- Step 5: Log batch completion
INSERT INTO billing.batch_log (batch_id, processed_at, order_count)
SELECT p_batch_id, NOW(), COUNT(*) FROM payment_worklist;
COMMIT;
EXCEPTION
WHEN OTHERS THEN
ROLLBACK;
RAISE;
END;
$$;
ON COMMIT PRESERVE ROWS — Session-Scoped Temp Tables
When the temp table needs to survive across multiple transactions within the same session (e.g., a multi-step data import), use ON COMMIT PRESERVE ROWS (the default) or simply CREATE TEMP TABLE.
-- Session-scoped: survives COMMIT, dropped when session ends
CREATE TEMP TABLE import_staging (
raw_email TEXT,
raw_name TEXT,
is_valid BOOLEAN DEFAULT FALSE
);
-- Use TRUNCATE to reuse the same temp table across calls
TRUNCATE import_staging;
| Temp Table Lifecycle | Created With | Dropped When | Use Case |
|---|---|---|---|
| Transaction-scoped | ON COMMIT DROP | Transaction ends | Single-batch procedures |
| Session-scoped | ON COMMIT PRESERVE ROWS | Session ends | Multi-step imports, interactive workflows |
| Reusable pattern | CREATE TEMP TABLE IF NOT EXISTS + TRUNCATE | Manually or session end | Repeated procedure calls |
Error Handling with EXCEPTION
The EXCEPTION block catches errors inside PL/pgSQL. You can handle specific error conditions or use WHEN OTHERS as a catch-all. This lets you log failures, clean up resources, or provide meaningful error messages.
CREATE OR REPLACE PROCEDURE sales.safe_transfer_order(
IN p_order_id BIGINT,
IN p_new_user INTEGER
)
LANGUAGE plpgsql
AS $$
DECLARE
v_old_user INTEGER;
BEGIN
SELECT user_id INTO STRICT v_old_user
FROM sales.orders
WHERE order_id = p_order_id;
UPDATE sales.orders
SET user_id = p_new_user
WHERE order_id = p_order_id;
RAISE NOTICE 'Order % transferred from user % to user %.',
p_order_id, v_old_user, p_new_user;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RAISE EXCEPTION 'Order % does not exist.', p_order_id;
WHEN foreign_key_violation THEN
RAISE EXCEPTION 'User % does not exist.', p_new_user;
WHEN OTHERS THEN
RAISE EXCEPTION 'Unexpected error: % [%]', SQLERRM, SQLSTATE;
END;
$$;
Module 6: User-Defined Functions (UDFs)
Functions encapsulate reusable logic that returns a value. Unlike procedures, functions can be called inside SELECT, WHERE, and JOIN clauses, making them composable building blocks for queries.
Language Choice: SQL vs. PL/pgSQL
| Language | Use When | Performance |
|---|---|---|
LANGUAGE sql | The body is a single SQL statement or simple expression | Can be inlined by the planner — fastest |
LANGUAGE plpgsql | You need variables, control flow, or multiple statements | Slightly more overhead; not inlined |
Volatility Categories
Volatility tells PostgreSQL how aggressively it can optimize calls.
| Category | Meaning | Example |
|---|---|---|
IMMUTABLE | Same inputs always produce same output, no side effects | Math, string formatting |
STABLE | Same within a single statement, reads DB but no writes | Lookups against config tables |
VOLATILE | Can return different results even within the same statement | NOW(), random(), writes |
IMMUTABLE when possible — PostgreSQL can cache results, use them in index expressions, and fold constant calls at plan time. Mislabeling a volatile function as immutable can cause silent data bugs.Scalar Function (PL/pgSQL)
Returns a single value. Great for business-rule calculations that are reused across multiple queries.
CREATE OR REPLACE FUNCTION billing.calculate_order_tax(
p_subtotal NUMERIC,
p_tax_rate NUMERIC
)
RETURNS NUMERIC
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
RETURN ROUND(p_subtotal * p_tax_rate, 2);
END;
$$;
SELECT
order_id,
total_amount,
billing.calculate_order_tax(total_amount, 0.0825) AS tax
FROM sales.orders
LIMIT 10;
Scalar Function (Pure SQL — Inlineable)
A pure SQL function can be inlined by the planner — it is effectively copy-pasted into the calling query, avoiding function-call overhead entirely.
CREATE OR REPLACE FUNCTION billing.net_amount(
p_gross NUMERIC,
p_tax NUMERIC
)
RETURNS NUMERIC
LANGUAGE sql
IMMUTABLE
AS $$
SELECT ROUND(p_gross - p_tax, 2);
$$;
Table-Returning Function
A table function returns a result set that callers can use like a virtual table in FROM. This is powerful for parameterized views.
CREATE OR REPLACE FUNCTION reporting.get_user_orders(
p_user_id BIGINT
)
RETURNS TABLE (
order_id BIGINT,
order_date TIMESTAMP WITH TIME ZONE,
order_status TEXT,
total_amount NUMERIC(12, 2)
)
LANGUAGE plpgsql
STABLE
AS $$
BEGIN
RETURN QUERY
SELECT
o.order_id,
o.order_date,
o.order_status,
o.total_amount
FROM sales.orders AS o
WHERE o.user_id = p_user_id
ORDER BY o.order_date DESC;
END;
$$;
-- Use like a table
SELECT * FROM reporting.get_user_orders(42);
-- Join with other tables
SELECT
uo.order_id,
uo.total_amount,
u.full_name
FROM reporting.get_user_orders(42) AS uo
INNER JOIN sales.users AS u
ON u.user_id = 42;
Function with Default Parameters
Defaults make functions flexible without breaking existing callers.
CREATE OR REPLACE FUNCTION reporting.recent_orders(
p_days INTEGER DEFAULT 30,
p_status TEXT DEFAULT 'paid'
)
RETURNS TABLE (
order_id BIGINT,
user_id BIGINT,
total_amount NUMERIC(12, 2),
order_date TIMESTAMP WITH TIME ZONE
)
LANGUAGE sql
STABLE
AS $$
SELECT order_id, user_id, total_amount, order_date
FROM sales.orders
WHERE order_date >= NOW() - (p_days || ' days')::INTERVAL
AND order_status = p_status
ORDER BY order_date DESC;
$$;
-- Uses defaults: last 30 days, status = 'paid' SELECT * FROM reporting.recent_orders(); -- Override days only SELECT * FROM reporting.recent_orders(p_days := 7); -- Override both SELECT * FROM reporting.recent_orders(90, 'shipped');
Trigger Function
Trigger functions automate side effects when rows are inserted, updated, or deleted. They return TRIGGER and have access to special variables NEW (incoming row) and OLD (previous row).
CREATE OR REPLACE FUNCTION sales.set_updated_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
NEW.updated_at := NOW();
RETURN NEW;
END;
$$;
CREATE TRIGGER trg_users_updated_at
BEFORE UPDATE ON sales.users
FOR EACH ROW
EXECUTE FUNCTION sales.set_updated_at();
FOR EACH STATEMENT triggers or batch processing inside procedures for high-volume operations.SECURITY DEFINER Functions
SECURITY DEFINER runs the function with the permissions of the function owner, not the caller. This is useful for controlled access — letting application users call a function that reads from tables they cannot access directly.
CREATE OR REPLACE FUNCTION reporting.get_order_summary(
p_user_id BIGINT
)
RETURNS TABLE (
total_orders BIGINT,
total_spent NUMERIC,
avg_order_val NUMERIC
)
LANGUAGE sql
STABLE
SECURITY DEFINER
AS $$
SELECT
COUNT(*) AS total_orders,
COALESCE(SUM(total_amount), 0) AS total_spent,
ROUND(AVG(total_amount), 2) AS avg_order_val
FROM sales.orders
WHERE user_id = p_user_id
AND order_status IN ('paid', 'shipped', 'delivered');
$$;
COMMIT/ROLLBACK). If the logic is purely a calculation with no side effects, a LANGUAGE sql IMMUTABLE function gives the best performance.Module 7: Bonus - Advanced Data Types (JSONB)
PostgreSQL is famous for JSON support because it lets you mix relational structure with flexible semi-structured payloads. That is useful when the core entity is stable but some attributes vary, such as marketplace-specific metadata, checkout payloads, or webhook responses.
CREATE TABLE sales.order_events (
event_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
order_id BIGINT NOT NULL,
event_type TEXT NOT NULL,
event_payload JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
SELECT
order_id,
event_payload ->> 'source' AS source_system,
event_payload -> 'payment' ->> 'provider' AS payment_provider
FROM sales.order_events
WHERE event_type = 'checkout_completed';
SELECT order_id, event_payload
FROM sales.order_events
WHERE event_payload @> '{"payment": {"provider": "stripe"}}'::jsonb;
Reference Links
- Docs Table expressions and LATERAL
- Window Window functions tutorial
- PL/pgSQL PL/pgSQL reference
- JSONB JSON functions and operators