Back to handbooks index

PostgreSQL: Beginner to Intermediate Handbook

A production-oriented guide to PostgreSQL fundamentals, relational querying, reusable database objects, and the first layer of performance tuning for application engineers.

PostgreSQL 16+ E-commerce Schema CRUD · Joins · CTEs Indexes · EXPLAIN ANALYZE April 2026
ℹ
Theme for every example: the handbook uses an e-commerce schema with users, orders, and order_items so each concept builds on the same mental model instead of resetting the context in every module.

Module 1: The PostgreSQL Hierarchy & Basics

Think in layers. A PostgreSQL cluster is the outer machine-level container. Inside it you create databases. Inside a database you use schemas to divide responsibility, and inside schemas you store tables, views, indexes, functions, and other objects. The hierarchy is Cluster -> Database -> Schema -> Table.

Schemas are organizational boundaries. The default public schema is convenient for prototypes, but production systems usually split objects into focused namespaces like sales, hr, or reporting. That makes permissions cleaner, reduces naming collisions, and helps teams reason about ownership. It is similar to putting finance, HR, and operations documents into separate cabinets instead of one shared drawer.

Cluster Database Schema Table
LevelWhat it containsE-commerce example
ClusterOne PostgreSQL server instance with system-wide configuration and multiple databasesCompany database server
DatabaseA logical application container with its own catalogs and objectscommerce_app
SchemaA namespace inside the databasepublic, sales, reporting
TableActual row storagesales.users, sales.orders
CREATE SCHEMA IF NOT EXISTS sales;
CREATE SCHEMA IF NOT EXISTS reporting;

CREATE TABLE sales.users (
    user_id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    full_name VARCHAR(150) NOT NULL,
    loyalty_points INTEGER NOT NULL DEFAULT 0,
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Table design should make invalid states hard to insert. The table above uses a primary key, non-null columns for required fields, a unique email address, and timestamp defaults so application code does not need to keep recreating the same guardrails.

Basic CRUD

Create uses INSERT. PostgreSQL's RETURNING clause is especially useful because it immediately returns generated values such as a new identity column. That removes the need for a follow-up query.

INSERT INTO sales.users (email, full_name, loyalty_points)
VALUES ('ava.chen@example.com', 'Ava Chen', 150)
RETURNING user_id;

Read uses SELECT. Start simple and make the filter explicit.

SELECT user_id, email, full_name, loyalty_points, is_active, created_at
FROM sales.users
WHERE is_active = TRUE
ORDER BY created_at DESC;

Update should target the narrowest possible set of rows. Add a WHERE clause unless you intentionally mean to update everything.

UPDATE sales.users
SET loyalty_points = loyalty_points + 25,
    updated_at = NOW()
WHERE user_id = 1;
âš 
Practical note: if you want to use the updated_at column shown in update examples, define it up front in your production table. A revised version is below so the schema stays internally consistent.
DROP TABLE IF EXISTS sales.users;

CREATE TABLE sales.users (
    user_id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    full_name VARCHAR(150) NOT NULL,
    loyalty_points INTEGER NOT NULL DEFAULT 0,
    is_active BOOLEAN NOT NULL DEFAULT TRUE,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Delete removes data permanently. In production systems you often prefer soft deletes, but the basic syntax is still important.

DELETE FROM sales.users
WHERE user_id = 1;

Module 2: Filtering & Aggregation

Filtering narrows the working set. The WHERE clause is the database equivalent of giving a warehouse picker a very specific list instead of asking them to inspect every shelf. The more precise the filter, the less work the database has to do.

CREATE TABLE sales.orders (
    order_id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    user_id INTEGER NOT NULL REFERENCES sales.users(user_id),
    order_status VARCHAR(30) NOT NULL,
    total_amount NUMERIC(12, 2) NOT NULL,
    order_date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

Filtering Patterns

SELECT order_id, user_id, order_status, total_amount, order_date
FROM sales.orders
WHERE order_status = 'paid';
SELECT order_id, user_id, order_status
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered');
SELECT order_id, total_amount, order_date
FROM sales.orders
WHERE order_date BETWEEN DATE '2026-01-01' AND DATE '2026-03-31';
SELECT user_id, email, full_name
FROM sales.users
WHERE full_name ILIKE '%chen%';

ILIKE is PostgreSQL-specific and case-insensitive. Use it when a user-facing search should match Chen, chen, or CHEN without forcing you to call LOWER() on both sides of the comparison.

Aggregation with GROUP BY and HAVING

GROUP BY collapses rows into summaries. If raw order rows are receipts, aggregation is the month-end rollup. HAVING filters those grouped results after the aggregation has been calculated.

SELECT
    o.user_id,
    u.full_name,
    COUNT(*) AS purchase_count,
    SUM(o.total_amount) AS total_sales_amount
FROM sales.orders AS o
INNER JOIN sales.users AS u
    ON u.user_id = o.user_id
WHERE o.order_status IN ('paid', 'shipped', 'delivered')
GROUP BY o.user_id, u.full_name
HAVING COUNT(*) > 5
ORDER BY total_sales_amount DESC;
💡
Rule of thumb: use WHERE for row-level filters before grouping, and HAVING for group-level filters after grouping.

Module 3: The Power of Joins

Primary keys uniquely identify a row. Foreign keys point from one table to another and enforce that the relationship actually exists. In an e-commerce system, sales.users.user_id is a primary key, and sales.orders.user_id is a foreign key that says every order must belong to a real user.

ConceptPurposeExample
Primary KeyUniquely identifies a row in its own tablesales.users.user_id
Foreign KeyLinks one table to another and enforces referential integritysales.orders.user_id REFERENCES sales.users(user_id)

INNER JOIN: The strict intersection

Only rows with matching keys on both sides survive. If a user has never placed an order, that user disappears from the result.

SELECT
    u.user_id,
    u.full_name,
    o.order_id,
    o.total_amount
FROM sales.users AS u
INNER JOIN sales.orders AS o
    ON o.user_id = u.user_id
ORDER BY o.order_date DESC;

LEFT JOIN: Keep everything on the left

The left table is preserved even when the right side has no match. This is one of the most useful join types for reporting because missing related data becomes visible instead of disappearing.

SELECT
    u.user_id,
    u.full_name,
    o.order_id,
    o.order_status
FROM sales.users AS u
LEFT JOIN sales.orders AS o
    ON o.user_id = u.user_id
ORDER BY u.user_id;

FULL OUTER JOIN: Keep everything from both sides

This returns matched rows plus non-matching rows from either table. It is useful when reconciling two systems and you want to find records present on only one side.

SELECT
    u.user_id,
    u.full_name,
    o.order_id,
    o.order_status
FROM sales.users AS u
FULL OUTER JOIN sales.orders AS o
    ON o.user_id = u.user_id
ORDER BY u.user_id NULLS LAST, o.order_id NULLS LAST;

Find Users Who Have Never Placed an Order

This is a classic anti-join pattern. Start with all users, left join orders, and keep only the rows where the order side is missing.

SELECT
    u.user_id,
    u.email,
    u.full_name,
    u.created_at
FROM sales.users AS u
LEFT JOIN sales.orders AS o
    ON o.user_id = u.user_id
WHERE o.order_id IS NULL
ORDER BY u.created_at DESC;

SELF JOIN: Joining a Table to Itself

A self join connects rows within the same table. This is useful when rows have a parent-child relationship stored via a foreign key that points back to the same table's primary key. Think of an employee table where each employee has a manager_id referencing another employee.

CREATE TABLE hr.employees (
    employee_id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    employee_name VARCHAR(150) NOT NULL,
    manager_id INTEGER REFERENCES hr.employees(employee_id),
    department VARCHAR(100) NOT NULL
);

To list every employee alongside their manager's name:

SELECT
    e.employee_id,
    e.employee_name AS employee,
    m.employee_name AS manager
FROM hr.employees AS e
LEFT JOIN hr.employees AS m
    ON e.manager_id = m.employee_id
ORDER BY m.employee_name NULLS FIRST, e.employee_name;

Why LEFT JOIN instead of INNER JOIN? The CEO or top-level manager has no manager (manager_id IS NULL). An inner join would silently drop them from the result. A left join guarantees every employee appears, with NULL in the manager column for the top of the hierarchy.

You can also use a self join to find peers — employees who share the same manager:

SELECT
    a.employee_name AS employee_a,
    b.employee_name AS employee_b,
    a.manager_id
FROM hr.employees AS a
INNER JOIN hr.employees AS b
    ON a.manager_id = b.manager_id
    AND a.employee_id < b.employee_id
ORDER BY a.manager_id, a.employee_name;
💡
Why a.employee_id < b.employee_id? Without this condition, you would get duplicate pairs (Alice–Bob and Bob–Alice) plus self-pairs (Alice–Alice). The less-than filter keeps only one copy of each unique pair.

CROSS JOIN: The Cartesian Product

A cross join produces every possible combination of rows from two tables. If the left side has 3 rows and the right side has 4, the result has 12 rows. This sounds dangerous on large tables, but it is intentionally useful when you need to generate a grid or matrix from two small reference sets.

SELECT
    region.region_name,
    tier.tier_name
FROM (VALUES ('North America'), ('Europe'), ('APAC')) AS region(region_name)
CROSS JOIN (VALUES ('Standard'), ('Express'), ('Overnight')) AS tier(tier_name)
ORDER BY region.region_name, tier.tier_name;

A practical e-commerce example — generate a calendar skeleton for every user to detect months with zero orders:

WITH months AS (
    SELECT generate_series(
        DATE '2026-01-01',
        DATE '2026-06-01',
        INTERVAL '1 month'
    )::DATE AS month_start
),
active_users AS (
    SELECT user_id, full_name
    FROM sales.users
    WHERE is_active = TRUE
)
SELECT
    au.user_id,
    au.full_name,
    m.month_start,
    COUNT(o.order_id) AS order_count
FROM active_users AS au
CROSS JOIN months AS m
LEFT JOIN sales.orders AS o
    ON o.user_id = au.user_id
    AND DATE_TRUNC('month', o.order_date) = m.month_start
GROUP BY au.user_id, au.full_name, m.month_start
ORDER BY au.user_id, m.month_start;
âš 
Warning: never cross join two large tables accidentally. If both tables have 10,000 rows, the result is 100 million rows. Use cross joins deliberately with small reference data or VALUES lists.

LATERAL Join: The For-Each Subquery

LATERAL lets a subquery reference columns from the preceding table. Think of it as a for-each loop: for every row on the left side, PostgreSQL runs the lateral subquery using that row's values. This is perfect for "top-N per group" queries without window functions.

Fetch the 3 most recent orders for each user:

SELECT
    u.user_id,
    u.full_name,
    recent.order_id,
    recent.order_date,
    recent.total_amount
FROM sales.users AS u
LEFT JOIN LATERAL (
    SELECT o.order_id, o.order_date, o.total_amount
    FROM sales.orders AS o
    WHERE o.user_id = u.user_id
    ORDER BY o.order_date DESC
    LIMIT 3
) AS recent
    ON TRUE
ORDER BY u.user_id, recent.order_date DESC;

Why ON TRUE? The filtering already happened inside the lateral subquery's WHERE clause. The ON TRUE simply accepts every row the subquery returns. With a LEFT JOIN LATERAL, users with zero orders still appear with NULL values.

ℹ
LATERAL vs. Window Functions: both can solve "top-N per group." LATERAL is often more readable when you only need a few columns from the correlated table and want to limit rows. Window functions shine when you need to rank within the same result set without a separate subquery.

Set Operators: UNION, INTERSECT, EXCEPT

Set operators combine or compare entire result sets. Unlike joins, which match rows column-by-column using keys, set operators stack results vertically — both queries must produce the same number of columns with compatible data types.

OperatorKeepsDuplicates
UNIONAll unique rows from both queriesRemoved (dedup cost)
UNION ALLAll rows from both queriesKept (faster)
INTERSECTOnly rows that appear in both queriesRemoved
EXCEPTRows in the first query that are not in the secondRemoved

UNION & UNION ALL

UNION merges two result sets and removes duplicates. Use it when you need a deduplicated combined list, such as merging email subscribers with registered users.

-- Deduplicated list of all known emails
SELECT email FROM sales.users
UNION
SELECT email FROM sales.marketing_subscribers
ORDER BY email;

UNION ALL is faster because it skips deduplication. Use it when duplicates are acceptable or when you know the sources are already distinct.

-- All order events from two regional tables (duplicates OK)
SELECT order_id, event_type, created_at FROM sales.order_events_na
UNION ALL
SELECT order_id, event_type, created_at FROM sales.order_events_eu
ORDER BY created_at DESC;

INTERSECT

INTERSECT returns only rows present in both queries. This is useful for finding overlap — for example, users who are both registered customers and newsletter subscribers.

-- Emails that exist in both systems
SELECT email FROM sales.users
INTERSECT
SELECT email FROM sales.marketing_subscribers
ORDER BY email;

EXCEPT

EXCEPT returns rows from the first query that do not appear in the second. Think of it as set subtraction. This is a clean way to find subscribers who have not yet created an account.

-- Subscribers who are NOT registered users
SELECT email FROM sales.marketing_subscribers
EXCEPT
SELECT email FROM sales.users
ORDER BY email;
💡
Performance note: UNION, INTERSECT, and EXCEPT all perform implicit DISTINCT operations. On large data sets, prefer UNION ALL when deduplication is not required, and consider indexed columns to speed up the dedup step when it is.

Module 4: Temporary Structures & Views

Temporary Tables

CREATE TEMP TABLE creates session-scoped storage. A temp table behaves like a regular table for the current connection, but PostgreSQL drops it automatically when the session ends. It is useful for breaking a large transformation into steps without polluting the permanent schema.

CREATE TEMP TABLE recent_high_value_orders AS
SELECT order_id, user_id, total_amount, order_date
FROM sales.orders
WHERE order_date >= NOW() - INTERVAL '30 days'
  AND total_amount >= 500.00;

SELECT *
FROM recent_high_value_orders
ORDER BY order_date DESC;

Standard Views

A view is a saved query. It does not store the result itself. Instead, PostgreSQL runs the underlying query when you query the view. Views are useful when backend applications need a stable, readable interface over a more complex join.

CREATE OR REPLACE VIEW reporting.user_order_summary AS
SELECT
    u.user_id,
    u.full_name,
    u.email,
    COUNT(o.order_id) AS total_orders,
    COALESCE(SUM(o.total_amount), 0.00) AS total_spent,
    MAX(o.order_date) AS last_order_date
FROM sales.users AS u
LEFT JOIN sales.orders AS o
    ON o.user_id = u.user_id
GROUP BY u.user_id, u.full_name, u.email;
SELECT *
FROM reporting.user_order_summary
WHERE total_spent >= 1000.00
ORDER BY total_spent DESC;

Materialized Views

A materialized view stores data on disk. That makes reads faster for expensive reporting queries, but the result can become stale. You refresh it explicitly when you want current data. A normal view is like reading a live dashboard. A materialized view is like reading yesterday's generated report.

CREATE MATERIALIZED VIEW reporting.daily_sales_totals AS
SELECT
    DATE(order_date) AS sales_day,
    COUNT(*) AS order_count,
    SUM(total_amount) AS gross_revenue
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
GROUP BY DATE(order_date);

REFRESH MATERIALIZED VIEW reporting.daily_sales_totals;
âš 
Operational trade-off: materialized views speed up reads by shifting work into refresh time. That is often a good trade when dashboards are queried constantly but raw data changes in controlled batches.

Module 5: Intermediate Querying (CTEs & Window Functions)

Common Table Expressions (Standard)

The WITH clause improves readability. A CTE lets you name intermediate results instead of burying logic inside nested subqueries. It reads more like a step-by-step plan, which matters when a query becomes part of production reporting or an analytics pipeline.

WITH order_totals AS (
    SELECT
        user_id,
        AVG(total_amount) AS average_order_value
    FROM sales.orders
    WHERE order_status IN ('paid', 'shipped', 'delivered')
    GROUP BY user_id
)
SELECT
    u.user_id,
    u.full_name,
    ot.average_order_value
FROM order_totals AS ot
INNER JOIN sales.users AS u
    ON u.user_id = ot.user_id
WHERE ot.average_order_value > 150.00
ORDER BY ot.average_order_value DESC;

Chaining Multiple CTEs

CTEs can reference earlier CTEs, creating a readable step-by-step data pipeline:

WITH paid_orders AS (
    SELECT user_id, total_amount
    FROM sales.orders
    WHERE order_status IN ('paid', 'shipped', 'delivered')
),
user_totals AS (
    SELECT
        user_id,
        COUNT(*)          AS order_count,
        SUM(total_amount) AS lifetime_value
    FROM paid_orders
    GROUP BY user_id
)
SELECT
    u.full_name,
    ut.order_count,
    ut.lifetime_value
FROM user_totals AS ut
INNER JOIN sales.users AS u ON u.user_id = ut.user_id
WHERE ut.lifetime_value >= 1000.00
ORDER BY ut.lifetime_value DESC;

Recursive CTEs

WITH RECURSIVE traverses hierarchical data. It has two parts: an anchor query (the starting rows) and a recursive member that walks to the next level. PostgreSQL repeats the recursive member until no new rows are produced.

-- Org chart: who reports to whom, with depth
WITH RECURSIVE org_chart AS (
    -- Anchor: top-level (no manager)
    SELECT
        employee_id,
        employee_name,
        manager_id,
        0 AS depth,
        employee_name::TEXT AS path
    FROM hr.employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step: walk down the tree
    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        oc.depth + 1,
        oc.path || ' → ' || e.employee_name
    FROM hr.employees AS e
    INNER JOIN org_chart AS oc
        ON e.manager_id = oc.employee_id
)
SELECT employee_id, employee_name, depth, path
FROM org_chart
ORDER BY path;
💡
Safety tip: always include a depth limit (WHERE depth < 20) or use PostgreSQL 14’s CYCLE clause to prevent infinite recursion when data has circular references.

Window Functions & Ranking

Window functions calculate across related rows without collapsing them. That is the critical difference from GROUP BY. If GROUP BY turns many rows into one summary row, a window function leaves the detail rows intact and attaches the calculation beside each one.

Why window functions matter INTERMEDIATE

They solve ranking, running totals, gaps between events, and “latest row per group” queries cleanly without self-joins or procedural loops.

Anatomy of OVER(): the PARTITION BY clause defines the group boundary (like GROUP BY but without collapsing). The ORDER BY inside OVER() determines row sequence within each partition.

ROW_NUMBER() — Unique Sequential Rank

ROW_NUMBER() assigns a unique integer to each row within its partition. Even when two rows have the same value in the ORDER BY column, they get different numbers. This makes it ideal for “pick exactly one row per group” queries.

SELECT
    user_id,
    order_id,
    order_date,
    total_amount,
    ROW_NUMBER() OVER (
        PARTITION BY user_id
        ORDER BY order_date DESC
    ) AS order_rank
FROM sales.orders;

To find the most recent order for every user, wrap the ranking logic and keep only rank 1:

WITH ranked_orders AS (
    SELECT
        user_id,
        order_id,
        order_date,
        total_amount,
        ROW_NUMBER() OVER (
            PARTITION BY user_id
            ORDER BY order_date DESC
        ) AS order_rank
    FROM sales.orders
)
SELECT
    user_id,
    order_id,
    order_date,
    total_amount
FROM ranked_orders
WHERE order_rank = 1
ORDER BY order_date DESC;

RANK() — Gaps After Ties

RANK() handles ties by giving identical values the same rank, then skipping. If two orders tie for rank 1, the next order gets rank 3 (not 2). This is useful when you need to know the competitive position and gaps matter — like a leaderboard where two students tie for 1st, and the next is 3rd.

SELECT
    user_id,
    order_id,
    total_amount,
    RANK() OVER (
        PARTITION BY user_id
        ORDER BY total_amount DESC
    ) AS spending_rank
FROM sales.orders
ORDER BY user_id, spending_rank;

DENSE_RANK() — No Gaps After Ties

DENSE_RANK() also handles ties, but without gaps. If two orders tie for rank 1, the next order gets rank 2. This is better when you want to know “how many distinct levels exist” rather than the absolute position.

SELECT
    user_id,
    order_id,
    total_amount,
    DENSE_RANK() OVER (
        PARTITION BY user_id
        ORDER BY total_amount DESC
    ) AS dense_spending_rank
FROM sales.orders
ORDER BY user_id, dense_spending_rank;

Comparing All Three Rankings Side by Side

The difference becomes obvious when ties exist:

SELECT
    user_id,
    order_id,
    total_amount,
    ROW_NUMBER() OVER w AS row_num,
    RANK()       OVER w AS rank_val,
    DENSE_RANK() OVER w AS dense_rank_val
FROM sales.orders
WINDOW w AS (PARTITION BY user_id ORDER BY total_amount DESC)
ORDER BY user_id, total_amount DESC;
total_amountROW_NUMBERRANKDENSE_RANK
500.00111
500.00211
300.00332
150.00443
💡
Named WINDOW clause: the WINDOW w AS (...) syntax lets you define the window once and reuse it across multiple functions — reducing repetition and making the query easier to maintain.

NTILE(n) — Bucket Assignment

NTILE(n) divides the partition into n roughly equal buckets. This is useful for percentile analysis: split customers into quartiles by spending, or orders into deciles by value.

SELECT
    user_id,
    full_name,
    lifetime_spent,
    NTILE(4) OVER (ORDER BY lifetime_spent DESC) AS spending_quartile
FROM (
    SELECT
        u.user_id,
        u.full_name,
        SUM(o.total_amount) AS lifetime_spent
    FROM sales.users AS u
    INNER JOIN sales.orders AS o
        ON o.user_id = u.user_id
    WHERE o.order_status IN ('paid', 'shipped', 'delivered')
    GROUP BY u.user_id, u.full_name
) AS user_totals
ORDER BY spending_quartile, lifetime_spent DESC;

Quartile 1 = top 25% spenders, quartile 4 = bottom 25%. Marketing teams often use this to segment users for targeted campaigns.

LAG() and LEAD() — Peeking at Neighbors

These functions let a row look at the previous or next row in the window. This is useful for change analysis — week-over-week revenue growth, time between consecutive orders, or detecting gaps in a sequence.

WITH weekly_sales AS (
    SELECT
        DATE_TRUNC('week', order_date) AS sales_week,
        SUM(total_amount) AS weekly_revenue
    FROM sales.orders
    WHERE order_status IN ('paid', 'shipped', 'delivered')
    GROUP BY DATE_TRUNC('week', order_date)
)
SELECT
    sales_week,
    weekly_revenue,
    LAG(weekly_revenue)  OVER (ORDER BY sales_week) AS prev_week,
    LEAD(weekly_revenue) OVER (ORDER BY sales_week) AS next_week,
    ROUND(
        (weekly_revenue - LAG(weekly_revenue) OVER (ORDER BY sales_week))
        / NULLIF(LAG(weekly_revenue) OVER (ORDER BY sales_week), 0)
        * 100, 1
    ) AS wow_growth_pct
FROM weekly_sales
ORDER BY sales_week;

NULLIF prevents division by zero when the previous week has no revenue. The LAG default is NULL for the first row (no previous row exists), so the growth percentage will be NULL for week 1 — which is correct.

Running Totals with SUM() OVER

An aggregate function inside OVER() creates a running calculation. This is how you build cumulative revenue, running order counts, or progressive averages without self-joins.

SELECT
    order_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_revenue
FROM sales.orders
WHERE order_status IN ('paid', 'shipped', 'delivered')
ORDER BY order_date;
ℹ
Frame clause matters: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW explicitly says “sum everything from the first row up to this row.” Without specifying the frame, PostgreSQL uses a default that may behave differently when there are ties in the ORDER BY column.

Module 6: Stored Procedures

A stored procedure encapsulates database logic on the server side. PostgreSQL introduced CREATE PROCEDURE in version 11. Unlike functions, procedures can manage transactions with COMMIT and ROLLBACK, making them ideal for multi-step batch operations where each step should either fully succeed or fully revert.

FeatureFunctionProcedure
Returns a valueYes (scalar or table)No (uses OUT parameters)
Callable in SELECTYesNo — use CALL
Transaction controlNoYes (COMMIT / ROLLBACK)
Best forComputed values, reusable queriesBatch workflows, multi-step mutations

Basic Procedure Structure

A minimal procedure that updates loyalty points for all users who placed an order this month:

CREATE OR REPLACE PROCEDURE sales.award_monthly_loyalty()
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE sales.users AS u
    SET loyalty_points = u.loyalty_points + 10
    WHERE EXISTS (
        SELECT 1
        FROM sales.orders AS o
        WHERE o.user_id = u.user_id
          AND o.order_date >= DATE_TRUNC('month', CURRENT_DATE)
          AND o.order_status IN ('paid', 'shipped', 'delivered')
    );
END;
$$;

Call it with:

CALL sales.award_monthly_loyalty();

Parameters: IN, OUT, INOUT

IN parameters pass values into the procedure. OUT parameters return values to the caller. INOUT does both — the caller provides a value that the procedure can modify.

CREATE OR REPLACE PROCEDURE sales.apply_discount(
    IN p_user_id INTEGER,
    IN p_discount_pct NUMERIC,
    INOUT p_orders_updated INTEGER DEFAULT 0
)
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE sales.orders
    SET total_amount = total_amount * (1 - p_discount_pct / 100)
    WHERE user_id = p_user_id
      AND order_status = 'pending_payment';

    GET DIAGNOSTICS p_orders_updated = ROW_COUNT;
END;
$$;
CALL sales.apply_discount(42, 15.0, NULL);
-- Returns the number of updated orders via the INOUT parameter
¡
GET DIAGNOSTICS captures metadata about the last SQL statement — ROW_COUNT tells you how many rows were affected, which is useful for logging and validation inside procedures.

Control Flow in PL/pgSQL

PL/pgSQL supports IF, CASE, and loops. This lets you build conditional logic directly in the database rather than round-tripping to application code.

CREATE OR REPLACE PROCEDURE sales.upgrade_user_tier(
    IN p_user_id INTEGER
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_total_spent NUMERIC;
    v_new_tier TEXT;
BEGIN
    SELECT COALESCE(SUM(total_amount), 0)
    INTO v_total_spent
    FROM sales.orders
    WHERE user_id = p_user_id
      AND order_status IN ('paid', 'shipped', 'delivered');

    IF v_total_spent >= 10000 THEN
        v_new_tier := 'platinum';
    ELSIF v_total_spent >= 5000 THEN
        v_new_tier := 'gold';
    ELSIF v_total_spent >= 1000 THEN
        v_new_tier := 'silver';
    ELSE
        v_new_tier := 'bronze';
    END IF;

    UPDATE sales.users
    SET loyalty_tier = v_new_tier,
        updated_at = NOW()
    WHERE user_id = p_user_id;
END;
$$;

Temporary Tables Inside Procedures

Temp tables are powerful staging areas inside procedures. Use them to collect intermediate results, then process those results in subsequent steps. The ON COMMIT DROP option guarantees the temp table is cleaned up when the transaction ends.

CREATE OR REPLACE PROCEDURE billing.process_batch_payments(
    IN p_batch_id BIGINT
)
LANGUAGE plpgsql
AS $$
BEGIN
    -- Step 1: Create a staging table for this batch
    CREATE TEMP TABLE IF NOT EXISTS payment_worklist (
        order_id BIGINT,
        user_id BIGINT,
        amount NUMERIC(12, 2)
    ) ON COMMIT DROP;

    -- Step 2: Load pending payments into the staging table
    INSERT INTO payment_worklist (order_id, user_id, amount)
    SELECT order_id, user_id, total_amount
    FROM sales.orders
    WHERE payment_batch_id = p_batch_id
      AND order_status = 'pending_payment';

    -- Step 3: Mark orders as paid
    UPDATE sales.orders AS o
    SET order_status = 'paid',
        paid_at = NOW()
    FROM payment_worklist AS pw
    WHERE o.order_id = pw.order_id;

    -- Step 4: Award loyalty points based on the batch
    UPDATE sales.users AS u
    SET loyalty_points = u.loyalty_points + 10
    FROM payment_worklist AS pw
    WHERE u.user_id = pw.user_id;

    COMMIT;
EXCEPTION
    WHEN OTHERS THEN
        ROLLBACK;
        RAISE;
END;
$$;
&#9888;
ON COMMIT DROP guarantees cleanup — the temp table is destroyed at the end of the transaction. Use this when temp data should never survive the transaction boundary, especially in batch-style procedures.

Error Handling with EXCEPTION

The EXCEPTION block catches errors inside PL/pgSQL. This lets you log failures, clean up resources, or retry before the error propagates to the caller.

CREATE OR REPLACE PROCEDURE sales.safe_delete_user(
    IN p_user_id INTEGER
)
LANGUAGE plpgsql
AS $$
BEGIN
    DELETE FROM sales.orders WHERE user_id = p_user_id;
    DELETE FROM sales.users  WHERE user_id = p_user_id;

    RAISE NOTICE 'User % and related orders deleted.', p_user_id;
EXCEPTION
    WHEN foreign_key_violation THEN
        RAISE EXCEPTION 'Cannot delete user %: active references exist.', p_user_id;
    WHEN OTHERS THEN
        RAISE EXCEPTION 'Unexpected error deleting user %: %', p_user_id, SQLERRM;
END;
$$;
&#8505;
When to use procedures vs. application code: put logic in a procedure when it involves multiple tables in a single transaction, needs to stay close to the data for performance, or must be consistent across all callers (APIs, reports, cron jobs). Keep business rules in application code when they need unit testing, frequent changes, or integration with external services.

Module 7: Advanced Grouping & Pivoting

Standard GROUP BY produces one level of aggregation. PostgreSQL offers ROLLUP, CUBE, and GROUPING SETS to compute subtotals and grand totals in a single pass — no need to UNION ALL multiple queries.

ROLLUP — Hierarchical Subtotals

ROLLUP creates subtotals from right to left. ROLLUP(a, b) produces groups (a, b), (a), and ().

SELECT
    sales_region,
    order_status,
    COUNT(*)          AS order_count,
    SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY ROLLUP (sales_region, order_status)
ORDER BY sales_region NULLS LAST, order_status NULLS LAST;

Rows where order_status IS NULL are region subtotals. The row where both columns are NULL is the grand total.

CUBE — All Combinations

CUBE generates subtotals for every possible combination. For two columns this gives: (a, b), (a), (b), ().

SELECT
    sales_region,
    payment_method,
    SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY CUBE (sales_region, payment_method)
ORDER BY sales_region NULLS LAST, payment_method NULLS LAST;

GROUPING SETS — Explicit Control

You choose exactly which groupings to compute. ROLLUP and CUBE are shortcuts for common patterns — GROUPING SETS gives full control.

SELECT
    sales_region,
    payment_method,
    SUM(total_amount) AS revenue
FROM sales.orders
GROUP BY GROUPING SETS (
    (sales_region, payment_method),
    (sales_region),
    (payment_method),
    ()
)
ORDER BY sales_region NULLS LAST, payment_method NULLS LAST;

Pivoting with FILTER

PostgreSQL does not have a native PIVOT keyword. The idiomatic approach is conditional aggregation using FILTER — it turns row values into columns.

SELECT
    DATE_TRUNC('month', order_date) AS sales_month,
    SUM(total_amount) FILTER (WHERE order_status = 'paid')      AS paid,
    SUM(total_amount) FILTER (WHERE order_status = 'shipped')   AS shipped,
    SUM(total_amount) FILTER (WHERE order_status = 'cancelled') AS cancelled
FROM sales.orders
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY sales_month;
💡
FILTER vs. CASE: SUM(x) FILTER (WHERE ...) is the PostgreSQL idiom. SUM(CASE WHEN ... THEN x ELSE 0 END) achieves the same result and works on all SQL databases if portability matters.

Module 8: Performance & Maintenance Basics

Indexes

A B-Tree index is like a phonebook. Instead of scanning every person in a city to find one surname, you jump into a sorted structure that gets you close immediately and then narrow from there. PostgreSQL uses B-Tree indexes by default, and they are a good fit for equality and range filters on frequently searched columns.

CREATE INDEX idx_users_email
ON sales.users (email);

That index is valuable if the application often looks up users by email during login, checkout, or account management workflows.

Query Analysis with EXPLAIN ANALYZE

EXPLAIN ANALYZE shows the actual execution plan and timing. Use it to confirm whether PostgreSQL is scanning the whole table with a Seq Scan or using an index with an Index Scan or Bitmap Index Scan. For small tables a sequential scan can be fine. For large ones it often becomes the first sign of missing indexing.

EXPLAIN ANALYZE
SELECT user_id, email, full_name
FROM sales.users
WHERE email = 'ava.chen@example.com';
Plan shapeTypical meaningWhat to ask
Seq ScanPostgreSQL is reading rows in table orderIs the table small, or is an index missing?
Index ScanPostgreSQL is navigating through an indexIs the predicate selective enough to benefit from the index?
Bitmap Index ScanPostgreSQL uses an index, then fetches many matching heap pages efficientlyIs the query matching a moderate chunk of the table?
💡
Performance habit: do not add indexes blindly. First identify slow queries, inspect the plan, then add the smallest useful index that matches the real access pattern.

Module 9: User-Defined Functions

Functions encapsulate reusable logic that returns a value. Unlike procedures (which use CALL and cannot appear in queries), functions can be used inside SELECT, WHERE, and JOIN clauses, making them composable building blocks.

FeatureFunctionProcedure
Returns a valueYes (scalar or table)No
Callable in SELECTYesNo — use CALL
Transaction controlNoYes

Scalar Function

A scalar function returns one value. Use it when you have a business-rule calculation that is reused across multiple queries.

CREATE OR REPLACE FUNCTION billing.calculate_order_tax(
    p_subtotal NUMERIC,
    p_tax_rate NUMERIC
)
RETURNS NUMERIC
LANGUAGE plpgsql
IMMUTABLE
AS $$
BEGIN
    RETURN ROUND(p_subtotal * p_tax_rate, 2);
END;
$$;
-- Use it inside a query
SELECT
    order_id,
    total_amount,
    billing.calculate_order_tax(total_amount, 0.0825) AS tax
FROM sales.orders
LIMIT 5;
IMMUTABLE tells PostgreSQL that the same inputs always produce the same output. This lets the planner optimize calls by caching results. Use STABLE for functions that read data but do not write, and VOLATILE (the default) for functions with side effects.

Pure SQL Function (Inlineable)

A LANGUAGE sql function can be inlined by the planner — PostgreSQL copy-pastes the SQL directly into the calling query, avoiding all function-call overhead. Use this for simple expressions.

CREATE OR REPLACE FUNCTION billing.net_amount(
    p_gross NUMERIC,
    p_tax   NUMERIC
)
RETURNS NUMERIC
LANGUAGE sql
IMMUTABLE
AS $$
    SELECT ROUND(p_gross - p_tax, 2);
$$;

Table-Returning Function

A table function returns a result set that callers use in FROM like a virtual table. Think of it as a parameterized view.

CREATE OR REPLACE FUNCTION reporting.get_user_orders(
    p_user_id BIGINT
)
RETURNS TABLE (
    order_id     BIGINT,
    order_date   TIMESTAMP WITH TIME ZONE,
    order_status TEXT,
    total_amount NUMERIC(12, 2)
)
LANGUAGE plpgsql
STABLE
AS $$
BEGIN
    RETURN QUERY
    SELECT o.order_id, o.order_date, o.order_status, o.total_amount
    FROM sales.orders AS o
    WHERE o.user_id = p_user_id
    ORDER BY o.order_date DESC;
END;
$$;
SELECT * FROM reporting.get_user_orders(42);

Trigger Function

A trigger function runs automatically when data changes. It returns TRIGGER and has access to NEW (the incoming row) and OLD (the previous row). A common use case is auto-setting the updated_at timestamp.

CREATE OR REPLACE FUNCTION sales.set_updated_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
    NEW.updated_at := NOW();
    RETURN NEW;
END;
$$;

CREATE TRIGGER trg_users_updated_at
    BEFORE UPDATE ON sales.users
    FOR EACH ROW
    EXECUTE FUNCTION sales.set_updated_at();
💡
When to use functions vs. application code: use functions for logic that must be consistent across all callers (queries, reports, batch jobs) and benefits from being close to the data. Keep business rules in application code when they need unit testing, rapid iteration, or integration with external services.

Reference Links