PostgreSQL JSONB & Advanced Query Vocabulary for Backend Developers
Master PostgreSQL JSONB operators, GIN indexes, window functions, CTEs, lateral joins, and query plan vocabulary — with real SQL examples and developer conversation phrases.
PostgreSQL is one of the most powerful open-source databases in the world — and its vocabulary goes well beyond basic SELECT and JOIN. If you work on a backend team that uses Postgres, you will encounter terms like GIN index, lateral join, window function, and bitmap scan in code reviews, performance investigations, and architecture discussions. This guide explains them all in plain English with real examples.
JSONB vs JSON
JSON vs JSONB Storage
PostgreSQL supports two types for storing JSON data:
jsonstores the raw text, exactly as given. It is slower to query because it re-parses on every access.jsonbstores parsed binary representation. It is faster to query and supports indexing, but slightly slower on insert due to the parsing overhead.
Use jsonb for almost everything. json is only useful if you need to preserve the exact original text (including whitespace and duplicate keys).
“Always use
jsonbunless you have a specific reason forjson.jsonbis indexable and far faster for reads.”
JSONB Operators
-> and ->>
->returns a JSON object/array element by key (result isjsonb)->>returns a JSON object element astext
SELECT data -> 'address' FROM users; -- returns jsonb
SELECT data ->> 'name' FROM users; -- returns text
“Use
->>when you need to compare the value as a string. Use->when you’re chaining further into nested JSON.”
#> and #>>
#>extracts a nested value at a path (result isjsonb)#>>extracts a nested value at a path astext
SELECT data #> '{address, city}' FROM users; -- returns jsonb
SELECT data #>> '{address, city}' FROM users; -- returns text
@> — Contains
@> tests whether the left jsonb value contains the right value. This operator is accelerated by GIN indexes.
SELECT * FROM products WHERE attributes @> '{"colour": "blue"}';
“
@>is your best friend for filtering on jsonb. Make sure you have a GIN index on that column, or every query will do a sequential scan.”
<@ — Is Contained By
<@ is the reverse of @> — it tests whether the left value is contained by the right.
? — Key Exists
? tests whether a text key exists in a jsonb object.
SELECT * FROM events WHERE payload ? 'errorCode';
Indexing JSONB
GIN Index on JSONB
A GIN (Generalised Inverted Index) index on a jsonb column creates an index over all keys and values inside the JSON documents. It dramatically speeds up @>, ?, and ?| queries.
CREATE INDEX idx_attributes_gin ON products USING GIN (attributes);
“Without a GIN index, every
@>query does a full sequential scan. Once you add the GIN index, those queries drop from seconds to milliseconds.”
Full-Text Search
tsvector
A tsvector is a sorted list of normalised lexemes (words) representing a document. It is the data type you index for full-text search.
SELECT to_tsvector('english', 'The quick brown fox') AS lexemes;
tsquery
A tsquery is a parsed text search query — one or more lexemes connected by & (AND), | (OR), or ! (NOT).
SELECT * FROM articles WHERE search_vector @@ to_tsquery('english', 'postgresql & index');
“Store a pre-computed
tsvectorcolumn and index it with GIN. Callingto_tsvectoron every query without an index is slow.”
Window Functions
Window functions perform calculations across a set of rows related to the current row — without collapsing them into a single output row like GROUP BY does.
ROW_NUMBER()
ROW_NUMBER() assigns a sequential integer to each row within a partition.
SELECT user_id, order_date,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS order_num
FROM orders;
“Use
ROW_NUMBER()to find the most recent order per user — partition byuser_id, order byorder_datedescending, and filter fororder_num = 1.”
RANK() and DENSE_RANK()
RANK() assigns the same rank to tied rows but skips subsequent ranks. DENSE_RANK() does not skip.
LAG() and LEAD()
LAG() accesses the value of a column from a previous row. LEAD() accesses the next row. Useful for calculating differences between consecutive rows.
SELECT date, revenue,
LAG(revenue) OVER (ORDER BY date) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY date) AS revenue_change
FROM daily_stats;
“LAG and LEAD let you compare a row with its neighbours without a self-join. Much cleaner.”
CTEs and Subqueries
CTE (Common Table Expression)
A CTE (introduced with WITH) is a named temporary result set that you can reference within the main query. It improves readability but does not always improve performance.
WITH active_users AS (
SELECT id FROM users WHERE last_login > NOW() - INTERVAL '30 days'
)
SELECT count(*) FROM orders WHERE user_id IN (SELECT id FROM active_users);
CTE vs Subquery Trade-offs
In older Postgres versions (before 12), CTEs were optimisation fences — the query planner could not push predicates into them. From Postgres 12, non-recursive CTEs are inlined by default. Subqueries are often more efficient for simple filters; CTEs shine for readability and recursion.
“In Postgres 14, this CTE is inlined and the planner optimises it the same as a subquery. Use the one that reads more clearly.”
Advanced JOIN and Functions
Lateral Join
A lateral join allows each row of the left side to be used in the subquery on the right side. It is like a for loop in SQL — for each row, evaluate the subquery.
SELECT u.id, recent.title
FROM users u
JOIN LATERAL (
SELECT title FROM posts WHERE user_id = u.id ORDER BY created_at DESC LIMIT 3
) recent ON TRUE;
“The lateral join is perfect for ‘top N per group’ queries. Much cleaner than a correlated subquery.”
Set-Returning Function
A set-returning function is a function that can return multiple rows. generate_series(), unnest(), and jsonb_array_elements() are common examples.
“Use
jsonb_array_elements()to unnest a JSONB array into rows — then you can join, filter, and aggregate each element as if it were a table row.”
Query Plans
Index Scan vs Bitmap Scan vs Sequential Scan
Understanding how Postgres accesses your data is essential for performance work.
- Sequential scan (Seq Scan) — reads every row in the table. Appropriate for small tables or when a large fraction of rows match. Expensive on large tables.
- Index scan — follows the index to find matching rows one by one. Good when few rows match. Random I/O can be slow if many rows are scattered across many pages.
- Bitmap scan — reads the index to build a bitmap of matching page locations, then fetches those pages in order. A good middle ground when many rows match but a full sequential scan would be wasteful.
“The query does a sequential scan on a 50 million row table — add an index on
statusand the planner will switch to a bitmap scan.”
Partial Index
A partial index indexes only a subset of rows — those that satisfy a WHERE clause. It is smaller, faster, and cheaper to maintain than a full index.
CREATE INDEX idx_orders_pending ON orders (created_at) WHERE status = 'pending';
“We added a partial index for
status = 'active'because 95% of queries only care about active records. The index is a tenth of the size of a full index.”
Common PostgreSQL Phrases
| Phrase | Meaning |
|---|---|
| ”Run EXPLAIN ANALYSE” | Show the actual query execution plan and timings |
| ”The planner chose a seq scan” | PostgreSQL decided to read the whole table |
| ”Add a GIN index” | Index a jsonb or tsvector column for fast containment queries |
| ”The CTE is a fence” | In older Postgres, the query planner can’t optimise across a CTE boundary |
| ”Partition by user” | Group window function calculations per user |
| ”Unnest the array” | Expand a PostgreSQL array or jsonb array into individual rows |
| ”Write a partial index” | Index only the rows that match a specific condition |