SQL Jargon Reference

37 of the SQL words that come up in code review and interviews — what each one means, a query example, and the gotcha you'll wish someone had told you.

Last reviewed:

Sections

JOINs

INNER JOIN

Returns only rows that match in BOTH tables. The default — JOIN with no qualifier means INNER JOIN.

SELECT u.name, o.total
FROM users u
JOIN orders o ON o.user_id = u.id;

💡 Users with no orders disappear from the result.

LEFT (OUTER) JOIN

Keeps every row from the left table; right-table columns are NULL where there is no match.

SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;

💡 Filter o.id IS NULL to find users with no orders (an anti-join).

RIGHT (OUTER) JOIN

Mirror of LEFT JOIN — keeps every row from the right table. Rare in practice; people just flip the tables.

SELECT u.name, o.total
FROM users u
RIGHT JOIN orders o ON o.user_id = u.id;

FULL OUTER JOIN

Keeps unmatched rows from BOTH sides, filling missing columns with NULL.

SELECT u.name, o.total
FROM users u
FULL OUTER JOIN orders o ON o.user_id = u.id;

💡 MySQL lacks it natively — emulate with LEFT JOIN UNION RIGHT JOIN.

CROSS JOIN

The Cartesian product — every row of A paired with every row of B. No ON clause.

SELECT s.size, c.color
FROM sizes s
CROSS JOIN colors c;

💡 A missing join condition silently becomes a CROSS JOIN — the classic accidental row explosion.

SELF JOIN

A table joined to itself, using aliases, to relate rows within the same table (e.g. employee → manager).

SELECT e.name, m.name AS manager
FROM employees e
JOIN employees m ON e.manager_id = m.id;

Aggregates & GROUP BY

GROUP BY

Collapses rows that share the same key into one row per group, so aggregate functions run per group.

SELECT user_id, COUNT(*) AS orders
FROM orders
GROUP BY user_id;

💡 Every non-aggregated column in SELECT must appear in GROUP BY (in strict SQL).

HAVING

Filters AFTER grouping, on aggregate results. WHERE filters rows before grouping; HAVING filters groups.

SELECT user_id, COUNT(*) AS n
FROM orders
GROUP BY user_id
HAVING COUNT(*) > 5;

COUNT / SUM / AVG / MIN / MAX

The standard aggregate functions. COUNT(*) counts rows; COUNT(col) ignores NULLs; the rest reduce a column to one value.

SELECT COUNT(*), SUM(total), AVG(total), MIN(total), MAX(total)
FROM orders;

💡 AVG and SUM skip NULLs, which can surprise you.

DISTINCT

Removes duplicate rows from the result, or counts unique values with COUNT(DISTINCT col).

SELECT COUNT(DISTINCT country) FROM users;

Window functions

OVER

Turns an aggregate or window function into a window — it computes per row WITHOUT collapsing rows like GROUP BY does.

SELECT name, salary,
  AVG(salary) OVER () AS company_avg
FROM employees;

PARTITION BY

Splits rows into groups for a window function — like GROUP BY, but the rows are kept.

SELECT dept, name, salary,
  AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;

ROW_NUMBER

Assigns a unique sequential number per row within each partition, in the ORDER BY order.

SELECT name, dept,
  ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees;

💡 Filter rn = 1 in an outer query to get the top row per group.

RANK / DENSE_RANK

Rank rows by an ORDER BY. RANK leaves gaps after ties (1,1,3); DENSE_RANK does not (1,1,2).

SELECT name, score,
  RANK()       OVER (ORDER BY score DESC) AS r,
  DENSE_RANK() OVER (ORDER BY score DESC) AS dr
FROM players;

LAG / LEAD

Read a value from a previous (LAG) or following (LEAD) row in the window — perfect for diffs over time.

SELECT day, revenue,
  revenue - LAG(revenue) OVER (ORDER BY day) AS delta
FROM daily_sales;

running total

A cumulative sum computed with a window frame — SUM(...) OVER with rows from the start up to the current row.

SELECT day, amount,
  SUM(amount) OVER (ORDER BY day
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running
FROM ledger;

Transactions & isolation

ACID

The four guarantees a transactional database gives: Atomicity, Consistency, Isolation, Durability.

-- Atomic: both or neither
BEGIN;
UPDATE accounts SET bal = bal - 100 WHERE id = 1;
UPDATE accounts SET bal = bal + 100 WHERE id = 2;
COMMIT;

BEGIN / COMMIT / ROLLBACK

Start a transaction, then make all its changes permanent (COMMIT) or throw them all away (ROLLBACK).

BEGIN;
DELETE FROM logs WHERE created_at < now() - interval '90 days';
ROLLBACK; -- changed my mind

READ UNCOMMITTED

Weakest isolation level — a transaction can read another’s uncommitted changes (dirty reads allowed).

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

💡 Postgres treats this the same as READ COMMITTED — it never allows dirty reads.

READ COMMITTED

Each statement only sees data committed before it started. The default in PostgreSQL and Oracle.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

💡 No dirty reads, but the same query can return different rows twice (non-repeatable read).

REPEATABLE READ

Rows you read once stay the same for the whole transaction. MySQL InnoDB’s default.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

💡 Prevents dirty and non-repeatable reads; phantom rows can still appear in standard SQL.

SERIALIZABLE

Strongest level — transactions behave as if run one after another. Prevents phantoms at the cost of more aborts/locks.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

dirty read

Reading data another transaction has written but NOT committed — it might be rolled back, so you read a value that never existed.

-- T1 updates row, T2 reads it, T1 rolls back
-- T2 saw a phantom value

phantom read

Re-running a range query returns new rows that another committed transaction inserted in between.

SELECT COUNT(*) FROM orders WHERE total > 100;
-- run twice: count grows because a row was inserted

Indexes & query plans

B-tree index

The default, general-purpose index. Balanced tree that supports equality and range queries and ORDER BY.

CREATE INDEX idx_orders_created
ON orders (created_at);

💡 Great for =, <, >, BETWEEN, and sorting. Useless for arbitrary LIKE '%x%'.

hash index

Maps a hashed key to a location — fast for exact equality only, no ranges or sorting.

CREATE INDEX idx_sessions_token
ON sessions USING hash (token);

GIN index

Generalised Inverted iNdex (Postgres) — for columns holding many values per row: arrays, JSONB, full-text.

CREATE INDEX idx_docs_body
ON docs USING gin (to_tsvector('english', body));

GiST index

Generalised Search Tree (Postgres) — for geometric, range, and nearest-neighbour queries (PostGIS, ranges).

CREATE INDEX idx_shapes_geom
ON shapes USING gist (geom);

composite index

An index on multiple columns. Order matters — it can serve queries that use a leftmost prefix of the columns.

CREATE INDEX idx_orders_user_date
ON orders (user_id, created_at);

💡 This index helps WHERE user_id = ? and WHERE user_id = ? AND created_at > ?, but not created_at alone.

covering index

An index that contains every column a query needs, so the database answers from the index alone — an index-only scan.

CREATE INDEX idx_cover
ON orders (user_id) INCLUDE (total, status);

EXPLAIN / EXPLAIN ANALYZE

Shows the query plan (estimated). EXPLAIN ANALYZE actually runs the query and reports real timing and row counts.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42;

💡 Look for "Seq Scan" on big tables — usually a missing index.

Clauses & expressions

CTE (WITH)

A named, inline result set defined with WITH that you can reference like a table. Improves readability and enables recursion.

WITH recent AS (
  SELECT * FROM orders WHERE created_at > now() - interval '7 days'
)
SELECT user_id, COUNT(*) FROM recent GROUP BY user_id;

subquery

A query nested inside another — in SELECT, FROM, or WHERE. A correlated subquery references the outer row.

SELECT name FROM users u
WHERE (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) > 0;

UNION vs UNION ALL

Both stack result sets. UNION removes duplicates (and pays for a sort/hash); UNION ALL keeps everything and is faster.

SELECT id FROM active
UNION ALL
SELECT id FROM archived;

💡 Use UNION ALL unless you actually need de-duplication.

EXISTS

A boolean test — true if a subquery returns any row. Often faster than IN for large or correlated checks.

SELECT name FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

COALESCE

Returns the first non-NULL argument. The standard way to supply a default for NULL values.

SELECT COALESCE(nickname, name, 'Anonymous') AS display
FROM users;

CASE

SQL’s if/else expression — return different values per row based on conditions. Handy for bucketing and pivoting.

SELECT name,
  CASE WHEN age < 18 THEN 'minor'
       WHEN age < 65 THEN 'adult'
       ELSE 'senior' END AS bracket
FROM people;

English phrases engineers use

  • "That LEFT JOIN is fanning out — you're double-counting because one user has many orders."
  • "Put the filter in WHERE, not HAVING, so it runs before the grouping."
  • "This query does a sequential scan — run EXPLAIN ANALYZE and add an index."
  • "Use a window function here so we keep the rows instead of collapsing them."
  • "Bump the isolation level to serializable or we'll get a phantom read."
  • "Wrap the two updates in a transaction so it's atomic."
  • "Use UNION ALL — we don't need to pay for de-duplication here."