SQL Performance Reading
EXPLAIN output, Seq Scan vs Index Scan, N+1 problem, join algorithms, and index types
SQL performance vocabulary
- Seq Scan — full table scan; no index; slow on large tables
- Index Scan / Bitmap Heap Scan — uses an index; efficient for selective filters
- N+1 problem — 1 query for list + N queries for related data; fix with eager loading
- Rows Removed by Filter in EXPLAIN ANALYZE — high = missing index
- Functional index —
CREATE INDEX ON tbl(LOWER(col))— required when WHERE uses a function on a column
Question 0 of 5
A PostgreSQL EXPLAIN output shows: Seq Scan on orders (cost=0.00..4520.00 rows=200000 width=50). What does "Seq Scan" mean for performance?
Sequential scan = reading every row in the table; no index used. EXPLAIN output vocabulary:
- Seq Scan — full table scan; reads all rows; acceptable for small tables or when most rows are returned
- Index Scan — uses an index to locate rows; efficient for selective queries (few rows match)
- Index Only Scan — answers the query entirely from the index without touching the table
- Bitmap Heap Scan — uses an index to build a bitmap, then fetches matching rows; good for medium selectivity
- cost=start..total — estimated cost units (not milliseconds); lower is better
- rows=N — estimated number of rows returned
A developer says: "This page loads slowly because of the N+1 query problem." What does N+1 mean?
1 query for the list + N queries for each item's related data = N+1 total queries. The N+1 problem is one of the most common ORM performance issues:
- Example: fetch 100 orders → 1 query; for each order, fetch its user → 100 more queries → 101 total
- Symptom: page that handles 100 items suddenly makes 101 DB queries instead of 2
- Eager loading — load all related data in one query:
JOINin SQL or.include('user')in ORMs - Batch loading — fetch all related IDs in one IN query:
WHERE user_id IN (1, 2, ...100) - DataLoader — batching pattern used in GraphQL resolvers
getUser(id) on each order — classic N+1. Use eager loading instead."EXPLAIN ANALYZE shows: Filter: (status = 'pending'::text). What performance issue does this reveal?
Rows Removed by Filter: 198500
Almost all rows were scanned and discarded — a missing index on the status column. EXPLAIN ANALYZE vocabulary:
- Rows Removed by Filter — rows that were read but did not match the WHERE condition; high numbers indicate a full scan followed by filtering
- actual rows=N — actual rows returned (vs estimated)
- actual time=start..end — actual execution time in milliseconds
- loops=N — number of times this node executed (e.g., once per outer row in a nested loop join)
CREATE INDEX idx_orders_status ON orders(status). After indexing, EXPLAIN should show Index Scan or Bitmap Index Scan instead of Filter with high Rows Removed.A query runs fine with 1,000 rows but becomes very slow with 1,000,000 rows. Which EXPLAIN output term best explains why?
Nested Loop with no index on the inner side scales as O(N²) — the main reason join queries degrade at scale. Query plan vocabulary:
- Nested Loop Join — for each row in the outer table, scan the inner table; O(N×M) without indexes on both sides
- Hash Join — build a hash table of one side, probe with the other; O(N+M); good for large unsorted tables
- Merge Join — both inputs must be sorted; efficient when both sides have sorted indexes
In a code review, you see: SELECT * FROM users WHERE LOWER(email) = LOWER($1);. A colleague suggests adding an index. What kind of index should they suggest?
A functional index on LOWER(email) — a plain index on email is bypassed when the column is wrapped in a function. Index vocabulary:
- B-tree index — default; supports =, <, >, BETWEEN, LIKE prefix
- Functional/expression index — index on an expression:
CREATE INDEX ON users(LOWER(email)); allows the planner to use it when the WHERE clause uses the same expression - Partial index — index with a WHERE clause; only indexes matching rows:
CREATE INDEX ON orders(status) WHERE status != 'delivered' - Composite index — index on multiple columns; column order matters
CREATE INDEX ON users(LOWER(email)) and consider storing email normalized at write time instead."