Database Optimization Language Exercises
Learn vocabulary for database optimization communication: execution plans, index strategies, query optimization, replication lag, and capacity planning.
- Explain Plan Vocabulary
- Index Strategy Discussion
- Query Optimisation Language
- Replication Lag Communication
- Database Capacity Planning
Frequently Asked Questions
What does EXPLAIN ANALYZE actually tell you about a slow query?
EXPLAIN ANALYZE executes the query and returns the actual execution plan with real row counts and timing. Key things to look for are sequential scans on large tables (Seq Scan), high actual rows vs. estimated rows (indicating stale statistics), and nodes with a high cost value. The "actual time" field shows milliseconds spent per node, so you can pinpoint exactly which join, sort, or scan is the bottleneck.
How do you explain the N+1 query problem to a non-technical stakeholder?
A clear way to phrase it: "Instead of asking the database one question that returns all the data we need, the code asks it one question per record — so for 500 records we fire 501 database round-trips instead of 1." You can use the analogy of sending a warehouse worker to fetch each item individually versus handing them a full shopping list. The fix is typically eager loading or a single JOIN query.
What is the difference between a clustered and a non-clustered index?
A clustered index physically orders the table rows on disk according to the index key — a table can have only one. A non-clustered index is a separate structure that stores the key values and pointers back to the actual rows, so lookups require an extra step called a bookmark lookup or key lookup. When discussing index strategy in English, you might say: "We clustered on the primary key and added a covering non-clustered index on (user_id, created_at) to avoid a key lookup for the most common read path."
What vocabulary is used when discussing connection pooling?
Common terms include: pool size (the number of connections kept open), min/max connections, connection timeout (how long a client waits before failing), idle timeout (how long an unused connection is held before being closed), and pool exhaustion (when all connections are in use and new requests queue or fail). You might say: "We tuned the pool size from 10 to 25 after observing frequent pool exhaustion during peak traffic."
How do you describe index bloat in a code review or incident debrief?
Index bloat refers to unused space inside an index structure caused by frequent updates and deletes that leave dead tuples behind. Useful phrases: "The index has grown significantly due to write amplification on the orders table," "We need to run REINDEX CONCURRENTLY to reclaim space without locking," or "Dead tuples account for roughly 40% of the index size according to pg_stat_user_indexes."
What does "cardinality" mean in a database index context?
Cardinality refers to the number of distinct values in a column. High cardinality (e.g., user IDs) makes a column an excellent index candidate because the index can eliminate most rows quickly. Low cardinality (e.g., a boolean flag) means an index is often less useful — the query planner may prefer a sequential scan. In conversation: "The status column has low cardinality, so a partial index on status = 'pending' will be far more selective than a full index."
What English phrases are used to discuss query plan regressions?
A query plan regression occurs when a database optimizer chooses a worse execution plan after a statistics update, data growth, or version upgrade. Useful phrases: "The planner switched from an index scan to a sequential scan after the table grew past the statistics threshold," "We pinned the plan using a query hint to prevent regression," or "We noticed a 10x increase in p99 latency consistent with a plan change."
How do you communicate replication lag to a product team?
Translate technical terms into impact: "Replication lag means there is a delay — currently about 3 seconds — between a write on the primary database and that write becoming visible on the read replica. For features that write and immediately read, users may see stale data if the request is routed to the replica within that window." Business-friendly vocabulary: propagation delay, read-after-write consistency, eventual consistency, and staleness window.
What is the meaning of "covering index" and when would you recommend one?
A covering index includes all columns a query needs so the database engine never has to visit the base table rows at all — it satisfies the entire query from the index alone. Recommend one when profiling shows frequent key lookups following a non-clustered index seek. Example phrase: "I propose adding a covering index on (customer_id) INCLUDE (email, status) to eliminate the key lookup that accounts for 60% of the query cost."
How should you frame a database capacity planning discussion?
Structure the conversation around three dimensions: storage growth rate (current size, projected size at current rate, time to exhaustion), IOPS headroom (current peak vs. provisioned limit), and connection headroom (peak concurrent connections vs. max_connections setting). A useful framing: "At our current data growth rate of 8 GB per month, we will exhaust the current allocation in approximately 14 months. I recommend we review partitioning and archival strategies in Q3."