A DBA explains a performance problem: "The query is doing a full table scan — there's no index on the filter column. On a 50-million-row table that's unacceptably slow. I'll add a covering index on (user_id, created_at) to include the columns the SELECT needs." What is a covering index?
A covering index is an index that contains all columns referenced by a query (in both WHERE and SELECT), so the query engine can satisfy the entire query from the index without looking up the actual row (also called "index-only scan"). Example: CREATE INDEX idx_orders_user ON orders(user_id, created_at) INCLUDE (amount, status); — a query for "orders by user X, select created_at/amount/status" is fully served by the index. Database indexing vocabulary: Clustered index — defines the physical order of table data; InnoDB uses the primary key as the clustered index. Non-clustered index — a separate B-tree structure with pointers to the actual row; multiple per table. B-tree index — standard balanced tree index, good for range queries and equality. Hash index — faster for equality lookups, doesn't support range queries. Composite index — index on multiple columns; column order matters — (a,b) supports queries on a or a+b, but not b alone. Index selectivity — higher is better; a boolean column has low selectivity (bad candidate). Index scan vs seek — seek navigates B-tree to a specific entry; scan reads all entries. In conversation: "Adding the covering index dropped the query from 4.2 seconds to 12 milliseconds."
2 / 5
During a database design review, a senior DBA asks: "This table is in first normal form but not third — you have a transitive dependency: the manager_email column depends on manager_id, which depends on employee_id. Extract the manager data to its own table." What is third normal form (3NF)?
Third Normal Form (3NF): a table is in 3NF if it's in 2NF AND no non-key column depends on another non-key column (no transitive dependencies). Normalisation forms: 1NF — atomic values (no repeated groups, no arrays), each row uniquely identified. 2NF — every non-key column depends on the full primary key (not a partial key). Relevant only for composite keys. 3NF — no transitive dependencies (non-key → non-key dependencies). BCNF (Boyce-Codd) — stricter: every determinant must be a candidate key. Why normalise: avoids update anomalies (changing manager email requires updating many rows), insert anomalies (can't add a manager without an employee), delete anomalies (deleting last employee removes manager data). When NOT to normalise: data warehouses often use star schema (intentionally denormalised): one large fact table, surrounding dimension tables. Query performance is better when all needed data is in fewer tables. Denormalisation — deliberately violating normalisation for performance. In conversation: "The product table was in 2NF but violating 3NF — the supplier country depended on supplier_id, not on product_id, so we extracted it to a Suppliers table."
3 / 5
A DBA explains a data integrity concept: "PostgreSQL guarantees ACID properties for every transaction. Even if the server crashes after a COMMIT, the data is durable — it's in the write-ahead log and will survive a restart." What do the letters in ACID stand for, and what does durability mean?
ACID guarantees for database transactions: Atomicity — a transaction is all-or-nothing; either all changes commit or none do (no partial updates). Consistency — a transaction brings the database from one valid state to another; all integrity constraints (foreign keys, CHECK constraints, triggers) are satisfied. Isolation — concurrent transactions see each other as if they ran serially; intermediate state is not visible to others. Isolation levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE. Durability — committed transactions are permanently stored and survive crashes. Mechanism: WAL (Write-Ahead Log) — changes are written to a log before being applied to data files; on crash, the log replays uncommitted changes. NoSQL trade-off: many NoSQL databases sacrifice some ACID properties for scalability. MongoDB added transactions; Redis offers transactions; Cassandra offers eventual consistency (BASE: Basically Available, Soft state, Eventually consistent). In conversation: "We switched from eventual consistency to a strongly consistent database — the eventual model was causing user complaints about seeing stale data immediately after saving."
4 / 5
A DBA explains a query execution problem: "The query planner chose a sequential scan over the index — it estimated 60% of rows would match, and below that threshold it's cheaper to scan the table than to follow index pointers. After we refined the query to filter more specifically, the planner chose the index." What is a query planner?
The query planner (or query optimiser) is the database's internal component that, given a SQL query, generates and compares possible execution plans and chooses the estimated cheapest one. Vocabulary for query analysis: EXPLAIN — shows the execution plan the planner chose (in PostgreSQL: EXPLAIN ANALYZE SELECT...). Execution plan — the sequence of operations: table scan, index seek, hash join, sort, etc. Statistics — the planner uses table column statistics (row count, distinct values, histograms) to estimate selectivity. Sequential scan (SeqScan) — reads every row in the table; fast for large fractions of data. Index scan — navigates B-tree, then fetches rows; fast for high selectivity. Bitmap scan — collects a set of matching row IDs via index, then fetches them in page order; good for medium selectivity. Cost estimate — the planner assigns a cost to each operation; the plan with the lowest estimated cost wins. Cardinality estimation errors — if the planner's statistics are stale, it may choose a bad plan. ANALYZE updates statistics. In conversation: "I ran EXPLAIN ANALYZE and the planner was using a nested loop join for a 10-million-row table — I rewrote the JOIN order and it switched to a hash join, 60× faster."
5 / 5
A database administrator presents a high-availability architecture: "We have a primary-replica setup with streaming replication. If the primary goes down, we can promote a replica to primary within 30 seconds using failover automation." What is streaming replication and what does promote a replica mean?
Streaming replication sends WAL (Write-Ahead Log) records from the primary to one or more replica (standby) databases in near-real-time, keeping replicas continuously updated. Promoting a replica means reconfiguring a standby to become a new primary — it starts accepting writes and the old primary (if recovered) must re-join as a replica. High-availability vocabulary: Primary / leader / master — the database that accepts write queries. Replica / standby / secondary — a copy of the primary; typically read-only. Synchronous replication — primary waits for at least one replica to confirm receipt before committing; zero data loss but adds latency. Asynchronous replication — primary commits immediately; replica catches up asynchronously; faster but some data loss risk on failover. Failover — switching traffic to a standby when the primary fails. Switchover — a planned, graceful failover during maintenance. RPO (Recovery Point Objective) — maximum acceptable data loss. RTO (Recovery Time Objective) — maximum acceptable downtime. Tools: Patroni, Pacemaker (PostgreSQL HA), MHA (MySQL). In conversation: "We moved from async to sync replication on the payment database — RPO went from 5 seconds of potential data loss to zero, at the cost of 2ms additional write latency."