5 exercises — practise answering Database Reliability Engineer interview questions in professional technical English.
0 / 5 completed
1 / 5
The interviewer asks: "Can you compare streaming replication and logical replication in PostgreSQL, and describe when you would use read replicas versus multi-region topologies?" Which answer best demonstrates Database Reliability Engineer expertise?
Option B is strongest because it precisely distinguishes the WAL-block versus row-level change-event mechanisms, names real HA tools (Patroni, repmgr), explains synchronous_commit modes for RPO control, and highlights operational risks like WAL accumulation from unmanaged replication slots. Option A is superficial and misses topology trade-offs entirely. Option C is vague and incorrectly suggests logical replication for multi-master without addressing conflict resolution. Option D conflates configuration files with replication architecture and has no depth. Database Reliability Engineer interview best practice: always tie replication topology choices to RPO/RTO targets and show awareness of the operational monitoring required to keep replication healthy under load.
2 / 5
The interviewer asks: "Walk me through your backup strategy for a critical PostgreSQL database. How do you ensure you can meet aggressive RTO and RPO targets?" Which answer best demonstrates Database Reliability Engineer expertise?
Option B is strongest because it names specific tools (pgBackRest, Barman), applies the 3-2-1 rule to the database context, explains PITR semantics, quantifies the RTO target (under two minutes with Patroni/etcd), and demonstrates validation practices including weekly restore tests with checksum verification. Option A describes the minimum viable approach but lacks PITR, validation, and HA standby design. Option C mentions pgBackRest but lacks depth on PITR, testing frequency, and HA topology. Option D avoids the question by delegating to a managed service, demonstrating no underlying knowledge of backup architecture. Database Reliability Engineer interview best practice: always distinguish between RPO (data loss tolerance, answered by WAL archiving frequency) and RTO (downtime tolerance, answered by standby readiness), and show how each part of your strategy addresses both dimensions.
3 / 5
The interviewer asks: "Our application is hitting PostgreSQL max_connections limits under load. How would you approach connection pooling, and what are the trade-offs between PgBouncer's session, transaction, and statement modes?" Which answer best demonstrates Database Reliability Engineer expertise?
Option B is strongest because it explains why raising max_connections backfires, quantifies the memory cost per backend, names all three PgBouncer modes with their specific trade-offs and incompatible session features (prepared statements, advisory locks, LISTEN/NOTIFY), gives concrete pool sizing guidance tied to CPU core count, and identifies the monitoring metrics that signal pool exhaustion. Option A gives the wrong primary advice before mentioning PgBouncer. Option C is partially correct but lacks depth on incompatible session features, sizing heuristics, and monitoring. Option D addresses a different problem without solving the connection limit issue for synchronous clients. Database Reliability Engineer interview best practice: always identify which session-level PostgreSQL features your application uses before recommending a PgBouncer mode, and monitor cl_waiting as the leading indicator of pool exhaustion.
4 / 5
The interviewer asks: "How do you approach database monitoring in production? What specific signals do you track to detect performance degradation before it becomes an outage?" Which answer best demonstrates Database Reliability Engineer expertise?
Option B is strongest because it describes a complete monitoring stack from infrastructure to application SLIs, names specific system views (pg_stat_statements, pg_locks, pg_stat_activity, pg_stat_user_tables, pgstattuple, pg_stat_replication), gives concrete alert thresholds (500 ms lock wait, 20% bloat), explains the operational response for each signal, and ties everything to a Prometheus-based alerting pipeline supplemented by auto_explain. Option A is too generic — CPU and memory are lagging indicators and slow query logs alone are insufficient for proactive reliability. Option C names the right views but lacks a complete strategy, thresholds, and proactive bloat management. Option D describes a valid tooling choice but demonstrates no underlying understanding of what signals to measure. Database Reliability Engineer interview best practice: always demonstrate that you monitor leading indicators (lock wait time growth, bloat rate, autovacuum lag) rather than only lagging indicators (query timeout, replication failure).
5 / 5
The interviewer asks: "We need to add a NOT NULL column to a 500-million-row table in production without downtime. What tools and approaches would you use?" Which answer best demonstrates Database Reliability Engineer expertise?
Option B is strongest because it correctly explains the PostgreSQL 11+ catalog optimization, names multiple tools across databases (pg_repack, gh-ost, pt-online-schema-change, Liquibase, Flyway), describes the expand-contract pattern for zero-downtime deployments, mentions rate-limited batch backfilling for IOPS control, and explains the NOT NULL VALIDATE CONSTRAINT technique that avoids a write-blocking lock. Option A is partially correct but incomplete — it only applies to specific non-volatile DEFAULT types and ignores backfill and constraint validation. Option C requires a maintenance window and a single large UPDATE that would lock the table and generate massive IOPS. Option D names gh-ost correctly but gives no depth on the overall phased strategy or PostgreSQL-specific alternatives. Database Reliability Engineer interview best practice: always break online schema changes into three phases — add nullable column, backfill in rate-limited batches, add constraint — and validate each phase independently in a staging environment before running in production.