Learn the vocabulary of streaming database changes to downstream consumers in near real time.
0 / 5 completed
1 / 5
At standup, a dev mentions streaming every insert, update, and delete from a source database's transaction log to downstream consumers in near real time. What is this technique called?
Change data capture, or CDC, streams every insert, update, and delete recorded in a source database's transaction log to downstream consumers in near real time, rather than periodically exporting the entire database in a batch. This lets a downstream system stay continuously up to date with the source without repeatedly re-scanning the whole dataset. It's widely used to feed a data warehouse, a search index, or a cache with the latest changes as they happen.
2 / 5
During a design review, the team wants a downstream consumer to be able to reprocess a change event it already handled without causing an incorrect duplicate effect. Which capability supports this?
Idempotent processing of a replayed change event designs the downstream consumer so reprocessing an event it already handled produces the same correct outcome, rather than an incorrect duplicate effect. This matters because CDC systems often provide an at-least-once delivery guarantee, meaning the same change event can realistically be delivered more than once. Designing consumers to be idempotent is a standard, necessary practice given that realistic delivery guarantee, similar to other event-driven systems.
3 / 5
In a code review, a dev notices the CDC pipeline reads directly from the source database's write-ahead log rather than repeatedly querying application tables for what changed. What does this represent?
Log-based change data capture reads directly from the source database's write-ahead log, which already records every change in order, rather than repeatedly querying and diffing application tables to infer what changed. Polling and diffing tables adds load to the source database and can miss a change that happened and was then quickly overwritten between polls. Reading the transaction log directly is both more efficient and more reliably complete, since it captures every change exactly as the database itself recorded it.
4 / 5
An incident report shows a CDC consumer fell far behind the source database's change stream during a traffic spike, and the retained log segment expired before the consumer caught up, causing permanent data loss. What practice would prevent this?
Monitoring consumer lag and ensuring the source log's retention window comfortably exceeds the worst-case time a consumer might need to catch up prevents a slow consumer from permanently losing changes that expired out of the log before being read. Configuring retention as short as possible ignores that a consumer can realistically fall behind during a traffic spike or an outage. This retention and lag-monitoring discipline is essential for making a CDC pipeline reliable rather than fragile under real-world load variation.
5 / 5
During a PR review, a teammate asks why the team streams changes via CDC instead of running a nightly batch job to export the entire database to downstream systems. What is the reasoning?
A nightly batch export leaves a downstream system seeing only a stale snapshot until the next scheduled run, which can be hours out of date for a use case that needs current data. CDC streams each change as it happens, keeping downstream systems continuously up to date in near real time. The tradeoff is the added operational complexity of running and monitoring a continuous streaming pipeline compared to a simpler, if staler, periodic batch job.