5 exercises — practice structured English answers for streaming data engineering interviews covering Kafka internals, Flink processing, delivery semantics, late data, and pipeline testing.
Stream testing: unit (transformation logic) → integration (Kafka test containers) → end-to-end
0 / 5 completed
1 / 5
The interviewer asks: "Explain the difference between at-least-once, at-most-once, and exactly-once delivery in a streaming pipeline." Which answer is most precise?
Option B is strongest: it explains each semantic in terms of both producer and consumer behaviour (not just a definition), explains the at-least-once duplicate mechanism specifically (crash post-process pre-commit), introduces idempotent consumers as the pragmatic mitigation, names the specific Kafka and Flink mechanisms for exactly-once, and ends with the most mature insight — at-least-once + idempotent consumers covers most real-world needs. Streaming vocabulary:Offset commit — the consumer records its position in a Kafka partition. Idempotent consumer — processing the same message twice produces the same result as processing it once. enable.idempotence — Kafka producer config for idempotent writes. Chandy-Lamport algorithm — distributed snapshot algorithm used by Flink for exactly-once checkpointing. Transactional producer — Kafka producer that writes to multiple partitions atomically. Options C and D are accurate but lack the producer/consumer behaviour breakdown and the exactly-once cost justification.
2 / 5
The interviewer asks: "How do you handle late-arriving data in a streaming pipeline?" Which answer is most complete?
Option B is strongest: it explains why late data occurs (common in mobile, offline devices — production context), defines watermarks formally with the heuristic formula, explains the consequences of both extremes of the allowed_lateness setting, explains the incremental recomputation mechanism, and introduces side outputs with specific routing options. Late data vocabulary:Event time — the time the event occurred at the source. Processing time — the time the stream processor receives the event. Watermark — a progress marker indicating all events before a certain event time have arrived. Allowed lateness — Flink parameter controlling how long after the watermark a window remains open for late events. Side output — a secondary output stream for late or filtered events. Dead-letter queue — a holding area for unprocessable or late messages. Options C and D are accurate but lack the mobile/offline causation context and the incremental recomputation explanation.
3 / 5
The interviewer asks: "What's your strategy for managing Kafka consumer group lag?" Which answer is most operational?
Option B is strongest: it names four distinct root cause categories with specific solutions for each (not a generic "scale out"), introduces the poison pill / crash loop pattern with the DLQ solution, explains the partition-count constraint on horizontal scaling, and introduces lag growth rate as a better SLO than absolute lag — a production insight that shows real operational experience. Consumer lag vocabulary:Consumer group lag — the difference between the latest offset in a partition and the last committed offset by the consumer group. Poison pill — a message that causes the consumer to crash every time it attempts processing. DLQ (Dead Letter Queue) — a topic where unprocessable messages are sent to prevent blocking. Partition imbalance — uneven message distribution across partitions, usually caused by a low-cardinality partition key. max.poll.records — Kafka consumer config for the maximum number of records per poll call. Options C and D are accurate but lack the crash loop root cause and the growth rate vs. absolute lag distinction.
4 / 5
The interviewer asks: "Walk me through how you'd design a real-time dashboard with Kafka and Flink." Which answer demonstrates the clearest system design thinking?
Option B is strongest: it names four explicit layers with specific technical choices at each, explains why partitioning by entity ID enables stateful processing, provides a serving layer decision framework (OLAP vs. key-value vs. hybrid), explains the push vs. pull model trade-off for the dashboard layer, and introduces the data age / stale data UX concern — which is often missed in system design interviews but critical for real dashboards. Streaming system design vocabulary:KeyedStream — a Flink stream partitioned by key, enabling per-key state. Tumbling window — a non-overlapping, fixed-size time window. Druid / ClickHouse — OLAP databases optimised for real-time analytical queries. Server-Sent Events (SSE) — a push protocol for streaming data from server to browser. Data age — the time since the displayed data was last updated; a freshness indicator. Options C and D are accurate but lack the entity-partitioning rationale and the data age UX insight.
5 / 5
The interviewer asks: "How do you test a streaming data pipeline?" Which answer is most complete?
Option B is strongest: it names four testing levels (adding chaos testing beyond the standard three), names Flink's TestHarness specifically for stateful operator testing (a real tool), specifies what to test at each level (not just "test the pipeline"), introduces the shadow deployment pattern for end-to-end comparison, and connects chaos testing to the exactly-once guarantee — which is the hardest property to verify. Streaming testing vocabulary:TestHarness — Flink's unit testing framework for operators. EmbeddedKafka — an in-memory Kafka broker for integration tests. Shadow deployment — running a new version in parallel with production to compare outputs. Chaos testing — deliberately injecting failures to verify system resilience. Checkpointing — Flink's mechanism for fault tolerance; periodic snapshots of operator state. Options C and D are accurate but lack the specific tool names and the chaos testing rationale.