Real-Time Streaming Vocabulary
0 / 5 completed
Exercise 1 of 5
The data engineer says: 'We publish order events to a Kafka topic — each partition is ordered, and consumers in the same group each read from different partitions in parallel.'
What is a Kafka topic partition?
A Kafka topic is split into partitions. Each partition is an ordered, append-only log. Partitions enable parallelism — multiple consumers in a group each handle a subset of partitions.
Exercise 2 of 5
The architect explains: 'We use consumer groups — each group gets every message once, but within the group each partition is assigned to exactly one consumer. Add consumers to scale out.'
What is the purpose of a Kafka consumer group?
A consumer group is a set of consumers that jointly consume a topic. Each partition is assigned to one consumer in the group. Different groups all get every message independently — enabling multiple applications to consume the same topic.
Exercise 3 of 5
The streaming engineer says: 'We use event time, not processing time, for our windowed aggregations — this handles late-arriving events correctly with watermarks.'
What is the difference between event time and processing time in stream processing?
Event time is embedded in the event itself (when it happened). Processing time is when the system processes it. They differ due to network delays and late arrivals. Watermarks define how long to wait for late events before closing a window.
Exercise 4 of 5
The team debates delivery guarantees: 'At-least-once is simpler — we might duplicate, but nothing is lost. Exactly-once requires idempotent producers and transactional consumers.'
What does exactly-once delivery guarantee in a streaming system?
Exactly-once semantics ensures each message is processed exactly once — no duplicates, no losses. In Kafka, it requires idempotent producers (enable.idempotence=true) and transactional consumers. It's more expensive than at-least-once.
Exercise 5 of 5
The Flink developer explains: 'We checkpoint state every 30 seconds — if the job crashes, it restores from the last checkpoint and replays events from that offset. That gives us fault tolerance.'
What is checkpointing in stream processing?
Checkpointing periodically snapshots the state of a streaming job. On failure, the job restores from the last checkpoint and replays only the events since that point — enabling fault-tolerant stateful stream processing.