Apache Flink's DataStream API enables stateful stream processing with event-time semantics and exactly-once guarantees. Master vocabulary for watermark advancement and timer behavior, RocksDB vs HashMapStateBackend trade-offs, keyBy partitioning, 2PC Kafka sink recovery, and allowedLateness for late event handling.
0 / 5 completed
1 / 5
A Flink job uses a KeyedProcessFunction with a timer set for 10 seconds in event time. The watermark does not advance for 5 minutes. What happens to the timer?
Event-time timers in Flink fire only when the watermark advances past the timer's timestamp. If the watermark stalls (e.g., no new events, or an upstream partition goes idle), the timer remains pending indefinitely. Use WatermarkStrategy.withIdleness(duration) to handle idle partitions and prevent watermark stalling.
2 / 5
What is the purpose of Flink's RocksDBStateBackend compared to the HashMapStateBackend?
RocksDBStateBackend stores state in RocksDB on local disk/SSD, allowing state sizes far exceeding JVM heap (terabytes). It also supports incremental checkpoints. HashMapStateBackend stores state in JVM heap — faster access but limited by available heap memory. RocksDB also enables incremental checkpoints, making it preferred for large-state production jobs.
3 / 5
A developer calls stream.keyBy(event -> event.userId).window(TumblingEventTimeWindows.of(Time.minutes(5))).aggregate(new MyAggregateFunction()). What does keyBy do to the parallelism?
keyBy implements a hash partitioning strategy: events with the same key are deterministically routed to the same parallel subtask (based on key hash modulo parallelism). The overall parallelism is unchanged — there are still N subtasks, but each subtask handles a disjoint subset of keys. This enables parallel stateful processing.
4 / 5
A Flink job uses two-phase commit for exactly-once Kafka output. The pre-commit phase succeeds but the commit phase fails before acknowledgment. What happens on job recovery?
Flink's 2PC exactly-once Kafka sink stores pre-committed transaction IDs in checkpoint state. On recovery, Flink re-issues the commit for any transactions that were pre-committed but not yet confirmed. Kafka transactions support this: a pre-committed transaction can be committed by any producer with the same transactional.id, enabling recovery.
5 / 5
What does the allowedLateness setting on a Flink window do?
allowedLateness(Time.seconds(30)) keeps window state alive for 30 seconds after the watermark passes the window's end. Late events arriving within this window period trigger a window re-firing with updated results. Events arriving after allowedLateness expires are sent to the side output (if configured with sideOutputLateData) or dropped.