Learn vocabulary for Apache Flink, Spark Streaming, Kafka Streams, Faust, stateful vs. stateless processing, and operator chaining.
0 / 5 completed
1 / 5
What is Apache Flink's key characteristic compared to other stream processing frameworks?
Apache Flink: true stream processing (not micro-batch), designed for low latency. Key vocabulary: DataStream API, stateful operators, checkpoints (distributed snapshots for fault tolerance), savepoints (manual state snapshots for upgrades), watermarks for event-time, and exactly-once end-to-end guarantees. Used for fraud detection, real-time recommendations, and complex event processing.
2 / 5
What is 'micro-batch processing' in the context of Spark Structured Streaming vocabulary?
Spark Structured Streaming uses micro-batching (default) or continuous processing mode. Micro-batch: accumulate records for a trigger interval, then process them as a Spark SQL query. Simpler programming model (same DataFrame API as batch), strong exactly-once guarantees via WAL + idempotent sinks. Downside: latency floor is at least the batch interval. Good for sub-second latency requirements but not millisecond.
3 / 5
What is Kafka Streams' defining characteristic compared to Flink or Spark Streaming?
Kafka Streams: a Java/Scala library (not a cluster framework). Your application IS the stream processing cluster — Kafka handles partitioning and rebalancing across instances. State stores (RocksDB) are local to each instance and backed by Kafka changelog topics for durability. Key vocabulary: KStream (record stream), KTable (changelog stream / materialized view), GlobalKTable, interactive queries, topology.
4 / 5
What is 'stateful processing' in stream processing frameworks vocabulary?
Stateful operators maintain state between events: count events per user (counter state), aggregate revenue per hour (accumulator state), join a click stream with an ad impression stream (join state). Frameworks manage state backends (in-memory, RocksDB) and make state fault-tolerant via checkpoints/changelogs. Stateful processing is the hard part of streaming — scaling, rebalancing, and state migration require careful design.
5 / 5
What is 'operator chaining' in stream processing framework vocabulary?
Operator chaining (Flink terminology): if operators A → B → C are sequential and have the same parallelism, Flink can chain them into one task. Records pass between operators as in-memory Java objects — no serialization or network hops. This significantly reduces overhead. You can disable chaining for debugging or when operators have different resource needs. Spark's query optimizer performs similar fusion (WholeStageCodegen).