Practice data streaming architecture vocabulary: Lambda vs. Kappa architecture, speed and batch layers, real-time views, and architectural trade-offs.
0 / 5 completed
1 / 5
What are the two processing layers in a Lambda architecture?
Lambda architecture combines a batch layer — which reprocesses all historical data periodically for accuracy — and a speed layer — which processes incoming events in real-time with lower latency but potentially less accuracy. Query results are merged from both layers. The trade-off is complexity: maintaining two separate processing codebases.
2 / 5
What is the key principle of Kappa architecture?
Kappa architecture (Nathan Marz, 2014) simplifies Lambda by using only a streaming layer. Historical data reprocessing is handled by replaying the event log (e.g., a Kafka topic) through the same streaming pipeline. This eliminates the complexity of maintaining two separate codebases but requires a durable, replayable event log.
3 / 5
'The speed layer provides real-time views.' What is a real-time view in this context?
In Lambda architecture, the speed layer computes real-time views from the most recent events — the data that the batch layer hasn't yet processed. These views may be less accurate (due to late data or approximate algorithms) but are available with low latency. The serving layer merges batch views and real-time views to answer queries.
4 / 5
'We chose Kappa for operational simplicity.' What operational complexity does Kappa avoid compared to Lambda?
Lambda's main operational burden is maintaining two codebases — a batch pipeline (often Hadoop/Spark) and a streaming pipeline (often Flink/Storm) — and ensuring they produce consistent results. When business logic changes, both must be updated in sync. Kappa eliminates this by using one streaming pipeline for everything, at the cost of needing a durable replayable log.
5 / 5
What is a key trade-off when choosing between Lambda and Kappa architecture?
The core Lambda vs. Kappa trade-off is complexity vs. capability. Lambda's two-layer approach handles reprocessing naturally (just rerun the batch layer) but requires maintaining two pipelines. Kappa is simpler operationally but requires your streaming system to efficiently replay large volumes of historical data — which works well with Kafka but adds pressure on the streaming engine.