5 exercises — choose the best-structured answer to common Kafka Streams Architect interview questions. Focus on KRaft internals, consumer rebalancing, EOS, state stores, and topology design.
Structure for Kafka Streams architect interview answers
Name the mechanism: explain how the feature works internally, not just what it does
Quantify trade-offs: give concrete numbers (WAF, throughput overhead, partition counts)
Cover failure semantics: what happens on crash, how recovery works
State operational impact: ops complexity, monitoring knobs, Kubernetes considerations
0 / 5 completed
1 / 5
The interviewer asks: "Explain how KRaft mode replaces ZooKeeper in Kafka — what changed architecturally and why does it matter?" Which answer best captures the architectural significance?
Option B covers the full architecture: the metadata log as a Raft-replicated topic, controller election mechanics, the broker-controller MetadataFetch protocol, and four concrete reasons why it matters (startup time with O(n) ZK detail, operational simplicity, partition scalability numbers, and consistency guarantee). It also names the migration path (KIP-833). Options A, C, D each identify one correct aspect but miss the architectural depth — none explain the controller quorum model or quantify the scalability improvement.
2 / 5
The interviewer asks: "Compare eager rebalancing vs cooperative sticky rebalancing in Kafka consumer groups. When would you choose one over the other?" Which answer demonstrates the deepest understanding?
Option B is strongest: it explains the mechanism of both protocols precisely (stop-the-world vs. two-phase delta approach), quantifies the difference (all partitions revoked vs. only the delta), covers incremental cooperative rebalancing as an extension, gives concrete guidance on when each applies (pod scaling, latency-sensitive), and provides the exact configuration. Options C and D state the conclusion correctly but don't explain the two-phase mechanism or give concrete usage criteria. Option A is too superficial.
3 / 5
The interviewer asks: "Explain how Kafka achieves exactly-once semantics end-to-end in a Kafka Streams application." Which answer best covers the full EOS guarantee stack?
Option B covers the complete stack: idempotent producer mechanics (PID + sequence numbers), transaction coordinator two-phase commit with the specific topics involved (`__consumer_offsets`), `read_committed` isolation level on the consumer side, Kafka Streams EOS v2 implementation detail (task-level vs thread-level producers), failure recovery semantics, and the throughput cost. Options C and D name the right components but don't explain the mechanism (how deduplication works, what the transaction coordinator does, or how failures are recovered). Option A is a one-liner with no mechanism.
4 / 5
The interviewer asks: "How do compacted topics work in Kafka, and how does Kafka Streams use them for state store fault tolerance?" Which answer best explains the mechanism and use case?
Option B explains all six aspects: the log cleaner mechanics (merge segments, tombstones, `min.compaction.lag.ms`), why compaction produces a durable key-value snapshot, how RocksDB is used as the local store, what a changelog topic is and how it maps to state store changes, how recovery reads the compacted log and why size stays bounded (scales with distinct keys not total events), standby replicas as the fast-failover alternative, and the keying requirement tradeoff. Options A, C, D each state the conclusion but none explain the compaction mechanism, recovery time behaviour, or standby replicas.
5 / 5
The interviewer asks: "When would you choose Kafka Streams (Java API) over ksqlDB for a streaming application? Walk through the trade-offs." Which answer best covers the technical decision criteria?
Option B provides six dimensions: programming model difference (and the key fact that ksqlDB compiles to Kafka Streams), concrete criteria for choosing ksqlDB (SQL team, simple pipelines, REST API), concrete criteria for Kafka Streams (custom state, non-Kafka integration, JVM DI, embedded library deployment), the operational architecture difference (library vs cluster — the biggest practical factor for Kubernetes teams), EOS control, and unit testing capability (`TopologyTestDriver`). Options C and D each identify one or two correct criteria but miss the operational/testing/embedding dimensions. Option A is too superficial.