In Apache Kafka, what is a topic and how is it structured?
A Kafka topic is a named stream of records, implemented as a durable, append-only log. Each topic is split into partitions, which are the unit of parallelism and ordering: records within a partition are strictly ordered by offset, while across partitions there is no global order. Producers append to partitions and consumers read from them. Topics retain data based on a retention policy (time or size), so unlike a traditional queue, messages are not deleted when read — they can be re-read by replaying offsets.
2 / 5
What is a consumer group in Kafka and how does it enable scaling?
A consumer group is how Kafka load-balances consumption. Kafka assigns each partition to exactly one consumer within a group, so the group collectively processes all partitions in parallel. Adding consumers increases throughput — up to the number of partitions, after which extra consumers sit idle (parallelism is capped by partition count). Different groups each get their own full copy of the stream (pub/sub), while members of the same group share the work (queue semantics). Rebalancing reassigns partitions when consumers join or leave.
3 / 5
What is an offset in Kafka?
An offset is the sequential position of a record within a partition — record 0, 1, 2, and so on. Consumers track their progress by committing the offset of the last record they processed, so after a restart or rebalance they resume from where they left off rather than reprocessing or skipping. Because Kafka retains records, a consumer can seek to an earlier offset to replay history, or to the latest to skip ahead. Offset management (when and whether to commit) directly determines at-least-once vs at-most-once delivery semantics.
4 / 5
Why does Kafka guarantee ordering only within a partition, not across the whole topic?
Ordering is a per-partition guarantee because partitions are the unit of parallelism. If Kafka enforced a single global order across a topic, all records would have to flow through one ordered sequence, destroying the horizontal scalability that partitioning provides. Instead, you control ordering by choosing the partition key: records with the same key (e.g. the same user_id) always go to the same partition and are therefore strictly ordered relative to each other. This lets you preserve order where it matters while scaling across keys.
5 / 5
What does exactly-once semantics (EOS) mean in Kafka stream processing, and why is it hard?
In distributed messaging, at-least-once (retries cause duplicates) and at-most-once (no retries, risk of loss) are easy; exactly-once is hard because network failures force retries, which naturally duplicate. Kafka achieves EOS with idempotent producers (the broker deduplicates retried sends using sequence numbers) and transactions (atomically commit the consumed offset and the produced output together). This ensures a record is processed and its effect recorded once and only once, even across failures — essential for correctness in financial or stateful stream processing.