Apache Kafka Vocabulary: 30 Terms for Event Streaming Developers

Topics, partitions, consumer groups, offsets, Kafka Streams, Schema Registry, and event streaming vocabulary.

If you work on backend systems and your team has adopted event-driven architecture, chances are you have already heard colleagues talk about Kafka. Apache Kafka has become the backbone of real-time data pipelines at companies of every size, and understanding the vocabulary around it is essential — not just for writing code, but for participating confidently in design discussions, code reviews, and incident calls. This post covers 30 core terms you will encounter when working with Kafka and Kafka Streams, with plain-English definitions and real conversation examples to help you use them naturally.


Core Broker and Storage Terms

topic — a named channel to which producers write messages and from which consumers read them. Think of it as a category or feed, similar to a folder for a particular type of event.

“We decided to create a separate topic for payment events rather than mixing them with order events.”

“How many messages per second is that topic currently handling?”

partition — a topic is split into one or more partitions, each of which is an ordered, immutable sequence of records. Partitions are the primary unit of parallelism in Kafka.

“We bumped the partition count from 6 to 12 to let us scale out the consumer side.”

“Messages with the same key always land in the same partition, which preserves ordering for that key.”

replica — a copy of a partition stored on a different broker. Replicas provide fault tolerance: if one broker goes down, another replica can take over.

“We run a replication factor of 3, so each replica is on a different availability zone.”

leader / follower — for each partition, one broker is the leader (it handles all reads and writes) while the others are followers (they replicate data from the leader). If the leader fails, a follower is elected as the new leader.

“The leader for partition 4 moved to broker 2 after the rolling restart.”

“All produce and fetch requests go to the leader — followers are purely for redundancy.”

offset — an integer that uniquely identifies a record’s position within a partition. Consumers track their offset to know which messages they have already processed.

“After the outage we manually reset the offset to replay the last two hours of events.”

“Always commit your offset only after you have successfully processed the message, not before.”

lag — the difference between the latest offset in a partition and the offset a consumer has reached. High lag means the consumer is falling behind the producer.

“Our lag spiked to 500k messages during the deploy — we need to investigate why processing slowed down.”

“We have alerts set up to page us if consumer lag exceeds 10k for more than five minutes.”


Producers, Consumers, and Groups

producer — an application or library that publishes (writes) messages to a Kafka topic. Producers decide which partition to send a message to, either by key, round-robin, or a custom partitioner.

“The producer is configured with acks=all to ensure the message is written to every in-sync replica before the call returns.”

consumer — an application or library that reads messages from one or more topic partitions. Consumers poll Kafka at their own pace and track progress via offsets.

“The consumer is idempotent — if it receives the same message twice it will not double-process the payment.”

consumer group — a logical grouping of consumers that collaborate to read from a topic. Each partition is assigned to exactly one consumer within a group, enabling horizontal scaling.

“We added a third instance to the consumer group and Kafka automatically redistributed the partitions.”

“If two separate services both need to read the same events, give each its own consumer group — they will each receive every message independently.”

rebalancing — the process by which Kafka redistributes partition assignments among the members of a consumer group, triggered by a member joining, leaving, or crashing.

“We are seeing a lot of rebalancing because our consumers keep timing out — we need to tune max.poll.interval.ms.”

“During a rebalance, consumers pause processing, which is why short rebalances are critical for low-latency pipelines.”

dead letter topic — a dedicated topic where messages that could not be processed successfully are sent, allowing engineers to inspect and reprocess them without blocking the main pipeline.

“Any message that fails deserialization gets routed to the dead letter topic so we can examine it later without holding up the rest of the stream.”

“We have a small consumer that reads from the dead letter topic and sends alerts to Slack when failure rates are abnormal.”


Schema and Serialisation

Schema Registry — a centralised service (commonly Confluent Schema Registry) that stores and versions the schemas used to serialise and deserialise Kafka messages. It enforces compatibility rules so that producers and consumers stay in sync.

“Before merging, check that your schema change is backward-compatible with the Schema Registry — we cannot break existing consumers.”

“The Schema Registry rejected the new field because it had no default value and we are running in FULL compatibility mode.”

Avro / Protobuf schema — two popular binary serialisation formats used with Kafka. Avro uses a JSON-based schema definition and is common in the Confluent ecosystem; Protobuf (Protocol Buffers) is Google’s binary format, known for strong typing and broad language support.

“We chose Avro for its tight integration with the Schema Registry and its compact binary encoding.”

“The mobile team prefers Protobuf because it generates strongly-typed classes in both Swift and Kotlin from the same schema.”

idempotent producer — a producer configured so that retries do not result in duplicate messages. Kafka assigns each producer a unique ID and sequence numbers to detect and discard duplicates on the broker side.

“Enable the idempotent producer setting — it costs almost nothing and prevents duplicate writes during transient network errors.”

exactly-once semantics — a delivery guarantee that ensures each message is processed exactly one time, even in the presence of failures and retries. Achieved in Kafka through a combination of idempotent producers, transactions, and consumer offset management.

“For the billing pipeline we need exactly-once semantics — duplicate charges are unacceptable even if a broker crashes mid-transaction.”

Exactly-once comes with a performance overhead, so we only enable it for financial events, not for analytics topics.”


Kafka Streams and Stream Processing

Kafka Streams — a Java library (part of the Apache Kafka project) for building stateful stream processing applications that read from and write back to Kafka topics without requiring a separate processing cluster.

“We replaced our Flink job with Kafka Streams because we wanted to avoid operating a separate cluster — it runs inside our existing Spring Boot service.”

KStream — a Kafka Streams abstraction representing a continuous, unbounded stream of records. Each record is treated as an independent event.

“We use a KStream for raw click events because every click is a distinct, stateless fact.”

KTable — a Kafka Streams abstraction representing a changelog stream, where each record is an upsert (insert or update) keyed by a unique identifier. A KTable models the latest state of each key.

“The user profile data is modelled as a KTable — when a user updates their email, the new record replaces the old one.”

“We join the order KStream against the customer KTable to enrich each order with the customer’s current tier.”

windowing — the technique of grouping a stream of events into finite time buckets (windows) for aggregation. Common window types include tumbling (fixed, non-overlapping), hopping (fixed size, overlapping), and session (activity-based gaps).

“We use a five-minute tumbling window to count page views per user for our real-time dashboard.”

“Session windowing groups all actions within a user’s visit, closing the window after 30 minutes of inactivity.”

state store — a local, persistent key-value store maintained by a Kafka Streams task to support stateful operations like joins and aggregations. State stores are backed up to a compacted changelog topic.

“The aggregation is backed by a RocksDB state store — make sure the instance has enough disk space during recovery.”

interactive queries — a Kafka Streams feature that lets you query the state stores of a running application directly, turning your stream processor into a queryable microservice.

“We expose the running totals via interactive queries so the dashboard can fetch real-time counts without hitting the database.”


How to Use These in Conversation

Understanding these terms is only half the battle — using them naturally in daily communication is what marks you as a fluent member of a Kafka-focused engineering team. Here are some typical scenarios:

Scenario 1 — Architecture discussion: Your team is deciding how to split event types. You might say: “I’d recommend separate topics for order-created and order-updated events. That way each consumer group can subscribe to only the events it cares about, and we avoid unnecessary processing overhead.”

Scenario 2 — Incident investigation: A downstream service is sending alerts. You could say: “Consumer lag on the payment topic has reached 200k. Let’s check whether a rebalancing loop is the cause — I noticed the pod count dropped briefly during the deploy.”

Scenario 3 — Code review: Reviewing a colleague’s producer configuration, you comment: “You should enable the idempotent producer flag here. Without it, a retry after a network timeout could result in duplicate records. And given this is a financial flow, we really should consider exactly-once semantics end-to-end.”

Scenario 4 — Onboarding a new team member: Explaining the data model, you say: “We model the current state of each user’s subscription as a KTable, and we join it against the incoming event KStream to decide whether to grant or deny access. Any event we cannot deserialise goes to the dead letter topic for manual review.”


Quick Reference

TermWhat it isWhy it matters
topicNamed message channelOrganises events by type
partitionOrdered sub-division of a topicEnables parallel consumption
offsetPosition of a record in a partitionTracks consumer progress
consumer groupGroup of consumers sharing partition loadHorizontal scaling
lagGap between latest and consumed offsetKey health indicator
Schema RegistryCentralised schema versioning servicePrevents producer/consumer mismatch
KTableChangelog stream (latest value per key)Models current state
exactly-once semanticsNo duplicates, no data loss guaranteeCritical for financial pipelines
dead letter topicHolding area for unprocessable messagesPrevents pipeline blockage
rebalancingRedistribution of partitions in a groupTriggered by membership changes

Mastering this vocabulary will help you contribute more confidently to architecture decisions, write clearer commit messages and runbooks, and communicate precisely during incidents. The next step is to practise using these terms in context — code reviews, design documents, and team standups are all excellent opportunities to reinforce what you have learnt.