Learn vocabulary for log-based architecture, immutable event logs, Kafka log compaction, consumer group lag, offset management, and replay semantics.
0 / 5 completed
1 / 5
What is a 'log-based architecture' in streaming vocabulary?
Log-based architecture (Martin Kleppmann, 'Designing Data-Intensive Applications'): the log is the system of record. Kafka topics are durable, ordered, append-only logs. Services publish facts ('OrderPlaced', 'PaymentProcessed') to the log; consumers derive views (databases, caches, search indexes) by reading and processing the log. The log becomes the integration backbone — any consumer can replay from offset 0 to rebuild state.
2 / 5
What is 'log compaction' in Kafka vocabulary?
Log compaction vs. time-based retention: time-based retention deletes old messages after N days (suitable for event streams). Log compaction keeps the latest record per key indefinitely (suitable for changelog topics — e.g., a KTable changelog). After compaction, the log still contains all current values but not the full history per key. Kafka Streams uses compacted changelog topics to restore local state stores after failures.
3 / 5
What is 'consumer group lag' in Kafka vocabulary?
Consumer group lag (also called consumer lag): if the topic's latest offset is 10,000 and a consumer group's committed offset is 9,500, the lag is 500 messages. High lag indicates consumers can't keep up with producers — a sign of processing bottleneck or consumer failure. Monitor lag with kafka-consumer-groups --describe or tools like Burrow, Cruise Control, or Confluent Control Center. Lag is a key operational metric.
4 / 5
What is 'offset management' in Kafka consumer vocabulary?
Kafka offsets: each message in a partition has a sequential offset number. Consumers commit their current offset to Kafka (internal __consumer_offsets topic) to record progress. Auto-commit (enable.auto.commit=true): simple but risks duplicates on crash. Manual commit: commit only after successful processing (at-least-once). Commit before processing (at-most-once). Exactly-once: use transactions + idempotent consumers.
5 / 5
What is 'replay semantics' in event sourcing and streaming vocabulary?
Replay is a superpower of log-based architectures: because Kafka retains events (by time or log compaction), any consumer can rewind to offset 0 and replay the entire history. Use cases: (1) Deploy a new service that needs historical data — replay from the beginning. (2) Fix a bug in a consumer — replay to correct derived state. (3) Build a new read model (search index, cache) — populate from the log. Replay makes the event log the single source of truth.