English for Apache Pulsar Messaging

Learn the English vocabulary for Apache Pulsar: topics, subscriptions, tiered storage, and multi-tenancy.

Pulsar discussions often need to clarify one distinction early: unlike some messaging systems, a single topic in Pulsar can support several independent subscription patterns at once, so “who’s consuming this and how” is rarely a one-word answer.

Key Vocabulary

Subscription type (exclusive, shared, failover, key-shared) — the mode that determines how messages on a topic are distributed among the consumers attached to a subscription. “We switched this subscription from shared to key-shared, since we needed all messages for the same order ID to land on the same consumer, in order.”

Tiered storage — Pulsar’s ability to automatically offload older message segments from local disk to cheaper, long-term object storage, without consumers needing to know where the data physically lives. “We don’t need a separate archival pipeline — tiered storage moves the older segments to object storage automatically once they age out of the hot tier.”

Tenant / namespace — Pulsar’s built-in multi-tenancy model, where a tenant contains namespaces, and namespaces contain topics, letting many teams share one cluster with isolated configuration and quotas. “Instead of standing up a separate cluster for each team, we gave each one its own tenant, with its own retention and quota policies configured at the namespace level.”

BookKeeper (bookie) — the underlying distributed log storage layer that Pulsar brokers write to, responsible for durability and replication, separate from the brokers that handle client connections. “The brokers themselves were fine — the actual write latency spike traced back to a slow bookie in the BookKeeper cluster.”

Backlog — the accumulated set of unacknowledged messages waiting for a subscription to consume, a key metric for spotting a consumer that’s falling behind. “The alert wasn’t about the producer at all — it was the backlog on one subscription growing steadily, meaning that consumer group had stalled.”

Common Phrases

  • “Which subscription type does this consumer group actually need — shared, or key-shared for ordering guarantees?”
  • “Is this data still on local disk, or has it already moved to tiered storage?”
  • “Should this be its own tenant, or can it share a namespace with the existing team’s topics?”
  • “Is this a broker problem, or is the actual bottleneck down in BookKeeper?”
  • “Is the backlog growing because the consumer is slow, or because it’s actually stopped processing?”

Example Sentences

Explaining a subscription choice in review: “We need per-key ordering for these events, so key-shared is the right subscription type here, not plain shared.”

Describing a cost optimization: “We enabled tiered storage on this namespace so months-old data moves to object storage automatically instead of eating expensive local disk.”

Diagnosing a latency issue: “The brokers reported healthy, so we checked BookKeeper directly and found one bookie with degraded disk I/O dragging down write latency.”

Professional Tips

  • Name the exact subscription type when discussing message ordering or distribution — “consumers aren’t getting messages in order” almost always traces back to the wrong subscription mode.
  • Mention tiered storage explicitly when discussing retention costs — it changes the cost conversation from “how long can we afford to keep this” to “how much hot-tier disk do we actually need.”
  • Use tenant and namespace boundaries deliberately when onboarding a new team — it’s the mechanism for isolating quotas and configuration, not just an organizational label.
  • Separate broker and BookKeeper issues explicitly when diagnosing latency — they’re different layers with different failure modes.

Practice Exercise

  1. Explain the difference between a shared and a key-shared subscription.
  2. Describe what tiered storage does and why it matters for cost.
  3. Write a sentence explaining the difference between a broker and a bookie.