English for Apache Pulsar Developers
Master the English vocabulary developers need for Pulsar's multi-tenant namespaces, subscription modes, and tiered storage when discussing messaging architecture.
Apache Pulsar is a distributed messaging and streaming platform built around multi-tenancy and a separation between the messaging layer (brokers) and the storage layer (BookKeeper). Its vocabulary — “tenant,” “namespace,” “subscription mode,” “tiered storage” — trips up teams coming from Kafka, where the model is flatter. This guide covers the English used when discussing Pulsar with a team.
Key Vocabulary
Tenant / namespace — Pulsar’s two-level grouping above the topic: a tenant is typically an organization or business unit, and a namespace within it groups related topics with shared policies (retention, replication). “Give the fraud team their own tenant instead of dumping their topics into the shared namespace — that way their retention policy doesn’t affect anyone else’s.”
Subscription mode (exclusive, shared, failover, key-shared) — the setting that determines how multiple consumers on the same subscription share a topic’s messages, from one consumer only to load-balanced key-based distribution. “Switch this from shared to key-shared mode — right now messages for the same user can land on two different consumers out of order, and key-shared guarantees per-key ordering.”
Broker vs. bookie — Pulsar’s split architecture: brokers handle routing, dispatch, and client connections, while bookies (BookKeeper nodes) handle the actual durable storage of message data. “Scaling brokers gives us more connection and dispatch capacity, but it doesn’t add storage — for that we need to scale the bookies.”
Tiered storage — the ability to automatically offload older message segments from BookKeeper to cheaper object storage (like S3) while keeping them queryable, without a separate archival pipeline. “We don’t need a custom archival job for old segments — tiered storage offloads them to S3 automatically once they age past the configured threshold.”
Geo-replication — Pulsar’s built-in mechanism for replicating topics across clusters in different regions, configured at the namespace level rather than per-topic. “Enable geo-replication on the namespace, not on each topic individually — new topics created under it inherit the replication policy automatically.”
Common Phrases
- “Should this live in its own tenant, or is it fine sharing a namespace with the other services?”
- “Which subscription mode do we need here — do consumers need strict per-key ordering, or is shared load-balancing fine?”
- “Is this a broker capacity problem or a bookie storage problem?”
- “Have we set a tiered storage threshold for this namespace, or is everything staying on hot storage indefinitely?”
- “Is geo-replication configured at the namespace level, and does every topic under it need to replicate?”
Example Sentences
Reviewing a pull request: “This consumer group is on shared subscription mode but expects strict ordering per user — switch to key-shared or you’ll keep seeing out-of-order processing.”
Explaining a design decision: “We split billing and analytics into separate tenants so a retention policy change on one side can never accidentally affect the other.”
Describing an incident: “Storage costs spiked because tiered storage wasn’t enabled on that namespace — every message stayed on expensive hot storage instead of offloading to S3.”
Professional Tips
- Say “tenant” and “namespace” deliberately, not just “topic group” — the two-level model is central to how Pulsar isolates teams and policies.
- When debugging ordering issues, ask “what subscription mode is this consumer on?” — it’s usually the first thing to check before suspecting the producer.
- Distinguish broker and bookie explicitly when discussing scaling — conflating them leads to scaling the wrong tier and not fixing the bottleneck.
- Mention tiered storage thresholds specifically when proposing cost reductions — it’s Pulsar’s built-in answer to “how do we cheaply keep old data queryable.”
Practice Exercise
- Explain in two sentences the difference between a tenant and a namespace in Pulsar.
- Write a one-sentence code review comment recommending a subscription mode change to fix ordering.
- Describe, in your own words, what tiered storage does and why it matters for cost.