English for Apache Cassandra
Learn the English vocabulary for discussing Cassandra's partitioning, consistency levels, and replication when working with a distributed database team.
Cassandra trades some of the guarantees developers expect from relational databases for horizontal scale and availability, and that trade-off shows up constantly in how teams talk about it — consistency levels, partition keys, and replication factor come up in almost every design discussion. Getting comfortable with this vocabulary in English makes it much easier to reason about trade-offs out loud with a team.
Key Vocabulary
Partition key — the part of a row’s primary key that determines which node in the cluster stores the data, making it the single most important design decision in a Cassandra table.
“We’re seeing hot partitions because the partition key is just tenant_id, and one large customer is generating way more traffic than the rest.”
Replication factor — the number of copies of each piece of data that Cassandra stores across different nodes, directly trading storage cost for fault tolerance. “With a replication factor of three, we can lose one node entirely and still serve every read and write without data loss.”
Consistency level — a per-query setting that controls how many replicas must acknowledge a read or write before it’s considered successful, balancing consistency against latency and availability. “We’re using quorum consistency for writes, so as long as two out of three replicas confirm, the client gets an acknowledgment even if the third one is temporarily down.”
Tombstone — a marker Cassandra writes instead of immediately deleting data, since data can live on multiple nodes and deletes need to propagate before being permanently removed. “Query performance degraded because this table has accumulated millions of tombstones from repeated deletes, and every read has to scan past them.”
Compaction — the background process that merges multiple SSTables into fewer, larger ones, reclaiming space from overwritten and deleted data and improving read performance. “We scheduled a manual compaction because reads were slowing down from too many small SSTables piling up after a bulk import.”
Common Phrases
- “What’s the partition key on this table, and are we at risk of a hot partition?”
- “Which consistency level are we reading and writing at for this workload?”
- “Could this slowdown be caused by tombstone buildup from all those deletes?”
- “Do we need to trigger a manual compaction, or is it keeping up on its own?”
- “Is our replication factor high enough to tolerate losing a node in this datacenter?”
Example Sentences
Diagnosing a hot partition: “All of our traffic is hitting the same few nodes because the partition key doesn’t distribute evenly — we need to add a component like a date bucket to spread it out.”
Explaining a consistency trade-off:
“We chose quorum consistency instead of ALL because we’d rather tolerate one slow replica than fail every write when a single node has a blip.”
Reviewing a performance regression: “This table’s read latency spiked after a mass deletion job ran last week — I suspect it’s tombstone buildup, and we should force a compaction to confirm.”
Professional Tips
- Say hot partition, not “slow node,” when a single partition key is overloading one part of the cluster — it points reviewers straight at the schema design instead of the hardware.
- Always state the consistency level explicitly when discussing a query’s guarantees — “it’s consistent” means nothing in Cassandra without specifying which level you mean.
- Mention tombstones by name when diagnosing read slowdowns on tables with heavy deletes — it’s a distinctly Cassandra-specific cause that a relational-database background won’t anticipate.
- Frame replication factor decisions in terms of the failure you want to tolerate (“survive one node loss per datacenter”) rather than just a number — it makes the trade-off concrete for non-database engineers in the discussion.
Practice Exercise
- Explain, in one sentence, why a poorly chosen partition key can cause a hot partition even when the cluster has plenty of total capacity.
- Describe the difference between a tombstone and an actual deleted row in Cassandra’s storage model.
- Write two sentences comparing quorum consistency and
ALLconsistency, and when you’d choose each.