English for ScyllaDB Developers
Master the English vocabulary developers need for ScyllaDB's partition model, shard-per-core architecture, and tombstones when discussing high-throughput NoSQL.
ScyllaDB is a wide-column NoSQL database compatible with Cassandra’s data model but rewritten for a shard-per-core architecture that changes how teams reason about performance. Vocabulary like “partition key,” “hot partition,” “tombstone,” and “shard-per-core” carries specific operational weight — misusing them leads to designs that look fine locally and fall over under production traffic. This guide covers the English used when discussing ScyllaDB with a team.
Key Vocabulary
Partition key — the part of the primary key that determines which node (and shard) a row lives on; all rows sharing a partition key are stored together and read together efficiently. “Querying by user ID works well because it’s the partition key — querying by email would mean scanning every partition, since email isn’t part of it.”
Hot partition — a partition receiving disproportionately more reads or writes than others, often from a poorly chosen partition key (like a single popular tenant ID), overloading one node while others sit idle. “This design puts every event for a given day under one partition key — on a high-traffic day, that’s a hot partition, and one shard ends up doing all the work.”
Shard-per-core — ScyllaDB’s architecture where each CPU core runs its own independent shard with its own memory and I/O, avoiding cross-core locking, unlike Cassandra’s thread-pool model. “Because of shard-per-core, adding more cores to a node scales throughput close to linearly — there’s no shared lock between cores fighting over the same data.”
Tombstone — a marker written when a row or cell is deleted, kept around until compaction removes it; too many tombstones (from patterns like frequent delete-and-reinsert) slow down reads that have to skip past them. “This queue table deletes and reinserts constantly — that’s a tombstone accumulation pattern, and reads are slowing down because they scan past thousands of dead markers.”
Compaction strategy — the background process (and its configurable strategy — size-tiered, leveled, time-window) that merges SSTables on disk, reclaims space from deleted data, and affects both write and read performance differently. “Switch this time-series table to time-window compaction strategy — size-tiered isn’t designed for data that’s mostly written once and expires on a schedule.”
Common Phrases
- “Is this partition key going to create a hot partition once traffic scales up?”
- “Are we seeing tombstone accumulation on this table, and is that from a delete-heavy access pattern?”
- “Which compaction strategy is configured here, and does it match this table’s write/read pattern?”
- “Does shard-per-core mean we should be thinking about per-shard load rather than per-node load?”
- “Should this really be one partition, or does it need to be split further to spread the load?”
Example Sentences
Reviewing a pull request: “This uses a fixed constant as part of the partition key for all of today’s writes — that’s a textbook hot partition waiting to happen once volume grows.”
Explaining a design decision: “We chose time-window compaction for this table specifically because the data has a strict TTL and is never updated after being written — it matches the access pattern much better than the default.”
Describing an incident: “Read latency on that table crept up for weeks before anyone noticed it was tombstone accumulation — the delete-and-reinsert pattern from the retry logic was quietly poisoning read performance.”
Professional Tips
- Say “partition key” and “hot partition” precisely when reviewing schema — these are the single most common source of production issues in wide-column stores, and naming them clearly speeds up design review.
- When a delete-heavy table shows creeping latency, ask “could this be tombstone accumulation?” before assuming it’s an unrelated performance regression.
- Mention shard-per-core explicitly when discussing capacity planning — it changes the unit of scaling from “add a node” to “think in shards,” which affects how teams size clusters.
- Name the compaction strategy directly when proposing a schema for time-series or append-heavy data — the default strategy is rarely the right one for those workloads.
Practice Exercise
- Explain in two sentences how a poorly chosen partition key can create a hot partition.
- Write a one-sentence code review comment flagging a delete-heavy pattern that risks tombstone accumulation.
- Describe, in your own words, what shard-per-core means and how it differs from a thread-pool model.