English for Citus Developers

Master the English vocabulary developers need for Citus's distributed Postgres model, shard keys, and co-location when scaling out relational workloads.

Citus turns PostgreSQL into a distributed database by sharding tables across worker nodes while keeping full SQL compatibility, but that only works well if a team understands terms like “distribution column,” “co-location,” and “reference table.” Getting the distribution column wrong at table creation is one of the most expensive mistakes to fix later, since it usually means rebuilding the table. This guide covers the English used when discussing Citus with a team.

Key Vocabulary

Distribution column (shard key) — the column Citus uses to decide which shard a row belongs to, chosen once when a table is distributed; nearly every query’s performance depends on whether it filters or joins on this column. “We distributed by tenant_id, so any query filtering by tenant stays on a single shard — but a query that ignores it has to fan out across every worker.”

Co-location — arranging related distributed tables so rows that are frequently joined together (like an order and its line items) land on the same shard, letting Citus execute the join locally instead of shuffling data across the network. “Since orders and order_items are co-located on tenant_id, this join runs entirely on one worker — if they weren’t co-located, Citus would have to redistribute data first.”

Reference table — a small table (like a country list or a status enum) replicated in full to every worker node, so it can be joined locally against any distributed table without a network shuffle. “Make the currencies table a reference table — it’s tiny, rarely changes, and every worker needs to join against it constantly.”

Shard rebalancing — the operation that moves shards between worker nodes to even out load, typically run after adding or removing a worker or when data growth becomes skewed. “After adding two new workers, we need to run a shard rebalance — otherwise the new nodes sit idle while the old ones stay just as loaded as before.”

Router query (vs. distributed query) — a query that Citus can route entirely to a single worker because it filters on the distribution column, versus a distributed query that has to touch multiple shards and aggregate results at the coordinator. “This became a router query the moment we added the tenant filter — before that, it was fanning out to all thirty-two shards for every request.”

Common Phrases

  • “What’s the distribution column on this table, and does this query actually filter on it?”
  • “Are these two tables co-located, or is this join going to trigger a shard shuffle?”
  • “Is this small, rarely-changing table a good candidate for a reference table?”
  • “Do we need a shard rebalance after this scaling change?”
  • “Is this a router query, or is it fanning out across every shard?”

Example Sentences

Reviewing a pull request: “This query joins two distributed tables without filtering on the distribution column — that’s going to become a full distributed join across every shard rather than a router query.”

Explaining a design decision: “We made plans a reference table instead of distributing it, since every tenant-scoped query needs to join against it and it barely ever changes.”

Describing an incident: “Latency spiked after we onboarded a large new tenant, because their data landed disproportionately on one shard — a rebalance combined with a better distribution key fixed it.”

Professional Tips

  • Say “distribution column” precisely, not just “the sharding column” — it’s the term the Citus docs and community use, and it’s the single decision with the biggest long-term consequences.
  • When reviewing a join between two distributed tables, ask “are these co-located?” before assuming the query will perform well — an uncolocated join is far more expensive than it looks.
  • Use “reference table” deliberately for small, frequently-joined lookup data — it’s a specific Citus feature, not just “a table we didn’t bother distributing.”
  • Distinguish router queries from distributed queries when discussing performance — a query becoming a router query is usually the single biggest latency win available.

Practice Exercise

  1. Explain in two sentences why the distribution column choice matters so much at table creation time.
  2. Write a one-sentence code review comment recommending a small lookup table become a reference table.
  3. Describe, in your own words, the difference between a router query and a distributed query.