English for StarRocks Developers
Learn the English vocabulary for StarRocks, the high-performance analytical database: materialized views, colocation, and compaction.
StarRocks conversations tend to center on one question: is the slowness a query-design problem, or a data-layout problem — and the vocabulary below exists mostly to let a team answer that question precisely instead of guessing.
Key Vocabulary
Materialized view — a precomputed, automatically-maintained result of a query, stored so subsequent queries can read from it instead of recomputing the aggregation each time. “Instead of running this expensive daily aggregation on every dashboard load, we built a materialized view that refreshes incrementally and serves the same result instantly.”
Colocation group — a configuration that ensures related tables are physically stored on the same nodes, so joining them doesn’t require shuffling data across the network. “This join used to be slow because the two tables lived on different nodes — putting them in the same colocation group turned it into a local join.”
Bucket / tablet — a bucket is a logical partition of a table’s data, physically stored as a tablet, the smallest unit StarRocks distributes and replicates across the cluster. “The uneven load across nodes traced back to a bad bucket count — too few buckets meant a few tablets were massively larger than the rest.”
Compaction — the background process that merges smaller data segments written by frequent inserts into larger, more efficient ones, which affects both storage size and query speed over time. “Query latency crept up over the week because compaction was falling behind the ingestion rate, leaving too many small segments to scan.”
Broadcast vs. shuffle join — a broadcast join sends a small table’s full data to every node, while a shuffle join redistributes both tables’ data by join key, and picking the wrong one for the data size hurts performance badly. “Forcing a shuffle join instead of a broadcast join fixed the memory spike, since the ‘small’ table on the right side had actually grown to millions of rows.”
Common Phrases
- “Would a materialized view make sense here, or is this aggregation actually cheap enough to run live?”
- “Are these two tables in the same colocation group, or is this join shuffling data across the network unnecessarily?”
- “Is our bucket count actually balanced, or are a few tablets carrying a disproportionate share of the data?”
- “Is compaction keeping up with ingestion, or are we accumulating too many small segments?”
- “Is the planner choosing a broadcast join here — is that still the right call given how this table has grown?”
Example Sentences
Explaining a materialized view decision in review: “We added a materialized view for this aggregation because five different dashboards were independently recomputing the same expensive query.”
Describing a colocation fix: “The join was shuffling terabytes of data every run — putting both tables in the same colocation group eliminated that entirely.”
Justifying a compaction investigation: “Before tuning the query, we checked compaction status and found thousands of small unmerged segments — that was the real cause of the slowdown.”
Professional Tips
- Suggest a materialized view specifically when the same expensive aggregation is recomputed repeatedly across queries or dashboards — that’s the clearest sign it’s worth the maintenance cost.
- Check colocation configuration before assuming a join is inherently slow — a shuffle that should be a local join is a common, fixable cause.
- Investigate bucket and tablet balance when one part of the cluster seems overloaded — uneven distribution is often the root cause, not raw data volume.
- Monitor compaction lag as its own signal, separate from query performance — a healthy query plan can still be slow if compaction has fallen behind.
Practice Exercise
- Explain what a materialized view is and why it helps with repeated aggregations.
- Describe what a colocation group does and how it changes a join’s performance.
- Write a sentence explaining why falling-behind compaction can slow down queries.