English for Trino Developers
Master the English vocabulary developers need for Trino's federated query engine, connectors, and query planning when discussing analytics across data sources.
Trino (formerly PrestoSQL) is a distributed SQL query engine built to query data where it lives — object storage, relational databases, Kafka — without first loading it into a single warehouse. That “query in place” model brings vocabulary (connector, catalog, federated query, splits) that teams used to a single-database mindset need to pick up quickly. This guide covers the English used when discussing Trino with a team.
Key Vocabulary
Connector — the plugin that lets Trino read (and sometimes write) a specific data source — Hive, Iceberg, PostgreSQL, Kafka — translating that source’s native format into Trino’s query engine. “We can’t push this filter down efficiently because the connector for this source doesn’t support predicate pushdown yet — it has to pull the full partition and filter afterward.”
Catalog — a named configuration of a connector pointing at a specific data source, giving every table a three-part name (catalog.schema.table) so a single query can span multiple systems.
“Reference the warehouse tables through the iceberg catalog and the live order data through the postgres catalog — you can join across both in one query.”
Federated query — a single query that joins data across multiple catalogs (and therefore multiple underlying systems) without any of them being physically moved beforehand. “This federated query joins yesterday’s Iceberg snapshot with today’s live Postgres orders — no ETL step needed just to answer the question.”
Split — the unit of work Trino divides a table scan into for parallel execution across worker nodes, roughly analogous to a partition or file range being processed concurrently. “If one split is much larger than the rest, that worker becomes the bottleneck — this is classic data skew showing up at the split level.”
Predicate pushdown — optimizing a query by passing filter conditions down to the connector so the underlying source does the filtering itself, instead of Trino pulling all the data and filtering afterward. “With predicate pushdown working correctly, this query only reads the three matching partitions from S3 instead of scanning the whole table.”
Common Phrases
- “Does this connector support predicate pushdown, or are we pulling more data than we need?”
- “Is this a federated query across catalogs, or can it stay within a single source?”
- “Are the splits roughly even, or is one worker doing disproportionately more work?”
- “Which catalog does this table actually live under — are we sure it’s not ambiguous?”
- “Can we avoid moving this data at all and just query it federated instead?”
Example Sentences
Reviewing a pull request: “This query pulls the entire table before filtering in application code — rewrite the WHERE clause so predicate pushdown can do that filtering at the source instead.”
Explaining a design decision: “We used a federated query instead of a nightly ETL job specifically because the freshness requirement here is minutes, not hours.”
Describing an incident: “The query was stuck at ninety-five percent for twenty minutes because of split skew — one partition was ten times larger than the others and one worker was still grinding through it.”
Professional Tips
- Say “predicate pushdown” precisely when a query is slow due to over-fetching — it names the specific optimization missing and points reviewers straight at the fix.
- When designing cross-system analytics, ask “does this need to be a federated query, or should we materialize it first?” — federation is powerful but not always the cheapest option.
- Use “catalog” correctly to mean a configured connection, not a data source name — confusing the two causes miscommunication about where a table actually lives.
- Mention split skew explicitly when diagnosing a slow, unevenly-progressing query — it’s a distinct problem from a missing index or bad join order.
Practice Exercise
- Explain in two sentences the difference between a connector and a catalog in Trino.
- Write a one-sentence code review comment recommending a query be rewritten to enable predicate pushdown.
- Describe, in your own words, what split skew looks like while a query is running.