English Vocabulary for DuckDB Analytics
Learn the English vocabulary data analysts and engineers use with DuckDB — in-process OLAP, Parquet querying, WASM deployment, MotherDuck, and DuckDB extensions explained.
DuckDB has quickly become the tool of choice for local and embedded analytics, earning the nickname “SQLite for analytics.” Data engineers and analysts who work with Python, R, or data notebooks use DuckDB vocabulary constantly in documentation, blog posts, and community discussions. This post covers the terms you need to discuss DuckDB fluently and professionally.
Key Vocabulary
In-Process OLAP DuckDB is an in-process OLAP (Online Analytical Processing) database, meaning it runs inside your application process rather than as a separate server. There is no network round-trip, no connection pool, and no daemon to manage. Analysts say DuckDB runs “in-process” or “embedded.” Example: “Because DuckDB is in-process, the Python script queries 50 million rows directly in memory without any network overhead.”
Columnar Storage
DuckDB stores data in a columnar format, meaning all values in a column are stored together. This is optimal for analytical queries that aggregate a few columns across many rows. Analysts say DuckDB uses “columnar storage” or “column-oriented execution.”
Example: “The columnar storage lets DuckDB compute the sum of the revenue column across 100 million rows by reading only that one column from disk.”
read_parquet()
read_parquet() is a DuckDB SQL function that queries Parquet files directly, without importing them into a database first. It can read local files, S3 paths, and HTTP URLs. Analysts “use,” “call,” or “query via” read_parquet().
Example: “Instead of loading the data into a table, I queried it directly with SELECT * FROM read_parquet('s3://my-bucket/data/*.parquet').”
SQL Extensions (LIST, STRUCT, MAP types)
DuckDB extends standard SQL with complex nested types: LIST (ordered arrays), STRUCT (named fields), and MAP (key-value pairs). These let you work with semi-structured data directly in SQL. Analysts “use,” “query,” and “unnest” these types.
Example: “The JSON column was parsed as a STRUCT so I could access nested fields with dot notation directly in the SQL query.”
WASM Deployment DuckDB can run in a web browser via WebAssembly (WASM). This lets you embed a full analytical database in a client-side web application with no backend required. Developers “deploy DuckDB via WASM,” “run DuckDB in the browser,” or “use the DuckDB WASM build.” Example: “We embedded the DuckDB WASM build in our reporting dashboard so users can run ad-hoc SQL queries in the browser without hitting our servers.”
MotherDuck MotherDuck is a cloud service built on DuckDB that adds collaboration, persistent storage, and a cloud execution tier. Analysts “connect to MotherDuck,” “deploy to MotherDuck,” or “use MotherDuck for shared analytics.” Example: “We use MotherDuck for our team’s shared datasets so everyone queries the same data without managing a dedicated server.”
DuckDB Extensions
DuckDB supports a plugin system called extensions that add functionality — such as reading from Postgres (postgres extension), Iceberg (iceberg extension), spatial data (spatial extension), or Excel files (excel extension). Developers “install,” “load,” and “use” extensions.
Example: “I installed the httpfs extension to query Parquet files directly from S3 without downloading them first.”
Streaming Result Sets DuckDB can return query results incrementally as a stream rather than loading everything into memory at once. This is important for large result sets. Developers “stream results,” “use the streaming API,” or “fetch results in batches.” Example: “For the 500-million-row export, I used the streaming result set API to write results to a Parquet file chunk by chunk instead of loading everything into a DataFrame.”
Common Phrases and Collocations
“query Parquet files directly” The standard description of DuckDB’s most celebrated feature — reading Parquet without a prior import step. “Directly” is the key word that distinguishes DuckDB’s approach. Example: “DuckDB lets you query Parquet files directly from S3 with full SQL support, including joins and window functions.”
“run DuckDB in the browser” The standard phrase for the WASM use case. “In the browser” — not “on the frontend” or “client-side.” Example: “We run DuckDB in the browser for our interactive report builder — users can filter and aggregate without a backend API call.”
“use DuckDB for local analytics” Describes the common workflow of using DuckDB as a local tool during data exploration and development. “Local analytics” is the established phrase in the DuckDB community. Example: “I use DuckDB for local analytics when exploring a new dataset — it handles 10 GB CSVs faster than pandas on my laptop.”
“scan the Parquet file” Describes DuckDB reading through a Parquet file during query execution. “Scan” is the technical term — more precise than “read” in query performance discussions. Example: “With partition pruning enabled, DuckDB only needs to scan three Parquet files out of the 200 in the partition folder.”
“attach a database” DuckDB supports attaching multiple database files in a single session. Teams “attach,” “detach,” and “query across” attached databases. Example: “I attached the production DuckDB file as read-only and the local analysis database as read-write so I can join data from both.”
Practical Sentences to Practice
- “DuckDB is in-process, so there is no server to start — just import the library and run SQL.”
- “I queried the entire S3 data lake by pointing
read_parquet()at the top-level prefix with a glob pattern.” - “The
spatialextension lets DuckDB run geospatial queries without exporting data to PostGIS.” - “Our interactive dashboard runs DuckDB in the browser using the WASM build — latency dropped from 800 ms to under 50 ms.”
- “Load the
jsonextension to automatically parse JSON columns into STRUCT types that you can query with dot notation.”
Common Mistakes to Avoid
Calling DuckDB a “database server” DuckDB is an in-process or embedded database — it has no server mode by default (MotherDuck adds cloud capability). Saying “our DuckDB server” is incorrect in most contexts. Say “our DuckDB instance” or “the embedded DuckDB database.”
Saying “import” instead of “query directly” One of DuckDB’s key advantages is that you can query Parquet and CSV files without importing them into a table first. Saying “import the Parquet file into DuckDB” misses the point — say “query the Parquet file directly.”
Confusing “extension” and “function”
read_parquet() is a built-in function. The httpfs or postgres extension is a plugin that must be installed. Functions are always available; extensions must be explicitly loaded. Do not call a function an extension or vice versa.
Summary
DuckDB’s vocabulary — in-process OLAP, columnar storage, read_parquet(), SQL extensions, WASM deployment, MotherDuck, and streaming result sets — reflects its position as the fastest-growing tool in the data engineering ecosystem. Fluency in these terms helps you communicate confidently in data team discussions, write accurate blog posts and documentation, and participate in the active DuckDB community on GitHub and Discord. The best English resources in this space are the DuckDB blog, where the core team regularly publishes detailed technical articles, and the DuckDB documentation, which uses consistent vocabulary and includes clear, practical examples.