DuckDB Extensions Vocabulary: English for In-Process Analytics Discussions

Learn the English vocabulary analytics engineers and data scientists use when discussing DuckDB extensions, ATTACH, and in-process analytics workflows.

DuckDB has become a favourite tool among data scientists and analytics engineers for its speed, simplicity, and surprisingly rich extension ecosystem. If you work with DuckDB in team settings — code reviews, documentation, or technical discussions — knowing the right English vocabulary will help you communicate precisely. This guide covers the core terms and the collocations that surround them.

Core Vocabulary

Extension A loadable plugin that adds new functionality to DuckDB, such as support for a file format, a remote filesystem, or a geospatial functions library. Extensions can be installed and loaded at runtime.

“We added the spatial extension to support the geographic distance calculations — it ships with PostGIS-compatible functions.”

Autoload A DuckDB behaviour where certain well-known extensions are automatically loaded when a relevant function or file type is referenced, without requiring an explicit LOAD statement.

“DuckDB autoloaded the httpfs extension when we ran the query against the S3 URL, so we didn’t need to install it manually.”

Spatial extension A DuckDB extension that provides geospatial data types and functions, including support for reading shapefiles and performing geometric operations.

“Once we loaded the spatial extension, we could query the GeoJSON files directly with standard SQL and spatial predicates.”

httpfs An extension that enables DuckDB to read files from remote HTTP or S3-compatible endpoints directly in SQL queries, without downloading files first.

“With httpfs, our analysts can query Parquet files sitting in S3 as if they were local tables — no ETL step needed.”

Iceberg extension A DuckDB extension that adds native support for reading Apache Iceberg tables, including metadata scanning and snapshot-based time travel.

“We installed the Iceberg extension so the analytics team can query our data lake directly without going through Spark.”

Community extensions vs core extensions Core extensions are maintained by the DuckDB team and ship with guaranteed compatibility. Community extensions are contributed by third parties and may have looser version compatibility guarantees.

“Before you add that community extension to the pipeline, check the compatibility matrix — community extensions don’t always follow the same release cadence as core ones.”

ATTACH statement A DuckDB SQL command that connects an external database file — another DuckDB file, a SQLite database, or a remote source — to the current session, allowing cross-database queries.

“We ATTACH the production DuckDB file as read-only and join it against our local staging data to compare results.”

In-process database A database engine that runs inside the host process rather than as a separate server. DuckDB is in-process, meaning it runs inside your Python script or application without a network connection or separate daemon.

“The great thing about an in-process database is that there’s zero serialisation overhead — DuckDB reads your Pandas dataframe directly from memory.”

Persistent vs in-memory mode DuckDB can run with data stored on disk (persistent mode) or entirely in RAM (in-memory mode). In-memory mode is faster for ephemeral analysis but loses all data when the session ends.

“For our CI pipeline, we use in-memory mode so the test database is automatically discarded — no cleanup step needed.”

Key Collocations

  • install and load an extension — “To use httpfs, you need to install and load the extension first — INSTALL httpfs; LOAD httpfs.”
  • attach a remote database — “We can attach a remote database over the S3 endpoint and query it alongside local tables in the same SQL statement.”
  • query remote files — “DuckDB lets you query remote files on S3 directly using the httpfs extension — no local copy required.”
  • scan a Parquet file — “The query scans a Parquet file of 4 billion rows and returns results in under 10 seconds on a laptop.”
  • persist to disk — “For long-running analysis sessions, we persist to disk so we can resume the next day without re-processing the source data.”
  • run in-memory — “Unit tests for our data transformations run in-memory with a seed dataset — fast and isolated.”

Using This Vocabulary in Discussions

One of the most common DuckDB topics in team discussions is the trade-off between persistent and in-memory mode. A typical exchange might sound like: “Do we actually need to persist to disk here, or can we run in-memory and regenerate from source if needed? The dataset is only 2 GB.” Using these terms precisely — rather than “save to a file” or “keep it in RAM” — shows familiarity with DuckDB’s architecture.

When discussing extensions, English speakers frequently use the pair “install and load” together, because these are two separate steps in DuckDB. You install an extension once (it downloads and caches it), and you load it each session. Knowing this distinction prevents confusion when someone says “I installed httpfs but it’s not working” — the answer is usually “did you also load it?”

The verb “attach” has a specific meaning in DuckDB that is different from its everyday English usage. In daily English, “attach” usually means to fasten something to something else. In DuckDB, you attach a database to the current session — similar to mounting a drive. If you need to explain this to a new team member, you might say: “ATTACH is how we connect a second database file so we can query both in the same SQL.”

Common Mistakes to Avoid

A common source of confusion is mixing up extension and plugin in DuckDB contexts. While “plugin” is a general term, the DuckDB community consistently uses “extension.” Using “plugin” is understandable, but it signals that you may be borrowing from another ecosystem’s vocabulary.

Another frequent mistake is describing DuckDB as a “server database” or asking “what port does DuckDB run on?” DuckDB is an in-process database — there is no server, no port, and no connection string in the traditional sense. The correct framing is: “DuckDB runs inside your process.”

Practice Tip

Open DuckDB’s extension documentation and pick three extensions you haven’t used before. Write a short paragraph explaining what each one does and when you would use it, using the phrase “install and load” in at least one sentence. Then describe the difference between running in-memory versus persisting to disk in your own words — this is a question that commonly comes up when onboarding new team members.