DuckDB is an in-process analytical database that runs inside your Python program without a separate server. It queries Parquet, CSV, and JSON files directly, integrates with Pandas DataFrames via zero-copy Arrow, and executes vectorized columnar SQL queries.
0 / 5 completed
1 / 5
A data scientist runs import duckdb; duckdb.sql('SELECT * FROM read_parquet("data.parquet")'). What is notable about this operation?
DuckDB can query Parquet files directly using the read_parquet() function without importing them into a persistent database. DuckDB uses columnar vectorized execution and can push predicates into Parquet's row group metadata for efficient scanning.
2 / 5
Which Python data structure can DuckDB query directly using its Python API without any data copying?
DuckDB integrates with Pandas DataFrames and Arrow tables via zero-copy scanning. You can reference a DataFrame directly in SQL: duckdb.sql('SELECT * FROM my_df') where my_df is a Pandas DataFrame variable in scope.
3 / 5
What does duckdb.connect(':memory:') return in the DuckDB Python API?
DuckDB in-memory connections created with ':memory:' are isolated per connection. Each connection gets its own ephemeral database that is discarded when the connection closes. This is ideal for testing and one-off analytical queries.
4 / 5
A developer uses DuckDB's Python API to run a query and wants the result as an Arrow table. Which method should they call?
The .arrow() method on a DuckDB relation returns the result as an Apache Arrow Table. .df() returns a Pandas DataFrame, .fetchall() returns a list of tuples, and .to_pandas() is an alias for .df().
5 / 5
Which DuckDB feature allows running SQL queries across multiple CSV files in a directory using a glob pattern?
DuckDB's read_csv() (and read_parquet()) accepts glob patterns like 'data/*.csv' to scan multiple files in a single query. DuckDB automatically unions the results and can infer schemas from the files.