Explore the in-process OLAP engine — querying Parquet directly, vectorised columnar execution, Python/Wasm bindings, and Arrow interop
0 / 5 completed
1 / 5
What kind of database is DuckDB?
DuckDB: runs in-process (no server) like SQLite, but is built for analytics: it is columnar and vectorised, executing aggregations and scans over large tables very quickly. It is ideal for local data analysis, embedding in applications, and crunching files like Parquet and CSV without a separate database server.
2 / 5
How can DuckDB query Parquet files directly?
Querying files:SELECT category, SUM(amount) FROM 'sales.parquet' GROUP BY category; works without a load step. DuckDB pushes projection and filters down, reading only the referenced columns and skipping irrelevant row groups via Parquet metadata — making analytical queries over large files fast and memory-efficient.
3 / 5
What makes DuckDB efficient for analytical workloads?
Vectorised columnar engine: DuckDB stores and processes data column-wise in batches (vectors) rather than tuple-at-a-time. This maximises CPU cache locality and allows tight, SIMD-friendly loops over homogeneous data, which is why aggregations and group-bys over millions of rows run quickly on a single machine.
4 / 5
How is DuckDB commonly used from Python or the browser?
Embeddings: in Python, import duckdb; duckdb.sql('SELECT * FROM df') queries a pandas DataFrame or Arrow table zero-copy. DuckDB-Wasm compiles DuckDB to WebAssembly so full SQL analytics run client-side in the browser. These bindings make it a popular engine for notebooks, data apps, and local-first tools.
5 / 5
What is DuckDB's relationship to Apache Arrow?
Arrow integration: DuckDB and Arrow share a columnar memory model, so DuckDB can scan Arrow tables and return results as Arrow without serialisation overhead. This zero-copy interchange lets DuckDB slot into pipelines with pandas, Polars, and other Arrow-based tools, moving large result sets between systems cheaply.