DuckDB's extension system dramatically expands its capabilities — from querying S3 with httpfs to in-browser analytics with WASM. Master the INSTALL/LOAD pattern, core and community extensions, and the Arrow integration for building versatile analytical toolchains.
0 / 5 completed
1 / 5
A data engineer runs INSTALL httpfs; LOAD httpfs; in DuckDB. What capability does the httpfs extension add?
The httpfs extension enables DuckDB to read (and write) files directly from HTTP/HTTPS URLs and S3-compatible storage (AWS S3, GCS, Cloudflare R2, MinIO). After loading it, queries like SELECT * FROM read_parquet('s3://bucket/file.parquet') work natively without downloading files first.
2 / 5
DuckDB has a community extensions repository. How does a user install a community extension like spatial?
DuckDB's community extension repository is accessible via FROM community in the INSTALL command: INSTALL spatial FROM community;. This fetches the extension binary from the community repository. Core extensions (httpfs, parquet, json, arrow) use just INSTALL name; without specifying a repository.
3 / 5
The DuckDB Arrow extension enables integration with the Apache Arrow ecosystem. What primary operation does it enable?
The DuckDB Arrow extension (and its Python integration) enables zero-copy data exchange via the Arrow columnar memory format. DuckDB can consume Arrow tables/record batches from PyArrow, pandas (via Arrow), or Polars without copying data — and produce Arrow output. This makes DuckDB a high-performance query engine within Arrow-native data pipelines.
4 / 5
A developer uses DuckDB-WASM in a browser application. Which DuckDB extension is most commonly pre-bundled in the WASM build to support browser-based analytics?
The DuckDB-WASM build targeted at browser use cases typically bundles the httpfs (for fetching files from URLs) and parquet/json extensions, enabling browser-based analytics directly on remote data files. Not all native extensions are available in WASM due to binary size constraints and missing browser APIs (e.g., full filesystem access).
5 / 5
After running INSTALL json; LOAD json;, a DuckDB query uses read_json_auto('data.json'). What does the json extension's auto-detection do?
The read_json_auto() function uses DuckDB's JSON extension schema inference to sample the file, detect field names and types (including nested structures), and return a fully typed relational result — all without requiring the user to define a schema. It handles NDJSON, JSON arrays, and single objects automatically.