Learn the open table format — snapshots and time travel, hidden partitioning, catalogs, and engine-agnostic lakehouse access
0 / 5 completed
1 / 5
What is Apache Iceberg?
Apache Iceberg: is a table format, not a storage engine. It layers a metadata structure over data files (commonly Parquet) in object storage, giving lake data warehouse-like guarantees: ACID transactions, full schema and partition evolution, hidden partitioning, and snapshot-based time travel, all usable by many engines.
2 / 5
How does Iceberg enable time travel and rollback?
Snapshots: every write produces a new snapshot in the metadata tree, listing which data files constitute the table at that point. Queries can target a snapshot by id or as-of timestamp for time travel, and an operator can roll back the current pointer to a prior snapshot to undo a bad write — all without rewriting data.
3 / 5
What is hidden partitioning in Iceberg?
Hidden partitioning: rather than storing a separate partition column and requiring users to filter on it, Iceberg applies transforms like day(event_time) and records them in metadata. A query filtering on event_time automatically prunes partitions. You can also evolve partitioning later without rewriting historical data.
4 / 5
What does an Iceberg catalog provide?
Catalog: Iceberg needs a catalog to record where each table's current metadata file lives and to perform the atomic swap that makes a new snapshot the current one. Implementations include the REST catalog, AWS Glue, Hive metastore, and Nessie. The catalog is what makes concurrent commits safe and consistent.
5 / 5
Why is Iceberg considered engine-agnostic?
Engine-agnostic: because Iceberg defines an open table specification, multiple compute engines can operate on the identical underlying files and metadata concurrently. Spark might write while Trino reads and Flink streams in — all sharing one source of truth. This decoupling of storage format from compute is central to the open lakehouse approach.