From the course: Complete Guide to Data Lakes and Lakehouses
Unlock the full course today
Join today to access over 25,200 courses taught by industry experts.
Table formats: Delta Lake, Apache Iceberg, Apache Hudi
From the course: Complete Guide to Data Lakes and Lakehouses
Table formats: Delta Lake, Apache Iceberg, Apache Hudi
- So far, we have talked about the data lakehouse architecture and how it improves the data lake and its capabilities for supporting asset transactions and dynamic schemas. Now, it is time to talk about the core aspect that makes all of those outcomes possible. and as a format used for storing data. As we have seen earlier in the course, data in a data lake can be stored in optimized formats like Parquet, Avro, and ORC. In a data lakehouse, we go a step further by using even more advanced file formats, which we refer to as table formats. Table formats like Delta Lake, Apache Iceberg, or Apache Hudi are data formats that behave similarly to tables in a data warehouse. These table formats manage and store data in a way that combines the scalability of file systems with the structure query capabilities of warehouse tables. So what can these table formats offer and why are they so relevant in data lakehouses? First, they offer storage efficiency as data is stored in files or objects…