From the course: Complete Guide to Data Lakes and Lakehouses

Unlock the full course today

Join today to access over 25,200 courses taught by industry experts.

Data compression

Data compression

- [Instructor] With the volume of data stored in data lakes, we certainly need to think about data compression. Data compression is a technique used to maximize storage efficiency, reduce cost, and speed up data processing. By compressing data, you can store more information in less space and enhance the speed of the data transfer across networks. Let me explain how data compression works in data lakes. Data compression reduces the size of data files without losing the original data integrity. It can be performed using two main methods. Lossless compression compresses data in such a way that it can be perfectly reconstructed from the compressed data. It is used in data lakes where accuracy and data integrity are important, such as in financial records or historical data. Lossy compression reduces file by permanently eliminating redundant information, which is ideal for some media files, like images and videos, where a slight loss of quality is acceptable in exchange for significant…

Contents