From the course: Kafka Essentials: Quick Start for Building Effective Data Pipelines by Pearson

Unlock this course with a free trial

Join today to access over 25,200 courses taught by industry experts.

Understanding file management and compaction

Understanding file management and compaction

In this lesson, we're going to talk a little bit more about how file management or log file management is handled by Kafka. We've talked before in lesson 1.4 and actually 7.4 about the retention time for a topic log and how you can set that using Kafka topics. And we use that actually to clean out or clear data from a topic by setting the retention time to one second. And we did that using Kafka topics and we set retention.milliseconds to 1000 for one second. And then when Kafka comes around to do its cleanup, it just cleans out that topic and removes all the data. And then we reset it back to the default value. All these retention policies are important so that you don't overwhelm your servers with too much data in the Kafka server config file. And that's usually in something like Etsy slash Kafka slash config. There are three possible settings for the retention time. So the first one we talked about is retention in milliseconds, and that's how long to keep the log file before…

Contents