From the course: Machine Learning Foundations: Statistics

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Missing data

Missing data

- [Instructor] Imagine you're taking a survey. It says it will only take five minutes of your precious time, but after filling out 20 questions, you discover there are 30 more. Hey, you didn't sign up for this, so skip filling out half of the questions and finish the survey on time. You ask the colleague next to you what they did, and their answer is the same. They didn't bother filling out the whole survey. Data coming from the real world is rarely clean and homogenous. So in almost every dataset, there is some quantity of data missing. Missing data is defined as the values or data that is not stored or not present for some variables in the given dataset. You might ask yourself, "Hey, but why should I care about the missing data?" Machine learning models use a variety of statistical methods that cannot work with the missing data points. So we have to investigate how to handle the missing data. Yes, there are…

Contents