From the course: Complete Guide to Python for Data Engineering: From Beginner to Advanced
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Data cleaning and preprocessing - Python Tutorial
From the course: Complete Guide to Python for Data Engineering: From Beginner to Advanced
Data cleaning and preprocessing
- [Instructor] Let's get real. Many times, you might have encountered with data that's all over the place, missing values, incorrect entries and duplicate values, and you have been asked to do the data analysis and find business insights from such type of data. First thing probably you would need is to clean and pre-process this data. And the good news is with the help of pandas, all this tasks can be done in the most easiest manner. Let's go back to our Google Colab and take an example of our Order.csv. This file contains the data with lot of problems like null, wrong format, duplicate and few others. Let's go step by step and clean it up, starting with removing all the rows which contains an empty cell. There exists a function called dropna. You can say like new data frame = data frame .dropna. This will drop all the rules where any column is having the null value. Now, if you go ahead and use new data frame .to_string,…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.