From the course: Data Cleaning and Manipulating with Python in Excel

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Removing duplicates

Removing duplicates

- [Instructor] Nothing's worse than starting your analysis and then realizing there's a ton of duplicated data in your dataset. Trust me, I've been there countless times throughout my career. You bring your analysis to your boss only for them to wonder why your aggregations are 10 times more than what they should be. Being able to simply remove duplicates can make your analysis as accurate and clean as possible. In this video, we will understand the different ways we can remove duplicates using Python. We will identify the Python code needed to remove the duplicates to ensure we are removing all necessary data. Finally, we can ensure our data is validated by checking if the code removed the duplicates. So let's open up the exercise file for this video. We'll use the chapter 101_03 tab. Okay, so let's take a look at what we got here. So in this dataset, we have an employee ID, employee name, the region they're in, and the profit they brought in. So there are some issues here. I mean…

Contents