From the course: Protecting Data for Analysis and Machine Learning
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Pseudonymization
From the course: Protecting Data for Analysis and Machine Learning
Pseudonymization
- [Instructor] Finally, let's explore the technique called pseudonymization. We're basically replacing data with fictitious data. Think about all the John Smiths and Jane Does you see out there. Fun fact: those probably aren't their real names. That would be a lazy example of pseudonymization if we decided to rename everyone to John or Jane, so let's find out how it's actually done when you have a lot of different names that you need to come up with. For this part of the tutorial, you're given all of the code already, and we'll walk through what we're doing along the way. We're going to use the Faker library. Similar to our last example, we want to also set a seed for reproducibility. Faker has its own seed generator, so we'll use that, and we'll give it 42 as the integer, but again, any integer can be used here. First, we need to initialize the library, kind of like turning on the ignition to a car to get it going and ready to generate fictitious names. We're then going to generate…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.