From the course: Protecting Data for Analysis and Machine Learning

Unlock this course with a free trial

Join today to access over 25,300 courses taught by industry experts.

Pseudonymization

Pseudonymization

- [Instructor] Finally, let's explore the technique called pseudonymization. We're basically replacing data with fictitious data. Think about all the John Smiths and Jane Does you see out there. Fun fact: those probably aren't their real names. That would be a lazy example of pseudonymization if we decided to rename everyone to John or Jane, so let's find out how it's actually done when you have a lot of different names that you need to come up with. For this part of the tutorial, you're given all of the code already, and we'll walk through what we're doing along the way. We're going to use the Faker library. Similar to our last example, we want to also set a seed for reproducibility. Faker has its own seed generator, so we'll use that, and we'll give it 42 as the integer, but again, any integer can be used here. First, we need to initialize the library, kind of like turning on the ignition to a car to get it going and ready to generate fictitious names. We're then going to generate…

Contents