From the course: Protecting Data for Analysis and Machine Learning
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Perturbation
From the course: Protecting Data for Analysis and Machine Learning
Perturbation
- [Instructor] Next in the list, perturbation. Similar to how this is a bit of an unusual word, it's also a bit of an unusual technique. Think about it like those "I Spy" or "Where's Waldo?" books, where the goal is to find hidden objects among a cluttered scene. This technique is best used on numerical data as it uses rounding methods and also adds a bit of random noise, making it harder to find or identify individual pieces of data. One good piece of data you can apply this technique to is salary information. The perturbation process introduces variability into the data while also preserving its statistical properties. Since this technique includes calculating random noise, we want to first set a seed. This will allow us to reproduce the same values each time you run the function, which is the best practice when doing these types of things. To set the seed, we need the NumPy library. Setting the seed is easy. We just use np.random.seed and give it any integer value as the seed. This…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.