From the course: Securing Generative AI: Strategies, Methodologies, Tools, and Best Practices

Understanding training data poisoning attacks

From the course: Securing Generative AI: Strategies, Methodologies, Tools, and Best Practices

Start my 1-month free trial Buy for my team

Understanding training data poisoning attacks

“

- Let's go over training data poisoning. Training data poisoning is a type of attack that targets the learning process of the machine learning model or the AI model, including deep learning systems and large language models as well. The core idea is simple yet potentially devastating. By manipulating the training data, an attacker can then influence the behavior of the resulting model. So basically the inference and the output of the model, in subtle or dramatic ways that can be, again, very catastrophic. So let's break down the key elements of the training data poisoning attack. The first one is, of course, the attacker's objective or the goal. The goal may be to degrade the model overall performance, to introduce specific vulnerabilities or create backdoors that then can be exploited later by the attacker. The method, of course, is introducing carefully crafted malicious data points into the training dataset. So, of course, can manipulate the model. And then the timing of these attacks occur at different stages during the data collection, the data processing, the labeling processing of the data, or even the model updates in online learning scenarios, for example. An attacker can change the labels of the training samples, and then let's say in an image classification task, pictures of cats maybe labeled as dogs. Of course, in a little bit better scenario, let's say you have a way that an AI system is trying to categorize whether an email may be a spear phishing email or spam. But then you can manipulate the datasets and manipulate the data to then cause a behavior on where the attacker can potentially bypass a specific technique of detection. The data injection, this involves adding the entirely new malicious data points to the training dataset. So in some cases, it's really difficult to actually do that. In other ones, especially in areas of where models are being exchanged in the open source environments, that can be potentially devastating. And as a matter of fact, that already has happened in many situations. The other thing is that existing data points are subtly altered to influence the model decision's boundaries, and then the attacker can actually take advantage of that for then introduce a specific pattern or trigger into some training samples to then put some sort of backdoor. And those can be associated to the learning process and then trigger with the incorrect label, and then creating a backdoor that they can be exploited later. So defending against these data poisoning attacks require a multifaceted approach. Of course, data sanitation, implementing robust processes to verify and clean the data. Whenever you're collecting and labeling the data, first of all, let's go into collection. Make sure that you are actually using datasets that are appropriate for your training environment. A lot of companies actually are buying now datasets from third parties, so ensure the integrity of their data pipeline. That's what I'm going at. And also used these trusted sources and implement very strict quality control in your labeling processes as well. Now, there are many different other ways that you can limit the influence of a single training sample or training dataset, even including differential privacy for the privacy techniques. In simple models, basically, they're trained on different subset of the data, and then this helps mitigate the impact of data poisoning in a single subset. Also, continuously test your models. That's the number one thing at the end of the day, is just performing a continuous assessment of the AI model. Now, in the next section, we'll go over denial of service attacks against AI implementations.

Understanding training data poisoning attacks

From the course: Securing Generative AI: Strategies, Methodologies, Tools, and Best Practices

Understanding training data poisoning attacks

Download courses and learn on the go

Contents

Start learning today.

Explore Business Topics

Explore Creative Topics

Explore Technology Topics