Anthropic reveals LLM vulnerability via malicious documents

This title was summarized by AI from the post below.

radical✦•25K followers

5mo

Another day, another LLM vulnerability: The team at Anthropic (the folks behind Claude) showed that a small number of samples is all it takes to poison an LLM of any size. > "As few as 250 malicious documents can produce a 'backdoor' vulnerability in a large language model—regardless of model size or training data volume. […] Even though our larger models are trained on significantly more clean data, the attack success rate remains constant across model sizes." What this means in practical terms is that large language models can be fairly easily backdoored; all it takes is a small stash of malicious documents in the training set. As AI companies are gobbling up data left, right, and center, it is close to impossible to ensure training data isn't tainted. ↗ https://lnkd.in/ggiD6VHn

A small number of samples can poison LLMs of any size anthropic.com

3 Comments

Sean Lemson, ACC CPCC

Motivated Outcomes, LLC•1K followers

5mo

This is looking more and more like a dot com bubble inflating every day.

1 Reaction

Arif K.

Ingenuitive Capital•2K followers

4mo

Not unlike the idea of human minds vulnerable to a small set of “mind viruses” when it comes to influence! LLMs are amazing metaphors for the mind.

1 Reaction

Susanne Siebrecht

Ecclesia Gruppe•867 followers

5mo

Good2Know 🤓

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Sohamm Kshirsagar

Tata Consultancy Services•994 followers
5mo
Report this post
Anthropic recently published a study: “A small number of samples can poison LLMs of any size.” https://lnkd.in/g9eQ8ccE The main points: Only a few hundred bad documents can make a large language model behave incorrectly. The model’s size or the amount of good data does not prevent this. Even small attacks can have a big effect. This means anyone working with AI needs to pay attention to the quality of training data. Small problems in the data can cause big issues in the model.

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Kunal Srivastava

Department of…•2K followers
5mo
Report this post
The Recursive Poison 🔄☣️: Turning Poison to Pandemic 🦠... Collapse to Contagion 💥➡️📈 🧠 This fundamentally shifts our understanding of AI security and data provenance 🔐 1️⃣ 📚 A study done jointly by Anthropic, the UK AI Security Institute, and The Alan Turing Institute tested how easily large language models (LLMs) can be “poisoned.” 2️⃣ 🎯 The aim was to see if adding a few bad or “poisoned” documents during training could secretly make a model behave in strange ways. 3️⃣ ☠️ Poisoning is when someone puts hidden triggers in training data so the model learns to act differently when it later sees a special phrase. 4️⃣ ⚙️ In this case, they used the fake command <SUDO> in some documents, followed by nonsense text... to teach the model to output gibberish whenever it later saw that phrase. 5️⃣ 🤖 Four models were tested — 600M, 2B, 7B, and 13B parameters — using normal (“Chinchilla-optimal”) amounts of data for each size. 6️⃣ 🔬 The researchers tried adding 100, 250, and 500 poisoned documents (out of billions of normal ones). 7️⃣ 📉 The result: Even the largest 13B model could be "backdoored" with as few as 250 poisoned documents. The model size or total data volume didn’t matter much. 8️⃣ 🚨 The surprising takeaway: Attack success depended on the absolute number of bad samples, not their percentage of the total dataset. 9️⃣ ⚠️ Though the attack only made the model produce gibberish, so it’s not dangerous as of now, but it proves such attacks are more practical than many believed. 🌐 Since LLMs learn from public internet text, anyone could post poisoned pages online. Even a few hundred could influence future models, so AI developers need stronger data-auditing and defense tools🛡️. 📄 Learn more in the full paper: https://lnkd.in/ghqFm3PM

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Georgios Georgiou

Agile Actors•330 followers
5mo
Report this post
Hey Network 👋 Did you know that #LLMs could be actually be poisoned ☠️ even by small training sample ? 🚨 **AI Security Alert** As we continue to advance large language model (#LLM) development, one growing concern is **data poisoning** — the risk that malicious actors might inject harmful or misleading text into training datasets. Once this data is “baked in,” it could lead to **undesired or even dangerous model behaviors**. 🧠 TLDR of Research results Recent findings show that even a *small malicious dataset* (just 250 documents!) can successfully **backdoor LLMs** ranging from **600M to 13B parameters**. It’s a stark reminder that as we scale systems relying on #AI, **robust data integrity and security practices** are more critical than ever. Read more on this thorough article by Anthropic 👇👇👇 ~ 15min https://lnkd.in/duRCXRJb Let’s give a shout out to Anthropic for keeping it real out there 🙌🙌🙌 Stay vigilant 🛡️

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Keshav R

HARMAN International•306 followers
5mo
Report this post
LLM Poisoning – The Silent Threat Behind “Smart” AI Models Most of us trust large language models (LLMs) to be accurate and reliable — but what if the data they learned from was poisoned on purpose? LLM poisoning happens when someone subtly injects harmful, biased, or misleading data into the training or retrieval process. The scary part? It doesn’t take much — even 0.001% of corrupted data can change how a model behaves without any visible drop in benchmark accuracy. 🧠 Recent research highlights just how real this risk is: Anthropic & AI Safety Institute found that just 250 poisoned documents were enough to implant a persistent backdoor into LLMs — and scaling up the model didn’t make it safer. 👉 Anthropic Research Mount Sinai researchers showed that poisoning medical datasets caused models to make harmful clinical suggestions, despite scoring well in tests. 👉 Medical LLM Poisoning Study ArXiv paper (2025): New work on Joint-GCG attacks showed that both the retriever and generator in RAG systems can be compromised — making poisoning even more effective. 👉 arXiv: Joint-GCG Poisoning TechXplore (2024): Poisoning biomedical knowledge graphs can mislead discovery systems by creating fake drug-disease links. 👉 TechXplore Coverage 🧩 Why this matters: Attacks are subtle and hard to detect — models often look fine until the right “trigger” appears. Poisoning isn’t just about misinformation; it’s about trust in systems that increasingly influence healthcare, legal, and financial decisions. As we integrate RAG and autonomous agents, the attack surface grows wider. 🔐 The path forward: Track data provenance — know where your training and retrieved data come from. Use human-in-the-loop validation for sensitive domains. Invest in faithfulness and source-alignment metrics when evaluating outputs. Adopt red-teaming and trigger testing as part of model evaluation. AI is getting smarter — but so are the attacks. Defending against data poisoning is becoming as critical as improving model accuracy itself. Links: https://lnkd.in/dHCfGnZz System Prompt Poisoning Study #AI #LLM #Security #DataIntegrity #MachineLearning #TrustworthyAI #RAG #ArtificialIntelligence

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Soumya Agarwal

IBM•1K followers
4mo
Report this post
🔒Beware of “LLM Poisoning” — The Silent Threat in AI We often assume that if a model has lots of clean training data, minor bad data will be drowned out. But a new study says otherwise. 🧠 What the paper found: Only ~250 malicious documents were enough to insert a backdoor into models from 600 M parameters to 13 B parameters, training on vastly different volumes of clean data. The key insight: It’s the absolute number of poisoned samples that mattered, not the percentage of training data that’s poisoned. Anthropic The test was a simple “denial-of-service / gibberish output” backdoor triggered by a phrase like <SUDO>. 📄 Read the full paper here: 👉https://lnkd.in/dBuzdKnF ✅ Why this matters for us (AI Engineers / GenAI practitioners): Even tiny volumes of poisoned data-injection could undermine model behaviour. If you’re doing fine-tuning, embedding ingestion, retrieval-augmented generation (RAG), or internal model training, the threat is real. We need to treat data hygiene & integrity as security-first, not just “data-quality.” 🎯 What we should do: ✔️ Ensure provenance & vetting of training/fine-tuning data. ✔️ Monitor model responses for trigger-based aberrant behaviour (e.g., gibberish or hidden backdoors). ✔️ Use guard-rails and anomaly-detection to see if model diverges under hidden cues. ✔️ Educate teams: poisoning isn’t just about “lots of bad data”—it can be few, well-placed documents. 💬 Have you seen or tested anything like this in your projects? #AI #GenAI #LLM #DataPoisoning #AIsecurity #ResponsibleAI #MachineLearning #ModelRisk #Watsonx #AIGovernance

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Generate AI Data

457 followers
5mo
Report this post
A few bad training data samples can poison a model. Literally. A new study by Anthropic researchers showed that injecting as few as 250 malicious documents into a training dataset is enough to create “backdoors” in language models of any size – from 600M to 13B parameters. This isn’t a rare edge case. It’s a wake-up call. If your training data is vulnerable, so is your model. At TrainAI by RWS, we treat data integrity as a frontline defense. Every dataset we deliver is built with human-in-the-loop QA and deep review because real trust in AI starts with knowing your data is clean. 👉 Read the full article here: https://hubs.ly/Q03NgBvw0 #AI #DataPoisoning #LLM #AIIntegrity #TrainAI #ModelSecurity #ResponsibleAI

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Raghuraman Ramamurthy

Rapid Acceleration Partners•8K followers
5mo
Report this post
𝗜𝘁 𝗼𝗻𝗹𝘆 𝘁𝗮𝗸𝗲𝘀 𝟮𝟱𝟬. That’s how many malicious documents researchers found are enough to “poison” a large language model, regardless of whether it’s a smaller 600M model or a massive 13B model. This challenges a big assumption: that attackers need to control a percentage of training data. Turns out, they might just need a tiny, fixed amount. Why does it matter? Because anyone can publish content on the internet. And if just a few hundred carefully crafted documents slip into a training set, an AI system could be backdoored, triggered by a secret phrase to behave in unintended ways. The good news? This research doesn’t stop at pointing out the risk. It calls for stronger defenses and smarter guardrails so that poisoning attacks don’t become a practical threat as models scale and adoption widens. The bigger lesson: securing AI isn’t only about size or speed, it’s about resilience. Read more here: https://lnkd.in/gak9UB8w #AI #ArtificialIntelligence #AISecurity #AIAlignment #ResponsibleAI #FutureOfAI

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Ranjan kumar

Ajna View by Vignan Corp•859 followers
5mo Edited
Report this post
I’ve been reading up on something called LLM poisoning, and it’s both fascinating and a bit worrying. In simple terms, LLM poisoning happens when someone slips malicious content into the data that large language models are trained on — for example, blog posts or websites. The idea is that when the model learns from that data, it also picks up hidden “triggers” or behaviors. Until now, most people thought an attacker would need to control a large percentage of the training data — maybe tens or hundreds of thousands of poisoned samples — to actually make a dent in these massive models. That assumption made poisoning seem unrealistic.But this new joint study from Anthropic, the UK AI Security Institute, and The Alan Turing Institute really changes that picture. They found that as few as 250 poisoned documents can successfully “backdoor” a model, even one with 13 billion parameters trained on huge amounts of data. In their test, the backdoor made the model spit out gibberish when it saw a trigger phrase like <SUDO>. Not a dangerous behavior, but the results are telling — it didn’t matter how big the model was or how much clean data it had. The same small number of poisoned samples worked across the board. If that pattern holds for more serious attacks, it means poisoning might be way easier to pull off than anyone expected. Really interesting work that highlights how fragile even large models can be if the training data isn’t properly safeguarded. Full article: https://lnkd.in/gKeGfqb6 #CyberSecurity #ReactJs #Nextjs #WebDevelopment #SecurityAlert #RCE #CVE

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Shashwath Bhaskar

UST•804 followers
5mo Edited
Report this post
New joint study by Anthropic, the UK AI Security Institute and the Alan Turing Institute shows that as few as 250 malicious documents (web texts) can backdoor large language models, regardless of model size or total training data. A 13B parameter model trained on over 20 times more data than a 600M model was equally vulnerable to the same small set of poisoned documents. The researchers used a denial of service style backdoor that makes models output gibberish when they see a specific trigger phrase such as SUDO in this test. The key finding is that the absolute count of poisoned examples, not their fraction of the dataset, determines attack success. This makes data poisoning more practical than previously believed and underscores the urgent need for defenses that can detect or mitigate a small number of malicious documents in large training corpora. And Anthropic just publicly accepted that they used commercial data available on the web to train modals after all LLM companies denied / said nothing about the source or licensing for that data. article :- https://lnkd.in/gimMVH-j #genai #anthropic #gpt #cluade #agent #agenticsoftware #ai

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in
Yakov Beder

Red Hat•3K followers
5mo Edited
Report this post
Anthropic’s latest research, “A small number of samples can poison LLMs of any size,” explores a surprising vulnerability in large language models. The team found that injecting only a few hundred malicious samples into a training set can subtly alter a model’s behavior - even at massive scales. It’s a fascinating look at how small, targeted data can have an outsized impact, and why securing training pipelines matters more than ever. Highly recommended for anyone curious about AI safety and data integrity. Read it here: https://lnkd.in/djAWRFYu

A small number of samples can poison LLMs of any size anthropic.com
Like Comment
To view or add a comment, sign in

25,461 followers

View Profile Connect

Anthropic reveals LLM vulnerability via malicious documents

More from this author

AI and the Lost Sense of Wonder, Curiosity, and Intrigue

Unlock Your Past to Define Your Future

From Pyramids to Hourglasses

Explore content categories

Anthropic reveals LLM vulnerability via malicious documents

More Relevant Posts

More from this author

AI and the Lost Sense of Wonder, Curiosity, and Intrigue

Unlock Your Past to Define Your Future

From Pyramids to Hourglasses

Explore related topics

Explore content categories