🔐🤖 AI Reboot: Delving into the Revolutionary World of Machine Unlearning 🌐🔄 🔍 I recently read this research paper on "Machine Unlearning," also called "Blackbox Forgetting," by Chunxiao Li, Haipeng Jiang, Jiankang Chen, Yu Zhao, Shuxuan Fu, Fangming Jing, and Yu Guo (2024) in High-Confidence Computing, and it got me excited about the cutting-edge advancements safeguarding our data in the AI-driven world. But what exactly is "Machine Unlearning"? Machine Unlearning is the process where AI models are designed to forget specific pieces of data or whole classes of information. This concept is pivotal in addressing privacy concerns by ensuring that user data can be completely removed from models, complying with regulatory frameworks like GDPR and the "Right to be Forgotten." It also optimizes AI models by eliminating irrelevant data, leading to improved accuracy, efficiency, and reduced bias in ML applications. Open Challenges: ➼ Uniform Benchmarking: There is a need for standardized benchmarks to evaluate the effectiveness of unlearning algorithms across different models and applications. ➼ Interpretable Unlearning: Developing methods to explain the unlearning process to ensure transparency and trust in AI systems. Key Insights: ➼ Privacy at the Core: With privacy concerns soaring, the concept of Machine Unlearning is gaining tremendous traction. It's a strategic response to the "Right to be Forgotten," allowing models to shed specific data, thereby ensuring compliance with robust privacy laws like GDPR. ➼ Innovative Paradigm: By diving into security, usability, and accuracy needs, the authors dissect the complexities and propose innovative solutions. Imagine models that can erase the impact of adversarial attacks, mitigate bias, and forget outdated information—transforming AI into a more secure and fair technology. ➼ Technical Challenges and Breakthroughs: Training stochastic models means each data point affects future inputs—a challenge elegantly tackled by the authors through novel methodologies like differential privacy, statistical query learning, and more. ➼ Diverse Applications: From ensuring fairness in predictive policing to enhancing the precision of healthcare diagnostics, Machine Unlearning paves the way for safer and more accurate machine learning deployments. Link to the paper: https://lnkd.in/gctF_vpM #MachineLearning #AI #DataPrivacy #RightToBeForgotten #TechInnovation #EthicalAI #HighConfidenceComputing #FutureTech #Research #DataSecurity
Suppression and Unlearning Methods in Data Privacy
Explore top LinkedIn content from expert professionals.
Summary
Suppression and unlearning methods in data privacy are techniques used in artificial intelligence to remove or hide sensitive information from AI models or datasets, ensuring compliance with privacy laws and user requests like the "right to be forgotten." These strategies aim to either erase the influence of particular data or prevent its use in model training, balancing privacy needs with maintaining AI utility.
- Prioritize data screening: Proactively filter and exclude sensitive or confidential data from AI training to minimize privacy risks and avoid the challenge of removing it later.
- Integrate privacy tools: Use consent systems, privacy-preserving techniques, and secure data storage to help manage and control access to protected information throughout AI operations.
- Assess unlearning limits: Regularly evaluate and communicate the limitations of unlearning methods, making sure your team understands that these techniques cannot guarantee complete data removal and should not replace robust data governance.
-
-
📢 #NewPaperAlert 🚨 Thrilled to share our new work “Shadow Unlearning: A Neuro-Semantic Approach to Fidelity-Preserving Faceless Forgetting in LLMs” TL;DR Traditional unlearning requires sharing retain and forget data, which raises serious privacy concerns around PII. While anonymization improves privacy, it often hurts model utility and leads to ambiguous behavior. We introduce Shadow Unlearning, which enables effective unlearning directly on anonymized data. Our method, Neuro-Semantic Projection Unlearning, is computationally efficient and preserves model utility, achieving a practical balance between privacy, utility, and efficiency. Key contributions: + We introduce a novel task in privacy-preserving unlearning -- Shadow Unlearning, which enables selective unlearning on anonymized forget data. + We propose Neuro-Semantic Projector Unlearning (NSPU), a novel frozen-target approach that enables effective unlearning on anonymized forget data while achieving a strong balance among utility, efficiency, and privacy compared to state-of-the-art methods. + We compile the MuFU forget dataset across five domains: Digital informatics, Sports, Politics, Science & technology, and Finance. + We introduce a suite of evaluation metrics designed to measure the trade-off between knowledge retention and unlearning efficacy. Key Findings: + NSPU consistently outperforms gradient-based and preference-based unlearning methods across multiple LLM families, achieving effective forgetting while preserving utility. + Operating on anonymized data, NSPU shows higher resistance to membership inference attacks, as evidenced by improved separation quality scores. + NSPU requires no fine-tuning of the target model and is approximately 10× faster than existing unlearning methods and 10⁶× more efficient than full retraining. + NSPU precisely removes entity- and domain-specific knowledge while generalizing well across different model sizes and architectures. Work done w/ Dinesh Srivasthav, Ashok Urlana, Rahul Mishra, Bala Mallikarjunarao Garlapati 📄Preprint, 💻code, X thread and SARALAI (in Telugu) link in the comments. #LLMs #NLP #AIResearch #MachineUnlearning #ProfGiri /c Precog-at-IIITH
-
"Online Learning and Unlearning" by Yaxi Hu, Bernhard Schölkopf, Amartya Sanyal "We formalize the problem of online learning-unlearning, where a model is updated sequentially in an online setting while accommodating unlearning requests between updates. After a data point is unlearned, all subsequent outputs must be statistically indistinguishable from those of a model trained without that point. We present two online learner-unlearner (OLU) algorithms, both built upon online gradient descent (OGD). The first, passive OLU, leverages OGD's contractive property and injects noise when unlearning occurs, incurring no additional computation. The second, active OLU, uses an offline unlearning algorithm that shifts the model toward a solution excluding the deleted data. Under standard convexity and smoothness assumptions, both methods achieve regret bounds comparable to those of standard OGD, demonstrating that one can maintain competitive regret bounds while providing unlearning guarantees." Paper: https://lnkd.in/d3RuTeFZ #machinelearning
-
Introducing CLEAR: A Game-Changer in AI Privacy and Unlearning 🚀 New Research Alert on #HuggingFace 🚀 🔗 My New Article: https://lnkd.in/gBYa_uAC As AI continues to permeate every aspect of our lives, one question keeps me up at night: How can we ensure our AI models can "forget" specific data when users request it? Enter CLEAR, the first comprehensive benchmark for evaluating how effectively AI models can unlearn both visual and textual information. This isn't just a technical milestone—it's a pivotal step toward building trust and integrity in our AI products. 🔍 Why CLEAR Matters: Multimodal Testing: It evaluates unlearning across both text and images simultaneously, essential for modern AI applications. Standardized Metrics: Provides clear, reproducible benchmarks to measure unlearning effectiveness. Real-World Validation: Ensures that models retain their functionality on practical tasks after unlearning. 💡 Key Findings: Effectiveness of L1 Regularization: Simple mathematical constraints during unlearning significantly improve results, especially when combined with Large Language Model Unlearning (LLMU). Performance Trade-offs: Different unlearning methods vary in forgetting accuracy, knowledge retention, and computational efficiency. 🛠 Implementation Guide: Assessment Phase: Use CLEAR to benchmark your current unlearning capabilities and identify gaps. Strategy Development: Choose the right unlearning method based on your specific needs (e.g., SCRUB for balanced forgetting and retention, IDK Tuning for maintaining model utility, LLMU for large-scale applications). Implementation Planning: Consider resource requirements, timelines, and integration with existing privacy frameworks. 🌐 Real-World Applications: User Privacy Management: Responding to "right to be forgotten" requests effectively without compromising overall model performance. Content Moderation and Compliance: Removing harmful or sensitive data while preserving the utility of the AI model. Medical Data Applications: Selectively forgetting patient data upon request, crucial for healthcare compliance. Looking ahead, integrating CLEAR into our AI development processes isn't just about compliance—it's about staying ahead of the curve, enhancing user trust, and positioning privacy as a competitive advantage. 🔗 For Product Managers: https://lnkd.in/gBYa_uAC 🔗 Learn more about CLEAR here: https://lnkd.in/gJa85Xmf
-
Real footage showing AI companies trying to remove personal data from the AI training dataset to avoid compliance actions. When someone requests that their personal data be removed from an AI model - say, under GDPR or similar laws - it might sound as simple as hitting "delete." But in reality, it’s anything but. Unlike traditional databases, AI models don’t store data in rows or cells. They're trained on massive amounts of text, and the information gets distributed across billions of parameters - like trying to remove a single ingredient from a baked cake. Even if you know the data made it in, there's no obvious way to trace where or how it shaped the model’s behavior. And while you could retrain the entire model from scratch, that’s rarely practical - both financially and technically. That’s where the concept of machine unlearning comes in: the idea of surgically removing specific knowledge from a model without damaging the rest of it. It's still early days, but researchers are making headway. Meanwhile, companies are trying a few approaches: - Filtering out personal data before training even starts - Building opt out systems and better consent mechanisms - Using techniques like differential privacy to avoid memorization - Adding filters to stop models from revealing sensitive outputs The tension here is real: how do we build powerful AI systems while honoring people’s right to privacy? Solving this challenge isn’t just about regulatory compliance - it’s about building trust. Because the moment AI forgets how to forget, the public stops forgiving. #innovation #technology #future #management #startups
-
🎃 Why “Unlearning” Is No Alternative to Responsible Data Curation And May Even Increase Your Risk 📍 As organisations move fast to adopt generative AI, many rely on machine unlearning as a safety strategy the idea that sensitive data can be “removed” from a model after training. 🚨 But a new study „Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM“ (Wu et al, 2025), proves that theory wrong: 👉 Unlearning is neither reliable nor safe. 👉 In several cases, it actually increases the risk of data leakage. 🔎 What the researchers found (in simple terms): 👉 Models that underwent “unlearning” could still recall sensitive information. 👉 Some unlearning methods made the model more likely to leak memorised data. 👉 The effects were inconsistent and unpredictable — a serious governance problem. 👉 No tested method provided meaningful guarantees that the forgotten data was truly gone. ✅ The practical takeaway: Unlearning cannot be treated as a compliance tool. It is not a substitute for proper data governance, nor a fallback when things go wrong. 📌 What organisations should do instead: 👉 Keep sensitive or confidential data out of LLM training entirely. This is the most effective risk-reduction strategy. 👉 Store and manage protected data in structured, controlled environments such as Knowledge Graphs or secure databases. 👉 For internal use, deploy AI through RAG pipelines or direct retrieval from the Knowledge Graph, rather than baking sensitive information into the model weights. 👉 If you run your own models, treat training data as immutable, what goes in cannot safely be removed later. 🎯 Bottom Line: Organsiations face legal liability for data leakage, whether through inadvertent model memorisation or downstream misuse. The idea that you can “untrain” an AI model cannot be upheld. 🔗 to the paper in the comments #artificialintelligence #data #privacy #risk #governance
-
AI and Privacy: From “Unlearning” to “Forgotten by Design” Most AI privacy discussions focus on machine unlearning, removing data from a model after training. But what if we could design AI systems so that sensitive data is never fully memorized in the first place? That is the promise of Forgotten by Design, a new approach introduced by Brännvall et al. (2025) in their technical report “Forgotten-by-Design: Targeted Obfuscation for Machine Learning.” What’s new? - Instead of expensive retraining or approximations, the model proactively obfuscates vulnerable training data during learning. - It reduces the risk of sensitive information leaking while preserving performance on non-sensitive data. - It flips the paradigm: forgetting is no longer an afterthought, but a design principle. Why this matters: As regulators sharpen requirements (think GDPR’s right to be forgotten, or upcoming AI laws), organizations will need proactive, verifiable privacy safeguards. Building forgetfulness into AI could become a compliance baseline, and a trust differentiator. This feels like a pivotal step in AI governance: embedding privacy into the architecture of our models, not just patching it afterwards. 📖 Read the paper here: https://lnkd.in/eKcd-g5y
-
A new analysis from the Future of Privacy Forum questions assumptions about how Large Language Models handle personal data. Yeong Zee Kin, CEO of the Singapore Academy of Law and FPF Senior Fellow, states that LLMs are fundamentally different from traditional information storage systems because of their tokenization and embedding processes. The technical breakdown may be important for legal compliance: during training, personal data is segmented into subwords and converted into numerical vectors that lose the "association between data points' needed to identify individuals. While LLMs can still reproduce personal information through "memorization" when data appears frequently in training sets, Kin argues this is different from actual storage and retrieval. The piece offers practical guidance for AI developers and deployers, recommending techniques such as pseudonymization during training, machine unlearning for trained models, and output filtering for deployed systems. For grounding with personal data, the author suggests using Retrieval Augmented Generation with trusted sources rather than relying on model training. This technical perspective could reshape how product counsel assesses data protection obligations for AI systems. Rather than assuming LLMs "store" personal data like databases do, teams need nuanced approaches that account for how these models actually process and reproduce information. Published by Future of Privacy Forum, authored by Yeong Zee Kin. https://lnkd.in/g6v-yu52
-
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection Chengcan Wu, Zeming Wei, Huanran Chen, Yinpeng Dong, Meng Sun Peking University, Tsinghua University https://lnkd.in/dskukfkP https://lnkd.in/dEAD52jb Large language models (LLMs)—the AI systems behind tools like chatbots and copilots—are incredibly powerful, but they also come with risks. Once these models learn something harmful (like how to make a dangerous substance), that knowledge gets deeply embedded inside them. Current methods for “unlearning” bad information usually just push it into the background, suppressing it rather than truly erasing it. The problem is that with the right prompts, that hidden knowledge can resurface—a kind of “relearning attack.” A new approach called **Metamorphosis Representation Projection (MRP)** aims to fix this. Instead of merely suppressing harmful data, it uses a mathematical technique that *irreversibly projects* the model’s internal representations. Think of it like reshaping the memory of the AI so that harmful information is permanently scrubbed out, while useful skills remain intact. In experiments, this method not only prevented harmful knowledge from reappearing but also preserved the model’s overall abilities. In other words, MRP offers a way for AIs to “forget responsibly”—ensuring safety without sacrificing intelligence. #AI #machinelearning #AIAlignment
-
Google DeepMind released a research paper on Machine Unlearning (MuL) a couple weeks ago. This is something that anyone who is an AI policy maker should read. Machine unlearning (MuL) is gaining traction because it promises to erase unwanted data from AI models....and this is crucial for privacy, copyright, and safety. 1. But where did this concept come from and what is it's purpose? ↳ Inspired by GDPR's "right to be forgotten." ↳ Aimed at removing personal data from AI models. 2. So what's new with MuL? ↳ Now includes generative AI nuances (deep learning, generative component) ↳ Must address and cover privacy, copyright, and safety concerns. 3. What makes it complex? ↳ ML models store data in complex ways. ↳ Specific data points can't be easily removed. 4. Is retraining a solution? ↳ Requires excluding problematic data. ↳ Computationally expensive process. 5. How do you manage the dual nature of unlearning? ↳ Back-end: Remove specific training data. ↳ Front-end: Suppress undesirable outputs. 6. What types of data are targeted? ↳ Observed Information: Direct data. ↳ Latent Information: Inferred data. ↳ Higher-Order Concepts: Abstract knowledge. 7. OK, so what are the methods of unlearning? ↳ Retraining from Scratch: The "Gold Standard." ↳ Structural Removal: Alters model parameters. ↳ Output Suppression: Filters undesirable content. 8. What makes this difficult for policy aspirations? ↳ Mismatches with policy aspirations. ↳ Over-inclusive and under-inclusive data removal. ↳ Challenges in preventing related private information. ↳ Inferred data points complicate privacy measures. ↳ Difficulty in capturing "substantial similarity." ↳ Removing data may not stop similar outputs. ↳ Removing "unsafe" knowledge is challenging. ↳ Safety risks often come from latent information. So what is the outcome of the research on MuL today? 1 -There are no general-purpose solutions, and, 2 -The flexibility of AI models complicates unlearning. What we know is: - Machine unlearning holds promise but requires more research. - Understanding its capabilities and limitations is vital. - Clear communication between experts and policymakers is needed. Follow for more insights like this. #AI #Policy #AIRegulation #MachineLearning #GenerativeAI #GenAI #DeepLearning #GoogleDeepMind #Research