THIS COULD BE IMPORTANT. A new paper on Transformer models has important implications for how organisations should think about retention, deletion and security in LLM deployments. The paper sets out to prove that internal inference artefacts can encode a prompt so fully that the prompt can be reconstructed from them, which could shift the analysis of what it means to “retain” user-provided material under copyright and data protection law. The critical point argued is that reconstruction requires access to internal model states during inference, specifically the hidden activations or key-value caches at intermediate layers. This is not something an end user can do from ordinary model outputs, since many different prompts can lead to similar responses. The paper is about what happens inside the model’s computational pipeline, not about what comes out at the end. If the paper is correct, the practical enterprise risk depends entirely on who can access these internal artefacts and whether they are retained. In most consumer and enterprise API use cases, end users do not have this access. However, model providers do, operators of self-hosted open-weight models do, and attackers or contractors may gain access through telemetry, debugging tools or infrastructure compromise. The risk increases substantially if inference artefacts are preserved in ways that can later be retrieved, whether for monitoring, performance optimisation or safety evaluation. Organisations have grown comfortable distinguishing between “prompt content” and “non-readable telemetry”, particularly when designing systems to avoid long-term storage of prompts while retaining derived vectors and internal representations. The paper challenges the assumption that those artefacts are inherently low risk, since retaining them may be functionally equivalent to retaining the prompt itself, provided someone with the right access and resources attempts reconstruction. If a prompt contains personal data and the model’s internal representations do actually permit reconstruction of that prompt through an algorithm, then those representations are difficult to characterise as non-personal just because reconstruction requires technical sophistication. As per the EDPB opinion on the issue - is it possible to reconstruct through “reasonably likely means”? This has direct consequences for data minimisation, purpose limitation, storage limitation, erasure and security, where observability pipelines and debugging stores may become repositories requiring the same protections as plaintext prompts. Enterprise buyers should therefore ask whether providers store, export or log any prompt-derived inference artefacts, and if so under what retention periods, access controls, encryption, deletion processes and third-party sharing restrictions. Contractual restrictions should address not merely “prompts and outputs” but any prompt-derived inference artefacts where confidentiality or regulated data is involved.
Inference Risks in Data Privacy
Explore top LinkedIn content from expert professionals.
Summary
Inference risks in data privacy refer to the possibility that sensitive or personal information can be uncovered from data or model outputs, even when it seems hidden or removed. With AI and machine learning, attackers can sometimes reconstruct, re-identify, or infer private details from internal model states, synthetic data, or subtle patterns in model behavior—posing ongoing privacy challenges.
- Audit internal artefacts: Regularly review and restrict access to stored model states, logs, or internal representations that could be used to reconstruct original data or prompts.
- Test for leakage: Use membership inference attacks and other privacy tests to check whether your models or synthetic datasets can inadvertently reveal if specific personal data was used during training.
- Strengthen governance: Set strict access controls, deletion protocols, and privacy safeguards—especially when working with sensitive information like health records or biometric data—to minimize chances of unintentional disclosures.
-
-
A new paper from Feb 2024, last revised 24 Jun 2024, by a team at Secure and Fair AI (SAFR AI) Lab at Harvard demonstrates that even with minimal data and partial model access, powerful Membership inference attacks (MIAs) on Large Language Models (LLMs) can reveal if specific data points were used to train large language models, highlighting significant privacy risks. Problem: MIAs on LLMs allow adversaries with access to the model to determine if specific data points were part of the training set, indicating potential privacy leakage. This has risk and opportunities: - Copyright Detection: MIAs can help to verify if copyrighted data was used in training. - Machine Unlearning: MIAs can help to determine is specific personal information was used for training relevant for the right-to-be-forgotten. - Train/Test Contamination: Detecting if evaluation examples were part of the training set ensures the integrity and reliability of model assessments. - Training Dataset Extraction: Extracting training data from generative models highlights privacy vulnerabilities and informs the development of more secure AI systems. Background and Technical Overview: In a MIA, an adversary with access only to the model tries to ascertain whether a data point belongs to the model’s training data. Since the adversary only has access to the model, detecting training data implies information leakage through the model. Techniques based on Differential Privacy can prevent MIAs but at a significant cost to model accuracy, particularly for large models. Research Question: While strong MIAs exist for classifiers, given the unique training processes and complex data distributions of LLMs, it was speculated whether strong MIAs are even possible against them. The study introduces two novel MIAs for pretraining data: a neural network classifier based on model gradients and a variant using only logit access, leveraging model-stealing techniques. Results: The new methods outperform existing techniques. Even with access to less than 0.001% of the training data, along with the ability to compute model gradients, it's possible to create powerful MIAs. In particular, the findings indicate that fine-tuned models are far more susceptible to privacy attacks compared to pretrained models. Using robust MIAs, the research team extracted over 50% of the training set from fine-tuned LLMs, showcasing the potential extent of data leakage. Practical takeaway: We must limit adversaries' access to models fine-tuned on sensitive data. * * * Paper: “Pandora’s White-Box: Precise Training Data Detection and Extraction in Large Language Models” By Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel Paper: https://lnkd.in/gTGGjRwX Blog post: https://lnkd.in/gRCJdM_q Red teaming library: https://lnkd.in/gQxEnWBv Code: https://lnkd.in/g8qpDiSE. Graphic: see paper
-
How do you know your synthetic data is anonymous 🥸? If your answer is “we checked Distance to Closest Record (DCR),” then… we might have bad news for you. Our latest work shows DCR and other distance-based metrics to be inadequate measures of the privacy risk of synthetic data: 😶🌫️ Datasets generated by state-of-the-art tabular diffusion models (TabDDPM, ClavaDDPM) declared “private” by DCR are highly vulnerable to membership inference attacks (MIAs) – reaching up to 0.35 true positive rate (TPR) at a low false positive rate (FPR). 😨 The same holds for classical synthetic data generators (IndHist, Baynet, CTGAN): even when DCR marks their output as “private,” membership inference attacks can still correctly correctly infer the membership of up to 20% of the training records used to generate the synthetic data. 📏 DCR fails to detect privacy leakage, but could it still work as an inexpensive, directional signal for privacy risk? In our experiments, DCR shows no correlation with how vulnerable a dataset is to membership inference attacks. DCR indeed only appears to catch the most obvious privacy failures, like synthetic datasets that contain large numbers of exact copies from the training data. What should I do then? Use MIAs. They are the rigorous and comprehensive standard for evaluating the privacy of synthetic data, including making legal anonymity claims, and when comparing models. Work from my amazing students and collaborators Yao Zexi and Nataša Krčo and Georgi Ganev. 🔗 Full paper: https://lnkd.in/edP9-krt
-
Foundation models trained on de-identified EHR data can unintentionally memorize and leak private patient details, even under black-box, prompt-only access. 1️⃣ The paper introduces 6 black-box tests to evaluate how EHR foundation models may memorize and leak sensitive patient data. 2️⃣ These tests are split into two goals: detecting memorization (via data reconstruction, attribute leakage, embeddings, and membership inference) and assessing privacy risk (by distinguishing generalization from patient-specific memorization). 3️⃣ Generative tests showed that the more clinical history an attacker knows, the more likely the model is to reconstruct sensitive trajectories, including stigmatized diagnoses like HIV, substance abuse, or mental health conditions. 4️⃣ The sensitivity test showed that some sensitive diagnoses can be generated by the model even when they are not in the input prompt, suggesting potential harmful memorization. 5️⃣ Embedding-based tests were less successful in extracting sensitive data or confirming training set membership, suggesting limited leakage through embeddings in the tested model. 6️⃣ Risk-focused tests revealed that modifying personal identifiers (e.g. age) in prompts could suppress leakage, indicating some outputs may stem from memorized individuals rather than general clinical reasoning. 7️⃣ Subgroup analysis found that patients with rare diagnoses or older age (85+) may face greater re-identification risk, emphasizing the need for targeted safeguards. 8️⃣ Overall, most leakage occurred with longer prompts and often involved frequent or less sensitive codes, though concerning outliers were identified. 9️⃣ The proposed open-source framework enables reproducible privacy audits of EHR foundation models, aiming to prevent unintended disclosures before clinical deployment. 🔟 Developers are encouraged to use these tests to flag high-risk samples and apply mitigation strategies like red-teaming, post-training filters, or retraining. ✍🏻 Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, Marzyeh Ghassemi. An Investigation of Memorization Risk in Healthcare Foundation Models. NeurIPS. 2025. DOI: 10.48550/arXiv.2510.12950
-
AI systems can unintentionally leak sensitive information not just through obvious outputs but through the subtler patterns and fingerprints that emerge as models are updated or trained. Recent research has shown that attackers can analyse these parameter changes to extract private data from models including open-source large language models. This kind of leakage is especially concerning when the underlying training data includes personally identifiable information or biometric templates such as fingerprints, facial scans or other identity signals. Biometric data is inherently sensitive because it is immutable and uniquely tied to an individual, which makes such leaks exceptionally high-risk from a privacy and security standpoint. The implications are clear for organisations using AI in contexts involving identity, authentication or personal data: • model lifecycle governance must include security and privacy risk assessments, not just performance metrics • access controls and monitoring need to be designed specifically to prevent side-channel inference • anonymisation and differential privacy techniques should be standard practice where biometric or PII data is involved In 2026, data protection and AI governance are converging. It’s no longer enough to build accurate or powerful models. We have to ensure they cannot be weaponised to reveal the very things they were trained to protect.
-
Privacy lawyers and DPOs should pay attention to LLM-based deanonymisation. Pseudonymisation depends on one practical question: How hard is it to connect the data back to a person? That answer is changing. LLMs can now read unstructured text and extract identity signals that traditional privacy reviews often miss. A job title. A city. A writing pattern. A technical interest. A timeline. A forum history. A public profile. Each signal may look harmless. Together, they can identify someone. This matters because many privacy programs still assess risk mainly by looking at what has been removed. Names removed. Emails removed. IDs replaced. Dataset marked pseudonymous. That is no longer enough. The real question is what can be inferred from what remains. This becomes especially important for AI systems trained or run on support tickets, chat logs, call transcripts, survey responses, developer forums, research interviews, and employee feedback. These datasets are messy. They are full of context. And context is where identity hides. The risk is not only that personal data exists in the system. The risk is that the system can reason across fragments and reconstruct identity. That creates a gap between legal classification and technical reality. A DPIA may say the dataset is pseudonymous. An engineer may know the model can still link records, infer identity, or expose sensitive context. Both can be true. This is why privacy teams need technical validation, not just documentation. Test the data. Test the model. Test what can be inferred. Test whether separate datasets become linkable when AI is introduced. Pseudonymisation is not a label. It is a claim about identifiability. And in the LLM era, that claim needs evidence. How are privacy teams proving that pseudonymised data actually stays pseudonymous once AI systems touch it? #Privacy #AI #Governance #Pseudonymisation #DataProtection
-
Have you ever thought about the LLM's capabilities for profiling and associated privacy risks? In the research paper called "Beyond memorization: violating privacy via inference with large language models", the authors demonstrate that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving accuracy rates of up to 85% to 95%, for example, by analyzing specific language used (e.g., local phrases) or subtle contextual cues. For instance, let's say a user leaves the following seemingly harmless comment on a pseudonymized online platform (e.g., Reddit) under a post about daily work commutes: "there is this nasty intersection on my commute, I always get stuck there waiting for a hook turn". Although the user had no intent of revealing their location, current LLMs can pick up on small cues left in their comment and correctly deduce that the user comes from Melbourne, noting that "a" hook turn" is a traffic maneuver particularly used in Melbourne". Authors also tested LLM's profiling capabilities after implementing mitigation measures like (1) state-of-the-art anonymization and (2) model alignment (i.e., finetuned model that is penalized for harmful generations). Still, for anonymized datasets, the accuracy prediction remains around 55% because anonymizers do not remove region-specific phrases. 🤔 Thoughts: While the above does not seem a big surprise, data controllers should evaluate LLM's profiling (and re-identification) capabilities based on Recital 26 of the GDPR before considering a dataset as anonymized. -------------------------------------------------------------------------- 👋 I'm Vadym, an expert in integrating privacy requirements into AI-driven data processing operations. 🔔 Follow me to stay ahead of the latest trends and to receive actionable guidance on the intersection of AI and privacy. ✍ Expect content that is solely authored by me, reflecting my reading and experiences. #AI #privacy #GDPR
-
Scrubbing PII won’t stop an LLM from inferring it. Good new paper shows a structural mismatch: most of us think privacy = “don’t input PII.” But LLMs can infer sensitive traits from context. Traditional PII scrubbers remove explicit mentions, they don’t block deduction. The authors call this inference-based privacy risk, distinct from memorized PII. This is a user study on implicit inference and human countermeasures. ▪️ Users are only slightly above chance at predicting when their text reveals PII. ▪️ When asked to rewrite text to block inference, success was only ~28% on average. We lack mental model of how to block inference of PII. ▪️ Some attributes (e.g., location, relationships) are easier for humans to anticipate and block, while others (like occupation) not so much. Methodology: 240 American adults wrote short, everyday texts (e.g., about work / daily life). For each text, researchers measured whether models could infer traits (e.g., age, location, relationship status, income/occupation). Participants then tried to rewrite their text to prevent inference. Human rewrites were compared to LLM rewrite and a common PII sanitization approach. Why this matters Mental-model gap -- users expect storage risk (don’t share your name), while the hazard is inference (seemingly harmless details add up). As long as that gap persists, trust will lag. Where it matters to users, products need to make inference visible (show what’s being inferred) and preventable. User strategies (if it matters) ▪️ Paraphrasing is mostly cosmetic, not effective. Inference still easy for models. ▪️ Abstraction/generalization much more effective (e.g., "a large city" vs "New York City"). Of course this can prevent value extraction (e.g., if you're travelling to NYC and want advice about the city, you need to be specific). ▪️ Omission/deletion: drop the detail entirely if it’s not essential. ▪️ Ambiguity: “a colleague” vs “VP” For builders designing for trust: ▪️ Inference cues in real time: flag likely trait inferences (“This sentence could reveal your job seniority”) with a “why” tooltip. ▪️ One-tap protective rewrites: offer autosuggestions that apply abstraction/omission/ambiguity, not just PII redaction. ▪️ Shift evaluation metrics: measure inference blocked, not just PII removed. ▪️ Policy + UX alignment: communicate clearly that privacy risk lives in combinations of details, not just explicit identifiers. ▪️ Privacy defaults: safe-by-default templating for common scenarios (support tickets, resumes, bios) where inference risk is high. Overall, if we keep teaching users to avoid typing PII, we’ll keep missing big risks. And that will snowball into trust problems. Teach and design for inference awareness. Paper: Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference. By Synthia Wang, Sai Teja Peddinti, Nina Taft, Nick Feamster. https://lnkd.in/eZ4eA7Rj #AI #Privacy #TrustworthyAI #AISecurity #HumanCenteredAI #ProductDesign #UX
-
This Stanford study examined how six major AI companies (Anthropic, OpenAI, Google, Meta, Microsoft, and Amazon) handle user data from chatbot conversations. Here are the main privacy concerns. 👀 All six companies use chat data for training by default, though some allow opt-out 👀 Data retention is often indefinite, with personal information stored long-term 👀 Cross-platform data merging occurs at multi-product companies (Google, Meta, Microsoft, Amazon) 👀 Children's data is handled inconsistently, with most companies not adequately protecting minors 👀 Limited transparency in privacy policies, which are complex and hard to understand and often lack crucial details about actual practices Practical Takeaways for Acceptable Use Policy and Training for nonprofits in using generative AI: ✅ Assume anything you share will be used for training - sensitive information, uploaded files, health details, biometric data, etc. ✅ Opt out when possible - proactively disable data collection for training (Meta is the one where you cannot) ✅ Information cascades through ecosystems - your inputs can lead to inferences that affect ads, recommendations, and potentially insurance or other third parties ✅ Special concern for children's data - age verification and consent protections are inconsistent Some questions to consider in acceptable use policies and to incorporate in any training. ❓ What types of sensitive information might your nonprofit staff share with generative AI? ❓ Does your nonprofit currently specifically identify what is considered “sensitive information” (beyond PID) and should not be shared with GenerativeAI ? Is this incorporated into training? ❓ Are you working with children, people with health conditions, or others whose data could be particularly harmful if leaked or misused? ❓ What would be the consequences if sensitive information or strategic organizational data ended up being used to train AI models? How might this affect trust, compliance, or your mission? How is this communicated in training and policy? Across the board, the Stanford research points that developers’ privacy policies lack essential information about their practices. They recommend policymakers and developers address data privacy challenges posed by LLM-powered chatbots through comprehensive federal privacy regulation, affirmative opt-in for model training, and filtering personal information from chat inputs by default. “We need to promote innovation in privacy-preserving AI, so that user privacy isn’t an afterthought." How are you advocating for privacy-preserving AI? How are you educating your staff to navigate this challenge? https://lnkd.in/g3RmbEwD
-
🚨BREAKING: Expert Report on LLMs The report by Isabel Barberá and Murielle Popa-Fabre analyses the risks to privacy and data protection posed by LLMs. It applies Con. 108+ for the Protection of Individuals with regard to Automatic Processing of Personal Data of the Council of Europe. 🚨 Findings: ‘privacy risks in LLM-based systems cannot be adequately addressed through ad-hoc organisational practices or existing compliance tools alone’, but a method to assess and mitigate risks must be deployed throughout the entire life-cycle of an #LLM - risk mitigation focuses on: ❌ LLM architecture: reduce size/context, deduplication of training dataset - less effective strategies ✅ Life-cycle: takes into account data-related and output risks, implements cybersecurity at all levels - and it’s in line with international standards! 🎙️In breaking down LLM tech, three data-usage phases can be identified: 1️⃣ Web-scraping and pretraining 2️⃣ Fine-tuning 3️⃣ Optimisation through data augmentation (RAG), agentic workflows 👉🏼 Best practices can be successfully implemented in Phase 2 - so that LLMs are privacy-fit when entering Phase 3, which involves vetting customer intentions and forming a working memory 🎙️ The report breaks down risks: 👉🏼 at #model level: LLMs define the relationship between data subject and personal data by the proximity that one data vector bears to the source vector for that data. Awareness of such relation is not implied, but statistical. Vector proximity depends on how multiple features relatable to a vector are aggregated in LLM training ‼️ Risks include: LLM pretraining through personal data scraped off the internet (no legal bases), data regurgitation, hallucinations, bias amplification 👉🏼 at #system level: depending on how LLMs interact with their environment - risks hog beyond privacy and impact upon autonomy, identity. Lastly, without human oversight, LLM-automated decisions defies Art. 9 of Con. 108+, while the likelihood of accurate profiling, also addressed in Art.9, becomes a threat given the amount of information that LLM are able to collect due to their increasing multimodal application ‼️ Risk management also takes into account user interference in interaction, post-deployment adaptations Risk mitigation evaluation framework: 📌 Reflect real-world deployment condition 📌 Multiple re-assessments (ISO 42005) 📌 Address emergent and interactive risks - not just performance metrics 📌 Involve stakeholders 📌 Accessible evaluation reports 💡 The RMEF should be piloted in a multi-stakeholder collaboration whereby an LLM is built, deployed, interacted with, assessed 🎙️Recommendations to stakeholders: 👉🏼 Work on data protection AND data safety: the two don’t equate 👉🏼 Implement privacy protection on day 0 👉🏼 Use PETs and implement data protection benchmarks 🚨 Regulators must issue clear guidance to help companies address these risks! CC: Peter Hense 🇺🇦🇮🇱 Itxaso Domínguez de Olazábal, PhD.