AI reaches a milestone: privacy by design at scale Google AI and DeepMind have announced VaultGemma, a 1B parameter, open-weight model trained entirely with differential privacy (DP). Why does this matter? Most large LLMs carry inherent privacy risks: they can memorise and reproduce fragments of their training data. A serious issue if it’s a patient record, bank detail, or private correspondence. VaultGemma's training method - DP-SGD, which limits how much influence any datapoint has and adds noise to blur details - ensures no single personal data included in the training could later be exposed. The result: a mathematical guarantee of privacy, the strongest ever achieved at this scale. The opportunities In healthcare, finance, and government, the implications are immediate: 🔸 Hospitals can analyse patient data without risking disclosure. 🔸 Banks can detect fraud or assess credit risk within GDPR rules. 🔸 Governments can train models on citizen data while meeting privacy-by-design requirements. In each case, sensitive data shifts from a liability to an asset that can drive innovation. The challenges 1️⃣ Performance: VaultGemma is less accurate than the frontier LLMs, closer to the performance of GPT-3.5. This is the cost of stronger privacy: trading short-term capability for long-term protection. 2️⃣ Jurisdiction: The model guarantees privacy, but not sovereignty. Built by an American provider, it remains subject to U.S. law. Under the CLOUD Act, American authorities can compel access even to data hosted abroad. How this compares 💠 Gemini has strong capability and multimodality, but privacy protections rest on corporate policy. 💠 ChatGPT-5 leads in performance, but is closed & under U.S. jurisdiction. 💠 Claude is positioned as “safety-first,” yet its privacy controls are policy-based, not mathematical. By contrast, VaultGemma offers provable privacy. The trade-off is weaker performance and continued U.S. jurisdiction - but it moves the conversation from “trust us” to “prove it.” Leaders have now a wider choice for adopting AI: ✔️ Privacy-first model: trade accuracy for provable privacy. Suited for highly regulated sectors and SMEs needing compliance. Lower cost, limited customisation, under U.S. law. ✔️ Frontier LLMs: cutting-edge capability at scale. Privacy rests on policy, with jurisdiction split - U.S., Chinese, or EU law. Highest-priced via usage-based APIs, but with the broadest ecosystems and integrations. ✔️ Sovereign alternatives: slower today, but with greater control of data and law. Could adopt privacy-by-design methods like VaultGemma, though requiring heavy upfront investment. Higher initial cost, offset by customisation and long-term resilience. AI has reached a milestone: privacy by design is possible at scale. Leaders need to balance trust, compliance, performance, and control in their choices. #AI #ResponsibleAI #DataPrivacy #DigitalSovereignty #Boardroom
Data Privacy in AI Language Models for Enterprises
Explore top LinkedIn content from expert professionals.
Summary
Data privacy in AI language models for enterprises refers to protecting sensitive information when using or training artificial intelligence systems, ensuring personal and business data is not exposed, misused, or improperly stored. As companies increasingly adopt AI, it's essential to safeguard confidential details and comply with privacy regulations to maintain trust and avoid legal risks.
- Identify sensitive data: Take inventory of all information that could reveal personal or business details and make sure it is flagged before being used in any AI tools or training processes.
- Establish privacy controls: Create clear company policies, use secure enterprise-level AI platforms, and routinely check settings to prevent accidental sharing or retention of private data.
- Monitor regulatory compliance: Stay updated on privacy laws and industry guidelines to ensure your AI systems align with legal requirements and maintain customer trust.
-
-
Before diving headfirst into AI, companies need to define what data privacy means to them in order to use GenAI safely. After decades of harvesting and storing data, many tech companies have created vast troves of the stuff - and not all of it is safe to use when training new GenAI models. Most companies can easily recognize obvious examples of Personally Identifying Information (PII) like Social Security numbers (SSNs) - but what about home addresses, phone numbers, or even information like how many kids a customer has? These details can be just as critical to ensure newly built GenAI products don’t compromise their users' privacy - or safety - but once this information has entered an LLM, it can be really difficult to excise it. To safely build the next generation of AI, companies need to consider some key issues: ⚠️Defining Sensitive Data: Companies need to decide what they consider sensitive beyond the obvious. Personally identifiable information (PII) covers more than just SSNs and contact information - it can include any data that paints a detailed picture of an individual and needs to be redacted to protect customers. 🔒Using Tools to Ensure Privacy: Ensuring privacy in AI requires a range of tools that can help tech companies process, redact, and safeguard sensitive information. Without these tools in place, they risk exposing critical data in their AI models. 🏗️ Building a Framework for Privacy: Redacting sensitive data isn’t just a one-time process; it needs to be a cornerstone of any company’s data management strategy as they continue to scale AI efforts. Since PII is so difficult to remove from an LLM once added, GenAI companies need to devote resources to making sure it doesn’t enter their databases in the first place. Ultimately, AI is only as safe as the data you feed into it. Companies need a clear, actionable plan to protect their customers - and the time to implement it is now.
-
If you are an organisation using AI or you are an AI developer, the Australian privacy regulator has just published some vital information about AI and your privacy obligations. Here is a summary of the new guides for businesses published today by the Office of the Australian Information Commissioner which articulate how Australian privacy law applies to AI and set out the regulator’s expectations. The first guide is aimed to help businesses comply with their privacy obligations when using commercially available AI products and help them to select an appropriate product. The second provides privacy guidance to developers using personal information to train generative AI models. GUIDE ONE: Guidance on privacy and the use of commercially available AI products Top five takeaways * Privacy obligations will apply to any personal information input into an AI system, as well as the output data generated by AI (where it contains personal information). * Businesses should update their privacy policies and notifications with clear and transparent information about their use of AI * If AI systems are used to generate or infer personal information, including images, this is a collection of personal information and must comply with APP 3 (which deals with collection of personal info). * If personal information is being input into an AI system, APP 6 requires entities to only use or disclose the information for the primary purpose for which it was collected. * As a matter of best practice, the OAIC recommends that organisations do not enter personal information, and particularly sensitive information, into publicly available generative AI tools. GUIDE 2: Guidance on privacy and developing and training generative AI models Top five takeaways * Developers must take reasonable steps to ensure accuracy in generative AI models. * Just because data is publicly available or otherwise accessible does not mean it can legally be used to train or fine-tune generative AI models or systems.. * Developers must take particular care with sensitive information, which generally requires consent to be collected. * Where developers are seeking to use personal information that they already hold for the purpose of training an AI model, and this was not a primary purpose of collection, they need to carefully consider their privacy obligations. * Where a developer cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to a primary purpose, to avoid regulatory risk they should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out of such a use. https://lnkd.in/gX_FrtS9
-
Most employees using AI at work are unknowingly feeding their company’s IP into a model that could answer someone else’s question tomorrow. That’s not paranoia. That’s often the default setting. I’ve spent 17 years in cybersecurity and the last few years working deeply in enterprise AI deployments. Here’s what I see constantly: Smart professionals at serious companies pasting things like: - confidential contracts - internal strategy documents - customer data - proprietary code - product roadmaps Into free or standard-tier AI tools. No malicious intent. They’re just trying to get work done faster. But here’s what most people don’t realize: Many consumer AI platforms, including tools like ChatGPT and Claude may use conversations to improve their models unless training is disabled or enterprise protections are in place. Which means your data may not stay yours. Look at the screenshots below. These are the actual settings screens most users never see. What you should actually do 1️⃣ Never paste real PII or sensitive data into consumer AI tools. Not even “just to test.” Data hygiene starts before you hit send. 2️⃣ If your team is using AI for real business work, use the enterprise tier. Not because “enterprise” sounds nicer. Because it usually includes: — Contractual data processing agreements — Model training disabled by default — Audit logging — Data isolation — Administrative controls 3️⃣ Assume the default is ON until you verify otherwise. Go check the settings. The screenshots below show examples: — Claude → “Help improve Claude” toggle — ChatGPT → “Improve the model for everyone” — Remote browser data retention settings These controls exist. Most users have never looked at them. The hard truth: Shadow AI is becoming the new Shadow IT. Except this time employees aren’t installing unauthorized software. They’re voluntarily pasting sensitive company data into external models. If you work in: Security Legal HR Finance Engineering leadership …this risk likely sits in your lane. AI will absolutely transform productivity. But the companies that benefit most will be the ones that adopt it with security discipline. Curious how others are handling this: Does your organization have an AI usage policy yet or is it still the Wild West for consumer AI tools?
-
This new white paper by Stanford Institute for Human-Centered Artificial Intelligence (HAI) titled "Rethinking Privacy in the AI Era" addresses the intersection of data privacy and AI development, highlighting the challenges and proposing solutions for mitigating privacy risks. It outlines the current data protection landscape, including the Fair Information Practice Principles, GDPR, and U.S. state privacy laws, and discusses the distinction and regulatory implications between predictive and generative AI. The paper argues that AI's reliance on extensive data collection presents unique privacy risks at both individual and societal levels, noting that existing laws are inadequate for the emerging challenges posed by AI systems, because they don't fully tackle the shortcomings of the Fair Information Practice Principles (FIPs) framework or concentrate adequately on the comprehensive data governance measures necessary for regulating data used in AI development. According to the paper, FIPs are outdated and not well-suited for modern data and AI complexities, because: - They do not address the power imbalance between data collectors and individuals. - FIPs fail to enforce data minimization and purpose limitation effectively. - The framework places too much responsibility on individuals for privacy management. - Allows for data collection by default, putting the onus on individuals to opt out. - Focuses on procedural rather than substantive protections. - Struggles with the concepts of consent and legitimate interest, complicating privacy management. It emphasizes the need for new regulatory approaches that go beyond current privacy legislation to effectively manage the risks associated with AI-driven data acquisition and processing. The paper suggests three key strategies to mitigate the privacy harms of AI: 1.) Denormalize Data Collection by Default: Shift from opt-out to opt-in data collection models to facilitate true data minimization. This approach emphasizes "privacy by default" and the need for technical standards and infrastructure that enable meaningful consent mechanisms. 2.) Focus on the AI Data Supply Chain: Enhance privacy and data protection by ensuring dataset transparency and accountability throughout the entire lifecycle of data. This includes a call for regulatory frameworks that address data privacy comprehensively across the data supply chain. 3.) Flip the Script on Personal Data Management: Encourage the development of new governance mechanisms and technical infrastructures, such as data intermediaries and data permissioning systems, to automate and support the exercise of individual data rights and preferences. This strategy aims to empower individuals by facilitating easier management and control of their personal data in the context of AI. by Dr. Jennifer King Caroline Meinhardt Link: https://lnkd.in/dniktn3V
-
Many companies are accidentally training AI models with their own data. Most people assume: Free plan = risky Paid plan = safe That assumption is wrong. Across many AI tools, personal subscriptions still allow your data to be used for model training. Even when you are paying. Teams are uploading: Client documents Company strategy Internal reports Customer data And assuming the subscription protects it. In many cases, it doesn’t. (Get the high-red PDF here: https://lnkd.in/eh9_DPFD) Here are five things leaders should know before allowing teams to use AI tools at work. 1. Most personal AI plans do not protect your data Many AI tools treat Free and paid individual plans the same. That means the data you upload may still be used for model training. Paying for the tool does not automatically change the data policy. 2. Enterprise protection usually requires a separate contract For example, with Anthropic: Free and Pro Claude plans do not protect your data from training. To get: SOC 2 GDPR protections Data excluded from training You typically need a separate commercial enterprise agreement. Simply paying for “Claude for Work” via credit card may still leave you under consumer terms. 3. Even “team” plans can have hidden limitations Take OpenAI. The Teams plan protects data from training. But external legal proceedings can still override deletion commitments and require chat preservation in some cases. Enterprise tiers introduce additional controls like audit logs, SSO, and data residency. 4. Microsoft handles this differently If you use Copilot while signed into a Microsoft 365 organisational account, enterprise data protection activates automatically. Your data stays within the Microsoft tenant boundary. 5. Google Workspace includes protection surprisingly early Google takes another approach. Enterprise data protection starts from the cheapest Workspace business tier. Which makes it one of the simplest options for smaller companies already using Google’s ecosystem. The real lesson: AI adoption is not just a tooling decision. It is a data governance decision. Before teams upload company data, leaders need to understand: Where the data goes Whether it trains models What legal terms actually apply And whether the plan they are paying for is still governed by consumer terms. One final note. These policies change constantly. This comparison reflects the situation as of March 2026 and will likely evolve as vendors update their policies and enterprise offerings. Get the high-red PDF here: https://lnkd.in/eh9_DPFD ♻️ If this resonated, share it. Someone in your network is trying to make sense of AI adoption. 🔔 Follow Alex Issakova for practical frameworks on using AI in real organisations. 📩 Join The Roadmap for AI education, real-world use cases, and lessons from building a business after corporate. 👉 https://lnkd.in/euKP99Ss
-
Your trade secrets just walked out the front door … and you might have held it open. No employee—except the rare bad actor—means to leak sensitive company data. But it happens, especially when people are using generative AI tools like ChatGPT to “polish a proposal,” “summarize a contract,” or “write code faster.” But here’s the problem: unless you’re using ChatGPT Team or Enterprise, it doesn’t treat your data as confidential. According to OpenAI’s own Terms of Use: “We do not use Content that you provide to or receive from our API to develop or improve our Services.” But don‘t forget to read the fine print: that protection does not apply unless you’re on a business plan. For regular users, ChatGPT can use your prompts, including anything you type or upload, to train its large language models. Translation: That “confidential strategy doc” you asked ChatGPT to summarize? That “internal pricing sheet” you wanted to reword for a client? That “source code” you needed help debugging? ☠️ Poof. Trade secret status, gone. ☠️ If you don’t take reasonable measures to maintain the secrecy of your trade secrets, they will lose their protection as such. So how do you protect your business? 1. Write an AI Acceptable Use Policy. Be explicit: what’s allowed, what’s off limits, and what’s confidential. 2. Educate employees. Most folks don’t realize that ChatGPT isn’t a secure sandbox. Make sure they do. 3. Control tool access. Invest in an enterprise solution with confidentiality protections. 4. Audit and enforce. Treat ChatGPT the way you treat Dropbox or Google Drive, as tools that can leak data if unmanaged. 5. Update your confidentiality and trade secret agreements. Include restrictions on AI disclosures. AI isn’t going anywhere. The companies that get ahead of its risk will be the ones still standing when the dust settles. If you don’t have an AI policy and a plan to protect your data, you’re not just behind—you’re exposed.
-
Google has published a whitepaper on privacy in AI, proposing a practical framework for integrating Privacy Enhancing Technologies (PETs) across the entire AI lifecycle — from data collection to training, personalization, and deployment. The paper reframes privacy from “regulatory obligation” to “product design.” PETs shouldn’t be bolted on at the end just to manage compliance risk; they should be part of the system architecture from the start. The approach is: map where personal data enters the model at each stage, identify the specific privacy risks in each of those stages, and then apply targeted protections in data handling, training, and production. The framework is built around a three-way decision: privacy, utility, and cost. Teams are expected to intentionally choose the combination of PETs that offers protection without breaking product value or user experience. The whitepaper also categorizes PETs by phase: 📃Data layer: PII removal, deduplication, anonymization, synthetic data with differential privacy. ⚙️Training: differential privacy during optimization, federated learning, MPC, trusted execution environments to reduce memorization and internal exposure. 🚀Deployment: input/output filtering, secure runtime environments, on-device processing, and computation over encrypted data to protect prompts and responses in production. Finally, the document introduces the idea of creating “well-lit paths”: reusable engineering and governance patterns that make privacy part of the core infrastructure instead of something manually reinvented by each team. It’s a useful read for anyone looking to understand, in practical terms, how to apply PETs when assessing and deploying AI models.
-
Working with LLMs or AI chat tools? You’re probably leaking user data! Here’s the privacy hole no one’s talking about. When users interact with AI apps, they often share sensitive information like names, emails, internal identifiers, and even health records. Most apps send this raw data directly to the model. That means PII ends up in logs, audit trails, or third-party APIs. It’s a silent risk sitting in every prompt. Masking data sounds like a fix, but it often breaks the prompt or causes hallucinations. The model can’t reason properly if key context is missing. That’s where GPT Guard comes in. GPTGuard acts as a privacy layer that enables secure use of LLMs without ever exposing sensitive data to public models. Here's how it works: 1. PII Detection and Masking Every prompt is scanned for sensitive information using a mix of regex, heuristics, and AI models. Masking is handled through Protecto’s tokenization API, which replaces sensitive fields with format-preserving placeholders. This ensures nothing identifiable reaches the LLM. 2. Understanding Masked Inputs GPT Guard uses a fine-tuned OpenAI model that understands masked data. It preserves structure and type, so even a placeholder like `<PER>Token123</PER>` retains enough meaning for the LLM to respond naturally. The result: no hallucinations, no broken logic, just accurate answers with privacy intact. 3. Seamless Unmasking Once the LLM generates a reply, GPTGuard unmasks the tokens and returns a complete, readable response. The user never sees the masking — just the final answer with all original context restored. Key features: 🔍 Detects and masks sensitive data like PII, PHI, and internal identifiers from prompts and files 🚫 Prevents raw sensitive data from ever reaching the LLM 🔁 Unmasks the output so users still get a clear, readable response 🚀 Works with OpenAI, Claude, Gemini, Llama, DeepSeek, and other major LLMs 📄 Supports file uploads and secure chat with internal documents via RAG The best part? It works across cloud or on-prem, integrates cleanly with your existing workflows, and doesn't require custom fine-tuning or data pipelines.
-
European Data Protection Board issues long awaited opinion on AI models: part 3 - anonymization (See Part 1: https://shorturl.at/TYbq3 consequences and Part 2: https://shorturl.at/ba5A1 legitimate interest legal basis). 🔹️AI models are not always anonymous; assess case by case. 🔹️ AI models specifically designed to provide personal data regarding individuals whose personal data were used to train the model, cannot be considered anonymous. 🔹️For an AI model to be considered anonymous, both (1) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to develop the model and (2) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant, taking into account ‘all the means reasonably likely to be used’ by the controller or another person. 🔹️ Pay special attention to risk of singling out, which is substantial 🔹️ Consider all means reasonably likely to be used by the controller or another person to identify individuals which may include: characteristics of training data, AI model & training procedure; context; c. additional information; costs and amount of time needed to obtain such info; available technology & technological developments. 🔹️ Such means & levels of testing may differ between a publicly available and a model to be used only internally by employees. 🔹️ Consider risk of identification by controller & different types of ‘other persons’, including unintended third parties accessing the AI model, and unintended reuse or disclosure of model. Be able to prove, through steps taken and documentation, that you have taken effective measures to anonymize the AI Model. Otherwise, you may be in breach of your accountability obligations under Article 5(2) GDPR. Factors to consider: 🔹️ selection of sources: (selection criteria; relevance and adequacy of chosen sources; exclusion of inappropriate sources. 🔹️ preparation of data for training phase: (could you use anonymous or pseudonymous); if not why not; data minimisation strategies & techniques to restrict volume of personal data included in training process; data filtering processes to remove irrelevant personal data. 🔹️ Methodological choices regarding training: improve model generalisation & reduce overfitting; privacy-preserving techniques (e.g. differential privacy) 🔹️ Measures regarding outputs of model (lower likelihood of obtaining personal data related to training data from queries). 🔹️ Conduct sufficient tests on model that cover widely known, state-of-the-art attacks: eg attribute and membership inference; exfiltration; regurgitation of training data; model inversion; or reconstruction attacks. 🔹️ Document process including: DPIA; advice by DPO; technical & organisational measures; AI model’s theoretical resistance to re-identification techniques. #dataprivacy #dataprotection #privacyFOMO #AIFOMO Pic by Grok