If you are an organisation using AI or you are an AI developer, the Australian privacy regulator has just published some vital information about AI and your privacy obligations. Here is a summary of the new guides for businesses published today by the Office of the Australian Information Commissioner which articulate how Australian privacy law applies to AI and set out the regulator’s expectations. The first guide is aimed to help businesses comply with their privacy obligations when using commercially available AI products and help them to select an appropriate product. The second provides privacy guidance to developers using personal information to train generative AI models. GUIDE ONE: Guidance on privacy and the use of commercially available AI products Top five takeaways * Privacy obligations will apply to any personal information input into an AI system, as well as the output data generated by AI (where it contains personal information). * Businesses should update their privacy policies and notifications with clear and transparent information about their use of AI * If AI systems are used to generate or infer personal information, including images, this is a collection of personal information and must comply with APP 3 (which deals with collection of personal info). * If personal information is being input into an AI system, APP 6 requires entities to only use or disclose the information for the primary purpose for which it was collected. * As a matter of best practice, the OAIC recommends that organisations do not enter personal information, and particularly sensitive information, into publicly available generative AI tools. GUIDE 2: Guidance on privacy and developing and training generative AI models Top five takeaways * Developers must take reasonable steps to ensure accuracy in generative AI models. * Just because data is publicly available or otherwise accessible does not mean it can legally be used to train or fine-tune generative AI models or systems.. * Developers must take particular care with sensitive information, which generally requires consent to be collected. * Where developers are seeking to use personal information that they already hold for the purpose of training an AI model, and this was not a primary purpose of collection, they need to carefully consider their privacy obligations. * Where a developer cannot clearly establish that a secondary use for an AI-related purpose was within reasonable expectations and related to a primary purpose, to avoid regulatory risk they should seek consent for that use and/or offer individuals a meaningful and informed ability to opt-out of such a use. https://lnkd.in/gX_FrtS9
How to Manage AI Training Data Privacy Settings
Explore top LinkedIn content from expert professionals.
Summary
Managing AI training data privacy settings is about controlling how personal and sensitive information is used when training artificial intelligence systems. This ensures data isn’t misused or exposed, especially as AI models learn from the inputs people provide during use.
- Review privacy policies: Always check the AI platform’s settings and privacy terms to understand how your data will be used and whether your information might be used for model training or shared with third parties.
- Limit sensitive inputs: Avoid entering confidential or private information into public AI tools, and use features like opt-out options or incognito modes whenever available to prevent your data from being included in training.
- Update organizational practices: Regularly review and revise your company or client agreements and internal policies to clearly disclose AI use and protect sensitive data before integrating AI tools into workflows.
-
-
The Cybersecurity and Infrastructure Security Agency together with the National Security Agency, the Federal Bureau of Investigation (FBI), the National Cyber Security Centre, and other international organizations, published this advisory providing recommendations for organizations in how to protect the integrity, confidentiality, and availability of the data used to train and operate #artificialintelligence. The advisory focuses on three main risk areas: 1. Data #supplychain threats: Including compromised third-party data, poisoning of datasets, and lack of provenance verification. 2. Maliciously modified data: Covering adversarial #machinelearning, statistical bias, metadata manipulation, and unauthorized duplication. 3. Data drift: The gradual degradation of model performance due to changes in real-world data inputs over time. The best practices recommended include: - Tracking data provenance and applying cryptographic controls such as digital signatures and secure hashes. - Encrypting data at rest, in transit, and during processing—especially sensitive or mission-critical information. - Implementing strict access controls and classification protocols based on data sensitivity. - Applying privacy-preserving techniques such as data masking, differential #privacy, and federated learning. - Regularly auditing datasets and metadata, conducting anomaly detection, and mitigating statistical bias. - Securely deleting obsolete data and continuously assessing #datasecurity risks. This is a helpful roadmap for any organization deploying #AI, especially those working with limited internal resources or relying on third-party data.
-
This Stanford study examined how six major AI companies (Anthropic, OpenAI, Google, Meta, Microsoft, and Amazon) handle user data from chatbot conversations. Here are the main privacy concerns. 👀 All six companies use chat data for training by default, though some allow opt-out 👀 Data retention is often indefinite, with personal information stored long-term 👀 Cross-platform data merging occurs at multi-product companies (Google, Meta, Microsoft, Amazon) 👀 Children's data is handled inconsistently, with most companies not adequately protecting minors 👀 Limited transparency in privacy policies, which are complex and hard to understand and often lack crucial details about actual practices Practical Takeaways for Acceptable Use Policy and Training for nonprofits in using generative AI: ✅ Assume anything you share will be used for training - sensitive information, uploaded files, health details, biometric data, etc. ✅ Opt out when possible - proactively disable data collection for training (Meta is the one where you cannot) ✅ Information cascades through ecosystems - your inputs can lead to inferences that affect ads, recommendations, and potentially insurance or other third parties ✅ Special concern for children's data - age verification and consent protections are inconsistent Some questions to consider in acceptable use policies and to incorporate in any training. ❓ What types of sensitive information might your nonprofit staff share with generative AI? ❓ Does your nonprofit currently specifically identify what is considered “sensitive information” (beyond PID) and should not be shared with GenerativeAI ? Is this incorporated into training? ❓ Are you working with children, people with health conditions, or others whose data could be particularly harmful if leaked or misused? ❓ What would be the consequences if sensitive information or strategic organizational data ended up being used to train AI models? How might this affect trust, compliance, or your mission? How is this communicated in training and policy? Across the board, the Stanford research points that developers’ privacy policies lack essential information about their practices. They recommend policymakers and developers address data privacy challenges posed by LLM-powered chatbots through comprehensive federal privacy regulation, affirmative opt-in for model training, and filtering personal information from chat inputs by default. “We need to promote innovation in privacy-preserving AI, so that user privacy isn’t an afterthought." How are you advocating for privacy-preserving AI? How are you educating your staff to navigate this challenge? https://lnkd.in/g3RmbEwD
-
A federal court just ruled that a man's AI conversations were fair game for the government. Not because he did anything wrong with the tool. Because he didn't understand how it worked. That case, United States v. Heppner, should make every professional stop and think about what they're typing into AI platforms and under what conditions. Here's what actually happened. Bradley Heppner, a defendant in a federal fraud case, used Claude to research his own legal strategy. He typed in information his attorney had shared with him in confidence. He generated reports outlining his defense. He later shared those reports with his lawyers. The government seized his devices and wanted those AI documents. He claimed privilege. The court said no. Three reasons: He used the free, public version of Claude. The terms of service he agreed to allowed Anthropic to collect his inputs and outputs, use them for training, and share them with third parties including government authorities. Confidentiality was gone the moment he hit enter. He acted without his attorney's direction. Work product protection covers materials prepared at a lawyer's instruction. He did it on his own. That disqualified him. Claude is not a lawyer. The court was blunt. You cannot claim attorney-client privilege over a conversation with an AI. Now here is what you can actually do to protect yourself. If you use Claude Pro or any paid consumer plan, go to Settings, then Privacy, and turn off model training. This stops your conversations from being used to train the model. Do it now if you haven't. Use Incognito Mode for anything sensitive. Incognito conversations are excluded from training even if your general settings allow it. Never type privileged information into a public AI tool. Even with settings optimized, consumer plans are not enterprise grade. If it came from your attorney, it stays off the platform. If your work involves legally sensitive information regularly, look at enterprise AI solutions. They come with data processing agreements, no training on your data by default, and actual contractual confidentiality guarantees. And if you are working through a legal strategy, do it with your lawyer. Not a chatbot. The technology is new. The legal principles are not. Confidentiality still depends on who you shared information with and under what terms. Most people have no idea what they agreed to when they signed up for the tool they use every day. Now you do.
-
Before you connect a client's ad account to any AI tool, ask this one question: does this platform train on my data? Most teams skip it. They see the time savings — and the savings are real — and they don't stop to check what they're agreeing to in the terms of service. We spent more time on the data privacy question than on any single piece of the technical build. When you connect an ad account to an AI model, you're sending real campaign data to an external service: spend levels, audience sizes, creative performance, targeting parameters. For a client account, that's not just your data. It's your client's. The reason we chose Claude for this project was a single clause in Anthropic's privacy policy: data sent via the API and paid Claude plans is not used to train their models. That was the deciding factor. We also checked: does each ad platform's API Terms of Service permit third-party AI processing of account data? Have we updated our own client agreements to disclose that AI tools are used in our reporting workflow? We haven't rolled this out to client accounts yet. We're still working through those questions. The technical build took two weeks. The policy groundwork is taking at least as long. For any agency considering something similar: the privacy question is not a footnote you add at the end. It's the work you do before the build starts. What's your agency's policy on AI tools and client data? Curious how others are handling this. Full article here: https://lnkd.in/gR48G_GP
-
Generative AI is reshaping industries, but as Large Language Models (LLMs) continue to evolve, they bring a critical challenge: how do we teach them to forget? Forget what? Our sensitive data. In their default state, LLMs are designed to retain patterns from training data, enabling them to generate remarkable outputs. However, this capability raises privacy and security concerns. Why Forgetting Matters? Compliance with Privacy Laws: Regulations like GDPR and CCPA mandate the right to be forgotten. Training LLMs to erase specific data aligns with these legal requirements. Minimizing Data Exposure: Retaining unnecessary or sensitive information increases risks in case of breaches. Forgetting protects users and organizations alike. Building User Trust: Transparent mechanisms to delete user data foster confidence in AI solutions. Techniques to Enable Forgetting 🔹 Selective Fine-Tuning: Retraining models to exclude specific data sets without degrading performance. 🔹 Differential Privacy: Ensuring individual data points are obscured during training to prevent memorization. 🔹 Memory Augmentation: Using external memory modules where specific records can be updated or deleted without affecting the core model. 🔹 Data Tokenization: Encapsulating sensitive information in reversible tokens that can be erased independently. Balancing forgetfulness with functionality is complex. LLMs must retain enough context for accuracy while ensuring sensitive information isn’t permanently embedded. By prioritizing privacy, we can shape a future in which AI doesn’t just work for us—it works with our values. How are you addressing privacy concerns in your AI initiatives? Let’s discuss! #GenerativeAI #AIPrivacy #LLM #DataSecurity #EthicalAI Successive Digital
-
🇫🇷 CNIL just published guidance on informing data subject in the context of AI + GDPR (Jan. 5, 2026). 🤖 A few quick takeaways: ✅ 1) The scope is broad. CNIL frames transparency as applying whether data is collected directly (first-party) or indirectly (downloads, web scraping tools, APIs, partners, data brokers, reuse of existing datasets). It also flags that this includes data generated by the controller, citing a CJEU decision. ✅ 2) Timing: If data is not collected directly, CNIL reiterates the expectation to inform data subjects as soon as possible and within one month of retrieving the data (or earlier at first contact / first disclosure to a recipient, as applicable). Also notable: CNIL encourages a reasonable time gap between notice and model training when data is particularly sensitive, so rights can be exercised before training (given the technical complexity of “fixing” things at the model layer). ✅ 3) CNIL is explicit that AI complexity is not an excuse: information should be clear, intelligible, and easily accessible, and can use diagrams explaining how data is used in training, how the AI system works, and the distinction between the training dataset, the model, and outputs. ✅ 4) CNIL notes the GDPR derogation where individual notice is impractical or would require disproportionate effort, but stresses case-by-case analysis and documenting the balancing of (i) privacy impact and (ii) burden/cost and lack of contact details, plus safeguards (e.g., pseudonymization, DPIA, reduced retention, security measures). https://lnkd.in/gvmfbJyi #GDPR #Privacy #AI #AIGovernance #CNIL #Compliance #DataProtection #LLM
-
Generative AI offers transformative potential, but how do we harness it without compromising crucial data privacy? It's not an afterthought — it's central to the strategy. Evaluating the right approach depends heavily on specific privacy goals and data sensitivity. One starting point, with strong vendor contracts, is using the LLM context window directly. For larger datasets, Retrieval-Augmented Generation (RAG) scales well. RAG retrieves relevant information at query time to augment the prompt, which helps keep private data out of the LLM's core training dataset. However, optimizing RAG across diverse content types and meeting user expectations for structured, precise answers can be challenging. At the other extreme lies Self-Hosting LLMs. This offers maximum control but introduces significant deployment and maintenance overhead, especially when aiming for the capabilities of large foundation models. For ultra-sensitive use cases, this might be the only viable path. Distilling larger models for specific tasks can mitigate some deployment complexity, but the core challenges of self-hosting remain. Look at Apple Intelligence as a prime example. Their strategy prioritizes user privacy through On-Device Processing, minimizing external data access. While not explicitly labeled RAG, the architecture — with its semantic index, orchestration, and LLM interaction — strongly resembles a sophisticated RAG system, proving privacy and capability can coexist. At Egnyte, we believe robust AI solutions must uphold data security. For us, data privacy and fine-grained, authorized access aren't just compliance hurdles; they are innovation drivers. Looking ahead to advanced Agent-to-Agent AI interactions, this becomes even more critical. Autonomous agents require a bedrock of trust, built on rigorous access controls and privacy-centric design, to interact securely and effectively. This foundation is essential for unlocking AI's future potential responsibly.
-
The EDPB recently published a report on AI Privacy Risks and Mitigations in LLMs. This is one of the most practical and detailed resources I've seen from the EDPB, with extensive guidance for developers and deployers. The report walks through privacy risks associated with LLMs across the AI lifecycle, from data collection and training to deployment and retirement, and offers practical tips for identifying, measuring, and mitigating risks. Here's a quick summary of some of the key mitigations mentioned in the report: For providers: • Fine-tune LLMs on curated, high-quality datasets and limit the scope of model outputs to relevant and up-to-date information. • Use robust anonymisation techniques and automated tools to detect and remove personal data from training data. • Apply input filters and user warnings during deployment to discourage users from entering personal data, as well as automated detection methods to flag or anonymise sensitive input data before it is processed. • Clearly inform users about how their data will be processed through privacy policies, instructions, warning or disclaimers in the user interface. • Encrypt user inputs and outputs during transmission and storage to protect data from unauthorized access. • Protect against prompt injection and jailbreaking by validating inputs, monitoring LLMs for abnormal input behaviour, and limiting the amount of text a user can input. • Apply content filtering and human review processes to flag sensitive or inappropriate outputs. • Limit data logging and provide configurable options to deployers regarding log retention. • Offer easy-to-use opt-in/opt-out options for users whose feedback data might be used for retraining. For deployers: • Enforce strong authentication to restrict access to the input interface and protect session data. • Mitigate adversarial attacks by adding a layer for input sanitization and filtering, monitoring and logging user queries to detect unusual patterns. • Work with providers to ensure they do not retain or misuse sensitive input data. • Guide users to avoid sharing unnecessary personal data through clear instructions, training and warnings. • Educate employees and end users on proper usage, including the appropriate use of outputs and phishing techniques that could trick individuals into revealing sensitive information. • Ensure employees and end users avoid overreliance on LLMs for critical or high-stakes decisions without verification, and ensure outputs are reviewed by humans before implementation or dissemination. • Securely store outputs and restrict access to authorised personnel and systems. This is a rare example where the EDPB strikes a good balance between practical safeguards and legal expectations. Link to the report included in the comments. #AIprivacy #LLMs #dataprotection #AIgovernance #EDPB #privacybydesign #GDPR
-
The latest joint cybersecurity guidance from the NSA, CISA, FBI, and international partners outlines critical best practices for securing data used to train and operate AI systems recognizing data integrity as foundational to AI reliability. Key highlights include: • Mapping data-specific risks across all 6 NIST AI lifecycle stages: Plan and Design, Collect and Process, Build and Use, Verify and Validate, Deploy and Use, Operate and Monitor • Identifying three core AI data risks: poisoned data, compromised supply chain, and data drift for each with tailored mitigations • Outlining 10 concrete data security practices, including digital signatures, trusted computing, encryption with AES 256, and secure provenance tracking • Exposing real-world poisoning techniques like split-view attacks (costing as little as 60 dollars) and frontrunning poisoning against Wikipedia snapshots • Emphasizing cryptographically signed, append-only datasets and certification requirements for foundation model providers • Recommending anomaly detection, deduplication, differential privacy, and federated learning to combat adversarial and duplicate data threats • Integrating risk frameworks including NIST AI RMF, FIPS 204 and 205, and Zero Trust architecture for continuous protection Who should take note: • Developers and MLOps teams curating datasets, fine-tuning models, or building data pipelines • CISOs, data owners, and AI risk officers assessing third-party model integrity • Leaders in national security, healthcare, and finance tasked with AI assurance and governance • Policymakers shaping standards for secure, resilient AI deployment Noteworthy aspects: • Mitigations tailored to curated, collected, and web-crawled datasets and each with unique attack vectors and remediation strategies • Concrete protections against adversarial machine learning threats including model inversion and statistical bias • Emphasis on human-in-the-loop testing, secure model retraining, and auditability to maintain trust over time Actionable step: Build data-centric security into every phase of your AI lifecycle by following the 10 best practices, conducting ongoing assessments, and enforcing cryptographic protections. Consideration: AI security does not start at the model but rather it starts at the dataset. If you are not securing your data pipeline, you are not securing your AI.