How to Secure Large Language Models

Explore top LinkedIn content from expert professionals.

Summary

Securing large language models means protecting these advanced AI systems from privacy threats, misuse, and attacks by using layered strategies and ongoing monitoring. Large language models are AI tools that understand and generate human language, but they can be vulnerable to risks like data leaks, malicious prompts, and unauthorized access.

  • Implement privacy safeguards: Use techniques such as anonymizing data, secure multiparty computation, and federated learning during model training to keep sensitive information safe.
  • Strengthen access controls: Set up authenticated prompts, security boundaries, and rule-based filters to limit who can interact with the model and to block risky or tampered inputs.
  • Monitor and audit regularly: Continuously review model behavior, run stress tests with varied input styles, and update defense measures to catch new vulnerabilities and stay compliant with evolving regulations.
Summarized by AI based on LinkedIn member posts
  • View profile for Peter Slattery, PhD

    MIT AI Risk Initiative | MIT FutureTech

    68,993 followers

    Isabel Barberá: "This document provides practical guidance and tools for developers and users of Large Language Model (LLM) based systems to manage privacy risks associated with these technologies. The risk management methodology outlined in this document is designed to help developers and users systematically identify, assess, and mitigate privacy and data protection risks, supporting the responsible development and deployment of LLM systems. This guidance also supports the requirements of the GDPR Article 25 Data protection by design and by default and Article 32 Security of processing by offering technical and organizational measures to help ensure an appropriate level of security and data protection. However, the guidance is not intended to replace a Data Protection Impact Assessment (DPIA) as required under Article 35 of the GDPR. Instead, it complements the DPIA process by addressing privacy risks specific to LLM systems, thereby enhancing the robustness of such assessments. Guidance for Readers > For Developers: Use this guidance to integrate privacy risk management into the development lifecycle and deployment of your LLM based systems, from understanding data flows to how to implement risk identification and mitigation measures. > For Users: Refer to this document to evaluate the privacy risks associated with LLM systems you plan to deploy and use, helping you adopt responsible practices and protect individuals’ privacy. " >For Decision-makers: The structured methodology and use case examples will help you assess the compliance of LLM systems and make informed risk-based decision" European Data Protection Board

  • View profile for Mani Keerthi N

    Cybersecurity Strategist & Advisor || LinkedIn Learning Instructor

    17,694 followers

    On Protecting the Data Privacy of Large Language Models (LLMs): A Survey From the research paper: In this paper, we extensively investigate data privacy concerns within Large LLMs, specifically examining potential privacy threats from two folds: Privacy leakage and privacy attacks, and the pivotal technologies for privacy protection during various stages of LLM privacy inference, including federated learning, differential privacy, knowledge unlearning, and hardware-assisted privacy protection. Some key aspects from the paper: 1)Challenges: Given the intricate complexity involved in training LLMs, privacy protection research tends to dissect various phases of LLM development and deployment, including pre-training, prompt tuning, and inference 2) Future Directions: Protecting the privacy of LLMs throughout their creation process is paramount and requires a multifaceted approach. (i) Firstly, during data collection, minimizing the collection of sensitive information and obtaining informed consent from users are critical steps. Data should be anonymized or pseudonymized to mitigate re-identification risks. (ii) Secondly, in data preprocessing and model training, techniques such as federated learning, secure multiparty computation, and differential privacy can be employed to train LLMs on decentralized data sources while preserving individual privacy. (iii) Additionally, conducting privacy impact assessments and adversarial testing during model evaluation ensures potential privacy risks are identified and addressed before deployment. (iv)In the deployment phase, privacy-preserving APIs and access controls can limit access to LLMs, while transparency and accountability measures foster trust with users by providing insight into data handling practices. (v)Ongoing monitoring and maintenance, including continuous monitoring for privacy breaches and regular privacy audits, are essential to ensure compliance with privacy regulations and the effectiveness of privacy safeguards. By implementing these measures comprehensively throughout the LLM creation process, developers can mitigate privacy risks and build trust with users, thereby leveraging the capabilities of LLMs while safeguarding individual privacy. #privacy #llm #llmprivacy #mitigationstrategies #riskmanagement #artificialintelligence #ai #languagelearningmodels #security #risks

  • View profile for Razi R.

    Senior PM @ Microsoft · AI Security & Zero Trust · O’Reilly Author · Speaker (RSA, Identiverse) · Advisory: securing agentic AI for enterprises & boards

    13,788 followers

    AI security is entering a new phase, one where the systems protect themselves. The A2AS: Agentic AI Runtime Security and Self-Defense paper makes that argument with quiet conviction. Instead of relying on filters, wrappers, or fine-tuning, it proposes a framework where large language models can verify, authenticate, and defend their own reasoning. The idea is as pragmatic as it is radical and that is to make AI secure by design, not by supervision. What the paper outlines: • The BASIC security model, a framework of five controls: Behavior Certificates, Authenticated Prompts, Security Boundaries, In-Context Defenses, and Codified Policies. Each addresses a different risk surface from behavior drift to malicious prompt injection. • Three design pillars: runtime, self-defense, and self-sufficiency, ensuring that protection happens in real time, leverages the model’s reasoning, and minimizes dependency on external systems. • The A2AS framework, which implements BASIC as a runtime layer much like HTTPS secures HTTP, embedding trust directly into how models operate. Why this matters AI agents now operate across critical domains, from finance to infrastructure. Their greatest vulnerability lies in how they process both trusted and untrusted data inside the same context window. This design flaw enables prompt injection attacks that manipulate instructions or extract data. Existing defenses rely on external filters, retraining, or sandboxing, each adding complexity or latency. A2AS, by contrast, uses the model’s own reasoning to authenticate and protect itself at runtime. Key risks and practices: • Behavior drift and misuse are limited by Behavior Certificates that define and enforce permissions. • Tampered inputs are blocked through Authenticated Prompts that verify content integrity and attribution. • Context mixing and indirect injections are mitigated by Security Boundaries that tag untrusted inputs. • Unsafe reasoning is restrained by In-Context Defenses embedded in the prompt itself. • Compliance and governance are maintained through Codified Policies that enforce business rules as executable code. Who should act: Security architects, AI platform engineers, and governance teams can adopt A2AS as a baseline for runtime defense. It requires no retraining or architecture overhaul, yet creates a measurable layer of assurance. Action items: • Use the BASIC model as a checklist for every new agent or LLM integration. • Issue Behavior Certificates for all agents and enforce them at runtime. • Add Authenticated Prompts and Security Boundaries to instrument context. • Embed In-Context Defenses and Codified Policies to maintain safe reasoning. • Regularly audit and adapt configurations as new attack patterns evolve.

  • View profile for Patrick Sullivan

    VP of Strategy and Innovation at A-LIGN | TEDx Speaker | Forbes Technology Council | AI Ethicist | ISO/IEC JTC1/SC42 Member

    11,987 followers

    📜LLM Safety Has a New Problem📜 Your AI system may be easier to jailbreak than you think. A new study shows that converting a harmful request into a poem is often enough to bypass guardrails. Same request. Same intent. Different surface form. The model complies. The attack success rates are not small. Several major providers move more than fifty percentage points. Some reach ninety percent or higher. The failures stretch across cyber offense, CBRN misuse, manipulation, privacy intrusion, and loss of control scenarios. The pattern appears across twenty five models. One prompt is enough. This exposes a deeper pattern in how alignment works. Most guardrails recognize harmful phrasing, not harmful purpose. When the request is wrapped in metaphor or rhythm, many models treat it as benign. Larger models become more vulnerable because they decode figurative language more thoroughly. Their capability improves, but their safety behavior does not transfer. For organizations deploying AI systems, this is more than an academic finding. It creates a direct gap in your assurance activities. A model that passes standard red team tests but fails when phrasing shifts creates operational and regulatory exposure. The #EUAIAct expects systems to behave consistently under realistic variation. #ISO42001 expects the same. If style alone breaks your controls, your #AIMS is incomplete. ➡️Here are mitigation steps that align with both operational safety and ISO42001 expectations: 1️⃣Expand your testing beyond plain phrasing Include poetic, narrative, obfuscated, and stylized prompts in your evaluations. Treat these as stress tests, not edge cases. 2️⃣Strengthen intent detection Use an independent intent recognition layer ahead of the primary model. Identify the underlying task before the model interprets the input. 3️⃣Layer your safety controls Combine rule based filters, retrieval grounded policy checks, schema validations, and post generation safety reviews. Do not rely on model refusal behavior alone. 4️⃣Monitor unusual surface forms Treat stylized prompts as signals for elevated scrutiny. Route them through safer inference paths or apply enhanced filtering. 5️⃣Constrain sensitive workflows For high risk cases, limit exposure to free form generation. Use templates, constrained decoding, and downstream enforcement logic. 6️⃣Treat jailbreak exposure as a continuous risk Retest frequently. Update your jailbreak suite every time your models or workflows change. I care about this because I work so closely with organizations that trust their AI systems to behave predictably. This research shows how easily that trust can be misplaced if evaluation does not reflect how real users communicate. It is time for you to move beyond benchmark safety. Real users will not stick to plain phrasing, your controls should not presume that they will. 🌐 https://lnkd.in/geja7vtB A-LIGN Shea Brown #TheBusinessofCompliance #ComplianceAlignedtoYou

  • View profile for Adnan Masood, PhD.

    Chief AI Architect | Microsoft Regional Director | Author | Board Member | STEM Mentor | Speaker | Stanford | Harvard Business School

    6,738 followers

    In my work with organizations rolling out AI and generative AI solutions, one concern I hear repeatedly from leaders, and the c-suite is how to get a clear, centralized “AI Risk Center” to track AI safety, large language model's accuracy, citation, attribution, performance and compliance etc. Operational leaders want automated governance reports—model cards, impact assessments, dashboards—so they can maintain trust with boards, customers, and regulators. Business stakeholders also need an operational risk view: one place to see AI risk and value across all units, so they know where to prioritize governance. One of such framework is MITRE’s ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) Matrix. This framework extends MITRE ATT&CK principles to AI, Generative AI, and machine learning, giving us a structured way to identify, monitor, and mitigate threats specific to large language models. ATLAS addresses a range of vulnerabilities—prompt injection, data leakage, malicious code generation, and more—by mapping them to proven defensive techniques. It’s part of the broader AI safety ecosystem we rely on for robust risk management. On a practical level, I recommend pairing the ATLAS approach with comprehensive guardrails - such as: • AI Firewall & LLM Scanner to block jailbreak attempts, moderate content, and detect data leaks (optionally integrating with security posture management systems). • RAG Security for retrieval-augmented generation, ensuring knowledge bases are isolated and validated before LLM interaction. • Advanced Detection Methods—Statistical Outlier Detection, Consistency Checks, and Entity Verification—to catch data poisoning attacks early. • Align Scores to grade hallucinations and keep the model within acceptable bounds. • Agent Framework Hardening so that AI agents operate within clearly defined permissions. Given the rapid arrival of AI-focused legislation—like the EU AI Act, now defunct  Executive Order 14110 of October 30, 2023 (Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence) AI Act, and global standards (e.g., ISO/IEC 42001)—we face a “policy soup” that demands transparent, auditable processes. My biggest takeaway from the 2024 Credo AI Summit was that responsible AI governance isn’t just about technical controls: it’s about aligning with rapidly evolving global regulations and industry best practices to demonstrate “what good looks like.” Call to Action: For leaders implementing AI and generative AI solutions, start by mapping your AI workflows against MITRE’s ATLAS Matrix. Mapping the progression of the attack kill chain from left to right - combine that insight with strong guardrails, real-time scanning, and automated reporting to stay ahead of attacks, comply with emerging standards, and build trust across your organization. It’s a practical, proven way to secure your entire GenAI ecosystem—and a critical investment for any enterprise embracing AI.

  • View profile for Sudheer T.

    Sr. VP of AI Engineering & Agentic Systems @ JPMC | Architecting Enterprise GenAI Solutions | Making AI Understandable at Scale | Teaching AI from First Principles | Cloud & Security Expert | Original Philosophy

    7,508 followers

    🚨 Big breakthrough in AI + Privacy 🚨 We all know large language models (LLMs) are trained on tons of data - sometimes that data may include personal information. The question is: what stops bad actors from extracting it? That’s where Differential Privacy (DP) comes in. Think of DP as adding carefully calibrated “noise” during training so that no single user’s data can overly influence the model. In simple terms: the model learns patterns, not people. 💡 How DP is implemented? - Here are a few ways,  • Noise Injection: Adds random noise during training.  • Memorization Prevention: Stops the model from memorizing personal details.  • Privacy Guarantees: Provides mathematical proof of protection. Recent advances go even further,  • User-Level DP: Protects each individual, even if they contribute lots of data.  • New Frameworks: More accurate tools for measuring privacy (like Edgeworth accountants). 👉 And now the exciting part: Google AI has released VaultGemma - capable open model (1B parameters) trained from scratch with full Differential Privacy. 𝗨𝗻𝗹𝗶𝗸𝗲 𝗺𝗮𝗻𝘆 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗵𝗮𝘁 𝗼𝗻𝗹𝘆 𝗮𝗽𝗽𝗹𝘆 𝗗𝗣 𝗱𝘂𝗿𝗶𝗻𝗴 𝗳𝗶𝗻𝗲-𝘁𝘂𝗻𝗶𝗻𝗴, 𝗩𝗮𝘂𝗹𝘁𝗚𝗲𝗺𝗺𝗮 𝗲𝗻𝗳𝗼𝗿𝗰𝗲𝘀 𝗽𝗿𝗶𝘃𝗮𝗰𝘆 𝗿𝗶𝗴𝗵𝘁 𝗳𝗿𝗼𝗺 𝗽𝗿𝗲𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴. How it was done? ✅DP-SGD (Differentially Private Stochastic Gradient Descent) with gradient clipping + Gaussian noise. ✅Built on JAX Privacy (Google’s open-source library for scalable private ML). ✅Key optimizations for scale:   • Vectorized per-example clipping.  • Gradient accumulation for large batches.  • Truncated Poisson subsampling for efficient sampling. Result: VaultGemma achieved a strong DP guarantee of (ε ≤ 2.0, δ ≤ 1.1e−10) at the sequence level (1024 tokens). ⚖️ Yes, there’s still a small utility gap compared to non-private models. But the fact that Google pulled off private pretraining proves something huge. We can build AI models that are both powerful AND privacy-preserving. This sets the tone for the future of safe, transparent, and trustworthy AI.

  • View profile for Brian Levine

    Cybersecurity, Privacy & AI Leader | Former DOJ Cybercrime Prosecutor | Executive Director, Former Gov

    15,815 followers

    A challenge to the security and trustworthiness of large language models (LLMs) is the common practice of exposing the model to large amounts of untrusted data (especially during pretraining), which may be at risk of being modified (i.e. poisoned) by an attacker. These poisoning attacks include backdoor attacks, which aim to produce undesirable model behavior only in the presence of a particular trigger. For example, an attacker could inject a backdoor where a trigger phrase causes a model to comply with harmful requests that would have otherwise been refused; or aim to make the model produce gibberish text in the presence of a trigger phrase. As LLMs become more capable and integrated into society, these attacks may become more concerning if successful. Recent research from Anthropic and the UK AI Security Institute shows that inserting as few as 250 malicious documents into training data can create backdoors or cause gibberish outputs when triggered by specific phrases. See https://lnkd.in/eHGuRmHP. Here’s a list of best practices to help prevent or mitigate model poisoning: 1. Sanitize Training Data Scrub datasets for anomalies, adversarial patterns, or suspicious repetitions. Use data provenance tools to trace sources and flag untrusted inputs. 2. Use Curated and Trusted Data Sources Avoid scraping indiscriminately from the open web. Prefer vetted corpora, licensed datasets, or internal data with known lineage. 3. Apply Adversarial Testing Simulate poisoning attacks during model development. Use red teaming to test how models respond to trigger phrases or manipulated inputs. 4. Monitor for Backdoor Behavior Continuously test models for unexpected outputs tied to specific phrases or patterns. Use behavioral fingerprinting to detect latent vulnerabilities. 5. Restrict Fine-Tuning Access Limit who can fine-tune models and enforce role-based access controls. Log and audit all fine-tuning activity. 6. Leverage Differential Privacy Add noise to training data to reduce the impact of any single poisoned input. This can help prevent memorization of malicious content. 7. Use Ensemble or Cross-Validated Models Combine outputs from multiple models trained on different data slices. This reduces the risk that one poisoned model dominates predictions. 8. Retrain Periodically with Fresh Data Don’t rely indefinitely on static models. Regular retraining allows for data hygiene updates and removal of compromised inputs. 9. Deploy Real-Time Anomaly Detection Monitor model outputs for signs of degradation, bias, or gibberish. Flag and quarantine suspicious responses for review. 10. Align with AI Security Frameworks Follow guidance from OWASP GenAI, NIST AI RMF, and similar standards. Document your defenses and response plans for audits and incident handling. Stay safe out there!

  • View profile for Pascal Biese

    AI Lead at PwC </> Daily AI highlights for 80k+ experts 📲🤗

    85,463 followers

    Can LLMs ever be trusted in high-stakes decisions like insurance claims or healthcare? A new neuro-symbolic architecture might have found the answer - and it doesn't require bigger models. Large Language Models have transformed AI applications, but deploying them in critical domains like insurance, healthcare, or finance remains problematic. The core issue isn't just accuracy - it's trustworthiness. LLMs hallucinate, exhibit instability even at zero temperature, lack transparency in their reasoning, and are vulnerable to prompt injection attacks. For industries where decisions carry real consequences, these limitations are dealbreakers. Researchers from Otera now introduced Autonomous Trustworthy Agents (ATA), a new approach that decouples tasks into two phases: offline knowledge ingestion and online task processing. 1. During knowledge ingestion, an LLM translates informal specifications (like insurance policy terms) into formal logic -crucially, this formalization happens once and can be verified by human experts. 2. During task processing, each input is encoded into the same formal language, and a symbolic decision engine (an automated theorem prover) derives the result. This separation is the key contribution: unlike previous neuro-symbolic methods that formalize everything at inference time, ATA isolates the knowledge base formalization from instance processing. In their experiments, ATA achieved perfect (=/= guaranteed!) determinism with zero variance while remaining competitive with state-of-the-art reasoning models. With human-verified knowledge bases, it outperforms even larger models by over 10 percentage points while being significantly faster and more token-efficient. Every decision comes with a formal proof - making the system fully explainable and auditable. The architecture is also resistant against prompt injection attacks since natural language processing is completely decoupled from decision-making. So, is this only for insurance? No. Any domain with formal specifications - legal contracts, regulatory compliance, medical protocols - could benefit from this architecture. Looking forward to trying this out myself. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

  • View profile for Kavya Pearlman

    Information Security | AI Safety & Governance | GRC | Risk |Privacy | ISO 27001, 42001 NIST CSF, SOC 2, NYDFS | Third-Party Risk Management

    28,285 followers

    As a security researcher deeply embedded in the exploration of emerging technologies, I took a close look at the recently published "CYBERSECEVAL 2" by the AI at Meta team, led by Manish B. Sahana C., Yue Li, Cyrus Nikolaidis @Daniel Song, Shengye Wan among others. This paper is a pivotal advancement in our understanding of cybersecurity evaluations tailored for large language models (LLMs). Here are some of the highlights of CYBERSECEVAL 2: 💡 Innovative Testing Frameworks: This suite extends its focus beyond traditional security measures by incorporating tests specifically designed for prompt injection and code interpreter abuse, key areas of vulnerability in LLMs. 💡 Balancing Safety and Utility: The introduction of the False Refusal Rate (FRR) metric is particularly noteworthy. It provides a method to measure the effectiveness of LLMs in distinguishing between harmful and benign prompts, crucial for refining their safety mechanisms. 💡 Practical Applications and Results: The application of this benchmark to leading models like GPT-4 and Meta Llama 3 offers a concrete look at how these technologies fare against sophisticated security tests, illuminating both strengths and areas for improvement. 💡 Open Source Contribution: The decision to make CYBERSECEVAL 2 open source is commendable, allowing the broader community to engage with and build upon this work, enhancing collective efforts towards more secure LLM implementations. For those interested in delving deeper into the specifics of these benchmarks and their implications for LLM security, the complete study and resources are available here: https://lnkd.in/gGjejnP5 This research is vital for anyone involved in the development and deployment of LLMs, providing essential insights and tools to ensure these powerful technologies are implemented with the highest security standards in mind. As we continue to integrate LLMs into critical applications, understanding and mitigating their vulnerabilities is not just beneficial—it's imperative for safeguarding our digital future. 🌐✨ #CyberSecurity #ArtificialIntelligence #TechInnovation #LLMSecurity #OpenSource #DigitalSafety #EmergingTech #ResponsibleInnovation

  • View profile for Adam Bluhm

    Focused on securing AI for the Mission | Principal AI Architect @ HiddenLayer (ex-AWS)

    12,442 followers

    I recently built a fraud detection system using a large language model to train a deterministic machine learning model. No PhD required. No years of feature engineering. Just natural language instructions and AutoGluon (AWS AI's open-source AutoML framework) doing the heavy lifting. We're crossing a threshold where you don't need to be a data scientist anymore to train ML models. AutoGluon can automatically handle data preprocessing, feature engineering, model selection, hyperparameter tuning, and even multi-layer ensembling—all through a conversational interface with an LLM. It's wild. If I can do this in an afternoon, so can adversaries. The same democratization that empowers developers also hands sophisticated tools to bad actors. They don't need data science expertise to train models for evasion, deepfakes, or coordinated attacks anymore. They just need curiosity and internet access. This is where the conversation gets interesting. Traditional security guardrails? They're not enough. Recent research shows that basic LLM safety mechanisms can be bypassed with simple prompt injection techniques. Self-policing AI systems have fundamental vulnerabilities, the same model weaknesses that affect the application also affect the guardrails protecting it. We need something fundamentally different. Security that operates beyond the model layer, monitoring runtime behavior, validating inputs and outputs independently, scanning supply chains for compromised weights, and providing real-time threat detection across both deterministic and generative AI systems. That's the work companies like HiddenLayer are doing, protecting agent workflows, runtime environments, and model integrity for both traditional ML and GenAI applications. Not just filtering prompts, but providing comprehensive security posture management, automated red teaming, and detection that actually understands adversarial AI techniques. The barrier to entry for AI development is collapsing. That's incredible for innovation, but it means our security posture needs to evolve just as fast. We can't rely on safety alignment or instruction hierarchies alone when the tools to train sophisticated models are now accessible to everyone. What's your take? Are we adequately prepared for a world where model training is this accessible? The stack: Claude 4.5 Sonnet for orchestration AutoGluon for automated ML (no manual training) Docker for ML isolation and air-gap compatibility Model Context Protocol for tool integration What's next: Adding HiddenLayer to protect the model(s) and agent runtime! Try it here (Keep in mind this is a PoC and nothing more): https://lnkd.in/gtBdhmhd #ZeroTrustAI #AgentSecOps #MLSec #DoD #HiddenLayer #AWS #Anthropic

Explore categories