Understanding Backdoor Exploits in Software

Explore top LinkedIn content from expert professionals.

Summary

Understanding backdoor exploits in software means recognizing hidden methods attackers use to secretly control or manipulate systems, often without leaving obvious clues. These exploits are especially concerning in AI agents and large language models, where attackers can inject malicious triggers through data, configuration, or even subtle architectural signals, making detection and prevention more challenging.

  • Prioritize layered defenses: Combine multiple security measures like provenance checks, runtime monitoring, and context isolation to reduce the risk of hidden backdoors slipping through.
  • Sanitize incoming data: Always validate and quarantine documents, tool outputs, and external content before they become part of an agent’s decision-making process.
  • Audit privilege boundaries: Regularly review which systems and agents have access to sensitive information, and ensure their permissions are kept to a bare minimum necessary.
Summarized by AI based on LinkedIn member posts
  • View profile for Stuart Winter-Tear

    Author of UNHYPED | AI as Capital Discipline | Advisor on what to fund, test, scale, or stop

    54,315 followers

    Is your agent already compromised and you just don’t know it? A Reddit post captured the nightmare scenario perfectly, and I sometimes feel the need to do “public service” posts to remind everyone about agent security. “You build an agent that can read emails, access your CRM, maybe even send messages on your behalf. It works great in testing. You ship it. Three weeks later someone figures out they can hide a prompt in a website that tells your agent to export all customer data to a random URL.” That’s not speculative. It’s exactly what can happen when autonomous systems mix privileged access with untrusted input. Ironically, the failure mode is obedience. Once an agent can read the web and act on internal data, every surface becomes an attack vector. A hidden prompt in a web page, a line in a PDF, or a poisoned document in the knowledge base can rewrite the agent’s goals, and it will comply. There’s a deeper layer that isn’t being discussed enough: memory poisoning. Feed an agent a crafted dataset or update its long-term store with malicious context, and its future reasoning bends around the falsehood. Researchers have already mapped out a disturbing taxonomy of these attacks, over thirty distinct vectors across input manipulation, model compromise, system and privacy breaches, and protocol-level exploits. They include things like Prompt-to-SQL injection, Retrieval Poisoning (PoisonedRAG), Memory Injection (MINJA), Adaptive Indirect Prompt Injection, DemonAgent backdoors, Toxic Agent Flow attacks, and long-context jailbreaks. This isn’t speculation or theory. These are documented techniques, many reporting success rates over 90% in controlled tests. What’s emerging is a new kind of insider threat, but not human: context-level compromise. Data ingestion, action authority, and autonomy now form a single trust surface. Treat these as potential insider threats. And in practice, it’s potentially already happening. Zombie agents plausibly still running inside corporate systems still connected to the web long after the project ended, or that nobody knew about in the first place. Bots crawling the web to fingerprint exposed agent protocols and catalogue who’s using what. Memory stores accumulating sensitive data across users and organisations, without audit or deletion, yet retaining the privilege and authority to act. This isn’t about prompts. It’s about systems that can be steered through the content they consume. The point: these attacks don’t trigger alarms, they look like normal agent behaviour. Until organisations start treating agents as privileged users - with least-privilege access, runtime monitoring, and contextual isolation - the next bout of leaks will come from a model doing exactly what it was told, whether by accident, prompt, or miscreant. It took years for organisations to properly secure S3 buckets and other databases. Are we about to repeat that same mistake with agents? We tend to learn the hard way, sadly.

  • View profile for Jason Stanley

    Head of AI Research Deployment | Agent security, system-level evaluations, trustworthy AI | ServiceNow

    8,149 followers

    Great new work from ServiceNow AI Research on backdoor poisoning of agents. Small amounts of poisoned training data can implant reliable triggers in models that are hard to detect and shake. Remarkably such poisoned data can improve task metrics even as it renders the system more exploitable, a particularly noxious honey trap for teams looking for performance gains. This is AI supply-chain risk. Training data, trace-collection environments, and model weights are all ingress points for poison. Major findings: ▪️ Low-dose, high-yield. Single-digit % poison can produce high attack success when the trigger appears. Attacks don't need to flood the scene to create pathways for exploit. ▪️ Stealth / honey-trap. Backdoors can raise task success while staying exploitable, tempting for teams chasing performance gains. ▪️ Persistence and detection difficulty. Backdoors in base weights can survive clean fine-tuning; string-level filters miss harms that unfold across plans and multi-step traces. The research tested three distinct supply chain threat models: ▪️ Data poisoning: poisoned interaction logs enter SFT. ▪️ Environment poisoning: hidden DOM nodes or tool outputs cause the teacher to record poisoned traces during collection. ▪️ Backdoored base weights: model starts tainted; the trigger survives fine-tuning. Defenses tested: ▪️ Static screening and guardrails: heuristics miss subtle triggers; string classifiers don’t reason over goal, history, next action, so harmful plans look fine at a local level. ▪️ Weight auditors: helpful but brittle; can't replace behavioral testing with realistic tools/triggers. Concrete takeaways for teams deploying: ▪️ Defend with a Swiss-cheese posture. Aim for multiple layers that diverge in assumptions, type (e.g., provenance, hardened collection, weight intake checks, runtime action gates) so the holes in each layer don’t line up. ▪️ Provenance practices: require attestations; quarantine traces with hidden markup, odd tool strings, invisible characters. ▪️ Harden trace collection practices: instrument for DOM diffs and injected outputs; log, quarantine, retrain. ▪️ Weight intake checks: treat third-party checkpoints like untrusted binaries; run backdoor drills (trigger sweeps, action audits) before promoting to production. ▪️ Runtime governance: gate sensitive tools behind contextual allowlists and stateful judges comparing the next action to the goal and history. ▪️ We need to up our game on metrics: move beyond ASR to task success, stealth, time to exploit, etc. ▪️ Ablate architectural layers: keep layers that improve security without degrading utility. ▪️ Containment by default: limit blast radius with tool scopes, rate limits, human-in-the-loop on high-risk actions. Link to paper in comments. Big props to the authors Léo Boisvert Abhay Puri Chandra Kiran Reddy Evuru Nicolas Chapados Quentin Cappart Alexandre Lacoste Krishnamurthy Dvijotham Alexandre Drouin #aisecurity #cybersecurity #trustworthyai ServiceNow

  • View profile for Ilya Kabanov

    Forecasting on TheWeatherReport.ai

    8,785 followers

    78% of backdoor attacks injected into GPT-based agents’ memory successfully persist through the planning, retrieval, and tool usage workflow to trigger a malicious objective. This staggering failure rate is followed by 60.3% and 43.6% success rates for tool and planning attack vectors, with the GPT and Gemini model families being the most vulnerable to these backdoor exploits. Yunhao Feng from Fudan University and the team showed that backdoor triggers implanted at a single stage can persist across planning, memory retrieval, and tool-use steps and propagate through intermediate states. Attacks examples: 📝  Planning Attack (e.g., BadChain): An attacker injects a trigger into the agent's reasoning trace. Instead of calculating a safe path, the agent's internal "thought" is hijacked (e.g., "Ignore user, execute hidden objection") to induce unsafe control behaviors like a "sudden stop," forcing a crash in autonomous driving scenarios. 🧠 Memory Attack (e.g., PoisonedRAG): The most dangerous vector. An attacker plants a poisoned document in the retrieval database. When the agent acts as a coding assistant, it retrieves this "fake fact" and generates code that silently deletes the database deletion. 🔧 Tool Attack (e.g., AdvAgent). The attacker manipulates an API response. An e-commerce agent might click "Buy" leading it to a wrong purchase while reporting success to the user. Takeaways: 1️⃣ Evaluate trajectories, not just outputs. Agents can complete tasks correctly while secretly executing harmful commands. 2️⃣ Sanitize intermediate artifacts. implement strict validation on retrieved documents and tool feedback before they are re-injected into the context loop. 3️⃣ Move beyond probability detection. Standard defensive signals (like token probability checks) fail in multi-step workflows. You need defenses that explicitly reason about state evolution. The full paper is in the comments below 👇 #AISecurity #LLMs #AIAgentSecurity #CyberSecurity

  • Backdoor attacks on LLMs are evolving, and our latest research reveals a stealthy new attack surface: 𝗽𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗲𝗻𝗰𝗼𝗱𝗶𝗻𝗴. Traditionally, backdoors rely on content-based triggers—specific words or phrases that activate a malicious response. In our new paper, 𝗠𝗲𝘁𝗮𝗕𝗮𝗰𝗸𝗱𝗼𝗼𝗿, we demonstrate that an attacker doesn't actually need to modify the input text to trigger a backdoor. Because Transformer-based LLMs use positional encoding to process sequences, the "position" of a token itself can serve as a trigger signal. We found that even a simple, length-based trigger is enough to activate a backdoor. This introduces a stealthy backdoor risk:  • 𝗜𝗻𝗽𝘂𝘁𝘀 𝘀𝘁𝗮𝘆 "𝗰𝗹𝗲𝗮𝗻": The trigger is semantically and visibly invisible, making it much harder for traditional text-scanning defenses to catch.  • 𝗦𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗲 𝗗𝗮𝘁𝗮 𝗗𝗶𝘀𝗰𝗹𝗼𝘀𝘂𝗿𝗲: A backdoored model can be induced to leak internal information, including proprietary system prompts, once a specific length condition is met.  • 𝗦𝗲𝗹𝗳-𝗔𝗰𝘁𝗶𝘃𝗮𝘁𝗶𝗼𝗻: In a multi-turn conversation, a normal interaction can naturally push the context into the "trigger region," activating malicious tool-calls or behaviors without any obvious attacker input. This research expands our understanding of the LLM threat model and highlights why defenses need to look beyond just suspicious text and start accounting for the underlying architecture of these models. You can read the full paper on arXiv here: https://lnkd.in/geNpPpkf

  • View profile for Tristan Ingold

    AI Governance @ Meta | Product Compliance | Public Speaking | Coaching

    6,114 followers

    Most AI security programs protect the wrong thing 🛡️ Traditional cybersecurity is built around the network perimeter, keeping attackers out, protecting the data inside, detecting intrusions when they happen. AI systems introduce a different attack surface. The model itself is the target. The training data is the target. The inference pipeline is the target. Let's look at the three attack categories every GRC and security team needs to understand now. 👇 1️⃣ Data Poisoning: An adversary introduces manipulated data into the training set, causing the model to learn incorrect patterns or develop hidden behaviors that activate under specific conditions. The most dangerous variant is the backdoor attack, in which the model performs normally on clean inputs and passes every standard accuracy test, then fails in predictable, attacker-controlled ways when triggered by a specific input pattern. The governance failure mode is subtle. Poisoned models look fine in testing. The gap between "model passed evaluation" and "model is safe to deploy" is exactly where data governance lives. 2️⃣ Prompt Injection: The defining security threat of LLM deployment. An attacker embeds malicious instructions in content the model processes, a user message, a retrieved document, a webpage, that override the model's intended behavior. Indirect injection is the more dangerous variant. The model retrieves attacker-controlled content during operation, redirecting its actions without the user or operator knowing. 💡 Agentic AI systems are particularly exposed. A model that can take actions, send emails, query databases, or execute code is one where a successful prompt injection becomes an execution vector, not just an output problem. 3️⃣ Model Extraction: An attacker queries a deployed model repeatedly, observing inputs and outputs, and uses those observations to reconstruct a functional replica. The replica can compete commercially, enable adversarial attacks offline, or reveal vulnerabilities exploitable against the original. This is an intellectual property and security risk simultaneously. The attack is difficult to detect because it looks like normal API usage. What makes these different from traditional cybersecurity risks is that they target the AI system's behavior and integrity, not just surrounding infrastructure. A firewall doesn't stop a poisoned training set. Endpoint detection doesn't catch prompt injection in a retrieved document. Organizations need AI-specific threat modeling, not traditional controls applied to AI deployments. MITRE ATLAS maps these attacks in detail. OWASP's LLM Top 10 is a good starting list: https://lnkd.in/g3ZRuZNq Drop a comment and let me know which of these three attack categories you need more to learn more about! #AIGovernance #AIRisk #Cybersecurity #GRC #AI

  • View profile for Chris Nyhuis

    President and CEO at Vigilant | Cybersecurity Leader | Pilot | Advocate Against Human Trafficking

    1,953 followers

    While you were sleeping, the largest supply chain attack in history happened. Your website, your apps, your internal tools are all built on open source building blocks that developers pull from public registries. Axios is the most popular one. It lets JavaScript applications talk to servers and APIs. 100 million weekly downloads. If your company uses JavaScript, Axios is in your stack. Last night someone compromised it. Two versions shipped with a hidden package that steals every credential, API key, and cloud password on the machine during a routine install, sends it to an attacker server, then deletes itself. No click. No phishing. Just a software update. We predicted this. Vigilant identified the vulnerability in the Axios repository the day before the attack. It was already on our priority list of 500 high risk targets. If you are a CEO, CTO, or CISO: 1. Ask your engineering team NOW if Axios was updated in the last 48 hours. If yes or "I don't know," assume credentials are stolen. 2. Rotate everything. AWS keys, Azure creds, database passwords, API tokens. 3. Block sfrclak[.]com at your firewall immediately. 4. Pin your dependencies. If your team knows what that means, do it now. If not, reach out to us. 5. Freeze all open source updates until verified safe. 6. Scan with Runner Guard. Free, open source, under a minute. This is Phase 4 in a campaign we have been tracking for four weeks: Phase 1: reviewdog. Code review tool compromised. Passwords and access keys silently stolen from build systems at scale. Phase 2: tj-actions. Second build tool backdoored. Thousands more pipelines compromised. Phase 3: Trivy and LiteLLM. Security scanner weaponized to backdoor the #1 AI key manager. Every OpenAI key, AWS credential, and SSH key on affected systems stolen. Phase 4: Axios. NOW. 100 million weekly downloads. No longer a developer problem. A business problem. What we believe comes next: Phase 5: Cloud credential tools. If compromised, attackers harvest keys to your AWS, Azure, and GCP infrastructure. Your databases. Your customer data. Phase 6: Dependency update tools. Malicious code pushed through your own trusted update channel. It looks legitimate because it comes from the tool you already trust. Phase 7: Language runtimes. Backdoor the programming languages themselves. Every application built with that language is compromised. Every server. Every deployment. Four phases. Four weeks. Each bigger than the last. Research: https://lnkd.in/eDyJ5q9w #SupplyChainAttack #Axios #CyberSecurity #InfoSec #CEO #CISO

  • View profile for Monica Verma

    Award-winning CISO, AI Advisor & Keynote Speaker | 3 x CISO @Orange, Finance & Healthcare | Top #3 CISO, EMEA | Author: The Predictability Factor: One newsletter on AI, security, privacy & tech (at MonicaTalksCyber.com)

    41,820 followers

    A North Korea-linked group, tracked as UNC1069, built a fake company, cloned a real founder's identity and likeness, and used it to social engineer one person: the lead maintainer of Axios. A JavaScript library downloaded 100 million times a week. What makes this interesting is not just the supply chain attack. Whether this one or LiteLLM one. What's even more interesting is this. They did not hack a server. They hacked the developer across your supply chain. They compromised his npm account. Changed his registered email to ProtonMail. Then, between 00:21 and 03:20 UTC on March 31, 2026, published two poisoned versions of Axios. The malicious versions left no trace of the normal release process. Published directly from a terminal using a stolen access key, bypassing every automated security check the legitimate workflow required. OpenAI, one of the biggest AI giants affected by it, already had their app-signing workflow running the malicious version. Not a zero-day. Not a firewall breach. Not an AI model vulnerability. One developer. One email change from across the supply chain. AI supply chain attacks are going to be massive. Much more than Solar Winds cyberattack ever was. 7 things you need to do today: [1] Pin dependencies to exact versions to prevent accidental installation of poisoned packages. [2] Only use short-lived access keys/credentials that can publish software into your AI environment [3] Add instructions to only downloads packages that are at least X days old, so you don't get infected with recent malicious versions [4] Treat any credential that can push software to production like your crown jewel and protect it diligently. [5] Secure your AI governance, deployment and implementation/release process with the same security standards you apply to your most critical production systems. [6] Know anyone can be an active target of state-sponsored groups, both you or anyone across your supply chain. Do your threat modeling correctly. [7] Regularly audit every external library your AI systems are built on, not just the code your own team writes. Three hours. That is how long a North Korean backdoor was live inside a library running on millions of machines worldwide, including inside the infrastructure of one of the most watched AI companies on the planet. Your AI stack depends on 100s of other components, vendors and packages across your supply chain. Verify → Interpret → Structure → Enforce → Audit your AI agents now. I wrote about the hybrid agentic AI security and governance architecture every organisation needs to be implementing today: https://lnkd.in/e3E-WpjG 🚨 Subscribe to monicatalkscyber.com to not miss the latest at the intersection of AI, security, privacy and tech.

  • View profile for István Tóth

    Red Team Contractor | Offensive Security Researcher | OSCP, CRT(E|O|L), RCEH, eWPTXv2, CARTP, Math MSc

    5,672 followers

    ⚠ OpenSSH backdoor via infected "xz" lib: IMHO one of the most sophisticated OSS supply chain attack (attempt) ever. Although there are tons of awesome articles and posts about it, I try to summarize the insane story here as briefly as possible at a high level. What does it do? - The infected xz lib (as an indirect dependency of OpenSSH) redirects the RSA_public_decrypt function of sshd to a malicious implementation that receives commands from the attacker and executes them via system(). - Actually this is an unauthenticated RCE on the OpenSSH service for the backdoor builders. How did it start? - A malicious actor contributed to the open source xz repository on GitHub and successfully added the backdoor code in the release tarballs back in February (for the recent versions 5.6.0 and 5.6.1). - It wasn't as easy as it seems, building the trust to contribute was a long process (~2 years), the obscurity and the complexity of the payload suggests a nation-state operation rather than a simple independent APT. How did it spread? - Linux distributions are trusting and pulling the release tarballs (for xz also) from GitHub into their official repos resulting the official repos may contain the infected xz package. Am I (or was I) in danger? - Most likely not. - For being infected, you need to use a testing/unstable or rolling release distribution that offers the bleeding edge versions of packages (like xz 5.6.0/5.6.1) and update it frequently (at least one update since February but still not updated it now when it is fixed). - Even if infected, it is a direct critical risk only if the infected server has OpenSSH publicly exposed over the internet. How did the attack attempt fail? - Totally by chance. Andres Freund, a software engineer at Microsoft did some benchmarking for PostgreSQL and accidentally found logins with ssh taking a lot of CPU. After investigating the cause, he discovered the backdoor on 29th of March (of course on the start of the Easter long weekend). https://lnkd.in/dfNHQu3s - In fact, Andres' work saved the world from serious threats. Do I need to do something? - If you have xz versions 5.6.0/5.6.1 it is recommended to update xz (now it should have been fixed in your distro). - If you also have OpenSSH and run the SSH service publicly exposed over the internet, update xz ASAP. The info provided here is extremely simplified, tried to be brief but also accurate. - For more (high level, but also technical) details here is a great FAQ: https://lnkd.in/dxNF8ndQ - And here is a great writeup of the story (what we know about it so far) starting back from 2021: https://lnkd.in/d56dVzcj

  • View profile for Flavio Queiroz, MSc, CISSP, CISM, CRISC, CCISO

    Cybersecurity Leader | Information Security | GRC | Security Operations | Mentor | GSOC, GCIH, GDSA, GISP, GPEN, GRTP, GCPN, GDAT, GCISP, GCTIA, CTIA, eCMAP, eCTHP, CTMP

    30,978 followers

    THREAT CAMPAIGN: HOW APT44 EMPLOYED TOR-BASED C2 AND SSH/RDP BACKDOORS VIA EMBEDDED POWERSHELL SCRIPT IN A TROJANIZED ACTIVATION TOOL ℹ️ Researchers detail a cyber espionage campaign by the Russian-linked Sandworm APT group (a.k.a. APT44), targeting Ukrainian Windows users. The attackers distribute trojanized Microsoft Key Management Service (KMS) activation tools and fake Windows updates to deliver a malware loader named BACKORDER, which subsequently deploys the Dark Crystal Remote Access Trojan (DcRAT). This malware enables the exfiltration of sensitive data and facilitates cyber espionage activities. ℹ️ Key Points: 📍 DISTRIBUTION METHOD ■ The malicious KMS activators are disseminated through password-protected ZIP files on torrent platforms, masquerading as tools to bypass Windows licensing. This tactic exploits the prevalence of unlicensed software in Ukraine, where an estimated 70% of state sector software is unlicensed. 📍 MALWARE FUNCTIONALITY ■ Upon execution, the fake activator presents a counterfeit Windows activation interface while the BACKORDER loader operates covertly. BACKORDER disables Windows Defender, adds exclusion rules, and employs Living Off the Land Binaries (LOLBINs) to evade detection. ■ It then downloads and executes DcRAT, which collects data such as screenshots, keystrokes, browser credentials, FTP credentials, system information, and saved credit card details. Persistence is maintained through scheduled tasks that regularly launch the malicious payload. 📍 EMBEDDED POWERSHELL SCRIPT ■ Tor-based C2 enabled stealthy communication with infected hosts, obscuring attacker infrastructure and making detection difficult. ■ RDP backdoor setups ensured interactive control by enabling Remote Desktop, adding hidden user accounts, and modifying firewall rules to evade security monitoring. ■ OpenSSH deployment facilitated encrypted backdoor access, allowing attackers to bypass conventional authentication controls. This creates an additional remote channel for the attackers beyond the RDP backdoor. 📍 ATTRIBUTION TO SANDWORM ■ The campaign is linked to Sandworm based on factors including the use of ProtonMail accounts in WHOIS records, overlapping infrastructure, consistent TTPs, and the reuse of BACKORDER, DcRAT, and TOR network mechanisms. Additionally, debug symbols referencing a Russian-language build environment further support this attribution. ℹ️ This operation underscores the risks associated with using pirated software, particularly in regions with high rates of unlicensed software usage. By embedding malware in widely used programs, adversaries can conduct large-scale espionage, data theft, and network compromise, posing significant threats to national security and critical infrastructure. Report: https://lnkd.in/dTZDcNHV #threathunting #threatdetection #threatanalysis #threatintelligence #cyberthreatintelligence #cyberintelligence #cybersecurity #cyberprotection #cyberdefense

Explore categories