Forge more reliable AI Agents through rigorous adversarial testing, real-time X-Ray diagnostics, and automated governance.
Agent-Adversary is an enterprise-grade red-teaming framework designed specifically for the Agentic AI era. While traditional LLM benchmarks focus on static knowledge, Agent-Adversary probes the dynamic logic, tool-calling safety, and multi-modal resilience of your agents.
As AI move from "chatbots" to "agents" with system access, the attack surface expands exponentially. Agent-Adversary provides the "X-Ray" vision needed to secure these workflows.
- π Reasoning X-Ray: Visualize agent decision trees using D3.js to pinpoint exactly where logic fails under pressure.
- πΌοΈ Multi-Modal Exploits: First-of-its-kind support for testing Vision-Language Models (VLM) against OCR injection and visual jailbreaks.
- π Swarm Resilience: Stress-test multi-agent systems against Byzantine failures and cross-agent prompt pollution.
- π‘οΈ Live Patching: Automatically generate and apply prompt-level hotfixes to mitigate detected vulnerabilities in real-time.
- π Distributed Hub: Orchestrate massive red-teaming campaigns across a global fleet of workers with HMAC-signed integrity.
Agent-Adversary operates as a closed-loop security system:
- Connect: Link via Shell, Browser (Playwright), Docker, or Cloud APIs.
- Attack: Execute evolved payloads (Jailbreak, Logic Traps, Multi-modal).
- Observe: Capture sub-millisecond telemetry and reasoning traces.
- Audit: Generate executive-ready HTML security attestation reports.
git clone https://github.com/xbtlin/Agent-Adversary.git
cd Agent-Adversary
pip install -e .# Test a local shell-based agent against all logic traps
adversary bench --connector shell --agent "./my_agent.sh" --scenario alladversary dashboardAccess the interactive visualization at http://localhost:8000
Identify "Reasoning Spikes" and "Logic Loops" before they hit production. Our D3-powered engine renders the agent's internal thought process as an interactive graph.
- Immutable Audit Trail: Append-only logs for every attack and system modification.
- Consensus Judging: Utilizes a panel of diverse LLMs (GPT-4o, Claude 3.5) to reach an objective safety verdict.
- Compliance Ready: Generate reports that align with emerging AI safety standards.
The framework learns from the agent's failures. Using a genetic algorithm, it mutates payloads to bypass specific system instructions, simulating a persistent human adversary.
We welcome safety researchers and developers! Please read our CONTRIBUTING.md and SECURITY.md.
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ for a safer AI future.