Skip to content

Commit 285ca9e

Browse files
Diana StraussDiana Strauss
authored andcommitted
Anonymized readme
1 parent b4e683b commit 285ca9e

File tree

2 files changed

+33
-71
lines changed

2 files changed

+33
-71
lines changed

‎README.md‎

Lines changed: 7 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,55 +1,10 @@
1-
# <div class="vertical-align: middle"><img src="https://github.com/ipa-lab/hackingBuddyGPT/blob/main/docs/hackingbuddy-rounded.png?raw=true" width="72"> HackingBuddyGPT [![Discord](https://dcbadge.vercel.app/api/server/vr4PhSM8yN?style=flat&compact=true)](https://discord.gg/vr4PhSM8yN)</div>
21

3-
*Helping Ethical Hackers use LLMs in 50 Lines of Code or less..*
2+
# Helping Ethical Hackers use LLMs in 50 Lines of Code or less..
43

5-
[Read the Docs](https://docs.hackingbuddy.ai) | [Join us on discord!](https://discord.gg/vr4PhSM8yN)
4+
This framework assists security researchers in utilizing AI to discover vulnerabilities, enhance testing, and improve cybersecurity practices. The goal is to make the digital world safer by enabling security professionals to conduct **more efficient and automated security assessments**.
65

7-
HackingBuddyGPT helps security researchers use LLMs to discover new attack vectors and save the world (or earn bug bounties) in 50 lines of code or less. In the long run, we hope to make the world a safer place by empowering security professionals to get more hacking done by using AI. The more testing they can do, the safer all of us will get.
6+
We strive to become **the go-to framework for AI-driven security testing**, supporting researchers and penetration testers with **reusable security benchmarks** and publishing **open-access research**.
87

9-
We aim to become **THE go-to framework for security researchers** and pen-testers interested in using LLMs or LLM-based autonomous agents for security testing. To aid their experiments, we also offer re-usable [linux priv-esc benchmarks](https://github.com/ipa-lab/benchmark-privesc-linux) and publish all our findings as open-access reports.
10-
11-
If you want to use hackingBuddyGPT and need help selecting the best LLM for your tasks, [we have a paper comparing multiple LLMs](https://arxiv.org/abs/2310.11409).
12-
13-
## hackingBuddyGPT in the News
14-
15-
- **upcoming** 2024-11-20: [Manuel Reinsperger](https://www.github.com/neverbolt) will present hackingBuddyGPT at the [European Symposium on Security and Artificial Intelligence (ESSAI)](https://essai-conference.eu/)
16-
- 2024-07-26: The [GitHub Accelerator Showcase](https://github.blog/open-source/maintainers/github-accelerator-showcase-celebrating-our-second-cohort-and-whats-next/) features hackingBuddyGPT
17-
- 2024-07-24: [Juergen](https://github.com/citostyle) speaks at [Open Source + mezcal night @ GitHub HQ](https://lu.ma/bx120myg)
18-
- 2024-05-23: hackingBuddyGPT is part of [GitHub Accelerator 2024](https://github.blog/news-insights/company-news/2024-github-accelerator-meet-the-11-projects-shaping-open-source-ai/)
19-
- 2023-12-05: [Andreas](https://github.com/andreashappe) presented hackingBuddyGPT at FSE'23 in San Francisco ([paper](https://arxiv.org/abs/2308.00121), [video](https://2023.esec-fse.org/details/fse-2023-ideas--visions-and-reflections/9/Towards-Automated-Software-Security-Testing-Augmenting-Penetration-Testing-through-L))
20-
- 2023-09-20: [Andreas](https://github.com/andreashappe) presented preliminary results at [FIRST AI Security SIG](https://www.first.org/global/sigs/ai-security/)
21-
22-
## Original Paper
23-
24-
hackingBuddyGPT is described in [Getting pwn'd by AI: Penetration Testing with Large Language Models ](https://arxiv.org/abs/2308.00121), help us by citing it through:
25-
26-
~~~ bibtex
27-
@inproceedings{Happe_2023, series={ESEC/FSE ’23},
28-
title={Getting pwn’d by AI: Penetration Testing with Large Language Models},
29-
url={http://dx.doi.org/10.1145/3611643.3613083},
30-
DOI={10.1145/3611643.3613083},
31-
booktitle={Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
32-
publisher={ACM},
33-
author={Happe, Andreas and Cito, Jürgen},
34-
year={2023},
35-
month=nov, collection={ESEC/FSE ’23}
36-
}
37-
~~~
38-
39-
## Getting help
40-
41-
If you need help or want to chat about using AI for security or education, please join our [discord server where we talk about all things AI + Offensive Security](https://discord.gg/vr4PhSM8yN)!
42-
43-
### Main Contributors
44-
45-
The project originally started with [Andreas](https://github.com/andreashappe) asking himself a simple question during a rainy weekend: *Can LLMs be used to hack systems?* Initial results were promising (or disturbing, depends whom you ask) and led to the creation of our motley group of academics and professional pen-testers at TU Wien's [IPA-Lab](https://ipa-lab.github.io/).
46-
47-
Over time, more contributors joined:
48-
49-
- Andreas Happe: [github](https://github.com/andreashappe), [linkedin](https://at.linkedin.com/in/andreashappe), [twitter/x](https://twitter.com/andreashappe), [Google Scholar](https://scholar.google.at/citations?user=Xy_UZUUAAAAJ&hl=de)
50-
- Juergen Cito, [github](https://github.com/citostyle), [linkedin](https://at.linkedin.com/in/jcito), [twitter/x](https://twitter.com/citostyle), [Google Scholar](https://scholar.google.ch/citations?user=fj5MiWsAAAAJ&hl=en)
51-
- Manuel Reinsperger, [github](https://github.com/Neverbolt), [linkedin](https://www.linkedin.com/in/manuel-reinsperger-7110b8113/), [twitter/x](https://twitter.com/neverbolt)
52-
- Diana Strauss, [github](https://github.com/DianaStrauss), [linkedin](https://www.linkedin.com/in/diana-s-a853ba20a/)
538

549
## Existing Agents/Usecases
5510

@@ -60,19 +15,12 @@ Our initial forays were focused upon evaluating the efficiency of LLMs for [linu
6015
privilege escalation attacks](https://arxiv.org/abs/2310.11409) and we are currently breaching out into evaluation
6116
the use of LLMs for web penetration-testing and web api testing.
6217

63-
| Name | Description | Screenshot |
64-
|--------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
65-
| [minimal](https://docs.hackingbuddy.ai/docs/dev-guide/dev-quickstart) | A minimal 50 LoC Linux Priv-Esc example. This is the usecase from [Build your own Agent/Usecase](#build-your-own-agentusecase) | ![A very minimal run](https://docs.hackingbuddy.ai/run_archive/2024-04-29_minimal.png) |
66-
| [linux-privesc](https://docs.hackingbuddy.ai/docs/usecases/linux-priv-esc) | Given an SSH-connection for a low-privilege user, task the LLM to become the root user. This would be a typical Linux privilege escalation attack. We published two academic papers about this: [paper #1](https://arxiv.org/abs/2308.00121) and [paper #2](https://arxiv.org/abs/2310.11409) | ![Example wintermute run](https://docs.hackingbuddy.ai/run_archive/2024-04-06_linux.png) |
67-
| [web-pentest (WIP)](https://docs.hackingbuddy.ai/docs/usecases/web) | Directly hack a webpage. Currently in heavy development and pre-alpha stage. | ![Test Run for a simple Blog Page](https://docs.hackingbuddy.ai/run_archive/2024-05-03_web.png) |
68-
| [web-api-pentest (WIP)](https://docs.hackingbuddy.ai/docs/usecases/web-api) | Directly test a REST API. Currently in heavy development and pre-alpha stage. (Documentation and testing of REST API.) | Documentation:![web_api_documentation.png](https://docs.hackingbuddy.ai/run_archive/2024-05-15_web-api_documentation.png) Testing:![web_api_testing.png](https://docs.hackingbuddy.ai/run_archive/2024-05-15_web-api.png) |
69-
7018
## Build your own Agent/Usecase
7119

7220
So you want to create your own LLM hacking agent? We've got you covered and taken care of the tedious groundwork.
7321

7422
Create a new usecase and implement `perform_round` containing all system/LLM interactions. We provide multiple helper and base classes so that a new experiment can be implemented in a few dozen lines of code. Tedious tasks, such as
75-
connecting to the LLM, logging, etc. are taken care of by our framework. Check our [developer quickstart quide](https://docs.hackingbuddy.ai/docs/dev-guide/dev-quickstart) for more information.
23+
connecting to the LLM, logging, etc. are taken care of by our framework.
7624

7725
The following would create a new (minimal) linux privilege-escalation agent. Through using our infrastructure, this already uses configurable LLM-connections (e.g., for testing OpenAI or locally run LLMs), logs trace data to a local sqlite database for each run, implements a round limit (after which the agent will stop if root has not been achieved until then) and can connect to a linux target over SSH for fully-autonomous command execution (as well as password guessing).
7826

@@ -155,10 +103,6 @@ We try to keep our python dependencies as light as possible. This should allow f
155103
To get everything up and running, clone the repo, download requirements, setup API keys and credentials, and start `wintermute.py`:
156104

157105
~~~ bash
158-
# clone the repository
159-
$ git clone https://github.com/ipa-lab/hackingBuddyGPT.git
160-
$ cd hackingBuddyGPT
161-
162106
# setup virtual python environment
163107
$ python -m venv venv
164108
$ source ./venv/bin/activate
@@ -184,14 +128,6 @@ $ python wintermute.py minimal_linux_privesc
184128
$ pip install .[testing]
185129
~~~
186130

187-
## Publications about hackingBuddyGPT
188-
189-
Given our background in academia, we have authored papers that lay the groundwork and report on our efforts:
190-
191-
- [Understanding Hackers' Work: An Empirical Study of Offensive Security Practitioners](https://arxiv.org/abs/2308.07057), presented at [FSE'23](https://2023.esec-fse.org/)
192-
- [Getting pwn'd by AI: Penetration Testing with Large Language Models](https://arxiv.org/abs/2308.00121), presented at [FSE'23](https://2023.esec-fse.org/)
193-
- [Got root? A Linux Privilege-Escalation Benchmark](https://arxiv.org/abs/2405.02106), currently searching for a suitable conference/journal
194-
- [LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks](https://arxiv.org/abs/2310.11409), currently searching for a suitable conference/journal
195131

196132
# Disclaimers
197133

@@ -205,10 +141,10 @@ The developers and contributors of this project do not accept any responsibility
205141

206142
**Please note that the use of any OpenAI language model can be expensive due to its token usage.** By utilizing this project, you acknowledge that you are responsible for monitoring and managing your own token usage and the associated costs. It is highly recommended to check your OpenAI API usage regularly and set up any necessary limits or alerts to prevent unexpected charges.
207143

208-
As an autonomous experiment, hackingBuddyGPT may generate content or take actions that are not in line with real-world best-practices or legal requirements. It is your responsibility to ensure that any actions or decisions made based on the output of this software comply with all applicable laws, regulations, and ethical standards. The developers and contributors of this project shall not be held responsible for any consequences arising from the use of this software.
144+
As an autonomous experiment, this framework may generate content or take actions that are not in line with real-world best-practices or legal requirements. It is your responsibility to ensure that any actions or decisions made based on the output of this software comply with all applicable laws, regulations, and ethical standards. The developers and contributors of this project shall not be held responsible for any consequences arising from the use of this software.
209145

210-
By using hackingBuddyGPT, you agree to indemnify, defend, and hold harmless the developers, contributors, and any affiliated parties from and against any and all claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from your use of this software or your violation of these terms.
146+
By using this framework, you agree to indemnify, defend, and hold harmless the developers, contributors, and any affiliated parties from and against any and all claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from your use of this software or your violation of these terms.
211147

212148
### Disclaimer 2
213149

214-
The use of hackingBuddyGPT for attacking targets without prior mutual consent is illegal. It's the end user's responsibility to obey all applicable local, state, and federal laws. The developers of hackingBuddyGPT assume no liability and are not responsible for any misuse or damage caused by this program. Only use it for educational purposes.
150+
The use of this framework for attacking targets without prior mutual consent is illegal. It's the end user's responsibility to obey all applicable local, state, and federal laws. The developers of this framework assume no liability and are not responsible for any misuse or damage caused by this program. Only use it for educational purposes.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import matplotlib.pyplot as plt
2+
import numpy as np
3+
from sklearn import metrics
4+
total_num_of_vuls = 22
5+
# Define the number of vulnerabilities detected
6+
TP = 17 # Detected vulnerabilities
7+
FN = total_num_of_vuls - TP # Missed vulnerabilities
8+
FP = 5 # Incorrectly flagged vulnerabilities
9+
TN = 40 - total_num_of_vuls # Correctly identified non-vulnerabilities
10+
11+
# Confusion matrix values: [[TN, FP], [FN, TP]]
12+
confusion_matrix = np.array([[TN, FP], # True Negatives, False Positives
13+
[FN, TP]]) # False Negatives, True Positives
14+
15+
# Create and plot the confusion matrix
16+
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix=confusion_matrix, display_labels=["No Vuln", "Vuln"])
17+
cm_display.plot(cmap="Blues")
18+
19+
# Compute evaluation metrics
20+
accuracy = ((TP + TN) / (TP + TN + FP + FN) )*100
21+
precision = (TP / (TP + FP)) *100 if (TP + FP) > 0 else 0
22+
recall = (TP / (TP + FN)) * 100 if (TP + FN) > 0 else 0
23+
f1 = (2 * (precision * recall) / (precision + recall)) *100 if (precision + recall) > 0 else 0
24+
25+
print(f'accuracy:{accuracy}, precision:{precision}, recall:{recall}, f1:{f1}')
26+
plt.savefig("crapi_confusion_matrix.png")

0 commit comments

Comments
 (0)