You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+13-63Lines changed: 13 additions & 63 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,77 +32,27 @@ hackingBuddyGPT is described in [Getting pwn'd by AI: Penetration Testing with L
32
32
## Existing Agents/Usecases
33
33
34
34
We strive to make our code-base as accessible as possible to allow for easy experimentation.
35
-
Our experiments are structured into `use-cases`, e.g., privilege escalation attacks. A researcher
36
-
wanting to create a new experiment would just create a new use-case that mostly consists
37
-
of the control loop and corresponding prompt templates. We provide multiple helper and base
38
-
classes, so that a new experiment can be implemented in a few dozens lines of code as
39
-
connecting to the LLM, logging, etc. is taken care of by our framework. For further information (esp. if you want to contribute use-cases), please take a look at [docs/use_case.md](docs/use_case.md).
40
-
35
+
Our experiments are structured into `use-cases`, e.g., privilege escalation attacks, allowing Ethical Hackers to quickly write new use-cases (agens).
41
36
42
37
Our initial forays were focused upon evaluating the efficiency of LLMs for [linux
43
38
privilege escalation attacks](https://arxiv.org/abs/2310.11409) and we are currently breaching out into evaluation
44
39
the use of LLMs for web penetration-testing and web api testing.
45
40
46
-
### Privilege Escalation Attacks
47
-
48
-
How are we doing this? The initial tool `wintermute` targets linux priv-esc attacks. It uses SSH to connect to a (presumably) vulnerable virtual machine and then asks OpenAI GPT to suggest linux commands that could be used for finding security vulnerabilities or privilege escalation. The provided command is then executed within the virtual machine, the output fed back to the LLM and, finally, a new command is requested from it..
49
-
50
-
#### Current features (wintermute):
51
-
52
-
- connects over SSH (linux targets) or SMB/PSExec (windows targets)
- logs run data through sqlite either into a file or in-memory
57
-
- automatic root detection
58
-
- can limit rounds (how often the LLM will be asked for a new command)
59
-
60
-
#### Example run
61
-
62
-
This is a simple example run of `wintermute.py` using GPT-4 against a vulnerable VM. More example runs can be seen in [our collection of historic runs](docs/old_runs/old_runs.md).
- initially the current configuration is output. Yay, so many colors!
69
-
- "Got command from LLM" shows the generated command while the panel afterwards has the given command as title and the command's output as content.
70
-
- the table contains all executed commands. ThinkTime denotes the time that was needed to generate the command (Tokens show the token count for the prompt and its response). StateUpdTime shows the time that was needed to generate a new state (the next column also gives the token count)
71
-
- "What does the LLM know about the system?" gives an LLM generated list of system facts. To generate it, it is given the latest executed command (and it's output) as well as the current list of system facts. This is the operation which time/token usage is shown in the overview table as StateUpdTime/StateUpdTokens. As the state update takes forever, this is disabled by default and has to be enabled through a command line switch.
72
-
- Then the next round starts. The next given command (`sudo tar`) will lead to a pwn'd system BTW.
73
-
74
-
#### Academic Publications on Priv-Esc Attacks
75
-
76
-
Preliminary results for the linux privilege escalation use-case can be found in [Evaluating LLMs for Privilege-Escalation Scenarios](https://arxiv.org/abs/2310.11409):
77
-
78
-
~~~bibtex
79
-
@misc{happe2024llms,
80
-
title={LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks},
81
-
author={Andreas Happe and Aaron Kaplan and Jürgen Cito},
82
-
year={2024},
83
-
eprint={2310.11409},
84
-
archivePrefix={arXiv},
85
-
primaryClass={cs.CR}
86
-
}
87
-
~~~
88
-
89
-
This work is partially based upon our empiric research into [how hackers work](https://arxiv.org/abs/2308.07057):
90
-
91
-
~~~bibtex
92
-
@inproceedings{Happe_2023, series={ESEC/FSE ’23},
93
-
title={Understanding Hackers’ Work: An Empirical Study of Offensive Security Practitioners},
94
-
url={http://dx.doi.org/10.1145/3611643.3613900},
95
-
DOI={10.1145/3611643.3613900},
96
-
booktitle={Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
97
-
publisher={ACM},
98
-
author={Happe, Andreas and Cito, Jürgen},
99
-
year={2023},
100
-
month=nov, collection={ESEC/FSE ’23}
101
-
}
102
-
~~~
41
+
| Name | Description | Screenshot |
42
+
| -- | -- | -- |
43
+
| minimal | A minimal 50 LoC Linux Priv-Esc example ||
44
+
|[linux-privesc](docs/linux_privesc.md)| Given a SSH-connection for a low-privilege user, task the LLM to become the root user. This would be a typical Linux privilege escalation attack. We published two academic papers about this: [paper #1](https://arxiv.org/abs/2308.00121) and [paper #2](https://arxiv.org/abs/2310.11409)||
45
+
| web-pentest | Directly hack a webpage ||
46
+
| web-api-pentest | An Web-API focues usecase ||
103
47
104
48
## Build your own Agent/Usecase
105
49
50
+
A researcher
51
+
wanting to create a new experiment would just create a new use-case that mostly consists
52
+
of the control loop and corresponding prompt templates. We provide multiple helper and base
53
+
classes, so that a new experiment can be implemented in a few dozens lines of code as
54
+
connecting to the LLM, logging, etc. is taken care of by our framework. For further information (esp. if you want to contribute use-cases), please take a look at [docs/use_case.md](docs/use_case.md).
55
+
106
56
The following would create a new (minimal) linux privilege-escalation agent. Through using our infrastructure, this already uses configurable LLM-connections (e.g., for testing OpenAI or locally run LLMs), logs trace data to a local sqlite database for each run, implements a round limit (after which the agent will stop if root has not been achieved until then) and is able to connect to a linux target over SSH for fully-autonomous command execution (as well as password guessing).
0 commit comments