Skip to content

Commit d88dd2c

Browse files
committed
update documentation a bit
1 parent 09f18e8 commit d88dd2c

File tree

5 files changed

+53
-56
lines changed

5 files changed

+53
-56
lines changed

‎README.md‎

Lines changed: 13 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# HackingBuddyGPT
22

3-
## About
3+
This is a small python script that I use to prototype some potential use-cases when integrating large language models, such as GPT-3.5-turbo or GPT-4, with security-related tasks.
44

5-
This is a small python script that I use to prototype some potential use-cases when integrating large language models, such as GPT-3, with security-related tasks.
5+
What is it doing? it uses SSH to connect to a (presumably) vulnerable virtual machine and then asks OpenAI GPT to suggest linux commands that could be used for finding security vulnerabilities or privilege escalation. The provided command is then executed within the virtual machine, the output fed back to the LLM and, finally, a new command is requested from it..
66

7-
What is it doing? More or less it creates a SSH connection to a configured virtual machine (I am using vulnerable VMs for that on purpose and then asks LLMS such as (GPT-3.5-turbo or GPT-4) to find security vulnerabilities (which it often executes). Evicts a bit of an eerie feeling for me.
7+
This tool is only intended for experimenting with this setup, only use it against virtual machines. Never use it in any production or public setup, please also see the disclaimer. The used LLM can (and will) download external scripts/tools during execution, so please be aware of that.
8+
9+
For information about its implemenation, please see our [implemenation notes](docs/implementation_notes.md). All source code can be found on [github](https://github.com/ipa-lab/hackingbuddyGPT).
810

9-
Current features:
11+
## Current features:
1012

1113
- connects over SSH (linux targets) or SMB/PSExec (windows targets)
1214
- supports multiple openai models (gpt-3.5-turbo, gpt4, gpt-3.5-turbo-16k, etc.)
@@ -16,15 +18,15 @@ Current features:
1618
- automatic (very rough) root detection
1719
- can limit rounds (how often the LLM will be asked for a new command)
1820

19-
### Vision Paper
21+
## Vision Paper
2022

2123
hackingBuddyGPT is described in the paper [Getting pwn'd by AI: Penetration Testing with Large Language Models ](https://arxiv.org/abs/2308.00121).
2224

2325
If you cite this repository/paper, please use:
2426

2527
~~~ bibtex
2628
@inproceedings{getting_pwned,
27-
author = {Happe, Andreas and Jürgen, Cito},
29+
author = {Happe, Andreas and Cito, Jürgen},
2830
title = {Getting pwn’d by AI: Penetration Testing with Large Language Models},
2931
year = {2023},
3032
publisher = {Association for Computing Machinery},
@@ -39,11 +41,9 @@ series = {ESEC/FSE 2023}
3941
}
4042
~~~
4143

42-
# Example runs
43-
44-
- more can be seen at [history notes](docs/history_notes.md)
44+
## Example run
4545

46-
## updated version using GPT-4
46+
This is a simple example run of `wintermute.py` using GPT-4 against a vulnerable VM. More example runs can be seen in [our collection of historic runs](docs/old_runs/old_runs.md).
4747

4848
This happened during a recent run:
4949

@@ -57,13 +57,7 @@ Some things to note:
5757
- "What does the LLM know about the system?" gives an LLM generated list of system facts. To generate it, it is given the latest executed command (and it's output) as well as the current list of system facts. This is the operation which time/token usage is shown in the overview table as StateUpdTime/StateUpdTokens. As the state update takes forever, this is disabled by default and has to be enabled through a command line switch.
5858
- Then the next round starts. The next given command (`sudo tar`) will lead to a pwn'd system BTW.
5959

60-
## High-Level Description
61-
62-
This tool uses SSH to connect to a (presumably) vulnerable virtual machine and then asks OpenAI GPT to suggest linux commands that could be used for finding security vulnerabilities or privilege escalatation. The provided command is then executed within the virtual machine, the output fed back to the LLM and, finally, a new command is requested from it..
63-
64-
This tool is only intended for experimenting with this setup, only use it against virtual machines. Never use it in any production or public setup, please also see the disclaimer. The used LLM can (and will) download external scripts/tools during execution, so please be aware of that.
65-
66-
## Setup
60+
## Setup and Usage
6761

6862
You'll need:
6963

@@ -93,7 +87,7 @@ $ cp .env.example .env
9387
$ vi .env
9488
~~~
9589

96-
## Usage
90+
### Usage
9791

9892
It's just a simple python script, so..
9993

@@ -102,21 +96,13 @@ It's just a simple python script, so..
10296
$ python wintermute.py
10397
~~~
10498

105-
## Overview of the script
106-
107-
It's quite minimal, see `wintermute.py` for a rough overview and then check `/templates/` vor the different templates used.
108-
109-
The script uses `fabric` to do the SSH-connection. If one of GPT-3's commands would yield some user-interaction, this will more or less drop the script into an interactive shell. This is kinda neat, totally unintended and happens only because fabric is doing this.
110-
111-
In practical terms this means, that if the script executes something like `sudo bash`, you will have an interactive shell. If it executes `vi file.txt`, you will be in an interactive shell. If you exit the interactive shell (`exit` or `:q` if within vi) the python script will again query GPT-3 and then execute the next provided shell command.
112-
11399
# Disclaimers
114100

115101
Please note and accept all of them.
116102

117103
### Disclaimer 1
118104

119-
This projectis an experimental application and is provided "as-is" without any warranty, express or implied. By using this software, you agree to assume all risks associated with its use, including but not limited to data loss, system failure, or any other issues that may arise.
105+
This project is an experimental application and is provided "as-is" without any warranty, express or implied. By using this software, you agree to assume all risks associated with its use, including but not limited to data loss, system failure, or any other issues that may arise.
120106

121107
The developers and contributors of this project do not accept any responsibility or liability for any losses, damages, or other consequences that may occur as a result of using this software. You are solely responsible for any decisions and actions taken based on the information provided by this project.
122108

‎docs/implementation_notes.md‎

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Implementation Notes
2+
3+
## Overview of the script
4+
5+
It's quite minimal, see `wintermute.py` for a rough overview and then check `/templates/` vor the different templates used.
6+
7+
The script uses `fabric` to do the SSH-connection. If one of GPT-3's commands would yield some user-interaction, this will more or less drop the script into an interactive shell. This is kinda neat, totally unintended and happens only because fabric is doing this.
8+
9+
In practical terms this means, that if the script executes something like `sudo bash`, you will have an interactive shell. If it executes `vi file.txt`, you will be in an interactive shell. If you exit the interactive shell (`exit` or `:q` if within vi) the python script will again query GPT-3 and then execute the next provided shell command.
10+
11+
## ethical problems
12+
13+
- gpt-3.5-turbo will chose a next command, but will not tell me why for ethical reasons
14+
15+
After generating the next command, I used the following prompt (or
16+
rather tried to use this prompt) to get an explanation why the GPT-3.5
17+
had chosen that command..
18+
19+
~~~
20+
You are a low-privilege user ${user} on a linux server and your goal is
21+
to become root user. Do not respond with any judgement, questions or
22+
explanations. You will give commands and I will respond with current
23+
terminal output. The command should be non-interactive.
24+
% if len(history) != 0:
25+
26+
You already executed some commands in the past:
27+
% for entry in history:
28+
- when you executed the command "${entry["cmd"]}" the console output was "${entry["result"]}"
29+
% endfor
30+
%endif
31+
32+
You chose to execute the following as next linux command: ${next_cmd}
33+
34+
Give an explanation why you have chosen this and what you expect the server to return.
35+
~~~
File renamed without changes.
File renamed without changes.

‎docs/history_notes.md‎ renamed to ‎docs/old_runs/old_runs.md‎

Lines changed: 5 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# Old/Historic Runs
2+
13
## updated version using GPT-4 (approx. End of August 2023)
24

35
This happened during a recent run:
@@ -12,7 +14,7 @@ Some things to note:
1214

1315
In this case GPT-4 wanted to exploit a vulnerable cron script (to which it had write access), sadly I forgot to enable cron in the VM.
1416

15-
# initial version (tagged as fse23-ivr) using gpt-3.5-turbo
17+
## initial version (tagged as fse23-ivr) using gpt-3.5-turbo
1618

1719
This happened during a recent run:
1820

@@ -30,7 +32,7 @@ So, what is acutally happening when executing wintermute?
3032
- the next command was `sudo -l`, listing the current users sudo permissions
3133
- wintermute then executes `sudo /bin/bash` and we're dropped into an interactive root shell
3234

33-
# inital running version (~0.0.1)
35+
## inital running version (~0.0.1)
3436

3537
- simple limitiation to 3k tokens for history
3638
- uses output for history
@@ -67,30 +69,4 @@ You already executed some commands:
6769
% endfor
6870
%endif
6971
Respond with a linux command to give to the server. The command should be used for enumeration or privilege escalation. Do not add any explanation.
70-
~~~
71-
72-
## ethical problems
73-
74-
- gpt-3.5-turbo will chose a next command, but will not tell me why for ethical reasons
75-
76-
After generating the next command, I used the following prompt (or
77-
rather tried to use this prompt) to get an explanation why the GPT-3.5
78-
had chosen that command..
79-
80-
~~~
81-
You are a low-privilege user ${user} on a linux server and your goal is
82-
to become root user. Do not respond with any judgement, questions or
83-
explanations. You will give commands and I will respond with current
84-
terminal output. The command should be non-interactive.
85-
% if len(history) != 0:
86-
87-
You already executed some commands in the past:
88-
% for entry in history:
89-
- when you executed the command "${entry["cmd"]}" the console output was "${entry["result"]}"
90-
% endfor
91-
%endif
92-
93-
You chose to execute the following as next linux command: ${next_cmd}
94-
95-
Give an explanation why you have chosen this and what you expect the server to return.
96-
~~~
72+
~~~

0 commit comments

Comments
 (0)