Skip to content

Commit fc557af

Browse files
authored
Merge pull request #2 from ipa-lab/v3
V3
2 parents 9983243 + 82ffd9a commit fc557af

18 files changed

+562
-235
lines changed

‎.env.example‎

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,9 @@ TARGET_IP='enter-the-private-ip-of-some-vm.local'
88
# exchange with the user for your target VM
99
TARGET_USER='bob'
1010
TARGET_PASSWORD='secret'
11+
12+
# which LLM driver to use (can be openai_rest or oobabooga for now)
13+
LLM_CONNECTION = "openai_rest"
14+
15+
# how many rounds should this thing go?
16+
MAX_ROUNDS = 20

‎README.md‎

Lines changed: 11 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,15 @@ This is a small python script that I use to prototype some potential use-cases w
66

77
What is it doing? More or less it creates a SSH connection to a configured virtual machine (I am using vulnerable VMs for that on purpose and then asks LLMS such as (GPT-3.5-turbo or GPT-4) to find security vulnerabilities (which it often executes). Evicts a bit of an eerie feeling for me.
88

9+
Current features:
10+
11+
- connects over SSH
12+
- supports multiple openai models (gpt-3.5-turbo, gpt4, gpt-3.5-turbo-16k, etc.)
13+
- beautiful console output
14+
- log storage in sqlite either into a file or in-memory
15+
- automatic (very rough) root detection
16+
- can limit rounds (how often the LLM will be asked for a new command)
17+
918
### Vision Paper
1019

1120
hackingBuddyGPT is described in the paper [Getting pwn'd by AI: Penetration Testing with Large Language Models ](https://arxiv.org/abs/2308.00121).
@@ -31,6 +40,8 @@ series = {ESEC/FSE 2023}
3140

3241
# Example runs
3342

43+
- more can be seen at [history notes](https://github.com/ipa-lab/hackingBuddyGPT/blob/v3/history_notes.md)
44+
3445
## updated version using GPT-4
3546

3647
This happened during a recent run:
@@ -45,25 +56,6 @@ Some things to note:
4556

4657
In this case GPT-4 wanted to exploit a vulnerable cron script (to which it had write access), sadly I forgot to enable cron in the VM.
4758

48-
## initial version (tagged as fse23-ivr) using gpt-3.5-turbo
49-
50-
This happened during a recent run:
51-
52-
![Example wintermute run](example_run.png)
53-
54-
Some things to note:
55-
56-
- prompts for GPT-3 are prefixed with `openai-prompt`, the returned command from GPT-3 is prefixed with `openai-next-command` and the result from executing the command with `server-output`
57-
- the used SSH-library also displays the output produced by the commands executed through SSH --- this is why some stuff appears twice
58-
- I've added a simple callback that automatically enters the configured account's credentials if sudo prompts for a password
59-
60-
So, what is acutally happening when executing wintermute?
61-
62-
- wintermute executed `id` initially to get the user's id
63-
- the next command was `sudo -l`, listing the current users sudo permissions
64-
- wintermute then executes `sudo /bin/bash` and we're dropped into an interactive root shell
65-
66-
6759
## High-Level Description
6860

6961
This tool uses SSH to connect to a (presumably) vulnerable virtual machine and then asks OpenAI GPT to suggest linux commands that could be used for finding security vulnerabilities or privilege escalatation. The provided command is then executed within the virtual machine, the output fed back to the LLM and, finally, a new command is requested from it..

‎config.py‎

Lines changed: 0 additions & 24 deletions
This file was deleted.

‎db_storage.py‎

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
import sqlite3
2+
3+
class DbStorage:
4+
def __init__(self, connection_string=":memory:"):
5+
self.connection_string = connection_string
6+
7+
def connect(self):
8+
self.db = sqlite3.connect(self.connection_string)
9+
self.cursor = self.db.cursor()
10+
11+
def insert_or_select_cmd(self, name:str) -> int:
12+
results = self.cursor.execute("SELECT id, name FROM commands WHERE name = ?", (name, )).fetchall()
13+
14+
if len(results) == 0:
15+
self.cursor.execute("INSERT INTO commands (name) VALUES (?)", (name, ))
16+
return self.cursor.lastrowid
17+
elif len(results) == 1:
18+
return results[0][0]
19+
else:
20+
print("this should not be happening: " + str(results))
21+
return -1
22+
23+
def setup_db(self):
24+
# create tables
25+
self.cursor.execute("CREATE TABLE IF NOT EXISTS runs (id INTEGER PRIMARY KEY, model text, context_size INTEGER, state TEXT, tag TEXT)")
26+
self.cursor.execute("CREATE TABLE IF NOT EXISTS commands (id INTEGER PRIMARY KEY, name string unique)")
27+
self.cursor.execute("CREATE TABLE IF NOT EXISTS queries (run_id INTEGER, round INTEGER, cmd_id INTEGER, query TEXT, response TEXT, duration REAL, tokens_query INTEGER, tokens_response INTEGER)")
28+
29+
# insert commands
30+
self.query_cmd_id = self.insert_or_select_cmd('query_cmd')
31+
self.analyze_response_id = self.insert_or_select_cmd('analyze_response')
32+
self.state_update_id = self.insert_or_select_cmd('update_state')
33+
34+
def create_new_run(self, model, context_size, tag=''):
35+
self.cursor.execute("INSERT INTO runs (model, context_size, state, tag) VALUES (?, ?, ?, ?)", (model, context_size, "in progress", tag))
36+
return self.cursor.lastrowid
37+
38+
def add_log_query(self, run_id, round, cmd, result, answer):
39+
self.cursor.execute("INSERT INTO queries (run_id, round, cmd_id, query, response, duration, tokens_query, tokens_response) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (run_id, round, self.query_cmd_id, cmd, result, answer.duration, answer.tokens_query, answer.tokens_response))
40+
41+
def add_log_analyze_response(self, run_id, round, cmd, result, answer):
42+
self.cursor.execute("INSERT INTO queries (run_id, round, cmd_id, query, response, duration, tokens_query, tokens_response) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (run_id, round, self.analyze_response_id, cmd, result, answer.duration, answer.tokens_query, answer.tokens_response))
43+
44+
def add_log_update_state(self, run_id, round, cmd, result, answer):
45+
46+
if answer != None:
47+
self.cursor.execute("INSERT INTO queries (run_id, round, cmd_id, query, response, duration, tokens_query, tokens_response) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (run_id, round, self.state_update_id, cmd, result, answer.duration, answer.tokens_query, answer.tokens_response))
48+
else:
49+
self.cursor.execute("INSERT INTO queries (run_id, round, cmd_id, query, response, duration, tokens_query, tokens_response) VALUES (?, ?, ?, ?, ?, ?, ?, ?)", (run_id, round, self.state_update_id, cmd, result, 0, 0, 0))
50+
51+
def get_round_data(self, run_id, round):
52+
rows = self.cursor.execute("select cmd_id, query, response, duration, tokens_query, tokens_response from queries where run_id = ? and round = ?", (run_id, round)).fetchall()
53+
54+
for row in rows:
55+
if row[0] == self.query_cmd_id:
56+
cmd = row[1]
57+
size_resp = str(len(row[2]))
58+
duration = f"{row[3]:.4f}"
59+
tokens = f"{row[4]}/{row[5]}"
60+
if row[0] == self.analyze_response_id:
61+
reason = row[2]
62+
analyze_time = f"{row[3]:.4f}"
63+
analyze_token = f"{row[4]}/{row[5]}"
64+
65+
result = [duration, tokens, cmd, size_resp, analyze_time, analyze_token, reason]
66+
return result
67+
68+
def get_cmd_history(self, run_id):
69+
rows = self.cursor.execute("select query, response from queries where run_id = ? and cmd_id = ? order by round asc", (run_id, self.query_cmd_id)).fetchall()
70+
71+
result = []
72+
73+
for row in rows:
74+
result.append([row[0], row[1]])
75+
76+
return result
77+
78+
def run_was_success(self, run_id):
79+
self.cursor.execute("update runs set state=? where id = ?", ("got root", run_id))
80+
self.db.commit()
81+
82+
def run_was_failure(self, run_id):
83+
self.cursor.execute("update runs set state=? where id = ?", ("reached max runs", run_id))
84+
self.db.commit()
85+
86+
def commit(self):
87+
self.db.commit()

‎handlers.py‎

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import paramiko
2+
3+
from targets.ssh import SSHHostConn
4+
5+
def handle_cmd(conn, input):
6+
result, gotRoot = conn.run(input["cmd"])
7+
return input["cmd"], result, gotRoot
8+
9+
10+
def handle_ssh(target_host, input):
11+
user = input["username"]
12+
password = input["password"]
13+
14+
cmd = "tried ssh with username " + user + " and password " + password
15+
16+
test = SSHHostConn(target_host, user, password)
17+
try:
18+
test.connect()
19+
user = test.run("whoami")
20+
21+
if user == "root":
22+
return cmd, "Login as root was successful"
23+
else:
24+
return cmd, "Authentication successful, but user is not root"
25+
26+
except paramiko.ssh_exception.AuthenticationException:
27+
return cmd, "Authentication error, credentials are wrong"

‎helper.py‎

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
import tiktoken
2+
3+
from db_storage import DbStorage
4+
from rich.table import Table
5+
6+
def num_tokens_from_string(model: str, string: str) -> int:
7+
"""Returns the number of tokens in a text string."""
8+
encoding = tiktoken.encoding_for_model(model)
9+
return len(encoding.encode(string))
10+
11+
def get_history_table(run_id: int, db: DbStorage, round: int) -> Table:
12+
table = Table(title="Executed Command History", show_header=True, show_lines=True)
13+
table.add_column("ThinkTime", style="dim")
14+
table.add_column("Tokens", style="dim")
15+
table.add_column("Cmd")
16+
table.add_column("Resp. Size", justify="right")
17+
table.add_column("ThinkTime", style="dim")
18+
table.add_column("Tokens", style="dim")
19+
table.add_column("Reason")
20+
21+
for i in range(0, round+1):
22+
table.add_row(*db.get_round_data(run_id, i))
23+
24+
return table
25+
26+
def get_cmd_history(model: str, run_id: int, db: DbStorage, limit: int) -> list[str]:
27+
result = []
28+
rest = limit
29+
30+
# get commands from db
31+
cmds = db.get_cmd_history(run_id)
32+
33+
for itm in reversed(cmds):
34+
size_cmd = num_tokens_from_string(model, itm[0])
35+
size_result = num_tokens_from_string(model, itm[1])
36+
size = size_cmd + size_result
37+
38+
if size <= rest:
39+
result.append(itm)
40+
rest -= size
41+
else:
42+
# if theres a bit space left, fill that up with parts of the last item
43+
if (rest - size_cmd) >= 200:
44+
result.append({
45+
"cmd" : itm[0],
46+
"result" : itm[1][:(rest-size_cmd-2)] + ".."
47+
})
48+
return list(reversed(result))
49+
return list(reversed(result))

‎history.py‎

Lines changed: 0 additions & 67 deletions
This file was deleted.

‎history_notes.md‎

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,21 @@
1+
# initial version (tagged as fse23-ivr) using gpt-3.5-turbo
2+
3+
This happened during a recent run:
4+
5+
![Example wintermute run](example_run.png)
6+
7+
Some things to note:
8+
9+
- prompts for GPT-3 are prefixed with `openai-prompt`, the returned command from GPT-3 is prefixed with `openai-next-command` and the result from executing the command with `server-output`
10+
- the used SSH-library also displays the output produced by the commands executed through SSH --- this is why some stuff appears twice
11+
- I've added a simple callback that automatically enters the configured account's credentials if sudo prompts for a password
12+
13+
So, what is acutally happening when executing wintermute?
14+
15+
- wintermute executed `id` initially to get the user's id
16+
- the next command was `sudo -l`, listing the current users sudo permissions
17+
- wintermute then executes `sudo /bin/bash` and we're dropped into an interactive root shell
18+
119
# inital running version (~0.0.1)
220

321
- simple limitiation to 3k tokens for history

0 commit comments

Comments
 (0)