Skip to content

Commit 810100c

Browse files
authored
Merge pull request #128 from ipa-lab/development
Merge Development into Main Branch
2 parents 276ef06 + bc22dff commit 810100c

File tree

68 files changed

+11396
-1753
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+11396
-1753
lines changed

‎.gitignore‎

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,10 @@ scripts/mac_ansible_hosts.ini
2525
scripts/mac_ansible_id_rsa
2626
scripts/mac_ansible_id_rsa.pub
2727
.aider*
28+
29+
src/hackingBuddyGPT/usecases/web_api_testing/documentation/openapi_spec/
30+
src/hackingBuddyGPT/usecases/web_api_testing/documentation/reports/
31+
src/hackingBuddyGPT/usecases/web_api_testing/retrieve_spotify_token.py
32+
config/my_configs/*
33+
config/configs/*
34+
config/configs/

‎README.md‎

Lines changed: 55 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@
99

1010
HackingBuddyGPT helps security researchers use LLMs to discover new attack vectors and save the world (or earn bug bounties) in 50 lines of code or less. In the long run, we hope to make the world a safer place by empowering security professionals to get more hacking done by using AI. The more testing they can do, the safer all of us will get.
1111

12+
**🆕 New Feature**: hackingBuddyGPT now supports both SSH connections to remote targets and local shell execution for easier testing and development!
13+
14+
**⚠�� WARNING**: This software will execute commands on live environments. When using local shell mode, commands will be executed on your local system, which could potentially lead to data loss, system modification, or security vulnerabilities. Always use appropriate precautions and consider using isolated environments or virtual machines for testing.
15+
16+
1217
We aim to become **THE go-to framework for security researchers** and pen-testers interested in using LLMs or LLM-based autonomous agents for security testing. To aid their experiments, we also offer re-usable [linux priv-esc benchmarks](https://github.com/ipa-lab/benchmark-privesc-linux) and publish all our findings as open-access reports.
1318

1419
If you want to use hackingBuddyGPT and need help selecting the best LLM for your tasks, [we have a paper comparing multiple LLMs](https://arxiv.org/abs/2310.11409).
@@ -68,18 +73,19 @@ the use of LLMs for web penetration-testing and web api testing.
6873
| Name | Description | Screenshot |
6974
|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
7075
| [minimal](https://docs.hackingbuddy.ai/docs/dev-guide/dev-quickstart) | A minimal 50 LoC Linux Priv-Esc example. This is the usecase from [Build your own Agent/Usecase](#build-your-own-agentusecase) | ![A very minimal run](https://docs.hackingbuddy.ai/run_archive/2024-04-29_minimal.png) |
71-
| [linux-privesc](https://docs.hackingbuddy.ai/docs/usecases/linux-priv-esc) | Given an SSH-connection for a low-privilege user, task the LLM to become the root user. This would be a typical Linux privilege escalation attack. We published two academic papers about this: [paper #1](https://arxiv.org/abs/2308.00121) and [paper #2](https://arxiv.org/abs/2310.11409) | ![Example wintermute run](https://docs.hackingbuddy.ai/run_archive/2024-04-06_linux.png) |
76+
| [linux-privesc](https://docs.hackingbuddy.ai/docs/usecases/linux-priv-esc) | Given a connection (SSH or local shell) for a low-privilege user, task the LLM to become the root user. This would be a typical Linux privilege escalation attack. We published two academic papers about this: [paper #1](https://arxiv.org/abs/2308.00121) and [paper #2](https://arxiv.org/abs/2310.11409) | ![Example wintermute run](https://docs.hackingbuddy.ai/run_archive/2024-04-06_linux.png) |
7277
| [web-pentest (WIP)](https://docs.hackingbuddy.ai/docs/usecases/web) | Directly hack a webpage. Currently in heavy development and pre-alpha stage. | ![Test Run for a simple Blog Page](https://docs.hackingbuddy.ai/run_archive/2024-05-03_web.png) |
7378
| [web-api-pentest (WIP)](https://docs.hackingbuddy.ai/docs/usecases/web-api) | Directly test a REST API. Currently in heavy development and pre-alpha stage. (Documentation and testing of REST API.) | Documentation:![web_api_documentation.png](https://docs.hackingbuddy.ai/run_archive/2024-05-15_web-api_documentation.png) Testing:![web_api_testing.png](https://docs.hackingbuddy.ai/run_archive/2024-05-15_web-api.png) |
74-
| [extended linux-privesc](https://docs.hackingbuddy.ai/docs/usecases/extended-linux-privesc) | This usecases extends linux-privesc with additional features such as retrieval augmented generation (RAG) or chain-of-thought (CoT) | ![Extended Linux Privilege Escalation Run](https://docs.hackingbuddy.ai/run_archive/2025-4-14_extended_privesc_usecase_1.png) ![Extended Linux Privilege Escalation Run](https://docs.hackingbuddy.ai/run_archive/2025-4-14_extended_privesc_usecase_1.png) |
79+
| [extended linux-privesc](https://docs.hackingbuddy.ai/docs/usecases/extended-linux-privesc) | This usecases extends linux-privesc with additional features such as retrieval augmented generation (RAG) or chain-of-thought (CoT) | ![Extended Linux Privilege Escalation Run](https://docs.hackingbuddy.ai/run_archive/2025-4-14_extended_privesc_usecase_1.png) ![Extended Linux Privilege Escalation Run](https://docs.hackingbuddy.ai/run_archive/2025-4-14_extended_privesc_usecase_2.png) |
80+
7581
## Build your own Agent/Usecase
7682

7783
So you want to create your own LLM hacking agent? We've got you covered and taken care of the tedious groundwork.
7884

7985
Create a new usecase and implement `perform_round` containing all system/LLM interactions. We provide multiple helper and base classes so that a new experiment can be implemented in a few dozen lines of code. Tedious tasks, such as
8086
connecting to the LLM, logging, etc. are taken care of by our framework. Check our [developer quickstart quide](https://docs.hackingbuddy.ai/docs/dev-guide/dev-quickstart) for more information.
8187

82-
The following would create a new (minimal) linux privilege-escalation agent. Through using our infrastructure, this already uses configurable LLM-connections (e.g., for testing OpenAI or locally run LLMs), logs trace data to a local sqlite database for each run, implements a round limit (after which the agent will stop if root has not been achieved until then) and can connect to a linux target over SSH for fully-autonomous command execution (as well as password guessing).
88+
The following would create a new (minimal) linux privilege-escalation agent. Through using our infrastructure, this already uses configurable LLM-connections (e.g., for testing OpenAI or locally run LLMs), logs trace data to a local sqlite database for each run, implements a round limit (after which the agent will stop if root has not been achieved until then) and can connect to a target system either locally or over SSH for fully-autonomous command execution (as well as password guessing).
8389

8490
~~~ python
8591
template_dir = pathlib.Path(__file__).parent
@@ -155,7 +161,9 @@ We try to keep our python dependencies as light as possible. This should allow f
155161

156162
1. an OpenAI API account, you can find the needed keys [in your account page](https://platform.openai.com/account/api-keys)
157163
- please note that executing this script will call OpenAI and thus charges will occur to your account. Please keep track of those.
158-
2. a potential target that is accessible over SSH. You can either use a deliberately vulnerable machine such as [Lin.Security.1](https://www.vulnhub.com/entry/) or a security benchmark such as our [linux priv-esc benchmark](https://github.com/ipa-lab/benchmark-privesc-linux).
164+
2. a target environment to test against. You have two options:
165+
- **Local Shell**: Use your local system (useful for testing and development)
166+
- **SSH Target**: A remote machine accessible over SSH. You can use a deliberately vulnerable machine such as [Lin.Security.1](https://www.vulnhub.com/entry/) or a security benchmark such as our [linux priv-esc benchmark](https://github.com/ipa-lab/benchmark-privesc-linux).
159167

160168
To get everything up and running, clone the repo, download requirements, setup API keys and credentials, and start `wintermute.py`:
161169

@@ -229,11 +237,45 @@ usage: src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc [--help] [--config con
229237
--conn.port='2222' (default from .env file, alternatives: 22 from builtin)
230238
```
231239
232-
### Provide a Target Machine over SSH
240+
### Connection Options: Local Shell vs SSH
241+
242+
hackingBuddyGPT now supports two connection modes:
243+
244+
#### Local Shell Mode
245+
Use your local system for testing and development. This is useful for quick experimentation without needing a separate target machine.
246+
247+
**Setup Steps:**
248+
1. First, create a new tmux session with a specific name:
249+
```bash
250+
$ tmux new-session -s <session_name>
251+
```
252+
253+
2. Once you have the tmux shell running, use hackingBuddyGPT to interact with it:
254+
```bash
255+
# Local shell with tmux session
256+
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --conn=local_shell --conn.tmux_session=<session_name>
257+
```
258+
259+
**Example:**
260+
```bash
261+
# Step 1: Create tmux session named "hacking_session"
262+
$ tmux new-session -s hacking_session
263+
264+
# Step 2: In another terminal, run hackingBuddyGPT
265+
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --conn=local_shell --conn.tmux_session=hacking_session
266+
```
267+
268+
#### SSH Mode
269+
Connect to a remote target machine over SSH. This is the traditional mode for testing against vulnerable VMs.
270+
271+
```bash
272+
# SSH connection (note the updated format with --conn=ssh)
273+
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --conn=ssh --conn.host=192.168.122.151 --conn.username=lowpriv --conn.password=trustno1
274+
```
233275
234-
The next important part is having a machine that we can run our agent against. In our case, the target machine will be situated at `192.168.122.151`.
276+
When using SSH mode, the target machine should be situated at your specified IP address (e.g., `192.168.122.151` in the example above).
235277
236-
We are using vulnerable Linux systems running in Virtual Machines for this. Never run this against real systems.
278+
We are using vulnerable Linux systems running in Virtual Machines for SSH testing. Never run this against real production systems.
237279
238280
> 💡 **We also provide vulnerable machines!**
239281
>
@@ -277,9 +319,13 @@ Finally we can run hackingBuddyGPT against our provided test VM. Enjoy!
277319
With that out of the way, let's look at an example hackingBuddyGPT run. Each run is structured in rounds. At the start of each round, hackingBuddyGPT asks a LLM for the next command to execute (e.g., `whoami`) for the first round. It then executes that command on the virtual machine, prints its output and starts a new round (in which it also includes the output of prior rounds) until it reaches step number 10 or becomes root:
278320
279321
```bash
280-
# start wintermute, i.e., attack the configured virtual machine
281-
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --llm.api_key=sk...ChangeMeToYourOpenAiApiKey --llm.model=gpt-4-turbo --llm.context_size=8192 --conn.host=192.168.122.151 --conn.username=lowpriv --conn.password=trustno1 --conn.hostname=test1
322+
# Example 1: Using local shell with tmux session
323+
# First create the tmux session: tmux new-session -s hacking_session
324+
# Then run hackingBuddyGPT:
325+
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --llm.api_key=sk...ChangeMeToYourOpenAiApiKey --llm.model=gpt-4-turbo --llm.context_size=8192 --conn=local_shell --conn.tmux_session=hacking_session
282326
327+
# Example 2: Using SSH connection (updated format)
328+
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --llm.api_key=sk...ChangeMeToYourOpenAiApiKey --llm.model=gpt-4-turbo --llm.context_size=8192 --conn=ssh --conn.host=192.168.122.151 --conn.username=lowpriv --conn.password=trustno1 --conn.hostname=test1
283329
284330
# install dependencies for testing if you want to run the tests
285331
$ pip install '.[testing]'

‎pyproject.toml‎

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,15 @@ dependencies = [
4545
'uvicorn[standard] == 0.30.6',
4646
'dataclasses_json == 0.6.7',
4747
'websockets == 13.1',
48-
'langchain-community',
49-
'langchain-openai',
48+
'pandas',
49+
'faker',
50+
'fpdf',
51+
'langchain_core',
52+
'langchain_community',
53+
'langchain_chroma',
54+
'langchain_openai',
5055
'markdown',
5156
'chromadb',
52-
'langchain-chroma',
5357
]
5458

5559
[project.urls]
@@ -69,7 +73,7 @@ where = ["src"]
6973
pythonpath = "src"
7074
addopts = ["--import-mode=importlib"]
7175
[project.optional-dependencies]
72-
testing = ['pytest', 'pytest-mock']
76+
testing = ['pytest', 'pytest-mock', 'pandas', 'faker', 'langchain_core']
7377
dev = [
7478
'ruff',
7579
]

‎src/hackingBuddyGPT/capabilities/http_request.py‎

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -45,18 +45,11 @@ def __call__(
4545
body_is_base64: Optional[bool] = False,
4646
headers: Optional[Dict[str, str]] = None,
4747
) -> str:
48+
4849
if body is not None and body_is_base64:
4950
body = base64.b64decode(body).decode()
50-
if self.host[-1] != "/":
51+
if self.host[-1] != "/" and not path.startswith("/"):
5152
path = "/" + path
52-
resp = self._client.request(
53-
method,
54-
self.host + path,
55-
params=query,
56-
data=body,
57-
headers=headers,
58-
allow_redirects=self.follow_redirects,
59-
)
6053
try:
6154
resp = self._client.request(
6255
method,
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import re
2+
from dataclasses import dataclass
3+
from typing import Tuple
4+
5+
from hackingBuddyGPT.capabilities import Capability
6+
from hackingBuddyGPT.utils.local_shell import LocalShellConnection
7+
8+
9+
@dataclass
10+
class LocalShellCapability(Capability):
11+
conn: LocalShellConnection
12+
13+
def describe(self) -> str:
14+
return "give a command to be executed and I will respond with the terminal output when running this command on the shell via tmux. The given command must not require user interaction. Do not use quotation marks in front and after your command."
15+
16+
def get_name(self):
17+
return "local_exec"
18+
19+
def _got_root(self, output: str) -> bool:
20+
"""Check if we got root access based on the command output."""
21+
if not output.strip():
22+
return False
23+
24+
lines = output.strip().split('\n')
25+
last_line = lines[-1] if lines else ''
26+
27+
# Check for common root indicators
28+
return (
29+
"root" in output.lower() or
30+
last_line.strip().endswith("#") or
31+
"root@" in last_line or
32+
last_line.strip() == "#"
33+
)
34+
35+
def __call__(self, cmd: str) -> Tuple[str, bool]:
36+
out, _, _ = self.conn.run(cmd) # This is CORRECT - use the commented version
37+
return out, self._got_root(out)
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
from dataclasses import dataclass, field
2+
from typing import Dict, Any, List, Tuple
3+
from hackingBuddyGPT.capabilities import Capability
4+
5+
6+
from dataclasses import dataclass, field
7+
from typing import Any, Dict, List, Tuple
8+
9+
@dataclass
10+
class ParsedInformation(Capability):
11+
status_code: str
12+
reason_phrase: Dict[str, Any] = field(default_factory=dict)
13+
headers: Dict[str, Any] = field(default_factory=dict)
14+
response_body: Dict[str, Any] = field(default_factory=dict)
15+
registry: List[Tuple[str, str, str, str]] = field(default_factory=list)
16+
17+
def describe(self) -> str:
18+
"""
19+
Returns a description of the test case.
20+
"""
21+
return f"Parsed information for {self.status_code}, reason_phrase: {self.reason_phrase}, headers: {self.headers}, response_body: {self.response_body} "
22+
def __call__(self, status_code: str, reason_phrase: str, headers: str, response_body:str) -> dict:
23+
self.registry.append((status_code, response_body, headers,response_body))
24+
25+
return {"status_code": status_code, "reason_phrase": reason_phrase, "headers": headers, "response_body": response_body}
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
2+
from hackingBuddyGPT.capabilities import Capability
3+
4+
5+
from dataclasses import dataclass, field
6+
from typing import Any, Dict, List, Tuple
7+
8+
@dataclass
9+
class PythonTestCase(Capability):
10+
description: str
11+
input: Dict[str, Any] = field(default_factory=dict)
12+
expected_output: Dict[str, Any] = field(default_factory=dict)
13+
registry: List[Tuple[str, dict, dict]] = field(default_factory=list)
14+
15+
def describe(self) -> str:
16+
"""
17+
Returns a description of the test case.
18+
"""
19+
return f"Test Case: {self.description}\nInput: {self.input}\nExpected Output: {self.expected_output}"
20+
def __call__(self, description: str, input: dict, expected_output: dict) -> dict:
21+
self.registry.append((description, input, expected_output))
22+
return {"description": description, "input": input, "expected_output": expected_output}

‎src/hackingBuddyGPT/usecases/privesc/linux.py‎

100644100755
Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,24 @@
11
from hackingBuddyGPT.capabilities import SSHRunCommand, SSHTestCredential
2+
from hackingBuddyGPT.capabilities.local_shell import LocalShellCapability
23
from hackingBuddyGPT.usecases.base import AutonomousAgentUseCase, use_case
34
from hackingBuddyGPT.utils import SSHConnection
4-
5+
from hackingBuddyGPT.utils.local_shell import LocalShellConnection
6+
from typing import Union
57
from .common import Privesc
68

79

810
class LinuxPrivesc(Privesc):
9-
conn: SSHConnection = None
11+
conn: Union[SSHConnection, LocalShellConnection] = None
1012
system: str = "linux"
1113

1214
def init(self):
1315
super().init()
14-
self.add_capability(SSHRunCommand(conn=self.conn), default=True)
15-
self.add_capability(SSHTestCredential(conn=self.conn))
16+
if isinstance(self.conn, LocalShellConnection):
17+
self.add_capability(LocalShellCapability(conn=self.conn), default=True)
18+
self.add_capability(SSHTestCredential(conn=self.conn))
19+
else:
20+
self.add_capability(SSHRunCommand(conn=self.conn), default=True)
21+
self.add_capability(SSHTestCredential(conn=self.conn))
1622

1723

1824
@use_case("Linux Privilege Escalation")
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,5 @@
11
from .simple_openapi_documentation import SimpleWebAPIDocumentation
22
from .simple_web_api_testing import SimpleWebAPITesting
3+
from . import response_processing
4+
from . import documentation
5+
from . import testing

0 commit comments

Comments
 (0)