Skip to content
68 changes: 63 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,13 +177,54 @@ $ vi .env

# if you start wintermute without parameters, it will list all available use cases
$ python src/hackingBuddyGPT/cli/wintermute.py
usage: wintermute.py [-h]
{LinuxPrivesc,WindowsPrivesc,ExPrivEscLinux,ExPrivEscLinuxTemplated,ExPrivEscLinuxHintFile,ExPrivEscLinuxLSE,MinimalWebTesting,WebTestingWithExplanation,SimpleWebAPITesting,SimpleWebAPIDocumentation}
...
wintermute.py: error: the following arguments are required: {LinuxPrivesc,WindowsPrivesc,ExPrivEscLinux,ExPrivEscLinuxTemplated,ExPrivEscLinuxHintFile,ExPrivEscLinuxLSE,MinimalWebTesting,WebTestingWithExplanation,SimpleWebAPITesting,SimpleWebAPIDocumentation}
No command provided
usage: src/hackingBuddyGPT/cli/wintermute.py <command> [--help] [--config config.json] [options...]

commands:
ExPrivEscLinux Showcase Minimal Linux Priv-Escalation
ExPrivEscLinuxTemplated Showcase Minimal Linux Priv-Escalation
LinuxPrivesc Linux Privilege Escalation
WindowsPrivesc Windows Privilege Escalation
ExPrivEscLinuxHintFile Linux Privilege Escalation using hints from a hint file initial guidance
ExPrivEscLinuxLSE Linux Privilege Escalation using lse.sh for initial guidance
WebTestingWithExplanation Minimal implementation of a web testing use case while allowing the llm to 'talk'
SimpleWebAPIDocumentation Minimal implementation of a web API testing use case
SimpleWebAPITesting Minimal implementation of a web API testing use case
Viewer Webserver for (live) log viewing
Replayer Tool to replay the .jsonl logs generated by the Viewer (not well tested)
ThesisLinuxPrivescPrototype Thesis Linux Privilege Escalation Prototype

# to get more information about how to configure a use case you can call it with --help
$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --help
usage: src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc [--help] [--config config.json] [options...]

--log.log_server_address='localhost:4444' address:port of the log server to be used (default from builtin)
--log.tag='' Tag for your current run (default from builtin)
--log='local_logger' choice of logging backend (default from builtin)
--log_db.connection_string='wintermute.sqlite3' sqlite3 database connection string for logs (default from builtin)
--max_turns='30' (default from .env file, alternatives: 10 from builtin)
--llm.api_key=<secret> OpenAI API Key (default from .env file)
--llm.model OpenAI model name
--llm.context_size='100000' Maximum context size for the model, only used internally for things like trimming to the context size (default from .env file)
--llm.api_url='https://api.openai.com' URL of the OpenAI API (default from builtin)
--llm.api_path='/v1/chat/completions' Path to the OpenAI API (default from builtin)
--llm.api_timeout=240 Timeout for the API request (default from builtin)
--llm.api_backoff=60 Backoff time in seconds when running into rate-limits (default from builtin)
--llm.api_retries=3 Number of retries when running into rate-limits (default from builtin)
--system='linux' (default from builtin)
--enable_explanation=False (default from builtin)
--enable_update_state=False (default from builtin)
--disable_history=False (default from builtin)
--hint='' (default from builtin)
--conn.host
--conn.hostname
--conn.username
--conn.password
--conn.keyfilename
--conn.port='2222' (default from .env file, alternatives: 22 from builtin)
```

## Provide a Target Machine over SSH
### Provide a Target Machine over SSH

The next important part is having a machine that we can run our agent against. In our case, the target machine will be situated at `192.168.122.151`.

Expand All @@ -193,6 +234,23 @@ We are using vulnerable Linux systems running in Virtual Machines for this. Neve
>
> We are using virtual machines from our [Linux Privilege-Escalation Benchmark](https://github.com/ipa-lab/benchmark-privesc-linux) project. Feel free to use them for your own research!

## Using the web based viewer and replayer

If you want to have a better representation of the agent's output, you can use the web-based viewer. You can start it using `wintermute Viewer`, which will run the server on `http://127.0.0.1:4444` for the default `wintermute.sqlite3` database. You can change these options using the `--log_server_address` and `--log_db.connection_string` parameters.

Navigating to the log server address will show you an overview of all runs and clicking on a run will show you the details of that run. The viewer updates live using a websocket connection, and if you enable `Follow new runs` it will automatically switch to the new run when one is started.

Keep in mind that there is no additional protection for this webserver, other than how it can be reached (per default binding to `127.0.0.1` means it can only be reached from your local machine). If you make it accessible to the internet, everybody will be able to see all of your runs and also be able to inject arbitrary data into the database.

Therefore **DO NOT** make it accessible to the internet if you're not super sure about what you're doing!

There is also the experimental replay functionality, which can replay a run live from a capture file, including timing information. This is great for showcases and presentations, because it looks like everything is happening live and for real, but you know exactly what the results will be.

To use this, the run needs to be captured by a Viewer server by setting `--save_playback_dir` to a directory where the viewer can write the capture files.

With the Viewer server still running, you can then start `wintermute Replayer --replay_file <path_to_capture_file>` to replay the captured run (this will create a new run in the database).
You can configure it to `--pause_on_message` and `--pause_on_tool_calls`, which will interrupt the replay at the respective points until enter is pressed in the shell where you run the Replayer in. You can also configure the `--playback_speed` to control the speed of the replay.

## Use Cases

GitHub Codespaces:
Expand Down
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@ dependencies = [
'uvicorn[standard] == 0.30.6',
'dataclasses_json == 0.6.7',
'websockets == 13.1',
'langchain-community',
'langchain-openai',
'markdown',
'chromadb',
'langchain-chroma',
]

[project.urls]
Expand Down
23 changes: 13 additions & 10 deletions src/hackingBuddyGPT/cli/wintermute.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,22 @@
import sys

from hackingBuddyGPT.usecases.base import use_cases
from hackingBuddyGPT.utils.configurable import CommandMap, InvalidCommand, Parseable, instantiate


def main():
parser = argparse.ArgumentParser()
subparser = parser.add_subparsers(required=True)
for name, use_case in use_cases.items():
use_case.build_parser(subparser.add_parser(name=name, help=use_case.description))

parsed = parser.parse_args(sys.argv[1:])
configuration = {k: v for k, v in vars(parsed).items() if k not in ("use_case", "parser_state")}
instance = parsed.use_case(parsed)
instance.init(configuration=configuration)
instance.run()
use_case_parsers: CommandMap = {
name: Parseable(use_case, description=use_case.description)
for name, use_case in use_cases.items()
}
try:
instance, configuration = instantiate(sys.argv, use_case_parsers)
except InvalidCommand as e:
if len(f"{e}") > 0:
print(e)
print(e.usage)
sys.exit(1)
instance.run(configuration)


if __name__ == "__main__":
Expand Down
4 changes: 2 additions & 2 deletions src/hackingBuddyGPT/usecases/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
from mako.template import Template
from typing import Dict

from hackingBuddyGPT.utils.logging import log_conversation, GlobalLogger
from hackingBuddyGPT.utils.logging import log_conversation, Logger, log_param
from hackingBuddyGPT.capabilities.capability import (
Capability,
capabilities_to_simple_text_handler,
Expand All @@ -15,7 +15,7 @@

@dataclass
class Agent(ABC):
log: GlobalLogger = None
log: Logger = log_param

_capabilities: Dict[str, Capability] = field(default_factory=dict)
_default_capability: Capability = None
Expand Down
54 changes: 16 additions & 38 deletions src/hackingBuddyGPT/usecases/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,10 @@
import argparse
from dataclasses import dataclass

from hackingBuddyGPT.utils.logging import GlobalLogger
from hackingBuddyGPT.utils.logging import Logger, log_param
from typing import Dict, Type, TypeVar, Generic

from hackingBuddyGPT.utils.configurable import ParameterDefinitions, build_parser, get_arguments, get_class_parameters, \
Transparent, ParserState

from hackingBuddyGPT.utils.configurable import Transparent, configurable

@dataclass
class UseCase(abc.ABC):
Expand All @@ -22,22 +20,21 @@ class UseCase(abc.ABC):
so that they can be automatically discovered and run from the command line.
"""

log: GlobalLogger
log: Logger = log_param

def init(self, configuration):
def init(self):
"""
The init method is called before the run method. It is used to initialize the UseCase, and can be used to
perform any dynamic setup that is needed before the run method is called. One of the most common use cases is
setting up the llm capabilities from the tools that were injected.
"""
self.configuration = configuration
self.log.start_run(self.get_name(), self.serialize_configuration(configuration))
pass

def serialize_configuration(self, configuration) -> str:
return json.dumps(configuration)

@abc.abstractmethod
def run(self):
def run(self, configuration):
"""
The run method is the main method of the UseCase. It is used to run the UseCase, and should contain the main
logic. It is recommended to have only the main llm loop in here, and call out to other methods for the
Expand Down Expand Up @@ -70,7 +67,10 @@ def before_run(self):
def after_run(self):
pass

def run(self):
def run(self, configuration):
self.configuration = configuration
self.log.start_run(self.get_name(), self.serialize_configuration(configuration))

self.before_run()

turn = 1
Expand Down Expand Up @@ -98,31 +98,10 @@ def run(self):
raise


@dataclass
class _WrappedUseCase:
"""
A WrappedUseCase should not be used directly and is an internal tool used for initialization and dependency injection
of the actual UseCases.
"""

name: str
description: str
use_case: Type[UseCase]
parameters: ParameterDefinitions

def build_parser(self, parser: argparse.ArgumentParser):
parser_state = ParserState()
build_parser(self.parameters, parser, parser_state)
parser.set_defaults(use_case=self, parser_state=parser_state)

def __call__(self, args: argparse.Namespace):
return self.use_case(**get_arguments(self.parameters, args, args.parser_state))


use_cases: Dict[str, _WrappedUseCase] = dict()
use_cases: Dict[str, configurable] = dict()


T = TypeVar("T")
T = TypeVar("T", bound=type)


class AutonomousAgentUseCase(AutonomousUseCase, Generic[T]):
Expand All @@ -137,13 +116,12 @@ def get_name(self) -> str:
@classmethod
def __class_getitem__(cls, item):
item = dataclass(item)
item.__parameters__ = get_class_parameters(item)

class AutonomousAgentUseCase(AutonomousUseCase):
agent: Transparent(item) = None

def init(self, configuration):
super().init(configuration)
def init(self):
super().init()
self.agent.init()

def get_name(self) -> str:
Expand All @@ -169,7 +147,7 @@ def inner(cls):
name = cls.__name__.removesuffix("UseCase")
if name in use_cases:
raise IndexError(f"Use case with name {name} already exists")
use_cases[name] = _WrappedUseCase(name, description, cls, get_class_parameters(cls))
use_cases[name] = configurable(name, description)(cls)
return cls

return inner
Expand All @@ -181,4 +159,4 @@ def register_use_case(name: str, description: str, use_case: Type[UseCase]):
"""
if name in use_cases:
raise IndexError(f"Use case with name {name} already exists")
use_cases[name] = _WrappedUseCase(name, description, use_case, get_class_parameters(use_case))
use_cases[name] = configurable(name, description)(use_case)
4 changes: 2 additions & 2 deletions src/hackingBuddyGPT/usecases/examples/hintfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
class ExPrivEscLinuxHintFileUseCase(AutonomousAgentUseCase[LinuxPrivesc]):
hints: str = None

def init(self, configuration):
super().init(configuration)
def init(self):
super().init()
self.agent.hint = self.read_hint()

# simple helper that reads the hints file and returns the hint
Expand Down
4 changes: 2 additions & 2 deletions src/hackingBuddyGPT/usecases/rag/linux.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ def init(self):
class ThesisLinuxPrivescPrototypeUseCase(AutonomousAgentUseCase[ThesisLinuxPrivescPrototype]):
hints: str = ""

def init(self,configuration):
super().init(configuration)
def init(self):
super().init()
if self.hints != "":
self.agent.hint = self.read_hint()

Expand Down
30 changes: 22 additions & 8 deletions src/hackingBuddyGPT/usecases/viewer.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from starlette.templating import Jinja2Templates

from hackingBuddyGPT.usecases.base import UseCase, use_case
from hackingBuddyGPT.utils.configurable import parameter
from hackingBuddyGPT.utils.db_storage import DbStorage
from hackingBuddyGPT.utils.db_storage.db_storage import (
Message,
Expand Down Expand Up @@ -205,10 +206,9 @@ class Viewer(UseCase):
TODOs:
- [ ] This server needs to be as async as possible to allow good performance, but the database accesses are not yet, might be an issue?
"""
log: GlobalLocalLogger
log_db: DbStorage
listen_host: str = "127.0.0.1"
listen_port: int = 4444
log: GlobalLocalLogger = None
log_db: DbStorage = None
log_server_address: str = "127.0.0.1:4444"
save_playback_dir: str = ""

async def save_message(self, message: ControlMessage):
Expand All @@ -232,7 +232,7 @@ async def save_message(self, message: ControlMessage):
with open(file_path, "a") as f:
f.write(ReplayMessage(datetime.datetime.now(), message).to_json() + "\n")

def run(self):
def run(self, config):
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.db = self.log_db
Expand Down Expand Up @@ -337,16 +337,30 @@ async def client_endpoint(websocket: WebSocket):
print("Egress WebSocket disconnected")

import uvicorn
uvicorn.run(app, host=self.listen_host, port=self.listen_port)
listen_parts = self.log_server_address.split(":", 1)
if len(listen_parts) != 2:
if listen_parts[0].startswith("http://"):
listen_parts.append("80")
elif listen_parts[0].startswith("https://"):
listen_parts.append("443")
else:
raise ValueError(f"Invalid log server address (does not contain http/https or a port): {self.log_server_address}")

listen_host, listen_port = listen_parts[0], int(listen_parts[1])
if listen_host.startswith("http://"):
listen_host = listen_host[len("http://"):]
elif listen_host.startswith("https://"):
listen_host = listen_host[len("https://"):]
uvicorn.run(app, host=listen_host, port=listen_port)

def get_name(self) -> str:
return "log_viewer"


@use_case("Tool to replay the .jsonl logs generated by the Viewer (not well tested)")
class Replayer(UseCase):
log: GlobalRemoteLogger
replay_file: str
log: GlobalRemoteLogger = None
replay_file: str = None
pause_on_message: bool = False
pause_on_tool_calls: bool = False
playback_speed: float = 1.0
Expand Down
2 changes: 1 addition & 1 deletion src/hackingBuddyGPT/utils/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .configurable import Configurable, configurable
from .configurable import Configurable, configurable, parameter
from .console import *
from .db_storage import *
from .llm_util import *
Expand Down
Loading