ipa-lab · andreashappe · Apr 24, 2025 · Apr 15, 2025 · Apr 15, 2025 · Apr 21, 2025
@@ -177,13 +177,54 @@ $ vi .env
 
 # if you start wintermute without parameters, it will list all available use cases
 $ python src/hackingBuddyGPT/cli/wintermute.py
-usage: wintermute.py [-h]
-                     {LinuxPrivesc,WindowsPrivesc,ExPrivEscLinux,ExPrivEscLinuxTemplated,ExPrivEscLinuxHintFile,ExPrivEscLinuxLSE,MinimalWebTesting,WebTestingWithExplanation,SimpleWebAPITesting,SimpleWebAPIDocumentation}
-                     ...
-wintermute.py: error: the following arguments are required: {LinuxPrivesc,WindowsPrivesc,ExPrivEscLinux,ExPrivEscLinuxTemplated,ExPrivEscLinuxHintFile,ExPrivEscLinuxLSE,MinimalWebTesting,WebTestingWithExplanation,SimpleWebAPITesting,SimpleWebAPIDocumentation}
+No command provided
+usage: src/hackingBuddyGPT/cli/wintermute.py  <command> [--help] [--config config.json] [options...]
+
+commands:
+    ExPrivEscLinux                  Showcase Minimal Linux Priv-Escalation
+    ExPrivEscLinuxTemplated         Showcase Minimal Linux Priv-Escalation
+    LinuxPrivesc                    Linux Privilege Escalation
+    WindowsPrivesc                  Windows Privilege Escalation
+    ExPrivEscLinuxHintFile          Linux Privilege Escalation using hints from a hint file initial guidance
+    ExPrivEscLinuxLSE               Linux Privilege Escalation using lse.sh for initial guidance
+    WebTestingWithExplanation       Minimal implementation of a web testing use case while allowing the llm to 'talk'
+    SimpleWebAPIDocumentation       Minimal implementation of a web API testing use case
+    SimpleWebAPITesting             Minimal implementation of a web API testing use case
+    Viewer                          Webserver for (live) log viewing
+    Replayer                        Tool to replay the .jsonl logs generated by the Viewer (not well tested)
+    ThesisLinuxPrivescPrototype     Thesis Linux Privilege Escalation Prototype
+
+# to get more information about how to configure a use case you can call it with --help
+$ python src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc --help
+usage: src/hackingBuddyGPT/cli/wintermute.py LinuxPrivesc [--help] [--config config.json] [options...]
+
+    --log.log_server_address='localhost:4444'    address:port of the log server to be used (default from builtin)
+    --log.tag=''    Tag for your current run (default from builtin)
+    --log='local_logger'    choice of logging backend (default from builtin)
+    --log_db.connection_string='wintermute.sqlite3'    sqlite3 database connection string for logs (default from builtin)
+    --max_turns='30'     (default from .env file, alternatives: 10 from builtin)
+    --llm.api_key=<secret>    OpenAI API Key (default from .env file)
+    --llm.model    OpenAI model name
+    --llm.context_size='100000'    Maximum context size for the model, only used internally for things like trimming to the context size (default from .env file)
+    --llm.api_url='https://api.openai.com'    URL of the OpenAI API (default from builtin)
+    --llm.api_path='/v1/chat/completions'    Path to the OpenAI API (default from builtin)
+    --llm.api_timeout=240    Timeout for the API request (default from builtin)
+    --llm.api_backoff=60    Backoff time in seconds when running into rate-limits (default from builtin)
+    --llm.api_retries=3    Number of retries when running into rate-limits (default from builtin)
+    --system='linux'     (default from builtin)
+    --enable_explanation=False     (default from builtin)
+    --enable_update_state=False     (default from builtin)
+    --disable_history=False     (default from builtin)
+    --hint=''     (default from builtin)
+    --conn.host
+    --conn.hostname
+    --conn.username
+    --conn.password
+    --conn.keyfilename
+    --conn.port='2222'     (default from .env file, alternatives: 22 from builtin)
 ```
 
-## Provide a Target Machine over SSH
+### Provide a Target Machine over SSH
 
 The next important part is having a machine that we can run our agent against. In our case, the target machine will be situated at `192.168.122.151`.
 
@@ -193,6 +234,23 @@ We are using vulnerable Linux systems running in Virtual Machines for this. Neve
 >
 > We are using virtual machines from our [Linux Privilege-Escalation Benchmark](https://github.com/ipa-lab/benchmark-privesc-linux) project. Feel free to use them for your own research!
 
+## Using the web based viewer and replayer
+
+If you want to have a better representation of the agent's output, you can use the web-based viewer. You can start it using `wintermute Viewer`, which will run the server on `http://127.0.0.1:4444` for the default `wintermute.sqlite3` database. You can change these options using the `--log_server_address` and `--log_db.connection_string` parameters.
+
+Navigating to the log server address will show you an overview of all runs and clicking on a run will show you the details of that run. The viewer updates live using a websocket connection, and if you enable `Follow new runs` it will automatically switch to the new run when one is started.
+
+Keep in mind that there is no additional protection for this webserver, other than how it can be reached (per default binding to `127.0.0.1` means it can only be reached from your local machine). If you make it accessible to the internet, everybody will be able to see all of your runs and also be able to inject arbitrary data into the database.
+
+Therefore **DO NOT** make it accessible to the internet if you're not super sure about what you're doing!
+
+There is also the experimental replay functionality, which can replay a run live from a capture file, including timing information. This is great for showcases and presentations, because it looks like everything is happening live and for real, but you know exactly what the results will be.
+
+To use this, the run needs to be captured by a Viewer server by setting `--save_playback_dir` to a directory where the viewer can write the capture files.
+
+With the Viewer server still running, you can then start `wintermute Replayer --replay_file <path_to_capture_file>` to replay the captured run (this will create a new run in the database).
+You can configure it to `--pause_on_message` and `--pause_on_tool_calls`, which will interrupt the replay at the respective points until enter is pressed in the shell where you run the Replayer in. You can also configure the `--playback_speed` to control the speed of the replay.
+
 ## Use Cases
 
 GitHub Codespaces:

@@ -45,6 +45,11 @@ dependencies = [
     'uvicorn[standard] == 0.30.6',
     'dataclasses_json == 0.6.7',
     'websockets == 13.1',
+    'langchain-community',
+    'langchain-openai',
+    'markdown',
+    'chromadb',
+    'langchain-chroma',
 ]
 
 [project.urls]

@@ -2,19 +2,22 @@
 import sys
 
 from hackingBuddyGPT.usecases.base import use_cases
+from hackingBuddyGPT.utils.configurable import CommandMap, InvalidCommand, Parseable, instantiate
 
 
 def main():
-    parser = argparse.ArgumentParser()
-    subparser = parser.add_subparsers(required=True)
-    for name, use_case in use_cases.items():
-        use_case.build_parser(subparser.add_parser(name=name, help=use_case.description))
-
-    parsed = parser.parse_args(sys.argv[1:])
-    configuration = {k: v for k, v in vars(parsed).items() if k not in ("use_case", "parser_state")}
-    instance = parsed.use_case(parsed)
-    instance.init(configuration=configuration)
-    instance.run()
+    use_case_parsers: CommandMap = {
+        name: Parseable(use_case, description=use_case.description)
+        for name, use_case in use_cases.items()
+    }
+    try:
+        instance, configuration = instantiate(sys.argv, use_case_parsers)
+    except InvalidCommand as e:
+        if len(f"{e}") > 0:
+            print(e)
+        print(e.usage)
+        sys.exit(1)
+    instance.run(configuration)
 
 
 if __name__ == "__main__":

@@ -4,7 +4,7 @@
 from mako.template import Template
 from typing import Dict
 
-from hackingBuddyGPT.utils.logging import log_conversation, GlobalLogger
+from hackingBuddyGPT.utils.logging import log_conversation, Logger, log_param
 from hackingBuddyGPT.capabilities.capability import (
     Capability,
     capabilities_to_simple_text_handler,
@@ -15,7 +15,7 @@
 
 @dataclass
 class Agent(ABC):
-    log: GlobalLogger = None
+    log: Logger = log_param
 
     _capabilities: Dict[str, Capability] = field(default_factory=dict)
     _default_capability: Capability = None

@@ -3,12 +3,10 @@
 import argparse
 from dataclasses import dataclass
 
-from hackingBuddyGPT.utils.logging import GlobalLogger
+from hackingBuddyGPT.utils.logging import Logger, log_param
 from typing import Dict, Type, TypeVar, Generic
 
-from hackingBuddyGPT.utils.configurable import ParameterDefinitions, build_parser, get_arguments, get_class_parameters, \
-    Transparent, ParserState
-
+from hackingBuddyGPT.utils.configurable import Transparent, configurable
 
 @dataclass
 class UseCase(abc.ABC):
@@ -22,22 +20,21 @@ class UseCase(abc.ABC):
     so that they can be automatically discovered and run from the command line.
     """
 
-    log: GlobalLogger
+    log: Logger = log_param
 
-    def init(self, configuration):
+    def init(self):
         """
         The init method is called before the run method. It is used to initialize the UseCase, and can be used to
         perform any dynamic setup that is needed before the run method is called. One of the most common use cases is
         setting up the llm capabilities from the tools that were injected.
         """
-        self.configuration = configuration
-        self.log.start_run(self.get_name(), self.serialize_configuration(configuration))
+        pass
 
     def serialize_configuration(self, configuration) -> str:
         return json.dumps(configuration)
 
     @abc.abstractmethod
-    def run(self):
+    def run(self, configuration):
         """
         The run method is the main method of the UseCase. It is used to run the UseCase, and should contain the main
         logic. It is recommended to have only the main llm loop in here, and call out to other methods for the
@@ -70,7 +67,10 @@ def before_run(self):
     def after_run(self):
         pass
 
-    def run(self):
+    def run(self, configuration):
+        self.configuration = configuration
+        self.log.start_run(self.get_name(), self.serialize_configuration(configuration))
+
         self.before_run()
 
         turn = 1
@@ -98,31 +98,10 @@ def run(self):
             raise
 
 
-@dataclass
-class _WrappedUseCase:
-    """
-    A WrappedUseCase should not be used directly and is an internal tool used for initialization and dependency injection
-    of the actual UseCases.
-    """
-
-    name: str
-    description: str
-    use_case: Type[UseCase]
-    parameters: ParameterDefinitions
-
-    def build_parser(self, parser: argparse.ArgumentParser):
-        parser_state = ParserState()
-        build_parser(self.parameters, parser, parser_state)
-        parser.set_defaults(use_case=self, parser_state=parser_state)
-
-    def __call__(self, args: argparse.Namespace):
-        return self.use_case(**get_arguments(self.parameters, args, args.parser_state))
-
-
-use_cases: Dict[str, _WrappedUseCase] = dict()
+use_cases: Dict[str, configurable] = dict()
 
 
-T = TypeVar("T")
+T = TypeVar("T", bound=type)
 
 
 class AutonomousAgentUseCase(AutonomousUseCase, Generic[T]):
@@ -137,13 +116,12 @@ def get_name(self) -> str:
     @classmethod
     def __class_getitem__(cls, item):
         item = dataclass(item)
-        item.__parameters__ = get_class_parameters(item)
 
         class AutonomousAgentUseCase(AutonomousUseCase):
             agent: Transparent(item) = None
 
-            def init(self, configuration):
-                super().init(configuration)
+            def init(self):
+                super().init()
                 self.agent.init()
 
             def get_name(self) -> str:
@@ -169,7 +147,7 @@ def inner(cls):
         name = cls.__name__.removesuffix("UseCase")
         if name in use_cases:
             raise IndexError(f"Use case with name {name} already exists")
-        use_cases[name] = _WrappedUseCase(name, description, cls, get_class_parameters(cls))
+        use_cases[name] = configurable(name, description)(cls)
         return cls
 
     return inner
@@ -181,4 +159,4 @@ def register_use_case(name: str, description: str, use_case: Type[UseCase]):
     """
     if name in use_cases:
         raise IndexError(f"Use case with name {name} already exists")
-    use_cases[name] = _WrappedUseCase(name, description, use_case, get_class_parameters(use_case))
+    use_cases[name] = configurable(name, description)(use_case)
@@ -8,8 +8,8 @@
 class ExPrivEscLinuxHintFileUseCase(AutonomousAgentUseCase[LinuxPrivesc]):
     hints: str = None
 
-    def init(self, configuration):
-        super().init(configuration)
+    def init(self):
+        super().init()
         self.agent.hint = self.read_hint()
 
     # simple helper that reads the hints file and returns the hint

@@ -20,8 +20,8 @@ def init(self):
 class ThesisLinuxPrivescPrototypeUseCase(AutonomousAgentUseCase[ThesisLinuxPrivescPrototype]):
     hints: str = ""
 
-    def init(self,configuration):
-        super().init(configuration)
+    def init(self):
+        super().init()
         if self.hints != "":
             self.agent.hint = self.read_hint()
 

@@ -18,6 +18,7 @@
 from starlette.templating import Jinja2Templates
 
 from hackingBuddyGPT.usecases.base import UseCase, use_case
+from hackingBuddyGPT.utils.configurable import parameter
 from hackingBuddyGPT.utils.db_storage import DbStorage
 from hackingBuddyGPT.utils.db_storage.db_storage import (
     Message,
@@ -205,10 +206,9 @@ class Viewer(UseCase):
     TODOs:
     - [ ] This server needs to be as async as possible to allow good performance, but the database accesses are not yet, might be an issue?
     """
-    log: GlobalLocalLogger
-    log_db: DbStorage
-    listen_host: str = "127.0.0.1"
-    listen_port: int = 4444
+    log: GlobalLocalLogger = None
+    log_db: DbStorage = None
+    log_server_address: str = "127.0.0.1:4444"
     save_playback_dir: str = ""
 
     async def save_message(self, message: ControlMessage):
@@ -232,7 +232,7 @@ async def save_message(self, message: ControlMessage):
         with open(file_path, "a") as f:
             f.write(ReplayMessage(datetime.datetime.now(), message).to_json() + "\n")
 
-    def run(self):
+    def run(self, config):
         @asynccontextmanager
         async def lifespan(app: FastAPI):
             app.state.db = self.log_db
@@ -337,16 +337,30 @@ async def client_endpoint(websocket: WebSocket):
                 print("Egress WebSocket disconnected")
 
         import uvicorn
-        uvicorn.run(app, host=self.listen_host, port=self.listen_port)
+        listen_parts = self.log_server_address.split(":", 1)
+        if len(listen_parts) != 2:
+            if listen_parts[0].startswith("http://"):
+                listen_parts.append("80")
+            elif listen_parts[0].startswith("https://"):
+                listen_parts.append("443")
+            else:
+                raise ValueError(f"Invalid log server address (does not contain http/https or a port): {self.log_server_address}")
+
+        listen_host, listen_port = listen_parts[0], int(listen_parts[1])
+        if listen_host.startswith("http://"):
+            listen_host = listen_host[len("http://"):]
+        elif listen_host.startswith("https://"):
+            listen_host = listen_host[len("https://"):]
+        uvicorn.run(app, host=listen_host, port=listen_port)
 
     def get_name(self) -> str:
         return "log_viewer"
 
 
 @use_case("Tool to replay the .jsonl logs generated by the Viewer (not well tested)")
 class Replayer(UseCase):
-    log: GlobalRemoteLogger
-    replay_file: str
+    log: GlobalRemoteLogger = None
+    replay_file: str = None
     pause_on_message: bool = False
     pause_on_tool_calls: bool = False
     playback_speed: float = 1.0

@@ -1,4 +1,4 @@
-from .configurable import Configurable, configurable
+from .configurable import Configurable, configurable, parameter
 from .console import *
 from .db_storage import *
 from .llm_util import *