Implements first version of modular capability system #15

Neverbolt · 2024-03-29T22:27:07Z

This adds a first implementation of a modular capability system as well as the corresponding dependency injection / parameter resolution.

This means, that if you define an @experiment class (and it is included somewhere from the main.py script), all parameters of its __init__ method will try to be filled from parameters (either command line arguments, the .env file, environment variables or python function default values), including recursively building parameters that expose a __parameters__ field (such as created by the @capability annotation on a class).

As example, the exisiting wintermute.py script has been re-implemented, and can now be executed with the experiment names linux_privesc_gpt35turbo, linux_privesc_gpt4 and linux_privesc_gpt4turbo (which use different versions of the GPT API respectively, as can be seen by their names) from main.py.

To know which parameters to include, you can call it as eg python3 main.py linux_privesc_gpt4turbo -h and get the following list of parameters:

  --log_db.connection_string LOG_DB.CONNECTION_STRING
  --ssh.host SSH.HOST
  --ssh.hostname SSH.HOSTNAME
  --ssh.username SSH.USERNAME
  --ssh.password SSH.PASSWORD
  --ssh.port SSH.PORT
  --enable_explanation ENABLE_EXPLANATION
  --enable_update_state ENABLE_UPDATE_STATE
  --disable_history DISABLE_HISTORY
  --max_turns MAX_TURNS
  --tag TAG
  --hints HINTS
  --llm.api_key LLM.API_KEY
  --llm.api_url LLM.API_URL
  --llm.api_timeout LLM.API_TIMEOUT
  --llm.api_backoff LLM.API_BACKOFF
  --llm.api_retries LLM.API_RETRIES
  --llm.model LLM.MODEL
  --llm.context_size LLM.CONTEXT_SIZE

While the help output probably needs to be improved (not all of these parameters are mandatory, and some are not even sensible to change without knowing what you are doing), they show you all that you can configure in this experiment.

This has been automatically generated from the following dependencies / options (@dataclass has been used here to automatically have all fields in a __init__ function and have them properly assigned):

@experiment("linux_privesc_gpt4turbo", "Linux Privilege Escalation")
@dataclass
class LinuxPrivescGPT4Turbo(LinuxPrivesc):
    llm: GPT4Turbo = None
    log_db: DbStorage
    ssh: SSHConnection
    console: Console
    enable_explanation: bool = False
    enable_update_state: bool = False
    disable_history: bool = False
    max_turns: int = 10
    tag: str = ""
    hints: str = ""

Comparing this you can also see, how the parameter names were built. All that start with eg llm. are parameters of the GPT35Turbo capability.

If you now for example set the ssh.password and ssh.username in the environment variables, and the llm.api_key and log_db.connection_string in your .env file, you can execute the command using

python3 main.py linux_privesc_gpt4turbo --ssh.host localhost --ssh.hostname 8ad6a1d4b8d8 --ssh.port 2222

This PR currently contains quite a bit of duplicated code, which has been left on purpose to ease the comparison between old and new version, and to allow better regression testing.

andreashappe · 2024-03-30T07:03:22Z

Thank you for your submission, a couple of questions:

On high-level: in its current form this looks less like a capability-system but rather more like a configurable-system. What is the semantics of a capability? When talking with you, I thought that a capability is something that can be used by an LLM from within an experiment, e.g., run_command_linux, run_command_psexec, test_credential and maybe later perform_http_request. This seems to not match the patch set though (;

When going through the capabilities, console / db_storage / open_ai are not operations called by an LLM but rather scaffold that is provided by hackingBuddyGPT (which then can be used by the experiment). I would rather see them in the utils folder (or maybe even a scaffold folder).

I like the overall experiment approach, but question why the used LLM is part of the experiment and not a configuration parameter. I also like the module-based approach that you're doing. And having a dependency-injection system should become very handy.

And during the conversion, we seemed to have dropped the windows support (on a first glance).

So on a first glance, I'd suggest to:

create the experiment sub-package (good idea)
maybe add a annotation configurable that will be used for the parameter parsing, etc.
create an annotation capabilities that is used for all capabilities that can be called-out from within an experiment. Move test_credentials, ssh_run_command, psexec_run_command into this hierarchy. Here we would have, e.g., the remote hostname/credential configuration.
move helper classes such as db_storage, console and open_ai into maybe utils. They might use the configurable annotation, but not the capability annotation (yet).
can we make the used high-level LLM configurable on the command line? Having to create an experiment for each different LLM might become problematic

Using the new @use_case and @configurable system, it is now possible to add further use-cases to the wintermute project, while having them automatically inject all required configurations. It also formalizes some parts of the code, such as LLM capabilities, which are now specifically created. This is still a point where some functionality is missing though, as the capabilities are not yet properly passed to function calling LLMs, which should be done in a future release. Proper care was taken, to not change the semantics of the resulting use-cases (such as local_privesc_linux) other than that the configuration now works differently. If there are regressions then this is not on purpose and should be reported. This commit also contains various code-cleanup operations, which happened through the re-work for the new system. However, especially helper methods, and the contents of the privesc use-case have largely stayed unchanged, other than adapting them to the new surrounding infrastructure.

Neverbolt · 2024-04-06T16:09:56Z

The requested changes were implemented, and I have squashed them to a single, properly documented commit.

andreashappe

I am not 100% happy but it's good enough (and better than the code before) to be used as base for further improvements

Neverbolt requested review from andreashappe and citostyle as code owners March 29, 2024 22:27

Neverbolt force-pushed the main branch from c6ca579 to d6fe107 Compare April 6, 2024 16:07

andreashappe approved these changes Apr 6, 2024

View reviewed changes

andreashappe merged commit 7436a5d into ipa-lab:main Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implements first version of modular capability system #15

Implements first version of modular capability system #15

Uh oh!

Neverbolt commented Mar 29, 2024

andreashappe commented Mar 30, 2024 •

edited

Loading

Neverbolt commented Apr 6, 2024

andreashappe left a comment

Labels

2 participants

Uh oh!

Implements first version of modular capability system #15

Implements first version of modular capability system #15

Uh oh!

Conversation

Neverbolt commented Mar 29, 2024

andreashappe commented Mar 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Neverbolt commented Apr 6, 2024

andreashappe left a comment

Choose a reason for hiding this comment

Labels

2 participants

andreashappe commented Mar 30, 2024 •

edited

Loading