Skip to content

Conversation

@pierlj
Copy link
Member

@pierlj pierlj commented Dec 6, 2023

Description

Add multilanguages support inside the scanner for LLMs. Prompt the generator model to output evaluation queries in specified languages.

Related Issue

GSK-2152

Type of Change

  • Adds a method inside the Dataset class to extract languages from the 'text' columns of the datasets.
  • Add language requirement inside the generator prompts.
@pierlj pierlj requested a review from mattbit December 6, 2023 09:03
@mattbit mattbit changed the title GSK-2152 Add multilanguages scanner inputs Dec 6, 2023
@pierlj pierlj marked this pull request as ready for review December 7, 2023 09:10
def run(self, model: BaseModel, dataset: Dataset, features=None) -> Sequence[Issue]:
# Generate inputs
generator = ImplausibleDataGenerator(llm_temperature=0.1)
languages = dataset.extract_languages()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@mattbit mattbit enabled auto-merge December 8, 2023 11:56
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 8, 2023

@mattbit mattbit merged commit 12f1285 into main Dec 8, 2023
@mattbit mattbit deleted the GSK-2152-scanner-mulitlanguage-input branch December 8, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants