Skip to content

Conversation

@luca-rossi
Copy link
Contributor

@luca-rossi luca-rossi commented Nov 24, 2023

This PR allow users to set maximum number of samples (max_dataset_size) in loss-based scan detectors.

@linear
Copy link

linear bot commented Nov 24, 2023

GSK-1953 Allow users to set maximum number of samples in loss-based scan detectors

Currently we limit to num_features * num_samples < 10 000 000. Let’s allow users to set a max_dataset_size (or similar) parameter to override this limit if needed.

@luca-rossi luca-rossi requested a review from mattbit November 24, 2023 12:35
@luca-rossi luca-rossi marked this pull request as ready for review November 24, 2023 13:45
@luca-rossi luca-rossi requested a review from mattbit November 24, 2023 13:45
Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, one small change


_needs_target = True

def __init__(self, max_dataset_size: int = MAX_DATASET_SIZE):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline this and probably divide by ~10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, also maybe it still makes sense to divide by n_features if the param is not specified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, you could set Optional[int] = None and calculate the default at runtime if not provided

@mattbit
Copy link
Member

mattbit commented Nov 24, 2023

Broken test!

@luca-rossi luca-rossi requested a review from mattbit November 24, 2023 14:44
Copy link
Member

@mattbit mattbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but please fix conflicts

@mattbit mattbit enabled auto-merge November 24, 2023 16:04
@mattbit mattbit self-requested a review November 24, 2023 16:05
@sonarqubecloud
Copy link

@mattbit mattbit merged commit a4ec7d5 into main Nov 24, 2023
@mattbit mattbit deleted the task/GSK-1953-custom-max-data-size branch November 24, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants