-
Notifications
You must be signed in to change notification settings - Fork 114
feat: add vocabulary quantization #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
|
This new version stores all the relevant information (i.e., mappings, weights etc.) in the safetensors file. |
Pringled
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tested it on a few models and it works well I think. Only thing missing is some documentation; can probably add a small snippet in https://github.com/MinishLab/model2vec/blob/main/docs/usage.md?
| # Store the original dtype to restore it later | ||
| orig_dtype = embeddings.dtype | ||
|
|
||
| kmeans = KMeans(n_clusters=n_clusters, random_state=42, init="random") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is quite slow if the vocab is very large, is MiniBatchKmeans an option (or an optinal arg) that could help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we could consider that for a follow up if you think that's interesting
This PR adds vocabulary quantization.
The quantization itself is handled by scikit-learn ,so that has been added as a dependency to
quantization. There's a helper function to quantize already existing models, which can be imported directly:The quantization itself necessitated a lot of internal changes. Most prominently, the embeddings and token weights are now decoupled. Every token still has a unique weight, but shares embeddings with tokens in the same cluster.
Old models can still be created and loaded, so there are no breaking changes. Quantized models can't be loaded using the old model2vec, however.