Skip to content

Conversation

@stephantul
Copy link
Contributor

This PR speeds up loading by about 20ms by not recreating the StaticModel. We check if quantization is needed, and if not, we skip the entire quantize function. In addition, we also optimize the memory usage and flow of the quantize function: previously, if someone cast to the original embedding dtype, we would cast anyway.

@stephantul stephantul requested a review from Pringled September 11, 2025 08:46
@codecov
Copy link

codecov bot commented Sep 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
model2vec/model.py 95.29% <100.00%> (+0.05%) ⬆️
model2vec/quantization.py 97.14% <100.00%> (+0.36%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Copy link
Member

@Pringled Pringled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@stephantul stephantul merged commit 5a8578d into main Sep 11, 2025
6 checks passed
@stephantul stephantul deleted the quantize-loading-fix branch September 11, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants