Skip to content

Conversation

@Pringled
Copy link
Member

This PR adds the used vocabulary_quantization and embedding_dtype for quantization to the config, which makes it easier to see the precision of the model and (optionally) number of clusters for the vocabulary directly from the config.

E.g.:

{
    "model_type": "model2vec",
    "architectures": [
        "StaticModel"
    ],
    "tokenizer_name": "baai/bge-base-en-v1.5",
    "apply_pca": 256,
    "apply_zipf": true,
    "hidden_dim": 256,
    "seq_length": 1000000,
    "normalize": true,
    "vocabulary_quantization": 128,
    "embedding_dtype": "float16"
}
@Pringled Pringled requested a review from stephantul September 11, 2025 10:47
@codecov
Copy link

codecov bot commented Sep 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
model2vec/hf_utils.py 80.34% <100.00%> (+0.87%) ⬆️
model2vec/model.py 95.45% <100.00%> (+0.16%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
Copy link
Contributor

@stephantul stephantul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model can also be vocabulary quantized during distillation. You actually don't need to add it to the config to know the model is vocabulary quantized.

The recipe is as follows: if the number of tokens is not equal to the number of embeddings, then the vocabulary quantization value is equal to the number of embeddings. So you can just not store it until the model is saved, and calculate it on the fly. So this is why I am hesitant to actually add this, you can only get out of date information. Same with the embedding dtype, I guess? It can only be outdated.

Maybe just make both a property of a static model?

@Pringled
Copy link
Member Author

Right, if you load a model with quantization it's outdated, true. Still I think it's useful to know for a stored model what the dtype is; I was writing some tests and had a couple of quantized models, but to know the precision I had to actually load the safetensors. IMO it should be visible at least for saved models; what if we keep it in the config, and also make it a property of a static model that we update after loading? And then we also save it directly from the static model? That way it should never be outdated.

@stephantul
Copy link
Contributor

Sure sounds good, I don't think you should ever read the config value at run-time, just write it when you save.

@Pringled Pringled merged commit 66c30a5 into main Sep 29, 2025
6 checks passed
@Pringled Pringled deleted the add-quantization-type-to-config branch September 29, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants