Skip to content

Conversation

@stephantul
Copy link
Contributor

This PR makes tokenizers about 45% smaller on disk. Loading time or compatibility is not impacted.

@stephantul stephantul requested a review from Pringled May 20, 2025 06:59
@codecov
Copy link

codecov bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
model2vec/hf_utils.py 75.00% <100.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@stephantul stephantul merged commit 3ac27c6 into main May 21, 2025
6 checks passed
@stephantul stephantul deleted the feat-smaller-tokenizers branch May 21, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants