-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
add FastSentenceTransformer for easily finetuning SentenceTransformer models #3719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
Summary of ChangesHello @electroglyph, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands Unsloth's capabilities by introducing Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
just getting this started, i'll get to the docs (and any suggestions) tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces FastSentenceTransformer, a new class for easily finetuning SentenceTransformer models with Unsloth's optimizations. The implementation is solid, providing from_pretrained and get_peft_model methods that correctly integrate with the existing FastModel framework. The code handles loading quantized models, applying PEFT, and constructing a SentenceTransformer object. My review includes a couple of suggestions to improve code style and maintainability, such as moving imports to the top level and refactoring a conditional block to be more concise.
for more information, see https://pre-commit.ci
|
Thank you amazing!! Please let us know if you'd like to collab on a blog as well :) |
absolutely, that would be great! unslothai/unsloth-zoo#383 will add XLMRobertaModel support |
… a fix for XLMRobertaModel
for more information, see https://pre-commit.ci
|
here is a colab notebook to test the current code: https://colab.research.google.com/drive/1Wu-lB33o8JdeKT1R38uGLbd0yCqluYML?usp=sharing |
|
Ok I tested your notebook and it seems to work fine. Imma review it in the morning |
Datta0
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I have some queries and comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
for more information, see https://pre-commit.ci
|
edited: Dec 20 (removed a few incompatible types from list, like model2vec) here's current compatibility status, i tested training the top 100 encoder embedding models (by download number) and 87 out of 100 can be trained right now: https://0x0.st/Ps1q.txt training, inference, saving working on all supported models, avg. memory reduction around 85% i haven't yet figured out a generic way to switch all supported models from training mode to inference mode in memory. |
… + SDPA The original implementation was 31% slower than naive SentenceTransformer due to conflicting decorators from Unsloth's auto-compiler (@torch.compile on attention modules but @torch.compiler.disable on sub-modules). Changes: - Add fast encoder path that bypasses Unsloth patching for encoder models - Use native torch.compile with mode="reduce-overhead" for 6x speedup - Auto-detect and enable SDPA for models that support it (BERT, RoBERTa, etc.) - Change defaults: load_in_16bit=True, load_in_4bit=False (16-bit is optimal) - Change default: use_gradient_checkpointing=False (conflicts with torch.compile) - Add UNSLOTH_COMPILE_DISABLE=1 env var to fall back to old path if needed Supported encoder types: mpnet, bert, distilbert, roberta, xlm-roberta, albert, electra Benchmark results (BS=32, seq_len=128): - Naive 16-bit LoRA: 13-50ms per iter - Unsloth 16-bit LoRA: 2-9ms per iter (5.4x-6.7x faster) - Memory usage: 61MB-1.3GB (even largest model fits easily) Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665). 4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead, so 16-bit is recommended for these small encoder models anyway.
|
Added performance fix for FastSentenceTransformer. The original implementation was 31% slower than naive SentenceTransformer due to conflicting decorators from Unsloth's auto-compiler. Fixed by adding a fast encoder path that uses native torch.compile + SDPA. Changes:
Benchmark results (BS=32, seq_len=128):
Memory usage (16-bit LoRA + torch.compile):
Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665). 4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead, so 16-bit is recommended for these small encoder models. |
for more information, see https://pre-commit.ci
Changed from peft.prepare_model_for_kbit_training to unsloth.models._utils.prepare_model_for_kbit_training. Unsloth's version provides: - Float32 mixed precision upcasting for LoRA layers - Better numerical stability - Consistency with rest of Unsloth codebase
- Changed absolute import to relative: from ._utils import prepare_model_for_kbit_training - Added SUPPORTS_BFLOAT16 import for proper dtype detection - Handle devices that don't support bfloat16 by falling back to float16
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…nalysis Changes: - Change default compile_mode from "reduce-overhead" to "default" since CUDA Graphs (used by reduce-overhead) is incompatible with PEFT/LoRA - Add _estimate_compile_threshold() to calculate minimum steps needed for torch.compile to be beneficial based on model parameter count - Add _apply_torch_compile() helper with accelerate unwrap_model bug workaround - Defer torch.compile application to trainer initialization time so we can check max_steps against the breakeven threshold - Patch SentenceTransformerTrainer to auto-apply compile when max_steps exceeds the calculated threshold Breakeven thresholds (with 1.2x safety margin): - 22M params (MiniLM): ~1388 steps - 110M params (mpnet): ~242 steps - 335M params (snowflake): ~203 steps This ensures torch.compile warmup cost is only paid when training is long enough to benefit from the speedup.
for more information, see https://pre-commit.ci
|
Added auto-compile feature that intelligently decides whether to apply Changes
How it worksWhen creating a trainer: trainer = SentenceTransformerTrainer(model=model, args=args, ...)The patched trainer checks:
This ensures torch.compile warmup cost (~15-40 seconds) is only paid when training is long enough to benefit from the 1.5-2x speedup. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
/gemini review |
supersedes #3718
example training code: