add FastSentenceTransformer for easily finetuning SentenceTransformer models #3719

electroglyph · 2025-12-12T11:26:53Z

supersedes #3718

example training code:

from unsloth import FastSentenceTransformer
from sentence_transformers import (
    SentenceTransformerTrainer,
    SentenceTransformerTrainingArguments,
)
from datasets import Dataset
import torch

model_name = "Snowflake/snowflake-arctic-embed-m-v1.5"
model = FastSentenceTransformer.from_pretrained(
    model_name,
    load_in_4bit=True,
    device_map="cuda",
)

model = FastSentenceTransformer.get_peft_model(
    model,
    r=16,
    target_modules=["query", "key", "value", "dense"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    # task_type="FEATURE_EXTRACTION",
)

train_dataset = Dataset.from_dict(
    {
        "sentence_A": [
            "The cat sits outside",
            "A man is playing guitar",
            "I love pasta",
        ],
        "sentence_B": [
            "A man is playing guitar",
            "The woman loves that cat",
            "Do you like pizza?",
        ],
        "label": [0.0, 0.5, 1.0],
    }
)

from sentence_transformers.losses import CoSENTLoss

loss = CoSENTLoss(model)

args = SentenceTransformerTrainingArguments(
    output_dir="test_trainer_output",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=1,
    save_strategy="no",
    report_to="none",
)

trainer = SentenceTransformerTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    loss=loss,
)

trainer.train()

print("Training finished successfully!")

for more information, see https://pre-commit.ci

gemini-code-assist · 2025-12-12T11:27:11Z

Summary of Changes

Hello @electroglyph, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands Unsloth's capabilities by introducing FastSentenceTransformer, a new class designed to streamline the finetuning of SentenceTransformer models. By integrating Unsloth's optimized model loading and PEFT (LoRA) application, users can now efficiently train embedding models with reduced memory footprint and improved speed, broadening the range of models that can benefit from Unsloth's performance enhancements.

Highlights

New FastSentenceTransformer Class: Introduces FastSentenceTransformer to enable easy and optimized finetuning of SentenceTransformer models within the Unsloth framework.
Optimized Model Loading and PEFT Integration: Leverages Unsloth's FastModel for efficient loading of base models and seamlessly integrates PEFT (LoRA) for memory-efficient finetuning of the underlying transformer.
Automatic Pooling Mode Detection: Includes logic to automatically detect the pooling mode (e.g., 'cls', 'mean', 'max') from the SentenceTransformer model's configuration, ensuring correct setup.
Seamless SentenceTransformer Wrapper: Wraps the loaded base model into a sentence_transformers.SentenceTransformer instance, complete with Transformer, Pooling, and Normalize modules, making it compatible with the sentence-transformers ecosystem.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

electroglyph · 2025-12-12T11:27:39Z

just getting this started, i'll get to the docs (and any suggestions) tomorrow

gemini-code-assist

Code Review

This pull request introduces FastSentenceTransformer, a new class for easily finetuning SentenceTransformer models with Unsloth's optimizations. The implementation is solid, providing from_pretrained and get_peft_model methods that correctly integrate with the existing FastModel framework. The code handles loading quantized models, applying PEFT, and constructing a SentenceTransformer object. My review includes a couple of suggestions to improve code style and maintainability, such as moving imports to the top level and refactoring a conditional block to be more concise.

unsloth/models/sentence_transformer.py

for more information, see https://pre-commit.ci

shimmyshimmer · 2025-12-12T22:45:37Z

Thank you amazing!! Please let us know if you'd like to collab on a blog as well :)

electroglyph · 2025-12-13T09:36:00Z

Thank you amazing!! Please let us know if you'd like to collab on a blog as well :)

absolutely, that would be great!

unslothai/unsloth-zoo#383 will add XLMRobertaModel support

… a fix for XLMRobertaModel

for more information, see https://pre-commit.ci

electroglyph · 2025-12-15T07:35:13Z

here is a colab notebook to test the current code: https://colab.research.google.com/drive/1Wu-lB33o8JdeKT1R38uGLbd0yCqluYML?usp=sharing

Datta0 · 2025-12-15T16:42:37Z

Ok I tested your notebook and it seems to work fine. Imma review it in the morning

Datta0

Great work! I have some queries and comments :)

unsloth/models/sentence_transformer.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

unsloth/models/sentence_transformer.py

for more information, see https://pre-commit.ci

electroglyph · 2025-12-17T07:50:38Z

edited: Dec 20

(removed a few incompatible types from list, like model2vec)

here's current compatibility status, i tested training the top 100 encoder embedding models (by download number) and 87 out of 100 can be trained right now: https://0x0.st/Ps1q.txt

training, inference, saving working on all supported models, avg. memory reduction around 85%
from_pretrained() now takes for_inference bool to load models in inference mode

i haven't yet figured out a generic way to switch all supported models from training mode to inference mode in memory.

… + SDPA The original implementation was 31% slower than naive SentenceTransformer due to conflicting decorators from Unsloth's auto-compiler (@torch.compile on attention modules but @torch.compiler.disable on sub-modules). Changes: - Add fast encoder path that bypasses Unsloth patching for encoder models - Use native torch.compile with mode="reduce-overhead" for 6x speedup - Auto-detect and enable SDPA for models that support it (BERT, RoBERTa, etc.) - Change defaults: load_in_16bit=True, load_in_4bit=False (16-bit is optimal) - Change default: use_gradient_checkpointing=False (conflicts with torch.compile) - Add UNSLOTH_COMPILE_DISABLE=1 env var to fall back to old path if needed Supported encoder types: mpnet, bert, distilbert, roberta, xlm-roberta, albert, electra Benchmark results (BS=32, seq_len=128): - Naive 16-bit LoRA: 13-50ms per iter - Unsloth 16-bit LoRA: 2-9ms per iter (5.4x-6.7x faster) - Memory usage: 61MB-1.3GB (even largest model fits easily) Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665). 4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead, so 16-bit is recommended for these small encoder models anyway.

danielhanchen · 2026-01-05T06:55:50Z

Added performance fix for FastSentenceTransformer.

The original implementation was 31% slower than naive SentenceTransformer due to conflicting decorators from Unsloth's auto-compiler. Fixed by adding a fast encoder path that uses native torch.compile + SDPA.

Changes:

Fast encoder path bypasses Unsloth patching for encoder models (mpnet, bert, distilbert, roberta, xlm-roberta, albert, electra)
Uses torch.compile with mode="reduce-overhead" for 6x speedup
Auto-detects and enables SDPA for models that support it
Changed defaults: load_in_16bit=True, load_in_4bit=False, use_gradient_checkpointing=False
Added UNSLOTH_COMPILE_DISABLE=1 env var to fall back to old path if needed

Benchmark results (BS=32, seq_len=128):

Model	Naive	Unsloth	Speedup
MiniLM-L6	13.69ms	2.24ms	6.10x
MPNet-base	17.51ms	2.62ms	6.69x
BGE-large	49.48ms	9.15ms	5.41x

Memory usage (16-bit LoRA + torch.compile):

Model	BS=8	BS=32	BS=128
MiniLM-L6	61 MB	77 MB	143 MB
MPNet-base	237 MB	273 MB	417 MB
BGE-large	717 MB	837 MB	1320 MB

Note: 4-bit + torch.compile has a PyTorch bug (pytorch/pytorch#90665). 4-bit is also 1.7-1.9x slower than 16-bit due to dequantization overhead, so 16-bit is recommended for these small encoder models.

for more information, see https://pre-commit.ci

Changed from peft.prepare_model_for_kbit_training to unsloth.models._utils.prepare_model_for_kbit_training. Unsloth's version provides: - Float32 mixed precision upcasting for LoRA layers - Better numerical stability - Consistency with rest of Unsloth codebase

- Changed absolute import to relative: from ._utils import prepare_model_for_kbit_training - Added SUPPORTS_BFLOAT16 import for proper dtype detection - Handle devices that don't support bfloat16 by falling back to float16

for more information, see https://pre-commit.ci

…nalysis Changes: - Change default compile_mode from "reduce-overhead" to "default" since CUDA Graphs (used by reduce-overhead) is incompatible with PEFT/LoRA - Add _estimate_compile_threshold() to calculate minimum steps needed for torch.compile to be beneficial based on model parameter count - Add _apply_torch_compile() helper with accelerate unwrap_model bug workaround - Defer torch.compile application to trainer initialization time so we can check max_steps against the breakeven threshold - Patch SentenceTransformerTrainer to auto-apply compile when max_steps exceeds the calculated threshold Breakeven thresholds (with 1.2x safety margin): - 22M params (MiniLM): ~1388 steps - 110M params (mpnet): ~242 steps - 335M params (snowflake): ~203 steps This ensures torch.compile warmup cost is only paid when training is long enough to benefit from the speedup.

for more information, see https://pre-commit.ci

danielhanchen · 2026-01-08T12:02:47Z

Added auto-compile feature that intelligently decides whether to apply torch.compile based on a breakeven analysis.

Changes

Fixed CUDA Graphs incompatibility: Changed default compile mode from reduce-overhead to default. The reduce-overhead mode uses CUDA Graphs which causes errors with PEFT/LoRA:
```
RuntimeError: accessing tensor output of CUDAGraphs that has been overwritten
```
Added breakeven estimation: New _estimate_compile_threshold() calculates the minimum training steps needed for torch.compile warmup to pay off, based on model parameter count:
- 22M params (MiniLM): ~1388 steps threshold
- 110M params (mpnet): ~242 steps threshold
- 335M params (snowflake): ~203 steps threshold
Deferred compilation: torch.compile is now applied at trainer initialization time (not model creation time), allowing us to check max_steps against the threshold.
Added accelerate workaround: Fixed the unwrap_model bug by setting model.__dict__["_orig_mod"] = model after compilation.

How it works

When creating a trainer:

trainer = SentenceTransformerTrainer(model=model, args=args, ...)

The patched trainer checks:

If max_steps >= threshold: applies torch.compile (prints "Auto-compiling model")
If max_steps < threshold: skips compile (prints "Skipping torch.compile")

This ensures torch.compile warmup cost (~15-40 seconds) is only paid when training is long enough to benefit from the 1.5-2x speedup.

for more information, see https://pre-commit.ci

danielhanchen · 2026-01-22T14:56:33Z

/gemini review

for more information, see https://pre-commit.ci

electroglyph and others added 2 commits December 12, 2025 03:22

add FastSentenceTransformer

31ebe73

[pre-commit.ci] auto fixes from pre-commit.com hooks

2a937bd

for more information, see https://pre-commit.ci

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

unsloth/models/sentence_transformer.py Show resolved Hide resolved

unsloth/models/sentence_transformer.py Outdated Show resolved Hide resolved

electroglyph and others added 2 commits December 12, 2025 03:45

Gemini code review suggestions

36edcae

[pre-commit.ci] auto fixes from pre-commit.com hooks

66c8d8a

for more information, see https://pre-commit.ci

shimmyshimmer mentioned this pull request Dec 12, 2025

Embedding / bi-encoder fine-tuning with Unsloth + sentence-transformers #3718

Closed

electroglyph and others added 3 commits December 14, 2025 20:41

Merge branch 'unslothai:main' into FST

b8c28c5

unsloth-zoo patch only fixed usage for XLMRobertaForMaskedLM, this is…

a708cfe

… a fix for XLMRobertaModel

[pre-commit.ci] auto fixes from pre-commit.com hooks

33881c9

for more information, see https://pre-commit.ci

Datta0 reviewed Dec 16, 2025

View reviewed changes

electroglyph added 5 commits December 16, 2025 02:01

refactor do_lower_case

7349799

add some comments

0fa3cd3

force disable FP8 loading

e490ffd

refactor pooling detection, add missing pooling types

2c155fd

add save_pretrained_merged method which gets modules and config

e8e52b3

electroglyph marked this pull request as ready for review December 16, 2025 11:08

chatgpt-codex-connector bot reviewed Dec 16, 2025

View reviewed changes

unsloth/models/sentence_transformer.py Outdated Show resolved Hide resolved

electroglyph and others added 6 commits December 16, 2025 04:04

fix _save_pretrained_merged

1504f27

rename read_pooling_mode, load modules instead of hard-coding em

2286880

comment

52c2b73

revert save_pretrained_merged change

d4500fb

propagate trust_remote_code properly

40e397b

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c2fd02

for more information, see https://pre-commit.ci

danielhanchen added 3 commits January 4, 2026 05:50

Merge branch 'main' into pr/3719

0850912

Merge branch 'main' into pr/3719

d1a81d0

pre-commit-ci bot and others added 3 commits January 5, 2026 06:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee66329

for more information, see https://pre-commit.ci

Use relative imports and add float16 machine support

2e7828e

- Changed absolute import to relative: from ._utils import prepare_model_for_kbit_training - Added SUPPORTS_BFLOAT16 import for proper dtype detection - Handle devices that don't support bfloat16 by falling back to float16

danielhanchen force-pushed the FST branch from 1d311a6 to 2e7828e Compare January 5, 2026 10:23

pre-commit-ci bot and others added 6 commits January 5, 2026 10:23

[pre-commit.ci] auto fixes from pre-commit.com hooks

b2768df

for more information, see https://pre-commit.ci

Merge branch 'unslothai:main' into FST

d724f01

add save_pretrained_torchao

b40e1c7

[pre-commit.ci] auto fixes from pre-commit.com hooks

c1e5ff0

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

6361d9e

for more information, see https://pre-commit.ci

electroglyph and others added 10 commits January 10, 2026 00:00

do QAT preparation for fast path

efa9c3a

[pre-commit.ci] auto fixes from pre-commit.com hooks

577a6aa

for more information, see https://pre-commit.ci

fix double loading model, thanks Etherl

1a3ba2a

do mpnet gradient checkpoint patch if gc is enabled

2b8bb01

remove distilbert patches from mpnet fix

c9078f7

sanity check on model params, thanks Etherl

684ccb2

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b882e9

for more information, see https://pre-commit.ci

add save_pretrained_gguf, thanks Etherl

5841cfc

[pre-commit.ci] auto fixes from pre-commit.com hooks

0110ac7

for more information, see https://pre-commit.ci

Merge branch 'unslothai:main' into FST

11a650a

danielhanchen and others added 3 commits January 22, 2026 15:24

Refine compile threshold estimation for sentence transformers

23f0727

[pre-commit.ci] auto fixes from pre-commit.com hooks

6027ae7

for more information, see https://pre-commit.ci

Merge branch 'main' into FST

68ad3b0

danielhanchen merged commit 5011442 into unslothai:main Jan 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add FastSentenceTransformer for easily finetuning SentenceTransformer models #3719

add FastSentenceTransformer for easily finetuning SentenceTransformer models #3719

electroglyph commented Dec 12, 2025 •

edited

Loading

gemini-code-assist bot commented Dec 12, 2025

electroglyph commented Dec 12, 2025

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

shimmyshimmer commented Dec 12, 2025

electroglyph commented Dec 13, 2025 •

edited

Loading

electroglyph commented Dec 15, 2025

Datta0 commented Dec 15, 2025

Datta0 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

electroglyph commented Dec 17, 2025 •

edited

Loading

danielhanchen commented Jan 5, 2026

danielhanchen commented Jan 8, 2026

danielhanchen commented Jan 22, 2026

Uh oh!

Labels

5 participants

Uh oh!

add FastSentenceTransformer for easily finetuning SentenceTransformer models #3719

add FastSentenceTransformer for easily finetuning SentenceTransformer models #3719

Conversation

electroglyph commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gemini-code-assist bot commented Dec 12, 2025

Summary of Changes

Highlights

Footnotes

electroglyph commented Dec 12, 2025

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

shimmyshimmer commented Dec 12, 2025

electroglyph commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

electroglyph commented Dec 15, 2025

Datta0 commented Dec 15, 2025

Datta0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

electroglyph commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

danielhanchen commented Jan 5, 2026

danielhanchen commented Jan 8, 2026

Changes

How it works

danielhanchen commented Jan 22, 2026

Uh oh!

Labels

5 participants

electroglyph commented Dec 12, 2025 •

edited

Loading

electroglyph commented Dec 13, 2025 •

edited

Loading

electroglyph commented Dec 17, 2025 •

edited

Loading