Torch dtype #314

danielhanchen · 2024-04-08T13:19:21Z

No description provided.

* make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314 * fix model loading and clean merged model directory * revert default quant * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert mapper.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enable FP8 + RL training for bf16 models (#3440) * Enable FP8 + RL training for bf16 models **Summary:** Enable FP8 + RL training using TorchAO for 1.33x faster training and 42% less model memory usage: - We quantize the frozen LoRA weights into fp8 and keep the LoRA adapters in bf16 - We leverage TorchAO's `Float8Tensor`, which calls into fbgemm's fp8 x fp8 rowwise matmul kernel - For now, we need to do an offline quantization first, because vllm doesn't support on-the-fly quantization for torchao yet (this is in progress: vllm-project/vllm#26327) **Example usage:** ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = True, # set this to True ) \# the rest is the same as before model = FastLanguageModel.get_peft_model(...) ``` **Initial results:** ``` \# fp8 {'train_runtime': 1725.4337, 'train_samples_per_second': 0.232, 'train_steps_per_second': 0.058, 'train_loss': 0.00015715716748673002, 'epoch': 0.01} \# bf16 {'train_runtime': 2297.8145, 'train_samples_per_second': 0.174, 'train_steps_per_second': 0.044, 'train_loss': 0.00016081033063528594, 'epoch': 0.01} ``` <img width="1199" height="448" alt="Screenshot 2025-11-11 at 4 10 50 PM" src="https://github.com/user-attachments/assets/b6304afd-89e9-42b1-8064-775807e17b23" /> Test script: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423 **Requires:** - pytorch/ao#3158 (torchao nightly or 0.15.0+) - unslothai/unsloth-zoo#351 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * _get_inference_mode_context_manager * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update utils.py * Update utils.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update __init__.py * Fix/save torchao model loading logic (#3621) * make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR #314 * fix model loading and clean merged model directory * revert default quant * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert mapper.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update loader_utils.py * Update loader_utils.py * Add 128x128 PerBlock FP8 + RL (#3629) * Add 128x128 PerBlock FP8 + RL **Summary:** Following #3440, this PR extends torchao FP8 + RL support to also handle 128x128 PerBlock granularity (in addition to PerRow). **Example usage:** ``` model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen3-8B-Base", max_seq_length = 2048, load_in_4bit = False, fast_inference = True, max_lora_rank = 32, load_in_fp8 = "block", # or "row" or True ) ``` **Initial results:** TBD **Note:** - Requires pytorch/ao#3370 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Version * Update vision.py * Update rl.py * Add torch 2.9.1 * Fix auto installer * Update fp8.py * Float8 * Update fp8.py * Update mapper.py * Update mapper.py * Update loader_utils.py * Update loader.py * Update fp8.py * Versioning * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: andrewor14 <andrewor14@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>

danielhanchen added 30 commits March 7, 2024 04:32

Merge branch 'main' into nightly

b321ada

Merge branch 'main' into nightly

9b3dd3e

Update chat_templates.py

58d1f1e

Update save.py

f0a3c05

Update save.py

e6c3cdf

Update save.py

5c9629f

Update chat_templates.py

457b7ba

Update llama.py

daba749

model_name

14f0638

Update loader.py

1c1461a

Tokenizer overwritten

f887080

Merge branch 'main' into nightly

91877f5

Update llama.py

c0d9516

Update llama.py

baf8e4c

Update llama.py

6d2bc97

Update save.py

35c6d77

Accuracy

74fc5ca

Revert

1e8922a

Update save.py

08da057

Update fast_lora.py

c1f3e70

Update fast_lora.py

7e3abd1

Update fast_lora.py

c192ce3

Update fast_lora.py

c1728a9

Update fast_lora.py

23b7a57

Update chat_templates.py

8be91b0

Update save.py

b5d3d63

Update save.py

133097d

Update llama.py

aba595d

Merge branch 'main' into nightly

93d88ad

Update llama.py

2e87755

danielhanchen added 26 commits April 7, 2024 02:12

Update pyproject.toml

7509f01

Update pyproject.toml

4245120

Update tokenizer_utils.py

5fc9bdb

Update tokenizer_utils.py

b165ecf

Update llama.py

c126eef

Update save.py

c42cb8e

Merge branch 'main' into nightly

c0238aa

Update save.py

5e7ac22

checkpoint

bc28cef

Gemma 1.1

92a96b2

more models

6ca8593

Merge branch 'main' into nightly

8ad4f40

Update llama.py

4290028

Update llama.py

380a587

Update llama.py

561b1f6

Update llama.py

f5c88b2

Update llama.py

5ca6198

Update llama.py

874459e

Update llama.py

8e43c24

Update llama.py

7854c9d

Update llama.py

aa4efcd

Update llama.py

e4bfd99

Update llama.py

1db50bd

Update llama.py

982f50c

Update llama.py

6b98ef3

dtype

dad664e

danielhanchen merged commit 568eb74 into main Apr 8, 2024

rolandtannous added a commit to rolandtannous/unsloth that referenced this pull request Oct 9, 2025

make loading gpt-oss-BF16 faster. Linked to unsloth-zoo PR unslothai#314

c919fb1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Torch dtype #314

Torch dtype #314

Uh oh!

danielhanchen commented Apr 8, 2024

Labels

2 participants

Uh oh!

Torch dtype #314

Torch dtype #314

Uh oh!

Conversation

danielhanchen commented Apr 8, 2024

Labels

2 participants