Skip to content

Conversation

@yangjianxin1
Copy link
Contributor

We add support of Qwen2 which is important for open-source community. Our repo Firefly has already supported training Qwen2 with Unsloth, more experiment details can be seen in our model card.

We have evaluated the training gain of Qwen1.5-7B, we use QLoRA and Unsloth to train Qwen1.5-7B for 20 steps on a single V100. The result can be listed as follows. Unsloth can reduce GPU memory by 39.13% and training time by 32.12%, and the training speed can increase by 47.32%.

max_seq_length per_device_train_batch_size gradient_accumulation_steps use_unsloth rank GPU Time
1024 1 16 false 8 13.72GB 448s
1024 1 16 true 8 8.43GB(-38.56%) 308s(-31.25%)
1024 1 16 false 64 16.01GB 452s
1024 1 16 true 64 11.07GB(-30.86%) 311s(-31.19%)
2048 1 16 false 64 18.55GB 840s
2048 1 16 true 64 12.99GB(-29.97%) 596s(-29.05%)
1024 4 4 false 64 24.70GB 357s
1024 4 4 true 64 14.36GB(-41.86%) 253s(-29.13%)
2048 4 4 false 64 32.51GB 741s
2048 4 4 true 64 19.79GB(-39.13%) 503s(-32.12%)

We also evaluate our sft and dpo models with Unsloth on Open LLM Leaderboard, they achieve good performance and outperform the official Qwen1.5-7B-Chat.

Model Average ARC HellaSwag MMLU TruthfulQA Winogrande GSM8K
firefly-gemma-7b 62.93 62.12 79.77 61.57 49.41 75.45 49.28
firefly-qwen1.5-en-7b-dpo-v0.1-unsloth 62.65 56.14 75.5 60.87 58.09 70.72 54.59
zephyr-7b-beta 61.95 62.03 84.36 61.07 57.45 77.74 29.04
firefly-qwen1.5-en-7b-unsloth 61.81 54.27 76.22 61.55 50.62 70.48 57.7
vicuna-13b-v1.5 55.41 57.08 81.24 56.67 51.51 74.66 11.3
Xwin-LM-13B-V0.1 55.29 62.54 82.8 56.53 45.96 74.27 9.63
Qwen1.5-7B-Chat 55.15 55.89 78.56 61.65 53.54 67.72 13.57
gemma-7b-it 53.56 51.45 71.96 53.52 47.29 67.96 29.19
@danielhanchen
Copy link
Contributor

@yangjianxin1 Oh wait does Qwen2 not have that weird alternating sliding window & normal attention thingo?

@yangjianxin1
Copy link
Contributor Author

Yes, there is not weird alternating sliding window & normal attention in Qwen2, and its use_sliding_window is false in the config.json.
And I have compared the code between Llama and Qwen2 almost line by line, they are very similar.

@danielhanchen danielhanchen changed the base branch from main to nightly May 10, 2024 16:57
@danielhanchen
Copy link
Contributor

Thanks for the PR again! I streamlined Qwen2 to call FastMistralModel (since I think it's an exact replica right?)

@danielhanchen danielhanchen merged commit cf83fe3 into unslothai:nightly May 10, 2024
@danielhanchen danielhanchen mentioned this pull request May 10, 2024
danielhanchen added a commit that referenced this pull request May 12, 2024
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

* Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415)

* Adds dependencies and extras for torch 2.3.0 with new xformers versions

* Add 2.3.0 section to readme

* Support Qwen2 (#428)

* support Qwen2

* support Qwen2

* Delete README.md

* Revert "Delete README.md"

This reverts commit 026b05f.

* Update README.md

* Qwen2 == Mistral

* Update llama.py

* Update __init__.py

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update save.py

* test_hf_gguf_equivalence

* Update chat_templates.py

* Update chat_templates.py

* --pad-vocab

* Update tokenizer_utils.py

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com>
Co-authored-by: Yang JianXin <995462226@qq.com>
danielhanchen added a commit that referenced this pull request May 13, 2024
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

* Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415)

* Adds dependencies and extras for torch 2.3.0 with new xformers versions

* Add 2.3.0 section to readme

* Support Qwen2 (#428)

* support Qwen2

* support Qwen2

* Delete README.md

* Revert "Delete README.md"

This reverts commit 026b05f.

* Update README.md

* Qwen2 == Mistral

* Update llama.py

* Update __init__.py

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update save.py

* test_hf_gguf_equivalence

* Update chat_templates.py

* Update chat_templates.py

* --pad-vocab

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Unspecified max_seq_length

* possible_pad_token

* Update tokenizer_utils.py

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com>
Co-authored-by: Yang JianXin <995462226@qq.com>
danielhanchen added a commit that referenced this pull request May 16, 2024
* Fix prompt

* Update chat_templates.py

* fix_untrained_tokens

* Update llama.py

* add tokens

* Update _utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* pad_token

* Update chat_templates.py

* Update chat_templates.py

* tokenizer

* Update save.py

* Update chat_templates.py

* Update chat_templates.py

* patch tokenizer padding

* Update tokenizer_utils.py

* Update save.py

* Fix: loading models with resized vocabulary (#377)

* new: vocab resize on load

* new: gitignore

* GGUF fix

* Readme (#390)

* Update README.md

* Update README.md

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>

* Update README.md

* Delete .gitignore

* Phi-3

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix reserved tokens

* Update save.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update chat_templates.py

* Update save.py

* Update _utils.py

* Update chat_templates.py

* Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415)

* Adds dependencies and extras for torch 2.3.0 with new xformers versions

* Add 2.3.0 section to readme

* Support Qwen2 (#428)

* support Qwen2

* support Qwen2

* Delete README.md

* Revert "Delete README.md"

This reverts commit 026b05f.

* Update README.md

* Qwen2 == Mistral

* Update llama.py

* Update __init__.py

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update save.py

* Update _utils.py

* Update save.py

* Update save.py

* Update save.py

* test_hf_gguf_equivalence

* Update chat_templates.py

* Update chat_templates.py

* --pad-vocab

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Unspecified max_seq_length

* possible_pad_token

* Update tokenizer_utils.py

* past_key_values

* Update llama.py

* Update llama.py

* Update llama.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* _wrap_fast_inference

* Update llama.py

* Update llama.py

* flag

---------

Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com>
Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com>
Co-authored-by: Yang JianXin <995462226@qq.com>
@NeoFii
Copy link

NeoFii commented May 16, 2024

Could you please provide a detailed explanation of the specific process of fine-tuning Qwen-1.5B-Chat using Unsloth?I want to fine-tune Qwen1.5-7B myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants