Skip to content

Conversation

@Datta0
Copy link
Collaborator

@Datta0 Datta0 commented Oct 16, 2025

This PR adds support for Qwen-3-VL + vLLM weight sharing like we have for Qwen-2.5-VL, Mistral3, Gemma3

Needed for : unslothai/unsloth#3489

@danielhanchen danielhanchen changed the base branch from main to nightly October 31, 2025 09:10
@danielhanchen danielhanchen merged commit bb81b69 into unslothai:nightly Oct 31, 2025
danielhanchen added a commit that referenced this pull request Nov 3, 2025
* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Fix Flex Attention autotuning

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* Update mxfp4.py

* Update mxfp4.py

* Update gpt_oss.py

* Update attention_sink.py

* Update patching_utils.py

* Update attention_sink.py

* Update gpt_oss.py

* prefer_nd_tiling

* Update patching_utils.py

* flex_attention_with_sink

* Compile Flex Attention

* Update mxfp4.py

* Update mxfp4.py

* Update mxfp4.py

* Update mxfp4.py

* Update gpt_oss.py

* bitsandbytes patch

* Update bitsandbytes.py

* Update gpt_oss.py

* Inplace ops

* Update gpt_oss.py

* has_static_cache

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update attention_sink.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* torch compile

* Update attention_sink.py

* Update common.py

* Update common.py

* Patches

* Compiled mask creation

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Revert

* Update gpt_oss.py

* Update gpt_oss.py

* Fix up

* Update attention_sink.py

* Update attention_sink.py

* Update utils.py

* Update attention_sink.py

* Update attention_sink.py

* Retry

* Update gpt_oss.py

* Update gpt_oss.py

* Fix Flex

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Bug fixes

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* Update rl_replacements.py

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* flash attn

* Update gpt_oss.py

* Update __init__.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* dropout_p

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* fix

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update loss_utils.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update loss_utils.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Versioning

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Fix Gemma 3

* Update misc.py

* Update rl_environments.py

* Update pyproject.toml

* Update rl_environments.py

* Update __init__.py

* Update empty_model.py

* Update empty_model.py

* Update empty_model.py

* Update empty_model.py

* Device type

* Update vllm_utils.py

* Update compiler.py

* Update empty_model.py

* Update vllm_utils.py

* Update empty_model.py

* Fixes

* Update empty_model.py

* Update empty_model.py

* Update __init__.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_environments.py

* Update cross_entropy_loss.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_environments.py

* Update vllm_utils.py

* Qwen3 VL vLLM (#324)

* qwen3 vl additional layers

* qwen3 fused vision qkv

* refactor for handling qwen 3 vl

* [WIP] fix backward pass issues

* out hidden size change

* Qwen 2.5 and qwen 3 conv3d->Linear vLLM changes

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update pyproject.toml

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update compiler.py

* Update __init__.py

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
danielhanchen added a commit that referenced this pull request Nov 6, 2025
* Update attention_sink.py

* Update gpt_oss.py

* prefer_nd_tiling

* Update patching_utils.py

* flex_attention_with_sink

* Compile Flex Attention

* Update mxfp4.py

* Update mxfp4.py

* Update mxfp4.py

* Update mxfp4.py

* Update gpt_oss.py

* bitsandbytes patch

* Update bitsandbytes.py

* Update gpt_oss.py

* Inplace ops

* Update gpt_oss.py

* has_static_cache

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update attention_sink.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* torch compile

* Update attention_sink.py

* Update common.py

* Update common.py

* Patches

* Compiled mask creation

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Revert

* Update gpt_oss.py

* Update gpt_oss.py

* Fix up

* Update attention_sink.py

* Update attention_sink.py

* Update utils.py

* Update attention_sink.py

* Update attention_sink.py

* Retry

* Update gpt_oss.py

* Update gpt_oss.py

* Fix Flex

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Bug fixes

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* Update rl_replacements.py

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* flash attn

* Update gpt_oss.py

* Update __init__.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* dropout_p

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* fix

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update loss_utils.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update loss_utils.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Versioning

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Fix Gemma 3

* Update misc.py

* Update rl_environments.py

* Update pyproject.toml

* Update rl_environments.py

* Update __init__.py

* Update empty_model.py

* Update empty_model.py

* Update empty_model.py

* Update empty_model.py

* Device type

* Update vllm_utils.py

* Update compiler.py

* Update empty_model.py

* Update vllm_utils.py

* Update empty_model.py

* Fixes

* Update empty_model.py

* Update empty_model.py

* Update __init__.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_environments.py

* Update cross_entropy_loss.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_environments.py

* Update vllm_utils.py

* Qwen3 VL vLLM (#324)

* qwen3 vl additional layers

* qwen3 fused vision qkv

* refactor for handling qwen 3 vl

* [WIP] fix backward pass issues

* out hidden size change

* Qwen 2.5 and qwen 3 conv3d->Linear vLLM changes

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update pyproject.toml

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update compiler.py

* Update __init__.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix CE compile

* Update loss_utils.py

* Update cross_entropy_loss.py

* Fix

* Deepseekocr fix: save single model shard (#346)

* DeepSeekOCR Fix: check for saftensors_list shard naming convention

* turned off shard padding length check bc deepseeks padding is different

* if you try to copy the index.json file and the same file alredy exists it wil throw and error.

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
danielhanchen added a commit that referenced this pull request Nov 25, 2025
* Update gpt_oss.py

* torch compile

* Update attention_sink.py

* Update common.py

* Update common.py

* Patches

* Compiled mask creation

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Revert

* Update gpt_oss.py

* Update gpt_oss.py

* Fix up

* Update attention_sink.py

* Update attention_sink.py

* Update utils.py

* Update attention_sink.py

* Update attention_sink.py

* Retry

* Update gpt_oss.py

* Update gpt_oss.py

* Fix Flex

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Bug fixes

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* Update rl_replacements.py

* Update patching_utils.py

* Update patching_utils.py

* Update patching_utils.py

* flash attn

* Update gpt_oss.py

* Update __init__.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* dropout_p

* Update gpt_oss.py

* Update gpt_oss.py

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* fix

* Update attention_sink.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update loss_utils.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update loss_utils.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Update gpt_oss.py

* Versioning

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Update saving_utils.py

* Fix Gemma 3

* Update misc.py

* Update rl_environments.py

* Update pyproject.toml

* Update rl_environments.py

* Update __init__.py

* Update empty_model.py

* Update empty_model.py

* Update empty_model.py

* Update empty_model.py

* Device type

* Update vllm_utils.py

* Update compiler.py

* Update empty_model.py

* Update vllm_utils.py

* Update empty_model.py

* Fixes

* Update empty_model.py

* Update empty_model.py

* Update __init__.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_environments.py

* Update cross_entropy_loss.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update rl_environments.py

* Update vllm_utils.py

* Qwen3 VL vLLM (#324)

* qwen3 vl additional layers

* qwen3 fused vision qkv

* refactor for handling qwen 3 vl

* [WIP] fix backward pass issues

* out hidden size change

* Qwen 2.5 and qwen 3 conv3d->Linear vLLM changes

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update pyproject.toml

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update __init__.py

* Update compiler.py

* Update __init__.py

* Update vllm_utils.py

* Update rl_replacements.py

* Update rl_replacements.py

* Update rl_replacements.py

* Fix CE compile

* Update loss_utils.py

* Update cross_entropy_loss.py

* Fix

* Deepseekocr fix: save single model shard (#346)

* DeepSeekOCR Fix: check for saftensors_list shard naming convention

* turned off shard padding length check bc deepseeks padding is different

* if you try to copy the index.json file and the same file alredy exists it wil throw and error.

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update patching_utils.py

* Fp8 compressed (#358)

* [WIP] compressed tensors support for FP8

* [WIP 2/n] improve loading fake layer

* [WIP 3/n] improve loading fake layer

* [WIP 4/n] improve loading fake layer

* revert seq and token util calculation

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vllm_utils.py

* Update vllm_utils.py

* Update vllm_utils.py

* Update gradient_checkpointing.py

* Update gpt_oss.py

* Update gpt_oss.py

* updates for vLLM compativility with lora (#359)

* updates for vLLM compativility with lora

* LoRA extra vocab cleanup

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* Update vllm_utils.py

* Versioning

---------

Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants