Speed up vLLM load #208

Datta0 · 2025-07-19T15:13:40Z

Speeds up vLLM loadup by disabling gc temporarily at cuda graph captures.
Doesn't increase any memory either on CPU or on GPU

Inspired by vllm-project/vllm#21146 and pytorch/pytorch#158193

unsloth_zoo/vllm_utils.py

Speed up vLLM load

a36a67d

danielhanchen reviewed Jul 20, 2025

View reviewed changes

unsloth_zoo/vllm_utils.py Outdated Show resolved Hide resolved

danielhanchen reviewed Jul 20, 2025

View reviewed changes

unsloth_zoo/vllm_utils.py Outdated Show resolved Hide resolved

Datta0 added 2 commits July 20, 2025 07:04

Patch vLLM v0 graph capture gc

4325036

Cleanup patching mechanism for both v0 and v1 vllm

9406bc9

danielhanchen merged commit b7f2e53 into unslothai:main Jul 20, 2025

Datta0 mentioned this pull request Nov 17, 2025

[Feature] Please fix GRPO notebook to work without fast_inference, otherwise the notebook breaks and the model prints gibberish unslothai/unsloth#2730

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up vLLM load #208

Speed up vLLM load #208

Uh oh!

Datta0 commented Jul 19, 2025

Uh oh!

Uh oh!

Labels

2 participants

Speed up vLLM load #208

Speed up vLLM load #208

Uh oh!

Conversation

Datta0 commented Jul 19, 2025

Uh oh!

Uh oh!

Labels

2 participants