Skip to content

Conversation

@Datta0
Copy link
Collaborator

@Datta0 Datta0 commented Jul 19, 2025

Speeds up vLLM loadup by disabling gc temporarily at cuda graph captures.
Doesn't increase any memory either on CPU or on GPU

Inspired by vllm-project/vllm#21146 and pytorch/pytorch#158193

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants