vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 12.3k
Star 66.6k

Code
Issues 1.8k
Pull requests 1.3k
Discussions
Actions
Projects 20
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 44 Milestones 4

New pull request New

1,321 Open 16,903 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add multimodal input method in the documentation documentation

Improvements or additions to documentation

#31601 opened Jan 1, 2026 by labAxiaoming

Loading…

5 tasks

Fix: Add None check in step_with_batch_queue for async scheduling v1

#31600 opened Jan 1, 2026 by rickychen-infinirc

Loading…

[Qwen3-Omni] Prefer CUDA for faster Whisper audio feature extraction nvidia qwen

Related to Qwen models

#31598 opened Jan 1, 2026 by Jzz1943

Loading…

[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp rocm

Related to AMD ROCm

#31597 opened Jan 1, 2026 by AndreasKaratzas

Loading…

[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs

#31596 opened Jan 1, 2026 by AndreasKaratzas

Loading…

Fix audio mono dimension documentation

Improvements or additions to documentation

multi-modality

Related to multi-modality (#4194)

qwen

Related to Qwen models

#31595 opened Jan 1, 2026 by jeremyteboul

Loading…

Fix flashinfer experts quant config hack llama

Related to Llama models

nvidia

#31593 opened Jan 1, 2026 by robertgshaw2-redhat • Draft

5 tasks

feat(kv-cache): support multiple sliding window groups in HybridKVCac… v1

#31592 opened Jan 1, 2026 by DZADSL72-00558

Loading…

5 tasks

[Misc] Tidy up some spec decode logic in GPUModelRunner ready

ONLY add when PR is ready to merge/full CI is needed

#31591 opened Dec 31, 2025 by njhill

Loading…

[Bugfix] Replace BaseException with specific exceptions in FLA utils ready

ONLY add when PR is ready to merge/full CI is needed

#31590 opened Dec 31, 2025 by c0de128

Loading…

2 of 3 tasks

[Bugfix] Narrow broad exceptions in rank detection functions

#31589 opened Dec 31, 2025 by c0de128

Loading…

2 of 3 tasks

[Bugfix][Hardware][ROCm] Narrow broad exception in PyNCCL library loading rocm

Related to AMD ROCm

#31587 opened Dec 31, 2025 by c0de128

Loading…

2 of 3 tasks

[Bugfix] Narrow broad exception in custom all-reduce detection

#31586 opened Dec 31, 2025 by c0de128

Loading…

2 of 3 tasks

[Bug] Revert torch warning fix bug

Something isn't working

ready

ONLY add when PR is ready to merge/full CI is needed

#31585 opened Dec 31, 2025 by yewentao256

Loading…

[BugFix] scheduler: Fix resuming of preempted requests after async load v1

#31583 opened Dec 31, 2025 by orozery

Loading…

[Frontend] [Bugfix] respect server-level default chat template kwargs in reasoning parser frontend

#31581 opened Dec 31, 2025 by cjackal

Loading…

3 of 5 tasks

[Bugfix]: update global_rank when adjusting rpc_rank to fix layer key error v1

#31580 opened Dec 31, 2025 by zhaoninge

Loading…

3 of 5 tasks

feat: add vllm.utils.device_utils module

#31576 opened Dec 31, 2025 by codebasecomprehension

Loading…

[Model] Support IQuestCoder model new-model

Requests to new models

#31575 opened Dec 31, 2025 by yxing-bj

Loading…

5 tasks

[P/D] Refactor mooncake connector sender thread using async coroutines kv-connector

#31573 opened Dec 31, 2025 by dtcccc

Loading…

5 tasks

[Bugfix] Fix activation quantization for compressed-tensors W4A16 ready

ONLY add when PR is ready to merge/full CI is needed

#31572 opened Dec 31, 2025 by Tmn07

Loading…

[Quantization][MoE] remove unused ep logic from moe marlin

#31571 opened Dec 31, 2025 by jinzhen-lin

Loading…

feat: support LoRA for DeepSeek-OCR(Language Model part) deepseek

Related to DeepSeek models

documentation

Improvements or additions to documentation

ready

ONLY add when PR is ready to merge/full CI is needed

#31569 opened Dec 31, 2025 by zhima771

Loading…

[Core] Optimize group size selection for hybrid KV cache v1

#31568 opened Dec 31, 2025 by DZADSL72-00558

Loading…

5 tasks

[Model] Support SentenceTransformers V6 reranker config documentation

Improvements or additions to documentation

frontend

#31563 opened Dec 31, 2025 by noooop • Draft

5 tasks

Previous 1 2 3 4 5 … 52 53 Next

Previous Next

ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!