FP8, Standby and vLLM updates #340

Datta0 · 2025-10-31T10:20:50Z

H100 and L4 seem to disagree when it comes to vLLM weight scale shapes for FP8. This is a workaround to handle both simultaneously till we see any concrete reason behind that

unslothai/unsloth#3531

Also this intends to improve standby util calculation in hope of addressing unslothai/unsloth#3542
L4 seems to be capturing cuda graphs which need more buffer. Notice the slight increase in memory usage for a brief while (2s)

This reverts commit c353bb7.

Datta0 added 3 commits October 30, 2025 11:59

Cleaner and error free alloc_conf

c353bb7

Smarter shape adjustments

b3dc49f

Revert "Cleaner and error free alloc_conf"

69949ea

This reverts commit c353bb7.

Datta0 mentioned this pull request Oct 31, 2025

fbgemm fp8 block quant support (>=1.4.0) unslothai/unsloth#3531

Merged

Datta0 added 3 commits November 3, 2025 05:35

Safer vLLM V0 patches

ecf1566

point to vLLM fp8 for reference

388d86f

rework gpu util for standby

aa8af8e

Datta0 changed the title ~~Smarter weight shape adjustment for FP8 block quant~~ Nov 3, 2025

Datta0 added 4 commits November 3, 2025 15:50

deterministic fp8 weight scale transposing from vllm

4eeed74

remove print

68979b4

add vllm reference

4e6409a

Merge remote-tracking branch 'origin/main' into fbgemm_fp8

2586aad

danielhanchen merged commit 6593994 into unslothai:main Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FP8, Standby and vLLM updates #340

FP8, Standby and vLLM updates #340

Uh oh!

Datta0 commented Oct 31, 2025 •

edited

Loading

Labels

2 participants

FP8, Standby and vLLM updates #340

FP8, Standby and vLLM updates #340

Uh oh!

Conversation

Datta0 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

2 participants

Datta0 commented Oct 31, 2025 •

edited

Loading