Skip to content

Conversation

@Datta0
Copy link
Collaborator

@Datta0 Datta0 commented Oct 31, 2025

H100 and L4 seem to disagree when it comes to vLLM weight scale shapes for FP8. This is a workaround to handle both simultaneously till we see any concrete reason behind that

unslothai/unsloth#3531

Also this intends to improve standby util calculation in hope of addressing unslothai/unsloth#3542
L4 seems to be capturing cuda graphs which need more buffer. Notice the slight increase in memory usage for a brief while (2s)

image
@Datta0 Datta0 changed the title Smarter weight shape adjustment for FP8 block quant Nov 3, 2025
@danielhanchen danielhanchen merged commit 6593994 into unslothai:main Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants