Skip to content

Conversation

@pluesclues
Copy link
Contributor

Detach old and reference hidden states to prevent gradient flow across micro-batches. This is a fix for a few models but mainly for Qwen3 vl in GRPO.

Detach old and reference hidden states to prevent gradient flow across micro-batches.
@danielhanchen danielhanchen merged commit 347c46a into unslothai:main Nov 4, 2025
danielhanchen added a commit that referenced this pull request Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants