Skip to content

Conversation

Detach old and reference hidden states to prevent gradient accumulation across micro-batches.
@danielhanchen
Copy link
Contributor

I'm merging the partial fix - make another PR on detach :) @pluesclues

@danielhanchen danielhanchen merged commit e58389e into unslothai:main Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants