-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Fix: prevent rope_embedding AssertionError by checking kv_seq_len before reuse #3578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Prevented AssertionError in rope_embedding.forward when kv_seq_len exceeds the cached rope size. Added condition to verify kv_seq_len <= position_embeddings[0].shape[0] before reuse, ensuring dynamic extension triggers correctly. Fixes unslothai#3036 unslothai#3216
Summary of ChangesHello @jarrycyx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly addresses a critical AssertionError that occurs during long-sequence processing by adding a boundary check for kv_seq_len before reusing cached RoPE position embeddings. The fix is applied to LlamaAttention_fast_forward in llama.py and Qwen3Attention_fast_forward in qwen3.py, which is a solid approach.
As you noted, further investigation on other models is needed. My analysis confirms this, as similar vulnerabilities seem to exist in other models within the repository. Specifically:
cohere.py(CohereAttention_fast_forward)granite.py(GraniteAttention_fast_forward)falcon_h1.py(FalconH1Attention_fast_forward)
These models also appear to use position_embeddings without a sufficient length check against kv_seq_len. The fix might not be as straightforward for all of them since some attention layers may not have direct access to self.rotary_emb to extend the embeddings.
The changes in this PR are correct and well-implemented. Great job on identifying and fixing this issue!
|
Hey @jarrycyx , thanks a lot for identifying and fixing the issue. It'd be great if you can also adapt the same to other models as well so that the PR is complete... |
Thank you for the reply. I have added this fix to falcon_h1. |
|
Once extended, does it get scaled back for shorter sequences on the next training step? |
Actually I don't see any functions for scaling back here. unsloth/unsloth/models/llama.py Lines 1373 to 1484 in b1c782e
|
I don't have enough knowledge to estimate the consequences, but may it cause unexpected drift in training results? |
unsloth/unsloth/models/llama.py Lines 1412 to 1416 in b1c782e
I am also not an expert, but it is my understanding that the frequency calculated here is not based on the context length, so maybe not? |
|
Thanks so much for the fix! |
📝 Pull Request Description
Problem
When running with long sequences, the following assertion occasionally triggers:
This happens because the current
position_embeddingsbuffer is reused even whenkv_seq_lenexceeds its allocated length. The dynamically extended RoPE embedding is never called due to a missing check.Root Cause
Seems that current_rope_size defined here never got extended.
unsloth/unsloth/models/llama.py
Line 1395 in b1c782e
In
unsloth/models/llama.py, the condition:unsloth/unsloth/models/qwen3.py
Lines 113 to 115 in aa7cfa1
does not verify whether
kv_seq_lenfits within the existing RoPE cache.As a result,
cosandsinmay be too short, leading to theAssertionError.Fix
Added a boundary check before reusing
position_embeddings:This ensures RoPE is extended dynamically when the sequence length exceeds the cached size.
Related Issue
Fixes #3036 #3216.
Impact
AssertionErrorinrope_embedding.pyduring long-sequence inference or training.