-
Notifications
You must be signed in to change notification settings - Fork 395
[Patch] Skip calculating loss of empty <think></think> (ಥ﹏ಥ)
#1387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a temporary monkey patch to skip loss calculation for empty <think></think> tags in training data. The patch is controlled via the XTUNER_SKIP_EMPTY_THINK environment variable and is explicitly designed to be disposable rather than a permanent solution.
Key Changes:
- Added
_hardcode_patch.pywith_SkipEmptyThinkmixin class that filters empty thinking tags from labels - Modified multiple tokenize function classes through runtime base class injection
- Wrapped tokenization methods to apply label processing post-tokenization
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
xtuner/v1/datasets/_hardcode_patch.py |
Implements the core monkey patching logic with _SkipEmptyThink mixin and label processing wrapper for filtering <think></think> sequences |
xtuner/v1/datasets/__init__.py |
Imports the hardcode patch module to activate the monkey patching when the environment variable is set |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger = get_logger() | ||
|
|
||
|
|
||
| # Need to inherit ABC otherwise the assignment of `__bases__` will faild |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "faild" should be "fail".
| # Need to inherit ABC otherwise the assignment of `__bases__` will faild | |
| # Need to inherit ABC otherwise the assignment of `__bases__` will fail |
| class _SkipEmptyThink(ABC): | ||
| def __init__(self, tokenizer: PreTrainedTokenizer, **kwargs): | ||
| # For `__mro__` compatibility | ||
| super().__init__(tokenizer=tokenizer) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __init__ method accepts **kwargs but never uses or forwards them. This could silently ignore important configuration parameters that were intended for the parent class or other mixins in the MRO chain. Consider using super().__init__(tokenizer=tokenizer, **kwargs) to properly forward any additional arguments.
| super().__init__(tokenizer=tokenizer) | |
| super().__init__(tokenizer=tokenizer, **kwargs) |
46633ad to
f607e1a
Compare
f607e1a to
cd748ba
Compare
No description provided.