Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes a few updates to the vision data collator which mostly affect video and prompt completion processing.
Specifically, two bugs were addressed for video datasets. For prompt completion datasets there were a couple of bugs, additional handling for token_type_ids (gemma), and an efficiency improvement to flush_to_side.
We also update the logic to mask prompts in prompt completion datasets by default.
Old Collator:
qwen3vl: https://colab.research.google.com/drive/13cHImIKg2t00qBgaO_9GmLIwN8_ozMNN?usp=sharing
pixtral: https://colab.research.google.com/drive/1YQY3p4jmQRCA7wswnlwx4_y-aGSIs2FY?usp=sharing
qwen25vl: https://colab.research.google.com/drive/1VoqKqRvzKrpXdLK3h29goN_-1ohfdQMh?usp=sharing
llamavl: https://colab.research.google.com/drive/1LlTgYdrU6ug10vVNw-Ng664akqWF13BZ?usp=sharing
gemma3n: https://colab.research.google.com/drive/1vGrhglCAVThw-KhS_veR_qRxOdx_OV-K?usp=sharing
gemma3: https://colab.research.google.com/drive/1hXzYzxBCBrgCFw9kBeokJBKkH21wtZm_?usp=sharing
New Collator:
qwen3vl: https://colab.research.google.com/drive/1RFHt1RcIB9K9y7A9h_w3bWmTfYvUo3Hn?usp=sharing
pixtral: https://colab.research.google.com/drive/1J4xQtJY4RjHeB0ZeOhUFqA2HGbzmm8bV?usp=sharing
qwen25vl: https://colab.research.google.com/drive/1XA2p6GtNOuVHh4Pi4udLvjb4Ur7zAKgY?usp=sharing
llamavl: https://colab.research.google.com/drive/1naUzOF5KWRy--PtgIQfNFjeULA2aR-xH?usp=sharing
gemma3n: https://colab.research.google.com/drive/1Wl4owxrUv1-J0XP2Abvf0YkK1XvhnDc0?usp=sharing
gemma3: https://colab.research.google.com/drive/1drhCI_xXZKq6IFqB8TVa4PrJSMGZnJsv?usp=sharing