Skip to content

Conversation

@mmathew23
Copy link
Collaborator

This PR makes a few updates to the vision data collator which mostly affect video and prompt completion processing.

Specifically, two bugs were addressed for video datasets. For prompt completion datasets there were a couple of bugs, additional handling for token_type_ids (gemma), and an efficiency improvement to flush_to_side.

We also update the logic to mask prompts in prompt completion datasets by default.

Old Collator:
qwen3vl: https://colab.research.google.com/drive/13cHImIKg2t00qBgaO_9GmLIwN8_ozMNN?usp=sharing
pixtral: https://colab.research.google.com/drive/1YQY3p4jmQRCA7wswnlwx4_y-aGSIs2FY?usp=sharing
qwen25vl: https://colab.research.google.com/drive/1VoqKqRvzKrpXdLK3h29goN_-1ohfdQMh?usp=sharing
llamavl: https://colab.research.google.com/drive/1LlTgYdrU6ug10vVNw-Ng664akqWF13BZ?usp=sharing
gemma3n: https://colab.research.google.com/drive/1vGrhglCAVThw-KhS_veR_qRxOdx_OV-K?usp=sharing
gemma3: https://colab.research.google.com/drive/1hXzYzxBCBrgCFw9kBeokJBKkH21wtZm_?usp=sharing

New Collator:
qwen3vl: https://colab.research.google.com/drive/1RFHt1RcIB9K9y7A9h_w3bWmTfYvUo3Hn?usp=sharing
pixtral: https://colab.research.google.com/drive/1J4xQtJY4RjHeB0ZeOhUFqA2HGbzmm8bV?usp=sharing
qwen25vl: https://colab.research.google.com/drive/1XA2p6GtNOuVHh4Pi4udLvjb4Ur7zAKgY?usp=sharing
llamavl: https://colab.research.google.com/drive/1naUzOF5KWRy--PtgIQfNFjeULA2aR-xH?usp=sharing
gemma3n: https://colab.research.google.com/drive/1Wl4owxrUv1-J0XP2Abvf0YkK1XvhnDc0?usp=sharing
gemma3: https://colab.research.google.com/drive/1drhCI_xXZKq6IFqB8TVa4PrJSMGZnJsv?usp=sharing

This PR updates prompt completion processing and video processing
utilities for the vision data collator
@danielhanchen danielhanchen merged commit c8b204f into unslothai:main Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants