Fix/video collate #342

mmathew23 · 2025-11-02T03:19:23Z

This PR makes a few updates to the vision data collator which mostly affect video and prompt completion processing.

Specifically, two bugs were addressed for video datasets. For prompt completion datasets there were a couple of bugs, additional handling for token_type_ids (gemma), and an efficiency improvement to flush_to_side.

We also update the logic to mask prompts in prompt completion datasets by default.

Old Collator:
qwen3vl: https://colab.research.google.com/drive/13cHImIKg2t00qBgaO_9GmLIwN8_ozMNN?usp=sharing
pixtral: https://colab.research.google.com/drive/1YQY3p4jmQRCA7wswnlwx4_y-aGSIs2FY?usp=sharing
qwen25vl: https://colab.research.google.com/drive/1VoqKqRvzKrpXdLK3h29goN_-1ohfdQMh?usp=sharing
llamavl: https://colab.research.google.com/drive/1LlTgYdrU6ug10vVNw-Ng664akqWF13BZ?usp=sharing
gemma3n: https://colab.research.google.com/drive/1vGrhglCAVThw-KhS_veR_qRxOdx_OV-K?usp=sharing
gemma3: https://colab.research.google.com/drive/1hXzYzxBCBrgCFw9kBeokJBKkH21wtZm_?usp=sharing

New Collator:
qwen3vl: https://colab.research.google.com/drive/1RFHt1RcIB9K9y7A9h_w3bWmTfYvUo3Hn?usp=sharing
pixtral: https://colab.research.google.com/drive/1J4xQtJY4RjHeB0ZeOhUFqA2HGbzmm8bV?usp=sharing
qwen25vl: https://colab.research.google.com/drive/1XA2p6GtNOuVHh4Pi4udLvjb4Ur7zAKgY?usp=sharing
llamavl: https://colab.research.google.com/drive/1naUzOF5KWRy--PtgIQfNFjeULA2aR-xH?usp=sharing
gemma3n: https://colab.research.google.com/drive/1Wl4owxrUv1-J0XP2Abvf0YkK1XvhnDc0?usp=sharing
gemma3: https://colab.research.google.com/drive/1drhCI_xXZKq6IFqB8TVa4PrJSMGZnJsv?usp=sharing

This PR updates prompt completion processing and video processing utilities for the vision data collator

mmathew23 added 2 commits November 1, 2025 16:36

Update UnslothVisionDataCollator

12bdd59

This PR updates prompt completion processing and video processing utilities for the vision data collator

copmpletion_only_loss=True is default

83e9a6e

danielhanchen merged commit c8b204f into unslothai:main Nov 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/video collate #342

Fix/video collate #342

Uh oh!

mmathew23 commented Nov 2, 2025

Labels

2 participants

Fix/video collate #342

Fix/video collate #342

Uh oh!

Conversation

mmathew23 commented Nov 2, 2025

Labels

2 participants