Skip to content

qwen3vl 多模态video sft: index out of range #9704

@KYRIE-LI11

Description

@KYRIE-LI11

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.4.dev0
  • Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.35
  • Python version: 3.11.14
  • PyTorch version: 2.9.1+cu128 (GPU)
  • Transformers version: 4.57.1
  • Datasets version: 4.0.0
  • Accelerate version: 1.11.0
  • PEFT version: 0.17.1
  • GPU type: NVIDIA H200
  • GPU number: 8
  • GPU memory: 139.81GB
  • TRL version: 0.9.6
  • DeepSpeed version: 0.16.9
  • Git commit: aeda079
  • Default data directory: detected

Reproduction

在用sft方法训练qwen3vl时,数据用的是video数据,且是多轮对话,报错idx out of range, 这里代码中是默认video_metadata长度和messgaes长度相同吗,如果多轮对话中user又输入了一个video就不支持吗

Image

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpendingThis problem is yet to be addressed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions