-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
-
Did you update?
pip install --upgrade unsloth unsloth_zoo
yes (unsloth updated to 2025.7.8) -
ColaborKaggleor local / cloud
local Python script
OS: Linux -
Number GPUs used, use
nvidia-smi
GPU(s): 1× NVIDIA RTX A6000 (compute 8.6, 47.44 GB VRAM)5. Number GPUs used, usenvidia-smi -
Which notebook? Please link!
Local -
Which Unsloth version, TRL version, transformers version, PyTorch version?
Unsloth: 2025.7.8
TRL (Transformers Reinforcement Learning): 0.19.1
Transformers: 4.53.2
PyTorch: 2.7.0+cu126 (CUDA 12.6) -
Which trainer?
SFTTrainer,GRPOTraineretc```pythonPut Minimal code to reproduce error here ###Remove Hugging Face Using GRPOTrainer from trl, in a plain Python script (train_deepseek_grpo_custom-dataset_v2_long_context.py).
Configs:
Script
script: train_deepseek_grpo_custom-dataset_v2.py
Model & data
model_name: "unsloth/DeepSeek-R1-0528-Qwen3-8B"
revision: "main"
dataset_name: "custom"
train_split: "train"
test_split: "test"
Output & logging
output_dir: "deepseek_r1_grpo"
wandb_project: "deepseek_qwen3_avy"
wandb_run_name: "training_5percent_dataset_new_reward_func"
Training hyperparameters
batch_size: 2
accum_steps: 1
total_steps: 30
lr: 2e-5
Context window & LoRA
max_length: 55000
max_new_tokens: 256
max_completion_len: 256
lora_rank: 16
Error:
AssertionError: seq_len <= cos.shape[0]
File “…/unsloth/kernels/rope_embedding.py”, line 91, in forward
assert(seq_len <= cos.shape[0])