Skip to content

[Bug] grad_norm = NaN while executing Mistral_v0.3_(7B)-GRPO notebook with gradient_accumulation_steps=4 #3129

@truptimohanty

Description

@truptimohanty

While executing the Colab notebook Mistral_v0.3_(7B)-GRPO.ipynb with gradient_accumulation_steps=4, I encounter grad_norm = NaN during training.
Environment:
Python: 3.11.13
unsloth: 2025.8.4
vllm: 0.8.5.post1
GPU: NVIDIA T4 (Colab)
While, I don't see this NaN value while keeping gradient_accumulation_steps=1.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions