[Bug] grad_norm = NaN while executing Mistral_v0.3_(7B)-GRPO notebook with gradient_accumulation_steps=4

While executing the Colab notebook Mistral_v0.3_(7B)-GRPO.ipynb with gradient_accumulation_steps=4, I encounter grad_norm = NaN during training.
Environment:
Python: 3.11.13
unsloth: 2025.8.4
vllm: 0.8.5.post1
GPU: NVIDIA T4 (Colab)
While, I don't see this NaN value while keeping gradient_accumulation_steps=1.