unslothai
diff --git a/‎README.md‎
Lines changed: 54 additions & 74 deletions b/‎README.md‎
Lines changed: 54 additions & 74 deletions
diff --git a/‎nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb‎
Lines changed: 15 additions & 15 deletions b/‎nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎nb/CodeForces-cot-Finetune_for_Reasoning_on_CodeForces.ipynb‎
Lines changed: 14 additions & 14 deletions b/‎nb/CodeForces-cot-Finetune_for_Reasoning_on_CodeForces.ipynb‎
Lines changed: 14 additions & 14 deletions
@@ -25,7 +25,7 @@
     "\n",
     "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n",
     "\n",
-    "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)"
+    "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n"
    ]
   },
   {
@@ -44,13 +44,13 @@
    },
    "source": [
     "\n",
-    "New 3x faster training & 30% less VRAM. New kernels, padding-free & packing. [Blog](https://docs.unsloth.ai/new/3x-faster-training-packing)\n",
-    "\n",
-    "You can now train with 500K context windows on a single 80GB GPU. [Blog](https://docs.unsloth.ai/new/500k-context-length-fine-tuning)\n",
+    "Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).\n",
     "\n",
     "Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).\n",
     "\n",
-    "New in Reinforcement Learning: [FP8 RL](https://docs.unsloth.ai/new/fp8-reinforcement-learning) \u2022 [Vision RL](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) \u2022 [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) (faster, less VRAM RL) \u2022 [gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning)\n",
+    "[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!\n",
+    "\n",
+    "Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.\n",
     "\n",
     "Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n"
    ]
@@ -71,7 +71,7 @@
     "id": "AgKwzU9L9lvs"
    },
    "outputs": [],
-   "source": "%%capture\nimport os\nos.environ[\"UNSLOTH_vLLM_STANDBY\"] = \"1\" # [NEW] Extra 30% context lengths!\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n    # If you're not in Colab, just use pip install or uv pip install\n    !pip install unsloth vllm\nelse:\n    pass # For Colab / Kaggle, we need extra instructions hidden below \\/"
+   "source": "%%capture\nimport os\nos.environ[\"UNSLOTH_VLLM_STANDBY\"] = \"1\" # [NEW] Extra 30% context lengths!\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n    # If you're not in Colab, just use pip install or uv pip install\n    !pip install unsloth vllm\nelse:\n    pass # For Colab / Kaggle, we need extra instructions hidden below \\/"
   },
   {
    "cell_type": "code",
@@ -80,7 +80,7 @@
     "id": "9jZhY2Dm9lvt"
    },
    "outputs": [],
-   "source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n    # If you're not in Colab, just use pip install!\n    !pip install unsloth vllm\nelse:\n    try: import numpy, PIL; _numpy = f'numpy=={numpy.__version__}'; _pil = f'pillow=={PIL.__version__}'\n    except: _numpy = \"numpy\"; _pil = \"pillow\"\n    try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n    except: is_t4 = False\n    _vllm, _triton = ('vllm==0.9.2', 'triton==3.2.0') if is_t4 else ('vllm==0.10.2', 'triton')\n    !uv pip install -qqq --upgrade {_vllm} {_numpy} {_pil} torchvision bitsandbytes xformers unsloth\n    !uv pip install -qqq {_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2"
+   "source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n    # If you're not in Colab, just use pip install!\n    !pip install unsloth vllm\nelse:\n    try: import numpy, PIL; get_numpy = f\"numpy=={numpy.__version__}\"; get_pil = f\"pillow=={PIL.__version__}\"\n    except: get_numpy = \"numpy\"; get_pil = \"pillow\"\n    try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n    except: is_t4 = False\n    get_vllm, get_triton = (\"vllm==0.9.2\", \"triton==3.2.0\") if is_t4 else (\"vllm==0.10.2\", \"triton\")\n    !uv pip install -qqq --upgrade \\\n        unsloth {get_vllm} {get_numpy} {get_pil} torchvision bitsandbytes xformers\n    !uv pip install -qqq {get_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2"
   },
   {
    "cell_type": "markdown",
@@ -97,7 +97,7 @@
     "id": "jN75nmdx9lvw"
    },
    "source": [
-    "Load up `Llama 3.2 3B Instruct`, and set parameters. To finetune a base model from scratch, check out our `Qwen 3 4B Base GRPO` notebook [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)"
+    "Load up `Llama 3.2 3B Instruct`, and set parameters. To finetune a base model from scratch, check out our `Qwen 3 4B Base GRPO` notebook [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)\n"
    ]
   },
   {
@@ -218,7 +218,7 @@
       "\ud83e\udda5 Unsloth Zoo will now patch everything to make training faster!\n",
       "INFO 05-16 11:53:29 [importing.py:53] Triton module has been replaced with a placeholder.\n",
       "INFO 05-16 11:53:30 [__init__.py:239] Automatically detected platform cuda.\n",
-      "==((====))==  Unsloth 2025.12.8: Fast Llama patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.\n",
+      "==((====))==  Unsloth 2025.5.4: Fast Llama patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.\n",
       "   \\\\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.\n",
       "O^O/ \\_/ \\    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0\n",
       "\\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]\n",
@@ -491,7 +491,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Unsloth 2025.12.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
+      "Unsloth 2025.5.4 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
      ]
     }
    ],
@@ -1307,7 +1307,7 @@
       "   \\\\   /|    Num examples = 7,473 | Num Epochs = 1 | Total steps = 500\n",
       "O^O/ \\_/ \\    Batch size per device = 4 | Gradient accumulation steps = 4\n",
       "\\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16\n",
-      " \"-____-\"     Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)"
+      " \"-____-\"     Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)\n"
      ]
     },
     {
@@ -12630,9 +12630,9 @@
    },
    "source": [
     "<a name=\"Save\"></a>\n",
-    "### Saving to float16 for vLLM\n",
+    "### Saving to float16 for VLLM\n",
     "\n",
-    "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://docs.unsloth.ai/basics/inference-and-deployment) for more deployment options."
+    "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens."
    ]
   },
   {
@@ -12660,7 +12660,7 @@
     "    tokenizer.save_pretrained(\"model\")\n",
     "if False:\n",
     "    model.push_to_hub(\"hf/model\", token = \"\")\n",
-    "    tokenizer.push_to_hub(\"hf/model\", token = \"\")"
+    "    tokenizer.push_to_hub(\"hf/model\", token = \"\")\n"
    ]
   },
   {
@@ -12672,7 +12672,7 @@
     "### GGUF / llama.cpp Conversion\n",
     "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n",
     "\n",
-    "Some supported quant methods (full list on our [Wiki page](https://docs.unsloth.ai/basics/inference-and-deployment/saving-to-gguf#locally)):\n",
+    "Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n",
     "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n",
     "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n",
     "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n",
 
@@ -13,7 +13,7 @@
     "\n",
     "To install Unsloth your local device, follow [our guide](https://docs.unsloth.ai/get-started/install-and-update). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n",
     "\n",
-    "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)"
+    "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n"
    ]
   },
   {
@@ -28,13 +28,13 @@
    "metadata": {},
    "source": [
     "\n",
-    "New 3x faster training & 30% less VRAM. New kernels, padding-free & packing. [Blog](https://docs.unsloth.ai/new/3x-faster-training-packing)\n",
-    "\n",
-    "You can now train with 500K context windows on a single 80GB GPU. [Blog](https://docs.unsloth.ai/new/500k-context-length-fine-tuning)\n",
+    "Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).\n",
     "\n",
     "Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).\n",
     "\n",
-    "New in Reinforcement Learning: [FP8 RL](https://docs.unsloth.ai/new/fp8-reinforcement-learning) \u2022 [Vision RL](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) \u2022 [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) (faster, less VRAM RL) \u2022 [gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning)\n",
+    "[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!\n",
+    "\n",
+    "Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.\n",
     "\n",
     "Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n"
    ]
@@ -51,7 +51,7 @@
    "execution_count": null,
    "metadata": {},
    "outputs": [],
-   "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n    !pip install unsloth  # Do this in local & cloud setups\nelse:\n    import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n    xformers = 'xformers==' + {'2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.33.post1\")\n    !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n    !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install transformers==4.56.2 && pip install --no-deps trl==0.22.2"
+   "source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n    !pip install unsloth\nelse:\n    # Do this only in Colab notebooks! Otherwise use pip install unsloth\n    import torch; v = re.match(r\"[0-9]{1,}\\.[0-9]{1,}\", str(torch.__version__)).group(0)\n    xformers = \"xformers==\" + (\"0.0.33.post1\" if v==\"2.9\" else \"0.0.32.post2\" if v==\"2.8\" else \"0.0.29.post3\")\n    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo\n    !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n    !pip install --no-deps unsloth\n!pip install transformers==4.56.2\n!pip install --no-deps trl==0.22.2"
   },
   {
    "cell_type": "markdown",
@@ -71,7 +71,7 @@
      "text": [
       "\ud83e\udda5 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n",
       "\ud83e\udda5 Unsloth Zoo will now patch everything to make training faster!\n",
-      "==((====))==  Unsloth 2025.12.8: Fast Qwen2 patching. Transformers: 4.48.3.\n",
+      "==((====))==  Unsloth 2025.3.15: Fast Qwen2 patching. Transformers: 4.48.3.\n",
       "   \\\\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.\n",
       "O^O/ \\_/ \\    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0\n",
       "\\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]\n",
@@ -237,7 +237,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "Unsloth 2025.12.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
+      "Unsloth 2025.3.15 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
      ]
     }
    ],
@@ -753,7 +753,7 @@
    "source": [
     "<a name=\"Train\"></a>\n",
     "### Train the model\n",
-    "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!!"
+    "Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!"
    ]
   },
   {
@@ -874,7 +874,7 @@
       "   \\\\   /|    Num examples = 40,665 | Num Epochs = 1 | Total steps = 60\n",
       "O^O/ \\_/ \\    Batch size per device = 2 | Gradient accumulation steps = 4\n",
       "\\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8\n",
-      " \"-____-\"     Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)"
+      " \"-____-\"     Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)\n"
      ]
     },
     {
@@ -1459,9 +1459,9 @@
     "id": "f422JgM9sdVT"
    },
    "source": [
-    "### Saving to float16 for vLLM\n",
+    "### Saving to float16 for VLLM\n",
     "\n",
-    "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://docs.unsloth.ai/basics/inference-and-deployment) for more deployment options."
+    "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens."
    ]
   },
   {
@@ -1486,7 +1486,7 @@
     "    tokenizer.save_pretrained(\"model\")\n",
     "if False:\n",
     "    model.push_to_hub(\"hf/model\", token = \"\")\n",
-    "    tokenizer.push_to_hub(\"hf/model\", token = \"\")"
+    "    tokenizer.push_to_hub(\"hf/model\", token = \"\")\n"
    ]
   },
   {
@@ -1498,7 +1498,7 @@
     "### GGUF / llama.cpp Conversion\n",
     "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n",
     "\n",
-    "Some supported quant methods (full list on our [Wiki page](https://docs.unsloth.ai/basics/inference-and-deployment/saving-to-gguf#locally)):\n",
+    "Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n",
     "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n",
     "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n",
     "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n",