Skip to content

Commit fe36aa1

Browse files
Merge pull request #161 from unslothai/revert-159-lfm2.5
Revert "Add LFM2.5 notebooks"
2 parents 6a94240 + f6c9a2f commit fe36aa1

File tree

444 files changed

+107092
-135392
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

444 files changed

+107092
-135392
lines changed

‎README.md‎

Lines changed: 54 additions & 74 deletions
Large diffs are not rendered by default.

‎nb/Advanced_Llama3_2_(3B)_GRPO_LoRA.ipynb‎

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525
"\n",
2626
"To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n",
2727
"\n",
28-
"You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)"
28+
"You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n"
2929
]
3030
},
3131
{
@@ -44,13 +44,13 @@
4444
},
4545
"source": [
4646
"\n",
47-
"New 3x faster training & 30% less VRAM. New kernels, padding-free & packing. [Blog](https://docs.unsloth.ai/new/3x-faster-training-packing)\n",
48-
"\n",
49-
"You can now train with 500K context windows on a single 80GB GPU. [Blog](https://docs.unsloth.ai/new/500k-context-length-fine-tuning)\n",
47+
"Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).\n",
5048
"\n",
5149
"Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).\n",
5250
"\n",
53-
"New in Reinforcement Learning: [FP8 RL](https://docs.unsloth.ai/new/fp8-reinforcement-learning) \u2022 [Vision RL](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) \u2022 [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) (faster, less VRAM RL) \u2022 [gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning)\n",
51+
"[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!\n",
52+
"\n",
53+
"Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.\n",
5454
"\n",
5555
"Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n"
5656
]
@@ -71,7 +71,7 @@
7171
"id": "AgKwzU9L9lvs"
7272
},
7373
"outputs": [],
74-
"source": "%%capture\nimport os\nos.environ[\"UNSLOTH_vLLM_STANDBY\"] = \"1\" # [NEW] Extra 30% context lengths!\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install or uv pip install\n !pip install unsloth vllm\nelse:\n pass # For Colab / Kaggle, we need extra instructions hidden below \\/"
74+
"source": "%%capture\nimport os\nos.environ[\"UNSLOTH_VLLM_STANDBY\"] = \"1\" # [NEW] Extra 30% context lengths!\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install or uv pip install\n !pip install unsloth vllm\nelse:\n pass # For Colab / Kaggle, we need extra instructions hidden below \\/"
7575
},
7676
{
7777
"cell_type": "code",
@@ -80,7 +80,7 @@
8080
"id": "9jZhY2Dm9lvt"
8181
},
8282
"outputs": [],
83-
"source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install!\n !pip install unsloth vllm\nelse:\n try: import numpy, PIL; _numpy = f'numpy=={numpy.__version__}'; _pil = f'pillow=={PIL.__version__}'\n except: _numpy = \"numpy\"; _pil = \"pillow\"\n try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n except: is_t4 = False\n _vllm, _triton = ('vllm==0.9.2', 'triton==3.2.0') if is_t4 else ('vllm==0.10.2', 'triton')\n !uv pip install -qqq --upgrade {_vllm} {_numpy} {_pil} torchvision bitsandbytes xformers unsloth\n !uv pip install -qqq {_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2"
83+
"source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install!\n !pip install unsloth vllm\nelse:\n try: import numpy, PIL; get_numpy = f\"numpy=={numpy.__version__}\"; get_pil = f\"pillow=={PIL.__version__}\"\n except: get_numpy = \"numpy\"; get_pil = \"pillow\"\n try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n except: is_t4 = False\n get_vllm, get_triton = (\"vllm==0.9.2\", \"triton==3.2.0\") if is_t4 else (\"vllm==0.10.2\", \"triton\")\n !uv pip install -qqq --upgrade \\\n unsloth {get_vllm} {get_numpy} {get_pil} torchvision bitsandbytes xformers\n !uv pip install -qqq {get_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2"
8484
},
8585
{
8686
"cell_type": "markdown",
@@ -97,7 +97,7 @@
9797
"id": "jN75nmdx9lvw"
9898
},
9999
"source": [
100-
"Load up `Llama 3.2 3B Instruct`, and set parameters. To finetune a base model from scratch, check out our `Qwen 3 4B Base GRPO` notebook [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)"
100+
"Load up `Llama 3.2 3B Instruct`, and set parameters. To finetune a base model from scratch, check out our `Qwen 3 4B Base GRPO` notebook [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)\n"
101101
]
102102
},
103103
{
@@ -218,7 +218,7 @@
218218
"\ud83e\udda5 Unsloth Zoo will now patch everything to make training faster!\n",
219219
"INFO 05-16 11:53:29 [importing.py:53] Triton module has been replaced with a placeholder.\n",
220220
"INFO 05-16 11:53:30 [__init__.py:239] Automatically detected platform cuda.\n",
221-
"==((====))== Unsloth 2025.12.8: Fast Llama patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.\n",
221+
"==((====))== Unsloth 2025.5.4: Fast Llama patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.\n",
222222
" \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.\n",
223223
"O^O/ \\_/ \\ Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0\n",
224224
"\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]\n",
@@ -491,7 +491,7 @@
491491
"name": "stderr",
492492
"output_type": "stream",
493493
"text": [
494-
"Unsloth 2025.12.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
494+
"Unsloth 2025.5.4 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
495495
]
496496
}
497497
],
@@ -1307,7 +1307,7 @@
13071307
" \\\\ /| Num examples = 7,473 | Num Epochs = 1 | Total steps = 500\n",
13081308
"O^O/ \\_/ \\ Batch size per device = 4 | Gradient accumulation steps = 4\n",
13091309
"\\ / Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16\n",
1310-
" \"-____-\" Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)"
1310+
" \"-____-\" Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)\n"
13111311
]
13121312
},
13131313
{
@@ -12630,9 +12630,9 @@
1263012630
},
1263112631
"source": [
1263212632
"<a name=\"Save\"></a>\n",
12633-
"### Saving to float16 for vLLM\n",
12633+
"### Saving to float16 for VLLM\n",
1263412634
"\n",
12635-
"We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://docs.unsloth.ai/basics/inference-and-deployment) for more deployment options."
12635+
"We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens."
1263612636
]
1263712637
},
1263812638
{
@@ -12660,7 +12660,7 @@
1266012660
" tokenizer.save_pretrained(\"model\")\n",
1266112661
"if False:\n",
1266212662
" model.push_to_hub(\"hf/model\", token = \"\")\n",
12663-
" tokenizer.push_to_hub(\"hf/model\", token = \"\")"
12663+
" tokenizer.push_to_hub(\"hf/model\", token = \"\")\n"
1266412664
]
1266512665
},
1266612666
{
@@ -12672,7 +12672,7 @@
1267212672
"### GGUF / llama.cpp Conversion\n",
1267312673
"To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n",
1267412674
"\n",
12675-
"Some supported quant methods (full list on our [Wiki page](https://docs.unsloth.ai/basics/inference-and-deployment/saving-to-gguf#locally)):\n",
12675+
"Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n",
1267612676
"* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n",
1267712677
"* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n",
1267812678
"* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n",

‎nb/CodeForces-cot-Finetune_for_Reasoning_on_CodeForces.ipynb‎

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"\n",
1414
"To install Unsloth your local device, follow [our guide](https://docs.unsloth.ai/get-started/install-and-update). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).\n",
1515
"\n",
16-
"You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)"
16+
"You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n"
1717
]
1818
},
1919
{
@@ -28,13 +28,13 @@
2828
"metadata": {},
2929
"source": [
3030
"\n",
31-
"New 3x faster training & 30% less VRAM. New kernels, padding-free & packing. [Blog](https://docs.unsloth.ai/new/3x-faster-training-packing)\n",
32-
"\n",
33-
"You can now train with 500K context windows on a single 80GB GPU. [Blog](https://docs.unsloth.ai/new/500k-context-length-fine-tuning)\n",
31+
"Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).\n",
3432
"\n",
3533
"Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).\n",
3634
"\n",
37-
"New in Reinforcement Learning: [FP8 RL](https://docs.unsloth.ai/new/fp8-reinforcement-learning) \u2022 [Vision RL](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) \u2022 [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) (faster, less VRAM RL) \u2022 [gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning)\n",
35+
"[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!\n",
36+
"\n",
37+
"Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.\n",
3838
"\n",
3939
"Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n"
4040
]
@@ -51,7 +51,7 @@
5151
"execution_count": null,
5252
"metadata": {},
5353
"outputs": [],
54-
"source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth # Do this in local & cloud setups\nelse:\n import torch; v = re.match(r'[\\d]{1,}\\.[\\d]{1,}', str(torch.__version__)).group(0)\n xformers = 'xformers==' + {'2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, \"0.0.33.post1\")\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth\n!pip install transformers==4.56.2 && pip install --no-deps trl==0.22.2"
54+
"source": "%%capture\nimport os, re\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n !pip install unsloth\nelse:\n # Do this only in Colab notebooks! Otherwise use pip install unsloth\n import torch; v = re.match(r\"[0-9]{1,}\\.[0-9]{1,}\", str(torch.__version__)).group(0)\n xformers = \"xformers==\" + (\"0.0.33.post1\" if v==\"2.9\" else \"0.0.32.post2\" if v==\"2.8\" else \"0.0.29.post3\")\n !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo\n !pip install sentencepiece protobuf \"datasets==4.3.0\" \"huggingface_hub>=0.34.0\" hf_transfer\n !pip install --no-deps unsloth\n!pip install transformers==4.56.2\n!pip install --no-deps trl==0.22.2"
5555
},
5656
{
5757
"cell_type": "markdown",
@@ -71,7 +71,7 @@
7171
"text": [
7272
"\ud83e\udda5 Unsloth: Will patch your computer to enable 2x faster free finetuning.\n",
7373
"\ud83e\udda5 Unsloth Zoo will now patch everything to make training faster!\n",
74-
"==((====))== Unsloth 2025.12.8: Fast Qwen2 patching. Transformers: 4.48.3.\n",
74+
"==((====))== Unsloth 2025.3.15: Fast Qwen2 patching. Transformers: 4.48.3.\n",
7575
" \\\\ /| NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.\n",
7676
"O^O/ \\_/ \\ Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0\n",
7777
"\\ / Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]\n",
@@ -237,7 +237,7 @@
237237
"name": "stderr",
238238
"output_type": "stream",
239239
"text": [
240-
"Unsloth 2025.12.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
240+
"Unsloth 2025.3.15 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n"
241241
]
242242
}
243243
],
@@ -753,7 +753,7 @@
753753
"source": [
754754
"<a name=\"Train\"></a>\n",
755755
"### Train the model\n",
756-
"Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!!"
756+
"Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!"
757757
]
758758
},
759759
{
@@ -874,7 +874,7 @@
874874
" \\\\ /| Num examples = 40,665 | Num Epochs = 1 | Total steps = 60\n",
875875
"O^O/ \\_/ \\ Batch size per device = 2 | Gradient accumulation steps = 4\n",
876876
"\\ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8\n",
877-
" \"-____-\" Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)"
877+
" \"-____-\" Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)\n"
878878
]
879879
},
880880
{
@@ -1459,9 +1459,9 @@
14591459
"id": "f422JgM9sdVT"
14601460
},
14611461
"source": [
1462-
"### Saving to float16 for vLLM\n",
1462+
"### Saving to float16 for VLLM\n",
14631463
"\n",
1464-
"We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://docs.unsloth.ai/basics/inference-and-deployment) for more deployment options."
1464+
"We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens."
14651465
]
14661466
},
14671467
{
@@ -1486,7 +1486,7 @@
14861486
" tokenizer.save_pretrained(\"model\")\n",
14871487
"if False:\n",
14881488
" model.push_to_hub(\"hf/model\", token = \"\")\n",
1489-
" tokenizer.push_to_hub(\"hf/model\", token = \"\")"
1489+
" tokenizer.push_to_hub(\"hf/model\", token = \"\")\n"
14901490
]
14911491
},
14921492
{
@@ -1498,7 +1498,7 @@
14981498
"### GGUF / llama.cpp Conversion\n",
14991499
"To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n",
15001500
"\n",
1501-
"Some supported quant methods (full list on our [Wiki page](https://docs.unsloth.ai/basics/inference-and-deployment/saving-to-gguf#locally)):\n",
1501+
"Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n",
15021502
"* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n",
15031503
"* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n",
15041504
"* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n",

0 commit comments

Comments
 (0)