|
25 | 25 | "\n", |
26 | 26 | "To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n", |
27 | 27 | "\n", |
28 | | - "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)" |
| 28 | + "You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n" |
29 | 29 | ] |
30 | 30 | }, |
31 | 31 | { |
|
44 | 44 | }, |
45 | 45 | "source": [ |
46 | 46 | "\n", |
47 | | - "New 3x faster training & 30% less VRAM. New kernels, padding-free & packing. [Blog](https://docs.unsloth.ai/new/3x-faster-training-packing)\n", |
48 | | - "\n", |
49 | | - "You can now train with 500K context windows on a single 80GB GPU. [Blog](https://docs.unsloth.ai/new/500k-context-length-fine-tuning)\n", |
| 47 | + "Introducing FP8 precision training for faster RL inference. [Read Blog](https://docs.unsloth.ai/new/fp8-reinforcement-learning).\n", |
50 | 48 | "\n", |
51 | 49 | "Unsloth's [Docker image](https://hub.docker.com/r/unsloth/unsloth) is here! Start training with no setup & environment issues. [Read our Guide](https://docs.unsloth.ai/new/how-to-train-llms-with-unsloth-and-docker).\n", |
52 | 50 | "\n", |
53 | | - "New in Reinforcement Learning: [FP8 RL](https://docs.unsloth.ai/new/fp8-reinforcement-learning) \u2022 [Vision RL](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) \u2022 [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) (faster, less VRAM RL) \u2022 [gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning)\n", |
| 51 | + "[gpt-oss RL](https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning) is now supported with the fastest inference & lowest VRAM. Try our [new notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/gpt-oss-(20B)-GRPO.ipynb) which creates kernels!\n", |
| 52 | + "\n", |
| 53 | + "Introducing [Vision](https://docs.unsloth.ai/new/vision-reinforcement-learning-vlm-rl) and [Standby](https://docs.unsloth.ai/basics/memory-efficient-rl) for RL! Train Qwen, Gemma etc. VLMs with GSPO - even faster with less VRAM.\n", |
54 | 54 | "\n", |
55 | 55 | "Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n" |
56 | 56 | ] |
|
71 | 71 | "id": "AgKwzU9L9lvs" |
72 | 72 | }, |
73 | 73 | "outputs": [], |
74 | | - "source": "%%capture\nimport os\nos.environ[\"UNSLOTH_vLLM_STANDBY\"] = \"1\" # [NEW] Extra 30% context lengths!\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install or uv pip install\n !pip install unsloth vllm\nelse:\n pass # For Colab / Kaggle, we need extra instructions hidden below \\/" |
| 74 | + "source": "%%capture\nimport os\nos.environ[\"UNSLOTH_VLLM_STANDBY\"] = \"1\" # [NEW] Extra 30% context lengths!\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install or uv pip install\n !pip install unsloth vllm\nelse:\n pass # For Colab / Kaggle, we need extra instructions hidden below \\/" |
75 | 75 | }, |
76 | 76 | { |
77 | 77 | "cell_type": "code", |
|
80 | 80 | "id": "9jZhY2Dm9lvt" |
81 | 81 | }, |
82 | 82 | "outputs": [], |
83 | | - "source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install!\n !pip install unsloth vllm\nelse:\n try: import numpy, PIL; _numpy = f'numpy=={numpy.__version__}'; _pil = f'pillow=={PIL.__version__}'\n except: _numpy = \"numpy\"; _pil = \"pillow\"\n try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n except: is_t4 = False\n _vllm, _triton = ('vllm==0.9.2', 'triton==3.2.0') if is_t4 else ('vllm==0.10.2', 'triton')\n !uv pip install -qqq --upgrade {_vllm} {_numpy} {_pil} torchvision bitsandbytes xformers unsloth\n !uv pip install -qqq {_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2" |
| 83 | + "source": "#@title Colab Extra Install { display-mode: \"form\" }\n%%capture\nimport os\n!pip install --upgrade -qqq uv\nif \"COLAB_\" not in \"\".join(os.environ.keys()):\n # If you're not in Colab, just use pip install!\n !pip install unsloth vllm\nelse:\n try: import numpy, PIL; get_numpy = f\"numpy=={numpy.__version__}\"; get_pil = f\"pillow=={PIL.__version__}\"\n except: get_numpy = \"numpy\"; get_pil = \"pillow\"\n try: import subprocess; is_t4 = \"Tesla T4\" in str(subprocess.check_output([\"nvidia-smi\"]))\n except: is_t4 = False\n get_vllm, get_triton = (\"vllm==0.9.2\", \"triton==3.2.0\") if is_t4 else (\"vllm==0.10.2\", \"triton\")\n !uv pip install -qqq --upgrade \\\n unsloth {get_vllm} {get_numpy} {get_pil} torchvision bitsandbytes xformers\n !uv pip install -qqq {get_triton}\n!uv pip install transformers==4.56.2\n!uv pip install --no-deps trl==0.22.2" |
84 | 84 | }, |
85 | 85 | { |
86 | 86 | "cell_type": "markdown", |
|
97 | 97 | "id": "jN75nmdx9lvw" |
98 | 98 | }, |
99 | 99 | "source": [ |
100 | | - "Load up `Llama 3.2 3B Instruct`, and set parameters. To finetune a base model from scratch, check out our `Qwen 3 4B Base GRPO` notebook [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)" |
| 100 | + "Load up `Llama 3.2 3B Instruct`, and set parameters. To finetune a base model from scratch, check out our `Qwen 3 4B Base GRPO` notebook [here](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb)\n" |
101 | 101 | ] |
102 | 102 | }, |
103 | 103 | { |
|
218 | 218 | "\ud83e\udda5 Unsloth Zoo will now patch everything to make training faster!\n", |
219 | 219 | "INFO 05-16 11:53:29 [importing.py:53] Triton module has been replaced with a placeholder.\n", |
220 | 220 | "INFO 05-16 11:53:30 [__init__.py:239] Automatically detected platform cuda.\n", |
221 | | - "==((====))== Unsloth 2025.12.8: Fast Llama patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.\n", |
| 221 | + "==((====))== Unsloth 2025.5.4: Fast Llama patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.\n", |
222 | 222 | " \\\\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.\n", |
223 | 223 | "O^O/ \\_/ \\ Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0\n", |
224 | 224 | "\\ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]\n", |
|
491 | 491 | "name": "stderr", |
492 | 492 | "output_type": "stream", |
493 | 493 | "text": [ |
494 | | - "Unsloth 2025.12.8 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n" |
| 494 | + "Unsloth 2025.5.4 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.\n" |
495 | 495 | ] |
496 | 496 | } |
497 | 497 | ], |
|
1307 | 1307 | " \\\\ /| Num examples = 7,473 | Num Epochs = 1 | Total steps = 500\n", |
1308 | 1308 | "O^O/ \\_/ \\ Batch size per device = 4 | Gradient accumulation steps = 4\n", |
1309 | 1309 | "\\ / Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16\n", |
1310 | | - " \"-____-\" Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)" |
| 1310 | + " \"-____-\" Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)\n" |
1311 | 1311 | ] |
1312 | 1312 | }, |
1313 | 1313 | { |
|
12630 | 12630 | }, |
12631 | 12631 | "source": [ |
12632 | 12632 | "<a name=\"Save\"></a>\n", |
12633 | | - "### Saving to float16 for vLLM\n", |
| 12633 | + "### Saving to float16 for VLLM\n", |
12634 | 12634 | "\n", |
12635 | | - "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See [our docs](https://docs.unsloth.ai/basics/inference-and-deployment) for more deployment options." |
| 12635 | + "We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens." |
12636 | 12636 | ] |
12637 | 12637 | }, |
12638 | 12638 | { |
|
12660 | 12660 | " tokenizer.save_pretrained(\"model\")\n", |
12661 | 12661 | "if False:\n", |
12662 | 12662 | " model.push_to_hub(\"hf/model\", token = \"\")\n", |
12663 | | - " tokenizer.push_to_hub(\"hf/model\", token = \"\")" |
| 12663 | + " tokenizer.push_to_hub(\"hf/model\", token = \"\")\n" |
12664 | 12664 | ] |
12665 | 12665 | }, |
12666 | 12666 | { |
|
12672 | 12672 | "### GGUF / llama.cpp Conversion\n", |
12673 | 12673 | "To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.\n", |
12674 | 12674 | "\n", |
12675 | | - "Some supported quant methods (full list on our [Wiki page](https://docs.unsloth.ai/basics/inference-and-deployment/saving-to-gguf#locally)):\n", |
| 12675 | + "Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):\n", |
12676 | 12676 | "* `q8_0` - Fast conversion. High resource use, but generally acceptable.\n", |
12677 | 12677 | "* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.\n", |
12678 | 12678 | "* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.\n", |
|
0 commit comments