This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| TMA benchmarks will be running without grid constant TMA descriptor. | |
| 🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. | |
| [xformers|WARNING]WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for: | |
| PyTorch 2.8.0+cu128 with CUDA 1208 (you have 2.9.0+cu129) | |
| Python 3.9.23 (you have 3.10.18) | |
| Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers) | |
| Memory-efficient attention, SwiGLU, sparse and more won't be available. | |
| Set XFORMERS_MORE_DETAILS=1 for more details | |
| Unsloth: Your Flash Attention 2 installation seems to be broken? | |
| A possible explanation is you have a new CUDA version which isn't |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Unsloth 2025.11.3 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers. | |
| Performing substitution for additional_keys=set() | |
| Unsloth: Just some info: will skip parsing ['attention_norm', 'cross_attn_input_layernorm', 'cross_attn_post_attention_layernorm', 'post_layernorm', 'norm2', 'post_attention_layernorm', 'input_layernorm', 'pre_feedforward_layernorm', 'norm', 'post_feedforward_layernorm', 'k_norm', 'layer_norm1', 'q_norm', 'layer_norm2', 'ffn_norm', 'norm1'] | |
| Unsloth: Tokenizing ["text"] (num_proc=64): 100%|██████████| 1126/1126 [00:11<00:00, 95.40 examples/s] | |
| ==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 | |
| \\ /| Num examples = 1,126 | Num Epochs = 1 | Total steps = 100 | |
| O^O/ \_/ \ Batch size per device = 4 | Gradient accumulation steps = 1 | |
| \ / Data Parallel GPUs = 1 | Total batch size (4 x 1 x 1) = 4 | |
| "-____-" Trainable parameters = 87,293,952 of 8,278,029,312 (1.05% trained) | |
| {'loss': 0.5516, 'grad_norm': 0.5312474966049194, 'learning_rate': 0.0, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Modeled after https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb | |
| from unsloth import FastLanguageModel | |
| import gc | |
| import os | |
| import re | |
| from datasets import load_dataset, Dataset | |
| from trl import GRPOConfig, GRPOTrainer, SFTConfig, SFTTrainer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [rank0]: Traceback (most recent call last): | |
| [rank0]: File "/home/andrewor/local/unsloth-zoo/unsloth_zoo/vllm_utils.py", line 884, in get_state_dict | |
| [rank0]: weight = qweight[dim_offsets[kk] : dim_offsets[kk + 1]] | |
| [rank0]: File "/home/andrewor/local/ao/torchao/utils.py", line 662, in _dispatch__torch_function__ | |
| [rank0]: raise e | |
| [rank0]: File "/home/andrewor/local/ao/torchao/utils.py", line 659, in _dispatch__torch_function__ | |
| [rank0]: return func(*args, **kwargs) | |
| [rank0]: RuntimeError: Cannot set version_counter for inference tensor | |
| [rank0]: Exception raised from set_version_counter at /pytorch/c10/core/TensorImpl.h:2117 (most recent call first): | |
| [rank0]: C++ CapturedTraceback: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class _UnslothGRPOTrainer(Trainer): | |
| ... | |
| def _sync_fsdp1_params_to_vllm(self, module: nn.Module, prefix: str = "", visited=None): | |
| """Memory-efficient post-order traversal of FSDP modules to extract full parameters and sync with vLLM.""" | |
| # For FSDP1, we need to recurse into children and also use summon_full_params | |
| if visited is None: | |
| visited = set() | |
| for child_name, child_module in module.named_children(): | |
| child_prefix = f"{prefix}.{child_name}" if prefix else child_name |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from unsloth import FastLanguageModel | |
| import torch | |
| from torchao.quantization import Int4WeightOnlyConfig | |
| from transformers import AutoModelForCausalLM, TextStreamer, TorchAoConfig | |
| qat_scheme = "int4" | |
| save_output_path = "/tmp/unsloth_model" | |
| max_seq_length = 2048 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from unsloth import FastLanguageModel | |
| import torch | |
| from torchao.quantization import Float8DynamicActivationInt4WeightConfig | |
| from transformers import AutoModelForCausalLM, TextStreamer, TorchAoConfig | |
| qat_scheme = "fp8-int4" | |
| save_output_path = "/tmp/unsloth_model" | |
| max_seq_length = 2048 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Based on https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb | |
| # but with `full_finetuning=True` and without `get_peft_model` | |
| import os | |
| from unsloth import FastLanguageModel | |
| from transformers import TextStreamer | |
| import torch | |
| max_seq_length = 2048 | |
| model, tokenizer = FastLanguageModel.from_pretrained( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Based on https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb | |
| # but with `full_finetuning=True` and without `get_peft_model` | |
| # Output is at the bottom of the gist | |
| import os | |
| from unsloth import FastLanguageModel | |
| from transformers import TextStreamer | |
| import torch | |
| max_seq_length = 2048 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| $ GRADIO_SERVER_NAME="0.0.0.0" python test_sayak.py | |
| /home/andrewor/local/ao/torchao/utils.py:408: UserWarning: TORCH_VERSION_AT_LEAST_2_8 is deprecated and will be removed in torchao 0.14.0 | |
| warnings.warn(self.msg) | |
| /home/andrewor/local/ao/torchao/utils.py:408: UserWarning: TORCH_VERSION_AT_LEAST_2_7 is deprecated and will be removed in torchao 0.14.0 | |
| warnings.warn(self.msg) | |
| Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 65.28it/s] | |
| Step 1: Applying QAT observers to the model... | |
| /home/andrewor/local/ao/torchao/quantization/qat/utils.py:84: UserWarning: 'FakeQuantizeConfig' is deprecated and will be removed in a future release. Please use the following API instead: |
NewerOlder