andrewor14’s gists

andrewor14 / gist:d45690463599d0fb6f1bde8446f7fe68

Created November 19, 2025 02:57

Unsloth fp8 test for #3440

	TMA benchmarks will be running without grid constant TMA descriptor.
	🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
	[xformers\|WARNING]WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
	PyTorch 2.8.0+cu128 with CUDA 1208 (you have 2.9.0+cu129)
	Python 3.9.23 (you have 3.10.18)
	Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
	Memory-efficient attention, SwiGLU, sparse and more won't be available.
	Set XFORMERS_MORE_DETAILS=1 for more details
	Unsloth: Your Flash Attention 2 installation seems to be broken?
	A possible explanation is you have a new CUDA version which isn't

andrewor14 / gist:8c3909d08e0fcecd26a5fb8e155b9521

Created November 18, 2025 22:31

Unsloth GRPO datta script

	Unsloth 2025.11.3 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.
	Performing substitution for additional_keys=set()
	Unsloth: Just some info: will skip parsing ['attention_norm', 'cross_attn_input_layernorm', 'cross_attn_post_attention_layernorm', 'post_layernorm', 'norm2', 'post_attention_layernorm', 'input_layernorm', 'pre_feedforward_layernorm', 'norm', 'post_feedforward_layernorm', 'k_norm', 'layer_norm1', 'q_norm', 'layer_norm2', 'ffn_norm', 'norm1']
	Unsloth: Tokenizing ["text"] (num_proc=64): 100%\|██████████\| 1126/1126 [00:11<00:00, 95.40 examples/s]
	==((====))== Unsloth - 2x faster free finetuning \| Num GPUs used = 1
	\\ /\| Num examples = 1,126 \| Num Epochs = 1 \| Total steps = 100
	O^O/ \_/ \ Batch size per device = 4 \| Gradient accumulation steps = 1
	\ / Data Parallel GPUs = 1 \| Total batch size (4 x 1 x 1) = 4
	"-____-" Trainable parameters = 87,293,952 of 8,278,029,312 (1.05% trained)
	{'loss': 0.5516, 'grad_norm': 0.5312474966049194, 'learning_rate': 0.0,

andrewor14 / gist:5b85119fae46845d07b608d420907423

Created November 11, 2025 20:54

Unsloth FP8 + GRPO test script

	# Modeled after https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_(4B)-GRPO.ipynb

	from unsloth import FastLanguageModel

	import gc
	import os
	import re

	from datasets import load_dataset, Dataset
	from trl import GRPOConfig, GRPOTrainer, SFTConfig, SFTTrainer

andrewor14 / gist:062b753e72d419d2c1e2a9d4e142b1fa

Created October 7, 2025 21:07

Full CPP stack trace for https://github.com/pytorch/pytorch/issues/164872

	[rank0]: Traceback (most recent call last):
	[rank0]: File "/home/andrewor/local/unsloth-zoo/unsloth_zoo/vllm_utils.py", line 884, in get_state_dict
	[rank0]: weight = qweight[dim_offsets[kk] : dim_offsets[kk + 1]]
	[rank0]: File "/home/andrewor/local/ao/torchao/utils.py", line 662, in _dispatch__torch_function__
	[rank0]: raise e
	[rank0]: File "/home/andrewor/local/ao/torchao/utils.py", line 659, in _dispatch__torch_function__
	[rank0]: return func(args, *kwargs)
	[rank0]: RuntimeError: Cannot set version_counter for inference tensor
	[rank0]: Exception raised from set_version_counter at /pytorch/c10/core/TensorImpl.h:2117 (most recent call first):
	[rank0]: C++ CapturedTraceback:

andrewor14 / gist:378a86485e2bc167f851bfd648625b70

Created October 2, 2025 23:33

UnslothGRPOTrainer excerpt

	class _UnslothGRPOTrainer(Trainer):
	...

	def _sync_fsdp1_params_to_vllm(self, module: nn.Module, prefix: str = "", visited=None):
	"""Memory-efficient post-order traversal of FSDP modules to extract full parameters and sync with vLLM."""
	# For FSDP1, we need to recurse into children and also use summon_full_params
	if visited is None:
	visited = set()
	for child_name, child_module in module.named_children():
	child_prefix = f"{prefix}.{child_name}" if prefix else child_name

andrewor14 / gist:6360dd69b5784c71c46e80c14f53e6b6

Created September 15, 2025 17:34

Unsloth QAT LoRA

	from unsloth import FastLanguageModel
	import torch
	from torchao.quantization import Int4WeightOnlyConfig
	from transformers import AutoModelForCausalLM, TextStreamer, TorchAoConfig


	qat_scheme = "int4"
	save_output_path = "/tmp/unsloth_model"
	max_seq_length = 2048

andrewor14 / gist:048b5c1bd01b7fa23c53913856a8ef9f

Created September 5, 2025 20:15

Unsloth QAT full fine-tuning

	from unsloth import FastLanguageModel
	import torch
	from torchao.quantization import Float8DynamicActivationInt4WeightConfig
	from transformers import AutoModelForCausalLM, TextStreamer, TorchAoConfig


	qat_scheme = "fp8-int4"
	save_output_path = "/tmp/unsloth_model"
	max_seq_length = 2048

andrewor14 / gist:b0364ac3cb8aa114e46b39d848fa5c8b

Created August 29, 2025 21:50

Unsloth QAT full finetuning test

	# Based on https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb
	# but with `full_finetuning=True` and without `get_peft_model`

	import os
	from unsloth import FastLanguageModel
	from transformers import TextStreamer
	import torch

	max_seq_length = 2048
	model, tokenizer = FastLanguageModel.from_pretrained(

andrewor14 / gist:ab650350b69276cf585c008914aaa146

Last active August 29, 2025 20:30

Repro unsloth full finetuning

	# Based on https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb
	# but with `full_finetuning=True` and without `get_peft_model`
	# Output is at the bottom of the gist

	import os
	from unsloth import FastLanguageModel
	from transformers import TextStreamer
	import torch

	max_seq_length = 2048

andrewor14 / gist:2b69861957f54a71bc10bab145d2b1ed

Created August 21, 2025 22:38

Debug SmolLM3 QAT

	$ GRADIO_SERVER_NAME="0.0.0.0" python test_sayak.py
	/home/andrewor/local/ao/torchao/utils.py:408: UserWarning: TORCH_VERSION_AT_LEAST_2_8 is deprecated and will be removed in torchao 0.14.0
	warnings.warn(self.msg)
	/home/andrewor/local/ao/torchao/utils.py:408: UserWarning: TORCH_VERSION_AT_LEAST_2_7 is deprecated and will be removed in torchao 0.14.0
	warnings.warn(self.msg)
	Loading checkpoint shards: 100%\|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████\| 2/2 [00:00<00:00, 65.28it/s]

	Step 1: Applying QAT observers to the model...
	/home/andrewor/local/ao/torchao/quantization/qat/utils.py:84: UserWarning: 'FakeQuantizeConfig' is deprecated and will be removed in a future release. Please use the following API instead: