-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
-
Did you update?
pip install --upgrade unsloth unsloth_zoo
Yes -
ColaborKaggleor local / cloud
Local (debian LXC) -
Number GPUs used, use
nvidia-smi
4070 - not relevant -
Which notebook? Please link!
None, my own code -
Which Unsloth version, TRL version, transformers version, PyTorch version?
unsloth==2025.8.9
unsloth-zoo==2025.8.8 -
Which trainer?
SFTTrainer,GRPOTraineretc
The problem is not on training
if not os.path.isdir('llama.cpp'):
subprocess.run(f"apt install libcurl4-openssl-dev -y && git clone --recursive https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j {os.cpu_count()} && cd .. ", shell=True, executable="/bin/bash")
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL_SAVE_PATH,
load_in_4bit=load_in_4bit
)
# Quantization types
q_types = ["f16","q8_0","q4_k_m","q4_0"]
# Save model with different quantizations
for q_type in q_types:
save_path = f"repos/{MODEL_SAVE_PATH}-GGUF"
print(f"Saving GGUF with quantization {q_type} to {save_path}...")
model.save_pretrained_gguf(save_path, tokenizer, quantization_method=q_type)
#subprocess.run(f"cd llama.cpp && git checkout . ", shell=True, executable="/bin/bash") # hack since it does an errorNow that I have complete that, here is my issue
Ok, so when I run that code, I got this error:
This might take 3 minutes...
File "/root/unsloth_master/llama.cpp/convert_hf_to_gguf.py", line 34
from mistral_common.tokens.tokenizers.base import TokenizerVersion
^
IndentationError: expected an indented block after 'try' statement on line 33
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/root/unsloth_master/helpers/unsloth_quant1.py", line 93, in <module>
main()
File "/root/unsloth_master/helpers/unsloth_quant1.py", line 39, in main
model.save_pretrained_gguf(save_path, tokenizer, quantization_method=q_type)
File "/root/unsloth_master/venv/lib/python3.11/site-packages/unsloth/save.py", line 1898, in unsloth_save_pretrained_gguf
all_file_locations, want_full_precision = save_to_gguf(
^^^^^^^^^^^^^
File "/root/unsloth_master/venv/lib/python3.11/site-packages/unsloth/save.py", line 1255, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for /root/unsloth_master/repos/kittawere/test-GGUF/unsloth.Q8_0.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.
What I have observed:
1º It does not import mistral_common.tokens.tokenizers.base
I have not proof but I think it fails since it is modifying llama.cpp/convert_hf_to_gguf.py from:
from gguf.vocab import MistralTokenizerType, MistralVocab
from mistral_common.tokens.tokenizers.base import TokenizerVersion
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
from mistral_common.tokens.tokenizers.tekken import Tekkenizer
from mistral_common.tokens.tokenizers.sentencepiece import (
SentencePieceTokenizer,
)to
from gguf.vocab import MistralTokenizerType, MistralVocab
try:
from mistral_common.tokens.tokenizers.base import TokenizerVersion
except:
pass
try:
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
except:
pass
try:
from mistral_common.tokens.tokenizers.tekken import Tekkenizer
except:
pass
try:
from mistral_common.tokens.tokenizers.sentencepiece import (
SentencePieceTokenizer,
)
except:
pass2º It performs the GGUF transformation of F16
It works well :)
3º It trys to do the GGUF to Q_8
It fails IDK were and fallback to adding the "try excepts"
So it goes from:
from gguf.vocab import MistralTokenizerType, MistralVocab
try:
from mistral_common.tokens.tokenizers.base import TokenizerVersion
except:
pass
try:
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
except:
pass
try:
from mistral_common.tokens.tokenizers.tekken import Tekkenizer
except:
pass
try:
from mistral_common.tokens.tokenizers.sentencepiece import (
SentencePieceTokenizer,
)
except:
passto:
from gguf.vocab import MistralTokenizerType, MistralVocab
try:
try:
from mistral_common.tokens.tokenizers.base import TokenizerVersion
except:
pass
except:
pass
try:
try:
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
except:
pass
except:
pass
try:
try:
from mistral_common.tokens.tokenizers.tekken import Tekkenizer
except:
pass
except:
pass
try:
try:
from mistral_common.tokens.tokenizers.sentencepiece import (
SentencePieceTokenizer,
)
except:
pass
except:
pass4º It fails from IndentationError
As seen above :(
IDK what is happening, hope someone see this useful
BTW I don't care (just writing this in case someone find it useful (or the devs fix this "try except" madness 😹 )), in my case if I add subprocess.run(f"cd llama.cpp && git checkout . ", shell=True, executable="/bin/bash") in the loop it does not give any errors, and everything goes well 😸