Skip to content

[Bug] GGUF not working [in my case works, but needs fixing] #3215

@kittawere

Description

@kittawere
  1. Did you update? pip install --upgrade unsloth unsloth_zoo
    Yes

  2. Colab or Kaggle or local / cloud
    Local (debian LXC)

  3. Number GPUs used, use nvidia-smi
    4070 - not relevant

  4. Which notebook? Please link!
    None, my own code

  5. Which Unsloth version, TRL version, transformers version, PyTorch version?
    unsloth==2025.8.9
    unsloth-zoo==2025.8.8

  6. Which trainer? SFTTrainer, GRPOTrainer etc
    The problem is not on training

if not os.path.isdir('llama.cpp'):
    subprocess.run(f"apt install libcurl4-openssl-dev -y && git clone --recursive https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j {os.cpu_count()} && cd .. ", shell=True, executable="/bin/bash")


model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=MODEL_SAVE_PATH,
        load_in_4bit=load_in_4bit
    )

    # Quantization types
    q_types = ["f16","q8_0","q4_k_m","q4_0"]

    # Save model with different quantizations
    for q_type in q_types:
        save_path = f"repos/{MODEL_SAVE_PATH}-GGUF"
        print(f"Saving GGUF with quantization {q_type} to {save_path}...")
        model.save_pretrained_gguf(save_path, tokenizer, quantization_method=q_type)
        #subprocess.run(f"cd llama.cpp && git checkout .  ", shell=True, executable="/bin/bash") # hack since it does an error

Now that I have complete that, here is my issue

Ok, so when I run that code, I got this error:

This might take 3 minutes...
  File "/root/unsloth_master/llama.cpp/convert_hf_to_gguf.py", line 34
    from mistral_common.tokens.tokenizers.base import TokenizerVersion
    ^
IndentationError: expected an indented block after 'try' statement on line 33
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/root/unsloth_master/helpers/unsloth_quant1.py", line 93, in <module>
    main()
  File "/root/unsloth_master/helpers/unsloth_quant1.py", line 39, in main
    model.save_pretrained_gguf(save_path, tokenizer, quantization_method=q_type)
  File "/root/unsloth_master/venv/lib/python3.11/site-packages/unsloth/save.py", line 1898, in unsloth_save_pretrained_gguf
    all_file_locations, want_full_precision = save_to_gguf(
                                              ^^^^^^^^^^^^^
  File "/root/unsloth_master/venv/lib/python3.11/site-packages/unsloth/save.py", line 1255, in save_to_gguf
    raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for /root/unsloth_master/repos/kittawere/test-GGUF/unsloth.Q8_0.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

What I have observed:

1º It does not import mistral_common.tokens.tokenizers.base

I have not proof but I think it fails since it is modifying llama.cpp/convert_hf_to_gguf.py from:

from gguf.vocab import MistralTokenizerType, MistralVocab
from mistral_common.tokens.tokenizers.base import TokenizerVersion
from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
from mistral_common.tokens.tokenizers.tekken import Tekkenizer
from mistral_common.tokens.tokenizers.sentencepiece import (
    SentencePieceTokenizer,
)

to

from gguf.vocab import MistralTokenizerType, MistralVocab
try:
    from mistral_common.tokens.tokenizers.base import TokenizerVersion
except:
    pass
try:
    from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
except:
    pass
try:
    from mistral_common.tokens.tokenizers.tekken import Tekkenizer
except:
    pass
try:
    from mistral_common.tokens.tokenizers.sentencepiece import (
    SentencePieceTokenizer,
)
except:
    pass

2º It performs the GGUF transformation of F16

It works well :)

3º It trys to do the GGUF to Q_8

It fails IDK were and fallback to adding the "try excepts"

So it goes from:

from gguf.vocab import MistralTokenizerType, MistralVocab
try:
    from mistral_common.tokens.tokenizers.base import TokenizerVersion
except:
    pass
try:
    from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
except:
    pass
try:
    from mistral_common.tokens.tokenizers.tekken import Tekkenizer
except:
    pass
try:
    from mistral_common.tokens.tokenizers.sentencepiece import (
    SentencePieceTokenizer,
)
except:
    pass

to:

from gguf.vocab import MistralTokenizerType, MistralVocab
try:
    try:
    from mistral_common.tokens.tokenizers.base import TokenizerVersion
except:
    pass
except:
    pass
try:
    try:
    from mistral_common.tokens.tokenizers.multimodal import DATASET_MEAN, DATASET_STD
except:
    pass
except:
    pass
try:
    try:
    from mistral_common.tokens.tokenizers.tekken import Tekkenizer
except:
    pass
except:
    pass
try:
    try:
    from mistral_common.tokens.tokenizers.sentencepiece import (
    SentencePieceTokenizer,
)
except:
    pass

except:
    pass

4º It fails from IndentationError

As seen above :(

IDK what is happening, hope someone see this useful

BTW I don't care (just writing this in case someone find it useful (or the devs fix this "try except" madness 😹 )), in my case if I add subprocess.run(f"cd llama.cpp && git checkout . ", shell=True, executable="/bin/bash") in the loop it does not give any errors, and everything goes well 😸

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions