Skip to content

[Bug] Unsloth llama.cpp installation overflows out of the venv into the global env #3487

@kittawere

Description

@kittawere
  1. Did you update? pip install --upgrade unsloth unsloth_zoo > YES

  2. Colab or Kaggle or local / cloud > LOCAL DEBIAN 12 LXC

  3. Number GPUs used, use nvidia-smi > 1

  4. Which notebook? Please link! > none

  5. Which Unsloth version, TRL version, transformers version, PyTorch version?

# uv pip freeze                                            
Using Python 3.11.2 environment at: venv
accelerate==1.11.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.1
aiosignal==1.4.0
annotated-types==0.7.0
anyio==4.11.0
attrs==25.4.0
bitsandbytes==0.48.1
certifi==2025.10.5
charset-normalizer==3.4.4
cut-cross-entropy==25.1.1
datasets==4.2.0
diffusers==0.35.2
dill==0.4.0
diskcache==5.6.3
docstring-parser==0.17.0
filelock==3.20.0
frozenlist==1.8.0
fsspec==2025.9.0
gguf==0.17.1
h11==0.16.0
hf-transfer==0.1.9
hf-xet==1.1.10
httpcore==1.0.9
httpx==0.28.1
huggingface-hub==0.35.3
idna==3.11
importlib-metadata==8.7.0
jinja2==3.1.6
jsonschema==4.25.1
jsonschema-specifications==2025.9.1
llama-cpp-python==0.3.16
markdown-it-py==4.0.0
markupsafe==3.0.3
mdurl==0.1.2
mistral-common==1.8.5
mpmath==1.3.0
msgspec==0.19.0
multidict==6.7.0
multiprocess==0.70.16
networkx==3.5
numpy==2.3.4
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.3
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvtx-cu12==12.8.90
packaging==25.0
pandas==2.3.3
peft==0.17.1
pillow==12.0.0
propcache==0.4.1
protobuf==6.33.0
psutil==7.1.1
pyarrow==21.0.0
pycountry==24.6.1
pydantic==2.12.3
pydantic-core==2.41.4
pydantic-extra-types==2.10.6
pygments==2.19.2
python-dateutil==2.9.0.post0
pytz==2025.2
pyyaml==6.0.3
referencing==0.37.0
regex==2025.9.18
requests==2.32.5
rich==14.2.0
rpds-py==0.27.1
safetensors==0.6.2
sentencepiece==0.2.1
setuptools==80.9.0
shtab==1.7.2
six==1.17.0
sniffio==1.3.1
sympy==1.14.0
tiktoken==0.12.0
tokenizers==0.22.1
torch==2.8.0
torchao==0.13.0
torchvision==0.23.0
tqdm==4.67.1
transformers==4.56.2
triton==3.4.0
trl==0.23.0
typeguard==4.4.4
typing-extensions==4.15.0
typing-inspection==0.4.2
tyro==0.9.35
tzdata==2025.2
unsloth==2025.10.7
unsloth-zoo==2025.10.8
urllib3==2.5.0
wheel==0.45.1
xformers==0.0.32.post2
xxhash==3.6.0
yarl==1.22.0
zipp==3.23.0
  1. Which trainer? SFTTrainer, GRPOTrainer etc > none

  2. Faulty code:

from unsloth import FastLanguageModel

import os

if not os.path.isdir('llama.cpp'):
    # until now i have been using this because did not work otherway
    subprocess.run(f'CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python #only for the quant stuff', shell=True, executable="/bin/bash")
    # until now I have been installing llama.cpp because other way it did not used CUDA its automatic instalation
    # anyway the bug occurs either you install it manually or let it install it by it self
    subprocess.run(f"apt install libcurl4-openssl-dev -y && git clone --recursive https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j {os.cpu_count()} && cd .. ", shell=True, executable="/bin/bash")


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="kittawere/Llama-KW-CwP-V1-3B-notseparated", # just for test use another if u want
    load_in_4bit=False
)

model.save_pretrained_gguf(
    "./test_gguf",
    tokenizer=tokenizer,
    quantization_method="q8_0",
)

The problem

When trying to quantify models, it install llama.cpp and also some libraries like "gguf" (yes I have try to install them manually) and it is trying to use global pip instead of the one in the venv

error:

root@KW-ai:~/unsloth_easy_lora# source ./venv/bin/activate
(venv) root@KW-ai:~/unsloth_easy_lora# python tt.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO:unsloth_zoo.log: *** `model_type_list` in `get_transformers_model_type` is None!
==((====))==  Unsloth 2025.10.7: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA GeForce RTX 4070. Num GPUs = 1. Max memory: 11.721 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.18it/s]
Unsloth 2025.10.7 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `./test_gguf`: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.65s/it]
Successfully copied all 2 files from cache to `./test_gguf`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 95325.09it/s]
Unsloth: Merging weights into 16bit: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:33<00:00, 46.83s/it]
Unsloth: Merge process complete. Saved to `/root/unsloth_easy_lora/test_gguf`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF bf16 might take 3 minutes.
\        /    [2] Converting GGUF bf16 to ['q8_0'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: llama.cpp folder exists but binaries not found - will rebuild
Unsloth: Updating system package directories
Unsloth: All required system packages already installed!
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Install GGUF and other packages
Traceback (most recent call last):
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1074, in save_to_gguf
    quantizer_location, converter_location = check_llama_cpp()
                                             ^^^^^^^^^^^^^^^^^
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 343, in check_llama_cpp
    raise RuntimeError(
RuntimeError: Unsloth: No working quantizer found in llama.cpp
Files in directory: convert_hf_to_gguf.py, flake.nix, ci, Makefile, grammars, common, README.md, tools, scripts, requirements, licenses, flake.lock, docs, gguf-py, convert_hf_to_gguf_update.py, CMakeLists.txt, LICENSE, build-xcframework.sh, mypy.ini, pyproject.toml

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1835, in unsloth_save_pretrained_gguf
    all_file_locations, want_full_precision, is_vlm_update = save_to_gguf(
                                                             ^^^^^^^^^^^^^
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1085, in save_to_gguf
    quantizer_location, converter_location = install_llama_cpp(
                                             ^^^^^^^^^^^^^^^^^^
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 415, in install_llama_cpp
    try_execute(f"{pip} install gguf protobuf sentencepiece mistral_common", **kwargs)
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 275, in try_execute
    raise RuntimeError(error_msg)
RuntimeError: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1
stdout: error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/unsloth_easy_lora/tt.py", line 71, in <module>
    model.save_pretrained_gguf(
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1855, in unsloth_save_pretrained_gguf
    raise RuntimeError(f"Unsloth: GGUF conversion failed: {e}")
RuntimeError: Unsloth: GGUF conversion failed: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1
stdout: error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

Basically the lines I think are important:

try_execute(f"{pip} install gguf protobuf sentencepiece mistral_common", **kwargs)
RuntimeError: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1

And it fails due to the Debian setting to do not allow things to be installed on the global env (Yeah I can change it but I don't need unsloth right now so I wont mess with the global env)

Relevant stuff

In try_execute(f"{pip} install gguf protobuf sentencepiece mistral_common", **kwargs) , pip is a var set by check_pip() on venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py and it has a var called PIP_OPTIONS that inside have uv pip suggesting uv is supported so I did not test with pip instead of uv

note

I don't know if unsloth is meant to work inside a venv (in that case close the issue :] ) I am just creating this issue to report that I cannot confirm that #3215 is fixed since I cannot run my GGUF pipeline due to this issue

Metadata

Metadata

Assignees

Labels

DiscussionQuestions or discussions

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions