[Bug] Unsloth llama.cpp installation overflows out of the venv into the global env

Did you update? pip install --upgrade unsloth unsloth_zoo > YES
Colab or Kaggle or local / cloud > LOCAL DEBIAN 12 LXC
Number GPUs used, use nvidia-smi > 1
Which notebook? Please link! > none
Which Unsloth version, TRL version, transformers version, PyTorch version?

# uv pip freeze                                            
Using Python 3.11.2 environment at: venv
accelerate==1.11.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.1
aiosignal==1.4.0
annotated-types==0.7.0
anyio==4.11.0
attrs==25.4.0
bitsandbytes==0.48.1
certifi==2025.10.5
charset-normalizer==3.4.4
cut-cross-entropy==25.1.1
datasets==4.2.0
diffusers==0.35.2
dill==0.4.0
diskcache==5.6.3
docstring-parser==0.17.0
filelock==3.20.0
frozenlist==1.8.0
fsspec==2025.9.0
gguf==0.17.1
h11==0.16.0
hf-transfer==0.1.9
hf-xet==1.1.10
httpcore==1.0.9
httpx==0.28.1
huggingface-hub==0.35.3
idna==3.11
importlib-metadata==8.7.0
jinja2==3.1.6
jsonschema==4.25.1
jsonschema-specifications==2025.9.1
llama-cpp-python==0.3.16
markdown-it-py==4.0.0
markupsafe==3.0.3
mdurl==0.1.2
mistral-common==1.8.5
mpmath==1.3.0
msgspec==0.19.0
multidict==6.7.0
multiprocess==0.70.16
networkx==3.5
numpy==2.3.4
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.3
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvtx-cu12==12.8.90
packaging==25.0
pandas==2.3.3
peft==0.17.1
pillow==12.0.0
propcache==0.4.1
protobuf==6.33.0
psutil==7.1.1
pyarrow==21.0.0
pycountry==24.6.1
pydantic==2.12.3
pydantic-core==2.41.4
pydantic-extra-types==2.10.6
pygments==2.19.2
python-dateutil==2.9.0.post0
pytz==2025.2
pyyaml==6.0.3
referencing==0.37.0
regex==2025.9.18
requests==2.32.5
rich==14.2.0
rpds-py==0.27.1
safetensors==0.6.2
sentencepiece==0.2.1
setuptools==80.9.0
shtab==1.7.2
six==1.17.0
sniffio==1.3.1
sympy==1.14.0
tiktoken==0.12.0
tokenizers==0.22.1
torch==2.8.0
torchao==0.13.0
torchvision==0.23.0
tqdm==4.67.1
transformers==4.56.2
triton==3.4.0
trl==0.23.0
typeguard==4.4.4
typing-extensions==4.15.0
typing-inspection==0.4.2
tyro==0.9.35
tzdata==2025.2
unsloth==2025.10.7
unsloth-zoo==2025.10.8
urllib3==2.5.0
wheel==0.45.1
xformers==0.0.32.post2
xxhash==3.6.0
yarl==1.22.0
zipp==3.23.0

Which trainer? SFTTrainer, GRPOTrainer etc > none
Faulty code:

from unsloth import FastLanguageModel

import os

if not os.path.isdir('llama.cpp'):
    # until now i have been using this because did not work otherway
    subprocess.run(f'CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python #only for the quant stuff', shell=True, executable="/bin/bash")
    # until now I have been installing llama.cpp because other way it did not used CUDA its automatic instalation
    # anyway the bug occurs either you install it manually or let it install it by it self
    subprocess.run(f"apt install libcurl4-openssl-dev -y && git clone --recursive https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build --config Release -j {os.cpu_count()} && cd .. ", shell=True, executable="/bin/bash")


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="kittawere/Llama-KW-CwP-V1-3B-notseparated", # just for test use another if u want
    load_in_4bit=False
)

model.save_pretrained_gguf(
    "./test_gguf",
    tokenizer=tokenizer,
    quantization_method="q8_0",
)

The problem

When trying to quantify models, it install llama.cpp and also some libraries like "gguf" (yes I have try to install them manually) and it is trying to use global pip instead of the one in the venv

error:

root@KW-ai:~/unsloth_easy_lora# source ./venv/bin/activate
(venv) root@KW-ai:~/unsloth_easy_lora# python tt.py 
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
INFO:unsloth_zoo.log: *** `model_type_list` in `get_transformers_model_type` is None!
==((====))==  Unsloth 2025.10.7: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA GeForce RTX 4070. Num GPUs = 1. Max memory: 11.721 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 8.9. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.18it/s]
Unsloth 2025.10.7 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.
Unsloth: Merging model weights to 16-bit format...
Found HuggingFace hub cache directory: /root/.cache/huggingface/hub
Checking cache directory for required files...
Unsloth: Copying 2 files from cache to `./test_gguf`: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.65s/it]
Successfully copied all 2 files from cache to `./test_gguf`
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Unsloth: Preparing safetensor model files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 95325.09it/s]
Unsloth: Merging weights into 16bit: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:33<00:00, 46.83s/it]
Unsloth: Merge process complete. Saved to `/root/unsloth_easy_lora/test_gguf`
Unsloth: Converting to GGUF format...
==((====))==  Unsloth: Conversion from HF to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF bf16 might take 3 minutes.
\        /    [2] Converting GGUF bf16 to ['q8_0'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: llama.cpp folder exists but binaries not found - will rebuild
Unsloth: Updating system package directories
Unsloth: All required system packages already installed!
Unsloth: Install llama.cpp and building - please wait 1 to 3 minutes
Unsloth: Install GGUF and other packages
Traceback (most recent call last):
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1074, in save_to_gguf
    quantizer_location, converter_location = check_llama_cpp()
                                             ^^^^^^^^^^^^^^^^^
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 343, in check_llama_cpp
    raise RuntimeError(
RuntimeError: Unsloth: No working quantizer found in llama.cpp
Files in directory: convert_hf_to_gguf.py, flake.nix, ci, Makefile, grammars, common, README.md, tools, scripts, requirements, licenses, flake.lock, docs, gguf-py, convert_hf_to_gguf_update.py, CMakeLists.txt, LICENSE, build-xcframework.sh, mypy.ini, pyproject.toml

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1835, in unsloth_save_pretrained_gguf
    all_file_locations, want_full_precision, is_vlm_update = save_to_gguf(
                                                             ^^^^^^^^^^^^^
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1085, in save_to_gguf
    quantizer_location, converter_location = install_llama_cpp(
                                             ^^^^^^^^^^^^^^^^^^
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 415, in install_llama_cpp
    try_execute(f"{pip} install gguf protobuf sentencepiece mistral_common", **kwargs)
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py", line 275, in try_execute
    raise RuntimeError(error_msg)
RuntimeError: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1
stdout: error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.



During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/unsloth_easy_lora/tt.py", line 71, in <module>
    model.save_pretrained_gguf(
  File "/root/unsloth_easy_lora/venv/lib/python3.11/site-packages/unsloth/save.py", line 1855, in unsloth_save_pretrained_gguf
    raise RuntimeError(f"Unsloth: GGUF conversion failed: {e}")
RuntimeError: Unsloth: GGUF conversion failed: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1
stdout: error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

Basically the lines I think are important:

try_execute(f"{pip} install gguf protobuf sentencepiece mistral_common", **kwargs)
RuntimeError: [FAIL] Command `pip install gguf protobuf sentencepiece mistral_common` failed with exit code 1

And it fails due to the Debian setting to do not allow things to be installed on the global env (Yeah I can change it but I don't need unsloth right now so I wont mess with the global env)

Relevant stuff

In try_execute(f"{pip} install gguf protobuf sentencepiece mistral_common", **kwargs) , pip is a var set by check_pip() on venv/lib/python3.11/site-packages/unsloth_zoo/llama_cpp.py and it has a var called PIP_OPTIONS that inside have uv pip suggesting uv is supported so I did not test with pip instead of uv

note

I don't know if unsloth is meant to work inside a venv (in that case close the issue :] ) I am just creating this issue to report that I cannot confirm that #3215 is fixed since I cannot run my GGUF pipeline due to this issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug] Unsloth llama.cpp installation overflows out of the venv into the global env #3487

The problem

Relevant stuff

note

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Unsloth llama.cpp installation overflows out of the venv into the global env #3487

Description

The problem

Relevant stuff

note

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions