Fix/cuda build quantization #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

silvaxxx1 wants to merge 2 commits into mlabonne:main from silvaxxx1:fix/cuda-build-quantization

silvaxxx1 commented Feb 28, 2025

Addresses build errors from deprecated CUDA flags

Changes Proposed

Updated CMake configuration to use GGML_CUDA instead of deprecated LLAMA_CUBLAS
Fixed paths for conversion script and quantize binary
Added proper error handling for model downloads
Updated documentation for build requirements

Testing Performed

Verified build success on Ubuntu 22.04 with CUDA 12.1
Tested full quantization workflow with EvolCodeLlama-7b
Confirmed GPU acceleration working with nvidia-smi monitoring

Notes for Reviewers

Requires CUDA toolkit 11.x-12.x
Tested with Python 3.10
Added dependency on git-lfs in documentation

silvaxxx1 added 2 commits

February 28, 2025 18:12


          fix: update build process and quantization paths

0e061c7

- Replace deprecated LLAMA_CUBLAS with GGML_CUDA
- Update conversion script to convert-hf-to-gguf.py
- Fix quantize binary path
- Add error handling for model downloads


          fix: update build process and quantization paths

8b01880

- Replace deprecated LLAMA_CUBLAS with GGML_CUDA
- Update conversion script to convert-hf-to-gguf.py
- Fix quantize binary path
- Add error handling for model downloads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment