Newest 'llama-cpp-python' Questions

1 vote

1 answer

115 views

How to build and install a BLAS-enabled llama-cpp-python (GGML_BLAS=ON) on WSL (Ubuntu)?

I’m trying to install a BLAS-enabled version of llama-cpp-python on WSL so that the GGML library uses OpenBLAS. I attempted two different pip install invocations with CMAKE_ARGS, but the module ...

Kunal Gupta

11

asked Dec 12, 2025 at 9:58

0 votes

1 answer

710 views

How to properly install llama-cpp-python on windows 11 with GPU support

I have been trying to install llama-cpp-python for windows 11 with GPU support for a while, and it just doesn't work no matter how I try. I installed the necessary visual studio toolkit packages, ...

MiszS

11

asked Oct 4, 2025 at 22:29

1 vote

0 answers

394 views

Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python

I tried to install llama-cpp-python via pip, but I have an error with the installation The command that I wrote: CMAKE_ARGS="-DLLAMA_METAL_EMBED_LIBRARY=ON -DLLAMA_METAL=on" pip install ...

ZZISST

21

asked Aug 26, 2025 at 20:53

1 vote

0 answers

222 views

llama-cpp-python installing for x86_64 instead of arm64

I am trying to set up local, high speed NLP but am failing to install the arm64 version of llama-cpp-python. Even when I run CMAKE_ARGS="-DLLAMA_METAL=on -DLLAMA_METAL_EMBED_LIBRARY=on" \ ...

Dennis Losett

11

asked Jun 22, 2025 at 16:45

4 votes

0 answers

251 views

Cannot interence with images on llama-cpp-python

I am new to this. I have been trying but could not make the the model answer on images. from llama_cpp import Llama import torch from PIL import Image import base64 llm = Llama( model_path='Holo1-...

Abhash Rai

71

asked Jun 7, 2025 at 5:50

0 votes

0 answers

128 views

llama-cpp and transformers with pyinstaller in creation of .exe file

I am attempting to bundle a rag agent into a .exe. However on usage of the .exe i keep running into the same two problems. The first initial problem is with locating llama-cpp, which i have fixed. The ...

Arnab Mandal

1

asked May 23, 2025 at 6:57

-1 votes

2 answers

631 views

while pip install llama-cpp-python getting error on windows pc

Creating directory "llava_shared.dir\Release". Structured output is enabled. The formatting of compiler diagnostics will reflect the error hierarchy. See https://aka.ms/cpp/structured-output ...

sandeep

161

asked Feb 14, 2025 at 12:48

0 votes

0 answers

102 views

Generating an n-gram dataset based on an LLM

I want a dataset of common n-grams and their log likelihoods. Normally I would download the Google Books Ngram Exports, but I wonder if I can generate a better dataset using a large language model. ...

evashort

1

asked Feb 9, 2025 at 14:21

1 vote

0 answers

150 views

Does Ollama guarantee cross-platform determinism with identical quantization, seed, temperature, and version but differing hardware?

I’m working on a project that requires fully deterministic outputs across different machines using Ollama. I’ve ensured the following parameters are identical: Model quantization (e.g., llama2:7b-q4_0)...

user29255210

65

asked Jan 27, 2025 at 9:13

2 votes

1 answer

1k views

How to make a llm remember previous runtime chats

I want my llm chatbot to remember previous conversations even after restarting the program. It is made with llama cpp python and langchain, it has conversation memory of the present chat but obviously ...

QUARKS

29

asked Oct 26, 2024 at 18:02

1 vote

0 answers

77 views

My llama2 model is talking to itself asking question and answering it to them using Conversational retrieval chain

I was implementing RAG on a document with using the LLama2 model but my model is asking questions to itself and answering it to them. llm = LlamaCpp(model_path=model_path, temperature=0, ...

Knox

21

asked Oct 10, 2024 at 18:03

0 votes

0 answers

192 views

Unable to set top_k value in Llama cpp Python server

I start llama cpp Python server with the command: python -m llama_cpp.server --model D:\Mistral-7B-Instruct-v0.3.Q4_K_M.gguf --n_ctx 8192 --chat_format functionary Then I run my Python script which ...

Jengi829

1

asked Aug 29, 2024 at 15:13

2 votes

1 answer

1k views

How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?

I want to manually choose my tokens by myself, instead of letting llama-cpp-python automatically choose one for me. This requires me to see a list of candidate next tokens, along their probabilities, ...

caveman

464

asked Aug 24, 2024 at 18:15

0 votes

2 answers

880 views

How do I stream output as it is being generated by an LLM in Streamlit?

code: from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings from langchain import PromptTemplate from langchain_community.llms import ...

Ashish Sawant

1

asked Jul 26, 2024 at 5:27

1 vote

0 answers

173 views

LLama 2 prompt template

I am trying to build a chatbot using LangChain. This chatbot uses different backend: Ollama Huggingfaces LLama.cpp Open AI and in a YAML file, I can configure the back end (aka provider) and the ...

Salvatore D'angelo

1,179

asked Jul 24, 2024 at 5:50

Collectives™ on Stack Overflow

How to build and install a BLAS-enabled llama-cpp-python (GGML_BLAS=ON) on WSL (Ubuntu)?

How to properly install llama-cpp-python on windows 11 with GPU support

Failed to build installable wheels for some pyproject.toml based projects llama-cpp-python

llama-cpp-python installing for x86_64 instead of arm64

Cannot interence with images on llama-cpp-python

llama-cpp and transformers with pyinstaller in creation of .exe file

while pip install llama-cpp-python getting error on windows pc

Generating an n-gram dataset based on an LLM

Does Ollama guarantee cross-platform determinism with identical quantization, seed, temperature, and version but differing hardware?

How to make a llm remember previous runtime chats

My llama2 model is talking to itself asking question and answering it to them using Conversational retrieval chain

Unable to set top_k value in Llama cpp Python server

How to use `llama-cpp-python` to output list of candidate tokens and their probabilities?

How do I stream output as it is being generated by an LLM in Streamlit?

LLama 2 prompt template

Hot Network Questions