1,648 questions
0
votes
1
answer
73
views
PyTorch not recognizing RTX 5090 (sm_120) on Windows 11 – CUDA error: no kernel image available
I'm trying to use PyTorch with an NVIDIA GeForce RTX 5090 (Blackwell architecture, CUDA Compute Capability sm_120) on Windows 11, and I keep running into compatibility issues. PyTorch detects CUDA, ...
0
votes
1
answer
32
views
Continue.dev contextLength parameter doesn't work for Ollama model
I'm trying to use Continue.dev with a local model running on Ollama, but keep getting the "file exceeds context length" error.
My config looks like this:
- name: qwen3-coder
provider: ...
Advice
1
vote
1
replies
37
views
RAG with Pinecone + GPT-5 for generating new math problems: incoherent outputs, mixed chunks, and lack of originality
I’m building a tool that generates new mathematics exam problems using an internal database of past problems.
My current setup uses a RAG pipeline, Pinecone as the vector database, and GPT-5 as the ...
-9
votes
0
answers
89
views
Why is my LLM hallucinating even after using a multi-vector RAG pipeline with FAISS? [closed]
I implemented a RAG pipeline using:
OpenAI embeddings (text-embedding-3-large)
Multi-vector indexing (sentence-level chunks)
FAISS flat index
Top-k = 5 retrieval
Even after this, the LLM still ...
0
votes
0
answers
50
views
Optimization Challenge in Hugging Face: Effcienntly Serving Muliple, Differently Sized LLMs on a Single Gpu with PyTorch [closed]
I am currently working on a Python based Gen AI project that requires the efficient deployment and serving of multiple LLMs specifically models with different parameter counts ( Llama-2 7B and Mistral ...
0
votes
0
answers
41
views
Missing tokenizer.json File - GGUF Conversion
When converting mistralai/Mistral-Small-3.2-24B-Instruct-2506 to GGUF (via llama_cpp), I get an error saying the tokenizer.json file is missing. After re-examining the HF repo, there in fact, is not a ...
Best practices
0
votes
0
replies
27
views
How to extract nested loop features from CUDA kernels for LLM-based optimization?
Question
I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...
-2
votes
0
answers
39
views
Resource specifications or requirements of Vision Language models(llm that is specialized for processing images) [closed]
I’m having difficulty finding the hardware resource specifications for different LLMs and VLMs. The leaderboard at this link — https://huggingface.co/spaces/opencompass/open_vlm_leaderboard — includes ...
Advice
0
votes
0
replies
36
views
How can I integrate the Computer Use API to enable AI-powered navigation and actions within my own application?
I'm building a voice-assisted navigation feature for my app that would allow users to:
Navigate between screens/pages using voice commands
Have an AI agent take actions on the current page (clicking ...
-2
votes
1
answer
119
views
My Code With Openrouter api is not working [closed]
I'm trying to automate intraday financial analysis using OpenRouter's free models. My Python script rotates through several LLM endpoints, sending a prompt about a stock's symbol, change, and closing ...
Best practices
0
votes
2
replies
46
views
Best method to manage metrics of prompts that are stored in a git repository
We are building an LLM-based application which takes a lot of user data from various internal sources.
It then sends the data to various prompts which provide the answers needed to fill out forms ...
Best practices
0
votes
0
replies
40
views
Prompt managing storing in git v.s. db
We are building an LLM-based application that takes a lot of user data from various internal sources and sends the data to various prompts which provide the answers needed to fill out forms correctly.
...
Advice
0
votes
0
replies
25
views
How is the output from prompt responses formated in the tools?
I am trying to implement an application which displays the output from text generated by LLM, using the code from https://docs.ollama.com/quickstart#python.
I noticed, that for longer prompt, the ...
Advice
2
votes
5
replies
81
views
Building Privacy-Compliant LLM Apps (e.g. Section 203 StGB)
I’m working on an app that leverages Large Language Models (LLMs) to assist professionals in regulated fields like medicine and law. My main concern is ensuring compliance with privacy and secrecy ...
Tooling
0
votes
0
replies
87
views
Which LLMs can I run locally on RTX 1080 8GB with 48GB RAM?
I'm exploring options for running large language models locally on my workstation and would appreciate guidance on suitable models given my hardware constraints.
Hardware specifications:
CPU: Intel ...