Newest 'large-language-model' Questions

0 votes

1 answer

73 views

PyTorch not recognizing RTX 5090 (sm_120) on Windows 11 – CUDA error: no kernel image available

I'm trying to use PyTorch with an NVIDIA GeForce RTX 5090 (Blackwell architecture, CUDA Compute Capability sm_120) on Windows 11, and I keep running into compatibility issues. PyTorch detects CUDA, ...

sajjadesmaili

41

asked 19 hours ago

0 votes

1 answer

32 views

Continue.dev contextLength parameter doesn't work for Ollama model

I'm trying to use Continue.dev with a local model running on Ollama, but keep getting the "file exceeds context length" error. My config looks like this: - name: qwen3-coder provider: ...

Jacob Stern

4,767

asked yesterday

Advice

1 vote

1 replies

37 views

RAG with Pinecone + GPT-5 for generating new math problems: incoherent outputs, mixed chunks, and lack of originality

I’m building a tool that generates new mathematics exam problems using an internal database of past problems. My current setup uses a RAG pipeline, Pinecone as the vector database, and GPT-5 as the ...

Marc-Loïc Abena

11

asked yesterday

-9 votes

0 answers

89 views

Why is my LLM hallucinating even after using a multi-vector RAG pipeline with FAISS? [closed]

I implemented a RAG pipeline using: OpenAI embeddings (text-embedding-3-large) Multi-vector indexing (sentence-level chunks) FAISS flat index Top-k = 5 retrieval Even after this, the LLM still ...

Karan Singwal

1

asked 2 days ago

0 votes

0 answers

50 views

Optimization Challenge in Hugging Face: Effcienntly Serving Muliple, Differently Sized LLMs on a Single Gpu with PyTorch [closed]

I am currently working on a Python based Gen AI project that requires the efficient deployment and serving of multiple LLMs specifically models with different parameter counts ( Llama-2 7B and Mistral ...

Amira Yassin

1

asked Nov 27 at 23:11

0 votes

0 answers

41 views

Missing tokenizer.json File - GGUF Conversion

When converting mistralai/Mistral-Small-3.2-24B-Instruct-2506 to GGUF (via llama_cpp), I get an error saying the tokenizer.json file is missing. After re-examining the HF repo, there in fact, is not a ...

s3dev

9,871

asked Nov 26 at 20:54

Best practices

0 votes

0 replies

27 views

How to extract nested loop features from CUDA kernels for LLM-based optimization?

Question I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...

yuxuan-z

21

asked Nov 25 at 16:34

-2 votes

0 answers

39 views

Resource specifications or requirements of Vision Language models(llm that is specialized for processing images) [closed]

I’m having difficulty finding the hardware resource specifications for different LLMs and VLMs. The leaderboard at this link — https://huggingface.co/spaces/opencompass/open_vlm_leaderboard — includes ...

G KANISHK SAMURAI

1

asked Nov 25 at 13:04

Advice

0 votes

0 replies

36 views

How can I integrate the Computer Use API to enable AI-powered navigation and actions within my own application?

I'm building a voice-assisted navigation feature for my app that would allow users to: Navigate between screens/pages using voice commands Have an AI agent take actions on the current page (clicking ...

Jhutan Debnath

535

asked Nov 22 at 5:43

-2 votes

1 answer

119 views

My Code With Openrouter api is not working [closed]

I'm trying to automate intraday financial analysis using OpenRouter's free models. My Python script rotates through several LLM endpoints, sending a prompt about a stock's symbol, change, and closing ...

Ad Du

1

asked Nov 21 at 17:02

Best practices

0 votes

2 replies

46 views

Best method to manage metrics of prompts that are stored in a git repository

We are building an LLM-based application which takes a lot of user data from various internal sources. It then sends the data to various prompts which provide the answers needed to fill out forms ...

ssp

3

asked Nov 20 at 7:49

Best practices

0 votes

0 replies

40 views

Prompt managing storing in git v.s. db

We are building an LLM-based application that takes a lot of user data from various internal sources and sends the data to various prompts which provide the answers needed to fill out forms correctly. ...

ssp

3

asked Nov 20 at 7:42

Advice

0 votes

0 replies

25 views

How is the output from prompt responses formated in the tools?

I am trying to implement an application which displays the output from text generated by LLM, using the code from https://docs.ollama.com/quickstart#python. I noticed, that for longer prompt, the ...

Krzysztof Kazmierczyk

559

asked Nov 18 at 19:05

Advice

2 votes

5 replies

81 views

Building Privacy-Compliant LLM Apps (e.g. Section 203 StGB)

I’m working on an app that leverages Large Language Models (LLMs) to assist professionals in regulated fields like medicine and law. My main concern is ensuring compliance with privacy and secrecy ...

fersarr

3,541

asked Nov 11 at 21:58

Tooling

0 votes

0 replies

87 views

Which LLMs can I run locally on RTX 1080 8GB with 48GB RAM?

I'm exploring options for running large language models locally on my workstation and would appreciate guidance on suitable models given my hardware constraints. Hardware specifications: CPU: Intel ...

GbreH

13

asked Nov 10 at 15:46

Collectives™ on Stack Overflow

PyTorch not recognizing RTX 5090 (sm_120) on Windows 11 – CUDA error: no kernel image available

Continue.dev contextLength parameter doesn't work for Ollama model

RAG with Pinecone + GPT-5 for generating new math problems: incoherent outputs, mixed chunks, and lack of originality

Why is my LLM hallucinating even after using a multi-vector RAG pipeline with FAISS? [closed]

Optimization Challenge in Hugging Face: Effcienntly Serving Muliple, Differently Sized LLMs on a Single Gpu with PyTorch [closed]

Missing tokenizer.json File - GGUF Conversion

How to extract nested loop features from CUDA kernels for LLM-based optimization?

Resource specifications or requirements of Vision Language models(llm that is specialized for processing images) [closed]

How can I integrate the Computer Use API to enable AI-powered navigation and actions within my own application?

My Code With Openrouter api is not working [closed]

Best method to manage metrics of prompts that are stored in a git repository

Prompt managing storing in git v.s. db

How is the output from prompt responses formated in the tools?

Building Privacy-Compliant LLM Apps (e.g. Section 203 StGB)

Which LLMs can I run locally on RTX 1080 8GB with 48GB RAM?

Hot Network Questions