Skip to main content
0 votes
1 answer
73 views

I'm trying to use PyTorch with an NVIDIA GeForce RTX 5090 (Blackwell architecture, CUDA Compute Capability sm_120) on Windows 11, and I keep running into compatibility issues. PyTorch detects CUDA, ...
sajjadesmaili's user avatar
0 votes
1 answer
32 views

I'm trying to use Continue.dev with a local model running on Ollama, but keep getting the "file exceeds context length" error. My config looks like this: - name: qwen3-coder provider: ...
Jacob Stern's user avatar
  • 4,767
Advice
1 vote
1 replies
37 views

I’m building a tool that generates new mathematics exam problems using an internal database of past problems. My current setup uses a RAG pipeline, Pinecone as the vector database, and GPT-5 as the ...
Marc-Loïc Abena's user avatar
-9 votes
0 answers
89 views

I implemented a RAG pipeline using: OpenAI embeddings (text-embedding-3-large) Multi-vector indexing (sentence-level chunks) FAISS flat index Top-k = 5 retrieval Even after this, the LLM still ...
Karan Singwal's user avatar
0 votes
0 answers
50 views

I am currently working on a Python based Gen AI project that requires the efficient deployment and serving of multiple LLMs specifically models with different parameter counts ( Llama-2 7B and Mistral ...
Amira Yassin's user avatar
0 votes
0 answers
41 views

When converting mistralai/Mistral-Small-3.2-24B-Instruct-2506 to GGUF (via llama_cpp), I get an error saying the tokenizer.json file is missing. After re-examining the HF repo, there in fact, is not a ...
s3dev's user avatar
  • 9,871
Best practices
0 votes
0 replies
27 views

Question I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...
yuxuan-z's user avatar
-2 votes
0 answers
39 views

I’m having difficulty finding the hardware resource specifications for different LLMs and VLMs. The leaderboard at this link — https://huggingface.co/spaces/opencompass/open_vlm_leaderboard — includes ...
G KANISHK SAMURAI's user avatar
Advice
0 votes
0 replies
36 views

I'm building a voice-assisted navigation feature for my app that would allow users to: Navigate between screens/pages using voice commands Have an AI agent take actions on the current page (clicking ...
Jhutan Debnath's user avatar
-2 votes
1 answer
119 views

I'm trying to automate intraday financial analysis using OpenRouter's free models. My Python script rotates through several LLM endpoints, sending a prompt about a stock's symbol, change, and closing ...
Ad Du's user avatar
  • 1
Best practices
0 votes
2 replies
46 views

We are building an LLM-based application which takes a lot of user data from various internal sources. It then sends the data to various prompts which provide the answers needed to fill out forms ...
ssp's user avatar
  • 3
Best practices
0 votes
0 replies
40 views

We are building an LLM-based application that takes a lot of user data from various internal sources and sends the data to various prompts which provide the answers needed to fill out forms correctly. ...
ssp's user avatar
  • 3
Advice
0 votes
0 replies
25 views

I am trying to implement an application which displays the output from text generated by LLM, using the code from https://docs.ollama.com/quickstart#python. I noticed, that for longer prompt, the ...
Krzysztof Kazmierczyk's user avatar
Advice
2 votes
5 replies
81 views

I’m working on an app that leverages Large Language Models (LLMs) to assist professionals in regulated fields like medicine and law. My main concern is ensuring compliance with privacy and secrecy ...
fersarr's user avatar
  • 3,541
Tooling
0 votes
0 replies
87 views

I'm exploring options for running large language models locally on my workstation and would appreciate guidance on suitable models given my hardware constraints. Hardware specifications: CPU: Intel ...
GbreH's user avatar
  • 13

15 30 50 per page
1
2 3 4 5
110