Newest 'word-embedding' Questions

2 votes

1 answer

130 views

Saving embeddings from encoder efficiently with fast random access

I have embeddings (about 160 Million) that I created with a BERT-based encoder model. Right now they are in a .pt format and takes about 500GB in the disk. I want 2 things: To save them in an ...

Noam

55

asked Jul 30, 2025 at 9:31

0 votes

0 answers

73 views

SPECTER2 similarity performs poorly

I'm trying to compute a measure of semantic similarity between titles of scientific publications using SPECTER2, but the model performs poorly. Here is my code: from transformers import AutoTokenizer ...

robertspierre

5,495

asked Jul 25, 2025 at 9:38

1 vote

1 answer

207 views

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

My code: from transformers import AutoTokenizer, AutoModel model_name = "NVIDIA/nv-embed-v2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(...

6zL

21

asked Jul 5, 2025 at 13:27

0 votes

1 answer

288 views

How to handle negative search queries in a vector similarity search query?

I'm working with a vector database (qdrant) and using semantic search via embeddings (Sentence Transformers). My use case involves queries like: "Give animals other than cats" or "List ...

Vummenthala Sai ramana reddy

1

asked May 27, 2025 at 10:04

0 votes

1 answer

125 views

Langchain Cohere embeddings says "invalid type: parameter texts is of type object but should be of type string" inspite of receiving strings

This part of the code, specifically the part where rag_chain is invoked causes an error: Retrying langchain_cohere.embeddings.CohereEmbeddings.embed_with_retry.<locals>._embed_with_retry in 4.0 ...

Akshitha Rao

61

asked Jan 27, 2025 at 12:27

0 votes

1 answer

848 views

How to add metadata to an Index datapoint in vertex ai

I have been having problems integrating vertex ai matching engine for embedding similarity search using the google cloud aiplatform python sdk, specifically when adding metadata to an index datapoint ...

Precious112

155

asked Jan 26, 2025 at 2:11

1 vote

1 answer

187 views

Create native picture embeddings and then make a similarity search with text

Is it generally possible to create image embeddings directly (without additional text) and store them in a database? The aim is to make the content of the images findable later via a text input in the ...

Felix

201

asked Jan 2, 2025 at 12:50

0 votes

0 answers

45 views

IndexError: index out of range in self when training Dynamic Word Embedding Model (DWB)

I am a beginner in text mining analysis and am currently learning Dynamic Word Embedding (DWB) techniques. While running the replication codes from this Kaggle notebook, I encountered the following ...

ryang6476

13

asked Dec 14, 2024 at 13:51

1 vote

1 answer

174 views

Do I fit t-SNE the two sets of embeddings from two different models at the same time or do I fit each separately then visualize and compare?

I have two sets of embeddings from two different GNNs. I want to compare the embeddings by visualization and I want to know which way is the most appropriate way for comparison. Do I fit t-SNE ...

Darkmoon Chief

140

asked Nov 29, 2024 at 20:03

0 votes

1 answer

665 views

Dimensionality reduction in sentence transformers

I need to compute embeddings for a large number of sentences (say 10K) in preprocessing, and at runtime I will have to compute the embedding vector for one sentence at a time (user query), and then ...

Alaa M.

5,382

asked Nov 18, 2024 at 8:31

0 votes

1 answer

469 views

Can not load the safetensors huggingface model in DJL in Java

I tried a lot, but I want to read embeddings from the jina embeddings this is my java code: public static float[] getTextEmbedding(String text) throws ModelNotFoundException, MalformedModelException, ...

Richard Burkhardt

883

asked Nov 16, 2024 at 0:04

-1 votes

1 answer

114 views

Classifying Table Headers from Column Data

I have a large set CSVs with numeric and text data; here's a sample: Company ID Company Name Group ID Currency Amount ... 8494494 Acme Inc F942G EUR $1.56 ... 9283422A Walmart XXH3F3 AUD $5.64 ... ... ...

olives

118

asked Nov 1, 2024 at 5:09

0 votes

1 answer

549 views

How do I freeze only some embedding indices with tied embeddings?

I found in Is it possible to freeze only certain embedding weights in the embedding layer in pytorch? a nice way to freeze only some indices of an embedding layer. However, while including it in a ...

Mirco Ramo

11

asked Oct 7, 2024 at 9:42

1 vote

0 answers

66 views

extracting word embeddings from log probabilities

I am solving an image captioning related issue and eventually I have extract the embeddings of the tokens. One possible way is to extract the embeddings using the tokens. But I cannot do do that ...

user491683

21

asked Sep 9, 2024 at 23:26

0 votes

1 answer

226 views

Why does my SimpleRNN model in Sequential API show '?' for output shape and zero trainable parameters when using Embedding layers?

I'm building a SimpleRNN model with an Embedding layer in Keras and encountering an issue when using the Sequential API. The model summary shows the output shape as ? and the number of trainable ...

Arjun Kumar

1

asked Aug 25, 2024 at 11:15

Collectives™ on Stack Overflow

Saving embeddings from encoder efficiently with fast random access

SPECTER2 similarity performs poorly

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

How to handle negative search queries in a vector similarity search query?

Langchain Cohere embeddings says "invalid type: parameter texts is of type object but should be of type string" inspite of receiving strings

How to add metadata to an Index datapoint in vertex ai

Create native picture embeddings and then make a similarity search with text

IndexError: index out of range in self when training Dynamic Word Embedding Model (DWB)

Do I fit t-SNE the two sets of embeddings from two different models at the same time or do I fit each separately then visualize and compare?

Dimensionality reduction in sentence transformers

Can not load the safetensors huggingface model in DJL in Java

Classifying Table Headers from Column Data

How do I freeze only some embedding indices with tied embeddings?

extracting word embeddings from log probabilities

Why does my SimpleRNN model in Sequential API show '?' for output shape and zero trainable parameters when using Embedding layers?

Hot Network Questions