1,107 questions
2
votes
1
answer
130
views
Saving embeddings from encoder efficiently with fast random access
I have embeddings (about 160 Million) that I created with a BERT-based encoder model.
Right now they are in a .pt format and takes about 500GB in the disk.
I want 2 things:
To save them in an ...
0
votes
0
answers
73
views
SPECTER2 similarity performs poorly
I'm trying to compute a measure of semantic similarity between titles of scientific publications using SPECTER2, but the model performs poorly.
Here is my code:
from transformers import AutoTokenizer
...
1
vote
1
answer
207
views
(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'
My code:
from transformers import AutoTokenizer, AutoModel
model_name = "NVIDIA/nv-embed-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(...
0
votes
1
answer
288
views
How to handle negative search queries in a vector similarity search query?
I'm working with a vector database (qdrant) and using semantic search via embeddings (Sentence Transformers).
My use case involves queries like:
"Give animals other than cats"
or
"List ...
0
votes
1
answer
125
views
Langchain Cohere embeddings says "invalid type: parameter texts is of type object but should be of type string" inspite of receiving strings
This part of the code, specifically the part where rag_chain is invoked causes an error:
Retrying langchain_cohere.embeddings.CohereEmbeddings.embed_with_retry.<locals>._embed_with_retry in 4.0 ...
0
votes
1
answer
848
views
How to add metadata to an Index datapoint in vertex ai
I have been having problems integrating vertex ai matching engine for embedding similarity search using the google cloud aiplatform python sdk, specifically when adding metadata to an index datapoint ...
1
vote
1
answer
187
views
Create native picture embeddings and then make a similarity search with text
Is it generally possible to create image embeddings directly (without additional text) and store them in a database? The aim is to make the content of the images findable later via a text input in the ...
0
votes
0
answers
45
views
IndexError: index out of range in self when training Dynamic Word Embedding Model (DWB)
I am a beginner in text mining analysis and am currently learning Dynamic Word Embedding (DWB) techniques. While running the replication codes from this Kaggle notebook, I encountered the following ...
1
vote
1
answer
174
views
Do I fit t-SNE the two sets of embeddings from two different models at the same time or do I fit each separately then visualize and compare?
I have two sets of embeddings from two different GNNs. I want to compare the embeddings by visualization and I want to know which way is the most appropriate way for comparison. Do I fit t-SNE ...
0
votes
1
answer
665
views
Dimensionality reduction in sentence transformers
I need to compute embeddings for a large number of sentences (say 10K) in preprocessing, and at runtime I will have to compute the embedding vector for one sentence at a time (user query), and then ...
0
votes
1
answer
469
views
Can not load the safetensors huggingface model in DJL in Java
I tried a lot, but I want to read embeddings from the jina embeddings
this is my java code:
public static float[] getTextEmbedding(String text) throws ModelNotFoundException, MalformedModelException, ...
-1
votes
1
answer
114
views
Classifying Table Headers from Column Data
I have a large set CSVs with numeric and text data; here's a sample:
Company ID
Company Name
Group ID
Currency
Amount
...
8494494
Acme Inc
F942G
EUR
$1.56
...
9283422A
Walmart
XXH3F3
AUD
$5.64
...
...
...
0
votes
1
answer
549
views
How do I freeze only some embedding indices with tied embeddings?
I found in Is it possible to freeze only certain embedding weights in the embedding layer in pytorch? a nice way to freeze only some indices of an embedding layer.
However, while including it in a ...
1
vote
0
answers
66
views
extracting word embeddings from log probabilities
I am solving an image captioning related issue and eventually I have extract the embeddings of the tokens. One possible way is to extract the embeddings using the tokens. But I cannot do do that ...
0
votes
1
answer
226
views
Why does my SimpleRNN model in Sequential API show '?' for output shape and zero trainable parameters when using Embedding layers?
I'm building a SimpleRNN model with an Embedding layer in Keras and encountering an issue when using the Sequential API. The model summary shows the output shape as ? and the number of trainable ...