Frequent 'huggingface-transformers' Questions

19 votes

3 answers

27k views

BERT sentence embeddings from transformers

I'm trying to get sentence vectors from hidden states in a BERT model. Looking at the huggingface BertModel instructions here, which say: from transformers import BertTokenizer, BertModel tokenizer = ...

Mittenchops

20k

asked Aug 18, 2020 at 3:00

141 votes

8 answers

249k views

How to change huggingface transformers default cache directory?

The default cache directory lacks disk capacity, I need to change the configuration of the default cache directory. How can I do that?

Ivan Lee

4,431

asked Aug 8, 2020 at 7:28

0 votes

2 answers

7k views

How to use huggingface HF trainer train with custom collate function?

I have some custom data set with custom table entries and wanted to deal with it with a custom collate. But it didn't work when I pass a collate function I wrote (that DOES work on a individual ...

Charlie Parker

6,276

asked Aug 10, 2023 at 23:22

12 votes

7 answers

15k views

How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?

tldr; what I really want to know is what is the official way to set pad token for fine tuning it wasn't set during original training, so that it doesn't not learn to predict EOS. colab: https://colab....

Charlie Parker

6,276

asked Jul 7, 2023 at 1:11

77 votes

6 answers

291k views

Load a pre-trained model from disk with Huggingface Transformers

From the documentation for from_pretrained, I understand I don't have to download the pretrained vectors every time, I can save them and load from disk with this syntax: - a path to a `directory` ...

Mittenchops

20k

asked Sep 21, 2020 at 23:23

22 votes

10 answers

90k views

SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /dslim/bert-base-NER/resolve/main/tokenizer_config.json

I am facing below issue while loading the pretrained BERT model from HuggingFace due to SSL certificate error. Error: SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries ...

Nikita Malviya

1,729

asked Jan 13, 2023 at 15:09

17 votes

4 answers

29k views

Sentence embeddings from LLAMA 2 Huggingface opensource

Is there any way of getting sentence embeddings from meta-llama/Llama-2-13b-chat-hf from huggingface? Model link: https://huggingface.co/meta-llama/Llama-2-13b-chat-hf I tried using transfomer....

Mukesh Reddy

179

asked Aug 18, 2023 at 1:59

14 votes

2 answers

28k views

HuggingFace: ValueError: expected sequence of length 165 at dim 1 (got 128)

I am trying to fine-tune the BERT language model on my own data. I've gone through their docs, but their tasks seem to be not quite what I need, since my end goal is embedding text. Here's my code: ...

Rahul

1,276

asked Feb 17, 2022 at 23:45

13 votes

4 answers

14k views

How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

I've been looking to use Hugging Face's Pipelines for NER (named entity recognition). However, it is returning the entity labels in inside-outside-beginning (IOB) format but without the IOB labels. So ...

Union find

8,280

asked Mar 30, 2020 at 18:58

5 votes

4 answers

10k views

TFBertForSequenceClassification requires the TensorFlow library but it was not found in your environment

this: import tensorflow as tf from transformers import BertTokenizer, TFBertForSequenceClassification model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased") Outputs ...

zest16

689

asked Jan 7, 2022 at 17:15

4 votes

3 answers

16k views

How to add all standard special tokens to my hugging face tokenizer and model?

I want all special tokens to always be available. How do I do this? My first attempt to give it to my tokenizer: def does_t5_have_sep_token(): tokenizer: PreTrainedTokenizerFast = AutoTokenizer....

Charlie Parker

6,276

asked Aug 11, 2022 at 14:32

134 votes

9 answers

155k views

Where does hugging face's transformers save models?

Running the below code downloads a model - does anyone know what folder it downloads it to? !pip install -q transformers from transformers import pipeline model = pipeline('fill-mask')

user3472360

2,215

asked May 14, 2020 at 13:27

74 votes

5 answers

112k views

How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

I use pytorch to train huggingface-transformers model, but every epoch, always output the warning: The current process just got forked. Disabling parallelism to avoid deadlocks... To disable this ...

snowzjy

851

asked Jul 2, 2020 at 7:35

14 votes

2 answers

14k views

How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model. In ...

tlqn

399

asked Jan 9, 2021 at 19:46

9 votes

2 answers

6k views

How to convert a PyTorch nn.Module into a HuggingFace PreTrainedModel object?

Given a simple neural net in Pytorch like: import torch.nn as nn net = nn.Sequential( nn.Linear(3, 4), nn.Sigmoid(), nn.Linear(4, 1), nn.Sigmoid() ).to(device) How do I ...

alvas

123k

asked Oct 4, 2022 at 12:56

Collectives™ on Stack Overflow

BERT sentence embeddings from transformers

How to change huggingface transformers default cache directory?

How to use huggingface HF trainer train with custom collate function?

How does one set the pad token correctly (not to eos) during fine-tuning to avoid model not predicting EOS?

Load a pre-trained model from disk with Huggingface Transformers

SSLError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /dslim/bert-base-NER/resolve/main/tokenizer_config.json

Sentence embeddings from LLAMA 2 Huggingface opensource

HuggingFace: ValueError: expected sequence of length 165 at dim 1 (got 128)

How to reconstruct text entities with Hugging Face's transformers pipelines without IOB tags?

TFBertForSequenceClassification requires the TensorFlow library but it was not found in your environment

How to add all standard special tokens to my hugging face tokenizer and model?

Where does hugging face's transformers save models?

How to disable TOKENIZERS_PARALLELISM=(true | false) warning?

How to train BERT from scratch on a new domain for both MLM and NSP?

How to convert a PyTorch nn.Module into a HuggingFace PreTrainedModel object?

Hot Network Questions