Newest 'huggingface-transformers' Questions

1 vote

2 answers

153 views

ImportError: cannot import name 'MistralCommonBackend' from 'transformers'

Given: $ pip install transformers mistral-common -Uq $ pip show transformers mistral-common torch # to verify Name: transformers Version: 4.57.3 Summary: State-of-the-art Machine Learning for JAX, ...

farid

1,641

asked Dec 17, 2025 at 13:09

0 votes

0 answers

84 views

Custom GRPO Trainer not Learning

I am new to reinforcement learning. So as an educational exercise, I am implementing the GRPO from scratch with pytorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...

csnate

1,661

asked Dec 15, 2025 at 1:07

0 votes

0 answers

57 views

Huggingface caching stops during download without any error

I am trying to run an example from z-image-turbo (https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) import os os.environ["HF_HOME"] = "E:/MyCustomHFCache" os.environ["...

Recently_Created_User

171

asked Dec 9, 2025 at 11:08

Best practices

0 votes

2 replies

61 views

Recommended way to create abstracted text embeddings from large text data?

I would like to use a LLM Encoder model to create vector embeddings for certain texts in my dataset. The texts are written as technical problem descriptions by experts who are trying to repair a ...

Alles Klar

31

asked Dec 8, 2025 at 12:32

-1 votes

2 answers

103 views

Repetitive generation on instruction tuning for raw language model [closed]

I have used the following code to do sft: base_model = "google/gemma-3-270m" it_model = "google/gemma-3-270m-it" checkpoint_dir = "checkpoint" learning_rate = 5e-5 #@...

keramat

4,611

asked Dec 1, 2025 at 11:33

1 vote

0 answers

153 views

How can I run Flux2 inference on 2 GPUs?

I try to run Flux2 inference on 2 GPUs as follows: import torch from diffusers import Flux2Pipeline from accelerate import PartialState import argparse from pathlib import Path def main(): parser ...

Franck Dernoncourt

84.8k

asked Nov 29, 2025 at 23:59

0 votes

0 answers

46 views

Attribution Error when using Huggingface transformers Trainer with FSDP

I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like train_dataset = Mydataset(...) args = TrainingArguments(...) model = LlamaForCausalLM....

xuehao-049

11

asked Nov 28, 2025 at 4:11

2 votes

1 answer

74 views

Transformers LlamaForCasualLM class: base_model Attribute Mystery

Question: I'm experiencing a question with the transformers library, specifically with the pipeline initialization. When I access the base_model attribute of a LlamaForCausalLM model, it seems to ...

Hank Wang

21

asked Nov 16, 2025 at 5:34

0 votes

0 answers

88 views

IndexError: index -1 is out of bounds for dimension 0 with size 0

I am currently experimenting with modifying the KV cache of the LLaVA model in order to perform controlled interventions during generation (similar to cache-steering methods in recent research). The ...

Pulkit Mittal

25

asked Nov 7, 2025 at 7:41

1 vote

0 answers

207 views

Transformers 'could not import module pipeline' to jupyter notebook

I need to to run a series of pre-trained fine-tuned models from Hugging Face to Jupyter notebook. I have updated to the latest version of both PyTorch and Transformers, but when I run the code from ...

Alex Colville

11

asked Nov 4, 2025 at 9:16

1 vote

1 answer

94 views

Xcode Can't Find swift-transformers Package

I'm trying to implement Speech-to-Text transcription in my Swift app using Hugging Face's swift-transformers package to run Whisper models locally. I've added the package to my Xcode project, but when ...

Zaid

513

asked Nov 2, 2025 at 15:07

0 votes

1 answer

83 views

Generating response with KV Cached System Prompt throws error when Input Tokens are less than Prompt Tokens

I am trying to run Mistral-7B-Instruct-v0.2. Each run is PROMPT + details[i]. PROMPT has instructions on how to generate JSON based on details. As the prefix part of each input is same; kind of like a ...

acdhemtos

1

asked Oct 28, 2025 at 22:54

0 votes

0 answers

137 views

Transformers with Python 3.12.3 produce lots of errors

I got Python 3.12.3 on an Ubuntu server. I tried to install transformers, tokenizers, datasets and accelerate to use the Seq2SeqTrainer in the transformers. I used a virtual environment for the ...

Raptor

54.4k

asked Oct 28, 2025 at 4:35

0 votes

0 answers

43 views

T5-small generates only padding tokens during validation/test in PyTorch Lightning

I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps. The Problem: During validation_step and test_step, model.generate() consistently ...

GeraniumCat

21

asked Oct 20, 2025 at 20:11

3 votes

0 answers

129 views

How does one log the operations done on a GPU during the execution of Python code?

I have encountered a particular problem while executing a function from the transformers library of huggingface on an Intel GPU wheel of torch. Since I am doing something I normally shouldn't be ...

Logarithmnepnep

31

asked Oct 17, 2025 at 11:19

Collectives™ on Stack Overflow

ImportError: cannot import name 'MistralCommonBackend' from 'transformers'

Custom GRPO Trainer not Learning

Huggingface caching stops during download without any error

Recommended way to create abstracted text embeddings from large text data?

Repetitive generation on instruction tuning for raw language model [closed]

How can I run Flux2 inference on 2 GPUs?

Attribution Error when using Huggingface transformers Trainer with FSDP

Transformers LlamaForCasualLM class: base_model Attribute Mystery

IndexError: index -1 is out of bounds for dimension 0 with size 0

Transformers 'could not import module pipeline' to jupyter notebook

Xcode Can't Find swift-transformers Package

Generating response with KV Cached System Prompt throws error when Input Tokens are less than Prompt Tokens

Transformers with Python 3.12.3 produce lots of errors

T5-small generates only padding tokens during validation/test in PyTorch Lightning

How does one log the operations done on a GPU during the execution of Python code?

Hot Network Questions