3,391 questions
1
vote
2
answers
153
views
ImportError: cannot import name 'MistralCommonBackend' from 'transformers'
Given:
$ pip install transformers mistral-common -Uq
$ pip show transformers mistral-common torch # to verify
Name: transformers
Version: 4.57.3
Summary: State-of-the-art Machine Learning for JAX, ...
0
votes
0
answers
84
views
Custom GRPO Trainer not Learning
I am new to reinforcement learning. So as an educational exercise, I am implementing the GRPO from scratch with pytorch. My goal is mimic how TRL works, but boil it down to just the loss function and ...
0
votes
0
answers
57
views
Huggingface caching stops during download without any error
I am trying to run an example from z-image-turbo
(https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
import os
os.environ["HF_HOME"] = "E:/MyCustomHFCache"
os.environ["...
Best practices
0
votes
2
replies
61
views
Recommended way to create abstracted text embeddings from large text data?
I would like to use a LLM Encoder model to create vector embeddings for certain texts in my dataset. The texts are written as technical problem descriptions by experts who are trying to repair a ...
-1
votes
2
answers
103
views
Repetitive generation on instruction tuning for raw language model [closed]
I have used the following code to do sft:
base_model = "google/gemma-3-270m"
it_model = "google/gemma-3-270m-it"
checkpoint_dir = "checkpoint"
learning_rate = 5e-5 #@...
1
vote
0
answers
153
views
How can I run Flux2 inference on 2 GPUs?
I try to run Flux2 inference on 2 GPUs as follows:
import torch
from diffusers import Flux2Pipeline
from accelerate import PartialState
import argparse
from pathlib import Path
def main():
parser ...
0
votes
0
answers
46
views
Attribution Error when using Huggingface transformers Trainer with FSDP
I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like
train_dataset = Mydataset(...)
args = TrainingArguments(...)
model = LlamaForCausalLM....
2
votes
1
answer
74
views
Transformers LlamaForCasualLM class: base_model Attribute Mystery
Question:
I'm experiencing a question with the transformers library, specifically with the pipeline initialization. When I access the base_model attribute of a LlamaForCausalLM model, it seems to ...
0
votes
0
answers
88
views
IndexError: index -1 is out of bounds for dimension 0 with size 0
I am currently experimenting with modifying the KV cache of the LLaVA model in order to perform controlled interventions during generation (similar to cache-steering methods in recent research). The ...
1
vote
0
answers
207
views
Transformers 'could not import module pipeline' to jupyter notebook
I need to to run a series of pre-trained fine-tuned models from Hugging Face to Jupyter notebook. I have updated to the latest version of both PyTorch and Transformers, but when I run the code
from ...
1
vote
1
answer
94
views
Xcode Can't Find swift-transformers Package
I'm trying to implement Speech-to-Text transcription in my Swift app using Hugging Face's swift-transformers package to run Whisper models locally.
I've added the package to my Xcode project, but when ...
0
votes
1
answer
83
views
Generating response with KV Cached System Prompt throws error when Input Tokens are less than Prompt Tokens
I am trying to run Mistral-7B-Instruct-v0.2.
Each run is PROMPT + details[i].
PROMPT has instructions on how to generate JSON based on details.
As the prefix part of each input is same; kind of like a ...
0
votes
0
answers
137
views
Transformers with Python 3.12.3 produce lots of errors
I got Python 3.12.3 on an Ubuntu server. I tried to install transformers, tokenizers, datasets and accelerate to use the Seq2SeqTrainer in the transformers.
I used a virtual environment for the ...
0
votes
0
answers
43
views
T5-small generates only padding tokens during validation/test in PyTorch Lightning
I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps.
The Problem:
During validation_step and test_step, model.generate() consistently ...
3
votes
0
answers
129
views
How does one log the operations done on a GPU during the execution of Python code?
I have encountered a particular problem while executing a function from the transformers library of huggingface on an Intel GPU wheel of torch. Since I am doing something I normally shouldn't be ...