-
Notifications
You must be signed in to change notification settings - Fork 31k
Open
Labels
Description
System Info
macos 26.0
python 3.10
pytorch 2.7.1
transformers 4.57.1
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
import torch
from transformers import AutoModel, AutoProcessor
from transformers.image_utils import load_image
# load the model and processor
ckpt = "google/siglip2-so400m-patch16-naflex"
model = AutoModel.from_pretrained(ckpt).eval()
processor = AutoProcessor.from_pretrained(ckpt)
# load the image
image = load_image("https://huggingface.co/datasets/merve/coco/resolve/main/val2017/000000000285.jpg")
inputs = processor(images=[image], return_tensors="pt").to(model.device)
# run infernece
with torch.no_grad():
image_embeddings = model.vision_model(
pixel_values = inputs['pixel_values'],
attention_mask = inputs['pixel_attention_mask'],
spatial_shapes = inputs['spatial_shapes'],
output_attentions = True,
output_hidden_states = True
)
print(image_embeddings)
Expected behavior
I'm trying to get attributes, hidden_states from google/siglip2-so400m-patch16-naflex using the model.vision_model.forward() method.
from_pretrained(), you can also add output_attentions as a parameter,
If you add it as a new config or as a parameter in .forward(), it always returns attributes and hidden_states to none in all operations.
Changing attn_implementation to eager does not solve the problem.