max_pixels parameter ignored when loading Qwen2VL/Qwen3VL image processors via from_pretrained()

@ArthurZucker

Bug description

When loading Qwen2VL or Qwen3VL image processors using from_pretrained() with the max_pixels parameter, the parameter is accepted without error but silently ignored during image processing. This causes images to be resized using the default max_pixels=16,777,216 instead of the user-specified value, resulting in significantly higher token counts than expected.

System Info

transformers version: 4.57.1 (and likely all previous versions with Qwen2VL/Qwen3VL support)
Affected models: all models that rely on Qwen2VL's image processor (so Qwen2VL and Qwen3VL)
CPython version: 3.13
Platform: Ubuntu 22.04.5 LTS

Who can help?

@ArthurZucker and @itazap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Steps to reproduce

from transformers import Qwen3VLProcessor
from PIL import Image

# Load processor with custom max_pixels
processor = Qwen3VLProcessor.from_pretrained(
    'Qwen/Qwen3-VL-2B-Instruct',
    trust_remote_code=True,
    max_pixels=200_000  # Expect images to be resized to ~200k pixels
)

# Check the internal state
print(f"max_pixels attribute: {processor.image_processor.max_pixels}")
print(f"size['longest_edge']: {processor.image_processor.size['longest_edge']}")

# Process a 2000×2000 image (4M pixels)
test_image = Image.new('RGB', (2000, 2000), color='red')
print(f"Input image: {test_image.size[0]}×{test_image.size[1]} = {test_image.size[0] * test_image.size[1]:,} pixels")

# Further investigations
messages = [{"role": "user", "content": [
    {"type": "image", "image": test_image},
    {"type": "text", "text": "Describe this image."}
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
result = processor(text=text, images=[test_image], return_tensors="pt")

# Check actual processed dimensions
grid_thw = result['image_grid_thw'][0]
temporal, height_patches, width_patches = grid_thw
total_patches = (temporal * height_patches * width_patches).item()
effective_pixels = total_patches * 16 * 16  # Each patch is 16×16 pixels

print(f"Grid (T, H, W): {temporal}×{height_patches}×{width_patches}")
print(f"Total patches: {total_patches}")
print(f"Effective pixels: {effective_pixels:,}")

Expected behavior

max_pixels attribute: 200000
size['longest_edge']: 200000
Input image: 2000×2000 = 4,000,000 pixels
Grid (T, H, W): 1×26×26
Total patches: 676
Effective pixels: 173,056

Actual behavior:

max_pixels attribute: 200000
size['longest_edge']: 16777216
Input image: 2000×2000 = 4,000,000 pixels
Grid (T, H, W): 1×124×124
Total patches: 15376
Effective pixels: 3,936,256

Related PR

#41954

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`max_pixels` parameter ignored when loading Qwen2VL/Qwen3VL image processors via `from_pretrained()` #41955

Bug description

System Info

Who can help?

Information

Tasks

Reproduction

Steps to reproduce

Expected behavior

Actual behavior:

Related PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

max_pixels parameter ignored when loading Qwen2VL/Qwen3VL image processors via from_pretrained() #41955

Description

Bug description

System Info

Who can help?

Information

Tasks

Reproduction

Steps to reproduce

Expected behavior

Actual behavior:

Related PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`max_pixels` parameter ignored when loading Qwen2VL/Qwen3VL image processors via `from_pretrained()` #41955