phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

Summary

A critical performance vulnerability has been identified in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs.

Details

Affected Component: input_processor_for_phi4mm function.

vllm/vllm/model_executor/models/phi4mm.py

Lines 1182 to 1197 in 8cac35b

    
           while i < len(input_ids): 
        
               token_id = input_ids[i] 
        
               if token_id == _AUDIO_PLACEHOLDER_TOKEN_ID: 
        
                   token_count = next(audio_embed_size_iter) 
        
                   audio_cnt += 1 
        
               elif token_id == _IMAGE_PLACEHOLDER_TOKEN_ID: 
        
                   token_count = next(image_token_count_iter) 
        
                   img_cnt += 1 
        
               else: 
        
                   user_text_input_cnt += 1 if token_id not in \ 
        
                       NON_USER_INPUT_TOKENS else 0 
        
                   i += 1 
        
                   continue 
        
               tokens = [token_id] * token_count 
        
               input_ids = input_ids[:i] + tokens + input_ids[i + 1:] 
        
               i += token_count

The code modifies the input_ids list in-place using input_ids = input_ids[:i] + tokens + input_ids[i+1:]. Each concatenation operation copies the entire list, leading to O(n) operations per replacement. For k placeholders expanding to m tokens, total time becomes O(kmn), approximating O(n²) in worst-case scenarios.

PoC

Test data demonstrates exponential time growth:

test_cases = [100, 200, 400, 800, 1600, 3200, 6400]
run_times = [0.002, 0.007, 0.028, 0.136, 0.616, 2.707, 11.854]  # seconds

Doubling input size increases runtime by ~4x (consistent with O(n²)).

Impact

Denial-of-Service (DoS): An attacker could submit inputs with many placeholders (e.g., 10,000 <|audio_1|> tokens), causing CPU/memory exhaustion.
Example: 10,000 placeholders → ~100 million operations.

Remediation Recommendations

Precompute all placeholder positions and expansion lengths upfront.
Replace dynamic list concatenation with a single preallocated array.

# Pseudocode for O(n) solution
new_input_ids = []
for token in input_ids:
    if token is placeholder:
        new_input_ids.extend([token] * precomputed_length)
    else:
        new_input_ids.append(token)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

Package

Affected versions

Patched versions

Description

Summary

Details

PoC

Impact

Remediation Recommendations

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

CVE ID

Weaknesses

Credits

	while i < len(input_ids):
	token_id = input_ids[i]
	if token_id == _AUDIO_PLACEHOLDER_TOKEN_ID:
	token_count = next(audio_embed_size_iter)
	audio_cnt += 1
	elif token_id == _IMAGE_PLACEHOLDER_TOKEN_ID:
	token_count = next(image_token_count_iter)
	img_cnt += 1
	else:
	user_text_input_cnt += 1 if token_id not in \
	NON_USER_INPUT_TOKENS else 0
	i += 1
	continue
	tokens = [token_id] * token_count
	input_ids = input_ids[:i] + tokens + input_ids[i + 1:]
	i += token_count

Uh oh!

phi4mm: Quadratic Time Complexity in Input Token Processing​ leads to denial of service

Package

Affected versions

Patched versions

Description

Summary

Details

PoC

Impact

Remediation Recommendations​

Severity

CVSS v3 base metrics

CVE ID

Weaknesses

Credits

phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service

Remediation Recommendations