[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp #31597

AndreasKaratzas · 2026-01-01T02:55:52Z

Fixes ROCm accuracy failures in language generation tests by disabling flash_sdp and mem_efficient_sdp backends for HuggingFace Transformers inference.

Problem

Tests comparing HuggingFace and vLLM outputs were failing on ROCm due to numerical divergence:

AssertionError: Test5:
hf:   'fact' (logprob=-7.794)
vllm: 'upset' (logprob=-8.045)

The root cause is accuracy issues in ROCm's flash attention and memory-efficient SDP implementations in HuggingFace Transformers (#30167).

Solution

Add a conftest.py that configures PyTorch SDP backends on ROCm:

Disable flash_sdp
Disable mem_efficient_sdp
Enable math_sdp (more accurate reference implementation)

This ensures HuggingFace uses a numerically stable attention path for baseline comparisons.

Testing

Verified pytest -v -s tests/models/language/generation/test_common.py::test_models[False-True-5-32-TitanML/tiny-mixtral] passes with this fix

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas · 2026-01-01T02:56:33Z

cc @tjtanaa

gemini-code-assist

Code Review

This pull request introduces a workaround for ROCm accuracy issues in language generation tests by disabling flash_sdp and mem_efficient_sdp backends in HuggingFace Transformers. The change is implemented by adding a conftest.py file that applies these settings at the start of the test session. The approach is sound and the code is clear. My only suggestion is to use a more semantically appropriate pytest hook (pytest_sessionstart instead of pytest_collection_modifyitems) to improve maintainability and adherence to pytest best practices.

gemini-code-assist · 2026-01-01T02:57:14Z

tests/models/language/generation/conftest.py

+def pytest_collection_modifyitems(config, items):
+    """Configure ROCm-specific settings based on collected tests."""
+    if not current_platform.is_rocm():
+        return
+
+    # Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers
+    # accuracy issues: https://github.com/vllm-project/vllm/issues/30167
+    # TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace
+    torch.backends.cuda.enable_flash_sdp(False)
+    torch.backends.cuda.enable_mem_efficient_sdp(False)
+    torch.backends.cuda.enable_math_sdp(True)
+    warnings.warn(
+        "ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp "
+        "to avoid HuggingFace Transformers accuracy issues",
+        UserWarning,
+        stacklevel=1,
+    )


The pytest_collection_modifyitems hook is intended for modifying the list of collected test items, not for test session setup. Using it for setup can be misleading and is not idiomatic. A more appropriate hook for this kind of session-level setup is pytest_sessionstart, which is executed once at the beginning of the test session. Alternatively, a session-scoped autouse fixture could be used. Using the correct hook improves code clarity and maintainability.

Suggested change

def pytest_collection_modifyitems(config, items):

"""Configure ROCm-specific settings based on collected tests."""

if not current_platform.is_rocm():

return

# Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers

# accuracy issues: https://github.com/vllm-project/vllm/issues/30167

# TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace

torch.backends.cuda.enable_flash_sdp(False)

torch.backends.cuda.enable_mem_efficient_sdp(False)

torch.backends.cuda.enable_math_sdp(True)

warnings.warn(

"ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp "

"to avoid HuggingFace Transformers accuracy issues",

UserWarning,

stacklevel=1,

)

def pytest_sessionstart(session):

"""Configure ROCm-specific settings before test session starts."""

if not current_platform.is_rocm():

return

# Disable Flash/MemEfficient SDP on ROCm to avoid HF Transformers

# accuracy issues: https://github.com/vllm-project/vllm/issues/30167

# TODO: Remove once ROCm SDP accuracy issues are resolved on HuggingFace

torch.backends.cuda.enable_flash_sdp(False)

torch.backends.cuda.enable_mem_efficient_sdp(False)

torch.backends.cuda.enable_math_sdp(True)

warnings.warn(

"ROCm: Disabled flash_sdp and mem_efficient_sdp, enabled math_sdp "

"to avoid HuggingFace Transformers accuracy issues",

UserWarning,

stacklevel=1,

)

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Fix ROCm HF test by disabling flash/mem_efficient SDP for HF

b869898

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

AndreasKaratzas requested review from DarkLight1337 and ywang96 as code owners January 1, 2026 02:55

AndreasKaratzas changed the title ~~[ROCm] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp~~ Jan 1, 2026

mergify bot added the rocm Related to AMD ROCm label Jan 1, 2026

gemini-code-assist bot reviewed Jan 1, 2026

View reviewed changes

Use pytest_sessionstart instead of pytest_collection_modifyitems

7fb8fb9

Signed-off-by: Andreas Karatzas <akaratza@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp #31597

[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp #31597

AndreasKaratzas commented Jan 1, 2026 •

edited by github-actions bot

Loading

AndreasKaratzas commented Jan 1, 2026

gemini-code-assist bot left a comment

gemini-code-assist bot Jan 1, 2026

AndreasKaratzas Jan 1, 2026

Labels

1 participant

Uh oh!

[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp #31597

Are you sure you want to change the base?

[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp #31597

Conversation

AndreasKaratzas commented Jan 1, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Related

Testing

AndreasKaratzas commented Jan 1, 2026

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist bot Jan 1, 2026

Choose a reason for hiding this comment

AndreasKaratzas Jan 1, 2026

Choose a reason for hiding this comment

Labels

1 participant

AndreasKaratzas commented Jan 1, 2026 •

edited by github-actions bot

Loading