Fix Llama4 example #2846

yiliu30 · 2025-08-22T11:12:36Z

No description provided.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

pytorch-bot · 2025-08-22T11:12:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2846

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 1 Cancelled Job

As of commit c6fdc2c with merge base df7bf37 ():

NEW FAILURES - The following jobs have failed:

Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_base_cpu_1_multiple_tokens
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_base_cpu_1_multiple_tokens
Run Regression Tests / test (CPU 2.8, linux.4xlarge, torch==2.8.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_base_cpu_1_multiple_tokens
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_fake_dim_1_multiple_tokens
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_fake_dim_1_multiple_tokens
Run Regression Tests / test (CUDA 2.8, linux.g5.12xlarge.nvidia.gpu, torch==2.8.0, cuda, 12.6) / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_fake_dim_1_multiple_tokens
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_base_cpu_1_multiple_tokens
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/quantization/test_moe_quant.py::TestMoEQuantCompile::test_int8wo_fake_dim_1_multiple_tokens
Run TorchAO Experimental Tests / test-cpu-ops (macos-14) (gh)
torchao/experimental/tests/test_int8_dynamic_activation_intx_weight.py::TestInt8DynamicActivationIntxWeight::test_moe_quant_intx

CANCELLED JOB - The following job was cancelled. Please retry:

Run TorchAO Experimental Tests / test-cpu-ops (linux.arm64.2xlarge) (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/prototype/moe_quant/llama4_quant.py

yiliu30 · 2025-08-22T11:19:28Z

torchao/prototype/moe_quant/quantizable_moe_modules.py

        batch_size = x.shape[0]
        x = x.view(-1, self.hidden_dim)  # x: [T, D]
-        scores = self.router(x)  # [T, E]
+        scores = self.router(x)[0]  # [T, E]


The original router returns router_scores and router_logits.
https://github.com/huggingface/transformers/blob/19ffe0219dae122203f9726669f88ef1c6ea3bb4/src/transformers/models/llama4/modeling_llama4.py#L143

router here is a nn.Linear, not Llama4Router actually, see L21?

This rounter was replaced with the original rounter.

ao/torchao/prototype/moe_quant/llama4_quant.py

Lines 48 to 53 in 9d01b43

router = module.router

up_proj = module.experts.gate_up_proj

w1, w3 = up_proj.permute(0, 2, 1).chunk(2, dim=1)

w2 = module.experts.down_proj.permute(0, 2, 1)

new_mod.router = router

so this module does not run by itself? seems quite confusing

The MOEFeedForwardAOQuantizable was used, but it seems that its rounter shouldn’t be quantized in order to preserve accuracy. Could you confirm that? @HDCharles

yiliu30 · 2025-09-03T01:02:26Z

Hi @liangel-02 @andrewor14 Looks like the failed CI checks are not related to this PR. Could you help retrigger them?

jerryzh168 · 2025-09-26T17:33:19Z

@yiliu30 I'd recommend wait a bit until #2744 is landed for moe support

yiliu30 · 2025-09-27T05:14:08Z

@yiliu30 I'd recommend wait a bit until #2744 is landed for moe support

Sure, thanks for the information.

fix moe example

c6fdc2c

Signed-off-by: yiliu30 <yi4.liu@intel.com>

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2025

yiliu30 commented Aug 22, 2025

View reviewed changes

torchao/prototype/moe_quant/llama4_quant.py Show resolved Hide resolved

yiliu30 commented Aug 22, 2025

View reviewed changes

andrewor14 requested a review from liangel-02 August 25, 2025 19:01

liangel-02 approved these changes Aug 26, 2025

View reviewed changes

liangel-02 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Aug 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Llama4 example #2846

Fix Llama4 example #2846

Uh oh!

yiliu30 commented Aug 22, 2025

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading

Uh oh!

yiliu30 Aug 22, 2025

jerryzh168 Sep 4, 2025

yiliu30 Sep 4, 2025

jerryzh168 Sep 5, 2025

yiliu30 Sep 8, 2025

yiliu30 commented Sep 3, 2025

jerryzh168 commented Sep 26, 2025

yiliu30 commented Sep 27, 2025

Labels

3 participants

	router = module.router
	up_proj = module.experts.gate_up_proj
	w1, w3 = up_proj.permute(0, 2, 1).chunk(2, dim=1)
	w2 = module.experts.down_proj.permute(0, 2, 1)

	new_mod.router = router

Fix Llama4 example #2846

Are you sure you want to change the base?

Fix Llama4 example #2846

Uh oh!

Conversation

yiliu30 commented Aug 22, 2025

pytorch-bot bot commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2846

❌ 9 New Failures, 1 Cancelled Job

Uh oh!

yiliu30 Aug 22, 2025

Choose a reason for hiding this comment

jerryzh168 Sep 4, 2025

Choose a reason for hiding this comment

yiliu30 Sep 4, 2025

Choose a reason for hiding this comment

jerryzh168 Sep 5, 2025

Choose a reason for hiding this comment

yiliu30 Sep 8, 2025

Choose a reason for hiding this comment

yiliu30 commented Sep 3, 2025

jerryzh168 commented Sep 26, 2025

yiliu30 commented Sep 27, 2025

Labels

3 participants

pytorch-bot bot commented Aug 22, 2025 •

edited

Loading