Enable MOE int4 quant on XPU #2758

liangan1 · 2025-08-13T06:17:41Z

Motivation

This PR is target to enable the MOE quantization on XPU device. The mainly changes includes:

The 3d weight dimensions support in int4xpulayout.
gpt-oss model enabling for MOE quantitation.
According to our experiments, the gpt-oss-20B model size can be reduce to 12GB.

pytorch-bot · 2025-08-13T06:17:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2758

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3e3ebff with merge base 8776967 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2025-08-19T01:12:34Z

@liangan1 this is our experimental API not ready for wider adoption yet, an urgent task would be to migrate the XPULayout to the new design for int4 actually: #2752 since we want to bump the Int4WeightOnlyConfig version, can you work on this?

also cc @Xia-Weiwen for visibility

liangan1 · 2025-08-20T00:24:53Z

@liangan1 this is our experimental API not ready for wider adoption yet, an urgent task would be to migrate the XPULayout to the new design for int4 actually: #2752 since we want to bump the Int4WeightOnlyConfig version, can you work on this?

also cc @Xia-Weiwen for visibility

Got it. we will work on it.

jerryzh168 · 2025-09-10T01:02:48Z

@liangan1 here is our longer term plan for moe: #2744

jerryzh168 · 2025-09-10T01:04:46Z

torchao/prototype/moe_quant/quantizable_moe_modules.py

                w1 = self.w1[expert]  # I, D
                w2 = self.w2[expert]  # D, I
                w3 = self.w3[expert]  # I, D
+                if self.gpt_oss_mlp:


this seems pretty hacky, I'd suggest to try to extend based on our long term API instead

Thanks. make sense to me. we will refine our pr when the long term API ready.

liangan1 · 2025-09-10T01:13:52Z

@liangan1 here is our longer term plan for moe: #2744

Thanks for your info.

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 13, 2025

liangan1 and others added 6 commits September 3, 2025 19:48

Enable MOE int4 quant:

51b4246

Add MOE quant for gpt-oss

c34babe

Add gpt_oss_quant for MOE

6f54856

Update int4_xpu_layout.py

75e67fb

Update int4_xpu_layout.py

1ad3ad2

Add MOE quant for gpt-oss

82bb33f

xiaowangintel force-pushed the liangan1/moe_quant_xpu branch from 37dc887 to 82bb33f Compare September 4, 2025 02:49

xiaowangintel added 3 commits September 3, 2025 19:59

Add MOE quant for gpt-oss

4e69afe

Add MOE quant for gpt-oss

68628e6

Add MOE quant for gpt-oss

3e3ebff

liangan1 requested a review from jerryzh168 September 5, 2025 02:11

liangan1 added enhancement New feature or request topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) labels Sep 5, 2025

liangan1 changed the title ~~[WIP]Enable MOE int4 quant on XPU~~ Sep 10, 2025

liangan1 self-assigned this Sep 10, 2025

jerryzh168 reviewed Sep 10, 2025

View reviewed changes

liangan1 mentioned this pull request Oct 17, 2025

[TorchAO] Int4XPULayout Lacks MoE Quant Support intel/torch-xpu-ops#1913

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable MOE int4 quant on XPU #2758

Enable MOE int4 quant on XPU #2758

Uh oh!

liangan1 commented Aug 13, 2025 •

edited

Loading

pytorch-bot bot commented Aug 13, 2025 •

edited

Loading

jerryzh168 commented Aug 19, 2025

liangan1 commented Aug 20, 2025

jerryzh168 commented Sep 10, 2025

jerryzh168 Sep 10, 2025

liangan1 Sep 10, 2025

liangan1 commented Sep 10, 2025

Labels

3 participants

Enable MOE int4 quant on XPU #2758

Are you sure you want to change the base?

Enable MOE int4 quant on XPU #2758

Uh oh!

Conversation

liangan1 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

pytorch-bot bot commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2758

✅ No Failures

jerryzh168 commented Aug 19, 2025

liangan1 commented Aug 20, 2025

jerryzh168 commented Sep 10, 2025

jerryzh168 Sep 10, 2025

Choose a reason for hiding this comment

liangan1 Sep 10, 2025

Choose a reason for hiding this comment

liangan1 commented Sep 10, 2025

Labels

3 participants

liangan1 commented Aug 13, 2025 •

edited

Loading

pytorch-bot bot commented Aug 13, 2025 •

edited

Loading