Skip to content

Conversation

@liangan1
Copy link
Collaborator

@liangan1 liangan1 commented Aug 13, 2025

Motivation

This PR is target to enable the MOE quantization on XPU device. The mainly changes includes:

  • The 3d weight dimensions support in int4xpulayout.
  • gpt-oss model enabling for MOE quantitation.
    According to our experiments, the gpt-oss-20B model size can be reduce to 12GB.
@pytorch-bot
Copy link

pytorch-bot bot commented Aug 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2758

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3e3ebff with merge base 8776967 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 13, 2025
@jerryzh168
Copy link
Contributor

@liangan1 this is our experimental API not ready for wider adoption yet, an urgent task would be to migrate the XPULayout to the new design for int4 actually: #2752 since we want to bump the Int4WeightOnlyConfig version, can you work on this?

also cc @Xia-Weiwen for visibility

@liangan1
Copy link
Collaborator Author

@liangan1 this is our experimental API not ready for wider adoption yet, an urgent task would be to migrate the XPULayout to the new design for int4 actually: #2752 since we want to bump the Int4WeightOnlyConfig version, can you work on this?

also cc @Xia-Weiwen for visibility

Got it. we will work on it.

@liangan1 liangan1 requested a review from jerryzh168 September 5, 2025 02:11
@liangan1 liangan1 added enhancement New feature or request topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) labels Sep 5, 2025
@liangan1 liangan1 changed the title [WIP]Enable MOE int4 quant on XPU Sep 10, 2025
@liangan1 liangan1 self-assigned this Sep 10, 2025
@jerryzh168
Copy link
Contributor

@liangan1 here is our longer term plan for moe: #2744

w1 = self.w1[expert] # I, D
w2 = self.w2[expert] # D, I
w3 = self.w3[expert] # I, D
if self.gpt_oss_mlp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems pretty hacky, I'd suggest to try to extend based on our long term API instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. make sense to me. we will refine our pr when the long term API ready.

@liangan1
Copy link
Collaborator Author

@liangan1 here is our longer term plan for moe: #2744

Thanks for your info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)

3 participants