This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Removed fallback for lm_head op #1482

Open

PenghuiCheng wants to merge 5 commits into main from penghuic/add_lm_head

Contributor

PenghuiCheng commented Apr 15, 2024

Type of Change

feature
No API changed

Description

Removed fallback of lm_head op for WOQ

Expected Behavior & Potential Risk

Don't fallback lm_head when weight-only quantization.

How has this PR been tested?

Local tested


          Removed fallback for lm_head op

aef6112

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>

PenghuiCheng added the WIP label

github-actions bot commented Apr 15, 2024 •

edited

Loading

⛈️ Required checks status: Has failure 🔴

Warning
If you do not have the access to re-run the CI-Summary bot, please contact VincyZhang for help. If you push a new commit, all of the workflow will be re-triggered.

Groups summary

🟢 Format Scan Tests workflow

Check ID	Status	Error details
format-scan (pylint)	success		✅
format-scan (bandit)	success		✅
format-scan (cloc)	success		✅
format-scan (cpplint)	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🔴 Optimize Unit Test workflow

Check ID	Status	Error details
optimize-unit-test-baseline	success		✅
optimize-unit-test-PR-test	failure	download	❌
Genreate-OptimizeUT-Report	skipped		❓

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟢 NeuralChat Unit Test

Check ID	Status	Error details
neuralchat-unit-test-baseline	success		✅
neuralchat-unit-test-PR-test	success		✅
Generate-NeuralChat-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟢 Engine Unit Test workflow

Check ID	Status	Error details
engine-unit-test-baseline	success		✅
engine-unit-test-PR-test	success		✅
Genreate-Engine-Report	success		✅

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

🟡 Chat Bot Test workflow

Check ID	Status	Error details
call-inference-llama-2-7b-chat-hf / inference test	queued		⌛
call-inference-mpt-7b-chat / inference test	queued		⌛

These checks are required after the changes to intel_extension_for_transformers/transformers/llm/quantization/utils.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and will be updates every 180 seconds within the next 6 hours. If you have any other questions, contact VincyZhang or XuehaoSun for help.

PenghuiCheng and others added 4 commits

April 15, 2024 07:04


          Fixed load issue

0cbaa50

Signed-off-by: Cheng Penghui <penghui.cheng@intel.com>


          update_script

78a7fc1


          Update run_generation_gpu_woq.py

57a65ec

Signed-off-by: Meng, Hengyu <hengyu.meng@intel.com>


          Update requirements_GPU.txt

1f5c68d

Signed-off-by: Meng, Hengyu <hengyu.meng@intel.com>

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.