-
Notifications
You must be signed in to change notification settings - Fork 174
return local model in determine_base_model_source with any quantization #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2068,22 +2068,27 @@ def determine_base_model_source(model_name, token=None): | |
| if local_path: | ||
| local_is_quantized, local_quant_type = check_model_quantization_status(local_path) | ||
|
|
||
| # Priority 1: Local unquantized OR Local mxfp4 | ||
| if local_path and (not local_is_quantized or local_quant_type == "mxfp4"): | ||
| if not local_is_quantized: | ||
| return (local_path, True, "local_unquantized", False, None) | ||
| else: # local_quant_type == "mxfp4" | ||
| return (local_path, True, "local_mxfp4", True, "mxfp4") | ||
|
|
||
| # Priority 2: HF unquantized | ||
| # Priority 1: Local unquantized | ||
| if local_path and not local_is_quantized: | ||
| return (local_path, True, "local_unquantized", False, None) | ||
|
|
||
| # Priority 2: Local mxfp4 | ||
| if local_path and local_is_quantized and local_quant_type == "mxfp4": # local_quant_type == "mxfp4" | ||
| return (local_path, True, "local_mxfp4", True, "mxfp4") | ||
|
|
||
| # Priority 3: HF unquantized | ||
| if hf_exists and not hf_is_quantized: | ||
| return (model_name, False, "HF_unquantized", False, None) | ||
|
|
||
| # Priority 3: HF quantized (covers both "both quantized" and "just HF quantized") | ||
| # Priority 4: HF quantized (covers both "both quantized" and "just HF quantized") | ||
| if hf_exists and hf_is_quantized: | ||
| return (model_name, False, f"HF_{hf_quant_type}", True, hf_quant_type) | ||
|
|
||
| # Priority 4: Nothing suitable found | ||
| # Priority 5: Local other quantization | ||
| if local_path and local_is_quantized: | ||
| return (local_path, True, f"local_{local_quant_type}", True, local_quant_type) | ||
|
|
||
| # Priority 6: Nothing suitable found | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we've intentionally restricted using a 4bit bnb base model when the merge method is 16bit , because the upcasting from 4bit-> 32 bit to merge will result in a high precision loss, so we don't want the case for Priotrity 5 when we are merging in 16bits.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I understand the reason behind restricting merging of 4bit models, this is not the place where it should happen.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with a better error message if that's the intended purpose of this change, cause the priority logic seems almost the same (end point is no merging happens) , and i wasn't sure whether there was a functional bug motivating the change, hence had to ask. LGTM if that's the case. |
||
| return (None, False, "", False, None) | ||
| pass | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring to split the checks is a good step for clarity. To further improve it, you could group the high-priority local model checks to avoid repeating
if local_pathand make the logic more compact.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two blocks are actually functionally equivalent.