-
Notifications
You must be signed in to change notification settings - Fork 173
return local model in determine_base_model_source with any quantization #334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
could you please explain the rationale of this? what problem does it solve? i am unsure this is necessary |
|
@gemini-code-assist review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the model source detection by returning any locally found quantized model, which makes the behavior more consistent and should prevent user confusion. The change is well-structured and the new priority logic is clear. I've added one suggestion to refactor the code slightly for improved readability and to reduce redundancy.
Regarding your comment in the description about removing the special case for mxfp4: that's a valid point. If mxfp4 doesn't require a higher priority than Hugging Face models, simplifying the logic by treating all local quantized models equally would be a good simplification. However, the current implementation is safer if that special priority for mxfp4 is indeed required.
| # Priority 1: Local unquantized | ||
| if local_path and not local_is_quantized: | ||
| return (local_path, True, "local_unquantized", False, None) | ||
|
|
||
| # Priority 2: Local mxfp4 | ||
| if local_path and local_is_quantized and local_quant_type == "mxfp4": # local_quant_type == "mxfp4" | ||
| return (local_path, True, "local_mxfp4", True, "mxfp4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring to split the checks is a good step for clarity. To further improve it, you could group the high-priority local model checks to avoid repeating if local_path and make the logic more compact.
| # Priority 1: Local unquantized | |
| if local_path and not local_is_quantized: | |
| return (local_path, True, "local_unquantized", False, None) | |
| # Priority 2: Local mxfp4 | |
| if local_path and local_is_quantized and local_quant_type == "mxfp4": # local_quant_type == "mxfp4" | |
| return (local_path, True, "local_mxfp4", True, "mxfp4") | |
| # Priority 1 & 2: Local models | |
| if local_path: | |
| if not local_is_quantized: | |
| return (local_path, True, "local_unquantized", False, None) | |
| elif local_quant_type == "mxfp4": | |
| return (local_path, True, "local_mxfp4", True, "mxfp4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two blocks are actually functionally equivalent.
| if local_path and local_is_quantized: | ||
| return (local_path, True, f"local_{local_quant_type}", True, local_quant_type) | ||
|
|
||
| # Priority 6: Nothing suitable found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we've intentionally restricted using a 4bit bnb base model when the merge method is 16bit , because the upcasting from 4bit-> 32 bit to merge will result in a high precision loss, so we don't want the case for Priotrity 5 when we are merging in 16bits.
in fact, we explicitely prohibit this and issue a warning. a 4 bit model should be mostly used for inference , and a 16bit base model should be available if we wish to finetune.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the reason behind restricting merging of 4bit models, this is not the place where it should happen.
The code calling this has to check the quantization anyway, since a remote model is always returned if found, even if that is 4bit.
Filtering here is just reduntant and makes it more difficult to find out why the merging isn't working, since you get a model not found error instead of a model not compatible error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with a better error message if that's the intended purpose of this change, cause the priority logic seems almost the same (end point is no merging happens) , and i wasn't sure whether there was a functional bug motivating the change, hence had to ask. LGTM if that's the case.
This returns the found local model regardless of quantization.
The change also means that local base models and hf base models have more similar behaviour.
This should prevent some confution regarding the error messages if local 4bit models are returned.
I would suggest removing the special case for mxfp4 models, since the caller has to check the returned quant type in any case, but I am not too sure if that is actually desired.