Skip to content

Conversation

@noah1510
Copy link
Contributor

This returns the found local model regardless of quantization.
The change also means that local base models and hf base models have more similar behaviour.
This should prevent some confution regarding the error messages if local 4bit models are returned.

I would suggest removing the special case for mxfp4 models, since the caller has to check the returned quant type in any case, but I am not too sure if that is actually desired.

@rolandtannous
Copy link
Collaborator

could you please explain the rationale of this? what problem does it solve? i am unsure this is necessary

@danielhanchen
Copy link
Contributor

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the model source detection by returning any locally found quantized model, which makes the behavior more consistent and should prevent user confusion. The change is well-structured and the new priority logic is clear. I've added one suggestion to refactor the code slightly for improved readability and to reduce redundancy.

Regarding your comment in the description about removing the special case for mxfp4: that's a valid point. If mxfp4 doesn't require a higher priority than Hugging Face models, simplifying the logic by treating all local quantized models equally would be a good simplification. However, the current implementation is safer if that special priority for mxfp4 is indeed required.

Comment on lines +2071 to +2077
# Priority 1: Local unquantized
if local_path and not local_is_quantized:
return (local_path, True, "local_unquantized", False, None)

# Priority 2: Local mxfp4
if local_path and local_is_quantized and local_quant_type == "mxfp4": # local_quant_type == "mxfp4"
return (local_path, True, "local_mxfp4", True, "mxfp4")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This refactoring to split the checks is a good step for clarity. To further improve it, you could group the high-priority local model checks to avoid repeating if local_path and make the logic more compact.

Suggested change
# Priority 1: Local unquantized
if local_path and not local_is_quantized:
return (local_path, True, "local_unquantized", False, None)
# Priority 2: Local mxfp4
if local_path and local_is_quantized and local_quant_type == "mxfp4": # local_quant_type == "mxfp4"
return (local_path, True, "local_mxfp4", True, "mxfp4")
# Priority 1 & 2: Local models
if local_path:
if not local_is_quantized:
return (local_path, True, "local_unquantized", False, None)
elif local_quant_type == "mxfp4":
return (local_path, True, "local_mxfp4", True, "mxfp4")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two blocks are actually functionally equivalent.

if local_path and local_is_quantized:
return (local_path, True, f"local_{local_quant_type}", True, local_quant_type)

# Priority 6: Nothing suitable found
Copy link
Collaborator

@rolandtannous rolandtannous Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've intentionally restricted using a 4bit bnb base model when the merge method is 16bit , because the upcasting from 4bit-> 32 bit to merge will result in a high precision loss, so we don't want the case for Priotrity 5 when we are merging in 16bits.
in fact, we explicitely prohibit this and issue a warning. a 4 bit model should be mostly used for inference , and a 16bit base model should be available if we wish to finetune.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the reason behind restricting merging of 4bit models, this is not the place where it should happen.
The code calling this has to check the quantization anyway, since a remote model is always returned if found, even if that is 4bit.
Filtering here is just reduntant and makes it more difficult to find out why the merging isn't working, since you get a model not found error instead of a model not compatible error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with a better error message if that's the intended purpose of this change, cause the priority logic seems almost the same (end point is no merging happens) , and i wasn't sure whether there was a functional bug motivating the change, hence had to ask. LGTM if that's the case.

@rolandtannous rolandtannous merged commit 09a56df into unslothai:main Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants