🐛 Bug Description
When using model.save_pretrained_torchao(), the function incorrectly uses AutoModel instead of AutoModelForCausalLM to reload the 16-bit model.
This causes the saved config.json in the final -torchao directory to have the base model architecture (e.g., Qwen3Model) instead of the language modeling head architecture (e.g., Qwen3ModelForCausalLM).
reproducing the bug
You can see this in the unsloth/save.py file, inside the unsloth_save_pretrained_torchao function.
The problematic lines are:
On line 2772:
from transformers import AutoModel, AutoTokenizer, TorchAoConfig
And around line 2791:
model = AutoModel.from_pretrained(...)
✅ The Fix
This bug is fixed by changing the function to use AutoModelForCausalLM:
-
Change the import to:
from transformers import AutoModelForCausalLM, AutoTokenizer, TorchAoConfig
-
Change the model loading line to:
model = AutoModelForCausalLM.from_pretrained(...)