Deploying models to LM Studio

Saving models to GGUF so you can run and deploy them to LM Studio

You can run and deploy your fine-tuned LLM directly in LM Studio. LM Studio enables easy running and deployment of GGUF models (llama.cpp format).

You can use our LM Studio notebook or follow the instructions below:

Export your Unsloth fine-tuned model to .gguf
Import / download the GGUF into LM Studio
Load it in Chat (or run it behind an OpenAI-compatible local API)

1) Export to GGUF (from Unsloth)

If you already exported a .gguf, skip to Importing into LM Studio.

q4_k_m is usually the default for local runs.

q8_0 is the optimum for near full precision quality.

f16 is largest / slowest, but original unquantized precision.

2) Import the GGUF into LM Studio

LM Studio provides a CLI called lms that can import a local .gguf into LM Studio’s models folder.

Import a GGUF file:

Keep the original file (copy instead of move):

Click for more customizable private settings

Keep the model where it is (symlink):

This is helpful for large models stored on a dedicated drive.

Skip prompts and choose the target namespace yourself:

Dry-run (shows what will happen):

After importing, the model should appear in LM Studio under My Models.

3) Load and chat in LM Studio

Open LM Studio → Chat
Open the model loader
Select your imported model
(Optional) adjust load settings (GPU offload, context length, etc.)
Chat normally in the UI

4) Serve your fine-tuned model as a local API (OpenAI-compatible)

LM Studio can serve your loaded model behind an OpenAI-compatible API (handy for apps like Open WebUI, custom agents, scripts, etc.).

Load your model in LM Studio
Go to the Developer tab
Start the local server
Use the shown base URL (default is typically http://localhost:1234/v1)

Quick test: list models

Python example (OpenAI SDK):

cURL example (chat completions):

Debugging tip: If you’re troubleshooting formatting/templates, you can inspect the raw prompt LM Studio sends to the model by running: lms log stream

Troubleshooting

Model runs in Unsloth, but LM Studio output is gibberish / repeats

This is almost always a prompt template / chat template mismatch.

LM Studio will auto-detect the prompt template from the GGUF metadata when possible, but custom or incorrectly-tagged models may need a manual override.

Fix:

Go to My Models → click the gear ⚙️ next to your model
Find Prompt Template and set it to match the template you trained with
Alternatively, in the Chat sidebar: enable the Prompt Template box (you can force it to always show)

LM Studio doesn’t show my model in “My Models”

Prefer lms import /path/to/model.gguf
Or confirm the file is in the correct folder structure: ~/.lmstudio/models/publisher/model/model-file.gguf

OOM / slow performance

Use a smaller quant (ex: Q4_K_M)
Reduce context length
Adjust GPU offload (LM Studio “Per-model defaults” / load settings)

More resources

LM Studio + Unsloth blog post (FunctionGemma walkthrough):
LM Studuo Import Models docs
LM Studio Prompt Template docs
LM Studio OpenAI-compatible API docs

PreviousOllama NextSGLang

Last updated 1 month ago

Was this helpful?