Deploying models to LM Studio

Saving models to GGUF so you can run and deploy them to LM Studio

You can run and deploy your fine-tuned LLM directly in LM Studio. LM Studio enables easy running and deployment of GGUF models (llama.cpp format).

You can use our LM Studio notebook or follow the instructions below:

  1. Export your Unsloth fine-tuned model to .gguf

  2. Import / download the GGUF into LM Studio

  3. Load it in Chat (or run it behind an OpenAI-compatible local API)

Before fine-tuning in LM Studio
After fine-tuning in LM Studio

1) Export to GGUF (from Unsloth)

If you already exported a .gguf, skip to Importing into LM Studio.

q4_k_m is usually the default for local runs.

q8_0 is the optimum for near full precision quality.

f16 is largest / slowest, but original unquantized precision.

2) Import the GGUF into LM Studio

LM Studio provides a CLI called lms that can import a local .gguf into LM Studio’s models folder.

Import a GGUF file:

Keep the original file (copy instead of move):

Click for more customizable private settings

Keep the model where it is (symlink):

This is helpful for large models stored on a dedicated drive.

Skip prompts and choose the target namespace yourself:

Dry-run (shows what will happen):

After importing, the model should appear in LM Studio under My Models.

3) Load and chat in LM Studio

  1. Open LM Studio → Chat

  2. Open the model loader

  3. Select your imported model

  4. (Optional) adjust load settings (GPU offload, context length, etc.)

  5. Chat normally in the UI

4) Serve your fine-tuned model as a local API (OpenAI-compatible)

LM Studio can serve your loaded model behind an OpenAI-compatible API (handy for apps like Open WebUI, custom agents, scripts, etc.).

  1. Load your model in LM Studio

  2. Go to the Developer tab

  3. Start the local server

  4. Use the shown base URL (default is typically http://localhost:1234/v1)

Quick test: list models

Python example (OpenAI SDK):

cURL example (chat completions):

Debugging tip: If you’re troubleshooting formatting/templates, you can inspect the raw prompt LM Studio sends to the model by running: lms log stream

Troubleshooting

Model runs in Unsloth, but LM Studio output is gibberish / repeats

This is almost always a prompt template / chat template mismatch.

LM Studio will auto-detect the prompt template from the GGUF metadata when possible, but custom or incorrectly-tagged models may need a manual override.

Fix:

  1. Go to My Models → click the gear ⚙️ next to your model

  2. Find Prompt Template and set it to match the template you trained with

  3. Alternatively, in the Chat sidebar: enable the Prompt Template box (you can force it to always show)

LM Studio doesn’t show my model in “My Models”

  • Prefer lms import /path/to/model.gguf

  • Or confirm the file is in the correct folder structure: ~/.lmstudio/models/publisher/model/model-file.gguf

OOM / slow performance

  • Use a smaller quant (ex: Q4_K_M)

  • Reduce context length

  • Adjust GPU offload (LM Studio “Per-model defaults” / load settings)


More resources

Last updated

Was this helpful?