Tool Calling LLMs Guide

In this guide, you will learn how to use LLMs with Tool Calling with inference via llama.cpp, llama-server and OpenAI endpoints

Our guide should work for nearly any model we provide including:

Qwen3-Coder, Qwen3-Next and other Qwen models
DeepSeek-V3.1 and gpt-oss
GLM-4.7, GLM-4.7-Flash, GLM-4.6 and GLM 4.5
Devstral 2 and Devstral
And nearly all models within our model catalog.

🔨Tool Calling Setup

In a new terminal (if using tmux, use CTRL+B+D), we create some tools like adding 2 numbers, executing Python code, executing Linux functions and much more:

We then use the below functions (copy and paste and execute) which will parse the function calls automatically and call the OpenAI endpoint for any model:

Now we'll showcase multiple methods of running tool-calling for many different use-cases below:

Writing a story:

Mathematical operations:

Execute generated Python code

Execute arbitrary terminal functions

🔧 GLM-4.7-Flash + GLM 4.7 Calling

We first download GLM-4.7 or GLM-4.7-Flash via some Python code, then launch it via llama-server in a separate terminal (like using tmux). In this example we download the large GLM-4.7 model:

If you ran it successfully, you should see:

Now launch it via llama-server in a new terminal. Use tmux if you want:

And you will get:

Now in a new terminal and executing Python code, reminder to run Tool Calling Setup We use GLM 4.7's optimal parameters of temperature = 0.7 and top_p = 1.0

Tool Call for mathematical operations for GLM 4.7

Tool Call to execute generated Python code for GLM 4.7

⚒️Devstral 2 Tool Calling

We first download Devstral 2 via some Python code, then launch it via llama-server in a separate terminal (like using tmux):

If you ran it successfully, you should see:

Now launch it via llama-server in a new terminal. Use tmux if you want:

You will see the below if it succeeded:

We then call the model with the following message and with Devstral's suggested parameters of temperature = 0.15 only. Reminder to run Tool Calling Setup

PreviousAider Polyglot Benchmarks NextVision Fine-tuning

Last updated 5 days ago

Was this helpful?

import json, subprocess, random from typing import Any def add_number(a: float | str, b: float | str) -> float: return float(a) + float(b) def multiply_number(a: float | str, b: float | str) -> float: return float(a) * float(b) def substract_number(a: float | str, b: float | str) -> float: return float(a) - float(b) def write_a_story() -> str: return random.choice([ "A long time ago in a galaxy far far away...", "There were 2 friends who loved sloths and code...", "The world was ending because every sloth evolved to have superhuman intelligence...", "Unbeknownst to one friend, the other accidentally coded a program to evolve sloths...", ]) def terminal(command: str) -> str: if "rm" in command or "sudo" in command or "dd" in command or "chmod" in command: msg = "Cannot execute 'rm, sudo, dd, chmod' commands since they are dangerous" print(msg); return msg print(f"Executing terminal command `{command}`") try: return str(subprocess.run(command, capture_output = True, text = True, shell = True, check = True).stdout) except subprocess.CalledProcessError as e: return f"Command failed: {e.stderr}" def python(code: str) -> str: data = {} exec(code, data) del data["__builtins__"] return str(data) MAP_FN = { "add_number": add_number, "multiply_number": multiply_number, "substract_number": substract_number, "write_a_story": write_a_story, "terminal": terminal, "python": python, } tools = [ { "type": "function", "function": { "name": "add_number", "description": "Add two numbers.", "parameters": { "type": "object", "properties": { "a": { "type": "string", "description": "The first number.", }, "b": { "type": "string", "description": "The second number.", }, }, "required": ["a", "b"], }, }, }, { "type": "function", "function": { "name": "multiply_number", "description": "Multiply two numbers.", "parameters": { "type": "object", "properties": { "a": { "type": "string", "description": "The first number.", }, "b": { "type": "string", "description": "The second number.", }, }, "required": ["a", "b"], }, }, }, { "type": "function", "function": { "name": "substract_number", "description": "Substract two numbers.", "parameters": { "type": "object", "properties": { "a": { "type": "string", "description": "The first number.", }, "b": { "type": "string", "description": "The second number.", }, }, "required": ["a", "b"], }, }, }, { "type": "function", "function": { "name": "write_a_story", "description": "Writes a random story.", "parameters": { "type": "object", "properties": {}, "required": [], }, }, }, { "type": "function", "function": { "name": "terminal", "description": "Perform operations from the terminal.", "parameters": { "type": "object", "properties": { "command": { "type": "string", "description": "The command you wish to launch, e.g `ls`, `rm`, ...", }, }, "required": ["command"], }, }, }, { "type": "function", "function": { "name": "python", "description": "Call a Python interpreter with some Python code that will be ran.", "parameters": { "type": "object", "properties": { "code": { "type": "string", "description": "The Python code to run", }, }, "required": ["code"], }, }, }, ]