Large language models have been widely used in tool-calling workflows thanks to their strong performance in generating appropriate function calls. However, due to their size and cost, they are inaccessible to small-scale builders, and server-side computing makes data privacy challenging. Small language models (SLMs) are a promising, affordable alternative that can run on local hardware, ensuring higher privacy.

Unfortunately, SLMs struggle with this task - they pass wrong arguments when calling functions with many parameters, and make mistakes when the conversation spans multiple turns. On the other hand, for production applications with specific API sets, we often don't need general-purpose LLMs—we need reliable, specialized models.

This talk demonstrates how to increase the accuracy SLMs (under 8B parameters) for custom tool calling tasks. We will share how leveraging knowledge distillation helps to get the most out of SLMs in low-data settings - they can even outperform LLMs! We will present the whole pipeline from data generation, fine-tuning, and local deployment.

What you'll learn:

Tool calling: Different tool calling settings (single and multi-turn)
Distillation: Using large models as teachers to train specialized, compact models that maintain reliability with lower computational cost.
Tool calling data generation: Challenges in generating diverse tool calling data.

Gabi Kadlecova

I am a Machine Learning Researcher at Distil Labs.

I have a PhD in Neural Architecture Search from Charles University in Prague. I also did an internship at Amazon Alexa in Turin. During my PhD, I focused on Neural Architecture Search and surrogate models. Now I shifted focus to small language models - I like that they save us a lot of compute & as local models, they are privacy-friendly.

Small Language Models for Tool Calling Are Better Than You Think

Gabi Kadlecova

Gabi Kadlecova