When LLMs Are Too Big: Building Cost-Efficient High-Throughput ML Systems for E-Commerce Cataloging

Tobias Senst, Bastian Wandt

MLOps & DevOps
Python Skill Intermediate
Domain Expertise Advanced

When LLMs Are Too Big: Building Cost-Efficient High-Throughput Machine Learning for Cataloging in E-Commerce

idealo.de offers a price comparison service for over 5.7 million products from a wide variety of over thousands of categories. It navigates a dynamic, constantly changing billion-scale landscape with over 4.5 billion offers from 50,000+ shops in 6 countries. Our central challenge is cataloging this huge amount of offers automatically at scale, with a peak throughput of processing 4.8 million offers per minute.

While modern large language models (LLMs) excel in such tasks, they do not scale well to huge amounts of data. To fulfill business needs, we need to strike a balance between processing speed and offer cataloging quality. By employing modern machine learning techniques to extract specialist knowledge from downscaled state-of-the-art LLMs and a multitude of performance enhancing techniques we speed up idealo’s processing while massively improving cataloging performance. This talk presents how these solutions find the balance between cost and performance and how they integrate into idealo’s offer cataloging pipelines.

What makes this approach unique?

Our solution and practical experiences in the area of high-throughput classification are presented. This includes the operational aspects of our system, in particular the design of a stable and high-performance MLOps lifecycle integrated into our CI/CD and continuous Training pipelines. Where we automate continuous data sampling, model training, model deployments, and monitoring.

Concrete solutions and best practices are discussed that demonstrate how our model accuracy of the multilingual MiniLM transformer encoder model is improved through knowledge distillation by a large e5 instruction transformer. Additionally, we show how the integration of these models on specialized hardware like AWS Neuron enables strict runtime and latency requirements to be met in a cost-efficient manner.

In detail we will discuss the following topics:

  • Machine Learning Operation Lifecyle for a high-throughput category classification system.
  • Challenges when creating training and testing datasets from the huge amount of existing massively unbalanced data efficiently.
  • Selecting the right model in presence of the current encoder language model zoo.
  • Using knowledge distillation via student-teacher models to balance required compute and classification performance.
  • Integrating quantization techniques for speed improvements.
  • Selecting ideal compute instances for our production environment.
  • How to compile the model on custom designed machine learning accelerators using the neuron package.

Key takeaways for attendees:

  • An overview of months of research and exploration for massive throughput environments including their practical integration in live systems.
  • Modern machine learning systems in production, especially with billion-scale data, need to carefully balance business needs in terms of cost and quality.
  • State-of-the-art LLMs are often not feasible for large-scale tasks. However, new machine learning techniques can extract their knowledge for specific applications.
  • How to transition research findings to production.

The talk will be aligned along our tech stack, which includes PyTorch, PyTorch Lightning, Huggingface, AWS Sagemaker, AWS Neuron SDK, Grafana Loki, Docker and GitHub Actions.

Tobias Senst

Tobias Senst is a Senior Machine Learning Engineer at idealo internet GmbH. Tobias Senst received his PhD in 2019 from the Technische Universität Berlin under the supervision of Prof. Thomas Sikora. He has more than 10 years of experience in Computer Vision and Video Analytics research.

At idealo, he switched from the world of images and videos to Natural Language Processing and is responsible for the operation and development of machine learning models in a productive environment.

Bastian Wandt

Bastian is a Senior Machine Learning Research Engineer at idealo Internet GmbH, where he focuses on large-scale offer cataloging and high-throughput machine learning systems. Before joining idealo in 2025, he was an Assistant Professor at Linköping University in Sweden, leading a research group in 3D computer vision.

He completed his PhD in 2020 at Leibniz University Hannover with a thesis on 3D human pose estimation and subsequently spent two years at the University of British Columbia in Canada as a PostDoc, expanding his research into broader areas of 3D computer vision and teaching related courses.