What Breaks When Automatic Speech Recognition Systems Go Multilingual

Rashmi Nagpal

Natural Language Processing & Audio (incl. Generative AI NLP)
Python Skill Intermediate
Domain Expertise Intermediate

In a multilingual Automatic Speech Recognition (ASR) dataset containing over 440,000 audio samples, preprocessing methods that were effective for one language often failed silently for others. This resulted in shifts in acoustic features, misleading validation outcomes, and prolonged jobs that failed due to assumptions that held true only in monolingual contexts. This presentation examines the issues that arise when extending ASR systems to multilingual data, using a real-world deepfake detection system that includes Hindi, Korean, Mandarin, and German. It addresses the engineering challenges encountered while developing and operating a Python-based pipeline at scale.

The session will discuss practical issues in large-scale audio processing, including the creation of memory-efficient data loaders, the design of workflows that support resumable preprocessing and feature extraction, and strategies for managing long-running jobs to avoid redundant computations. Additionally, it will cover validation strategies for multilingual ASR systems, emphasizing that language imbalance and shared pipelines can lead to cross-lingual leakage, which skews evaluation results if not explicitly addressed.

Key takeaways include:

  1. Multilingual ASR pipelines reveal language-specific issues that are not present in monolingual systems.
  2. Scalable audio processing requires memory-efficient and resumable Python workflows.
  3. Cross-lingual evaluation necessitates explicit control over language imbalance and leakage.

Rashmi Nagpal

Rashmi is a AI Research Scientist at Poseidon and a researcher at MIT CSAIL, working in the intersection of cybersecurity and artificial intelligence. She has six years of industrial experience, having brought ideas to life at pre-seed startups and contributed to impactful redesigns and features at established industry giants. Beyond coding, Rashmi finds inspiration in capturing the wonders of the cosmos through her telescope and engaging in board games with friends.