arxiv:2503.23542

Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages

Published on Mar 30

· Submitted by

zuazo on Apr 4

Upvote

Authors:

Xabier de Zuazo ,

Inma Hernáez Rioja

Abstract

Automatic speech recognition systems have undoubtedly advanced with the integration of multilingual and multitask models such as Whisper, which have shown a promising ability to understand and process speech across a wide range of languages. Despite their robustness, these models often fall short in handling the linguistic distinctions of minority languages. This study addresses this gap by integrating traditional and novel language models with fine-tuned Whisper models to raise their performance in less commonly studied languages. Through rigorous fine-tuning and evaluation across multiple datasets, we demonstrate substantial improvements in word error rate, particularly in low-resource scenarios. Our approach not only does take advantage of the extensive data Whisper was pre-trained on, but also complements its linguistic adaptability by incorporating language models. We obtained improvements up to 51\% for in-distribution datasets and up to 34\% for out-of-distribution sentences using statistical language models, while large language models provided moderate but consistently robust improvement across diverse linguistic contexts. The findings reveal that, while the integration reliably benefits all model sizes, the extent of improvement varies, highlighting the importance of optimized language model parameters. Finally, we emphasize the importance of selecting appropriate evaluation parameters when reporting the results using transformer-based ASR models. In summary, this research clears the way for more inclusive ASR technologies that perform better across languages by enriching their linguistic knowledge. For further implementation details of this study, the technical documentation and source code are available at http://www.github.com/hitz-zentroa/whisper-lm.

View arXiv page View PDF GitHub repository Add to collection

Community

zuazo

Paper author Paper submitter 1 day ago

Hello!

We are excited to share our initial work on integrating n-gram and large language models with Whisper models. It is focused on (but not technically limited to) improving results in low-resource languages.

The main code, for OpenAI models, used in our paper: https://github.com/hitz-zentroa/whisper-lm
An alternative implementation using transformers: https://github.com/hitz-zentroa/whisper-lm-transformers

We welcome any questions, feedback, or ideas for improvement!

zuazo

Paper author Paper submitter about 18 hours ago

Hello again!

I have uploaded the n-gram models here: https://huggingface.co/HiTZ/whisper-lm-ngrams (thanks to @nielsr for helping me with this), and also linked the fine-tuned Whisper models we have used.

librarian-bot

about 14 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment

Upvote

Models citing this paper 29

Browse 29 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.23542 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.23542 in a Space README.md to link it from this page.