kamahori's picture
Add quick start code and citation to model card (#1)
5d02f38 verified
|
raw
history blame
3.48 kB
metadata
base_model: openai/whisper-large-v3-turbo
library_name: transformers
license: apache-2.0
pipeline_tag: automatic-speech-recognition
tags:
  - audio
  - automatic-speech-recognition
  - whisper
  - hf-asr-leaderboard

Model Card for Lite-Whisper large-v3-turbo-acc

Lite-Whisper is a compressed version of OpenAI Whisper with LiteASR. See our GitHub repository and paper for details.

Benchmark Results

Following is the average word error rate (WER) evaluated on the ESB datasets:

Model Average WER (↓) Encoder Size Decoder Size
whisper-large-v3 10.1 635M 907M
lite-whisper-large-v3-acc 10.1 429M 907M
lite-whisper-large-v3 10.2 377M 907M
lite-whisper-large-v3-fast 11.3 308M 907M
       
whisper-large-v3-turbo 10.1 635M 172M
lite-whisper-large-v3-turbo-acc 10.2 421M 172M
lite-whisper-large-v3-turbo 12.6 374M 172M
lite-whisper-large-v3-turbo-fast 20.1 313M 172M
       
whisper-medium 14.8 306M 457M

Quick Start

The easiest way to run our model is to use our integration with HuggingFace Transformers library. We provide model weights for the compressed version of OpenAI Whisper series here.

import librosa 
import torch
from transformers import AutoProcessor, AutoModel

device = "cuda:0"
dtype = torch.float16

# load the compressed Whisper model
model = AutoModel.from_pretrained(
    "efficient-speech/lite-whisper-large-v3-turbo", 
    trust_remote_code=True, 
)
model.to(dtype).to(device)

# we use the same processor as the original model
processor = AutoProcessor.from_pretrained("openai/whisper-large-v3")

# set the path to your audio file
path = "path/to/audio.wav"
audio, _ = librosa.load(path, sr=16000)

input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
input_features = input_features.to(dtype).to(device)

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(
    predicted_ids, 
    skip_special_tokens=True
)[0]

print(transcription)

Citation

If you use LiteASR in your research, please cite the following paper:

@misc{kamahori2025liteasrefficientautomaticspeech,
      title={LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation}, 
      author={Keisuke Kamahori and Jungo Kasai and Noriyuki Kojima and Baris Kasikci},
      year={2025},
      eprint={2502.20583},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.20583}, 
}