Feature Extraction
Transformers
PyTorch
Safetensors
hubert

Comparison to w2v-bert-2.0

#13
by anferico - opened

Hi, how does this model compare to wav2vec-BERT 2.0? mHuBERT is pre-trained on a total of 90K hours of speech and 147 languages, whereas wav2vec-BERT 2.0 was pre-trained on 4.5M hours of unlabeled speech covering more than 143 languages. Disregarding parameter count (and therefore inference time), wav2vec-BERT 2.0 has a striking advantage on paper, so I would be curious to know if any performance comparisons were carried out.

UTTER - Unified Transcription and Translation for Extended Reality org

Hi! Thanks for the interest in our model! I unfortunately did not have time to investigate this. Their model was released more or less at the same time we were finishing our experiments. If you ever benchmark them, don't hesitate to share it. :)

Sign up or log in to comment