GetmanY1
/

wav2vec2-large-uralic-voxpopuli-v2-1500h

@@ -6,7 +6,7 @@ tags:
 - fi
 - finnish
 model-index:
-  - name: wav2vec2-base-fi-voxpopuli-v2-1500h
     results:
       - task:
           name: Automatic Speech Recognition
@@ -18,25 +18,25 @@ model-index:
         metrics:
           - name: Dev WER
             type: wer
-            value: 22.18
           - name: Dev CER
             type: cer
-            value: 5.96
           - name: Test WER
             type: wer
-            value: 24.43
           - name: Test CER
             type: cer
-            value: 6.97
 ---
-# Colloquial Finnish Wav2vec2-Base ASR
-[facebook/wav2vec2-base-fi-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-base-fi-voxpopuli-v2) fine-tuned on 1500 hours of [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3) on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
 ## Model description
-The Finnish Wav2Vec2 Base has the same architecture and uses the same training objective as the English and multilingual one described in [Paper](https://arxiv.org/abs/2006.11477). It is pre-trained on 2600 hours of unlabeled colloquial Finnish speech from [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3).
 You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/colloquial-Finnish-wav2vec2)
@@ -54,8 +54,8 @@ from datasets import load_dataset
 import torch
 # load model and processor
-processor = Wav2Vec2Processor.from_pretrained("wav2vec2-base-fi-voxpopuli-v2-1500h")
-model = Wav2Vec2ForCTC.from_pretrained("wav2vec2-base-fi-voxpopuli-v2-1500h")
 # load dummy dataset and read soundfiles
 ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')

 - fi
 - finnish
 model-index:
+  - name: wav2vec2-large-uralic-voxpopuli-v2-1500h
     results:
       - task:
           name: Automatic Speech Recognition
         metrics:
           - name: Dev WER
             type: wer
+            value: 19.14
           - name: Dev CER
             type: cer
+            value: 5.05
           - name: Test WER
             type: wer
+            value: 20.49
           - name: Test CER
             type: cer
+            value: 5.93
 ---
+# Colloquial Finnish Wav2vec2-Large ASR
+[facebook/wav2vec2-large-uralic-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-large-uralic-voxpopuli-v2) fine-tuned on 1500 hours of [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3) on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
 ## Model description
+The Finnish Wav2Vec2 Large has the same architecture and uses the same training objective as the English and multilingual one described in [Paper](https://arxiv.org/abs/2006.11477). It is pre-trained on 2600 hours of unlabeled colloquial Finnish speech from [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3).
 You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/colloquial-Finnish-wav2vec2)
 import torch
 # load model and processor
+processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-large-uralic-voxpopuli-v2-1500h")
+model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-large-uralic-voxpopuli-v2-1500h")
 # load dummy dataset and read soundfiles
 ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')