TalTechNLP
/

xls-r-300m-et

@@ -1,93 +1,93 @@
----
-license: cc-by-4.0
-tags:
-- audio
-- automatic-speech-recognition
-- hf-asr-leaderboard
-language: et
-model-index:
-- name: xls-r-300m-et
-  results:
-  - task:
-      name: Automatic Speech Recognition
-      type: automatic-speech-recognition
-    dataset:
-      name: Common Voice
-      type: common_voice
-      args: et
-    metrics:
-    - name: Test WER
-      type: wer
-      value: 0.12520395591222402
-    - name: Test CER
-      type: cer
-      value: 0.027091152438624897
-  - task:
-      name: Automatic Speech Recognition
-      type: automatic-speech-recognition
-    dataset:
-      name: Common Voice 8
-      type: mozilla-foundation/common_voice_8_0
-      args: et
-    metrics:
-    - name: Test WER
-      type: wer
-      value: 0.1338447882323104
-    - name: Test CER
-      type: cer
-      value: 0.029816686199500255
----
-# XLS-R-300m-ET
-This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data.
-## Model description
-This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided.
-## Intended uses & limitations
-This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
-## How to use
-TODO
-#### Limitations and bias
-Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
-  * Speech containing technical and other domain-specific terms
-  * Children's speech
-  * Non-native speech
-  * Speech recorded under very noisy conditions or with a microphone far from the speaker
-  * Very spontaneous and overlapping speech
-## Training data
-Acoustic training data:
-| Type                  | Amount (h) |
-|-----------------------|:------:|
-| Broadcast speech      |   591  |
-| Spontaneous speech    |   53   |
-| Elderly speech corpus |   53   |
-| Talks, lectures       |   49   |
-| Parliament speeches   |   31   |
-| *Total*               |   *761*  |
-## Training procedure
-Finetuned using Fairseq.
-## Evaluation results
-### WER
-|Dataset | WER |
-|---|---|
-| jutusaated.devset | 7.9 |
-| jutusaated.testset | 6.1 |
-| Common Voice 6.1 | 12.5 |
-| Common Voice 8.0 | 13.4 |

+---
+license: cc-by-4.0
+tags:
+- audio
+- automatic-speech-recognition
+- hf-asr-leaderboard
+language: et
+model-index:
+- name: xls-r-300m-et
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice
+      type: common_voice
+      args: et
+    metrics:
+    - name: Test WER
+      type: wer
+      value: 12.520395591222402
+    - name: Test CER
+      type: cer
+      value: 2.7091152438624897
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Common Voice 8
+      type: mozilla-foundation/common_voice_8_0
+      args: et
+    metrics:
+    - name: Test WER
+      type: wer
+      value: 13.38447882323104
+    - name: Test CER
+      type: cer
+      value: 2.9816686199500255
+---
+# XLS-R-300m-ET
+This is a XLS-R-300M model [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) finetuned on around 800 hours of diverse Estonian data.
+## Model description
+This is a general-purpose Estonian ASR model trained in the Lab of Language Technology at TalTech. It consists of only the CTC-based end-to-end model, no language model is currently provided.
+## Intended uses & limitations
+This model is intended for general-purpose speech recognition, such as broadcast conversations, interviews, talks, etc.
+## How to use
+TODO
+#### Limitations and bias
+Since this model was trained on mostly broadcast speech and texts from the web, it might have problems correctly decoding the following:
+  * Speech containing technical and other domain-specific terms
+  * Children's speech
+  * Non-native speech
+  * Speech recorded under very noisy conditions or with a microphone far from the speaker
+  * Very spontaneous and overlapping speech
+## Training data
+Acoustic training data:
+| Type                  | Amount (h) |
+|-----------------------|:------:|
+| Broadcast speech      |   591  |
+| Spontaneous speech    |   53   |
+| Elderly speech corpus |   53   |
+| Talks, lectures       |   49   |
+| Parliament speeches   |   31   |
+| *Total*               |   *761*  |
+## Training procedure
+Finetuned using Fairseq.
+## Evaluation results
+### WER
+|Dataset | WER |
+|---|---|
+| jutusaated.devset | 7.9 |
+| jutusaated.testset | 6.1 |
+| Common Voice 6.1 | 12.5 |
+| Common Voice 8.0 | 13.4 |