Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ tags:
|
|
6 |
- fi
|
7 |
- finnish
|
8 |
model-index:
|
9 |
-
- name: wav2vec2-
|
10 |
results:
|
11 |
- task:
|
12 |
name: Automatic Speech Recognition
|
@@ -18,25 +18,25 @@ model-index:
|
|
18 |
metrics:
|
19 |
- name: Dev WER
|
20 |
type: wer
|
21 |
-
value:
|
22 |
- name: Dev CER
|
23 |
type: cer
|
24 |
-
value: 5.
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
-
value:
|
28 |
- name: Test CER
|
29 |
type: cer
|
30 |
-
value:
|
31 |
---
|
32 |
-
# Colloquial Finnish Wav2vec2-
|
33 |
|
34 |
-
[facebook/wav2vec2-
|
35 |
|
36 |
|
37 |
## Model description
|
38 |
|
39 |
-
The Finnish Wav2Vec2
|
40 |
|
41 |
You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/colloquial-Finnish-wav2vec2)
|
42 |
|
@@ -54,8 +54,8 @@ from datasets import load_dataset
|
|
54 |
import torch
|
55 |
|
56 |
# load model and processor
|
57 |
-
processor = Wav2Vec2Processor.from_pretrained("wav2vec2-
|
58 |
-
model = Wav2Vec2ForCTC.from_pretrained("wav2vec2-
|
59 |
|
60 |
# load dummy dataset and read soundfiles
|
61 |
ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')
|
|
|
6 |
- fi
|
7 |
- finnish
|
8 |
model-index:
|
9 |
+
- name: wav2vec2-large-uralic-voxpopuli-v2-1500h
|
10 |
results:
|
11 |
- task:
|
12 |
name: Automatic Speech Recognition
|
|
|
18 |
metrics:
|
19 |
- name: Dev WER
|
20 |
type: wer
|
21 |
+
value: 19.14
|
22 |
- name: Dev CER
|
23 |
type: cer
|
24 |
+
value: 5.05
|
25 |
- name: Test WER
|
26 |
type: wer
|
27 |
+
value: 20.49
|
28 |
- name: Test CER
|
29 |
type: cer
|
30 |
+
value: 5.93
|
31 |
---
|
32 |
+
# Colloquial Finnish Wav2vec2-Large ASR
|
33 |
|
34 |
+
[facebook/wav2vec2-large-uralic-voxpopuli-v2](https://huggingface.co/facebook/wav2vec2-large-uralic-voxpopuli-v2) fine-tuned on 1500 hours of [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3) on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
|
35 |
|
36 |
|
37 |
## Model description
|
38 |
|
39 |
+
The Finnish Wav2Vec2 Large has the same architecture and uses the same training objective as the English and multilingual one described in [Paper](https://arxiv.org/abs/2006.11477). It is pre-trained on 2600 hours of unlabeled colloquial Finnish speech from [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3).
|
40 |
|
41 |
You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/colloquial-Finnish-wav2vec2)
|
42 |
|
|
|
54 |
import torch
|
55 |
|
56 |
# load model and processor
|
57 |
+
processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-large-uralic-voxpopuli-v2-1500h")
|
58 |
+
model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-large-uralic-voxpopuli-v2-1500h")
|
59 |
|
60 |
# load dummy dataset and read soundfiles
|
61 |
ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')
|