File size: 1,413 Bytes
d26912a
 
c6f6a78
 
 
 
d26912a
c6f6a78
 
c89034e
 
 
1e40ec9
 
 
 
 
 
 
 
c89034e
 
5875b66
 
 
 
1e40ec9
 
 
 
c89034e
1e40ec9
 
 
 
45cee1b
c89034e
 
 
 
 
 
 
 
 
 
 
 
 
 
add8334
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
license: apache-2.0
datasets:
- KBLab/rixvox
language:
- sv
---
# Whisper Tiny RixVox Swedish

This is a [Whisper tiny](https://huggingface.co/openai/whisper-tiny) finetuned for Swedish using
the [RixVox](https://huggingface.co/datasets/KBLab/rixvox) dataset.

Please note that this model, as every other encoder-decoder speech-to-text model, is prone to
hallucinating on unexpected inputs and treats the task as translation rather than transcription.
I.e your mileage may vary depending on filtering and type of data.

In this release the entire encoder was frozen. Subsequent releases will not do this **if** the
generalization to other types of data (i.e not parliamentary speeches) is kept when not freezing
the encoder.

## Evaluation

<! --
* Common Voice 11 WER: 17.18
* Common Voice 11 WER (normalized*): 12.24
-->
* Fleurs WER: 51.68
* Fleurs WER (normalized*): 48.09

*) Normalization is done by applying the following to source and generated texts:

```
def normalize(s):
    return ' '.join([ x for x in sub('[^0-9a-zåäöA-ZÅÄÖ ]', ' ', s.lower()).split() ])
```

## Training

Training was done using Huggingface and Deepspeed with ZeRO stage 2.

* learning rate: 1e-5
* optimizer: CPUAdamW (Deepspeed)
* lr scheduler: linear
* warmup steps: 500
* per device batch size: 32
* GPUs: 8 x NVIDIA A100 40GB
* total batch size: 160
* steps: 10000
* lowercase: no
* fp16
* entire encoder was frozen