Spaces:
Sleeping
Sleeping
Update app.py
Browse files
app.py
CHANGED
@@ -16,20 +16,49 @@ leaderboard_data = [
|
|
16 |
# Texto para la pestaña de métricas
|
17 |
METRICS_TAB_TEXT = """
|
18 |
## Metrics
|
19 |
-
Here you will find details about the speech recognition metrics and datasets reported in our leaderboard.
|
20 |
-
### UTMOS
|
21 |
-
The **UTMOS** (Utterance Mean Opinion Score) metric evaluates the **quality** of speech synthesis models. A higher UTMOS score indicates better audio quality.
|
22 |
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
25 |
|
26 |
-
|
27 |
-
The **Short-Time Objective Intelligibility (STOI)** is a metric used to evaluate the **intelligibility** of synthesized speech. Higher STOI values indicate clearer, more intelligible speech.
|
28 |
|
29 |
-
###
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
"""
|
32 |
|
|
|
|
|
33 |
####################################
|
34 |
# Functions (static version)
|
35 |
####################################
|
|
|
16 |
# Texto para la pestaña de métricas
|
17 |
METRICS_TAB_TEXT = """
|
18 |
## Metrics
|
|
|
|
|
|
|
19 |
|
20 |
+
Models in the leaderboard are evaluated using several key metrics:
|
21 |
+
* **UTMOS** (User-TTS Mean Opinion Score),
|
22 |
+
* **WER** (Word Error Rate),
|
23 |
+
* **STOI** (Short-Time Objective Intelligibility),
|
24 |
+
* **PESQ** (Perceptual Evaluation of Speech Quality).
|
25 |
|
26 |
+
These metrics help evaluate both the accuracy and quality of the model, as well as the inference speed.
|
|
|
27 |
|
28 |
+
### UTMOS (User-TTS Mean Opinion Score)
|
29 |
+
UTMOS is a subjective metric that evaluates the perceived quality of speech generated by a TTS system. **A higher UTMOS indicates better quality** of the generated voice.
|
30 |
+
|
31 |
+
### WER (Word Error Rate)
|
32 |
+
WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
|
33 |
+
|
34 |
+
Example:
|
35 |
+
| Reference | the | cat | sat | on | the | mat |
|
36 |
+
|-------------|------|-----|---------|-----|------|-----|
|
37 |
+
| Prediction | the | cat | **sit** | on | the | |
|
38 |
+
| Label | ✅ | ✅ | S | ✅ | ✅ | D |
|
39 |
+
|
40 |
+
The WER calculation is done as follows:
|
41 |
+
|
42 |
+
|
43 |
+
```
|
44 |
+
WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
|
45 |
+
```
|
46 |
+
|
47 |
+
### STOI (Short-Time Objective Intelligibility)
|
48 |
+
STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
|
49 |
+
|
50 |
+
### PESQ (Perceptual Evaluation of Speech Quality)
|
51 |
+
PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
|
52 |
+
|
53 |
+
## How to Reproduce Our Results
|
54 |
+
The ASR Leaderboard will continue as an effort to benchmark open-source TTS models based on the metrics mentioned above. To reproduce these results, check our [GitHub repository](https://github.com/huggingface/open_asr_leaderboard).
|
55 |
+
|
56 |
+
## Benchmark Datasets
|
57 |
+
Model performance is evaluated using our test datasets. These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
|
58 |
"""
|
59 |
|
60 |
+
|
61 |
+
|
62 |
####################################
|
63 |
# Functions (static version)
|
64 |
####################################
|