rjzevallos commited on
Commit
047ab5b
·
verified ·
1 Parent(s): ee5c425

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +38 -9
app.py CHANGED
@@ -16,20 +16,49 @@ leaderboard_data = [
16
  # Texto para la pestaña de métricas
17
  METRICS_TAB_TEXT = """
18
  ## Metrics
19
- Here you will find details about the speech recognition metrics and datasets reported in our leaderboard.
20
- ### UTMOS
21
- The **UTMOS** (Utterance Mean Opinion Score) metric evaluates the **quality** of speech synthesis models. A higher UTMOS score indicates better audio quality.
22
 
23
- ### WER
24
- The **Word Error Rate (WER)** measures the **accuracy** of automatic speech recognition systems. It calculates the percentage of words in the system's output that differ from the reference transcript. Lower WER values indicate higher accuracy.
 
 
 
25
 
26
- ### STOI
27
- The **Short-Time Objective Intelligibility (STOI)** is a metric used to evaluate the **intelligibility** of synthesized speech. Higher STOI values indicate clearer, more intelligible speech.
28
 
29
- ### PESQ
30
- The **Perceptual Evaluation of Speech Quality (PESQ)** is a metric used to measure the **quality** of speech signals, considering human perception. Higher PESQ values indicate better speech quality.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  """
32
 
 
 
33
  ####################################
34
  # Functions (static version)
35
  ####################################
 
16
  # Texto para la pestaña de métricas
17
  METRICS_TAB_TEXT = """
18
  ## Metrics
 
 
 
19
 
20
+ Models in the leaderboard are evaluated using several key metrics:
21
+ * **UTMOS** (User-TTS Mean Opinion Score),
22
+ * **WER** (Word Error Rate),
23
+ * **STOI** (Short-Time Objective Intelligibility),
24
+ * **PESQ** (Perceptual Evaluation of Speech Quality).
25
 
26
+ These metrics help evaluate both the accuracy and quality of the model, as well as the inference speed.
 
27
 
28
+ ### UTMOS (User-TTS Mean Opinion Score)
29
+ UTMOS is a subjective metric that evaluates the perceived quality of speech generated by a TTS system. **A higher UTMOS indicates better quality** of the generated voice.
30
+
31
+ ### WER (Word Error Rate)
32
+ WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
33
+
34
+ Example:
35
+ | Reference | the | cat | sat | on | the | mat |
36
+ |-------------|------|-----|---------|-----|------|-----|
37
+ | Prediction | the | cat | **sit** | on | the | |
38
+ | Label | ✅ | ✅ | S | ✅ | ✅ | D |
39
+
40
+ The WER calculation is done as follows:
41
+
42
+
43
+ ```
44
+ WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
45
+ ```
46
+
47
+ ### STOI (Short-Time Objective Intelligibility)
48
+ STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
49
+
50
+ ### PESQ (Perceptual Evaluation of Speech Quality)
51
+ PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
52
+
53
+ ## How to Reproduce Our Results
54
+ The ASR Leaderboard will continue as an effort to benchmark open-source TTS models based on the metrics mentioned above. To reproduce these results, check our [GitHub repository](https://github.com/huggingface/open_asr_leaderboard).
55
+
56
+ ## Benchmark Datasets
57
+ Model performance is evaluated using our test datasets. These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
58
  """
59
 
60
+
61
+
62
  ####################################
63
  # Functions (static version)
64
  ####################################