Spaces:

rjzevallos
/

test_app

Sleeping

App Files Files Community

rjzevallos commited on Nov 25, 2024

Commit

047ab5b

verified ·

1 Parent(s): ee5c425

Update app.py

Browse files

Files changed (1) hide show

app.py +38 -9

app.py CHANGED Viewed

@@ -16,20 +16,49 @@ leaderboard_data = [
 # Texto para la pestaña de métricas
 METRICS_TAB_TEXT = """
 ## Metrics
-Here you will find details about the speech recognition metrics and datasets reported in our leaderboard.
-### UTMOS
-The **UTMOS** (Utterance Mean Opinion Score) metric evaluates the **quality** of speech synthesis models. A higher UTMOS score indicates better audio quality.
-### WER
-The **Word Error Rate (WER)** measures the **accuracy** of automatic speech recognition systems. It calculates the percentage of words in the system's output that differ from the reference transcript. Lower WER values indicate higher accuracy.
-### STOI
-The **Short-Time Objective Intelligibility (STOI)** is a metric used to evaluate the **intelligibility** of synthesized speech. Higher STOI values indicate clearer, more intelligible speech.
-### PESQ
-The **Perceptual Evaluation of Speech Quality (PESQ)** is a metric used to measure the **quality** of speech signals, considering human perception. Higher PESQ values indicate better speech quality.
 """
 ####################################
 # Functions (static version)
 ####################################

 # Texto para la pestaña de métricas
 METRICS_TAB_TEXT = """
 ## Metrics
+Models in the leaderboard are evaluated using several key metrics:
+* **UTMOS** (User-TTS Mean Opinion Score),
+* **WER** (Word Error Rate),
+* **STOI** (Short-Time Objective Intelligibility),
+* **PESQ** (Perceptual Evaluation of Speech Quality).
+These metrics help evaluate both the accuracy and quality of the model, as well as the inference speed.
+### UTMOS (User-TTS Mean Opinion Score)
+UTMOS is a subjective metric that evaluates the perceived quality of speech generated by a TTS system. **A higher UTMOS indicates better quality** of the generated voice.
+### WER (Word Error Rate)
+WER is a common metric for evaluating speech recognition systems. It measures the percentage of words in the generated transcript that differ from the reference (correct) transcript. **A lower WER value indicates higher accuracy**.
+Example:
+| Reference   | the  | cat | sat     | on  | the  | mat |
+|-------------|------|-----|---------|-----|------|-----|
+| Prediction  | the  | cat | **sit** | on  | the  |     |
+| Label       | ✅   | ✅  | S       | ✅  | ✅   | D   |
+The WER calculation is done as follows:
+```
+WER = (S + I + D) / N = (1 + 0 + 1) / 6 = 0.333
+```
+### STOI (Short-Time Objective Intelligibility)
+STOI measures the intelligibility of the synthesized speech signal compared to the original signal. **A higher STOI indicates better intelligibility**.
+### PESQ (Perceptual Evaluation of Speech Quality)
+PESQ is a perceptual metric that evaluates the quality of speech in a similar manner to how a human listener would. **A higher PESQ indicates better voice quality**.
+## How to Reproduce Our Results
+The ASR Leaderboard will continue as an effort to benchmark open-source TTS models based on the metrics mentioned above. To reproduce these results, check our [GitHub repository](https://github.com/huggingface/open_asr_leaderboard).
+## Benchmark Datasets
+Model performance is evaluated using our test datasets. These datasets cover a variety of domains and acoustic conditions, ensuring a robust evaluation.
 """
 ####################################
 # Functions (static version)
 ####################################