Spaces:

asr-africa
/

Automatic_Speech_Recognition_for_African_Languages

Sleeping

App Files Files Community

Beijuka commited on Sep 27

Commit

71506d5

verified ·

1 Parent(s): f6cf615

Update src/streamlit_app.py

Browse files

Files changed (1) hide show

src/streamlit_app.py +24 -8

src/streamlit_app.py CHANGED Viewed

@@ -217,15 +217,32 @@ with tab1:
             except pyperclip.PyperclipException:
                 st.error("Could not copy automatically. Please copy manually.")
-with tab6:
     st.header("Results: WER vs Dataset Size")
     st.write("""
-    Overall, the Word Error Rate (WER) decreases as the number of training hours increases across all models and languages.
-    This highlights the importance of dataset size in improving ASR performance, although the rate of improvement varies
-    significantly between models.
     """)
     # XLS-R
     st.subheader("XLS-R")
     st.write("""
@@ -259,9 +276,8 @@ with tab6:
     st.image("src/Images/mmslog.png", caption="Log WER vs Training Hours for MMS")
     # Overall Insight
-    st.subheader("Overall Insights")
     st.write("""
-    - All models exhibit the largest WER improvements when training data is scarce.
-    - Beyond a certain dataset size, adding more data results in marginal gains.
-    - Dataset size remains a critical factor, but its impact plateaus once the model is trained on sufficient data.
     """)

             except pyperclip.PyperclipException:
                 st.error("Could not copy automatically. Please copy manually.")
+with tab6:
     st.header("Results: WER vs Dataset Size")
+    # --- Introduction ---
+    st.subheader("Introduction")
     st.write("""
+    Automatic Speech Recognition (ASR) for African languages remains challenging due to the scarcity of labeled data and limited methodological guidance for low-resource settings. While interest in multilingual and low-resource ASR is growing, there is still limited understanding of how different pretrained models perform across diverse African languages, data sizes, and decoding strategies.
+    In this study, we benchmark four state-of-the-art ASR models, Wav2Vec2 XLS-R, Whisper, MMS, and W2V-BERT, across 17 African languages representing East, West, and Southern Africa. These include Luganda, Swahili, Kinyarwanda, Wolof, Akan, Ewe, Xhosa, Lingala, Amharic, Bambara, Bemba, Zulu, Igbo, Shona, Afrikaans, Hausa, and Fula. Our findings contribute empirical insights into model robustness and data efficiency in low-resource scenarios.
     """)
+    # --- Datasets ---
+    st.subheader("Datasets")
+    st.write("""
+    We trained each ASR model on 1, 5, 10, 20, 50, 100, 200 and 400-hour splits, based on labelled data available perlanguage. For Wav2Vec2-XLS-R and W2V-BERT, we also trained 5-gram language models using available textual data to assess the impact of language model integration
+    """)
+    # --- Results ---
+    st.subheader("Results")
+    st.write("""
+    Overall, the Word Error Rate (WER) decreases as the number of training hours increases across all models and
+    languages. This highlights the importance of dataset size in improving ASR performance, although the rate of
+    improvement varies significantly between models.
+    """)
     # XLS-R
     st.subheader("XLS-R")
     st.write("""
     st.image("src/Images/mmslog.png", caption="Log WER vs Training Hours for MMS")
     # Overall Insight
+    st.subheader("Takeaways")
     st.write("""
+    Model performance generally improves with more training data, but performance gains become smaller after 100 hours for some languages. Language model are more effective when training data is limited especially below 5o hours, but their impact reduces as data increases, with some variation across languages.
     """)