Spaces:

asr-africa
/

Automatic_Speech_Recognition_for_African_Languages

Running

App Files Files Community

Beijuka commited on 28 days ago

Commit

9788c5f

verified ·

1 Parent(s): 1dff7ff

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

src/streamlit_app.py +53 -23

src/streamlit_app.py CHANGED Viewed

@@ -21,7 +21,7 @@ tab1, tab2, tab3, tab4, tab5, tab6, tab7 = st.tabs([
     "Model Collections",
     "Evaluation Scenarios",
     "ASR models demo",
-    "Results",
     "Human Evaluation of ASR Models"
 ])
@@ -194,32 +194,62 @@ with tab1:
         We will build a test set that can be used for benchmarking ASR models in some of the 30 most spoken African languages. The benchmark dataset will be structured to consist of unique MP3 files and corresponding text files. We will ensure as much as possible that the benchmark datasets are as diverse as possible with dataset characteristics like gender, age, accent, variant, vocabulary, acoustic characteristics to help improve the accuracy of speech recognition models. The speech benchmark dataset will be reviewed, deemed highly quality, and split into dev, test and train sets. Due to the largely acoustic nature of African languages (mostly tonal, diacritical, etc.), a careful speech analysis of African languages is necessary and the benchmark dataset is important to spur more research in the African context.
     """)
-    # Citation
-    CITATION_TEXT = """@misc{asr-africa-2025,
-    title        = {Automatic Speech Recognition for African Languages},
-    author       = {Dr Joyce Nakatumba-Nabende, Dr Peter Nabende, Dr Andrew Katumba, Alvin Nahabwe},
-    year         = 2025,
-    publisher    = {Hugging Face},
-    howpublished = "\\url{https://huggingface.co/spaces/asr-africa/Automatic_Speech_Recognition_for_African_Languages}"
-    }"""
-    with st.expander("📙 Citation", expanded=False):
-        st.text_area(
-            "BibTeX snippet to cite this source",
-            value=CITATION_TEXT,
-            height=150,
-            disabled=True
-        )
-        if st.button("📋 Copy to Clipboard"):
-            try:
-                pyperclip.copy(CITATION_TEXT)
-                st.success("Citation copied to clipboard!")
-            except pyperclip.PyperclipException:
-                st.error("Could not copy automatically. Please copy manually.")
 with tab6:
-    st.header("Results: WER vs Dataset Size")
     # --- Introduction ---
     st.subheader("Introduction")

     "Model Collections",
     "Evaluation Scenarios",
     "ASR models demo",
+    "Quantitative Results",
     "Human Evaluation of ASR Models"
 ])
         We will build a test set that can be used for benchmarking ASR models in some of the 30 most spoken African languages. The benchmark dataset will be structured to consist of unique MP3 files and corresponding text files. We will ensure as much as possible that the benchmark datasets are as diverse as possible with dataset characteristics like gender, age, accent, variant, vocabulary, acoustic characteristics to help improve the accuracy of speech recognition models. The speech benchmark dataset will be reviewed, deemed highly quality, and split into dev, test and train sets. Due to the largely acoustic nature of African languages (mostly tonal, diacritical, etc.), a careful speech analysis of African languages is necessary and the benchmark dataset is important to spur more research in the African context.
     """)
+    # # Citation
+    # CITATION_TEXT = """@misc{asr-africa-2025,
+    # title        = {Automatic Speech Recognition for African Languages},
+    # author       = {Dr Joyce Nakatumba-Nabende, Dr Peter Nabende, Dr Andrew Katumba, Alvin Nahabwe},
+    # year         = 2025,
+    # publisher    = {Hugging Face},
+    # howpublished = "\\url{https://huggingface.co/spaces/asr-africa/Automatic_Speech_Recognition_for_African_Languages}"
+    # }"""
+    # with st.expander("📙 Citation", expanded=False):
+    #     st.text_area(
+    #         "BibTeX snippet to cite this source",
+    #         value=CITATION_TEXT,
+    #         height=150,
+    #         disabled=True
+    #     )
+        # if st.button("📋 Copy to Clipboard"):
+        #     try:
+        #         pyperclip.copy(CITATION_TEXT)
+        #         st.success("Citation copied to clipboard!")
+        #     except pyperclip.PyperclipException:
+        #         st.error("Could not copy automatically. Please copy manually.")
+    # --- Platform preview for About tab ---
+    st.markdown("""
+## Platform overview
+A preview of what the platform contains and how to navigate. Use the links and tabs in the top navigation to jump to demos, datasets, results, or evaluation details.
+1. **Benchmark Datasets:**
+       A multilingual collection covering over **17 African languages**, built from open corpora (e.g., Common Voice, Fleurs, NCHLT, ALFFA, Naija Voices).
+       Each dataset is cleaned, validated, and partitioned into training, development, and test splits to ensure fair benchmarking.
+2. **Model Collections:**
+       Fine-tuned ASR models derived from **Wav2Vec2 XLS-R**, **Whisper**, **MMS**, and **W2V-BERT**, adapted for African phonetic, tonal, and orthographic features.
+       These are hosted as public collections on [Hugging Face](https://huggingface.co/asr-africa).
+3. **Evaluation Scenarios:**
+       Designed to test **data efficiency**, **domain adaptation**, and **speech-type robustness** — e.g., how models generalize from read speech to spontaneous dialogue,
+       or from education to agricultural domains.
+4. **ASR Demo Interface:**
+       A **Gradio-powered live testing tool**, allowing users to upload or record audio, view transcriptions, and submit structured feedback via the integrated backend API.
+5. **Quantitative Results:**
+    Comprehensive analysis of model performance across training hours and data scales (1–400 hours), visualized through **Word Error Rate (WER)** and **Character Error Rate (CER)** trends.
+    Findings show clear **data scaling laws**, with XLS-R and W2V-BERT models performing best under low-resource conditions.
+6. **Human Evaluation Framework:**
+    A structured qualitative evaluation conducted with **20 native-language evaluators** across 12 languages.
+    Evaluators assessed **accuracy**, **meaning preservation**, **orthography**, and **error types** (e.g., named entities, punctuation, diacritics).
+    This data is publicly available in the curated [ASR_Evaluation_dataset](https://huggingface.co/datasets/asr-africa/ASR_Evaluation_dataset).
+""")
 with tab6:
+    st.header("Quantitative Results: WER vs Dataset Size")
     # --- Introduction ---
     st.subheader("Introduction")