Beijuka's picture
Upload benchmark.md
74e9683 verified
|
raw
history blame
4.06 kB

ASR Africa Benchmark Dataset

The main objective of this study is to develop an evidence base for the amount of speech data required to build a good automatic speech recognition model across priority “low-resource” African languages and key domain areas. This was achieved by developing ASR models for African languages, evaluating their performance, and building benchmark speech corpora for these languages. The African languages in discussion are Luganda, Kinyarwanda, Lingala, Swahili, Ahmaric, Oromo, Yoruba, Hausa, Igbo, Wolof, Fula, Ewe, Zulu, Xhosa, Afrikaans, Bemba and Shona.

Benchmark Datasets Characteristics

Dataset Domain Speech Type Languages License URL Version Date of Publication
Common Voice Generic Read Swahili, Luganda, Kinyarwanda MPL-2.0 link V19 09/18/2024
FLEURS Generic Read Wolof, Swahili, Luganda, Lingala CC-BY-4.0 link V0 05/25/2022
Naija Voices Generic Read Igbo, Yoruba, Hausa CC-BY-NC-SA-4.0 link V0 05/06/2024
BIG-C Generic Conversational Bemba CC-BY-NC-ND-4.0 link V0 05/26/2023
NCHLT Generic Read Zulu, Xhosa, Afrikaans CC-BY-3.0 link V0 02/06/2018
ALFFA Generic Read Swahili, Wolof MIT link V0 04/14/2015
GRIOTS Generic Conversational Bambara CC-BY-4.0 link V2.0 07/11/2023
AfriVoice Generic Spontaneous Lingala, Shona CC-BY-4.0 link V1.1.0 03/26/2024
Kallaama Agriculture Spontaneous Wolof CC-BY-4.0 link V0 29/03/2024
Yogera Generic Descriptive Luganda CC-BY-SA-4.0 link V4.0.1 08/13/2024
Asheshi Financial Finance Spontaneous Akan CC-BY-4.0 link V0 24/06/2024
Lingala Read Speech Corpus Generic Read Lingala CC BY 4.0 link V1 22/09/2023
Amharic ASR Dataset Generic Mixed Amharic CC-BY-4.0 link V0 08/01/2024
BembaSpeech ASR Corpus Generic Read Bemba CC-BY-NC-4.0 link V0 06/20/2022
AMMI Generic Read Swahili, Lingala, Bemba MIT link V0 07/06/2020
Waxal dataset General Spontaneous Akan, Ewe CC-BY-SA-4.0 link V1.3 27/07/2020
EthioSpeech General Read Amharic, Oromo ELRA END USER link V1.0 21/03/2025
Sagalee General Read Oromo CC BY-NC 4.0 link V0 28/11/2024