Model Description
This model is a fine-tuned version of Wav2Vec2-BERT 2.0 for automatic speech recognition (ASR) in Kinyarwanda. It was trained on the Kinyarwanda ASR Track A dataset covering Health, Government, Finance, Education, and Agriculture domains.
- Developed by: badrex
- Model type: Speech Recognition (ASR)
- Language: Kinyarwanda (rw)
- License: MIT
- Finetuned from: facebook/w2v-bert-2.0
Model Sources
- Repository: https://huggingface.co/badrex/w2v-bert-2.0-kinyarwanda-asr
- Dataset: Kinyarwanda ASR Track A
Direct Use
The model can be used directly for automatic speech recognition of Kinyarwanda audio:
from transformers import Wav2Vec2BertProcessor, Wav2Vec2BertForCTC
import torch
import torchaudio
# load model and processor
processor = Wav2Vec2BertProcessor.from_pretrained("badrex/w2v-bert-2.0-kinyarwanda-asr")
model = Wav2Vec2BertForCTC.from_pretrained("badrex/w2v-bert-2.0-kinyarwanda-asr")
# load audio
audio_input, sample_rate = torchaudio.load("path/to/audio.wav")
# preprocess
inputs = processor(audio_input.squeeze(), sampling_rate=sample_rate, return_tensors="pt")
# inference
with torch.no_grad():
logits = model(**inputs).logits
# decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)
Downstream Use
This model can be used as a foundation for:
- building voice assistants for Kinyarwanda speakers
- transcription services for Kinyarwanda content
- accessibility tools for Kinyarwanda-speaking communities
- research in low-resource speech recognition
Out-of-Scope Use
- transcribing languages other than Kinyarwanda
- real-time applications without proper latency testing
- high-stakes applications without domain-specific validation
Bias, Risks, and Limitations
- Domain bias: primarily trained on formal speech from specific domains (Health, Government, Finance, Education, Agriculture)
- Accent variation: may not perform well on dialects or accents not represented in training data
- Audio quality: performance may degrade on noisy or low-quality audio
- Technical terms: may struggle with specialized vocabulary outside training domains
Training Data
The model was fine-tuned on the Kinyarwanda ASR Track A dataset:
- Size: ~500 hours of transcribed Kinyarwanda speech
- Domains: Health, Government, Finance, Education, Agriculture
- Source: Digital Umuganda (Gates Foundation funded)
- License: CC BY 4.0
Model Architecture
- Base model: Wav2Vec2-BERT 2.0
- Architecture: transformer-based with convolutional feature extractor
- Parameters: ~600M (inherited from base model)
- Objective: connectionist temporal classification (CTC)
Compute Infrastructure
Citation
@misc{w2v_bert_kinyarwanda_asr,
author = {Badr M. Abdullah},
title = {Adapting Wav2Vec2-BERT 2.0 for Kinyarwanda ASR},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/badrex/w2v-bert-2.0-kinyarwanda-asr}
}
@misc{kinyarwanda_asr_track_a,
title={Kinyarwanda Automatic Speech Recognition Track A},
author={Digital Umuganda},
year={2025},
url={https://www.kaggle.com/competitions/kinyarwanda-automatic-speech-recognition-track-a}
}
Model Card Contact
For questions or issues, please contact via the Hugging Face model repository.
- Downloads last month
- 44
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for badrex/w2v-bert-2.0-kinyarwanda-asr
Base model
facebook/w2v-bert-2.0