🇰🇪 Model Card for `RareElf/swahili-wav2vec2-asr`

This model is a fine-tuned version of eddiegulay/wav2vec2-large-xlsr-mvc-swahili for automatic speech recognition (ASR) in Swahili. It has been trained using the Common Voice 11.0 Swahili dataset.

📋 Model Details

Model Description

This model leverages the wav2vec2 architecture from Facebook AI, fine-tuned for Swahili ASR. It maps raw speech waveforms sampled at 16kHz to transcriptions using a CTC (Connectionist Temporal Classification) loss. It supports real-time transcription for voice-based Swahili applications.

Developed by: Kevin Obote / RareElf
Funded by: Internal research at Guild Code
Shared by: RareElf
Model type: Automatic Speech Recognition (ASR)
Language(s) (NLP): Swahili (sw)
License: Apache-2.0
Finetuned from model: eddiegulay/wav2vec2-large-xlsr-mvc-swahili

Model Sources [optional]

Repository: https://huggingface.co/RareElf/swahili-wav2vec2-asr
Paper [optional]: Coming soon
Demo [optional]: Coming soon on semasasa.ai

Uses

Direct Use

This model can be used for:

Transcribing Swahili audio for accessibility, journalism, documentation, education, etc.
Integration into chatbots or voice agents in Swahili.

Downstream Use [optional]

Can be integrated with translation and sentiment analysis pipelines.
Useful for fine-tuning on domain-specific Swahili data (e.g.education, healthcare, government).

Out-of-Scope Use

Not suitable for noisy, far-field, or multi-speaker environments without preprocessing.
Not recommended for use in legal, medical, or high-stakes domains without additional validation.

Bias, Risks, and Limitations

The model may underperform on underrepresented dialects of Swahili.
Accents, noisy recordings, and overlapping speech may impact accuracy.
Reflects the linguistic distribution of Common Voice contributors, which may not be representative of all Swahili speakers.

Recommendations

Preprocess noisy audio for best results.
Fine-tune further on targeted domain data for production use.
Provide user disclaimers about ASR limitations in live deployments.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import torch
import librosa

model_id = "RareElf/swahili-wav2vec2-asr"

processor = Wav2Vec2Processor.from_pretrained(model_id)
model = Wav2Vec2ForCTC.from_pretrained(model_id)

audio, _ = librosa.load("sample.wav", sr=16000)
inputs = processor(audio, return_tensors="pt", sampling_rate=16000).input_values

with torch.no_grad():
    logits = model(inputs).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)

Training Details

Training Data

Dataset: Common Voice 11.0 – Swahili subset

Training Procedure

Fine-tuned using Trainer API from 🤗 Transformers.
Loss: CTC (Connectionist Temporal Classification)
Optimizer: AdamW
Precision: fp16 (mixed precision)

Preprocessing [optional]

Resampled to 16kHz
Normalized text
Removed empty, corrupted, or misaligned samples

Training Hyperparameters

Training regime:
Epochs: 10
Batch Size: 16
Learning Rate: 3e-4
Warmup Steps: 500
Weight Decay: 0.01
Gradient Accumulation: 2

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Dataset: Held-out subset of Common Voice Swahili

Factors

[More Information Needed]

Metrics

WER (Word Error Rate)
BLEU (for translation use-case)
ROUGE (for paraphrase quality)

Results

Metric	Score
WER	0.33
BLEU	0.44
ROUGE	0.66

Note: Evaluation scores are being finalized with the full test set.

Summary

Model Examination [optional]

Visualized attention maps confirm the model learns phonetic and acoustic patterns relevant to Swahili.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: ***
Hours used: ~10
Cloud Provider: Google Cloud
Compute Region: ****
Carbon Emitted: ~X gCO2eq (estimation via mlco2)

Technical Specifications [optional]

Model Architecture and Objective

Architecture: Wav2Vec2 (base) + CTC head
Objective: Predict character-level transcription from 16kHz audio

Compute Infrastructure

[More Information Needed]

Hardware

** Personal Computer Lenovo ThinkPad T14 Gen 1 (32GB RAM) 1TB SSD

Software

Python 3.10
PyTorch 2.x
Transformers 4.39.x
Datasets 2.x

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

ASR: Automatic Speech Recognition
WER: Word Error Rate
CTC: Connectionist Temporal Classification

More Information [optional]

Model Card Authors [optional]

Kevin Obote/ RareElf / Guild Code Team

Model Card Contact

Email: [email protected]
GitHub: Kevin Obote

RareElf
/

swahili-wav2vec2-asr

🇰🇪 Model Card for `RareElf/swahili-wav2vec2-asr`

📋 Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for RareElf/swahili-wav2vec2-asr

Dataset used to train RareElf/swahili-wav2vec2-asr

🇰🇪 Model Card for RareElf/swahili-wav2vec2-asr

📋 Model Details

Model Description

Model Sources [optional]

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Preprocessing [optional]

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary

Model Examination [optional]

Environmental Impact

Technical Specifications [optional]

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Model tree for RareElf/swahili-wav2vec2-asr

Dataset used to train RareElf/swahili-wav2vec2-asr

🇰🇪 Model Card for `RareElf/swahili-wav2vec2-asr`