You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Nigerian English ASR - Whisper Small

A breakthrough automatic speech recognition (ASR) model specifically designed for Nigerian Accented English, fine-tuned on Whisper Small architecture. This model is powered by Awarri Technologies and an initiative of the Federal Ministry of Communications, Innovation and Digital Economy to bridge the accent gap in speech recognition technology and promote digital inclusion for Nigerian English speakers.

Model Description

Model Name: NaijaEnglish-ASR-v1.0
Architecture: Whisper Small (244M parameters)
Language: Nigerian Accented English (en-NG)
License: [other]
Model Size: ~244M parameters

Quick Start

Installation

pip install torch torchaudio transformers librosa

Basic Usage

from transformers import pipeline
import librosa

# Initialize the ASR pipeline
asr = pipeline("automatic-speech-recognition", model="NCAIR1/NigerianAccentedEnglish")

# Load audio file (16kHz recommended)
audio, sr = librosa.load("your_nigerian_english_audio.wav", sr=16000)

# Transcribe
result = asr(audio)
print(result["text"])

Advanced Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa

# Load model and processor
processor = WhisperProcessor.from_pretrained("NCAIR1/NigerianAccentedEnglish")
model = WhisperForConditionalGeneration.from_pretrained("NCAIR1/NigerianAccentedEnglish")

# Process audio
audio, sr = librosa.load("nigerian_english_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=sr, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    predicted_ids = model.generate(input_features)
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
    
print(transcription[0])

Use Cases

✅ Perfect For

This model can be applied across multiple domains for Nigerian speakers, where English usage follows local accents and patterns.
Call Centers and Customer Service – Improve support operations by handling local English effectively.
Educational Technology – Enable better learning tools tailored for students.
Media and Broadcasting – Provide accurate transcription of audio and video content.
Government Services – Support public-facing applications that require reliable English language processing.
Accessibility Applications – Assist hearing-impaired users through accurate voice-to-text solutions.
Academic Research – Facilitate linguistic studies on English usage and variation.
Voice-Enabled Applications – Build applications that recognize and respond to local voices and accents.
Business Applications – Power enterprise solutions in English-speaking contexts.

❌ Not Recommended

Mass surveillance or unauthorized monitoring
High-stakes applications without human oversight
Applications that could perpetuate accent discrimination
Contexts where accent bias could cause harm

Key Features

🎯 Accent-Inclusive Design

Trained specifically on Nigerian English speech patterns
Recognizes regional variations across Nigeria's 6 geopolitical zones
Handles Nigerian expressions and linguistic patterns

🌍 Cultural Awareness

Understands Nigerian English conventions
Respects linguistic diversity within Nigerian English
Promotes inclusive speech recognition technology

⚡ High Performance

Significant improvement over general English models
Optimized for real-world Nigerian speech patterns

Limitations

Regional Variations: Some specific regional accents may vary in accuracy
Code-Switching: Reduced performance when mixing with local Nigerian languages
Audio Quality: Performance depends on clear audio input
Domain-Specific Content: May require fine-tuning for specialized fields
Non-Nigerian Accents: Optimized specifically for Nigerian English

Model Details

Technical Specifications

Architecture: Transformer-based (Whisper Small)
Parameters: 244M
Input: Audio waveform (16kHz recommended)
Output: Nigerian English text transcription
Context Length: 30 seconds maximum per inference

Training Details

Base Model: OpenAI Whisper Small
Training Duration: 120 hours
Data Collection Platform: Langeasy
Data Sources:
- Langeasy platform recordings from speakers across Nigeria's 6 geopolitical zones
- Publicly available Nigerian English datasets
Geographic Coverage: All 6 geopolitical zones of Nigeria
Accent Diversity: Multiple regional Nigerian English variations

Fine-tuning for Your Domain

Enhance performance for specific Nigerian English applications:

from transformers import WhisperForConditionalGeneration, Seq2SeqTrainer

# Load the Nigerian English base model
model = WhisperForConditionalGeneration.from_pretrained("NCAIR1/NigerianAccentedEnglish")

# Fine-tune with your domain-specific Nigerian English data
# Recommended: 10-20 hours of high-quality domain audio

Impact & Applications

This model addresses a critical gap in speech recognition technology by providing:

Digital Inclusion for 200+ million Nigerian English speakers
Bias Reduction in voice-enabled applications
Cultural Preservation of Nigerian English linguistic patterns
Economic Opportunities through accessible speech technology

Ethical Considerations

Designed to combat accent bias in AI systems
Promotes equitable access to speech technology
Respects Nigerian English as a legitimate language variety
Should not be used for surveillance or discriminatory purposes

Citation

@misc{awarri2025nigerian,
  title={NaijaEnglish-ASR-v1.0: Accent-Inclusive Speech Recognition for Nigerian English},
  author={Awarri Technologies},
  year={2025},
  howpublished={Hugging Face Model Hub},
  url={https://huggingface.co/NCAIR-NG/NigerianAccentedEnglish}
}

Contact & Support

Initiative Of: The Federal Ministry of Communications, Innovation and Digital Economy
Powered By: Awarri Technologies
Project: N-ATLaS
Version: 1.0 (September 2025)

For issues, questions, or collaboration opportunities, please refer to the model repository discussions or contact Awarri Technologies.

Acknowledgments

This work was made possible through:

AWARRI Technologies
National Information Technology Development Agency (NITDA)
The Federal Ministry of Communications, Innovation and Digital Economy
National Center for Artificial Intelligence and Robotics
Data contributors from across Nigeria's 6 geopolitical zones via the Langeasy platform
The broader Nigerian language technology research community

Breaking barriers in speech recognition. Celebrating Nigerian English. Advancing digital inclusion.

Related Models

AWARRITech/whisper_small_yoruba - Yoruba ASR Model
More Nigerian language models coming soon!

Terms of Use for NigerianAccentedEnglish

Effective Date: September 2025
Version: 1.0

1. Introduction & Scope

Awarri Technologies, in partnership with the Federal Government of Nigeria, hereby releases NigerianAccentedEnglish, an Automatic Speech Recognition (ASR) model for Nigerian-accented English.

NigerianAccentedEnglish is released under an Open-Source Research and Innovation License inspired by permissive licenses such as Apache 2.0 and MIT, but with additional restrictions tailored for responsible use in Nigeria and globally.

The model is intended to support:

Research and academic study
Education and capacity development
Civic technology and accessibility initiatives
Linguistic and cultural preservation, and community projects

⚠️ NigerianAccentedEnglish is not an enterprise-grade or commercial system. Commercial or large-scale enterprise use requires a separate licensing agreement (see Section 3).

2. License Grant

Subject to compliance with these Terms, users are granted a worldwide, royalty-free, non-exclusive, non-transferable license to:

Download, use, and run NigerianAccentedEnglish for permitted purposes
Modify, adapt, and create derivative works of NigerianAccentedEnglish
Redistribute NigerianAccentedEnglish and derivative works under these same Terms

Conditions:

Attribution must be given to:

“Awarri Technologies and the Federal Government of Nigeria, developers of N-ATLaS (NigerianAccentedEnglish).”
Derivative works must be released under the same license, ensuring consistency and traceability.
If NigerianAccentedEnglish or its derivatives are renamed, they must carry the suffix: “Powered by Awarri.”

3. User License Cap (1000 Users)

Use of NigerianAccentedEnglish is limited to organizations, institutions, or projects with no more than 1000 active end-users.

An active end-user is an individual who directly interacts with the model outputs (e.g., via an app, website, or integrated service) within a rolling 30-day period.
Organizations exceeding the 1000-user cap must obtain a commercial license directly from Awarri Technologies in partnership with the Federal Ministry of Communications, Innovation, and Digital Economy.

4. Acceptable Use

✅ Permitted Use Cases include (but are not limited to):

Academic and non-profit research
Accessibility for persons with disabilities
Language and cultural preservation projects
Civic technology and public benefit applications
Education, training, and community innovation

❌ Prohibited Use Cases include (but are not limited to):

Surveillance or unlawful monitoring
Discriminatory profiling or exclusionary practices
Disinformation, impersonation, or synthetic fraud
Military, intelligence, or weaponized deployment
Exploitative, harmful, or unlawful applications

5. Limitations & Disclaimer

NigerianAccentedEnglish is released “as-is”, without warranties of any kind, express or implied.

Known limitations include:

Dialectal/spoken accent variation may affect performance
Reduced accuracy with children’s speech
Limited handling of code-switching or mixing English with local languages
Degraded performance in very noisy or low-quality audio environments

Neither Awarri Technologies nor the Federal Government of Nigeria shall be liable for damages arising from the use of NigerianAccentedEnglish.

6. Ethical & Cultural Considerations

Users must:

Respect Nigeria’s cultural and linguistic diversity
Ensure transparent reporting of accuracy, bias, and limitations
Uphold human rights and privacy standards in all deployments

7. Data & Privacy

All training data used in NigerianAccentedEnglish was either publicly available or government-approved for use.
Users are strictly prohibited from using the model for unauthorized personal data scraping, collection, or profiling.

8. Governance & Updates

Governance and oversight are led by the Federal Ministry of Communications, Innovation, and Digital Economy, in collaboration with the National Centre for Artificial Intelligence & Robotics (NCAIR).
Awarri Technologies shall act as the technical maintainer and custodian of NigerianAccentedEnglish.
Updates, improvements, and community contributions will be published periodically.
Users must comply with the specific Terms attached to each version release.

9. Legal & Jurisdiction

These Terms are governed by the laws of the Federal Republic of Nigeria.
In the event of a dispute, parties agree to seek resolution first through mediation under the auspices of the Federal Ministry of Justice, before pursuing litigation in Nigerian courts.

10. Termination

The Federal Government of Nigeria and Awarri Technologies reserve the right to revoke, suspend, or terminate usage rights if these Terms are violated.

Termination may apply to individual users, institutions, or organizations found in breach.

11. Contact & Attribution

For licensing, inquiries, and commercial partnerships regarding NigerianAccentedEnglish, contact:

Awarri Technologies

Email: [email protected]
Website: awarri.com

Federal Ministry of Communications, Innovation, and Digital Economy

Email: [email protected]
Website: NCAIR

Required attribution in all public use:

“NigerianAccentedEnglish is powered by Awarri Technologies and an initiative of the Federal Ministry of Communications, Innovation and Digital Economy.”

If renamed, the model must carry the suffix:

“Powered by Awarri.”

Keywords: Nigerian English, Accent Recognition, Speech-to-Text, West African English, Inclusive AI, Digital Inclusion

Downloads last month: 123

Model tree for NCAIR1/NigerianAccentedEnglish

Base model

openai/whisper-small

Finetuned

(2947)

this model