The NaijaVoices Dataset: Cultivating Large-Scale, High-Quality, Culturally-Rich Speech Data for African Languages Paper • 2505.20564 • Published May 26 • 1
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published 14 days ago • 59
MT Quality Estimation Collection Models for reference-free quality estimation of machine translation • 10 items • Updated Jan 29 • 4
Domain-Specific Translation with Open-Source Large Language Models: Resource-Oriented Analysis Paper • 2412.05862 • Published Dec 8, 2024 • 1
view article Article Fine-Tune W2V2-Bert for low-resource ASR with 🤗 Transformers By ylacombe • Jan 19, 2024 • 36
view article Article Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers By patrickvonplaten • Nov 15, 2021 • 28
view article Article Fine-tuning MMS Adapter Models for Multi-Lingual ASR By patrickvonplaten • Jun 19, 2023 • 20
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus Paper • 2207.03546 • Published Jul 7, 2022 • 2
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published Dec 18, 2024 • 151
Open Whisper-style Speech Models (OWSM) Collection Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 21 items • Updated Jun 3 • 6
CommonCrawl Collection Large web-mined general corpus based on CommonCrawl. • 8 items • Updated Apr 13 • 3
AfriCOMET Collection COMET evaluation models for African languages • 6 items • Updated Oct 1, 2024 • 2
MobileLLM Collection Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 40 items • Updated 16 days ago • 118
Optimized ONNX models for NVIDIA RTX GPUs Collection Collection of optimized ONNX model checkpoints for NVIDIA RTX GPUs • 7 items • Updated 2 days ago • 10
Spaces for Model / Space / useful Utilities in Hugging Face Collection 297 items • Updated 7 days ago • 10
MaLA corpus Collection MaLA Corpus for Massive Language Adaptation of Large Language Models https://mala-lm.github.io • 18 items • Updated about 1 month ago • 7