Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan Paper • 2107.07903 • Published Jul 16, 2021
Spanish Biomedical Crawled Corpus: A Large, Diverse Dataset for Spanish Biomedical Language Models Paper • 2109.07765 • Published Sep 16, 2021
A New Massive Multilingual Dataset for High-Performance Language Technologies Paper • 2403.14009 • Published Mar 20, 2024 • 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 29
An Expanded Massive Multilingual Dataset for High-Performance Language Technologies Paper • 2503.10267 • Published 10 days ago