Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability Paper • 2506.01789 • Published 7 days ago • 13
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 98
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling Paper • 2410.09223 • Published Oct 11, 2024 • 5