Biomed-Enriched: A Biomedical Dataset Enriched with LLMs for Pretraining and Extracting Rare and Hidden Content Paper • 2506.20331 • Published 2 days ago • 3
Optimizing Multilingual Text-To-Speech with Accents & Emotions Paper • 2506.16310 • Published 8 days ago • 22 • 8
Optimizing Multilingual Text-To-Speech with Accents & Emotions Paper • 2506.16310 • Published 8 days ago • 22 • 8
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10 • 99 • 4
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published Feb 11 • 36 • 4
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks Paper • 2210.14712 • Published Oct 26, 2022
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models Paper • 2409.17146 • Published Sep 25, 2024 • 119
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Paper • 2406.19680 • Published Jun 28, 2024 • 1 • 1
AescF/hubert-base-ls960-finetuned-common_language Audio Classification • Updated Sep 26, 2023 • 35 • 1
jpbello/Hubert_emotion-finetuned-common_language Audio Classification • Updated Sep 26, 2023 • 29 • 1