Optimizing Multilingual Text-To-Speech with Accents & Emotions Paper โข 2506.16310 โข Published 8 days ago โข 22 โข 8
Optimizing Multilingual Text-To-Speech with Accents & Emotions Paper โข 2506.16310 โข Published 8 days ago โข 22 โข 8
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper โข 2503.07920 โข Published Mar 10 โข 99 โข 4
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper โข 2502.07701 โข Published Feb 11 โข 36 โข 4
MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance Paper โข 2406.19680 โข Published Jun 28, 2024 โข 1 โข 1
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Paper โข 2408.07547 โข Published Aug 14, 2024 โข 8 โข 3
BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation Paper โข 2407.17952 โข Published Jul 25, 2024 โข 33 โข 7
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data Paper โข 2402.08093 โข Published Feb 12, 2024 โข 62 โข 9