F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9 • 40
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published Oct 6 • 27
MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion Paper • 2410.03825 • Published Oct 4 • 17
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Paper • 2410.01036 • Published Oct 1 • 14
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published Oct 2 • 30
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 134
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18 • 36
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey Paper • 2409.11564 • Published Sep 17 • 19
The VoxCeleb Speaker Recognition Challenge: A Retrospective Paper • 2408.14886 • Published Aug 27 • 8
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published Jul 11 • 14
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4 • 30
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions Paper • 2402.17485 • Published Feb 27 • 188
ChatMusician: Understanding and Generating Music Intrinsically with LLM Paper • 2402.16153 • Published Feb 25 • 56