SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper β’ 2501.09756 β’ Published 1 day ago β’ 14
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models Paper β’ 2501.02955 β’ Published 12 days ago β’ 40
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper β’ 2501.01957 β’ Published 15 days ago β’ 40
SDPO: Segment-Level Direct Preference Optimization for Social Agents Paper β’ 2501.01821 β’ Published 15 days ago β’ 18
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation Paper β’ 2410.23090 β’ Published Oct 30, 2024 β’ 54
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper β’ 2410.06885 β’ Published Oct 9, 2024 β’ 43