WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments Paper β’ 2504.03886 β’ Published 11 days ago β’ 9
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Paper β’ 2504.06958 β’ Published 6 days ago β’ 9
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper β’ 2504.04010 β’ Published 11 days ago β’ 8
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper β’ 2504.05541 β’ Published 8 days ago β’ 14
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis Paper β’ 2504.04842 β’ Published 9 days ago β’ 29
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Paper β’ 2504.07083 β’ Published 6 days ago β’ 21
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper β’ 2504.07096 β’ Published 6 days ago β’ 66
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper β’ 2504.02826 β’ Published 12 days ago β’ 67
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning Paper β’ 2504.00396 β’ Published 15 days ago β’ 4
HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration Paper β’ 2504.03536 β’ Published 11 days ago β’ 11
SmolVLM: Redefining small and efficient multimodal models Paper β’ 2504.05299 β’ Published 8 days ago β’ 158
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper β’ 2504.02436 β’ Published 13 days ago β’ 35
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper β’ 2504.02542 β’ Published 13 days ago β’ 41
FreSca: Unveiling the Scaling Space in Diffusion Models Paper β’ 2504.02154 β’ Published 13 days ago β’ 17
Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Paper β’ 2503.23542 β’ Published 16 days ago β’ 10