MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 17 days ago • 120
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 16 days ago • 93
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published 14 days ago • 28
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback Paper • 2405.20216 • Published May 30, 2024 • 20
Articulated Kinematics Distillation from Video Diffusion Models Paper • 2504.01204 • Published 14 days ago • 23
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper • 2504.01956 • Published 13 days ago • 38
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Paper • 2504.01014 • Published 14 days ago • 59
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 14 days ago • 78
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published 12 days ago • 35
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models Paper • 2504.03641 • Published 11 days ago • 13
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 8 days ago • 158
Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence Paper • 2503.20533 • Published 20 days ago • 11
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance Paper • 2504.06232 • Published 7 days ago • 11