MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 7 days ago • 37
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 6 days ago • 112
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills Paper • 2504.07079 • Published 8 days ago • 11
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 7 days ago • 21
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published 15 days ago • 33
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published 19 days ago • 51
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published 15 days ago • 35
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 17 days ago • 74
DreamActor-M1: Holistic, Expressive and Robust Human Image Animation with Hybrid Guidance Paper • 2504.01724 • Published 16 days ago • 61
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published 16 days ago • 28
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning Paper • 2503.21860 • Published 21 days ago • 4
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 19 days ago • 121
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 19 days ago • 93
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published Mar 12 • 68
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency Paper • 2503.20785 • Published 22 days ago • 21