CoS: Chain-of-Shot Prompting for Long Video Understanding Paper • 2502.06428 • Published 12 days ago • 10
Pippo: High-Resolution Multi-View Humans from a Single Image Paper • 2502.07785 • Published 10 days ago • 9
VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation Paper • 2502.07531 • Published 11 days ago • 13
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published 11 days ago • 32
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling Paper • 2502.07737 • Published 11 days ago • 9
DPO-Shift: Shifting the Distribution of Direct Preference Optimization Paper • 2502.07599 • Published 11 days ago • 14
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance Paper • 2502.06145 • Published 12 days ago • 16
CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation Paper • 2502.08639 • Published 9 days ago • 36
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published 11 days ago • 43
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published 8 days ago • 27
Exploring the Potential of Encoder-free Architectures in 3D LMMs Paper • 2502.09620 • Published 8 days ago • 26
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models Paper • 2502.06608 • Published 12 days ago • 32
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published 8 days ago • 31
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 7 days ago • 29
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published 8 days ago • 49