Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models Paper • 2504.08809 • Published Apr 9 • 1
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published 22 days ago • 72
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published Dec 2, 2024 • 9
Centroid-centered Modeling for Efficient Vision Transformer Pre-training Paper • 2303.04664 • Published Mar 8, 2023
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Paper • 2402.06119 • Published Feb 9, 2024 • 1
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 10