Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published 3 days ago • 34
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published 3 days ago • 25
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 3 days ago • 68
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published 10 days ago • 48
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 10 days ago • 21
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation Paper • 2504.07405 • Published 17 days ago • 12
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration Paper • 2504.08591 • Published 16 days ago • 18
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 16 days ago • 39
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published 16 days ago • 47
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 16 days ago • 122
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Paper • 2504.09641 • Published 14 days ago • 16
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 13 days ago • 245
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding Paper • 2504.09925 • Published 13 days ago • 38
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published 17 days ago • 46
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper • 2504.04010 • Published 22 days ago • 10
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis Paper • 2504.04842 • Published 20 days ago • 34