VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper • 2504.07615 • Published 6 days ago • 19
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 1 day ago • 185
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 5 days ago • 104
TAPNext: Tracking Any Point (TAP) as Next Token Prediction Paper • 2504.05579 • Published 8 days ago • 4
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Paper • 2504.05541 • Published 8 days ago • 14
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance Paper • 2504.06232 • Published 8 days ago • 11
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published 13 days ago • 33
An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 8 days ago • 59
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 8 days ago • 141
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Paper • 2504.02949 • Published 13 days ago • 19
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published 13 days ago • 41
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published 13 days ago • 35
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation Paper • 2504.02782 • Published 13 days ago • 54
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models Paper • 2503.18886 • Published 23 days ago • 20
Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback Paper • 2405.20216 • Published May 30, 2024 • 20