GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography Paper • 2504.07083 • Published 7 days ago • 21
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases Paper • 2312.15011 • Published Dec 22, 2023 • 18
GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models Paper • 2501.01428 • Published Jan 2
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting Paper • 2501.16330 • Published Jan 27 • 2
RelightVid: Temporal-Consistent Diffusion Model for Video Relighting Paper • 2501.16330 • Published Jan 27 • 2
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 73
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published Jan 6 • 45
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Paper • 2412.07674 • Published Dec 10, 2024 • 20
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models Paper • 2412.01824 • Published Dec 2, 2024 • 66
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models Paper • 2410.17637 • Published Oct 23, 2024 • 37
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction Paper • 2410.17247 • Published Oct 22, 2024 • 48