Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective Paper • 2505.19815 • Published 29 days ago • 36
EnerVerse-AC: Envisioning Embodied Environments with Action Condition Paper • 2505.09723 • Published May 14 • 22
MM-IFEngine: Towards Multimodal Instruction Following Paper • 2504.07957 • Published Apr 10 • 34
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 69
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing Paper • 2504.02826 • Published Apr 3 • 69
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language Paper • 2406.20085 • Published Jun 28, 2024 • 13
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Paper • 2401.02361 • Published Jan 4, 2024
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 48
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper • 2503.14478 • Published Mar 18 • 48
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning Paper • 2503.10291 • Published Mar 13 • 37
Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models Paper • 2503.01774 • Published Mar 3 • 45
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 88