π^3: Scalable Permutation-Equivariant Visual Geometry Learning Paper • 2507.13347 • Published Jul 17 • 64
A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs Paper • 2411.17265 • Published Nov 26, 2024 • 1
Use Property-Based Testing to Bridge LLM Code Generation and Validation Paper • 2506.18315 • Published Jun 23 • 10
Use Property-Based Testing to Bridge LLM Code Generation and Validation Paper • 2506.18315 • Published Jun 23 • 10
A Topic-level Self-Correctional Approach to Mitigate Hallucinations in MLLMs Paper • 2411.17265 • Published Nov 26, 2024 • 1
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation Paper • 2501.12612 • Published Jan 22
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 59
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 59
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Paper • 2506.04308 • Published Jun 4 • 43
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection Paper • 2412.04455 • Published Dec 5, 2024 • 39
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published Jan 7 • 23
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE Paper • 2311.02684 • Published Nov 5, 2023
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy Paper • 2203.07845 • Published Mar 15, 2022
MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control Paper • 2403.12037 • Published Mar 18, 2024 • 1