facebook/vjepa2-vitl-fpc64-256 Video Classification • 0.3B • Updated 21 days ago • 29.6k • 134
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published 18 days ago • 60
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published 18 days ago • 60
BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing Paper • 2506.17450 • Published 18 days ago • 60 • 1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Paper • 2506.13654 • Published 22 days ago • 43
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21 • 51
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published Dec 11, 2024 • 39
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 48
MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion Paper • 2307.01097 • Published Jul 3, 2023 • 10
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Paper • 2402.12712 • Published Feb 20, 2024 • 18
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping Paper • 2403.15951 • Published Mar 23, 2024 • 1