-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 21 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 104 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 10 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2506.21539
-
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations
Paper • 2504.07830 • Published • 18 -
WORLDMEM: Long-term Consistent World Simulation with Memory
Paper • 2504.12369 • Published • 34 -
Towards a Unified Copernicus Foundation Model for Earth Vision
Paper • 2503.11849 • Published • 4 -
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Paper • 2506.18903 • Published • 18
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 62
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 80 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 19 -
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Paper • 2503.05333 • Published • 8 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 27 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 51
-
Unified Vision-Language-Action Model
Paper • 2506.19850 • Published • 21 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 104 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 10 -
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Paper • 2312.14457 • Published • 1
-
Cosmos World Foundation Model Platform for Physical AI
Paper • 2501.03575 • Published • 80 -
Intuitive physics understanding emerges from self-supervised pretraining on natural videos
Paper • 2502.11831 • Published • 19 -
PhysicsGen: Can Generative Models Learn from Images to Predict Complex Physical Relations?
Paper • 2503.05333 • Published • 8 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations
Paper • 2504.07830 • Published • 18 -
WORLDMEM: Long-term Consistent World Simulation with Memory
Paper • 2504.12369 • Published • 34 -
Towards a Unified Copernicus Foundation Model for Earth Vision
Paper • 2503.11849 • Published • 4 -
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Paper • 2506.18903 • Published • 18
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 27 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 51
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 62