Vision Language Action models
updated
A Survey on Vision-Language-Action Models: An Action Tokenization
Perspective
Paper
•
2507.01925
•
Published
•
39
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper
•
2507.16746
•
Published
•
35
MolmoAct: Action Reasoning Models that can Reason in Space
Paper
•
2508.07917
•
Published
•
44
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding
in Vision-Language-Action Policies
Paper
•
2508.20072
•
Published
•
31
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks
Paper
•
2508.15804
•
Published
•
15
CLIPSym: Delving into Symmetry Detection with CLIP
Paper
•
2508.14197
•
Published
•
8
F1: A Vision-Language-Action Model Bridging Understanding and Generation
to Actions
Paper
•
2509.06951
•
Published
•
32
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action
Model
Paper
•
2509.09372
•
Published
•
243
Lost in Embeddings: Information Loss in Vision-Language Models
Paper
•
2509.11986
•
Published
•
28
A Vision-Language-Action-Critic Model for Robotic Real-World
Reinforcement Learning
Paper
•
2509.15937
•
Published
•
20
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
•
2509.22186
•
Published
•
139
More Thought, Less Accuracy? On the Dual Nature of Reasoning in
Vision-Language Models
Paper
•
2509.25848
•
Published
•
80
Visual Jigsaw Post-Training Improves MLLMs
Paper
•
2509.25190
•
Published
•
35
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action
Models
Paper
•
2510.13626
•
Published
•
45
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning
and Online Reinforcement Learning
Paper
•
2510.12693
•
Published
•
27
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Paper
•
2510.19430
•
Published
•
49
π_RL: Online RL Fine-tuning for Flow-based
Vision-Language-Action Models
Paper
•
2510.25889
•
Published
•
64
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution
Paper
•
2511.14210
•
Published
•
19
VisPlay: Self-Evolving Vision-Language Models from Images
Paper
•
2511.15661
•
Published
•
42
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models
Paper
•
2511.15605
•
Published
•
22
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Paper
•
2511.19773
•
Published
•
9
Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
Paper
•
2511.16175
•
Published
•
12
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion
Paper
•
2512.19535
•
Published
•
10
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
Paper
•
2512.19526
•
Published
•
10