Multimodal Action - a btjhjeon Collection

btjhjeon 's Collections

Multimodal Action

Multimodal System

Multimodal Reasoning

Multimodal Analysis

Multimodal Alignment

PEFT

LLM

LLM context length

Multimodal Dataset

Multimodal Benchmarks

Multimodal Action

updated Mar 30

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 25
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 50
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 51
ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Paper • 2411.17465 • Published Nov 26, 2024 • 87
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Paper • 2501.11733 • Published Jan 20 • 29
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Paper • 2503.12533 • Published Mar 16 • 66
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

Paper • 2503.21696 • Published Mar 27 • 22
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

Paper • 2503.21620 • Published Mar 27 • 62
OmniParser for Pure Vision Based GUI Agent

Paper • 2408.00203 • Published Aug 1, 2024 • 26