Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
btjhjeon 's Collections
Multimodal Action
Multimodal System
Multimodal Reasoning
Multimodal Analysis
Multimodal Alignment
PEFT
Multimodal LLM
LLM
LLM context length
Multimodal Dataset
Multimodal Benchmarks

Multimodal Action

updated Mar 30
Upvote
-

  • Gemini Robotics: Bringing AI into the Physical World

    Paper • 2503.20020 • Published Mar 25 • 25

  • Magma: A Foundation Model for Multimodal AI Agents

    Paper • 2502.13130 • Published Feb 18 • 58

  • LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

    Paper • 2311.05437 • Published Nov 9, 2023 • 50

  • OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

    Paper • 2410.23218 • Published Oct 30, 2024 • 51

  • ShowUI: One Vision-Language-Action Model for GUI Visual Agent

    Paper • 2411.17465 • Published Nov 26, 2024 • 87

  • Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

    Paper • 2501.11733 • Published Jan 20 • 29

  • Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

    Paper • 2503.12533 • Published Mar 16 • 66

  • Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

    Paper • 2503.21696 • Published Mar 27 • 22

  • UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning

    Paper • 2503.21620 • Published Mar 27 • 62

  • OmniParser for Pure Vision Based GUI Agent

    Paper • 2408.00203 • Published Aug 1, 2024 • 26
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs