Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
johannhartmann 's Collections
Music
GUI Intelligence
Document & UI Intelligence
Multimodal Models
Medical MultiModal

Document & UI Intelligence

updated Jan 20
Upvote
1

  • xlangai/Aguvis-7B-720P

    8B • Updated Jan 7 • 32 • 9

  • Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

    Paper • 2412.04454 • Published Dec 5, 2024 • 70

  • SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

    Paper • 2401.10935 • Published Jan 17, 2024 • 5

  • cckevinn/SeeClick

    Text Generation • 10B • Updated Jan 29, 2024 • 132 • 18

  • jadechoghari/Ferret-UI-Llama8b

    Image-Text-to-Text • 8B • Updated Jan 8 • 363 • 68

  • Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

    Paper • 2410.18967 • Published Oct 24, 2024 • 1

  • microsoft/OmniParser

    Image-Text-to-Text • Updated Dec 2, 2024 • 354 • 1.69k

  • InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

    Paper • 2501.04575 • Published Jan 8 • 25

  • showlab/ShowUI-2B

    Updated Mar 11 • 1.71k • 267

  • AskUI/PTA-1

    Image-Text-to-Text • 0.3B • Updated Nov 28, 2024 • 423 • 96
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs