Update README.md
Browse files
README.md
CHANGED
|
@@ -7,12 +7,12 @@ base_model:
|
|
| 7 |
pipeline_tag: visual-question-answering
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
This repository contains the efficient GUI grounding model, **UI-R1-E-3B**, presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
|
| 12 |
|
| 13 |
Project page: https://github.com/lll6gg/UI-R1
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
| ScreenSpotV2 | inference mode | Mobile-T | Mobile-I | Desktop-T | Desktop-I | Web-T | Web-I | Avg↑ / Len↓ |
|
| 18 |
| ------------- | -------------- | -------- | -------- | --------- | --------- | -------- | -------- | ----------------- |
|
|
@@ -22,8 +22,7 @@ Project page: https://github.com/lll6gg/UI-R1
|
|
| 22 |
| GUI-R1-3B | w/ thinking | 97.6 | 78.2 | 94.3 | 64.3 | 91.0 | 72.4 | 85.0 / 80 |
|
| 23 |
| UI-R1-3B (v2) | w/ thinking | 97.6 | 79.6 | 92.3 | 67.9 | 88.9 | 77.8 | 85.8 / 60 |
|
| 24 |
| **UI-R1-E-3B** | w/o thinking | **98.2** | 83.9 | **94.8** | **75.0** | **93.2** | **83.7** | **89.5** / **28** |
|
| 25 |
-
|
| 26 |
-
#### Benchmark 2: ScreenSpot-Pro
|
| 27 |
|
| 28 |
| ScreenSpot-Pro | inference mode | Average Length↓ | Average Accuracy↑ |
|
| 29 |
| -------------- | -------------- | --------------- | ---------------- |
|
|
@@ -33,7 +32,6 @@ Project page: https://github.com/lll6gg/UI-R1
|
|
| 33 |
| GUI-R1-3B | w/ thinking | 114 | 26.6 |
|
| 34 |
| UI-R1-3B (v2) | w/ thinking | 129 | 29.8 |
|
| 35 |
| **UI-R1-E-3B** | w/o thinking | **28** | **33.5** |
|
| 36 |
-
|
| 37 |
## Evaluation Method for GUI Grounding
|
| 38 |
|
| 39 |
1. Prompt for UI-R1-E-3B:
|
|
|
|
| 7 |
pipeline_tag: visual-question-answering
|
| 8 |
---
|
| 9 |
|
| 10 |
+
## Introduction
|
| 11 |
This repository contains the efficient GUI grounding model, **UI-R1-E-3B**, presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
|
| 12 |
|
| 13 |
Project page: https://github.com/lll6gg/UI-R1
|
| 14 |
|
| 15 |
+
## Benchmark 1: ScreenSpotV2
|
| 16 |
|
| 17 |
| ScreenSpotV2 | inference mode | Mobile-T | Mobile-I | Desktop-T | Desktop-I | Web-T | Web-I | Avg↑ / Len↓ |
|
| 18 |
| ------------- | -------------- | -------- | -------- | --------- | --------- | -------- | -------- | ----------------- |
|
|
|
|
| 22 |
| GUI-R1-3B | w/ thinking | 97.6 | 78.2 | 94.3 | 64.3 | 91.0 | 72.4 | 85.0 / 80 |
|
| 23 |
| UI-R1-3B (v2) | w/ thinking | 97.6 | 79.6 | 92.3 | 67.9 | 88.9 | 77.8 | 85.8 / 60 |
|
| 24 |
| **UI-R1-E-3B** | w/o thinking | **98.2** | 83.9 | **94.8** | **75.0** | **93.2** | **83.7** | **89.5** / **28** |
|
| 25 |
+
## Benchmark 2: ScreenSpot-Pro
|
|
|
|
| 26 |
|
| 27 |
| ScreenSpot-Pro | inference mode | Average Length↓ | Average Accuracy↑ |
|
| 28 |
| -------------- | -------------- | --------------- | ---------------- |
|
|
|
|
| 32 |
| GUI-R1-3B | w/ thinking | 114 | 26.6 |
|
| 33 |
| UI-R1-3B (v2) | w/ thinking | 129 | 29.8 |
|
| 34 |
| **UI-R1-E-3B** | w/o thinking | **28** | **33.5** |
|
|
|
|
| 35 |
## Evaluation Method for GUI Grounding
|
| 36 |
|
| 37 |
1. Prompt for UI-R1-E-3B:
|