LZXzju commited on
Commit
f665794
·
verified ·
1 Parent(s): 42b38ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -5
README.md CHANGED
@@ -7,12 +7,12 @@ base_model:
7
  pipeline_tag: visual-question-answering
8
  ---
9
 
10
-
11
  This repository contains the efficient GUI grounding model, **UI-R1-E-3B**, presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
12
 
13
  Project page: https://github.com/lll6gg/UI-R1
14
 
15
- #### Benchmark 1: ScreenSpotV2
16
 
17
  | ScreenSpotV2 | inference mode | Mobile-T | Mobile-I | Desktop-T | Desktop-I | Web-T | Web-I | Avg↑ / Len↓ |
18
  | ------------- | -------------- | -------- | -------- | --------- | --------- | -------- | -------- | ----------------- |
@@ -22,8 +22,7 @@ Project page: https://github.com/lll6gg/UI-R1
22
  | GUI-R1-3B | w/ thinking | 97.6 | 78.2 | 94.3 | 64.3 | 91.0 | 72.4 | 85.0 / 80 |
23
  | UI-R1-3B (v2) | w/ thinking | 97.6 | 79.6 | 92.3 | 67.9 | 88.9 | 77.8 | 85.8 / 60 |
24
  | **UI-R1-E-3B** | w/o thinking | **98.2** | 83.9 | **94.8** | **75.0** | **93.2** | **83.7** | **89.5** / **28** |
25
-
26
- #### Benchmark 2: ScreenSpot-Pro
27
 
28
  | ScreenSpot-Pro | inference mode | Average Length↓ | Average Accuracy↑ |
29
  | -------------- | -------------- | --------------- | ---------------- |
@@ -33,7 +32,6 @@ Project page: https://github.com/lll6gg/UI-R1
33
  | GUI-R1-3B | w/ thinking | 114 | 26.6 |
34
  | UI-R1-3B (v2) | w/ thinking | 129 | 29.8 |
35
  | **UI-R1-E-3B** | w/o thinking | **28** | **33.5** |
36
-
37
  ## Evaluation Method for GUI Grounding
38
 
39
  1. Prompt for UI-R1-E-3B:
 
7
  pipeline_tag: visual-question-answering
8
  ---
9
 
10
+ ## Introduction
11
  This repository contains the efficient GUI grounding model, **UI-R1-E-3B**, presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
12
 
13
  Project page: https://github.com/lll6gg/UI-R1
14
 
15
+ ## Benchmark 1: ScreenSpotV2
16
 
17
  | ScreenSpotV2 | inference mode | Mobile-T | Mobile-I | Desktop-T | Desktop-I | Web-T | Web-I | Avg↑ / Len↓ |
18
  | ------------- | -------------- | -------- | -------- | --------- | --------- | -------- | -------- | ----------------- |
 
22
  | GUI-R1-3B | w/ thinking | 97.6 | 78.2 | 94.3 | 64.3 | 91.0 | 72.4 | 85.0 / 80 |
23
  | UI-R1-3B (v2) | w/ thinking | 97.6 | 79.6 | 92.3 | 67.9 | 88.9 | 77.8 | 85.8 / 60 |
24
  | **UI-R1-E-3B** | w/o thinking | **98.2** | 83.9 | **94.8** | **75.0** | **93.2** | **83.7** | **89.5** / **28** |
25
+ ## Benchmark 2: ScreenSpot-Pro
 
26
 
27
  | ScreenSpot-Pro | inference mode | Average Length↓ | Average Accuracy↑ |
28
  | -------------- | -------------- | --------------- | ---------------- |
 
32
  | GUI-R1-3B | w/ thinking | 114 | 26.6 |
33
  | UI-R1-3B (v2) | w/ thinking | 129 | 29.8 |
34
  | **UI-R1-E-3B** | w/o thinking | **28** | **33.5** |
 
35
  ## Evaluation Method for GUI Grounding
36
 
37
  1. Prompt for UI-R1-E-3B: