LZXzju
/

Qwen2.5-VL-3B-UI-R1-E

Visual Question Answering

Model card Files Files and versions

LZXzju commited on May 15

Commit

f665794

·

verified ·

1 Parent(s): 42b38ca

Update README.md

Files changed (1) hide show

README.md +3 -5

README.md CHANGED Viewed

@@ -7,12 +7,12 @@ base_model:
 pipeline_tag: visual-question-answering
 ---
 This repository contains the efficient GUI grounding model, **UI-R1-E-3B**, presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
 Project page: https://github.com/lll6gg/UI-R1
-#### Benchmark 1: ScreenSpotV2
 | ScreenSpotV2  | inference mode | Mobile-T | Mobile-I | Desktop-T | Desktop-I | Web-T    | Web-I    | Avg↑ / Len↓        |
 | ------------- | -------------- | -------- | -------- | --------- | --------- | -------- | -------- | ----------------- |
@@ -22,8 +22,7 @@ Project page: https://github.com/lll6gg/UI-R1
 | GUI-R1-3B     | w/ thinking    | 97.6     | 78.2     | 94.3      | 64.3      | 91.0     | 72.4     | 85.0 / 80         |
 | UI-R1-3B (v2) | w/ thinking    | 97.6     | 79.6     | 92.3      | 67.9      | 88.9     | 77.8     | 85.8 / 60         |
 | **UI-R1-E-3B**    | w/o thinking   | **98.2** | 83.9     | **94.8**  | **75.0**  | **93.2** | **83.7** | **89.5** / **28** |
-#### Benchmark 2: ScreenSpot-Pro
 | ScreenSpot-Pro | inference mode | Average Length↓ | Average Accuracy↑ |
 | -------------- | -------------- | --------------- | ---------------- |
@@ -33,7 +32,6 @@ Project page: https://github.com/lll6gg/UI-R1
 | GUI-R1-3B      | w/ thinking    | 114             | 26.6             |
 | UI-R1-3B (v2)  | w/ thinking    | 129             | 29.8             |
 | **UI-R1-E-3B**     | w/o thinking   | **28**          | **33.5**         |
 ## Evaluation Method for GUI Grounding
 1. Prompt for UI-R1-E-3B：

 pipeline_tag: visual-question-answering
 ---
+## Introduction
 This repository contains the efficient GUI grounding model, **UI-R1-E-3B**, presented in [UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning](https://huggingface.co/papers/2503.21620).
 Project page: https://github.com/lll6gg/UI-R1
+## Benchmark 1: ScreenSpotV2
 | ScreenSpotV2  | inference mode | Mobile-T | Mobile-I | Desktop-T | Desktop-I | Web-T    | Web-I    | Avg↑ / Len↓        |
 | ------------- | -------------- | -------- | -------- | --------- | --------- | -------- | -------- | ----------------- |
 | GUI-R1-3B     | w/ thinking    | 97.6     | 78.2     | 94.3      | 64.3      | 91.0     | 72.4     | 85.0 / 80         |
 | UI-R1-3B (v2) | w/ thinking    | 97.6     | 79.6     | 92.3      | 67.9      | 88.9     | 77.8     | 85.8 / 60         |
 | **UI-R1-E-3B**    | w/o thinking   | **98.2** | 83.9     | **94.8**  | **75.0**  | **93.2** | **83.7** | **89.5** / **28** |
+## Benchmark 2: ScreenSpot-Pro
 | ScreenSpot-Pro | inference mode | Average Length↓ | Average Accuracy↑ |
 | -------------- | -------------- | --------------- | ---------------- |
 | GUI-R1-3B      | w/ thinking    | 114             | 26.6             |
 | UI-R1-3B (v2)  | w/ thinking    | 129             | 29.8             |
 | **UI-R1-E-3B**     | w/o thinking   | **28**          | **33.5**         |
 ## Evaluation Method for GUI Grounding
 1. Prompt for UI-R1-E-3B：