osunlp
/

UGround

Image-Text-to-Text

Safetensors

llava_llama

Model card Files Files and versions Community

BoyuNLP commited on Apr 16

Commit

9cdba25

verified ·

1 Parent(s): 28519a6

Update README.md

Browse files

Files changed (1) hide show

README.md +21 -27

README.md CHANGED Viewed

@@ -18,39 +18,33 @@ UGround is a strong GUI visual grounding model trained with a simple recipe. Che
 ## Models
-- Initial UGround-V1: https://huggingface.co/osunlp/UGround
-- UGround-V1-2B (Qwen2-VL): https://huggingface.co/osunlp/UGround-V1-2B
-- UGround-V1-7B (Qwen2-VL): https://huggingface.co/osunlp/UGround-V1-7B
-- UGround-V1-72B (Qwen2-VL): https://huggingface.co/osunlp/UGround-V1-72B
 ## Release Plan
-- [x] Model Weights
-  - [x] Initial V1 (the one used in the paper)
-  - [x] Qwen2-VL-based V1
-    - [x] 2B
-    - [x] 7B
-    - [ ] 72B
-  - [ ] V1.1
-- [ ] Code
-  - [x] Inference Code of UGround
-  - [x] Offline Experiments
-    - [x] Screenspot (along with referring expressions generated by GPT-4/4o)
-    - [x] Multimodal-Mind2Web
-    - [x] OmniAct
-    - [ ] Android Control
-  - [ ] Online Experiments
-    - [ ] Mind2Web-Live-SeeAct-V
-    - [ ] AndroidWorld-SeeAct-V
-- [ ] Data-V1
-  - [ ] Data Examples
-  - [ ] Data Construction Scripts
-  - [ ] Guidance of Open-source Data
-- [ ] Data-V1.1
 - [x] Online Demo (HF Spaces)
 ## Main Results
 ### GUI Visual Grounding: ScreenSpot (Standard Setting)

 ## Models
+- Model-V1:
+  - [Initial UGround](https://huggingface.co/osunlp/UGround):
+  - [UGround-V1-2B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-2B)
+  - [UGround-V1-7B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-7B)
+  - [UGround-V1-72B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-72B)
+  - [Training Data](https://huggingface.co/datasets/osunlp/UGround-V1-Data)
 ## Release Plan
+- [x] [Model Weights](https://huggingface.co/collections/osunlp/uground-677824fc5823d21267bc9812)
+  - [x] Initial Version (the one used in the paper)
+  - [x] Qwen2-VL-Based V1 (2B, 7B, 72B)
+- [x] Code
+  - [x] [Inference Code of UGround (Initial & Qwen2-VL-Based)](https://github.com/boyugou/llava_uground/)
+  - [x] Offline Experiments (Code, Results, and Useful Resources)
+    - [x] [ScreenSpot](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/ScreenSpot)
+    - [x] [Multimodal-Mind2Web](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/Multimodal-Mind2Web)
+    - [x] [OmniAct](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/OmniACT)
+    - [x] [Android Control](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/AndroidControl)
+  - [x] Online Experiments
+    - [x] [Mind2Web-Live-SeeAct-V](https://github.com/boyugou/Mind2Web_Live_SeeAct_V)
+    - [x] [AndroidWorld-SeeAct-V](https://github.com/boyugou/android_world_seeact_v)
+  - [ ] Data Synthesis Pipeline (Coming Soon)
+- [x] [Training-Data (V1)](https://huggingface.co/datasets/osunlp/UGround-V1-Data)
 - [x] Online Demo (HF Spaces)
 ## Main Results
 ### GUI Visual Grounding: ScreenSpot (Standard Setting)