Image-Text-to-Text
Safetensors
llava_llama
BoyuNLP commited on
Commit
9cdba25
·
verified ·
1 Parent(s): 28519a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -27
README.md CHANGED
@@ -18,39 +18,33 @@ UGround is a strong GUI visual grounding model trained with a simple recipe. Che
18
 
19
  ## Models
20
 
21
- - Initial UGround-V1: https://huggingface.co/osunlp/UGround
22
- - UGround-V1-2B (Qwen2-VL): https://huggingface.co/osunlp/UGround-V1-2B
23
- - UGround-V1-7B (Qwen2-VL): https://huggingface.co/osunlp/UGround-V1-7B
24
- - UGround-V1-72B (Qwen2-VL): https://huggingface.co/osunlp/UGround-V1-72B
 
 
25
 
26
  ## Release Plan
27
 
28
- - [x] Model Weights
29
- - [x] Initial V1 (the one used in the paper)
30
- - [x] Qwen2-VL-based V1
31
- - [x] 2B
32
- - [x] 7B
33
- - [ ] 72B
34
- - [ ] V1.1
35
- - [ ] Code
36
- - [x] Inference Code of UGround
37
- - [x] Offline Experiments
38
- - [x] Screenspot (along with referring expressions generated by GPT-4/4o)
39
- - [x] Multimodal-Mind2Web
40
- - [x] OmniAct
41
- - [ ] Android Control
42
- - [ ] Online Experiments
43
- - [ ] Mind2Web-Live-SeeAct-V
44
- - [ ] AndroidWorld-SeeAct-V
45
- - [ ] Data-V1
46
- - [ ] Data Examples
47
- - [ ] Data Construction Scripts
48
- - [ ] Guidance of Open-source Data
49
- - [ ] Data-V1.1
50
  - [x] Online Demo (HF Spaces)
51
 
52
 
53
-
54
  ## Main Results
55
 
56
  ### GUI Visual Grounding: ScreenSpot (Standard Setting)
 
18
 
19
  ## Models
20
 
21
+ - Model-V1:
22
+ - [Initial UGround](https://huggingface.co/osunlp/UGround):
23
+ - [UGround-V1-2B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-2B)
24
+ - [UGround-V1-7B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-7B)
25
+ - [UGround-V1-72B (Qwen2-VL)](https://huggingface.co/osunlp/UGround-V1-72B)
26
+ - [Training Data](https://huggingface.co/datasets/osunlp/UGround-V1-Data)
27
 
28
  ## Release Plan
29
 
30
+ - [x] [Model Weights](https://huggingface.co/collections/osunlp/uground-677824fc5823d21267bc9812)
31
+ - [x] Initial Version (the one used in the paper)
32
+ - [x] Qwen2-VL-Based V1 (2B, 7B, 72B)
33
+ - [x] Code
34
+ - [x] [Inference Code of UGround (Initial & Qwen2-VL-Based)](https://github.com/boyugou/llava_uground/)
35
+ - [x] Offline Experiments (Code, Results, and Useful Resources)
36
+ - [x] [ScreenSpot](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/ScreenSpot)
37
+ - [x] [Multimodal-Mind2Web](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/Multimodal-Mind2Web)
38
+ - [x] [OmniAct](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/OmniACT)
39
+ - [x] [Android Control](https://github.com/OSU-NLP-Group/UGround/tree/main/offline_evaluation/AndroidControl)
40
+ - [x] Online Experiments
41
+ - [x] [Mind2Web-Live-SeeAct-V](https://github.com/boyugou/Mind2Web_Live_SeeAct_V)
42
+ - [x] [AndroidWorld-SeeAct-V](https://github.com/boyugou/android_world_seeact_v)
43
+ - [ ] Data Synthesis Pipeline (Coming Soon)
44
+ - [x] [Training-Data (V1)](https://huggingface.co/datasets/osunlp/UGround-V1-Data)
 
 
 
 
 
 
 
45
  - [x] Online Demo (HF Spaces)
46
 
47
 
 
48
  ## Main Results
49
 
50
  ### GUI Visual Grounding: ScreenSpot (Standard Setting)