Update model card for Pref-GRPO: add pipeline tag, library, and correct paper/project/code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +36 -26
README.md CHANGED
@@ -1,29 +1,33 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - CodeGoat24/HPD
5
  - CodeGoat24/LiFT-HRA
6
  - CodeGoat24/OIP
7
  - CodeGoat24/EvalMuse
8
  - CodeGoat24/ShareGPTVideo-DPO
9
- - CodeGoat24/VideoFeedback
10
  - CodeGoat24/LLaVA-Critic-113k
11
  - CodeGoat24/VideoDPO
12
- base_model:
13
- - Qwen/Qwen2.5-VL-7B-Instruct
 
14
  ---
15
 
 
16
 
17
- # UnifiedReward-qwen-7B
18
  We are actively gathering feedback from the community to improve our models. **We welcome your input and encourage you to stay updated through our repository**!!
19
 
20
  ## Model Summary
21
 
22
- `UnifiedReward-qwen-7b` is the first unified reward model based on [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for multimodal understanding and generation assessment, enabling both pairwise ranking and pointwise scoring, which can be employed for vision model preference alignment.
 
 
23
 
24
  For further details, please refer to the following resources:
25
- - πŸ“° Paper: https://arxiv.org/pdf/2503.05236
26
- - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/
 
27
  - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
28
  - πŸ€— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
29
  - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
@@ -31,22 +35,22 @@ For further details, please refer to the following resources:
31
 
32
  ## 🏁 Compared with Current Reward Models
33
 
34
- | Reward Model | Method| Image Generation | Image Understanding | Video Generation | Video Understanding
35
  | :-----: | :-----: |:-----: |:-----: | :-----: | :-----: |
36
- | [PickScore](https://github.com/yuvalkirstain/PickScore) |Point | √ | | ||
37
- | [HPS](https://github.com/tgxs002/HPSv2) | Point | √ | |||
38
- | [ImageReward](https://github.com/THUDM/ImageReward) | Point| √| |||
39
- | [LLaVA-Critic](https://huggingface.co/lmms-lab/llava-critic-7b) | Pair/Point | | √ |||
40
- | [IXC-2.5-Reward](https://github.com/InternLM/InternLM-XComposer) | Pair/Point | | √ ||√|
41
- | [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore) | Point | | |√ ||
42
- | [LiFT](https://github.com/CodeGoat24/LiFT) | Point | | |√| |
43
- | [VisionReward](https://github.com/THUDM/VisionReward) | Point |√ | |√||
44
- | [VideoReward](https://github.com/KwaiVGI/VideoAlign) | Point | | |√ ||
45
- | UnifiedReward (Ours) | Pair/Point | √ | √ |√|√|
46
 
47
 
48
  ### Quick Start
49
- All pair rank and point score inference codes are provided in our [github](https://github.com/CodeGoat24/UnifiedReward).
50
 
51
  We take image understanding assessment as example here:
52
  ~~~python
@@ -57,6 +61,7 @@ import tqdm
57
  from PIL import Image
58
  import warnings
59
  import os
 
60
  from transformers import AutoProcessor, AutoTokenizer, Qwen2_5_VLForConditionalGeneration
61
  from qwen_vl_utils import process_vision_info
62
 
@@ -72,7 +77,12 @@ processor = AutoProcessor.from_pretrained(model_path)
72
  url = "https://github.com/LLaVA-VL/blog/blob/main/2024-10-03-llava-critic/static/images/critic_img_seven.png?raw=True"
73
  image = Image.open(requests.get(url, stream=True).raw)
74
 
75
- prompt_text = f'Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:\nQuestion: [What this image presents?]\nThe first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]\nThe second response: [This is a handwritten number seven.]\nASSISTANT:\n'
 
 
 
 
 
76
 
77
  messages = [
78
  {
@@ -109,11 +119,11 @@ print(output)
109
 
110
  ## Citation
111
 
112
- ```
113
- @article{unifiedreward,
114
- title={Unified reward model for multimodal understanding and generation},
115
- author={Wang, Yibin and Zang, Yuhang and Li, Hao and Jin, Cheng and Wang, Jiaqi},
116
- journal={arXiv preprint arXiv:2503.05236},
117
  year={2025}
118
  }
119
  ```
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-VL-7B-Instruct
4
  datasets:
5
  - CodeGoat24/HPD
6
  - CodeGoat24/LiFT-HRA
7
  - CodeGoat24/OIP
8
  - CodeGoat24/EvalMuse
9
  - CodeGoat24/ShareGPTVideo-DPO
 
10
  - CodeGoat24/LLaVA-Critic-113k
11
  - CodeGoat24/VideoDPO
12
+ license: mit
13
+ pipeline_tag: image-text-to-text
14
+ library_name: transformers
15
  ---
16
 
17
+ # UnifiedReward-qwen-7B: A Reward Model for Pref-GRPO
18
 
 
19
  We are actively gathering feedback from the community to improve our models. **We welcome your input and encourage you to stay updated through our repository**!!
20
 
21
  ## Model Summary
22
 
23
+ `UnifiedReward-qwen-7b` is the first unified reward model based on [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for multimodal understanding and generation assessment. It enables both pairwise ranking and pointwise scoring, and is notably employed for vision model preference alignment within the **Pref-GRPO** framework.
24
+
25
+ This model is a key component of the research presented in the paper [**Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning**](https://huggingface.co/papers/2508.20751).
26
 
27
  For further details, please refer to the following resources:
28
+ - πŸ“° Paper: [Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning](https://huggingface.co/papers/2508.20751)
29
+ - πŸͺ Project Page: https://codegoat24.github.io/UnifiedReward/Pref-GRPO
30
+ - πŸ’» Code: https://github.com/CodeGoat24/Pref-GRPO
31
  - πŸ€— Model Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-models-67c3008148c3a380d15ac63a
32
  - πŸ€— Dataset Collections: https://huggingface.co/collections/CodeGoat24/unifiedreward-training-data-67c300d4fd5eff00fa7f1ede
33
  - πŸ‘‹ Point of Contact: [Yibin Wang](https://codegoat24.github.io)
 
35
 
36
  ## 🏁 Compared with Current Reward Models
37
 
38
+ | Reward Model | Method| Image Generation | Image Understanding | Video Generation | Video Understanding
39
  | :-----: | :-----: |:-----: |:-----: | :-----: | :-----: |
40
+ | [PickScore](https://github.com/yuvalkirstain/PickScore) |Point | √ | | ||
41
+ | [HPS](https://github.com/tgxs002/HPSv2) | Point | √ | |||
42
+ | [ImageReward](https://github.com/THUDM/ImageReward) | Point| √| |||
43
+ | [LLaVA-Critic](https://huggingface.co/lmms-lab/llava-critic-7b) | Pair/Point | | √ |||
44
+ | [IXC-2.5-Reward](https://github.com/InternLM/InternLM-XComposer) | Pair/Point | | √ ||√|
45
+ | [VideoScore](https://github.com/TIGER-AI-Lab/VideoScore) | Point | | |\u221a ||
46
+ | [LiFT](https://github.com/CodeGoat24/LiFT) | Point | | |\u221a| |
47
+ | [VisionReward](https://github.com/THUDM/VisionReward) | Point |√ | |\u221a||
48
+ | [VideoReward](https://github.com/KwaiVGI/VideoAlign) | Point | | |\u221a ||
49
+ | UnifiedReward (Ours) | Pair/Point | √ | √ |\u221a|\u221a|
50
 
51
 
52
  ### Quick Start
53
+ All pair rank and point score inference codes are provided in our [GitHub repository](https://github.com/CodeGoat24/Pref-GRPO).
54
 
55
  We take image understanding assessment as example here:
56
  ~~~python
 
61
  from PIL import Image
62
  import warnings
63
  import os
64
+ import requests # Added for image download in example
65
  from transformers import AutoProcessor, AutoTokenizer, Qwen2_5_VLForConditionalGeneration
66
  from qwen_vl_utils import process_vision_info
67
 
 
77
  url = "https://github.com/LLaVA-VL/blog/blob/main/2024-10-03-llava-critic/static/images/critic_img_seven.png?raw=True"
78
  image = Image.open(requests.get(url, stream=True).raw)
79
 
80
+ prompt_text = f'Given an image and a corresponding question, please serve as an unbiased and fair judge to evaluate the quality of the answers provided by a Large Multimodal Model (LMM). Determine which answer is better and explain your reasoning with specific details. Your task is provided as follows:\
81
+ Question: [What this image presents?]\
82
+ The first response: [The image is a black and white sketch of a line that appears to be in the shape of a cross. The line is a simple and straightforward representation of the cross shape, with two straight lines intersecting at a point.]\
83
+ The second response: [This is a handwritten number seven.]\
84
+ ASSISTANT:\
85
+ '
86
 
87
  messages = [
88
  {
 
119
 
120
  ## Citation
121
 
122
+ ```bibtex
123
+ @article{Pref-GRPO&UniGenBench,
124
+ title={Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning},
125
+ author={Wang, Yibin and Li, Zhimin and Zang, Yuhang and Zhou, Yujie and Bu, Jiazi and Wang, Chunyu and Lu, Qinglin, and Jin, Cheng and Wang, Jiaqi},
126
+ journal={arXiv preprint arXiv:2508.20751},
127
  year={2025}
128
  }
129
  ```