Update pipeline tag and add library name (#3)
Browse files- Update pipeline tag and add library name (1d67590cdb934819669c73b04fc99f32ada05623)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,9 +1,11 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
base_model:
|
4 |
- Qwen/Qwen3-4B
|
5 |
-
|
|
|
|
|
6 |
---
|
|
|
7 |
# Skywork-Reward-V2
|
8 |
|
9 |
<div align="center">
|
@@ -27,7 +29,7 @@ pipeline_tag: text-classification
|
|
27 |
**Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared with the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
|
28 |
|
29 |
- **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
|
30 |
-
- **State-of-the-art performance on seven major reward model benchmarks
|
31 |
- **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
|
32 |
|
33 |
<div align="center">
|
@@ -118,8 +120,12 @@ rm = AutoModelForSequenceClassification.from_pretrained(
|
|
118 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
119 |
|
120 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
121 |
-
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples
|
122 |
-
|
|
|
|
|
|
|
|
|
123 |
|
124 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
125 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|
@@ -193,7 +199,7 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
|
|
193 |
for conv in convs:
|
194 |
conv = tokenizer.apply_chat_template(conv, tokenize=False)
|
195 |
if tokenizer.bos_token is not None and conv.startswith(tokenizer.bos_token):
|
196 |
-
conv = conv[len(tokenizer.bos_token)
|
197 |
convs_formatted.append(conv)
|
198 |
|
199 |
payload.update({"text": convs_formatted})
|
@@ -212,8 +218,12 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
|
|
212 |
|
213 |
|
214 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
215 |
-
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples
|
216 |
-
|
|
|
|
|
|
|
|
|
217 |
|
218 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
219 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|
|
|
1 |
---
|
|
|
2 |
base_model:
|
3 |
- Qwen/Qwen3-4B
|
4 |
+
license: apache-2.0
|
5 |
+
pipeline_tag: text-ranking
|
6 |
+
library_name: transformers
|
7 |
---
|
8 |
+
|
9 |
# Skywork-Reward-V2
|
10 |
|
11 |
<div align="center">
|
|
|
29 |
**Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared with the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
|
30 |
|
31 |
- **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
|
32 |
+
- **State-of-the-art performance on seven major reward model benchmarks** (as shown in the table below), including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
|
33 |
- **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
|
34 |
|
35 |
<div align="center">
|
|
|
120 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
121 |
|
122 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
123 |
+
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
124 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
125 |
+
3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
|
126 |
+
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
127 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
128 |
+
3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
|
129 |
|
130 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
131 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|
|
|
199 |
for conv in convs:
|
200 |
conv = tokenizer.apply_chat_template(conv, tokenize=False)
|
201 |
if tokenizer.bos_token is not None and conv.startswith(tokenizer.bos_token):
|
202 |
+
conv = conv[len(tokenizer.bos_token):]
|
203 |
convs_formatted.append(conv)
|
204 |
|
205 |
payload.update({"text": convs_formatted})
|
|
|
218 |
|
219 |
|
220 |
prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
|
221 |
+
response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
222 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
223 |
+
3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
|
224 |
+
response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
|
225 |
+
2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
|
226 |
+
3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
|
227 |
|
228 |
conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
|
229 |
conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
|