chrisliu298 nielsr HF Staff commited on
Commit
9a93596
·
verified ·
1 Parent(s): d1a11d6

Update pipeline tag and add library name (#3)

Browse files

- Update pipeline tag and add library name (1d67590cdb934819669c73b04fc99f32ada05623)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +18 -8
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
- license: apache-2.0
3
  base_model:
4
  - Qwen/Qwen3-4B
5
- pipeline_tag: text-classification
 
 
6
  ---
 
7
  # Skywork-Reward-V2
8
 
9
  <div align="center">
@@ -27,7 +29,7 @@ pipeline_tag: text-classification
27
  **Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared with the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
28
 
29
  - **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
30
- - **State-of-the-art performance on seven major reward model benchmarks**, including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
31
  - **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
32
 
33
  <div align="center">
@@ -118,8 +120,12 @@ rm = AutoModelForSequenceClassification.from_pretrained(
118
  tokenizer = AutoTokenizer.from_pretrained(model_name)
119
 
120
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
121
- response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
122
- response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 
 
 
 
123
 
124
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
125
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
@@ -193,7 +199,7 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
193
  for conv in convs:
194
  conv = tokenizer.apply_chat_template(conv, tokenize=False)
195
  if tokenizer.bos_token is not None and conv.startswith(tokenizer.bos_token):
196
- conv = conv[len(tokenizer.bos_token) :]
197
  convs_formatted.append(conv)
198
 
199
  payload.update({"text": convs_formatted})
@@ -212,8 +218,12 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
212
 
213
 
214
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
215
- response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
216
- response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 
 
 
 
217
 
218
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
219
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
 
1
  ---
 
2
  base_model:
3
  - Qwen/Qwen3-4B
4
+ license: apache-2.0
5
+ pipeline_tag: text-ranking
6
+ library_name: transformers
7
  ---
8
+
9
  # Skywork-Reward-V2
10
 
11
  <div align="center">
 
29
  **Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared with the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
30
 
31
  - **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
32
+ - **State-of-the-art performance on seven major reward model benchmarks** (as shown in the table below), including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
33
  - **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
34
 
35
  <div align="center">
 
120
  tokenizer = AutoTokenizer.from_pretrained(model_name)
121
 
122
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
123
+ response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
124
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
125
+ 3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
126
+ response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
127
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
128
+ 3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
129
 
130
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
131
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
 
199
  for conv in convs:
200
  conv = tokenizer.apply_chat_template(conv, tokenize=False)
201
  if tokenizer.bos_token is not None and conv.startswith(tokenizer.bos_token):
202
+ conv = conv[len(tokenizer.bos_token):]
203
  convs_formatted.append(conv)
204
 
205
  payload.update({"text": convs_formatted})
 
218
 
219
 
220
  prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
221
+ response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
222
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
223
+ 3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
224
+ response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
225
+ 2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
226
+ 3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
227
 
228
  conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
229
  conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]