Update pipeline tag and add library name (#3)

Browse files

- Update pipeline tag and add library name (1d67590cdb934819669c73b04fc99f32ada05623)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +18 -8

README.md CHANGED Viewed

@@ -1,9 +1,11 @@
 ---
-license: apache-2.0
 base_model:
 - Qwen/Qwen3-4B
-pipeline_tag: text-classification
 ---
 # Skywork-Reward-V2
 <div align="center">
@@ -27,7 +29,7 @@ pipeline_tag: text-classification
 **Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared with the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
 - **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
-- **State-of-the-art performance on seven major reward model benchmarks**, including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
 - **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
 <div align="center">
@@ -118,8 +120,12 @@ rm = AutoModelForSequenceClassification.from_pretrained(
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
-response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
-response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
@@ -193,7 +199,7 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
     for conv in convs:
         conv = tokenizer.apply_chat_template(conv, tokenize=False)
         if tokenizer.bos_token is not None and conv.startswith(tokenizer.bos_token):
-            conv = conv[len(tokenizer.bos_token) :]
         convs_formatted.append(conv)
     payload.update({"text": convs_formatted})
@@ -212,8 +218,12 @@ def process_convs(convs, base_url, tokenizer, model_name_or_path):
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
-response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
-response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.\n2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.\n3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]

 ---
 base_model:
 - Qwen/Qwen3-4B
+license: apache-2.0
+pipeline_tag: text-ranking
+library_name: transformers
 ---
 # Skywork-Reward-V2
 <div align="center">
 **Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared with the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
 - **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
+- **State-of-the-art performance on seven major reward model benchmarks** (as shown in the table below), including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
 - **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
 <div align="center">
 tokenizer = AutoTokenizer.from_pretrained(model_name)
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
+response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
+response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]
     for conv in convs:
         conv = tokenizer.apply_chat_template(conv, tokenize=False)
         if tokenizer.bos_token is not None and conv.startswith(tokenizer.bos_token):
+            conv = conv[len(tokenizer.bos_token):]
         convs_formatted.append(conv)
     payload.update({"text": convs_formatted})
 prompt = "Jane has 12 apples. She gives 4 apples to her friend Mark, then buys 1 more apple, and finally splits all her apples equally among herself and her 2 siblings. How many apples does each person get?"
+response1 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among herself and her 2 siblings (3 people in total). 9 ÷ 3 = 3 apples each. Each person gets 3 apples."
+response2 = "1. Jane starts with 12 apples and gives 4 to Mark. 12 - 4 = 8. Jane now has 8 apples.
+2. Jane buys 1 more apple. 8 + 1 = 9. Jane now has 9 apples.
+3. Jane splits the 9 apples equally among her 2 siblings (2 people in total). 9 ÷ 2 = 4.5 apples each. Each person gets 4 apples."
 conv1 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response1}]
 conv2 = [{"role": "user", "content": prompt}, {"role": "assistant", "content": response2}]