Add placeholder link to code repository

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +51 -19
README.md CHANGED
@@ -1,20 +1,19 @@
1
  ---
 
 
2
  library_name: transformers
3
  license: other
4
  license_name: nvidia-open-model-license
5
- license_link: >-
6
- https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
7
  pipeline_tag: text-generation
8
- language:
9
- - en
10
  tags:
11
- - nvidia
12
- - reasoning
13
- - math
14
- - code
15
- - supervised fine-tuning
16
- - reinforcement learning
17
- - pytorch
18
  ---
19
 
20
  # AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
@@ -32,6 +31,8 @@ tags:
32
 
33
  We're thrilled to introduce [AceReason-Nemotron-1.1-7B](https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B), a math and code reasoning model built upon the Qwen2.5-Math-7B base. The model is first trained with supervised fine-tuning (SFT) on math and code tasks, then further enhanced through reinforcement learning (RL) using the same recipe as [AceReason-Nemotron-1.0-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B). We initiate RL training from various SFT models and find that stronger SFT models continue to produce consistently better results after large-scale RL, although the performance gap narrows during RL training. Thanks to its stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models on challenging math and code reasoning benchmarks. For more details, check our [technical report](https://arxiv.org/abs/2506.13284).
34
 
 
 
35
  ## Results
36
 
37
  We evaluate our model against competitive reasoning models of comparable size on AIME 2024, AIME 2025, and LiveCodeBench (LCB) v5 (2024/08/01 - 2025/02/01) and v6 (2025/02/01-2025/05/01).
@@ -91,22 +92,53 @@ math_question = "MATH_QUESTION"
91
  math_instruction = "Please place your final answer inside \\boxed{}."
92
  system_instruction = "You are a helpful and harmless assistant. You should think step-by-step."
93
 
94
- final_prompt = "<|im_start|>system\n" + system_instruction + "<|im_end|>\n<|im_start|>user\n" + math_question + "\n\n" + math_instruction + "<|im_end|>\n<|im_start|>assistant\n<think>\n"
 
 
 
 
 
 
 
 
95
  ```
96
  3. We recommend using the following instruction for code questions:
97
  ```python
98
  code_question = "CODE_QUESTION"
99
  starter_code = "STARTER_CODE" # starter code function header, set empty string ("") if there is no starter code
100
 
101
- code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
102
- code_instruction_hasstartercode = """Please place the solution code in the following format:\n```python\n# Your solution code here\n```"""
 
 
 
 
 
 
103
  if starter_code != "":
104
- code_question += "\n\n" + "Solve the problem starting with the provided function header.\n\nFunction header:\n" + "```\n" + starter_code + "\n```"
105
- code_question += "\n\n" + code_instruction_hasstartercode
 
 
 
 
 
 
 
 
 
106
  else:
107
- code_question += "\n\n" + code_instruction_nostartercode
108
 
109
- final_prompt = "<|im_start|>system\n" + system_instruction + "<|im_end|>\n<|im_start|>user\n" + code_question + "<|im_end|>\n<|im_start|>assistant\n<think>\n"
 
 
 
 
 
 
 
 
110
  ```
111
  4. Our inference engine for evaluation is vLLM==0.7.3 using top-p=0.95, temperature=0.6, max_tokens=32768.
112
 
@@ -134,4 +166,4 @@ June 16, 2025
134
  journal={arXiv preprint arXiv:2506.13284},
135
  year={2025}
136
  }
137
- ```
 
1
  ---
2
+ language:
3
+ - en
4
  library_name: transformers
5
  license: other
6
  license_name: nvidia-open-model-license
7
+ license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
 
8
  pipeline_tag: text-generation
 
 
9
  tags:
10
+ - nvidia
11
+ - reasoning
12
+ - math
13
+ - code
14
+ - supervised fine-tuning
15
+ - reinforcement learning
16
+ - pytorch
17
  ---
18
 
19
  # AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy
 
31
 
32
  We're thrilled to introduce [AceReason-Nemotron-1.1-7B](https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B), a math and code reasoning model built upon the Qwen2.5-Math-7B base. The model is first trained with supervised fine-tuning (SFT) on math and code tasks, then further enhanced through reinforcement learning (RL) using the same recipe as [AceReason-Nemotron-1.0-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B). We initiate RL training from various SFT models and find that stronger SFT models continue to produce consistently better results after large-scale RL, although the performance gap narrows during RL training. Thanks to its stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models on challenging math and code reasoning benchmarks. For more details, check our [technical report](https://arxiv.org/abs/2506.13284).
33
 
34
+ Code: TBD.
35
+
36
  ## Results
37
 
38
  We evaluate our model against competitive reasoning models of comparable size on AIME 2024, AIME 2025, and LiveCodeBench (LCB) v5 (2024/08/01 - 2025/02/01) and v6 (2025/02/01-2025/05/01).
 
92
  math_instruction = "Please place your final answer inside \\boxed{}."
93
  system_instruction = "You are a helpful and harmless assistant. You should think step-by-step."
94
 
95
+ final_prompt = "<|im_start|>system
96
+ " + system_instruction + "<|im_end|>
97
+ <|im_start|>user
98
+ " + math_question + "
99
+
100
+ " + math_instruction + "<|im_end|>
101
+ <|im_start|>assistant
102
+ <think>
103
+ "
104
  ```
105
  3. We recommend using the following instruction for code questions:
106
  ```python
107
  code_question = "CODE_QUESTION"
108
  starter_code = "STARTER_CODE" # starter code function header, set empty string ("") if there is no starter code
109
 
110
+ code_instruction_nostartercode = """Write Python code to solve the problem. Please place the solution code in the following format:
111
+ ```python
112
+ # Your solution code here
113
+ ```"""
114
+ code_instruction_hasstartercode = """Please place the solution code in the following format:
115
+ ```python
116
+ # Your solution code here
117
+ ```"""
118
  if starter_code != "":
119
+ code_question += "
120
+
121
+ " + "Solve the problem starting with the provided function header.
122
+
123
+ Function header:
124
+ " + "```
125
+ " + starter_code + "
126
+ ```"
127
+ code_question += "
128
+
129
+ " + code_instruction_hasstartercode
130
  else:
131
+ code_question += "
132
 
133
+ " + code_instruction_nostartercode
134
+
135
+ final_prompt = "<|im_start|>system
136
+ " + system_instruction + "<|im_end|>
137
+ <|im_start|>user
138
+ " + code_question + "<|im_end|>
139
+ <|im_start|>assistant
140
+ <think>
141
+ "
142
  ```
143
  4. Our inference engine for evaluation is vLLM==0.7.3 using top-p=0.95, temperature=0.6, max_tokens=32768.
144
 
 
166
  journal={arXiv preprint arXiv:2506.13284},
167
  year={2025}
168
  }
169
+ ```