zihanliu commited on
Commit
009977d
·
verified ·
1 Parent(s): 0335b64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -2
README.md CHANGED
@@ -21,7 +21,7 @@ tags:
21
  We’re thrilled to introduce AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning (RL), starting from the Deepseek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% Pass@1 accuracy on AIME 2024 (+13.5% gain) and 53.6% Pass@1 accuracy on AIME 2025 (+14.4% gain).
22
  Interestingly, this math-focused RL training also improves the model’s coding accuracy on LiveCodeBench, reaching 44.4% Pass@1 (+6.8% gain), demonstrating the generalization capabilities of scaled RL training.
23
 
24
- We share our training recipe, training logs, and data curation details in our [BLOG](LINK).
25
 
26
 
27
  ## Results
@@ -76,8 +76,15 @@ generated_ids = [
76
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
77
  ```
78
 
 
 
 
 
 
 
 
79
  ## Correspondence to
80
- Yang Chen, Zihan Liu, Chankyu Lee, Wei Ping
81
 
82
 
83
  ## License
 
21
  We’re thrilled to introduce AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning (RL), starting from the Deepseek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% Pass@1 accuracy on AIME 2024 (+13.5% gain) and 53.6% Pass@1 accuracy on AIME 2025 (+14.4% gain).
22
  Interestingly, this math-focused RL training also improves the model’s coding accuracy on LiveCodeBench, reaching 44.4% Pass@1 (+6.8% gain), demonstrating the generalization capabilities of scaled RL training.
23
 
24
+ We share our training recipe, training logs, and data curation details in our [BLOG](https://research.nvidia.com/labs/adlr/acemath_rl/).
25
 
26
 
27
  ## Results
 
76
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
77
  ```
78
 
79
+
80
+ ## Usage Recommendations
81
+
82
+ 1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
83
+ 2. We recommend using the following prompt format for math questions:<br>*<|begin▁of▁sentence|><|User|>{math_question}\nPlease reason step by step, and put your final answer within \boxed{}.<|Assistant|><think>\n*
84
+
85
+
86
  ## Correspondence to
87
+ Yang Chen ([email protected]),<br>Zihan Liu ([email protected]),<br>Chankyu Lee ([email protected]),<br>Wei Ping ([email protected])
88
 
89
 
90
  ## License