nvidia
/

AceMath-RL-Nemotron-7B

Text Generation

reinforcement learning

text-generation-inference

Model card Files Files and versions Community

zihanliu commited on Apr 23

Commit

009977d

·

verified ·

1 Parent(s): 0335b64

Update README.md

Files changed (1) hide show

README.md +9 -2

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ tags:
 We’re thrilled to introduce AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning (RL), starting from the Deepseek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% Pass@1 accuracy on AIME 2024 (+13.5% gain) and 53.6% Pass@1 accuracy on AIME 2025 (+14.4% gain).
 Interestingly, this math-focused RL training also improves the model’s coding accuracy on LiveCodeBench, reaching 44.4% Pass@1 (+6.8% gain), demonstrating the generalization capabilities of scaled RL training.
-We share our training recipe, training logs, and data curation details in our [BLOG](LINK).
 ## Results
@@ -76,8 +76,15 @@ generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
 ## Correspondence to
-Yang Chen, Zihan Liu, Chankyu Lee, Wei Ping
 ## License

 We’re thrilled to introduce AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning (RL), starting from the Deepseek-R1-Distilled-Qwen-7B. It delivers impressive results, achieving 69.0% Pass@1 accuracy on AIME 2024 (+13.5% gain) and 53.6% Pass@1 accuracy on AIME 2025 (+14.4% gain).
 Interestingly, this math-focused RL training also improves the model’s coding accuracy on LiveCodeBench, reaching 44.4% Pass@1 (+6.8% gain), demonstrating the generalization capabilities of scaled RL training.
+We share our training recipe, training logs, and data curation details in our [BLOG](https://research.nvidia.com/labs/adlr/acemath_rl/).
 ## Results
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ```
+## Usage Recommendations
+1. Don't include a system prompt; instead, place all instructions directly in the user prompt.
+2. We recommend using the following prompt format for math questions:<br>*<｜begin▁of▁sentence｜><｜User｜>{math_question}\nPlease reason step by step, and put your final answer within \boxed{}.<｜Assistant｜><think>\n*
 ## Correspondence to
+Yang Chen ([email protected]),<br>Zihan Liu ([email protected]),<br>Chankyu Lee ([email protected]),<br>Wei Ping ([email protected])
 ## License