Update README.md
Browse files
README.md
CHANGED
@@ -30,7 +30,7 @@ tags:
|
|
30 |
|
31 |
<img src="fig/main_fig.png" alt="main_fig" style="width: 1000px; max-width: 100%;" />
|
32 |
|
33 |
-
We're thrilled to introduce [AceReason-Nemotron-1.1-7B](https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B), a math and code reasoning model built upon the Qwen2.5-Math-7B base. The model is first trained with supervised fine-tuning (SFT) on math and code tasks, then further enhanced through reinforcement learning (RL) using the same recipe as [AceReason-Nemotron-1.0-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B)
|
34 |
|
35 |
## Results
|
36 |
|
|
|
30 |
|
31 |
<img src="fig/main_fig.png" alt="main_fig" style="width: 1000px; max-width: 100%;" />
|
32 |
|
33 |
+
We're thrilled to introduce [AceReason-Nemotron-1.1-7B](https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B), a math and code reasoning model built upon the Qwen2.5-Math-7B base. The model is first trained with supervised fine-tuning (SFT) on math and code tasks, then further enhanced through reinforcement learning (RL) using the same recipe as [AceReason-Nemotron-1.0-7B](https://huggingface.co/nvidia/AceReason-Nemotron-7B). We initiate RL training from various SFT models and find that stronger SFT models continue to produce consistently better results after large-scale RL, although the performance gap narrows during RL training. Thanks to its stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models on challenging math and code reasoning benchmarks. For more details, check our [technical report](https://arxiv.org/abs/2506.13284).
|
34 |
|
35 |
## Results
|
36 |
|