Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,26 @@ base_model: nvidia/AceReason-Nemotron-14B
|
|
22 |
This model was converted to GGUF format from [`nvidia/AceReason-Nemotron-14B`](https://huggingface.co/nvidia/AceReason-Nemotron-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
23 |
Refer to the [original model card](https://huggingface.co/nvidia/AceReason-Nemotron-14B) for more details on the model.
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
## Use with llama.cpp
|
26 |
Install llama.cpp through brew (works on Mac and Linux)
|
27 |
|
|
|
22 |
This model was converted to GGUF format from [`nvidia/AceReason-Nemotron-14B`](https://huggingface.co/nvidia/AceReason-Nemotron-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
23 |
Refer to the [original model card](https://huggingface.co/nvidia/AceReason-Nemotron-14B) for more details on the model.
|
24 |
|
25 |
+
---
|
26 |
+
We're thrilled to introduce AceReason-Nemotron-14B, a math and code
|
27 |
+
reasoning model trained entirely through reinforcement learning (RL),
|
28 |
+
starting from the DeepSeek-R1-Distilled-Qwen-14B. It delivers impressive
|
29 |
+
results, achieving 78.6% on AIME 2024 (+8.9%), 67.4% on AIME 2025
|
30 |
+
(+17.4%), 61.1% on LiveCodeBench v5 (+8%), 54.9% on LiveCodeBench v6
|
31 |
+
(+7%), and 2024 on Codeforces (+543). We systematically study the RL
|
32 |
+
training process through extensive ablations and propose a simple yet
|
33 |
+
effective approach: first RL training on math-only prompts, then RL
|
34 |
+
training on code-only prompts. Notably, we find that math-only RL not
|
35 |
+
only significantly enhances the performance of strong distilled models
|
36 |
+
on math benchmarks, but also code reasoning tasks. In addition, extended
|
37 |
+
code-only RL further improves code benchmark performance while causing
|
38 |
+
minimal degradation in math results. We find that RL not only elicits
|
39 |
+
the foundational reasoning capabilities acquired during pre-training and
|
40 |
+
supervised fine-tuning (e.g., distillation), but also pushes the limits
|
41 |
+
of the model's reasoning ability, enabling it to solve problems that
|
42 |
+
were previously unsolvable.
|
43 |
+
|
44 |
+
---
|
45 |
## Use with llama.cpp
|
46 |
Install llama.cpp through brew (works on Mac and Linux)
|
47 |
|