Triangle104 commited on
Commit
adff63a
·
verified ·
1 Parent(s): e1dda08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -22,6 +22,26 @@ base_model: nvidia/AceReason-Nemotron-14B
22
  This model was converted to GGUF format from [`nvidia/AceReason-Nemotron-14B`](https://huggingface.co/nvidia/AceReason-Nemotron-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
23
  Refer to the [original model card](https://huggingface.co/nvidia/AceReason-Nemotron-14B) for more details on the model.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Use with llama.cpp
26
  Install llama.cpp through brew (works on Mac and Linux)
27
 
 
22
  This model was converted to GGUF format from [`nvidia/AceReason-Nemotron-14B`](https://huggingface.co/nvidia/AceReason-Nemotron-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
23
  Refer to the [original model card](https://huggingface.co/nvidia/AceReason-Nemotron-14B) for more details on the model.
24
 
25
+ ---
26
+ We're thrilled to introduce AceReason-Nemotron-14B, a math and code
27
+ reasoning model trained entirely through reinforcement learning (RL),
28
+ starting from the DeepSeek-R1-Distilled-Qwen-14B. It delivers impressive
29
+ results, achieving 78.6% on AIME 2024 (+8.9%), 67.4% on AIME 2025
30
+ (+17.4%), 61.1% on LiveCodeBench v5 (+8%), 54.9% on LiveCodeBench v6
31
+ (+7%), and 2024 on Codeforces (+543). We systematically study the RL
32
+ training process through extensive ablations and propose a simple yet
33
+ effective approach: first RL training on math-only prompts, then RL
34
+ training on code-only prompts. Notably, we find that math-only RL not
35
+ only significantly enhances the performance of strong distilled models
36
+ on math benchmarks, but also code reasoning tasks. In addition, extended
37
+ code-only RL further improves code benchmark performance while causing
38
+ minimal degradation in math results. We find that RL not only elicits
39
+ the foundational reasoning capabilities acquired during pre-training and
40
+ supervised fine-tuning (e.g., distillation), but also pushes the limits
41
+ of the model's reasoning ability, enabling it to solve problems that
42
+ were previously unsolvable.
43
+
44
+ ---
45
  ## Use with llama.cpp
46
  Install llama.cpp through brew (works on Mac and Linux)
47