Triangle104
/

AceReason-Nemotron-14B-Q8_0-GGUF

Text Generation

reinforcement learning

Model card Files Files and versions Community

Triangle104 commited on May 31

Commit

decf6d1

·

verified ·

1 Parent(s): c5bde2c

Update README.md

Files changed (1) hide show

README.md +20 -0

README.md CHANGED Viewed

@@ -22,6 +22,26 @@ base_model: nvidia/AceReason-Nemotron-14B
 This model was converted to GGUF format from [`nvidia/AceReason-Nemotron-14B`](https://huggingface.co/nvidia/AceReason-Nemotron-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/nvidia/AceReason-Nemotron-14B) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`nvidia/AceReason-Nemotron-14B`](https://huggingface.co/nvidia/AceReason-Nemotron-14B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/nvidia/AceReason-Nemotron-14B) for more details on the model.
+---
+We're thrilled to introduce AceReason-Nemotron-14B, a math and code
+reasoning model trained entirely through reinforcement learning (RL),
+starting from the DeepSeek-R1-Distilled-Qwen-14B. It delivers impressive
+ results, achieving 78.6% on AIME 2024 (+8.9%), 67.4% on AIME 2025
+(+17.4%), 61.1% on LiveCodeBench v5 (+8%), 54.9% on LiveCodeBench v6
+(+7%), and 2024 on Codeforces (+543). We systematically study the RL
+training process through extensive ablations and propose a simple yet
+effective approach: first RL training on math-only prompts, then RL
+training on code-only prompts. Notably, we find that math-only RL not
+only significantly enhances the performance of strong distilled models
+on math benchmarks, but also code reasoning tasks. In addition, extended
+ code-only RL further improves code benchmark performance while causing
+minimal degradation in math results. We find that RL not only elicits
+the foundational reasoning capabilities acquired during pre-training and
+ supervised fine-tuning (e.g., distillation), but also pushes the limits
+ of the model's reasoning ability, enabling it to solve problems that
+were previously unsolvable.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)