Text Generation
Transformers
Safetensors
qwen2
conversational
text-generation-inference
quyanh commited on
Commit
d51df0a
·
verified ·
1 Parent(s): 3aeaea7

update README

Browse files
.gitattributes CHANGED
@@ -34,3 +34,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ assets/costs-1.5b.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/costs-7b.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/performances.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Knovel Engineering
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ license: mit
5
+ datasets:
6
+ - knoveleng/open-rs
7
+ - knoveleng/open-s1
8
+ - knoveleng/open-deepscaler
9
+ base_model:
10
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
11
+ ---
12
+
13
+ # Model Summary
14
+
15
+ This repository hosts model for the **Open RS** project, accompanying the paper *Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn’t*. The project explores enhancing reasoning capabilities in small large language models (LLMs) using reinforcement learning (RL) under resource-constrained conditions.
16
+
17
+ We focus on a 1.5-billion-parameter model, `DeepSeek-R1-Distill-Qwen-1.5B`, trained on 4 NVIDIA A40 GPUs (48 GB VRAM each) within 24 hours. By adapting the Group Relative Policy Optimization (GRPO) algorithm and leveraging a curated, compact mathematical reasoning dataset, we conducted three experiments to assess performance and behavior. Key findings include:
18
+
19
+ - Significant reasoning improvements, e.g., AMC23 accuracy rising from 63% to 80% and AIME24 reaching 46.7%, outperforming `o1-preview`.
20
+ - Efficient training with just 7,000 samples at a cost of $42, compared to thousands of dollars for baseline models.
21
+ - Challenges like optimization instability and length constraints with extended training.
22
+
23
+ These results showcase RL-based fine-tuning as a cost-effective approach for small LLMs, making reasoning capabilities accessible in resource-limited settings. We open-source our code, models, and datasets to support further research.
24
+
25
+ For more details, please refer our [github](https://github.com/knoveleng/open-rs).
26
+
27
+ ## Evaluation
28
+ ### Performance Highlights
29
+ - **Open-RS1**: 53.0% avg. score
30
+ - **Open-RS2**: 55.7% avg. score, 80.0% on AMC23
31
+ - **Open-RS3**: 56.3% avg. score, 46.7% on AIME24 (outperforms `o1-preview` at 44.6%)
32
+ - Competitive MATH-500 scores; Minerva lags behind 7B models.
33
+
34
+ ![Performance Metrics](assets/performances.png)
35
+
36
+ ### Cost Efficiency
37
+ Our approach uses 7,000 samples (42,000 total outputs) and costs ~$42 on 4x A40 GPUs in 24 hours, compared to:
38
+ - 7B models: `Qwen2.5-7B-SimpleRL` ($1,633), `Eurus-2-7B-PRIME` ($1,088)
39
+ - 1.5B models: `DeepScaleR-1.5B-Preview` ($3,629), `Still-3-1.5B-Preview` ($2,268)
40
+
41
+ ![7B Model Costs](assets/costs-7b.png)
42
+ ![1.5B Model Costs](assets/costs-1.5b.png)
43
+
44
+
45
+ ## Citation
46
+ If this project aids your work, please cite it as:
47
+ ```
48
+ @misc{open-rs,
49
+ title = {Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't},
50
+ url = {https://github.com/knoveleng/open-rs},
51
+ author = {Quy-Anh Dang, Chris Ngo},
52
+ month = {March},
53
+ year = {2025}
54
+ }
55
+ ```
56
+
assets/costs-1.5b.png ADDED

Git LFS Details

  • SHA256: 82f1616c5dee41c7cc602e6b26cd9218ff40ad91b3e1b09d9b863271ab073c57
  • Pointer size: 131 Bytes
  • Size of remote file: 107 kB
assets/costs-7b.png ADDED

Git LFS Details

  • SHA256: ff73ad00a6b2e22def8c2573bcc5bd9c893ddb0f6d5d7b5493fb462a0616a6e3
  • Pointer size: 131 Bytes
  • Size of remote file: 139 kB
assets/performances.png ADDED

Git LFS Details

  • SHA256: cd2ffb83740832bb69af3317afa162211499c20f297483a13958fbac92838168
  • Pointer size: 131 Bytes
  • Size of remote file: 281 kB