nlile commited on
Commit
c88b0de
·
verified ·
1 Parent(s): 1e38d04

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - reasoning
5
+ - mathematics
6
+ - reinforcement-learning
7
+ datasets:
8
+ - AIME
9
+ - AMC
10
+ - Omni-Math
11
+ base_model: R1-Distill-Qwen-1.5B
12
+ ---
13
+
14
+ # ALP_R1_Qwen1.5B
15
+
16
+ R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.
17
+
18
+ ## Training
19
+ - 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
20
+ - 16 rollouts/prompt for difficulty estimation
21
+ - 8K context window
22
+
23
+ ## Performance (Pass@1)
24
+ - MATH-500: 0.81
25
+ - AIME: 0.252
26
+ - OlympiadBench: 0.51
27
+
28
+ ## Token Usage
29
+ - MATH: 2804→862 (-69%)
30
+ - AIME: 4007→3331 (-17%)
31
+ - Olympiad: 3606→2107 (-42%)
32
+
33
+ ## Usage
34
+ ```python
35
+ prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."