File size: 718 Bytes
c88b0de |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
---
license: apache-2.0
tags:
- reasoning
- mathematics
- reinforcement-learning
datasets:
- AIME
- AMC
- Omni-Math
base_model: R1-Distill-Qwen-1.5B
---
# ALP_R1_Qwen1.5B
R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.
## Training
- 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
- 16 rollouts/prompt for difficulty estimation
- 8K context window
## Performance (Pass@1)
- MATH-500: 0.81
- AIME: 0.252
- OlympiadBench: 0.51
## Token Usage
- MATH: 2804→862 (-69%)
- AIME: 4007→3331 (-17%)
- Olympiad: 3606→2107 (-42%)
## Usage
```python
prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}." |