SynthLabsAI
/

ALP_R1_Qwen1.5B

Reinforcement Learning

Model card Files Files and versions Community

nlile commited on 2 days ago

Commit

c88b0de

·

verified ·

1 Parent(s): 1e38d04

Create README.md

Files changed (1) hide show

README.md +35 -0

README.md ADDED Viewed

	@@ -0,0 +1,35 @@

+---
+license: apache-2.0
+tags:
+- reasoning
+- mathematics
+- reinforcement-learning
+datasets:
+- AIME
+- AMC
+- Omni-Math
+base_model: R1-Distill-Qwen-1.5B
+---
+# ALP_R1_Qwen1.5B
+R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.
+## Training
+- 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
+- 16 rollouts/prompt for difficulty estimation
+- 8K context window
+## Performance (Pass@1)
+- MATH-500: 0.81
+- AIME: 0.252
+- OlympiadBench: 0.51
+## Token Usage
+- MATH: 2804→862 (-69%)
+- AIME: 4007→3331 (-17%)
+- Olympiad: 3606→2107 (-42%)
+## Usage
+```python
+prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."