SynthLabsAI
/

ALP_R1_Qwen1.5B

Reinforcement Learning

Model card Files Files and versions Community

ALP_R1_Qwen1.5B / README.md

nlile's picture

Create README.md

c88b0de verified 2 days ago

|

history blame contribute delete

718 Bytes

	---
	license: apache-2.0
	tags:
	- reasoning
	- mathematics
	- reinforcement-learning
	datasets:
	- AIME
	- AMC
	- Omni-Math
	base_model: R1-Distill-Qwen-1.5B
	---

	# ALP_R1_Qwen1.5B

	R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.

	## Training
	- 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
	- 16 rollouts/prompt for difficulty estimation
	- 8K context window

	## Performance (Pass@1)
	- MATH-500: 0.81
	- AIME: 0.252
	- OlympiadBench: 0.51

	## Token Usage
	- MATH: 2804→862 (-69%)
	- AIME: 4007→3331 (-17%)
	- Olympiad: 3606→2107 (-42%)

	## Usage
	```python
	prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."