SynthLabsAI
/

ALP_R1_Qwen1.5B

Reinforcement Learning

Model card Files Files and versions Community

ALP_R1_Qwen1.5B

R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.

Training

100 steps GRPO, batch 512, LR 1e-6, β=1e-7
16 rollouts/prompt for difficulty estimation
8K context window

Performance (Pass@1)

MATH-500: 0.81
AIME: 0.252
OlympiadBench: 0.51

Token Usage

MATH: 2804→862 (-69%)
AIME: 4007→3331 (-17%)
Olympiad: 3606→2107 (-42%)

Usage

prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."

Downloads last month: 260

Safetensors

Model size

1.78B params

Tensor type

BF16

·

Video Preview

Reinforcement Learning

loading

Model tree for SynthLabsAI/ALP_R1_Qwen1.5B

Quantizations

1 model

Collection including SynthLabsAI/ALP_R1_Qwen1.5B

Adaptive Length Penalty

Teaching language models to think efficiently with Adaptive Length Penalty (ALP) • 3 items • Updated 2 days ago • 1