ALP_DeepScaleR_1.5B_C16K

DeepScaleR-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.

Training

  • 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
  • 16 rollouts/prompt for difficulty estimation
  • 16K context window

Performance (Pass@1)

  • MATH-500: 0.80
  • AIME: 0.24
  • OlympiadBench: 0.51

Token Usage

  • MATH: 2326→646 (-72%)
  • AIME: 3906→2254 (-42%)
  • Olympiad: 3309→2107 (-36%)

Usage

prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."
Downloads last month
712
Safetensors
Model size
1.78B params
Tensor type
BF16
·
Video Preview
loading

Model tree for SynthLabsAI/ALP_DeepScaleR_1.5B_C16K

Quantizations
1 model

Collection including SynthLabsAI/ALP_DeepScaleR_1.5B_C16K