ALP_R1_Qwen1.5B / README.md
nlile's picture
Create README.md
c88b0de verified
metadata
license: apache-2.0
tags:
  - reasoning
  - mathematics
  - reinforcement-learning
datasets:
  - AIME
  - AMC
  - Omni-Math
base_model: R1-Distill-Qwen-1.5B

ALP_R1_Qwen1.5B

R1-Distill-Qwen-1.5B trained with Adaptive Length Penalty (ALP) - reduces token usage by ~50% while maintaining performance.

Training

  • 100 steps GRPO, batch 512, LR 1e-6, β=1e-7
  • 16 rollouts/prompt for difficulty estimation
  • 8K context window

Performance (Pass@1)

  • MATH-500: 0.81
  • AIME: 0.252
  • OlympiadBench: 0.51

Token Usage

  • MATH: 2804→862 (-69%)
  • AIME: 4007→3331 (-17%)
  • Olympiad: 3606→2107 (-42%)

Usage

prompt = f"{problem} Let's think step by step and output the final answer within \\boxed{{}}."