metadata

base_model:
  - allenai/OLMo-2-1124-7B-SFT
license: apache-2.0
datasets:
  - math
metrics:
  - accuracy
pipeline_tag: text-generation
language:
  - en

OLMo-2-7B-SFT-Intuitor-MATH-1EPOCH

Description:

An Intuitor-fine-tuned version of Allenai/OLMo-2-1124-7B-SFT trained on the MATH dataset.

Citation

@article{zhao2025learning,
  title={Learning to Reason without External Rewards},
  author={Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
  journal={arXiv preprint arXiv:2505.19590},
  year={2025}
}