--- base_model: Qwen/Qwen2.5-1.5B license: apache-2.0 datasets: - math metrics: - accuracy pipeline_tag: text-generation language: - en --- # Qwen2.5-1.5B-GRPO-MATH-1EPOCH **Description:** A GRPO-fine-tuned version of Qwen2.5-1.5B trained on the MATH dataset. --- ## Citation ```bibtex @article{zhao2025learning, title={Learning to Reason without External Rewards}, author={Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn}, journal={arXiv preprint arXiv:2505.19590}, year={2025} } @article{sha2024deepseekmath, title = {DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}, author = {Shao, Zhihong and Wang, Peiyi and Zhu, Qihao and Xu, Runxin and Song, Junxiao and Bi, Xiao and … Guo, Daya}, journal = {arXiv preprint arXiv:2402.03300}, year = {2024}, } ```