Intuitor
Collection
Models in the paper "Learning to Reason without External Rewards"
•
12 items
•
Updated
Description:
An Intuitor-fine-tuned version of Qwen3-14B trained on the MATH dataset.
@article{zhao2025learning,
title = {Learning to Reason without External Rewards},
author = {Zhao, Xuandong and Kang, Zhewei and Feng, Aosong and Levine, Sergey and Song, Dawn},
journal = {arXiv preprint arXiv:2505.19590},
year = {2025}
}