CoReward-Qwen3-4B-Base
This is the Qwen3-4B-Base model trained by Co-Reward method using MATH training set.
If you are interested in Co-Reward, you can find more details on our Github Repo [https://github.com/tmlr-group/Co-Reward].
Citation
@article{zhang2025coreward,
title={Co-Reward: Self-supervised Reinforcement Learning for Large Language Model Reasoning via Contrastive Agreement},
author={Zizhuo Zhang and Jianing Zhu and Xinmu Ge and Zihua Zhao and Zhanke Zhou and Xuan Li and Xiao Feng and Jiangchao Yao and Bo Han},
journal={arXiv preprint arXiv:2508.00410}
year={2025},
}
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support