--- library_name: transformers license: apache-2.0 base_model: - Qwen/Qwen2.5-Math-1.5B --- # Qwen2.5-Math-1.5B-Oat-Zero ## Links - 📜 [Paper](https://github.com/sail-sg/understand-r1-zero/blob/main/understand-r1-zero.pdf) - 💻 [GitHub](https://github.com/sail-sg/understand-r1-zero) - 🤗 [Oat-Zero Collection](https://huggingface.co/collections/sail/oat-zero-understanding-r1-zero-like-training-67dcdb07b9f3eb05f1501c4a) ## Introduction This model is trained by the minimalist R1-Zero recipe introduced in our paper: - **Algorithm**: Dr. DRPO - **Data**: level 3-5 questions from MATH dataset - **Base model**: [Qwen/Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B) - **Template**: Qwen-Math Evaluation results on widely used math benchmarks are shown below: ## Usage ```python import vllm def apply_qwen_math_template(question: str): return ( "<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n<|im_start|>user\n" + question + "<|im_end|>\n<|im_start|>assistant\n" ) def apply_r1_template(question: str): return ( "A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. " "The reasoning process is enclosed within and answer is enclosed within tags, respectively, i.e., reasoning process here answer here .\nUser: " + question + "\nAssistant: " ) model_name = "sail/Qwen2.5-Math-1.5B-Oat-Zero" sampling_params = vllm.SamplingParams( n=1, temperature=0, top_p=1, max_tokens=3000, ) model = vllm.LLM( model_name, max_model_len=4096, dtype="bfloat16", enable_prefix_caching=True, ) if "Llama-3.2-3B-Oat-Zero" in model_name: apply_template = apply_r1_template else: apply_template = apply_qwen_math_template prompts = [ "How many positive whole-number divisors does 196 have?" ] prompts = list(map(apply_template, prompts)) outputs = model.generate(prompts, sampling_params) print(outputs) ``` ## Citation ```latex @misc{liu2025understanding, title={Understanding R1-Zero-Like Training: A Critical Perspective}, author={Zichen Liu and Changyu Chen and Wenjun Li and Penghui Qi and Tianyu Pang and Chao Du and Wee Sun Lee and Min Lin}, year={2025}, howpublished={\url{https://github.com/sail-sg/understand-r1-zero}}, } ```