PromptCoT-2.0-SFT-7B

This model is part of PromptCoT 2.0 (Scaling Prompt Synthesis for LLM Reasoning).
It is a 7B parameter model trained entirely on synthetic prompts generated by PromptCoT 2.0, with reasoning trajectories distilled from GPT-OSS-120B (medium).

Unlike prior works (e.g., OpenMathReasoning, OpenCodeReasoning) that rely on human-written prompts, this model demonstrates that fully synthetic data can match or even surpass the effectiveness of manually curated datasets for advancing reasoning in both mathematics and programming.


๐Ÿ“Š Comparison

PromptCoT-2.0-SFT-7B is trained 100% on synthetic prompts with teacher trajectories from GPT-OSS-120B (medium).
Below we compare it against two widely used human-written prompt baselines.

Metric: Pass@1 for AIME24/25, HMMT Feb25, LiveCodeBench v5/v6; Elo for Codeforces.

Model Prompt Source Teacher AIME24 AIME25 HMMT Feb25 LiveCodeBench v5 (2408-2502) LiveCodeBench v6 (2502-2505) Codeforces
PromptCoT-2.0-SFT-7B Synthetic GPT-OSS-120B (med.) 73.1 65.6 46.5 53.4 48.9 1815
OpenMathReasoning Human DeepSeek-R1 73.3 58.1 42.1 9.7 10.7 676
OpenCodeReasoning Human DeepSeek-R1 11.7 7.7 6.0 50.5 42.0 1648

Takeaways

  • Fully synthetic wins: PromptCoT-2.0-SFT-7B outperforms human-written baselines across most math benchmarks and all code benchmarks.
  • Scalable & practical: High performance without manual prompt curation suggests a clear path to scaling reasoning with synthetic data.

๐Ÿš€ Usage

You can load the model via Hugging Face transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "xl-zhao/PromptCoT-2.0-SFT-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Solve for x: If 2x + 5 = 17, what is the value of x?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

๐Ÿ“‚ Training Details

  • Data: 4.8M fully synthetic prompts generated by PromptCoT 2.0
  • Teacher: GPT-OSS-120B (medium), used for reasoning trajectory distillation
  • Domains: Mathematics (Olympiad-level) and Programming (competitive coding)
  • Training regime: Supervised fine-tuning (SFT), 100% synthetic data

๐Ÿ”ฎ Key Insights

  • Fully synthetic prompts work: No reliance on human-written datasets.
  • Compact trajectories: Distilled responses are shorter than those in prior datasets, reducing inference cost while maintaining quality.
  • Scalability: Opens the door for training larger reasoning models on purely synthetic corpora.

๐Ÿ“œ Citation

If you use this model or the PromptCoT 2.0 dataset, please cite:

@article{zhao2025promptcot2,
  title     = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2509.19894},
  year      = {2025},
  url       = {https://arxiv.org/abs/2509.19894}
}
Downloads last month
11
Safetensors
Model size
7.62B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for xl-zhao/PromptCoT-2.0-SFT-7B

Quantizations
1 model

Collection including xl-zhao/PromptCoT-2.0-SFT-7B