PromptCoT-2.0-SFT-7B

This model is part of PromptCoT 2.0 (Scaling Prompt Synthesis for LLM Reasoning).
It is a 7B parameter model trained entirely on synthetic prompts generated by PromptCoT 2.0, with reasoning trajectories distilled from GPT-OSS-120B (medium).

Unlike prior works (e.g., OpenMathReasoning, OpenCodeReasoning) that rely on human-written prompts, this model demonstrates that fully synthetic data can match or even surpass the effectiveness of manually curated datasets for advancing reasoning in both mathematics and programming.

📊 Comparison

PromptCoT-2.0-SFT-7B is trained 100% on synthetic prompts with teacher trajectories from GPT-OSS-120B (medium).
Below we compare it against two widely used human-written prompt baselines.

Metric: Pass@1 for AIME24/25, HMMT Feb25, LiveCodeBench v5/v6; Elo for Codeforces.

Model	Prompt Source	Teacher	AIME24	AIME25	HMMT Feb25	LiveCodeBench v5 (2408-2502)	LiveCodeBench v6 (2502-2505)	Codeforces
PromptCoT-2.0-SFT-7B	Synthetic	GPT-OSS-120B (med.)	73.1	65.6	46.5	53.4	48.9	1815
OpenMathReasoning	Human	DeepSeek-R1	73.3	58.1	42.1	9.7	10.7	676
OpenCodeReasoning	Human	DeepSeek-R1	11.7	7.7	6.0	50.5	42.0	1648

Takeaways

Fully synthetic wins: PromptCoT-2.0-SFT-7B outperforms human-written baselines across most math benchmarks and all code benchmarks.
Scalable & practical: High performance without manual prompt curation suggests a clear path to scaling reasoning with synthetic data.

🚀 Usage

You can load the model via Hugging Face transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "xl-zhao/PromptCoT-2.0-SFT-7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

prompt = "Solve for x: If 2x + 5 = 17, what is the value of x?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📂 Training Details

Data: 4.8M fully synthetic prompts generated by PromptCoT 2.0
Teacher: GPT-OSS-120B (medium), used for reasoning trajectory distillation
Domains: Mathematics (Olympiad-level) and Programming (competitive coding)
Training regime: Supervised fine-tuning (SFT), 100% synthetic data

🔮 Key Insights

Fully synthetic prompts work: No reliance on human-written datasets.
Compact trajectories: Distilled responses are shorter than those in prior datasets, reducing inference cost while maintaining quality.
Scalability: Opens the door for training larger reasoning models on purely synthetic corpora.

📜 Citation

If you use this model or the PromptCoT 2.0 dataset, please cite:

@article{zhao2025promptcot2,
  title     = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2509.19894},
  year      = {2025},
  url       = {https://arxiv.org/abs/2509.19894}
}

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xl-zhao/PromptCoT-2.0-SFT-7B

Quantizations

1 model

Collection including xl-zhao/PromptCoT-2.0-SFT-7B

PromptCoT 2.0

Collection

9 items • Updated 8 days ago • 1