Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
# PromptCoT-2.0-SelfPlay-4B
|
7 |
+
|
8 |
+
This model is part of **PromptCoT 2.0** (*Scaling Prompt Synthesis for LLM Reasoning*).
|
9 |
+
It is a **4B model trained via self-play**, where synthesized problems from PromptCoT 2.0 provide **verifiable feedback** (unit tests for code, boxed answers for math).
|
10 |
+
The training loop uses **Direct Preference Optimization (DPO)** to align generations with automatically verified outcomes, removing the dependence on stronger external teachers.
|
11 |
+
|
12 |
+
This model establishes **new state-of-the-art performance at the 4B scale**, consistently outperforming strong open-source baselines and curated datasets.
|
13 |
+
|
14 |
+
---
|
15 |
+
|
16 |
+
## ✨ Highlights
|
17 |
+
|
18 |
+
- **Self-Play Training**:
|
19 |
+
The model improves autonomously using **synthetic math & code problems** generated by PromptCoT 2.0.
|
20 |
+
Positive/negative pairs are constructed from verifiable feedback signals (unit test success / final answer correctness).
|
21 |
+
|
22 |
+
- **Strong Baseline Improvements**:
|
23 |
+
Outperforms **Qwen3-4B-Thinking-2507** and surpasses curated datasets such as **OpenMathReasoning**, **OpenCodeReasoning**, and **OpenThoughts3** across all six benchmarks.
|
24 |
+
|
25 |
+
---
|
26 |
+
|
27 |
+
## 📊 Results
|
28 |
+
|
29 |
+
Evaluation on six benchmarks under the **self-play setting with 4B parameters**.
|
30 |
+
**Bold = best**, *Italic = second-best*.
|
31 |
+
|
32 |
+
| Model | AIME 24 | AIME 25 | HMMT Feb 25 | LiveCodeBench v5 (2408–2502) | LiveCodeBench v6 (2502–2505) | Codeforces |
|
33 |
+
|------------------------------|---------|---------|-------------|-------------------------------|-------------------------------|------------|
|
34 |
+
| Qwen3-4B-Thinking-2507 | 85.2 | 81.3 | 55.5 | 63.8 | 55.2 | 1852 |
|
35 |
+
| OpenCodeReasoning | 83.1 | 78.5 | 50.4 | 64.4 | *57.1* | 1867 |
|
36 |
+
| OpenMathReasoning | *85.3* | *83.0* | 56.8 | 59.7 | 48.5 | 1826 |
|
37 |
+
| OpenThoughts3 | 84.7 | 80.6 | 54.2 | *65.2* | 54.4 | 1846 |
|
38 |
+
| OpenR1 | 84.6 | 80.9 | 56.7 | 63.0 | 54.6 | 1829 |
|
39 |
+
| PromptCoT 1.0 | *85.3* | 81.8 | *58.6* | 64.5 | 56.7 | *1878* |
|
40 |
+
| **PromptCoT 2.0** | **87.3**| **85.0**| **66.5** | **67.7** | **61.1** | **1934** |
|
41 |
+
|
42 |
+
---
|
43 |
+
|
44 |
+
|
45 |
+
## 🔮 Key Takeaways
|
46 |
+
|
47 |
+
* **Best across all six benchmarks**: PromptCoT 2.0 achieves top scores on AIME 24/25, HMMT Feb 25, LiveCodeBench v5/v6, and Codeforces.
|
48 |
+
* **Large gains on high-difficulty tasks**: +11.0 points on HMMT, +5.9 on LCB v6, and +82 Elo on Codeforces compared to the next best.
|
49 |
+
* **Beyond curated baselines**: Unlike OpenMathReasoning, OpenCodeReasoning, and OpenThoughts3—which saturate on strong 4B bases—PromptCoT 2.0 continues to deliver significant improvements.
|
50 |
+
|
51 |
+
---
|
52 |
+
|
53 |
+
## 📂 Resources
|
54 |
+
|
55 |
+
* 📄 Paper: [PromptCoT 2.0](https://arxiv.org/abs/2509.19894)
|
56 |
+
* 💻 GitHub: [inclusionAI/PromptCoT](https://github.com/inclusionAI/PromptCoT)
|
57 |
+
* 📊 Dataset: [PromptCoT-2.0-SelfPlay-4B-48K](https://huggingface.co/datasets/xl-zhao/PromptCoT-2.0-SelfPlay-4B-48K)
|
58 |
+
|
59 |
+
---
|
60 |
+
|
61 |
+
## 📜 Citation
|
62 |
+
|
63 |
+
If you find this model useful, please consider citing:
|
64 |
+
|
65 |
+
````bibtex
|
66 |
+
@article{zhao2025promptcot2,
|
67 |
+
title = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
|
68 |
+
author = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
|
69 |
+
journal = {arXiv preprint arXiv:2509.19894},
|
70 |
+
year = {2025},
|
71 |
+
url = {https://arxiv.org/abs/2509.19894}
|
72 |
+
}
|
73 |
+
````
|