CriticLeanInstruct

Welcome to the CriticLeanInstruct dataset repository! This dataset is designed to facilitate the alignment of large language models (LLMs) through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) processes. Below you'll find detailed information about the dataset structure, usage, and associated model variants.

🔍 Overview

The CriticLeanInstruct dataset suite consists of several JSONL files, each serving specific purposes in the model training pipeline. Here's a breakdown of each file:

File Name	Description
`CriticLean_12K`	All of our critic lean data.
`CriticLean_4K`	Seed Data. A subset of the CriticLean_12K, specifically used as our Reinforcement Learning (RL) dataset.
`CriticLean_Mix_48K`	An expanded mixed dataset including: - `CriticLean_12K` - 18k math data sampled from OpenR1-Math-220k - 18k code data sampled from OpenThoughts-114k-Code_decontaminated
`CriticLean_Mix_16K`	A mixed dataset including: - `CriticLean_4K` - 6k math data sampled from OpenR1-Math-220k - 6k code data sampled from OpenThoughts-114k-Code_decontaminated

🍭 Associated Model Variants

We've trained several model variants using the CriticLean dataset to demonstrate its effectiveness. Below is a summary of the training configurations:

Base Model	SFT Applied?	SFT Data	RL Applied?	RL Data	CriticLeanGPT Model Name
Qwen2.5-Instruct	Yes	`CriticLean_4K`	No	*	Qwen2.5-Instruct-SFT(Critic Only)
Qwen2.5-Instruct	Yes	`CriticLean_Mix_16K`	No	*	Qwen2.5-Instruct-SFT(16K)
Qwen2.5-Instruct	Yes	`CriticLean_Mix_48K`	No	*	Qwen2.5-Instruct-SFT
Qwen2.5-Instruct	Yes	`CriticLean_Mix_48K`	Yes	`CriticLean_4K`	Qwen2.5-Instruct-SFT-RL
Qwen2.5-Instruct	No	*	Yes	`CriticLean_4K`	Qwen2.5-RL
Qwen3	No	*	Yes	`CriticLean_4K`	Qwen3-RL

⛏️ Usage

For Supervised Fine-Tuning (SFT)

The CriticLean_Mix_* files are optimized for SFT, with varying sizes and domain focuses:

Use CriticLean_Mix_16K for a balanced mix of critic data, math, and code (16k total samples)
Use CriticLean_Mix_48K for a larger dataset with expanded math and code content (48k total samples)
Use CriticLean_4K or CriticLean_12K for pure critic lean data without additional math/code

For Reinforcement Learning (RL)

CriticLean_4K is specifically designated as the RL dataset, used to fine-tune models after SFT (or directly on base models) through reinforcement learning from human feedback (RLHF) or similar techniques.

🔗 Referenced Datasets & Links

This dataset incorporates samples from:

We thank the creators of these datasets for making their work available to the community.

📜 License

Apache-2.0 license

☕️ Citation

If you use CriticLeanBench in your research, please cite our paper:

@misc{peng2025criticleancriticguidedreinforcementlearning,
      title={CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization}, 
      author={Zhongyuan Peng and Yifan Yao and Kaijing Ma and Shuyue Guo and Yizhe Li and Yichi Zhang and Chenchen Zhang and Yifan Zhang and Zhouliang Yu and Luming Li and Minghao Liu and Yihang Xia and Jiawei Shen and Yuchen Wu and Yixin Cao and Zhaoxiang Zhang and Wenhao Huang and Jiaheng Liu and Ge Zhang},
      year={2025},
      eprint={2507.06181},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.06181}, 
}

m-a-p
/

CriticLeanGPT-Qwen3-8B-RL