CriticLeanInstruct
Welcome to the CriticLeanInstruct dataset repository! This dataset is designed to facilitate the alignment of large language models (LLMs) through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) processes. Below you'll find detailed information about the dataset structure, usage, and associated model variants.
π Overview
The CriticLeanInstruct dataset suite consists of several JSONL files, each serving specific purposes in the model training pipeline. Here's a breakdown of each file:
File Name | Description |
---|---|
CriticLean_12K |
All of our critic lean data. |
CriticLean_4K |
Seed Data. A subset of the CriticLean_12K, specifically used as our Reinforcement Learning (RL) dataset. |
CriticLean_Mix_48K |
An expanded mixed dataset including: - CriticLean_12K - 18k math data sampled from OpenR1-Math-220k - 18k code data sampled from OpenThoughts-114k-Code_decontaminated |
CriticLean_Mix_16K |
A mixed dataset including: - CriticLean_4K - 6k math data sampled from OpenR1-Math-220k - 6k code data sampled from OpenThoughts-114k-Code_decontaminated |
π Associated Model Variants
We've trained several model variants using the CriticLean dataset to demonstrate its effectiveness. Below is a summary of the training configurations:
Base Model | SFT Applied? | SFT Data | RL Applied? | RL Data | CriticLeanGPT Model Name |
---|---|---|---|---|---|
Qwen2.5-Instruct | Yes | CriticLean_4K |
No | * | Qwen2.5-Instruct-SFT(Critic Only) |
Qwen2.5-Instruct | Yes | CriticLean_Mix_16K |
No | * | Qwen2.5-Instruct-SFT(16K) |
Qwen2.5-Instruct | Yes | CriticLean_Mix_48K |
No | * | Qwen2.5-Instruct-SFT |
Qwen2.5-Instruct | Yes | CriticLean_Mix_48K |
Yes | CriticLean_4K |
Qwen2.5-Instruct-SFT-RL |
Qwen2.5-Instruct | No | * | Yes | CriticLean_4K |
Qwen2.5-RL |
Qwen3 | No | * | Yes | CriticLean_4K |
Qwen3-RL |
βοΈ Usage
For Supervised Fine-Tuning (SFT)
The CriticLean_Mix_*
files are optimized for SFT, with varying sizes and domain focuses:
- Use
CriticLean_Mix_16K
for a balanced mix of critic data, math, and code (16k total samples) - Use
CriticLean_Mix_48K
for a larger dataset with expanded math and code content (48k total samples) - Use
CriticLean_4K
orCriticLean_12K
for pure critic lean data without additional math/code
For Reinforcement Learning (RL)
CriticLean_4K
is specifically designated as the RL dataset, used to fine-tune models after SFT (or directly on base models) through reinforcement learning from human feedback (RLHF) or similar techniques.
π Referenced Datasets & Links
This dataset incorporates samples from:
We thank the creators of these datasets for making their work available to the community.
π License
Apache-2.0 license
βοΈ Citation
If you use CriticLeanBench in your research, please cite our paper:
@misc{peng2025criticleancriticguidedreinforcementlearning,
title={CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization},
author={Zhongyuan Peng and Yifan Yao and Kaijing Ma and Shuyue Guo and Yizhe Li and Yichi Zhang and Chenchen Zhang and Yifan Zhang and Zhouliang Yu and Luming Li and Minghao Liu and Yihang Xia and Jiawei Shen and Yuchen Wu and Yixin Cao and Zhaoxiang Zhang and Wenhao Huang and Jiaheng Liu and Ge Zhang},
year={2025},
eprint={2507.06181},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.06181},
}
- Downloads last month
- 0