# CriticLeanInstruct Welcome to the **CriticLeanInstruct** dataset repository! This dataset is designed to facilitate the alignment of large language models (LLMs) through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) processes. Below you'll find detailed information about the dataset structure, usage, and associated model variants. ## 🔍 Overview The CriticLeanInstruct dataset suite consists of several JSONL files, each serving specific purposes in the model training pipeline. Here's a breakdown of each file: | File Name | Description | |-----------|-------------| | `CriticLean_12K` | All of our critic lean data. | | `CriticLean_4K` | Seed Data. A subset of the CriticLean_12K, specifically used as our Reinforcement Learning (RL) dataset. | | `CriticLean_Mix_48K` | An expanded mixed dataset including:
- `CriticLean_12K`
- 18k math data sampled from [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)
- 18k code data sampled from [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) | | `CriticLean_Mix_16K` | A mixed dataset including:
- `CriticLean_4K`
- 6k math data sampled from [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k)
- 6k code data sampled from [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) | ## 🍭 Associated Model Variants We've trained several model variants using the CriticLean dataset to demonstrate its effectiveness. Below is a summary of the training configurations: | Base Model | SFT Applied? | SFT Data | RL Applied? | RL Data | CriticLeanGPT Model Name | |------------|--------------|----------|-------------|---------|------------| | Qwen2.5-Instruct | Yes | `CriticLean_4K` | No | * | Qwen2.5-Instruct-SFT(Critic Only) | | Qwen2.5-Instruct | Yes | `CriticLean_Mix_16K` | No | * | Qwen2.5-Instruct-SFT(16K) | | Qwen2.5-Instruct | Yes | `CriticLean_Mix_48K` | No | * | Qwen2.5-Instruct-SFT | | Qwen2.5-Instruct | Yes | `CriticLean_Mix_48K` | Yes | `CriticLean_4K` | Qwen2.5-Instruct-SFT-RL | | Qwen2.5-Instruct | No | * | Yes | `CriticLean_4K` | Qwen2.5-Instruct-RL | | Qwen3 | No | * | Yes | `CriticLean_4K` | Qwen3-RL | ## ⛏️ Usage ### For Supervised Fine-Tuning (SFT) The `CriticLean_Mix_*` files are optimized for SFT, with varying sizes and domain focuses: - Use `CriticLean_Mix_16K` for a balanced mix of critic data, math, and code (16k total samples) - Use `CriticLean_Mix_48K` for a larger dataset with expanded math and code content (48k total samples) - Use `CriticLean_4K` or `CriticLean_12K` for pure critic lean data without additional math/code ### For Reinforcement Learning (RL) `CriticLean_4K` is specifically designated as the RL dataset, used to fine-tune models after SFT (or directly on base models) through reinforcement learning from human feedback (RLHF) or similar techniques. ## 🔗 Referenced Datasets & Links This dataset incorporates samples from: - [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) - [OpenThoughts-114k-Code_decontaminated](https://huggingface.co/datasets/open-r1/OpenThoughts-114k-Code_decontaminated) We thank the creators of these datasets for making their work available to the community. ## 📜 License Apache-2.0 license ## ☕️ Citation If you use CriticLeanBench in your research, please cite our paper: ``` @misc{peng2025criticleancriticguidedreinforcementlearning, title={CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization}, author={Zhongyuan Peng and Yifan Yao and Kaijing Ma and Shuyue Guo and Yizhe Li and Yichi Zhang and Chenchen Zhang and Yifan Zhang and Zhouliang Yu and Luming Li and Minghao Liu and Yihang Xia and Jiawei Shen and Yuchen Wu and Yixin Cao and Zhaoxiang Zhang and Wenhao Huang and Jiaheng Liu and Ge Zhang}, year={2025}, eprint={2507.06181}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2507.06181}, } ```