YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

CriticLeanInstruct

Welcome to the CriticLeanInstruct dataset repository! This dataset is designed to facilitate the alignment of large language models (LLMs) through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) processes. Below you'll find detailed information about the dataset structure, usage, and associated model variants.

πŸ” Overview

The CriticLeanInstruct dataset suite consists of several JSONL files, each serving specific purposes in the model training pipeline. Here's a breakdown of each file:

File Name Description
CriticLean_12K All of our critic lean data.
CriticLean_4K Seed Data. A subset of the CriticLean_12K, specifically used as our Reinforcement Learning (RL) dataset.
CriticLean_Mix_48K An expanded mixed dataset including:
- CriticLean_12K
- 18k math data sampled from OpenR1-Math-220k
- 18k code data sampled from OpenThoughts-114k-Code_decontaminated
CriticLean_Mix_16K A mixed dataset including:
- CriticLean_4K
- 6k math data sampled from OpenR1-Math-220k
- 6k code data sampled from OpenThoughts-114k-Code_decontaminated

🍭 Associated Model Variants

We've trained several model variants using the CriticLean dataset to demonstrate its effectiveness. Below is a summary of the training configurations:

Base Model SFT Applied? SFT Data RL Applied? RL Data CriticLeanGPT Model Name
Qwen2.5-Instruct Yes CriticLean_4K No * Qwen2.5-Instruct-SFT(Critic Only)
Qwen2.5-Instruct Yes CriticLean_Mix_16K No * Qwen2.5-Instruct-SFT(16K)
Qwen2.5-Instruct Yes CriticLean_Mix_48K No * Qwen2.5-Instruct-SFT
Qwen2.5-Instruct Yes CriticLean_Mix_48K Yes CriticLean_4K Qwen2.5-Instruct-SFT-RL
Qwen2.5-Instruct No * Yes CriticLean_4K Qwen2.5-RL
Qwen3 No * Yes CriticLean_4K Qwen3-RL

⛏️ Usage

For Supervised Fine-Tuning (SFT)

The CriticLean_Mix_* files are optimized for SFT, with varying sizes and domain focuses:

  • Use CriticLean_Mix_16K for a balanced mix of critic data, math, and code (16k total samples)
  • Use CriticLean_Mix_48K for a larger dataset with expanded math and code content (48k total samples)
  • Use CriticLean_4K or CriticLean_12K for pure critic lean data without additional math/code

For Reinforcement Learning (RL)

CriticLean_4K is specifically designated as the RL dataset, used to fine-tune models after SFT (or directly on base models) through reinforcement learning from human feedback (RLHF) or similar techniques.

πŸ”— Referenced Datasets & Links

This dataset incorporates samples from:

We thank the creators of these datasets for making their work available to the community.

πŸ“œ License

Apache-2.0 license

β˜•οΈ Citation

If you use CriticLeanBench in your research, please cite our paper:

@misc{peng2025criticleancriticguidedreinforcementlearning,
      title={CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization}, 
      author={Zhongyuan Peng and Yifan Yao and Kaijing Ma and Shuyue Guo and Yizhe Li and Yichi Zhang and Chenchen Zhang and Yifan Zhang and Zhouliang Yu and Luming Li and Minghao Liu and Yihang Xia and Jiawei Shen and Yuchen Wu and Yixin Cao and Zhaoxiang Zhang and Wenhao Huang and Jiaheng Liu and Ge Zhang},
      year={2025},
      eprint={2507.06181},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.06181}, 
}
Downloads last month
0
Safetensors
Model size
8.19B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for m-a-p/CriticLeanGPT-Qwen3-8B-RL

Quantizations
1 model