Dataset Details

Dataset Description

This dataset is the training data for rt from Scaling Reasoning can Improve Factuality in Large Language Models. The amount of data is around 7K rows.

  • Curated by: Mike Zhang
  • Funded by [optional]: Villum Fonden
  • Language(s) (NLP): English
  • License: Apache 2.0

Dataset Sources [optional]

Uses

One can use these reasoning traces to fine-tune their models to induce more factual thinking.

Direct Use

Having reasoning models via simple scaling (Muennighoff et al., 2025).

Out-of-Scope Use

We only have QA in this dataset, no other domains like mathematical reasoning or puzzles.

Dataset Structure

We have the following features:

features:
- name: id
  dtype: string
- name: question
  dtype: string
- name: gold_answer
  sequence: string
- name: model_answer
  sequence: string
- name: model
  dtype: string
- name: reasoning_trace
  dtype: string
- name: model_attempt
  dtype: string
- name: valid
  dtype: int64
- name: text
  dtype: string
- name: total_length
  dtype: int64
- name: think_length
  dtype: int64
- name: answer_length
  dtype: int64

The part used for fine-tuning is text where we pre-apply the chat template and also add a special tag for the <thinking> block.

Dataset Creation

Source Data

The data comes from the datasets used in the paper.

Data Collection and Processing

We did no further pre-processing to the QA pairs.

Bias, Risks, and Limitations

Recommendations

Users should be made aware of the risks, biases and limitations of the dataset. Note that not every answer is correct, thus always double-check the answers from the model.

Citation

BibTeX:

@misc{zhang2025scalingreasoningimprovefactuality,
      title={Scaling Reasoning can Improve Factuality in Large Language Models}, 
      author={Mike Zhang and Johannes Bjerva and Russa Biswas},
      year={2025},
      eprint={2505.11140},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.11140}, 
}
Downloads last month
5
Safetensors
Model size
32.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jjzha/Qwen2.5-32B-Instruct-fs1

Base model

Qwen/Qwen2.5-32B
Finetuned
(1003)
this model

Dataset used to train jjzha/Qwen2.5-32B-Instruct-fs1

Collection including jjzha/Qwen2.5-32B-Instruct-fs1