QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Highlights

QuestA introduces question augmentation to significantly improve reasoning tasks in large language models (LLMs). By incorporating partial solutions during reinforcement learning (RL) training, QuestA enhances problem-solving capacity and accelerates learning on challenging tasks. Key improvements with QuestA:

Significant performance boost on math reasoning benchmarks (e.g., AIME25, HMMT25), including a 10%+ increase in accuracy.
Enhanced training efficiency via augmented prompts, allowing more tractable learning on hard problems.
State-of-the-art results for 1.5B-parameter models, making QuestA effective even on models with smaller parameter sizes.

Model Overview

Model Type: Causal Language Model (RL-based Training)
Training Method: Reinforcement Learning (RL) with Question Augmentation
Number of Parameters: 1.5B (base model), augmented with dynamic difficulty control
Layer Count: Customizable based on the RL training configuration
Context Length: 32K tokens (configurable)
Main Innovation: Question Augmentation with Partial Solutions

QuestA dynamically adjusts problem difficulty by providing partial solutions to complex problems, thus improving the model’s ability to solve hard tasks more effectively.

Performance

QuestA achieves the following performance improvements over baseline models, particularly in the field of math reasoning:

Model	AIME24	AIME25	HMMT FEB 25	Olympiad Bench	BRUMO25	Avg.
DeepSeek-R1-Distill-32B	72.6	51.8	33.0	65.0	68.0	58.1
Qwen3-1.7B	48.3	36.8	22.2	56.1	44.1	41.5
Nemotron-1.5B	61.8	49.5	31.6	64.6	58.2	53.1
QuestA-Nemotron-1.5B	72.5	62.3	41.7	70.4	69.5	63.3

Pass@k Performance: Shows consistent improvement across various difficulty levels.

Quickstart

To get started with QuestA, you can load the model using the transformers library. Make sure you have the latest version installed.

pip install transformers

Example Python code to run QuestA:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "QuestA/QuestA-Nemotron-1.5B"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Generate response with augmented question
prompt = "Solve for x: 2x + 3 = 11."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)

# Decode the response
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

For deployment, QuestA can be served using frameworks like vLLM or SGLang:

# For vLLM
vllm serve QuestA/QuestA-Nemotron-1.5B --tensor-parallel-size 8 --max-model-len 32768

Key Features

Question Augmentation: Prepend partial solutions to difficult problems, aiding model learning.
Curriculum-based RL: Gradually reduce dependency on hints as training progresses.
Training with Augmented Data: Use dynamically filtered datasets to focus on the hardest problems.
Efficient Learning: Faster convergence on complex tasks due to better sampling and more informative rewards.

Citation

If you find this work useful, please cite our paper:

@misc{li2025questaexpandingreasoningcapacity,
      title={QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation}, 
      author={Jiazheng Li and Hong Lu and Kaiyue Wen and Zaiwen Yang and Jiaxuan Gao and Hongzhou Lin and Yi Wu and Jingzhao Zhang},
      year={2025},
      eprint={2507.13266},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.13266}, 
}

For more details on the methodology, results, and code, visit the official QuestA GitHub repository.

Conclusion

QuestA is a novel framework for enhancing LLMs' reasoning capabilities by addressing complex problems more effectively. By augmenting the training process with partial solutions, QuestA accelerates learning, resulting in state-of-the-art performance on benchmark math reasoning tasks and more.

Downloads last month: 204

Model tree for foreverlasting1202/QuestA-Nemotron-1.5B

Quantizations

1 model