Papers
arxiv:2506.09513

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Published on Jun 11
· Submitted by YuSun-AI on Jun 13
#1 Paper of the day
Authors:
Yu Sun ,
,
,
,
,
,
,

Abstract

ReasonMed, a large medical reasoning dataset, enhances the accuracy of medical question answering models by combining detailed reasoning paths with concise summaries, setting new benchmarks for model performance.

AI-generated summary

Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.

Community

Paper author Paper submitter

Welcome to follow our latest achievement—ReasonMed! We are dedicated to tackling knowledge-intensive reasoning challenges in the medical domain. To this end, we have built the largest open-source medical reasoning dataset in the industry and developed state-of-the-art (SOTA) models that outperform competitors at comparable parameter scales.

💪 What makes ReasonMed stand out?

⚕️ Largest & Highest-Quality Medical Reasoning Dataset!
We constructed and open-sourced the unprecedented ReasonMed dataset containing 370,000 rigorously validated high-quality reasoning paths. This breakthrough not only sets a new benchmark in scale but also empowers large language models (LLMs) with robust and reliable medical logical reasoning capabilities.

🧠 Innovative Multi-Agent Framework Ensures Excellence!
To address knowledge domain variations across models, we pioneered a Multi-Agent data generation framework. This system intelligently coordinates diverse "expert models" and dynamically adjusts reasoning strategies based on task complexity. The resulting dataset outperforms outputs from top-tier models like GPT-4o and DeepSeek-R1 in direct quality comparisons!

🔬 First Systematic Validation of "Reasoning" in Medicine!
While reasoning capabilities shine in mathematics and coding, their value in knowledge-dense medical scenarios remains underexplored. For the first time, we systematically evaluated the real-world benefits of explicit medical reasoning using unified data sources, offering critical empirical insights into "how LLMs can think deeper in medicine."

🔥 Small Model, Big Impact!
Our ReasonMed-7B model, trained on the ReasonMed dataset, achieves SOTA performance in public benchmarks for sub-10B parameter models, surpassing many larger counterparts. It demonstrates exceptional computational efficiency and accuracy!

We believe this work lays a solid data and methodological foundation for advancing medical AI.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.09513 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.09513 in a Space README.md to link it from this page.

Collections including this paper 4