arxiv:2506.09513

ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning

Published on Jun 11

· Submitted by

YuSun-AI on Jun 13

#1 Paper of the day

Upvote

Authors:

Yu Sun ,

Hao Zhang ,

Yu Rong ,

Abstract

ReasonMed, a large medical reasoning dataset, enhances the accuracy of medical question answering models by combining detailed reasoning paths with concise summaries, setting new benchmarks for model performance.

AI-generated summary

Though reasoning-based large language models (LLMs) have excelled in mathematics and programming, their capabilities in knowledge-intensive medical question answering remain underexplored. To address this, we introduce ReasonMed, the largest medical reasoning dataset, comprising 370k high-quality examples distilled from 1.7 million initial reasoning paths generated by various LLMs. ReasonMed is constructed through a multi-agent verification and refinement process, where we design an Error Refiner to enhance the reasoning paths by identifying and correcting error-prone steps flagged by a verifier. Leveraging ReasonMed, we systematically investigate best practices for training medical reasoning models and find that combining detailed Chain-of-Thought (CoT) reasoning with concise answer summaries yields the most effective fine-tuning strategy. Based on this strategy, we train ReasonMed-7B, which sets a new benchmark for sub-10B models, outperforming the prior best by 4.17\% and even exceeding LLaMA3.1-70B on PubMedQA by 4.60\%.

View arXiv page View PDF GitHub repository Add to collection

Community

YuSun-AI

Paper author Paper submitter 1 day ago

Welcome to follow our latest achievement—ReasonMed! We are dedicated to tackling knowledge-intensive reasoning challenges in the medical domain. To this end, we have built the largest open-source medical reasoning dataset in the industry and developed state-of-the-art (SOTA) models that outperform competitors at comparable parameter scales.

💪 What makes ReasonMed stand out?

⚕️ Largest & Highest-Quality Medical Reasoning Dataset!
We constructed and open-sourced the unprecedented ReasonMed dataset containing 370,000 rigorously validated high-quality reasoning paths. This breakthrough not only sets a new benchmark in scale but also empowers large language models (LLMs) with robust and reliable medical logical reasoning capabilities.

🧠 Innovative Multi-Agent Framework Ensures Excellence!
To address knowledge domain variations across models, we pioneered a Multi-Agent data generation framework. This system intelligently coordinates diverse "expert models" and dynamically adjusts reasoning strategies based on task complexity. The resulting dataset outperforms outputs from top-tier models like GPT-4o and DeepSeek-R1 in direct quality comparisons!

🔬 First Systematic Validation of "Reasoning" in Medicine!
While reasoning capabilities shine in mathematics and coding, their value in knowledge-dense medical scenarios remains underexplored. For the first time, we systematically evaluated the real-world benefits of explicit medical reasoning using unified data sources, offering critical empirical insights into "how LLMs can think deeper in medicine."

🔥 Small Model, Big Impact!
Our ReasonMed-7B model, trained on the ReasonMed dataset, achieves SOTA performance in public benchmarks for sub-10B parameter models, surpassing many larger counterparts. It demonstrates exceptional computational efficiency and accuracy!

We believe this work lays a solid data and methodological foundation for advancing medical AI.