π¨ββοΈMed-R1
Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1.Overview
Inspired by the success of DeepSeek-R1 in training models using RL, many open-source projects have made significant progress in exploring the effectiveness of RL training. These projects not only validate RL as an effective training method but also showcase its potential across various application scenarios. However, such efforts have so far only replicated this success within limited domains and with smaller-parameter LLMs, without fully extending to larger-scale, more complex models and a broader range of tasks.
Med-R1 is dedicated to translating the success of RL in the training of LLMs within general domains to the medical field. To ensure that the model possesses a comprehensive reserve of medical knowledge, we have adopted a large-parameter base model. Specifically, this model integrates vast amounts of multi-source heterogeneous data, including medical literature, clinical guidelines, and electronic health records during the pre-training phase. Through fine-tuning, the model is refined to accurately understand and generate specialized medical content with nuanced reasoning.
- πΊ Achieves SOTA with only limited training samples. A model with only 14B parameters outperforms Qwen2.5-72B-Instruct, and Claude-3.5-Sonnet on various medical tasks.
- π Shows strong generalization across diverse problem types. It excels not only in medical tasks but also in mathematics and various other challenging problems (such as Ruozhiba).
- π Release open-source models and various tools, such as training datasets, gradio demo, etc.
Citation
@Misc{med-r1,
title = {Med-R1: Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1},
author = {Rongsheng Wang},
howpublished = {\url{https://github.com/WangRongsheng/Med-R1}},
year = {2025}
}