--- license: apache-2.0 language: - ko - en base_model: - Qwen/Qwen2.5-72B tags: - medical - clinical - QA - benchmark - healthcare - korean --- ๐Ÿง  **Korean Medical LLM (QA-Finetuned) by Healthcare AI Research Institute of Seoul National University Hospital** Welcome to the official repository of the **Korean Medical Large Language Model (LLM)** developed by the **Healthcare AI Research Institute (HARI)** at **Seoul National University Hospital (SNUH)**. This model is **fine-tuned on Korean medical questionโ€“answering (QA) style data**, enabling robust performance in clinical reasoning, educational Q&A, and domain-specific medical inference. --- ## ๐Ÿš€ Model Overview * **Model Name**: `snuh/hari-q2.5` * **Architecture**: Large Language Model (LLM) * **Fine-tuning Objective**: Medical QA (Questionโ€“Answer) style generation * **Primary Language**: English, Korean * **Domain**: Clinical Medicine * **Performance**: Achieves **82% accuracy** on the **Korean Medical Licensing Examination (KMLE)** * **Key Applications**: * Clinical decision support (QA-style) * Medical education and self-assessment tools * Automated medical reasoning and documentation aid --- ## ๐Ÿ“Š Training Data & Benchmark This model was fine-tuned using a curated corpus of Korean medical QA-style data derived from **publicly available, de-identified sources**. The training data includes clinical guidelines, academic publications, exam-style questions, and synthetic prompts reflecting real-world clinical reasoning. * **Training Data Characteristics**: - Focused on Korean-language questionโ€“answering formats relevant to clinical settings. - Includes guideline-derived questions, de-identified case descriptions, and physician-crafted synthetic queries. - Designed to reflect realistic diagnostic, therapeutic, and decision-making scenarios. * **Benchmark Evaluation**: - **KMLE-style QA benchmark(KorMedMCQA)** - Doctor: 82.30% - Nurse: 90.43% - Pharm: 86.21% - Dentist: 73.86% - All evaluations were conducted on de-identified, non-clinical test sets, with no real patient data involved. > โš ๏ธ These benchmarks are provided for research purposes only and do not imply clinical safety or efficacy. --- ## ๐Ÿ” Privacy & Ethical Compliance We strictly adhere to ethical AI development and privacy protection: * โœ… The model was trained exclusively on **publicly available and de-identified data**. * ๐Ÿ”’ It does **not include any real patient data or personally identifiable information (PII)**. * โš–๏ธ Designed for **safe, responsible, and research-oriented** use in healthcare AI. > โš ๏ธ This model is intended for **research and educational purposes only** and should **not** be used to make clinical decisions. --- ## ๐Ÿฅ About HARI โ€“ Healthcare AI Research Institute The **Healthcare AI Research Institute (HARI)** is a pioneering research group within **Seoul National University Hospital**, driving innovation in medical AI. ### ๐ŸŒ Vision & Mission * **Vision**: Shaping a sustainable and healthy future through pioneering AI research. * **Mission**: * Develop clinically useful, trustworthy AI technologies. * Foster cross-disciplinary collaboration in medicine and AI. * Lead global healthcare AI commercialization and policy frameworks. * Educate the next generation of AI-powered medical professionals. --- ## ๐Ÿงช Research Platforms & Infrastructure * **Platforms**: SUPREME, SNUHUB, DeView, VitalDB, NSTRI Global Data Platform * **Computing**: NVIDIA H100 / A100 GPUs, Quantum AI Infrastructure * **Projects**: * Clinical note summarization * AI-powered diagnostics * EHR automation * Real-time monitoring via AI pipelines --- ## ๐ŸŽ“ AI Education Programs * **Basic AI for Healthcare**: Designed for clinicians and students * **Advanced AI Research**: Targeting senior researchers and specialists in clinical AI validation and deep learning --- ## ๐Ÿค Collaborate with Us We welcome collaboration with: * AI research institutions and medical universities * Healthcare startups and technology partners * Policymakers shaping AI regulation in medicine ๐Ÿ“ง **Contact**: [help-ds@snuh.org](mailto:help-ds@snuh.org) ๐ŸŒ **Website**: [Seoul National University Hospital](https://www.snuh.org) --- ## ๐Ÿค— Model Usage Example ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load tokenizer and model model_name = "snuh/hari-q2.5" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained(model_name) prompt = ''' ### Instruction: ๋‹น์‹ ์€ ์ž„์ƒ ์ง€์‹์„ ๊ฐ–์ถ˜ ์œ ๋Šฅํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ตญ์–ด ๊ธฐ๋ฐ˜ ์˜๋ฃŒ ์–ด์‹œ์Šคํ„ดํŠธ์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๋Œ€ํ•ด ์ •ํ™•ํ•˜๊ณ  ์‹ ์ค‘ํ•œ ์ž„์ƒ ์ถ”๋ก ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ง„๋‹จ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ด ์ฃผ์„ธ์š”. ๋ฐ˜๋“œ์‹œ ํ™˜์ž์˜ ์—ฐ๋ น, ์ฆ์ƒ, ๊ฒ€์‚ฌ ๊ฒฐ๊ณผ, ํ†ต์ฆ ๋ถ€์œ„ ๋“ฑ ๋ชจ๋“  ๋‹จ์„œ๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ณ ๋ คํ•˜์—ฌ ์ถ”๋ก  ๊ณผ์ •๊ณผ ์ง„๋‹จ๋ช…์„ ์ œ์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ํ•™์ ์œผ๋กœ ์ •ํ™•ํ•œ ์šฉ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋˜, ํ•„์š”ํ•˜๋‹ค๋ฉด ์ผ๋ฐ˜์ธ์ด ์ดํ•ดํ•˜๊ธฐ ์‰ฌ์šด ์šฉ์–ด๋„ ๋ณ‘ํ–‰ํ•ด ์„ค๋ช…ํ•ด ์ฃผ์„ธ์š”. ### Question: 60์„ธ ๋‚จ์„ฑ์ด ๋ณตํ†ต๊ณผ ๋ฐœ์—ด์„ ํ˜ธ์†Œํ•˜๋ฉฐ ๋‚ด์›ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ํ˜ˆ์•ก ๊ฒ€์‚ฌ ๊ฒฐ๊ณผ ๋ฐฑํ˜ˆ๊ตฌ ์ˆ˜์น˜๊ฐ€ ์ƒ์Šนํ–ˆ๊ณ , ์šฐ์ธก ํ•˜๋ณต๋ถ€ ์••ํ†ต์ด ํ™•์ธ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์ง„๋‹จ๋ช…์€ ๋ฌด์—‡์ธ๊ฐ€์š”? ### Reasoning: '''.strip() messages = [ {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."}, {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer([text], return_tensors="pt").to(model.device) generated_ids = model.generate( **model_inputs, max_new_tokens=512 ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ```` --- ## ๐Ÿ“„ License **Apache 2.0 License** โ€“ Free for research and commercial use with attribution. --- ## ๐Ÿ“ข Citation If you use this model in your work, please cite: ``` @misc{hari-q2.5, title = {hari-q2.5}, url = {https://huggingface.co/snuh/hari-q2.5}, author = {Healthcare AI Research Institute(HARI) of Seoul National University Hospital(SNUH)}, month = {May}, year = {2025} } ``` --- ## ๐Ÿš€ Together, we are shaping the future of AI-driven healthcare. ```