---
license: apache-2.0
language:
- ko
- en
base_model:
- Qwen/Qwen2.5-72B
tags:
- medical
- clinical
- QA
- benchmark
- healthcare
- korean
---

🧠 **Korean Medical LLM (QA-Finetuned) by Healthcare AI Research Institute of Seoul National University Hospital**

Welcome to the official repository of the **Korean Medical Large Language Model (LLM)** developed by the **Healthcare AI Research Institute (HARI)** at **Seoul National University Hospital (SNUH)**.

This model is **fine-tuned on Korean medical question–answering (QA) style data**, enabling robust performance in clinical reasoning, educational Q&A, and domain-specific medical inference.

---

## 🚀 Model Overview

* **Model Name**: `snuh/hari-q2.5`
* **Architecture**: Large Language Model (LLM)
* **Fine-tuning Objective**: Medical QA (Question–Answer) style generation
* **Primary Language**: English, Korean
* **Domain**: Clinical Medicine
* **Performance**: Achieves **82% accuracy** on the **Korean Medical Licensing Examination (KMLE)**
* **Key Applications**:
  * Clinical decision support (QA-style)
  * Medical education and self-assessment tools
  * Automated medical reasoning and documentation aid

---

## 📊 Training Data & Benchmark

This model was fine-tuned using a curated corpus of Korean medical QA-style data derived from **publicly available, de-identified sources**. The training data includes clinical guidelines, academic publications, exam-style questions, and synthetic prompts reflecting real-world clinical reasoning.

* **Training Data Characteristics**:
  - Focused on Korean-language question–answering formats relevant to clinical settings.
  - Includes guideline-derived questions, de-identified case descriptions, and physician-crafted synthetic queries.
  - Designed to reflect realistic diagnostic, therapeutic, and decision-making scenarios.

* **Benchmark Evaluation**:
  - **KMLE-style QA benchmark(KorMedMCQA)**
    - Doctor: 82.30%
    - Nurse: 90.43%
    - Pharm: 86.21%
    - Dentist: 73.86%
  - All evaluations were conducted on de-identified, non-clinical test sets, with no real patient data involved.

> ⚠️ These benchmarks are provided for research purposes only and do not imply clinical safety or efficacy.

---

## 🔐 Privacy & Ethical Compliance

We strictly adhere to ethical AI development and privacy protection:

* ✅ The model was trained exclusively on **publicly available and de-identified data**.
* 🔒 It does **not include any real patient data or personally identifiable information (PII)**.
* ⚖️ Designed for **safe, responsible, and research-oriented** use in healthcare AI.

> ⚠️ This model is intended for **research and educational purposes only** and should **not** be used to make clinical decisions.

---

## 🏥 About HARI – Healthcare AI Research Institute

The **Healthcare AI Research Institute (HARI)** is a pioneering research group within **Seoul National University Hospital**, driving innovation in medical AI.

### 🌍 Vision & Mission

* **Vision**: Shaping a sustainable and healthy future through pioneering AI research.
* **Mission**:
  * Develop clinically useful, trustworthy AI technologies.
  * Foster cross-disciplinary collaboration in medicine and AI.
  * Lead global healthcare AI commercialization and policy frameworks.
  * Educate the next generation of AI-powered medical professionals.

---

## 🧪 Research Platforms & Infrastructure

* **Platforms**: SUPREME, SNUHUB, DeView, VitalDB, NSTRI Global Data Platform
* **Computing**: NVIDIA H100 / A100 GPUs, Quantum AI Infrastructure
* **Projects**:
  * Clinical note summarization
  * AI-powered diagnostics
  * EHR automation
  * Real-time monitoring via AI pipelines

---

## 🎓 AI Education Programs

* **Basic AI for Healthcare**: Designed for clinicians and students
* **Advanced AI Research**: Targeting senior researchers and specialists in clinical AI validation and deep learning

---

## 🤝 Collaborate with Us

We welcome collaboration with:

* AI research institutions and medical universities
* Healthcare startups and technology partners
* Policymakers shaping AI regulation in medicine

📧 **Contact**: [help-ds@snuh.org](mailto:help-ds@snuh.org)  
🌐 **Website**: [Seoul National University Hospital](https://www.snuh.org)

---

## 🤗 Model Usage Example

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
model_name = "snuh/hari-q2.5"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = '''
### Instruction:
당신은 임상 지식을 갖춘 유능하고 신뢰할 수 있는 한국어 기반 의료 어시스턴트입니다.
사용자의 질문에 대해 정확하고 신중한 임상 추론을 바탕으로 진단 가능성을 제시해 주세요.
반드시 환자의 연령, 증상, 검사 결과, 통증 부위 등 모든 단서를 종합적으로 고려하여 추론 과정과 진단명을 제시해야 합니다.
의학적으로 정확한 용어를 사용하되, 필요하다면 일반인이 이해하기 쉬운 용어도 병행해 설명해 주세요.

### Question:
60세 남성이 복통과 발열을 호소하며 내원하였습니다.
혈액 검사 결과 백혈구 수치가 상승했고, 우측 하복부 압통이 확인되었습니다.
가장 가능성이 높은 진단명은 무엇인가요?

### Reasoning:
'''.strip()

messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
````

---

## 📄 License

**Apache 2.0 License** – Free for research and commercial use with attribution.

---

## 📢 Citation

If you use this model in your work, please cite:

```
@misc{hari-q2.5,
    title  = {hari-q2.5},
    url    = {https://huggingface.co/snuh/hari-q2.5},
    author = {Healthcare AI Research Institute(HARI) of Seoul National University Hospital(SNUH)},
    month  = {May},
    year   = {2025}
}
```

---

## 🚀 Together, we are shaping the future of AI-driven healthcare.

```