π§ Korean Medical LLM (QA-Finetuned) by HARI @ Seoul National University Hospital
Welcome to the official repository of the Korean Medical Large Language Model (LLM) developed by the Healthcare AI Research Institute (HARI) at Seoul National University Hospital (SNUH).
This model is fine-tuned on Korean medical questionβanswering (QA) style data, enabling robust performance in clinical reasoning, educational Q&A, and domain-specific medical inference.
π Model Overview
- Model Name:
snuh/hari-q2.5
- Architecture: Large Language Model (LLM)
- Fine-tuning Objective: Medical QA (QuestionβAnswer) style generation
- Primary Language: English, Korean
- Domain: Clinical Medicine
- Performance: Achieves 82% accuracy on the Korean Medical Licensing Examination (KMLE)
- Key Applications:
- Clinical decision support (QA-style)
- Medical education and self-assessment tools
- Automated medical reasoning and documentation aid
π Training Data & Benchmark
This model was fine-tuned using a curated corpus of Korean medical QA-style data derived from publicly available, de-identified sources. The training data includes clinical guidelines, academic publications, exam-style questions, and synthetic prompts reflecting real-world clinical reasoning.
Training Data Characteristics:
- Focused on Korean-language questionβanswering formats relevant to clinical settings.
- Includes guideline-derived questions, de-identified case descriptions, and physician-crafted synthetic queries.
- Designed to reflect realistic diagnostic, therapeutic, and decision-making scenarios.
Benchmark Evaluation:
- KMLE-style QA benchmark(KorMedMCQA)
- Doctor: 82.30%
- Nurse: 90.43%
- Pharm: 86.21%
- Dentist: 73.86%
- All evaluations were conducted on de-identified, non-clinical test sets, with no real patient data involved.
- KMLE-style QA benchmark(KorMedMCQA)
β οΈ These benchmarks are provided for research purposes only and do not imply clinical safety or efficacy.
π Privacy & Ethical Compliance
We strictly adhere to ethical AI development and privacy protection:
- β The model was trained exclusively on publicly available and de-identified data.
- π It does not include any real patient data or personally identifiable information (PII).
- βοΈ Designed for safe, responsible, and research-oriented use in healthcare AI.
β οΈ This model is intended for research and educational purposes only and should not be used to make clinical decisions.
π₯ About HARI β Healthcare AI Research Institute
The Healthcare AI Research Institute (HARI) is a pioneering research group within Seoul National University Hospital, driving innovation in medical AI.
π Vision & Mission
- Vision: Shaping a sustainable and healthy future through pioneering AI research.
- Mission:
- Develop clinically useful, trustworthy AI technologies.
- Foster cross-disciplinary collaboration in medicine and AI.
- Lead global healthcare AI commercialization and policy frameworks.
- Educate the next generation of AI-powered medical professionals.
π§ͺ Research Platforms & Infrastructure
- Platforms: SUPREME, SNUHUB, DeView, VitalDB, NSTRI Global Data Platform
- Computing: NVIDIA H100 / A100 GPUs, Quantum AI Infrastructure
- Projects:
- Clinical note summarization
- AI-powered diagnostics
- EHR automation
- Real-time monitoring via AI pipelines
π AI Education Programs
- Basic AI for Healthcare: Designed for clinicians and students
- Advanced AI Research: Targeting senior researchers and specialists in clinical AI validation and deep learning
π€ Collaborate with Us
We welcome collaboration with:
- AI research institutions and medical universities
- Healthcare startups and technology partners
- Policymakers shaping AI regulation in medicine
π§ Contact: [email protected]
π Website: Seoul National University Hospital
π€ Model Usage Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load tokenizer and model
model_name = "snuh/hari-q2.5"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = '''
### Instruction:
λΉμ μ μμ μ§μμ κ°μΆ μ λ₯νκ³ μ λ’°ν μ μλ νκ΅μ΄ κΈ°λ° μλ£ μ΄μμ€ν΄νΈμ
λλ€.
μ¬μ©μμ μ§λ¬Έμ λν΄ μ ννκ³ μ μ€ν μμ μΆλ‘ μ λ°νμΌλ‘ μ§λ¨ κ°λ₯μ±μ μ μν΄ μ£ΌμΈμ.
λ°λμ νμμ μ°λ Ή, μ¦μ, κ²μ¬ κ²°κ³Ό, ν΅μ¦ λΆμ λ± λͺ¨λ λ¨μλ₯Ό μ’
ν©μ μΌλ‘ κ³ λ €νμ¬ μΆλ‘ κ³Όμ κ³Ό μ§λ¨λͺ
μ μ μν΄μΌ ν©λλ€.
μνμ μΌλ‘ μ νν μ©μ΄λ₯Ό μ¬μ©νλ, νμνλ€λ©΄ μΌλ°μΈμ΄ μ΄ν΄νκΈ° μ¬μ΄ μ©μ΄λ λ³νν΄ μ€λͺ
ν΄ μ£ΌμΈμ.
### Question:
60μΈ λ¨μ±μ΄ 볡ν΅κ³Ό λ°μ΄μ νΈμνλ©° λ΄μνμμ΅λλ€.
νμ‘ κ²μ¬ κ²°κ³Ό λ°±νꡬ μμΉκ° μμΉνκ³ , μ°μΈ‘ νλ³΅λΆ μν΅μ΄ νμΈλμμ΅λλ€.
κ°μ₯ κ°λ₯μ±μ΄ λμ μ§λ¨λͺ
μ 무μμΈκ°μ?
### Reasoning:
'''.strip()
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
π License
Apache 2.0 License β Free for research and commercial use with attribution.
π’ Citation
If you use this model in your work, please cite:
@misc{hari-q2.5,
title = {hari-q2.5},
url = {https://huggingface.co/snuh/hari-q2.5},
author = {Healthcare AI Research Institute(HARI) of Seoul National University Hospital(SNUH)},
month = {May},
year = {2025}
}
π Together, we are shaping the future of AI-driven healthcare.
- Downloads last month
- 0
Model tree for snuh/hari-q2.5
Base model
Qwen/Qwen2.5-72B