🧠 Korean Medical LLM (QA-Finetuned) by HARI @ Seoul National University Hospital

Welcome to the official repository of the Korean Medical Large Language Model (LLM) developed by the Healthcare AI Research Institute (HARI) at Seoul National University Hospital (SNUH).

This model is fine-tuned on Korean medical question–answering (QA) style data, enabling robust performance in clinical reasoning, educational Q&A, and domain-specific medical inference.


πŸš€ Model Overview

  • Model Name: snuh/hari-q2.5
  • Architecture: Large Language Model (LLM)
  • Fine-tuning Objective: Medical QA (Question–Answer) style generation
  • Primary Language: English, Korean
  • Domain: Clinical Medicine
  • Performance: Achieves 82% accuracy on the Korean Medical Licensing Examination (KMLE)
  • Key Applications:
    • Clinical decision support (QA-style)
    • Medical education and self-assessment tools
    • Automated medical reasoning and documentation aid

πŸ“Š Training Data & Benchmark

This model was fine-tuned using a curated corpus of Korean medical QA-style data derived from publicly available, de-identified sources. The training data includes clinical guidelines, academic publications, exam-style questions, and synthetic prompts reflecting real-world clinical reasoning.

  • Training Data Characteristics:

    • Focused on Korean-language question–answering formats relevant to clinical settings.
    • Includes guideline-derived questions, de-identified case descriptions, and physician-crafted synthetic queries.
    • Designed to reflect realistic diagnostic, therapeutic, and decision-making scenarios.
  • Benchmark Evaluation:

    • KMLE-style QA benchmark(KorMedMCQA)
      • Doctor: 82.30%
      • Nurse: 90.43%
      • Pharm: 86.21%
      • Dentist: 73.86%
    • All evaluations were conducted on de-identified, non-clinical test sets, with no real patient data involved.

⚠️ These benchmarks are provided for research purposes only and do not imply clinical safety or efficacy.


πŸ” Privacy & Ethical Compliance

We strictly adhere to ethical AI development and privacy protection:

  • βœ… The model was trained exclusively on publicly available and de-identified data.
  • πŸ”’ It does not include any real patient data or personally identifiable information (PII).
  • βš–οΈ Designed for safe, responsible, and research-oriented use in healthcare AI.

⚠️ This model is intended for research and educational purposes only and should not be used to make clinical decisions.


πŸ₯ About HARI – Healthcare AI Research Institute

The Healthcare AI Research Institute (HARI) is a pioneering research group within Seoul National University Hospital, driving innovation in medical AI.

🌍 Vision & Mission

  • Vision: Shaping a sustainable and healthy future through pioneering AI research.
  • Mission:
    • Develop clinically useful, trustworthy AI technologies.
    • Foster cross-disciplinary collaboration in medicine and AI.
    • Lead global healthcare AI commercialization and policy frameworks.
    • Educate the next generation of AI-powered medical professionals.

πŸ§ͺ Research Platforms & Infrastructure

  • Platforms: SUPREME, SNUHUB, DeView, VitalDB, NSTRI Global Data Platform
  • Computing: NVIDIA H100 / A100 GPUs, Quantum AI Infrastructure
  • Projects:
    • Clinical note summarization
    • AI-powered diagnostics
    • EHR automation
    • Real-time monitoring via AI pipelines

πŸŽ“ AI Education Programs

  • Basic AI for Healthcare: Designed for clinicians and students
  • Advanced AI Research: Targeting senior researchers and specialists in clinical AI validation and deep learning

🀝 Collaborate with Us

We welcome collaboration with:

  • AI research institutions and medical universities
  • Healthcare startups and technology partners
  • Policymakers shaping AI regulation in medicine

πŸ“§ Contact: [email protected]
🌐 Website: Seoul National University Hospital


πŸ€— Model Usage Example

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
model_name = "snuh/hari-q2.5"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = '''
### Instruction:
당신은 μž„μƒ 지식을 κ°–μΆ˜ 유λŠ₯ν•˜κ³  μ‹ λ’°ν•  수 μžˆλŠ” ν•œκ΅­μ–΄ 기반 의료 μ–΄μ‹œμŠ€ν„΄νŠΈμž…λ‹ˆλ‹€.
μ‚¬μš©μžμ˜ μ§ˆλ¬Έμ— λŒ€ν•΄ μ •ν™•ν•˜κ³  μ‹ μ€‘ν•œ μž„μƒ 좔둠을 λ°”νƒ•μœΌλ‘œ 진단 κ°€λŠ₯성을 μ œμ‹œν•΄ μ£Όμ„Έμš”.
λ°˜λ“œμ‹œ ν™˜μžμ˜ μ—°λ Ή, 증상, 검사 κ²°κ³Ό, 톡증 λΆ€μœ„ λ“± λͺ¨λ“  λ‹¨μ„œλ₯Ό μ’…ν•©μ μœΌλ‘œ κ³ λ €ν•˜μ—¬ μΆ”λ‘  κ³Όμ •κ³Ό 진단λͺ…을 μ œμ‹œν•΄μ•Ό ν•©λ‹ˆλ‹€.
μ˜ν•™μ μœΌλ‘œ μ •ν™•ν•œ μš©μ–΄λ₯Ό μ‚¬μš©ν•˜λ˜, ν•„μš”ν•˜λ‹€λ©΄ 일반인이 μ΄ν•΄ν•˜κΈ° μ‰¬μš΄ μš©μ–΄λ„ 병행해 μ„€λͺ…ν•΄ μ£Όμ„Έμš”.

### Question:
60μ„Έ 남성이 볡톡과 λ°œμ—΄μ„ ν˜Έμ†Œν•˜λ©° λ‚΄μ›ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
ν˜ˆμ•‘ 검사 κ²°κ³Ό 백혈ꡬ μˆ˜μΉ˜κ°€ μƒμŠΉν–ˆκ³ , 우츑 ν•˜λ³΅λΆ€ 압톡이 ν™•μΈλ˜μ—ˆμŠ΅λ‹ˆλ‹€.
κ°€μž₯ κ°€λŠ₯성이 높은 진단λͺ…은 λ¬΄μ—‡μΈκ°€μš”?

### Reasoning:
'''.strip()

messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

πŸ“„ License

Apache 2.0 License – Free for research and commercial use with attribution.


πŸ“’ Citation

If you use this model in your work, please cite:

@misc{hari-q2.5,
    title  = {hari-q2.5},
    url    = {https://huggingface.co/snuh/hari-q2.5},
    author = {Healthcare AI Research Institute(HARI) of Seoul National University Hospital(SNUH)},
    month  = {May},
    year   = {2025}
}

πŸš€ Together, we are shaping the future of AI-driven healthcare.


Downloads last month
0
Safetensors
Model size
72.7B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for snuh/hari-q2.5

Base model

Qwen/Qwen2.5-72B
Finetuned
(35)
this model