HERBERT: Leveraging UMLS Hierarchical Knowledge to Enhance Clinical Entity Normalization in Spanish

HERBERT-GP is a contrastive-learning-based bi-encoder for medical entity normalization in Spanish.
It leverages hierarchical relationships from UMLS (parents and grandparents) to enhance the candidate retrieval step for entity linking in Spanish clinical texts.

Key features:

  • Base model: PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
  • Trained with 30 positive pairs per anchor using synonyms, parents, and grandparents from UMLS/SNOMED-CT.
  • Task: Normalization of disease, procedure, and symptom mentions to SNOMED-CT/UMLS codes.
  • Domain: Spanish biomedical/clinical texts.
  • Corpora: DisTEMIST, MedProcNER, SympTEMIST.

Evaluation (top-k accuracy):

Corpus Top-1 Top-5 Top-25 Top-200
DisTEMIST 0.585 0.727 0.808 0.871
SympTEMIST 0.632 0.783 0.884 0.948
MedProcNER 0.655 0.770 0.840 0.891
Downloads last month
5
Safetensors
Model size
126M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ICB-UMA/HERBERT-GP-30

Finetuned
(3)
this model