fernandogd97's picture
Update README.md
c6eb7cf verified
metadata
license: apache-2.0
language:
  - es
base_model:
  - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
tags:
  - medical
  - spanish
  - bi-encoder
  - entity-linking
  - sapbert
  - umls
  - snomed-ct

MedProcNER-bi-encoder

Model Description

MedProcNER-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the MedProcNER corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks.

💡 Intended Use

  • Domain: Spanish Clinical NLP
  • Tasks: Entity linking of MedProcNER mentions to SNOMED-CT concepts
  • Evaluated On: MedProcNER (Gold Standard, Unseen Mentions, Unseen Codes)
  • Users: Researchers and developers focusing on specialized medical NEL

💬 Definitions

  • Unseen Mentions: Mentions that do not appear in training but reference known codes.
  • Unseen Codes: Mentions associated with SNOMED-CT codes never seen during training.

📈 Performance Summary (Top-25 Accuracy)

Evaluation Split Top-25 Accuracy
Gold Standard 0.917
Unseen Mentions 0.831
Unseen Codes 0.808

🧪 Usage

from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("ICB-UMA/MedProcNER-bi-encoder")
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/MedProcNER-bi-encoder")

mention = "insuficiencia renal aguda"
inputs = tokenizer(mention, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)

Use with Faiss or FaissEncoder for efficient retrieval.

⚠️ Limitations

  • The model is specialized for MedProcNER mentions and may underperform in other domains or corpora.
  • Expert supervision is advised for clinical deployment.

📚 Citation

Gallego, Fernando and López-García, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986

Authors

Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas