--- license: apache-2.0 language: - es base_model: - PlanTL-GOB-ES/roberta-base-biomedical-clinical-es tags: - medical - spanish - bi-encoder - entity-linking - sapbert - umls - snomed-ct --- # **MedProcNER-bi-encoder** ## Model Description MedProcNER-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the MedProcNER corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks. ## 💡 Intended Use - **Domain**: Spanish Clinical NLP - **Tasks**: Entity linking of MedProcNER mentions to SNOMED-CT concepts - **Evaluated On**: MedProcNER (Gold Standard, Unseen Mentions, Unseen Codes) - **Users**: Researchers and developers focusing on specialized medical NEL ### 💬 Definitions - **Unseen Mentions**: Mentions that do not appear in training but reference known codes. - **Unseen Codes**: Mentions associated with SNOMED-CT codes never seen during training. ## 📈 Performance Summary (Top-25 Accuracy) | Evaluation Split | Top-25 Accuracy | |--------------------|-----------------| | Gold Standard | 0.917 | | Unseen Mentions | 0.831 | | Unseen Codes | 0.808 | ## 🧪 Usage ```python from transformers import AutoModel, AutoTokenizer import torch model = AutoModel.from_pretrained("ICB-UMA/MedProcNER-bi-encoder") tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/MedProcNER-bi-encoder") mention = "insuficiencia renal aguda" inputs = tokenizer(mention, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) embedding = outputs.last_hidden_state[:, 0, :] print(embedding.shape) ``` Use with [Faiss](https://github.com/facebookresearch/faiss) or [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) for efficient retrieval. ## ⚠️ Limitations - The model is specialized for MedProcNER mentions and may underperform in other domains or corpora. - Expert supervision is advised for clinical deployment. ## 📚 Citation > Gallego, Fernando and López-García, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986 ## Authors Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas