|
--- |
|
license: apache-2.0 |
|
language: |
|
- es |
|
base_model: |
|
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es |
|
tags: |
|
- medical |
|
- spanish |
|
- bi-encoder |
|
- entity-linking |
|
- sapbert |
|
- umls |
|
- snomed-ct |
|
--- |
|
|
|
# **MedProcNER-bi-encoder** |
|
|
|
## Model Description |
|
|
|
MedProcNER-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the MedProcNER corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks. |
|
|
|
## 馃挕 Intended Use |
|
- **Domain**: Spanish Clinical NLP |
|
- **Tasks**: Entity linking of MedProcNER mentions to SNOMED-CT concepts |
|
- **Evaluated On**: MedProcNER (Gold Standard, Unseen Mentions, Unseen Codes) |
|
- **Users**: Researchers and developers focusing on specialized medical NEL |
|
|
|
### 馃挰 Definitions |
|
- **Unseen Mentions**: Mentions that do not appear in training but reference known codes. |
|
- **Unseen Codes**: Mentions associated with SNOMED-CT codes never seen during training. |
|
|
|
## 馃搱 Performance Summary (Top-25 Accuracy) |
|
|
|
| Evaluation Split | Top-25 Accuracy | |
|
|--------------------|-----------------| |
|
| Gold Standard | 0.917 | |
|
| Unseen Mentions | 0.831 | |
|
| Unseen Codes | 0.808 | |
|
|
|
## 馃И Usage |
|
|
|
```python |
|
from transformers import AutoModel, AutoTokenizer |
|
import torch |
|
|
|
model = AutoModel.from_pretrained("ICB-UMA/MedProcNER-bi-encoder") |
|
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/MedProcNER-bi-encoder") |
|
|
|
mention = "insuficiencia renal aguda" |
|
inputs = tokenizer(mention, return_tensors="pt") |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
embedding = outputs.last_hidden_state[:, 0, :] |
|
print(embedding.shape) |
|
``` |
|
|
|
Use with [Faiss](https://github.com/facebookresearch/faiss) or [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) for efficient retrieval. |
|
|
|
## 鈿狅笍 Limitations |
|
|
|
- The model is specialized for MedProcNER mentions and may underperform in other domains or corpora. |
|
- Expert supervision is advised for clinical deployment. |
|
|
|
## 馃摎 Citation |
|
|
|
> Gallego, Fernando and L贸pez-Garc铆a, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986 |
|
|
|
## Authors |
|
|
|
Fernando Gallego, Guillermo L贸pez-Garc铆a, Luis Gasco-S谩nchez, Martin Krallinger, Francisco J Veredas |
|
|