File size: 2,488 Bytes
cd4009e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
license: apache-2.0
language:
- es
base_model:
- PlanTL-GOB-ES/roberta-base-biomedical-clinical-es
tags:
- medical
- spanish
- bi-encoder
- entity-linking
- sapbert
- umls
- snomed-ct
---

# **MedProcNER-bi-encoder**

## Model Description

MedProcNER-bi-encoder is a domain-specific bi-encoder model for medical entity linking in Spanish, trained using synonym pairs from the MedProcNER corpus and SNOMED-CT (Fully Specified Name and preferred synonyms). The training data was curated from the gold standard corpus and enriched with knowledge-based synonyms to enhance entity normalization tasks.

## 💡 Intended Use
- **Domain**: Spanish Clinical NLP
- **Tasks**: Entity linking of MedProcNER mentions to SNOMED-CT concepts
- **Evaluated On**: MedProcNER (Gold Standard, Unseen Mentions, Unseen Codes)
- **Users**: Researchers and developers focusing on specialized medical NEL

### 💬 Definitions
- **Unseen Mentions**: Mentions that do not appear in training but reference known codes.
- **Unseen Codes**: Mentions associated with SNOMED-CT codes never seen during training.

## 📈 Performance Summary (Top-25 Accuracy)

| Evaluation Split   | Top-25 Accuracy |
|--------------------|-----------------|
| Gold Standard      | 0.917 |
| Unseen Mentions    | 0.831 |
| Unseen Codes       | 0.808 |

## 🧪 Usage

```python
from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("ICB-UMA/MedProcNER-bi-encoder")
tokenizer = AutoTokenizer.from_pretrained("ICB-UMA/MedProcNER-bi-encoder")

mention = "insuficiencia renal aguda"
inputs = tokenizer(mention, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
embedding = outputs.last_hidden_state[:, 0, :]
print(embedding.shape)
```

Use with [Faiss](https://github.com/facebookresearch/faiss) or [`FaissEncoder`](https://github.com/ICB-UMA/KnowledgeGraph) for efficient retrieval.

## ⚠️ Limitations

- The model is specialized for MedProcNER mentions and may underperform in other domains or corpora.
- Expert supervision is advised for clinical deployment.

## 📚 Citation

> Gallego, Fernando and López-García, Guillermo and Gasco, Luis and Krallinger, Martin and Veredas, Francisco J., Clinlinker-Kb: Clinical Entity Linking in Spanish with Knowledge-Graph Enhanced Biencoders. Available at SSRN: http://dx.doi.org/10.2139/ssrn.4939986

## Authors

Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J Veredas