File size: 2,908 Bytes
054564a
 
 
 
 
 
 
 
 
 
 
09f8766
d2c0921
09f8766
 
 
5f440e8
09f8766
dac7e37
62473a3
 
dac7e37
62473a3
dac7e37
62473a3
dac7e37
62473a3
 
09f8766
 
b490aff
054564a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09f8766
 
81d2960
09f8766
 
ede77c2
09f8766
81d2960
054564a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language: es
datasets:
- squad_es
- hackathon-pln-es/biomed_squad_es_v2
metrics:
- "f1"

---

# roberta-base-biomedical-clinical-es for QA 

This model was trained as part of the "Extractive QA Biomedicine" project developed during the 2022 [Hackathon](https://somosnlp.org/hackathon) organized by SOMOS NLP.

## Motivation

Recent research has made available Spanish Language Models trained on Biomedical corpus. This project explores the use of these new models to generate extractive Question Answering models for Biomedicine, and compares their effectiveness with general masked language models.

The models trained during the [Hackathon](https://somosnlp.org/hackathon) were:

[hackathon-pln-es/roberta-base-bne-squad2-es](https://huggingface.co/hackathon-pln-es/roberta-base-bne-squad2-es)

[hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es](https://huggingface.co/hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es)

[hackathon-pln-es/roberta-base-biomedical-es-squad2-es](https://huggingface.co/hackathon-pln-es/roberta-base-biomedical-es-squad2-es)

[hackathon-pln-es/biomedtra-small-es-squad2-es](https://huggingface.co/hackathon-pln-es/biomedtra-small-es-squad2-es)

## Description

This model is a fine-tuned version of [PlanTL-GOB-ES/roberta-base-biomedical-clinical-es](https://huggingface.co/PlanTL-GOB-ES/roberta-base-biomedical-clinical-es) on the [squad_es (v2)](https://huggingface.co/datasets/squad_es) training dataset.


## Hyperparameters

The hyperparameters were chosen based on those used in [PlanTL-GOB-ES/roberta-base-bne-sqac](https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne-sqac), a spanish-based QA model trained on a dataset with SQUAD v1 fromat.

```
 --num_train_epochs 2
 --learning_rate 3e-5
 --weight_decay 0.01
 --max_seq_length 386
 --doc_stride 128 
```

## Performance

Evaluated on the [hackathon-pln-es/biomed_squad_es_v2](https://huggingface.co/datasets/hackathon-pln-es/biomed_squad_es_v2) dev set.

|Model                                                         |Base Model Domain|exact  |f1     |HasAns_exact|HasAns_f1|NoAns_exact|NoAns_f1|
|--------------------------------------------------------------|-----------------|-------|-------|------------|---------|-----------|--------|
|hackathon-pln-es/roberta-base-bne-squad2-es                   |General          |67.6341|75.6988|53.7367     |70.0526  |81.2174    |81.2174 |
|hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es|Biomedical       |66.8426|75.2346|53.0249     |70.0031  |80.3478    |80.3478 |
|hackathon-pln-es/roberta-base-biomedical-es-squad2-es         |Biomedical       |67.6341|74.5612|47.6868     |61.7012  |87.1304    | 87.1304|
|hackathon-pln-es/biomedtra-small-es-squad2-es                 |Biomedical       |34.4767|44.3294|45.3737     |65.307   |23.8261    |23.8261 |


## Team

Santiago Maximo: [smaximo](https://huggingface.co/smaximo)