Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,8 @@ The model was trained to extract entities from French biomedical sentences using
|
|
16 |
|
17 |
## Dataset
|
18 |
|
19 |
-
The original dataset is Quaero French Medical Corpus.
|
20 |
-
|
21 |
|
22 |
```json
|
23 |
{
|
@@ -26,13 +26,22 @@ It was converted to a JSON format for generative instruction-style training.
|
|
26 |
}
|
27 |
```
|
28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
## Evaluation
|
30 |
|
31 |
Evaluation was performed on the test split comparing 2 sets, the predicted one and the ground truth one :
|
32 |
|
33 |
-
Metric
|
34 |
-
|
35 |
-
Precision
|
36 |
-
Recall
|
37 |
-
F1 Score
|
38 |
|
|
|
16 |
|
17 |
## Dataset
|
18 |
|
19 |
+
The original dataset is Quaero French Medical Corpus and I converted it to a JSON format for generative instruction-style training.
|
20 |
+
I used "<>" as a separator and the format is : 'TAG_1 entity_1 <> TAG_2 entity_2 <> ... <> TAG_n entity_n'.
|
21 |
|
22 |
```json
|
23 |
{
|
|
|
26 |
}
|
27 |
```
|
28 |
|
29 |
+
The QUAERO French Medical corpus features **overlapping entity spans**, including nested structures, for instance :
|
30 |
+
```json
|
31 |
+
{
|
32 |
+
"input": "Cancer du pancréas",
|
33 |
+
"output": "DISO Cancer <> DISO Cancer du pancréas <> ANAT pancréas"
|
34 |
+
}
|
35 |
+
```
|
36 |
+
|
37 |
+
|
38 |
## Evaluation
|
39 |
|
40 |
Evaluation was performed on the test split comparing 2 sets, the predicted one and the ground truth one :
|
41 |
|
42 |
+
| Metric | Score |
|
43 |
+
| --------- | ------ |
|
44 |
+
| Precision | 0.6482 |
|
45 |
+
| Recall | 0.6951 |
|
46 |
+
| F1 Score | 0.6709 |
|
47 |
|