Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ inference:
|
|
4 |
do_sample: true
|
5 |
max_length: 512
|
6 |
top_p: 0.9
|
7 |
-
repetition_penalty: 1.
|
8 |
language:
|
9 |
- en
|
10 |
license: mit
|
@@ -143,19 +143,21 @@ Implementations of SacreBLEU, BERT Score, ROUGLE, METEOR, and SARI are from Hugg
|
|
143 |
|
144 |
## Results
|
145 |
|
146 |
-
|
|
|
|
|
147 |
|
148 |
-
|
149 |
|----------------|-------------------|
|
150 |
-
| SacreBLEU↑ |
|
151 |
-
| BERT Score F1↑ |
|
152 |
-
| ROUGLE-1↑ |
|
153 |
-
| ROUGLE-2↑ |
|
154 |
-
| ROUGLE-L↑ |
|
155 |
-
| METEOR↑ |
|
156 |
-
| SARI↑ |
|
157 |
-
| ARI
|
158 |
-
|
159 |
|
160 |
|
161 |
# Contact
|
@@ -164,7 +166,7 @@ Please [contact us](mailto:[email protected]) for any questions or suggestions.
|
|
164 |
|
165 |
# Disclaimer
|
166 |
|
167 |
-
|
168 |
|
169 |
|
170 |
# Acknowledgement
|
|
|
4 |
do_sample: true
|
5 |
max_length: 512
|
6 |
top_p: 0.9
|
7 |
+
repetition_penalty: 1.0
|
8 |
language:
|
9 |
- en
|
10 |
license: mit
|
|
|
143 |
|
144 |
## Results
|
145 |
|
146 |
+
We tested our model on the SAS test set (200 samples). We generate 10 lay summaries based on each sample's abstract. During generation, we used top-p sampling with $p=0.9$.
|
147 |
+
The mean performance is reported below.
|
148 |
+
|
149 |
|
150 |
+
| Metrics | SAS |
|
151 |
|----------------|-------------------|
|
152 |
+
| SacreBLEU↑ | 25.60 |
|
153 |
+
| BERT Score F1↑ | 90.14 |
|
154 |
+
| ROUGLE-1↑ | 52.28 |
|
155 |
+
| ROUGLE-2↑ | 29.61 |
|
156 |
+
| ROUGLE-L↑ | 38.02 |
|
157 |
+
| METEOR↑ | 43.75 |
|
158 |
+
| SARI↑ | 51.96 |
|
159 |
+
| ARI↓ | 17.04 |
|
160 |
+
Note: 1. Some generated texts are too short (less than 100 words) to calcualte meaningful ARI. We therefore concatenated adjecent five texts and compute ARI for the 400 longer texts (instead of original 2,000 texts). 2. BERT score, ROUGE, and METEOR multiplied by 100.
|
161 |
|
162 |
|
163 |
# Contact
|
|
|
166 |
|
167 |
# Disclaimer
|
168 |
|
169 |
+
This model is created for making scientific abstracts more accessible. Its outputs should not be used or trusted outside of its scope. There is no guarantee that the generated text is perfectly aligned with the research. Resort to human experts or original papers when a decision is critical.
|
170 |
|
171 |
|
172 |
# Acknowledgement
|