Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Text Summarization
|
2 |
This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).
|
3 |
### Task Description:Chinese News Summarization (Title Generation)
|
@@ -20,19 +25,45 @@ output(news title):
|
|
20 |
After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax).
|
21 |
However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
|
22 |
```
|
23 |
-
Greedy Result(f1-score):rouge-1:
|
24 |
```
|
25 |
- Beam Search
|
26 |
Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result.
|
27 |
Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy.
|
28 |
-
However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
|
29 |
-
For example
|
30 |
```
|
|
|
31 |
I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
```
|
33 |
- Top k Sampling
|
34 |
-
|
35 |
-
|
36 |
-
|
37 |
-
|
38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- zh
|
5 |
+
---
|
6 |
# Text Summarization
|
7 |
This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).
|
8 |
### Task Description:Chinese News Summarization (Title Generation)
|
|
|
25 |
After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax).
|
26 |
However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
|
27 |
```
|
28 |
+
Greedy Result(f1-score):rouge-1: 1.5, rouge-2: 0.9, rouge-L: 1.4
|
29 |
```
|
30 |
- Beam Search
|
31 |
Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result.
|
32 |
Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy.
|
33 |
+
However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
|
34 |
+
For example
|
35 |
```
|
36 |
+
input:
|
37 |
I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
|
38 |
+
output:
|
39 |
+
What do you like to listen?
|
40 |
+
```
|
41 |
+
```
|
42 |
+
beam size = 5
|
43 |
+
Beam Search Result(f1-score):rouge-1: 7.4, rouge-2: 1.9, rouge-L: 6.9
|
44 |
```
|
45 |
- Top k Sampling
|
46 |
+
Sampling is a strategy to randomly choose the next word via the probability distribution instead of argmax.
|
47 |
+
Therefore, Top k Sampling samples the word via distribution but restricted to top-k probable words.
|
48 |
+
However, there is a problem when sampling the rarely used word, the sentence will not fluent.
|
49 |
+
```
|
50 |
+
k = 5
|
51 |
+
Top k Result(f1-score):rouge-1: 4.0, rouge-2: 0.5, rouge-L: 3.7
|
52 |
+
```
|
53 |
+
- Nucleus(Top p) Sampling
|
54 |
+
Nucleus Sampling is sampling from a subset of vocabulary with the most probability mass.
|
55 |
+
It can dynamically shrink and expand top-k.
|
56 |
+
```
|
57 |
+
p = 5
|
58 |
+
Top p Result(f1-score):rouge-1: 3.0, rouge-2: 0.2, rouge-L: 2.9
|
59 |
+
```
|
60 |
+
- Temperature
|
61 |
+
softmax temperature is applying a temperature hyperparameter to the softmax.
|
62 |
+
with high temperature: become more uniform, more diversity
|
63 |
+
with low temperature:become more spiky, less diversity
|
64 |
+
```
|
65 |
+
temperature = 5
|
66 |
+
Temperature Result(f1-score):rouge-1: 2.1, rouge-2: 0.04, rouge-L: 1.9
|
67 |
+
```
|
68 |
+
|
69 |
+
As the result, we can figure out that in this task, beam search outperforms other strategies.
|