QQhahaha
/

Summarization

@@ -1,3 +1,8 @@
 # Text Summarization
 This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).
 ### Task Description：Chinese News Summarization (Title Generation)
@@ -20,19 +25,45 @@ output(news title)：
   After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax).
   However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
   ```
-  Greedy Result(f1-score)：rouge-1: 15.7, rouge-2: 4.9, rouge-L: 14.8
   ```
 - Beam Search
   Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result.
   Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy.
-  However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
-  For example
   ```
   I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
   ```
 - Top k Sampling
-- Top p Sampling
-- Temperature

+---
+license: apache-2.0
+language:
+- zh
+---
 # Text Summarization
 This is a assignment of Applied Deep Learning which is a course of National Taiwan University(NTU).
 ### Task Description：Chinese News Summarization (Title Generation)
   After the model generate the probility of every token as result, Greedy is the simplest way to choose the next word with most probable word(argmax).
   However, there is a problem that it's easy to choose the duplicate word with Greedy strategy.
   ```
+  Greedy Result(f1-score)：rouge-1: 1.5, rouge-2: 0.9, rouge-L: 1.4
   ```
 - Beam Search
   Beam Search strategy is keeping track of the k most probable sentences and finding the best one as a result.
   Therefore, if beam size is setting as 1, it becomes Greedy. We can say that beam search kind of solves the problem of Greedy.
+  However, if beam size is too large, the result will turn into too generic and less relevant though the result is safe and "correct".
+  For example
   ```
+  input：
   I love to listen Taylor Swift's songs so I decide to participate the concert of Taylor.
+  output：
+  What do you like to listen？
+  ```
+  ```
+  beam size = 5
+  Beam Search Result(f1-score)：rouge-1: 7.4, rouge-2: 1.9, rouge-L: 6.9
   ```
 - Top k Sampling
+  Sampling is a strategy to randomly choose the next word via the probability distribution instead of argmax.
+  Therefore, Top k Sampling samples the word via distribution but restricted to top-k probable words.
+  However, there is a problem when sampling the rarely used word, the sentence will not fluent.
+  ```
+  k = 5
+  Top k Result(f1-score)：rouge-1: 4.0, rouge-2: 0.5, rouge-L: 3.7
+  ```
+- Nucleus(Top p) Sampling
+  Nucleus Sampling is sampling from a subset of vocabulary with the most probability mass.
+  It can dynamically shrink and expand top-k.
+  ```
+  p = 5
+  Top p Result(f1-score)：rouge-1: 3.0, rouge-2: 0.2, rouge-L: 2.9
+  ```
+- Temperature
+  softmax temperature is applying a temperature hyperparameter to the softmax.
+  with high temperature： become more uniform, more diversity
+  with low temperature：become more spiky, less diversity
+  ```
+  temperature = 5
+  Temperature Result(f1-score)：rouge-1: 2.1, rouge-2: 0.04, rouge-L: 1.9
+  ```
+  As the result, we can figure out that in this task, beam search outperforms other strategies.