Tejas21 commited on
Commit
1ac421c
·
1 Parent(s): 7656c1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -1,3 +1,26 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ language:
5
+ - en
6
+
7
+ tags:
8
+ - Table to text
9
+ - Data to text
10
+
11
+ ## Dataset:
12
+ - [ToTTo](https://github.com/google-research-datasets/ToTTo)
13
+ A Controlled Table-to-Text Dataset. Totto is an open-source table-to-text dataset with over 1,20,000 examples in the English language. It defines a controlled generation task as: given a Wikipedia table and a set of highlighted cells, generate a one-sentence description.
14
+
15
+ ## Base Model - T5-Base
16
+ [Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
17
+ The T5 was built by the Google team in order to create a general-purpose model that can understand the text. The basic idea behind t5 was to deal with the text processing problem as a “text-to-text” problem, i.e. taking the text as input and producing new text as output.
18
+
19
+ ## Baseline Preprocessing
20
+ [Baseline Preprocessing](https://github.com/google-research/language/tree/master/language/totto)
21
+ This code repository serves as a supplementary for the main repository, which can be used to do basic preprocessing of the Totto dataset.
22
+
23
+ ## Fine-tuning
24
+ We used the T5 for the conditional generation model to fine-tune with, 24000 steps with the ToTTo dataset using [BLEURT](https://arxiv.org/abs/2004.04696) as a metric.
25
+
26
+