HeyLucasLeao
/

gpt-neo-small-portuguese

Text Generation

Model card Files Files and versions

HeyLucasLeao commited on Jun 19, 2021

Commit

fa013b0

·

1 Parent(s): 26a826f

Update README.md

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -1,18 +1,19 @@
 ## GPT-Neo Small Portuguese
-##### Model Description
 This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
-##### Training data
 It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
-##### Training Procedure
 Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
 ##### Learning Rate: **2e-4**
 ##### Epochs: **1**
-##### Goals
 My true intention was totally educational, thus making available a Portuguese version of this model.
 How to use
@@ -45,8 +46,8 @@ sample_outputs = model.generate(generated,
 # Decoding and printing sequences
 for i, sample_output in enumerate(sample_outputs):
-    print(">> Generated text {}\
-\
 {}".format(i+1, tokenizer.decode(sample_output.tolist())))
 # >> Generated text

 ## GPT-Neo Small Portuguese
+#### Model Description
 This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
+#### Training data
 It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
+#### Training Procedure
 Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
 ##### Learning Rate: **2e-4**
 ##### Epochs: **1**
+#### Goals
 My true intention was totally educational, thus making available a Portuguese version of this model.
 How to use
 # Decoding and printing sequences
 for i, sample_output in enumerate(sample_outputs):
+    print(">> Generated text {}\\
+\\
 {}".format(i+1, tokenizer.decode(sample_output.tolist())))
 # >> Generated text