Commit
·
fa013b0
1
Parent(s):
26a826f
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,18 +1,19 @@
|
|
| 1 |
## GPT-Neo Small Portuguese
|
| 2 |
|
| 3 |
-
|
| 4 |
This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
|
| 5 |
|
| 6 |
-
|
| 7 |
It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
|
| 8 |
|
| 9 |
-
|
| 10 |
Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
|
| 11 |
|
| 12 |
##### Learning Rate: **2e-4**
|
| 13 |
##### Epochs: **1**
|
| 14 |
|
| 15 |
-
|
|
|
|
| 16 |
My true intention was totally educational, thus making available a Portuguese version of this model.
|
| 17 |
|
| 18 |
How to use
|
|
@@ -45,8 +46,8 @@ sample_outputs = model.generate(generated,
|
|
| 45 |
|
| 46 |
# Decoding and printing sequences
|
| 47 |
for i, sample_output in enumerate(sample_outputs):
|
| 48 |
-
print(">> Generated text {}
|
| 49 |
-
|
| 50 |
{}".format(i+1, tokenizer.decode(sample_output.tolist())))
|
| 51 |
|
| 52 |
# >> Generated text
|
|
|
|
| 1 |
## GPT-Neo Small Portuguese
|
| 2 |
|
| 3 |
+
#### Model Description
|
| 4 |
This is a finetuned version from GPT-Neo 125M by EletheurAI to Portuguese language.
|
| 5 |
|
| 6 |
+
#### Training data
|
| 7 |
It was training from 227,382 selected texts from a PTWiki Dump. You can found all the data from here: https://archive.org/details/ptwiki-dump-20210520
|
| 8 |
|
| 9 |
+
#### Training Procedure
|
| 10 |
Every text was passed through a GPT2-Tokenizer with bos and eos tokens to separate it, with max sequence length that the GPT-Neo could support. It was finetuned using the default metrics of the Trainer Class, available on the Hugging Face library.
|
| 11 |
|
| 12 |
##### Learning Rate: **2e-4**
|
| 13 |
##### Epochs: **1**
|
| 14 |
|
| 15 |
+
#### Goals
|
| 16 |
+
|
| 17 |
My true intention was totally educational, thus making available a Portuguese version of this model.
|
| 18 |
|
| 19 |
How to use
|
|
|
|
| 46 |
|
| 47 |
# Decoding and printing sequences
|
| 48 |
for i, sample_output in enumerate(sample_outputs):
|
| 49 |
+
print(">> Generated text {}\\
|
| 50 |
+
\\
|
| 51 |
{}".format(i+1, tokenizer.decode(sample_output.tolist())))
|
| 52 |
|
| 53 |
# >> Generated text
|