pszemraj
/

long-t5-tglobal-base-16384-book-summary

text2text-generation

Model card Files Files and versions

pszemraj commited on Jul 3, 2022

Commit

cec0b20

·

1 Parent(s): b2e34e3

write about updated checkpoint

Files changed (1) hide show

README.md +10 -4

README.md CHANGED Viewed

@@ -122,21 +122,27 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during the **final** training round\*:
-- learning_rate: 0.0004
-- train_batch_size: 2
 - eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
 - gradient_accumulation_steps: 64
-- total_train_batch_size: 128
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.02
 - num_epochs: 2
 \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
 ### Training results

 ## Training procedure
+### Updates:
+- Added a new version on July 3, 2022, with several epochs of additional training that is more performant in general.
 ### Training hyperparameters
 The following hyperparameters were used during the **final** training round\*:
+- learning_rate: 0.001
+- train_batch_size: 1
 - eval_batch_size: 1
 - seed: 42
 - distributed_type: multi-GPU
 - gradient_accumulation_steps: 64
+- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.01
 - num_epochs: 2
 \*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train_
 ### Training results