Reproduce pre-training results
#2
by
kevalmorabia97
- opened
I would like to reproduce the bert-large-uncased-whole-word-masking model provided by huggingface. Could you please share more details on the experimental setup?
- Was this model trained from scratch or a fine-tuned version of bert-large-uncased but with wwm?
- How many epochs / steps, learning rate, batch size, number of gpus, etc.?
- There is this reference script but the example command uses wikitext dataset while bert was pre-trained on book corpus and english Wikipedia so I'm not sure how to reproduce these results.
Thank you :)
Any follow-up would be greatly appreciated!