google-bert/bert-large-uncased-whole-word-masking · Reproduce pre-training results

I would like to reproduce the bert-large-uncased-whole-word-masking model provided by huggingface. Could you please share more details on the experimental setup?

Was this model trained from scratch or a fine-tuned version of bert-large-uncased but with wwm?
How many epochs / steps, learning rate, batch size, number of gpus, etc.?
There is this reference script but the example command uses wikitext dataset while bert was pre-trained on book corpus and english Wikipedia so I'm not sure how to reproduce these results.

Thank you :)