hardware?
#3
by
amanpreet7
- opened
what hardware did you used? and also the dataset is 27TB? isnt it big and also wasnt the allenai/c4 susbet enough??
We trained this model on 128 H100 GPUs and it is trained on the same exact mixture of 7B. Metrics improved throughout the run, so having more data was beneficial.