Global Batch size : 384 seq_len: 2048
Checkpoint every 500 steps
i.e every 393216000 tokens or 400M Tokens
Current Revison available as
checkpoint-500
393Mcheckpoint-1000
786Mcheckpoint-1500
1.18Bcheckpoint-2000
1.57Bcheckpoint-2500
1.96B
max_lr : 7e-5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support