![](https://cdn-avatars.huggingface.co/v1/production/uploads/1675353480936-63d98af1897746d6496177df.png)
tomg-group-umd/Gemstone-768x45
Text Generation
•
Updated
•
3
Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80.
Note Our main training runs (no postfix) are trained for 350 billion tokens with checkpoints stored are revisions at every 2 billion tokens. We also have ablations over learning rate (postfix: "_lr_ablation") and cooldown periods (postfix "_cooldown" still uploading).