Gemstone-384x13_lr_ablation

Gemstone-384x13_lr_ablation is part of the Gemstone Suite of Models. A set of models trained with varying widths and depths. This particular version, denoted by the _lr_ablation postfix, corresponds to an ablation detailed in the paper where we train the same suite of models but with a learning rate that is half of the original.

Training

We train using litgpt and AxoNN using AMD MI250X GPUs on Frontier at Oak Ridge National Laboratory with a global batch size of 2048.

Data

Train and validation data is taken from non-overlapping subsets of dolma. As such it is not an instruction model. This model is trained for 100 billion tokens (in contrast to the main suite, which is trained to 350 billion tokens), we upload checkpoints every 2 billion tokens (477 steps).

Using Gemstone-384x13_lr_ablation

The Gemstones are based on the gemma-2b architecture and use modeling_gemma.py to run using the transformers library.

Licence

This model is released under the apache-2.0 licence.

Contact

Please, feel free to contact us with any questions, or open a discussion thread.

Citation

@article{mcleish2024gemstones
    title={Gemstones: A Model Suite for Multi-Faceted Scaling Laws}, 
    author={Sean McLeish and John Kirchenbauer and David Yu Miller and Siddharth Singh and Abhinav Bhatele and Micah Goldblum and Ashwinee Panda and Tom Goldstein},
    journal={arXiv preprint arXiv:2502.},
    year={2025}
}

tomg-group-umd
/

Gemstone-384x13_lr_ablation