Model Card for OLMo-2-1B-Decayed-Early

This model is a research variant of OLMo-2-0425-1B. The model serves as a baseline for comparisons with OLMo-2-1B-Exp.

This model was obtained by linearly decaying the learning rate of the OLMo-2-0425-1B checkpoint at gradient step 90.000 to zero over 10.000 gradient steps.

The model is described in the paper "Train Once, Answer All: Many Pretraining Experiments for the Cost of One".

Note: This is the model that is named OLMo-2-1B in the paper. To avoid confusion with the fully trained OLMo-2-1B base model, it is named differently on Huggingface.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
olmo = AutoModelForCausalLM.from_pretrained("sbordt/OLMo-2-1B-Decayed-Early")
tokenizer = AutoTokenizer.from_pretrained("sbordt/OLMo-2-1B-Decayed-Early")

Downloads last month: 6

Safetensors

Model size

1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sbordt/OLMo-2-1B-Decayed-Early

Quantizations

1 model

Collection including sbordt/OLMo-2-1B-Decayed-Early

train-once-answer-all

Collection

Model checkpoints and training data modifications for the paper "Train Once, Answer All: Many Pretraining Experiments for the Cost of One" • 3 items • Updated 28 days ago