Text Generation
Transformers
PyTorch
TensorBoard
English
olmo
conversational
Inference Endpoints
Edit model card

OLMo-1B-0724 Instruct

This is a version of OLMo-1B-0724-hf that has undergone SFT and DPO training. See the SFT model card for details on SFT training.

This model is initialised from OLMo-1B-0724-SFT-hf, and then DPO trained on a cleaned ultrafeedback dataset for 3 epochs with a batch size of 32, beta of 0.1, linear warmup for 10% of training, and then linear cooldown.

Evals are as follows:

Metric OLMo-1B-0724-hf OLMo-1B-0724-SFT-hf OLMo-1B-0724-Instruct-hf (this model!)
MMLU 0-shot 25.0 36.0 36.7
GSM8k CoT 8-shot 7.0 12.5 12.5
BBH CoT 3-shot 22.5 27.2 30.6
HumanEval P@10 16.0 21.2 22.0
AlpacaEval 1 - 41.5 50.9
AlpacaEval 2 LC - 2.7 2.5
Toxigen % Toxic 80.3 59.7 14.1
TruthfulQA %Info+True 23.0 40.9 42.2
IFEval Loose Acc 20.5 26.1 24.2
XSTest F1 67.6 81.9 79.8
Average of above metrics 25.2 33.0 38.7

Model training and evaluation was performed using Open-instruct, so check that out for more details on evaluation.

Downloads last month
90
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train hamishivi/OLMo-1B-0724-Instruct-hf