Model Details

This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with AdamW optimizer. 100B tokens seen.

How to use

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_cautious_100B_token_8222025",
)

print(pipe("The key to life is"))

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_100B_token_8222025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8

Tasks	Version	Filter	Metric		Value		Stderr
arc_challenge	1	none	acc	↑	0.3123	±	0.0135
		none	acc_norm	↑	0.3413	±	0.0139
arc_easy	1	none	acc	↑	0.6768	±	0.0096
		none	acc_norm	↑	0.5922	±	0.0101
hellaswag	1	none	acc	↑	0.4007	±	0.0049
		none	acc_norm	↑	0.5092	±	0.0050
lambada_openai	1	none	acc	↑	0.3901	±	0.0068
		none	perplexity	↓	21.6290	±	0.7689
openbookqa	1	none	acc	↑	0.2660	±	0.0198
		none	acc_norm	↑	0.3680	±	0.0216
piqa	1	none	acc	↑	0.7127	±	0.0106
		none	acc_norm	↑	0.7100	±	0.0106

MMLU

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.2515	±	0.0037
- humanities	2	none	acc	↑	0.2451	±	0.0063
- other	2	none	acc	↑	0.2716	±	0.0080
- social sciences	2	none	acc	↑	0.2476	±	0.0078
- stem	2	none	acc	↑	0.2452	±	0.0076

kz919
/

llama3_1b_100B_token_8222025

Model Details

How to use

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

MMLU

Dataset used to train kz919/llama3_1b_100B_token_8222025