Model Details

This model is a 3B llama3 model pretrained from scratch with torchtitan on fineweb-edu with AdamW optimizer. 20x chinchilla rule for 60B tokens seen.

How to use

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_3b_chinchilla_8142025",
)

print(pipe("The key to life is"))

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

lm_eval --model hf --model_args pretrained=kz919/llama3_3b_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8

Tasks	Version	Filter	Metric		Value		Stderr
arc_challenge	1	none	acc	↑	0.3003	±	0.0134
		none	acc_norm	↑	0.3106	±	0.0135
arc_easy	1	none	acc	↑	0.6246	±	0.0099
		none	acc_norm	↑	0.5379	±	0.0102
hellaswag	1	none	acc	↑	0.3735	±	0.0048
		none	acc_norm	↑	0.4614	±	0.0050
lambada_openai	1	none	acc	↑	0.3685	±	0.0067
		none	perplexity	↓	32.9840	±	1.3564
openbookqa	1	none	acc	↑	0.2560	±	0.0195
		none	acc_norm	↑	0.3460	±	0.0213
piqa	1	none	acc	↑	0.6703	±	0.0110
		none	acc_norm	↑	0.6844	±	0.0108

MMLU

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.2421	±	0.0036
- humanities	2	none	acc	↑	0.2497	±	0.0063
- other	2	none	acc	↑	0.2568	±	0.0078
- social sciences	2	none	acc	↑	0.2265	±	0.0075
- stem	2	none	acc	↑	0.2315	±	0.0075

kz919
/

llama3_3b_chinchilla_8142025

Model Details

How to use

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

MMLU

Dataset used to train kz919/llama3_3b_chinchilla_8142025