Model Details
This model is a 3B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 60B tokens seen.
How to use
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="kz919/llama3_3b_chinchilla_8142025",
)
print(pipe("The key to life is"))
Downstream Eval
ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA
lm_eval --model hf --model_args pretrained=kz919/llama3_3b_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
Tasks |
Version |
Filter |
n-shot |
Metric |
|
Value |
|
Stderr |
arc_challenge |
1 |
none |
0 |
acc |
↑ |
0.2892 |
± |
0.0133 |
|
|
none |
0 |
acc_norm |
↑ |
0.2892 |
± |
0.0133 |
arc_easy |
1 |
none |
0 |
acc |
↑ |
0.6162 |
± |
0.0100 |
|
|
none |
0 |
acc_norm |
↑ |
0.5311 |
± |
0.0102 |
hellaswag |
1 |
none |
0 |
acc |
↑ |
0.3698 |
± |
0.0048 |
|
|
none |
0 |
acc_norm |
↑ |
0.4611 |
± |
0.0050 |
lambada_openai |
1 |
none |
0 |
acc |
↑ |
0.3670 |
± |
0.0067 |
|
|
none |
0 |
perplexity |
↓ |
34.2265 |
± |
1.4167 |
openbookqa |
1 |
none |
0 |
acc |
↑ |
0.2380 |
± |
0.0191 |
|
|
none |
0 |
acc_norm |
↑ |
0.3460 |
± |
0.0213 |
piqa |
1 |
none |
0 |
acc |
↑ |
0.6904 |
± |
0.0108 |
|
|
none |
0 |
acc_norm |
↑ |
0.6975 |
± |
0.0107 |
MMLU
Groups |
Version |
Filter |
n-shot |
Metric |
|
Value |
|
Stderr |
mmlu |
2 |
none |
|
acc |
↑ |
0.2453 |
± |
0.0036 |
- humanities |
2 |
none |
|
acc |
↑ |
0.2502 |
± |
0.0063 |
- other |
2 |
none |
|
acc |
↑ |
0.2620 |
± |
0.0079 |
- social sciences |
2 |
none |
|
acc |
↑ |
0.2320 |
± |
0.0076 |
- stem |
2 |
none |
|
acc |
↑ |
0.2347 |
± |
0.0076 |