Model Details

This model is a 1B llama3 model pretrained from scratch with torchtitan on fineweb-edu with C_AdamW optimizer. 20x chinchilla rule for 20B tokens seen.

How to use

import torch
from transformers import pipeline


pipe = pipeline(
    "text-generation",
    model="kz919/llama3_1b_cautious_chinchilla_8132025",
)

print(pipe("The key to life is"))

Downstream Eval

ARC, Hellaswag, Lambda_OpenAI, OpenbookQA, PIQA

lm_eval --model hf --model_args pretrained=kz919/llama3_1b_cautious_chinchilla_8142025,dtype="bfloat16",add_bos_token=True --tasks lambada_openai,hellaswag,piqa,arc_easy,arc_challenge,openbookqa --device cuda:7 --batch_size 8
Tasks Version Filter n-shot Metric Value Stderr
arc_challenge 1 none 0 acc ↑ 0.2730 ± 0.0130
none 0 acc_norm ↑ 0.2765 ± 0.0131
arc_easy 1 none 0 acc ↑ 0.5960 ± 0.0101
none 0 acc_norm ↑ 0.5290 ± 0.0102
hellaswag 1 none 0 acc ↑ 0.3442 ± 0.0047
none 0 acc_norm ↑ 0.4122 ± 0.0049
lambada_openai 1 none 0 acc ↑ 0.3264 ± 0.0065
none 0 perplexity ↓ 39.7510 ± 1.6063
openbookqa 1 none 0 acc ↑ 0.2200 ± 0.0185
none 0 acc_norm ↑ 0.3300 ± 0.0210
piqa 1 none 0 acc ↑ 0.6872 ± 0.0108
none 0 acc_norm ↑ 0.6850 ± 0.0108

MMLU

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc ↑ 0.2536 ± 0.0037
- humanities 2 none acc ↑ 0.2667 ± 0.0064
- other 2 none acc ↑ 0.2475 ± 0.0077
- social sciences 2 none acc ↑ 0.2337 ± 0.0076
- stem 2 none acc ↑ 0.2594 ± 0.0078
Downloads last month
44
Safetensors
Model size
1.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train kz919/llama3_1b_cautious_chinchilla_8142025