KT313's picture
Update README.md
5872d14 verified
metadata
license: mit
datasets:
  - allenai/c4
language:
  - en
library_name: transformers

Bingus-v0.1-60M-Base

A not-so-state-of-the-art 60M parameter transformer model.
Uses the olmo default architecture.

Specs

Heads: 8
Layers: 8
Dimension model: 512
Dimension mlp: 4096

eval/v3-small-c4_en-validation/Perplexity: 40.33

Training Data

Pretraining:

  • 5B Tokens C4 (preprocessed, from olmo-data.org)