This model is GPT model created from scratch, following instructions from the book "Build a Large Language Model (from scratch)" by Sebastian Raschka, with following configuration:

vocab_size: 50257
context_length: 512
emb_dim: 768
n_heads: 12
n_layers: 12
drop_rate: 0.1
qkv_bias: false

Test dataset used for training and validation is one book from manu/project_gutenberg dataset, English split. It is trained for 4.6 minutes (5 books) in Google Colab on A100 GPU (used around 5 compute units). Only 1 epoch. Total tokens seen in training: 177900. Total tokens seen in validation: 152700.

It acheives following results:

  • Training loss: 5.964
  • Validation loss: 6.939
Downloads last month
63
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support