This model is GPT model created from scratch with following configuration:

vocab_size: 50257
context_length: 50
emb_dim: 768
n_heads: 12
n_layers: 12
drop_rate: 0.1
qkv_bias: false

Test dataset used for training and validation is one book from manu/project_gutenberg dataset, English split.

It acheives following results:

  • Training loss: 7.500
  • Validation loss: 6.500
Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support