This model is GPT model created from scratch with following configuration:

vocab_size: 50257
context_length: 50
emb_dim: 768
n_heads: 12
n_layers: 12
drop_rate: 0.1
qkv_bias: false

Test dataset used for training and validation is one book from manu/project_gutenberg dataset, English split.

It acheives following results:

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support