This model is GPT model created from scratch with following configuration:
vocab_size: 50257
context_length: 50
emb_dim: 768
n_heads: 12
n_layers: 12
drop_rate: 0.1
qkv_bias: false
Test dataset used for training and validation is one book from manu/project_gutenberg dataset, English split.
It acheives following results:
- Training loss: 7.500
- Validation loss: 6.500
- Downloads last month
- 42
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support