This model is GPT model created from scratch, following instructions from the book "Build a Large Language Model (from scratch)" by Sebastian Raschka, with following configuration:
vocab_size: 50257
context_length: 512
emb_dim: 768
n_heads: 12
n_layers: 12
drop_rate: 0.1
qkv_bias: false
Test dataset used for training and validation is one book from manu/project_gutenberg dataset, English split. It is trained for 4.6 minutes (5 books) in Google Colab on A100 GPU (used around 5 compute units). Only 1 epoch. Total tokens seen in training: 177900. Total tokens seen in validation: 152700.
It acheives following results:
- Training loss: 5.964
- Validation loss: 6.939
- Downloads last month
- 63
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support