distilgpt2-wikitext

This model is a fine-tuned version of distilbert/distilgpt2 on wikitext dataset. It achieves the following results on the evaluation set:

Loss: 3.6354
Perplexity: 37.92

Model description

This is a DistilGPT-2 model fine-tuned on the Wikitext-2 dataset for causal language modeling (CLM).
The model predicts the next token given previous tokens, suitable for text generation tasks.

Base model: distilgpt2
Fine-tuning dataset: wikitext-2-raw-v1
Task: Causal Language Modeling / Text Generation

Intended uses & limitations

Autocomplete text
Experimentation with small-scale language modeling
Educational purposes and research

Limitations section

Trained on a small dataset (Wikitext-2), so knowledge is limited.
May generate plausible-sounding but incorrect or biased text.
Not suitable for production-level AI assistants without further fine-tuning.

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
3.6842	1.0	4667	3.6529
3.5672	2.0	9334	3.6371
3.5242	3.0	14001	3.6354

Final Evalidation Loss: 3.63

Perplexity:37.92

Framework versions

Transformers 4.56.0
Pytorch 2.8.0+cu126
Datasets 4.0.0
Tokenizers 0.22.0

Downloads last month: 3

Safetensors

Model size

81.9M params

Tensor type

F32

Model tree for Sebastian-18/distilgpt2-wikitext

Base model

distilbert/distilgpt2

Finetuned

(920)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard