distilgpt2-wikitext
This model is a fine-tuned version of distilbert/distilgpt2 on wikitext dataset. It achieves the following results on the evaluation set:
- Loss: 3.6354
- Perplexity: 37.92
Model description
This is a DistilGPT-2 model fine-tuned on the Wikitext-2 dataset for causal language modeling (CLM).
The model predicts the next token given previous tokens, suitable for text generation tasks.
- Base model:
distilgpt2 - Fine-tuning dataset:
wikitext-2-raw-v1 - Task: Causal Language Modeling / Text Generation
Intended uses & limitations
- Autocomplete text
- Experimentation with small-scale language modeling
- Educational purposes and research
Limitations section
- Trained on a small dataset (Wikitext-2), so knowledge is limited.
- May generate plausible-sounding but incorrect or biased text.
- Not suitable for production-level AI assistants without further fine-tuning.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 3
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 3.6842 | 1.0 | 4667 | 3.6529 |
| 3.5672 | 2.0 | 9334 | 3.6371 |
| 3.5242 | 3.0 | 14001 | 3.6354 |
Final Evalidation Loss: 3.63
Perplexity:37.92
Framework versions
- Transformers 4.56.0
- Pytorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 3
Model tree for Sebastian-18/distilgpt2-wikitext
Base model
distilbert/distilgpt2