AegyptusTranslit1
This model is a GPT-2 based language model trained from scratch on transliterations of Ancient Egyptian texts. It uses custom tokenization optimized for linguistic features found in hieroglyphic transliteration.
Overview
- Architecture: GPT-2 (custom configuration)
- Tokenizer: Byte-level BPE (custom-trained)
- Language: Ancient Egyptian (transliterated in Latin script)
- Vocabulary size: 6,475
- Training corpus: ~30,000 lines of fully intact, unambiguously readable transliterated sentences.
- Training steps: ~500 steps over 20 epochs
Intended Use
This model is intended for:
- Research in ancient Egyptian linguistics
- Automatic completion or generation of transliterated hieroglyphic texts
Data Used For Training
Thesaurus Linguae Aegyptiae, Original Earlier Egyptian sentences, corpus v18, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-Earlier_Egyptian_original-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.
Training Progress
The model was trained for 20 epochs with the following loss metrics:
Epoch | Step | Train Loss | Val Loss |
---|---|---|---|
Ep 1 | 000000 | 7.884 | 7.942 |
Ep 2 | 000040 | 3.949 | 4.032 |
Ep 3 | 000070 | 3.663 | 3.775 |
Ep 4 | 000090 | 3.551 | 3.671 |
Ep 5 | 000120 | 3.462 | 3.587 |
Ep 6 | 000140 | 3.407 | 3.561 |
Ep 7 | 000170 | 3.346 | 3.524 |
Ep 8 | 000190 | 3.331 | 3.518 |
Ep 9 | 000220 | 3.284 | 3.500 |
Ep 10 | 000240 | 3.264 | 3.496 |
Ep 11 | 000270 | 3.208 | 3.483 |
Ep 12 | 000290 | 3.177 | 3.465 |
Ep 13 | 000320 | 3.127 | 3.460 |
Ep 14 | 000340 | 3.095 | 3.452 |
Ep 15 | 000370 | 3.058 | 3.447 |
Ep 16 | 000390 | 3.020 | 3.443 |
Ep 17 | 000420 | 2.981 | 3.422 |
Ep 18 | 000440 | 2.952 | 3.413 |
Ep 19 | 000470 | 2.885 | 3.398 |
Ep 20 | 000490 | 2.845 | 3.391 |
This model has been pushed to the Hub using the PytorchModelHubMixin integration:
- Downloads last month
- 7
Model tree for RamzyBakir/AegyptusTranslit1-gpt2-17M
Base model
openai-community/gpt2