AegyptusTranslit1

This model is a GPT-2 based language model trained from scratch on transliterations of Ancient Egyptian texts. It uses custom tokenization optimized for linguistic features found in hieroglyphic transliteration.

Overview

Architecture: GPT-2 (custom configuration)
Tokenizer: Byte-level BPE (custom-trained)
Language: Ancient Egyptian (transliterated in Latin script)
Vocabulary size: 6,475
Training corpus: ~30,000 lines of fully intact, unambiguously readable transliterated sentences.
Training steps: ~500 steps over 20 epochs

Intended Use

This model is intended for:

Research in ancient Egyptian linguistics
Automatic completion or generation of transliterated hieroglyphic texts

Data Used For Training

Thesaurus Linguae Aegyptiae, Original Earlier Egyptian sentences, corpus v18, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-Earlier_Egyptian_original-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.

Training Progress

The model was trained for 20 epochs with the following loss metrics:

Epoch	Step	Train Loss	Val Loss
Ep 1	000000	7.884	7.942
Ep 2	000040	3.949	4.032
Ep 3	000070	3.663	3.775
Ep 4	000090	3.551	3.671
Ep 5	000120	3.462	3.587
Ep 6	000140	3.407	3.561
Ep 7	000170	3.346	3.524
Ep 8	000190	3.331	3.518
Ep 9	000220	3.284	3.500
Ep 10	000240	3.264	3.496
Ep 11	000270	3.208	3.483
Ep 12	000290	3.177	3.465
Ep 13	000320	3.127	3.460
Ep 14	000340	3.095	3.452
Ep 15	000370	3.058	3.447
Ep 16	000390	3.020	3.443
Ep 17	000420	2.981	3.422
Ep 18	000440	2.952	3.413
Ep 19	000470	2.885	3.398
Ep 20	000490	2.845	3.391

This model has been pushed to the Hub using the PytorchModelHubMixin integration:

Code: [https://github.com/RamzyBakir/AegyptusTranslit-GPT2]

RamzyBakir
/

AegyptusTranslit1-gpt2-17M

AegyptusTranslit1

Overview

Intended Use

Data Used For Training

Training Progress

Model tree for RamzyBakir/AegyptusTranslit1-gpt2-17M

Datasets used to train RamzyBakir/AegyptusTranslit1-gpt2-17M

Collection including RamzyBakir/AegyptusTranslit1-gpt2-17M

ancient-egyptian-nlp