Text Generation
Safetensors
PyTorch
model_hub_mixin

AegyptusTranslit1

This model is a GPT-2 based language model trained from scratch on transliterations of Ancient Egyptian texts. It uses custom tokenization optimized for linguistic features found in hieroglyphic transliteration.

Overview

  • Architecture: GPT-2 (custom configuration)
  • Tokenizer: Byte-level BPE (custom-trained)
  • Language: Ancient Egyptian (transliterated in Latin script)
  • Vocabulary size: 6,475
  • Training corpus: ~30,000 lines of fully intact, unambiguously readable transliterated sentences.
  • Training steps: ~500 steps over 20 epochs

Intended Use

This model is intended for:

  • Research in ancient Egyptian linguistics
  • Automatic completion or generation of transliterated hieroglyphic texts

Data Used For Training

Thesaurus Linguae Aegyptiae, Original Earlier Egyptian sentences, corpus v18, premium, https://huggingface.co/datasets/thesaurus-linguae-aegyptiae/tla-Earlier_Egyptian_original-v18-premium, v1.1, 2/16/2024 ed. by Tonio Sebastian Richter & Daniel A. Werning on behalf of the Berlin-Brandenburgische Akademie der Wissenschaften and Hans-Werner Fischer-Elfert & Peter Dils on behalf of the Sächsische Akademie der Wissenschaften zu Leipzig.

Training Progress

The model was trained for 20 epochs with the following loss metrics:

Epoch Step Train Loss Val Loss
Ep 1 000000 7.884 7.942
Ep 2 000040 3.949 4.032
Ep 3 000070 3.663 3.775
Ep 4 000090 3.551 3.671
Ep 5 000120 3.462 3.587
Ep 6 000140 3.407 3.561
Ep 7 000170 3.346 3.524
Ep 8 000190 3.331 3.518
Ep 9 000220 3.284 3.500
Ep 10 000240 3.264 3.496
Ep 11 000270 3.208 3.483
Ep 12 000290 3.177 3.465
Ep 13 000320 3.127 3.460
Ep 14 000340 3.095 3.452
Ep 15 000370 3.058 3.447
Ep 16 000390 3.020 3.443
Ep 17 000420 2.981 3.422
Ep 18 000440 2.952 3.413
Ep 19 000470 2.885 3.398
Ep 20 000490 2.845 3.391

This model has been pushed to the Hub using the PytorchModelHubMixin integration:

Downloads last month
7
Safetensors
Model size
17.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RamzyBakir/AegyptusTranslit1-gpt2-17M

Finetuned
(1802)
this model

Datasets used to train RamzyBakir/AegyptusTranslit1-gpt2-17M

Collection including RamzyBakir/AegyptusTranslit1-gpt2-17M