HIT-60m

Introducing HIT-60m, a model capable of handling a diverse number of Hitttie translation, and correction tasks. Though designed for German, it is also capable of translating to English, though with poorer quality.

1. Model description

This is an instruct model, meaning it is capable of multiple tasks. It is intended primarily for translation to German and English, but it can also be used for reverse translation as well for both languages.

Translation Instructions:

  • "Translate complex Hittite transliteration to German" + complex transliteration → German

  • "Translate Hittite simple transliteration to German" + simple transliteration → German

  • "Translate Hittite grouped transliteration to German" + transliteration with special symbols → German

  • "Translate German to simple Hittite transliteration" + German → Hittite simple transliteration with no special symbols

  • "Translate German to grouped Hittite transliteration" + German → Hittite transliteration grouped into words with special symbols

  • "Translate complex Hittite transliteration to English" + complex transliteration → English

  • "Translate Hittite simple transliteration to English" + simple transliteration → English

  • "Translate Hittite grouped transliteration to English" + transliteration with special symbols → English

  • "Translate English to simple Hittite transliteration" + English → Hittite simple transliteration with no special symbols

  • "Translate English to grouped Hittite transliteration" + English → Hittite transliteration grouped into words with special symbols

Mising Sign Insructions:

  • 'Identify the missing signs: ' + string of Hittite transliterations

Base model

This is a finetuned version of google's t5-small.

2. Usage (code snippet)

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_path = "Thalesian/HIT-60m"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, local_files_only=True)
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

# 1) Prepare your cuneiform input
prompt_de = "Translate complex Hittite transliteration to German: "
prompt_en = "Translate complex Hittite transliteration to English: "
input_text = "is ka4 ru uh k u babbar kas da a i ar ha "

# 2) Tokenize & get model outputs
inputs_de = tokenizer(prompt_de + input_text, return_tensors="pt")
inputs_en = tokenizer(prompt_en + input_text, return_tensors="pt")
outputs_de = model.generate(**inputs_de, max_length=64)
outputs_en = model.generate(**inputs_en, max_length=64)

# 3) Decode prediction
prediction_de = tokenizer.decode(outputs_de[0], skip_special_tokens=True)
prediction_en = tokenizer.decode(outputs_en[0], skip_special_tokens=True)

print("German Reference:", "opfergefa silber bier nehmen von weg")
print("German Prediction:", prediction_de)

print("English Reference:", "sacrificial vessel take silver beer from * away")
print("English Prediction:", prediction_en)

3. Training and evaluation data

Data was used from the Hethitologie Portal Mainz (HPM), organized by Emma Yavasan and Shai Gordin on Zenodo.

Training procedure

It was trained in 5 tranches with different datasets and collators:

  • a pretraining and training dataset (transliterations only) of CTH Hittite transliterated data (7,099 texts)
  • a google-translated copy of the German to create an (inferior) English set.

There were 3 different collation methods:

  • pretraining collation which introduces an asterisk to represent missing signs
  • missing sign translations, which randomly introduces an asterisk to represent missing signs
  • translation error, which randomly introduces the wrong sign into input data to simulate transliteration or glyph error

Final stage training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 256
  • total_eval_batch_size: 256
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 5000
  • num_epochs: 200

Framework versions

  • Transformers 4.50.3
  • PyTorch 2.6.0+cu126
  • Datasets 3.3.0
  • Tokenizers 0.21.1

4. Metrics

From Language From Script To Language To Script BLEU
Hittite Transliteration German Latin 83.38
Hittite Transliteration English Latin 60.92
German Latin Hittite Transliteration 42.90
English Latin Hittite Transliteration 38.50

5. Intended uses

– Short Hittite transliterated lines, German and English, reverse lookup experiments.

6. Limitations

– Context window is only 64 tokens, it is untested on long passages.

7. How to Cite

@misc{drake2025hit60m,
  title        = {{HIT-60m}: A T5-Small for Hittite⇄German+English},
  author       = {Drake, B. Lee},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/Thalesian/HIT-60m}}
}
Downloads last month
1
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support