Model Card

A fine-tuned MarianMT model that translates Turkish prose into English with a deliberate “Kafkaesque” flavour.
The checkpoint starts from the bilingual Helsinki-NLP/opus-mt-tr-en base model and is further trained on ~10 k parallel sentences taken from published Turkish & English versions of Franz Kafka’s works.
The goal was purely experimental:

Can a compact MT model be nudged toward a specific literary voice by exposing it to a small, style-consistent corpus?


Model Details

Base architecture MarianMT (Transformer encoder-decoder)
Source languages tr (modern Turkish)
Target language en (contemporary English)
Training corpus 10 014 sentence pairs manually aligned from Turkish editions of Kafka’s short stories & Die Verwandlung and their authorised English translations
Framework 🤗 Transformers ≥ 4.40
License Apache-2.0 for the model code + weights ✧ ⚠️ Translations used for fine-tuning may still be under copyright; see “Data & Copyright” below

Training Procedure

  • Hardware: 1× A100 40 GB (Google Colab Pro)
  • Hyper-params: 5 epochs, batch 16 (eff.), LR 5 × 10⁻⁵, linear decay, warm-up 200 steps
  • Early stopping: patience 3 (@ 500-step evals) monitored on BLEU
  • Best checkpoint: step 2 500
    • Train loss ≈ 0.61 → Val loss ≈ 1.20
    • SacreBLEU (500-sent dev) baseline 24.7 → tuned 31.5

Quick Start

from transformers import MarianMTModel, MarianTokenizer

tr_en_model_name = "yeniguno/opus-mt-tr-en-kafkaesque"
tokenizer = MarianTokenizer.from_pretrained(tr_en_model_name)
model = MarianMTModel.from_pretrained(tr_en_model_name)

turkish_text ="Komşum her gece tam aynı tuhaf saatte, elinde küçük, kilitli bir çantayla dairesinden çıkıyor."

inputs = tokenizer(turkish_text, return_tensors="pt", padding=True)
output_ids = model.generate(**inputs)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Downloads last month
6
Safetensors
Model size
76.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeniguno/opus-mt-tr-en-kafkaesque

Finetuned
(2)
this model