opus-mt-cs-en-Prefix-Finetuned

This model is a fine-tuned version of Helsinki-NLP/opus-mt-cs-en on a dataset of Czech to English pairs of sentence prefixes (unfinished sentences). It is meant to improve text-to-text Simultaneous translation from Czech to English.

Before fine-tuning, it achieves the following results on the evaluation set:

Loss: 1.2841
Model Preparation Time: 0.0019
Bleu: 55.8042

After fine-tuning, the best checkpoint at epoch 3 (saved here) achieves the following results on the evaluation set:

Loss: 0.6869
Model Preparation Time: 0.0019
Bleu: 64.4592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

There are always two random prefixes of the original sentence pair in the data. Train and eval contain prefixes from distinct original sentences.

Example of the data:

{"pref_source": "Respektoval jsem ho.", "pref_target": "I respected that man."}
{"pref_source": "Respektoval", "pref_target": "I respected"}
{"pref_source": "Společnost", "pref_target": "FxPro Global"}
{"pref_source": "Společnost FxPro Global Markets MENA Limited je autorizována a regulována Dubai Financial Services Authority (referenční č.", "pref_target": "FxPro Global Markets MENA Limited is authorised and regulated by the Dubai Financial Services Authority (reference"}
{"pref_source": "Jsi si jistá, že se tady cítíš", "pref_target": "Mm-hmm. Yeah."}
{"pref_source": "Jsi si jistá, že se tady", "pref_target": "Mm-hmm."}
{"pref_source": "Jsme v", "pref_target": "We're fine,"}
{"pref_source": "Jsme v pořádku Margaret .", "pref_target": "We're fine, Margaret."}
{"pref_source": "Svobodná", "pref_target": "Free"}
{"pref_source": "Svobodná vůle.", "pref_target": "Free will, and all."}
{"pref_source": "Všechny oběti", "pref_target": "All the victims"}
{"pref_source": "Všechny", "pref_target": "All the"}

Training data: ~1.734M prefixes
Evaluation data: 5k prefixes

Training procedure

Trained on NVIDIA H100 NVL 94GB.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 220
eval_batch_size: 700
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 10
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Model Preparation Time	Bleu
0.749	1.0	7881	0.7074	0.0019	63.3123
0.6925	2.0	15762	0.6927	0.0019	63.9972
0.6529	3.0	23643	0.6869	0.0019	64.4592
0.626	4.0	31524	0.6817	0.0019	63.8378
0.5989	5.0	39405	0.6820	0.0019	64.2718

Framework versions

Transformers 4.51.3
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

davidruda
/

opus-mt-cs-en-Prefix-Finetuned