mbart-large-50-b4-e4-lr2e-05-126k-jupyter

This model is a fine-tuned version of facebook/mbart-large-50 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 4

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len Ratio
0.2859	0.3173	10000	0.1897	83.883	0.9985
0.2495	0.6346	20000	0.1607	85.2621	0.9957
0.1575	0.9519	30000	0.1373	89.4101	1.0267
0.0923	1.2692	40000	0.1355	87.5438	1.0071
0.0831	1.5865	50000	0.1292	89.4781	1.0206
0.0817	1.9038	60000	0.1228	89.8947	1.0215
0.0489	2.2211	70000	0.1386	90.0854	1.0228
0.0353	2.5384	80000	0.1381	89.9357	1.0194
0.0346	2.8557	90000	0.1348	89.8965	1.0178
0.0231	3.1730	100000	0.1524	87.853	0.9987
0.0138	3.4903	110000	0.1521	89.1289	1.0104
0.0129	3.8076	120000	0.1505	89.0184	1.008