mT5-Small (Taxi1500 Maltese)
This model is a fine-tuned version of google/mt5-small on the Taxi1500 dataset. It achieves the following results on the test set:
- Loss: 0.7
- F1: 0.4220
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adafactor and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
| Training Loss | Epoch | Step | Validation Loss | F1 |
|---|---|---|---|---|
| No log | 1.0 | 27 | 7.1210 | 0.4432 |
| No log | 2.0 | 54 | 0.7519 | 0.4546 |
| No log | 3.0 | 81 | 0.7215 | 0.3865 |
| No log | 4.0 | 108 | 0.7781 | 0.4213 |
| No log | 5.0 | 135 | 0.7418 | 0.3728 |
| No log | 6.0 | 162 | 0.7876 | 0.3881 |
| No log | 7.0 | 189 | 0.8915 | 0.3570 |
| No log | 8.0 | 216 | 0.7115 | 0.3611 |
| No log | 9.0 | 243 | 0.7800 | 0.3487 |
| No log | 10.0 | 270 | 0.7971 | 0.3928 |
| No log | 11.0 | 297 | 0.7406 | 0.3707 |
| No log | 12.0 | 324 | 0.7309 | 0.3527 |
| No log | 13.0 | 351 | 0.6971 | 0.4233 |
| No log | 14.0 | 378 | 0.8458 | 0.3515 |
| No log | 15.0 | 405 | 0.7301 | 0.3515 |
| No log | 16.0 | 432 | 2.9614 | 0.1838 |
| No log | 17.0 | 459 | 0.7779 | 0.1903 |
| No log | 18.0 | 486 | 0.7124 | 0.3556 |
| 2.0694 | 19.0 | 513 | 0.7182 | 0.425 |
| 2.0694 | 20.0 | 540 | 0.7275 | 0.4385 |
| 2.0694 | 21.0 | 567 | 0.7660 | 0.3660 |
| 2.0694 | 22.0 | 594 | 0.7280 | 0.3556 |
Framework versions
- Transformers 4.48.2
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/mt5-small_taxi1500-mlt
Base model
google/mt5-smallCollection including MLRS/mt5-small_taxi1500-mlt
Evaluation results
- Macro-averaged F1 on taxi1500MELABench Leaderboard75.190
