library_name: transformers | |
tags: [] | |
# Tetun BERT model | |
A fine-tune of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) trained on Tetun data with a masked language modelling objective. | |
Tetun data used: [MADLAD](https://huggingface.co/datasets/allenai/MADLAD-400) tet clean split (~40k documents). | |
Trained for 10 epochs with hyper params from the [MasakhaNER paper](https://aclanthology.org/2021.tacl-1.66.pdf) (lr 5e-5 etc). | |