Model Card for xlm-roberta-large-lemma-eu

This model is a fine-tuned version of xlm-roberta-large for the contextual lemmatization task.
The datasets used for training are extracted from the data of the SIGMORPHON 2019 Shared Task.
The model for the Basque language was trained using BDT corpus.

Training Hyperparameters

SEED: 42
EPOCHS: 20
BATCH SIZE: 8
GRADIENT ACCUMULATION STEPS: 2
LEARNING RATE: 0.00005
WARMUP: 0.06
WEIGHT DECAY: 0.01

Results

For more details you can see the paper and the repository:

📖 Paper: On the Role of Morphological Information for Contextual Lemmatization
🌐 Repository: Datasets and training files

Contact: Olia Toporkov and Rodrigo Agerri HiTZ Center - Ixa, University of the Basque Country UPV/EHU
Funding:
Model type: xlm-roberta-large
Language(s) (NLP): Basque License: apache-2.0

Citation

@article{10.1162/coli_a_00497,
    author = {Toporkov, Olia and Agerri, Rodrigo},
    title = "{On the Role of Morphological Information for Contextual
                    Lemmatization}",
    journal = {Computational Linguistics},
    volume = {50},
    number = {1},
    pages = {157-191},
    year = {2024},
    month = {03},
    issn = {0891-2017},
    doi = {10.1162/coli_a_00497},
    url = {https://doi.org/10.1162/coli\_a\_00497},
    eprint = {https://direct.mit.edu/coli/article-pdf/50/1/157/2367156/coli\_a\_00497.pdf},
}

Downloads last month: 12

Model tree for HiTZ/xlm-roberta-large-lemma-eu

Quantizations

1 model

Collection including HiTZ/xlm-roberta-large-lemma-eu

Lemmatization

Collection

On the Role of Morphological Information for Contextual Lemmatization • 8 items • Updated 5 days ago