Lemmatization
Collection
On the Role of Morphological Information for Contextual Lemmatization
β’
8 items
β’
Updated
This model is a fine-tuned version of xlm-roberta-large for the contextual lemmatization task.
The datasets used for training are extracted from the data of the SIGMORPHON 2019 Shared Task.
The model for the Polish language was trained using LFG corpus.
SEED: 42
EPOCHS: 20
BATCH SIZE: 4
GRADIENT ACCUMULATION STEPS: 2
LEARNING RATE: 0.00002
WARMUP: 0.06
WEIGHT DECAY: 0.1
For more details you can see the paper and the repository:
Contact: Olia Toporkov and Rodrigo Agerri HiTZ Center - Ixa, University of the Basque Country UPV/EHU
Funding:
Model type: xlm-roberta-large
Language(s) (NLP): Polish
License: apache-2.0
@article{10.1162/coli_a_00497,
author = {Toporkov, Olia and Agerri, Rodrigo},
title = "{On the Role of Morphological Information for Contextual
Lemmatization}",
journal = {Computational Linguistics},
volume = {50},
number = {1},
pages = {157-191},
year = {2024},
month = {03},
issn = {0891-2017},
doi = {10.1162/coli_a_00497},
url = {https://doi.org/10.1162/coli\_a\_00497},
eprint = {https://direct.mit.edu/coli/article-pdf/50/1/157/2367156/coli\_a\_00497.pdf},
}