Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- ig
|
5 |
+
base_model:
|
6 |
+
- Davlan/afro-xlmr-large
|
7 |
+
---
|
8 |
+
# masakhane/igbo-pos-tagger-afroxlmr
|
9 |
+
## Model description
|
10 |
+
**igbo-pos-tagger-afroxlmr** is a POS tagger for Igbo language based on [MasakhaPOS](https://github.com/masakhane-io/masakhane-pos) dataset.
|
11 |
+
## Intended uses & limitations
|
12 |
+
#### How to use
|
13 |
+
You can use this model with Transformers *pipeline* for POS.
|
14 |
+
```python
|
15 |
+
|
16 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline
|
17 |
+
|
18 |
+
model_name = "masakhane/igbo-pos-tagger-afroxlmr"
|
19 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
20 |
+
model = AutoModelForTokenClassification.from_pretrained(model_name)
|
21 |
+
|
22 |
+
pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
|
23 |
+
outputs = pipeline("Nke a na-abịa dịka Trump rụtụrụ aka na inyefe okeala bụ ihe nwereike ịkwụsị agha dị n'etiti mba abụọ ahụ.")
|
24 |
+
print(outputs)
|
25 |
+
|
26 |
+
|
27 |
+
```
|
28 |
+
#### Limitations and bias
|
29 |
+
This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
|
30 |
+
## Training data
|
31 |
+
This model was fine-tuned on Igbo POS dataset with the [UD POS tags](https://universaldependencies.org/u/pos/)
|
32 |
+
|
33 |
+
### BibTeX entry and citation info
|
34 |
+
```
|
35 |
+
@inproceedings{dione-etal-2023-masakhapos,
|
36 |
+
title = "{M}asakha{POS}: Part-of-Speech Tagging for Typologically Diverse {A}frican languages",
|
37 |
+
author = "Dione, Cheikh M. Bamba and
|
38 |
+
Adelani, David Ifeoluwa and
|
39 |
+
Nabende, Peter and
|
40 |
+
Alabi, Jesujoba and
|
41 |
+
Sindane, Thapelo and
|
42 |
+
Buzaaba, Happy and
|
43 |
+
Muhammad, Shamsuddeen Hassan and
|
44 |
+
Emezue, Chris Chinenye and
|
45 |
+
Ogayo, Perez and
|
46 |
+
Aremu, Anuoluwapo and
|
47 |
+
Gitau, Catherine and
|
48 |
+
Mbaye, Derguene and
|
49 |
+
Mukiibi, Jonathan and
|
50 |
+
Sibanda, Blessing and
|
51 |
+
Dossou, Bonaventure F. P. and
|
52 |
+
Bukula, Andiswa and
|
53 |
+
Mabuya, Rooweither and
|
54 |
+
Tapo, Allahsera Auguste and
|
55 |
+
Munkoh-Buabeng, Edwin and
|
56 |
+
Memdjokam Koagne, Victoire and
|
57 |
+
Ouoba Kabore, Fatoumata and
|
58 |
+
Taylor, Amelia and
|
59 |
+
Kalipe, Godson and
|
60 |
+
Macucwa, Tebogo and
|
61 |
+
Marivate, Vukosi and
|
62 |
+
Gwadabe, Tajuddeen and
|
63 |
+
Elvis, Mboning Tchiaze and
|
64 |
+
Onyenwe, Ikechukwu and
|
65 |
+
Atindogbe, Gratien and
|
66 |
+
Adelani, Tolulope and
|
67 |
+
Akinade, Idris and
|
68 |
+
Samuel, Olanrewaju and
|
69 |
+
Nahimana, Marien and
|
70 |
+
Musabeyezu, Th{\'e}og{\`e}ne and
|
71 |
+
Niyomutabazi, Emile and
|
72 |
+
Chimhenga, Ester and
|
73 |
+
Gotosa, Kudzai and
|
74 |
+
Mizha, Patrick and
|
75 |
+
Agbolo, Apelete and
|
76 |
+
Traore, Seydou and
|
77 |
+
Uchechukwu, Chinedu and
|
78 |
+
Yusuf, Aliyu and
|
79 |
+
Abdullahi, Muhammad and
|
80 |
+
Klakow, Dietrich",
|
81 |
+
editor = "Rogers, Anna and
|
82 |
+
Boyd-Graber, Jordan and
|
83 |
+
Okazaki, Naoaki",
|
84 |
+
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
|
85 |
+
month = jul,
|
86 |
+
year = "2023",
|
87 |
+
address = "Toronto, Canada",
|
88 |
+
publisher = "Association for Computational Linguistics",
|
89 |
+
url = "https://aclanthology.org/2023.acl-long.609/",
|
90 |
+
doi = "10.18653/v1/2023.acl-long.609",
|
91 |
+
pages = "10883--10900",
|
92 |
+
abstract = "In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages."
|
93 |
+
}
|
94 |
+
```
|