Davlan commited on
Commit
4deb52b
·
verified ·
1 Parent(s): 1dc383e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -0
README.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - ig
5
+ base_model:
6
+ - Davlan/afro-xlmr-large
7
+ ---
8
+ # masakhane/igbo-pos-tagger-afroxlmr
9
+ ## Model description
10
+ **igbo-pos-tagger-afroxlmr** is a POS tagger for Igbo language based on [MasakhaPOS](https://github.com/masakhane-io/masakhane-pos) dataset.
11
+ ## Intended uses & limitations
12
+ #### How to use
13
+ You can use this model with Transformers *pipeline* for POS.
14
+ ```python
15
+
16
+ from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline
17
+
18
+ model_name = "masakhane/igbo-pos-tagger-afroxlmr"
19
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
20
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
21
+
22
+ pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
23
+ outputs = pipeline("Nke a na-abịa dịka Trump rụtụrụ aka na inyefe okeala bụ ihe nwereike ịkwụsị agha dị n'etiti mba abụọ ahụ.")
24
+ print(outputs)
25
+
26
+
27
+ ```
28
+ #### Limitations and bias
29
+ This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains.
30
+ ## Training data
31
+ This model was fine-tuned on Igbo POS dataset with the [UD POS tags](https://universaldependencies.org/u/pos/)
32
+
33
+ ### BibTeX entry and citation info
34
+ ```
35
+ @inproceedings{dione-etal-2023-masakhapos,
36
+ title = "{M}asakha{POS}: Part-of-Speech Tagging for Typologically Diverse {A}frican languages",
37
+ author = "Dione, Cheikh M. Bamba and
38
+ Adelani, David Ifeoluwa and
39
+ Nabende, Peter and
40
+ Alabi, Jesujoba and
41
+ Sindane, Thapelo and
42
+ Buzaaba, Happy and
43
+ Muhammad, Shamsuddeen Hassan and
44
+ Emezue, Chris Chinenye and
45
+ Ogayo, Perez and
46
+ Aremu, Anuoluwapo and
47
+ Gitau, Catherine and
48
+ Mbaye, Derguene and
49
+ Mukiibi, Jonathan and
50
+ Sibanda, Blessing and
51
+ Dossou, Bonaventure F. P. and
52
+ Bukula, Andiswa and
53
+ Mabuya, Rooweither and
54
+ Tapo, Allahsera Auguste and
55
+ Munkoh-Buabeng, Edwin and
56
+ Memdjokam Koagne, Victoire and
57
+ Ouoba Kabore, Fatoumata and
58
+ Taylor, Amelia and
59
+ Kalipe, Godson and
60
+ Macucwa, Tebogo and
61
+ Marivate, Vukosi and
62
+ Gwadabe, Tajuddeen and
63
+ Elvis, Mboning Tchiaze and
64
+ Onyenwe, Ikechukwu and
65
+ Atindogbe, Gratien and
66
+ Adelani, Tolulope and
67
+ Akinade, Idris and
68
+ Samuel, Olanrewaju and
69
+ Nahimana, Marien and
70
+ Musabeyezu, Th{\'e}og{\`e}ne and
71
+ Niyomutabazi, Emile and
72
+ Chimhenga, Ester and
73
+ Gotosa, Kudzai and
74
+ Mizha, Patrick and
75
+ Agbolo, Apelete and
76
+ Traore, Seydou and
77
+ Uchechukwu, Chinedu and
78
+ Yusuf, Aliyu and
79
+ Abdullahi, Muhammad and
80
+ Klakow, Dietrich",
81
+ editor = "Rogers, Anna and
82
+ Boyd-Graber, Jordan and
83
+ Okazaki, Naoaki",
84
+ booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
85
+ month = jul,
86
+ year = "2023",
87
+ address = "Toronto, Canada",
88
+ publisher = "Association for Computational Linguistics",
89
+ url = "https://aclanthology.org/2023.acl-long.609/",
90
+ doi = "10.18653/v1/2023.acl-long.609",
91
+ pages = "10883--10900",
92
+ abstract = "In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages."
93
+ }
94
+ ```