BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper
•
1810.04805
•
Published
•
25
You can use this model with Transformers pipeline for NER.
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("eolang/SW-NER-v1")
model = AutoModelForTokenClassification.from_pretrained("eolang/SW-NER-v1")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Tumefanya mabadiliko muhimu katika sera zetu za faragha na vidakuzi"
ner_results = nlp(example)
print(ner_results)
This model was fine-tuned on the Swahili Version of the Masakhane Dataset from the MasakhaneNER Project. MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.
This model was trained on a single NVIDIA RTX 3090 GPU with recommended hyperparameters from the original BERT paper.