|
# FongBERT |
|
|
|
FongBERT is a BERT model trained on 68.363 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website. |
|
It is the first pretrained model to leverage transfer learning for downtream tasks for Fon. |
|
Below are some examples of missing word prediction. |
|
|
|
|
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
from transformers import pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Gilles/FongBERT") |
|
|
|
model = AutoModelForMaskedLM.from_pretrained("Gilles/FongBERT") |
|
|
|
|
|
fill = pipeline('fill-mask', model=model, tokenizer=tokenizer) |
|
|
|
|
|
#### Example 1 |
|
|
|
**Sentence 1**: un tuùn ɖɔ un jló na wazɔ̌ nú we . **Translation**: I know I have to work for you. |
|
|
|
**Masked Sentence**: un tuùn ɖɔ un jló na wazɔ̌ <"mask"> we . **Translation**: I know I have to work <"mask"> you. |
|
|
|
fill(f'un tuùn ɖɔ un jló na wazɔ̌ {fill.tokenizer.mask_token} we') |
|
|
|
[{'score': 0.994536280632019, |
|
'sequence': 'un tuùn ɖɔ un jló na wazɔ̌ nú we', |
|
'token': 312, |
|
'token_str': ' nú'}, |
|
{'score': 0.0015309195732697845, |
|
'sequence': 'un tuùn ɖɔ un jló na wazɔ̌nu we', |
|
...........] |
|
|
|
|
|
#### Example 2 |
|
|
|
**Sentence 2**: un yi wan nu we ɖesu . **Translation**: I love you so much. |
|
|
|
**Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much. |
|
|
|
[{'score': 0.31483960151672363, |
|
'sequence': 'un yi wan nu we ɖesu', |
|
'token': 639, |
|
'token_str': ' wan'}, |
|
{'score': 0.20940221846103668, |
|
'sequence': 'un yi ba nu we ɖesu', |
|
...........] |
|
|
|
|
|
#### Example 3 |
|
|
|
**Sentence 3**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé . **Translation**: I went to my boyfriend for a while. |
|
|
|
**Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">. |
|
|
|
[{'score': 0.934298574924469, |
|
'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé', |
|
'token': 1102, |
|
'token_str': ' táan'}, |
|
{'score': 0.03750855475664139, |
|
'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú ganxixo ɖé', |
|
...........] |
|
|