File size: 2,205 Bytes
7a0bb6b ff320af 7a0bb6b f308c50 32fada3 f308c50 7a0bb6b ff320af 7a0bb6b ff320af 7a0bb6b 32fada3 f8bec93 7a0bb6b ff320af 7a0bb6b ff320af 7a0bb6b ff320af 7a0bb6b 32fada3 95147f5 7a0bb6b ff320af 7a0bb6b ff320af 7a0bb6b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
# FongBERT
FongBERT is a BERT model trained on 68.363 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website.
It is the first pretrained model to leverage transfer learning for downtream tasks for Fon.
Below are some examples of missing word prediction.
from transformers import AutoTokenizer, AutoModelForMaskedLM
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Gilles/FongBERT")
model = AutoModelForMaskedLM.from_pretrained("Gilles/FongBERT")
fill = pipeline('fill-mask', model=model, tokenizer=tokenizer)
#### Example 1
**Sentence 1**: un tuùn ɖɔ un jló na wazɔ̌ nú we . **Translation**: I know I have to work for you.
**Masked Sentence**: un tuùn ɖɔ un jló na wazɔ̌ <"mask"> we . **Translation**: I know I have to work <"mask"> you.
fill(f'un tuùn ɖɔ un jló na wazɔ̌ {fill.tokenizer.mask_token} we')
[{'score': 0.994536280632019,
'sequence': 'un tuùn ɖɔ un jló na wazɔ̌ nú we',
'token': 312,
'token_str': ' nú'},
{'score': 0.0015309195732697845,
'sequence': 'un tuùn ɖɔ un jló na wazɔ̌nu we',
...........]
#### Example 2
**Sentence 2**: un yi wan nu we ɖesu . **Translation**: I love you so much.
**Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much.
[{'score': 0.31483960151672363,
'sequence': 'un yi wan nu we ɖesu',
'token': 639,
'token_str': ' wan'},
{'score': 0.20940221846103668,
'sequence': 'un yi ba nu we ɖesu',
...........]
#### Example 3
**Sentence 3**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé . **Translation**: I went to my boyfriend for a while.
**Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">.
[{'score': 0.934298574924469,
'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé',
'token': 1102,
'token_str': ' táan'},
{'score': 0.03750855475664139,
'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú ganxixo ɖé',
...........]
|