Gilles
/

FongBERT

Feature Extraction

Transformers

PyTorch

roberta

Model card Files Files and versions

xet

Community

Gilles commited on Aug 11, 2021

Commit

ff320af

1 Parent(s): 1b366de

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -20

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # FongBERT
-FongBERT is a BERT model trained on more than 50.000 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website.
 It is the first pretrained model to leverage transfer learning for downtream tasks for Fon.
 Below are some examples of missing word prediction.
@@ -18,18 +18,18 @@ fill = pipeline('fill-mask', model=model, tokenizer=tokenizer)
 #### Example 1
-**Sentence 1**: wa wazɔ xa mi . **Translation**: come to work with me.
-**Masked Sentence**: wa wazɔ xa <"mask"> . **Translation**: come to work with <"mask">.
-fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')
-[{'score': 0.9988399147987366,
-  'sequence': 'wa wazɔ xa mi',
-  'token': 391,
-  'token_str': ' mi'},
- {'score': 0.00041466866969130933,
-  'sequence': 'wa wazɔ xa wɛ'
 ...........]
@@ -39,12 +39,12 @@ fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')
 **Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much.
-[{'score': 0.8948522210121155,
   'sequence': 'un yi wan nu we ɖesu',
-  'token': 702,
   'token_str': ' wan'},
- {'score': 0.06282926350831985,
-  'sequence': 'un yi ɖɔ nu we ɖesu',
   ...........]
@@ -54,11 +54,10 @@ fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')
 **Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">.
-  [{'score': 0.2686346471309662,
-  'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú é ɖé',
-  'token': 278,
-  'token_str': ' é'},
- {'score': 0.1764318197965622,
   'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé',
-  'token': 1205,
     ...........]

 # FongBERT
+FongBERT is a BERT model trained on 68.363 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website.
 It is the first pretrained model to leverage transfer learning for downtream tasks for Fon.
 Below are some examples of missing word prediction.
 #### Example 1
+**Sentence 1**: un tuùn ɖɔ un jló na wazɔ̌ nú we . **Translation**: I know, I have to work for you.
+**Masked Sentence**:  un tuùn ɖɔ un jló na wazɔ̌ <"mask"> we  . **Translation**: I know, I have to work <"mask"> you.
+fill(f'un tuùn ɖɔ un jló na wazɔ̌ {fill.tokenizer.mask_token} we')
+[{'score': 0.994536280632019,
+  'sequence': 'un tuùn ɖɔ un jló na wazɔ̌ nú we',
+  'token': 312,
+  'token_str': ' nú'},
+ {'score': 0.0015309195732697845,
+  'sequence': 'un tuùn ɖɔ un jló na wazɔ̌nu we',
 ...........]
 **Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much.
+[{'score': 0.31483960151672363,
   'sequence': 'un yi wan nu we ɖesu',
+  'token': 639,
   'token_str': ' wan'},
+ {'score': 0.20940221846103668,
+  'sequence': 'un yi ba nu we ɖesu',
   ...........]
 **Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">.
+  [{'score': 0.934298574924469,
   'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé',
+  'token': 1102,
+  'token_str': ' táan'},
+ {'score': 0.03750855475664139,
+  'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú ganxixo ɖé',
     ...........]