Gilles commited on
Commit
ff320af
·
1 Parent(s): 1b366de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -20
README.md CHANGED
@@ -1,6 +1,6 @@
1
  # FongBERT
2
 
3
- FongBERT is a BERT model trained on more than 50.000 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website.
4
  It is the first pretrained model to leverage transfer learning for downtream tasks for Fon.
5
  Below are some examples of missing word prediction.
6
 
@@ -18,18 +18,18 @@ fill = pipeline('fill-mask', model=model, tokenizer=tokenizer)
18
 
19
  #### Example 1
20
 
21
- **Sentence 1**: wa wazɔ xa mi . **Translation**: come to work with me.
22
 
23
- **Masked Sentence**: wa wazɔ xa <"mask"> . **Translation**: come to work with <"mask">.
24
 
25
- fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')
26
 
27
- [{'score': 0.9988399147987366,
28
- 'sequence': 'wa wazɔ xa mi',
29
- 'token': 391,
30
- 'token_str': ' mi'},
31
- {'score': 0.00041466866969130933,
32
- 'sequence': 'wa wazɔ xa '
33
  ...........]
34
 
35
 
@@ -39,12 +39,12 @@ fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')
39
 
40
  **Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much.
41
 
42
- [{'score': 0.8948522210121155,
43
  'sequence': 'un yi wan nu we ɖesu',
44
- 'token': 702,
45
  'token_str': ' wan'},
46
- {'score': 0.06282926350831985,
47
- 'sequence': 'un yi ɖɔ nu we ɖesu',
48
  ...........]
49
 
50
 
@@ -54,11 +54,10 @@ fill(f'wa wazɔ xa {fill.tokenizer.mask_token}')
54
 
55
  **Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">.
56
 
57
- [{'score': 0.2686346471309662,
58
- 'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú é ɖé',
59
- 'token': 278,
60
- 'token_str': ' é'},
61
- {'score': 0.1764318197965622,
62
  'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé',
63
- 'token': 1205,
 
 
 
64
  ...........]
 
1
  # FongBERT
2
 
3
+ FongBERT is a BERT model trained on 68.363 sentences in [Fon](https://en.wikipedia.org/wiki/Fon_language). The data are compiled from [JW300](https://opus.nlpl.eu/JW300.php) and other additional data I scraped from the [JW](https://www.jw.org/en/) website.
4
  It is the first pretrained model to leverage transfer learning for downtream tasks for Fon.
5
  Below are some examples of missing word prediction.
6
 
 
18
 
19
  #### Example 1
20
 
21
+ **Sentence 1**: un tuùn ɖɔ un jló na wazɔ̌ nú we . **Translation**: I know, I have to work for you.
22
 
23
+ **Masked Sentence**: un tuùn ɖɔ un jló na wazɔ̌ <"mask"> we . **Translation**: I know, I have to work <"mask"> you.
24
 
25
+ fill(f'un tuùn ɖɔ un jló na wazɔ̌ {fill.tokenizer.mask_token} we')
26
 
27
+ [{'score': 0.994536280632019,
28
+ 'sequence': 'un tuùn ɖɔ un jló na wazɔ̌ nú we',
29
+ 'token': 312,
30
+ 'token_str': ' '},
31
+ {'score': 0.0015309195732697845,
32
+ 'sequence': 'un tuùn ɖɔ un jló na wazɔ̌nu we',
33
  ...........]
34
 
35
 
 
39
 
40
  **Masked Sentence**: un yi <"mask"> nu we ɖesu . **Translation**: I <"mask"> you so much.
41
 
42
+ [{'score': 0.31483960151672363,
43
  'sequence': 'un yi wan nu we ɖesu',
44
+ 'token': 639,
45
  'token_str': ' wan'},
46
+ {'score': 0.20940221846103668,
47
+ 'sequence': 'un yi ba nu we ɖesu',
48
  ...........]
49
 
50
 
 
54
 
55
  **Masked Sentence**: un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú <"mask"> ɖé . **Translation**: I went to my boyfriend for a <"mask">.
56
 
57
+ [{'score': 0.934298574924469,
 
 
 
 
58
  'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú táan ɖé',
59
+ 'token': 1102,
60
+ 'token_str': ' táan'},
61
+ {'score': 0.03750855475664139,
62
+ 'sequence': 'un yì cí sunnu xɔ́ntɔn ce Tony gɔ́n nú ganxixo ɖé',
63
  ...........]