YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
BPE based tokenizer used for the MEHDIE project and the training of a bilingual BERT model.
Vocabulary size: 52000 Trained on:
- Arabic dataset: https://huggingface.co/datasets/bigscience-data/roots_ar_openiti_proc
- Hebrew/English dataset: https://huggingface.co/datasets/mehdie/sefaria
Examples: Hebrew:
- "ืื ืืกืคืจ ืืืืืจ ืืืืจืื ืฉืกืคืจ ืืืฉ ืืื ืืืจืฅ ื ืืืจื ืฉืฉืื ืจืื ืื ืืืื ืืจ ืืื ื ืืืืืืื. ืืืื ืืืื ืืืื ืืืจืฆืืช ืจืืืช ืืจืืืงืืช ืืืฉืจ ืืชืคืจืฉ ืืืืจืื ืืื ืืืื ืืงืื ืฉืื ืื ืืชื ืื ืืืืจืื ืฉืจืื ืื ืฉืฉืืข ืืคื ืื ืฉื ืืืช ืืฉืจ ื ืฉืืขื ืืืจืฅ ืกืคืจื: ืืื ืืื ืืืืจ ืืงืฆืช ืืืืืืื ืืื ืฉืืืื ืฉืืืงืฆืช ืืงืืืืช ืืืฉืื ืืืื ืืืจืื ืืื ืขืื ืืืจืฅ ืงืฉืืืืื ืืฉื ืช ืชืชืงื"
- {'input_ids': [1060, 15784, 20958, 31767, 476, 4398, 3294, 1812, 19949, 42648, 455, 38010, 2069, 23008, 978, 11894, 3509, 8222, 973, 26, 23816, 8043, 461, 19170, 2998, 6517, 4245, 960, 5536, 928, 4122, 1008, 2643, 16456, 2702, 10350, 1796, 3044, 1333, 1488, 1019, 5501, 15530, 1109, 26822, 8473, 11437, 5419, 1919, 467, 13163, 6566, 4398, 454, 38, 7922, 1203, 41248, 9907, 21722, 1001, 16464, 931, 1123, 9907, 9647, 1053, 3044, 4553, 3573, 2851, 4088, 9330, 3492, 18352, 1057, 23994, 32635, 463], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Arabic:
- "ุณูุณูุฉ ุงูุฃุฌุฒุงุก ูุงููุชุจ ุงูุญุฏูุซูุฉ ุงูููุงุฆุฏ ูุงูุฃุฎุจุงุฑ ูุงูุญูุงูุงุช ุนู ุงูุดุงูุนู ูุญุงุชู ุงูุฃุตู ูู ุนุฑูู ุงููุฑุฎู ูุบูุฑูู ููู ุญุฏุซ ุงููููู ุฃุจู ุนูู ุงูุญุณู ุจู ุงูุญุณูู ุจู ุญู ูุงู ุงููู ุฐุงูู ุงูุดุงูุนู ุฏุฑุงุณุฉ ูุชุญููู ูุชุนููู ุงูุทุจุนุฉ ุงูุฃููู ุงูุฌุฒุก ุงูุฃูู ู ู ุงูููุงุฆุฏ ูุงูุฃุฎุจุงุฑ ูุงูุญูุงูุงุช ุนู ุงูุดุงูุนู ูุญุงุชู ุงูุฃุตู ูู ุนุฑูู ุงููุฑุฎู ูุบูุฑูู ุฑุถู ุงููู ุนููู ุฑูุงูุฉ"
- {'input_ids': [27193, 15595, 34780, 1361, 949, 13852, 21459, 2169, 30440, 896, 2040, 41252, 9723, 50442, 16317, 3057, 1675, 1216, 3320, 958, 910, 1260, 888, 1532, 888, 912, 935, 13333, 2040, 36093, 22637, 49937, 16554, 2254, 4572, 1576, 890, 13852, 21459, 2169, 30440, 896, 2040, 41252, 9723, 50442, 16317, 3057, 1432, 904, 2710, 1933], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
English:
- "The medieval Arabic name of the northernmost of the three provinces of the Jazira, the other two being Diyar Mudar and Diyar Rabi'a"
- {'input_ids': [2034, 16522, 4490, 1270, 22040, 1837, 2340, 7960, 1183, 989, 10048, 2068, 90, 13377, 1183, 989, 8235, 14261, 1021, 7322, 1183, 989, 54, 18017, 17311, 24, 989, 3249, 5269, 8500, 48, 17821, 1294, 57, 3307, 1294, 1261, 48, 17821, 1294, 26438, 85, 19, 77], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.