rahulseetharaman commited on
Commit
78bff4d
·
verified ·
1 Parent(s): fbd9dde

Add new CrossEncoder model

Browse files
Files changed (6) hide show
  1. README.md +543 -0
  2. config.json +57 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +37 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +945 -0
README.md ADDED
@@ -0,0 +1,543 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - reranker
8
+ - generated_from_trainer
9
+ - dataset_size:78704
10
+ - loss:RankNetLoss
11
+ base_model: jhu-clsp/ettin-encoder-68m
12
+ datasets:
13
+ - microsoft/ms_marco
14
+ pipeline_tag: text-ranking
15
+ library_name: sentence-transformers
16
+ metrics:
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ model-index:
21
+ - name: CrossEncoder based on jhu-clsp/ettin-encoder-68m
22
+ results:
23
+ - task:
24
+ type: cross-encoder-reranking
25
+ name: Cross Encoder Reranking
26
+ dataset:
27
+ name: NanoMSMARCO R100
28
+ type: NanoMSMARCO_R100
29
+ metrics:
30
+ - type: map
31
+ value: 0.5335
32
+ name: Map
33
+ - type: mrr@10
34
+ value: 0.5229
35
+ name: Mrr@10
36
+ - type: ndcg@10
37
+ value: 0.5843
38
+ name: Ndcg@10
39
+ - task:
40
+ type: cross-encoder-reranking
41
+ name: Cross Encoder Reranking
42
+ dataset:
43
+ name: NanoNFCorpus R100
44
+ type: NanoNFCorpus_R100
45
+ metrics:
46
+ - type: map
47
+ value: 0.3515
48
+ name: Map
49
+ - type: mrr@10
50
+ value: 0.5691
51
+ name: Mrr@10
52
+ - type: ndcg@10
53
+ value: 0.3698
54
+ name: Ndcg@10
55
+ - task:
56
+ type: cross-encoder-reranking
57
+ name: Cross Encoder Reranking
58
+ dataset:
59
+ name: NanoNQ R100
60
+ type: NanoNQ_R100
61
+ metrics:
62
+ - type: map
63
+ value: 0.566
64
+ name: Map
65
+ - type: mrr@10
66
+ value: 0.5751
67
+ name: Mrr@10
68
+ - type: ndcg@10
69
+ value: 0.6239
70
+ name: Ndcg@10
71
+ - task:
72
+ type: cross-encoder-nano-beir
73
+ name: Cross Encoder Nano BEIR
74
+ dataset:
75
+ name: NanoBEIR R100 mean
76
+ type: NanoBEIR_R100_mean
77
+ metrics:
78
+ - type: map
79
+ value: 0.4837
80
+ name: Map
81
+ - type: mrr@10
82
+ value: 0.5557
83
+ name: Mrr@10
84
+ - type: ndcg@10
85
+ value: 0.526
86
+ name: Ndcg@10
87
+ ---
88
+
89
+ # CrossEncoder based on jhu-clsp/ettin-encoder-68m
90
+
91
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [jhu-clsp/ettin-encoder-68m](https://huggingface.co/jhu-clsp/ettin-encoder-68m) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
92
+
93
+ ## Model Details
94
+
95
+ ### Model Description
96
+ - **Model Type:** Cross Encoder
97
+ - **Base model:** [jhu-clsp/ettin-encoder-68m](https://huggingface.co/jhu-clsp/ettin-encoder-68m) <!-- at revision ac19ae4bc51093b31c475665ac872a936d056cc2 -->
98
+ - **Maximum Sequence Length:** 7999 tokens
99
+ - **Number of Output Labels:** 1 label
100
+ - **Training Dataset:**
101
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
102
+ - **Language:** en
103
+ <!-- - **License:** Unknown -->
104
+
105
+ ### Model Sources
106
+
107
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
108
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
109
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
110
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
111
+
112
+ ## Usage
113
+
114
+ ### Direct Usage (Sentence Transformers)
115
+
116
+ First install the Sentence Transformers library:
117
+
118
+ ```bash
119
+ pip install -U sentence-transformers
120
+ ```
121
+
122
+ Then you can load this model and run inference.
123
+ ```python
124
+ from sentence_transformers import CrossEncoder
125
+
126
+ # Download from the 🤗 Hub
127
+ model = CrossEncoder("rahulseetharaman/reranker-msmarco-v1.1-ettin-encoder-68m-ranknet")
128
+ # Get scores for pairs of texts
129
+ pairs = [
130
+ ['define monogenic trait', 'An allele is a version of a gene. For example, in fruitflies there is a gene which determines eye colour: one allele gives red eyes, and another gives white eyes; it is the same *gene*, just different versions of that gene. A monogenic trait is one which is encoded by a single gene. e.g. - cystic fibrosis in humans. There is a single gene which determines this trait: the wild-type allele is healthy, while the disease allele gives you cystic fibrosis'],
131
+ ['define monogenic trait', 'Abstract. Monogenic inheritance refers to genetic control of a phenotype or trait by a single gene. For a monogenic trait, mutations in one (dominant) or both (recessive) copies of the gene are sufficient for the trait to be expressed. Digenic inheritance refers to mutation on two genes interacting to cause a genetic phenotype or disease. Triallelic inheritance is a special case of digenic inheritance that requires homozygous mutations at one locus and heterozygous mutations at a second locus to express a phenotype.'],
132
+ ['define monogenic trait', 'A trait that is controlled by a group of nonallelic genes. Supplement. Polygenic traits are controlled by two or more than two genes (usually by many different genes) at different loci on different chromosomes. These genes are described as polygenes.'],
133
+ ['define monogenic trait', "Monogenic Disorders (Single Abnormal Gene). Monogenic autosomal dominant disorders occur through the inheritance of a single copy of a defective gene. These disorders are the result of a single defective gene on the autosomes. They are inherited according to Mendel's Laws (Mendelian disorders). The mutation can be spontaneous and where there is no previous family history. Inheritance patterns can be autosomal dominant, autosomal recessive or X-linked recessive."],
134
+ ['define monogenic trait', 'Adj. 1. monogenic-of or relating to an inheritable character that is controlled by a single pair of genes. genetic science, genetics-the branch of biology that studies heredity and variation in organisms. heritable, inheritable-capable of being inherited; inheritable traits such as eye color; an inheritable title. monogenic. adj. 1. (Genetics) genetics of or relating to an inherited character difference that is controlled by a single gene. 2. (Biology) (of animals) producing offspring of one sex. (ˌmɒn əˈdʒɛn ɪk).'],
135
+ ]
136
+ scores = model.predict(pairs)
137
+ print(scores.shape)
138
+ # (5,)
139
+
140
+ # Or rank different texts based on similarity to a single text
141
+ ranks = model.rank(
142
+ 'define monogenic trait',
143
+ [
144
+ 'An allele is a version of a gene. For example, in fruitflies there is a gene which determines eye colour: one allele gives red eyes, and another gives white eyes; it is the same *gene*, just different versions of that gene. A monogenic trait is one which is encoded by a single gene. e.g. - cystic fibrosis in humans. There is a single gene which determines this trait: the wild-type allele is healthy, while the disease allele gives you cystic fibrosis',
145
+ 'Abstract. Monogenic inheritance refers to genetic control of a phenotype or trait by a single gene. For a monogenic trait, mutations in one (dominant) or both (recessive) copies of the gene are sufficient for the trait to be expressed. Digenic inheritance refers to mutation on two genes interacting to cause a genetic phenotype or disease. Triallelic inheritance is a special case of digenic inheritance that requires homozygous mutations at one locus and heterozygous mutations at a second locus to express a phenotype.',
146
+ 'A trait that is controlled by a group of nonallelic genes. Supplement. Polygenic traits are controlled by two or more than two genes (usually by many different genes) at different loci on different chromosomes. These genes are described as polygenes.',
147
+ "Monogenic Disorders (Single Abnormal Gene). Monogenic autosomal dominant disorders occur through the inheritance of a single copy of a defective gene. These disorders are the result of a single defective gene on the autosomes. They are inherited according to Mendel's Laws (Mendelian disorders). The mutation can be spontaneous and where there is no previous family history. Inheritance patterns can be autosomal dominant, autosomal recessive or X-linked recessive.",
148
+ 'Adj. 1. monogenic-of or relating to an inheritable character that is controlled by a single pair of genes. genetic science, genetics-the branch of biology that studies heredity and variation in organisms. heritable, inheritable-capable of being inherited; inheritable traits such as eye color; an inheritable title. monogenic. adj. 1. (Genetics) genetics of or relating to an inherited character difference that is controlled by a single gene. 2. (Biology) (of animals) producing offspring of one sex. (ˌmɒn əˈdʒɛn ɪk).',
149
+ ]
150
+ )
151
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
152
+ ```
153
+
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
+
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
+
159
+ </details>
160
+ -->
161
+
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
+
165
+ You can finetune this model on your own dataset.
166
+
167
+ <details><summary>Click to expand</summary>
168
+
169
+ </details>
170
+ -->
171
+
172
+ <!--
173
+ ### Out-of-Scope Use
174
+
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
+
178
+ ## Evaluation
179
+
180
+ ### Metrics
181
+
182
+ #### Cross Encoder Reranking
183
+
184
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
185
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
186
+ ```json
187
+ {
188
+ "at_k": 10,
189
+ "always_rerank_positives": true
190
+ }
191
+ ```
192
+
193
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
194
+ |:------------|:---------------------|:---------------------|:---------------------|
195
+ | map | 0.5335 (+0.0440) | 0.3515 (+0.0905) | 0.5660 (+0.1464) |
196
+ | mrr@10 | 0.5229 (+0.0454) | 0.5691 (+0.0692) | 0.5751 (+0.1484) |
197
+ | **ndcg@10** | **0.5843 (+0.0439)** | **0.3698 (+0.0448)** | **0.6239 (+0.1232)** |
198
+
199
+ #### Cross Encoder Nano BEIR
200
+
201
+ * Dataset: `NanoBEIR_R100_mean`
202
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
203
+ ```json
204
+ {
205
+ "dataset_names": [
206
+ "msmarco",
207
+ "nfcorpus",
208
+ "nq"
209
+ ],
210
+ "rerank_k": 100,
211
+ "at_k": 10,
212
+ "always_rerank_positives": true
213
+ }
214
+ ```
215
+
216
+ | Metric | Value |
217
+ |:------------|:---------------------|
218
+ | map | 0.4837 (+0.0936) |
219
+ | mrr@10 | 0.5557 (+0.0877) |
220
+ | **ndcg@10** | **0.5260 (+0.0706)** |
221
+
222
+ <!--
223
+ ## Bias, Risks and Limitations
224
+
225
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
226
+ -->
227
+
228
+ <!--
229
+ ### Recommendations
230
+
231
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
232
+ -->
233
+
234
+ ## Training Details
235
+
236
+ ### Training Dataset
237
+
238
+ #### ms_marco
239
+
240
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
241
+ * Size: 78,704 training samples
242
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
243
+ * Approximate statistics based on the first 1000 samples:
244
+ | | query | docs | labels |
245
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
246
+ | type | string | list | list |
247
+ | details | <ul><li>min: 11 characters</li><li>mean: 32.93 characters</li><li>max: 95 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
248
+ * Samples:
249
+ | query | docs | labels |
250
+ |:----------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
251
+ | <code>what does vegan mean</code> | <code>['A vegan, a person who practices veganism, is an individual who actively avoids the use of animal products for food, clothing or any other purpose. As with many diets and lifestyles, not all vegans approach animal product avoidance in the same ways. For example, some vegans completely avoid all animal by-products, while others consider it acceptable to use honey, silk, and other by-products produced from insects.', 'Fruitarian: Eats only raw fruit, including raw nuts and seeds. Vegan. Does not eat dairy products, eggs, or any other animal product. So in a nutshell, a vegetarian diet excludes flesh, but includes other animal products: A vegan diet is one that excludes all animal products. And I have to say that I have met very few vegans who stop with what they put in their mouths. ', 'Animal Ingredients and Their Alternatives. Adopting a vegan diet means saying “no” to cruelty to animals and environmental destruction and “yes” to compassion and good health. It also means paying attent...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
252
+ | <code>difference between viral and bacterial conjunctivitis symptoms</code> | <code>["Viral and bacterial conjunctivitis. Viral conjunctivitis and bacterial conjunctivitis may affect one or both eyes. Viral conjunctivitis usually produces a watery discharge. Bacterial conjunctivitis often produces a thicker, yellow-green discharge. Both types can be associated with colds or symptoms of a respiratory infection, such as a sore throat. Both viral and bacterial types are very contagious. They are spread through direct or indirect contact with the eye secretions of someone who's infected", 'A Honor Society of Nursing (STTI) answered. Viral and bacterial conjunctivitis are similar, but differ in several key ways. First, bacterial conjunctivitis can be cured with antibiotics, while the viral form cannot. Second, there is a slight variation in symptoms. With viral conjunctivitis, the discharge from the eye is clearer and less thick than with the bacterial infection. Viral conjunctivitis can also cause painful swelling in the lymph node nearest the ear, a symptom not experienc...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
253
+ | <code>can single member llc be taxed as s corp</code> | <code>['A single-member limited liability company, as a solely owned LLC is called, gives the owner a choice of how to be taxed -- as a sole proprietorship, an S corporation or a C corporation. The legal structure of the business itself doesn’t change with any of the choices. Under an S corporation classification, a single-member LLC needs to have a large enough profit in excess of the owner’s salary to realize any tax savings on passive income.', 'An S corp may own up to 100 percent of an LLC, or limited liability company. While all but single-member LLCs cannot be shareholders in S corporations, the reverse -- an S corporation owning an LLC -- is legal. The similarity of tax treatment for S corps and LLCs eliminates most of the common concerns about IRS issues. There is, however, one way for an LLC to own stock in an S corp. A single member LLC, taxed as a sole proprietorship, is called a disregarded entity by the IRS. Treated like an unincorporated individual, this LLC could own stock in ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
254
+ * Loss: [<code>RankNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#ranknetloss) with these parameters:
255
+ ```json
256
+ {
257
+ "k": null,
258
+ "sigma": 1.0,
259
+ "eps": 1e-10,
260
+ "reduction_log": "binary",
261
+ "activation_fn": "torch.nn.modules.linear.Identity",
262
+ "mini_batch_size": 16
263
+ }
264
+ ```
265
+
266
+ ### Evaluation Dataset
267
+
268
+ #### ms_marco
269
+
270
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
271
+ * Size: 1,000 evaluation samples
272
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
273
+ * Approximate statistics based on the first 1000 samples:
274
+ | | query | docs | labels |
275
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
276
+ | type | string | list | list |
277
+ | details | <ul><li>min: 11 characters</li><li>mean: 33.63 characters</li><li>max: 99 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
278
+ * Samples:
279
+ | query | docs | labels |
280
+ |:----------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
281
+ | <code>define monogenic trait</code> | <code>['An allele is a version of a gene. For example, in fruitflies there is a gene which determines eye colour: one allele gives red eyes, and another gives white eyes; it is the same *gene*, just different versions of that gene. A monogenic trait is one which is encoded by a single gene. e.g. - cystic fibrosis in humans. There is a single gene which determines this trait: the wild-type allele is healthy, while the disease allele gives you cystic fibrosis', 'Abstract. Monogenic inheritance refers to genetic control of a phenotype or trait by a single gene. For a monogenic trait, mutations in one (dominant) or both (recessive) copies of the gene are sufficient for the trait to be expressed. Digenic inheritance refers to mutation on two genes interacting to cause a genetic phenotype or disease. Triallelic inheritance is a special case of digenic inheritance that requires homozygous mutations at one locus and heterozygous mutations at a second locus to express a phenotype.', 'A trait that is ...</code> | <code>[1, 1, 0, 0, 0, ...]</code> |
282
+ | <code>behavioral theory definition</code> | <code>["Not to be confused with Behavioralism. Behaviorism (or behaviourism) is an approach to psychology that focuses on an individual's behavior. It combines elements of philosophy, methodology, and psychological theory", 'The initial assumption is that behavior can be explained and further described using behavioral theories. For instance, John Watson and B.F. Skinner advocate the theory that behavior can be acquired through conditioning. Also known as general behavior theory. BEHAVIOR THEORY: Each behavioral theory is an advantage to learning, because it provides teachers with a new and different approach.. No related posts. ', 'behaviorism. noun be·hav·ior·ism. : a school of psychology that takes the objective evidence of behavior (as measured responses to stimuli) as the only concern of its research and the only basis of its theory without reference to conscious experience—compare cognitive psychology. : a school of psychology that takes the objective evidence of behavior (as measured ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
283
+ | <code>What is a disease that is pleiotropic?</code> | <code>['Unsourced material may be challenged and removed. (September 2013). Pleiotropy occurs when one gene influences two or more seemingly unrelated phenotypic traits, an example being phenylketonuria, which is a human disease that affects multiple systems but is caused by one gene defect. Consequently, a mutation in a pleiotropic gene may have an effect on some or all traits simultaneously. The underlying mechanism is that the gene codes for a product that is, for example, used by various cells, or has a signaling function on various targets. A classic example of pleiotropy is the human disease phenylketonuria (PKU).', 'Pleiotropic, autosomal dominant disorder affecting connective tissue: Related Diseases. Pleiotropic, autosomal dominant disorder affecting connective tissue: Pleiotropic, autosomal dominant disorder affecting connective tissue is listed as a type of (or associated with) the following medical conditions in our database: 1 Heart conditions. Office of Rare Diseases (ORD) of ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
284
+ * Loss: [<code>RankNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#ranknetloss) with these parameters:
285
+ ```json
286
+ {
287
+ "k": null,
288
+ "sigma": 1.0,
289
+ "eps": 1e-10,
290
+ "reduction_log": "binary",
291
+ "activation_fn": "torch.nn.modules.linear.Identity",
292
+ "mini_batch_size": 16
293
+ }
294
+ ```
295
+
296
+ ### Training Hyperparameters
297
+ #### Non-Default Hyperparameters
298
+
299
+ - `eval_strategy`: steps
300
+ - `per_device_train_batch_size`: 16
301
+ - `per_device_eval_batch_size`: 16
302
+ - `learning_rate`: 2e-05
303
+ - `num_train_epochs`: 1
304
+ - `warmup_ratio`: 0.1
305
+ - `seed`: 12
306
+ - `bf16`: True
307
+ - `load_best_model_at_end`: True
308
+
309
+ #### All Hyperparameters
310
+ <details><summary>Click to expand</summary>
311
+
312
+ - `overwrite_output_dir`: False
313
+ - `do_predict`: False
314
+ - `eval_strategy`: steps
315
+ - `prediction_loss_only`: True
316
+ - `per_device_train_batch_size`: 16
317
+ - `per_device_eval_batch_size`: 16
318
+ - `per_gpu_train_batch_size`: None
319
+ - `per_gpu_eval_batch_size`: None
320
+ - `gradient_accumulation_steps`: 1
321
+ - `eval_accumulation_steps`: None
322
+ - `torch_empty_cache_steps`: None
323
+ - `learning_rate`: 2e-05
324
+ - `weight_decay`: 0.0
325
+ - `adam_beta1`: 0.9
326
+ - `adam_beta2`: 0.999
327
+ - `adam_epsilon`: 1e-08
328
+ - `max_grad_norm`: 1.0
329
+ - `num_train_epochs`: 1
330
+ - `max_steps`: -1
331
+ - `lr_scheduler_type`: linear
332
+ - `lr_scheduler_kwargs`: {}
333
+ - `warmup_ratio`: 0.1
334
+ - `warmup_steps`: 0
335
+ - `log_level`: passive
336
+ - `log_level_replica`: warning
337
+ - `log_on_each_node`: True
338
+ - `logging_nan_inf_filter`: True
339
+ - `save_safetensors`: True
340
+ - `save_on_each_node`: False
341
+ - `save_only_model`: False
342
+ - `restore_callback_states_from_checkpoint`: False
343
+ - `no_cuda`: False
344
+ - `use_cpu`: False
345
+ - `use_mps_device`: False
346
+ - `seed`: 12
347
+ - `data_seed`: None
348
+ - `jit_mode_eval`: False
349
+ - `use_ipex`: False
350
+ - `bf16`: True
351
+ - `fp16`: False
352
+ - `fp16_opt_level`: O1
353
+ - `half_precision_backend`: auto
354
+ - `bf16_full_eval`: False
355
+ - `fp16_full_eval`: False
356
+ - `tf32`: None
357
+ - `local_rank`: 0
358
+ - `ddp_backend`: None
359
+ - `tpu_num_cores`: None
360
+ - `tpu_metrics_debug`: False
361
+ - `debug`: []
362
+ - `dataloader_drop_last`: False
363
+ - `dataloader_num_workers`: 0
364
+ - `dataloader_prefetch_factor`: None
365
+ - `past_index`: -1
366
+ - `disable_tqdm`: False
367
+ - `remove_unused_columns`: True
368
+ - `label_names`: None
369
+ - `load_best_model_at_end`: True
370
+ - `ignore_data_skip`: False
371
+ - `fsdp`: []
372
+ - `fsdp_min_num_params`: 0
373
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
374
+ - `fsdp_transformer_layer_cls_to_wrap`: None
375
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
376
+ - `deepspeed`: None
377
+ - `label_smoothing_factor`: 0.0
378
+ - `optim`: adamw_torch
379
+ - `optim_args`: None
380
+ - `adafactor`: False
381
+ - `group_by_length`: False
382
+ - `length_column_name`: length
383
+ - `ddp_find_unused_parameters`: None
384
+ - `ddp_bucket_cap_mb`: None
385
+ - `ddp_broadcast_buffers`: False
386
+ - `dataloader_pin_memory`: True
387
+ - `dataloader_persistent_workers`: False
388
+ - `skip_memory_metrics`: True
389
+ - `use_legacy_prediction_loop`: False
390
+ - `push_to_hub`: False
391
+ - `resume_from_checkpoint`: None
392
+ - `hub_model_id`: None
393
+ - `hub_strategy`: every_save
394
+ - `hub_private_repo`: None
395
+ - `hub_always_push`: False
396
+ - `hub_revision`: None
397
+ - `gradient_checkpointing`: False
398
+ - `gradient_checkpointing_kwargs`: None
399
+ - `include_inputs_for_metrics`: False
400
+ - `include_for_metrics`: []
401
+ - `eval_do_concat_batches`: True
402
+ - `fp16_backend`: auto
403
+ - `push_to_hub_model_id`: None
404
+ - `push_to_hub_organization`: None
405
+ - `mp_parameters`:
406
+ - `auto_find_batch_size`: False
407
+ - `full_determinism`: False
408
+ - `torchdynamo`: None
409
+ - `ray_scope`: last
410
+ - `ddp_timeout`: 1800
411
+ - `torch_compile`: False
412
+ - `torch_compile_backend`: None
413
+ - `torch_compile_mode`: None
414
+ - `include_tokens_per_second`: False
415
+ - `include_num_input_tokens_seen`: False
416
+ - `neftune_noise_alpha`: None
417
+ - `optim_target_modules`: None
418
+ - `batch_eval_metrics`: False
419
+ - `eval_on_start`: False
420
+ - `use_liger_kernel`: False
421
+ - `liger_kernel_config`: None
422
+ - `eval_use_gather_object`: False
423
+ - `average_tokens_across_devices`: False
424
+ - `prompts`: None
425
+ - `batch_sampler`: batch_sampler
426
+ - `multi_dataset_batch_sampler`: proportional
427
+ - `router_mapping`: {}
428
+ - `learning_rate_mapping`: {}
429
+
430
+ </details>
431
+
432
+ ### Training Logs
433
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
434
+ |:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
435
+ | -1 | -1 | - | - | 0.0442 (-0.4962) | 0.2555 (-0.0695) | 0.0464 (-0.4542) | 0.1154 (-0.3400) |
436
+ | 0.0002 | 1 | 1.2939 | - | - | - | - | - |
437
+ | 0.0203 | 100 | 1.1672 | 1.0078 | 0.0786 (-0.4619) | 0.2589 (-0.0661) | 0.0735 (-0.4272) | 0.1370 (-0.3184) |
438
+ | 0.0407 | 200 | 0.976 | 0.9437 | 0.1044 (-0.4360) | 0.2471 (-0.0779) | 0.1236 (-0.3770) | 0.1584 (-0.2970) |
439
+ | 0.0610 | 300 | 0.9437 | 0.9052 | 0.1082 (-0.4323) | 0.2184 (-0.1066) | 0.1583 (-0.3424) | 0.1616 (-0.2937) |
440
+ | 0.0813 | 400 | 0.9031 | 0.8531 | 0.2337 (-0.3068) | 0.2189 (-0.1061) | 0.2840 (-0.2166) | 0.2455 (-0.2098) |
441
+ | 0.1016 | 500 | 0.7811 | 0.7722 | 0.4329 (-0.1075) | 0.2982 (-0.0269) | 0.4438 (-0.0569) | 0.3916 (-0.0638) |
442
+ | 0.1220 | 600 | 0.7444 | 0.7193 | 0.5069 (-0.0336) | 0.3209 (-0.0041) | 0.4896 (-0.0110) | 0.4391 (-0.0162) |
443
+ | 0.1423 | 700 | 0.7178 | 0.6976 | 0.4949 (-0.0456) | 0.3468 (+0.0218) | 0.4743 (-0.0264) | 0.4387 (-0.0167) |
444
+ | 0.1626 | 800 | 0.6909 | 0.6992 | 0.5257 (-0.0147) | 0.3709 (+0.0459) | 0.4752 (-0.0255) | 0.4573 (+0.0019) |
445
+ | 0.1830 | 900 | 0.6994 | 0.6759 | 0.5011 (-0.0393) | 0.3648 (+0.0397) | 0.5058 (+0.0051) | 0.4572 (+0.0018) |
446
+ | 0.2033 | 1000 | 0.6688 | 0.6698 | 0.4638 (-0.0766) | 0.3637 (+0.0387) | 0.4537 (-0.0469) | 0.4271 (-0.0283) |
447
+ | 0.2236 | 1100 | 0.6692 | 0.6645 | 0.5275 (-0.0129) | 0.3617 (+0.0366) | 0.5123 (+0.0116) | 0.4671 (+0.0118) |
448
+ | 0.2440 | 1200 | 0.634 | 0.6539 | 0.5276 (-0.0128) | 0.3638 (+0.0388) | 0.5515 (+0.0508) | 0.4810 (+0.0256) |
449
+ | 0.2643 | 1300 | 0.6598 | 0.6531 | 0.4998 (-0.0406) | 0.3504 (+0.0254) | 0.4976 (-0.0031) | 0.4493 (-0.0061) |
450
+ | 0.2846 | 1400 | 0.6627 | 0.6498 | 0.5584 (+0.0180) | 0.3558 (+0.0308) | 0.5260 (+0.0254) | 0.4801 (+0.0247) |
451
+ | 0.3049 | 1500 | 0.6462 | 0.6471 | 0.5633 (+0.0228) | 0.3565 (+0.0315) | 0.5507 (+0.0500) | 0.4901 (+0.0348) |
452
+ | 0.3253 | 1600 | 0.6361 | 0.6309 | 0.5512 (+0.0108) | 0.3538 (+0.0288) | 0.5308 (+0.0302) | 0.4786 (+0.0233) |
453
+ | 0.3456 | 1700 | 0.6265 | 0.6390 | 0.5841 (+0.0437) | 0.3739 (+0.0489) | 0.5701 (+0.0695) | 0.5094 (+0.0540) |
454
+ | 0.3659 | 1800 | 0.6358 | 0.6424 | 0.5745 (+0.0341) | 0.3478 (+0.0227) | 0.5834 (+0.0827) | 0.5019 (+0.0465) |
455
+ | 0.3863 | 1900 | 0.598 | 0.6373 | 0.5681 (+0.0276) | 0.3501 (+0.0250) | 0.5771 (+0.0764) | 0.4984 (+0.0430) |
456
+ | 0.4066 | 2000 | 0.6306 | 0.6326 | 0.5927 (+0.0523) | 0.3503 (+0.0253) | 0.5630 (+0.0624) | 0.5020 (+0.0467) |
457
+ | 0.4269 | 2100 | 0.6465 | 0.6250 | 0.5726 (+0.0322) | 0.3638 (+0.0387) | 0.5667 (+0.0661) | 0.5010 (+0.0457) |
458
+ | 0.4472 | 2200 | 0.6338 | 0.6222 | 0.5496 (+0.0092) | 0.3661 (+0.0410) | 0.5686 (+0.0680) | 0.4948 (+0.0394) |
459
+ | 0.4676 | 2300 | 0.6272 | 0.6272 | 0.5597 (+0.0192) | 0.3692 (+0.0442) | 0.5781 (+0.0774) | 0.5023 (+0.0469) |
460
+ | 0.4879 | 2400 | 0.6218 | 0.6364 | 0.5582 (+0.0178) | 0.3776 (+0.0526) | 0.5914 (+0.0908) | 0.5091 (+0.0537) |
461
+ | 0.5082 | 2500 | 0.6127 | 0.6242 | 0.5361 (-0.0043) | 0.3793 (+0.0543) | 0.6110 (+0.1103) | 0.5088 (+0.0534) |
462
+ | 0.5286 | 2600 | 0.6262 | 0.6287 | 0.5269 (-0.0135) | 0.3625 (+0.0375) | 0.5949 (+0.0943) | 0.4948 (+0.0394) |
463
+ | 0.5489 | 2700 | 0.617 | 0.6263 | 0.5200 (-0.0205) | 0.3647 (+0.0396) | 0.5840 (+0.0833) | 0.4895 (+0.0342) |
464
+ | 0.5692 | 2800 | 0.5798 | 0.6240 | 0.5590 (+0.0185) | 0.3728 (+0.0478) | 0.5750 (+0.0743) | 0.5022 (+0.0469) |
465
+ | 0.5896 | 2900 | 0.6122 | 0.6219 | 0.5499 (+0.0095) | 0.3721 (+0.0471) | 0.6149 (+0.1142) | 0.5123 (+0.0569) |
466
+ | 0.6099 | 3000 | 0.6283 | 0.6176 | 0.5423 (+0.0018) | 0.3819 (+0.0569) | 0.6388 (+0.1381) | 0.5210 (+0.0656) |
467
+ | 0.6302 | 3100 | 0.6083 | 0.6151 | 0.5520 (+0.0116) | 0.3644 (+0.0393) | 0.6047 (+0.1040) | 0.5070 (+0.0516) |
468
+ | 0.6505 | 3200 | 0.6264 | 0.6130 | 0.5534 (+0.0129) | 0.3582 (+0.0331) | 0.5629 (+0.0622) | 0.4915 (+0.0361) |
469
+ | 0.6709 | 3300 | 0.6132 | 0.6115 | 0.5744 (+0.0340) | 0.3572 (+0.0322) | 0.5860 (+0.0854) | 0.5059 (+0.0505) |
470
+ | 0.6912 | 3400 | 0.6215 | 0.6072 | 0.5700 (+0.0296) | 0.3594 (+0.0344) | 0.5922 (+0.0916) | 0.5072 (+0.0518) |
471
+ | 0.7115 | 3500 | 0.598 | 0.6058 | 0.5705 (+0.0301) | 0.3446 (+0.0196) | 0.5973 (+0.0967) | 0.5042 (+0.0488) |
472
+ | 0.7319 | 3600 | 0.6 | 0.6073 | 0.5676 (+0.0272) | 0.3495 (+0.0245) | 0.6222 (+0.1216) | 0.5131 (+0.0577) |
473
+ | 0.7522 | 3700 | 0.599 | 0.6088 | 0.5799 (+0.0395) | 0.3619 (+0.0369) | 0.6129 (+0.1123) | 0.5183 (+0.0629) |
474
+ | 0.7725 | 3800 | 0.6263 | 0.6085 | 0.5856 (+0.0452) | 0.3665 (+0.0415) | 0.6178 (+0.1171) | 0.5233 (+0.0679) |
475
+ | 0.7928 | 3900 | 0.6031 | 0.6089 | 0.5798 (+0.0394) | 0.3645 (+0.0395) | 0.6005 (+0.0999) | 0.5149 (+0.0596) |
476
+ | 0.8132 | 4000 | 0.5976 | 0.6082 | 0.5921 (+0.0517) | 0.3636 (+0.0386) | 0.5871 (+0.0864) | 0.5143 (+0.0589) |
477
+ | 0.8335 | 4100 | 0.5855 | 0.6092 | 0.5861 (+0.0456) | 0.3604 (+0.0353) | 0.5736 (+0.0729) | 0.5067 (+0.0513) |
478
+ | 0.8538 | 4200 | 0.6075 | 0.6055 | 0.5793 (+0.0388) | 0.3602 (+0.0351) | 0.5969 (+0.0962) | 0.5121 (+0.0567) |
479
+ | 0.8742 | 4300 | 0.5782 | 0.6079 | 0.5930 (+0.0526) | 0.3663 (+0.0412) | 0.6028 (+0.1022) | 0.5207 (+0.0653) |
480
+ | 0.8945 | 4400 | 0.5937 | 0.6042 | 0.5921 (+0.0516) | 0.3681 (+0.0430) | 0.6048 (+0.1042) | 0.5217 (+0.0663) |
481
+ | **0.9148** | **4500** | **0.6006** | **0.6015** | **0.5843 (+0.0439)** | **0.3698 (+0.0448)** | **0.6239 (+0.1232)** | **0.5260 (+0.0706)** |
482
+ | 0.9351 | 4600 | 0.5772 | 0.6010 | 0.5944 (+0.0540) | 0.3656 (+0.0406) | 0.5972 (+0.0965) | 0.5191 (+0.0637) |
483
+ | 0.9555 | 4700 | 0.6231 | 0.5998 | 0.5873 (+0.0468) | 0.3580 (+0.0330) | 0.5952 (+0.0946) | 0.5135 (+0.0581) |
484
+ | 0.9758 | 4800 | 0.5871 | 0.5994 | 0.5762 (+0.0357) | 0.3678 (+0.0427) | 0.5944 (+0.0937) | 0.5128 (+0.0574) |
485
+ | 0.9961 | 4900 | 0.5873 | 0.5995 | 0.5835 (+0.0431) | 0.3616 (+0.0365) | 0.6006 (+0.1000) | 0.5152 (+0.0599) |
486
+ | -1 | -1 | - | - | 0.5843 (+0.0439) | 0.3698 (+0.0448) | 0.6239 (+0.1232) | 0.5260 (+0.0706) |
487
+
488
+ * The bold row denotes the saved checkpoint.
489
+
490
+ ### Framework Versions
491
+ - Python: 3.10.18
492
+ - Sentence Transformers: 5.0.0
493
+ - Transformers: 4.56.0.dev0
494
+ - PyTorch: 2.7.1+cu126
495
+ - Accelerate: 1.9.0
496
+ - Datasets: 4.0.0
497
+ - Tokenizers: 0.21.4
498
+
499
+ ## Citation
500
+
501
+ ### BibTeX
502
+
503
+ #### Sentence Transformers
504
+ ```bibtex
505
+ @inproceedings{reimers-2019-sentence-bert,
506
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
507
+ author = "Reimers, Nils and Gurevych, Iryna",
508
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
509
+ month = "11",
510
+ year = "2019",
511
+ publisher = "Association for Computational Linguistics",
512
+ url = "https://arxiv.org/abs/1908.10084",
513
+ }
514
+ ```
515
+
516
+ #### RankNetLoss
517
+ ```bibtex
518
+ @inproceedings{burges2005learning,
519
+ title={Learning to Rank using Gradient Descent},
520
+ author={Burges, Chris and Shaked, Tal and Renshaw, Erin and Lazier, Ari and Deeds, Matt and Hamilton, Nicole and Hullender, Greg},
521
+ booktitle={Proceedings of the 22nd international conference on Machine learning},
522
+ pages={89--96},
523
+ year={2005}
524
+ }
525
+ ```
526
+
527
+ <!--
528
+ ## Glossary
529
+
530
+ *Clearly define terms in order to be accessible across audiences.*
531
+ -->
532
+
533
+ <!--
534
+ ## Model Card Authors
535
+
536
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
537
+ -->
538
+
539
+ <!--
540
+ ## Model Card Contact
541
+
542
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
543
+ -->
config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "ModernBertForSequenceClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 50281,
8
+ "causal_mask": false,
9
+ "classifier_activation": "gelu",
10
+ "classifier_bias": false,
11
+ "classifier_dropout": 0.0,
12
+ "classifier_pooling": "mean",
13
+ "cls_token_id": 50281,
14
+ "decoder_bias": true,
15
+ "deterministic_flash_attn": false,
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 512,
23
+ "id2label": {
24
+ "0": "LABEL_0"
25
+ },
26
+ "initializer_cutoff_factor": 2.0,
27
+ "initializer_range": 0.02,
28
+ "intermediate_size": 768,
29
+ "is_causal": false,
30
+ "label2id": {
31
+ "LABEL_0": 0
32
+ },
33
+ "layer_norm_eps": 1e-05,
34
+ "local_attention": 128,
35
+ "local_rope_theta": 160000.0,
36
+ "max_position_embeddings": 7999,
37
+ "mlp_bias": false,
38
+ "mlp_dropout": 0.0,
39
+ "model_type": "modernbert",
40
+ "norm_bias": false,
41
+ "norm_eps": 1e-05,
42
+ "num_attention_heads": 8,
43
+ "num_hidden_layers": 19,
44
+ "pad_token_id": 50283,
45
+ "position_embedding_type": "sans_pos",
46
+ "repad_logits_with_grad": false,
47
+ "sentence_transformers": {
48
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
49
+ "version": "5.0.0"
50
+ },
51
+ "sep_token_id": 50282,
52
+ "sparse_pred_ignore_index": -100,
53
+ "sparse_prediction": false,
54
+ "torch_dtype": "float32",
55
+ "transformers_version": "4.56.0.dev0",
56
+ "vocab_size": 50368
57
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a79ab093a2dc98209942fd8375949a327117e85735d7ac4a755c83f632491292
3
+ size 273643524
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 7999,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }