AdamLucek commited on
Commit
b5f26bd
·
verified ·
1 Parent(s): fb0ad0e

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,758 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:8760
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: nomic-ai/modernbert-embed-base
14
+ widget:
15
+ - source_sentence: What is the interpretation described as inappropriate?
16
+ sentences:
17
+ - . Factors to be considered in determining the reasonableness of the lawyer’s expectation
18
+ of confidentiality include the sensitivity of the information and the extent to
19
+ which the privacy of the communication is protected by law or by a confidentiality
20
+ agreement
21
+ - . 20 of competition and rests on an inappropriate interpretation of SBA regulation
22
+ 13 C.F.R. § 125.9(b)(3)(i). See SHS MJAR at 16–23; VCH MJAR at 16–23
23
+ - . 29-2, the CIA’s declaration explains in much more detail what is meant by “intelligence
24
+ sources and methods” or “intelligence activities,” see Third Lutz Decl. ¶–30
25
+ - source_sentence: What is the source of the information regarding Senetas's knowledge
26
+ about FDA approval?
27
+ sentences:
28
+ - . . . the exemption under which the deletion is made, shall be indicated at the
29
+ place in the record where such deletion is made.” Id. Finally, the FOIA provides
30
+ that “a court shall accord substantial weight to an affidavit of an agency concerning
31
+ the agency’s determination as to technical feasibility under . . . subsection
32
+ (b).” Id. § 552(a)(4)(B)
33
+ - . 52 Senetas asserts that it learned about the plan to discontinue seeking FDA
34
+ approval for DR’s products in September of 2018 after the decision had been made
35
+ without any Board involvement. Galbally Dep. Tr. 66:19-23
36
+ - . Conclusion Video footage, like social media evidence, is susceptible to alteration,
37
+ and the increased availability of new technology, particularly the advent of image-generating
38
+ artificial intelligence, may present unique challenges in authenticating videos
39
+ and photographs
40
+ - source_sentence: What does Class Deviation CD-2020-14 allow for at the contract
41
+ level?
42
+ sentences:
43
+ - social media company that 7At trial, the State had attempted to introduce evidence
44
+ that was purportedly a printout from the MySpace page of the girlfriend of the
45
+ defendant (whose nickname was allegedly “Boozy”) to demonstrate that the girlfriend
46
+ had threatened a State’s witness
47
+ - .” Supplement 2 to Class Deviation CD-2020-14 (Supplement 2), AR at 2904. The
48
+ Senior Procurement Executive further elaborated that Class Deviation CD-2020-14
49
+ “allowed for the use of ‘unpriced labor’ categories at the contract level for
50
+ certain IDIQ multiple-award contracts.” Id
51
+ - . Circuit has recognized that, separate from claims seeking relief for specific
52
+ requests made under the FOIA, requesting parties may also assert a “claim that
53
+ an agency policy or practice will impair the party’s lawful access to information
54
+ in the future.” Payne Enters., Inc. v. United States, 837 F.2d 486, 491 (D.C.
55
+ Cir. 1988) (emphasis in original); 31 accord Newport Aeronautical Sales v
56
+ - source_sentence: What should the agency describe about the non-exempt material in
57
+ a document?
58
+ sentences:
59
+ - . A straightforward reading of the 2019 NDAA reveals that the Commission’s members
60
+ are “temporary” federal employees. The Commission “shall be considered . . . a
61
+ temporary organization under [5 U.S.C. § 3161].” Pub. L. No. 115-232, § 1051(a)(2).
62
+ The Commission’s 15 members are “appointed for the life of the Commission” and
63
+ are “Federal employees.” Id. § 1051(a)(4)(A), (6)–(7)
64
+ - .15 Posteriormente, en armonía con el marco constitucional y doctrinario previamente
65
+ reseñado, el 13 de julio de 2011, nuestra Legislatura aprobó, la Ley del Derecho
66
+ sobre la Propia Imagen o Ley Núm. 139-201116. Dicho precepto legal estatuye una
67
+ causa de acción en daños y perjuicios debido al uso no autorizado de la imagen
68
+ con fines comerciales o publicitarios
69
+ - . To this end, the Circuit has said that “[i]n addition to a statement of its
70
+ reasons, an agency should also describe what proportion of the information in
71
+ a document is non-exempt and how that material is dispersed throughout the document.”
72
+ Id
73
+ - source_sentence: Which offeror is mentioned as getting in if there is a points discrepancy?
74
+ sentences:
75
+ - . at 9:14–19 (“[I]f an offeror does not have the same number of points, if it’s
76
+ the 130th offeror and it doesn’t have the same number of points as the 90th offeror,
77
+ then the solicitation says the 90th offeror gets in and the 130th doesn’t.”)
78
+ - '. But the State had to establish that the communications were the handiwork of
79
+ the defendant. It was in that context that temporal proximity came into play:
80
+ The timing of the communications relative to other events connecting the defendant
81
+ to the alleged crime was circumstantial evidence of the defendant’s authorship.
82
+ Id. at 674-76'
83
+ - . Since the plaintiff does not address this issue in its sur-reply brief in No.
84
+ 11-445, and because the plaintiff does not ask the Court to direct the DOJ to
85
+ produce Document 3 to the plaintiff, the plaintiff does not appear to continue
86
+ to challenge the DOJ’s decision to withhold Document 3. 140 recorded decision
87
+ to implement the opinion.” Id. at 32
88
+ pipeline_tag: sentence-similarity
89
+ library_name: sentence-transformers
90
+ metrics:
91
+ - cosine_accuracy@1
92
+ - cosine_accuracy@3
93
+ - cosine_accuracy@5
94
+ - cosine_accuracy@10
95
+ - cosine_precision@1
96
+ - cosine_precision@3
97
+ - cosine_precision@5
98
+ - cosine_precision@10
99
+ - cosine_recall@1
100
+ - cosine_recall@3
101
+ - cosine_recall@5
102
+ - cosine_recall@10
103
+ - cosine_ndcg@10
104
+ - cosine_mrr@10
105
+ - cosine_map@100
106
+ model-index:
107
+ - name: Fine-tuned with [QuicKB](https://github.com/ALucek/QuicKB)
108
+ results:
109
+ - task:
110
+ type: information-retrieval
111
+ name: Information Retrieval
112
+ dataset:
113
+ name: dim 768
114
+ type: dim_768
115
+ metrics:
116
+ - type: cosine_accuracy@1
117
+ value: 0.582135523613963
118
+ name: Cosine Accuracy@1
119
+ - type: cosine_accuracy@3
120
+ value: 0.7494866529774127
121
+ name: Cosine Accuracy@3
122
+ - type: cosine_accuracy@5
123
+ value: 0.795687885010267
124
+ name: Cosine Accuracy@5
125
+ - type: cosine_accuracy@10
126
+ value: 0.8572895277207392
127
+ name: Cosine Accuracy@10
128
+ - type: cosine_precision@1
129
+ value: 0.582135523613963
130
+ name: Cosine Precision@1
131
+ - type: cosine_precision@3
132
+ value: 0.24982888432580422
133
+ name: Cosine Precision@3
134
+ - type: cosine_precision@5
135
+ value: 0.1591375770020534
136
+ name: Cosine Precision@5
137
+ - type: cosine_precision@10
138
+ value: 0.08572895277207392
139
+ name: Cosine Precision@10
140
+ - type: cosine_recall@1
141
+ value: 0.582135523613963
142
+ name: Cosine Recall@1
143
+ - type: cosine_recall@3
144
+ value: 0.7494866529774127
145
+ name: Cosine Recall@3
146
+ - type: cosine_recall@5
147
+ value: 0.795687885010267
148
+ name: Cosine Recall@5
149
+ - type: cosine_recall@10
150
+ value: 0.8572895277207392
151
+ name: Cosine Recall@10
152
+ - type: cosine_ndcg@10
153
+ value: 0.7211793259435271
154
+ name: Cosine Ndcg@10
155
+ - type: cosine_mrr@10
156
+ value: 0.6775296600501939
157
+ name: Cosine Mrr@10
158
+ - type: cosine_map@100
159
+ value: 0.6827316333877884
160
+ name: Cosine Map@100
161
+ - task:
162
+ type: information-retrieval
163
+ name: Information Retrieval
164
+ dataset:
165
+ name: dim 512
166
+ type: dim_512
167
+ metrics:
168
+ - type: cosine_accuracy@1
169
+ value: 0.5657084188911704
170
+ name: Cosine Accuracy@1
171
+ - type: cosine_accuracy@3
172
+ value: 0.7330595482546202
173
+ name: Cosine Accuracy@3
174
+ - type: cosine_accuracy@5
175
+ value: 0.7915811088295688
176
+ name: Cosine Accuracy@5
177
+ - type: cosine_accuracy@10
178
+ value: 0.8531827515400411
179
+ name: Cosine Accuracy@10
180
+ - type: cosine_precision@1
181
+ value: 0.5657084188911704
182
+ name: Cosine Precision@1
183
+ - type: cosine_precision@3
184
+ value: 0.24435318275154005
185
+ name: Cosine Precision@3
186
+ - type: cosine_precision@5
187
+ value: 0.15831622176591376
188
+ name: Cosine Precision@5
189
+ - type: cosine_precision@10
190
+ value: 0.08531827515400411
191
+ name: Cosine Precision@10
192
+ - type: cosine_recall@1
193
+ value: 0.5657084188911704
194
+ name: Cosine Recall@1
195
+ - type: cosine_recall@3
196
+ value: 0.7330595482546202
197
+ name: Cosine Recall@3
198
+ - type: cosine_recall@5
199
+ value: 0.7915811088295688
200
+ name: Cosine Recall@5
201
+ - type: cosine_recall@10
202
+ value: 0.8531827515400411
203
+ name: Cosine Recall@10
204
+ - type: cosine_ndcg@10
205
+ value: 0.7102670568981261
206
+ name: Cosine Ndcg@10
207
+ - type: cosine_mrr@10
208
+ value: 0.6645362765229291
209
+ name: Cosine Mrr@10
210
+ - type: cosine_map@100
211
+ value: 0.6695389256684248
212
+ name: Cosine Map@100
213
+ - task:
214
+ type: information-retrieval
215
+ name: Information Retrieval
216
+ dataset:
217
+ name: dim 256
218
+ type: dim_256
219
+ metrics:
220
+ - type: cosine_accuracy@1
221
+ value: 0.5410677618069816
222
+ name: Cosine Accuracy@1
223
+ - type: cosine_accuracy@3
224
+ value: 0.7063655030800822
225
+ name: Cosine Accuracy@3
226
+ - type: cosine_accuracy@5
227
+ value: 0.7659137577002053
228
+ name: Cosine Accuracy@5
229
+ - type: cosine_accuracy@10
230
+ value: 0.8305954825462012
231
+ name: Cosine Accuracy@10
232
+ - type: cosine_precision@1
233
+ value: 0.5410677618069816
234
+ name: Cosine Precision@1
235
+ - type: cosine_precision@3
236
+ value: 0.2354551676933607
237
+ name: Cosine Precision@3
238
+ - type: cosine_precision@5
239
+ value: 0.15318275154004105
240
+ name: Cosine Precision@5
241
+ - type: cosine_precision@10
242
+ value: 0.08305954825462013
243
+ name: Cosine Precision@10
244
+ - type: cosine_recall@1
245
+ value: 0.5410677618069816
246
+ name: Cosine Recall@1
247
+ - type: cosine_recall@3
248
+ value: 0.7063655030800822
249
+ name: Cosine Recall@3
250
+ - type: cosine_recall@5
251
+ value: 0.7659137577002053
252
+ name: Cosine Recall@5
253
+ - type: cosine_recall@10
254
+ value: 0.8305954825462012
255
+ name: Cosine Recall@10
256
+ - type: cosine_ndcg@10
257
+ value: 0.6839216686374571
258
+ name: Cosine Ndcg@10
259
+ - type: cosine_mrr@10
260
+ value: 0.6371842508392814
261
+ name: Cosine Mrr@10
262
+ - type: cosine_map@100
263
+ value: 0.6427516419970609
264
+ name: Cosine Map@100
265
+ - task:
266
+ type: information-retrieval
267
+ name: Information Retrieval
268
+ dataset:
269
+ name: dim 128
270
+ type: dim_128
271
+ metrics:
272
+ - type: cosine_accuracy@1
273
+ value: 0.4887063655030801
274
+ name: Cosine Accuracy@1
275
+ - type: cosine_accuracy@3
276
+ value: 0.6581108829568788
277
+ name: Cosine Accuracy@3
278
+ - type: cosine_accuracy@5
279
+ value: 0.7176591375770021
280
+ name: Cosine Accuracy@5
281
+ - type: cosine_accuracy@10
282
+ value: 0.7802874743326489
283
+ name: Cosine Accuracy@10
284
+ - type: cosine_precision@1
285
+ value: 0.4887063655030801
286
+ name: Cosine Precision@1
287
+ - type: cosine_precision@3
288
+ value: 0.2193702943189596
289
+ name: Cosine Precision@3
290
+ - type: cosine_precision@5
291
+ value: 0.14353182751540042
292
+ name: Cosine Precision@5
293
+ - type: cosine_precision@10
294
+ value: 0.07802874743326488
295
+ name: Cosine Precision@10
296
+ - type: cosine_recall@1
297
+ value: 0.4887063655030801
298
+ name: Cosine Recall@1
299
+ - type: cosine_recall@3
300
+ value: 0.6581108829568788
301
+ name: Cosine Recall@3
302
+ - type: cosine_recall@5
303
+ value: 0.7176591375770021
304
+ name: Cosine Recall@5
305
+ - type: cosine_recall@10
306
+ value: 0.7802874743326489
307
+ name: Cosine Recall@10
308
+ - type: cosine_ndcg@10
309
+ value: 0.6318826024721981
310
+ name: Cosine Ndcg@10
311
+ - type: cosine_mrr@10
312
+ value: 0.5846004041589256
313
+ name: Cosine Mrr@10
314
+ - type: cosine_map@100
315
+ value: 0.5917468903182894
316
+ name: Cosine Map@100
317
+ - task:
318
+ type: information-retrieval
319
+ name: Information Retrieval
320
+ dataset:
321
+ name: dim 64
322
+ type: dim_64
323
+ metrics:
324
+ - type: cosine_accuracy@1
325
+ value: 0.3798767967145791
326
+ name: Cosine Accuracy@1
327
+ - type: cosine_accuracy@3
328
+ value: 0.5462012320328542
329
+ name: Cosine Accuracy@3
330
+ - type: cosine_accuracy@5
331
+ value: 0.6139630390143738
332
+ name: Cosine Accuracy@5
333
+ - type: cosine_accuracy@10
334
+ value: 0.704312114989733
335
+ name: Cosine Accuracy@10
336
+ - type: cosine_precision@1
337
+ value: 0.3798767967145791
338
+ name: Cosine Precision@1
339
+ - type: cosine_precision@3
340
+ value: 0.1820670773442847
341
+ name: Cosine Precision@3
342
+ - type: cosine_precision@5
343
+ value: 0.12279260780287474
344
+ name: Cosine Precision@5
345
+ - type: cosine_precision@10
346
+ value: 0.0704312114989733
347
+ name: Cosine Precision@10
348
+ - type: cosine_recall@1
349
+ value: 0.3798767967145791
350
+ name: Cosine Recall@1
351
+ - type: cosine_recall@3
352
+ value: 0.5462012320328542
353
+ name: Cosine Recall@3
354
+ - type: cosine_recall@5
355
+ value: 0.6139630390143738
356
+ name: Cosine Recall@5
357
+ - type: cosine_recall@10
358
+ value: 0.704312114989733
359
+ name: Cosine Recall@10
360
+ - type: cosine_ndcg@10
361
+ value: 0.5333651837657117
362
+ name: Cosine Ndcg@10
363
+ - type: cosine_mrr@10
364
+ value: 0.4796983475114887
365
+ name: Cosine Mrr@10
366
+ - type: cosine_map@100
367
+ value: 0.4877644055271696
368
+ name: Cosine Map@100
369
+ ---
370
+
371
+ # Fine-tuned with [QuicKB](https://github.com/ALucek/QuicKB)
372
+
373
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
374
+
375
+ ## Model Details
376
+
377
+ ### Model Description
378
+ - **Model Type:** Sentence Transformer
379
+ - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
380
+ - **Maximum Sequence Length:** 1024 tokens
381
+ - **Output Dimensionality:** 768 dimensions
382
+ - **Similarity Function:** Cosine Similarity
383
+ <!-- - **Training Dataset:** Unknown -->
384
+ - **Language:** en
385
+ - **License:** apache-2.0
386
+
387
+ ### Model Sources
388
+
389
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
390
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
391
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
392
+
393
+ ### Full Model Architecture
394
+
395
+ ```
396
+ SentenceTransformer(
397
+ (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: ModernBertModel
398
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
399
+ (2): Normalize()
400
+ )
401
+ ```
402
+
403
+ ## Usage
404
+
405
+ ### Direct Usage (Sentence Transformers)
406
+
407
+ First install the Sentence Transformers library:
408
+
409
+ ```bash
410
+ pip install -U sentence-transformers
411
+ ```
412
+
413
+ Then you can load this model and run inference.
414
+ ```python
415
+ from sentence_transformers import SentenceTransformer
416
+
417
+ # Download from the 🤗 Hub
418
+ model = SentenceTransformer("AdamLucek/modernbert-embed-quickb")
419
+ # Run inference
420
+ sentences = [
421
+ 'Which offeror is mentioned as getting in if there is a points discrepancy?',
422
+ '. at 9:14–19 (“[I]f an offeror does not have the same number of points, if it’s the 130th offeror and it doesn’t have the same number of points as the 90th offeror, then the solicitation says the 90th offeror gets in and the 130th doesn’t.”)',
423
+ '. Since the plaintiff does not address this issue in its sur-reply brief in No. 11-445, and because the plaintiff does not ask the Court to direct the DOJ to produce Document 3 to the plaintiff, the plaintiff does not appear to continue to challenge the DOJ’s decision to withhold Document 3. 140 recorded decision to implement the opinion.” Id. at 32',
424
+ ]
425
+ embeddings = model.encode(sentences)
426
+ print(embeddings.shape)
427
+ # [3, 768]
428
+
429
+ # Get the similarity scores for the embeddings
430
+ similarities = model.similarity(embeddings, embeddings)
431
+ print(similarities.shape)
432
+ # [3, 3]
433
+ ```
434
+
435
+ <!--
436
+ ### Direct Usage (Transformers)
437
+
438
+ <details><summary>Click to see the direct usage in Transformers</summary>
439
+
440
+ </details>
441
+ -->
442
+
443
+ <!--
444
+ ### Downstream Usage (Sentence Transformers)
445
+
446
+ You can finetune this model on your own dataset.
447
+
448
+ <details><summary>Click to expand</summary>
449
+
450
+ </details>
451
+ -->
452
+
453
+ <!--
454
+ ### Out-of-Scope Use
455
+
456
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
457
+ -->
458
+
459
+ ## Evaluation
460
+
461
+ ### Metrics
462
+
463
+ #### Information Retrieval
464
+
465
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
466
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
467
+
468
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
469
+ |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
470
+ | cosine_accuracy@1 | 0.5821 | 0.5657 | 0.5411 | 0.4887 | 0.3799 |
471
+ | cosine_accuracy@3 | 0.7495 | 0.7331 | 0.7064 | 0.6581 | 0.5462 |
472
+ | cosine_accuracy@5 | 0.7957 | 0.7916 | 0.7659 | 0.7177 | 0.614 |
473
+ | cosine_accuracy@10 | 0.8573 | 0.8532 | 0.8306 | 0.7803 | 0.7043 |
474
+ | cosine_precision@1 | 0.5821 | 0.5657 | 0.5411 | 0.4887 | 0.3799 |
475
+ | cosine_precision@3 | 0.2498 | 0.2444 | 0.2355 | 0.2194 | 0.1821 |
476
+ | cosine_precision@5 | 0.1591 | 0.1583 | 0.1532 | 0.1435 | 0.1228 |
477
+ | cosine_precision@10 | 0.0857 | 0.0853 | 0.0831 | 0.078 | 0.0704 |
478
+ | cosine_recall@1 | 0.5821 | 0.5657 | 0.5411 | 0.4887 | 0.3799 |
479
+ | cosine_recall@3 | 0.7495 | 0.7331 | 0.7064 | 0.6581 | 0.5462 |
480
+ | cosine_recall@5 | 0.7957 | 0.7916 | 0.7659 | 0.7177 | 0.614 |
481
+ | cosine_recall@10 | 0.8573 | 0.8532 | 0.8306 | 0.7803 | 0.7043 |
482
+ | **cosine_ndcg@10** | **0.7212** | **0.7103** | **0.6839** | **0.6319** | **0.5334** |
483
+ | cosine_mrr@10 | 0.6775 | 0.6645 | 0.6372 | 0.5846 | 0.4797 |
484
+ | cosine_map@100 | 0.6827 | 0.6695 | 0.6428 | 0.5917 | 0.4878 |
485
+
486
+ <!--
487
+ ## Bias, Risks and Limitations
488
+
489
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
490
+ -->
491
+
492
+ <!--
493
+ ### Recommendations
494
+
495
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
496
+ -->
497
+
498
+ ## Training Details
499
+
500
+ ### Training Dataset
501
+
502
+ #### Unnamed Dataset
503
+
504
+ * Size: 8,760 training samples
505
+ * Columns: <code>anchor</code> and <code>positive</code>
506
+ * Approximate statistics based on the first 1000 samples:
507
+ | | anchor | positive |
508
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
509
+ | type | string | string |
510
+ | details | <ul><li>min: 7 tokens</li><li>mean: 15.54 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 76.24 tokens</li><li>max: 169 tokens</li></ul> |
511
+ * Samples:
512
+ | anchor | positive |
513
+ |:--------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
514
+ | <code>What is being compared in the Circuit's statement?</code> | <code>.2d at 1389–90. The Circuit rejected this analogy, stating that, in contrast to the CIA Act, the NSA Act “protects not only organizational matters . . . but also ‘any information with respect to the activities’ of the NSA.” Id. at 1390</code> |
515
+ | <code>What type of internal documents used by the CIA in FOIA requests is mentioned?</code> | <code>. 108 Accordingly, the Court holds that certain specific categories of information withheld by the CIA in this case pursuant to § 403g clearly fall outside that provision’s scope, including (1) internal templates utilized by the CIA in tasking FOIA requests, (2) internal rules, policies and procedures governing FOIA processing, and (7) information about the CIA’s “core functions,” including</code> |
516
+ | <code>How many documents did the CIA withhold under Exemption 2?</code> | <code>. The CIA states in its declaration that all thirteen documents withheld under 38 The plaintiff previously indicated that it intended to challenge Exemption 2 withholding decisions made by the ODNI as well. See Hackett Decl. Ex. E at 1, ECF No. 29-8. The plaintiff, however, does not pursue that challenge in its opposition to the defendants’ motions for summary judgment in No. 11-445</code> |
517
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
518
+ ```json
519
+ {
520
+ "loss": "MultipleNegativesRankingLoss",
521
+ "matryoshka_dims": [
522
+ 768,
523
+ 512,
524
+ 256,
525
+ 128,
526
+ 64
527
+ ],
528
+ "matryoshka_weights": [
529
+ 1,
530
+ 1,
531
+ 1,
532
+ 1,
533
+ 1
534
+ ],
535
+ "n_dims_per_step": -1
536
+ }
537
+ ```
538
+
539
+ ### Training Hyperparameters
540
+ #### Non-Default Hyperparameters
541
+
542
+ - `eval_strategy`: epoch
543
+ - `per_device_train_batch_size`: 32
544
+ - `gradient_accumulation_steps`: 16
545
+ - `learning_rate`: 2e-05
546
+ - `num_train_epochs`: 4
547
+ - `lr_scheduler_type`: cosine
548
+ - `warmup_ratio`: 0.1
549
+ - `bf16`: True
550
+ - `tf32`: True
551
+ - `load_best_model_at_end`: True
552
+ - `optim`: adamw_torch_fused
553
+ - `batch_sampler`: no_duplicates
554
+
555
+ #### All Hyperparameters
556
+ <details><summary>Click to expand</summary>
557
+
558
+ - `overwrite_output_dir`: False
559
+ - `do_predict`: False
560
+ - `eval_strategy`: epoch
561
+ - `prediction_loss_only`: True
562
+ - `per_device_train_batch_size`: 32
563
+ - `per_device_eval_batch_size`: 8
564
+ - `per_gpu_train_batch_size`: None
565
+ - `per_gpu_eval_batch_size`: None
566
+ - `gradient_accumulation_steps`: 16
567
+ - `eval_accumulation_steps`: None
568
+ - `torch_empty_cache_steps`: None
569
+ - `learning_rate`: 2e-05
570
+ - `weight_decay`: 0.0
571
+ - `adam_beta1`: 0.9
572
+ - `adam_beta2`: 0.999
573
+ - `adam_epsilon`: 1e-08
574
+ - `max_grad_norm`: 1.0
575
+ - `num_train_epochs`: 4
576
+ - `max_steps`: -1
577
+ - `lr_scheduler_type`: cosine
578
+ - `lr_scheduler_kwargs`: {}
579
+ - `warmup_ratio`: 0.1
580
+ - `warmup_steps`: 0
581
+ - `log_level`: passive
582
+ - `log_level_replica`: warning
583
+ - `log_on_each_node`: True
584
+ - `logging_nan_inf_filter`: True
585
+ - `save_safetensors`: True
586
+ - `save_on_each_node`: False
587
+ - `save_only_model`: False
588
+ - `restore_callback_states_from_checkpoint`: False
589
+ - `no_cuda`: False
590
+ - `use_cpu`: False
591
+ - `use_mps_device`: False
592
+ - `seed`: 42
593
+ - `data_seed`: None
594
+ - `jit_mode_eval`: False
595
+ - `use_ipex`: False
596
+ - `bf16`: True
597
+ - `fp16`: False
598
+ - `fp16_opt_level`: O1
599
+ - `half_precision_backend`: auto
600
+ - `bf16_full_eval`: False
601
+ - `fp16_full_eval`: False
602
+ - `tf32`: True
603
+ - `local_rank`: 0
604
+ - `ddp_backend`: None
605
+ - `tpu_num_cores`: None
606
+ - `tpu_metrics_debug`: False
607
+ - `debug`: []
608
+ - `dataloader_drop_last`: False
609
+ - `dataloader_num_workers`: 0
610
+ - `dataloader_prefetch_factor`: None
611
+ - `past_index`: -1
612
+ - `disable_tqdm`: False
613
+ - `remove_unused_columns`: True
614
+ - `label_names`: None
615
+ - `load_best_model_at_end`: True
616
+ - `ignore_data_skip`: False
617
+ - `fsdp`: []
618
+ - `fsdp_min_num_params`: 0
619
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
620
+ - `fsdp_transformer_layer_cls_to_wrap`: None
621
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
622
+ - `deepspeed`: None
623
+ - `label_smoothing_factor`: 0.0
624
+ - `optim`: adamw_torch_fused
625
+ - `optim_args`: None
626
+ - `adafactor`: False
627
+ - `group_by_length`: False
628
+ - `length_column_name`: length
629
+ - `ddp_find_unused_parameters`: None
630
+ - `ddp_bucket_cap_mb`: None
631
+ - `ddp_broadcast_buffers`: False
632
+ - `dataloader_pin_memory`: True
633
+ - `dataloader_persistent_workers`: False
634
+ - `skip_memory_metrics`: True
635
+ - `use_legacy_prediction_loop`: False
636
+ - `push_to_hub`: False
637
+ - `resume_from_checkpoint`: None
638
+ - `hub_model_id`: None
639
+ - `hub_strategy`: every_save
640
+ - `hub_private_repo`: None
641
+ - `hub_always_push`: False
642
+ - `gradient_checkpointing`: False
643
+ - `gradient_checkpointing_kwargs`: None
644
+ - `include_inputs_for_metrics`: False
645
+ - `include_for_metrics`: []
646
+ - `eval_do_concat_batches`: True
647
+ - `fp16_backend`: auto
648
+ - `push_to_hub_model_id`: None
649
+ - `push_to_hub_organization`: None
650
+ - `mp_parameters`:
651
+ - `auto_find_batch_size`: False
652
+ - `full_determinism`: False
653
+ - `torchdynamo`: None
654
+ - `ray_scope`: last
655
+ - `ddp_timeout`: 1800
656
+ - `torch_compile`: False
657
+ - `torch_compile_backend`: None
658
+ - `torch_compile_mode`: None
659
+ - `dispatch_batches`: None
660
+ - `split_batches`: None
661
+ - `include_tokens_per_second`: False
662
+ - `include_num_input_tokens_seen`: False
663
+ - `neftune_noise_alpha`: None
664
+ - `optim_target_modules`: None
665
+ - `batch_eval_metrics`: False
666
+ - `eval_on_start`: False
667
+ - `use_liger_kernel`: False
668
+ - `eval_use_gather_object`: False
669
+ - `average_tokens_across_devices`: False
670
+ - `prompts`: None
671
+ - `batch_sampler`: no_duplicates
672
+ - `multi_dataset_batch_sampler`: proportional
673
+
674
+ </details>
675
+
676
+ ### Training Logs
677
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
678
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
679
+ | 0.5839 | 10 | 67.1727 | - | - | - | - | - |
680
+ | 1.0 | 18 | - | 0.6999 | 0.6820 | 0.6577 | 0.5988 | 0.4855 |
681
+ | 1.1168 | 20 | 32.4667 | - | - | - | - | - |
682
+ | 1.7007 | 30 | 27.9435 | - | - | - | - | - |
683
+ | 2.0 | 36 | - | 0.7167 | 0.7002 | 0.6764 | 0.6233 | 0.5187 |
684
+ | 2.2336 | 40 | 22.2924 | - | - | - | - | - |
685
+ | 2.8175 | 50 | 20.5125 | - | - | - | - | - |
686
+ | 3.0 | 54 | - | 0.7190 | 0.7080 | 0.6824 | 0.6318 | 0.5339 |
687
+ | 3.3504 | 60 | 18.3621 | - | - | - | - | - |
688
+ | **3.8175** | **68** | **-** | **0.7212** | **0.7103** | **0.6839** | **0.6319** | **0.5334** |
689
+
690
+ * The bold row denotes the saved checkpoint.
691
+
692
+ ### Framework Versions
693
+ - Python: 3.10.12
694
+ - Sentence Transformers: 3.4.0
695
+ - Transformers: 4.48.1
696
+ - PyTorch: 2.5.1+cu124
697
+ - Accelerate: 1.3.0
698
+ - Datasets: 3.2.0
699
+ - Tokenizers: 0.21.0
700
+
701
+ ## Citation
702
+
703
+ ### BibTeX
704
+
705
+ #### Sentence Transformers
706
+ ```bibtex
707
+ @inproceedings{reimers-2019-sentence-bert,
708
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
709
+ author = "Reimers, Nils and Gurevych, Iryna",
710
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
711
+ month = "11",
712
+ year = "2019",
713
+ publisher = "Association for Computational Linguistics",
714
+ url = "https://arxiv.org/abs/1908.10084",
715
+ }
716
+ ```
717
+
718
+ #### MatryoshkaLoss
719
+ ```bibtex
720
+ @misc{kusupati2024matryoshka,
721
+ title={Matryoshka Representation Learning},
722
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
723
+ year={2024},
724
+ eprint={2205.13147},
725
+ archivePrefix={arXiv},
726
+ primaryClass={cs.LG}
727
+ }
728
+ ```
729
+
730
+ #### MultipleNegativesRankingLoss
731
+ ```bibtex
732
+ @misc{henderson2017efficient,
733
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
734
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
735
+ year={2017},
736
+ eprint={1705.00652},
737
+ archivePrefix={arXiv},
738
+ primaryClass={cs.CL}
739
+ }
740
+ ```
741
+
742
+ <!--
743
+ ## Glossary
744
+
745
+ *Clearly define terms in order to be accessible across audiences.*
746
+ -->
747
+
748
+ <!--
749
+ ## Model Card Authors
750
+
751
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
752
+ -->
753
+
754
+ <!--
755
+ ## Model Card Contact
756
+
757
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
758
+ -->
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nomic-ai/modernbert-embed-base",
3
+ "architectures": [
4
+ "ModernBertModel"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 50281,
9
+ "classifier_activation": "gelu",
10
+ "classifier_bias": false,
11
+ "classifier_dropout": 0.0,
12
+ "classifier_pooling": "mean",
13
+ "cls_token_id": 50281,
14
+ "decoder_bias": true,
15
+ "deterministic_flash_attn": false,
16
+ "embedding_dropout": 0.0,
17
+ "eos_token_id": 50282,
18
+ "global_attn_every_n_layers": 3,
19
+ "global_rope_theta": 160000.0,
20
+ "gradient_checkpointing": false,
21
+ "hidden_activation": "gelu",
22
+ "hidden_size": 768,
23
+ "initializer_cutoff_factor": 2.0,
24
+ "initializer_range": 0.02,
25
+ "intermediate_size": 1152,
26
+ "layer_norm_eps": 1e-05,
27
+ "local_attention": 128,
28
+ "local_rope_theta": 10000.0,
29
+ "max_position_embeddings": 8192,
30
+ "mlp_bias": false,
31
+ "mlp_dropout": 0.0,
32
+ "model_type": "modernbert",
33
+ "norm_bias": false,
34
+ "norm_eps": 1e-05,
35
+ "num_attention_heads": 12,
36
+ "num_hidden_layers": 22,
37
+ "pad_token_id": 50283,
38
+ "position_embedding_type": "absolute",
39
+ "reference_compile": true,
40
+ "repad_logits_with_grad": false,
41
+ "sep_token_id": 50282,
42
+ "sparse_pred_ignore_index": -100,
43
+ "sparse_prediction": false,
44
+ "torch_dtype": "float32",
45
+ "transformers_version": "4.48.1",
46
+ "vocab_size": 50368
47
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.0",
4
+ "transformers": "4.48.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31f4fe186ba37b53cd5a38aa632cffb0ed11ff885fdcb0021d380cd6e430bad2
3
+ size 596070136
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 1024,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": true,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,945 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "|||IP_ADDRESS|||",
5
+ "lstrip": false,
6
+ "normalized": true,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": false
10
+ },
11
+ "1": {
12
+ "content": "<|padding|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "50254": {
20
+ "content": " ",
21
+ "lstrip": false,
22
+ "normalized": true,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": false
26
+ },
27
+ "50255": {
28
+ "content": " ",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": false
34
+ },
35
+ "50256": {
36
+ "content": " ",
37
+ "lstrip": false,
38
+ "normalized": true,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": false
42
+ },
43
+ "50257": {
44
+ "content": " ",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "50258": {
52
+ "content": " ",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "50259": {
60
+ "content": " ",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "50260": {
68
+ "content": " ",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ },
75
+ "50261": {
76
+ "content": " ",
77
+ "lstrip": false,
78
+ "normalized": true,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": false
82
+ },
83
+ "50262": {
84
+ "content": " ",
85
+ "lstrip": false,
86
+ "normalized": true,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": false
90
+ },
91
+ "50263": {
92
+ "content": " ",
93
+ "lstrip": false,
94
+ "normalized": true,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": false
98
+ },
99
+ "50264": {
100
+ "content": " ",
101
+ "lstrip": false,
102
+ "normalized": true,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": false
106
+ },
107
+ "50265": {
108
+ "content": " ",
109
+ "lstrip": false,
110
+ "normalized": true,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": false
114
+ },
115
+ "50266": {
116
+ "content": " ",
117
+ "lstrip": false,
118
+ "normalized": true,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": false
122
+ },
123
+ "50267": {
124
+ "content": " ",
125
+ "lstrip": false,
126
+ "normalized": true,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": false
130
+ },
131
+ "50268": {
132
+ "content": " ",
133
+ "lstrip": false,
134
+ "normalized": true,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": false
138
+ },
139
+ "50269": {
140
+ "content": " ",
141
+ "lstrip": false,
142
+ "normalized": true,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": false
146
+ },
147
+ "50270": {
148
+ "content": " ",
149
+ "lstrip": false,
150
+ "normalized": true,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": false
154
+ },
155
+ "50271": {
156
+ "content": " ",
157
+ "lstrip": false,
158
+ "normalized": true,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": false
162
+ },
163
+ "50272": {
164
+ "content": " ",
165
+ "lstrip": false,
166
+ "normalized": true,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": false
170
+ },
171
+ "50273": {
172
+ "content": " ",
173
+ "lstrip": false,
174
+ "normalized": true,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": false
178
+ },
179
+ "50274": {
180
+ "content": " ",
181
+ "lstrip": false,
182
+ "normalized": true,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": false
186
+ },
187
+ "50275": {
188
+ "content": " ",
189
+ "lstrip": false,
190
+ "normalized": true,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": false
194
+ },
195
+ "50276": {
196
+ "content": " ",
197
+ "lstrip": false,
198
+ "normalized": true,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": false
202
+ },
203
+ "50277": {
204
+ "content": "|||EMAIL_ADDRESS|||",
205
+ "lstrip": false,
206
+ "normalized": true,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": false
210
+ },
211
+ "50278": {
212
+ "content": "|||PHONE_NUMBER|||",
213
+ "lstrip": false,
214
+ "normalized": true,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": false
218
+ },
219
+ "50279": {
220
+ "content": "<|endoftext|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "50280": {
228
+ "content": "[UNK]",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "50281": {
236
+ "content": "[CLS]",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "50282": {
244
+ "content": "[SEP]",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "50283": {
252
+ "content": "[PAD]",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "50284": {
260
+ "content": "[MASK]",
261
+ "lstrip": true,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "50285": {
268
+ "content": "[unused0]",
269
+ "lstrip": false,
270
+ "normalized": true,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": false
274
+ },
275
+ "50286": {
276
+ "content": "[unused1]",
277
+ "lstrip": false,
278
+ "normalized": true,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": false
282
+ },
283
+ "50287": {
284
+ "content": "[unused2]",
285
+ "lstrip": false,
286
+ "normalized": true,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": false
290
+ },
291
+ "50288": {
292
+ "content": "[unused3]",
293
+ "lstrip": false,
294
+ "normalized": true,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": false
298
+ },
299
+ "50289": {
300
+ "content": "[unused4]",
301
+ "lstrip": false,
302
+ "normalized": true,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": false
306
+ },
307
+ "50290": {
308
+ "content": "[unused5]",
309
+ "lstrip": false,
310
+ "normalized": true,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": false
314
+ },
315
+ "50291": {
316
+ "content": "[unused6]",
317
+ "lstrip": false,
318
+ "normalized": true,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": false
322
+ },
323
+ "50292": {
324
+ "content": "[unused7]",
325
+ "lstrip": false,
326
+ "normalized": true,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": false
330
+ },
331
+ "50293": {
332
+ "content": "[unused8]",
333
+ "lstrip": false,
334
+ "normalized": true,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": false
338
+ },
339
+ "50294": {
340
+ "content": "[unused9]",
341
+ "lstrip": false,
342
+ "normalized": true,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": false
346
+ },
347
+ "50295": {
348
+ "content": "[unused10]",
349
+ "lstrip": false,
350
+ "normalized": true,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": false
354
+ },
355
+ "50296": {
356
+ "content": "[unused11]",
357
+ "lstrip": false,
358
+ "normalized": true,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": false
362
+ },
363
+ "50297": {
364
+ "content": "[unused12]",
365
+ "lstrip": false,
366
+ "normalized": true,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": false
370
+ },
371
+ "50298": {
372
+ "content": "[unused13]",
373
+ "lstrip": false,
374
+ "normalized": true,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": false
378
+ },
379
+ "50299": {
380
+ "content": "[unused14]",
381
+ "lstrip": false,
382
+ "normalized": true,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": false
386
+ },
387
+ "50300": {
388
+ "content": "[unused15]",
389
+ "lstrip": false,
390
+ "normalized": true,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": false
394
+ },
395
+ "50301": {
396
+ "content": "[unused16]",
397
+ "lstrip": false,
398
+ "normalized": true,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": false
402
+ },
403
+ "50302": {
404
+ "content": "[unused17]",
405
+ "lstrip": false,
406
+ "normalized": true,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": false
410
+ },
411
+ "50303": {
412
+ "content": "[unused18]",
413
+ "lstrip": false,
414
+ "normalized": true,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": false
418
+ },
419
+ "50304": {
420
+ "content": "[unused19]",
421
+ "lstrip": false,
422
+ "normalized": true,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": false
426
+ },
427
+ "50305": {
428
+ "content": "[unused20]",
429
+ "lstrip": false,
430
+ "normalized": true,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": false
434
+ },
435
+ "50306": {
436
+ "content": "[unused21]",
437
+ "lstrip": false,
438
+ "normalized": true,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": false
442
+ },
443
+ "50307": {
444
+ "content": "[unused22]",
445
+ "lstrip": false,
446
+ "normalized": true,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": false
450
+ },
451
+ "50308": {
452
+ "content": "[unused23]",
453
+ "lstrip": false,
454
+ "normalized": true,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": false
458
+ },
459
+ "50309": {
460
+ "content": "[unused24]",
461
+ "lstrip": false,
462
+ "normalized": true,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": false
466
+ },
467
+ "50310": {
468
+ "content": "[unused25]",
469
+ "lstrip": false,
470
+ "normalized": true,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": false
474
+ },
475
+ "50311": {
476
+ "content": "[unused26]",
477
+ "lstrip": false,
478
+ "normalized": true,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": false
482
+ },
483
+ "50312": {
484
+ "content": "[unused27]",
485
+ "lstrip": false,
486
+ "normalized": true,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": false
490
+ },
491
+ "50313": {
492
+ "content": "[unused28]",
493
+ "lstrip": false,
494
+ "normalized": true,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": false
498
+ },
499
+ "50314": {
500
+ "content": "[unused29]",
501
+ "lstrip": false,
502
+ "normalized": true,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": false
506
+ },
507
+ "50315": {
508
+ "content": "[unused30]",
509
+ "lstrip": false,
510
+ "normalized": true,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": false
514
+ },
515
+ "50316": {
516
+ "content": "[unused31]",
517
+ "lstrip": false,
518
+ "normalized": true,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": false
522
+ },
523
+ "50317": {
524
+ "content": "[unused32]",
525
+ "lstrip": false,
526
+ "normalized": true,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": false
530
+ },
531
+ "50318": {
532
+ "content": "[unused33]",
533
+ "lstrip": false,
534
+ "normalized": true,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": false
538
+ },
539
+ "50319": {
540
+ "content": "[unused34]",
541
+ "lstrip": false,
542
+ "normalized": true,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": false
546
+ },
547
+ "50320": {
548
+ "content": "[unused35]",
549
+ "lstrip": false,
550
+ "normalized": true,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": false
554
+ },
555
+ "50321": {
556
+ "content": "[unused36]",
557
+ "lstrip": false,
558
+ "normalized": true,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": false
562
+ },
563
+ "50322": {
564
+ "content": "[unused37]",
565
+ "lstrip": false,
566
+ "normalized": true,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": false
570
+ },
571
+ "50323": {
572
+ "content": "[unused38]",
573
+ "lstrip": false,
574
+ "normalized": true,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": false
578
+ },
579
+ "50324": {
580
+ "content": "[unused39]",
581
+ "lstrip": false,
582
+ "normalized": true,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": false
586
+ },
587
+ "50325": {
588
+ "content": "[unused40]",
589
+ "lstrip": false,
590
+ "normalized": true,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": false
594
+ },
595
+ "50326": {
596
+ "content": "[unused41]",
597
+ "lstrip": false,
598
+ "normalized": true,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": false
602
+ },
603
+ "50327": {
604
+ "content": "[unused42]",
605
+ "lstrip": false,
606
+ "normalized": true,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": false
610
+ },
611
+ "50328": {
612
+ "content": "[unused43]",
613
+ "lstrip": false,
614
+ "normalized": true,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": false
618
+ },
619
+ "50329": {
620
+ "content": "[unused44]",
621
+ "lstrip": false,
622
+ "normalized": true,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": false
626
+ },
627
+ "50330": {
628
+ "content": "[unused45]",
629
+ "lstrip": false,
630
+ "normalized": true,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": false
634
+ },
635
+ "50331": {
636
+ "content": "[unused46]",
637
+ "lstrip": false,
638
+ "normalized": true,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": false
642
+ },
643
+ "50332": {
644
+ "content": "[unused47]",
645
+ "lstrip": false,
646
+ "normalized": true,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": false
650
+ },
651
+ "50333": {
652
+ "content": "[unused48]",
653
+ "lstrip": false,
654
+ "normalized": true,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": false
658
+ },
659
+ "50334": {
660
+ "content": "[unused49]",
661
+ "lstrip": false,
662
+ "normalized": true,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": false
666
+ },
667
+ "50335": {
668
+ "content": "[unused50]",
669
+ "lstrip": false,
670
+ "normalized": true,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": false
674
+ },
675
+ "50336": {
676
+ "content": "[unused51]",
677
+ "lstrip": false,
678
+ "normalized": true,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": false
682
+ },
683
+ "50337": {
684
+ "content": "[unused52]",
685
+ "lstrip": false,
686
+ "normalized": true,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": false
690
+ },
691
+ "50338": {
692
+ "content": "[unused53]",
693
+ "lstrip": false,
694
+ "normalized": true,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": false
698
+ },
699
+ "50339": {
700
+ "content": "[unused54]",
701
+ "lstrip": false,
702
+ "normalized": true,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": false
706
+ },
707
+ "50340": {
708
+ "content": "[unused55]",
709
+ "lstrip": false,
710
+ "normalized": true,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": false
714
+ },
715
+ "50341": {
716
+ "content": "[unused56]",
717
+ "lstrip": false,
718
+ "normalized": true,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": false
722
+ },
723
+ "50342": {
724
+ "content": "[unused57]",
725
+ "lstrip": false,
726
+ "normalized": true,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": false
730
+ },
731
+ "50343": {
732
+ "content": "[unused58]",
733
+ "lstrip": false,
734
+ "normalized": true,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": false
738
+ },
739
+ "50344": {
740
+ "content": "[unused59]",
741
+ "lstrip": false,
742
+ "normalized": true,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": false
746
+ },
747
+ "50345": {
748
+ "content": "[unused60]",
749
+ "lstrip": false,
750
+ "normalized": true,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": false
754
+ },
755
+ "50346": {
756
+ "content": "[unused61]",
757
+ "lstrip": false,
758
+ "normalized": true,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": false
762
+ },
763
+ "50347": {
764
+ "content": "[unused62]",
765
+ "lstrip": false,
766
+ "normalized": true,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": false
770
+ },
771
+ "50348": {
772
+ "content": "[unused63]",
773
+ "lstrip": false,
774
+ "normalized": true,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": false
778
+ },
779
+ "50349": {
780
+ "content": "[unused64]",
781
+ "lstrip": false,
782
+ "normalized": true,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": false
786
+ },
787
+ "50350": {
788
+ "content": "[unused65]",
789
+ "lstrip": false,
790
+ "normalized": true,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": false
794
+ },
795
+ "50351": {
796
+ "content": "[unused66]",
797
+ "lstrip": false,
798
+ "normalized": true,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": false
802
+ },
803
+ "50352": {
804
+ "content": "[unused67]",
805
+ "lstrip": false,
806
+ "normalized": true,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": false
810
+ },
811
+ "50353": {
812
+ "content": "[unused68]",
813
+ "lstrip": false,
814
+ "normalized": true,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": false
818
+ },
819
+ "50354": {
820
+ "content": "[unused69]",
821
+ "lstrip": false,
822
+ "normalized": true,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": false
826
+ },
827
+ "50355": {
828
+ "content": "[unused70]",
829
+ "lstrip": false,
830
+ "normalized": true,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": false
834
+ },
835
+ "50356": {
836
+ "content": "[unused71]",
837
+ "lstrip": false,
838
+ "normalized": true,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": false
842
+ },
843
+ "50357": {
844
+ "content": "[unused72]",
845
+ "lstrip": false,
846
+ "normalized": true,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": false
850
+ },
851
+ "50358": {
852
+ "content": "[unused73]",
853
+ "lstrip": false,
854
+ "normalized": true,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": false
858
+ },
859
+ "50359": {
860
+ "content": "[unused74]",
861
+ "lstrip": false,
862
+ "normalized": true,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": false
866
+ },
867
+ "50360": {
868
+ "content": "[unused75]",
869
+ "lstrip": false,
870
+ "normalized": true,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": false
874
+ },
875
+ "50361": {
876
+ "content": "[unused76]",
877
+ "lstrip": false,
878
+ "normalized": true,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": false
882
+ },
883
+ "50362": {
884
+ "content": "[unused77]",
885
+ "lstrip": false,
886
+ "normalized": true,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": false
890
+ },
891
+ "50363": {
892
+ "content": "[unused78]",
893
+ "lstrip": false,
894
+ "normalized": true,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": false
898
+ },
899
+ "50364": {
900
+ "content": "[unused79]",
901
+ "lstrip": false,
902
+ "normalized": true,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": false
906
+ },
907
+ "50365": {
908
+ "content": "[unused80]",
909
+ "lstrip": false,
910
+ "normalized": true,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": false
914
+ },
915
+ "50366": {
916
+ "content": "[unused81]",
917
+ "lstrip": false,
918
+ "normalized": true,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": false
922
+ },
923
+ "50367": {
924
+ "content": "[unused82]",
925
+ "lstrip": false,
926
+ "normalized": true,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": false
930
+ }
931
+ },
932
+ "clean_up_tokenization_spaces": true,
933
+ "cls_token": "[CLS]",
934
+ "extra_special_tokens": {},
935
+ "mask_token": "[MASK]",
936
+ "model_input_names": [
937
+ "input_ids",
938
+ "attention_mask"
939
+ ],
940
+ "model_max_length": 8192,
941
+ "pad_token": "[PAD]",
942
+ "sep_token": "[SEP]",
943
+ "tokenizer_class": "PreTrainedTokenizerFast",
944
+ "unk_token": "[UNK]"
945
+ }