tomaarsen HF Staff commited on
Commit
f0a1856
·
verified ·
1 Parent(s): 1c45cdc

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,836 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:90000
12
+ - loss:SpladeLoss
13
+ - loss:SparseMarginMSELoss
14
+ - loss:FlopsLoss
15
+ base_model: Luyu/co-condenser-marco
16
+ widget:
17
+ - text: up to what age can a child get autism
18
+ - text: food temperature danger zone
19
+ - text: Small and medium size poly tanks are relatively inexpensive. They are also
20
+ easy to handle, so poly tanks are used in many smaller wineries. New and used
21
+ poly. drums are available in 20, 30, 40 and 55 gallon sizes, and they make excellent
22
+ wine storage containers. for home winemakers. Just like glass, wine storage containers
23
+ made of polyethylene advantages and disadvantages. They are lightweight, and polyethylene
24
+ drums can be handled and stored easily.
25
+ - text: what county is louin ms
26
+ - text: Map of the Old City of Shanghai. By the early 1400s, Shanghai had become important
27
+ enough for Ming dynasty engineers to begin dredging the Huangpu River (also known
28
+ as Shen). In 1553, a city wall was built around the Old Town (Nanshi) as a defense
29
+ against the depredations of the Wokou (Japanese pirates).
30
+ datasets:
31
+ - sentence-transformers/msmarco
32
+ pipeline_tag: feature-extraction
33
+ library_name: sentence-transformers
34
+ metrics:
35
+ - dot_accuracy@1
36
+ - dot_accuracy@3
37
+ - dot_accuracy@5
38
+ - dot_accuracy@10
39
+ - dot_precision@1
40
+ - dot_precision@3
41
+ - dot_precision@5
42
+ - dot_precision@10
43
+ - dot_recall@1
44
+ - dot_recall@3
45
+ - dot_recall@5
46
+ - dot_recall@10
47
+ - dot_ndcg@10
48
+ - dot_mrr@10
49
+ - dot_map@100
50
+ - query_active_dims
51
+ - query_sparsity_ratio
52
+ - corpus_active_dims
53
+ - corpus_sparsity_ratio
54
+ co2_eq_emissions:
55
+ emissions: 84.77861327949611
56
+ energy_consumed: 0.21810696440845714
57
+ source: codecarbon
58
+ training_type: fine-tuning
59
+ on_cloud: false
60
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
61
+ ram_total_size: 31.777088165283203
62
+ hours_used: 0.618
63
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
64
+ model-index:
65
+ - name: CoCondenser trained on Natural-Questions tuples
66
+ results:
67
+ - task:
68
+ type: sparse-information-retrieval
69
+ name: Sparse Information Retrieval
70
+ dataset:
71
+ name: NanoMSMARCO
72
+ type: NanoMSMARCO
73
+ metrics:
74
+ - type: dot_accuracy@1
75
+ value: 0.46
76
+ name: Dot Accuracy@1
77
+ - type: dot_accuracy@3
78
+ value: 0.64
79
+ name: Dot Accuracy@3
80
+ - type: dot_accuracy@5
81
+ value: 0.72
82
+ name: Dot Accuracy@5
83
+ - type: dot_accuracy@10
84
+ value: 0.82
85
+ name: Dot Accuracy@10
86
+ - type: dot_precision@1
87
+ value: 0.46
88
+ name: Dot Precision@1
89
+ - type: dot_precision@3
90
+ value: 0.21333333333333335
91
+ name: Dot Precision@3
92
+ - type: dot_precision@5
93
+ value: 0.14400000000000002
94
+ name: Dot Precision@5
95
+ - type: dot_precision@10
96
+ value: 0.08199999999999999
97
+ name: Dot Precision@10
98
+ - type: dot_recall@1
99
+ value: 0.46
100
+ name: Dot Recall@1
101
+ - type: dot_recall@3
102
+ value: 0.64
103
+ name: Dot Recall@3
104
+ - type: dot_recall@5
105
+ value: 0.72
106
+ name: Dot Recall@5
107
+ - type: dot_recall@10
108
+ value: 0.82
109
+ name: Dot Recall@10
110
+ - type: dot_ndcg@10
111
+ value: 0.6288613269928542
112
+ name: Dot Ndcg@10
113
+ - type: dot_mrr@10
114
+ value: 0.5688571428571428
115
+ name: Dot Mrr@10
116
+ - type: dot_map@100
117
+ value: 0.5779425698484522
118
+ name: Dot Map@100
119
+ - type: query_active_dims
120
+ value: 56.099998474121094
121
+ name: Query Active Dims
122
+ - type: query_sparsity_ratio
123
+ value: 0.9981619815715183
124
+ name: Query Sparsity Ratio
125
+ - type: corpus_active_dims
126
+ value: 192.40869140625
127
+ name: Corpus Active Dims
128
+ - type: corpus_sparsity_ratio
129
+ value: 0.9936960654149056
130
+ name: Corpus Sparsity Ratio
131
+ - task:
132
+ type: sparse-information-retrieval
133
+ name: Sparse Information Retrieval
134
+ dataset:
135
+ name: NanoNFCorpus
136
+ type: NanoNFCorpus
137
+ metrics:
138
+ - type: dot_accuracy@1
139
+ value: 0.38
140
+ name: Dot Accuracy@1
141
+ - type: dot_accuracy@3
142
+ value: 0.58
143
+ name: Dot Accuracy@3
144
+ - type: dot_accuracy@5
145
+ value: 0.62
146
+ name: Dot Accuracy@5
147
+ - type: dot_accuracy@10
148
+ value: 0.74
149
+ name: Dot Accuracy@10
150
+ - type: dot_precision@1
151
+ value: 0.38
152
+ name: Dot Precision@1
153
+ - type: dot_precision@3
154
+ value: 0.36
155
+ name: Dot Precision@3
156
+ - type: dot_precision@5
157
+ value: 0.316
158
+ name: Dot Precision@5
159
+ - type: dot_precision@10
160
+ value: 0.26999999999999996
161
+ name: Dot Precision@10
162
+ - type: dot_recall@1
163
+ value: 0.039663209420347775
164
+ name: Dot Recall@1
165
+ - type: dot_recall@3
166
+ value: 0.07520387221675563
167
+ name: Dot Recall@3
168
+ - type: dot_recall@5
169
+ value: 0.09363263999248954
170
+ name: Dot Recall@5
171
+ - type: dot_recall@10
172
+ value: 0.14669853217549625
173
+ name: Dot Recall@10
174
+ - type: dot_ndcg@10
175
+ value: 0.3303519560816792
176
+ name: Dot Ndcg@10
177
+ - type: dot_mrr@10
178
+ value: 0.49576984126984125
179
+ name: Dot Mrr@10
180
+ - type: dot_map@100
181
+ value: 0.14778057031019226
182
+ name: Dot Map@100
183
+ - type: query_active_dims
184
+ value: 53.68000030517578
185
+ name: Query Active Dims
186
+ - type: query_sparsity_ratio
187
+ value: 0.9982412685831473
188
+ name: Query Sparsity Ratio
189
+ - type: corpus_active_dims
190
+ value: 367.5431823730469
191
+ name: Corpus Active Dims
192
+ - type: corpus_sparsity_ratio
193
+ value: 0.9879580898246167
194
+ name: Corpus Sparsity Ratio
195
+ - task:
196
+ type: sparse-information-retrieval
197
+ name: Sparse Information Retrieval
198
+ dataset:
199
+ name: NanoNQ
200
+ type: NanoNQ
201
+ metrics:
202
+ - type: dot_accuracy@1
203
+ value: 0.5
204
+ name: Dot Accuracy@1
205
+ - type: dot_accuracy@3
206
+ value: 0.76
207
+ name: Dot Accuracy@3
208
+ - type: dot_accuracy@5
209
+ value: 0.8
210
+ name: Dot Accuracy@5
211
+ - type: dot_accuracy@10
212
+ value: 0.88
213
+ name: Dot Accuracy@10
214
+ - type: dot_precision@1
215
+ value: 0.5
216
+ name: Dot Precision@1
217
+ - type: dot_precision@3
218
+ value: 0.25999999999999995
219
+ name: Dot Precision@3
220
+ - type: dot_precision@5
221
+ value: 0.16799999999999998
222
+ name: Dot Precision@5
223
+ - type: dot_precision@10
224
+ value: 0.09599999999999997
225
+ name: Dot Precision@10
226
+ - type: dot_recall@1
227
+ value: 0.48
228
+ name: Dot Recall@1
229
+ - type: dot_recall@3
230
+ value: 0.71
231
+ name: Dot Recall@3
232
+ - type: dot_recall@5
233
+ value: 0.75
234
+ name: Dot Recall@5
235
+ - type: dot_recall@10
236
+ value: 0.85
237
+ name: Dot Recall@10
238
+ - type: dot_ndcg@10
239
+ value: 0.677150216479017
240
+ name: Dot Ndcg@10
241
+ - type: dot_mrr@10
242
+ value: 0.6328888888888887
243
+ name: Dot Mrr@10
244
+ - type: dot_map@100
245
+ value: 0.6167275355591967
246
+ name: Dot Map@100
247
+ - type: query_active_dims
248
+ value: 55.939998626708984
249
+ name: Query Active Dims
250
+ - type: query_sparsity_ratio
251
+ value: 0.9981672236869567
252
+ name: Query Sparsity Ratio
253
+ - type: corpus_active_dims
254
+ value: 228.83615112304688
255
+ name: Corpus Active Dims
256
+ - type: corpus_sparsity_ratio
257
+ value: 0.9925025833456834
258
+ name: Corpus Sparsity Ratio
259
+ - task:
260
+ type: sparse-nano-beir
261
+ name: Sparse Nano BEIR
262
+ dataset:
263
+ name: NanoBEIR mean
264
+ type: NanoBEIR_mean
265
+ metrics:
266
+ - type: dot_accuracy@1
267
+ value: 0.4466666666666667
268
+ name: Dot Accuracy@1
269
+ - type: dot_accuracy@3
270
+ value: 0.66
271
+ name: Dot Accuracy@3
272
+ - type: dot_accuracy@5
273
+ value: 0.7133333333333333
274
+ name: Dot Accuracy@5
275
+ - type: dot_accuracy@10
276
+ value: 0.8133333333333334
277
+ name: Dot Accuracy@10
278
+ - type: dot_precision@1
279
+ value: 0.4466666666666667
280
+ name: Dot Precision@1
281
+ - type: dot_precision@3
282
+ value: 0.27777777777777773
283
+ name: Dot Precision@3
284
+ - type: dot_precision@5
285
+ value: 0.20933333333333334
286
+ name: Dot Precision@5
287
+ - type: dot_precision@10
288
+ value: 0.14933333333333332
289
+ name: Dot Precision@10
290
+ - type: dot_recall@1
291
+ value: 0.3265544031401159
292
+ name: Dot Recall@1
293
+ - type: dot_recall@3
294
+ value: 0.47506795740558516
295
+ name: Dot Recall@3
296
+ - type: dot_recall@5
297
+ value: 0.5212108799974965
298
+ name: Dot Recall@5
299
+ - type: dot_recall@10
300
+ value: 0.605566177391832
301
+ name: Dot Recall@10
302
+ - type: dot_ndcg@10
303
+ value: 0.5454544998511834
304
+ name: Dot Ndcg@10
305
+ - type: dot_mrr@10
306
+ value: 0.5658386243386242
307
+ name: Dot Mrr@10
308
+ - type: dot_map@100
309
+ value: 0.44748355857261374
310
+ name: Dot Map@100
311
+ - type: query_active_dims
312
+ value: 55.23999913533529
313
+ name: Query Active Dims
314
+ - type: query_sparsity_ratio
315
+ value: 0.9981901579472073
316
+ name: Query Sparsity Ratio
317
+ - type: corpus_active_dims
318
+ value: 246.17159613336406
319
+ name: Corpus Active Dims
320
+ - type: corpus_sparsity_ratio
321
+ value: 0.9919346177795241
322
+ name: Corpus Sparsity Ratio
323
+ ---
324
+
325
+ # CoCondenser trained on Natural-Questions tuples
326
+
327
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [Luyu/co-condenser-marco](https://huggingface.co/Luyu/co-condenser-marco) on the [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
328
+ ## Model Details
329
+
330
+ ### Model Description
331
+ - **Model Type:** SPLADE Sparse Encoder
332
+ - **Base model:** [Luyu/co-condenser-marco](https://huggingface.co/Luyu/co-condenser-marco) <!-- at revision e0cef0ab2410aae0f0994366ddefb5649a266709 -->
333
+ - **Maximum Sequence Length:** 512 tokens
334
+ - **Output Dimensionality:** 30522 dimensions
335
+ - **Similarity Function:** Dot Product
336
+ - **Training Dataset:**
337
+ - [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco)
338
+ - **Language:** en
339
+ - **License:** apache-2.0
340
+
341
+ ### Model Sources
342
+
343
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
344
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
345
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
346
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
347
+
348
+ ### Full Model Architecture
349
+
350
+ ```
351
+ SparseEncoder(
352
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
353
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
354
+ )
355
+ ```
356
+
357
+ ## Usage
358
+
359
+ ### Direct Usage (Sentence Transformers)
360
+
361
+ First install the Sentence Transformers library:
362
+
363
+ ```bash
364
+ pip install -U sentence-transformers
365
+ ```
366
+
367
+ Then you can load this model and run inference.
368
+ ```python
369
+ from sentence_transformers import SparseEncoder
370
+
371
+ # Download from the 🤗 Hub
372
+ model = SparseEncoder("tomaarsen/splade-cocondenser-msmarco-margin-mse")
373
+ # Run inference
374
+ queries = [
375
+ "when did shanghai disneyland open",
376
+ ]
377
+ documents = [
378
+ "Shanghai Disney officially opens: A peek inside. June 17, 2016, 6 p.m. After five years of construction, $5.5 billion in spending and a month of testing to work out the kinks, Shanghai Disney Resort opened to the public just before noon, Shanghai time, on Thursday, June 16 (which was 9 p.m. Wednesday in Anaheim, home of the original Disney park). Shanghai Disneyland features six themed areas, and the resort contains two hotels, a shopping district and 99 acres of gardens, lakes and parkland. We'll keep you updated throughout the week with new details and peeks inside the resort.",
379
+ 'Map of the Old City of Shanghai. By the early 1400s, Shanghai had become important enough for Ming dynasty engineers to begin dredging the Huangpu River (also known as Shen). In 1553, a city wall was built around the Old Town (Nanshi) as a defense against the depredations of the Wokou (Japanese pirates).',
380
+ 'The conflict is referred to in China as the War of Resistance against Japanese Aggression (1937-45) and the Anti-Fascist War. Japanâ\x80\x99s expansionist policy of the 1930s, driven by the military, was to set up what it called the Greater East Asia Co-Prosperity Sphere. Marco Polo Bridge, Beijing.A sphere.e are marking the anniversary of Germany and Japanâ\x80\x99s surrender in 1945, but it is legitimate to suggest that the incident that sparked the conflict that became WWII occurred not in Poland in 1939 but in China, near this eleven-arched bridge on the outskirts of Beijing, in July 1937. Letâ\x80\x99s look at the undisputed facts.',
381
+ ]
382
+ query_embeddings = model.encode_query(queries)
383
+ document_embeddings = model.encode_document(documents)
384
+ print(query_embeddings.shape, document_embeddings.shape)
385
+ # [1, 30522] [3, 30522]
386
+
387
+ # Get the similarity scores for the embeddings
388
+ similarities = model.similarity(query_embeddings, document_embeddings)
389
+ print(similarities)
390
+ # tensor([[31.8057, 19.5344, 12.4372]])
391
+ ```
392
+
393
+ <!--
394
+ ### Direct Usage (Transformers)
395
+
396
+ <details><summary>Click to see the direct usage in Transformers</summary>
397
+
398
+ </details>
399
+ -->
400
+
401
+ <!--
402
+ ### Downstream Usage (Sentence Transformers)
403
+
404
+ You can finetune this model on your own dataset.
405
+
406
+ <details><summary>Click to expand</summary>
407
+
408
+ </details>
409
+ -->
410
+
411
+ <!--
412
+ ### Out-of-Scope Use
413
+
414
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
415
+ -->
416
+
417
+ ## Evaluation
418
+
419
+ ### Metrics
420
+
421
+ #### Sparse Information Retrieval
422
+
423
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
424
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
425
+
426
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
427
+ |:----------------------|:------------|:-------------|:-----------|
428
+ | dot_accuracy@1 | 0.46 | 0.38 | 0.5 |
429
+ | dot_accuracy@3 | 0.64 | 0.58 | 0.76 |
430
+ | dot_accuracy@5 | 0.72 | 0.62 | 0.8 |
431
+ | dot_accuracy@10 | 0.82 | 0.74 | 0.88 |
432
+ | dot_precision@1 | 0.46 | 0.38 | 0.5 |
433
+ | dot_precision@3 | 0.2133 | 0.36 | 0.26 |
434
+ | dot_precision@5 | 0.144 | 0.316 | 0.168 |
435
+ | dot_precision@10 | 0.082 | 0.27 | 0.096 |
436
+ | dot_recall@1 | 0.46 | 0.0397 | 0.48 |
437
+ | dot_recall@3 | 0.64 | 0.0752 | 0.71 |
438
+ | dot_recall@5 | 0.72 | 0.0936 | 0.75 |
439
+ | dot_recall@10 | 0.82 | 0.1467 | 0.85 |
440
+ | **dot_ndcg@10** | **0.6289** | **0.3304** | **0.6772** |
441
+ | dot_mrr@10 | 0.5689 | 0.4958 | 0.6329 |
442
+ | dot_map@100 | 0.5779 | 0.1478 | 0.6167 |
443
+ | query_active_dims | 56.1 | 53.68 | 55.94 |
444
+ | query_sparsity_ratio | 0.9982 | 0.9982 | 0.9982 |
445
+ | corpus_active_dims | 192.4087 | 367.5432 | 228.8362 |
446
+ | corpus_sparsity_ratio | 0.9937 | 0.988 | 0.9925 |
447
+
448
+ #### Sparse Nano BEIR
449
+
450
+ * Dataset: `NanoBEIR_mean`
451
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
452
+ ```json
453
+ {
454
+ "dataset_names": [
455
+ "msmarco",
456
+ "nfcorpus",
457
+ "nq"
458
+ ]
459
+ }
460
+ ```
461
+
462
+ | Metric | Value |
463
+ |:----------------------|:-----------|
464
+ | dot_accuracy@1 | 0.4467 |
465
+ | dot_accuracy@3 | 0.66 |
466
+ | dot_accuracy@5 | 0.7133 |
467
+ | dot_accuracy@10 | 0.8133 |
468
+ | dot_precision@1 | 0.4467 |
469
+ | dot_precision@3 | 0.2778 |
470
+ | dot_precision@5 | 0.2093 |
471
+ | dot_precision@10 | 0.1493 |
472
+ | dot_recall@1 | 0.3266 |
473
+ | dot_recall@3 | 0.4751 |
474
+ | dot_recall@5 | 0.5212 |
475
+ | dot_recall@10 | 0.6056 |
476
+ | **dot_ndcg@10** | **0.5455** |
477
+ | dot_mrr@10 | 0.5658 |
478
+ | dot_map@100 | 0.4475 |
479
+ | query_active_dims | 55.24 |
480
+ | query_sparsity_ratio | 0.9982 |
481
+ | corpus_active_dims | 246.1716 |
482
+ | corpus_sparsity_ratio | 0.9919 |
483
+
484
+ <!--
485
+ ## Bias, Risks and Limitations
486
+
487
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
488
+ -->
489
+
490
+ <!--
491
+ ### Recommendations
492
+
493
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
494
+ -->
495
+
496
+ ## Training Details
497
+
498
+ ### Training Dataset
499
+
500
+ #### msmarco
501
+
502
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
503
+ * Size: 90,000 training samples
504
+ * Columns: <code>score</code>, <code>query</code>, <code>positive</code>, and <code>negative</code>
505
+ * Approximate statistics based on the first 1000 samples:
506
+ | | score | query | positive | negative |
507
+ |:--------|:--------------------------------------------------------------------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
508
+ | type | float | string | string | string |
509
+ | details | <ul><li>min: -2.22</li><li>mean: 13.59</li><li>max: 22.53</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.05 tokens</li><li>max: 40 tokens</li></ul> | <ul><li>min: 19 tokens</li><li>mean: 81.18 tokens</li><li>max: 203 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 77.08 tokens</li><li>max: 249 tokens</li></ul> |
510
+ * Samples:
511
+ | score | query | positive | negative |
512
+ |:-------------------------------|:-----------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
513
+ | <code>4.470368494590124</code> | <code>where does the bile duct carry its secretions</code> | <code>The function of the common bile duct is to carry bile from the liver and the gallbladder into the duodenum, the top of the small intestine directly after the stomach. The bile it carries interacts with ingested fats and fat-soluble vitamins to enable them to be absorbed by the intestine.</code> | <code>The gall bladder is a pouch-shaped organ that stores the bile produced by the liver. The gall bladder shares a vessel, called the common bile duct, with the liver. When bile is needed, it moves through the common bile duct into the first part of the small intestine, the duodenum. It is here that the bile breaks down fat.</code> |
514
+ | <code>9.550037781397503</code> | <code>definition of reverse auction</code> | <code>Reverse auction. A reverse auction is a type of auction in which the roles of buyer and seller are reversed. In an ordinary auction (also known as a 'forward auction'), buyers compete to obtain goods or services by offering increasingly higher prices. In a reverse auction, the sellers compete to obtain business from the buyer and prices will typically decrease as the sellers underbid each other.</code> | <code>No-reserve auction. A No-reserve auction (NR), also known as an absolute auction, is an auction in which the item for sale will be sold regardless of price. From the seller's perspective, advertising an auction as having no reserve price can be desirable because it potentially attracts a greater number of bidders due to the possibility of a bargain.</code> |
515
+ | <code>19.58259622255961</code> | <code>how do i prevent diverticulitis</code> | <code>Follow Following Unfollow Pending Disabled. A , Gastroenterology, answered. The suggestion to prevent diverticulitis is to eat a diet high in fiber, and that includes high-fiber whole grains, fruits, vegetables, nuts, and seeds. I’m aware that some gastroenterologists say to avoid all seeds and nuts, so some of you are nuts enough to wash tomato seeds from slices and pick free poppy seeds from buns.</code> | <code>The test is fast and easy especially with the newer CT scanners. But does it provide the information needed? CT KUBs are used to screen for a variety of intra-abdominal conditions, including appendicitis, kidney stones, diverticulitis, and others.</code> |
516
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
517
+ ```json
518
+ {
519
+ "loss": "SparseMarginMSELoss",
520
+ "lambda_corpus": 0.08,
521
+ "lambda_query": 0.1
522
+ }
523
+ ```
524
+
525
+ ### Evaluation Dataset
526
+
527
+ #### msmarco
528
+
529
+ * Dataset: [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco) at [9e329ed](https://huggingface.co/datasets/sentence-transformers/msmarco/tree/9e329ed2e649c9d37b0d91dd6b764ff6fe671d83)
530
+ * Size: 10,000 evaluation samples
531
+ * Columns: <code>score</code>, <code>query</code>, <code>positive</code>, and <code>negative</code>
532
+ * Approximate statistics based on the first 1000 samples:
533
+ | | score | query | positive | negative |
534
+ |:--------|:-------------------------------------------------------------------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
535
+ | type | float | string | string | string |
536
+ | details | <ul><li>min: -1.34</li><li>mean: 13.49</li><li>max: 22.2</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 8.85 tokens</li><li>max: 27 tokens</li></ul> | <ul><li>min: 14 tokens</li><li>mean: 80.48 tokens</li><li>max: 211 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 77.44 tokens</li><li>max: 209 tokens</li></ul> |
537
+ * Samples:
538
+ | score | query | positive | negative |
539
+ |:-------------------------------|:-----------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
540
+ | <code>15.64028427998225</code> | <code>what is a protected seedbed</code> | <code>A seedbed is a plot of garden set aside to grow vegetables seeds, which can later be transplanted. seedbed is a plot of garden set aside to grow vegetables seeds, which can later be transplanted.</code> | <code>Several articles within the Confederate States’ Constitution specifically protected slavery within the Confederacy, but some articles of the U.S. Constitution also protected slavery—the Emancipation Proclamation drew a clearer distinction between the two.</code> |
541
+ | <code>6.375148057937622</code> | <code>who founded ecuador</code> | <code>The first Spanish settlement in Ecuador was established in 1534 at Quito on the site of an important Incan town of the same name. Another settlement was established four years later near the river Guayas in Guayaquil.</code> | <code>Zuleta is a colonial working farm of 4,000 acres (2,000 hectares) that belongs to the family of Mr. Galo Plaza lasso, a former president of Ecuador, for more than 100 years. It was chosen as one of the world’s “Top Ten Finds” by Outside magazine and named as one of the best Ecuador Hotel by National Geographic Traveler.</code> |
542
+ | <code>8.436618288358051</code> | <code>what is aol problem</code> | <code>AOL problems. Lots of people are reporting ongoing (RTR:GE) messages from AOL today. This indicates the AOL mail servers are having problems and can’t accept mail. This has nothing to do with spam, filtering or malicious email. This is simply their servers aren’t functioning as well as they should be and so AOL can’t accept all the mail thrown at them. These types of blocks resolve themselves. Update Feb 8, 2016: AOL users are having problems logging in.</code> | <code>Executive Director. I have read these complaints of poor service and agree 110%. I'm a college professor and give extra credit to all AOL users and over the 100% highest grade. I thought I phoned AOL and get some chap in India who is a proven scam man and I'm the poor American SOB who gets whacked.</code> |
543
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
544
+ ```json
545
+ {
546
+ "loss": "SparseMarginMSELoss",
547
+ "lambda_corpus": 0.08,
548
+ "lambda_query": 0.1
549
+ }
550
+ ```
551
+
552
+ ### Training Hyperparameters
553
+ #### Non-Default Hyperparameters
554
+
555
+ - `eval_strategy`: steps
556
+ - `per_device_train_batch_size`: 16
557
+ - `per_device_eval_batch_size`: 16
558
+ - `learning_rate`: 2e-05
559
+ - `num_train_epochs`: 1
560
+ - `warmup_ratio`: 0.1
561
+ - `fp16`: True
562
+ - `batch_sampler`: no_duplicates
563
+
564
+ #### All Hyperparameters
565
+ <details><summary>Click to expand</summary>
566
+
567
+ - `overwrite_output_dir`: False
568
+ - `do_predict`: False
569
+ - `eval_strategy`: steps
570
+ - `prediction_loss_only`: True
571
+ - `per_device_train_batch_size`: 16
572
+ - `per_device_eval_batch_size`: 16
573
+ - `per_gpu_train_batch_size`: None
574
+ - `per_gpu_eval_batch_size`: None
575
+ - `gradient_accumulation_steps`: 1
576
+ - `eval_accumulation_steps`: None
577
+ - `torch_empty_cache_steps`: None
578
+ - `learning_rate`: 2e-05
579
+ - `weight_decay`: 0.0
580
+ - `adam_beta1`: 0.9
581
+ - `adam_beta2`: 0.999
582
+ - `adam_epsilon`: 1e-08
583
+ - `max_grad_norm`: 1.0
584
+ - `num_train_epochs`: 1
585
+ - `max_steps`: -1
586
+ - `lr_scheduler_type`: linear
587
+ - `lr_scheduler_kwargs`: {}
588
+ - `warmup_ratio`: 0.1
589
+ - `warmup_steps`: 0
590
+ - `log_level`: passive
591
+ - `log_level_replica`: warning
592
+ - `log_on_each_node`: True
593
+ - `logging_nan_inf_filter`: True
594
+ - `save_safetensors`: True
595
+ - `save_on_each_node`: False
596
+ - `save_only_model`: False
597
+ - `restore_callback_states_from_checkpoint`: False
598
+ - `no_cuda`: False
599
+ - `use_cpu`: False
600
+ - `use_mps_device`: False
601
+ - `seed`: 42
602
+ - `data_seed`: None
603
+ - `jit_mode_eval`: False
604
+ - `use_ipex`: False
605
+ - `bf16`: False
606
+ - `fp16`: True
607
+ - `fp16_opt_level`: O1
608
+ - `half_precision_backend`: auto
609
+ - `bf16_full_eval`: False
610
+ - `fp16_full_eval`: False
611
+ - `tf32`: None
612
+ - `local_rank`: 0
613
+ - `ddp_backend`: None
614
+ - `tpu_num_cores`: None
615
+ - `tpu_metrics_debug`: False
616
+ - `debug`: []
617
+ - `dataloader_drop_last`: False
618
+ - `dataloader_num_workers`: 0
619
+ - `dataloader_prefetch_factor`: None
620
+ - `past_index`: -1
621
+ - `disable_tqdm`: False
622
+ - `remove_unused_columns`: True
623
+ - `label_names`: None
624
+ - `load_best_model_at_end`: False
625
+ - `ignore_data_skip`: False
626
+ - `fsdp`: []
627
+ - `fsdp_min_num_params`: 0
628
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
629
+ - `fsdp_transformer_layer_cls_to_wrap`: None
630
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
631
+ - `deepspeed`: None
632
+ - `label_smoothing_factor`: 0.0
633
+ - `optim`: adamw_torch
634
+ - `optim_args`: None
635
+ - `adafactor`: False
636
+ - `group_by_length`: False
637
+ - `length_column_name`: length
638
+ - `ddp_find_unused_parameters`: None
639
+ - `ddp_bucket_cap_mb`: None
640
+ - `ddp_broadcast_buffers`: False
641
+ - `dataloader_pin_memory`: True
642
+ - `dataloader_persistent_workers`: False
643
+ - `skip_memory_metrics`: True
644
+ - `use_legacy_prediction_loop`: False
645
+ - `push_to_hub`: False
646
+ - `resume_from_checkpoint`: None
647
+ - `hub_model_id`: None
648
+ - `hub_strategy`: every_save
649
+ - `hub_private_repo`: None
650
+ - `hub_always_push`: False
651
+ - `gradient_checkpointing`: False
652
+ - `gradient_checkpointing_kwargs`: None
653
+ - `include_inputs_for_metrics`: False
654
+ - `include_for_metrics`: []
655
+ - `eval_do_concat_batches`: True
656
+ - `fp16_backend`: auto
657
+ - `push_to_hub_model_id`: None
658
+ - `push_to_hub_organization`: None
659
+ - `mp_parameters`:
660
+ - `auto_find_batch_size`: False
661
+ - `full_determinism`: False
662
+ - `torchdynamo`: None
663
+ - `ray_scope`: last
664
+ - `ddp_timeout`: 1800
665
+ - `torch_compile`: False
666
+ - `torch_compile_backend`: None
667
+ - `torch_compile_mode`: None
668
+ - `include_tokens_per_second`: False
669
+ - `include_num_input_tokens_seen`: False
670
+ - `neftune_noise_alpha`: None
671
+ - `optim_target_modules`: None
672
+ - `batch_eval_metrics`: False
673
+ - `eval_on_start`: False
674
+ - `use_liger_kernel`: False
675
+ - `eval_use_gather_object`: False
676
+ - `average_tokens_across_devices`: False
677
+ - `prompts`: None
678
+ - `batch_sampler`: no_duplicates
679
+ - `multi_dataset_batch_sampler`: proportional
680
+ - `router_mapping`: {}
681
+ - `learning_rate_mapping`: {}
682
+
683
+ </details>
684
+
685
+ ### Training Logs
686
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
687
+ |:------:|:----:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|
688
+ | 0.0178 | 100 | 805201.68 | - | - | - | - | - |
689
+ | 0.0356 | 200 | 11999.3975 | - | - | - | - | - |
690
+ | 0.0533 | 300 | 124.0031 | - | - | - | - | - |
691
+ | 0.0711 | 400 | 62.6813 | - | - | - | - | - |
692
+ | 0.0889 | 500 | 46.0329 | 49.7658 | 0.4890 | 0.2543 | 0.5131 | 0.4188 |
693
+ | 0.1067 | 600 | 41.2877 | - | - | - | - | - |
694
+ | 0.1244 | 700 | 35.3636 | - | - | - | - | - |
695
+ | 0.1422 | 800 | 33.3727 | - | - | - | - | - |
696
+ | 0.16 | 900 | 29.389 | - | - | - | - | - |
697
+ | 0.1778 | 1000 | 31.2482 | 28.1527 | 0.5652 | 0.2875 | 0.5423 | 0.4650 |
698
+ | 0.1956 | 1100 | 31.43 | - | - | - | - | - |
699
+ | 0.2133 | 1200 | 27.9919 | - | - | - | - | - |
700
+ | 0.2311 | 1300 | 26.9214 | - | - | - | - | - |
701
+ | 0.2489 | 1400 | 27.5533 | - | - | - | - | - |
702
+ | 0.2667 | 1500 | 25.7473 | 26.8466 | 0.5837 | 0.3265 | 0.6268 | 0.5123 |
703
+ | 0.2844 | 1600 | 26.7899 | - | - | - | - | - |
704
+ | 0.3022 | 1700 | 24.0652 | - | - | - | - | - |
705
+ | 0.32 | 1800 | 23.5837 | - | - | - | - | - |
706
+ | 0.3378 | 1900 | 24.1051 | - | - | - | - | - |
707
+ | 0.3556 | 2000 | 24.6901 | 22.0851 | 0.6018 | 0.3325 | 0.6359 | 0.5234 |
708
+ | 0.3733 | 2100 | 21.5136 | - | - | - | - | - |
709
+ | 0.3911 | 2200 | 22.066 | - | - | - | - | - |
710
+ | 0.4089 | 2300 | 20.8234 | - | - | - | - | - |
711
+ | 0.4267 | 2400 | 20.1988 | - | - | - | - | - |
712
+ | 0.4444 | 2500 | 20.0342 | 20.3437 | 0.5901 | 0.3222 | 0.6010 | 0.5044 |
713
+ | 0.4622 | 2600 | 18.8835 | - | - | - | - | - |
714
+ | 0.48 | 2700 | 19.4797 | - | - | - | - | - |
715
+ | 0.4978 | 2800 | 19.6199 | - | - | - | - | - |
716
+ | 0.5156 | 2900 | 16.6963 | - | - | - | - | - |
717
+ | 0.5333 | 3000 | 19.9204 | 18.0851 | 0.5915 | 0.3111 | 0.6323 | 0.5116 |
718
+ | 0.5511 | 3100 | 18.7849 | - | - | - | - | - |
719
+ | 0.5689 | 3200 | 18.3169 | - | - | - | - | - |
720
+ | 0.5867 | 3300 | 17.1938 | - | - | - | - | - |
721
+ | 0.6044 | 3400 | 18.0807 | - | - | - | - | - |
722
+ | 0.6222 | 3500 | 16.7721 | 20.1195 | 0.6012 | 0.3119 | 0.6337 | 0.5156 |
723
+ | 0.64 | 3600 | 16.7909 | - | - | - | - | - |
724
+ | 0.6578 | 3700 | 16.4954 | - | - | - | - | - |
725
+ | 0.6756 | 3800 | 16.3734 | - | - | - | - | - |
726
+ | 0.6933 | 3900 | 17.2231 | - | - | - | - | - |
727
+ | 0.7111 | 4000 | 16.8486 | 17.5785 | 0.6228 | 0.3423 | 0.6553 | 0.5401 |
728
+ | 0.7289 | 4100 | 18.2939 | - | - | - | - | - |
729
+ | 0.7467 | 4200 | 16.1108 | - | - | - | - | - |
730
+ | 0.7644 | 4300 | 16.878 | - | - | - | - | - |
731
+ | 0.7822 | 4400 | 15.6163 | - | - | - | - | - |
732
+ | 0.8 | 4500 | 15.8337 | 16.1847 | 0.6286 | 0.3376 | 0.6639 | 0.5434 |
733
+ | 0.8178 | 4600 | 15.5014 | - | - | - | - | - |
734
+ | 0.8356 | 4700 | 15.7579 | - | - | - | - | - |
735
+ | 0.8533 | 4800 | 15.9361 | - | - | - | - | - |
736
+ | 0.8711 | 4900 | 16.3308 | - | - | - | - | - |
737
+ | 0.8889 | 5000 | 14.8395 | 17.4054 | 0.6221 | 0.3280 | 0.6853 | 0.5451 |
738
+ | 0.9067 | 5100 | 14.8655 | - | - | - | - | - |
739
+ | 0.9244 | 5200 | 14.6498 | - | - | - | - | - |
740
+ | 0.9422 | 5300 | 15.5189 | - | - | - | - | - |
741
+ | 0.96 | 5400 | 14.608 | - | - | - | - | - |
742
+ | 0.9778 | 5500 | 15.6019 | 16.4883 | 0.6298 | 0.3317 | 0.6831 | 0.5482 |
743
+ | 0.9956 | 5600 | 14.6263 | - | - | - | - | - |
744
+ | -1 | -1 | - | - | 0.6289 | 0.3304 | 0.6772 | 0.5455 |
745
+
746
+
747
+ ### Environmental Impact
748
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
749
+ - **Energy Consumed**: 0.218 kWh
750
+ - **Carbon Emitted**: 0.085 kg of CO2
751
+ - **Hours Used**: 0.618 hours
752
+
753
+ ### Training Hardware
754
+ - **On Cloud**: No
755
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
756
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
757
+ - **RAM Size**: 31.78 GB
758
+
759
+ ### Framework Versions
760
+ - Python: 3.11.6
761
+ - Sentence Transformers: 4.2.0.dev0
762
+ - Transformers: 4.52.4
763
+ - PyTorch: 2.6.0+cu124
764
+ - Accelerate: 1.5.1
765
+ - Datasets: 2.21.0
766
+ - Tokenizers: 0.21.1
767
+
768
+ ## Citation
769
+
770
+ ### BibTeX
771
+
772
+ #### Sentence Transformers
773
+ ```bibtex
774
+ @inproceedings{reimers-2019-sentence-bert,
775
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
776
+ author = "Reimers, Nils and Gurevych, Iryna",
777
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
778
+ month = "11",
779
+ year = "2019",
780
+ publisher = "Association for Computational Linguistics",
781
+ url = "https://arxiv.org/abs/1908.10084",
782
+ }
783
+ ```
784
+
785
+ #### SpladeLoss
786
+ ```bibtex
787
+ @misc{formal2022distillationhardnegativesampling,
788
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
789
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
790
+ year={2022},
791
+ eprint={2205.04733},
792
+ archivePrefix={arXiv},
793
+ primaryClass={cs.IR},
794
+ url={https://arxiv.org/abs/2205.04733},
795
+ }
796
+ ```
797
+
798
+ #### SparseMarginMSELoss
799
+ ```bibtex
800
+ @misc{hofstätter2021improving,
801
+ title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
802
+ author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
803
+ year={2021},
804
+ eprint={2010.02666},
805
+ archivePrefix={arXiv},
806
+ primaryClass={cs.IR}
807
+ }
808
+ ```
809
+
810
+ #### FlopsLoss
811
+ ```bibtex
812
+ @article{paria2020minimizing,
813
+ title={Minimizing flops to learn efficient sparse representations},
814
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
815
+ journal={arXiv preprint arXiv:2004.05665},
816
+ year={2020}
817
+ }
818
+ ```
819
+
820
+ <!--
821
+ ## Glossary
822
+
823
+ *Clearly define terms in order to be accessible across audiences.*
824
+ -->
825
+
826
+ <!--
827
+ ## Model Card Authors
828
+
829
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
830
+ -->
831
+
832
+ <!--
833
+ ## Model Card Contact
834
+
835
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
836
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.52.4",
6
+ "pytorch": "2.6.0+cu124"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e64a3093285ddf8c30ab25bb7a9d6d45c7b974bdc70ac325605af3b962115b9
3
+ size 438080896
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff