tomaarsen HF Staff commited on
Commit
3c6aac9
·
verified ·
1 Parent(s): c4a5314

Add new SparseEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,842 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - asymmetric
10
+ - inference-free
11
+ - splade
12
+ - generated_from_trainer
13
+ - dataset_size:99000
14
+ - loss:SpladeLoss
15
+ - loss:SparseMultipleNegativesRankingLoss
16
+ - loss:FlopsLoss
17
+ widget:
18
+ - text: Rollin' (Limp Bizkit song) The music video was filmed atop the South Tower
19
+ of the former World Trade Center in New York City. The introduction features Ben
20
+ Stiller and Stephen Dorff mistaking Fred Durst for the valet and giving him the
21
+ keys to their Bentley Azure. Also making a cameo is break dancer Mr. Wiggles.
22
+ The rest of the video has several cuts to Durst and his bandmates hanging out
23
+ of the Bentley as they drive about Manhattan. The song Ben Stiller is playing
24
+ at the beginning is "My Generation" from the same album. The video also features
25
+ scenes of Fred Durst with five girls dancing in a room. The video was filmed around
26
+ the same time as the film Zoolander, which explains Stiller and Dorff's appearance.
27
+ Fred Durst has a small cameo in that film.
28
+ - text: 'Maze Runner: The Death Cure On April 22, 2017, the studio delayed the release
29
+ date once again, to February 9, 2018, in order to allow more time for post-production;
30
+ months later, on August 25, the studio moved the release forward two weeks.[17]
31
+ The film will premiere on January 26, 2018 in 3D, IMAX and IMAX 3D.[18][19]'
32
+ - text: who played the dj in the movie the warriors
33
+ - text: Lionel Messi Born and raised in central Argentina, Messi was diagnosed with
34
+ a growth hormone deficiency as a child. At age 13, he relocated to Spain to join
35
+ Barcelona, who agreed to pay for his medical treatment. After a fast progression
36
+ through Barcelona's youth academy, Messi made his competitive debut aged 17 in
37
+ October 2004. Despite being injury-prone during his early career, he established
38
+ himself as an integral player for the club within the next three years, finishing
39
+ 2007 as a finalist for both the Ballon d'Or and FIFA World Player of the Year
40
+ award, a feat he repeated the following year. His first uninterrupted campaign
41
+ came in the 2008–09 season, during which he helped Barcelona achieve the first
42
+ treble in Spanish football. At 22 years old, Messi won the Ballon d'Or and FIFA
43
+ World Player of the Year award by record voting margins.
44
+ - text: 'Send In the Clowns "Send In the Clowns" is a song written by Stephen Sondheim
45
+ for the 1973 musical A Little Night Music, an adaptation of Ingmar Bergman''s
46
+ film Smiles of a Summer Night. It is a ballad from Act Two, in which the character
47
+ Desirée reflects on the ironies and disappointments of her life. Among other things,
48
+ she looks back on an affair years earlier with the lawyer Fredrik, who was deeply
49
+ in love with her but whose marriage proposals she had rejected. Meeting him after
50
+ so long, she realizes she is in love with him and finally ready to marry him,
51
+ but now it is he who rejects her: he is in an unconsummated marriage with a much
52
+ younger woman. Desirée proposes marriage to rescue him from this situation, but
53
+ he declines, citing his dedication to his bride. Reacting to his rejection, Desirée
54
+ sings this song. The song is later reprised as a coda after Fredrik''s young wife
55
+ runs away with his son, and Fredrik is finally free to accept Desirée''s offer.[1]'
56
+ datasets:
57
+ - sentence-transformers/natural-questions
58
+ pipeline_tag: feature-extraction
59
+ library_name: sentence-transformers
60
+ metrics:
61
+ - dot_accuracy@1
62
+ - dot_accuracy@3
63
+ - dot_accuracy@5
64
+ - dot_accuracy@10
65
+ - dot_precision@1
66
+ - dot_precision@3
67
+ - dot_precision@5
68
+ - dot_precision@10
69
+ - dot_recall@1
70
+ - dot_recall@3
71
+ - dot_recall@5
72
+ - dot_recall@10
73
+ - dot_ndcg@10
74
+ - dot_mrr@10
75
+ - dot_map@100
76
+ - query_active_dims
77
+ - query_sparsity_ratio
78
+ - corpus_active_dims
79
+ - corpus_sparsity_ratio
80
+ co2_eq_emissions:
81
+ emissions: 40.32092414763781
82
+ energy_consumed: 0.10373222712421808
83
+ source: codecarbon
84
+ training_type: fine-tuning
85
+ on_cloud: false
86
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
87
+ ram_total_size: 31.777088165283203
88
+ hours_used: 0.271
89
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
90
+ model-index:
91
+ - name: Inference-free SPLADE distilbert-base-uncased trained on Natural-Questions
92
+ tuples
93
+ results:
94
+ - task:
95
+ type: sparse-information-retrieval
96
+ name: Sparse Information Retrieval
97
+ dataset:
98
+ name: NanoMSMARCO
99
+ type: NanoMSMARCO
100
+ metrics:
101
+ - type: dot_accuracy@1
102
+ value: 0.32
103
+ name: Dot Accuracy@1
104
+ - type: dot_accuracy@3
105
+ value: 0.52
106
+ name: Dot Accuracy@3
107
+ - type: dot_accuracy@5
108
+ value: 0.62
109
+ name: Dot Accuracy@5
110
+ - type: dot_accuracy@10
111
+ value: 0.8
112
+ name: Dot Accuracy@10
113
+ - type: dot_precision@1
114
+ value: 0.32
115
+ name: Dot Precision@1
116
+ - type: dot_precision@3
117
+ value: 0.1733333333333333
118
+ name: Dot Precision@3
119
+ - type: dot_precision@5
120
+ value: 0.124
121
+ name: Dot Precision@5
122
+ - type: dot_precision@10
123
+ value: 0.08
124
+ name: Dot Precision@10
125
+ - type: dot_recall@1
126
+ value: 0.32
127
+ name: Dot Recall@1
128
+ - type: dot_recall@3
129
+ value: 0.52
130
+ name: Dot Recall@3
131
+ - type: dot_recall@5
132
+ value: 0.62
133
+ name: Dot Recall@5
134
+ - type: dot_recall@10
135
+ value: 0.8
136
+ name: Dot Recall@10
137
+ - type: dot_ndcg@10
138
+ value: 0.5423437640649257
139
+ name: Dot Ndcg@10
140
+ - type: dot_mrr@10
141
+ value: 0.46220634920634923
142
+ name: Dot Mrr@10
143
+ - type: dot_map@100
144
+ value: 0.46993244404328005
145
+ name: Dot Map@100
146
+ - type: query_active_dims
147
+ value: 7.21999979019165
148
+ name: Query Active Dims
149
+ - type: query_sparsity_ratio
150
+ value: 0.999763449322122
151
+ name: Query Sparsity Ratio
152
+ - type: corpus_active_dims
153
+ value: 55.45885467529297
154
+ name: Corpus Active Dims
155
+ - type: corpus_sparsity_ratio
156
+ value: 0.9981829875278392
157
+ name: Corpus Sparsity Ratio
158
+ - task:
159
+ type: sparse-information-retrieval
160
+ name: Sparse Information Retrieval
161
+ dataset:
162
+ name: NanoNFCorpus
163
+ type: NanoNFCorpus
164
+ metrics:
165
+ - type: dot_accuracy@1
166
+ value: 0.46
167
+ name: Dot Accuracy@1
168
+ - type: dot_accuracy@3
169
+ value: 0.5
170
+ name: Dot Accuracy@3
171
+ - type: dot_accuracy@5
172
+ value: 0.52
173
+ name: Dot Accuracy@5
174
+ - type: dot_accuracy@10
175
+ value: 0.6
176
+ name: Dot Accuracy@10
177
+ - type: dot_precision@1
178
+ value: 0.46
179
+ name: Dot Precision@1
180
+ - type: dot_precision@3
181
+ value: 0.36666666666666664
182
+ name: Dot Precision@3
183
+ - type: dot_precision@5
184
+ value: 0.32
185
+ name: Dot Precision@5
186
+ - type: dot_precision@10
187
+ value: 0.24200000000000002
188
+ name: Dot Precision@10
189
+ - type: dot_recall@1
190
+ value: 0.04383405927588812
191
+ name: Dot Recall@1
192
+ - type: dot_recall@3
193
+ value: 0.07406949762623444
194
+ name: Dot Recall@3
195
+ - type: dot_recall@5
196
+ value: 0.08852725075885176
197
+ name: Dot Recall@5
198
+ - type: dot_recall@10
199
+ value: 0.11263403679653615
200
+ name: Dot Recall@10
201
+ - type: dot_ndcg@10
202
+ value: 0.31346960788721207
203
+ name: Dot Ndcg@10
204
+ - type: dot_mrr@10
205
+ value: 0.4925
206
+ name: Dot Mrr@10
207
+ - type: dot_map@100
208
+ value: 0.13315774123427984
209
+ name: Dot Map@100
210
+ - type: query_active_dims
211
+ value: 5.659999847412109
212
+ name: Query Active Dims
213
+ - type: query_sparsity_ratio
214
+ value: 0.9998145599945151
215
+ name: Query Sparsity Ratio
216
+ - type: corpus_active_dims
217
+ value: 68.93396759033203
218
+ name: Corpus Active Dims
219
+ - type: corpus_sparsity_ratio
220
+ value: 0.9977414989977612
221
+ name: Corpus Sparsity Ratio
222
+ - task:
223
+ type: sparse-information-retrieval
224
+ name: Sparse Information Retrieval
225
+ dataset:
226
+ name: NanoNQ
227
+ type: NanoNQ
228
+ metrics:
229
+ - type: dot_accuracy@1
230
+ value: 0.34
231
+ name: Dot Accuracy@1
232
+ - type: dot_accuracy@3
233
+ value: 0.58
234
+ name: Dot Accuracy@3
235
+ - type: dot_accuracy@5
236
+ value: 0.68
237
+ name: Dot Accuracy@5
238
+ - type: dot_accuracy@10
239
+ value: 0.76
240
+ name: Dot Accuracy@10
241
+ - type: dot_precision@1
242
+ value: 0.34
243
+ name: Dot Precision@1
244
+ - type: dot_precision@3
245
+ value: 0.19333333333333333
246
+ name: Dot Precision@3
247
+ - type: dot_precision@5
248
+ value: 0.136
249
+ name: Dot Precision@5
250
+ - type: dot_precision@10
251
+ value: 0.078
252
+ name: Dot Precision@10
253
+ - type: dot_recall@1
254
+ value: 0.33
255
+ name: Dot Recall@1
256
+ - type: dot_recall@3
257
+ value: 0.55
258
+ name: Dot Recall@3
259
+ - type: dot_recall@5
260
+ value: 0.64
261
+ name: Dot Recall@5
262
+ - type: dot_recall@10
263
+ value: 0.7
264
+ name: Dot Recall@10
265
+ - type: dot_ndcg@10
266
+ value: 0.5246562093175895
267
+ name: Dot Ndcg@10
268
+ - type: dot_mrr@10
269
+ value: 0.47938095238095235
270
+ name: Dot Mrr@10
271
+ - type: dot_map@100
272
+ value: 0.4679694682232657
273
+ name: Dot Map@100
274
+ - type: query_active_dims
275
+ value: 10.319999694824219
276
+ name: Query Active Dims
277
+ - type: query_sparsity_ratio
278
+ value: 0.9996618832417657
279
+ name: Query Sparsity Ratio
280
+ - type: corpus_active_dims
281
+ value: 48.96862030029297
282
+ name: Corpus Active Dims
283
+ - type: corpus_sparsity_ratio
284
+ value: 0.9983956287169815
285
+ name: Corpus Sparsity Ratio
286
+ - task:
287
+ type: sparse-nano-beir
288
+ name: Sparse Nano BEIR
289
+ dataset:
290
+ name: NanoBEIR mean
291
+ type: NanoBEIR_mean
292
+ metrics:
293
+ - type: dot_accuracy@1
294
+ value: 0.37333333333333335
295
+ name: Dot Accuracy@1
296
+ - type: dot_accuracy@3
297
+ value: 0.5333333333333333
298
+ name: Dot Accuracy@3
299
+ - type: dot_accuracy@5
300
+ value: 0.6066666666666668
301
+ name: Dot Accuracy@5
302
+ - type: dot_accuracy@10
303
+ value: 0.7200000000000001
304
+ name: Dot Accuracy@10
305
+ - type: dot_precision@1
306
+ value: 0.37333333333333335
307
+ name: Dot Precision@1
308
+ - type: dot_precision@3
309
+ value: 0.24444444444444444
310
+ name: Dot Precision@3
311
+ - type: dot_precision@5
312
+ value: 0.19333333333333336
313
+ name: Dot Precision@5
314
+ - type: dot_precision@10
315
+ value: 0.13333333333333333
316
+ name: Dot Precision@10
317
+ - type: dot_recall@1
318
+ value: 0.23127801975862938
319
+ name: Dot Recall@1
320
+ - type: dot_recall@3
321
+ value: 0.3813564992087448
322
+ name: Dot Recall@3
323
+ - type: dot_recall@5
324
+ value: 0.4495090835862839
325
+ name: Dot Recall@5
326
+ - type: dot_recall@10
327
+ value: 0.5375446789321787
328
+ name: Dot Recall@10
329
+ - type: dot_ndcg@10
330
+ value: 0.46015652708990906
331
+ name: Dot Ndcg@10
332
+ - type: dot_mrr@10
333
+ value: 0.4780291005291006
334
+ name: Dot Mrr@10
335
+ - type: dot_map@100
336
+ value: 0.35701988450027516
337
+ name: Dot Map@100
338
+ - type: query_active_dims
339
+ value: 7.733333110809326
340
+ name: Query Active Dims
341
+ - type: query_sparsity_ratio
342
+ value: 0.999746630852801
343
+ name: Query Sparsity Ratio
344
+ - type: corpus_active_dims
345
+ value: 56.004758931296756
346
+ name: Corpus Active Dims
347
+ - type: corpus_sparsity_ratio
348
+ value: 0.9981651019287301
349
+ name: Corpus Sparsity Ratio
350
+ ---
351
+
352
+ # Inference-free SPLADE distilbert-base-uncased trained on Natural-Questions tuples
353
+
354
+ This is a [Asymmetric Inference-free SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model trained on the [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
355
+ ## Model Details
356
+
357
+ ### Model Description
358
+ - **Model Type:** Asymmetric Inference-free SPLADE Sparse Encoder
359
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
360
+ - **Maximum Sequence Length:** 512 tokens
361
+ - **Output Dimensionality:** 30522 dimensions
362
+ - **Similarity Function:** Dot Product
363
+ - **Training Dataset:**
364
+ - [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions)
365
+ - **Language:** en
366
+ - **License:** apache-2.0
367
+
368
+ ### Model Sources
369
+
370
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
371
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
372
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
373
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
374
+
375
+ ### Full Model Architecture
376
+
377
+ ```
378
+ SparseEncoder(
379
+ (0): Router(
380
+ (query_0_IDF): IDF ({'frozen': False}, dim:30522, tokenizer: DistilBertTokenizerFast)
381
+ (document_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: DistilBertForMaskedLM
382
+ (document_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
383
+ )
384
+ )
385
+ ```
386
+
387
+ ## Usage
388
+
389
+ ### Direct Usage (Sentence Transformers)
390
+
391
+ First install the Sentence Transformers library:
392
+
393
+ ```bash
394
+ pip install -U sentence-transformers
395
+ ```
396
+
397
+ Then you can load this model and run inference.
398
+ ```python
399
+ from sentence_transformers import SparseEncoder
400
+
401
+ # Download from the 🤗 Hub
402
+ model = SparseEncoder("tomaarsen/inference-free-splade-distilbert-base-uncased-nq")
403
+ # Run inference
404
+ queries = [
405
+ "is send in the clowns from a musical",
406
+ ]
407
+ documents = [
408
+ 'Send In the Clowns "Send In the Clowns" is a song written by Stephen Sondheim for the 1973 musical A Little Night Music, an adaptation of Ingmar Bergman\'s film Smiles of a Summer Night. It is a ballad from Act Two, in which the character Desirée reflects on the ironies and disappointments of her life. Among other things, she looks back on an affair years earlier with the lawyer Fredrik, who was deeply in love with her but whose marriage proposals she had rejected. Meeting him after so long, she realizes she is in love with him and finally ready to marry him, but now it is he who rejects her: he is in an unconsummated marriage with a much younger woman. Desirée proposes marriage to rescue him from this situation, but he declines, citing his dedication to his bride. Reacting to his rejection, Desirée sings this song. The song is later reprised as a coda after Fredrik\'s young wife runs away with his son, and Fredrik is finally free to accept Desirée\'s offer.[1]',
409
+ 'The Suite Life on Deck The Suite Life on Deck is an American sitcom that aired on Disney Channel from September 26, 2008 to May 6, 2011. It is a sequel/spin-off of the Disney Channel Original Series The Suite Life of Zack & Cody. The series follows twin brothers Zack and Cody Martin and hotel heiress London Tipton in a new setting, the SS Tipton, where they attend classes at "Seven Seas High School" and meet Bailey Pickett while Mr. Moseby manages the ship. The ship travels around the world to nations such as Italy, France, Greece, India, Sweden and the United Kingdom where the characters experience different cultures, adventures, and situations.[1]',
410
+ 'Money in the Bank ladder match The first match was contested in 2005 at WrestleMania 21, after being invented (in kayfabe) by Chris Jericho.[1] At the time, it was exclusive to wrestlers of the Raw brand, and Edge won the inaugural match.[1] From then until 2010, the Money in the Bank ladder match, now open to all WWE brands, became a WrestleMania mainstay. 2010 saw a second and third Money in the Bank ladder match when the Money in the Bank pay-per-view debuted in July. Unlike the matches at WrestleMania, this new event featured two such ladder matches – one each for a contract for the WWE Championship and World Heavyweight Championship, respectively.',
411
+ ]
412
+ query_embeddings = model.encode_query(queries)
413
+ document_embeddings = model.encode_document(documents)
414
+ print(query_embeddings.shape, document_embeddings.shape)
415
+ # [1, 30522] [3, 30522]
416
+
417
+ # Get the similarity scores for the embeddings
418
+ similarities = model.similarity(query_embeddings, document_embeddings)
419
+ print(similarities)
420
+ # tensor([[8.9836, 0.0000, 0.0000]])
421
+ ```
422
+
423
+ <!--
424
+ ### Direct Usage (Transformers)
425
+
426
+ <details><summary>Click to see the direct usage in Transformers</summary>
427
+
428
+ </details>
429
+ -->
430
+
431
+ <!--
432
+ ### Downstream Usage (Sentence Transformers)
433
+
434
+ You can finetune this model on your own dataset.
435
+
436
+ <details><summary>Click to expand</summary>
437
+
438
+ </details>
439
+ -->
440
+
441
+ <!--
442
+ ### Out-of-Scope Use
443
+
444
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
445
+ -->
446
+
447
+ ## Evaluation
448
+
449
+ ### Metrics
450
+
451
+ #### Sparse Information Retrieval
452
+
453
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
454
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
455
+
456
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
457
+ |:----------------------|:------------|:-------------|:-----------|
458
+ | dot_accuracy@1 | 0.32 | 0.46 | 0.34 |
459
+ | dot_accuracy@3 | 0.52 | 0.5 | 0.58 |
460
+ | dot_accuracy@5 | 0.62 | 0.52 | 0.68 |
461
+ | dot_accuracy@10 | 0.8 | 0.6 | 0.76 |
462
+ | dot_precision@1 | 0.32 | 0.46 | 0.34 |
463
+ | dot_precision@3 | 0.1733 | 0.3667 | 0.1933 |
464
+ | dot_precision@5 | 0.124 | 0.32 | 0.136 |
465
+ | dot_precision@10 | 0.08 | 0.242 | 0.078 |
466
+ | dot_recall@1 | 0.32 | 0.0438 | 0.33 |
467
+ | dot_recall@3 | 0.52 | 0.0741 | 0.55 |
468
+ | dot_recall@5 | 0.62 | 0.0885 | 0.64 |
469
+ | dot_recall@10 | 0.8 | 0.1126 | 0.7 |
470
+ | **dot_ndcg@10** | **0.5423** | **0.3135** | **0.5247** |
471
+ | dot_mrr@10 | 0.4622 | 0.4925 | 0.4794 |
472
+ | dot_map@100 | 0.4699 | 0.1332 | 0.468 |
473
+ | query_active_dims | 7.22 | 5.66 | 10.32 |
474
+ | query_sparsity_ratio | 0.9998 | 0.9998 | 0.9997 |
475
+ | corpus_active_dims | 55.4589 | 68.934 | 48.9686 |
476
+ | corpus_sparsity_ratio | 0.9982 | 0.9977 | 0.9984 |
477
+
478
+ #### Sparse Nano BEIR
479
+
480
+ * Dataset: `NanoBEIR_mean`
481
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
482
+ ```json
483
+ {
484
+ "dataset_names": [
485
+ "msmarco",
486
+ "nfcorpus",
487
+ "nq"
488
+ ]
489
+ }
490
+ ```
491
+
492
+ | Metric | Value |
493
+ |:----------------------|:-----------|
494
+ | dot_accuracy@1 | 0.3733 |
495
+ | dot_accuracy@3 | 0.5333 |
496
+ | dot_accuracy@5 | 0.6067 |
497
+ | dot_accuracy@10 | 0.72 |
498
+ | dot_precision@1 | 0.3733 |
499
+ | dot_precision@3 | 0.2444 |
500
+ | dot_precision@5 | 0.1933 |
501
+ | dot_precision@10 | 0.1333 |
502
+ | dot_recall@1 | 0.2313 |
503
+ | dot_recall@3 | 0.3814 |
504
+ | dot_recall@5 | 0.4495 |
505
+ | dot_recall@10 | 0.5375 |
506
+ | **dot_ndcg@10** | **0.4602** |
507
+ | dot_mrr@10 | 0.478 |
508
+ | dot_map@100 | 0.357 |
509
+ | query_active_dims | 7.7333 |
510
+ | query_sparsity_ratio | 0.9997 |
511
+ | corpus_active_dims | 56.0048 |
512
+ | corpus_sparsity_ratio | 0.9982 |
513
+
514
+ <!--
515
+ ## Bias, Risks and Limitations
516
+
517
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
518
+ -->
519
+
520
+ <!--
521
+ ### Recommendations
522
+
523
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
524
+ -->
525
+
526
+ ## Training Details
527
+
528
+ ### Training Dataset
529
+
530
+ #### natural-questions
531
+
532
+ * Dataset: [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
533
+ * Size: 99,000 training samples
534
+ * Columns: <code>query</code> and <code>answer</code>
535
+ * Approximate statistics based on the first 1000 samples:
536
+ | | query | answer |
537
+ |:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
538
+ | type | string | string |
539
+ | details | <ul><li>min: 10 tokens</li><li>mean: 11.71 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 131.81 tokens</li><li>max: 450 tokens</li></ul> |
540
+ * Samples:
541
+ | query | answer |
542
+ |:--------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
543
+ | <code>who played the father in papa don't preach</code> | <code>Alex McArthur Alex McArthur (born March 6, 1957) is an American actor.</code> |
544
+ | <code>where was the location of the battle of hastings</code> | <code>Battle of Hastings The Battle of Hastings[a] was fought on 14 October 1066 between the Norman-French army of William, the Duke of Normandy, and an English army under the Anglo-Saxon King Harold Godwinson, beginning the Norman conquest of England. It took place approximately 7 miles (11 kilometres) northwest of Hastings, close to the present-day town of Battle, East Sussex, and was a decisive Norman victory.</code> |
545
+ | <code>how many puppies can a dog give birth to</code> | <code>Canine reproduction The largest litter size to date was set by a Neapolitan Mastiff in Manea, Cambridgeshire, UK on November 29, 2004; the litter was 24 puppies.[22]</code> |
546
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
547
+ ```json
548
+ {
549
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
550
+ "lambda_corpus": 0.003,
551
+ "lambda_query": 0
552
+ }
553
+ ```
554
+
555
+ ### Evaluation Dataset
556
+
557
+ #### natural-questions
558
+
559
+ * Dataset: [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
560
+ * Size: 1,000 evaluation samples
561
+ * Columns: <code>query</code> and <code>answer</code>
562
+ * Approximate statistics based on the first 1000 samples:
563
+ | | query | answer |
564
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
565
+ | type | string | string |
566
+ | details | <ul><li>min: 10 tokens</li><li>mean: 11.69 tokens</li><li>max: 23 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 134.01 tokens</li><li>max: 512 tokens</li></ul> |
567
+ * Samples:
568
+ | query | answer |
569
+ |:-------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
570
+ | <code>where is the tiber river located in italy</code> | <code>Tiber The Tiber (/ˈtaɪbər/, Latin: Tiberis,[1] Italian: Tevere [ˈteːvere])[2] is the third-longest river in Italy, rising in the Apennine Mountains in Emilia-Romagna and flowing 406 kilometres (252 mi) through Tuscany, Umbria and Lazio, where it is joined by the river Aniene, to the Tyrrhenian Sea, between Ostia and Fiumicino.[3] It drains a basin estimated at 17,375 square kilometres (6,709 sq mi). The river has achieved lasting fame as the main watercourse of the city of Rome, founded on its eastern banks.</code> |
571
+ | <code>what kind of car does jay gatsby drive</code> | <code>Jay Gatsby At the Buchanan home, Jordan Baker, Nick, Jay, and the Buchanans decide to visit New York City. Tom borrows Gatsby's yellow Rolls Royce to drive up to the city. On the way to New York City, Tom makes a detour at a gas station in "the Valley of Ashes", a run-down part of Long Island. The owner, George Wilson, shares his concern that his wife, Myrtle, may be having an affair. This unnerves Tom, who has been having an affair with Myrtle, and he leaves in a hurry.</code> |
572
+ | <code>who sings if i can dream about you</code> | <code>I Can Dream About You "I Can Dream About You" is a song performed by American singer Dan Hartman on the soundtrack album of the film Streets of Fire. Released in 1984 as a single from the soundtrack, and included on Hartman's album I Can Dream About You, it reached number 6 on the Billboard Hot 100.[1]</code> |
573
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
574
+ ```json
575
+ {
576
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score')",
577
+ "lambda_corpus": 0.003,
578
+ "lambda_query": 0
579
+ }
580
+ ```
581
+
582
+ ### Training Hyperparameters
583
+ #### Non-Default Hyperparameters
584
+
585
+ - `eval_strategy`: steps
586
+ - `per_device_train_batch_size`: 16
587
+ - `per_device_eval_batch_size`: 16
588
+ - `learning_rate`: 2e-05
589
+ - `num_train_epochs`: 1
590
+ - `warmup_ratio`: 0.1
591
+ - `fp16`: True
592
+ - `batch_sampler`: no_duplicates
593
+ - `router_mapping`: {'query': 'query', 'answer': 'document'}
594
+ - `learning_rate_mapping`: {'IDF\\.weight': 0.001}
595
+
596
+ #### All Hyperparameters
597
+ <details><summary>Click to expand</summary>
598
+
599
+ - `overwrite_output_dir`: False
600
+ - `do_predict`: False
601
+ - `eval_strategy`: steps
602
+ - `prediction_loss_only`: True
603
+ - `per_device_train_batch_size`: 16
604
+ - `per_device_eval_batch_size`: 16
605
+ - `per_gpu_train_batch_size`: None
606
+ - `per_gpu_eval_batch_size`: None
607
+ - `gradient_accumulation_steps`: 1
608
+ - `eval_accumulation_steps`: None
609
+ - `torch_empty_cache_steps`: None
610
+ - `learning_rate`: 2e-05
611
+ - `weight_decay`: 0.0
612
+ - `adam_beta1`: 0.9
613
+ - `adam_beta2`: 0.999
614
+ - `adam_epsilon`: 1e-08
615
+ - `max_grad_norm`: 1.0
616
+ - `num_train_epochs`: 1
617
+ - `max_steps`: -1
618
+ - `lr_scheduler_type`: linear
619
+ - `lr_scheduler_kwargs`: {}
620
+ - `warmup_ratio`: 0.1
621
+ - `warmup_steps`: 0
622
+ - `log_level`: passive
623
+ - `log_level_replica`: warning
624
+ - `log_on_each_node`: True
625
+ - `logging_nan_inf_filter`: True
626
+ - `save_safetensors`: True
627
+ - `save_on_each_node`: False
628
+ - `save_only_model`: False
629
+ - `restore_callback_states_from_checkpoint`: False
630
+ - `no_cuda`: False
631
+ - `use_cpu`: False
632
+ - `use_mps_device`: False
633
+ - `seed`: 42
634
+ - `data_seed`: None
635
+ - `jit_mode_eval`: False
636
+ - `use_ipex`: False
637
+ - `bf16`: False
638
+ - `fp16`: True
639
+ - `fp16_opt_level`: O1
640
+ - `half_precision_backend`: auto
641
+ - `bf16_full_eval`: False
642
+ - `fp16_full_eval`: False
643
+ - `tf32`: None
644
+ - `local_rank`: 0
645
+ - `ddp_backend`: None
646
+ - `tpu_num_cores`: None
647
+ - `tpu_metrics_debug`: False
648
+ - `debug`: []
649
+ - `dataloader_drop_last`: False
650
+ - `dataloader_num_workers`: 0
651
+ - `dataloader_prefetch_factor`: None
652
+ - `past_index`: -1
653
+ - `disable_tqdm`: False
654
+ - `remove_unused_columns`: True
655
+ - `label_names`: None
656
+ - `load_best_model_at_end`: False
657
+ - `ignore_data_skip`: False
658
+ - `fsdp`: []
659
+ - `fsdp_min_num_params`: 0
660
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
661
+ - `fsdp_transformer_layer_cls_to_wrap`: None
662
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
663
+ - `deepspeed`: None
664
+ - `label_smoothing_factor`: 0.0
665
+ - `optim`: adamw_torch
666
+ - `optim_args`: None
667
+ - `adafactor`: False
668
+ - `group_by_length`: False
669
+ - `length_column_name`: length
670
+ - `ddp_find_unused_parameters`: None
671
+ - `ddp_bucket_cap_mb`: None
672
+ - `ddp_broadcast_buffers`: False
673
+ - `dataloader_pin_memory`: True
674
+ - `dataloader_persistent_workers`: False
675
+ - `skip_memory_metrics`: True
676
+ - `use_legacy_prediction_loop`: False
677
+ - `push_to_hub`: False
678
+ - `resume_from_checkpoint`: None
679
+ - `hub_model_id`: None
680
+ - `hub_strategy`: every_save
681
+ - `hub_private_repo`: None
682
+ - `hub_always_push`: False
683
+ - `gradient_checkpointing`: False
684
+ - `gradient_checkpointing_kwargs`: None
685
+ - `include_inputs_for_metrics`: False
686
+ - `include_for_metrics`: []
687
+ - `eval_do_concat_batches`: True
688
+ - `fp16_backend`: auto
689
+ - `push_to_hub_model_id`: None
690
+ - `push_to_hub_organization`: None
691
+ - `mp_parameters`:
692
+ - `auto_find_batch_size`: False
693
+ - `full_determinism`: False
694
+ - `torchdynamo`: None
695
+ - `ray_scope`: last
696
+ - `ddp_timeout`: 1800
697
+ - `torch_compile`: False
698
+ - `torch_compile_backend`: None
699
+ - `torch_compile_mode`: None
700
+ - `include_tokens_per_second`: False
701
+ - `include_num_input_tokens_seen`: False
702
+ - `neftune_noise_alpha`: None
703
+ - `optim_target_modules`: None
704
+ - `batch_eval_metrics`: False
705
+ - `eval_on_start`: False
706
+ - `use_liger_kernel`: False
707
+ - `eval_use_gather_object`: False
708
+ - `average_tokens_across_devices`: False
709
+ - `prompts`: None
710
+ - `batch_sampler`: no_duplicates
711
+ - `multi_dataset_batch_sampler`: proportional
712
+ - `router_mapping`: {'query': 'query', 'answer': 'document'}
713
+ - `learning_rate_mapping`: {'IDF\\.weight': 0.001}
714
+
715
+ </details>
716
+
717
+ ### Training Logs
718
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
719
+ |:------:|:----:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|
720
+ | 0.0323 | 200 | 0.2453 | - | - | - | - | - |
721
+ | 0.0646 | 400 | 0.0873 | 0.1016 | 0.5390 | 0.3208 | 0.5273 | 0.4624 |
722
+ | 0.0970 | 600 | 0.0705 | - | - | - | - | - |
723
+ | 0.1293 | 800 | 0.0605 | 0.0824 | 0.5449 | 0.3297 | 0.4931 | 0.4559 |
724
+ | 0.1616 | 1000 | 0.0606 | - | - | - | - | - |
725
+ | 0.1939 | 1200 | 0.0672 | 0.0826 | 0.5335 | 0.3240 | 0.5067 | 0.4547 |
726
+ | 0.2262 | 1400 | 0.0768 | - | - | - | - | - |
727
+ | 0.2586 | 1600 | 0.0836 | 0.0861 | 0.5213 | 0.3276 | 0.5354 | 0.4614 |
728
+ | 0.2909 | 1800 | 0.0783 | - | - | - | - | - |
729
+ | 0.3232 | 2000 | 0.0832 | 0.1030 | 0.5214 | 0.3142 | 0.4775 | 0.4377 |
730
+ | 0.3555 | 2200 | 0.0857 | - | - | - | - | - |
731
+ | 0.3878 | 2400 | 0.0851 | 0.0919 | 0.4909 | 0.3175 | 0.5332 | 0.4472 |
732
+ | 0.4202 | 2600 | 0.0848 | - | - | - | - | - |
733
+ | 0.4525 | 2800 | 0.0804 | 0.0950 | 0.5247 | 0.3179 | 0.5316 | 0.4581 |
734
+ | 0.4848 | 3000 | 0.0763 | - | - | - | - | - |
735
+ | 0.5171 | 3200 | 0.0781 | 0.0925 | 0.5350 | 0.3261 | 0.5254 | 0.4621 |
736
+ | 0.5495 | 3400 | 0.0816 | - | - | - | - | - |
737
+ | 0.5818 | 3600 | 0.0762 | 0.0893 | 0.5068 | 0.3171 | 0.5149 | 0.4462 |
738
+ | 0.6141 | 3800 | 0.0821 | - | - | - | - | - |
739
+ | 0.6464 | 4000 | 0.0733 | 0.0909 | 0.5523 | 0.3268 | 0.5435 | 0.4742 |
740
+ | 0.6787 | 4200 | 0.0772 | - | - | - | - | - |
741
+ | 0.7111 | 4400 | 0.0707 | 0.0868 | 0.5320 | 0.3104 | 0.5008 | 0.4477 |
742
+ | 0.7434 | 4600 | 0.0694 | - | - | - | - | - |
743
+ | 0.7757 | 4800 | 0.0733 | 0.0930 | 0.5369 | 0.3047 | 0.5012 | 0.4476 |
744
+ | 0.8080 | 5000 | 0.0693 | - | - | - | - | - |
745
+ | 0.8403 | 5200 | 0.0726 | 0.0863 | 0.5458 | 0.3095 | 0.5261 | 0.4605 |
746
+ | 0.8727 | 5400 | 0.0686 | - | - | - | - | - |
747
+ | 0.9050 | 5600 | 0.0634 | 0.0844 | 0.5479 | 0.3142 | 0.5361 | 0.4661 |
748
+ | 0.9373 | 5800 | 0.0656 | - | - | - | - | - |
749
+ | 0.9696 | 6000 | 0.0655 | 0.0828 | 0.5434 | 0.3134 | 0.5214 | 0.4594 |
750
+ | -1 | -1 | - | - | 0.5423 | 0.3135 | 0.5247 | 0.4602 |
751
+
752
+
753
+ ### Environmental Impact
754
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
755
+ - **Energy Consumed**: 0.104 kWh
756
+ - **Carbon Emitted**: 0.040 kg of CO2
757
+ - **Hours Used**: 0.271 hours
758
+
759
+ ### Training Hardware
760
+ - **On Cloud**: No
761
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
762
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
763
+ - **RAM Size**: 31.78 GB
764
+
765
+ ### Framework Versions
766
+ - Python: 3.11.6
767
+ - Sentence Transformers: 4.2.0.dev0
768
+ - Transformers: 4.52.4
769
+ - PyTorch: 2.6.0+cu124
770
+ - Accelerate: 1.5.1
771
+ - Datasets: 2.21.0
772
+ - Tokenizers: 0.21.1
773
+
774
+ ## Citation
775
+
776
+ ### BibTeX
777
+
778
+ #### Sentence Transformers
779
+ ```bibtex
780
+ @inproceedings{reimers-2019-sentence-bert,
781
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
782
+ author = "Reimers, Nils and Gurevych, Iryna",
783
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
784
+ month = "11",
785
+ year = "2019",
786
+ publisher = "Association for Computational Linguistics",
787
+ url = "https://arxiv.org/abs/1908.10084",
788
+ }
789
+ ```
790
+
791
+ #### SpladeLoss
792
+ ```bibtex
793
+ @misc{formal2022distillationhardnegativesampling,
794
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
795
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
796
+ year={2022},
797
+ eprint={2205.04733},
798
+ archivePrefix={arXiv},
799
+ primaryClass={cs.IR},
800
+ url={https://arxiv.org/abs/2205.04733},
801
+ }
802
+ ```
803
+
804
+ #### SparseMultipleNegativesRankingLoss
805
+ ```bibtex
806
+ @misc{henderson2017efficient,
807
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
808
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
809
+ year={2017},
810
+ eprint={1705.00652},
811
+ archivePrefix={arXiv},
812
+ primaryClass={cs.CL}
813
+ }
814
+ ```
815
+
816
+ #### FlopsLoss
817
+ ```bibtex
818
+ @article{paria2020minimizing,
819
+ title={Minimizing flops to learn efficient sparse representations},
820
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
821
+ journal={arXiv preprint arXiv:2004.05665},
822
+ year={2020}
823
+ }
824
+ ```
825
+
826
+ <!--
827
+ ## Glossary
828
+
829
+ *Clearly define terms in order to be accessible across audiences.*
830
+ -->
831
+
832
+ <!--
833
+ ## Model Card Authors
834
+
835
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
836
+ -->
837
+
838
+ <!--
839
+ ## Model Card Contact
840
+
841
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
842
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.52.4",
6
+ "pytorch": "2.6.0+cu124"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
document_0_MLMTransformer/config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForMaskedLM"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "hidden_dim": 3072,
10
+ "initializer_range": 0.02,
11
+ "max_position_embeddings": 512,
12
+ "model_type": "distilbert",
13
+ "n_heads": 12,
14
+ "n_layers": 6,
15
+ "pad_token_id": 0,
16
+ "qa_dropout": 0.1,
17
+ "seq_classif_dropout": 0.2,
18
+ "sinusoidal_pos_embds": false,
19
+ "tie_weights_": true,
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.52.4",
22
+ "vocab_size": 30522
23
+ }
document_0_MLMTransformer/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e536ab4e452b4361cb9b90603e68bb250ff171caf85bbbcdef77bc8f6e9bdec2
3
+ size 267954768
document_0_MLMTransformer/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
document_0_MLMTransformer/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
document_0_MLMTransformer/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
document_0_MLMTransformer/tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
document_0_MLMTransformer/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
document_1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
modules.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Router"
7
+ }
8
+ ]
query_0_IDF/config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "frozen": false
3
+ }
query_0_IDF/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c58adf6efb9faa8d19d69af16336e9e942a7d327149ddfc71dada7fa2ad57b7f
3
+ size 122168
query_0_IDF/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
query_0_IDF/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
query_0_IDF/tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
query_0_IDF/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
router_config.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "types": {
3
+ "query_0_IDF": "sentence_transformers.sparse_encoder.models.IDF.IDF",
4
+ "document_0_MLMTransformer": "sentence_transformers.sparse_encoder.models.MLMTransformer.MLMTransformer",
5
+ "document_1_SpladePooling": "sentence_transformers.sparse_encoder.models.SpladePooling.SpladePooling"
6
+ },
7
+ "structure": {
8
+ "query": [
9
+ "query_0_IDF"
10
+ ],
11
+ "document": [
12
+ "document_0_MLMTransformer",
13
+ "document_1_SpladePooling"
14
+ ]
15
+ },
16
+ "parameters": {
17
+ "default_route": "document",
18
+ "allow_empty_key": true
19
+ }
20
+ }