tomaarsen HF Staff commited on
Commit
71467c8
·
verified ·
1 Parent(s): f3bbb80

Add new SparseEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,881 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - asymmetric
10
+ - inference-free
11
+ - splade
12
+ - generated_from_trainer
13
+ - dataset_size:99000
14
+ - loss:SpladeLoss
15
+ widget:
16
+ - source_sentence: where is the tiber river located in italy
17
+ sentences:
18
+ - Sales taxes in British Columbia On 1 July 2010, the PST and GST were combined
19
+ into the Harmonized Sales Tax (HST) levied according to the provisions of the
20
+ GST. The conversion to HST was controversial. Popular opposition led to a referendum
21
+ on the tax system, the first such referendum in the Commonwealth of Nations, resulting
22
+ in the province reverting to the former PST/GST model on 1 April 2013.
23
+ - 'Tiber The Tiber (/ˈtaɪbər/, Latin: Tiberis,[1] Italian: Tevere [ˈteːvere])[2]
24
+ is the third-longest river in Italy, rising in the Apennine Mountains in Emilia-Romagna
25
+ and flowing 406 kilometres (252 mi) through Tuscany, Umbria and Lazio, where it
26
+ is joined by the river Aniene, to the Tyrrhenian Sea, between Ostia and Fiumicino.[3]
27
+ It drains a basin estimated at 17,375 square kilometres (6,709 sq mi). The river
28
+ has achieved lasting fame as the main watercourse of the city of Rome, founded
29
+ on its eastern banks.'
30
+ - 'Water in California California''s limited water supply comes from two main sources:
31
+ surface water, or water that travels or gathers on the ground, like rivers, streams,
32
+ and lakes; and groundwater, which is water that is pumped out from the ground.
33
+ California has also begun producing a small amount of desalinated water, water
34
+ that was once sea water, but has been purified.'
35
+ - source_sentence: what kind of car does jay gatsby drive
36
+ sentences:
37
+ - Jay Gatsby At the Buchanan home, Jordan Baker, Nick, Jay, and the Buchanans decide
38
+ to visit New York City. Tom borrows Gatsby's yellow Rolls Royce to drive up to
39
+ the city. On the way to New York City, Tom makes a detour at a gas station in
40
+ "the Valley of Ashes", a run-down part of Long Island. The owner, George Wilson,
41
+ shares his concern that his wife, Myrtle, may be having an affair. This unnerves
42
+ Tom, who has been having an affair with Myrtle, and he leaves in a hurry.
43
+ - 'Panama Canal The Panama Canal (Spanish: Canal de Panamá) is an artificial 77 km
44
+ (48 mi) waterway in Panama that connects the Atlantic Ocean with the Pacific Ocean.
45
+ The canal cuts across the Isthmus of Panama and is a conduit for maritime trade.
46
+ Canal locks are at each end to lift ships up to Gatun Lake, an artificial lake
47
+ created to reduce the amount of excavation work required for the canal, 26 m (85
48
+ ft) above sea level, and then lower the ships at the other end. The original locks
49
+ are 34 m (110 ft) wide. A third, wider lane of locks was constructed between September
50
+ 2007 and May 2016. The expanded canal began commercial operation on June 26, 2016.
51
+ The new locks allow transit of larger, post-Panamax ships, capable of handling
52
+ more cargo.[1]'
53
+ - Solar maximum Predictions of a future maximum's timing and strength are very difficult;
54
+ predictions vary widely. There was a solar maximum in 2000. In 2006 NASA initially
55
+ expected a solar maximum in 2010 or 2011, and thought that it could be the strongest
56
+ since 1958.[3] However, the solar maximum was not declared to have occurred until
57
+ 2014, and even then was ranked among the weakest on record.[4]
58
+ - source_sentence: who sings if i can dream about you
59
+ sentences:
60
+ - Wesley Jonathan Wesley Jonathan Waples (born October 18, 1978), known professionally
61
+ as Wesley Jonathan, is an American actor. He is best known for his starring role
62
+ as Jamal Grant on the NBC Saturday morning comedy-drama series City Guys, Sweetness
63
+ in the 2005 film Roll Bounce, as well as Burrell "Stamps" Ballentine on TV Land's
64
+ The Soul Man.
65
+ - I Can Dream About You "I Can Dream About You" is a song performed by American
66
+ singer Dan Hartman on the soundtrack album of the film Streets of Fire. Released
67
+ in 1984 as a single from the soundtrack, and included on Hartman's album I Can
68
+ Dream About You, it reached number 6 on the Billboard Hot 100.[1]
69
+ - Blood is thicker than water In modern society, the proverb "blood is thicker than
70
+ water" is used to imply that family relationships are always more important than
71
+ friends.
72
+ - source_sentence: who did jesse palmer end up with on the bachelor
73
+ sentences:
74
+ - Jesse Palmer In 2004, Palmer was the first professional athlete to appear on The
75
+ Bachelor television program and the first non-American bachelor, in which he was
76
+ given his choice of eligible single women. He eventually selected Jessica Bowlin,
77
+ but their courtship lasted for only a few months after the end of the show.[19][20]
78
+ - Wave base In seawater, the water particles are moved in a circular orbital motion
79
+ when a wave passes. The radius of the circle of motion for any given water molecule
80
+ decreases exponentially with increasing depth. The wave base, which is the depth
81
+ of influence of a water wave, is about half the wavelength.
82
+ - Do You Remember the First Time? (The Vampire Diaries) Elena, after everyone continues
83
+ to convince her that she had once loved damon decides to run through the magic
84
+ free, mystic falls border. So she does, and she gets glimpses of her and Damon
85
+ but never fully remembers yet that she loves him. Damon pulls her back across
86
+ the line and she asks about a kiss in the rain. He continues to try to get her
87
+ to remember.
88
+ - source_sentence: when did the american civil rights movement end
89
+ sentences:
90
+ - 'A Sunday Afternoon on the Island of La Grande Jatte A Sunday Afternoon on the
91
+ Island of La Grande Jatte (French: Un dimanche après-midi à l''Île de la Grande
92
+ Jatte) painted in 1884, is one of Georges Seurat''s most famous works. It is a
93
+ leading example of pointillist technique, executed on a large canvas. Seurat''s
94
+ composition includes a number of Parisians at a park on the banks of the River
95
+ Seine.'
96
+ - Paleolithic Paleolithic humans made tools of stone, bone, and wood.[23] The early
97
+ paleolithic hominins, Australopithecus, were the first users of stone tools. Excavations
98
+ in Gona, Ethiopia have produced thousands of artifacts, and through radioisotopic
99
+ dating and magnetostratigraphy, the sites can be firmly dated to 2.6 million
100
+ years ago. Evidence shows these early hominins intentionally selected raw materials
101
+ with good flaking qualities and chose appropriate sized stones for their needs
102
+ to produce sharp-edged tools for cutting.[29]
103
+ - African-American civil rights movement (1954–1968) The Civil Rights Movement (also
104
+ known as the American civil rights movement, African-American civil rights movement,
105
+ and other terms,[b]) was a human rights movement from 1954–1968 that encompassed
106
+ strategies, groups, and social movements to accomplish its goal of ending legalized
107
+ racial segregation and discrimination laws in the United States. The movement
108
+ secured the legal recognition and federal protection of black Americans in the
109
+ United States Constitution and federal law.
110
+ datasets:
111
+ - sentence-transformers/natural-questions
112
+ pipeline_tag: feature-extraction
113
+ library_name: sentence-transformers
114
+ metrics:
115
+ - dot_accuracy@1
116
+ - dot_accuracy@3
117
+ - dot_accuracy@5
118
+ - dot_accuracy@10
119
+ - dot_precision@1
120
+ - dot_precision@3
121
+ - dot_precision@5
122
+ - dot_precision@10
123
+ - dot_recall@1
124
+ - dot_recall@3
125
+ - dot_recall@5
126
+ - dot_recall@10
127
+ - dot_ndcg@10
128
+ - dot_mrr@10
129
+ - dot_map@100
130
+ co2_eq_emissions:
131
+ emissions: 11.486068711625794
132
+ energy_consumed: 0.029549806050974257
133
+ source: codecarbon
134
+ training_type: fine-tuning
135
+ on_cloud: false
136
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
137
+ ram_total_size: 31.777088165283203
138
+ hours_used: 0.087
139
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
140
+ model-index:
141
+ - name: Inference-free SPLADE BERT-tiny trained on Natural-Questions tuples
142
+ results:
143
+ - task:
144
+ type: sparse-information-retrieval
145
+ name: Sparse Information Retrieval
146
+ dataset:
147
+ name: NanoMSMARCO
148
+ type: NanoMSMARCO
149
+ metrics:
150
+ - type: dot_accuracy@1
151
+ value: 0.3
152
+ name: Dot Accuracy@1
153
+ - type: dot_accuracy@3
154
+ value: 0.46
155
+ name: Dot Accuracy@3
156
+ - type: dot_accuracy@5
157
+ value: 0.52
158
+ name: Dot Accuracy@5
159
+ - type: dot_accuracy@10
160
+ value: 0.66
161
+ name: Dot Accuracy@10
162
+ - type: dot_precision@1
163
+ value: 0.3
164
+ name: Dot Precision@1
165
+ - type: dot_precision@3
166
+ value: 0.15333333333333332
167
+ name: Dot Precision@3
168
+ - type: dot_precision@5
169
+ value: 0.10400000000000001
170
+ name: Dot Precision@5
171
+ - type: dot_precision@10
172
+ value: 0.066
173
+ name: Dot Precision@10
174
+ - type: dot_recall@1
175
+ value: 0.3
176
+ name: Dot Recall@1
177
+ - type: dot_recall@3
178
+ value: 0.46
179
+ name: Dot Recall@3
180
+ - type: dot_recall@5
181
+ value: 0.52
182
+ name: Dot Recall@5
183
+ - type: dot_recall@10
184
+ value: 0.66
185
+ name: Dot Recall@10
186
+ - type: dot_ndcg@10
187
+ value: 0.4613583823584531
188
+ name: Dot Ndcg@10
189
+ - type: dot_mrr@10
190
+ value: 0.3998809523809524
191
+ name: Dot Mrr@10
192
+ - type: dot_map@100
193
+ value: 0.4174283897822019
194
+ name: Dot Map@100
195
+ - task:
196
+ type: sparse-information-retrieval
197
+ name: Sparse Information Retrieval
198
+ dataset:
199
+ name: NanoNFCorpus
200
+ type: NanoNFCorpus
201
+ metrics:
202
+ - type: dot_accuracy@1
203
+ value: 0.46
204
+ name: Dot Accuracy@1
205
+ - type: dot_accuracy@3
206
+ value: 0.56
207
+ name: Dot Accuracy@3
208
+ - type: dot_accuracy@5
209
+ value: 0.6
210
+ name: Dot Accuracy@5
211
+ - type: dot_accuracy@10
212
+ value: 0.66
213
+ name: Dot Accuracy@10
214
+ - type: dot_precision@1
215
+ value: 0.46
216
+ name: Dot Precision@1
217
+ - type: dot_precision@3
218
+ value: 0.36
219
+ name: Dot Precision@3
220
+ - type: dot_precision@5
221
+ value: 0.31200000000000006
222
+ name: Dot Precision@5
223
+ - type: dot_precision@10
224
+ value: 0.248
225
+ name: Dot Precision@10
226
+ - type: dot_recall@1
227
+ value: 0.0437532639932766
228
+ name: Dot Recall@1
229
+ - type: dot_recall@3
230
+ value: 0.07287376477005543
231
+ name: Dot Recall@3
232
+ - type: dot_recall@5
233
+ value: 0.109733905482588
234
+ name: Dot Recall@5
235
+ - type: dot_recall@10
236
+ value: 0.13321789504016796
237
+ name: Dot Recall@10
238
+ - type: dot_ndcg@10
239
+ value: 0.32305198562670556
240
+ name: Dot Ndcg@10
241
+ - type: dot_mrr@10
242
+ value: 0.5245
243
+ name: Dot Mrr@10
244
+ - type: dot_map@100
245
+ value: 0.13942100043534664
246
+ name: Dot Map@100
247
+ - task:
248
+ type: sparse-information-retrieval
249
+ name: Sparse Information Retrieval
250
+ dataset:
251
+ name: NanoNQ
252
+ type: NanoNQ
253
+ metrics:
254
+ - type: dot_accuracy@1
255
+ value: 0.24
256
+ name: Dot Accuracy@1
257
+ - type: dot_accuracy@3
258
+ value: 0.5
259
+ name: Dot Accuracy@3
260
+ - type: dot_accuracy@5
261
+ value: 0.64
262
+ name: Dot Accuracy@5
263
+ - type: dot_accuracy@10
264
+ value: 0.74
265
+ name: Dot Accuracy@10
266
+ - type: dot_precision@1
267
+ value: 0.24
268
+ name: Dot Precision@1
269
+ - type: dot_precision@3
270
+ value: 0.16666666666666663
271
+ name: Dot Precision@3
272
+ - type: dot_precision@5
273
+ value: 0.128
274
+ name: Dot Precision@5
275
+ - type: dot_precision@10
276
+ value: 0.07400000000000001
277
+ name: Dot Precision@10
278
+ - type: dot_recall@1
279
+ value: 0.24
280
+ name: Dot Recall@1
281
+ - type: dot_recall@3
282
+ value: 0.48
283
+ name: Dot Recall@3
284
+ - type: dot_recall@5
285
+ value: 0.61
286
+ name: Dot Recall@5
287
+ - type: dot_recall@10
288
+ value: 0.7
289
+ name: Dot Recall@10
290
+ - type: dot_ndcg@10
291
+ value: 0.4672527583679101
292
+ name: Dot Ndcg@10
293
+ - type: dot_mrr@10
294
+ value: 0.40066666666666656
295
+ name: Dot Mrr@10
296
+ - type: dot_map@100
297
+ value: 0.3953139370042203
298
+ name: Dot Map@100
299
+ - task:
300
+ type: sparse-nano-beir
301
+ name: Sparse Nano BEIR
302
+ dataset:
303
+ name: NanoBEIR mean
304
+ type: NanoBEIR_mean
305
+ metrics:
306
+ - type: dot_accuracy@1
307
+ value: 0.3333333333333333
308
+ name: Dot Accuracy@1
309
+ - type: dot_accuracy@3
310
+ value: 0.5066666666666667
311
+ name: Dot Accuracy@3
312
+ - type: dot_accuracy@5
313
+ value: 0.5866666666666668
314
+ name: Dot Accuracy@5
315
+ - type: dot_accuracy@10
316
+ value: 0.6866666666666666
317
+ name: Dot Accuracy@10
318
+ - type: dot_precision@1
319
+ value: 0.3333333333333333
320
+ name: Dot Precision@1
321
+ - type: dot_precision@3
322
+ value: 0.22666666666666666
323
+ name: Dot Precision@3
324
+ - type: dot_precision@5
325
+ value: 0.18133333333333335
326
+ name: Dot Precision@5
327
+ - type: dot_precision@10
328
+ value: 0.12933333333333333
329
+ name: Dot Precision@10
330
+ - type: dot_recall@1
331
+ value: 0.1945844213310922
332
+ name: Dot Recall@1
333
+ - type: dot_recall@3
334
+ value: 0.3376245882566851
335
+ name: Dot Recall@3
336
+ - type: dot_recall@5
337
+ value: 0.4132446351608627
338
+ name: Dot Recall@5
339
+ - type: dot_recall@10
340
+ value: 0.49773929834672265
341
+ name: Dot Recall@10
342
+ - type: dot_ndcg@10
343
+ value: 0.41722104211768957
344
+ name: Dot Ndcg@10
345
+ - type: dot_mrr@10
346
+ value: 0.4416825396825396
347
+ name: Dot Mrr@10
348
+ - type: dot_map@100
349
+ value: 0.31738777574058963
350
+ name: Dot Map@100
351
+ ---
352
+
353
+ # Inference-free SPLADE BERT-tiny trained on Natural-Questions tuples
354
+
355
+ This is a [Asymmetric Inference-free SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model trained on the [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) dataset using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
356
+
357
+ ## Model Details
358
+
359
+ ### Model Description
360
+ - **Model Type:** Asymmetric Inference-free SPLADE Sparse Encoder
361
+ <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
362
+ - **Maximum Sequence Length:** 512 tokens
363
+ - **Output Dimensionality:** 30522 dimensions
364
+ - **Similarity Function:** Dot Product
365
+ - **Training Dataset:**
366
+ - [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions)
367
+ - **Language:** en
368
+ - **License:** apache-2.0
369
+
370
+ ### Model Sources
371
+
372
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
373
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
374
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
375
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
376
+
377
+ ### Full Model Architecture
378
+
379
+ ```
380
+ SparseEncoder(
381
+ (0): Asym(
382
+ (query_0_IDF): IDF ({'frozen': False}, dim:30522, tokenizer: BertTokenizerFast)
383
+ (corpus_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
384
+ (corpus_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
385
+ )
386
+ )
387
+ ```
388
+
389
+ ## Usage
390
+
391
+ ### Direct Usage (Sentence Transformers)
392
+
393
+ First install the Sentence Transformers library:
394
+
395
+ ```bash
396
+ pip install -U sentence-transformers
397
+ ```
398
+
399
+ Then you can load this model and run inference.
400
+ ```python
401
+ from sentence_transformers import SparseEncoder
402
+
403
+ # Download from the 🤗 Hub
404
+ model = SparseEncoder("tomaarsen/inference-free-splade-bert-tiny-nq-3e-3-lambda-corpus")
405
+ # Run inference
406
+ sentences = [
407
+ 'when did the american civil rights movement end',
408
+ 'African-American civil rights movement (1954–1968) The Civil Rights Movement (also known as the American civil rights movement, African-American civil rights movement, and other terms,[b]) was a human rights movement from 1954–1968 that encompassed strategies, groups, and social movements to accomplish its goal of ending legalized racial segregation and discrimination laws in the United States. The movement secured the legal recognition and federal protection of black Americans in the United States Constitution and federal law.',
409
+ 'Paleolithic Paleolithic humans made tools of stone, bone, and wood.[23] The early paleolithic hominins, Australopithecus, were the first users of stone tools. Excavations in Gona, Ethiopia have produced thousands of artifacts, and through radioisotopic dating and magnetostratigraphy, the sites can be firmly dated to 2.6Â\xa0million years ago. Evidence shows these early hominins intentionally selected raw materials with good flaking qualities and chose appropriate sized stones for their needs to produce sharp-edged tools for cutting.[29]',
410
+ ]
411
+ embeddings = model.encode(sentences)
412
+ print(embeddings.shape)
413
+ # (3, 30522)
414
+
415
+ # Get the similarity scores for the embeddings
416
+ similarities = model.similarity(embeddings, embeddings)
417
+ print(similarities.shape)
418
+ # [3, 3]
419
+ ```
420
+
421
+ <!--
422
+ ### Direct Usage (Transformers)
423
+
424
+ <details><summary>Click to see the direct usage in Transformers</summary>
425
+
426
+ </details>
427
+ -->
428
+
429
+ <!--
430
+ ### Downstream Usage (Sentence Transformers)
431
+
432
+ You can finetune this model on your own dataset.
433
+
434
+ <details><summary>Click to expand</summary>
435
+
436
+ </details>
437
+ -->
438
+
439
+ <!--
440
+ ### Out-of-Scope Use
441
+
442
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
443
+ -->
444
+
445
+ ## Evaluation
446
+
447
+ ### Metrics
448
+
449
+ #### Sparse Information Retrieval
450
+
451
+ * Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
452
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
453
+
454
+ | Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
455
+ |:-----------------|:------------|:-------------|:-----------|
456
+ | dot_accuracy@1 | 0.3 | 0.46 | 0.24 |
457
+ | dot_accuracy@3 | 0.46 | 0.56 | 0.5 |
458
+ | dot_accuracy@5 | 0.52 | 0.6 | 0.64 |
459
+ | dot_accuracy@10 | 0.66 | 0.66 | 0.74 |
460
+ | dot_precision@1 | 0.3 | 0.46 | 0.24 |
461
+ | dot_precision@3 | 0.1533 | 0.36 | 0.1667 |
462
+ | dot_precision@5 | 0.104 | 0.312 | 0.128 |
463
+ | dot_precision@10 | 0.066 | 0.248 | 0.074 |
464
+ | dot_recall@1 | 0.3 | 0.0438 | 0.24 |
465
+ | dot_recall@3 | 0.46 | 0.0729 | 0.48 |
466
+ | dot_recall@5 | 0.52 | 0.1097 | 0.61 |
467
+ | dot_recall@10 | 0.66 | 0.1332 | 0.7 |
468
+ | **dot_ndcg@10** | **0.4614** | **0.3231** | **0.4673** |
469
+ | dot_mrr@10 | 0.3999 | 0.5245 | 0.4007 |
470
+ | dot_map@100 | 0.4174 | 0.1394 | 0.3953 |
471
+
472
+ #### Sparse Nano BEIR
473
+
474
+ * Dataset: `NanoBEIR_mean`
475
+ * Evaluated with [<code>SparseNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseNanoBEIREvaluator) with these parameters:
476
+ ```json
477
+ {
478
+ "dataset_names": [
479
+ "msmarco",
480
+ "nfcorpus",
481
+ "nq"
482
+ ]
483
+ }
484
+ ```
485
+
486
+ | Metric | Value |
487
+ |:-----------------|:-----------|
488
+ | dot_accuracy@1 | 0.3333 |
489
+ | dot_accuracy@3 | 0.5067 |
490
+ | dot_accuracy@5 | 0.5867 |
491
+ | dot_accuracy@10 | 0.6867 |
492
+ | dot_precision@1 | 0.3333 |
493
+ | dot_precision@3 | 0.2267 |
494
+ | dot_precision@5 | 0.1813 |
495
+ | dot_precision@10 | 0.1293 |
496
+ | dot_recall@1 | 0.1946 |
497
+ | dot_recall@3 | 0.3376 |
498
+ | dot_recall@5 | 0.4132 |
499
+ | dot_recall@10 | 0.4977 |
500
+ | **dot_ndcg@10** | **0.4172** |
501
+ | dot_mrr@10 | 0.4417 |
502
+ | dot_map@100 | 0.3174 |
503
+
504
+ <!--
505
+ ## Bias, Risks and Limitations
506
+
507
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
508
+ -->
509
+
510
+ <!--
511
+ ### Recommendations
512
+
513
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
514
+ -->
515
+
516
+ ## Training Details
517
+
518
+ ### Training Dataset
519
+
520
+ #### natural-questions
521
+
522
+ * Dataset: [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
523
+ * Size: 99,000 training samples
524
+ * Columns: <code>query</code> and <code>corpus</code>
525
+ * Approximate statistics based on the first 1000 samples:
526
+ | | query | corpus |
527
+ |:--------|:-------------------|:-------------------|
528
+ | type | dict | dict |
529
+ | details | <ul><li></li></ul> | <ul><li></li></ul> |
530
+ * Samples:
531
+ | query | corpus |
532
+ |:---------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
533
+ | <code>{'query': "who played the father in papa don't preach"}</code> | <code>{'corpus': 'Alex McArthur Alex McArthur (born March 6, 1957) is an American actor.'}</code> |
534
+ | <code>{'query': 'where was the location of the battle of hastings'}</code> | <code>{'corpus': 'Battle of Hastings The Battle of Hastings[a] was fought on 14 October 1066 between the Norman-French army of William, the Duke of Normandy, and an English army under the Anglo-Saxon King Harold Godwinson, beginning the Norman conquest of England. It took place approximately 7 miles (11 kilometres) northwest of Hastings, close to the present-day town of Battle, East Sussex, and was a decisive Norman victory.'}</code> |
535
+ | <code>{'query': 'how many puppies can a dog give birth to'}</code> | <code>{'corpus': 'Canine reproduction The largest litter size to date was set by a Neapolitan Mastiff in Manea, Cambridgeshire, UK on November 29, 2004; the litter was 24 puppies.[22]'}</code> |
536
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
537
+ ```json
538
+ {'loss': SparseMultipleNegativesRankingLoss(
539
+ (model): SparseEncoder(
540
+ (0): Asym(
541
+ (query_0_IDF): IDF ({'frozen': False}, dim:30522, tokenizer: BertTokenizerFast)
542
+ (corpus_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
543
+ (corpus_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
544
+ )
545
+ )
546
+ (cross_entropy_loss): CrossEntropyLoss()
547
+ ), 'lambda_corpus': 0.003, 'lambda_query': 0, 'corpus_regularizer': FlopsLoss(
548
+ (model): SparseEncoder(
549
+ (0): Asym(
550
+ (query_0_IDF): IDF ({'frozen': False}, dim:30522, tokenizer: BertTokenizerFast)
551
+ (corpus_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
552
+ (corpus_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
553
+ )
554
+ )
555
+ ), 'query_regularizer': None}
556
+ ```
557
+
558
+ ### Evaluation Dataset
559
+
560
+ #### natural-questions
561
+
562
+ * Dataset: [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) at [f9e894e](https://huggingface.co/datasets/sentence-transformers/natural-questions/tree/f9e894e1081e206e577b4eaa9ee6de2b06ae6f17)
563
+ * Size: 1,000 evaluation samples
564
+ * Columns: <code>query</code> and <code>corpus</code>
565
+ * Approximate statistics based on the first 1000 samples:
566
+ | | query | corpus |
567
+ |:--------|:-------------------|:-------------------|
568
+ | type | dict | dict |
569
+ | details | <ul><li></li></ul> | <ul><li></li></ul> |
570
+ * Samples:
571
+ | query | corpus |
572
+ |:--------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
573
+ | <code>{'query': 'where is the tiber river located in italy'}</code> | <code>{'corpus': 'Tiber The Tiber (/ˈtaɪbər/, Latin: Tiberis,[1] Italian: Tevere [ˈteːvere])[2] is the third-longest river in Italy, rising in the Apennine Mountains in Emilia-Romagna and flowing 406 kilometres (252\xa0mi) through Tuscany, Umbria and Lazio, where it is joined by the river Aniene, to the Tyrrhenian Sea, between Ostia and Fiumicino.[3] It drains a basin estimated at 17,375 square kilometres (6,709\xa0sq\xa0mi). The river has achieved lasting fame as the main watercourse of the city of Rome, founded on its eastern banks.'}</code> |
574
+ | <code>{'query': 'what kind of car does jay gatsby drive'}</code> | <code>{'corpus': 'Jay Gatsby At the Buchanan home, Jordan Baker, Nick, Jay, and the Buchanans decide to visit New York City. Tom borrows Gatsby\'s yellow Rolls Royce to drive up to the city. On the way to New York City, Tom makes a detour at a gas station in "the Valley of Ashes", a run-down part of Long Island. The owner, George Wilson, shares his concern that his wife, Myrtle, may be having an affair. This unnerves Tom, who has been having an affair with Myrtle, and he leaves in a hurry.'}</code> |
575
+ | <code>{'query': 'who sings if i can dream about you'}</code> | <code>{'corpus': 'I Can Dream About You "I Can Dream About You" is a song performed by American singer Dan Hartman on the soundtrack album of the film Streets of Fire. Released in 1984 as a single from the soundtrack, and included on Hartman\'s album I Can Dream About You, it reached number 6 on the Billboard Hot 100.[1]'}</code> |
576
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
577
+ ```json
578
+ {'loss': SparseMultipleNegativesRankingLoss(
579
+ (model): SparseEncoder(
580
+ (0): Asym(
581
+ (query_0_IDF): IDF ({'frozen': False}, dim:30522, tokenizer: BertTokenizerFast)
582
+ (corpus_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
583
+ (corpus_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
584
+ )
585
+ )
586
+ (cross_entropy_loss): CrossEntropyLoss()
587
+ ), 'lambda_corpus': 0.003, 'lambda_query': 0, 'corpus_regularizer': FlopsLoss(
588
+ (model): SparseEncoder(
589
+ (0): Asym(
590
+ (query_0_IDF): IDF ({'frozen': False}, dim:30522, tokenizer: BertTokenizerFast)
591
+ (corpus_0_MLMTransformer): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False}) with MLMTransformer model: BertForMaskedLM
592
+ (corpus_1_SpladePooling): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
593
+ )
594
+ )
595
+ ), 'query_regularizer': None}
596
+ ```
597
+
598
+ ### Training Hyperparameters
599
+ #### Non-Default Hyperparameters
600
+
601
+ - `eval_strategy`: steps
602
+ - `per_device_train_batch_size`: 64
603
+ - `per_device_eval_batch_size`: 64
604
+ - `learning_rate`: 2e-05
605
+ - `num_train_epochs`: 1
606
+ - `warmup_ratio`: 0.1
607
+ - `fp16`: True
608
+ - `batch_sampler`: no_duplicates
609
+
610
+ #### All Hyperparameters
611
+ <details><summary>Click to expand</summary>
612
+
613
+ - `overwrite_output_dir`: False
614
+ - `do_predict`: False
615
+ - `eval_strategy`: steps
616
+ - `prediction_loss_only`: True
617
+ - `per_device_train_batch_size`: 64
618
+ - `per_device_eval_batch_size`: 64
619
+ - `per_gpu_train_batch_size`: None
620
+ - `per_gpu_eval_batch_size`: None
621
+ - `gradient_accumulation_steps`: 1
622
+ - `eval_accumulation_steps`: None
623
+ - `torch_empty_cache_steps`: None
624
+ - `learning_rate`: 2e-05
625
+ - `weight_decay`: 0.0
626
+ - `adam_beta1`: 0.9
627
+ - `adam_beta2`: 0.999
628
+ - `adam_epsilon`: 1e-08
629
+ - `max_grad_norm`: 1.0
630
+ - `num_train_epochs`: 1
631
+ - `max_steps`: -1
632
+ - `lr_scheduler_type`: linear
633
+ - `lr_scheduler_kwargs`: {}
634
+ - `warmup_ratio`: 0.1
635
+ - `warmup_steps`: 0
636
+ - `log_level`: passive
637
+ - `log_level_replica`: warning
638
+ - `log_on_each_node`: True
639
+ - `logging_nan_inf_filter`: True
640
+ - `save_safetensors`: True
641
+ - `save_on_each_node`: False
642
+ - `save_only_model`: False
643
+ - `restore_callback_states_from_checkpoint`: False
644
+ - `no_cuda`: False
645
+ - `use_cpu`: False
646
+ - `use_mps_device`: False
647
+ - `seed`: 42
648
+ - `data_seed`: None
649
+ - `jit_mode_eval`: False
650
+ - `use_ipex`: False
651
+ - `bf16`: False
652
+ - `fp16`: True
653
+ - `fp16_opt_level`: O1
654
+ - `half_precision_backend`: auto
655
+ - `bf16_full_eval`: False
656
+ - `fp16_full_eval`: False
657
+ - `tf32`: None
658
+ - `local_rank`: 0
659
+ - `ddp_backend`: None
660
+ - `tpu_num_cores`: None
661
+ - `tpu_metrics_debug`: False
662
+ - `debug`: []
663
+ - `dataloader_drop_last`: False
664
+ - `dataloader_num_workers`: 0
665
+ - `dataloader_prefetch_factor`: None
666
+ - `past_index`: -1
667
+ - `disable_tqdm`: False
668
+ - `remove_unused_columns`: True
669
+ - `label_names`: None
670
+ - `load_best_model_at_end`: False
671
+ - `ignore_data_skip`: False
672
+ - `fsdp`: []
673
+ - `fsdp_min_num_params`: 0
674
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
675
+ - `fsdp_transformer_layer_cls_to_wrap`: None
676
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
677
+ - `deepspeed`: None
678
+ - `label_smoothing_factor`: 0.0
679
+ - `optim`: adamw_torch
680
+ - `optim_args`: None
681
+ - `adafactor`: False
682
+ - `group_by_length`: False
683
+ - `length_column_name`: length
684
+ - `ddp_find_unused_parameters`: None
685
+ - `ddp_bucket_cap_mb`: None
686
+ - `ddp_broadcast_buffers`: False
687
+ - `dataloader_pin_memory`: True
688
+ - `dataloader_persistent_workers`: False
689
+ - `skip_memory_metrics`: True
690
+ - `use_legacy_prediction_loop`: False
691
+ - `push_to_hub`: False
692
+ - `resume_from_checkpoint`: None
693
+ - `hub_model_id`: None
694
+ - `hub_strategy`: every_save
695
+ - `hub_private_repo`: None
696
+ - `hub_always_push`: False
697
+ - `gradient_checkpointing`: False
698
+ - `gradient_checkpointing_kwargs`: None
699
+ - `include_inputs_for_metrics`: False
700
+ - `include_for_metrics`: []
701
+ - `eval_do_concat_batches`: True
702
+ - `fp16_backend`: auto
703
+ - `push_to_hub_model_id`: None
704
+ - `push_to_hub_organization`: None
705
+ - `mp_parameters`:
706
+ - `auto_find_batch_size`: False
707
+ - `full_determinism`: False
708
+ - `torchdynamo`: None
709
+ - `ray_scope`: last
710
+ - `ddp_timeout`: 1800
711
+ - `torch_compile`: False
712
+ - `torch_compile_backend`: None
713
+ - `torch_compile_mode`: None
714
+ - `dispatch_batches`: None
715
+ - `split_batches`: None
716
+ - `include_tokens_per_second`: False
717
+ - `include_num_input_tokens_seen`: False
718
+ - `neftune_noise_alpha`: None
719
+ - `optim_target_modules`: None
720
+ - `batch_eval_metrics`: False
721
+ - `eval_on_start`: False
722
+ - `use_liger_kernel`: False
723
+ - `eval_use_gather_object`: False
724
+ - `average_tokens_across_devices`: False
725
+ - `prompts`: None
726
+ - `batch_sampler`: no_duplicates
727
+ - `multi_dataset_batch_sampler`: proportional
728
+
729
+ </details>
730
+
731
+ ### Training Logs
732
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_dot_ndcg@10 | NanoNFCorpus_dot_ndcg@10 | NanoNQ_dot_ndcg@10 | NanoBEIR_mean_dot_ndcg@10 |
733
+ |:------:|:----:|:-------------:|:---------------:|:-----------------------:|:------------------------:|:------------------:|:-------------------------:|
734
+ | 0.0129 | 20 | 1.5874 | - | - | - | - | - |
735
+ | 0.0259 | 40 | 1.7845 | - | - | - | - | - |
736
+ | 0.0388 | 60 | 1.924 | - | - | - | - | - |
737
+ | 0.0517 | 80 | 1.6441 | - | - | - | - | - |
738
+ | 0.0646 | 100 | 1.1842 | - | - | - | - | - |
739
+ | 0.0776 | 120 | 0.9556 | - | - | - | - | - |
740
+ | 0.0905 | 140 | 0.8623 | - | - | - | - | - |
741
+ | 0.1034 | 160 | 0.7888 | - | - | - | - | - |
742
+ | 0.1164 | 180 | 0.7923 | - | - | - | - | - |
743
+ | 0.1293 | 200 | 0.7464 | 0.6235 | 0.4629 | 0.3201 | 0.4511 | 0.4113 |
744
+ | 0.1422 | 220 | 0.7121 | - | - | - | - | - |
745
+ | 0.1551 | 240 | 0.6795 | - | - | - | - | - |
746
+ | 0.1681 | 260 | 0.7187 | - | - | - | - | - |
747
+ | 0.1810 | 280 | 0.6786 | - | - | - | - | - |
748
+ | 0.1939 | 300 | 0.6608 | - | - | - | - | - |
749
+ | 0.2069 | 320 | 0.6625 | - | - | - | - | - |
750
+ | 0.2198 | 340 | 0.651 | - | - | - | - | - |
751
+ | 0.2327 | 360 | 0.6671 | - | - | - | - | - |
752
+ | 0.2456 | 380 | 0.6732 | - | - | - | - | - |
753
+ | 0.2586 | 400 | 0.6301 | 0.5740 | 0.4243 | 0.3278 | 0.4273 | 0.3931 |
754
+ | 0.2715 | 420 | 0.6375 | - | - | - | - | - |
755
+ | 0.2844 | 440 | 0.6651 | - | - | - | - | - |
756
+ | 0.2973 | 460 | 0.6378 | - | - | - | - | - |
757
+ | 0.3103 | 480 | 0.6592 | - | - | - | - | - |
758
+ | 0.3232 | 500 | 0.6404 | - | - | - | - | - |
759
+ | 0.3361 | 520 | 0.6216 | - | - | - | - | - |
760
+ | 0.3491 | 540 | 0.6072 | - | - | - | - | - |
761
+ | 0.3620 | 560 | 0.6508 | - | - | - | - | - |
762
+ | 0.3749 | 580 | 0.5645 | - | - | - | - | - |
763
+ | 0.3878 | 600 | 0.6275 | 0.4993 | 0.4352 | 0.3241 | 0.4227 | 0.3940 |
764
+ | 0.4008 | 620 | 0.566 | - | - | - | - | - |
765
+ | 0.4137 | 640 | 0.5063 | - | - | - | - | - |
766
+ | 0.4266 | 660 | 0.5297 | - | - | - | - | - |
767
+ | 0.4396 | 680 | 0.5448 | - | - | - | - | - |
768
+ | 0.4525 | 700 | 0.5436 | - | - | - | - | - |
769
+ | 0.4654 | 720 | 0.4771 | - | - | - | - | - |
770
+ | 0.4783 | 740 | 0.5035 | - | - | - | - | - |
771
+ | 0.4913 | 760 | 0.5005 | - | - | - | - | - |
772
+ | 0.5042 | 780 | 0.4509 | - | - | - | - | - |
773
+ | 0.5171 | 800 | 0.4956 | 0.4341 | 0.4596 | 0.3280 | 0.4357 | 0.4078 |
774
+ | 0.5301 | 820 | 0.4876 | - | - | - | - | - |
775
+ | 0.5430 | 840 | 0.4622 | - | - | - | - | - |
776
+ | 0.5559 | 860 | 0.4791 | - | - | - | - | - |
777
+ | 0.5688 | 880 | 0.4608 | - | - | - | - | - |
778
+ | 0.5818 | 900 | 0.451 | - | - | - | - | - |
779
+ | 0.5947 | 920 | 0.4537 | - | - | - | - | - |
780
+ | 0.6076 | 940 | 0.4233 | - | - | - | - | - |
781
+ | 0.6206 | 960 | 0.4534 | - | - | - | - | - |
782
+ | 0.6335 | 980 | 0.4701 | - | - | - | - | - |
783
+ | 0.6464 | 1000 | 0.4017 | 0.4052 | 0.4692 | 0.3271 | 0.4452 | 0.4138 |
784
+ | 0.6593 | 1020 | 0.4518 | - | - | - | - | - |
785
+ | 0.6723 | 1040 | 0.4173 | - | - | - | - | - |
786
+ | 0.6852 | 1060 | 0.4369 | - | - | - | - | - |
787
+ | 0.6981 | 1080 | 0.456 | - | - | - | - | - |
788
+ | 0.7111 | 1100 | 0.448 | - | - | - | - | - |
789
+ | 0.7240 | 1120 | 0.4369 | - | - | - | - | - |
790
+ | 0.7369 | 1140 | 0.4394 | - | - | - | - | - |
791
+ | 0.7498 | 1160 | 0.437 | - | - | - | - | - |
792
+ | 0.7628 | 1180 | 0.4402 | - | - | - | - | - |
793
+ | 0.7757 | 1200 | 0.4382 | 0.3901 | 0.4623 | 0.3238 | 0.4664 | 0.4175 |
794
+ | 0.7886 | 1220 | 0.4111 | - | - | - | - | - |
795
+ | 0.8016 | 1240 | 0.4386 | - | - | - | - | - |
796
+ | 0.8145 | 1260 | 0.4136 | - | - | - | - | - |
797
+ | 0.8274 | 1280 | 0.4439 | - | - | - | - | - |
798
+ | 0.8403 | 1300 | 0.4423 | - | - | - | - | - |
799
+ | 0.8533 | 1320 | 0.4339 | - | - | - | - | - |
800
+ | 0.8662 | 1340 | 0.4124 | - | - | - | - | - |
801
+ | 0.8791 | 1360 | 0.417 | - | - | - | - | - |
802
+ | 0.8920 | 1380 | 0.4067 | - | - | - | - | - |
803
+ | 0.9050 | 1400 | 0.414 | 0.3854 | 0.4591 | 0.3234 | 0.4660 | 0.4162 |
804
+ | 0.9179 | 1420 | 0.4153 | - | - | - | - | - |
805
+ | 0.9308 | 1440 | 0.3889 | - | - | - | - | - |
806
+ | 0.9438 | 1460 | 0.4368 | - | - | - | - | - |
807
+ | 0.9567 | 1480 | 0.4241 | - | - | - | - | - |
808
+ | 0.9696 | 1500 | 0.423 | - | - | - | - | - |
809
+ | 0.9825 | 1520 | 0.4287 | - | - | - | - | - |
810
+ | 0.9955 | 1540 | 0.4282 | - | - | - | - | - |
811
+ | -1 | -1 | - | - | 0.4614 | 0.3231 | 0.4673 | 0.4172 |
812
+
813
+
814
+ ### Environmental Impact
815
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
816
+ - **Energy Consumed**: 0.030 kWh
817
+ - **Carbon Emitted**: 0.011 kg of CO2
818
+ - **Hours Used**: 0.087 hours
819
+
820
+ ### Training Hardware
821
+ - **On Cloud**: No
822
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
823
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
824
+ - **RAM Size**: 31.78 GB
825
+
826
+ ### Framework Versions
827
+ - Python: 3.11.6
828
+ - Sentence Transformers: 4.2.0.dev0
829
+ - Transformers: 4.49.0
830
+ - PyTorch: 2.6.0+cu124
831
+ - Accelerate: 1.5.1
832
+ - Datasets: 2.21.0
833
+ - Tokenizers: 0.21.1
834
+
835
+ ## Citation
836
+
837
+ ### BibTeX
838
+
839
+ #### Sentence Transformers
840
+ ```bibtex
841
+ @inproceedings{reimers-2019-sentence-bert,
842
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
843
+ author = "Reimers, Nils and Gurevych, Iryna",
844
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
845
+ month = "11",
846
+ year = "2019",
847
+ publisher = "Association for Computational Linguistics",
848
+ url = "https://arxiv.org/abs/1908.10084",
849
+ }
850
+ ```
851
+
852
+ #### SpladeLoss
853
+ ```bibtex
854
+ @misc{formal2022distillationhardnegativesampling,
855
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
856
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
857
+ year={2022},
858
+ eprint={2205.04733},
859
+ archivePrefix={arXiv},
860
+ primaryClass={cs.IR},
861
+ url={https://arxiv.org/abs/2205.04733},
862
+ }
863
+ ```
864
+
865
+ <!--
866
+ ## Glossary
867
+
868
+ *Clearly define terms in order to be accessible across audiences.*
869
+ -->
870
+
871
+ <!--
872
+ ## Model Card Authors
873
+
874
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
875
+ -->
876
+
877
+ <!--
878
+ ## Model Card Contact
879
+
880
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
881
+ -->
config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "types": {
3
+ "query_0_IDF": "sentence_transformers.sparse_encoder.models.IDF",
4
+ "corpus_0_MLMTransformer": "sentence_transformers.sparse_encoder.models.MLMTransformer",
5
+ "corpus_1_SpladePooling": "sentence_transformers.sparse_encoder.models.SpladePooling"
6
+ },
7
+ "structure": {
8
+ "query": [
9
+ "query_0_IDF"
10
+ ],
11
+ "corpus": [
12
+ "corpus_0_MLMTransformer",
13
+ "corpus_1_SpladePooling"
14
+ ]
15
+ },
16
+ "parameters": {
17
+ "allow_empty_key": true
18
+ }
19
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "4.2.0.dev0",
5
+ "transformers": "4.49.0",
6
+ "pytorch": "2.6.0+cu124"
7
+ },
8
+ "prompts": {},
9
+ "default_prompt_name": null,
10
+ "similarity_fn_name": "dot"
11
+ }
corpus_0_MLMTransformer/config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "prajjwal1/bert-tiny",
3
+ "architectures": [
4
+ "BertForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 128,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 512,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 2,
17
+ "num_hidden_layers": 2,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.49.0",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
corpus_0_MLMTransformer/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d2983fbf960a43769fb6808ffe6b9972c43ffab11a0da172241e1c618f25f37
3
+ size 17671560
corpus_0_MLMTransformer/sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
corpus_0_MLMTransformer/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
corpus_0_MLMTransformer/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
corpus_0_MLMTransformer/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
corpus_0_MLMTransformer/vocab.txt ADDED
The diff for this file is too large to render. See raw diff
 
corpus_1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
modules.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Asym"
7
+ }
8
+ ]
query_0_IDF/config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "frozen": false
3
+ }
query_0_IDF/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8f8f624db485734035ed2d9600a93d6a134ccbd25ff4b14618179cfc5bef948
3
+ size 122168
query_0_IDF/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
query_0_IDF/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
query_0_IDF/tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
query_0_IDF/vocab.txt ADDED
The diff for this file is too large to render. See raw diff