yosefw commited on
Commit
20ab96e
·
verified ·
1 Parent(s): d8c61cb

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,518 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - sentence-transformers
7
+ - sparse-encoder
8
+ - sparse
9
+ - splade
10
+ - generated_from_trainer
11
+ - dataset_size:496123
12
+ - loss:SpladeLoss
13
+ - loss:SparseMultipleNegativesRankingLoss
14
+ - loss:FlopsLoss
15
+ base_model: prajjwal1/bert-medium
16
+ widget:
17
+ - text: What is the name, background and ethnicity of the actress who plays Raj’s
18
+ sister Priya on “The Big Bang Theory”? —Charles Dix, Stewartsville, Mo. Aarti
19
+ Mann, 36, a first-generation Indian American, was born in Connecticut and raised
20
+ in Pennsylvania, and plays Priya Koothrappali on “The Big Bang Theory.”. Of landing
21
+ the role as Raj’s sister, she says, “It is like winning the opportunity to go
22
+ to the acting Olympics.
23
+ - text: 'Resolved Question: Severe pain in right side of hip radiating down leg and
24
+ into foot. It hurts to stand, walk, sit or lie down. I''ve had it for several
25
+ weeks & have used heat, ice, muscle rub-ons & patches.'
26
+ - text: 'The Antarctic Treaty. The 12 nations listed in the preamble (below) signed
27
+ the Antarctic Treaty on 1 December 1959 at Washington, D.C. The Treaty entered
28
+ into force on 23 June 1961; the 12 signatories became the original 12 consultative
29
+ nations.nother 21 nations have acceded to the Antarctic Treaty: Austria, Belarus,
30
+ Canada, Colombia, Cuba, Democratic Peoples Republic of Korea, Denmark, Estonia,
31
+ Greece, Guatemala, Hungary, Malaysia, Monaco, Pakistan, Papua New Guinea, Portugal,
32
+ Romania, Slovak Republic, Switzerland, Turkey, and Venezuela.'
33
+ - text: Orlando, Florida, USA — Sunrise, Sunset, and Daylength, May 2017. May 2017
34
+ — Sun in Orlando.
35
+ - text: Line baking dish ... to also cover roast). Place roast ... the roast. Place
36
+ in preheated 300 degree oven for 2 1/2 to 3 hours. About 50 minutes per pound.rim
37
+ all excess fat from roast. Place potatoes ... Crockery Pot on top of potatoes
38
+ and onions. Cover and cook on low setting for 10 to 12 hours (high 5 to 6).
39
+ pipeline_tag: feature-extraction
40
+ library_name: sentence-transformers
41
+ metrics:
42
+ - dot_accuracy@1
43
+ - dot_accuracy@3
44
+ - dot_accuracy@5
45
+ - dot_accuracy@10
46
+ - dot_precision@1
47
+ - dot_precision@3
48
+ - dot_precision@5
49
+ - dot_precision@10
50
+ - dot_recall@1
51
+ - dot_recall@3
52
+ - dot_recall@5
53
+ - dot_recall@10
54
+ - dot_ndcg@10
55
+ - dot_mrr@10
56
+ - dot_map@100
57
+ - query_active_dims
58
+ - query_sparsity_ratio
59
+ - corpus_active_dims
60
+ - corpus_sparsity_ratio
61
+ model-index:
62
+ - name: SPLADE-BERT-Medium
63
+ results:
64
+ - task:
65
+ type: sparse-information-retrieval
66
+ name: Sparse Information Retrieval
67
+ dataset:
68
+ name: Unknown
69
+ type: unknown
70
+ metrics:
71
+ - type: dot_accuracy@1
72
+ value: 0.4716
73
+ name: Dot Accuracy@1
74
+ - type: dot_accuracy@3
75
+ value: 0.7802
76
+ name: Dot Accuracy@3
77
+ - type: dot_accuracy@5
78
+ value: 0.8684
79
+ name: Dot Accuracy@5
80
+ - type: dot_accuracy@10
81
+ value: 0.9396
82
+ name: Dot Accuracy@10
83
+ - type: dot_precision@1
84
+ value: 0.4716
85
+ name: Dot Precision@1
86
+ - type: dot_precision@3
87
+ value: 0.26713333333333333
88
+ name: Dot Precision@3
89
+ - type: dot_precision@5
90
+ value: 0.18059999999999998
91
+ name: Dot Precision@5
92
+ - type: dot_precision@10
93
+ value: 0.09851999999999998
94
+ name: Dot Precision@10
95
+ - type: dot_recall@1
96
+ value: 0.4563333333333333
97
+ name: Dot Recall@1
98
+ - type: dot_recall@3
99
+ value: 0.7666333333333334
100
+ name: Dot Recall@3
101
+ - type: dot_recall@5
102
+ value: 0.8592166666666667
103
+ name: Dot Recall@5
104
+ - type: dot_recall@10
105
+ value: 0.9338666666666667
106
+ name: Dot Recall@10
107
+ - type: dot_ndcg@10
108
+ value: 0.7088774640922301
109
+ name: Dot Ndcg@10
110
+ - type: dot_mrr@10
111
+ value: 0.6397524603174632
112
+ name: Dot Mrr@10
113
+ - type: dot_map@100
114
+ value: 0.6359976077086615
115
+ name: Dot Map@100
116
+ - type: query_active_dims
117
+ value: 23.28499984741211
118
+ name: Query Active Dims
119
+ - type: query_sparsity_ratio
120
+ value: 0.9992371076650478
121
+ name: Query Sparsity Ratio
122
+ - type: corpus_active_dims
123
+ value: 175.6306999586799
124
+ name: Corpus Active Dims
125
+ - type: corpus_sparsity_ratio
126
+ value: 0.9942457669891004
127
+ name: Corpus Sparsity Ratio
128
+ ---
129
+
130
+ # SPLADE-BERT-Medium
131
+
132
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [prajjwal1/bert-medium](https://huggingface.co/prajjwal1/bert-medium) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
133
+ ## Model Details
134
+
135
+ ### Model Description
136
+ - **Model Type:** SPLADE Sparse Encoder
137
+ - **Base model:** [prajjwal1/bert-medium](https://huggingface.co/prajjwal1/bert-medium) <!-- at revision ce27ec2944bd32b66ed837edb9c77eb7301b8ecc -->
138
+ - **Maximum Sequence Length:** 512 tokens
139
+ - **Output Dimensionality:** 30522 dimensions
140
+ - **Similarity Function:** Dot Product
141
+ <!-- - **Training Dataset:** Unknown -->
142
+ - **Language:** en
143
+ - **License:** mit
144
+
145
+ ### Model Sources
146
+
147
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
148
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
149
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
150
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
151
+
152
+ ### Full Model Architecture
153
+
154
+ ```
155
+ SparseEncoder(
156
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
157
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
158
+ )
159
+ ```
160
+
161
+ ## Usage
162
+
163
+ ### Direct Usage (Sentence Transformers)
164
+
165
+ First install the Sentence Transformers library:
166
+
167
+ ```bash
168
+ pip install -U sentence-transformers
169
+ ```
170
+
171
+ Then you can load this model and run inference.
172
+ ```python
173
+ from sentence_transformers import SparseEncoder
174
+
175
+ # Download from the 🤗 Hub
176
+ model = SparseEncoder("yosefw/SPLADE-BERT-Medium-BS384")
177
+ # Run inference
178
+ queries = [
179
+ "how long to bake arm roast",
180
+ ]
181
+ documents = [
182
+ 'Line baking dish ... to also cover roast). Place roast ... the roast. Place in preheated 300 degree oven for 2 1/2 to 3 hours. About 50 minutes per pound.rim all excess fat from roast. Place potatoes ... Crockery Pot on top of potatoes and onions. Cover and cook on low setting for 10 to 12 hours (high 5 to 6).',
183
+ 'Considerations. The total time it takes to cook an arm roast depends on its size. A 3- to 4-lb. chuck roast takes 5 to 6 hours on high and 10 to 12 hours on low.Chuck roasts usually contain enough marbled fat to cook without water, but most Crock-Pot roast recipes call for a little liquid.Most importantly, resist the temptation to lift the lid while your roast is cooking. 3- to 4-lb. chuck roast takes 5 to 6 hours on high and 10 to 12 hours on low. Chuck roasts usually contain enough marbled fat to cook without water, but most Crock-Pot roast recipes call for a little liquid. Most importantly, resist the temptation to lift the lid while your roast is cooking.',
184
+ 'Set your Crock Pot on high to reach a simmer point of 209 degrees F in 3 to 4 hours, or low to reach the same cooking temperature in 7 to 8 hours. The total time it takes to cook an arm roast depends on its size. A 3- to 4-lb. chuck roast takes 5 to 6 hours on high and 10 to 12 hours on low.Chuck roasts usually contain enough marbled fat to cook without water, but most Crock-Pot roast recipes call for a little liquid.Most importantly, resist the temptation to lift the lid while your roast is cooking. 3- to 4-lb. chuck roast takes 5 to 6 hours on high and 10 to 12 hours on low. Chuck roasts usually contain enough marbled fat to cook without water, but most Crock-Pot roast recipes call for a little liquid. Most importantly, resist the temptation to lift the lid while your roast is cooking.',
185
+ ]
186
+ query_embeddings = model.encode_query(queries)
187
+ document_embeddings = model.encode_document(documents)
188
+ print(query_embeddings.shape, document_embeddings.shape)
189
+ # [1, 30522] [3, 30522]
190
+
191
+ # Get the similarity scores for the embeddings
192
+ similarities = model.similarity(query_embeddings, document_embeddings)
193
+ print(similarities)
194
+ # tensor([[16.1861, 15.3382, 15.6794]])
195
+ ```
196
+
197
+ <!--
198
+ ### Direct Usage (Transformers)
199
+
200
+ <details><summary>Click to see the direct usage in Transformers</summary>
201
+
202
+ </details>
203
+ -->
204
+
205
+ <!--
206
+ ### Downstream Usage (Sentence Transformers)
207
+
208
+ You can finetune this model on your own dataset.
209
+
210
+ <details><summary>Click to expand</summary>
211
+
212
+ </details>
213
+ -->
214
+
215
+ <!--
216
+ ### Out-of-Scope Use
217
+
218
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
219
+ -->
220
+
221
+ ## Evaluation
222
+
223
+ ### Metrics
224
+
225
+ #### Sparse Information Retrieval
226
+
227
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
228
+
229
+ | Metric | Value |
230
+ |:----------------------|:-----------|
231
+ | dot_accuracy@1 | 0.4716 |
232
+ | dot_accuracy@3 | 0.7802 |
233
+ | dot_accuracy@5 | 0.8684 |
234
+ | dot_accuracy@10 | 0.9396 |
235
+ | dot_precision@1 | 0.4716 |
236
+ | dot_precision@3 | 0.2671 |
237
+ | dot_precision@5 | 0.1806 |
238
+ | dot_precision@10 | 0.0985 |
239
+ | dot_recall@1 | 0.4563 |
240
+ | dot_recall@3 | 0.7666 |
241
+ | dot_recall@5 | 0.8592 |
242
+ | dot_recall@10 | 0.9339 |
243
+ | **dot_ndcg@10** | **0.7089** |
244
+ | dot_mrr@10 | 0.6398 |
245
+ | dot_map@100 | 0.636 |
246
+ | query_active_dims | 23.285 |
247
+ | query_sparsity_ratio | 0.9992 |
248
+ | corpus_active_dims | 175.6307 |
249
+ | corpus_sparsity_ratio | 0.9942 |
250
+
251
+ <!--
252
+ ## Bias, Risks and Limitations
253
+
254
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
255
+ -->
256
+
257
+ <!--
258
+ ### Recommendations
259
+
260
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
261
+ -->
262
+
263
+ ## Training Details
264
+
265
+ ### Training Dataset
266
+
267
+ #### Unnamed Dataset
268
+
269
+ * Size: 496,123 training samples
270
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, and <code>negative_2</code>
271
+ * Approximate statistics based on the first 1000 samples:
272
+ | | query | positive | negative_1 | negative_2 |
273
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
274
+ | type | string | string | string | string |
275
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.87 tokens</li><li>max: 43 tokens</li></ul> | <ul><li>min: 24 tokens</li><li>mean: 81.23 tokens</li><li>max: 259 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 79.21 tokens</li><li>max: 197 tokens</li></ul> | <ul><li>min: 20 tokens</li><li>mean: 77.89 tokens</li><li>max: 207 tokens</li></ul> |
276
+ * Samples:
277
+ | query | positive | negative_1 | negative_2 |
278
+ |:------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
279
+ | <code>heart specialists in ridgeland ms</code> | <code>Dr. George Reynolds Jr, MD is a cardiology specialist in Ridgeland, MS and has been practicing for 35 years. He graduated from Vanderbilt University School Of Medicine in 1977 and specializes in cardiology and internal medicine.</code> | <code>Dr. James Kramer is a Internist in Ridgeland, MS. Find Dr. Kramer's phone number, address and more.</code> | <code>Dr. James Kramer is an internist in Ridgeland, Mississippi. He received his medical degree from Loma Linda University School of Medicine and has been in practice for more than 20 years. Dr. James Kramer's Details</code> |
280
+ | <code>does baytril otic require a prescription</code> | <code>Baytril Otic Ear Drops-Enrofloxacin/Silver Sulfadiazine-Prices & Information. A prescription is required for this item. A prescription is required for this item. Brand medication is not available at this time.</code> | <code>RX required for this item. Click here for our full Prescription Policy and Form. Baytril Otic (enrofloxacin/silver sulfadiazine) Emulsion from Bayer is the first fluoroquinolone approved by the Food and Drug Administration for the topical treatment of canine otitis externa.</code> | <code>Product Details. Baytril Otic is a highly effective treatment prescribed by many veterinarians when your pet has an ear infection caused by susceptible bacteria or fungus. Baytril Otic is: a liquid emulsion that is used topically directly in the ear or on the skin in order to treat susceptible bacterial and yeast infections.</code> |
281
+ | <code>what is on a gyro</code> | <code>Report Abuse. Gyros or gyro (giros) (pronounced /ˈjɪəroʊ/ or /ˈdʒaɪroʊ/, Greek: γύρος turn) is a Greek dish consisting of meat (typically lamb and/or beef), tomato, onion, and tzatziki sauce, and is served with pita bread. Chicken and pork meat can be used too.</code> | <code>A gyroscope (from Ancient Greek γῦρος gûros, circle and σκοπέω skopéō, to look) is a spinning wheel or disc in which the axis of rotation is free to assume any orientation by itself. When rotating, the orientation of this axis is unaffected by tilting or rotation of the mounting, according to the conservation of angular momentum.</code> | <code>Diagram of a gyro wheel. Reaction arrows about the output axis (blue) correspond to forces applied about the input axis (green), and vice versa. A gyroscope is a wheel mounted in two or three gimbals, which are a pivoted supports that allow the rotation of the wheel about a single axis.</code> |
282
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
283
+ ```json
284
+ {
285
+ "loss": "SparseMultipleNegativesRankingLoss(scale=1.0, similarity_fct='dot_score', gather_across_devices=False)",
286
+ "document_regularizer_weight": 0.003,
287
+ "query_regularizer_weight": 0.005
288
+ }
289
+ ```
290
+
291
+ ### Training Hyperparameters
292
+ #### Non-Default Hyperparameters
293
+
294
+ - `eval_strategy`: epoch
295
+ - `per_device_train_batch_size`: 48
296
+ - `per_device_eval_batch_size`: 48
297
+ - `gradient_accumulation_steps`: 8
298
+ - `learning_rate`: 8e-05
299
+ - `num_train_epochs`: 8
300
+ - `lr_scheduler_type`: cosine
301
+ - `warmup_ratio`: 0.025
302
+ - `fp16`: True
303
+ - `load_best_model_at_end`: True
304
+ - `push_to_hub`: True
305
+ - `batch_sampler`: no_duplicates
306
+
307
+ #### All Hyperparameters
308
+ <details><summary>Click to expand</summary>
309
+
310
+ - `overwrite_output_dir`: False
311
+ - `do_predict`: False
312
+ - `eval_strategy`: epoch
313
+ - `prediction_loss_only`: True
314
+ - `per_device_train_batch_size`: 48
315
+ - `per_device_eval_batch_size`: 48
316
+ - `per_gpu_train_batch_size`: None
317
+ - `per_gpu_eval_batch_size`: None
318
+ - `gradient_accumulation_steps`: 8
319
+ - `eval_accumulation_steps`: None
320
+ - `torch_empty_cache_steps`: None
321
+ - `learning_rate`: 8e-05
322
+ - `weight_decay`: 0.0
323
+ - `adam_beta1`: 0.9
324
+ - `adam_beta2`: 0.999
325
+ - `adam_epsilon`: 1e-08
326
+ - `max_grad_norm`: 1.0
327
+ - `num_train_epochs`: 8
328
+ - `max_steps`: -1
329
+ - `lr_scheduler_type`: cosine
330
+ - `lr_scheduler_kwargs`: {}
331
+ - `warmup_ratio`: 0.025
332
+ - `warmup_steps`: 0
333
+ - `log_level`: passive
334
+ - `log_level_replica`: warning
335
+ - `log_on_each_node`: True
336
+ - `logging_nan_inf_filter`: True
337
+ - `save_safetensors`: True
338
+ - `save_on_each_node`: False
339
+ - `save_only_model`: False
340
+ - `restore_callback_states_from_checkpoint`: False
341
+ - `no_cuda`: False
342
+ - `use_cpu`: False
343
+ - `use_mps_device`: False
344
+ - `seed`: 42
345
+ - `data_seed`: None
346
+ - `jit_mode_eval`: False
347
+ - `use_ipex`: False
348
+ - `bf16`: False
349
+ - `fp16`: True
350
+ - `fp16_opt_level`: O1
351
+ - `half_precision_backend`: auto
352
+ - `bf16_full_eval`: False
353
+ - `fp16_full_eval`: False
354
+ - `tf32`: None
355
+ - `local_rank`: 0
356
+ - `ddp_backend`: None
357
+ - `tpu_num_cores`: None
358
+ - `tpu_metrics_debug`: False
359
+ - `debug`: []
360
+ - `dataloader_drop_last`: False
361
+ - `dataloader_num_workers`: 0
362
+ - `dataloader_prefetch_factor`: None
363
+ - `past_index`: -1
364
+ - `disable_tqdm`: False
365
+ - `remove_unused_columns`: True
366
+ - `label_names`: None
367
+ - `load_best_model_at_end`: True
368
+ - `ignore_data_skip`: False
369
+ - `fsdp`: []
370
+ - `fsdp_min_num_params`: 0
371
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
372
+ - `fsdp_transformer_layer_cls_to_wrap`: None
373
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
374
+ - `deepspeed`: None
375
+ - `label_smoothing_factor`: 0.0
376
+ - `optim`: adamw_torch_fused
377
+ - `optim_args`: None
378
+ - `adafactor`: False
379
+ - `group_by_length`: False
380
+ - `length_column_name`: length
381
+ - `ddp_find_unused_parameters`: None
382
+ - `ddp_bucket_cap_mb`: None
383
+ - `ddp_broadcast_buffers`: False
384
+ - `dataloader_pin_memory`: True
385
+ - `dataloader_persistent_workers`: False
386
+ - `skip_memory_metrics`: True
387
+ - `use_legacy_prediction_loop`: False
388
+ - `push_to_hub`: True
389
+ - `resume_from_checkpoint`: None
390
+ - `hub_model_id`: None
391
+ - `hub_strategy`: every_save
392
+ - `hub_private_repo`: None
393
+ - `hub_always_push`: False
394
+ - `hub_revision`: None
395
+ - `gradient_checkpointing`: False
396
+ - `gradient_checkpointing_kwargs`: None
397
+ - `include_inputs_for_metrics`: False
398
+ - `include_for_metrics`: []
399
+ - `eval_do_concat_batches`: True
400
+ - `fp16_backend`: auto
401
+ - `push_to_hub_model_id`: None
402
+ - `push_to_hub_organization`: None
403
+ - `mp_parameters`:
404
+ - `auto_find_batch_size`: False
405
+ - `full_determinism`: False
406
+ - `torchdynamo`: None
407
+ - `ray_scope`: last
408
+ - `ddp_timeout`: 1800
409
+ - `torch_compile`: False
410
+ - `torch_compile_backend`: None
411
+ - `torch_compile_mode`: None
412
+ - `include_tokens_per_second`: False
413
+ - `include_num_input_tokens_seen`: False
414
+ - `neftune_noise_alpha`: None
415
+ - `optim_target_modules`: None
416
+ - `batch_eval_metrics`: False
417
+ - `eval_on_start`: False
418
+ - `use_liger_kernel`: False
419
+ - `liger_kernel_config`: None
420
+ - `eval_use_gather_object`: False
421
+ - `average_tokens_across_devices`: False
422
+ - `prompts`: None
423
+ - `batch_sampler`: no_duplicates
424
+ - `multi_dataset_batch_sampler`: proportional
425
+ - `router_mapping`: {}
426
+ - `learning_rate_mapping`: {}
427
+
428
+ </details>
429
+
430
+ ### Training Logs
431
+ | Epoch | Step | Training Loss | dot_ndcg@10 |
432
+ |:-----:|:----:|:-------------:|:-----------:|
433
+ | 1.0 | 1292 | 42.0325 | 0.7155 |
434
+ | 2.0 | 2584 | 1.1261 | 0.7216 |
435
+ | 3.0 | 3876 | 1.049 | 0.7214 |
436
+ | 4.0 | 5168 | 0.9631 | 0.7188 |
437
+ | 5.0 | 6460 | 0.8725 | 0.7120 |
438
+ | -1 | -1 | - | 0.7089 |
439
+
440
+
441
+ ### Framework Versions
442
+ - Python: 3.12.11
443
+ - Sentence Transformers: 5.1.0
444
+ - Transformers: 4.55.4
445
+ - PyTorch: 2.8.0+cu126
446
+ - Accelerate: 1.10.1
447
+ - Datasets: 4.0.0
448
+ - Tokenizers: 0.21.4
449
+
450
+ ## Citation
451
+
452
+ ### BibTeX
453
+
454
+ #### Sentence Transformers
455
+ ```bibtex
456
+ @inproceedings{reimers-2019-sentence-bert,
457
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
458
+ author = "Reimers, Nils and Gurevych, Iryna",
459
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
460
+ month = "11",
461
+ year = "2019",
462
+ publisher = "Association for Computational Linguistics",
463
+ url = "https://arxiv.org/abs/1908.10084",
464
+ }
465
+ ```
466
+
467
+ #### SpladeLoss
468
+ ```bibtex
469
+ @misc{formal2022distillationhardnegativesampling,
470
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
471
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
472
+ year={2022},
473
+ eprint={2205.04733},
474
+ archivePrefix={arXiv},
475
+ primaryClass={cs.IR},
476
+ url={https://arxiv.org/abs/2205.04733},
477
+ }
478
+ ```
479
+
480
+ #### SparseMultipleNegativesRankingLoss
481
+ ```bibtex
482
+ @misc{henderson2017efficient,
483
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
484
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
485
+ year={2017},
486
+ eprint={1705.00652},
487
+ archivePrefix={arXiv},
488
+ primaryClass={cs.CL}
489
+ }
490
+ ```
491
+
492
+ #### FlopsLoss
493
+ ```bibtex
494
+ @article{paria2020minimizing,
495
+ title={Minimizing flops to learn efficient sparse representations},
496
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
497
+ journal={arXiv preprint arXiv:2004.05665},
498
+ year={2020}
499
+ }
500
+ ```
501
+
502
+ <!--
503
+ ## Glossary
504
+
505
+ *Clearly define terms in order to be accessible across audiences.*
506
+ -->
507
+
508
+ <!--
509
+ ## Model Card Authors
510
+
511
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
512
+ -->
513
+
514
+ <!--
515
+ ## Model Card Contact
516
+
517
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
518
+ -->
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.55.4",
6
+ "pytorch": "2.8.0+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a8fa0143111ad21c58cfbd666dac8afd7510264aa6ce0ff931472d15ae549dcc
3
  size 165634976
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1fdedc56dc2c1e4d05a7c38914ca477e038542b1d583d82b03fa144971c8902
3
  size 165634976
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }