rasyosef commited on
Commit
c5eff42
·
verified ·
1 Parent(s): 67dfbda

Add new SparseEncoder model

Browse files
1_SpladePooling/config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "pooling_strategy": "max",
3
+ "activation_function": "relu",
4
+ "word_embedding_dimension": 30522
5
+ }
README.md ADDED
@@ -0,0 +1,509 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sparse-encoder
5
+ - sparse
6
+ - splade
7
+ - generated_from_trainer
8
+ - dataset_size:1200000
9
+ - loss:SpladeLoss
10
+ - loss:SparseMarginMSELoss
11
+ - loss:FlopsLoss
12
+ base_model: yosefw/SPLADE-BERT-Small-BS128
13
+ widget:
14
+ - text: Donate to the Breast Cancer Research Foundation Now BCRF is the largest nonprofit
15
+ funder of breast cancer research worldwide. Over the years, it has raised more
16
+ than half a billion dollars in support of research that has made a major impact
17
+ on how we view and treat breast cancer.
18
+ - text: Macular degeneration—Loss of central vision, blurred vision (especially while
19
+ reading), distorted vision (like seeing wavy lines), and colors appearing faded.
20
+ The most common cause of blindness in people over age 60. Eye infection, inflammation,
21
+ or injury.
22
+ - text: how do i find the tongue weight of a trailer?
23
+ - text: Feathers (1-3) Pidgey are docile Pokémon, and generally prefer to flee from
24
+ their enemies rather than fight them. Pidgey's small size permits it to hide easily
25
+ in long grass, where it is typically found foraging for small insects. It is known
26
+ to flush out potential prey from long grass by flapping its wings rapidly.
27
+ - text: 10 hilariously insightful foreign words. One of the most obvious differences
28
+ between cognac and whiskey is that cognac makers use grapes, and whiskey makers
29
+ use grains. Although both processes use fermentation to create the liquors, cognac
30
+ makers use a double distillation process.
31
+ pipeline_tag: feature-extraction
32
+ library_name: sentence-transformers
33
+ metrics:
34
+ - dot_accuracy@1
35
+ - dot_accuracy@3
36
+ - dot_accuracy@5
37
+ - dot_accuracy@10
38
+ - dot_precision@1
39
+ - dot_precision@3
40
+ - dot_precision@5
41
+ - dot_precision@10
42
+ - dot_recall@1
43
+ - dot_recall@3
44
+ - dot_recall@5
45
+ - dot_recall@10
46
+ - dot_ndcg@10
47
+ - dot_mrr@10
48
+ - dot_map@100
49
+ - query_active_dims
50
+ - query_sparsity_ratio
51
+ - corpus_active_dims
52
+ - corpus_sparsity_ratio
53
+ model-index:
54
+ - name: SPLADE Sparse Encoder
55
+ results:
56
+ - task:
57
+ type: sparse-information-retrieval
58
+ name: Sparse Information Retrieval
59
+ dataset:
60
+ name: Unknown
61
+ type: unknown
62
+ metrics:
63
+ - type: dot_accuracy@1
64
+ value: 0.5172
65
+ name: Dot Accuracy@1
66
+ - type: dot_accuracy@3
67
+ value: 0.8368
68
+ name: Dot Accuracy@3
69
+ - type: dot_accuracy@5
70
+ value: 0.9232
71
+ name: Dot Accuracy@5
72
+ - type: dot_accuracy@10
73
+ value: 0.9762
74
+ name: Dot Accuracy@10
75
+ - type: dot_precision@1
76
+ value: 0.5172
77
+ name: Dot Precision@1
78
+ - type: dot_precision@3
79
+ value: 0.2866666666666667
80
+ name: Dot Precision@3
81
+ - type: dot_precision@5
82
+ value: 0.1924
83
+ name: Dot Precision@5
84
+ - type: dot_precision@10
85
+ value: 0.10273999999999998
86
+ name: Dot Precision@10
87
+ - type: dot_recall@1
88
+ value: 0.5006
89
+ name: Dot Recall@1
90
+ - type: dot_recall@3
91
+ value: 0.8237833333333332
92
+ name: Dot Recall@3
93
+ - type: dot_recall@5
94
+ value: 0.91535
95
+ name: Dot Recall@5
96
+ - type: dot_recall@10
97
+ value: 0.9723333333333332
98
+ name: Dot Recall@10
99
+ - type: dot_ndcg@10
100
+ value: 0.7553714776897319
101
+ name: Dot Ndcg@10
102
+ - type: dot_mrr@10
103
+ value: 0.6876940476190507
104
+ name: Dot Mrr@10
105
+ - type: dot_map@100
106
+ value: 0.6829029994536953
107
+ name: Dot Map@100
108
+ - type: query_active_dims
109
+ value: 29.71980094909668
110
+ name: Query Active Dims
111
+ - type: query_sparsity_ratio
112
+ value: 0.9990262826502491
113
+ name: Query Sparsity Ratio
114
+ - type: corpus_active_dims
115
+ value: 168.3538420216879
116
+ name: Corpus Active Dims
117
+ - type: corpus_sparsity_ratio
118
+ value: 0.9944841805248121
119
+ name: Corpus Sparsity Ratio
120
+ ---
121
+
122
+ # SPLADE Sparse Encoder
123
+
124
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [yosefw/SPLADE-BERT-Small-BS128](https://huggingface.co/yosefw/SPLADE-BERT-Small-BS128) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
125
+ ## Model Details
126
+
127
+ ### Model Description
128
+ - **Model Type:** SPLADE Sparse Encoder
129
+ - **Base model:** [yosefw/SPLADE-BERT-Small-BS128](https://huggingface.co/yosefw/SPLADE-BERT-Small-BS128) <!-- at revision 27575d2504e7400b5ed11f94d0e162e3e7c01af6 -->
130
+ - **Maximum Sequence Length:** 512 tokens
131
+ - **Output Dimensionality:** 30522 dimensions
132
+ - **Similarity Function:** Dot Product
133
+ <!-- - **Training Dataset:** Unknown -->
134
+ <!-- - **Language:** Unknown -->
135
+ <!-- - **License:** Unknown -->
136
+
137
+ ### Model Sources
138
+
139
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
140
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
141
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
142
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
143
+
144
+ ### Full Model Architecture
145
+
146
+ ```
147
+ SparseEncoder(
148
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
149
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
150
+ )
151
+ ```
152
+
153
+ ## Usage
154
+
155
+ ### Direct Usage (Sentence Transformers)
156
+
157
+ First install the Sentence Transformers library:
158
+
159
+ ```bash
160
+ pip install -U sentence-transformers
161
+ ```
162
+
163
+ Then you can load this model and run inference.
164
+ ```python
165
+ from sentence_transformers import SparseEncoder
166
+
167
+ # Download from the 🤗 Hub
168
+ model = SparseEncoder("yosefw/SPLADE-BERT-Small-BS128-distil")
169
+ # Run inference
170
+ queries = [
171
+ "is cognac whisky",
172
+ ]
173
+ documents = [
174
+ 'Cognac vs Whiskey. • Whiskey is the alcoholic drink made from grains whereas Cognac is the alcoholic drink made from grapes. • Cognac is a type of brandy. In fact, many label it as the finest of brandies. • Cognac is the brandy originating from a wine producing region of France called Cognac. • While a cognac is considered an after dinner beverage that is intended to digest food, there is no such stereotyping of whiskey that can be consumed anytime of the day.',
175
+ '10 hilariously insightful foreign words. One of the most obvious differences between cognac and whiskey is that cognac makers use grapes, and whiskey makers use grains. Although both processes use fermentation to create the liquors, cognac makers use a double distillation process.',
176
+ 'The word whisky (or whiskey) is an anglicisation of the Classical Gaelic word uisce / uisge meaning water (now written as uisce in Irish Gaelic, and uisge in Scottish Gaelic). Distilled alcohol was known in Latin as aqua vitae (water of life).',
177
+ ]
178
+ query_embeddings = model.encode_query(queries)
179
+ document_embeddings = model.encode_document(documents)
180
+ print(query_embeddings.shape, document_embeddings.shape)
181
+ # [1, 30522] [3, 30522]
182
+
183
+ # Get the similarity scores for the embeddings
184
+ similarities = model.similarity(query_embeddings, document_embeddings)
185
+ print(similarities)
186
+ # tensor([[22.4589, 20.5905, 10.0662]])
187
+ ```
188
+
189
+ <!--
190
+ ### Direct Usage (Transformers)
191
+
192
+ <details><summary>Click to see the direct usage in Transformers</summary>
193
+
194
+ </details>
195
+ -->
196
+
197
+ <!--
198
+ ### Downstream Usage (Sentence Transformers)
199
+
200
+ You can finetune this model on your own dataset.
201
+
202
+ <details><summary>Click to expand</summary>
203
+
204
+ </details>
205
+ -->
206
+
207
+ <!--
208
+ ### Out-of-Scope Use
209
+
210
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
211
+ -->
212
+
213
+ ## Evaluation
214
+
215
+ ### Metrics
216
+
217
+ #### Sparse Information Retrieval
218
+
219
+ * Evaluated with [<code>SparseInformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sparse_encoder/evaluation.html#sentence_transformers.sparse_encoder.evaluation.SparseInformationRetrievalEvaluator)
220
+
221
+ | Metric | Value |
222
+ |:----------------------|:-----------|
223
+ | dot_accuracy@1 | 0.5172 |
224
+ | dot_accuracy@3 | 0.8368 |
225
+ | dot_accuracy@5 | 0.9232 |
226
+ | dot_accuracy@10 | 0.9762 |
227
+ | dot_precision@1 | 0.5172 |
228
+ | dot_precision@3 | 0.2867 |
229
+ | dot_precision@5 | 0.1924 |
230
+ | dot_precision@10 | 0.1027 |
231
+ | dot_recall@1 | 0.5006 |
232
+ | dot_recall@3 | 0.8238 |
233
+ | dot_recall@5 | 0.9153 |
234
+ | dot_recall@10 | 0.9723 |
235
+ | **dot_ndcg@10** | **0.7554** |
236
+ | dot_mrr@10 | 0.6877 |
237
+ | dot_map@100 | 0.6829 |
238
+ | query_active_dims | 29.7198 |
239
+ | query_sparsity_ratio | 0.999 |
240
+ | corpus_active_dims | 168.3538 |
241
+ | corpus_sparsity_ratio | 0.9945 |
242
+
243
+ <!--
244
+ ## Bias, Risks and Limitations
245
+
246
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
247
+ -->
248
+
249
+ <!--
250
+ ### Recommendations
251
+
252
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
253
+ -->
254
+
255
+ ## Training Details
256
+
257
+ ### Training Dataset
258
+
259
+ #### Unnamed Dataset
260
+
261
+ * Size: 1,200,000 training samples
262
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, <code>negative_2</code>, and <code>label</code>
263
+ * Approximate statistics based on the first 1000 samples:
264
+ | | query | positive | negative_1 | negative_2 | label |
265
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------|
266
+ | type | string | string | string | string | list |
267
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.04 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 81.11 tokens</li><li>max: 215 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 77.81 tokens</li><li>max: 247 tokens</li></ul> | <ul><li>min: 22 tokens</li><li>mean: 76.2 tokens</li><li>max: 217 tokens</li></ul> | <ul><li>size: 2 elements</li></ul> |
268
+ * Samples:
269
+ | query | positive | negative_1 | negative_2 | label |
270
+ |:-------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------|
271
+ | <code>The _____________________ is a body system which consists of glands that produce hormones that act throughout the body.</code> | <code>Endocrine System. The endocrine system is made up of a group of glands that produce the body's long-distance messengers, or hormones. Hormones are chemicals that control body functions, such as metabolism, growth, and sexual development.t is made up of a group of organs that transport blood throughout the body. The heart pumps the blood and the arteries and veins transport it. Oxygen-rich blood leaves the left side of the heart and enters the biggest artery, called the aorta.</code> | <code>The endocrine system is a control system of ductless glands that secrete hormones within specific organs. Hormones act as messengers, and are carried by the bloodstream to different cells in the body, which interpret these messages and act on them.he pancreas is unusual among the body's glands in that it also has a very important endocrine function. Small groups of special cells called islet cells throughout the organ make the hormones of insulin and glucagon.</code> | <code>These glands produce different types of hormones that evoke a specific response in other cells, tissues, and/or organs located throughout the body. The hormones reach these faraway targets using the blood stream. Like the nervous system, the endocrine system is one of your body’s main communicators.he Endocrine System Essentials. 1 The endocrine system is made up of a network of glands. 2 These glands secrete hormones to regulate many bodily functions, including growth and metabolism.</code> | <code>[2.3722684383392334, 5.211579322814941]</code> |
272
+ | <code>causes of low body temperature in adults</code> | <code>Hypothermia is defined as a body temperature (core, or internal body temperature) of less than about 95 F (35 C). Usually, hypothermia occurs when the body's temperature regulation is overwhelmed by a cold environment. However, in the medical and lay literature there are essentially two major classifications, accidental hypothermia and intentional hypothermia.</code> | <code>In general, a baby has a fever when their body temperature exceeds 100.4°F, or 38°C. A child has a fever when their temperature exceeds 99.5°F, or 37.5°C. An adult has a fever when their temperature exceeds 99 to 99.5°F, or 37.2 to 37.5°C.</code> | <code>Consequently, an accurate measurement of body temperature (best is rectal core temperature) of 100.4 F (38 C) or above is considered to be a fever.. A newer option includes a temperature-sensitive infrared device that measures the temperature in the skin by simply rubbing the sensor on the body.</code> | <code>[1.3747079372406006, 8.096447944641113]</code> |
273
+ | <code>who is laila gifty akita</code> | <code>Lailah Gifty Akita is a Ghanaian and founder of Smart Youth Volunteers Foundation. She obtained a BSc in Renewable Natural Resources Management at Kwame Nkrumah University of Science and Technology, Kumasi-Ghana. She also had MPhil in Oceanography at the University of Ghana. She obtained a doctorate in Geosciences at International Max Planck Research School for Global Biogeochemical Cycles-Friedrich Schiller University of Jena, Germany ( June 2011 to March 2016). Lailah is an influential lady with the passion of empowering the mind of young people to make a great difference.</code> | <code>She is a PhD-student, studying Geosciences at the University of Jena, Germany. She is an enthusiastic inspirational writer. She wishes to challenge and inspire people from all walks of life to dare a greater life. You can capable of heroic deeds. Think well of yourself and act positively. You can correspond with Lailah via an email:[email protected]. https://www.goodreads.com/author/show/8297615.Lailah_Gifty_Akita/blog.</code> | <code>Also in the Talmud, the interpretation is found of rabbi Hanina ben Pappa (3rd century AD), that Lailah is an angel in charge of conception who takes a drop of semen and places it before God, saying: For R. Hanina b.</code> | <code>[2.6488447189331055, 15.058775901794434]</code> |
274
+ * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
275
+ ```json
276
+ {
277
+ "loss": "SparseMarginMSELoss",
278
+ "document_regularizer_weight": 0.12,
279
+ "query_regularizer_weight": 0.2
280
+ }
281
+ ```
282
+
283
+ ### Training Hyperparameters
284
+ #### Non-Default Hyperparameters
285
+
286
+ - `eval_strategy`: epoch
287
+ - `per_device_train_batch_size`: 64
288
+ - `per_device_eval_batch_size`: 64
289
+ - `learning_rate`: 4e-05
290
+ - `num_train_epochs`: 4
291
+ - `lr_scheduler_type`: cosine
292
+ - `warmup_ratio`: 0.025
293
+ - `fp16`: True
294
+ - `load_best_model_at_end`: True
295
+ - `optim`: adamw_torch_fused
296
+ - `push_to_hub`: True
297
+
298
+ #### All Hyperparameters
299
+ <details><summary>Click to expand</summary>
300
+
301
+ - `overwrite_output_dir`: False
302
+ - `do_predict`: False
303
+ - `eval_strategy`: epoch
304
+ - `prediction_loss_only`: True
305
+ - `per_device_train_batch_size`: 64
306
+ - `per_device_eval_batch_size`: 64
307
+ - `per_gpu_train_batch_size`: None
308
+ - `per_gpu_eval_batch_size`: None
309
+ - `gradient_accumulation_steps`: 1
310
+ - `eval_accumulation_steps`: None
311
+ - `torch_empty_cache_steps`: None
312
+ - `learning_rate`: 4e-05
313
+ - `weight_decay`: 0.0
314
+ - `adam_beta1`: 0.9
315
+ - `adam_beta2`: 0.999
316
+ - `adam_epsilon`: 1e-08
317
+ - `max_grad_norm`: 1.0
318
+ - `num_train_epochs`: 4
319
+ - `max_steps`: -1
320
+ - `lr_scheduler_type`: cosine
321
+ - `lr_scheduler_kwargs`: {}
322
+ - `warmup_ratio`: 0.025
323
+ - `warmup_steps`: 0
324
+ - `log_level`: passive
325
+ - `log_level_replica`: warning
326
+ - `log_on_each_node`: True
327
+ - `logging_nan_inf_filter`: True
328
+ - `save_safetensors`: True
329
+ - `save_on_each_node`: False
330
+ - `save_only_model`: False
331
+ - `restore_callback_states_from_checkpoint`: False
332
+ - `no_cuda`: False
333
+ - `use_cpu`: False
334
+ - `use_mps_device`: False
335
+ - `seed`: 42
336
+ - `data_seed`: None
337
+ - `jit_mode_eval`: False
338
+ - `use_ipex`: False
339
+ - `bf16`: False
340
+ - `fp16`: True
341
+ - `fp16_opt_level`: O1
342
+ - `half_precision_backend`: auto
343
+ - `bf16_full_eval`: False
344
+ - `fp16_full_eval`: False
345
+ - `tf32`: None
346
+ - `local_rank`: 0
347
+ - `ddp_backend`: None
348
+ - `tpu_num_cores`: None
349
+ - `tpu_metrics_debug`: False
350
+ - `debug`: []
351
+ - `dataloader_drop_last`: False
352
+ - `dataloader_num_workers`: 0
353
+ - `dataloader_prefetch_factor`: None
354
+ - `past_index`: -1
355
+ - `disable_tqdm`: False
356
+ - `remove_unused_columns`: True
357
+ - `label_names`: None
358
+ - `load_best_model_at_end`: True
359
+ - `ignore_data_skip`: False
360
+ - `fsdp`: []
361
+ - `fsdp_min_num_params`: 0
362
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
363
+ - `fsdp_transformer_layer_cls_to_wrap`: None
364
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
365
+ - `deepspeed`: None
366
+ - `label_smoothing_factor`: 0.0
367
+ - `optim`: adamw_torch_fused
368
+ - `optim_args`: None
369
+ - `adafactor`: False
370
+ - `group_by_length`: False
371
+ - `length_column_name`: length
372
+ - `ddp_find_unused_parameters`: None
373
+ - `ddp_bucket_cap_mb`: None
374
+ - `ddp_broadcast_buffers`: False
375
+ - `dataloader_pin_memory`: True
376
+ - `dataloader_persistent_workers`: False
377
+ - `skip_memory_metrics`: True
378
+ - `use_legacy_prediction_loop`: False
379
+ - `push_to_hub`: True
380
+ - `resume_from_checkpoint`: None
381
+ - `hub_model_id`: None
382
+ - `hub_strategy`: every_save
383
+ - `hub_private_repo`: None
384
+ - `hub_always_push`: False
385
+ - `hub_revision`: None
386
+ - `gradient_checkpointing`: False
387
+ - `gradient_checkpointing_kwargs`: None
388
+ - `include_inputs_for_metrics`: False
389
+ - `include_for_metrics`: []
390
+ - `eval_do_concat_batches`: True
391
+ - `fp16_backend`: auto
392
+ - `push_to_hub_model_id`: None
393
+ - `push_to_hub_organization`: None
394
+ - `mp_parameters`:
395
+ - `auto_find_batch_size`: False
396
+ - `full_determinism`: False
397
+ - `torchdynamo`: None
398
+ - `ray_scope`: last
399
+ - `ddp_timeout`: 1800
400
+ - `torch_compile`: False
401
+ - `torch_compile_backend`: None
402
+ - `torch_compile_mode`: None
403
+ - `include_tokens_per_second`: False
404
+ - `include_num_input_tokens_seen`: False
405
+ - `neftune_noise_alpha`: None
406
+ - `optim_target_modules`: None
407
+ - `batch_eval_metrics`: False
408
+ - `eval_on_start`: False
409
+ - `use_liger_kernel`: False
410
+ - `liger_kernel_config`: None
411
+ - `eval_use_gather_object`: False
412
+ - `average_tokens_across_devices`: False
413
+ - `prompts`: None
414
+ - `batch_sampler`: batch_sampler
415
+ - `multi_dataset_batch_sampler`: proportional
416
+ - `router_mapping`: {}
417
+ - `learning_rate_mapping`: {}
418
+
419
+ </details>
420
+
421
+ ### Training Logs
422
+ | Epoch | Step | Training Loss | dot_ndcg@10 |
423
+ |:-------:|:---------:|:-------------:|:-----------:|
424
+ | 1.0 | 18750 | 7.806 | 0.7439 |
425
+ | 2.0 | 37500 | 5.7509 | 0.7520 |
426
+ | **3.0** | **56250** | **4.5026** | **0.7554** |
427
+ | 4.0 | 75000 | 3.909 | 0.7534 |
428
+ | -1 | -1 | - | 0.7554 |
429
+
430
+ * The bold row denotes the saved checkpoint.
431
+
432
+ ### Framework Versions
433
+ - Python: 3.11.13
434
+ - Sentence Transformers: 5.1.0
435
+ - Transformers: 4.55.2
436
+ - PyTorch: 2.6.0+cu124
437
+ - Accelerate: 1.10.0
438
+ - Datasets: 4.0.0
439
+ - Tokenizers: 0.21.4
440
+
441
+ ## Citation
442
+
443
+ ### BibTeX
444
+
445
+ #### Sentence Transformers
446
+ ```bibtex
447
+ @inproceedings{reimers-2019-sentence-bert,
448
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
449
+ author = "Reimers, Nils and Gurevych, Iryna",
450
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
451
+ month = "11",
452
+ year = "2019",
453
+ publisher = "Association for Computational Linguistics",
454
+ url = "https://arxiv.org/abs/1908.10084",
455
+ }
456
+ ```
457
+
458
+ #### SpladeLoss
459
+ ```bibtex
460
+ @misc{formal2022distillationhardnegativesampling,
461
+ title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
462
+ author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
463
+ year={2022},
464
+ eprint={2205.04733},
465
+ archivePrefix={arXiv},
466
+ primaryClass={cs.IR},
467
+ url={https://arxiv.org/abs/2205.04733},
468
+ }
469
+ ```
470
+
471
+ #### SparseMarginMSELoss
472
+ ```bibtex
473
+ @misc{hofstätter2021improving,
474
+ title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
475
+ author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
476
+ year={2021},
477
+ eprint={2010.02666},
478
+ archivePrefix={arXiv},
479
+ primaryClass={cs.IR}
480
+ }
481
+ ```
482
+
483
+ #### FlopsLoss
484
+ ```bibtex
485
+ @article{paria2020minimizing,
486
+ title={Minimizing flops to learn efficient sparse representations},
487
+ author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
488
+ journal={arXiv preprint arXiv:2004.05665},
489
+ year={2020}
490
+ }
491
+ ```
492
+
493
+ <!--
494
+ ## Glossary
495
+
496
+ *Clearly define terms in order to be accessible across audiences.*
497
+ -->
498
+
499
+ <!--
500
+ ## Model Card Authors
501
+
502
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
503
+ -->
504
+
505
+ <!--
506
+ ## Model Card Contact
507
+
508
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
509
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForMaskedLM"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 512,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 2048,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 8,
16
+ "num_hidden_layers": 4,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.55.2",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SparseEncoder",
3
+ "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.55.2",
6
+ "pytorch": "2.6.0+cu124"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "dot"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ec00e9e7df6c50634dc89a9e490f1f787ef35fd9beccce17895323138acbedc
3
+ size 115189296
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.sparse_encoder.models.MLMTransformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_SpladePooling",
12
+ "type": "sentence_transformers.sparse_encoder.models.SpladePooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff