tomaarsen HF staff commited on
Commit
9378095
·
verified ·
1 Parent(s): c789527

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,520 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - generated_from_trainer
8
+ - dataset_size:78704
9
+ - loss:ListNetLoss
10
+ base_model: microsoft/MiniLM-L12-H384-uncased
11
+ datasets:
12
+ - microsoft/ms_marco
13
+ pipeline_tag: text-ranking
14
+ library_name: sentence-transformers
15
+ metrics:
16
+ - map
17
+ - mrr@10
18
+ - ndcg@10
19
+ co2_eq_emissions:
20
+ emissions: 92.35489230616244
21
+ energy_consumed: 0.23759819168968113
22
+ source: codecarbon
23
+ training_type: fine-tuning
24
+ on_cloud: false
25
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
26
+ ram_total_size: 31.777088165283203
27
+ hours_used: 0.977
28
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
29
+ model-index:
30
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
31
+ results:
32
+ - task:
33
+ type: cross-encoder-reranking
34
+ name: Cross Encoder Reranking
35
+ dataset:
36
+ name: NanoMSMARCO R100
37
+ type: NanoMSMARCO_R100
38
+ metrics:
39
+ - type: map
40
+ value: 0.506
41
+ name: Map
42
+ - type: mrr@10
43
+ value: 0.49
44
+ name: Mrr@10
45
+ - type: ndcg@10
46
+ value: 0.5497
47
+ name: Ndcg@10
48
+ - task:
49
+ type: cross-encoder-reranking
50
+ name: Cross Encoder Reranking
51
+ dataset:
52
+ name: NanoNFCorpus R100
53
+ type: NanoNFCorpus_R100
54
+ metrics:
55
+ - type: map
56
+ value: 0.3383
57
+ name: Map
58
+ - type: mrr@10
59
+ value: 0.5705
60
+ name: Mrr@10
61
+ - type: ndcg@10
62
+ value: 0.3736
63
+ name: Ndcg@10
64
+ - task:
65
+ type: cross-encoder-reranking
66
+ name: Cross Encoder Reranking
67
+ dataset:
68
+ name: NanoNQ R100
69
+ type: NanoNQ_R100
70
+ metrics:
71
+ - type: map
72
+ value: 0.5939
73
+ name: Map
74
+ - type: mrr@10
75
+ value: 0.6004
76
+ name: Mrr@10
77
+ - type: ndcg@10
78
+ value: 0.6574
79
+ name: Ndcg@10
80
+ - task:
81
+ type: cross-encoder-nano-beir
82
+ name: Cross Encoder Nano BEIR
83
+ dataset:
84
+ name: NanoBEIR R100 mean
85
+ type: NanoBEIR_R100_mean
86
+ metrics:
87
+ - type: map
88
+ value: 0.4794
89
+ name: Map
90
+ - type: mrr@10
91
+ value: 0.5536
92
+ name: Mrr@10
93
+ - type: ndcg@10
94
+ value: 0.5269
95
+ name: Ndcg@10
96
+ ---
97
+
98
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
99
+
100
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
101
+
102
+ ## Model Details
103
+
104
+ ### Model Description
105
+ - **Model Type:** Cross Encoder
106
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
107
+ - **Maximum Sequence Length:** 512 tokens
108
+ - **Number of Output Labels:** 1 label
109
+ - **Training Dataset:**
110
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
111
+ - **Language:** en
112
+ <!-- - **License:** Unknown -->
113
+
114
+ ### Model Sources
115
+
116
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
117
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
118
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
119
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
120
+
121
+ ## Usage
122
+
123
+ ### Direct Usage (Sentence Transformers)
124
+
125
+ First install the Sentence Transformers library:
126
+
127
+ ```bash
128
+ pip install -U sentence-transformers
129
+ ```
130
+
131
+ Then you can load this model and run inference.
132
+ ```python
133
+ from sentence_transformers import CrossEncoder
134
+
135
+ # Download from the 🤗 Hub
136
+ model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-seeded")
137
+ # Get scores for pairs of texts
138
+ pairs = [
139
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
140
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
141
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
142
+ ]
143
+ scores = model.predict(pairs)
144
+ print(scores.shape)
145
+ # (3,)
146
+
147
+ # Or rank different texts based on similarity to a single text
148
+ ranks = model.rank(
149
+ 'How many calories in an egg',
150
+ [
151
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
152
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
153
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
154
+ ]
155
+ )
156
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
157
+ ```
158
+
159
+ <!--
160
+ ### Direct Usage (Transformers)
161
+
162
+ <details><summary>Click to see the direct usage in Transformers</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Downstream Usage (Sentence Transformers)
169
+
170
+ You can finetune this model on your own dataset.
171
+
172
+ <details><summary>Click to expand</summary>
173
+
174
+ </details>
175
+ -->
176
+
177
+ <!--
178
+ ### Out-of-Scope Use
179
+
180
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
181
+ -->
182
+
183
+ ## Evaluation
184
+
185
+ ### Metrics
186
+
187
+ #### Cross Encoder Reranking
188
+
189
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
190
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
191
+ ```json
192
+ {
193
+ "at_k": 10,
194
+ "always_rerank_positives": true
195
+ }
196
+ ```
197
+
198
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
199
+ |:------------|:---------------------|:---------------------|:---------------------|
200
+ | map | 0.5060 (+0.0164) | 0.3383 (+0.0773) | 0.5939 (+0.1743) |
201
+ | mrr@10 | 0.4900 (+0.0125) | 0.5705 (+0.0707) | 0.6004 (+0.1737) |
202
+ | **ndcg@10** | **0.5497 (+0.0093)** | **0.3736 (+0.0485)** | **0.6574 (+0.1568)** |
203
+
204
+ #### Cross Encoder Nano BEIR
205
+
206
+ * Dataset: `NanoBEIR_R100_mean`
207
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
208
+ ```json
209
+ {
210
+ "dataset_names": [
211
+ "msmarco",
212
+ "nfcorpus",
213
+ "nq"
214
+ ],
215
+ "rerank_k": 100,
216
+ "at_k": 10,
217
+ "always_rerank_positives": true
218
+ }
219
+ ```
220
+
221
+ | Metric | Value |
222
+ |:------------|:---------------------|
223
+ | map | 0.4794 (+0.0894) |
224
+ | mrr@10 | 0.5536 (+0.0856) |
225
+ | **ndcg@10** | **0.5269 (+0.0715)** |
226
+
227
+ <!--
228
+ ## Bias, Risks and Limitations
229
+
230
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
231
+ -->
232
+
233
+ <!--
234
+ ### Recommendations
235
+
236
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
237
+ -->
238
+
239
+ ## Training Details
240
+
241
+ ### Training Dataset
242
+
243
+ #### ms_marco
244
+
245
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
246
+ * Size: 78,704 training samples
247
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
248
+ * Approximate statistics based on the first 1000 samples:
249
+ | | query | docs | labels |
250
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
251
+ | type | string | list | list |
252
+ | details | <ul><li>min: 11 characters</li><li>mean: 34.13 characters</li><li>max: 88 characters</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> |
253
+ * Samples:
254
+ | query | docs | labels |
255
+ |:----------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
256
+ | <code>food that are vegetarian that have vitamin a</code> | <code>['Vitamin A is a fat soluble vitamin, and therefore, needs to be consumed with fat in order to have optimal absorption. High vitamin A foods include sweet potatoes, carrots, dark leafy greens, winter squashes, lettuce, dried apricots, cantaloupe, bell peppers, fish, liver, and tropical fruits. The current daily value for Vitamin A is 5000 international units (IU).', 'Unlike some other B vitamins, B12 is not found in any plant food other than fortified cereals. It is, however, abundant in many meats and fish, and in smaller amounts in milk and eggs. This makes it difficult for people following a strict vegetarian diet to get the necessary amount of vitamin B12.', 'They found that 92% of the vegans they studied -- those who ate the strictest vegetarian diet, which shuns all animal products, including milk and eggs -- had vitamin B12 deficiency. But two in three people who followed a vegetarian diet that included milk and eggs as their only animal foods also were deficient.', 'Vitamin B 1...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
257
+ | <code>what is trilobar prostatic enlargement</code> | <code>["Prostate enlargement: Most prostatic enlargement is due to benign prostatic hyperplasia (BPH), a problem that bothers men increasingly with advancing age. The process of BPH generally begins in a man's 30s, evolves very slowly and usually causes symptoms only after he has passed the half-century mark. It is not a precursor (a forerunner) to prostate cancer. Treatment of BPH is usually reserved for men with significant symptoms. Watchful waiting with medical monitoring once a year is appropriate for most men with BPH. The medical therapy of BPH includes medication.", '1 A benign (noncancerous) condition in which an overgrowth of prostate tissue pushes against the urethra and the bladder, blocking the flow of urine. 2 Increase in constituent cells in the prostate, leading to enlargement of the organ (hypertrophy) and adverse impact on the lower urinary tract function. 1 Increase in constituent cells in the prostate, leading to enlargement of the organ (hypertrophy) and adverse impact ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
258
+ | <code>what is the classification of seasoning</code> | <code>['Artificial kiln seasoning. Its the most traditional way of seasoning wood or timber. In this method wood is dryed usually by the keeping the wood exposed to air, so that the moisture evaporates and wood is seasoned. This method is very economical is a sense that no operational charges exists but the process is too slow & ..... [Read More].', 'Vegetables used in seasoning such as onions, garlic, and celery may also be included in this category in some circumstances. Some people break types of spices up by what one does when it is added to food. Sweet, hot, pungent, and tangy are the four primary categories. I think the most common herbs and spices are garlic, onions, and Italian spices like Oregano. Garlic and onion can also be found in salts and powders for simpler things like marinades and just for basic baking.', 'Spices and herbs at a grocery shop in Goa, India. A spice is a seed, fruit, root, bark, berry, bud or vegetable substance primarily used for flavoring, coloring or preser...</code> | <code>[1, 1, 1, 0, 0, ...]</code> |
259
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
260
+ ```json
261
+ {
262
+ "activation_fct": "torch.nn.modules.linear.Identity",
263
+ "mini_batch_size": 16
264
+ }
265
+ ```
266
+
267
+ ### Evaluation Dataset
268
+
269
+ #### ms_marco
270
+
271
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
272
+ * Size: 1,000 evaluation samples
273
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
274
+ * Approximate statistics based on the first 1000 samples:
275
+ | | query | docs | labels |
276
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
277
+ | type | string | list | list |
278
+ | details | <ul><li>min: 11 characters</li><li>mean: 33.79 characters</li><li>max: 95 characters</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> |
279
+ * Samples:
280
+ | query | docs | labels |
281
+ |:--------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
282
+ | <code>absolute viscosity definition</code> | <code>['Noun. 1. absolute viscosity-a measure of the resistance to flow of a fluid under an applied force. coefficient of viscosity, dynamic viscosity. coefficient-a constant number that serves as a measure of some property or characteristic.', '1 the state or property of being viscous. 2 (Physics). a the extent to which a fluid resists a tendency to flow. b (Also called) absolute viscosity a measure of this resistance, equal to the tangential stress on a liquid undergoing streamline flow divided by its velocity gradient. It is measured in newton seconds per metre squared. , (Symbol) η.', 'Kinematic Viscosity. Kinematic viscosity is the ratio of-absolute (or dynamic) viscosity to density-a quantity in which no force is involved. Kinematic viscosity can be obtained by dividing the absolute viscosity of a fluid with the fluid mass density.', '2] = shear stress acted by fluid on lower surface of the blank element du = velocity of the blank element relative to blank holder and die surface [mu] =...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
283
+ | <code>meaning of chartered engineer</code> | <code>['noun. ( 1 in Britain) an engineer who is registered with the Engineering Council as having the scientific and technical knowledge and practical experience to satisfy its professional requirements.', '1 Trends. ( 2 in Britain) an engineer who is registered with the Engineering Council as having the scientific and technical knowledge and practical experience to satisfy its professional requirements.', '1 (in Britain) an engineer who is registered with the Engineering Council as having the scientific and technical knowledge and practical experience to satisfy its professional requirements. 2 Abbreviation: CEng.', 'chartered engineer n (in Britain) an engineer who is registered with the Engineering Council as having the scientific and technical knowledge and practical experience to satisfy its professional requirements, (Abbrev.) CEng. chartered engineer.', 'chartered engineer. ( 1 in Britain) an engineer who is registered with the Engineering Council as having the scientific and techni...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
284
+ | <code>how much do personal assistants make</code> | <code>['States That Pay the Most. Without exception, the personal assistants wanting to earn the highest dollar amount per year should live on the East Coast. New York tops the list, with an annual salary range of over $66,000 per year or just over $31.00 per hour. PAs in Maryland make the least per year at nearly $59,000. According to the Bureau of Labor Statistics, as of 2013, executive assistants/secretaries earn the highest salaries at nearly $51,870 per year on average. Other highly trained assistants include legal and medical secretaries, who can expect to earn just over $45,000 and $33,000 respectively.', "Before an agreement on pay is reached, research the national and local averages for full time personal assistant pay. According to the US Bureau of Labor Statistics (BLS), an assistant in California made an average hourly rate of $27.01 as of May 2012, while the same position in Florida earned a rate of $20.60. Determine what your budget for an assistant is and compare that number t...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
285
+ * Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
286
+ ```json
287
+ {
288
+ "activation_fct": "torch.nn.modules.linear.Identity",
289
+ "mini_batch_size": 16
290
+ }
291
+ ```
292
+
293
+ ### Training Hyperparameters
294
+ #### Non-Default Hyperparameters
295
+
296
+ - `eval_strategy`: steps
297
+ - `per_device_train_batch_size`: 16
298
+ - `per_device_eval_batch_size`: 16
299
+ - `learning_rate`: 2e-05
300
+ - `num_train_epochs`: 1
301
+ - `warmup_ratio`: 0.1
302
+ - `seed`: 12
303
+ - `bf16`: True
304
+ - `load_best_model_at_end`: True
305
+
306
+ #### All Hyperparameters
307
+ <details><summary>Click to expand</summary>
308
+
309
+ - `overwrite_output_dir`: False
310
+ - `do_predict`: False
311
+ - `eval_strategy`: steps
312
+ - `prediction_loss_only`: True
313
+ - `per_device_train_batch_size`: 16
314
+ - `per_device_eval_batch_size`: 16
315
+ - `per_gpu_train_batch_size`: None
316
+ - `per_gpu_eval_batch_size`: None
317
+ - `gradient_accumulation_steps`: 1
318
+ - `eval_accumulation_steps`: None
319
+ - `torch_empty_cache_steps`: None
320
+ - `learning_rate`: 2e-05
321
+ - `weight_decay`: 0.0
322
+ - `adam_beta1`: 0.9
323
+ - `adam_beta2`: 0.999
324
+ - `adam_epsilon`: 1e-08
325
+ - `max_grad_norm`: 1.0
326
+ - `num_train_epochs`: 1
327
+ - `max_steps`: -1
328
+ - `lr_scheduler_type`: linear
329
+ - `lr_scheduler_kwargs`: {}
330
+ - `warmup_ratio`: 0.1
331
+ - `warmup_steps`: 0
332
+ - `log_level`: passive
333
+ - `log_level_replica`: warning
334
+ - `log_on_each_node`: True
335
+ - `logging_nan_inf_filter`: True
336
+ - `save_safetensors`: True
337
+ - `save_on_each_node`: False
338
+ - `save_only_model`: False
339
+ - `restore_callback_states_from_checkpoint`: False
340
+ - `no_cuda`: False
341
+ - `use_cpu`: False
342
+ - `use_mps_device`: False
343
+ - `seed`: 12
344
+ - `data_seed`: None
345
+ - `jit_mode_eval`: False
346
+ - `use_ipex`: False
347
+ - `bf16`: True
348
+ - `fp16`: False
349
+ - `fp16_opt_level`: O1
350
+ - `half_precision_backend`: auto
351
+ - `bf16_full_eval`: False
352
+ - `fp16_full_eval`: False
353
+ - `tf32`: None
354
+ - `local_rank`: 0
355
+ - `ddp_backend`: None
356
+ - `tpu_num_cores`: None
357
+ - `tpu_metrics_debug`: False
358
+ - `debug`: []
359
+ - `dataloader_drop_last`: False
360
+ - `dataloader_num_workers`: 0
361
+ - `dataloader_prefetch_factor`: None
362
+ - `past_index`: -1
363
+ - `disable_tqdm`: False
364
+ - `remove_unused_columns`: True
365
+ - `label_names`: None
366
+ - `load_best_model_at_end`: True
367
+ - `ignore_data_skip`: False
368
+ - `fsdp`: []
369
+ - `fsdp_min_num_params`: 0
370
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
371
+ - `fsdp_transformer_layer_cls_to_wrap`: None
372
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
373
+ - `deepspeed`: None
374
+ - `label_smoothing_factor`: 0.0
375
+ - `optim`: adamw_torch
376
+ - `optim_args`: None
377
+ - `adafactor`: False
378
+ - `group_by_length`: False
379
+ - `length_column_name`: length
380
+ - `ddp_find_unused_parameters`: None
381
+ - `ddp_bucket_cap_mb`: None
382
+ - `ddp_broadcast_buffers`: False
383
+ - `dataloader_pin_memory`: True
384
+ - `dataloader_persistent_workers`: False
385
+ - `skip_memory_metrics`: True
386
+ - `use_legacy_prediction_loop`: False
387
+ - `push_to_hub`: False
388
+ - `resume_from_checkpoint`: None
389
+ - `hub_model_id`: None
390
+ - `hub_strategy`: every_save
391
+ - `hub_private_repo`: None
392
+ - `hub_always_push`: False
393
+ - `gradient_checkpointing`: False
394
+ - `gradient_checkpointing_kwargs`: None
395
+ - `include_inputs_for_metrics`: False
396
+ - `include_for_metrics`: []
397
+ - `eval_do_concat_batches`: True
398
+ - `fp16_backend`: auto
399
+ - `push_to_hub_model_id`: None
400
+ - `push_to_hub_organization`: None
401
+ - `mp_parameters`:
402
+ - `auto_find_batch_size`: False
403
+ - `full_determinism`: False
404
+ - `torchdynamo`: None
405
+ - `ray_scope`: last
406
+ - `ddp_timeout`: 1800
407
+ - `torch_compile`: False
408
+ - `torch_compile_backend`: None
409
+ - `torch_compile_mode`: None
410
+ - `dispatch_batches`: None
411
+ - `split_batches`: None
412
+ - `include_tokens_per_second`: False
413
+ - `include_num_input_tokens_seen`: False
414
+ - `neftune_noise_alpha`: None
415
+ - `optim_target_modules`: None
416
+ - `batch_eval_metrics`: False
417
+ - `eval_on_start`: False
418
+ - `use_liger_kernel`: False
419
+ - `eval_use_gather_object`: False
420
+ - `average_tokens_across_devices`: False
421
+ - `prompts`: None
422
+ - `batch_sampler`: batch_sampler
423
+ - `multi_dataset_batch_sampler`: proportional
424
+
425
+ </details>
426
+
427
+ ### Training Logs
428
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
429
+ |:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
430
+ | -1 | -1 | - | - | 0.0300 (-0.5104) | 0.2528 (-0.0723) | 0.0168 (-0.4839) | 0.0999 (-0.3555) |
431
+ | 0.0002 | 1 | 2.0665 | - | - | - | - | - |
432
+ | 0.0508 | 250 | 2.0907 | - | - | - | - | - |
433
+ | 0.1016 | 500 | 2.0889 | 2.0762 | 0.4880 (-0.0524) | 0.3157 (-0.0094) | 0.5145 (+0.0139) | 0.4394 (-0.0160) |
434
+ | 0.1525 | 750 | 2.0817 | - | - | - | - | - |
435
+ | 0.2033 | 1000 | 2.0771 | 2.0739 | 0.5346 (-0.0058) | 0.3581 (+0.0331) | 0.5875 (+0.0869) | 0.4934 (+0.0380) |
436
+ | 0.2541 | 1250 | 2.0813 | - | - | - | - | - |
437
+ | 0.3049 | 1500 | 2.073 | 2.0730 | 0.5088 (-0.0316) | 0.3440 (+0.0189) | 0.5719 (+0.0713) | 0.4749 (+0.0195) |
438
+ | 0.3558 | 1750 | 2.0698 | - | - | - | - | - |
439
+ | 0.4066 | 2000 | 2.0752 | 2.0725 | 0.5421 (+0.0017) | 0.3741 (+0.0490) | 0.6318 (+0.1311) | 0.5160 (+0.0606) |
440
+ | 0.4574 | 2250 | 2.073 | - | - | - | - | - |
441
+ | 0.5082 | 2500 | 2.0712 | 2.0725 | 0.5311 (-0.0094) | 0.3506 (+0.0256) | 0.6258 (+0.1252) | 0.5025 (+0.0471) |
442
+ | 0.5591 | 2750 | 2.0682 | - | - | - | - | - |
443
+ | 0.6099 | 3000 | 2.0738 | 2.0727 | 0.5682 (+0.0277) | 0.3634 (+0.0384) | 0.6241 (+0.1235) | 0.5186 (+0.0632) |
444
+ | 0.6607 | 3250 | 2.0702 | - | - | - | - | - |
445
+ | 0.7115 | 3500 | 2.0722 | 2.0721 | 0.5591 (+0.0187) | 0.3563 (+0.0312) | 0.6453 (+0.1446) | 0.5202 (+0.0649) |
446
+ | 0.7624 | 3750 | 2.0714 | - | - | - | - | - |
447
+ | **0.8132** | **4000** | **2.0632** | **2.0724** | **0.5497 (+0.0093)** | **0.3736 (+0.0485)** | **0.6574 (+0.1568)** | **0.5269 (+0.0715)** |
448
+ | 0.8640 | 4250 | 2.0681 | - | - | - | - | - |
449
+ | 0.9148 | 4500 | 2.066 | 2.0720 | 0.5510 (+0.0106) | 0.3718 (+0.0468) | 0.6483 (+0.1476) | 0.5237 (+0.0683) |
450
+ | 0.9656 | 4750 | 2.0736 | - | - | - | - | - |
451
+ | -1 | -1 | - | - | 0.5497 (+0.0093) | 0.3736 (+0.0485) | 0.6574 (+0.1568) | 0.5269 (+0.0715) |
452
+
453
+ * The bold row denotes the saved checkpoint.
454
+
455
+ ### Environmental Impact
456
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
457
+ - **Energy Consumed**: 0.238 kWh
458
+ - **Carbon Emitted**: 0.092 kg of CO2
459
+ - **Hours Used**: 0.977 hours
460
+
461
+ ### Training Hardware
462
+ - **On Cloud**: No
463
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
464
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
465
+ - **RAM Size**: 31.78 GB
466
+
467
+ ### Framework Versions
468
+ - Python: 3.11.6
469
+ - Sentence Transformers: 3.5.0.dev0
470
+ - Transformers: 4.49.0
471
+ - PyTorch: 2.6.0+cu124
472
+ - Accelerate: 1.5.1
473
+ - Datasets: 3.3.2
474
+ - Tokenizers: 0.21.0
475
+
476
+ ## Citation
477
+
478
+ ### BibTeX
479
+
480
+ #### Sentence Transformers
481
+ ```bibtex
482
+ @inproceedings{reimers-2019-sentence-bert,
483
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
484
+ author = "Reimers, Nils and Gurevych, Iryna",
485
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
486
+ month = "11",
487
+ year = "2019",
488
+ publisher = "Association for Computational Linguistics",
489
+ url = "https://arxiv.org/abs/1908.10084",
490
+ }
491
+ ```
492
+
493
+ #### ListNetLoss
494
+ ```bibtex
495
+ @inproceedings{cao2007learning,
496
+ title={Learning to rank: from pairwise approach to listwise approach},
497
+ author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
498
+ booktitle={Proceedings of the 24th international conference on Machine learning},
499
+ pages={129--136},
500
+ year={2007}
501
+ }
502
+ ```
503
+
504
+ <!--
505
+ ## Glossary
506
+
507
+ *Clearly define terms in order to be accessible across audiences.*
508
+ -->
509
+
510
+ <!--
511
+ ## Model Card Authors
512
+
513
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
514
+ -->
515
+
516
+ <!--
517
+ ## Model Card Contact
518
+
519
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
520
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.49.0",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2681c144a0130fccc27fc1b1d16e3cfc2ef64b60fc3fd36bd7943070d397bd91
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff