yjoonjang commited on
Commit
ca7bac9
·
verified ·
1 Parent(s): 9390c2b

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,502 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - generated_from_trainer
8
+ - dataset_size:78704
9
+ - loss:ListMLELoss
10
+ base_model: microsoft/MiniLM-L12-H384-uncased
11
+ datasets:
12
+ - microsoft/ms_marco
13
+ pipeline_tag: text-ranking
14
+ library_name: sentence-transformers
15
+ metrics:
16
+ - map
17
+ - mrr@10
18
+ - ndcg@10
19
+ model-index:
20
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
21
+ results:
22
+ - task:
23
+ type: cross-encoder-reranking
24
+ name: Cross Encoder Reranking
25
+ dataset:
26
+ name: NanoMSMARCO R100
27
+ type: NanoMSMARCO_R100
28
+ metrics:
29
+ - type: map
30
+ value: 0.4636
31
+ name: Map
32
+ - type: mrr@10
33
+ value: 0.45
34
+ name: Mrr@10
35
+ - type: ndcg@10
36
+ value: 0.5191
37
+ name: Ndcg@10
38
+ - task:
39
+ type: cross-encoder-reranking
40
+ name: Cross Encoder Reranking
41
+ dataset:
42
+ name: NanoNFCorpus R100
43
+ type: NanoNFCorpus_R100
44
+ metrics:
45
+ - type: map
46
+ value: 0.3174
47
+ name: Map
48
+ - type: mrr@10
49
+ value: 0.4912
50
+ name: Mrr@10
51
+ - type: ndcg@10
52
+ value: 0.3169
53
+ name: Ndcg@10
54
+ - task:
55
+ type: cross-encoder-reranking
56
+ name: Cross Encoder Reranking
57
+ dataset:
58
+ name: NanoNQ R100
59
+ type: NanoNQ_R100
60
+ metrics:
61
+ - type: map
62
+ value: 0.57
63
+ name: Map
64
+ - type: mrr@10
65
+ value: 0.5739
66
+ name: Mrr@10
67
+ - type: ndcg@10
68
+ value: 0.6383
69
+ name: Ndcg@10
70
+ - task:
71
+ type: cross-encoder-nano-beir
72
+ name: Cross Encoder Nano BEIR
73
+ dataset:
74
+ name: NanoBEIR R100 mean
75
+ type: NanoBEIR_R100_mean
76
+ metrics:
77
+ - type: map
78
+ value: 0.4503
79
+ name: Map
80
+ - type: mrr@10
81
+ value: 0.5051
82
+ name: Mrr@10
83
+ - type: ndcg@10
84
+ value: 0.4915
85
+ name: Ndcg@10
86
+ ---
87
+
88
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
89
+
90
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
91
+
92
+ ## Model Details
93
+
94
+ ### Model Description
95
+ - **Model Type:** Cross Encoder
96
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
97
+ - **Maximum Sequence Length:** 512 tokens
98
+ - **Number of Output Labels:** 1 label
99
+ - **Training Dataset:**
100
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
101
+ - **Language:** en
102
+ <!-- - **License:** Unknown -->
103
+
104
+ ### Model Sources
105
+
106
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
107
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
108
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
109
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
110
+
111
+ ## Usage
112
+
113
+ ### Direct Usage (Sentence Transformers)
114
+
115
+ First install the Sentence Transformers library:
116
+
117
+ ```bash
118
+ pip install -U sentence-transformers
119
+ ```
120
+
121
+ Then you can load this model and run inference.
122
+ ```python
123
+ from sentence_transformers import CrossEncoder
124
+
125
+ # Download from the 🤗 Hub
126
+ model = CrossEncoder("yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid")
127
+ # Get scores for pairs of texts
128
+ pairs = [
129
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
130
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
131
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
132
+ ]
133
+ scores = model.predict(pairs)
134
+ print(scores.shape)
135
+ # (3,)
136
+
137
+ # Or rank different texts based on similarity to a single text
138
+ ranks = model.rank(
139
+ 'How many calories in an egg',
140
+ [
141
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
142
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
143
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
144
+ ]
145
+ )
146
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
147
+ ```
148
+
149
+ <!--
150
+ ### Direct Usage (Transformers)
151
+
152
+ <details><summary>Click to see the direct usage in Transformers</summary>
153
+
154
+ </details>
155
+ -->
156
+
157
+ <!--
158
+ ### Downstream Usage (Sentence Transformers)
159
+
160
+ You can finetune this model on your own dataset.
161
+
162
+ <details><summary>Click to expand</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Out-of-Scope Use
169
+
170
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
171
+ -->
172
+
173
+ ## Evaluation
174
+
175
+ ### Metrics
176
+
177
+ #### Cross Encoder Reranking
178
+
179
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
180
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
181
+ ```json
182
+ {
183
+ "at_k": 10,
184
+ "always_rerank_positives": true
185
+ }
186
+ ```
187
+
188
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
189
+ |:------------|:---------------------|:---------------------|:---------------------|
190
+ | map | 0.4636 (-0.0260) | 0.3174 (+0.0564) | 0.5700 (+0.1504) |
191
+ | mrr@10 | 0.4500 (-0.0275) | 0.4912 (-0.0086) | 0.5739 (+0.1472) |
192
+ | **ndcg@10** | **0.5191 (-0.0213)** | **0.3169 (-0.0081)** | **0.6383 (+0.1377)** |
193
+
194
+ #### Cross Encoder Nano BEIR
195
+
196
+ * Dataset: `NanoBEIR_R100_mean`
197
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
198
+ ```json
199
+ {
200
+ "dataset_names": [
201
+ "msmarco",
202
+ "nfcorpus",
203
+ "nq"
204
+ ],
205
+ "rerank_k": 100,
206
+ "at_k": 10,
207
+ "always_rerank_positives": true
208
+ }
209
+ ```
210
+
211
+ | Metric | Value |
212
+ |:------------|:---------------------|
213
+ | map | 0.4503 (+0.0603) |
214
+ | mrr@10 | 0.5051 (+0.0371) |
215
+ | **ndcg@10** | **0.4915 (+0.0361)** |
216
+
217
+ <!--
218
+ ## Bias, Risks and Limitations
219
+
220
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
221
+ -->
222
+
223
+ <!--
224
+ ### Recommendations
225
+
226
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
227
+ -->
228
+
229
+ ## Training Details
230
+
231
+ ### Training Dataset
232
+
233
+ #### ms_marco
234
+
235
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
236
+ * Size: 78,704 training samples
237
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
238
+ * Approximate statistics based on the first 1000 samples:
239
+ | | query | docs | labels |
240
+ |:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
241
+ | type | string | list | list |
242
+ | details | <ul><li>min: 11 characters</li><li>mean: 33.74 characters</li><li>max: 100 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
243
+ * Samples:
244
+ | query | docs | labels |
245
+ |:---------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
246
+ | <code>cost of installing central air</code> | <code>['Central Air Average Costs. The actual cost of central air installation depends on a number of factors, including the size of the home as well as the unit’s tonnage and SEER rating. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 2 If ductwork is additionally required, costs could reach $6,000 to $10,000 or more. 3 Mini-split central air conditioner prices average $1,500 to $3', 'For example, homes with forced hot air heating will have the duct work necessary for a fast and easy installation, when the project involves the running of ducts however the prices climb significantly. The average price to install a central air conditioner will range from $2650 to upwards of $15K. This installation cannot be considered a DIY project, and it is traditional for a homeowner to hire a contractor for the job. Central ai...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
247
+ | <code>how much does it cost to set up a cabinet shop</code> | <code>['According to Kennedy, most cabinets range from $500 to $1,500 per cabinet box. Based on an estimated 30 cabinets in an average-size kitchen, you can be looking at a cost of about $15,000-$45,000, she says. Discover everything you need to know about cabinets with our free guide! 1. Measure the dimensions of your kitchen', "December 28, 2005 Question Those of you who consider your operation small, what type of machinery is the minimum for what you do? I'm starting a one man shop, 2,400 square feet, and know what I would like to have to start, but am curious how the rest of you get by. A simple streamlined operation that worked for professional builders, and sell some to DIYers for a retail price. I am a one man shop that builds cabinets, furniture and exterior/interior doors. My shop is 1600 sq ft with 300 sq ft of it being a small spray room.", 'Seven years later and I moved out of the garage to a more legitimate setting in an industrial park. Today, 25 years after starting out, my co...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
248
+ | <code>how close can a gas meter be to a condensing unit</code> | <code>['Is it dangerous if it is close to the gas meter/pipe? Thanks! It should be 3 feet from the gas meter vent, and not the actual gas meter itself. The gas company can come out later to extend this vent further away from the meter if it is within 3 feet. But the chance that anything actually happening because of the ac too close to the vent is insanely remote. I would be more worried about getting hit by lightning than any problems with the gas.', 'Condensing Unit Too Close to House – Bad air conditioner installation jobs such as this one proves that it is in the best interest of the homeowner to hire competent HVAC air conditioner and heating installers so that the job is done correctly.', "Re: Condensing furnace Exhaust, Distances from window, electric and gas meters. Joel, 3 ft from operable window is what I have on the Electrical Service. Gas meter looks OK. Install instructions in your post says if below 100,000 btu clearance is 12', and 36' if over 100,000 btu.", 'Condensing Unit Too Close to House. This condensing unit was too close to the house to effectively reject heat. It was a bad HVAC condensing unit installation job by the HVAC installers. A mechanical inspector rejected the final for the permit until the condensing unit was correctly installed. It is recommended that condensing units have at least 2 feet of space so that it can']</code> | <code>[1, 0, 0, 0]</code> |
249
+ * Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
250
+ ```json
251
+ {
252
+ "lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
253
+ "activation_fct": "torch.nn.modules.activation.Sigmoid",
254
+ "mini_batch_size": 16,
255
+ "respect_input_order": true
256
+ }
257
+ ```
258
+
259
+ ### Evaluation Dataset
260
+
261
+ #### ms_marco
262
+
263
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
264
+ * Size: 1,000 evaluation samples
265
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
266
+ * Approximate statistics based on the first 1000 samples:
267
+ | | query | docs | labels |
268
+ |:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
269
+ | type | string | list | list |
270
+ | details | <ul><li>min: 11 characters</li><li>mean: 34.38 characters</li><li>max: 99 characters</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> |
271
+ * Samples:
272
+ | query | docs | labels |
273
+ |:----------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
274
+ | <code>how long does an iva stay on your credit file</code> | <code>['For example your payments to your mobile phone (if you’re on a contract) and electricity companies will also appear in your credit report. Your IVA will show on your credit file for six years from the day it started. So if your IVA was five years long it will only be listed on your credit file for a further 12 months. The idea behind asking creditors to correct the dates on default notices is to make sure that these too will be gone within 12 months. Post IVA credit file clean up. It’s a happy day when your individual voluntary arrangement (IVA) finally ends, you’re well and truly free and clear and your money is your own again. You can also take satisfaction from the fact that you have done your best by your creditors.', 'LinkedIn0. An Individual Voluntary Arrangement (IVA) is recorded on your credit file for 6 years. During this time your credit rating will be negatively affected. Unfortunately your credit rating will not suddenly become good again after your Arrangement has ended ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
275
+ | <code>Plants which produce their gametes in flowers are called what?</code> | <code>['Plants which produce their gametes in flowers are called: antheridium, gymnosperms, angiosperms, or vascular. They are called angiosperms.', 'In humans, cells that do not produce gametes are collectively called somatic cells. Somatic cells do not include sperm and ova, the cells from which they are made, and und … ifferentiated stem cells.', 'This event is called fertilization. The male gametes produced by animals and some plants (e.g., club mosses, horsetails, ferns) are called spermatozoa (plural of spermatozoon), or simply sperm. Their female gametes are called ova (plural of ovum). Ova are often called eggs. Most plants produce male gametes called pollen grains.', 'Unlike animals, plants have multicellular haploid and multicellular diploid stages in their life cycle. Gametes develop from the multicellular haploid gametophytes (Greek phyton, plant). Fertilization gives rise to a multicellular, diploid sporophyte that produces haploid spores via meiosis.', 'Original conversation...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
276
+ | <code>what is a dts sound system</code> | <code>['DTS is a series of multichannel audio technologies owned by DTS, Inc. (formerly known as D igital T heater S ystems, Inc.), an American company specializing in digital surround sound formats used for both commercial/theatrical and consumer grade applications. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, but DTS was not part of the original DVD specification, so early DVD players do not recognize DTS audio tracks at all.', 'DTS Connect is a blanket name for a two-part system used on the computer platform only, in order to convert PC audio into the DTS format, transported via a single S/PDIF cable. The two components of the system are DTS Interactive and DTS Neo:PC. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, bu...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
277
+ * Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
278
+ ```json
279
+ {
280
+ "lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
281
+ "activation_fct": "torch.nn.modules.activation.Sigmoid",
282
+ "mini_batch_size": 16,
283
+ "respect_input_order": true
284
+ }
285
+ ```
286
+
287
+ ### Training Hyperparameters
288
+ #### Non-Default Hyperparameters
289
+
290
+ - `eval_strategy`: steps
291
+ - `per_device_train_batch_size`: 16
292
+ - `per_device_eval_batch_size`: 16
293
+ - `learning_rate`: 2e-05
294
+ - `num_train_epochs`: 1
295
+ - `warmup_ratio`: 0.1
296
+ - `seed`: 12
297
+ - `bf16`: True
298
+ - `load_best_model_at_end`: True
299
+
300
+ #### All Hyperparameters
301
+ <details><summary>Click to expand</summary>
302
+
303
+ - `overwrite_output_dir`: False
304
+ - `do_predict`: False
305
+ - `eval_strategy`: steps
306
+ - `prediction_loss_only`: True
307
+ - `per_device_train_batch_size`: 16
308
+ - `per_device_eval_batch_size`: 16
309
+ - `per_gpu_train_batch_size`: None
310
+ - `per_gpu_eval_batch_size`: None
311
+ - `gradient_accumulation_steps`: 1
312
+ - `eval_accumulation_steps`: None
313
+ - `torch_empty_cache_steps`: None
314
+ - `learning_rate`: 2e-05
315
+ - `weight_decay`: 0.0
316
+ - `adam_beta1`: 0.9
317
+ - `adam_beta2`: 0.999
318
+ - `adam_epsilon`: 1e-08
319
+ - `max_grad_norm`: 1.0
320
+ - `num_train_epochs`: 1
321
+ - `max_steps`: -1
322
+ - `lr_scheduler_type`: linear
323
+ - `lr_scheduler_kwargs`: {}
324
+ - `warmup_ratio`: 0.1
325
+ - `warmup_steps`: 0
326
+ - `log_level`: passive
327
+ - `log_level_replica`: warning
328
+ - `log_on_each_node`: True
329
+ - `logging_nan_inf_filter`: True
330
+ - `save_safetensors`: True
331
+ - `save_on_each_node`: False
332
+ - `save_only_model`: False
333
+ - `restore_callback_states_from_checkpoint`: False
334
+ - `no_cuda`: False
335
+ - `use_cpu`: False
336
+ - `use_mps_device`: False
337
+ - `seed`: 12
338
+ - `data_seed`: None
339
+ - `jit_mode_eval`: False
340
+ - `use_ipex`: False
341
+ - `bf16`: True
342
+ - `fp16`: False
343
+ - `fp16_opt_level`: O1
344
+ - `half_precision_backend`: auto
345
+ - `bf16_full_eval`: False
346
+ - `fp16_full_eval`: False
347
+ - `tf32`: None
348
+ - `local_rank`: 0
349
+ - `ddp_backend`: None
350
+ - `tpu_num_cores`: None
351
+ - `tpu_metrics_debug`: False
352
+ - `debug`: []
353
+ - `dataloader_drop_last`: False
354
+ - `dataloader_num_workers`: 0
355
+ - `dataloader_prefetch_factor`: None
356
+ - `past_index`: -1
357
+ - `disable_tqdm`: False
358
+ - `remove_unused_columns`: True
359
+ - `label_names`: None
360
+ - `load_best_model_at_end`: True
361
+ - `ignore_data_skip`: False
362
+ - `fsdp`: []
363
+ - `fsdp_min_num_params`: 0
364
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
365
+ - `fsdp_transformer_layer_cls_to_wrap`: None
366
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
367
+ - `deepspeed`: None
368
+ - `label_smoothing_factor`: 0.0
369
+ - `optim`: adamw_torch
370
+ - `optim_args`: None
371
+ - `adafactor`: False
372
+ - `group_by_length`: False
373
+ - `length_column_name`: length
374
+ - `ddp_find_unused_parameters`: None
375
+ - `ddp_bucket_cap_mb`: None
376
+ - `ddp_broadcast_buffers`: False
377
+ - `dataloader_pin_memory`: True
378
+ - `dataloader_persistent_workers`: False
379
+ - `skip_memory_metrics`: True
380
+ - `use_legacy_prediction_loop`: False
381
+ - `push_to_hub`: False
382
+ - `resume_from_checkpoint`: None
383
+ - `hub_model_id`: None
384
+ - `hub_strategy`: every_save
385
+ - `hub_private_repo`: None
386
+ - `hub_always_push`: False
387
+ - `gradient_checkpointing`: False
388
+ - `gradient_checkpointing_kwargs`: None
389
+ - `include_inputs_for_metrics`: False
390
+ - `include_for_metrics`: []
391
+ - `eval_do_concat_batches`: True
392
+ - `fp16_backend`: auto
393
+ - `push_to_hub_model_id`: None
394
+ - `push_to_hub_organization`: None
395
+ - `mp_parameters`:
396
+ - `auto_find_batch_size`: False
397
+ - `full_determinism`: False
398
+ - `torchdynamo`: None
399
+ - `ray_scope`: last
400
+ - `ddp_timeout`: 1800
401
+ - `torch_compile`: False
402
+ - `torch_compile_backend`: None
403
+ - `torch_compile_mode`: None
404
+ - `dispatch_batches`: None
405
+ - `split_batches`: None
406
+ - `include_tokens_per_second`: False
407
+ - `include_num_input_tokens_seen`: False
408
+ - `neftune_noise_alpha`: None
409
+ - `optim_target_modules`: None
410
+ - `batch_eval_metrics`: False
411
+ - `eval_on_start`: False
412
+ - `use_liger_kernel`: False
413
+ - `eval_use_gather_object`: False
414
+ - `average_tokens_across_devices`: False
415
+ - `prompts`: None
416
+ - `batch_sampler`: batch_sampler
417
+ - `multi_dataset_batch_sampler`: proportional
418
+
419
+ </details>
420
+
421
+ ### Training Logs
422
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
423
+ |:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
424
+ | -1 | -1 | - | - | 0.0407 (-0.4997) | 0.2816 (-0.0435) | 0.0231 (-0.4775) | 0.1151 (-0.3402) |
425
+ | 0.0002 | 1 | 883.6996 | - | - | - | - | - |
426
+ | 0.0508 | 250 | 921.6613 | - | - | - | - | - |
427
+ | 0.1016 | 500 | 904.6479 | 856.3090 | 0.1094 (-0.4310) | 0.2034 (-0.1216) | 0.2049 (-0.2957) | 0.1726 (-0.2828) |
428
+ | 0.1525 | 750 | 900.1757 | - | - | - | - | - |
429
+ | 0.2033 | 1000 | 892.1912 | 847.0684 | 0.3615 (-0.1789) | 0.2856 (-0.0394) | 0.5605 (+0.0598) | 0.4025 (-0.0528) |
430
+ | 0.2541 | 1250 | 891.0896 | - | - | - | - | - |
431
+ | 0.3049 | 1500 | 882.4826 | 844.2736 | 0.4446 (-0.0959) | 0.3072 (-0.0178) | 0.6115 (+0.1108) | 0.4544 (-0.0009) |
432
+ | 0.3558 | 1750 | 878.0654 | - | - | - | - | - |
433
+ | 0.4066 | 2000 | 878.2091 | 840.3965 | 0.4614 (-0.0791) | 0.3450 (+0.0200) | 0.6472 (+0.1466) | 0.4845 (+0.0292) |
434
+ | 0.4574 | 2250 | 878.5553 | - | - | - | - | - |
435
+ | 0.5082 | 2500 | 877.2454 | 841.2769 | 0.4602 (-0.0802) | 0.3123 (-0.0127) | 0.5765 (+0.0759) | 0.4497 (-0.0057) |
436
+ | 0.5591 | 2750 | 864.5746 | - | - | - | - | - |
437
+ | 0.6099 | 3000 | 899.3305 | 838.2897 | 0.4752 (-0.0652) | 0.3152 (-0.0099) | 0.6333 (+0.1326) | 0.4746 (+0.0192) |
438
+ | 0.6607 | 3250 | 870.9701 | - | - | - | - | - |
439
+ | **0.7115** | **3500** | **873.4406** | **835.9516** | **0.5191 (-0.0213)** | **0.3169 (-0.0081)** | **0.6383 (+0.1377)** | **0.4915 (+0.0361)** |
440
+ | 0.7624 | 3750 | 882.9871 | - | - | - | - | - |
441
+ | 0.8132 | 4000 | 881.5676 | 836.2292 | 0.5024 (-0.0380) | 0.3269 (+0.0019) | 0.6350 (+0.1343) | 0.4881 (+0.0327) |
442
+ | 0.8640 | 4250 | 884.8231 | - | - | - | - | - |
443
+ | 0.9148 | 4500 | 875.8995 | 834.7368 | 0.5028 (-0.0376) | 0.3284 (+0.0034) | 0.6200 (+0.1193) | 0.4837 (+0.0284) |
444
+ | 0.9656 | 4750 | 868.8395 | - | - | - | - | - |
445
+ | -1 | -1 | - | - | 0.5191 (-0.0213) | 0.3169 (-0.0081) | 0.6383 (+0.1377) | 0.4915 (+0.0361) |
446
+
447
+ * The bold row denotes the saved checkpoint.
448
+
449
+ ### Framework Versions
450
+ - Python: 3.11.11
451
+ - Sentence Transformers: 3.5.0.dev0
452
+ - Transformers: 4.49.0
453
+ - PyTorch: 2.6.0+cu124
454
+ - Accelerate: 1.5.2
455
+ - Datasets: 3.4.0
456
+ - Tokenizers: 0.21.1
457
+
458
+ ## Citation
459
+
460
+ ### BibTeX
461
+
462
+ #### Sentence Transformers
463
+ ```bibtex
464
+ @inproceedings{reimers-2019-sentence-bert,
465
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
466
+ author = "Reimers, Nils and Gurevych, Iryna",
467
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
468
+ month = "11",
469
+ year = "2019",
470
+ publisher = "Association for Computational Linguistics",
471
+ url = "https://arxiv.org/abs/1908.10084",
472
+ }
473
+ ```
474
+
475
+ #### ListMLELoss
476
+ ```bibtex
477
+ @inproceedings{lan2013position,
478
+ title={Position-aware ListMLE: a sequential learning process for ranking},
479
+ author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
480
+ booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
481
+ pages={333--342},
482
+ year={2013}
483
+ }
484
+ ```
485
+
486
+ <!--
487
+ ## Glossary
488
+
489
+ *Clearly define terms in order to be accessible across audiences.*
490
+ -->
491
+
492
+ <!--
493
+ ## Model Card Authors
494
+
495
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
496
+ -->
497
+
498
+ <!--
499
+ ## Model Card Contact
500
+
501
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
502
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.49.0",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e497671f77b7d5f585a6a97a111d0e211eac9c7afd3c21cfb343e5c26e323a90
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff