tomaarsen HF Staff commited on
Commit
dedfedb
·
verified ·
1 Parent(s): 7898cf6

Add new CrossEncoder model

Browse files
README.md ADDED
@@ -0,0 +1,524 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - sentence-transformers
6
+ - cross-encoder
7
+ - generated_from_trainer
8
+ - dataset_size:78704
9
+ - loss:ListMLELoss
10
+ base_model: microsoft/MiniLM-L12-H384-uncased
11
+ datasets:
12
+ - microsoft/ms_marco
13
+ pipeline_tag: text-ranking
14
+ library_name: sentence-transformers
15
+ metrics:
16
+ - map
17
+ - mrr@10
18
+ - ndcg@10
19
+ co2_eq_emissions:
20
+ emissions: 86.38436543185088
21
+ energy_consumed: 0.22223802664213427
22
+ source: codecarbon
23
+ training_type: fine-tuning
24
+ on_cloud: false
25
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
26
+ ram_total_size: 31.777088165283203
27
+ hours_used: 0.721
28
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
29
+ model-index:
30
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
31
+ results:
32
+ - task:
33
+ type: cross-encoder-reranking
34
+ name: Cross Encoder Reranking
35
+ dataset:
36
+ name: NanoMSMARCO R100
37
+ type: NanoMSMARCO_R100
38
+ metrics:
39
+ - type: map
40
+ value: 0.3712
41
+ name: Map
42
+ - type: mrr@10
43
+ value: 0.359
44
+ name: Mrr@10
45
+ - type: ndcg@10
46
+ value: 0.433
47
+ name: Ndcg@10
48
+ - task:
49
+ type: cross-encoder-reranking
50
+ name: Cross Encoder Reranking
51
+ dataset:
52
+ name: NanoNFCorpus R100
53
+ type: NanoNFCorpus_R100
54
+ metrics:
55
+ - type: map
56
+ value: 0.2849
57
+ name: Map
58
+ - type: mrr@10
59
+ value: 0.4289
60
+ name: Mrr@10
61
+ - type: ndcg@10
62
+ value: 0.2706
63
+ name: Ndcg@10
64
+ - task:
65
+ type: cross-encoder-reranking
66
+ name: Cross Encoder Reranking
67
+ dataset:
68
+ name: NanoNQ R100
69
+ type: NanoNQ_R100
70
+ metrics:
71
+ - type: map
72
+ value: 0.4117
73
+ name: Map
74
+ - type: mrr@10
75
+ value: 0.4104
76
+ name: Mrr@10
77
+ - type: ndcg@10
78
+ value: 0.466
79
+ name: Ndcg@10
80
+ - task:
81
+ type: cross-encoder-nano-beir
82
+ name: Cross Encoder Nano BEIR
83
+ dataset:
84
+ name: NanoBEIR R100 mean
85
+ type: NanoBEIR_R100_mean
86
+ metrics:
87
+ - type: map
88
+ value: 0.3559
89
+ name: Map
90
+ - type: mrr@10
91
+ value: 0.3994
92
+ name: Mrr@10
93
+ - type: ndcg@10
94
+ value: 0.3898
95
+ name: Ndcg@10
96
+ ---
97
+
98
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
99
+
100
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
101
+
102
+ ## Model Details
103
+
104
+ ### Model Description
105
+ - **Model Type:** Cross Encoder
106
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
107
+ - **Maximum Sequence Length:** 512 tokens
108
+ - **Number of Output Labels:** 1 label
109
+ - **Training Dataset:**
110
+ - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
111
+ - **Language:** en
112
+ <!-- - **License:** Unknown -->
113
+
114
+ ### Model Sources
115
+
116
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
117
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
118
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
119
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
120
+
121
+ ## Usage
122
+
123
+ ### Direct Usage (Sentence Transformers)
124
+
125
+ First install the Sentence Transformers library:
126
+
127
+ ```bash
128
+ pip install -U sentence-transformers
129
+ ```
130
+
131
+ Then you can load this model and run inference.
132
+ ```python
133
+ from sentence_transformers import CrossEncoder
134
+
135
+ # Download from the 🤗 Hub
136
+ model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listmle")
137
+ # Get scores for pairs of texts
138
+ pairs = [
139
+ ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
140
+ ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
141
+ ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
142
+ ]
143
+ scores = model.predict(pairs)
144
+ print(scores.shape)
145
+ # (3,)
146
+
147
+ # Or rank different texts based on similarity to a single text
148
+ ranks = model.rank(
149
+ 'How many calories in an egg',
150
+ [
151
+ 'There are on average between 55 and 80 calories in an egg depending on its size.',
152
+ 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
153
+ 'Most of the calories in an egg come from the yellow yolk in the center.',
154
+ ]
155
+ )
156
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
157
+ ```
158
+
159
+ <!--
160
+ ### Direct Usage (Transformers)
161
+
162
+ <details><summary>Click to see the direct usage in Transformers</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Downstream Usage (Sentence Transformers)
169
+
170
+ You can finetune this model on your own dataset.
171
+
172
+ <details><summary>Click to expand</summary>
173
+
174
+ </details>
175
+ -->
176
+
177
+ <!--
178
+ ### Out-of-Scope Use
179
+
180
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
181
+ -->
182
+
183
+ ## Evaluation
184
+
185
+ ### Metrics
186
+
187
+ #### Cross Encoder Reranking
188
+
189
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
190
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
191
+ ```json
192
+ {
193
+ "at_k": 10,
194
+ "always_rerank_positives": true
195
+ }
196
+ ```
197
+
198
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
199
+ |:------------|:---------------------|:---------------------|:---------------------|
200
+ | map | 0.3712 (-0.1184) | 0.2849 (+0.0239) | 0.4117 (-0.0079) |
201
+ | mrr@10 | 0.3590 (-0.1185) | 0.4289 (-0.0709) | 0.4104 (-0.0163) |
202
+ | **ndcg@10** | **0.4330 (-0.1074)** | **0.2706 (-0.0545)** | **0.4660 (-0.0347)** |
203
+
204
+ #### Cross Encoder Nano BEIR
205
+
206
+ * Dataset: `NanoBEIR_R100_mean`
207
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
208
+ ```json
209
+ {
210
+ "dataset_names": [
211
+ "msmarco",
212
+ "nfcorpus",
213
+ "nq"
214
+ ],
215
+ "rerank_k": 100,
216
+ "at_k": 10,
217
+ "always_rerank_positives": true
218
+ }
219
+ ```
220
+
221
+ | Metric | Value |
222
+ |:------------|:---------------------|
223
+ | map | 0.3559 (-0.0341) |
224
+ | mrr@10 | 0.3994 (-0.0686) |
225
+ | **ndcg@10** | **0.3898 (-0.0655)** |
226
+
227
+ <!--
228
+ ## Bias, Risks and Limitations
229
+
230
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
231
+ -->
232
+
233
+ <!--
234
+ ### Recommendations
235
+
236
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
237
+ -->
238
+
239
+ ## Training Details
240
+
241
+ ### Training Dataset
242
+
243
+ #### ms_marco
244
+
245
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
246
+ * Size: 78,704 training samples
247
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
248
+ * Approximate statistics based on the first 1000 samples:
249
+ | | query | docs | labels |
250
+ |:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
251
+ | type | string | list | list |
252
+ | details | <ul><li>min: 11 characters</li><li>mean: 33.89 characters</li><li>max: 101 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
253
+ * Samples:
254
+ | query | docs | labels |
255
+ |:------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
256
+ | <code>elysia meaning origin</code> | <code>['Meaning of Elysia. Latin-American name. In Latin-American, the name Elysia means-the blessed home.The name Elysia originated as an Latin-American name. The name Elysia is most often used as a girl name or female name. Latin-American Name Meaning-the blessed home. Origin-Latin-America. ', 'The Greek name Elysia means-sweet; blissful. Mythology: Elysium was the dwelling place of happy souls. ', 'Here are pictures of people with the name Elysia. Help us put a face to the name by uploading your pictures to BabyNames.com! ', 'The meaning of Elyssa has more than one different etymologies. It has same or different meanings in other countries and languages. The different meanings of the name Elyssa are: 1 Hebrew meaning: My God is a vow. 2 Greek meaning: My God is a vow. 3 English meaning: My God is a vow.', 'Elysia is a rare given name for women. Elysia is an equally unique last name for all people. (2000 U.S. Census). Displayed below is an analysis of the popularity of the girl name Ely...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
257
+ | <code>what zone is highgate station</code> | <code>["For the station known from 1907 to 1939 as Highgate, see Archway tube station. Highgate is a London Underground station and former railway station in Archway Road, in the London Borough of Haringey in north London. The station takes its name from nearby Highgate Village. It is on the High Barnet branch of the Northern line, between Archway and East Finchley stations and is in Travelcard Zone 3. The station was originally opened in 1867 as part of the Great Northern Railway 's line between Finsbury Park and Edgware stations. Highgate station was originally constructed by the Edgware, Highgate and London Railway in the 1860s on its line from Finsbury Park station to Edgware station.", "At the time of the station's construction the first cable car in Europe operated non-stop up Highgate Hill to the village from outside the Archway Tavern, and this name was also considered for the station. It is located underneath the Archway Tower, at the intersection of Holloway Road, Highgate Hill, Ju...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
258
+ | <code>how much does thyroid surgery cost</code> | <code>['1 The price of thyroidectomy depends on the location where the surgery will be performed. 2 The cost can also differ depending on the experience and skill of the physician that will perform the surgery. 3 This is due to the boost in their reputation for the surgeries that they have performed. 1 On average, this procedure can cost anywhere from $16,000 to as much as $65,000 without any type of health insurance. 2 SurgeryCosts.net offers information to people who want to know more about', '1 For example, a one-month supply of the generic anti-thyroid drug methimazole costs about $30-$120, depending on the dose -- or, about $360-$1,440 a year. 2 And a one-month supply of the brand-name drug Tapazole costs about $90-$150 or more, depending on the dose -- or, about $1,080-$1,800 per year. 1 After the thyroid is destroyed by a radioactive iodine treatment or surgically removed, the patient typically needs to take thyroid hormone replacement such as levothyroxine, which typically costs ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
259
+ * Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
260
+ ```json
261
+ {
262
+ "lambda_weight": null,
263
+ "activation_fct": "torch.nn.modules.linear.Identity",
264
+ "mini_batch_size": 16,
265
+ "respect_input_order": true
266
+ }
267
+ ```
268
+
269
+ ### Evaluation Dataset
270
+
271
+ #### ms_marco
272
+
273
+ * Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
274
+ * Size: 1,000 evaluation samples
275
+ * Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
276
+ * Approximate statistics based on the first 1000 samples:
277
+ | | query | docs | labels |
278
+ |:--------|:----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
279
+ | type | string | list | list |
280
+ | details | <ul><li>min: 9 characters</li><li>mean: 33.94 characters</li><li>max: 99 characters</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> |
281
+ * Samples:
282
+ | query | docs | labels |
283
+ |:---------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
284
+ | <code>what are some facts penguin enemies</code> | <code>['Penguins are social birds. Many species feed, swim and nest in groups. During the breeding season, some species form large groups, or “rookeries”, that include thousands of penguins. Each penguin has a distinct call, allowing individuals to find their mate and their chicks even in large groups. ', 'Breeding | Gentoo Penguin Facts. Gentoo penguins are commonly found to breed across sub-Antarctic islands. Some of the notable colonies include Kerguelen islands, Falkland islands, and South Georgia, with fewer numbers also inhabit in the Heard Islands, Macquarie Islands, Antarctic Peninsula, and South Shetland Islands. How about summarizing some of the most interesting and rarely known gentoo penguin facts such as gentoo penguins habitat, diet, breeding, and predators. The gentoo penguins are simply characterized by the broad white stripe extending like a bonnet across the top of its head', 'Predators | Gentoo Penguin Facts. Gentoo penguins are often fall to predators such as leopard seal...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
285
+ | <code>oral surface definition zoology</code> | <code>['oral. adj. 1. spoken or verbal: an oral agreement. 2. (Medicine) relating to, affecting, or for use in the mouth: an oral thermometer. 3. (Zoology) of or relating to the surface of an animal, such as a jellyfish, on which the mouth is situated. 4. (Medicine) denoting a drug to be taken by mouth Compare parenteral: an oral contraceptive.', 'In a medusa, the oral surface and tentacles face downward. The body of a medusa is typically bell-shaped or umbrella-shaped, and medusae are free-swimming. In a typical medusa, the margins of the bell extend to form a shelf called the velum, which partially closes the open side of the bell.', "Definition of ORAL ARM. : one of the prolongations of the distal end of the manubrium of a jellyfish. ADVERTISEMENT. This word doesn't usually appear in our free dictionary, but the definition from our premium Unabridged Dictionary is offered here on a limited basis.", 'See also occlusal surface. labial surface the vestibular surface of the incisors and canin...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
286
+ | <code>what year was the protect act enacted</code> | <code>["The PROTECT Act of 2003 (Pub.L. 108–21, 117 Stat. 650, S. 151, enacted April 30, 2003) is a United States law with the stated intent of preventing child abuse. PROTECT is a backronym which stands for P rosecutorial R emedies and O ther T ools to end the E xploitation of C hildren T oday. The Department of Justice appealed the Eleventh Circuit's ruling to the U.S. Supreme Court. The Supreme Court reversed the Eleventh Circuit's ruling in May 2008 and upheld this portion of the act.", 'Copyright Renewal Act of 1992, title I of the Copyright Amendments Act of 1992, Pub. L. No. 102-307, 106 Stat. 264 (amending chapter 3, title 17 of the United States Code, by providing for automatic renewal of copyright for works copyrighted between January 1, 1964, and December 31, 1977), enacted June 26, 1992. [Amendments to the Semiconductor Chip Protection Act of 1984], Pub. L. No. 100-159, 101 Stat. 899 (amending chapter 9, title 17, United States Code, regarding protection extended to semiconducto...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
287
+ * Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
288
+ ```json
289
+ {
290
+ "lambda_weight": null,
291
+ "activation_fct": "torch.nn.modules.linear.Identity",
292
+ "mini_batch_size": 16,
293
+ "respect_input_order": true
294
+ }
295
+ ```
296
+
297
+ ### Training Hyperparameters
298
+ #### Non-Default Hyperparameters
299
+
300
+ - `eval_strategy`: steps
301
+ - `per_device_train_batch_size`: 16
302
+ - `per_device_eval_batch_size`: 16
303
+ - `learning_rate`: 2e-05
304
+ - `num_train_epochs`: 1
305
+ - `warmup_ratio`: 0.1
306
+ - `seed`: 12
307
+ - `bf16`: True
308
+ - `load_best_model_at_end`: True
309
+
310
+ #### All Hyperparameters
311
+ <details><summary>Click to expand</summary>
312
+
313
+ - `overwrite_output_dir`: False
314
+ - `do_predict`: False
315
+ - `eval_strategy`: steps
316
+ - `prediction_loss_only`: True
317
+ - `per_device_train_batch_size`: 16
318
+ - `per_device_eval_batch_size`: 16
319
+ - `per_gpu_train_batch_size`: None
320
+ - `per_gpu_eval_batch_size`: None
321
+ - `gradient_accumulation_steps`: 1
322
+ - `eval_accumulation_steps`: None
323
+ - `torch_empty_cache_steps`: None
324
+ - `learning_rate`: 2e-05
325
+ - `weight_decay`: 0.0
326
+ - `adam_beta1`: 0.9
327
+ - `adam_beta2`: 0.999
328
+ - `adam_epsilon`: 1e-08
329
+ - `max_grad_norm`: 1.0
330
+ - `num_train_epochs`: 1
331
+ - `max_steps`: -1
332
+ - `lr_scheduler_type`: linear
333
+ - `lr_scheduler_kwargs`: {}
334
+ - `warmup_ratio`: 0.1
335
+ - `warmup_steps`: 0
336
+ - `log_level`: passive
337
+ - `log_level_replica`: warning
338
+ - `log_on_each_node`: True
339
+ - `logging_nan_inf_filter`: True
340
+ - `save_safetensors`: True
341
+ - `save_on_each_node`: False
342
+ - `save_only_model`: False
343
+ - `restore_callback_states_from_checkpoint`: False
344
+ - `no_cuda`: False
345
+ - `use_cpu`: False
346
+ - `use_mps_device`: False
347
+ - `seed`: 12
348
+ - `data_seed`: None
349
+ - `jit_mode_eval`: False
350
+ - `use_ipex`: False
351
+ - `bf16`: True
352
+ - `fp16`: False
353
+ - `fp16_opt_level`: O1
354
+ - `half_precision_backend`: auto
355
+ - `bf16_full_eval`: False
356
+ - `fp16_full_eval`: False
357
+ - `tf32`: None
358
+ - `local_rank`: 0
359
+ - `ddp_backend`: None
360
+ - `tpu_num_cores`: None
361
+ - `tpu_metrics_debug`: False
362
+ - `debug`: []
363
+ - `dataloader_drop_last`: False
364
+ - `dataloader_num_workers`: 0
365
+ - `dataloader_prefetch_factor`: None
366
+ - `past_index`: -1
367
+ - `disable_tqdm`: False
368
+ - `remove_unused_columns`: True
369
+ - `label_names`: None
370
+ - `load_best_model_at_end`: True
371
+ - `ignore_data_skip`: False
372
+ - `fsdp`: []
373
+ - `fsdp_min_num_params`: 0
374
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
375
+ - `fsdp_transformer_layer_cls_to_wrap`: None
376
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
377
+ - `deepspeed`: None
378
+ - `label_smoothing_factor`: 0.0
379
+ - `optim`: adamw_torch
380
+ - `optim_args`: None
381
+ - `adafactor`: False
382
+ - `group_by_length`: False
383
+ - `length_column_name`: length
384
+ - `ddp_find_unused_parameters`: None
385
+ - `ddp_bucket_cap_mb`: None
386
+ - `ddp_broadcast_buffers`: False
387
+ - `dataloader_pin_memory`: True
388
+ - `dataloader_persistent_workers`: False
389
+ - `skip_memory_metrics`: True
390
+ - `use_legacy_prediction_loop`: False
391
+ - `push_to_hub`: False
392
+ - `resume_from_checkpoint`: None
393
+ - `hub_model_id`: None
394
+ - `hub_strategy`: every_save
395
+ - `hub_private_repo`: None
396
+ - `hub_always_push`: False
397
+ - `gradient_checkpointing`: False
398
+ - `gradient_checkpointing_kwargs`: None
399
+ - `include_inputs_for_metrics`: False
400
+ - `include_for_metrics`: []
401
+ - `eval_do_concat_batches`: True
402
+ - `fp16_backend`: auto
403
+ - `push_to_hub_model_id`: None
404
+ - `push_to_hub_organization`: None
405
+ - `mp_parameters`:
406
+ - `auto_find_batch_size`: False
407
+ - `full_determinism`: False
408
+ - `torchdynamo`: None
409
+ - `ray_scope`: last
410
+ - `ddp_timeout`: 1800
411
+ - `torch_compile`: False
412
+ - `torch_compile_backend`: None
413
+ - `torch_compile_mode`: None
414
+ - `dispatch_batches`: None
415
+ - `split_batches`: None
416
+ - `include_tokens_per_second`: False
417
+ - `include_num_input_tokens_seen`: False
418
+ - `neftune_noise_alpha`: None
419
+ - `optim_target_modules`: None
420
+ - `batch_eval_metrics`: False
421
+ - `eval_on_start`: False
422
+ - `use_liger_kernel`: False
423
+ - `eval_use_gather_object`: False
424
+ - `average_tokens_across_devices`: False
425
+ - `prompts`: None
426
+ - `batch_sampler`: batch_sampler
427
+ - `multi_dataset_batch_sampler`: proportional
428
+
429
+ </details>
430
+
431
+ ### Training Logs
432
+ | Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
433
+ |:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
434
+ | -1 | -1 | - | - | 0.0536 (-0.4868) | 0.3415 (+0.0165) | 0.0633 (-0.4373) | 0.1528 (-0.3026) |
435
+ | 0.0002 | 1 | 15.0387 | - | - | - | - | - |
436
+ | 0.0508 | 250 | 13.8424 | - | - | - | - | - |
437
+ | 0.1016 | 500 | 12.2432 | 12.1961 | 0.0338 (-0.5066) | 0.3357 (+0.0107) | 0.0687 (-0.4319) | 0.1461 (-0.3093) |
438
+ | 0.1525 | 750 | 12.2166 | - | - | - | - | - |
439
+ | 0.2033 | 1000 | 12.1697 | 12.1567 | 0.0286 (-0.5118) | 0.3049 (-0.0202) | 0.0311 (-0.4696) | 0.1215 (-0.3339) |
440
+ | 0.2541 | 1250 | 12.1288 | - | - | - | - | - |
441
+ | 0.3049 | 1500 | 12.1364 | 12.1497 | 0.0389 (-0.5015) | 0.2523 (-0.0727) | 0.0284 (-0.4722) | 0.1065 (-0.3488) |
442
+ | 0.3558 | 1750 | 12.1556 | - | - | - | - | - |
443
+ | 0.4066 | 2000 | 12.134 | 12.1342 | 0.1969 (-0.3435) | 0.2295 (-0.0955) | 0.2666 (-0.2340) | 0.2310 (-0.2244) |
444
+ | 0.4574 | 2250 | 12.1346 | - | - | - | - | - |
445
+ | 0.5082 | 2500 | 12.0789 | 12.1369 | 0.2381 (-0.3023) | 0.2086 (-0.1164) | 0.3112 (-0.1895) | 0.2526 (-0.2027) |
446
+ | 0.5591 | 2750 | 12.1796 | - | - | - | - | - |
447
+ | 0.6099 | 3000 | 12.122 | 12.1233 | 0.2978 (-0.2426) | 0.2211 (-0.1039) | 0.3967 (-0.1039) | 0.3052 (-0.1501) |
448
+ | 0.6607 | 3250 | 12.1834 | - | - | - | - | - |
449
+ | 0.7115 | 3500 | 12.11 | 12.1241 | 0.3919 (-0.1486) | 0.2391 (-0.0860) | 0.4388 (-0.0619) | 0.3566 (-0.0988) |
450
+ | 0.7624 | 3750 | 12.1394 | - | - | - | - | - |
451
+ | **0.8132** | **4000** | **12.0582** | **12.1232** | **0.4330 (-0.1074)** | **0.2706 (-0.0545)** | **0.4660 (-0.0347)** | **0.3898 (-0.0655)** |
452
+ | 0.8640 | 4250 | 12.152 | - | - | - | - | - |
453
+ | 0.9148 | 4500 | 12.0818 | 12.1178 | 0.4173 (-0.1232) | 0.2749 (-0.0502) | 0.4767 (-0.0240) | 0.3896 (-0.0658) |
454
+ | 0.9656 | 4750 | 12.1172 | - | - | - | - | - |
455
+ | -1 | -1 | - | - | 0.4330 (-0.1074) | 0.2706 (-0.0545) | 0.4660 (-0.0347) | 0.3898 (-0.0655) |
456
+
457
+ * The bold row denotes the saved checkpoint.
458
+
459
+ ### Environmental Impact
460
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
461
+ - **Energy Consumed**: 0.222 kWh
462
+ - **Carbon Emitted**: 0.086 kg of CO2
463
+ - **Hours Used**: 0.721 hours
464
+
465
+ ### Training Hardware
466
+ - **On Cloud**: No
467
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
468
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
469
+ - **RAM Size**: 31.78 GB
470
+
471
+ ### Framework Versions
472
+ - Python: 3.11.6
473
+ - Sentence Transformers: 3.5.0.dev0
474
+ - Transformers: 4.49.0
475
+ - PyTorch: 2.6.0+cu124
476
+ - Accelerate: 1.5.1
477
+ - Datasets: 3.3.2
478
+ - Tokenizers: 0.21.0
479
+
480
+ ## Citation
481
+
482
+ ### BibTeX
483
+
484
+ #### Sentence Transformers
485
+ ```bibtex
486
+ @inproceedings{reimers-2019-sentence-bert,
487
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
488
+ author = "Reimers, Nils and Gurevych, Iryna",
489
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
490
+ month = "11",
491
+ year = "2019",
492
+ publisher = "Association for Computational Linguistics",
493
+ url = "https://arxiv.org/abs/1908.10084",
494
+ }
495
+ ```
496
+
497
+ #### ListMLELoss
498
+ ```bibtex
499
+ @inproceedings{lan2013position,
500
+ title={Position-aware ListMLE: a sequential learning process for ranking},
501
+ author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
502
+ booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
503
+ pages={333--342},
504
+ year={2013}
505
+ }
506
+ ```
507
+
508
+ <!--
509
+ ## Glossary
510
+
511
+ *Clearly define terms in order to be accessible across audiences.*
512
+ -->
513
+
514
+ <!--
515
+ ## Model Card Authors
516
+
517
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
518
+ -->
519
+
520
+ <!--
521
+ ## Model Card Contact
522
+
523
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
524
+ -->
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.activation.Sigmoid"
28
+ },
29
+ "torch_dtype": "float32",
30
+ "transformers_version": "4.49.0",
31
+ "type_vocab_size": 2,
32
+ "use_cache": true,
33
+ "vocab_size": 30522
34
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2022dc93e4f55700ff3ce300cc2f701962a3ccfc34d6694a3e79b420e25b0a2f
3
+ size 133464836
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff