tomaarsen HF Staff commited on
Commit
cb134b6
·
verified ·
1 Parent(s): 0e52b46

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +446 -136
README.md CHANGED
@@ -1,137 +1,447 @@
1
- ---
2
- tags:
3
- - sentence-transformers
4
- - cross-encoder
5
- - text-classification
6
- pipeline_tag: text-classification
7
- library_name: sentence-transformers
8
- ---
9
-
10
- # CrossEncoder
11
-
12
- This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model trained using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
13
-
14
- ## Model Details
15
-
16
- ### Model Description
17
- - **Model Type:** Cross Encoder
18
- <!-- - **Base model:** [Unknown](https://huggingface.co/unknown) -->
19
- - **Maximum Sequence Length:** 512 tokens
20
- - **Number of Output Labels:** 1 label
21
- <!-- - **Training Dataset:** Unknown -->
22
- <!-- - **Language:** Unknown -->
23
- <!-- - **License:** Unknown -->
24
-
25
- ### Model Sources
26
-
27
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
28
- - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
29
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
30
- - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
31
-
32
- ## Usage
33
-
34
- ### Direct Usage (Sentence Transformers)
35
-
36
- First install the Sentence Transformers library:
37
-
38
- ```bash
39
- pip install -U sentence-transformers
40
- ```
41
-
42
- Then you can load this model and run inference.
43
- ```python
44
- from sentence_transformers import CrossEncoder
45
-
46
- # Download from the 🤗 Hub
47
- model = CrossEncoder("tomaarsen/reranker-MiniLM-L12-msmarco-scratch")
48
- # Get scores for pairs of texts
49
- pairs = [
50
- ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
51
- ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
52
- ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
53
- ]
54
- scores = model.predict(pairs)
55
- print(scores.shape)
56
- # (3,)
57
-
58
- # Or rank different texts based on similarity to a single text
59
- ranks = model.rank(
60
- 'How many calories in an egg',
61
- [
62
- 'There are on average between 55 and 80 calories in an egg depending on its size.',
63
- 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
64
- 'Most of the calories in an egg come from the yellow yolk in the center.',
65
- ]
66
- )
67
- # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
68
- ```
69
-
70
- <!--
71
- ### Direct Usage (Transformers)
72
-
73
- <details><summary>Click to see the direct usage in Transformers</summary>
74
-
75
- </details>
76
- -->
77
-
78
- <!--
79
- ### Downstream Usage (Sentence Transformers)
80
-
81
- You can finetune this model on your own dataset.
82
-
83
- <details><summary>Click to expand</summary>
84
-
85
- </details>
86
- -->
87
-
88
- <!--
89
- ### Out-of-Scope Use
90
-
91
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
92
- -->
93
-
94
- <!--
95
- ## Bias, Risks and Limitations
96
-
97
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
98
- -->
99
-
100
- <!--
101
- ### Recommendations
102
-
103
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
104
- -->
105
-
106
- ## Training Details
107
-
108
- ### Framework Versions
109
- - Python: 3.11.6
110
- - Sentence Transformers: 3.5.0.dev0
111
- - Transformers: 4.48.3
112
- - PyTorch: 2.5.0+cu121
113
- - Accelerate: 1.3.0
114
- - Datasets: 2.20.0
115
- - Tokenizers: 0.21.0
116
-
117
- ## Citation
118
-
119
- ### BibTeX
120
-
121
- <!--
122
- ## Glossary
123
-
124
- *Clearly define terms in order to be accessible across audiences.*
125
- -->
126
-
127
- <!--
128
- ## Model Card Authors
129
-
130
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
131
- -->
132
-
133
- <!--
134
- ## Model Card Contact
135
-
136
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
137
  -->
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - text-classification
6
+ - generated_from_trainer
7
+ - dataset_size:2000000
8
+ - loss:BinaryCrossEntropyLoss
9
+ base_model: microsoft/MiniLM-L12-H384-uncased
10
+ pipeline_tag: text-classification
11
+ library_name: sentence-transformers
12
+ metrics:
13
+ - map
14
+ - mrr@10
15
+ - ndcg@10
16
+ co2_eq_emissions:
17
+ emissions: 194.67805160025472
18
+ energy_consumed: 0.5008413941792291
19
+ source: codecarbon
20
+ training_type: fine-tuning
21
+ on_cloud: false
22
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
23
+ ram_total_size: 31.777088165283203
24
+ hours_used: 1.403
25
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
26
+ model-index:
27
+ - name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
28
+ results:
29
+ - task:
30
+ type: cross-encoder-reranking
31
+ name: Cross Encoder Reranking
32
+ dataset:
33
+ name: train eval
34
+ type: train-eval
35
+ metrics:
36
+ - type: map
37
+ value: 0.6511488304623287
38
+ name: Map
39
+ - type: mrr@10
40
+ value: 0.6494007936507935
41
+ name: Mrr@10
42
+ - type: ndcg@10
43
+ value: 0.7082478541686404
44
+ name: Ndcg@10
45
+ ---
46
+
47
+ # CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
48
+
49
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
50
+
51
+ ## Model Details
52
+
53
+ ### Model Description
54
+ - **Model Type:** Cross Encoder
55
+ - **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
56
+ - **Maximum Sequence Length:** 512 tokens
57
+ - **Number of Output Labels:** 1 label
58
+ <!-- - **Training Dataset:** Unknown -->
59
+ <!-- - **Language:** Unknown -->
60
+ <!-- - **License:** Unknown -->
61
+
62
+ ### Model Sources
63
+
64
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
65
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
66
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
67
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
68
+
69
+ ## Usage
70
+
71
+ ### Direct Usage (Sentence Transformers)
72
+
73
+ First install the Sentence Transformers library:
74
+
75
+ ```bash
76
+ pip install -U sentence-transformers
77
+ ```
78
+
79
+ Then you can load this model and run inference.
80
+ ```python
81
+ from sentence_transformers import CrossEncoder
82
+
83
+ # Download from the 🤗 Hub
84
+ model = CrossEncoder("sentence_transformers_model_id")
85
+ # Get scores for pairs of texts
86
+ pairs = [
87
+ ['how much should i pay for breast implants', 'Implant Fees. The cost of buying the implants themselves will influence the overall costs of the procedure. Silicone implants cost between $1,800 and $2,500 though this may go up to slightly over $3,000.Saline implants will cost anywhere between $1,200 and $1,600.The cost of implants varies due to size and manufacturer though there is usually not much of a difference.Anesthesia Fees. There are various options for your anesthesia and this will have a direct impact on the fee you will be charged.otal Costs of Brest Implants. With all the above factors taken into consideration, the total cost of breast implants generally are somewhere in the range of $5,000 and $15,000. However, it is best for your safety and peace of mind to avoid the lowest-charging surgeons and the ones charging very high fees.'],
88
+ ['are merrell shoes lifetime warranty', "Best Answer: Regular shoes. If your horse doesn't have bad feet, don't put shoes on him. Shoes actually weaken the hooves due to a lack of circulation and the nail holes (they just make the hooves stronger while the shoes are on, once you take them off the hooves are weaker than they were at first).f your horse doesn't have bad feet, don't put shoes on him. Shoes actually weaken the hooves due to a lack of circulation and the nail holes (they just make the hooves stronger while the shoes are on, once you take them off the hooves are weaker than they were at first)."],
89
+ ['what is the largest capacity dvd disc available', 'Insert a disc that contains files into the drive that is having the problem. Use a type of disc that is not being recognized in the drive. Good discs to use are game or software discs that were purchased from a store. Do not use music CDs. If the DVD drive can read CDs but not DVDs, insert a DVD movie.nsert a software CD (like a game or business software) into the CD/DVD drive and note what happens. If an AutoPlay window opens, the drive is able to read the disc. The data stored on the disc may still be bad, but an AutoPlay window proves that the drive can read data on the disc.'],
90
+ ['weather in dead sea', "The higher above sea level you go, (for example, the tops of mountains,) the more separated and spaced out the molecules become, which causes cold weather. This is the ACCURATE answer to how elevation affects temperature. learned the answer to this in science this year, so don't worry, it is accurate: The higher above sea level/elevation you are, the colder the temperature becomes. The reas â\x80¦ on for this is because there are air molecules in the air bump closer together when you are lower above sea level-that creates warm weather."],
91
+ ['who should not contribute to roth ira', 'You can contribute to a Roth at any age, even past retirement age, as long as youâ\x80\x99re still earning taxable income. A working spouse can also contribute to a Roth IRA on behalf of a nonworking spouse. For a 401(k), the 2014 contribution limit is $17,500, unless youâ\x80\x99re 50 or older, in which case the limit is $23,000.hen you can strategize your distributions to minimize your tax liability. You can also contribute to a traditional IRA even if you participate in an employer-sponsored retirement plan, but in some cases not all of your traditional IRA contributions will be tax deductible.'],
92
+ ]
93
+ scores = model.predict(pairs)
94
+ print(scores.shape)
95
+ # (5,)
96
+
97
+ # Or rank different texts based on similarity to a single text
98
+ ranks = model.rank(
99
+ 'how much should i pay for breast implants',
100
+ [
101
+ 'Implant Fees. The cost of buying the implants themselves will influence the overall costs of the procedure. Silicone implants cost between $1,800 and $2,500 though this may go up to slightly over $3,000.Saline implants will cost anywhere between $1,200 and $1,600.The cost of implants varies due to size and manufacturer though there is usually not much of a difference.Anesthesia Fees. There are various options for your anesthesia and this will have a direct impact on the fee you will be charged.otal Costs of Brest Implants. With all the above factors taken into consideration, the total cost of breast implants generally are somewhere in the range of $5,000 and $15,000. However, it is best for your safety and peace of mind to avoid the lowest-charging surgeons and the ones charging very high fees.',
102
+ "Best Answer: Regular shoes. If your horse doesn't have bad feet, don't put shoes on him. Shoes actually weaken the hooves due to a lack of circulation and the nail holes (they just make the hooves stronger while the shoes are on, once you take them off the hooves are weaker than they were at first).f your horse doesn't have bad feet, don't put shoes on him. Shoes actually weaken the hooves due to a lack of circulation and the nail holes (they just make the hooves stronger while the shoes are on, once you take them off the hooves are weaker than they were at first).",
103
+ 'Insert a disc that contains files into the drive that is having the problem. Use a type of disc that is not being recognized in the drive. Good discs to use are game or software discs that were purchased from a store. Do not use music CDs. If the DVD drive can read CDs but not DVDs, insert a DVD movie.nsert a software CD (like a game or business software) into the CD/DVD drive and note what happens. If an AutoPlay window opens, the drive is able to read the disc. The data stored on the disc may still be bad, but an AutoPlay window proves that the drive can read data on the disc.',
104
+ "The higher above sea level you go, (for example, the tops of mountains,) the more separated and spaced out the molecules become, which causes cold weather. This is the ACCURATE answer to how elevation affects temperature. learned the answer to this in science this year, so don't worry, it is accurate: The higher above sea level/elevation you are, the colder the temperature becomes. The reas â\x80¦ on for this is because there are air molecules in the air bump closer together when you are lower above sea level-that creates warm weather.",
105
+ 'You can contribute to a Roth at any age, even past retirement age, as long as youâ\x80\x99re still earning taxable income. A working spouse can also contribute to a Roth IRA on behalf of a nonworking spouse. For a 401(k), the 2014 contribution limit is $17,500, unless youâ\x80\x99re 50 or older, in which case the limit is $23,000.hen you can strategize your distributions to minimize your tax liability. You can also contribute to a traditional IRA even if you participate in an employer-sponsored retirement plan, but in some cases not all of your traditional IRA contributions will be tax deductible.',
106
+ ]
107
+ )
108
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
109
+ ```
110
+
111
+ <!--
112
+ ### Direct Usage (Transformers)
113
+
114
+ <details><summary>Click to see the direct usage in Transformers</summary>
115
+
116
+ </details>
117
+ -->
118
+
119
+ <!--
120
+ ### Downstream Usage (Sentence Transformers)
121
+
122
+ You can finetune this model on your own dataset.
123
+
124
+ <details><summary>Click to expand</summary>
125
+
126
+ </details>
127
+ -->
128
+
129
+ <!--
130
+ ### Out-of-Scope Use
131
+
132
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
133
+ -->
134
+
135
+ ## Evaluation
136
+
137
+ ### Metrics
138
+
139
+ #### Cross Encoder Reranking
140
+
141
+ * Datasets: `train-eval`, `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
142
+ * Evaluated with [<code>CERerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CERerankingEvaluator)
143
+
144
+ | Metric | train-eval | NanoMSMARCO | NanoNFCorpus | NanoNQ |
145
+ |:------------|:-----------|:---------------------|:---------------------|:---------------------|
146
+ | map | 0.6511 | 0.5909 (+0.1013) | 0.3364 (+0.0660) | 0.6673 (+0.2466) |
147
+ | mrr@10 | 0.6494 | 0.5862 (+0.1087) | 0.5282 (+0.0284) | 0.6862 (+0.2595) |
148
+ | **ndcg@10** | **0.7082** | **0.6658 (+0.1254)** | **0.3656 (+0.0405)** | **0.7191 (+0.2185)** |
149
+
150
+ #### Cross Encoder Nano BEIR
151
+
152
+ * Dataset: `NanoBEIR_mean`
153
+ * Evaluated with [<code>CENanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CENanoBEIREvaluator)
154
+
155
+ | Metric | Value |
156
+ |:------------|:---------------------|
157
+ | map | 0.5315 (+0.1380) |
158
+ | mrr@10 | 0.6002 (+0.1322) |
159
+ | **ndcg@10** | **0.5835 (+0.1281)** |
160
+
161
+ <!--
162
+ ## Bias, Risks and Limitations
163
+
164
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
165
+ -->
166
+
167
+ <!--
168
+ ### Recommendations
169
+
170
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
171
+ -->
172
+
173
+ ## Training Details
174
+
175
+ ### Training Dataset
176
+
177
+ #### Unnamed Dataset
178
+
179
+ * Size: 2,000,000 training samples
180
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
181
+ * Approximate statistics based on the first 1000 samples:
182
+ | | sentence_0 | sentence_1 | label |
183
+ |:--------|:-----------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------|:------------------------------------------------|
184
+ | type | string | string | int |
185
+ | details | <ul><li>min: 7 characters</li><li>mean: 34.08 characters</li><li>max: 118 characters</li></ul> | <ul><li>min: 83 characters</li><li>mean: 342.99 characters</li><li>max: 1018 characters</li></ul> | <ul><li>0: ~81.70%</li><li>1: ~18.30%</li></ul> |
186
+ * Samples:
187
+ | sentence_0 | sentence_1 | label |
188
+ |:-------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
189
+ | <code>how much should i pay for breast implants</code> | <code>Implant Fees. The cost of buying the implants themselves will influence the overall costs of the procedure. Silicone implants cost between $1,800 and $2,500 though this may go up to slightly over $3,000.Saline implants will cost anywhere between $1,200 and $1,600.The cost of implants varies due to size and manufacturer though there is usually not much of a difference.Anesthesia Fees. There are various options for your anesthesia and this will have a direct impact on the fee you will be charged.otal Costs of Brest Implants. With all the above factors taken into consideration, the total cost of breast implants generally are somewhere in the range of $5,000 and $15,000. However, it is best for your safety and peace of mind to avoid the lowest-charging surgeons and the ones charging very high fees.</code> | <code>1</code> |
190
+ | <code>are merrell shoes lifetime warranty</code> | <code>Best Answer: Regular shoes. If your horse doesn't have bad feet, don't put shoes on him. Shoes actually weaken the hooves due to a lack of circulation and the nail holes (they just make the hooves stronger while the shoes are on, once you take them off the hooves are weaker than they were at first).f your horse doesn't have bad feet, don't put shoes on him. Shoes actually weaken the hooves due to a lack of circulation and the nail holes (they just make the hooves stronger while the shoes are on, once you take them off the hooves are weaker than they were at first).</code> | <code>0</code> |
191
+ | <code>what is the largest capacity dvd disc available</code> | <code>Insert a disc that contains files into the drive that is having the problem. Use a type of disc that is not being recognized in the drive. Good discs to use are game or software discs that were purchased from a store. Do not use music CDs. If the DVD drive can read CDs but not DVDs, insert a DVD movie.nsert a software CD (like a game or business software) into the CD/DVD drive and note what happens. If an AutoPlay window opens, the drive is able to read the disc. The data stored on the disc may still be bad, but an AutoPlay window proves that the drive can read data on the disc.</code> | <code>0</code> |
192
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss)
193
+
194
+ ### Training Hyperparameters
195
+ #### Non-Default Hyperparameters
196
+
197
+ - `eval_strategy`: steps
198
+ - `per_device_train_batch_size`: 64
199
+ - `per_device_eval_batch_size`: 64
200
+ - `num_train_epochs`: 1
201
+ - `fp16`: True
202
+
203
+ #### All Hyperparameters
204
+ <details><summary>Click to expand</summary>
205
+
206
+ - `overwrite_output_dir`: False
207
+ - `do_predict`: False
208
+ - `eval_strategy`: steps
209
+ - `prediction_loss_only`: True
210
+ - `per_device_train_batch_size`: 64
211
+ - `per_device_eval_batch_size`: 64
212
+ - `per_gpu_train_batch_size`: None
213
+ - `per_gpu_eval_batch_size`: None
214
+ - `gradient_accumulation_steps`: 1
215
+ - `eval_accumulation_steps`: None
216
+ - `torch_empty_cache_steps`: None
217
+ - `learning_rate`: 5e-05
218
+ - `weight_decay`: 0.0
219
+ - `adam_beta1`: 0.9
220
+ - `adam_beta2`: 0.999
221
+ - `adam_epsilon`: 1e-08
222
+ - `max_grad_norm`: 1
223
+ - `num_train_epochs`: 1
224
+ - `max_steps`: -1
225
+ - `lr_scheduler_type`: linear
226
+ - `lr_scheduler_kwargs`: {}
227
+ - `warmup_ratio`: 0.0
228
+ - `warmup_steps`: 0
229
+ - `log_level`: passive
230
+ - `log_level_replica`: warning
231
+ - `log_on_each_node`: True
232
+ - `logging_nan_inf_filter`: True
233
+ - `save_safetensors`: True
234
+ - `save_on_each_node`: False
235
+ - `save_only_model`: False
236
+ - `restore_callback_states_from_checkpoint`: False
237
+ - `no_cuda`: False
238
+ - `use_cpu`: False
239
+ - `use_mps_device`: False
240
+ - `seed`: 42
241
+ - `data_seed`: None
242
+ - `jit_mode_eval`: False
243
+ - `use_ipex`: False
244
+ - `bf16`: False
245
+ - `fp16`: True
246
+ - `fp16_opt_level`: O1
247
+ - `half_precision_backend`: auto
248
+ - `bf16_full_eval`: False
249
+ - `fp16_full_eval`: False
250
+ - `tf32`: None
251
+ - `local_rank`: 0
252
+ - `ddp_backend`: None
253
+ - `tpu_num_cores`: None
254
+ - `tpu_metrics_debug`: False
255
+ - `debug`: []
256
+ - `dataloader_drop_last`: False
257
+ - `dataloader_num_workers`: 0
258
+ - `dataloader_prefetch_factor`: None
259
+ - `past_index`: -1
260
+ - `disable_tqdm`: False
261
+ - `remove_unused_columns`: True
262
+ - `label_names`: None
263
+ - `load_best_model_at_end`: False
264
+ - `ignore_data_skip`: False
265
+ - `fsdp`: []
266
+ - `fsdp_min_num_params`: 0
267
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
268
+ - `fsdp_transformer_layer_cls_to_wrap`: None
269
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
270
+ - `deepspeed`: None
271
+ - `label_smoothing_factor`: 0.0
272
+ - `optim`: adamw_torch
273
+ - `optim_args`: None
274
+ - `adafactor`: False
275
+ - `group_by_length`: False
276
+ - `length_column_name`: length
277
+ - `ddp_find_unused_parameters`: None
278
+ - `ddp_bucket_cap_mb`: None
279
+ - `ddp_broadcast_buffers`: False
280
+ - `dataloader_pin_memory`: True
281
+ - `dataloader_persistent_workers`: False
282
+ - `skip_memory_metrics`: True
283
+ - `use_legacy_prediction_loop`: False
284
+ - `push_to_hub`: False
285
+ - `resume_from_checkpoint`: None
286
+ - `hub_model_id`: None
287
+ - `hub_strategy`: every_save
288
+ - `hub_private_repo`: None
289
+ - `hub_always_push`: False
290
+ - `gradient_checkpointing`: False
291
+ - `gradient_checkpointing_kwargs`: None
292
+ - `include_inputs_for_metrics`: False
293
+ - `include_for_metrics`: []
294
+ - `eval_do_concat_batches`: True
295
+ - `fp16_backend`: auto
296
+ - `push_to_hub_model_id`: None
297
+ - `push_to_hub_organization`: None
298
+ - `mp_parameters`:
299
+ - `auto_find_batch_size`: False
300
+ - `full_determinism`: False
301
+ - `torchdynamo`: None
302
+ - `ray_scope`: last
303
+ - `ddp_timeout`: 1800
304
+ - `torch_compile`: False
305
+ - `torch_compile_backend`: None
306
+ - `torch_compile_mode`: None
307
+ - `dispatch_batches`: None
308
+ - `split_batches`: None
309
+ - `include_tokens_per_second`: False
310
+ - `include_num_input_tokens_seen`: False
311
+ - `neftune_noise_alpha`: None
312
+ - `optim_target_modules`: None
313
+ - `batch_eval_metrics`: False
314
+ - `eval_on_start`: False
315
+ - `use_liger_kernel`: False
316
+ - `eval_use_gather_object`: False
317
+ - `average_tokens_across_devices`: False
318
+ - `prompts`: None
319
+ - `batch_sampler`: batch_sampler
320
+ - `multi_dataset_batch_sampler`: proportional
321
+
322
+ </details>
323
+
324
+ ### Training Logs
325
+ | Epoch | Step | Training Loss | train-eval_ndcg@10 | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
326
+ |:-----:|:-----:|:-------------:|:------------------:|:-------------------:|:--------------------:|:----------------:|:---------------------:|
327
+ | -1 | -1 | - | 0.0312 | 0.0280 (-0.5124) | 0.2260 (-0.0991) | 0.0315 (-0.4691) | 0.0952 (-0.3602) |
328
+ | 0.016 | 500 | 0.6271 | - | - | - | - | - |
329
+ | 0.032 | 1000 | 0.4867 | - | - | - | - | - |
330
+ | 0.048 | 1500 | 0.3551 | - | - | - | - | - |
331
+ | 0.064 | 2000 | 0.2768 | - | - | - | - | - |
332
+ | 0.08 | 2500 | 0.2455 | - | - | - | - | - |
333
+ | 0.096 | 3000 | 0.2186 | - | - | - | - | - |
334
+ | 0.112 | 3500 | 0.2151 | - | - | - | - | - |
335
+ | 0.128 | 4000 | 0.2002 | - | - | - | - | - |
336
+ | 0.144 | 4500 | 0.1973 | - | - | - | - | - |
337
+ | 0.16 | 5000 | 0.1928 | 0.6389 | 0.6178 (+0.0774) | 0.3541 (+0.0291) | 0.6869 (+0.1862) | 0.5529 (+0.0976) |
338
+ | 0.176 | 5500 | 0.1841 | - | - | - | - | - |
339
+ | 0.192 | 6000 | 0.1835 | - | - | - | - | - |
340
+ | 0.208 | 6500 | 0.1828 | - | - | - | - | - |
341
+ | 0.224 | 7000 | 0.1777 | - | - | - | - | - |
342
+ | 0.24 | 7500 | 0.1674 | - | - | - | - | - |
343
+ | 0.256 | 8000 | 0.1655 | - | - | - | - | - |
344
+ | 0.272 | 8500 | 0.1706 | - | - | - | - | - |
345
+ | 0.288 | 9000 | 0.1629 | - | - | - | - | - |
346
+ | 0.304 | 9500 | 0.1641 | - | - | - | - | - |
347
+ | 0.32 | 10000 | 0.1631 | 0.6859 | 0.6220 (+0.0815) | 0.3849 (+0.0598) | 0.6951 (+0.1944) | 0.5673 (+0.1119) |
348
+ | 0.336 | 10500 | 0.1616 | - | - | - | - | - |
349
+ | 0.352 | 11000 | 0.1575 | - | - | - | - | - |
350
+ | 0.368 | 11500 | 0.1565 | - | - | - | - | - |
351
+ | 0.384 | 12000 | 0.1523 | - | - | - | - | - |
352
+ | 0.4 | 12500 | 0.1628 | - | - | - | - | - |
353
+ | 0.416 | 13000 | 0.1569 | - | - | - | - | - |
354
+ | 0.432 | 13500 | 0.1581 | - | - | - | - | - |
355
+ | 0.448 | 14000 | 0.1527 | - | - | - | - | - |
356
+ | 0.464 | 14500 | 0.1484 | - | - | - | - | - |
357
+ | 0.48 | 15000 | 0.1531 | 0.6939 | 0.6455 (+0.1051) | 0.3663 (+0.0413) | 0.6977 (+0.1970) | 0.5698 (+0.1145) |
358
+ | 0.496 | 15500 | 0.1482 | - | - | - | - | - |
359
+ | 0.512 | 16000 | 0.1523 | - | - | - | - | - |
360
+ | 0.528 | 16500 | 0.1532 | - | - | - | - | - |
361
+ | 0.544 | 17000 | 0.1513 | - | - | - | - | - |
362
+ | 0.56 | 17500 | 0.1486 | - | - | - | - | - |
363
+ | 0.576 | 18000 | 0.1438 | - | - | - | - | - |
364
+ | 0.592 | 18500 | 0.1496 | - | - | - | - | - |
365
+ | 0.608 | 19000 | 0.1455 | - | - | - | - | - |
366
+ | 0.624 | 19500 | 0.1474 | - | - | - | - | - |
367
+ | 0.64 | 20000 | 0.1484 | 0.7025 | 0.6423 (+0.1019) | 0.3637 (+0.0387) | 0.7162 (+0.2156) | 0.5741 (+0.1187) |
368
+ | 0.656 | 20500 | 0.1436 | - | - | - | - | - |
369
+ | 0.672 | 21000 | 0.1427 | - | - | - | - | - |
370
+ | 0.688 | 21500 | 0.1463 | - | - | - | - | - |
371
+ | 0.704 | 22000 | 0.1475 | - | - | - | - | - |
372
+ | 0.72 | 22500 | 0.1446 | - | - | - | - | - |
373
+ | 0.736 | 23000 | 0.1424 | - | - | - | - | - |
374
+ | 0.752 | 23500 | 0.1397 | - | - | - | - | - |
375
+ | 0.768 | 24000 | 0.1405 | - | - | - | - | - |
376
+ | 0.784 | 24500 | 0.1405 | - | - | - | - | - |
377
+ | 0.8 | 25000 | 0.1397 | 0.7014 | 0.6492 (+0.1088) | 0.3672 (+0.0422) | 0.7229 (+0.2222) | 0.5798 (+0.1244) |
378
+ | 0.816 | 25500 | 0.1378 | - | - | - | - | - |
379
+ | 0.832 | 26000 | 0.1409 | - | - | - | - | - |
380
+ | 0.848 | 26500 | 0.1368 | - | - | - | - | - |
381
+ | 0.864 | 27000 | 0.1389 | - | - | - | - | - |
382
+ | 0.88 | 27500 | 0.1354 | - | - | - | - | - |
383
+ | 0.896 | 28000 | 0.1412 | - | - | - | - | - |
384
+ | 0.912 | 28500 | 0.138 | - | - | - | - | - |
385
+ | 0.928 | 29000 | 0.1369 | - | - | - | - | - |
386
+ | 0.944 | 29500 | 0.1321 | - | - | - | - | - |
387
+ | 0.96 | 30000 | 0.137 | 0.7150 | 0.6576 (+0.1172) | 0.3655 (+0.0405) | 0.7211 (+0.2204) | 0.5814 (+0.1260) |
388
+ | 0.976 | 30500 | 0.1342 | - | - | - | - | - |
389
+ | 0.992 | 31000 | 0.137 | - | - | - | - | - |
390
+ | 1.0 | 31250 | - | 0.7082 | 0.6658 (+0.1254) | 0.3656 (+0.0405) | 0.7191 (+0.2185) | 0.5835 (+0.1281) |
391
+
392
+
393
+ ### Environmental Impact
394
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
395
+ - **Energy Consumed**: 0.501 kWh
396
+ - **Carbon Emitted**: 0.195 kg of CO2
397
+ - **Hours Used**: 1.403 hours
398
+
399
+ ### Training Hardware
400
+ - **On Cloud**: No
401
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
402
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
403
+ - **RAM Size**: 31.78 GB
404
+
405
+ ### Framework Versions
406
+ - Python: 3.11.6
407
+ - Sentence Transformers: 3.5.0.dev0
408
+ - Transformers: 4.48.3
409
+ - PyTorch: 2.5.0+cu121
410
+ - Accelerate: 1.3.0
411
+ - Datasets: 2.20.0
412
+ - Tokenizers: 0.21.0
413
+
414
+ ## Citation
415
+
416
+ ### BibTeX
417
+
418
+ #### Sentence Transformers
419
+ ```bibtex
420
+ @inproceedings{reimers-2019-sentence-bert,
421
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
422
+ author = "Reimers, Nils and Gurevych, Iryna",
423
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
424
+ month = "11",
425
+ year = "2019",
426
+ publisher = "Association for Computational Linguistics",
427
+ url = "https://arxiv.org/abs/1908.10084",
428
+ }
429
+ ```
430
+
431
+ <!--
432
+ ## Glossary
433
+
434
+ *Clearly define terms in order to be accessible across audiences.*
435
+ -->
436
+
437
+ <!--
438
+ ## Model Card Authors
439
+
440
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
441
+ -->
442
+
443
+ <!--
444
+ ## Model Card Contact
445
+
446
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
447
  -->