maennyn commited on
Commit
5b3d06c
·
verified ·
1 Parent(s): f0a40eb

Add new CrossEncoder model

Browse files
Files changed (7) hide show
  1. README.md +474 -0
  2. config.json +35 -0
  3. model.safetensors +3 -0
  4. special_tokens_map.json +37 -0
  5. tokenizer.json +0 -0
  6. tokenizer_config.json +65 -0
  7. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,474 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - cross-encoder
8
+ - generated_from_trainer
9
+ - dataset_size:69699
10
+ - loss:BinaryCrossEntropyLoss
11
+ base_model: cross-encoder/ms-marco-MiniLM-L2-v2
12
+ pipeline_tag: text-ranking
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - pearson
16
+ - spearman
17
+ - map
18
+ - mrr@10
19
+ - ndcg@10
20
+ model-index:
21
+ - name: cross-encoder/ms-marco-MiniLM-L2-v2 Finetuned on PV211 HomeWork
22
+ results:
23
+ - task:
24
+ type: cross-encoder-correlation
25
+ name: Cross Encoder Correlation
26
+ dataset:
27
+ name: sts dev
28
+ type: sts_dev
29
+ metrics:
30
+ - type: pearson
31
+ value: 0.8392209488671921
32
+ name: Pearson
33
+ - type: spearman
34
+ value: 0.729809198818792
35
+ name: Spearman
36
+ - task:
37
+ type: cross-encoder-reranking
38
+ name: Cross Encoder Reranking
39
+ dataset:
40
+ name: NanoMSMARCO R100
41
+ type: NanoMSMARCO_R100
42
+ metrics:
43
+ - type: map
44
+ value: 0.5685
45
+ name: Map
46
+ - type: mrr@10
47
+ value: 0.557
48
+ name: Mrr@10
49
+ - type: ndcg@10
50
+ value: 0.6146
51
+ name: Ndcg@10
52
+ - task:
53
+ type: cross-encoder-reranking
54
+ name: Cross Encoder Reranking
55
+ dataset:
56
+ name: NanoNFCorpus R100
57
+ type: NanoNFCorpus_R100
58
+ metrics:
59
+ - type: map
60
+ value: 0.3511
61
+ name: Map
62
+ - type: mrr@10
63
+ value: 0.5391
64
+ name: Mrr@10
65
+ - type: ndcg@10
66
+ value: 0.3779
67
+ name: Ndcg@10
68
+ - task:
69
+ type: cross-encoder-reranking
70
+ name: Cross Encoder Reranking
71
+ dataset:
72
+ name: NanoNQ R100
73
+ type: NanoNQ_R100
74
+ metrics:
75
+ - type: map
76
+ value: 0.5917
77
+ name: Map
78
+ - type: mrr@10
79
+ value: 0.6017
80
+ name: Mrr@10
81
+ - type: ndcg@10
82
+ value: 0.645
83
+ name: Ndcg@10
84
+ - task:
85
+ type: cross-encoder-nano-beir
86
+ name: Cross Encoder Nano BEIR
87
+ dataset:
88
+ name: NanoBEIR R100 mean
89
+ type: NanoBEIR_R100_mean
90
+ metrics:
91
+ - type: map
92
+ value: 0.5038
93
+ name: Map
94
+ - type: mrr@10
95
+ value: 0.5659
96
+ name: Mrr@10
97
+ - type: ndcg@10
98
+ value: 0.5459
99
+ name: Ndcg@10
100
+ ---
101
+
102
+ # cross-encoder/ms-marco-MiniLM-L2-v2 Finetuned on PV211 HomeWork
103
+
104
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L2-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L2-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
105
+
106
+ ## Model Details
107
+
108
+ ### Model Description
109
+ - **Model Type:** Cross Encoder
110
+ - **Base model:** [cross-encoder/ms-marco-MiniLM-L2-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L2-v2) <!-- at revision da2cadf7e0af92ed9f105f41e9857437e07b51f5 -->
111
+ - **Maximum Sequence Length:** 512 tokens
112
+ - **Number of Output Labels:** 1 label
113
+ <!-- - **Training Dataset:** Unknown -->
114
+ - **Language:** en
115
+ - **License:** apache-2.0
116
+
117
+ ### Model Sources
118
+
119
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
120
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
121
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
122
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
123
+
124
+ ## Usage
125
+
126
+ ### Direct Usage (Sentence Transformers)
127
+
128
+ First install the Sentence Transformers library:
129
+
130
+ ```bash
131
+ pip install -U sentence-transformers
132
+ ```
133
+
134
+ Then you can load this model and run inference.
135
+ ```python
136
+ from sentence_transformers import CrossEncoder
137
+
138
+ # Download from the 🤗 Hub
139
+ model = CrossEncoder("maennyn/pv211_beir_cqadupstack_crossencoder")
140
+ # Get scores for pairs of texts
141
+ pairs = [
142
+ ['Do elevator upgrades increase your passive credit earnings, too?', 'I searched for a solution for this problem, but cannot find an answer (or exact replica of the problem) Basically, I set up Multisite on MAMP Pro (Apache port 80, MySQL Port 3306). The set up was smooth, and I created a new site via a subdirectory. The parent theme loads fine. I created a child theme, and it activates (it doesn\'t show a broken message). On the Appearance page it shows the message "This theme requires the parent theme", but underneath the Theme Description. However when I view the front page of the site, the page is blank, and there is no html at all. Would could possibly be the error? I spent a few hours on this already and it\'s not going really well. Code of child theme, only CSS, no functions.php or other php files in the child theme folder. /* Theme Name: Confit Child Theme Author: Automattic Template: confit Description: Confit Child Theme 1 Version: 1.0 */ @import url(\'../confit/style.css\'); * Should also mention that the parent functions are not loading either. Thanks!'],
143
+ ['Traceback (most recent call last) error appears on terminal', "I've got a binary characteristic and a population $S$ with size $n$ and $P[X] = p$ such that $p$ may be small and $n$ is extremely large. Within this population are subpopulations of various sizes $S_0, S_1, \\dots, S_k \\subset S$. I'd like to be able to select each subpopulation in which $p_i < p$ with some concept of statistical significance. My first inclination is to observe that the standard error on each $p_i$ is $SE_i = \\sqrt{\\frac{\\hat{p_i}(1-\\hat{p_i})}{n}}$ and to compare upper bounds on confidence intervals. $\\{S_i \\; | \\; \\hat{p_i} + 3 \\cdot SE_i < p\\}$, for example. But when $\\hat{p_i} = 0$, then $SE_i = 0$, and this upper bound is 0 even for the smallest subpopulations (like those where $n_i = 1$). Is there any way to express uncertainty in $p_i$ when $\\hat{p_i} = 0$? Maybe through use of $p$ as a prior? **Edit:** It looks like the Jeffreys interval as described in Brown et al. is about what I'm after, though I'm not as-of-yet sure how to apply it."],
144
+ ['Do I have to install a custom ROM if I root?', 'What is the difference between a battery and a charged capacitor? I can see lot of similarities between capacitor and battery. In both these charges are separated and When not connected in a circuit both can have same Potential difference `V`. The only difference is that battery runs for longer time but a capacitor discharges almost instantaneously. Why this difference? What is the exact cause for the difference in the discharge times?'],
145
+ ['How to seprate words into two lines in one cell?', 'To me the word "curious" would be something you can be i.e. > I am curious what tomorrow will bring I recently read a text of a student I was supervising which used it as follows > A curious phenomenon is ... With which he meant to say that the phenomenon was peculiar, odd or strange. The only other case I have ever seen this is in the movie title: "The Curious Case of Benjamin Button", but that might be \'artistic freedom\' (since Curious Case has the nice C.. C..). My question is: is the usage of the word "curious" in the meaning of peculiar correct?'],
146
+ ["Bought game on Steam, but it's not in my Library", "I'm looking to choose open source project hosting site for an F# project using SVN. CodePlex is where the .NET community in general and most F# projects are hosted, but I'm worried TFS + SvnBridge is going to give me headaches. So I'm looking elsewhere and seeking advice here. Or if you think CodePlex is still the best choice in my scenario, I'd like to hear that too. So far, Google Code is looking appealing to me. They have a clean interface and true SVN hosting. But there are close to no F# projects currently hosted (it's not even in their search by programming language list), so I'm wondering if there are any notable downsides besides the lack of community I might encounter. If there is yet another option, I'd like to hear that too. Thanks!"],
147
+ ]
148
+ scores = model.predict(pairs)
149
+ print(scores.shape)
150
+ # (5,)
151
+
152
+ # Or rank different texts based on similarity to a single text
153
+ ranks = model.rank(
154
+ 'Do elevator upgrades increase your passive credit earnings, too?',
155
+ [
156
+ 'I searched for a solution for this problem, but cannot find an answer (or exact replica of the problem) Basically, I set up Multisite on MAMP Pro (Apache port 80, MySQL Port 3306). The set up was smooth, and I created a new site via a subdirectory. The parent theme loads fine. I created a child theme, and it activates (it doesn\'t show a broken message). On the Appearance page it shows the message "This theme requires the parent theme", but underneath the Theme Description. However when I view the front page of the site, the page is blank, and there is no html at all. Would could possibly be the error? I spent a few hours on this already and it\'s not going really well. Code of child theme, only CSS, no functions.php or other php files in the child theme folder. /* Theme Name: Confit Child Theme Author: Automattic Template: confit Description: Confit Child Theme 1 Version: 1.0 */ @import url(\'../confit/style.css\'); * Should also mention that the parent functions are not loading either. Thanks!',
157
+ "I've got a binary characteristic and a population $S$ with size $n$ and $P[X] = p$ such that $p$ may be small and $n$ is extremely large. Within this population are subpopulations of various sizes $S_0, S_1, \\dots, S_k \\subset S$. I'd like to be able to select each subpopulation in which $p_i < p$ with some concept of statistical significance. My first inclination is to observe that the standard error on each $p_i$ is $SE_i = \\sqrt{\\frac{\\hat{p_i}(1-\\hat{p_i})}{n}}$ and to compare upper bounds on confidence intervals. $\\{S_i \\; | \\; \\hat{p_i} + 3 \\cdot SE_i < p\\}$, for example. But when $\\hat{p_i} = 0$, then $SE_i = 0$, and this upper bound is 0 even for the smallest subpopulations (like those where $n_i = 1$). Is there any way to express uncertainty in $p_i$ when $\\hat{p_i} = 0$? Maybe through use of $p$ as a prior? **Edit:** It looks like the Jeffreys interval as described in Brown et al. is about what I'm after, though I'm not as-of-yet sure how to apply it.",
158
+ 'What is the difference between a battery and a charged capacitor? I can see lot of similarities between capacitor and battery. In both these charges are separated and When not connected in a circuit both can have same Potential difference `V`. The only difference is that battery runs for longer time but a capacitor discharges almost instantaneously. Why this difference? What is the exact cause for the difference in the discharge times?',
159
+ 'To me the word "curious" would be something you can be i.e. > I am curious what tomorrow will bring I recently read a text of a student I was supervising which used it as follows > A curious phenomenon is ... With which he meant to say that the phenomenon was peculiar, odd or strange. The only other case I have ever seen this is in the movie title: "The Curious Case of Benjamin Button", but that might be \'artistic freedom\' (since Curious Case has the nice C.. C..). My question is: is the usage of the word "curious" in the meaning of peculiar correct?',
160
+ "I'm looking to choose open source project hosting site for an F# project using SVN. CodePlex is where the .NET community in general and most F# projects are hosted, but I'm worried TFS + SvnBridge is going to give me headaches. So I'm looking elsewhere and seeking advice here. Or if you think CodePlex is still the best choice in my scenario, I'd like to hear that too. So far, Google Code is looking appealing to me. They have a clean interface and true SVN hosting. But there are close to no F# projects currently hosted (it's not even in their search by programming language list), so I'm wondering if there are any notable downsides besides the lack of community I might encounter. If there is yet another option, I'd like to hear that too. Thanks!",
161
+ ]
162
+ )
163
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
164
+ ```
165
+
166
+ <!--
167
+ ### Direct Usage (Transformers)
168
+
169
+ <details><summary>Click to see the direct usage in Transformers</summary>
170
+
171
+ </details>
172
+ -->
173
+
174
+ <!--
175
+ ### Downstream Usage (Sentence Transformers)
176
+
177
+ You can finetune this model on your own dataset.
178
+
179
+ <details><summary>Click to expand</summary>
180
+
181
+ </details>
182
+ -->
183
+
184
+ <!--
185
+ ### Out-of-Scope Use
186
+
187
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
188
+ -->
189
+
190
+ ## Evaluation
191
+
192
+ ### Metrics
193
+
194
+ #### Cross Encoder Correlation
195
+
196
+ * Dataset: `sts_dev`
197
+ * Evaluated with [<code>CrossEncoderCorrelationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator)
198
+
199
+ | Metric | Value |
200
+ |:-------------|:-----------|
201
+ | pearson | 0.8392 |
202
+ | **spearman** | **0.7298** |
203
+
204
+ #### Cross Encoder Reranking
205
+
206
+ * Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
207
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
208
+ ```json
209
+ {
210
+ "at_k": 10,
211
+ "always_rerank_positives": true
212
+ }
213
+ ```
214
+
215
+ | Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
216
+ |:------------|:---------------------|:---------------------|:---------------------|
217
+ | map | 0.5685 (+0.0790) | 0.3511 (+0.0901) | 0.5917 (+0.1721) |
218
+ | mrr@10 | 0.5570 (+0.0795) | 0.5391 (+0.0392) | 0.6017 (+0.1750) |
219
+ | **ndcg@10** | **0.6146 (+0.0742)** | **0.3779 (+0.0529)** | **0.6450 (+0.1444)** |
220
+
221
+ #### Cross Encoder Nano BEIR
222
+
223
+ * Dataset: `NanoBEIR_R100_mean`
224
+ * Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
225
+ ```json
226
+ {
227
+ "dataset_names": [
228
+ "msmarco",
229
+ "nfcorpus",
230
+ "nq"
231
+ ],
232
+ "rerank_k": 100,
233
+ "at_k": 10,
234
+ "always_rerank_positives": true
235
+ }
236
+ ```
237
+
238
+ | Metric | Value |
239
+ |:------------|:---------------------|
240
+ | map | 0.5038 (+0.1137) |
241
+ | mrr@10 | 0.5659 (+0.0979) |
242
+ | **ndcg@10** | **0.5459 (+0.0905)** |
243
+
244
+ <!--
245
+ ## Bias, Risks and Limitations
246
+
247
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
248
+ -->
249
+
250
+ <!--
251
+ ### Recommendations
252
+
253
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
254
+ -->
255
+
256
+ ## Training Details
257
+
258
+ ### Training Dataset
259
+
260
+ #### Unnamed Dataset
261
+
262
+ * Size: 69,699 training samples
263
+ * Columns: <code>query</code>, <code>document</code>, and <code>label</code>
264
+ * Approximate statistics based on the first 1000 samples:
265
+ | | query | document | label |
266
+ |:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------|:------------------------------------------------|
267
+ | type | string | string | int |
268
+ | details | <ul><li>min: 15 characters</li><li>mean: 49.33 characters</li><li>max: 125 characters</li></ul> | <ul><li>min: 45 characters</li><li>mean: 793.68 characters</li><li>max: 18801 characters</li></ul> | <ul><li>0: ~74.50%</li><li>1: ~25.50%</li></ul> |
269
+ * Samples:
270
+ | query | document | label |
271
+ |:------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
272
+ | <code>Do elevator upgrades increase your passive credit earnings, too?</code> | <code>I searched for a solution for this problem, but cannot find an answer (or exact replica of the problem) Basically, I set up Multisite on MAMP Pro (Apache port 80, MySQL Port 3306). The set up was smooth, and I created a new site via a subdirectory. The parent theme loads fine. I created a child theme, and it activates (it doesn't show a broken message). On the Appearance page it shows the message "This theme requires the parent theme", but underneath the Theme Description. However when I view the front page of the site, the page is blank, and there is no html at all. Would could possibly be the error? I spent a few hours on this already and it's not going really well. Code of child theme, only CSS, no functions.php or other php files in the child theme folder. /* Theme Name: Confit Child Theme Author: Automattic Template: confit Description: Confit Child Theme 1 Version: 1.0 */ @import url('../confit/style.css'); * Should also menti...</code> | <code>0</code> |
273
+ | <code>Traceback (most recent call last) error appears on terminal</code> | <code>I've got a binary characteristic and a population $S$ with size $n$ and $P[X] = p$ such that $p$ may be small and $n$ is extremely large. Within this population are subpopulations of various sizes $S_0, S_1, \dots, S_k \subset S$. I'd like to be able to select each subpopulation in which $p_i < p$ with some concept of statistical significance. My first inclination is to observe that the standard error on each $p_i$ is $SE_i = \sqrt{\frac{\hat{p_i}(1-\hat{p_i})}{n}}$ and to compare upper bounds on confidence intervals. $\{S_i \; | \; \hat{p_i} + 3 \cdot SE_i < p\}$, for example. But when $\hat{p_i} = 0$, then $SE_i = 0$, and this upper bound is 0 even for the smallest subpopulations (like those where $n_i = 1$). Is there any way to express uncertainty in $p_i$ when $\hat{p_i} = 0$? Maybe through use of $p$ as a prior? **Edit:** It looks like the Jeffreys interval as described in Brown et al. is about what I'm after, though I'm not as-of-yet sure how to apply it.</code> | <code>0</code> |
274
+ | <code>Do I have to install a custom ROM if I root?</code> | <code>What is the difference between a battery and a charged capacitor? I can see lot of similarities between capacitor and battery. In both these charges are separated and When not connected in a circuit both can have same Potential difference `V`. The only difference is that battery runs for longer time but a capacitor discharges almost instantaneously. Why this difference? What is the exact cause for the difference in the discharge times?</code> | <code>0</code> |
275
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
276
+ ```json
277
+ {
278
+ "activation_fn": "torch.nn.modules.linear.Identity",
279
+ "pos_weight": null
280
+ }
281
+ ```
282
+
283
+ ### Training Hyperparameters
284
+ #### Non-Default Hyperparameters
285
+
286
+ - `eval_strategy`: epoch
287
+ - `per_device_train_batch_size`: 32
288
+ - `per_device_eval_batch_size`: 32
289
+ - `learning_rate`: 2e-05
290
+ - `warmup_ratio`: 0.1
291
+ - `save_only_model`: True
292
+ - `fp16`: True
293
+ - `load_best_model_at_end`: True
294
+
295
+ #### All Hyperparameters
296
+ <details><summary>Click to expand</summary>
297
+
298
+ - `overwrite_output_dir`: False
299
+ - `do_predict`: False
300
+ - `eval_strategy`: epoch
301
+ - `prediction_loss_only`: True
302
+ - `per_device_train_batch_size`: 32
303
+ - `per_device_eval_batch_size`: 32
304
+ - `per_gpu_train_batch_size`: None
305
+ - `per_gpu_eval_batch_size`: None
306
+ - `gradient_accumulation_steps`: 1
307
+ - `eval_accumulation_steps`: None
308
+ - `torch_empty_cache_steps`: None
309
+ - `learning_rate`: 2e-05
310
+ - `weight_decay`: 0.0
311
+ - `adam_beta1`: 0.9
312
+ - `adam_beta2`: 0.999
313
+ - `adam_epsilon`: 1e-08
314
+ - `max_grad_norm`: 1.0
315
+ - `num_train_epochs`: 3
316
+ - `max_steps`: -1
317
+ - `lr_scheduler_type`: linear
318
+ - `lr_scheduler_kwargs`: {}
319
+ - `warmup_ratio`: 0.1
320
+ - `warmup_steps`: 0
321
+ - `log_level`: passive
322
+ - `log_level_replica`: warning
323
+ - `log_on_each_node`: True
324
+ - `logging_nan_inf_filter`: True
325
+ - `save_safetensors`: True
326
+ - `save_on_each_node`: False
327
+ - `save_only_model`: True
328
+ - `restore_callback_states_from_checkpoint`: False
329
+ - `no_cuda`: False
330
+ - `use_cpu`: False
331
+ - `use_mps_device`: False
332
+ - `seed`: 42
333
+ - `data_seed`: None
334
+ - `jit_mode_eval`: False
335
+ - `use_ipex`: False
336
+ - `bf16`: False
337
+ - `fp16`: True
338
+ - `fp16_opt_level`: O1
339
+ - `half_precision_backend`: auto
340
+ - `bf16_full_eval`: False
341
+ - `fp16_full_eval`: False
342
+ - `tf32`: None
343
+ - `local_rank`: 0
344
+ - `ddp_backend`: None
345
+ - `tpu_num_cores`: None
346
+ - `tpu_metrics_debug`: False
347
+ - `debug`: []
348
+ - `dataloader_drop_last`: False
349
+ - `dataloader_num_workers`: 0
350
+ - `dataloader_prefetch_factor`: None
351
+ - `past_index`: -1
352
+ - `disable_tqdm`: False
353
+ - `remove_unused_columns`: True
354
+ - `label_names`: None
355
+ - `load_best_model_at_end`: True
356
+ - `ignore_data_skip`: False
357
+ - `fsdp`: []
358
+ - `fsdp_min_num_params`: 0
359
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
360
+ - `tp_size`: 0
361
+ - `fsdp_transformer_layer_cls_to_wrap`: None
362
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
363
+ - `deepspeed`: None
364
+ - `label_smoothing_factor`: 0.0
365
+ - `optim`: adamw_torch
366
+ - `optim_args`: None
367
+ - `adafactor`: False
368
+ - `group_by_length`: False
369
+ - `length_column_name`: length
370
+ - `ddp_find_unused_parameters`: None
371
+ - `ddp_bucket_cap_mb`: None
372
+ - `ddp_broadcast_buffers`: False
373
+ - `dataloader_pin_memory`: True
374
+ - `dataloader_persistent_workers`: False
375
+ - `skip_memory_metrics`: True
376
+ - `use_legacy_prediction_loop`: False
377
+ - `push_to_hub`: False
378
+ - `resume_from_checkpoint`: None
379
+ - `hub_model_id`: None
380
+ - `hub_strategy`: every_save
381
+ - `hub_private_repo`: None
382
+ - `hub_always_push`: False
383
+ - `gradient_checkpointing`: False
384
+ - `gradient_checkpointing_kwargs`: None
385
+ - `include_inputs_for_metrics`: False
386
+ - `include_for_metrics`: []
387
+ - `eval_do_concat_batches`: True
388
+ - `fp16_backend`: auto
389
+ - `push_to_hub_model_id`: None
390
+ - `push_to_hub_organization`: None
391
+ - `mp_parameters`:
392
+ - `auto_find_batch_size`: False
393
+ - `full_determinism`: False
394
+ - `torchdynamo`: None
395
+ - `ray_scope`: last
396
+ - `ddp_timeout`: 1800
397
+ - `torch_compile`: False
398
+ - `torch_compile_backend`: None
399
+ - `torch_compile_mode`: None
400
+ - `include_tokens_per_second`: False
401
+ - `include_num_input_tokens_seen`: False
402
+ - `neftune_noise_alpha`: None
403
+ - `optim_target_modules`: None
404
+ - `batch_eval_metrics`: False
405
+ - `eval_on_start`: False
406
+ - `use_liger_kernel`: False
407
+ - `eval_use_gather_object`: False
408
+ - `average_tokens_across_devices`: False
409
+ - `prompts`: None
410
+ - `batch_sampler`: batch_sampler
411
+ - `multi_dataset_batch_sampler`: proportional
412
+
413
+ </details>
414
+
415
+ ### Training Logs
416
+ | Epoch | Step | Training Loss | sts_dev_spearman | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
417
+ |:-------:|:--------:|:-------------:|:----------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
418
+ | -1 | -1 | - | 0.5982 | 0.6519 (+0.1115) | 0.3749 (+0.0498) | 0.6497 (+0.1490) | 0.5588 (+0.1035) |
419
+ | 0.4589 | 1000 | 0.4015 | - | - | - | - | - |
420
+ | 0.9179 | 2000 | 0.191 | - | - | - | - | - |
421
+ | **1.0** | **2179** | **-** | **0.7298** | **0.6146 (+0.0742)** | **0.3779 (+0.0529)** | **0.6450 (+0.1444)** | **0.5459 (+0.0905)** |
422
+ | 1.3768 | 3000 | 0.163 | - | - | - | - | - |
423
+ | 1.8357 | 4000 | 0.1524 | - | - | - | - | - |
424
+ | 2.0 | 4358 | - | 0.7312 | 0.5951 (+0.0547) | 0.3808 (+0.0557) | 0.6490 (+0.1484) | 0.5416 (+0.0863) |
425
+ | 2.2946 | 5000 | 0.1369 | - | - | - | - | - |
426
+ | 2.7536 | 6000 | 0.1297 | - | - | - | - | - |
427
+ | 3.0 | 6537 | - | 0.7335 | 0.5994 (+0.0590) | 0.3743 (+0.0492) | 0.6500 (+0.1494) | 0.5412 (+0.0859) |
428
+ | -1 | -1 | - | 0.7298 | 0.6146 (+0.0742) | 0.3779 (+0.0529) | 0.6450 (+0.1444) | 0.5459 (+0.0905) |
429
+
430
+ * The bold row denotes the saved checkpoint.
431
+
432
+ ### Framework Versions
433
+ - Python: 3.10.12
434
+ - Sentence Transformers: 4.1.0
435
+ - Transformers: 4.51.3
436
+ - PyTorch: 2.1.0+cu118
437
+ - Accelerate: 1.6.0
438
+ - Datasets: 3.5.0
439
+ - Tokenizers: 0.21.1
440
+
441
+ ## Citation
442
+
443
+ ### BibTeX
444
+
445
+ #### Sentence Transformers
446
+ ```bibtex
447
+ @inproceedings{reimers-2019-sentence-bert,
448
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
449
+ author = "Reimers, Nils and Gurevych, Iryna",
450
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
451
+ month = "11",
452
+ year = "2019",
453
+ publisher = "Association for Computational Linguistics",
454
+ url = "https://arxiv.org/abs/1908.10084",
455
+ }
456
+ ```
457
+
458
+ <!--
459
+ ## Glossary
460
+
461
+ *Clearly define terms in order to be accessible across audiences.*
462
+ -->
463
+
464
+ <!--
465
+ ## Model Card Authors
466
+
467
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
468
+ -->
469
+
470
+ <!--
471
+ ## Model Card Contact
472
+
473
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
474
+ -->
config.json ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 1536,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 2,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "sentence_transformers": {
27
+ "activation_fn": "torch.nn.modules.linear.Identity",
28
+ "version": "4.1.0"
29
+ },
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.51.3",
32
+ "type_vocab_size": 2,
33
+ "use_cache": true,
34
+ "vocab_size": 30522
35
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f6273404a5d8df480004d551bc0e1ed6300998761d991faad8d8c739f2446fc
3
+ size 62467588
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "never_split": null,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "[PAD]",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "[SEP]",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "[UNK]"
65
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff