Add new CrossEncoder model
Browse files- README.md +502 -0
- config.json +34 -0
- model.safetensors +3 -0
- special_tokens_map.json +7 -0
- tokenizer.json +0 -0
- tokenizer_config.json +58 -0
- vocab.txt +0 -0
README.md
ADDED
@@ -0,0 +1,502 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
tags:
|
5 |
+
- sentence-transformers
|
6 |
+
- cross-encoder
|
7 |
+
- generated_from_trainer
|
8 |
+
- dataset_size:78704
|
9 |
+
- loss:ListMLELoss
|
10 |
+
base_model: microsoft/MiniLM-L12-H384-uncased
|
11 |
+
datasets:
|
12 |
+
- microsoft/ms_marco
|
13 |
+
pipeline_tag: text-ranking
|
14 |
+
library_name: sentence-transformers
|
15 |
+
metrics:
|
16 |
+
- map
|
17 |
+
- mrr@10
|
18 |
+
- ndcg@10
|
19 |
+
model-index:
|
20 |
+
- name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
|
21 |
+
results:
|
22 |
+
- task:
|
23 |
+
type: cross-encoder-reranking
|
24 |
+
name: Cross Encoder Reranking
|
25 |
+
dataset:
|
26 |
+
name: NanoMSMARCO R100
|
27 |
+
type: NanoMSMARCO_R100
|
28 |
+
metrics:
|
29 |
+
- type: map
|
30 |
+
value: 0.4636
|
31 |
+
name: Map
|
32 |
+
- type: mrr@10
|
33 |
+
value: 0.45
|
34 |
+
name: Mrr@10
|
35 |
+
- type: ndcg@10
|
36 |
+
value: 0.5191
|
37 |
+
name: Ndcg@10
|
38 |
+
- task:
|
39 |
+
type: cross-encoder-reranking
|
40 |
+
name: Cross Encoder Reranking
|
41 |
+
dataset:
|
42 |
+
name: NanoNFCorpus R100
|
43 |
+
type: NanoNFCorpus_R100
|
44 |
+
metrics:
|
45 |
+
- type: map
|
46 |
+
value: 0.3174
|
47 |
+
name: Map
|
48 |
+
- type: mrr@10
|
49 |
+
value: 0.4912
|
50 |
+
name: Mrr@10
|
51 |
+
- type: ndcg@10
|
52 |
+
value: 0.3169
|
53 |
+
name: Ndcg@10
|
54 |
+
- task:
|
55 |
+
type: cross-encoder-reranking
|
56 |
+
name: Cross Encoder Reranking
|
57 |
+
dataset:
|
58 |
+
name: NanoNQ R100
|
59 |
+
type: NanoNQ_R100
|
60 |
+
metrics:
|
61 |
+
- type: map
|
62 |
+
value: 0.57
|
63 |
+
name: Map
|
64 |
+
- type: mrr@10
|
65 |
+
value: 0.5739
|
66 |
+
name: Mrr@10
|
67 |
+
- type: ndcg@10
|
68 |
+
value: 0.6383
|
69 |
+
name: Ndcg@10
|
70 |
+
- task:
|
71 |
+
type: cross-encoder-nano-beir
|
72 |
+
name: Cross Encoder Nano BEIR
|
73 |
+
dataset:
|
74 |
+
name: NanoBEIR R100 mean
|
75 |
+
type: NanoBEIR_R100_mean
|
76 |
+
metrics:
|
77 |
+
- type: map
|
78 |
+
value: 0.4503
|
79 |
+
name: Map
|
80 |
+
- type: mrr@10
|
81 |
+
value: 0.5051
|
82 |
+
name: Mrr@10
|
83 |
+
- type: ndcg@10
|
84 |
+
value: 0.4915
|
85 |
+
name: Ndcg@10
|
86 |
+
---
|
87 |
+
|
88 |
+
# CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
|
89 |
+
|
90 |
+
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
|
91 |
+
|
92 |
+
## Model Details
|
93 |
+
|
94 |
+
### Model Description
|
95 |
+
- **Model Type:** Cross Encoder
|
96 |
+
- **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
|
97 |
+
- **Maximum Sequence Length:** 512 tokens
|
98 |
+
- **Number of Output Labels:** 1 label
|
99 |
+
- **Training Dataset:**
|
100 |
+
- [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
|
101 |
+
- **Language:** en
|
102 |
+
<!-- - **License:** Unknown -->
|
103 |
+
|
104 |
+
### Model Sources
|
105 |
+
|
106 |
+
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
|
107 |
+
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
|
108 |
+
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
|
109 |
+
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
|
110 |
+
|
111 |
+
## Usage
|
112 |
+
|
113 |
+
### Direct Usage (Sentence Transformers)
|
114 |
+
|
115 |
+
First install the Sentence Transformers library:
|
116 |
+
|
117 |
+
```bash
|
118 |
+
pip install -U sentence-transformers
|
119 |
+
```
|
120 |
+
|
121 |
+
Then you can load this model and run inference.
|
122 |
+
```python
|
123 |
+
from sentence_transformers import CrossEncoder
|
124 |
+
|
125 |
+
# Download from the 🤗 Hub
|
126 |
+
model = CrossEncoder("yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid")
|
127 |
+
# Get scores for pairs of texts
|
128 |
+
pairs = [
|
129 |
+
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
|
130 |
+
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
|
131 |
+
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
|
132 |
+
]
|
133 |
+
scores = model.predict(pairs)
|
134 |
+
print(scores.shape)
|
135 |
+
# (3,)
|
136 |
+
|
137 |
+
# Or rank different texts based on similarity to a single text
|
138 |
+
ranks = model.rank(
|
139 |
+
'How many calories in an egg',
|
140 |
+
[
|
141 |
+
'There are on average between 55 and 80 calories in an egg depending on its size.',
|
142 |
+
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
|
143 |
+
'Most of the calories in an egg come from the yellow yolk in the center.',
|
144 |
+
]
|
145 |
+
)
|
146 |
+
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
|
147 |
+
```
|
148 |
+
|
149 |
+
<!--
|
150 |
+
### Direct Usage (Transformers)
|
151 |
+
|
152 |
+
<details><summary>Click to see the direct usage in Transformers</summary>
|
153 |
+
|
154 |
+
</details>
|
155 |
+
-->
|
156 |
+
|
157 |
+
<!--
|
158 |
+
### Downstream Usage (Sentence Transformers)
|
159 |
+
|
160 |
+
You can finetune this model on your own dataset.
|
161 |
+
|
162 |
+
<details><summary>Click to expand</summary>
|
163 |
+
|
164 |
+
</details>
|
165 |
+
-->
|
166 |
+
|
167 |
+
<!--
|
168 |
+
### Out-of-Scope Use
|
169 |
+
|
170 |
+
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
171 |
+
-->
|
172 |
+
|
173 |
+
## Evaluation
|
174 |
+
|
175 |
+
### Metrics
|
176 |
+
|
177 |
+
#### Cross Encoder Reranking
|
178 |
+
|
179 |
+
* Datasets: `NanoMSMARCO_R100`, `NanoNFCorpus_R100` and `NanoNQ_R100`
|
180 |
+
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
|
181 |
+
```json
|
182 |
+
{
|
183 |
+
"at_k": 10,
|
184 |
+
"always_rerank_positives": true
|
185 |
+
}
|
186 |
+
```
|
187 |
+
|
188 |
+
| Metric | NanoMSMARCO_R100 | NanoNFCorpus_R100 | NanoNQ_R100 |
|
189 |
+
|:------------|:---------------------|:---------------------|:---------------------|
|
190 |
+
| map | 0.4636 (-0.0260) | 0.3174 (+0.0564) | 0.5700 (+0.1504) |
|
191 |
+
| mrr@10 | 0.4500 (-0.0275) | 0.4912 (-0.0086) | 0.5739 (+0.1472) |
|
192 |
+
| **ndcg@10** | **0.5191 (-0.0213)** | **0.3169 (-0.0081)** | **0.6383 (+0.1377)** |
|
193 |
+
|
194 |
+
#### Cross Encoder Nano BEIR
|
195 |
+
|
196 |
+
* Dataset: `NanoBEIR_R100_mean`
|
197 |
+
* Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator) with these parameters:
|
198 |
+
```json
|
199 |
+
{
|
200 |
+
"dataset_names": [
|
201 |
+
"msmarco",
|
202 |
+
"nfcorpus",
|
203 |
+
"nq"
|
204 |
+
],
|
205 |
+
"rerank_k": 100,
|
206 |
+
"at_k": 10,
|
207 |
+
"always_rerank_positives": true
|
208 |
+
}
|
209 |
+
```
|
210 |
+
|
211 |
+
| Metric | Value |
|
212 |
+
|:------------|:---------------------|
|
213 |
+
| map | 0.4503 (+0.0603) |
|
214 |
+
| mrr@10 | 0.5051 (+0.0371) |
|
215 |
+
| **ndcg@10** | **0.4915 (+0.0361)** |
|
216 |
+
|
217 |
+
<!--
|
218 |
+
## Bias, Risks and Limitations
|
219 |
+
|
220 |
+
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
221 |
+
-->
|
222 |
+
|
223 |
+
<!--
|
224 |
+
### Recommendations
|
225 |
+
|
226 |
+
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
227 |
+
-->
|
228 |
+
|
229 |
+
## Training Details
|
230 |
+
|
231 |
+
### Training Dataset
|
232 |
+
|
233 |
+
#### ms_marco
|
234 |
+
|
235 |
+
* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
|
236 |
+
* Size: 78,704 training samples
|
237 |
+
* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
|
238 |
+
* Approximate statistics based on the first 1000 samples:
|
239 |
+
| | query | docs | labels |
|
240 |
+
|:--------|:------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
|
241 |
+
| type | string | list | list |
|
242 |
+
| details | <ul><li>min: 11 characters</li><li>mean: 33.74 characters</li><li>max: 100 characters</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 3 elements</li><li>mean: 6.50 elements</li><li>max: 10 elements</li></ul> |
|
243 |
+
* Samples:
|
244 |
+
| query | docs | labels |
|
245 |
+
|:---------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
|
246 |
+
| <code>cost of installing central air</code> | <code>['Central Air Average Costs. The actual cost of central air installation depends on a number of factors, including the size of the home as well as the unit’s tonnage and SEER rating. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 2 If ductwork is additionally required, costs could reach $6,000 to $10,000 or more. 3 Mini-split central air conditioner prices average $1,500 to $3', 'For example, homes with forced hot air heating will have the duct work necessary for a fast and easy installation, when the project involves the running of ducts however the prices climb significantly. The average price to install a central air conditioner will range from $2650 to upwards of $15K. This installation cannot be considered a DIY project, and it is traditional for a homeowner to hire a contractor for the job. Central ai...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
247 |
+
| <code>how much does it cost to set up a cabinet shop</code> | <code>['According to Kennedy, most cabinets range from $500 to $1,500 per cabinet box. Based on an estimated 30 cabinets in an average-size kitchen, you can be looking at a cost of about $15,000-$45,000, she says. Discover everything you need to know about cabinets with our free guide! 1. Measure the dimensions of your kitchen', "December 28, 2005 Question Those of you who consider your operation small, what type of machinery is the minimum for what you do? I'm starting a one man shop, 2,400 square feet, and know what I would like to have to start, but am curious how the rest of you get by. A simple streamlined operation that worked for professional builders, and sell some to DIYers for a retail price. I am a one man shop that builds cabinets, furniture and exterior/interior doors. My shop is 1600 sq ft with 300 sq ft of it being a small spray room.", 'Seven years later and I moved out of the garage to a more legitimate setting in an industrial park. Today, 25 years after starting out, my co...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
248 |
+
| <code>how close can a gas meter be to a condensing unit</code> | <code>['Is it dangerous if it is close to the gas meter/pipe? Thanks! It should be 3 feet from the gas meter vent, and not the actual gas meter itself. The gas company can come out later to extend this vent further away from the meter if it is within 3 feet. But the chance that anything actually happening because of the ac too close to the vent is insanely remote. I would be more worried about getting hit by lightning than any problems with the gas.', 'Condensing Unit Too Close to House – Bad air conditioner installation jobs such as this one proves that it is in the best interest of the homeowner to hire competent HVAC air conditioner and heating installers so that the job is done correctly.', "Re: Condensing furnace Exhaust, Distances from window, electric and gas meters. Joel, 3 ft from operable window is what I have on the Electrical Service. Gas meter looks OK. Install instructions in your post says if below 100,000 btu clearance is 12', and 36' if over 100,000 btu.", 'Condensing Unit Too Close to House. This condensing unit was too close to the house to effectively reject heat. It was a bad HVAC condensing unit installation job by the HVAC installers. A mechanical inspector rejected the final for the permit until the condensing unit was correctly installed. It is recommended that condensing units have at least 2 feet of space so that it can']</code> | <code>[1, 0, 0, 0]</code> |
|
249 |
+
* Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
|
250 |
+
```json
|
251 |
+
{
|
252 |
+
"lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
|
253 |
+
"activation_fct": "torch.nn.modules.activation.Sigmoid",
|
254 |
+
"mini_batch_size": 16,
|
255 |
+
"respect_input_order": true
|
256 |
+
}
|
257 |
+
```
|
258 |
+
|
259 |
+
### Evaluation Dataset
|
260 |
+
|
261 |
+
#### ms_marco
|
262 |
+
|
263 |
+
* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
|
264 |
+
* Size: 1,000 evaluation samples
|
265 |
+
* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
|
266 |
+
* Approximate statistics based on the first 1000 samples:
|
267 |
+
| | query | docs | labels |
|
268 |
+
|:--------|:-----------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------|
|
269 |
+
| type | string | list | list |
|
270 |
+
| details | <ul><li>min: 11 characters</li><li>mean: 34.38 characters</li><li>max: 99 characters</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> | <ul><li>min: 2 elements</li><li>mean: 6.00 elements</li><li>max: 10 elements</li></ul> |
|
271 |
+
* Samples:
|
272 |
+
| query | docs | labels |
|
273 |
+
|:----------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
|
274 |
+
| <code>how long does an iva stay on your credit file</code> | <code>['For example your payments to your mobile phone (if you’re on a contract) and electricity companies will also appear in your credit report. Your IVA will show on your credit file for six years from the day it started. So if your IVA was five years long it will only be listed on your credit file for a further 12 months. The idea behind asking creditors to correct the dates on default notices is to make sure that these too will be gone within 12 months. Post IVA credit file clean up. It’s a happy day when your individual voluntary arrangement (IVA) finally ends, you’re well and truly free and clear and your money is your own again. You can also take satisfaction from the fact that you have done your best by your creditors.', 'LinkedIn0. An Individual Voluntary Arrangement (IVA) is recorded on your credit file for 6 years. During this time your credit rating will be negatively affected. Unfortunately your credit rating will not suddenly become good again after your Arrangement has ended ...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
275 |
+
| <code>Plants which produce their gametes in flowers are called what?</code> | <code>['Plants which produce their gametes in flowers are called: antheridium, gymnosperms, angiosperms, or vascular. They are called angiosperms.', 'In humans, cells that do not produce gametes are collectively called somatic cells. Somatic cells do not include sperm and ova, the cells from which they are made, and und … ifferentiated stem cells.', 'This event is called fertilization. The male gametes produced by animals and some plants (e.g., club mosses, horsetails, ferns) are called spermatozoa (plural of spermatozoon), or simply sperm. Their female gametes are called ova (plural of ovum). Ova are often called eggs. Most plants produce male gametes called pollen grains.', 'Unlike animals, plants have multicellular haploid and multicellular diploid stages in their life cycle. Gametes develop from the multicellular haploid gametophytes (Greek phyton, plant). Fertilization gives rise to a multicellular, diploid sporophyte that produces haploid spores via meiosis.', 'Original conversation...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
276 |
+
| <code>what is a dts sound system</code> | <code>['DTS is a series of multichannel audio technologies owned by DTS, Inc. (formerly known as D igital T heater S ystems, Inc.), an American company specializing in digital surround sound formats used for both commercial/theatrical and consumer grade applications. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, but DTS was not part of the original DVD specification, so early DVD players do not recognize DTS audio tracks at all.', 'DTS Connect is a blanket name for a two-part system used on the computer platform only, in order to convert PC audio into the DTS format, transported via a single S/PDIF cable. The two components of the system are DTS Interactive and DTS Neo:PC. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, bu...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
|
277 |
+
* Loss: [<code>ListMLELoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listmleloss) with these parameters:
|
278 |
+
```json
|
279 |
+
{
|
280 |
+
"lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
|
281 |
+
"activation_fct": "torch.nn.modules.activation.Sigmoid",
|
282 |
+
"mini_batch_size": 16,
|
283 |
+
"respect_input_order": true
|
284 |
+
}
|
285 |
+
```
|
286 |
+
|
287 |
+
### Training Hyperparameters
|
288 |
+
#### Non-Default Hyperparameters
|
289 |
+
|
290 |
+
- `eval_strategy`: steps
|
291 |
+
- `per_device_train_batch_size`: 16
|
292 |
+
- `per_device_eval_batch_size`: 16
|
293 |
+
- `learning_rate`: 2e-05
|
294 |
+
- `num_train_epochs`: 1
|
295 |
+
- `warmup_ratio`: 0.1
|
296 |
+
- `seed`: 12
|
297 |
+
- `bf16`: True
|
298 |
+
- `load_best_model_at_end`: True
|
299 |
+
|
300 |
+
#### All Hyperparameters
|
301 |
+
<details><summary>Click to expand</summary>
|
302 |
+
|
303 |
+
- `overwrite_output_dir`: False
|
304 |
+
- `do_predict`: False
|
305 |
+
- `eval_strategy`: steps
|
306 |
+
- `prediction_loss_only`: True
|
307 |
+
- `per_device_train_batch_size`: 16
|
308 |
+
- `per_device_eval_batch_size`: 16
|
309 |
+
- `per_gpu_train_batch_size`: None
|
310 |
+
- `per_gpu_eval_batch_size`: None
|
311 |
+
- `gradient_accumulation_steps`: 1
|
312 |
+
- `eval_accumulation_steps`: None
|
313 |
+
- `torch_empty_cache_steps`: None
|
314 |
+
- `learning_rate`: 2e-05
|
315 |
+
- `weight_decay`: 0.0
|
316 |
+
- `adam_beta1`: 0.9
|
317 |
+
- `adam_beta2`: 0.999
|
318 |
+
- `adam_epsilon`: 1e-08
|
319 |
+
- `max_grad_norm`: 1.0
|
320 |
+
- `num_train_epochs`: 1
|
321 |
+
- `max_steps`: -1
|
322 |
+
- `lr_scheduler_type`: linear
|
323 |
+
- `lr_scheduler_kwargs`: {}
|
324 |
+
- `warmup_ratio`: 0.1
|
325 |
+
- `warmup_steps`: 0
|
326 |
+
- `log_level`: passive
|
327 |
+
- `log_level_replica`: warning
|
328 |
+
- `log_on_each_node`: True
|
329 |
+
- `logging_nan_inf_filter`: True
|
330 |
+
- `save_safetensors`: True
|
331 |
+
- `save_on_each_node`: False
|
332 |
+
- `save_only_model`: False
|
333 |
+
- `restore_callback_states_from_checkpoint`: False
|
334 |
+
- `no_cuda`: False
|
335 |
+
- `use_cpu`: False
|
336 |
+
- `use_mps_device`: False
|
337 |
+
- `seed`: 12
|
338 |
+
- `data_seed`: None
|
339 |
+
- `jit_mode_eval`: False
|
340 |
+
- `use_ipex`: False
|
341 |
+
- `bf16`: True
|
342 |
+
- `fp16`: False
|
343 |
+
- `fp16_opt_level`: O1
|
344 |
+
- `half_precision_backend`: auto
|
345 |
+
- `bf16_full_eval`: False
|
346 |
+
- `fp16_full_eval`: False
|
347 |
+
- `tf32`: None
|
348 |
+
- `local_rank`: 0
|
349 |
+
- `ddp_backend`: None
|
350 |
+
- `tpu_num_cores`: None
|
351 |
+
- `tpu_metrics_debug`: False
|
352 |
+
- `debug`: []
|
353 |
+
- `dataloader_drop_last`: False
|
354 |
+
- `dataloader_num_workers`: 0
|
355 |
+
- `dataloader_prefetch_factor`: None
|
356 |
+
- `past_index`: -1
|
357 |
+
- `disable_tqdm`: False
|
358 |
+
- `remove_unused_columns`: True
|
359 |
+
- `label_names`: None
|
360 |
+
- `load_best_model_at_end`: True
|
361 |
+
- `ignore_data_skip`: False
|
362 |
+
- `fsdp`: []
|
363 |
+
- `fsdp_min_num_params`: 0
|
364 |
+
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
365 |
+
- `fsdp_transformer_layer_cls_to_wrap`: None
|
366 |
+
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
367 |
+
- `deepspeed`: None
|
368 |
+
- `label_smoothing_factor`: 0.0
|
369 |
+
- `optim`: adamw_torch
|
370 |
+
- `optim_args`: None
|
371 |
+
- `adafactor`: False
|
372 |
+
- `group_by_length`: False
|
373 |
+
- `length_column_name`: length
|
374 |
+
- `ddp_find_unused_parameters`: None
|
375 |
+
- `ddp_bucket_cap_mb`: None
|
376 |
+
- `ddp_broadcast_buffers`: False
|
377 |
+
- `dataloader_pin_memory`: True
|
378 |
+
- `dataloader_persistent_workers`: False
|
379 |
+
- `skip_memory_metrics`: True
|
380 |
+
- `use_legacy_prediction_loop`: False
|
381 |
+
- `push_to_hub`: False
|
382 |
+
- `resume_from_checkpoint`: None
|
383 |
+
- `hub_model_id`: None
|
384 |
+
- `hub_strategy`: every_save
|
385 |
+
- `hub_private_repo`: None
|
386 |
+
- `hub_always_push`: False
|
387 |
+
- `gradient_checkpointing`: False
|
388 |
+
- `gradient_checkpointing_kwargs`: None
|
389 |
+
- `include_inputs_for_metrics`: False
|
390 |
+
- `include_for_metrics`: []
|
391 |
+
- `eval_do_concat_batches`: True
|
392 |
+
- `fp16_backend`: auto
|
393 |
+
- `push_to_hub_model_id`: None
|
394 |
+
- `push_to_hub_organization`: None
|
395 |
+
- `mp_parameters`:
|
396 |
+
- `auto_find_batch_size`: False
|
397 |
+
- `full_determinism`: False
|
398 |
+
- `torchdynamo`: None
|
399 |
+
- `ray_scope`: last
|
400 |
+
- `ddp_timeout`: 1800
|
401 |
+
- `torch_compile`: False
|
402 |
+
- `torch_compile_backend`: None
|
403 |
+
- `torch_compile_mode`: None
|
404 |
+
- `dispatch_batches`: None
|
405 |
+
- `split_batches`: None
|
406 |
+
- `include_tokens_per_second`: False
|
407 |
+
- `include_num_input_tokens_seen`: False
|
408 |
+
- `neftune_noise_alpha`: None
|
409 |
+
- `optim_target_modules`: None
|
410 |
+
- `batch_eval_metrics`: False
|
411 |
+
- `eval_on_start`: False
|
412 |
+
- `use_liger_kernel`: False
|
413 |
+
- `eval_use_gather_object`: False
|
414 |
+
- `average_tokens_across_devices`: False
|
415 |
+
- `prompts`: None
|
416 |
+
- `batch_sampler`: batch_sampler
|
417 |
+
- `multi_dataset_batch_sampler`: proportional
|
418 |
+
|
419 |
+
</details>
|
420 |
+
|
421 |
+
### Training Logs
|
422 |
+
| Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_R100_ndcg@10 | NanoNFCorpus_R100_ndcg@10 | NanoNQ_R100_ndcg@10 | NanoBEIR_R100_mean_ndcg@10 |
|
423 |
+
|:----------:|:--------:|:-------------:|:---------------:|:------------------------:|:-------------------------:|:--------------------:|:--------------------------:|
|
424 |
+
| -1 | -1 | - | - | 0.0407 (-0.4997) | 0.2816 (-0.0435) | 0.0231 (-0.4775) | 0.1151 (-0.3402) |
|
425 |
+
| 0.0002 | 1 | 883.6996 | - | - | - | - | - |
|
426 |
+
| 0.0508 | 250 | 921.6613 | - | - | - | - | - |
|
427 |
+
| 0.1016 | 500 | 904.6479 | 856.3090 | 0.1094 (-0.4310) | 0.2034 (-0.1216) | 0.2049 (-0.2957) | 0.1726 (-0.2828) |
|
428 |
+
| 0.1525 | 750 | 900.1757 | - | - | - | - | - |
|
429 |
+
| 0.2033 | 1000 | 892.1912 | 847.0684 | 0.3615 (-0.1789) | 0.2856 (-0.0394) | 0.5605 (+0.0598) | 0.4025 (-0.0528) |
|
430 |
+
| 0.2541 | 1250 | 891.0896 | - | - | - | - | - |
|
431 |
+
| 0.3049 | 1500 | 882.4826 | 844.2736 | 0.4446 (-0.0959) | 0.3072 (-0.0178) | 0.6115 (+0.1108) | 0.4544 (-0.0009) |
|
432 |
+
| 0.3558 | 1750 | 878.0654 | - | - | - | - | - |
|
433 |
+
| 0.4066 | 2000 | 878.2091 | 840.3965 | 0.4614 (-0.0791) | 0.3450 (+0.0200) | 0.6472 (+0.1466) | 0.4845 (+0.0292) |
|
434 |
+
| 0.4574 | 2250 | 878.5553 | - | - | - | - | - |
|
435 |
+
| 0.5082 | 2500 | 877.2454 | 841.2769 | 0.4602 (-0.0802) | 0.3123 (-0.0127) | 0.5765 (+0.0759) | 0.4497 (-0.0057) |
|
436 |
+
| 0.5591 | 2750 | 864.5746 | - | - | - | - | - |
|
437 |
+
| 0.6099 | 3000 | 899.3305 | 838.2897 | 0.4752 (-0.0652) | 0.3152 (-0.0099) | 0.6333 (+0.1326) | 0.4746 (+0.0192) |
|
438 |
+
| 0.6607 | 3250 | 870.9701 | - | - | - | - | - |
|
439 |
+
| **0.7115** | **3500** | **873.4406** | **835.9516** | **0.5191 (-0.0213)** | **0.3169 (-0.0081)** | **0.6383 (+0.1377)** | **0.4915 (+0.0361)** |
|
440 |
+
| 0.7624 | 3750 | 882.9871 | - | - | - | - | - |
|
441 |
+
| 0.8132 | 4000 | 881.5676 | 836.2292 | 0.5024 (-0.0380) | 0.3269 (+0.0019) | 0.6350 (+0.1343) | 0.4881 (+0.0327) |
|
442 |
+
| 0.8640 | 4250 | 884.8231 | - | - | - | - | - |
|
443 |
+
| 0.9148 | 4500 | 875.8995 | 834.7368 | 0.5028 (-0.0376) | 0.3284 (+0.0034) | 0.6200 (+0.1193) | 0.4837 (+0.0284) |
|
444 |
+
| 0.9656 | 4750 | 868.8395 | - | - | - | - | - |
|
445 |
+
| -1 | -1 | - | - | 0.5191 (-0.0213) | 0.3169 (-0.0081) | 0.6383 (+0.1377) | 0.4915 (+0.0361) |
|
446 |
+
|
447 |
+
* The bold row denotes the saved checkpoint.
|
448 |
+
|
449 |
+
### Framework Versions
|
450 |
+
- Python: 3.11.11
|
451 |
+
- Sentence Transformers: 3.5.0.dev0
|
452 |
+
- Transformers: 4.49.0
|
453 |
+
- PyTorch: 2.6.0+cu124
|
454 |
+
- Accelerate: 1.5.2
|
455 |
+
- Datasets: 3.4.0
|
456 |
+
- Tokenizers: 0.21.1
|
457 |
+
|
458 |
+
## Citation
|
459 |
+
|
460 |
+
### BibTeX
|
461 |
+
|
462 |
+
#### Sentence Transformers
|
463 |
+
```bibtex
|
464 |
+
@inproceedings{reimers-2019-sentence-bert,
|
465 |
+
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
466 |
+
author = "Reimers, Nils and Gurevych, Iryna",
|
467 |
+
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
468 |
+
month = "11",
|
469 |
+
year = "2019",
|
470 |
+
publisher = "Association for Computational Linguistics",
|
471 |
+
url = "https://arxiv.org/abs/1908.10084",
|
472 |
+
}
|
473 |
+
```
|
474 |
+
|
475 |
+
#### ListMLELoss
|
476 |
+
```bibtex
|
477 |
+
@inproceedings{lan2013position,
|
478 |
+
title={Position-aware ListMLE: a sequential learning process for ranking},
|
479 |
+
author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
|
480 |
+
booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
|
481 |
+
pages={333--342},
|
482 |
+
year={2013}
|
483 |
+
}
|
484 |
+
```
|
485 |
+
|
486 |
+
<!--
|
487 |
+
## Glossary
|
488 |
+
|
489 |
+
*Clearly define terms in order to be accessible across audiences.*
|
490 |
+
-->
|
491 |
+
|
492 |
+
<!--
|
493 |
+
## Model Card Authors
|
494 |
+
|
495 |
+
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
496 |
+
-->
|
497 |
+
|
498 |
+
<!--
|
499 |
+
## Model Card Contact
|
500 |
+
|
501 |
+
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
502 |
+
-->
|
config.json
ADDED
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
|
3 |
+
"architectures": [
|
4 |
+
"BertForSequenceClassification"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"classifier_dropout": null,
|
8 |
+
"hidden_act": "gelu",
|
9 |
+
"hidden_dropout_prob": 0.1,
|
10 |
+
"hidden_size": 384,
|
11 |
+
"id2label": {
|
12 |
+
"0": "LABEL_0"
|
13 |
+
},
|
14 |
+
"initializer_range": 0.02,
|
15 |
+
"intermediate_size": 1536,
|
16 |
+
"label2id": {
|
17 |
+
"LABEL_0": 0
|
18 |
+
},
|
19 |
+
"layer_norm_eps": 1e-12,
|
20 |
+
"max_position_embeddings": 512,
|
21 |
+
"model_type": "bert",
|
22 |
+
"num_attention_heads": 12,
|
23 |
+
"num_hidden_layers": 12,
|
24 |
+
"pad_token_id": 0,
|
25 |
+
"position_embedding_type": "absolute",
|
26 |
+
"sentence_transformers": {
|
27 |
+
"activation_fn": "torch.nn.modules.activation.Sigmoid"
|
28 |
+
},
|
29 |
+
"torch_dtype": "float32",
|
30 |
+
"transformers_version": "4.49.0",
|
31 |
+
"type_vocab_size": 2,
|
32 |
+
"use_cache": true,
|
33 |
+
"vocab_size": 30522
|
34 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e497671f77b7d5f585a6a97a111d0e211eac9c7afd3c21cfb343e5c26e323a90
|
3 |
+
size 133464836
|
special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cls_token": "[CLS]",
|
3 |
+
"mask_token": "[MASK]",
|
4 |
+
"pad_token": "[PAD]",
|
5 |
+
"sep_token": "[SEP]",
|
6 |
+
"unk_token": "[UNK]"
|
7 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"added_tokens_decoder": {
|
3 |
+
"0": {
|
4 |
+
"content": "[PAD]",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false,
|
9 |
+
"special": true
|
10 |
+
},
|
11 |
+
"100": {
|
12 |
+
"content": "[UNK]",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false,
|
17 |
+
"special": true
|
18 |
+
},
|
19 |
+
"101": {
|
20 |
+
"content": "[CLS]",
|
21 |
+
"lstrip": false,
|
22 |
+
"normalized": false,
|
23 |
+
"rstrip": false,
|
24 |
+
"single_word": false,
|
25 |
+
"special": true
|
26 |
+
},
|
27 |
+
"102": {
|
28 |
+
"content": "[SEP]",
|
29 |
+
"lstrip": false,
|
30 |
+
"normalized": false,
|
31 |
+
"rstrip": false,
|
32 |
+
"single_word": false,
|
33 |
+
"special": true
|
34 |
+
},
|
35 |
+
"103": {
|
36 |
+
"content": "[MASK]",
|
37 |
+
"lstrip": false,
|
38 |
+
"normalized": false,
|
39 |
+
"rstrip": false,
|
40 |
+
"single_word": false,
|
41 |
+
"special": true
|
42 |
+
}
|
43 |
+
},
|
44 |
+
"clean_up_tokenization_spaces": true,
|
45 |
+
"cls_token": "[CLS]",
|
46 |
+
"do_basic_tokenize": true,
|
47 |
+
"do_lower_case": true,
|
48 |
+
"extra_special_tokens": {},
|
49 |
+
"mask_token": "[MASK]",
|
50 |
+
"model_max_length": 512,
|
51 |
+
"never_split": null,
|
52 |
+
"pad_token": "[PAD]",
|
53 |
+
"sep_token": "[SEP]",
|
54 |
+
"strip_accents": null,
|
55 |
+
"tokenize_chinese_chars": true,
|
56 |
+
"tokenizer_class": "BertTokenizer",
|
57 |
+
"unk_token": "[UNK]"
|
58 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|